Member-only story
Adding self-adjusting label size to SpectralCluster in scikit-learn.
It is often that we are faced with copious amounts of data and wish to have a method of locating like groups. This is where the field of vector clustering comes into play. Instead of having to manually select each group, we can get an algorithm which does the hard work for us.
Which clustering method?
There are many different clustering methods ranging from the overused k-means clustering to density-based non-parametric algorithms such as DBSCAN and OPTICS — just see the list provided by scikit-learn.
Since most real-world data is non-linear and often non-convex, we needed to compare a series of different algorithms and determine which gave the best cluster separation.
It, therefore, makes sense to select a set of pseudo-self-adjusting algorithm for automated testing. One thing that struck me as odd, was the explicit constraint on the number of clusters, based on the input parameter within the spectral clustering function of sklearn.