Member-only story

Adding self-adjusting label size to SpectralCluster in scikit-learn.

Daniel Ellis Research
5 min readOct 24, 2019

--

It is often that we are faced with copious amounts of data and wish to have a method of locating like groups. This is where the field of vector clustering comes into play. Instead of having to manually select each group, we can get an algorithm which does the hard work for us.

Which clustering method?

There are many different clustering methods ranging from the overused k-means clustering to density-based non-parametric algorithms such as DBSCAN and OPTICS — just see the list provided by scikit-learn.

Since most real-world data is non-linear and often non-convex, we needed to compare a series of different algorithms and determine which gave the best cluster separation.

The sample dataset.

It, therefore, makes sense to select a set of pseudo-self-adjusting algorithm for automated testing. One thing that struck me as odd, was the explicit constraint on the number of clusters, based on the input parameter within the spectral clustering function of sklearn.

Spectral Clustering - how does it work?

--

--

Daniel Ellis Research
Daniel Ellis Research

Written by Daniel Ellis Research

Research Software Engineer specialising in High-Performance Computing and Data Visualisation. — PhD in Atmospheric Chemistry and Masters in Theoretical Physics.

No responses yet