Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Cross validation and clustering

Hi!
For our project in an EPFL lab we were assign the task of optimizing the set of genes that can predict the family of cells.
The method of prediction was already given to us by the host lab, we use a hierarchical clustering algorithm. Up to now to evaluate the prediction error, we use cross validation. However, we were wondering if it is relevant to cv our result in this case. From what we understand from the course cv is a good way to evaluate the error of an already fitted model on new data. But in our case no model is fitted, we compute a distance matrix using spearman correlation and from that the algorithm determine which cells should be in the same cluster.
So, is it actually useful to cv our error?

Thank you in advance for your help

It is generally useful to do cross-validation, however it is less straight in clustering than in supervised ML as the loss might not be very well-defined.

If you can identify the loss that your algorithm is minimizing then you can use it as measure of performance and you might study how different hyper-parameters affect your model. Also one interesting thing to study is how much your cluster assignment is stable with respect to your training data.

There are other methods for doing cross-validation for clustering like Gabriel's CV, for this you need to read some articles and chose the method that you think works best for your special case. A simple "cross validation for clustering" in Google will set you in the right path.

Thank you so much for your quick answer! We will look into it and get back to you if we have any further questions.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification