Q11 exam 2016

Can you explain why c) and d) are correct in this question? I don't understand what are "optimal representation points" or "optimal cluster"?
Thank you a lot!
Screenshot 2022-01-13 at 10.02.21.jpg

Top comment

A K-means solution consists of two things:

  • k cluster centers (representation points), and
  • assignments of the original data to those clusters (optimal clusters).

A solution is optimal if it minimizes some particular cost function.

We usually do clustering by alternating between optimizing the cluster centers for a fixed cluster assignment, and optimizing the assignment for fixed cluster centers. The algorithm finishes as soon as either of those doesn't change anymore. The two cases (c) and (d) correspond to the last steps of this algorithm.

Thank you so much for your answer.
But in the lesson it is said that you initialize K-means with the cluster centers, you don't initialize K-means with the assignment. So that is why I thought that (c) was false :/

I agree with the question asker!

Our k-means clustering algorithm does:
1) compute the assignment z_nk using centers
2) compute the new centers using new assignments z_nk

For c), if we optimize with optimal clusters but have horrible centers, we will recompute the clusters using the bad centers, so we will not get optimal clusters.

The question assumes that for c), after initializing with the assignments, you will proceed with re-computing the centers, and not by throwing those away and recomputing the assignments. Does that make sense?

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification