Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

EM, GMM, and K-means - Covariance matrix choice

Hi!

I have a question regarding lecture 11 material (GMM).
From the lecture and the formulae, it is clear that GMMs are a generalized version of the K-means algorithm.
However, in part 1 of the lecture, we say the K-mean algorithm is obtained by setting the covariance matrix (Sigma) to the identity matrix in the GMM algorithm.
In contrast, in part 2, we demonstrate that the EM algorithm is equivalent to the K-means algorithm by setting the covariance matrix (Sigma) to sigma*identity and letting sigma go to 0 (where sigma is a constant).

I understood both demonstrations, but they seem kind of contradictory to me. I’m mistaken somewhere; I probably mixed things up.
Hence, I am wondering what is the covariance matrix (Sigma) we should use to obtain the K-mean algorithm from the more general GMM algorithm.

Regards,
Niko.

the part 2 is more accurate.
GMM is an optimization problem, not an algorithm. It's a likelihood maximization, either original likelihood or marginalized one.
EM is one method to optimize this problem, and EM with setting the covariance matrix (Sigma) to sigma*identity and letting sigma->0 makes the EM algorithm become identical to the k-means algorithm.

in the lecture we have not shown if you could get the k-Means optimization problem directly from the GMM optimization problem. it would depend how you deal with the z variables and is not obvious

Thanks for the prompt reply and sorry for the terminology mistakes between algorithm/optimization problem.

I am still confused.
Let's assume we use the EM method to solve the GMM optimization problem. We let the covariance matrix of the cluster k be Sigma_k = sigma^2 * Identity.
We have shown that the EM algorithm becomes identical to the K-mean algorithm when sigma->0.

In contrast, if we let all covariance matrices (of all clusters) be the identity matrix, we said that the GMM optimization problem is equivalent to maximizing the likelihood of the probabilistic model for K-means. Indeed, we have shown in part 1 that minimizing the objective function of K-means is equivalent to minimizing the -log(likelihood) of the probabilistic model for K-means. And this made sense to me. Indeed, in the probabilistic model for K-means, we assumed every point is generated spherically around its center, i.e., the likelihood of x given the parameters mu and z is a gaussian with mean = mu and covariance matrix = identity.

Therefore, we can say we set Sigma_k = sigma^2 * Identity in both cases.
On one hand, we let sigma->0, and on the other hand, we let sigma=1.
This seems contradictory to me, and I am unsure of where I am mistaken.

Many thanks.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification