Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

"Identifiable model" ?

Good evening,

While doing the exercice set 6, I did not understand what you meant in the multi-class classification exercice (2) when you said that we must set
$$w_k = 0$$
so that the model is identifiable. What do you mean by identifiable, and how can we see that having a different k-th weight vector would make the model "unidentifiable" ?

P.S: thanks a lot for your answers, you are quick and clear to reply and it really helps me keeping up with the material in spite of the special working setup.

Also, would it be possible to access solutions of hw6 since it's a theory homework ? Thanks

solutions of all homeworks 1-6 will be uploaded asap next days

Good evening,

Thanks for your question! Sorry that it is not clear in the exercise sheet.

The model is identifiable if MLE always gives the true parameters asymptotically when the number of samples goes to infinity (see also exercise 3, 5b for definition).

In this exercise, if we have K learnable parameters w_k, then the model is not identifiable. It is because adding any constant vector v to your MLE solution vectors w_1 ... w_K gives exactly the same outputs of softmax and therefore has the same log-likelihood, and hence is also a solution of MLE. (exp(w_i x + v x) = exp(w_i x) exp (vx) and this exp(vx) cancels in softmax). So MLE has no way to find the right shifts v.

Fixing this shift to be v = - w_K makes the model identifiable and it is exactly equivalent to setting w_K = 0.

Maybe informal intuition here could be that we have K - 1 degrees of freedom, but not K (since the sum of all probabilities has to be equal to 1).

Oh right, I understand now, thanks a lot for your help :)

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification