Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Exercise 6 - Problem 3

Hi,
I have three questions in Problem 3 from Problem Set 6:

  • [3.3] I did not understand why did we suddenly assume independence between \(r_n\) and each of \(x_n\), \(y_n\), and \(w\), and thus we moved from \(p(r_n|x_n,y_n,w,\pi)\) to \(p(r_n|\pi)\). Also, why did we move from \(p(y_n|x_n, r_n, w, \pi)\) to \(p(y_n|x_n, r_n, w)\)?

  • [3.5.a] Is there a way that I can say from the first look that this model is not convex, and then I can find a counter-example as we did in the solution?

  • [3.5.b] What does it mean for the model to be identifiable?

  • [3.3] ( r_n ) is not independent of (x_n, y_n), but knowing the parameter (\pi) totally determines ( r_n ). In the solution we used the law of total probability then the definition of conditional probability.

  • [3.5.a] The main point is that if you permute the clusters, the final loss does not change, if you use this you can prove that the model is not convex.

  • [3.5.b] An identifiable model is a model that is defined by one and only one choice of the parameters ( the function \(f(x:w)\) is injective w.r.t \(w\))

Thank you very much.
For the second point, why does permuting the clusters mean that the model is not convex?

In general, there is no reason why permutation invariance and convexity can't co-exist, take for example :
\(f(x) = \sum_i g(x_i)\) , for g convex.

What I meant is to use this fact to come up with a counter example. In the solution we considered the case of two clusters, and the counter example is made out of picking two extreme cases, when we put all all data in one of the two clusters, what happens in between.
I hope I am clear.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification