Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Explanation on Bayes Formula

Hello, in the 2018 midterm, question 2, we are asked to write a probabilistic model such that the solution coincides to the MAP. In the solutions we deduce from Bayes Formula that

\( \mathbf{w}_{\mathrm{MAP}}^{\star}=\arg \max _{\mathbf{w}} p(\mathbf{y} \mid \mathbf{X}, \mathbf{w}) p(\mathbf{w}) \)

Which, I guess results from this:
\( p(\mathbf{w}|\mathbf{X},\mathbf{y}) \propto p(\mathbf{y} \mid \mathbf{X}, \mathbf{w}) p(\mathbf{w}) \)

However when we apply bayes formula to the LHS, we don't come up with the same RHS, in particular we get:
\( p(\mathbf{w}|\mathbf{X},\mathbf{y}) = \frac{ p(\mathbf{y}, \mathbf{X} \mid \mathbf{w})p(\mathbf{w})}{p(\mathbf{y},\mathbf{X})} \)

As you can see the numerator is not the same, can someone explain the steps more concisely?

Let's start by writing out Bayes' Theorem:

$$\begin{align} p(\mathbf{w}|\mathbf{X},\mathbf{y}) &= \frac{ p(\mathbf{y}, \mathbf{X} \mid \mathbf{w})p(\mathbf{w})}{p(\mathbf{y},\mathbf{X})}\\ &= \frac{ p(\mathbf{y}\mid \mathbf{X}, \mathbf{w})p(\mathbf{X} \mid \mathbf{w})p(\mathbf{w})}{p(\mathbf{y},\mathbf{X})}\\ &= \frac{ p(\mathbf{y}\mid \mathbf{X}, \mathbf{w})p(\mathbf{X})p(\mathbf{w})}{p(\mathbf{y},\mathbf{X})} \end{align}$$

In the last step, we use the fact that the likelihood of the input features \(\mathbf{X}\) does not depend on the model parameters \(\mathbf{w}\). I.e. \(p(\mathbf{X} \mid \mathbf{w}) = p(\mathbf{X})\).

Now taking the MAP, we get:

$$\begin{align} \mathbf{w}_{\mathrm{MAP}}^{\star} &= \arg \max _{\mathbf{w}} p(\mathbf{w}|\mathbf{X},\mathbf{y})\\ &= \arg \max _{\mathbf{w}} \frac{ p(\mathbf{y}\mid \mathbf{X}, \mathbf{w})p(\mathbf{X})p(\mathbf{w})}{p(\mathbf{y},\mathbf{X})}\\ &= \arg \max _{\mathbf{w}} p(\mathbf{y}\mid \mathbf{X}, \mathbf{w})p(\mathbf{w}) \end{align}$$

In the last step we leave out all factors which are not influenced by the model parameters \(\mathbf{w}\).

Its perfectly clear thank you!

Will Bayes Theorem and Bayes Net be on this exam?

@Anonymous said:
Will Bayes Theorem and Bayes Net be on this exam?

Yes, Bayes Theorem is certainly material to study for the exam. In relation to this topic, it is used for example in the interpretation of Ridge Regression as an MAP estimator in Lecture 3: https://github.com/epfml/ML_course/blob/master/lectures/03/lecture03d_ridge.pdf

For Bayesian Networks, the answer will be given in the (existing) topic here: http://oknoname.herokuapp.com/forum/topic/485/exam-bayes-net-2019/

The interpretation of Ridge Regression as an MAP estimator is in the lecture notes but was not covered during the lecture.
Should we study everything in the additional notes? Or can we assume that it is extra material?

I have the same question as the last commenter ^isn't it considered extra material. But also I am confused as to when we should use MAP and when we should use MLE. For example in least squares probabilistic approach we don't consider the prior when calculating the likelihood

as usual, exam material is only what was covered in the (video) lectures and lab sessions.

but also as usual, some additional bits might help you better understand the lecture materials

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification