Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

difference MAP and MLE

Hello,
I don't really understand the difference between MAP and MLE ?

Top comment

In a regression task the parameter is \(w\) and the observed data is given by the couples \((x,y)\), but as \(p(y,x\mid w)= p(y\mid x, w) p(x\mid w ) = p(y\mid x, w) p(x)\) (x does not depend on w), we only care about \(p(y\mid x, w)\).

I will try to help you have a general understanding of MAP vs MLE. In MLE you let the data and only the data decide which weight is best to explain it. However in MAP, you are giving some weights a heads up based on prior knowledge that you have about the weights. This is the only difference.

Note: Regularization can be understood as adding prior knowledge. For example ridge regression corresponds to a MAP (you suppose à priori that your weight comes from a normal distribution with mean zero and variance \(1/\lambda\) ).

I am also interested by this question. Could you please explain it for regression / supervised classification / unsupervised ?
I guess we always assume that "the data" is generated following a certain model M.
Do we assume that Y is generated by M for a regression task and that X is generated by M for a classification task ?

Basics MAP = maximum a posteriori and MLE= maximum likelihood. Both are statistical tools for estimating parameters of a model from data (obeying to the said model).

Now if you assume you have data \(X\) and a model defined by \(p(X\mid \theta)\) also known as the likelihood. The MLE estimator is \(\theta_{MLE}(X)= arg \max_\theta p(X\mid \theta)\).

The MAP estimator is \(\theta_{MAP}(X)= arg \max_\theta p(\theta\mid X) = arg \max_\theta p(X\mid \theta) p(\theta)\), with \(p(\theta)\) being the distribution à priori or prior on \(\theta\) i.e. what we know about \(\theta\) before observing "current" data \(X\).

Sometimes \(p(\theta)\) does not need to be a probability measure (integrates to 1), this is referred to as improper prior.

So for example when you take \(p(\theta)=1\) you get back the MLE estimator (This is sometimes referred to as the principle of indifference in bayesian statistics).

One final point, MAP has meaning only in bayesian statistics.

So, in a regression task, the parameters are X.T W and the observations are X.
(Lecture 3 says p(y|x,w) is the MLE and p(w|x,y) = p(y|x,w)*p(w) the MAP)
In classification the parameters are Y and the observation are X.
So p(y|x) is the MAP and p(x|y) is the MLE

What about unsupervised ?

Top comment

In a regression task the parameter is \(w\) and the observed data is given by the couples \((x,y)\), but as \(p(y,x\mid w)= p(y\mid x, w) p(x\mid w ) = p(y\mid x, w) p(x)\) (x does not depend on w), we only care about \(p(y\mid x, w)\).

I will try to help you have a general understanding of MAP vs MLE. In MLE you let the data and only the data decide which weight is best to explain it. However in MAP, you are giving some weights a heads up based on prior knowledge that you have about the weights. This is the only difference.

Note: Regularization can be understood as adding prior knowledge. For example ridge regression corresponds to a MAP (you suppose à priori that your weight comes from a normal distribution with mean zero and variance \(1/\lambda\) ).

Add comment

Post as Anonymous Dont send out notification