Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Least squares

Can you please provide some explanations about how these results were found ( the first and third ones)?

Thank you!

The first one is the Maximum Likelihood Estimator method (probabilistic approach). Knowing that p(w) does not vary, minimizing (-log p(y|X,w)p(w) ) is the same as minimizing (- log p(y|X,w)) which is maximizing the log-likelihood.

The third one is the dual representation of ridge regression, if you take the gradient with respect to w and set it to zero you will find that you can solve for w as a linear combination of all data points x. You can find the derivation in the lecture notes called “kernelized ridge regression”

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification