Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

l1 regularization for features vs l2

Hi,
why is it a "better" /"good" idea to use l1 regularization as a feature selector rather than l2. This was discussed during the q&a of today but not sure I understand why.

Thanks in advance for your answer

In the picture below:

  • [right] L1 regularization has a sharp (equi-)loss contour, which makes it likely that the optimal regularized loss lies on the axis. Lying on the axis = a sparse weight vector (some weights are 0) thus doing feature selection.
  • [left] L2 regularization will encourage the weight parameter vector to be on an axis only when the (non-regularized) optimum is on one of the axes itself. This rarely happens compared to all the other options in the weight space. Note that L2 regularization, while having less of a feature selection effect, is more stable when features are correlated (see optional reading).

Screen Shot 2020-10-20 at 19.13.37.jpg

image credit + graphical intuition: https://www.youtube.com/watch?v=sO4ZirJh9ds

Optional reading: https://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/

Thank you for this! It is very clear :))

I have one extra question:
If we transform our data with PCA (data is decorrelated), then both L1 and L2 are stable, correct?
And using L1 could be a way to choose the number of first components

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification