Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Excercise 4 loss vs rmse

Hello,

I was wondering why in the first function "cross_validation" we output losses (which is defined in lecture notes with the lambda penalty term) but then we need to compute RMSE in the second function "cross_validation_demo".
I don't really understand where we should use the losses and how the first function helps in RMSE calculation.

Thank you for your help,

I was wondering why in the first function "cross_validation" we output losses (which is defined in lecture notes with the lambda penalty term) but then we need to compute RMSE in the second function "cross_validation_demo".

  • The first function cross_validation() computes the cross-validation lossES (k train & test losses, one train/test for every fold) for a single \(\lambda\)
  • The second function cross_validation_demo() can then use the first function cross_validation() when sweeping through different values of \(\lambda\), e.g. every time storing the average train/test loss across the k-fold output (by calling cross_validation_demo() for every \(\lambda\), and then taking the mean).

The output/visualization will then be based on lists rmse_tr and rmse_te, containing the average (k-fold) train/test loss as function of \(\lambda\) (lambdas).

Thank you,
Yes but the loss function we are optimizing in lecture notes on ridge regression is the sum of square error + lamda (magnitude of w)^2 - it isn't the same definition as MSE or RMSE. How do we know what 'loss' means in generality.

thank you

Correct; it is not the same. The (global) loss can be composed of many things. I.e. we can add some regularization to the original MSE loss to create a new loss objective.

The loss expresses directly (or as a proxy if not directly expressible) what we wish to optimize (usually minimize, hence the name loss and not for example gain).

In ridge regression, and regularization in general, the focus shifts (a little) from the sole goal of minimizing the original loss function (MSE in this case) on the training data to also staying close to the origin (minimizing \(\left\lVert w \right\rVert^2\) keeps weights w close to the origin 0, independent of data). The balance between original objective (MSE) and regularization is tuned by the hyperparameter \(\lambda\).

The hope is that, by adding regularization to the original loss function, thus creating a new objective (or loss function) the trained model will generalize better on the test set.

Note that the model will perform worse in terms of raw loss on the training data because it has other things to take into account (the regularization term). This is fine, as long as we perform well or better on the test data.

Hello,
I feel like I am misunderstanding something here. The loss in ridge regression is indeed defined as sum of squared errors and the penalty term. And as you say the function cross_validation() is defined in the excercises as 'computing the cross-validation lossES (k train & test losses, one train/test for every fold) ' .

But rmse is not the same as loss function, as we agreed, yet in the second function 'cross_validation_demo()' we are supposed to use cross_validation() to store values in lists called rmse when that function is defined as ouputing a Loss. Is the first function outputing really RMSE instead of ridge regression loss or is the RMSE list of the second function storing ridge regression loses instead of rmse?

I am trying to be consistent with the nomenclature of the course. RMSE isn't a synonym with loss in the case of ridge regression. On exams won't we be expected to make the distinction?

Thank you for your help

rmse is not the same as loss function, as we agreed,

RMSE is an example of a loss function. Loss functions are composable (mostly by summing) to make another loss function.
For the case of regularized least-squares linear regression, the composition of the RMSE loss and the regularization loss makes up the overall loss. The model parameters (also called weights) are optimized w.r.t. this loss, (on the training data!).

Is the first function outputing really RMSE instead of ridge regression loss or is the RMSE list of the second function storing ridge regression loses instead of rmse?

When optimizing the model parameters we use the ridge regression objective (RMSE + regularization). However, when evaluating, we only care about the true loss (RMSE). So, rmse_tr and rmse_te will contain the pure RMSE losses.

I am trying to be consistent with the nomenclature of the course. RMSE isn't a synonym with loss in the case of ridge regression. On exams won't we be expected to make the distinction?

This terminology of loss, cost, objective, ... can be confusing at times. Saying that RMSE is the loss for ridge regression, with lambda nonzero, is indeed wrong.
Generally when speaking about "the loss", it's the true loss of the model on the data (e.g. RMSE). Yet, in some cases such as when optimizing the parameters for the ridge regression case, we use the regularized loss as the optimization objective (but for evaluation, and e.g. comparison with the nonregularized case, still report the raw RMSE).

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification