Is cross-validation necessary for hyperparameter optimization?

This forum is inactive. Browsing/searching possible.

CS-433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Is cross-validation necessary for hyperparameter optimization?

Hello,
I have several problems regarding the use of validation sets.

Is cross-validation necessary for hyperparameter optimization? After the initial training/validation/test splitting, can we just use a fixed validation set to optimize the hyperparameters? We understand that cross-validation is more reasonable and can give a more fair evaluation, but for project 2, our time and computing power are limited. If we use K-fold cross-validation, it means that we need about K times the time.
Why is a validation set necessary? For project 2, Can we use a larger training set that contains the previous validation set and training set to optimize the parameters/hyperparameters, and then use the test set to evaluate the performance of models?

Thank you for your help.

3 Dec '20 ·

anonymous

Hi,

Yes, you can use the fixed validation set for deciding which hyperparameters are optimal. Using a validation set rather than cross-validation is common for deep learning models, for the reasons that you mentioned - cross validation can be too expensive.
If you would have only train-test split, then you would choose hyperparameters that are optimal for that test set, which would mean that you are using your test set in the training of the model. Test set is a simulation on how your model behaves on unseen data, an you should keep it untouched until the final evaluation. In case you choose parameters based on test set, that would mean that your model was able to see the test data and it will be biased towards that data, which is not good.

However, when you decide on the hyperparameters based on the results you obtain on the validation set, you should retrain the model using whole training+validation set (with that fixed hyperparameters), and consider that as your final model, on which you evaluate your test set.

I hope this helps.

Best,
Maja

1

3 Dec '20 ·

anonymous

Thank you for your reply!

We will use training/validation/test splitting. But I still want to ask, why can't we choose hyperparameters based on the results we obtain on the training set? In that case, the test set is still untouched.

3 Dec '20 ·

anonymous

Hi,
Sometimes your model is complex and flexible enough that can do a perfect classification /regression on the training set. However, your goal is not to design a system that can perfectly recover the training set but it is designing a system that performs the best on unseen data that you encounter in future. In result, we use a test set as a proxy for unseen data. If your test set is small or not a good representative of unseen data your final system will perform poorly on unseen data. It happens often that people in data science challenges make so many modifications to their model based on test set that assumption of test set being unseen data becomes false. There is a pretty good course by Percy Liang on Statistical Learning that studies the learning phenomenon for few families of model and will give you a good intuition if you’re interested to know more.

3 Dec '20 · 1 ·

anonymous

The goal of the training is to get the best model that can generalize on the unseen data. Metrics on training data show how well are you fitting the data that you have, but that is not generalizing. So if you choose the best hyperparameters on training data (seen data), it is not aligned with the goal, which would be to choose the best hyperparameters for unseen data. With the validation set, you introduce that "unseen" moment that gives you a clue of what results you can expect with the real unseen data. And therefore you hope that if with some hyperparameters you get the best performance on validation unseen data, you will also get the best performance on testing unseen data.

Let me know if it is still unclear.

Best,
Maja

3 Dec '20 ·

anonymous

Yes, optimizing hyperparameters based on the validation set will help design models that perform the best on unseen data.

Thank you very much!

3 Dec '20 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).