I have a small question regarding the train/val/test split. I've looked into some answers online but I am still not convinced. The question is simple : Why don't we train a model (generally a NN) using a bigger train set that contains the train+val set and validate our model using the test set ?
With this strategy we have :
More data
Control on at which parameters the model performs the best on the test set
This strategy is problematic since in this case you would explicitly optimize over the test set. Namely, you won't optimize the parameters of the model on the test set which is good, but you will still optimize the hyperparameters of your model. And then if you have a large number of hyperparameters (e.g., imagine doing a grid search over the learning rate, regularization parameter, number of units in your network, etc), then it can happen just by chance then using some lucky set of hyperparameters you can achieve a very low value of the test error, while this will not generalize to unseen data.
That's why a 3-way split is necessary: one to optimize the main parameters of the model, one to optimize the hyperparameters, and one to report the final performance of the final training scheme.
[Splitting train/val/test]
Hi there!
I have a small question regarding the train/val/test split. I've looked into some answers online but I am still not convinced. The question is simple : Why don't we train a model (generally a NN) using a bigger train set that contains the train+val set and validate our model using the test set ?
With this strategy we have :
Thank you
This strategy is problematic since in this case you would explicitly optimize over the test set. Namely, you won't optimize the parameters of the model on the test set which is good, but you will still optimize the hyperparameters of your model. And then if you have a large number of hyperparameters (e.g., imagine doing a grid search over the learning rate, regularization parameter, number of units in your network, etc), then it can happen just by chance then using some lucky set of hyperparameters you can achieve a very low value of the test error, while this will not generalize to unseen data.
That's why a 3-way split is necessary: one to optimize the main parameters of the model, one to optimize the hyperparameters, and one to report the final performance of the final training scheme.
I hope this helps.
5
It perfectly makes sense ! THank you veyr much
Add comment