Hi,
In the 4th lecture, it is written that K fold cross validation returns an unbiased estimation of the generalization error. What does it mean ? Can we also find a bound on the generalization error which depends on the number of samples ? Is the split train-test biased ?
Thank you
Regarding your first question, let me explain how I understand it: K-Folds CV is an estimator of the generalized error. It's an unbiased estimator meaning the expected value of the estimator is the same as the generalization error (The value we try to estimate).
Second question. I'm pretty sure you can see this derived from the hoeffding's inequality in the lecture about model selection (04a). You see that |S| (Size of distribution sample) decreases the error.
Using a single train/split will be biased in my understanding. Note that a single train/split is more or less a 1-fold cross-validator, so a bad estimator which will most likely have a bias.
K cross validation vs split train test
Hi,
In the 4th lecture, it is written that K fold cross validation returns an unbiased estimation of the generalization error. What does it mean ? Can we also find a bound on the generalization error which depends on the number of samples ? Is the split train-test biased ?
Thank you
Regarding your first question, let me explain how I understand it: K-Folds CV is an estimator of the generalized error. It's an unbiased estimator meaning the expected value of the estimator is the same as the generalization error (The value we try to estimate).
Second question. I'm pretty sure you can see this derived from the hoeffding's inequality in the lecture about model selection (04a). You see that |S| (Size of distribution sample) decreases the error.
Using a single train/split will be biased in my understanding. Note that a single train/split is more or less a 1-fold cross-validator, so a bad estimator which will most likely have a bias.
https://en.wikipedia.org/wiki/Bias_of_an_estimator
1
Add comment