K cross validation vs split train test

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

K cross validation vs split train test

Hi,
In the 4th lecture, it is written that K fold cross validation returns an unbiased estimation of the generalization error. What does it mean ? Can we also find a bound on the generalization error which depends on the number of samples ? Is the split train-test biased ?
Thank you

18 Jan '22 ·

anonymous

Top comment

Regarding your first question, let me explain how I understand it: K-Folds CV is an estimator of the generalized error. It's an unbiased estimator meaning the expected value of the estimator is the same as the generalization error (The value we try to estimate).

Second question. I'm pretty sure you can see this derived from the hoeffding's inequality in the lecture about model selection (04a). You see that |S| (Size of distribution sample) decreases the error.

Using a single train/split will be biased in my understanding. Note that a single train/split is more or less a 1-fold cross-validator, so a bad estimator which will most likely have a bias.

https://en.wikipedia.org/wiki/Bias_of_an_estimator

1

19 Jan '22 · 4 ·

Martin Dahl

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).