TLDR: there is no ambiguity. true error is the expected error, but it is still random because of the randomness of the training set \(S\) (which results in random \(f_S\)). In the bias-variance decomposition slides, we want to calculate the expected "expected error (true error)" over the randomness of training set\(S\).
First of all, true error (risk) and expected error (risk) are the same thing. They both refer to the error \(L(f) = E_{(x,y)\sim \mathcal D}[l(y,f(x))]\).
Note that a machine learning model is trained with training data \(S\), and the function \(f\) can be written as \(f_S\) because different training data \(S\) will result in different function \(f\). And the true error (expected error, true risk, expected risk,..) is written as \(L(f_S) = E_{(x,y)\sim \mathcal D}[l(y,f_S(x))]\).
In bias-variance decomposition lecture, the quantity concerned is the expectation over the true error (expected error, true risk, expected risk,..), given random traning set \(S\). If you want you can explicitly expand it as \(E_{S\sim \mathcal D} L(f_S) = E_{S\sim \mathcal D} E_{(x,y)\sim \mathcal D}[l(y,f_S(x))]\).
Bias Variance Trade-off
Hey, why is the y axis in the graph called "True Error"? we had a different definition for the in the previous lecture,
the following pictures are:
1) the graph i am asking about
2) The definition of the True Error
3) what the sum of the three terms equals to
thanks in advance,
1
TLDR: there is no ambiguity. true error is the expected error, but it is still random because of the randomness of the training set \(S\) (which results in random \(f_S\)). In the bias-variance decomposition slides, we want to calculate the expected "expected error (true error)" over the randomness of training set \(S\).
First of all, true error (risk) and expected error (risk) are the same thing. They both refer to the error \(L(f) = E_{(x,y)\sim \mathcal D}[l(y,f(x))]\).
Note that a machine learning model is trained with training data \(S\), and the function \(f\) can be written as \(f_S\) because different training data \(S\) will result in different function \(f\). And the true error (expected error, true risk, expected risk,..) is written as \(L(f_S) = E_{(x,y)\sim \mathcal D}[l(y,f_S(x))]\).
In bias-variance decomposition lecture, the quantity concerned is the expectation over the true error (expected error, true risk, expected risk,..), given random traning set \(S\). If you want you can explicitly expand it as \(E_{S\sim \mathcal D} L(f_S) = E_{S\sim \mathcal D} E_{(x,y)\sim \mathcal D}[l(y,f_S(x))]\).
Hope this makes you clearer.
Tianzong
1
Add comment