Hello :)
On last week lecture we defined MSE as:
$$ \operatorname{MSE}(\mathbf{w}):=\frac{1}{N} \sum_{n=1}^{N}\left[y_{n}-f_{\mathbf{w}}\left(\mathbf{x}_{n}\right)\right]^{2} $$
Problem Set 2, however, defines MSE the following way:
$$ \mathcal{L}\left(w_{0}, w_{1}\right)=\frac{1}{2 N} \sum_{n=1}^{N}\left(y_{n}-f\left(x_{n 1}\right)\right)^{2}=\frac{1}{2 N} \sum_{n=1}^{N}\left(y_{n}-w_{0}-w_{1} x_{n 1}\right)^{2} $$
Are both equations equivalent? What is the effect of dividing MSE by 2? Is it easier to optimize or does is it only look nicer?
The constant doesn’t matter – you will still find the same optimal weight(s). It is often used with 1/2 as it cancels out nicely when you take the gradients.
Cost Function: MSE /2 (?)
Hello :)
On last week lecture we defined MSE as:
$$ \operatorname{MSE}(\mathbf{w}):=\frac{1}{N} \sum_{n=1}^{N}\left[y_{n}-f_{\mathbf{w}}\left(\mathbf{x}_{n}\right)\right]^{2} $$
Problem Set 2, however, defines MSE the following way:
$$ \mathcal{L}\left(w_{0}, w_{1}\right)=\frac{1}{2 N} \sum_{n=1}^{N}\left(y_{n}-f\left(x_{n 1}\right)\right)^{2}=\frac{1}{2 N} \sum_{n=1}^{N}\left(y_{n}-w_{0}-w_{1} x_{n 1}\right)^{2} $$
Are both equations equivalent? What is the effect of dividing MSE by 2? Is it easier to optimize or does is it only look nicer?
The constant doesn’t matter – you will still find the same optimal weight(s). It is often used with 1/2 as it cancels out nicely when you take the gradients.
4
Add comment