Exam 2017 Problem 19

This forum is inactive. Browsing/searching possible.

CS433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Hi,

Could you please explain the solution to this problem ?

Aren't we supposed to apply regularization or dropout before overfitting ?

Why is the third answer false ? I thought L(w) was a sum of L_n(w). Here L_n(w) would be 2 n w.

Thank you for your time

15 Jan '22 ·

anonymous

SGD is faster : nothing to explain.

Overfitting: better to only add regularization / dropout layers after you have observed your model to overfit, because you wouldn't need regularization.

Why you need to multiply by 30 (in general N when you have a sum)? you need from your gradient estimate to be unbiased i.e. its expectation is equal to the true gradient, if you pick a sample uniformly (with proba 1/30) then the average if you don't multiply by 30 would be true_gradient/30 hence why you would need to multiply by 30 your gradient estimate.

The last point is a special case of matrix factorization which is not convex.

2

15 Jan '22 ·

el mahdi chayti

Can you explain more in detail the last point?
I have u=(u1, u2), v=(v1,v2). So uv_T is a 2x2 matrix with entries [u1v1, u1v2] for the first row and [u2v1, u2v2] for the second row. So now g(uv_T)= u1v2 + u2v1.
How do I prove this is non convex?

3

16 Jan '22 ·

anonymous

18 Jan '22 ·

anonymous

Regarding the last point: \(f(u_1, u_2, v_1, v_2) = u_1 v_2 + u_2 v_1\), let's try this:

\(f(-1, -1, 1, 1) = -2\) and \(f(1,1,-1,-1) = -2\), and \(f(0,0,0,0) = 0\).

If \(f\) were convex, then \(f(-1,-1,1,1) + f(1,1,-1,-1) >= 2 f(0,0,0,0)\) should hold, but it isn't the case.

You can also calculate the hessian: the hessian is ((0,0,1,0), (0,0,0,1), (1,0,0,0), (0,1,0,0)) and it is not psd.

18 Jan '22 ·

Tianzong Zhang admin

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).