Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Exam 2017 Problem 19

Hi,

Could you please explain the solution to this problem ?

2017.jpg

Aren't we supposed to apply regularization or dropout before overfitting ?

Why is the third answer false ? I thought L(w) was a sum of L_n(w). Here L_n(w) would be 2 n w.

Thank you for your time

SGD is faster : nothing to explain.

Overfitting: better to only add regularization / dropout layers after you have observed your model to overfit, because you wouldn't need regularization.

Why you need to multiply by 30 (in general N when you have a sum)? you need from your gradient estimate to be unbiased i.e. its expectation is equal to the true gradient, if you pick a sample uniformly (with proba 1/30) then the average if you don't multiply by 30 would be true_gradient/30 hence why you would need to multiply by 30 your gradient estimate.

The last point is a special case of matrix factorization which is not convex.

Can you explain more in detail the last point?
I have u=(u1, u2), v=(v1,v2). So uv_T is a 2x2 matrix with entries [u1v1, u1v2] for the first row and [u2v1, u2v2] for the second row. So now g(uv_T)= u1v2 + u2v1.
How do I prove this is non convex?

up

Regarding the last point: \(f(u_1, u_2, v_1, v_2) = u_1 v_2 + u_2 v_1\), let's try this:

\(f(-1, -1, 1, 1) = -2\) and \(f(1,1,-1,-1) = -2\), and \(f(0,0,0,0) = 0\).

If \(f\) were convex, then \(f(-1,-1,1,1) + f(1,1,-1,-1) >= 2 f(0,0,0,0)\) should hold, but it isn't the case.

You can also calculate the hessian: the hessian is ((0,0,1,0), (0,0,0,1), (1,0,0,0), (0,1,0,0)) and it is not psd.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification