Some questions on 2019 exam

-Q5: I would have assumed that oracle 2 has higher variance since it is in part SGD (which has higher variance than GD, at least I think so). But the answer is "none is correct", so does this mean that they have equal variance?

-Q20: What is meant by 'graph' here? Is it talking about the epigraph?

-Q21: Why is this False? What would be the correct answer of L-smooth and convex? In the 2018 exam a similar question had the same rate, but it did not talk about smoothness, so I am guessing that this is what's making a difference.

-Q28: I am having a hard time understanding why L_g = eta + d. I understand that the hessian is = A, but why is the norm of A equals to eta + d?

Top comment

It requires smoothness, as the proof of convergence for random search says (slide 11 of Lecture 10: Assume that f is a L-smooth convex, differentiable function).

Q5: I am also confused about this one, although I noted that the stochastic oracle \( g_{SG(x)} \) simply outputs the full gradient plus some Gaussian noise. I computed

\( Var[g_A] = E[g_A^2] - E[g_A]^2 = \frac{1}{2}(2g_G)^2-(g_G)^2 = (g_G)^2 \)

\( Var[g_B] = \frac{1}{2}g_G^2 + \frac{1}{2}g_{SG}^2 = ...\)
not sure how to proceed since \(g_{SG} = g_G + \xi\)

Beware that the gradients are not scalars

I am also confused by the questions raised by OP. Any answers would be much appreciated.

I just want to add regarding Q21: In the slides, it says that for derivative-free random search: "Always d times the complexity of gradient descent on the function class.". However, for smooth and convex functions the complexity is of O(1/eps) so why isn't the answer True here. (And in exam 2018 Q14, why isn't the answer O(dL/eps^2), following the same logic?)

Thanks in advance.

Q21: It's False because it shouldn't be with (fixed) stepsize 1/L but with line search

@Anonymous said:
Q21: It's False because it shouldn't be with (fixed) stepsize 1/L but with line search

Ah yes, I see it now, you're right. So in general the results we found in class related to random search do not require smoothness at all? There was always a mention of an L term, so I have been assuming that we were assuming that f is also L-smooth.

Top comment

It requires smoothness, as the proof of convergence for random search says (slide 11 of Lecture 10: Assume that f is a L-smooth convex, differentiable function).

Add comment

Post as Anonymous Dont send out notification