### Q16 2018

Hi,
I understand the reason behind the relative ordering of the algorithms, although why is the case that Algorithm 2 cannot represent a Gradient descent meanwhile Algorithm 1 does?
Thanks

Top comment

I think the intention of the question is to remind that the gradient descent with correct step size (and if the objective is smooth) leads to a monotonic decreasing objective value, e.g. lemma 2.6 in the lecture notes. Though the gradient descent can diverge with incorrect learning rate, its learning curve will not be like algorithm 2.

Also, since the question doesn’t mention the function it smooth, does Accelerated GD still provide much speed up? I would have thought it doesn’t necessarily do that and therefore that it was Alg 2.

Page 1 of 1