SGD_not_converging

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

Hi !
I was wondering if it's normal that for the SGD the loss still varies and so do the weights after quite a number of iterations (I interpret it as it not converging...) It's in lab_2. Compared to gradient descent, where the weights and the loss "stabilize". For SGD, the loss start in the thousands and then varies 20 +- 10, as for the weights, it's (similar to GD) but varying as well (of course).

20 Oct '20 ·

anonymous

No worries, this is totally normal!

When you are considering the stochastic gradient descent algorithm, at each step you are following a noisy direction which is only in expectation the negative gradient direction (which is the direction you have to follow to decrease the most your function value). Therefore your function value will not always decrease since you are suffering from the extra variance of your noisy gradient estimate.

If you want to achieve convergence you will have to use a decreasing step-size (often proportional to 1/k or 1/sqrt(k) where k is the number of iterations). The effect of this decreasing step-size will be to kill the variance introduced by the stochastic gradient estimate. However since you are using a smaller step-size, the convergence speed will be slower compared to the one of deterministic GD. Indeed you are making smaller steps at each iteration.

Note that in practice (and in particular for non-convex problems such as DNN training) people tend to use constant step-size they tune by cross-validation.

Best,
Nicolas

21 Oct '20 ·

anonymous

Thank you for this answer ! Makes more sense and I believe you mentioned the "decreasing step size" in class !

21 Oct '20 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).