some questions regarding gradiant descent

This forum is inactive. Browsing/searching possible.

CS-433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

some questions regarding gradiant descent

Do I understand well saying:

the stochastic gradient descent works best with a big number of iterations (according to the law of large numbers)?
all the minima we compute using gradient descent are local around w(0)?
Does a gradient descent with N data point converges k time faster than a stochastical gradient descent? (where k could be determined analytically maybe based on the std deviation of the gradient based on 1 datapoint).

thank you in advance for your answers

22 Sep '20 ·

anonymous

1) yes, in general, you would need more iterations with SGD compared to GD. (I don't know the link with the law of large numbers though). however the cost of a single iteration is much cheaper with SGD.

yes, most of the time we converge to the minima somewhere close to the w(0).

2

14 Oct '20 · 1 ·

anonymous

If you are minimizing a convex function, you will converge to a global minimizer.
If you are minimizing a non-convex and smooth function you will converge to a stationary point (a point where the gradient is equal to zero). You will have to do additional work and assumption to characterize the convergence to local minimum and to show that your algorithm will escape saddle point (even if it is often/always the case in practice)

To precisely compare the rate of convergence of SGD and GD you have to consider a class of function, (e.g. on non-smooth and convex function, both algorithms will converge at the same speed, but on smooth and strongly convex functions (when the eigenvalues of the Hessian are strictly larger than zero), GD will provably converge faster than SGD).

But what is sure, is that in its most used version, one step of SGD is often N times cheaper than computing the full gradient.

If you are interested in all these questions, you should follow our next lecture on optimization for ml. We will cover all these topics.

Cheers,
Nicolas

2

15 Oct '20 · 1 ·

anonymous

Thank you very much for the answers,
I would really enjoy following the lecture on Optimiztion for ML, but it is very unlikely as I am an exchange student at EPFL for only one semester.
You maybe have some books to recommend on the topic ?

Best regards,

Jean-Baptiste

16 Oct '20 · 1 ·

anonymous

Sure, you will find some materials (related courses and recommended books) on the course website (https://github.com/epfml/OptML_course). I particularly recommend the book of Sebastien Bubeck.

Best,
Nicolas

17 Oct '20 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).