SGD Lab 7

This forum is inactive. Browsing/searching possible.

CS433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Hello,

I do not understand why for the update of the weights, gamma is divided by #iters +1?

Thank you for your answer.

7 Jan '22 ·

anonymous

Top comment

Unlike GD, SGD does not permit to fully converge for a fixed stepsize, in fact, convergence can only be guaranteed up to a ball of radius determined by the noise of the gradients used to perform the updates and the stepsize (the radius is proportional to both). For this reason there are strategies (called schedulers) that decrease the stepsize with iterations, one of these strategies is to choose gamma_t=gamma /(t+1), in general there is a condition (I don't remember what it is called) that says that if the stepsizes are chosen such that \(\sum_t \gamma_t=\infty\) and \(\sum_t \gamma_t^2<\infty\) then convergence is guaranteed. One other strategy that is used in practice is to keep the stepsize fixed and each time the learning stalls divide it by say 2.

2

7 Jan '22 ·

el mahdi chayti

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).