Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

SGD Lab 7

Hello,

I do not understand why for the update of the weights, gamma is divided by #iters +1?

Thank you for your answer.

update_weights.jpg

Top comment

Unlike GD, SGD does not permit to fully converge for a fixed stepsize, in fact, convergence can only be guaranteed up to a ball of radius determined by the noise of the gradients used to perform the updates and the stepsize (the radius is proportional to both). For this reason there are strategies (called schedulers) that decrease the stepsize with iterations, one of these strategies is to choose gamma_t=gamma /(t+1), in general there is a condition (I don't remember what it is called) that says that if the stepsizes are chosen such that \(\sum_t \gamma_t=\infty\) and \(\sum_t \gamma_t^2<\infty\) then convergence is guaranteed. One other strategy that is used in practice is to keep the stepsize fixed and each time the learning stalls divide it by say 2.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification