minibatch implementation

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

minibatch implementation

Good evening,

Looking online I find different ways to implement minibatch GD. Some select B random samples from N with replacement some without. Some do this random sampling at every iteration, others split the initial dataset into a number of batches of size B and iterate through each batch for different number of epochs. Which implementation is preferred, or which one should we remember for the exam? It also leads me to believe that depending on different implementation the minibatch gradient won't always be equal in expectation to the gradient right?

thanks!

1

7 Jan '21 ·

anonymous

Two are ok for me. The last question you asked is false they both are equal in expectation to the gradient.

7 Jan '21 ·

anonymous

thanks for the answer, I understand the proof that shows that for uniformly sampled sgd is equal to the gradient in expectation but I can't seem to convince myself with a proof that in all the different implementations of mini-batch this is true. By linearity of expectation I can prove it for the random uniform sampling with replacement but not for the others :/

8 Jan '21 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).