Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

minibatch implementation

Good evening,

Looking online I find different ways to implement minibatch GD. Some select B random samples from N with replacement some without. Some do this random sampling at every iteration, others split the initial dataset into a number of batches of size B and iterate through each batch for different number of epochs. Which implementation is preferred, or which one should we remember for the exam? It also leads me to believe that depending on different implementation the minibatch gradient won't always be equal in expectation to the gradient right?

thanks!

Two are ok for me. The last question you asked is false they both are equal in expectation to the gradient.

thanks for the answer, I understand the proof that shows that for uniformly sampled sgd is equal to the gradient in expectation but I can't seem to convince myself with a proof that in all the different implementations of mini-batch this is true. By linearity of expectation I can prove it for the random uniform sampling with replacement but not for the others :/

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification