Mini-Batch SGD

Hello, could you explain the second equality? And why the inner product after the third equality is gone?
Screenshot 2021-06-25 154901.jpg

Top comment

The second equality uses the linearity of expectation and summation, and swaps their order.

The third equality use the fact that in expectation the stochastic gradients \(\mathbf{g}_t=\nabla f(\mathbf{x}_t)\). The cross-terms (inner products) will be equal in expectation to \(-2 \,\text{norm}(\nabla f(\mathbf{x}_t))^2\).

Hope this helps.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification