in the second step when calculating the p* shouldn't the norm of grad(f) be squared?
Yes, good catch. The answer seems correct, though?
Yes I agree
Hello, could you please explain to me the first line of the solution for this exercise? I am not sure why we are allowed to split the expectation of ( g_t - grad_f)^2 . Is this a general property of expectation that maybe I don't remember?
Yes, see the definition of variance on Wikipedia, that should be clear! :)