At the end of the notebook is the question:
What do you observe about the rates of convergence of the two methods? Can you explain this difference?
The fixed point iteration with the Newton update clearly converges faster. I could not think of the reason. Would someone care to explain?
This is related to the difference between gradient descent and Newton methods explained in the lecture. To sum up, Newton converges faster since it uses the second-order information (the Hessian of the function).