### exam 2017 problem 11

So in exam 2017 problem 11:
the answer is as follows: "Since I use a linear activation function I in fact use just a linear scheme. And since the problem is convex
SGD will give the same result as least squares. So we will get exactly the same result."

When we specificy 'since the problem is convex SGD will give same results as least squares'. How do we formally prove the problem is convex?

Not sure about your question tbh, but for the original question, the point I got is that you should use a non-linear function for the activation function to get something interesting out of NNs :).

