Q 8 exam 2020

Hello,

I was wondering how to prove formally that the output of a linear network with more than one layer as a function of the weight matrices is a non convex function with \( \hat{Y} = W_l W_{l-1}..W_1X \)?

Thanks for any help!

Cheers,

Yann

Top comment

you can for example prove it for l=2 and the matrices being just numbers. (w_2*w_1 in the two-param plane is a saddle, so a non-convex function)

But, for me the resulting function is a linear function. So it represents a plane. So how can there be saddle points? We need at least x² complexity to have saddle points, no?

Ok, I got it, it is a function of the weights !

So now I understand why there are saddle points.

But now I dont' understand why it is "equally or less expressive than a one-layer linear network" ? To my understanding, the one-layer linear network must have less weights. So we are not in the same weights input dimension.

What do we understand by expressivity?

@Anonymous said:
So now I understand why there are saddle points.

But now I dont' understand why it is "equally or less expressive than a one-layer linear network" ? To my understanding, the one-layer linear network must have less weights. So we are not in the same weights input dimension.

What do we understand by expressivity?

Hi, Check page 104 of lecture notes. It is explained there.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification