Could you please detail the steps to find the gradient with respect to Z ?
I tried first with respect to Z_i,j, the result is :
\sum_d=1 to D -w_d,j(x_d,i - (WZ^T)_d,i)
Thus, I think that the derivative with respect to Z_i is :
\sum_d=1 to D -(x_d,i - (WZ^T)_d,i) . w_d
However, I don't know what the derivative wrt to Z could be and how to come to a matrix form.
Thank you for your help.
edit : I think this solution should work :
the derivative wrt to zi is :
\sum_d=1 to D of : -W_d (x_d,i - W_d @ (Z_i)^T)
= W^T ( (x^T_i)^T - W @ (Z_i)^T) = 0
=> W^T @ (x^T_i)^T = W^T @ W @ (Z_i)^T
For some reasons that I can't explain, replacing Z_i by Z and (x^T_i)^T by X seems to give the derivative wrt to Z.
However, I am still struggling with the derivative of ALS when we're missing some data.
Hello,
To directly have a compact form and when in doubt I highly advise going back to the basics.
How is the gradient of real-output function defined ? \(f(x+h)= f(x) + <\nabla f(x),h> + o(h)\), when the input x is a matrix, then the inner product is \(<x,y>= tr(x^\top y)\).
Now if your function is \(f(Z) = 1/2 \Vert X-WZ^\top\Vert^2_F\), then you can take inspiration from least squares and say the gradient is \( (WZ^\top-X)^\top W\), if you can't see it then go back to the definition \(f(Z+H) = f(Z)+ Tr((WH^\top)^\top(WZ^\top-X))+o(H)\) , we can write \(Tr((WH^\top)^\top(WZ^\top-X))\) as \(Tr(HW^\top(WZ^\top-X)) = Tr(W^\top(WZ^\top-X)H)=Tr([(WZ^\top-X)^\top W]^\top H)\) which means the gradient is \((WZ^\top-X)^\top W\).
If you add to this the term \(\lambda_z Z\) and set everything to zero you should find the first expression.
Matrix factorization ALS
Hi,
Could you please detail the steps to find the gradient with respect to Z ?
I tried first with respect to Z_i,j, the result is :
\sum_d=1 to D -w_d,j(x_d,i - (WZ^T)_d,i)
Thus, I think that the derivative with respect to Z_i is :
\sum_d=1 to D -(x_d,i - (WZ^T)_d,i) . w_d
However, I don't know what the derivative wrt to Z could be and how to come to a matrix form.
Thank you for your help.
edit : I think this solution should work :
the derivative wrt to zi is :
\sum_d=1 to D of : -W_d (x_d,i - W_d @ (Z_i)^T)
= W^T ( (x^T_i)^T - W @ (Z_i)^T) = 0
=> W^T @ (x^T_i)^T = W^T @ W @ (Z_i)^T
For some reasons that I can't explain, replacing Z_i by Z and (x^T_i)^T by X seems to give the derivative wrt to Z.
However, I am still struggling with the derivative of ALS when we're missing some data.
You replace and expand things, you know explicitly the function.
Hello,
To directly have a compact form and when in doubt I highly advise going back to the basics.
How is the gradient of real-output function defined ? \(f(x+h)= f(x) + <\nabla f(x),h> + o(h)\), when the input x is a matrix, then the inner product is \(<x,y>= tr(x^\top y)\).
Now if your function is \(f(Z) = 1/2 \Vert X-WZ^\top\Vert^2_F\), then you can take inspiration from least squares and say the gradient is \( (WZ^\top-X)^\top W\), if you can't see it then go back to the definition \(f(Z+H) = f(Z)+ Tr((WH^\top)^\top(WZ^\top-X))+o(H)\) , we can write \(Tr((WH^\top)^\top(WZ^\top-X))\) as \(Tr(HW^\top(WZ^\top-X)) = Tr(W^\top(WZ^\top-X)H)=Tr([(WZ^\top-X)^\top W]^\top H)\) which means the gradient is \((WZ^\top-X)^\top W\).
If you add to this the term \(\lambda_z Z\) and set everything to zero you should find the first expression.
1
Thank you for your answer. The only part that I don't understand is how you can find from the definition that
f(Z+H)=f(Z)+Tr((WH⊤)⊤(WZ⊤−X))+o(H)
You replace and expand things, you know explicitly the function.
Add comment