Matrix factorization ALS

This forum is inactive. Browsing/searching possible.

CS433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Matrix factorization ALS

Hi,

Could you please detail the steps to find the gradient with respect to Z ?

I tried first with respect to Z_i,j, the result is :
\sum_d=1 to D -w_d,j(x_d,i - (WZ^T)_d,i)

Thus, I think that the derivative with respect to Z_i is :
\sum_d=1 to D -(x_d,i - (WZ^T)_d,i) . w_d

However, I don't know what the derivative wrt to Z could be and how to come to a matrix form.

Thank you for your help.

edit : I think this solution should work :
the derivative wrt to zi is :
\sum_d=1 to D of : -W_d (x_d,i - W_d @ (Z_i)^T)
= W^T ( (x^T_i)^T - W @ (Z_i)^T) = 0
=> W^T @ (x^T_i)^T = W^T @ W @ (Z_i)^T
For some reasons that I can't explain, replacing Z_i by Z and (x^T_i)^T by X seems to give the derivative wrt to Z.

However, I am still struggling with the derivative of ALS when we're missing some data.

12 Jan '22 · 1 ·

anonymous

Top comment

You replace and expand things, you know explicitly the function.

12 Jan '22 ·

el mahdi chayti

Hello,
To directly have a compact form and when in doubt I highly advise going back to the basics.
How is the gradient of real-output function defined ? \(f(x+h)= f(x) + <\nabla f(x),h> + o(h)\), when the input x is a matrix, then the inner product is \(<x,y>= tr(x^\top y)\).

Now if your function is \(f(Z) = 1/2 \Vert X-WZ^\top\Vert^2_F\), then you can take inspiration from least squares and say the gradient is \( (WZ^\top-X)^\top W\), if you can't see it then go back to the definition \(f(Z+H) = f(Z)+ Tr((WH^\top)^\top(WZ^\top-X))+o(H)\) , we can write \(Tr((WH^\top)^\top(WZ^\top-X))\) as \(Tr(HW^\top(WZ^\top-X)) = Tr(W^\top(WZ^\top-X)H)=Tr([(WZ^\top-X)^\top W]^\top H)\) which means the gradient is \((WZ^\top-X)^\top W\).
If you add to this the term \(\lambda_z Z\) and set everything to zero you should find the first expression.

1

12 Jan '22 · 1 ·

el mahdi chayti

Thank you for your answer. The only part that I don't understand is how you can find from the definition that
f(Z+H)=f(Z)+Tr((WH⊤)⊤(WZ⊤−X))+o(H)

12 Jan '22 ·

anonymous

Top comment

You replace and expand things, you know explicitly the function.

12 Jan '22 ·

el mahdi chayti

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).