Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

ALS optimisation

Hello,
I was wondering if you could give more information about the derivation of the ALS update rule (for instance, the derivation for the matrix W) ?
For the ALS update rule with missing entries, is the update rule available in a closed (matrix) form ? In this case, could you explain how to get it ? Or is it only writable using a sum ?

Best,
Alessio

Top comment

Hi Allessio,

Or is it only writable using a sum?

I'm not sure if this is the only way, but I would certainly recommend writing the objective as a sum over the dataset. This way it is also more intuitive to differentiate the objective with respect to any of the rows of the two matrices. The ALS updates would be given by setting those derivatives to zero. After you did those derivations, you can check the results. If the expressions are linear with sums in them, you can try to write those back into matrix form.

Hello,

As a follow up to Alessio's question, I was trying to compute the derivative w.r to W of the closed form of the loss function given for ALS. For some reason, this task is a bit challenging and I struggle to retrieve the result of \(W^T\) that is presented in the slides when I set the derivative computed to 0.

For instance, if this is the closed form representation of the loss function (I added regularization terms)

\(\frac{1}{2} \vert \vert X-WZ^T \vert \vert + \frac{1}{2} \lambda_w \vert \vert W \vert \vert + \frac{1}{2} \lambda_z \vert \vert Z \vert \vert\)

Whenever I take the derivative of this loss w.r to W, set it to 0 and solve for W, I get dimension errors or a result that does not align/isn't equivalent to the one given in the slides. Additionally, why is the solution of \(W^T\) given instead of simply W? Does this imply that it is easier to take the derivative of the loss w.r to \(W^T\)? Even so, I cannot seem to retrieve the solution given - I am most likely doing something wrong. Taking the derivative as a sum over the whole dataset also seems like a much more arduous and time consuming task. Is it usually the way you would recommend retrieving these computations?

Would it be possible to give us a bit more details on how this solution was retrieved? Or at least a few additional hints on ways to compute it? Getting the solution for \(Z^T\) is not a problem though.

Thank you very much in advance!
Simon

The gradient with respect to \(W\) is : \((WZ^\top - X)Z + \lambda_w W\).
We set this to zero and get : \(W (Z^\top Z+ \lambda_w I) = XZ\)
In other words : \(W = XZ (Z^\top Z+ \lambda_w I)^{-1}\)
Now transpose everything and you get the formulae given in the lecture.

Transposing does not change anything since you can always transpose a second time.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification