Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Coordinate Ascent algorithm clarifications

Hello,

The recent discussion (on Discord and on the forum) has really confused me regarding the Coordinate descent algorithm.

As far as I understood it, the way to perform an update on \( \alpha_n \) is to calculate the partial derivative wrt. \( \alpha_n \) of \( G(\alpha, w) = \alpha^T1 - \frac{1}{2\lambda}\alpha YXX^TY\alpha \), setting it equal to 0, and solving that to obtain the new \( \alpha_n \). Since \(G(\alpha, w)\) is concave in \(\alpha\), this should give us the coordinate of the maximum of G for that specific n. This obtained \( \alpha_n \) might not be in the range [0, 1], so we need to clamp it to that range.

Am I making sense so far? Or am I going on completely the wrong path?

Now, when it comes to solving that equation, there was a hint given to obtain the update in terms of w, among other factors. Calculating the partial derivative gives me \(1 - \frac{1}{\lambda}(\alpha^TYXX^TY)_n = 1 - (YXw)_n = 0\).
I might be making a really stupid mistake with this, but I do not see \(\alpha_n\) in that equation at all, giving me no way to explicitly solve for it.

I'd like to ask the TAs for some advice on what could be going wrong with my reasoning, and some hints as to which path to follow.

Thanks in advance for you help.

Hi Carlo,

Did you check the updated version of the lab sheet ? An announcement was put on moodle concerning the fact we added some intermediate questions to help you. You can find this updated version on the github repository.

As far as I understood it, the way to perform an update on αn is to calculate the partial derivative wrt. αn of G(α,w)=αT1−12λαYXXTYα, setting it equal to 0, and solving that to obtain the new αn.
This is correct (however it is equivalent but easier to follow the reasoning from the lab sheet where a parameter \( \gamma \) is introduced).

Since G(α,w) is concave in α, this should give us the coordinate of the maximum of G for that specific n
I am not sure I understand this statement, could you please reformulate it ?

This obtained αn might not be in the range [0, 1], so we need to clamp it to that range.
This is correct, however you need to show that clamping it does indeed solve the initial constrained problem.

Am I making sense so far? Or am I going on completely the wrong path?
Apart from the second sentence where I am not sure what you meant, you are making total sense ;)

Now, when it comes to solving that equation, there was a hint given to obtain the update in terms of w, among other factors. Calculating the partial derivative gives me 1−1λ(αTYXXTY)n=1−(YXw)n=0.
I might be making a really stupid mistake with this, but I do not see αn in that equation at all, giving me no way to explicitly solve for it.

\( 1 - \frac{1}{\lambda} (\alpha^T Y X X^T Y)_n = 0 \) is exactly the equation you need to solve to update \( \alpha_n \), notice that this expression involves all the values \(\alpha_i\) , \( 1 \leq i \leq N \), you then need to isolate the term \( \alpha_n\) in order to express it in terms of all the other \(\alpha_i\) , \( 1 \leq i \leq N \), \( i \neq n \). By playing with this equation a bit you should be able to express the update in terms of the required parameters. However, though in the end it is equivalent, I would encourage you to follow the reasonning which is proposed in the lab sheet (where a parameter \( \gamma \) is introduced), this way the update appears more naturally.

Hope this helps !

Best,

Scott

Page 1 of 1