Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Why transform the prediction to {-1, 1} label again?

Hi,
In exercise 7 I'm wondering why we use transformation \(y = 2\tilde{y}-1\) when calculating the prediction from X@w, given that w should have been calculated based on \(y = \{1, -1\}\) and not on \(\tilde{y} = \{1, 0\}\), with Hinge loss. With w calculated based on the first option, transforming the result X@w shouldn't be necessary, or so I've understood.
Also, do we select only X@w > 0 to ensure we have positive values more susceptible to be similar to labels {0,1} or does it have to do with the margin ?
Thanks !

Hello,

I am not sure I totally understood what your questions were, but here is my attempt to answer them. Let me know if I did not understand your actual questions.

In exercise 7 I'm wondering why we use transformation y=2~y−1 when calculating the prediction from X@w, given that w should have been calculated based on y={1,−1} and not on ~y={1,0}, with Hinge loss. With w calculated based on the first option, transforming the result X@w shouldn't be necessary, or so I've understood.
When calculating the prediction of a given sample \( x \), we want this predicted label to be coherent with the training labels, for consistency. Hence in our case we want all predictions to be in \( \{ -1, 1 \} \). However, \(x^T w\) is not a prediction label, it is a real number. We have to convert this real number to \( 1 \) if \( x^T w \) is positive and to \( -1 \) if \(x^T w \) if negative. A way in python to do this is to return 2 * (X@w > 0) - 1 \( \in \{ -1, 1 \} \) .

Also, do we select only X@w > 0 to ensure we have positive values more susceptible to be similar to labels {0,1} or does it have to do with the margin ?
I do not understand this question, could you please reformulate ? We do not "select" only "X@w > 0": in python X@w > 0 returns a vector composed of \(0\)'s and \(1\)'s.

Let me know if there are still some unclear things. Best,

Scott

Thank you for your reply. I actually realize that X@w produces a real number and it's clearer to me now why we write 2 * (X@w > 0) - 1. I think my understanding problem was that I was reading the theory aside where we don't handle this part
We have to convert this real number to 1 if xTw is positive and to −1 if xTw if negative.
and hence the only time we do 2*...-1 is when transforming from {0, 1} to {-1, 1} labels, but now I realize it's not what's at stake here.
Thank you very much !

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification