Exam - SVM | oknoname.com

This forum is inactive. Browsing/searching possible.

CS-433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Exam - SVM

Hi,

I have a question regarding the problem 13 in the 2018 final exam. (image below)

Screenshot from 2020-12-30 20-01-05.jpg

If I understood correctly the points that lays exactly on the margin have a loss value of 0, since \(y_{n} x^{t} w = 1\) . So it doesn't impact the value of the loss function, thus the weights update. Then why the margin increases when we remove this points?

Thanks

1

30 Dec '20 ·

anonymous

I’m not a TA but how I understood it is that the solution is usually very sparse (in other words there are usually only a very few support vectors compared to the amount of data) so if you remove a support vector the margin will probably increase because there will be a new point further away that will make the margin even wider.. If that makes sense?

Edit: I think the key here is that lambda is very large which restricts the complexity of the model.

1

30 Dec '20 · 1 ·

anonymous

If we use the dual formulation then the points exactly on the margin have an alpha between 0 and 1. With alpha greater than 0, the points are taken into account when computing the loss function or the hyperplane. That is why the margin will change. Given that the point was on the margin and what we want is to find the hyperplane with maximal spacing, then if we don't have the point on the margin, the margin can increase.

Having said that, I am still not quite sure because you are right on the fact that the points that lay exactly on the margin have a loss value of 0, and even if alpha is 0.9 we are multiplying 0.9 times 0.

@anonymous, could you explain why the key is the size of lambda? How would the results change if lambda was smaller?

1

30 Dec '20 ·

anonymous

I agree with you and would love to get more explanation about your thoughts @anonymous1

1

31 Dec '20 ·

anonymous

I think the point is that you can now minimize the loss function including the regularization term by making the modulus of the weights smaller without incurring a higher classification error, i.e. you can increase the margin without making any wrong predictions (as we have just removed the support vectors) but still reducing your regularization error. Note that the function being optimized above includes the L2 term.

1

31 Dec '20 ·

anonymous

Adding to the last comment, the total width of the margin is equal to 2/||w||. Thus, if ||w|| is smaller, the total width of the margin will increase.

31 Dec '20 ·

anonymous

This topic has been moved.

@anonymous why removing a point in the margin implies that w gets smaller ?

1

4 Jan '21 ·

anonymous

"@anonymous why removing a point in the margin implies that w gets smaller ?"

We remove a point in the margin
We can therefore make the margin bigger
- The width of the margin is equal to 2/||w||, therefore increasing the margin actually means making the modulus of the weights smaller (||w||).
- Notice that this also decreases the loss function. Especially because of the very large value of lambda that makes the regularization term have more weight than the wrong predictions term.

1

5 Jan '21 ·

anonymous

OK thanks for the clear explanation. Is it still true if we have more than one point in the margin?

1

5 Jan '21 ·

anonymous

Good point!!

I'm guessing that maybe the hyperplane could rotate and increase even if there are other points on the margin. However, you are completely right on the fact that there might be cases in which the margin does not increase. For example, if there is a point next to that point on the margin.

Nevertheless, the question asks for the most likely scenario and I guess we should assume that the most likely scenario is that there are no duplicate points or that the margin is not completely covered with points.

1

5 Jan '21 ·

anonymous

Alright ! Thanks :)

1

5 Jan '21 ·

anonymous

Hello, I have question about this task.
We have such formula for w
SVM w.jpg
So omega depends on the values of alpha. But if the point lies exactly on the boundary of margin, then the alpha can be from 0 to 1 inclusive. If alpha is set to 0, this point doesn't matter for w, and nothing will change if it is removed. But if alpha is taken to be non-zero, then after removing this point, w will change. It seems to come out somewhat ambivalent.

10 Jan '21 · 3 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).