Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Set of classifiers is not convex

Hello,

In the first part of the first lecture (w7), when talking about the convexity of the loss function, I don't see why we need the set of classifier to be convex (we saw this concept earlier, but I don't see how it applies here). Indeed, if the set contains only convex functions, then I believe the loss should be convex, isn't it ?

Thanks for your help

Hi,

In order to have a convex problem (which you can in theory solve) you need to optimize a convex function over a convex domain. That is why you need the set of classifiers to be convex. Since a classifier is taking value in a discrete set, then the set of classifiers is in general non convex (let's see a simple counter example: if you are doing the average of the constant classifier which always outputs +1 and of the constant classifier -1 you will obtain the value 0 and therefore your average will not be a classifier since it does not take the values {-1,1} ). That is why we are restricting ourselves to convex set of functions with real values (in the course we are considering the set of linear classifier \( w\mapsto w^\top x \).

Best,
Nicolas

Hello again,

Thanks for you answer. I am sorry but I don't really understand... I see that when the function to optimize is convex on a convex domain (e.g. \( \mathbb{R}^n \), then we can find an optimum. But why do we need the condition that the set of functions to optimize is convex (So all linear combinations of the functions are in the set) ? If we have a discrete set of classifiers for example with two functions, both of them can be convex on a convex domain (the "input" of the function is a convex domain), so we can can find an optimum for both. Hence, I would say we have an "easy" problem... Why do we need the set of classifier to be convex ? Maybe I did not understand what it means to have a set of classifiers that is convex ? I think it would mean that for any functions \(f, g\) in the set and for any \(\theta \in [0, 1]\) it holds that \(\theta f + (1 - \theta)g\) is also in the set... If it is correct, I don't understand why we need this condition to find optimums of f and g.

Sorry for the inconvenience, I feel like I am missing an obvious detail

I think there is a confusion.

You have an optimization problem where your variable is a function (i.e., classifier in that precise case). You want to find the classifier with the smaller training loss. Therefore you are searching over the set of all possible classifiers. You do not want to minimize the classifier itself.

And you know that in order to be solvable it is better for an optimization problem to be convex, i.e. to minimize a convex function over a convex set. That is why you want the set you are minimizing on to be convex, and since in this precise example the set you are minimizing on is the set of all function you are searching over, you want to restrict yourself to such a set which is convex.

If it is still not clear, come Thursday to the Q&A session and I will explain it again.

Oh, I see, indeed there was a confusion from my part. Thanks for the clarification.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification