Analysis of the bayes classifier: scaling everything by a constant
Hi,
I have a question concerning the end of the analysis of the bayes classifier using all features. I don't see how we can rescale everything by \(1/(1 + \mathrm{log}(D))\) and have an equivalent situation. For me, if we start from the last expression of the argmax, if we rescale it, we indeed have noise with smaller variance, but the \(y \in \{\pm 1\}\) becomes \(\tilde{y} \in \{ \pm \frac{1}{1 + \mathrm{log}(D)}\}\), so we don't get anything new. Obviously I am missing something, so it would be great if you could tell me what.
Great question! Maybe I have been a little clumsy in my explanation and I hope it is better explained in the lecture note.
Let's start by what we have at the end of page 12:
$$ \arg\max_{\hat y\in \{ -1,+1\}} \hat y y (1 + \log(D)) + \hat y Z \text{ with } Z \sim \mathcal{N} (0, 1+\log(D) ) $$
Therefore by rescaling by \(1/(1+\log(D)) \) we obtain
$$ \arg\max_{\hat y\in \{ -1,+1\}} \hat y (y+ \tilde Z) \text{ with } \tilde Z \sim \mathcal{N} (0, 1/(1+\log(D)) ) $$
When the dimension becomes large the variance of this Gaussian noise will become very small and the sign of \(y+ \tilde Z\) will be the same as the sign of \(y\) with high probability. Therefore by taking the argmax you will have \( \hat y = sign (y+ \tilde Z ) = y\) and you will recover the correct \(y\).
Analysis of the bayes classifier: scaling everything by a constant
Hi,
I have a question concerning the end of the analysis of the bayes classifier using all features. I don't see how we can rescale everything by \(1/(1 + \mathrm{log}(D))\) and have an equivalent situation. For me, if we start from the last expression of the argmax, if we rescale it, we indeed have noise with smaller variance, but the \(y \in \{\pm 1\}\) becomes \(\tilde{y} \in \{ \pm \frac{1}{1 + \mathrm{log}(D)}\}\), so we don't get anything new. Obviously I am missing something, so it would be great if you could tell me what.
Thanks,
Justin
1
Hi Justin,
Great question! Maybe I have been a little clumsy in my explanation and I hope it is better explained in the lecture note.
Let's start by what we have at the end of page 12:
$$ \arg\max_{\hat y\in \{ -1,+1\}} \hat y y (1 + \log(D)) + \hat y Z \text{ with } Z \sim \mathcal{N} (0, 1+\log(D) ) $$
Therefore by rescaling by \(1/(1+\log(D)) \) we obtain
$$ \arg\max_{\hat y\in \{ -1,+1\}} \hat y (y+ \tilde Z) \text{ with } \tilde Z \sim \mathcal{N} (0, 1/(1+\log(D)) ) $$
When the dimension becomes large the variance of this Gaussian noise will become very small and the sign of \(y+ \tilde Z\) will be the same as the sign of \(y\) with high probability. Therefore by taking the argmax you will have \( \hat y = sign (y+ \tilde Z ) = y\) and you will recover the correct \(y\).
Is it clearer now?
Best,
Nicolas
1
Hi,
Thanks a lot for your answer. It is perfectly clear now, I was not careful enough while reading the notes, everything makes sense.
Best,
Justin
Add comment