Problem 4 final-exam 2017
Hello, I was wondering why having more data makes k-nn more accurate, doesn't that depdend on the extra data ? Since it wasn't mentioned that the extra data was balanced I thought of a counter example. For a fixed K if we add a lot more data from one of the classes won't it shift the decision boundary to one side, therefore decreasing the accuracy, am I wrong ? I made a drawing to illustrate what I think :
This is assuming the question is what answer is always true. I guess in general yes it's better to have more data, but it's not always better ?
(sorry for the bad hand-writing - it says new decision boundary on the right )
Thank you for your time,
final exam 2017 problem 4 - k-nn more data
Problem 4 final-exam 2017
Hello, I was wondering why having more data makes k-nn more accurate, doesn't that depdend on the extra data ? Since it wasn't mentioned that the extra data was balanced I thought of a counter example. For a fixed K if we add a lot more data from one of the classes won't it shift the decision boundary to one side, therefore decreasing the accuracy, am I wrong ? I made a drawing to illustrate what I think :
This is assuming the question is what answer is always true. I guess in general yes it's better to have more data, but it's not always better ?
(sorry for the bad hand-writing - it says new decision boundary on the right )
Thank you for your time,
1
I think that if the data is selected randomly, it preserves the distribution, thus, a knn's performance will be improved.
Add comment