Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

theory question complexity

Hello,

I understand that we consider D different Wi for each dimension but how is the x N obtained? If it is number of samples i'm not sure why that is true because for every W combination there isn't just a 1 step process to evaluate loss.

thanks for your help

complexity.JPG

Hi, the assumption is calculating the loss for a single value is O(1), so doing it N times is O(N) complexity. O(N) is important as SGD does not require O(N) complexity for each step (whereas grid search does).
Best,
Semih

thank you for your quick response, I understood that was the assumption but I am not sure why we can say that calculating the loss has complexity O(1)

It is more of a matter of choice. Here we choose to say it has the complexity of O(N) instead of O(1). We chose O(N) to stress that some algorithms do not look at the whole dataset to make a single decision (like SGD). I think an example would make it clearer. Let's say we instead make a new algorithm, which is a batched grid search, which samples B examples randomly from the training set, and performs the evaluation of a single grid location. If we were to assume loss calculation has O(1) complexity, then we cannot really show batched grid search is faster than grid search. However, we can show the complexity reduction if we assume calculating loss has O(N) complexity, then batched one would have O(B) complexity.
Here we choose O(N) as we will compare grid-search with SGD later and SGD indeed does batched computation.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification