Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

A few solutions

Hi,

What was the complexity of the K-fold cross validation algorithm ?
We can see it as a function of K, N and D algorithm I guess but in another way we can also see that we are training a dataset on K times on K-1 groups of N/K samples of size D. It's not very clear whether the complexity should be O(K) or O(K(K-1)).
i.e. Take N=K, we train on K-1 samples and test on 1 sample. We do it K times. Here we obviously have O(K(K-1))

About the question on the Neural network with L layers of M nodes where we keep only one node. Was the answer "less than 1/M^L" ?

And finally about the function where there was a typo, could you please give us the answer ? I think that if we apply a relu with a threshold = 0 we always get "YES" but if we multiply by 2 the weights of the sigmoid layer, it's still possible to obtain a "NO".

edit :
Just remembered the "threshold" was "NO" if f(x) <= 0 else 1. So the ReLU still works with this threshold. I guess the combination of the sigmoid + relu doesn't work because it always outputs "YES" which is what I've ticked.

For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all

For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.

@Anonymous said:
For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.

No but you train one time on K-1 little set. It's equivalent to union them also.

The training size depends on K. The best way to notice it is to set K=N

@Anonymous said:
For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all

Agree, \( w \) or \( 2w \) only make the sigmoid steeper a little. When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. Thus, this doesn’t make the \( \sigma(z) \) change the decision.

This is also true when changing the Sigmoid to ReLU using the threshold 1/2 and 0. When the \( w^Tx \) is positive or negative the ReLU becomes that positive number or 0 respectively.

Lastly, When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. With this positive and negative sign remain the same, ReLU decision also remains the same.

So, these changes do not affect the accuracy.

For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I was wondering about that too but then I saw the each layer is just sigma(Wx), so I assume there is no bias term.

For the first question, i think they was asking the complexity with regard to K. The overall complexity w.r.t. N and K should be O((K-1)/KNK) = O(NK), so O(K) w.r.t. K.

Also assuming N=K doesn't make sense because you could even assume N=K^2 as well, that part doesn't contribute to the complexity w.r.t. K.

@Anonymous said:

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.

The question actually says that \(\textbf{all the parameters}\) which should include both weights and biases equal to zero

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification