What was the complexity of the K-fold cross validation algorithm ?
We can see it as a function of K, N and D algorithm I guess but in another way we can also see that we are training a dataset on K times on K-1 groups of N/K samples of size D. It's not very clear whether the complexity should be O(K) or O(K(K-1)).
i.e. Take N=K, we train on K-1 samples and test on 1 sample. We do it K times. Here we obviously have O(K(K-1))
About the question on the Neural network with L layers of M nodes where we keep only one node. Was the answer "less than 1/M^L" ?
And finally about the function where there was a typo, could you please give us the answer ? I think that if we apply a relu with a threshold = 0 we always get "YES" but if we multiply by 2 the weights of the sigmoid layer, it's still possible to obtain a "NO".
edit :
Just remembered the "threshold" was "NO" if f(x) <= 0 else 1. So the ReLU still works with this threshold. I guess the combination of the sigmoid + relu doesn't work because it always outputs "YES" which is what I've ticked.
For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all
For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.
@Anonymous said:
For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.
No but you train one time on K-1 little set. It's equivalent to union them also.
The training size depends on K. The best way to notice it is to set K=N
@Anonymous said:
For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all
Agree, \( w \) or \( 2w \) only make the sigmoid steeper a little. When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. Thus, this doesn’t make the \( \sigma(z) \) change the decision.
This is also true when changing the Sigmoid to ReLU using the threshold 1/2 and 0. When the \( w^Tx \) is positive or negative the ReLU becomes that positive number or 0 respectively.
Lastly, When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. With this positive and negative sign remain the same, ReLU decision also remains the same.
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.
@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.
I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.
@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.
I was wondering about that too but then I saw the each layer is just sigma(Wx), so I assume there is no bias term.
For the first question, i think they was asking the complexity with regard to K. The overall complexity w.r.t. N and K should be O((K-1)/KNK) = O(NK), so O(K) w.r.t. K.
Also assuming N=K doesn't make sense because you could even assume N=K^2 as well, that part doesn't contribute to the complexity w.r.t. K.
@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.
I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.
The question actually says that \(\textbf{all the parameters}\) which should include both weights and biases equal to zero
A few solutions
Hi,
What was the complexity of the K-fold cross validation algorithm ?
We can see it as a function of K, N and D algorithm I guess but in another way we can also see that we are training a dataset on K times on K-1 groups of N/K samples of size D. It's not very clear whether the complexity should be O(K) or O(K(K-1)).
i.e. Take N=K, we train on K-1 samples and test on 1 sample. We do it K times. Here we obviously have O(K(K-1))
About the question on the Neural network with L layers of M nodes where we keep only one node. Was the answer "less than 1/M^L" ?
And finally about the function where there was a typo, could you please give us the answer ? I think that if we apply a relu with a threshold = 0 we always get "YES" but if we multiply by 2 the weights of the sigmoid layer, it's still possible to obtain a "NO".
edit :
Just remembered the "threshold" was "NO" if f(x) <= 0 else 1. So the ReLU still works with this threshold. I guess the combination of the sigmoid + relu doesn't work because it always outputs "YES" which is what I've ticked.
3
For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all
3
For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.
2
No but you train one time on K-1 little set. It's equivalent to union them also.
The training size depends on K. The best way to notice it is to set K=N
Agree, \( w \) or \( 2w \) only make the sigmoid steeper a little. When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. Thus, this doesn’t make the \( \sigma(z) \) change the decision.
This is also true when changing the Sigmoid to ReLU using the threshold 1/2 and 0. When the \( w^Tx \) is positive or negative the ReLU becomes that positive number or 0 respectively.
Lastly, When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. With this positive and negative sign remain the same, ReLU decision also remains the same.
So, these changes do not affect the accuracy.
1
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.
1
I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.
I was wondering about that too but then I saw the each layer is just sigma(Wx), so I assume there is no bias term.
For the first question, i think they was asking the complexity with regard to K. The overall complexity w.r.t. N and K should be O((K-1)/KNK) = O(NK), so O(K) w.r.t. K.
Also assuming N=K doesn't make sense because you could even assume N=K^2 as well, that part doesn't contribute to the complexity w.r.t. K.
The question actually says that \(\textbf{all the parameters}\) which should include both weights and biases equal to zero
Add comment