A few solutions

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

Hi,

What was the complexity of the K-fold cross validation algorithm ?
We can see it as a function of K, N and D algorithm I guess but in another way we can also see that we are training a dataset on K times on K-1 groups of N/K samples of size D. It's not very clear whether the complexity should be O(K) or O(K(K-1)).
i.e. Take N=K, we train on K-1 samples and test on 1 sample. We do it K times. Here we obviously have O(K(K-1))

About the question on the Neural network with L layers of M nodes where we keep only one node. Was the answer "less than 1/M^L" ?

And finally about the function where there was a typo, could you please give us the answer ? I think that if we apply a relu with a threshold = 0 we always get "YES" but if we multiply by 2 the weights of the sigmoid layer, it's still possible to obtain a "NO".

edit :
Just remembered the "threshold" was "NO" if f(x) <= 0 else 1. So the ReLU still works with this threshold. I guess the combination of the sigmoid + relu doesn't work because it always outputs "YES" which is what I've ticked.

3

20 Jan '22 · 3 ·

anonymous

For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all

3

20 Jan '22 ·

anonymous

For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.

2

20 Jan '22 ·

anonymous

@Anonymous said:
For K fold, we train K times, so it is O(K). You don’t train K-1 times on the little sets; you union them and train once, and test on the remaining once.

No but you train one time on K-1 little set. It's equivalent to union them also.

The training size depends on K. The best way to notice it is to set K=N

20 Jan '22 · 2 ·

anonymous

@Anonymous said:
For the last question, I think it was not a combination of relu + sigmoid, but relu with doubled weights (combination of the two previous cases). So I think it is the same accuracy after all

Agree, \( w \) or \( 2w \) only make the sigmoid steeper a little. When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. Thus, this doesn’t make the \( \sigma(z) \) change the decision.

This is also true when changing the Sigmoid to ReLU using the threshold 1/2 and 0. When the \( w^Tx \) is positive or negative the ReLU becomes that positive number or 0 respectively.

Lastly, When \( w^Tx \) is positive or negative, \( 2w^Tx \) retains the same sign. With this positive and negative sign remain the same, ReLU decision also remains the same.

So, these changes do not affect the accuracy.

1

20 Jan '22 ·

anonymous

For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

1

20 Jan '22 ·

anonymous

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.

20 Jan '22 · 1 ·

anonymous

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I was wondering about that too but then I saw the each layer is just sigma(Wx), so I assume there is no bias term.

20 Jan '22 ·

anonymous

For the first question, i think they was asking the complexity with regard to K. The overall complexity w.r.t. N and K should be O((K-1)/KNK) = O(NK), so O(K) w.r.t. K.

Also assuming N=K doesn't make sense because you could even assume N=K^2 as well, that part doesn't contribute to the complexity w.r.t. K.

20 Jan '22 · 1 ·

anonymous

@Anonymous said:

@Anonymous said:
For the second question I checked that we do not have enough information and one reason for that is that biases are not mentioned in the whole description and I think they will affect the probability.

I have put less than 1/M^L because there are M^L path from the input to the output and I thought that it is possible that one of them was more predictive than the others. We have 1/M^L chance to pick this path.

The question actually says that \(\textbf{all the parameters}\) which should include both weights and biases equal to zero

21 Jan '22 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).