Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Different dropout used in part 1

Good evening,

I wonder why we used different ways to implement dropout in the first part of the exercice session. For the convolutional part, we use a Dropout2D module, while in the fully connected layers, we use the dropout from the "torch.nn.functional" package. I read online that the Dropout2D module was better to use, as it allows to choose when to enable the dropout during evaluation (using model.train() or model.eval()), while the alternative did not care about the "state" of the model. Does it mean that we wan to use dropout in the fully connected layers, even when predicting (after training) ?

Thanks for your help,
Justin

Hi Justin,
First, just to prevent a misunderstanding, both Dropout and Dropout2D behave differently in train and eval modes (meaning after you call model.train() or model.eval()). Both modules zero-out some part of the feature maps in the train mode and behave as an identity function in the eval mode (you can have a closer look at here: https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html).
The difference between the two is, Dropout drops each neuron (or feature) randomly, whereas Dropout2D drops random channels completely. The reason is, during convolution, the values in a channel are highly correlated, as they are computed using the same convolution kernel on the same image (or another feature map). Therefore dropping values randomly has a very small effect, as the model can retrieve the values by looking at it's neighbors. Therefore much more meaningful thing is dropping channels completely, what Dropout2D is doing. Dropout is used before fully connected layers, whereas Dropout2D is used before (or between) convolutional layers.

Oh, I see, thanks a lot for the clarification

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification