Dropout

This forum is inactive. Browsing/searching possible.

CS-433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Hi,

What would be the definition (in simpler terms) of an architecture in the context of NN?
Do we obtain a different architecture by removing a node/edge (a.k.a. a different graph) ?
Do we obtain a different architecture by changing the value of some weight edges?
When we are training our model do we change the architecture at every SGD step by removing some nodes according to probability \(p^{l}_{j}\) ?
In practice should we use a global value of \(p\) for all the nodes/edges or different values \(p^{l}_{j}\) for different nodes/egdes?

Thanks!

1

10 Nov '20 · 6 ·

anonymous

What would be the definition (in simpler terms) of an architecture in the context of NN?

An NN architecture is defined as the way how hyperparameters (number of layers, neurons per layer, ...) organize the weights (how they are multiplied/added/combined in general and special transformations like activation functions).

Do we obtain a different architecture by removing a node/edge (a.k.a. a different graph)?

Yes. With regard to dropout, which puts some weights to 0 (= remove edge), you get a new architecture everytime you sample (a subset of) edges.

Do we obtain a different architecture by changing the value of some weight edges?

No. As only the values of the weights change and not the way they are organized. Changing the value to 0 is a border-case here.

When we are training our model do we change the architecture at every SGD step by removing some nodes according to probability \(p^{l}_{j}\)?

Correct. Mostly the dropout is defined more locally w.r.t. a subset of all the weights. See below.

In practice should we use a global value of \(p\) for all the nodes/edges or different values \(p^{l}_{j}\) for different nodes/egdes?

Mostly, dropout is instantiated as a 'gating' layer and can technically be used anywhere (or everywhere). You have the same \(p^{l}_{j}\) for nodes/edges in the same layer, but can have different \(p^{l}_{j}\) for different layers.
See also: Where should I place dropout layers in a neural network?

3

10 Nov '20 · 5 ·

anonymous

This topic has been moved.

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).