What would be the definition (in simpler terms) of an architecture in the context of NN?
An NN architecture is defined as the way how hyperparameters (number of layers, neurons per layer, ...) organize the weights (how they are multiplied/added/combined in general and special transformations like activation functions).
Do we obtain a different architecture by removing a node/edge (a.k.a. a different graph)?
Yes. With regard to dropout, which puts some weights to 0 (= remove edge), you get a new architecture everytime you sample (a subset of) edges.
Do we obtain a different architecture by changing the value of some weight edges?
No. As only the values of the weights change and not the way they are organized. Changing the value to 0 is a border-case here.
When we are training our model do we change the architecture at every SGD step by removing some nodes according to probability \(p^{l}_{j}\)?
Correct. Mostly the dropout is defined more locally w.r.t. a subset of all the weights. See below.
In practice should we use a global value of \(p\) for all the nodes/edges or different values \(p^{l}_{j}\) for different nodes/egdes?
Mostly, dropout is instantiated as a 'gating' layer and can technically be used anywhere (or everywhere). You have the same \(p^{l}_{j}\) for nodes/edges in the same layer, but can have different \(p^{l}_{j}\) for different layers.
See also: Where should I place dropout layers in a neural network?
Dropout
Hi,
Thanks!
1
An NN architecture is defined as the way how hyperparameters (number of layers, neurons per layer, ...) organize the weights (how they are multiplied/added/combined in general and special transformations like activation functions).
Yes. With regard to dropout, which puts some weights to 0 (= remove edge), you get a new architecture everytime you sample (a subset of) edges.
No. As only the values of the weights change and not the way they are organized. Changing the value to 0 is a border-case here.
Correct. Mostly the dropout is defined more locally w.r.t. a subset of all the weights. See below.
Mostly, dropout is instantiated as a 'gating' layer and can technically be used anywhere (or everywhere). You have the same \(p^{l}_{j}\) for nodes/edges in the same layer, but can have different \(p^{l}_{j}\) for different layers.
See also: Where should I place dropout layers in a neural network?
3
Add comment