Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Theorem of the paper by Barron

Hello,
In the lesson neural nets - representation power, I don't understand why the lemma in page 3 applies only for activation function that are "sigmoid-like" and not for others ?
Thank you for your answer.

Hi,

The approximation result by Barron tells you it is possible to approximate some function by a one-hidden-layer NN with sigmoid-like activation functions (see the paper http://www.stat.yale.edu/~arb4/publications_files/UniversalApproximationBoundsForSuperpositionsOfASigmoidalFunction.pdf)
The reason it requires sigmoid functions comes from the proof:

  • first, you approximate your function by a sum of rectangles
  • then you approximate your rectangles by the sum of two sigmoids

You see that the particular shape of the sigmoid is important here: you need your activation function to be able to approximate a step function.

Note however that it exists other approximation results for NN with other activations functions. For example, we saw one about pointwise approximation using RELU NNs at the end of the lecture too.

Best,
Nicolas

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification