Indeed, that's a departure from the standard gradient descent with unnormalized updates (which can be used for this problem as well). However, in the literature on adversarial examples, it's more common to use normalized updates (such as \(\nabla/||\nabla||_2\) for \(\ell_2\) or \(sign(\nabla)\) for \(\ell_\infty\) perturbations which both discard the magnitude of the gradient \(\nabla\)) simply because it's usually easier to select the appropriate step size \(\alpha\) as a multiple of the perturbation radius \(\epsilon\). In most cases, this helps to avoid using a grid search over the step size, so people prefer to do that.
Adversarial ML
Hey, is there a reason why the gradient is being normalized at each step? In gradient descent we didn't do that.
Thanks,
Hi,
Indeed, that's a departure from the standard gradient descent with unnormalized updates (which can be used for this problem as well). However, in the literature on adversarial examples, it's more common to use normalized updates (such as \(\nabla/||\nabla||_2\) for \(\ell_2\) or \(sign(\nabla)\) for \(\ell_\infty\) perturbations which both discard the magnitude of the gradient \(\nabla\)) simply because it's usually easier to select the appropriate step size \(\alpha\) as a multiple of the perturbation radius \(\epsilon\). In most cases, this helps to avoid using a grid search over the step size, so people prefer to do that.
I hope that helps.
Best,
Maksym
3
Add comment