[Lecture 9c] Query-based gradient estimation

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

[Lecture 9c] Query-based gradient estimation

Hi,

In a score-based black-box attack, I don't understand why we approximate the gradient by summing the approximate directional derivate over all the direction (I don't understand why it should works) ?

For me it would make more sense to compute an approximate for each directional derivative, but then only take the biggest one as the direction of the gradient is the one with the greatest change.

Thanks in advance

10 Jan '22 ·

anonymous

Top comment

Hi,

So what is suggested in the lecture slides: we approximate each of the \(d\) partial derivatives of function \(g(x)\) which results in an approximate gradient \(\nabla_x g(x)\). It should work because when we tend \(\alpha \rightarrow 0\), we recover the exact gradient \(\nabla_x g(x)\). But, of course, in practice it's not a perfect scheme as we may encounter numerical issues when selecting a too small \(\alpha\), and even if this were not a problem, still querying the function \(g(x)\) \(d\) times may be prohibitively expensive (e.g., on ImageNet \(d \approx 150'000\). To mitigate these problems, in practice sometimes random directions (say, from the Gaussian distribution) are used instead of basis vectors \(e_i\) which leads to an unbiased gradient estimate in expectation. So there are indeed multiple options available here.

About your suggestion: if I understand your suggestion correctly, you propose to use an estimate that corresponds to only one non-zero element (which is the largest according to the directional derivatives). If this is the idea, then it's not a correct gradient estimate. The direction of the gradient is indeed the direction with the greatest change but only when we restrict ourselves to a neighbourhood defined via an \(\ell_2\)-ball. What you are saying would be the steepest direction with respect to an \(\ell_1\)-ball and would correspond to coordinate descent, not gradient descent. More details are given, e.g., here: https://people.seas.harvard.edu/~yaron/AM221-S16/lecture_notes/AM221_lecture10.pdf (see Section 2).

I hope that helps.

Best,
Maksym

2

10 Jan '22 · 1 ·

Maksym Andriushchenko admin

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).