Week 12: Differentiable Nash equilibrium

This forum is inactive. Browsing/searching possible.

CS-433 Machine Learning

Connect your moderator Slack workspace to receive post notifications:

Bugs/improvements?

Week 12: Differentiable Nash equilibrium

Hi, I have a question regarding the equations shown in slide 5 of the first lecture of week 12:

We first define the Nash equilibrium as follows, where \( theta\) is the minimizer and \(phi\) the maximizer.

But in the following lines we define the differentiable Nash equilibrium as follows:

But if the second derivative of the Loss with respect to theta is > 0 doesn't that mean that the loss is minimized in \(theta*\) , not maximized like we state in the previous expression ?

3 Jan '21 ·

anonymous

Player G (Generator): "I want to minimize loss with only parameter that I got and that is theta that is in set BIG_THETA."

Player D (Discriminator/Adversary): "I want to maximize loss with only parameter that I got and that is phi that is in set BIG_PHI."

So DNE is defined:

1) as "classic" Nash Equilibrium, i.e.
for every theta that is in the set BIG_THETA
for every phi that is in the set BIG_PHI
it holds that
loss(theta := theta_star, phi := phi ) <= loss(theta := theta_star, phi := phi_star) <= loss(theta := theta, phi := phi_star)

In this possition both players say the following:

Player G (Generator): "I cant take any theta from the set BIG_THETA so that I can lower the loss function given as loss(theta := theta_star, phi := phi_star), at the point (theta := theta_star, phi := phi_star), so that means I will be happy with this what I got, hence I cant make any more moves with my parameter that I control - theta."

Player D (Discriminator\Adversary): "I cant take any phi from the set BIG_PHI so that I can make the loss function bigger, where the loss function is given as loss(theta := theta_star, phi := phi_star), at the point (theta := theta_star, phi := phi_star), so that means I will be happy with this what I got, hence I cant make any more moves with my parameter that I control - phi."

And this is why it is called an Equilibrium - players make no decisions no more when we are at the point (theta := theta_star, phi := phi_star), i.e. players stick to their decisions (player G with theta_star and player D with phi_star).

2) The second part of definition of DNE is stationarity for both gradients (with resect to theta and with respect to phi) need to be zero. Also, because this is locally optimal point it need to satisfy that the curvature at that optimal point (theta := theta_star, phi := phi_star) is going up for parameter theta (player G wants to minimize, hence want strictly convex function with respect to the parameter theta at the optimal point) and going down for parameter phi (player D wants to maximize, hence want strictly concave function with respect to the parameter phi at the optimal point).

Hope it helps! And I hope that what I have written is okay :))

If I am wrong please correct me!

If want we can talk over it, email me at :
milos.novakovic@epfl.ch

1

3 Jan '21 ·

anonymous

I did not ask the question but I think the explanation is clear. However I believe the question is more about the notation and indeed it seems like there is a little error. When the hessian is positive definite, it means the function is "curved upwards", hence increases when\(\theta\) changes. Therefore, it is an equilibrium for the player that tries to minimize the function (hence the \(\phi\) player.

If you look at the equation on the first line, I think it is not the same as you say in your first sentence

@milos5 said:
Player G (Generator): "I want to minimize loss with only parameter that I got and that is theta that is in set BIG_THETA."

Player D (Discriminator/Adversary): "I want to maximize loss with only parameter that I got and that is phi that is in set BIG_PHI."

3 Jan '21 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).