Ex7 Question 1.2

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

In the solution for this question, why do we multiply the gradient with num_examples since we just consider one data point for calculating gradient?

def calculate_stochastic_gradient(y, X, w, lambda_, n, num_examples):
    def is_support(y_n, x_n, w):
        """a datapoint is support if max{} is not 0. """
        return y_n * x_n @ w < 1

    x_n, y_n = X[n], y[n]
    grad = - y_n * x_n.T if is_support(y_n, x_n, w) else np.zeros_like(x_n.T)
    grad = num_examples * np.squeeze(grad) + lambda_ * w
    return grad

2 Dec '21 ·

Mete Ismayil

You want the gradient to be unbiased, in SVM the objective is a sum, you chose a data point at random with probability 1/num_examples so you need to multiply by num_examples to recuperate in average the true gradient.
But it is not a big problem if you don't multiply by it, however this will change the scale of the learning rate and the régularisation parameter.

2

2 Dec '21 ·

el mahdi chayti

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., \(\alpha + \beta\) gives (inline) \(\alpha + \beta\). No \(\LaTeX\) preview (yet).