Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Ex7 Question 1.2

In the solution for this question, why do we multiply the gradient with num_examples since we just consider one data point for calculating gradient?

def calculate_stochastic_gradient(y, X, w, lambda_, n, num_examples):
    def is_support(y_n, x_n, w):
        """a datapoint is support if max{} is not 0. """
        return y_n * x_n @ w < 1

    x_n, y_n = X[n], y[n]
    grad = - y_n * x_n.T if is_support(y_n, x_n, w) else np.zeros_like(x_n.T)
    grad = num_examples * np.squeeze(grad) + lambda_ * w
    return grad

You want the gradient to be unbiased, in SVM the objective is a sum, you chose a data point at random with probability 1/num_examples so you need to multiply by num_examples to recuperate in average the true gradient.
But it is not a big problem if you don't multiply by it, however this will change the scale of the learning rate and the régularisation parameter.

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification