In the solution for this question, why do we multiply the gradient with num_examples since we just consider one data point for calculating gradient?
def calculate_stochastic_gradient(y, X, w, lambda_, n, num_examples):
def is_support(y_n, x_n, w):
"""a datapoint is support if max{} is not 0. """
return y_n * x_n @ w < 1
x_n, y_n = X[n], y[n]
grad = - y_n * x_n.T if is_support(y_n, x_n, w) else np.zeros_like(x_n.T)
grad = num_examples * np.squeeze(grad) + lambda_ * w
return grad
You want the gradient to be unbiased, in SVM the objective is a sum, you chose a data point at random with probability 1/num_examples so you need to multiply by num_examples to recuperate in average the true gradient.
But it is not a big problem if you don't multiply by it, however this will change the scale of the learning rate and the régularisation parameter.
Ex7 Question 1.2
In the solution for this question, why do we multiply the gradient with num_examples since we just consider one data point for calculating gradient?
You want the gradient to be unbiased, in SVM the objective is a sum, you chose a data point at random with probability 1/num_examples so you need to multiply by num_examples to recuperate in average the true gradient.
But it is not a big problem if you don't multiply by it, however this will change the scale of the learning rate and the régularisation parameter.
2
Add comment