Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

GD on Hyperparameters

In the lecture, Martin mentioned that Grid Search is the simplest but not the best optimization algorithm. My question is why is it used to tune hyperparameters?

Would it be possible to do gradient descent or a variation to optimize the hyperparameters?

Is there literature on adaptive grid search algorithms?

My guess is that the range of hyperparameters are much smaller than for regular parameters. In that scenario, a grid search does the trick without costing too much.

Would it be possible to do gradient descent or a variation to optimize the hyperparameters?

You could: https://stackoverflow.com/questions/43420493/sklearn-hyperparameter-tuning-by-gradient-descent

the main reason is that the training objective is often not differentiable with respect to hyperparameters (a hyperparameter for example is the mini-batch size, or the number of neurons/layer or many other choices. the stepsize is also a hyperparam and some people tried gradient on this. this is called meta-learning)

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification