Hello,

Me and my team are trying to implement the Random Search Algorithm in order to compare zero-order and first-order methods. I have a few questions about this:

1. In lecture 10 slides, the line search is done for all gammas on the real line. However, I don’t see how we can do this – wouldn’t we need the gradient to be able to find the best gamma that minimizes the function in the given direction? In that case, are supposed to approximate the gradient somehow? Or is it enough to minimize over a grid of gammas?

2. Since zero-order methods don’t require the computation of the gradient, there is no point in using batches. But this makes it difficult to compare the method to the first order algorithms where the loss per epoch is much higher since it is accumulated over many batches. For the sake of the comparison, would it be sensible to do the Random Search in batches, even though it slows it down unnecessarily?

Top comment

Hi,

wouldn’t we need the gradient to be able to find the best gamma that minimizes the function in the given direction?

Such line search is typically executed with binary search, or by trying learning rates on an exponential grid, and refining with binary search.

Since zero-order methods don’t require the computation of the gradient, there is no point in using batches.

Why? I think it can still yield better utilization of the computation hardware (GPUs)

For the sake of the comparison, would it be sensible to do the Random Search in batches, even though it slows it down unnecessarily?

I'm not sure I follow the reason why it slows it down unnecessarily, but you could probably all methods without batches?

Hi and thank you for your response,

Unfortunately I don't see how we can apply binary search in this case. Could you please specify what exactly is the search applied to, or refer me to some literature? To my understanding, we can't simply apply the binary search to the gammas, we would need to have all the corresponding losses as well.

Page 1 of 1