Project topic

Hello,

I have a question regarding a suggested topic for the project :

" AdaGrad / Adam / signSGD: Can you suggest/try different data-dependent coordinate-wise learning rates schemes and compare them?"

What does " data-dependent" means? If we work with images, any type of image would fit?

Thank you in advance!

Top comment

Hello,

Algorithms like Adam use a different step size per coordinate. That's the coordinate-wise part. Those step sizes depend on the gradients observed in the past. If you run them on different data, you would get different step sizes. That's the data-dependent part.

Hope this clarifies the confusion. Let me know otherwise!

Ok, that's clear! Thank you!

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification