天天看點

Keras手冊記錄之optimizer, 待更新各個算法詳解!

optimizer:https://keras.io/optimizers/

下面recommended to leave the parameters of this optimizer at their default values的優化算法,名字旁邊打星号

SGD

 keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

RMSprop 适用于RNN*

keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)

連結:http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

Adagrad*

keras.optimizers.Adagrad(lr=0.01, epsilon=None, decay=0.0)

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.

連結:http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

Adadelta,a more robust extension of Adagrad*

keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=None, decay=0.0)

連結:https://arxiv.org/abs/1212.5701

Adam

keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

連結:

https://arxiv.org/abs/1412.6980v8

https://openreview.net/forum?id=ryQu7f-RZ

Adamax,a variant of Adam based on the infinity norm

keras.optimizers.Adamax(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)

連結同adam

Nadam*,Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.

keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)

連結:

http://cs229.stanford.edu/proj2015/054_report.pdf

http://www.cs.toronto.edu/~fritz/absps/momentum.pdf

繼續閱讀