optimizer:https://keras.io/optimizers/
下面recommended to leave the parameters of this optimizer at their default values的優化算法,名字旁邊打星号
SGD
keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)
RMSprop 适用于RNN*
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
連結:http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Adagrad*
keras.optimizers.Adagrad(lr=0.01, epsilon=None, decay=0.0)
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.
連結:http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
Adadelta,a more robust extension of Adagrad*
keras.optimizers.Adadelta(lr=1.0, rho=0.95, epsilon=None, decay=0.0)
連結:https://arxiv.org/abs/1212.5701
Adam
keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
連結:
https://arxiv.org/abs/1412.6980v8
https://openreview.net/forum?id=ryQu7f-RZ
Adamax,a variant of Adam based on the infinity norm
keras.optimizers.Adamax(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
連結同adam
Nadam*,Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.
keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)
連結:
http://cs229.stanford.edu/proj2015/054_report.pdf
http://www.cs.toronto.edu/~fritz/absps/momentum.pdf