entry¶
entry¶
- class SgdOptimizer(learning_rate=None)[source]¶
随机梯度下降优化器。 定义参数为x,梯度为grad,第i次更新梯度有
\[x_{i+1} = x_{i} - \eta * grad\]- Parameters:
learning_rate (
float
) – 学习率
- class AdagradOptimizer(learning_rate=None, initial_accumulator_value=None, hessian_compression_times=1, warmup_steps=0, weight_decay_factor=0.0)[source]¶
Adagrad优化器,论文可参考 http://jmlr.org/papers/v12/duchi11a.html 定义参数为x,梯度为grad,第i次更新梯度时有
\[ \begin{align}\begin{aligned}g_{i+1} = g_{i} + grad^2\\x_{i+1} = x_{i} - \frac{\eta}{\sqrt{g_i + \epsilon}} grad\end{aligned}\end{align} \]- Parameters:
learning_rate (
float
) – 学习率initial_accumulator_value (
float
) – accmulator的起始值hessian_compression_times (
float
) – 在训练的时候,对accumulator使用hessian sketching算法进行压缩。1代表没有压缩,值越大,压缩效果越好warmup_steps (
int
) – 已弃用
- class AdamOptimizer(learning_rate=None, beta1=0.9, beta2=0.99, use_beta1_warmup=False, weight_decay_factor=0.0, use_nesterov=False, epsilon=0.01, warmup_steps=0)[source]¶
Adam优化器,论文可参考 https://arxiv.org/abs/1412.6980
定义参数为x,梯度为grad,第i次更新梯度时有
\[ \begin{align}\begin{aligned}m_{i+1} = \beta_1 * m_i + (1 - \beta_1) * grad\\v_{i+1} = \beta_2 * v_i + (1 - \beta_2) * grad^2\\w_{i+1} = w_i - \eta * \frac{m_i}{\sqrt{v_i + \epsilon}}\end{aligned}\end{align} \]- Parameters:
learning_rate (
float
) – 学习率beta1 (
float
) – 一阶矩估计的指数衰减率beta2 (
float
) – 二阶矩估计的指数衰减率epsilon (
float
) – 用来保证除数不为0的偏移量warmup_steps (
int
) – 已弃用
- class BatchSoftmaxOptimizer(learning_rate=None)[source]¶
Batch softmax优化器,论文可参考 https://research.google/pubs/pub48840/
- Parameters:
learning_rate (
float
) – 学习率
- class FtrlOptimizer(learning_rate=None, initial_accumulator_value=None, beta=None, warmup_steps=0, l1_regularization=None, l2_regularization=None)[source]¶
FTRL优化器,论文可参考 https://dl.acm.org/citation.cfm?id=2488200
- Parameters:
initial_accumulator_value (
float
) – accumulator的起始值beta (
float
) – 论文中的beta值