Embedding¶
initializer¶
- class ZerosInitializer¶
全 0 初始化器,将会把 embedidng 的初始值设为全 0
Example
>>> from monolith.native_training.entry import ZerosInitializer >>> initializer = ZerosInitializer()
- class ConstantsInitializer(constant)¶
常数初始化器,将会把 embedidng 的初始值设为常数
Example
>>> from monolith.native_training.entry import ConstantsInitializer >>> initializer = ConstantsInitializer(0.0)
- class RandomUniformInitializer(minval=None, maxval=None)¶
随机均匀的初始化器,将会把初始化区间默认为[minval, maxval]
Example
>>> from monolith.native_training.entry import RandomUniformInitializer >>> initializer = RandomUniformInitializer(-0.015625, 0.015625)
- Parameters:
minval (
float
) – 初始化的区间maxval (
float
) – 初始化的区间
optimizer¶
- class SgdOptimizer(learning_rate=None)¶
随机梯度下降优化器。 定义参数为 \(x\),梯度为 \(grad\),则第 \(i\) 次更新梯度有
\[x_{i+1} = x_{i} - \eta * grad\]Example
>>> from monolith.native_training.entry import SgdOptimizer >>> optimizer = SgdOptimizer(0.01)
- Parameters:
learning_rate (
float
) – 学习率
- class AdagradOptimizer(learning_rate=None, initial_accumulator_value=None, hessian_compression_times=1, warmup_steps=0, weight_decay_factor=0.0)¶
Adagrad优化器,论文可参考 http://jmlr.org/papers/v12/duchi11a.html。 定义参数为 \(x\),梯度为 \(grad\),第 \(i\) 次更新梯度时有
\[ \begin{align}\begin{aligned}g_{i+1} = g_{i} + grad^2\\x_{i+1} = x_{i} - \frac{\eta}{\sqrt{g_i + \epsilon}} grad\end{aligned}\end{align} \]Example
>>> from monolith.native_training.entry import AdagradOptimizer >>> optimizer = AdagradOptimizer(0.01)
- Parameters:
learning_rate (
float
) – 学习率initial_accumulator_value (
float
) – accumulator 的起始值hessian_compression_times (
float
) – 在训练的时候,对 accumulator 使用 hessian sketching 算法进行压缩。1 代表没有压缩,值越大,压缩效果越好warmup_steps (
int
) – 已弃用
- class AdamOptimizer(learning_rate=None, beta1=0.9, beta2=0.99, use_beta1_warmup=False, weight_decay_factor=0.0, use_nesterov=False, epsilon=0.01, warmup_steps=0)¶
Adam优化器,论文可参考 https://arxiv.org/abs/1412.6980。
定义参数为 \(x\),梯度为 \(grad\),第 \(i\) 次更新梯度时有
\[ \begin{align}\begin{aligned}m_{i+1} = \beta_1 * m_i + (1 - \beta_1) * grad\\v_{i+1} = \beta_2 * v_i + (1 - \beta_2) * grad^2\\w_{i+1} = w_i - \eta * \frac{m_i}{\sqrt{v_i + \epsilon}}\end{aligned}\end{align} \]Example
>>> from monolith.native_training.entry import AdamOptimizer >>> optimizer = AdamOptimizer(0.01)
- Parameters:
learning_rate (
float
) – 学习率beta1 (
float
) – 一阶矩估计的指数衰减率beta2 (
float
) – 二阶矩估计的指数衰减率epsilon (
float
) – 用来保证除数不为0的偏移量warmup_steps (
int
) – 已弃用
- class FtrlOptimizer(learning_rate=None, initial_accumulator_value=None, beta=None, warmup_steps=0, l1_regularization=None, l2_regularization=None)¶
FTRL优化器,论文可参考 https://dl.acm.org/citation.cfm?id=2488200
Example
>>> from monolith.native_training.entry import FtrlOptimizer >>> optimizer = FtrlOptimizer(0.01)
- Parameters:
initial_accumulator_value (
float
) – accumulator 的起始值beta (
float
) – 论文中的 beta 值
compressor¶
- class Fp16Compressor¶
模型推理阶段,对 embedding 进行 Fp16 编码,从而达到在服务时节省内存的目的
Example
>>> from monolith.native_training.entry import Fp16Compressor >>> compressor = Fp16Compressor()
- class Fp32Compressor¶
模型推理阶段,对 embedding 进行 Fp32 编码
Example
>>> from monolith.native_training.entry import Fp32Compressor >>> compressor = Fp32Compressor()