Embedding

initializer

class ZerosInitializer

全 0 初始化器,将会把 embedidng 的初始值设为全 0

Example

>>> from monolith.native_training.entry import ZerosInitializer
>>> initializer = ZerosInitializer()
class ConstantsInitializer(constant)

常数初始化器,将会把 embedidng 的初始值设为常数

Example

>>> from monolith.native_training.entry import ConstantsInitializer
>>> initializer = ConstantsInitializer(0.0)
class RandomUniformInitializer(minval=None, maxval=None)

随机均匀的初始化器,将会把初始化区间默认为[minval, maxval]

Example

>>> from monolith.native_training.entry import RandomUniformInitializer
>>> initializer = RandomUniformInitializer(-0.015625, 0.015625)
Parameters:
  • minval (float) – 初始化的区间

  • maxval (float) – 初始化的区间

optimizer

class SgdOptimizer(learning_rate=None)

随机梯度下降优化器。 定义参数为 \(x\),梯度为 \(grad\),则第 \(i\) 次更新梯度有

\[x_{i+1} = x_{i} - \eta * grad\]

Example

>>> from monolith.native_training.entry import SgdOptimizer
>>> optimizer = SgdOptimizer(0.01)
Parameters:

learning_rate (float) – 学习率

class AdagradOptimizer(learning_rate=None, initial_accumulator_value=None, hessian_compression_times=1, warmup_steps=0, weight_decay_factor=0.0)

Adagrad优化器,论文可参考 http://jmlr.org/papers/v12/duchi11a.html。 定义参数为 \(x\),梯度为 \(grad\),第 \(i\) 次更新梯度时有

\[ \begin{align}\begin{aligned}g_{i+1} = g_{i} + grad^2\\x_{i+1} = x_{i} - \frac{\eta}{\sqrt{g_i + \epsilon}} grad\end{aligned}\end{align} \]

Example

>>> from monolith.native_training.entry import AdagradOptimizer
>>> optimizer = AdagradOptimizer(0.01)
Parameters:
  • learning_rate (float) – 学习率

  • initial_accumulator_value (float) – accumulator 的起始值

  • hessian_compression_times (float) – 在训练的时候,对 accumulator 使用 hessian sketching 算法进行压缩。1 代表没有压缩,值越大,压缩效果越好

  • warmup_steps (int) – 已弃用

class AdamOptimizer(learning_rate=None, beta1=0.9, beta2=0.99, use_beta1_warmup=False, weight_decay_factor=0.0, use_nesterov=False, epsilon=0.01, warmup_steps=0)

Adam优化器,论文可参考 https://arxiv.org/abs/1412.6980

定义参数为 \(x\),梯度为 \(grad\),第 \(i\) 次更新梯度时有

\[ \begin{align}\begin{aligned}m_{i+1} = \beta_1 * m_i + (1 - \beta_1) * grad\\v_{i+1} = \beta_2 * v_i + (1 - \beta_2) * grad^2\\w_{i+1} = w_i - \eta * \frac{m_i}{\sqrt{v_i + \epsilon}}\end{aligned}\end{align} \]

Example

>>> from monolith.native_training.entry import AdamOptimizer
>>> optimizer = AdamOptimizer(0.01)
Parameters:
  • learning_rate (float) – 学习率

  • beta1 (float) – 一阶矩估计的指数衰减率

  • beta2 (float) – 二阶矩估计的指数衰减率

  • epsilon (float) – 用来保证除数不为0的偏移量

  • warmup_steps (int) – 已弃用

class FtrlOptimizer(learning_rate=None, initial_accumulator_value=None, beta=None, warmup_steps=0, l1_regularization=None, l2_regularization=None)

FTRL优化器,论文可参考 https://dl.acm.org/citation.cfm?id=2488200

Example

>>> from monolith.native_training.entry import FtrlOptimizer
>>> optimizer = FtrlOptimizer(0.01)
Parameters:
  • initial_accumulator_value (float) – accumulator 的起始值

  • beta (float) – 论文中的 beta 值

compressor

class Fp16Compressor

模型推理阶段,对 embedding 进行 Fp16 编码,从而达到在服务时节省内存的目的

Example

>>> from monolith.native_training.entry import Fp16Compressor
>>> compressor = Fp16Compressor()
class Fp32Compressor

模型推理阶段,对 embedding 进行 Fp32 编码

Example

>>> from monolith.native_training.entry import Fp32Compressor
>>> compressor = Fp32Compressor()