merlin.models.tf.MultiOptimizer#
- class merlin.models.tf.MultiOptimizer(optimizers_and_blocks: Sequence[merlin.models.tf.blocks.optimizer.OptimizerBlocks], default_optimizer: Union[str, keras.optimizers.optimizer.Optimizer] = 'rmsprop', name: str = 'MultiOptimizer', **kwargs)[source]#
Bases:
keras.optimizers.optimizer.Optimizer
An optimizer that composes multiple individual optimizers.
It allows different optimizers to be applied to different subsets of the model’s variables. For example, it is possible to apply one optimizer to the blocks which contains the model’s embeddings (sparse variables) and another optimizer to the rest of its variables (other blocks).
To specify which optimizer should apply to each block, pass a list of pairs of (optimizer instance, blocks the optimizer should apply to).
import merlin.models.tf as ml user_tower = ml.InputBlock(schema.select_by_tag(Tags.USER)).connect(ml.MLPBlock([512, 256])) item_tower = ml.InputBlock(schema.select_by_tag(Tags.ITEM)).connect(ml.MLPBlock([512, 256])) third_tower = ml.InputBlock(schema.select_by_tag(Tags.ITEM)).connect(ml.MLPBlock([64])) three_tower = ml.ParallelBlock({“user”: user_tower, “item”: item_tower, “third”: third_tower}) model = ml.Model(three_tower, ml.BinaryClassificationTask(“click”))
# The third_tower would be assigned the default_optimizer (“adagrad” in this example) optimizer = ml.MultiOptimizer(default_optimizer=”adagrad”,
- optimizers_and_blocks=[
ml.OptimizerBlocks(tf.keras.optimizers.SGD(), user_tower), ml.OptimizerBlocks(tf.keras.optimizers.Adam(), item_tower),
])
# The string identification of optimizer is also acceptable, here “sgd” for the third_tower # The variables of BinaryClassificationTask(“click”) would still use the default_optimizer optimizer = ml.MultiOptimizer(default_optimizer=”adam”,
- optimizers_and_blocks=[
ml.OptimizerBlocks(“sgd”, [user_tower, third_tower]), ml.OptimizerBlocks(“adam”, item_tower),
])
- __init__(optimizers_and_blocks: Sequence[merlin.models.tf.blocks.optimizer.OptimizerBlocks], default_optimizer: Union[str, keras.optimizers.optimizer.Optimizer] = 'rmsprop', name: str = 'MultiOptimizer', **kwargs)[source]#
Initializes an MultiOptimizer instance.
- Parameters
optimizers_and_blocks (Sequence[OptimizerBlocks]) – List of OptimizerBlocks(dataclass), the OptimizerBlocks contains two items, one is optimizer, another one is a list of blocks or a block that the optimizer should apply to. See ‘class OptimizerBlocks’
default_optimizer (Union[str, tf.keras.optimizers.Optimizer]) – Default optimizer for the rest variables not specified in optimizers_and_blocks, by default “rmsprop”.
name (str) – The name of MultiOptimizer.
Methods
__init__
(optimizers_and_blocks[, ...])Initializes an MultiOptimizer instance.
add
(optimizer_blocks)add another optimzier and specify which block to apply this optimizer to
add_variable
(shape[, dtype, initializer, name])Create an optimizer variable.
add_variable_from_reference
(model_variable, ...)aggregate_gradients
(grads_and_vars)Aggregate gradients on all devices.
apply_gradients
(grads_and_vars[, name, ...])build
(var_list)Initialize the optimizer's variables, such as momemtum variables.
compute_gradients
(loss, var_list[, tape])Compute gradients of loss on trainable variables.
exclude_from_weight_decay
([var_list, var_names])Exclude variables from weight decay.
finalize_variable_values
(var_list)Set the final value of model's trainable variables.
from_config
(config)minimize
(loss, var_list[, tape])Minimize loss by updating var_list.
set_weights
(weights)Set the weights of the optimizer.
update
(optimizer_blocks)update the optimzier of a block, which would update the block's optimizer no matter what optimizer it used to utilize.
update_step
(gradient, variable)Function to update variable value based on given gradients.
Returns the optimizer's variables.
Attributes
See base class.
learning_rate
lr
Alias of learning_rate().
default_optimizer is included here
Returns the optimizer's variables.
- apply_gradients(grads_and_vars: Sequence[Tuple[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]]], name: Optional[str] = None, experimental_aggregate_gradients: bool = True) None [source]#
- add(optimizer_blocks: merlin.models.tf.blocks.optimizer.OptimizerBlocks)[source]#
add another optimzier and specify which block to apply this optimizer to
- update(optimizer_blocks: merlin.models.tf.blocks.optimizer.OptimizerBlocks)[source]#
update the optimzier of a block, which would update the block’s optimizer no matter what optimizer it used to utilize. If the block is not specified with an optimizer before, this functions would have the same functionality as self.add()
Note: the optimizer_blocks would be kept in self.update_optimizers_and_blockss, instead of self.optimizers_and_blocks
- property iterations#
See base class.
- property weights: List[tensorflow.python.ops.variables.Variable]#
Returns the optimizer’s variables.
- property optimizers: List[keras.optimizers.optimizer.Optimizer]#
default_optimizer is included here
- Type
Returns the optimizers in MultiOptimizer (in the original order). Note