merlin.models.tf.MultiOptimizer#

class merlin.models.tf.MultiOptimizer(optimizers_and_blocks: Sequence[merlin.models.tf.blocks.optimizer.OptimizerBlocks], default_optimizer: Union[str, keras.optimizers.optimizer.Optimizer] = 'rmsprop', name: str = 'MultiOptimizer', **kwargs)[source]#

Bases: keras.optimizers.optimizer.Optimizer

An optimizer that composes multiple individual optimizers.

It allows different optimizers to be applied to different subsets of the model’s variables. For example, it is possible to apply one optimizer to the blocks which contains the model’s embeddings (sparse variables) and another optimizer to the rest of its variables (other blocks).

To specify which optimizer should apply to each block, pass a list of pairs of (optimizer instance, blocks the optimizer should apply to).

For example: ```python

import merlin.models.tf as ml user_tower = ml.InputBlock(schema.select_by_tag(Tags.USER)).connect(ml.MLPBlock([512, 256])) item_tower = ml.InputBlock(schema.select_by_tag(Tags.ITEM)).connect(ml.MLPBlock([512, 256])) third_tower = ml.InputBlock(schema.select_by_tag(Tags.ITEM)).connect(ml.MLPBlock([64])) three_tower = ml.ParallelBlock({“user”: user_tower, “item”: item_tower, “third”: third_tower}) model = ml.Model(three_tower, ml.BinaryClassificationTask(“click”))

# The third_tower would be assigned the default_optimizer (“adagrad” in this example) optimizer = ml.MultiOptimizer(default_optimizer=”adagrad”,

optimizers_and_blocks=[
ml.OptimizerBlocks(tf.keras.optimizers.SGD(), user_tower), ml.OptimizerBlocks(tf.keras.optimizers.Adam(), item_tower),

])

# The string identification of optimizer is also acceptable, here “sgd” for the third_tower # The variables of BinaryClassificationTask(“click”) would still use the default_optimizer optimizer = ml.MultiOptimizer(default_optimizer=”adam”,

optimizers_and_blocks=[
ml.OptimizerBlocks(“sgd”, [user_tower, third_tower]), ml.OptimizerBlocks(“adam”, item_tower),

])

```

__init__(optimizers_and_blocks: Sequence[merlin.models.tf.blocks.optimizer.OptimizerBlocks], default_optimizer: Union[str, keras.optimizers.optimizer.Optimizer] = 'rmsprop', name: str = 'MultiOptimizer', **kwargs)[source]#

Initializes an MultiOptimizer instance.

Parameters

optimizers_and_blocks (Sequence[OptimizerBlocks]) – List of OptimizerBlocks(dataclass), the OptimizerBlocks contains two items, one is optimizer, another one is a list of blocks or a block that the optimizer should apply to. See ‘class OptimizerBlocks’
default_optimizer (Union[str, tf.keras.optimizers.Optimizer]) – Default optimizer for the rest variables not specified in optimizers_and_blocks, by default “rmsprop”.
name (str) – The name of MultiOptimizer.

Methods

`__init__`(optimizers_and_blocks[, ...])	Initializes an MultiOptimizer instance.
`add`(optimizer_blocks)	add another optimzier and specify which block to apply this optimizer to
`add_variable`(shape[, dtype, initializer, name])	Create an optimizer variable.
`add_variable_from_reference`(model_variable, ...)
`aggregate_gradients`(grads_and_vars)	Aggregate gradients on all devices.
`apply_gradients`(grads_and_vars[, name, ...])
`build`(var_list)	Initialize the optimizer's variables, such as momemtum variables.
`compute_gradients`(loss, var_list[, tape])	Compute gradients of loss on trainable variables.
`exclude_from_weight_decay`([var_list, var_names])	Exclude variables from weight decay.
`finalize_variable_values`(var_list)	Set the final value of model's trainable variables.
`from_config`(config)
`get_config`()
`minimize`(loss, var_list[, tape])	Minimize loss by updating var_list.
`set_weights`(weights)	Set the weights of the optimizer.
`update`(optimizer_blocks)	update the optimzier of a block, which would update the block's optimizer no matter what optimizer it used to utilize.
`update_step`(gradient, variable)	Function to update variable value based on given gradients.
`variables`()	Returns the optimizer's variables.

Attributes

`iterations`	See base class.
`learning_rate`
`lr`	Alias of learning_rate().
`optimizers`	default_optimizer is included here
`weights`	Returns the optimizer's variables.

apply_gradients(grads_and_vars: Sequence[Tuple[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]]], name: Optional[str] = None, experimental_aggregate_gradients: bool = True) → None[source]#

add(optimizer_blocks: merlin.models.tf.blocks.optimizer.OptimizerBlocks)[source]#: add another optimzier and specify which block to apply this optimizer to

update(optimizer_blocks: merlin.models.tf.blocks.optimizer.OptimizerBlocks)[source]#

update the optimzier of a block, which would update the block’s optimizer no matter what optimizer it used to utilize. If the block is not specified with an optimizer before, this functions would have the same functionality as self.add()

Note: the optimizer_blocks would be kept in self.update_optimizers_and_blockss, instead of self.optimizers_and_blocks

get_config()[source]#

classmethod from_config(config)[source]#

property iterations#: See base class.

variables()[source]#: Returns the optimizer’s variables.

property weights: List[tensorflow.python.ops.variables.Variable]#: Returns the optimizer’s variables.

property optimizers: List[keras.optimizers.optimizer.Optimizer]#

default_optimizer is included here

Type: Returns the optimizers in MultiOptimizer (in the original order). Note