merlin.models.tf.EmbeddingTable#

class merlin.models.tf.EmbeddingTable(*args, **kwargs)[source]#

Bases: merlin.models.tf.inputs.embedding.EmbeddingTableBase

Embedding table that is backed by a standard Keras Embedding Layer. It accepts as input features for lookup tf.Tensor, tf.RaggedTensor, and tf.SparseTensor which might be 2D (batch_size, 1) for scalars or 3d (batch_size, seq_length, 1) for sequential features

Parameters

dim (int) – The dimension of the dense embedding.
col_schemas (ColumnSchema) – The schema of the column(s) used to infer the cardinality.
embeddings_initializer (str, optional) – The initializer for the embeddings matrix (see keras.initializers), by default “uniform”.
embeddings_regularizer (str, optional) – The regularizer function applied to the embeddings matrix (see keras.regularizers), by default None.
embeddings_constraint (str, optional) – The constraint function applied to the embeddings matrix (see keras.constraints), by default None.
mask_zero (bool, optional) – Whether or not the input value 0 is a special “padding” value that should be masked out. This is useful when using recurrent layers which may take variable length input, by default False.
input_length (int, optional) – The length of input sequences when it is constant, by default None.
sequence_combiner (CombinerType, optional) – A string specifying how to combine embedding results for each entry (“mean”, “sqrtn” and “sum” are supported) or a layer. Default is None (no combiner used).
trainable (bool, optional) – Whether the layer’s variables should be trainable, by default True.
name (str, optional) – The name of the layer, by default None.
dtype (str, optional) – The data type of the layer’s computations and weights. It can also be a tf.keras.mixed_precision.Policy, which allows the computation and weight dtype to differ, by default None.
dynamic (bool, optional) – Set this to True if the layer should only be run eagerly and should not be used to generate a static computation graph, by default False.
l2_batch_regularization_factor (float, optional) – The factor for L2 regularization of the embeddings vectors (from the current batch only), by default 0.0.
**kwargs – Other keyword arguments forwarded to the Keras Layer.

__init__(dim: int, *col_schemas: merlin.schema.schema.ColumnSchema, embeddings_initializer='uniform', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None, sequence_combiner: Optional[Union[str, keras.engine.base_layer.Layer]] = None, trainable=True, name=None, dtype=None, dynamic=False, table=None, l2_batch_regularization_factor=0.0, weights=None, **kwargs)[source]#: Create an EmbeddingTable.

Methods

`__init__`(dim, *col_schemas[, ...])	Create an EmbeddingTable.
`add_feature`(col_schema)	Add a feature to the table.
`add_loss`(losses, **kwargs)	Add loss tensor(s), potentially dependent on layer inputs.
`add_metric`(value[, name])	Adds metric tensor to the layer.
`add_update`(updates)	Add update op(s), potentially dependent on layer inputs.
`add_variable`(args, *kwargs)	Deprecated, do NOT use! Alias for add_weight.
`add_weight`([name, shape, dtype, ...])	Adds a new variable to the layer.
`as_tabular`([name])
`build`(input_shapes)	Builds the EmbeddingTable based on the input shapes.
`build_from_config`(config)
`call`(inputs, **kwargs)	param inputs Tensors or dictionary of tensors representing the input batch.
`call_outputs`(outputs[, training])
`check_schema`([schema])
`compute_call_output_shape`(input_shapes)	Computes the shape of the output of a call to this layer.
`compute_mask`(inputs[, mask])	Computes an output mask tensor.
`compute_output_shape`(input_shape)	Computes the shape of the output tensors.
`compute_output_signature`(input_signature)	Compute the output tensor signature of the layer based on the inputs.
`connect`(*block[, block_name, context])	Connect the block to other blocks sequentially.
`connect_branch`(*branches[, add_rest, post, ...])	Connect the block to one or multiple branches.
`connect_debug_block`([append])	Connect the block to a debug block.
`connect_with_residual`(block[, activation])	Connect the block to other blocks sequentially with a residual connection.
`connect_with_shortcut`(block[, ...])	Connect the block to other blocks sequentially with a shortcut connection.
`copy`()
`count_params`()	Count the total number of scalars composing the weights.
`finalize_state`()	Finalizes the layers state after updating layer weights.
`from_config`(config[, table])	Creates an EmbeddingTable from its configuration.
`from_dataset`(data[, trainable, name, col_schema])	Create From pre-trained embeddings from a Dataset or DataFrame.
`from_layer`(layer)
`from_pretrained`(data[, trainable, name, ...])	Create From pre-trained embeddings from a Dataset or DataFrame.
`get_build_config`()
`get_config`()	Returns the configuration of this EmbeddingTable.
`get_input_at`(node_index)	Retrieves the input tensor(s) of a layer at a given node.
`get_input_mask_at`(node_index)	Retrieves the input mask tensor(s) of a layer at a given node.
`get_input_shape_at`(node_index)	Retrieves the input shape(s) of a layer at a given node.
`get_item_ids_from_inputs`(inputs)
`get_output_at`(node_index)	Retrieves the output tensor(s) of a layer at a given node.
`get_output_mask_at`(node_index)	Retrieves the output mask tensor(s) of a layer at a given node.
`get_output_shape_at`(node_index)	Retrieves the output shape(s) of a layer at a given node.
`get_padding_mask_from_item_id`(inputs[, ...])
`get_weights`()	Returns the current weights of the layer, as NumPy arrays.
`parse`(*block)
`parse_block`(input)
`prepare`([block, post, aggregation])	Transform the inputs of this block.
`register_features`(feature_shapes)
`repeat`([num])	Repeat the block num times.
`repeat_in_parallel`([num, prefix, names, ...])	Repeat the block num times in parallel.
`select_by_name`(name)
`select_by_tag`(tags)	Select features in EmbeddingTable by tags.
`set_schema`([schema])
`set_weights`(weights)	Sets the weights of the layer, from NumPy arrays.
`to_dataset`([gpu])	Converts the EmbeddingTable to a merlin.io.Dataset.
`to_df`([gpu])	Converts the EmbeddingTable to a DataFrame.
`with_name_scope`(method)	Decorator to automatically enter the module name scope.

Attributes

`REQUIRES_SCHEMA`
`activity_regularizer`	Optional regularizer function for the output of this layer.
`compute_dtype`	The dtype of the layer's computations.
`context`
`dtype`	The dtype of the layer weights.
`dtype_policy`	The dtype policy associated with this layer.
`dynamic`	Whether the layer is dynamic (eager-only); set in the constructor.
`has_schema`
`inbound_nodes`	Return Functional API nodes upstream of this layer.
`input`	Retrieves the input tensor(s) of a layer.
`input_dim`
`input_mask`	Retrieves the input mask tensor(s) of a layer.
`input_shape`	Retrieves the input shape(s) of a layer.
`input_spec`	InputSpec instance(s) describing the input format for this layer.
`losses`	List of losses added using the add_loss() API.
`metrics`	List of metrics added using the add_metric() API.
`name`	Name of the layer (string), set in the constructor.
`name_scope`	Returns a tf.name_scope instance for this class.
`non_trainable_variables`
`non_trainable_weights`	List of all non-trainable weights tracked by this layer.
`outbound_nodes`	Return Functional API nodes downstream of this layer.
`output`	Retrieves the output tensor(s) of a layer.
`output_mask`	Retrieves the output mask tensor(s) of a layer.
`output_shape`	Retrieves the output shape(s) of a layer.
`registry`
`schema`
`stateful`
`submodules`	Sequence of all sub-modules.
`supports_masking`	Whether this layer supports computing a mask using compute_mask.
`table_name`
`trainable`
`trainable_variables`
`trainable_weights`	List of all trainable weights tracked by this layer.
`updates`
`variable_dtype`	Alias of Layer.dtype, the dtype of the weights.
`variables`	Returns the list of all layer variables/weights.
`weights`	Returns the list of all layer variables/weights.

select_by_tag(tags: Union[merlin.schema.tags.Tags, Sequence[merlin.schema.tags.Tags]]) → Optional[merlin.models.tf.inputs.embedding.EmbeddingTable][source]#

Select features in EmbeddingTable by tags.

Since an EmbeddingTable can be a shared-embedding table, this method filters the schema for features that match the tags.

If none of the features match the tags, it will return None.

Parameters: tags (Union[Tags, Sequence[Tags]]) – A list of tags.
Return type: An EmbeddingTable if the tags match. If no features match, it returns None.

classmethod from_pretrained(data: Union[merlin.io.dataset.Dataset, pandas.core.frame.DataFrame], trainable=True, name=None, col_schema=None, **kwargs) → merlin.models.tf.inputs.embedding.EmbeddingTable[source]#: Create From pre-trained embeddings from a Dataset or DataFrame. :param data: A dataset containing the pre-trained embedding weights :type data: Union[Dataset, DataFrameType] :param trainable: Whether the layer should be trained or not. :type trainable: bool :param name: The name of the layer. :type name: str

classmethod from_dataset(data: Union[merlin.io.dataset.Dataset, pandas.core.frame.DataFrame], trainable=True, name=None, col_schema=None, **kwargs) → merlin.models.tf.inputs.embedding.EmbeddingTable[source]#: Create From pre-trained embeddings from a Dataset or DataFrame. :param data: A dataset containing the pre-trained embedding weights :type data: Union[Dataset, DataFrameType] :param trainable: Whether the layer should be trained or not. :type trainable: bool :param name: The name of the layer. :type name: str

to_dataset(gpu=None) → merlin.io.dataset.Dataset[source]#

Converts the EmbeddingTable to a merlin.io.Dataset.

Parameters: gpu (bool) – Whether to use gpu.
Returns: The dataset representation of the EmbeddingTable.
Return type: merlin.io.Dataset

to_df(gpu=None)[source]#

Converts the EmbeddingTable to a DataFrame.

Parameters: gpu (bool) – Whether to use gpu.
Returns: The DataFrame representation of the EmbeddingTable.
Return type: cudf or pandas DataFrame

build(input_shapes)[source]#

Builds the EmbeddingTable based on the input shapes.

Parameters: input_shapes (tf.TensorShape or dictionary of shapes.) – The shapes of the input tensors.

call(inputs: Union[tensorflow.python.framework.ops.Tensor, Dict[str, tensorflow.python.framework.ops.Tensor]], **kwargs) → Union[tensorflow.python.framework.ops.Tensor, Dict[str, tensorflow.python.framework.ops.Tensor]][source]#

Parameters: inputs (Union[tf.Tensor, tf.RaggedTensor, tf.SparseTensor]) – Tensors or dictionary of tensors representing the input batch.
Return type: A tensor or dict of tensors corresponding to the embeddings for inputs

compute_output_shape(input_shape: Union[tensorflow.python.framework.tensor_shape.TensorShape, Dict[str, tensorflow.python.framework.tensor_shape.TensorShape]]) → Union[tensorflow.python.framework.tensor_shape.TensorShape, Dict[str, tensorflow.python.framework.tensor_shape.TensorShape]][source]#

Computes the shape of the output tensors.

Parameters: input_shape (Union[tf.TensorShape, Dict[str, tf.TensorShape]]) – The shape of the input tensors.
Returns: The shape of the output tensors.
Return type: Union[tf.TensorShape, Dict[str, tf.TensorShape]]

compute_call_output_shape(input_shapes)[source]#

Computes the shape of the output of a call to this layer.

Parameters: input_shapes (tf.TensorShape or dictionary of shapes.) – The shapes of the input tensors.
Returns: The shape of the output of a call to this layer.
Return type: Union[tf.TensorShape, Dict[str, tf.TensorShape]]

classmethod from_config(config, table=None)[source]#

Creates an EmbeddingTable from its configuration.

Parameters

config (dict) – Configuration dictionary.
table (tf.keras.layers.Embedding, optional) – An optional embedding layer.

Returns

A newly created EmbeddingTable.

Return type

EmbeddingTable

get_config()[source]#

Returns the configuration of this EmbeddingTable.

Returns: Configuration dictionary.
Return type: dict