merlin.models.tf.EmbeddingFeatures#

class merlin.models.tf.EmbeddingFeatures(*args, **kwargs)[source]#

Bases: merlin.models.tf.core.tabular.TabularBlock

Input block for embedding-lookups for categorical features.

For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs when the module is called (so before call).

post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs after the module is called (so after call).

aggregation: Union[str, TabularAggregation], optional

Aggregation to apply after processing the call-method to output a single Tensor.

Next to providing a class that extends TabularAggregation, it’s also possible to provide the name that the class is registered in the tabular_aggregation_registry. Out of the box this contains: “concat”, “stack”, “element-wise-sum” & “element-wise-sum-item-multi”.

schema: Optional[DatasetSchema]

DatasetSchema containing the columns used in this block.

name: Optional[str]

Name of the layer.

__init__(feature_config: Dict[str, tensorflow.python.tpu.tpu_embedding_v2_utils.FeatureConfig], pre: Optional[Union[merlin.models.tf.core.base.Block, str, Sequence[str]]] = None, post: Optional[Union[merlin.models.tf.core.base.Block, str, Sequence[str]]] = None, aggregation: Optional[Union[str, merlin.models.tf.core.tabular.TabularAggregation]] = None, schema: Optional[merlin.schema.schema.Schema] = None, name=None, add_default_pre=True, l2_reg: Optional[float] = 0.0, **kwargs)[source]#

Methods

`__init__`(feature_config[, pre, post, ...])
`add_loss`(losses, **kwargs)	Add loss tensor(s), potentially dependent on layer inputs.
`add_metric`(value[, name])	Adds metric tensor to the layer.
`add_update`(updates)	Add update op(s), potentially dependent on layer inputs.
`add_variable`(args, *kwargs)	Deprecated, do NOT use! Alias for add_weight.
`add_weight`([name, shape, dtype, ...])	Adds a new variable to the layer.
`apply_to_all`(inputs[, columns_to_filter])
`as_tabular`([name])
`build`(input_shapes)
`build_from_config`(config)
`calculate_batch_size_from_input_shapes`(...)
`call`(inputs, **kwargs)
`call_outputs`(outputs[, training])
`check_schema`([schema])
`compute_call_output_shape`(input_shapes)
`compute_mask`(inputs[, mask])	Computes an output mask tensor.
`compute_output_shape`(input_shapes)
`compute_output_signature`(input_signature)	Compute the output tensor signature of the layer based on the inputs.
`connect`(*block[, block_name, context])	Connect the block to other blocks sequentially.
`connect_branch`(*branches[, add_rest, post, ...])	Connect the block to one or multiple branches.
`connect_debug_block`([append])	Connect the block to a debug block.
`connect_with_residual`(block[, activation])	Connect the block to other blocks sequentially with a residual connection.
`connect_with_shortcut`(block[, ...])	Connect the block to other blocks sequentially with a shortcut connection.
`copy`()
`count_params`()	Count the total number of scalars composing the weights.
`embedding_table_dataset`(table_name[, ...])	Creates a Dataset for the embedding table
`embedding_table_df`(table_name[, ...])	Retrieves a dataframe with the embedding table
`export_embedding_table`(table_name, export_path)	Exports the embedding table to parquet file
`finalize_state`()	Finalizes the layers state after updating layer weights.
`from_config`(config)
`from_features`(features[, pre, post, ...])	Initializes a TabularLayer instance where the contents of features will be filtered out
`from_layer`(layer)
`from_schema`(schema[, embedding_options, ...])	Instantiates embedding features from the schema
`get_build_config`()
`get_config`()
`get_embedding_table`(table_name[, ...])
`get_input_at`(node_index)	Retrieves the input tensor(s) of a layer at a given node.
`get_input_mask_at`(node_index)	Retrieves the input mask tensor(s) of a layer at a given node.
`get_input_shape_at`(node_index)	Retrieves the input shape(s) of a layer at a given node.
`get_item_ids_from_inputs`(inputs)
`get_output_at`(node_index)	Retrieves the output tensor(s) of a layer at a given node.
`get_output_mask_at`(node_index)	Retrieves the output mask tensor(s) of a layer at a given node.
`get_output_shape_at`(node_index)	Retrieves the output shape(s) of a layer at a given node.
`get_padding_mask_from_item_id`(inputs[, ...])
`get_weights`()	Returns the current weights of the layer, as NumPy arrays.
`lookup_feature`(name, val[, output_sequence])
`parse`(*block)
`parse_block`(input)
`post_call`(inputs[, transformations, ...])	Method that's typically called after the forward method for post-processing.
`pre_call`(inputs[, transformations])	Method that's typically called before the forward method for pre-processing.
`prepare`([block, post, aggregation])	Transform the inputs of this block.
`register_features`(feature_shapes)
`repeat`([num])	Repeat the block num times.
`repeat_in_parallel`([num, prefix, names, ...])	Repeat the block num times in parallel.
`repr_add`()
`repr_extra`()
`repr_ignore`()
`select_by_name`(name)
`select_by_tag`(tags)
`set_aggregation`(value)	param value
`set_post`(value)
`set_pre`(value)
`set_schema`([schema])
`set_weights`(weights)	Sets the weights of the layer, from NumPy arrays.
`super`()
`table_config`(feature_name)
`with_name_scope`(method)	Decorator to automatically enter the module name scope.

Attributes

`REQUIRES_SCHEMA`
`activity_regularizer`	Optional regularizer function for the output of this layer.
`aggregation`	rtype: TabularAggregation, optional
`compute_dtype`	The dtype of the layer's computations.
`context`
`dtype`	The dtype of the layer weights.
`dtype_policy`	The dtype policy associated with this layer.
`dynamic`	Whether the layer is dynamic (eager-only); set in the constructor.
`has_schema`
`inbound_nodes`	Return Functional API nodes upstream of this layer.
`input`	Retrieves the input tensor(s) of a layer.
`input_mask`	Retrieves the input mask tensor(s) of a layer.
`input_shape`	Retrieves the input shape(s) of a layer.
`input_spec`	InputSpec instance(s) describing the input format for this layer.
`is_input`
`is_tabular`
`losses`	List of losses added using the add_loss() API.
`metrics`	List of metrics added using the add_metric() API.
`name`	Name of the layer (string), set in the constructor.
`name_scope`	Returns a tf.name_scope instance for this class.
`non_trainable_variables`
`non_trainable_weights`	List of all non-trainable weights tracked by this layer.
`outbound_nodes`	Return Functional API nodes downstream of this layer.
`output`	Retrieves the output tensor(s) of a layer.
`output_mask`	Retrieves the output mask tensor(s) of a layer.
`output_shape`	Retrieves the output shape(s) of a layer.
`post`	rtype: SequentialTabularTransformations, optional
`pre`	rtype: SequentialTabularTransformations, optional
`registry`
`schema`
`stateful`
`submodules`	Sequence of all sub-modules.
`supports_masking`	Whether this layer supports computing a mask using compute_mask.
`trainable`
`trainable_variables`
`trainable_weights`	List of all trainable weights tracked by this layer.
`updates`
`variable_dtype`	Alias of Layer.dtype, the dtype of the weights.
`variables`	Returns the list of all layer variables/weights.
`weights`	Returns the list of all layer variables/weights.

classmethod from_schema(schema: merlin.schema.schema.Schema, embedding_options: merlin.models.tf.inputs.embedding.EmbeddingOptions = EmbeddingOptions(embedding_dims=None, embedding_dim_default=64, infer_embedding_sizes=False, infer_embedding_sizes_multiplier=2.0, infer_embeddings_ensure_dim_multiple_of_8=False, embeddings_initializers=None, embeddings_l2_reg=0.0, combiner='mean'), tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, max_sequence_length: Optional[int] = None, **kwargs) → Optional[merlin.models.tf.inputs.embedding.EmbeddingFeatures][source]#

Instantiates embedding features from the schema

Parameters

schema (Schema) – The features chema
embedding_options (EmbeddingOptions, optional) – An EmbeddingOptions instance, which allows for a number of options for the embedding table, by default EmbeddingOptions()
tags (Optional[TagsType], optional) – If provided, keeps only features from those tags, by default None
max_sequence_length (Optional[int], optional) – Maximum sequence length of sparse features (if any), by default None

Returns

An instance of EmbeddingFeatures block, with the embedding layers created under-the-hood

Return type

EmbeddingFeatures

build(input_shapes)[source]#

call(inputs: Dict[str, tensorflow.python.framework.ops.Tensor], **kwargs) → Dict[str, tensorflow.python.framework.ops.Tensor][source]#

compute_call_output_shape(input_shapes)[source]#

lookup_feature(name, val, output_sequence=False)[source]#

table_config(feature_name: str)[source]#

get_embedding_table(table_name: Union[str, merlin.schema.tags.Tags], l2_normalization: bool = False)[source]#

embedding_table_df(table_name: Union[str, merlin.schema.tags.Tags], l2_normalization: bool = False, gpu: bool = True)[source]#

Retrieves a dataframe with the embedding table

Parameters

table_name (Union[str, Tags]) – Tag or name of the embedding table
l2_normalization (bool, optional) – Whether the L2-normalization should be applied to embeddings (common approach for Matrix Factorization and Retrieval models in general), by default False
gpu (bool, optional) – Whether or not should use GPU, by default True

Returns

Returns a dataframe (cudf or pandas), depending on the gpu

Return type

Union[pd.DataFrame, cudf.DataFrame]

embedding_table_dataset(table_name: Union[str, merlin.schema.tags.Tags], l2_normalization: bool = False, gpu=True) → merlin.io.dataset.Dataset[source]#

Creates a Dataset for the embedding table

Parameters

table_name (Union[str, Tags]) – Tag or name of the embedding table
l2_normalization (bool, optional) – Whether the L2-normalization should be applied to embeddings (common approach for Matrix Factorization and Retrieval models in general), by default False
gpu (bool, optional) – Whether or not should use GPU, by default True

Returns

Returns a Dataset with the embeddings

Return type

merlin.io.Dataset

export_embedding_table(table_name: Union[str, merlin.schema.tags.Tags], export_path: str, l2_normalization: bool = False, gpu=True)[source]#

Exports the embedding table to parquet file

Parameters

table_name (Union[str, Tags]) – Tag or name of the embedding table
export_path (str) – Path for the generated parquet file
l2_normalization (bool, optional) – Whether the L2-normalization should be applied to embeddings (common approach for Matrix Factorization and Retrieval models in general), by default False
gpu (bool, optional) – Whether or not should use GPU, by default True

get_config()[source]#

classmethod from_config(config)[source]#