transformers4rec.torch.features package

Submodules

transformers4rec.torch.features.base module

class transformers4rec.torch.features.base.InputBlock(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]: Bases: transformers4rec.torch.tabular.base.TabularBlock, abc.ABC

transformers4rec.torch.features.continuous module

class transformers4rec.torch.features.continuous.ContinuousFeatures(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for continuous features.

Parameters

features (List[str]) – List of continuous features to include in this module.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_features(features, **kwargs)[source]

forward(inputs, **kwargs)[source]

forward_output_size(input_sizes)[source]

transformers4rec.torch.features.embedding module

class transformers4rec.torch.features.embedding.EmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for embedding-lookups for categorical features.

For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs when the module is called (so before forward).
post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs after the module is called (so after forward).
aggregation: Union[str, TabularAggregation], optional: Aggregation to apply after processing the forward-method to output a single Tensor.

property item_embedding_table

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.module.Module [source]

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs) → Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source]

Instantitates EmbeddingFeatures from a DatasetSchema.

Parameters

schema (DatasetSchema) – Dataset schema
embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None
default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in default_soft_embedding_dim, by default 64
infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False
infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
item_id (Optional[str], optional) – Name of the item id column (feature), by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None

Returns

Returns the EmbeddingFeatures for the dataset schema

Return type

Optional[EmbeddingFeatures]

item_ids(inputs) → torch.Tensor [source]

forward(inputs, **kwargs)[source]

forward_output_size(input_sizes)[source]

class transformers4rec.torch.features.embedding.EmbeddingBagWrapper(num_embeddings: int, embedding_dim: int, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, mode: str = 'mean', sparse: bool = False, _weight: Optional[torch.Tensor] = None, include_last_offset: bool = False, padding_idx: Optional[int] = None, device=None, dtype=None)[source]

Bases: torch.nn.modules.sparse.EmbeddingBag

forward(input, **kwargs)[source]

num_embeddings: int

embedding_dim: int

max_norm: Optional[float]

norm_type: float

scale_grad_by_freq: bool

weight: torch.Tensor

mode: str

sparse: bool

include_last_offset: bool

padding_idx: Optional[int]

class transformers4rec.torch.features.embedding.SoftEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs) → Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source]

Instantitates SoftEmbeddingFeatures from a DatasetSchema.

Parameters

schema (DatasetSchema) – Dataset schema
soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None
soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in soft_embedding_cardinalities, by default 10
soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None
soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in soft_embedding_dim_default, by default 8
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None

Returns

Returns a SoftEmbeddingFeatures instance from the dataset schema

Return type

Optional[SoftEmbeddingFeatures]

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → transformers4rec.torch.features.embedding.SoftEmbedding [source]

class transformers4rec.torch.features.embedding.TableConfig(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source]: Bases: object

class transformers4rec.torch.features.embedding.FeatureConfig(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source]: Bases: object

class transformers4rec.torch.features.embedding.SoftEmbedding(num_embeddings, embeddings_dim, emb_initializer=None)[source]

Bases: torch.nn.modules.module.Module

Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings

forward(input_numeric)[source]

training: bool

class transformers4rec.torch.features.embedding.PretrainedEmbeddingsInitializer(weight_matrix: Union[torch.Tensor, List[List[float]]], trainable: bool = False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Initializer of embedding tables with pre-trained weights

Parameters

weight_matrix (Union[torch.Tensor, List[List[float]]]) – A 2D torch or numpy tensor or lists of lists with the pre-trained weights for embeddings. The expect dims are (embedding_cardinality, embedding_dim). The embedding_cardinality can be inferred from the column schema, for example, schema.select_by_name(“item_id”).feature[0].int_domain.max + 1. The first position of the embedding table is reserved for padded items (id=0).
trainable (bool) – Whether the embedding table should be trainable or not

training: bool

forward(x)[source]

transformers4rec.torch.features.sequence module

class transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
padding_idx (int) – The symbol to use for padding.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.sparse.Embedding [source]

forward_output_size(input_sizes)[source]

class transformers4rec.torch.features.sequence.TabularSequenceFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.tabular.TabularFeatures

Input module that combines different types of features to a sequence: continuous, categorical & text.

Parameters

continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock.
masking (MaskSequence, optional) – Masking to apply to the inputs.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[List[int], int]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs) → transformers4rec.torch.features.sequence.TabularSequenceFeatures [source]

Instantiates TabularFeatures from a DatasetSchema

Parameters

schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None
d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None
masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

property masking

set_masking(value)[source]

property item_id

property item_embedding_table

forward(inputs, training=False, testing=False, **kwargs)[source]

project_continuous_features(dimensions)[source]

forward_output_size(input_size)[source]

transformers4rec.torch.features.tabular module

class transformers4rec.torch.features.tabular.TabularFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.MergeTabular

Input module that combines different types of features: continuous, categorical & text.

Parameters

continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs when the module is called (so before forward).
post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs after the module is called (so after forward).
aggregation: Union[str, TabularAggregation], optional: Aggregation to apply after processing the forward-method to output a single Tensor.

CONTINUOUS_MODULE_CLASS: alias of transformers4rec.torch.features.continuous.ContinuousFeatures

EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.embedding.EmbeddingFeatures

SOFT_EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.embedding.SoftEmbeddingFeatures

project_continuous_features(mlp_layers_dims: Union[List[int], int]) → transformers4rec.torch.features.tabular.TabularFeatures [source]

Combine all concatenated continuous features with stacked MLP layers

Parameters: mlp_layers_dims (Union[List[int], int]) – The MLP layer dimensions
Returns: Returns the same TabularFeatures object with the continuous features projected
Return type: TabularFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[List[int], int]] = None, continuous_soft_embeddings: bool = False, **kwargs) → transformers4rec.torch.features.tabular.TabularFeatures [source]

Instantiates TabularFeatures from a DatasetSchema

Parameters

schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

forward_output_size(input_size)[source]

property continuous_module

property categorical_module

transformers4rec.torch.features package

Submodules

transformers4rec.torch.features.base module

transformers4rec.torch.features.continuous module

transformers4rec.torch.features.embedding module

transformers4rec.torch.features.sequence module

transformers4rec.torch.features.tabular module

transformers4rec.torch.features.text module

Module contents