transformers4rec.torch.features package
Submodules
transformers4rec.torch.features.base module
-
class
transformers4rec.torch.features.base.
InputBlock
(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.tabular.base.TabularBlock
,abc.ABC
transformers4rec.torch.features.continuous module
-
class
transformers4rec.torch.features.continuous.
ContinuousFeatures
(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.features.base.InputBlock
Input block for continuous features.
- Parameters
features (List[str]) – List of continuous features to include in this module.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
transformers4rec.torch.features.embedding module
-
class
transformers4rec.torch.features.embedding.
EmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source] Bases:
transformers4rec.torch.features.base.InputBlock
Input block for embedding-lookups for categorical features.
For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
- pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs when the module is called (so before forward).
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs after the module is called (so after forward).
- aggregation: Union[str, TabularAggregation], optional
Aggregation to apply after processing the forward-method to output a single Tensor.
-
property
item_embedding_table
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.module.Module[source]
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs) → Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source] Instantitates
EmbeddingFeatures
from aDatasetSchema
.- Parameters
schema (DatasetSchema) – Dataset schema
embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None
default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in
default_soft_embedding_dim
, by default 64infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False
infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
item_id (Optional[str], optional) – Name of the item id column (feature), by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None
- Returns
Returns the
EmbeddingFeatures
for the dataset schema- Return type
Optional[EmbeddingFeatures]
-
item_ids
(inputs) → torch.Tensor[source]
-
class
transformers4rec.torch.features.embedding.
EmbeddingBagWrapper
(num_embeddings: int, embedding_dim: int, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, mode: str = 'mean', sparse: bool = False, _weight: Optional[torch.Tensor] = None, include_last_offset: bool = False, padding_idx: Optional[int] = None, device=None, dtype=None)[source] Bases:
torch.nn.modules.sparse.EmbeddingBag
-
weight
: torch.Tensor
-
-
class
transformers4rec.torch.features.embedding.
SoftEmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source] Bases:
transformers4rec.torch.features.embedding.EmbeddingFeatures
Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs) → Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source] Instantitates
SoftEmbeddingFeatures
from aDatasetSchema
.- Parameters
schema (DatasetSchema) – Dataset schema
soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None
soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in
soft_embedding_cardinalities
, by default 10soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None
soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in
soft_embedding_dim_default
, by default 8embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None
- Returns
Returns a
SoftEmbeddingFeatures
instance from the dataset schema- Return type
Optional[SoftEmbeddingFeatures]
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → transformers4rec.torch.features.embedding.SoftEmbedding[source]
-
class
transformers4rec.torch.features.embedding.
TableConfig
(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source] Bases:
object
-
class
transformers4rec.torch.features.embedding.
FeatureConfig
(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source] Bases:
object
-
class
transformers4rec.torch.features.embedding.
SoftEmbedding
(num_embeddings, embeddings_dim, emb_initializer=None)[source] Bases:
torch.nn.modules.module.Module
Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings
-
class
transformers4rec.torch.features.embedding.
PretrainedEmbeddingsInitializer
(weight_matrix: Union[torch.Tensor, List[List[float]]], trainable: bool = False, **kwargs)[source] Bases:
torch.nn.modules.module.Module
Initializer of embedding tables with pre-trained weights
- Parameters
weight_matrix (Union[torch.Tensor, List[List[float]]]) – A 2D torch or numpy tensor or lists of lists with the pre-trained weights for embeddings. The expect dims are (embedding_cardinality, embedding_dim). The embedding_cardinality can be inferred from the column schema, for example, schema.select_by_name(“item_id”).feature[0].int_domain.max + 1. The first position of the embedding table is reserved for padded items (id=0).
trainable (bool) – Whether the embedding table should be trainable or not
transformers4rec.torch.features.sequence module
-
class
transformers4rec.torch.features.sequence.
SequenceEmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source] Bases:
transformers4rec.torch.features.embedding.EmbeddingFeatures
Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
padding_idx (int) – The symbol to use for padding.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.sparse.Embedding[source]
-
class
transformers4rec.torch.features.sequence.
TabularSequenceFeatures
(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.features.tabular.TabularFeatures
Input module that combines different types of features to a sequence: continuous, categorical & text.
- Parameters
continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock.
masking (MaskSequence, optional) – Masking to apply to the inputs.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs) → transformers4rec.torch.features.sequence.TabularSequenceFeatures[source] Instantiates
TabularFeatures
from aDatasetSchema
- Parameters
schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None
d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None
masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None
- Returns
Returns
TabularFeatures
from a dataset schema- Return type
-
property
masking
-
property
item_id
-
property
item_embedding_table
transformers4rec.torch.features.tabular module
-
class
transformers4rec.torch.features.tabular.
TabularFeatures
(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.tabular.base.MergeTabular
Input module that combines different types of features: continuous, categorical & text.
- Parameters
continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
- pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs when the module is called (so before forward).
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs after the module is called (so after forward).
- aggregation: Union[str, TabularAggregation], optional
Aggregation to apply after processing the forward-method to output a single Tensor.
-
CONTINUOUS_MODULE_CLASS
alias of
transformers4rec.torch.features.continuous.ContinuousFeatures
-
EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.embedding.EmbeddingFeatures
-
SOFT_EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.embedding.SoftEmbeddingFeatures
-
project_continuous_features
(mlp_layers_dims: Union[List[int], int]) → transformers4rec.torch.features.tabular.TabularFeatures[source] Combine all concatenated continuous features with stacked MLP layers
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, **kwargs) → transformers4rec.torch.features.tabular.TabularFeatures[source] Instantiates
TabularFeatures
from aDatasetSchema
- Parameters
schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
- Returns
Returns
TabularFeatures
from a dataset schema- Return type
-
property
continuous_module
-
property
categorical_module