transformers4rec.torch package

Subpackages

Submodules

transformers4rec.torch.masking module

class transformers4rec.torch.masking.MaskingInfo(schema: torch.Tensor, targets: torch.Tensor)[source]

Bases: object

schema: torch.Tensor

targets: torch.Tensor

class transformers4rec.torch.masking.MaskSequence(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module

Base class to prepare masked items inputs/labels for language modeling tasks.

Transformer architectures can be trained in different ways. Depending of the training method, there is a specific masking schema. The masking schema sets the items to be predicted (labels) and mask (hide) their positions in the sequence so that they are not used by the Transformer layers for prediction.

We currently provide 4 different masking schemes out of the box:

Causal LM (clm)
Masked LM (mlm)
Permutation LM (plm)
Replacement Token Detection (rtd)

This class can be extended to add different a masking scheme.

Parameters

hidden_size – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
pad_token (int, default = 0) – Index of the padding token used for getting batch of sequences with the same length

compute_masked_targets(item_ids: torch.Tensor, training: bool = False, testing: bool = False) → transformers4rec.torch.masking.MaskingInfo [source]

Method to prepare masked labels based on the sequence of item ids. It returns The true labels of masked positions and the related boolean mask. And the attributes of the class mask_schema and masked_targets are updated to be re-used in other modules.

item_ids: torch.Tensor
The sequence of input item ids used for deriving labels of next item prediction task.

training: bool

Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.

testing: bool

Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.

Tuple[MaskingSchema, MaskedTargets]

apply_mask_to_inputs(inputs: torch.Tensor, schema: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor [source]

Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.

Parameters

inputs (torch.Tensor) – The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)
schema (MaskingSchema) – The boolean mask indicating masked positions.

predict_all(item_ids: torch.Tensor) → transformers4rec.torch.masking.MaskingInfo [source]

Prepare labels for all next item predictions instead of last-item predictions in a user’s sequence.

Parameters: item_ids (torch.Tensor) – The sequence of input item ids used for deriving labels of next item prediction task.
Returns
Return type: Tuple[MaskingSchema, MaskedTargets]

forward(inputs: torch.Tensor, item_ids: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor [source]

forward_output_size(input_size)[source]

transformer_required_arguments() → Dict[str, Any][source]

transformer_optional_arguments() → Dict[str, Any][source]

property transformer_arguments: Prepare additional arguments to pass to the Transformer forward methods.

class transformers4rec.torch.masking.CausalLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
train_on_last_item_seq_only (predict only last item during training) –

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor [source]

class transformers4rec.torch.masking.MaskedLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it.

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training=False, testing=False) → torch.Tensor [source]

Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.

inputs: torch.Tensor
The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)

schema: MaskingSchema
The boolean mask indicating masked positions.

training: bool: Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.
testing: bool: Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.

class transformers4rec.torch.masking.PermutationLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
max_span_length (int) – maximum length of a span of masked items
plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items
permute_all (bool) – Compute partial span-based prediction (=False) or not.

compute_masked_targets(item_ids: torch.Tensor, training=False, **kwargs) → transformers4rec.torch.masking.MaskingInfo [source]

transformer_required_arguments() → Dict[str, Any][source]

class transformers4rec.torch.masking.ReplacementLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskedLanguageModeling

Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not

get_fake_tokens(itemid_seq, target_flat, logits)[source]

Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements.

Parameters

itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids
target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences
logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise

Returns

corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement
discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items
batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled

sample_from_softmax(logits: torch.Tensor) → torch.Tensor [source]

Sampling method for replacement token modeling (ELECTRA)

Parameters: logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model
Returns: samples – ids of replacements items.
Return type: torch.Tensor(#pos_item)

transformers4rec.torch.ranking_metric module

class transformers4rec.torch.ranking_metric.RankingMetric(top_ks=None, labels_onehot=False)[source]

Bases: torchmetrics.metric.Metric

Metric wrapper for computing ranking metrics@K for session-based task.

Parameters

top_ks (list, default [2, 5])) – list of cutoffs
labels_onehot (bool) – Enable transform the labels to one-hot representation

update(preds: torch.Tensor, target: torch.Tensor, **kwargs)[source]

compute()[source]

class transformers4rec.torch.ranking_metric.PrecisionAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.RecallAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.AvgPrecisionAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.DCGAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.NDCGAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.MeanReciprocalRankAt(top_ks=None, labels_onehot=False)[source]: Bases: transformers4rec.torch.ranking_metric.RankingMetric

transformers4rec.torch.trainer module

class transformers4rec.torch.trainer.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]

Bases: transformers.trainer.Trainer

An Trainer specialized for sequential recommendation including (session-based and sequtial recommendation)

Parameters

model (Model) – The Model defined using Transformers4Rec api.
args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.
schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None
train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None
eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None
train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None
eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None
compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None
incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard

get_train_dataloader()[source]: Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.

get_eval_dataloader(eval_dataset=None)[source]: Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.

get_test_dataloader(test_dataset=None)[source]: Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.

num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]: Overriding Trainer.num_examples() method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size

reset_lr_scheduler() → None [source]: Resets the LR scheduler of the previous Trainer.train() call, so that a new LR scheduler one is created by the next Trainer.train() call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train

create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]

static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]

Unified API to get any scheduler from its name.

Parameters

name ((str or :obj:`SchedulerType)) – The name of the scheduler to use.
optimizer ((torch.optim.Optimizer)) – The optimizer that will be used during training.
num_warmup_steps ((int, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
num_training_steps ((int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
num_cycles ((int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler

compute_loss(model, inputs, return_outputs=False)[source]: Overriding Trainer.compute_loss() To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}

prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]: Overriding Trainer.prediction_step() to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model

evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source]

Overriding Trainer.prediction_loop() (shared by Trainer.evaluate() and Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)

Parameters

dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data
description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test
prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None
ignore_keys (Optional[List[str]]) – Columns not accepted by the model.forward() method are automatically removed. by default None
metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval

load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]

This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)

Parameters

checkpoint_path (str) – Path to the checkpoint directory.
model (Optional[Model]) – Model class used by Trainer. by default None

property log_predictions_callback

log(logs: Dict[str, float]) → None [source]

transformers4rec.torch.trainer.process_metrics(metrics, prefix='', to_cpu=True)[source]

class transformers4rec.torch.trainer.IncrementalLoggingCallback(trainer: transformers4rec.torch.trainer.Trainer)[source]

Bases: transformers.trainer_callback.TrainerCallback

An TrainerCallback that changes the state of the Trainer on specific hooks for the purpose of the incremental logging :param trainer: :type trainer: Trainer

on_train_begin(args, state, control, model=None, **kwargs)[source]

on_train_end(args, state, control, model=None, **kwargs)[source]

on_epoch_end(args, state, control, model=None, **kwargs)[source]

class transformers4rec.torch.trainer.DatasetMock(nsteps=1)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

Mock to inform HF Trainer that the dataset is sized, and can be obtained via the generated/provided data loader

transformers4rec.torch.typing module

Module contents

class transformers4rec.torch.Schema(feature: Sequence[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>, weighted_feature: List[merlin_standard_lib.proto.schema_bp.WeightedFeature] = <betterproto._PLACEHOLDER object>, string_domain: List[merlin_standard_lib.proto.schema_bp.StringDomain] = <betterproto._PLACEHOLDER object>, float_domain: List[merlin_standard_lib.proto.schema_bp.FloatDomain] = <betterproto._PLACEHOLDER object>, int_domain: List[merlin_standard_lib.proto.schema_bp.IntDomain] = <betterproto._PLACEHOLDER object>, default_environment: List[str] = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, dataset_constraints: merlin_standard_lib.proto.schema_bp.DatasetConstraints = <betterproto._PLACEHOLDER object>, tensor_representation_group: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup] = <betterproto._PLACEHOLDER object>)[source]

Bases: merlin_standard_lib.proto.schema_bp._Schema

A collection of column schemas for a dataset.

feature: List[merlin_standard_lib.schema.schema.ColumnSchema] = Field(name=None,type=None,default=<betterproto._PLACEHOLDER object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'betterproto': FieldMetadata(number=1, proto_type='message', map_types=None, group=None, wraps=None)}),_field_type=None)

classmethod create(column_schemas: Optional[Union[List[Union[merlin_standard_lib.schema.schema.ColumnSchema, str]], Dict[str, Union[merlin_standard_lib.schema.schema.ColumnSchema, str]]]] = None, **kwargs)[source]

with_tags_based_on_properties(using_value_count=True, using_domain=True) → merlin_standard_lib.schema.schema.Schema [source]

apply(selector) → merlin_standard_lib.schema.schema.Schema [source]

apply_inverse(selector) → merlin_standard_lib.schema.schema.Schema [source]

filter_columns_from_dict(input_dict)[source]

select_by_type(to_select) → merlin_standard_lib.schema.schema.Schema [source]

remove_by_type(to_remove) → merlin_standard_lib.schema.schema.Schema [source]

select_by_tag(to_select) → merlin_standard_lib.schema.schema.Schema [source]

remove_by_tag(to_remove) → merlin_standard_lib.schema.schema.Schema [source]

select_by_name(to_select) → merlin_standard_lib.schema.schema.Schema [source]

remove_by_name(to_remove) → merlin_standard_lib.schema.schema.Schema [source]

map_column_schemas(map_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], merlin_standard_lib.schema.schema.ColumnSchema]) → merlin_standard_lib.schema.schema.Schema [source]

filter_column_schemas(filter_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], bool], negate=False) → merlin_standard_lib.schema.schema.Schema [source]

property column_names

property column_schemas

property item_id_column_name

from_json(value: Union[str, bytes]) → merlin_standard_lib.schema.schema.Schema [source]

to_proto_text() → str [source]

from_proto_text(path_or_proto_text: str) → merlin_standard_lib.schema.schema.Schema [source]

copy(**kwargs) → merlin_standard_lib.schema.schema.Schema [source]

add(other, allow_overlap=True) → merlin_standard_lib.schema.schema.Schema [source]

transformers4rec.torch.requires_schema(module)[source]

class transformers4rec.torch.T4RecConfig[source]

Bases: object

to_huggingface_torch_model()[source]

to_torch_model(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction='mean', **kwargs)[source]

property transformers_config_cls

classmethod build(*args, **kwargs)[source]

class transformers4rec.torch.GPT2Config(vocab_size=50257, n_positions=1024, n_embd=768, n_layer=12, n_head=12, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, scale_attn_weights=True, use_cache=True, bos_token_id=50256, eos_token_id=50256, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.gpt2.configuration_gpt2.GPT2Config

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]

class transformers4rec.torch.XLNetConfig(vocab_size=32000, d_model=1024, n_layer=24, n_head=16, d_inner=4096, ff_activation='gelu', untie_r=True, attn_type='bi', initializer_range=0.02, layer_norm_eps=1e-12, dropout=0.1, mem_len=512, reuse_len=None, use_mems_eval=True, use_mems_train=False, bi_data=False, clamp_len=- 1, same_length=False, summary_type='last', summary_use_proj=True, summary_activation='tanh', summary_last_dropout=0.1, start_n_top=5, end_n_top=5, pad_token_id=5, bos_token_id=1, eos_token_id=2, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.xlnet.configuration_xlnet.XLNetConfig

classmethod build(d_model, n_head, n_layer, total_seq_length=None, attn_type='bi', hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, mem_len=1, **kwargs)[source]

class transformers4rec.torch.TransfoXLConfig(vocab_size=267735, cutoffs=[20000, 40000, 200000], d_model=1024, d_embed=1024, n_head=16, d_head=64, d_inner=4096, div_val=4, pre_lnorm=False, n_layer=18, mem_len=1600, clamp_len=1000, same_length=True, proj_share_all_but_first=True, attn_type=0, sample_softmax=- 1, adaptive=True, dropout=0.1, dropatt=0.0, untie_r=True, init='normal', init_range=0.01, proj_init_std=0.01, init_std=0.02, layer_norm_epsilon=1e-05, eos_token_id=0, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.transfo_xl.configuration_transfo_xl.TransfoXLConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]

class transformers4rec.torch.LongformerConfig(attention_window: Union[List[int], int] = 512, sep_token_id: int = 2, pad_token_id: int = 1, bos_token_id: int = 0, eos_token_id: int = 2, vocab_size: int = 30522, hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, onnx_export: bool = False, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.longformer.configuration_longformer.LongformerConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]

class transformers4rec.torch.AlbertConfig(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.albert.configuration_albert.AlbertConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]

class transformers4rec.torch.ReformerConfig(attention_head_size=64, attn_layers=['local', 'lsh', 'local', 'lsh', 'local', 'lsh'], axial_norm_std=1.0, axial_pos_embds=True, axial_pos_shape=[64, 64], axial_pos_embds_dim=[64, 192], chunk_size_lm_head=0, eos_token_id=2, feed_forward_size=512, hash_seed=None, hidden_act='relu', hidden_dropout_prob=0.05, hidden_size=256, initializer_range=0.02, is_decoder=False, layer_norm_eps=1e-12, local_num_chunks_before=1, local_num_chunks_after=0, local_attention_probs_dropout_prob=0.05, local_attn_chunk_length=64, lsh_attn_chunk_length=64, lsh_attention_probs_dropout_prob=0.0, lsh_num_chunks_before=1, lsh_num_chunks_after=0, max_position_embeddings=4096, num_attention_heads=12, num_buckets=None, num_hashes=1, pad_token_id=0, vocab_size=320, tie_word_embeddings=False, use_cache=True, classifier_dropout=None, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.reformer.configuration_reformer.ReformerConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, axial_pos_shape_first_dim=4, **kwargs)[source]

class transformers4rec.torch.ElectraConfig(vocab_size=30522, embedding_size=128, hidden_size=256, num_hidden_layers=12, num_attention_heads=4, intermediate_size=1024, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, summary_type='first', summary_use_proj=True, summary_activation='gelu', summary_last_dropout=0.1, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.electra.configuration_electra.ElectraConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]

class transformers4rec.torch.T4RecTrainingArguments(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: Union[transformers.trainer_utils.SchedulerType, str] = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'warning', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', save_steps: float = 500, save_total_limit: Optional[int] = None, save_safetensors: Optional[bool] = False, save_on_each_node: bool = False, no_cuda: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: Optional[int] = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, ddp_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[float] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', fsdp: str = '', fsdp_min_num_params: int = 0, fsdp_config: Optional[str] = None, fsdp_transformer_layer_cls_to_wrap: Optional[str] = None, deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: Union[transformers.training_args.OptimizerNames, str] = 'adamw_hf', optim_args: Optional[str] = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: Union[transformers.trainer_utils.HubStrategy, str] = 'every_save', hub_token: Optional[str] = None, hub_private_repo: bool = False, gradient_checkpointing: bool = False, include_inputs_for_metrics: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: Optional[str] = None, ray_scope: Optional[str] = 'last', ddp_timeout: Optional[int] = 1800, torch_compile: bool = False, torch_compile_backend: Optional[str] = None, torch_compile_mode: Optional[str] = None, xpu_backend: Optional[str] = None, max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'merlin', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 0, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source]

Bases: transformers.training_args.TrainingArguments

Class that inherits HF TrainingArguments and add on top of it arguments needed for session-based and sequential-based recommendation

Parameters

shuffle_buffer_size (int) –
validate_every (Optional[int], int) – Run validation set every this epoch. -1 means no validation is used by default -1
eval_on_test_set (bool) –
eval_steps_on_train_set (int) –
predict_top_k (Option[int], int) – Truncate recommendation list to the highest top-K predicted items, (do not affect evaluation metrics computation), this parameter is specific to NextItemPredictionTask. by default 0
log_predictions (Optional[bool], bool) – log predictions, labels and metadata features each –compute_metrics_each_n_steps (for test set). by default False
log_attention_weights (Optional[bool], bool) – Logs the inputs and attention weights each –eval_steps (only test set)” by default False
learning_rate_num_cosine_cycles_by_epoch (Optional[int], int) – Number of cycles for by epoch when –lr_scheduler_type = cosine_with_warmup. The number of waves in the cosine schedule (e.g. 0.5 is to just decrease from the max value to 0, following a half-cosine). by default 1.25
experiments_group (Optional[str], str) – Name of the Experiments Group, for organizing job runs logged on W&B by default “default”

max_sequence_length: Optional[int] = None

shuffle_buffer_size: int = 0

data_loader_engine: str = 'merlin'

eval_on_test_set: bool = False

eval_steps_on_train_set: int = 20

predict_top_k: int = 0

learning_rate_num_cosine_cycles_by_epoch: float = 1.25

log_predictions: bool = False

compute_metrics_each_n_steps: int = 1

experiments_group: str = 'default'

property place_model_on_device: Override the method to allow running training on cpu

output_dir: str

class transformers4rec.torch.SequentialBlock(*args, output_size=None)[source]

Bases: transformers4rec.torch.block.base.BlockBase, torch.nn.modules.container.Sequential

property inputs

add_module(name: str, module: Optional[torch.nn.modules.module.Module]) → None [source]

add_module_and_maybe_build(name: str, module, parent, idx) → torch.nn.modules.module.Module [source]

forward(input, training=False, testing=False, **kwargs)[source]

build(input_size, schema=None, **kwargs)[source]

as_tabular(name=None)[source]

forward_output_size(input_size)[source]

static get_children_by_class_name(parent, *class_name)[source]

transformers4rec.torch.right_shift_block(self, other)[source]

transformers4rec.torch.build_blocks(*modules)[source]

class transformers4rec.torch.BlockBase(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module

to_model(prediction_task_or_head, inputs=None, **kwargs)[source]

as_tabular(name=None)[source]

class transformers4rec.torch.TabularBlock(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.tabular.base.TabularModule, abc.ABC

TabularBlock extends TabularModule to turn it into a block with output size info.

Parameters

pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

to_module(shape_or_module, device=None)[source]

output_size(input_size=None)[source]

build(input_size, schema=None, **kwargs)[source]

class transformers4rec.torch.Block(module: torch.nn.modules.module.Module, output_size: Union[List[int], torch.Size])[source]

Bases: transformers4rec.torch.block.base.BlockBase

forward(inputs, **kwargs)[source]

forward_output_size(input_size)[source]

class transformers4rec.torch.MLPBlock(dimensions, activation=<class 'torch.nn.modules.activation.ReLU'>, use_bias: bool = True, dropout=None, normalization=None, filter_features=None)[source]

Bases: transformers4rec.torch.block.base.BuildableBlock

build(input_shape) → transformers4rec.torch.block.base.SequentialBlock [source]

class transformers4rec.torch.TabularTransformation(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module, abc.ABC

Transformation that takes in TabularData and outputs TabularData.

forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]

classmethod parse(class_or_str)[source]

class transformers4rec.torch.SequentialTabularTransformations(*transformation: Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]])[source]

Bases: transformers4rec.torch.block.base.SequentialBlock

A sequential container, modules will be added to it in the order they are passed in.

Parameters: transformation (TabularTransformationType) – transformations that are passed in here will be called in order.

append(transformation)[source]

class transformers4rec.torch.TabularAggregation(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module, abc.ABC

Aggregation of TabularData that outputs a single Tensor

forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor [source]

classmethod parse(class_or_str)[source]

class transformers4rec.torch.StochasticSwapNoise(schema=None, pad_token=0, replacement_prob=0.1)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies Stochastic replacement of sequence features. It can be applied as a pre transform like TransformerBlock(pre=”stochastic-swap-noise”)

forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], input_mask: Optional[torch.Tensor] = None, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]

forward_output_size(input_size)[source]

augment(input_tensor: torch.Tensor, mask: Optional[torch.Tensor] = None) → torch.Tensor [source]

class transformers4rec.torch.TabularLayerNorm(features_dim: Optional[Dict[str, int]] = None)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies Layer norm to each input feature individually, before the aggregation

classmethod from_feature_config(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig])[source]

forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]

forward_output_size(input_size)[source]

build(input_size, **kwargs)[source]

class transformers4rec.torch.TabularDropout(dropout_rate=0.0)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies dropout transformation.

forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]

forward_output_size(input_size)[source]

class transformers4rec.torch.TransformerBlock(transformer: Union[transformers.modeling_utils.PreTrainedModel, transformers.configuration_utils.PretrainedConfig], masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, prepare_module: Optional[Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = None, output_fn=<function TransformerBlock.<lambda>>)[source]

Bases: transformers4rec.torch.block.base.BlockBase

Class to support HF Transformers for session-based and sequential-based recommendation models.

Parameters

transformer (TransformerBody) – The T4RecConfig or a pre-trained HF object related to specific transformer architecture.
masking – Needed when masking is applied on the inputs.

TRANSFORMER_TO_PREPARE: Dict[Type[transformers.modeling_utils.PreTrainedModel], Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = {<class 'transformers.models.gpt2.modeling_gpt2.GPT2Model'>: <class 'transformers4rec.torch.block.transformer.GPT2Prepare'>}

classmethod from_registry(transformer: str, d_model: int, n_head: int, n_layer: int, total_seq_length: int, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None)[source]

Load the HF transformer architecture based on its name

Parameters

transformer (str) – Name of the Transformer to use. Possible values are : [“reformer”, “gtp2”, “longformer”, “electra”, “albert”, “xlnet”]
d_model (int) – size of hidden states for Transformers
n_head – Number of attention heads for Transformers
n_layer (int) – Number of layers for RNNs and Transformers”
total_seq_length (int) – The maximum sequence length

forward(inputs_embeds, **kwargs)[source]: Transformer Models

forward_output_size(input_size)[source]

class transformers4rec.torch.ContinuousFeatures(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for continuous features.

Parameters

features (List[str]) – List of continuous features to include in this module.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_features(features, **kwargs)[source]

forward(inputs, **kwargs)[source]

forward_output_size(input_sizes)[source]

class transformers4rec.torch.EmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for embedding-lookups for categorical features.

For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs when the module is called (so before forward).
post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs after the module is called (so after forward).
aggregation: Union[str, TabularAggregation], optional: Aggregation to apply after processing the forward-method to output a single Tensor.

property item_embedding_table

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.module.Module [source]

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs) → Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source]

Instantitates EmbeddingFeatures from a DatasetSchema.

Parameters

schema (DatasetSchema) – Dataset schema
embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None
default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in default_soft_embedding_dim, by default 64
infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False
infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
item_id (Optional[str], optional) – Name of the item id column (feature), by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None

Returns

Returns the EmbeddingFeatures for the dataset schema

Return type

Optional[EmbeddingFeatures]

item_ids(inputs) → torch.Tensor [source]

forward(inputs, **kwargs)[source]

forward_output_size(input_sizes)[source]

class transformers4rec.torch.SoftEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs) → Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source]

Instantitates SoftEmbeddingFeatures from a DatasetSchema.

Parameters

schema (DatasetSchema) – Dataset schema
soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None
soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in soft_embedding_cardinalities, by default 10
soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None
soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in soft_embedding_dim_default, by default 8
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None

Returns

Returns a SoftEmbeddingFeatures instance from the dataset schema

Return type

Optional[SoftEmbeddingFeatures]

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → transformers4rec.torch.features.embedding.SoftEmbedding [source]

class transformers4rec.torch.PretrainedEmbeddingsInitializer(weight_matrix: Union[torch.Tensor, List[List[float]]], trainable: bool = False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Initializer of embedding tables with pre-trained weights

Parameters

weight_matrix (Union[torch.Tensor, List[List[float]]]) – A 2D torch or numpy tensor or lists of lists with the pre-trained weights for embeddings. The expect dims are (embedding_cardinality, embedding_dim). The embedding_cardinality can be inferred from the column schema, for example, schema.select_by_name(“item_id”).feature[0].int_domain.max + 1. The first position of the embedding table is reserved for padded items (id=0).
trainable (bool) – Whether the embedding table should be trainable or not

training: bool

forward(x)[source]

class transformers4rec.torch.TabularSequenceFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.tabular.TabularFeatures

Input module that combines different types of features to a sequence: continuous, categorical & text.

Parameters

continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock.
masking (MaskSequence, optional) – Masking to apply to the inputs.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs) → transformers4rec.torch.features.sequence.TabularSequenceFeatures [source]

Instantiates TabularFeatures from a DatasetSchema

Parameters

schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None
d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None
masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

property masking

set_masking(value)[source]

property item_id

property item_embedding_table

forward(inputs, training=False, testing=False, **kwargs)[source]

project_continuous_features(dimensions)[source]

forward_output_size(input_size)[source]

class transformers4rec.torch.SequenceEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers.

Parameters

feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
padding_idx (int) – The symbol to use for padding.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.sparse.Embedding [source]

forward_output_size(input_sizes)[source]

class transformers4rec.torch.FeatureConfig(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source]: Bases: object

class transformers4rec.torch.TableConfig(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source]: Bases: object

class transformers4rec.torch.TabularFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.MergeTabular

Input module that combines different types of features: continuous, categorical & text.

Parameters

continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs when the module is called (so before forward).
post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional: Transformations to apply on the inputs after the module is called (so after forward).
aggregation: Union[str, TabularAggregation], optional: Aggregation to apply after processing the forward-method to output a single Tensor.

CONTINUOUS_MODULE_CLASS: alias of transformers4rec.torch.features.continuous.ContinuousFeatures

EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.embedding.EmbeddingFeatures

SOFT_EMBEDDING_MODULE_CLASS: alias of transformers4rec.torch.features.embedding.SoftEmbeddingFeatures

project_continuous_features(mlp_layers_dims: Union[List[int], int]) → transformers4rec.torch.features.tabular.TabularFeatures [source]

Combine all concatenated continuous features with stacked MLP layers

Parameters: mlp_layers_dims (Union[List[int], int]) – The MLP layer dimensions
Returns: Returns the same TabularFeatures object with the continuous features projected
Return type: TabularFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, **kwargs) → transformers4rec.torch.features.tabular.TabularFeatures [source]

Instantiates TabularFeatures from a DatasetSchema

Parameters

schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

forward_output_size(input_size)[source]

property continuous_module

property categorical_module

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

Head of a Model, a head has a single body but could have multiple prediction-tasks. :param body: TODO :type body: Block :param prediction_tasks: TODO :type prediction_tasks: Union[List[PredictionTask], PredictionTask], optional :param task_blocks: TODO :param task_weights: TODO :type task_weights: List[float], optional :param loss_reduction: TODO :type loss_reduction: str, default=”mean” :param inputs: TODO :type inputs: TabularFeaturesType, optional

build(inputs=None, device=None, task_blocks=None)[source]: Build each prediction task that’s part of the head. :param body: :param inputs: :param device: :param task_blocks:

Instantiate a Head from a Schema through tagged targets. :param schema: Schema to use for inferring all targets based on the tags. :type schema: DatasetSchema :param body: :param task_blocks: :param task_weight_dict: :param loss_reduction: :param inputs:

Returns
Return type: Head

pop_labels(inputs: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]

Pop the labels from the different prediction_tasks from the inputs. :param inputs: Input dictionary containing all targets. :type inputs: TabularData

Returns
Return type: TabularData

forward(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], training: bool = False, testing: bool = False, targets: Optional[Union[torch.Tensor, Dict[str, torch.Tensor]]] = None, call_body: bool = False, top_k: Optional[int] = None, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]

calculate_metrics(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]]) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]

Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors to use for calculate metrics.

They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple tasks are used. In the second case, the dictionary is indexed by the tasks names.

Parameters: targets – The tensor or dictionary of targets to use for computing the metrics of one or multiple tasks.

compute_metrics(mode: Optional[str] = None) → Dict[str, Union[float, torch.Tensor]][source]

reset_metrics()[source]

property task_blocks

to_model(**kwargs) → transformers4rec.torch.model.base.Model[source]: Convert the head to a Model. :returns: :rtype: Model

training: bool

class transformers4rec.torch.Model(*head: transformers4rec.torch.model.base.Head, head_weights: Optional[List[float]] = None, head_reduction: str = 'mean', optimizer: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, name: Optional[str] = None, max_sequence_length: Optional[int] = None, top_k: Optional[int] = None)[source]

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

forward(inputs: Dict[str, torch.Tensor], targets=None, training=False, testing=False, **kwargs)[source]

calculate_metrics(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]]) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]

Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors returned by the model.

They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple heads/tasks are used. In the second case, the dictionary is indexed by the tasks names.

Parameters: targets – The tensor or dictionary of targets returned by the model. They are used for computing the metrics of one or multiple tasks.

compute_metrics(mode=None) → Dict[str, Union[float, torch.Tensor]][source]

reset_metrics()[source]

to_lightning()[source]

fit(dataloader, optimizer=<class 'torch.optim.adam.Adam'>, eval_dataloader=None, num_epochs=1, amp=False, train=True, verbose=True, compute_metric=True)[source]

evaluate(dataloader, targets=None, training=False, testing=True, verbose=True, mode='eval')[source]

property input_schema

property output_schema

property prediction_tasks

save(path: Union[str, os.PathLike], model_name='t4rec_model_class')[source]

Saves the model to f”{export_path}/{model_name}.pkl” using cloudpickle :param path: Path to the directory where the T4Rec model should be saved. :type path: Union[str, os.PathLike] :param model_name:

the name given to the pickle file storing the T4Rec model,
by default ‘t4rec_model_class’

classmethod load(path: Union[str, os.PathLike], model_name='t4rec_model_class') → transformers4rec.torch.model.base.Model[source]

Loads a T4Rec model that was saved with model.save(). :param path: Path to the directory where the T4Rec model is saved. :type path: Union[str, os.PathLike] :param model_name:

the name given to the pickle file storing the T4Rec model,
by default ‘t4rec_model_class’.

training: bool

class transformers4rec.torch.PredictionTask(loss: torch.nn.modules.module.Module, metrics: Optional[Iterable[torchmetrics.metric.Metric]] = None, target_name: Optional[str] = None, task_name: Optional[str] = None, forward_to_prediction_fn: Callable[[torch.Tensor], torch.Tensor] = <function PredictionTask.<lambda>>, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, summary_type: str = 'last')[source]

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

Individual prediction-task of a model. :param loss: The loss to use during training of this task. :type loss: torch.nn.Module :param metrics: The metrics to calculate during training & evaluation. :type metrics: torch.nn.Module :param target_name: Name of the target, this is needed when there are multiple targets. :type target_name: str, optional :param task_name: Name of the prediction task, if not provided a name will be automatically constructed based

on the target-name & class-name.

Parameters

forward_to_prediction_fn (Callable[[torch.Tensor], torch.Tensor]) – Function to apply before the prediction
task_block (BlockType) – Module to transform input tensor before computing predictions.
pre (BlockType) – Module to compute the predictions probabilities.
summary_type (str) –
This is used to summarize a sequence into a single tensor. Accepted values are:
- ”last” – Take the last token hidden state (like XLNet)
- ”first” – Take the first token hidden state (like Bert)
- ”mean” – Take the mean of all tokens hidden states
- ”cls_index” – Supply a Tensor of classification token position (GPT/GPT-2)
- ”attn” – Not implemented now, use multi-head attention

build(body: Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock], input_size, inputs: Optional[transformers4rec.torch.features.base.InputBlock] = None, device=None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre=None)[source]: The method will be called when block is converted to a model, i.e when linked to prediction head. :param block: the model block to link with head :param device: set the device for the metrics and layers of the task

forward(inputs: torch.Tensor, targets: Optional[torch.Tensor] = None, training: bool = False, testing: bool = False)[source]

property task_name

child_name(name)[source]

set_metrics(metrics)[source]

calculate_metrics(predictions: torch.Tensor, targets: torch.Tensor) → Dict[str, torch.Tensor][source]

compute_metrics(**kwargs)[source]

metric_name(metric: torchmetrics.metric.Metric) → str [source]

reset_metrics()[source]

to_head(body, inputs=None, **kwargs) → transformers4rec.torch.model.base.Head[source]

to_model(body, inputs=None, **kwargs) → transformers4rec.torch.model.base.Model[source]

training: bool

class transformers4rec.torch.AsTabular(output_name: str)[source]

Bases: transformers4rec.torch.tabular.base.TabularBlock

Converts a Tensor to TabularData by converting it to a dictionary.

Parameters: output_name (str) – Name that should be used as the key in the output dictionary.

forward(inputs: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]

forward_output_size(input_size)[source]

class transformers4rec.torch.ConcatFeatures(*args, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.TabularAggregation

Aggregation by stacking all values in TabularData, all non-sequential values will be converted to a sequence.

The output of this concatenation will have 3 dimensions.

forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor [source]

forward_output_size(input_size)[source]

class transformers4rec.torch.FilterFeatures(to_include: List[str], pop: bool = False)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Module that filters out certain features from TabularData.”

Parameters

to_include (List[str]) – List of features to include in the result of calling the module
pop (bool) – Boolean indicating whether to pop the features to exclude from the inputs dictionary.

forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]

Parameters

inputs (TabularData) – Input dictionary containing features to filter.
Filtered TabularData that only contains the feature-names in self.to_include. (Returns) –
------- –

forward_output_size(input_shape)[source]

Parameters: input_shape –

class transformers4rec.torch.ElementwiseSum[source]

Bases: transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation

Aggregation by first stacking all values in TabularData in the first dimension, and then summing the result.

forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor [source]

forward_output_size(input_size)[source]

class transformers4rec.torch.ElementwiseSumItemMulti(schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation

Aggregation by applying the ElementwiseSum aggregation to all features except the item-id, and then multiplying this with the item-ids.

Parameters: schema (DatasetSchema) –

forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor [source]

forward_output_size(input_size)[source]

REQUIRES_SCHEMA = True

class transformers4rec.torch.MergeTabular(*modules_to_merge: Union[transformers4rec.torch.tabular.base.TabularModule, Dict[str, transformers4rec.torch.tabular.base.TabularModule]], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.TabularBlock

Merge multiple TabularModule’s into a single output of TabularData.

Parameters

modules_to_merge (Union[TabularModule, Dict[str, TabularModule]]) – TabularModules to merge into, this can also be one or multiple dictionaries keyed by the name the module should have.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

property merge_values

forward(inputs: Dict[str, torch.Tensor], training=True, **kwargs) → Dict[str, torch.Tensor][source]

forward_output_size(input_size)[source]

build(input_size, **kwargs)[source]

class transformers4rec.torch.StackFeatures(axis: int = - 1)[source]

Bases: transformers4rec.torch.tabular.base.TabularAggregation

Aggregation by stacking all values in input dictionary in the given dimension.

Parameters: axis (int, default=-1) – Axis to use for the stacking operation.

forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor [source]

forward_output_size(input_size)[source]

class transformers4rec.torch.BinaryClassificationTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=BCELoss(), metrics=(BinaryPrecision(), BinaryRecall(), BinaryAccuracy()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

Returns a PredictionTask for binary classification.

Example usage:

# Define the input module to process the tabular input features.
input_module = tr.TabularSequenceFeatures.from_schema(
    schema,
    max_sequence_length=max_sequence_length,
    continuous_projection=d_model,
    aggregation="concat",
    masking=None,
)

# Define XLNetConfig class and set default parameters for HF XLNet config.
transformer_config = tr.XLNetConfig.build(
    d_model=d_model, n_head=4, n_layer=2, total_seq_length=max_sequence_length
)

# Define the model block including: inputs, masking, projection and transformer block.
body = tr.SequentialBlock(
    input_module,
    tr.MLPBlock([64]),
    tr.TransformerBlock(
        transformer_config,
        masking=input_module.masking
    )
)

# Define a head with BinaryClassificationTask.
head = tr.Head(
    body,
    tr.BinaryClassificationTask(
        "click",
        summary_type="mean",
        metrics=[
            tm.Precision(task='binary'),
            tm.Recall(task='binary'),
            tm.Accuracy(task='binary'),
            tm.F1Score(task='binary')
        ]
    ),
    inputs=input_module,
)

# Get the end-to-end Model class.
model = tr.Model(head)

Parameters

target_name (Optional[str] = None) – Specifies the variable name that represents the positive and negative values.
task_name (Optional[str] = None) – Specifies the name of the prediction task. If this parameter is not specified, a name is automatically constructed based on target_name and the Python class name of the model.
task_block (Optional[BlockType] = None) – Specifies a module to transform the input tensor before computing predictions.
loss (torch.nn.Module) – Specifies the loss function for the task. The default class is torch.nn.BCELoss.
metrics (Tuple[torch.nn.Module, ..]) – Specifies the metrics to calculate during training and evaluation. The default metrics are Precision, Recall, and Accuracy.
summary_type (str) –
Summarizes a sequence into a single tensor. Accepted values are:
- last – Take the last token hidden state (like XLNet)
- first – Take the first token hidden state (like Bert)
- mean – Take the mean of all tokens hidden states
- cls_index – Supply a Tensor of classification token position (GPT/GPT-2)
- attn – Not implemented now, use multi-head attention

DEFAULT_LOSS = BCELoss()

DEFAULT_METRICS = (BinaryPrecision(), BinaryRecall(), BinaryAccuracy())

training: bool

class transformers4rec.torch.RegressionTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=MSELoss(), metrics=(MeanSquaredError()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

DEFAULT_LOSS = MSELoss()

DEFAULT_METRICS = (MeanSquaredError(),)

training: bool

class transformers4rec.torch.NextItemPredictionTask(loss: torch.nn.modules.module.Module = CrossEntropyLoss(), metrics: Iterable[torchmetrics.metric.Metric] = (NDCGAt(), AvgPrecisionAt(), RecallAt()), task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, task_name: str = 'next-item', weight_tying: bool = False, softmax_temperature: float = 1, padding_idx: int = 0, target_dim: Optional[int] = None, sampled_softmax: Optional[bool] = False, max_n_samples: Optional[int] = 100)[source]

Bases: transformers4rec.torch.model.base.PredictionTask

This block performs item prediction task for session and sequential-based models. It requires a body containing a masking schema to use for training and target generation. For the supported masking schemes, please refers to: https://nvidia-merlin.github.io/Transformers4Rec/main/model_definition.html#sequence-masking

Parameters

loss (torch.nn.Module) – Loss function to use. Defaults to NLLLos.
metrics (Iterable[torchmetrics.Metric]) – List of ranking metrics to use for evaluation.
task_block – Module to transform input tensor before computing predictions.
task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name.
weight_tying (bool) – The item id embedding table weights are shared with the prediction network layer.
softmax_temperature (float) – Softmax temperature, used to reduce model overconfidence, so that softmax(logits / T). Value 1.0 reduces to regular softmax.
padding_idx (int) – pad token id.
target_dim (int) – vocabulary size of item ids
sampled_softmax (Optional[bool]) – Enables sampled softmax. By default False
max_n_samples (Optional[int]) – Number of samples for sampled softmax. By default 100

DEFAULT_METRICS = (NDCGAt(), AvgPrecisionAt(), RecallAt())

build(body, input_size, device=None, inputs=None, task_block=None, pre=None)[source]: Build method, this is called by the Head.

forward(inputs: torch.Tensor, targets=None, training=False, testing=False, top_k=None, **kwargs)[source]

remove_pad_3d(inp_tensor, non_pad_mask)[source]

calculate_metrics(predictions, targets) → Dict[str, torch.Tensor][source]

compute_metrics()[source]

training: bool

class transformers4rec.torch.TabularModule(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwargs)[source]

Bases: torch.nn.modules.module.Module

PyTorch Module that’s specialized for tabular-data by integrating many often used operations.

Parameters

pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, tags=None, **kwargs) → Optional[transformers4rec.torch.tabular.base.TabularModule][source]

Instantiate a TabularModule instance from a DatasetSchema.

Parameters

schema –
tags –
kwargs –

Returns

Return type

Optional[TabularModule]

classmethod from_features(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → transformers4rec.torch.tabular.base.TabularModule[source]

Initializes a TabularModule instance where the contents of features will be filtered: out

Parameters

features (List[str]) – A list of feature-names that will be used as the first pre-processing op to filter out all other features not in this list.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

Returns

Return type

TabularModule

property pre: returns: :rtype: SequentialTabularTransformations, optional

property post: returns: :rtype: SequentialTabularTransformations, optional

property aggregation: returns: :rtype: TabularAggregation, optional

pre_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None) → Dict[str, torch.Tensor][source]

Method that’s typically called before the forward method for pre-processing.

Parameters

inputs (TabularData) – input-data, typically the output of the forward method.
transformations (TabularAggregationType, optional) –

Returns

Return type

TabularData

forward(x: Dict[str, torch.Tensor], *args, **kwargs) → Dict[str, torch.Tensor][source]

post_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, merge_with: Optional[Union[transformers4rec.torch.tabular.base.TabularModule, List[transformers4rec.torch.tabular.base.TabularModule]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]

Method that’s typically called after the forward method for post-processing.

Parameters

inputs (TabularData) – input-data, typically the output of the forward method.
transformations (TabularTransformationType, optional) – Transformations to apply on the input data.
merge_with (Union[TabularModule, List[TabularModule]], optional) – Other TabularModule’s to call and merge the outputs with.
aggregation (TabularAggregationType, optional) – Aggregation to aggregate the output to a single Tensor.

Returns

Return type

TensorOrTabularData (Tensor when aggregation is set, else TabularData)

merge(other)

training: bool

class transformers4rec.torch.SoftEmbedding(num_embeddings, embeddings_dim, emb_initializer=None)[source]

Bases: torch.nn.modules.module.Module

Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings

forward(input_numeric)[source]

training: bool

class transformers4rec.torch.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]

Bases: transformers.trainer.Trainer

An Trainer specialized for sequential recommendation including (session-based and sequtial recommendation)

Parameters

model (Model) – The Model defined using Transformers4Rec api.
args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.
schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None
train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None
eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None
train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None
eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None
compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None
incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard

get_train_dataloader()[source]: Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.

get_eval_dataloader(eval_dataset=None)[source]: Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.

get_test_dataloader(test_dataset=None)[source]: Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.

num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]: Overriding Trainer.num_examples() method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size

reset_lr_scheduler() → None [source]: Resets the LR scheduler of the previous Trainer.train() call, so that a new LR scheduler one is created by the next Trainer.train() call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train

create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]

static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]

Unified API to get any scheduler from its name.

Parameters

name ((str or :obj:`SchedulerType)) – The name of the scheduler to use.
optimizer ((torch.optim.Optimizer)) – The optimizer that will be used during training.
num_warmup_steps ((int, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
num_training_steps ((int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
num_cycles ((int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler

compute_loss(model, inputs, return_outputs=False)[source]: Overriding Trainer.compute_loss() To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}

prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]: Overriding Trainer.prediction_step() to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model

evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source]

Overriding Trainer.prediction_loop() (shared by Trainer.evaluate() and Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)

Parameters

dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data
description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test
prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None
ignore_keys (Optional[List[str]]) – Columns not accepted by the model.forward() method are automatically removed. by default None
metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval

load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]

This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)

Parameters

checkpoint_path (str) – Path to the checkpoint directory.
model (Optional[Model]) – Model class used by Trainer. by default None

property log_predictions_callback

log(logs: Dict[str, float]) → None [source]

transformers4rec.torch.LabelSmoothCrossEntropyLoss(smoothing: float = 0.0, reduction: str = 'mean', **kwargs)[source]

Coss-entropy loss with label smoothing. This is going to be deprecated. You should use torch.nn.CrossEntropyLoss() directly that in recent PyTorch versions already supports label_smoothing arg

Parameters

smoothing (float) – The label smoothing factor. Specify a value between 0 and 1.
reduction (str) – Specifies the reduction to apply to the output. Specify one of none, sum, or mean.
from https (Adapted) –