transformers4rec.torch package

Submodules

transformers4rec.torch.masking module

class transformers4rec.torch.masking.MaskingInfo(schema: torch.Tensor, targets: torch.Tensor)[source]

Bases: object

schema: torch.Tensor
targets: torch.Tensor
class transformers4rec.torch.masking.MaskSequence(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module

Base class to prepare masked items inputs/labels for language modeling tasks.

Transformer architectures can be trained in different ways. Depending of the training method, there is a specific masking schema. The masking schema sets the items to be predicted (labels) and mask (hide) their positions in the sequence so that they are not used by the Transformer layers for prediction.

We currently provide 4 different masking schemes out of the box:
  • Causal LM (clm)

  • Masked LM (mlm)

  • Permutation LM (plm)

  • Replacement Token Detection (rtd)

This class can be extended to add different a masking scheme.

Parameters
  • hidden_size – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • pad_token (int, default = 0) – Index of the padding token used for getting batch of sequences with the same length

compute_masked_targets(item_ids: torch.Tensor, training: bool = False, testing: bool = False)transformers4rec.torch.masking.MaskingInfo[source]

Method to prepare masked labels based on the sequence of item ids. It returns The true labels of masked positions and the related boolean mask. And the attributes of the class mask_schema and masked_targets are updated to be re-used in other modules.

item_ids: torch.Tensor

The sequence of input item ids used for deriving labels of next item prediction task.

training: bool

Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.

testing: bool

Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.

Tuple[MaskingSchema, MaskedTargets]

apply_mask_to_inputs(inputs: torch.Tensor, schema: torch.Tensor, training: bool = False, testing: bool = False)torch.Tensor[source]

Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.

Parameters
  • inputs (torch.Tensor) – The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)

  • schema (MaskingSchema) – The boolean mask indicating masked positions.

predict_all(item_ids: torch.Tensor)transformers4rec.torch.masking.MaskingInfo[source]

Prepare labels for all next item predictions instead of last-item predictions in a user’s sequence.

Parameters

item_ids (torch.Tensor) – The sequence of input item ids used for deriving labels of next item prediction task.

Returns

Return type

Tuple[MaskingSchema, MaskedTargets]

forward(inputs: torch.Tensor, item_ids: torch.Tensor, training: bool = False, testing: bool = False)torch.Tensor[source]
forward_output_size(input_size)[source]
transformer_required_arguments()Dict[str, Any][source]
transformer_optional_arguments()Dict[str, Any][source]
property transformer_arguments

Prepare additional arguments to pass to the Transformer forward methods.

class transformers4rec.torch.masking.CausalLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • train_on_last_item_seq_only (predict only last item during training) –

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training: bool = False, testing: bool = False)torch.Tensor[source]
class transformers4rec.torch.masking.MaskedLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it.

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training=False, testing=False)torch.Tensor[source]

Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.

inputs: torch.Tensor

The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)

schema: MaskingSchema

The boolean mask indicating masked positions.

training: bool

Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.

testing: bool

Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.

class transformers4rec.torch.masking.PermutationLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskSequence

In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • max_span_length (int) – maximum length of a span of masked items

  • plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items

  • permute_all (bool) – Compute partial span-based prediction (=False) or not.

compute_masked_targets(item_ids: torch.Tensor, training=False, **kwargs)transformers4rec.torch.masking.MaskingInfo[source]
transformer_required_arguments()Dict[str, Any][source]
class transformers4rec.torch.masking.ReplacementLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)[source]

Bases: transformers4rec.torch.masking.MaskedLanguageModeling

Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not

get_fake_tokens(itemid_seq, target_flat, logits)[source]

Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements.

Parameters
  • itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids

  • target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences

  • logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise

Returns

  • corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement

  • discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items

  • batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled

sample_from_softmax(logits: torch.Tensor)torch.Tensor[source]

Sampling method for replacement token modeling (ELECTRA)

Parameters

logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model

Returns

samples – ids of replacements items.

Return type

torch.Tensor(#pos_item)

transformers4rec.torch.ranking_metric module

class transformers4rec.torch.ranking_metric.RankingMetric(top_ks=None, labels_onehot=False)[source]

Bases: torchmetrics.metric.Metric

Metric wrapper for computing ranking metrics@K for session-based task.

Parameters
  • top_ks (list, default [2, 5])) – list of cutoffs

  • labels_onehot (bool) – Enable transform the labels to one-hot representation

update(preds: torch.Tensor, target: torch.Tensor, **kwargs)[source]
compute()[source]
class transformers4rec.torch.ranking_metric.PrecisionAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.RecallAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.AvgPrecisionAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.DCGAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.NDCGAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

class transformers4rec.torch.ranking_metric.MeanReciprocalRankAt(top_ks=None, labels_onehot=False)[source]

Bases: transformers4rec.torch.ranking_metric.RankingMetric

transformers4rec.torch.trainer module

class transformers4rec.torch.trainer.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]

Bases: transformers.trainer.Trainer

An Trainer specialized for sequential recommendation including (session-based and sequtial recommendation)

Parameters
  • model (Model) – The Model defined using Transformers4Rec api.

  • args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.

  • schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None

  • train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None

  • eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None

  • train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None

  • eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None

  • compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None

  • incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard

get_train_dataloader()[source]

Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.

get_eval_dataloader(eval_dataset=None)[source]

Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.

get_test_dataloader(test_dataset=None)[source]

Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.

num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]

Overriding Trainer.num_examples() method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size

reset_lr_scheduler()None[source]

Resets the LR scheduler of the previous Trainer.train() call, so that a new LR scheduler one is created by the next Trainer.train() call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train

create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]

Unified API to get any scheduler from its name.

Parameters
  • name ((str or :obj:`SchedulerType)) – The name of the scheduler to use.

  • optimizer ((torch.optim.Optimizer)) – The optimizer that will be used during training.

  • num_warmup_steps ((int, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

  • num_training_steps ((int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

  • num_cycles ((int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler

compute_loss(model, inputs, return_outputs=False)[source]

Overriding Trainer.compute_loss() To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}

prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True)Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]

Overriding Trainer.prediction_step() to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model

evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval')transformers.trainer_utils.EvalLoopOutput[source]

Overriding Trainer.prediction_loop() (shared by Trainer.evaluate() and Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)

Parameters
  • dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data

  • description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test

  • prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None

  • ignore_keys (Optional[List[str]]) – Columns not accepted by the model.forward() method are automatically removed. by default None

  • metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval

load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]

This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)

Parameters
  • checkpoint_path (str) – Path to the checkpoint directory.

  • model (Optional[Model]) – Model class used by Trainer. by default None

property log_predictions_callback
log(logs: Dict[str, float])None[source]
transformers4rec.torch.trainer.process_metrics(metrics, prefix='', to_cpu=True)[source]
class transformers4rec.torch.trainer.IncrementalLoggingCallback(trainer: transformers4rec.torch.trainer.Trainer)[source]

Bases: transformers.trainer_callback.TrainerCallback

An TrainerCallback that changes the state of the Trainer on specific hooks for the purpose of the incremental logging :param trainer: :type trainer: Trainer

on_train_begin(args, state, control, model=None, **kwargs)[source]
on_train_end(args, state, control, model=None, **kwargs)[source]
on_epoch_end(args, state, control, model=None, **kwargs)[source]
class transformers4rec.torch.trainer.DatasetMock(nsteps=1)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

Mock to inform HF Trainer that the dataset is sized, and can be obtained via the generated/provided data loader

transformers4rec.torch.typing module

Module contents

class transformers4rec.torch.Schema(feature: Sequence[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>, weighted_feature: List[merlin_standard_lib.proto.schema_bp.WeightedFeature] = <betterproto._PLACEHOLDER object>, string_domain: List[merlin_standard_lib.proto.schema_bp.StringDomain] = <betterproto._PLACEHOLDER object>, float_domain: List[merlin_standard_lib.proto.schema_bp.FloatDomain] = <betterproto._PLACEHOLDER object>, int_domain: List[merlin_standard_lib.proto.schema_bp.IntDomain] = <betterproto._PLACEHOLDER object>, default_environment: List[str] = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, dataset_constraints: merlin_standard_lib.proto.schema_bp.DatasetConstraints = <betterproto._PLACEHOLDER object>, tensor_representation_group: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup] = <betterproto._PLACEHOLDER object>)[source]

Bases: merlin_standard_lib.proto.schema_bp._Schema

A collection of column schemas for a dataset.

feature: List[merlin_standard_lib.schema.schema.ColumnSchema] = Field(name=None,type=None,default=<betterproto._PLACEHOLDER object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'betterproto': FieldMetadata(number=1, proto_type='message', map_types=None, group=None, wraps=None)}),_field_type=None)
classmethod create(column_schemas: Optional[Union[List[Union[merlin_standard_lib.schema.schema.ColumnSchema, str]], Dict[str, Union[merlin_standard_lib.schema.schema.ColumnSchema, str]]]] = None, **kwargs)[source]
with_tags_based_on_properties(using_value_count=True, using_domain=True)merlin_standard_lib.schema.schema.Schema[source]
apply(selector)merlin_standard_lib.schema.schema.Schema[source]
apply_inverse(selector)merlin_standard_lib.schema.schema.Schema[source]
filter_columns_from_dict(input_dict)[source]
select_by_type(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_type(to_remove)merlin_standard_lib.schema.schema.Schema[source]
select_by_tag(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_tag(to_remove)merlin_standard_lib.schema.schema.Schema[source]
select_by_name(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_name(to_remove)merlin_standard_lib.schema.schema.Schema[source]
map_column_schemas(map_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], merlin_standard_lib.schema.schema.ColumnSchema])merlin_standard_lib.schema.schema.Schema[source]
filter_column_schemas(filter_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], bool], negate=False)merlin_standard_lib.schema.schema.Schema[source]
property column_names
property column_schemas
property item_id_column_name
from_json(value: Union[str, bytes])merlin_standard_lib.schema.schema.Schema[source]
to_proto_text()str[source]
from_proto_text(path_or_proto_text: str)merlin_standard_lib.schema.schema.Schema[source]
copy(**kwargs)merlin_standard_lib.schema.schema.Schema[source]
add(other, allow_overlap=True)merlin_standard_lib.schema.schema.Schema[source]
transformers4rec.torch.requires_schema(module)[source]
class transformers4rec.torch.T4RecConfig[source]

Bases: object

to_huggingface_torch_model()[source]
to_torch_model(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction='mean', **kwargs)[source]
property transformers_config_cls
classmethod build(*args, **kwargs)[source]
class transformers4rec.torch.GPT2Config(vocab_size=50257, n_positions=1024, n_embd=768, n_layer=12, n_head=12, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, scale_attn_weights=True, use_cache=True, bos_token_id=50256, eos_token_id=50256, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.gpt2.configuration_gpt2.GPT2Config

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]
class transformers4rec.torch.XLNetConfig(vocab_size=32000, d_model=1024, n_layer=24, n_head=16, d_inner=4096, ff_activation='gelu', untie_r=True, attn_type='bi', initializer_range=0.02, layer_norm_eps=1e-12, dropout=0.1, mem_len=512, reuse_len=None, use_mems_eval=True, use_mems_train=False, bi_data=False, clamp_len=- 1, same_length=False, summary_type='last', summary_use_proj=True, summary_activation='tanh', summary_last_dropout=0.1, start_n_top=5, end_n_top=5, pad_token_id=5, bos_token_id=1, eos_token_id=2, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.xlnet.configuration_xlnet.XLNetConfig

classmethod build(d_model, n_head, n_layer, total_seq_length=None, attn_type='bi', hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, mem_len=1, **kwargs)[source]
class transformers4rec.torch.TransfoXLConfig(vocab_size=267735, cutoffs=[20000, 40000, 200000], d_model=1024, d_embed=1024, n_head=16, d_head=64, d_inner=4096, div_val=4, pre_lnorm=False, n_layer=18, mem_len=1600, clamp_len=1000, same_length=True, proj_share_all_but_first=True, attn_type=0, sample_softmax=- 1, adaptive=True, dropout=0.1, dropatt=0.0, untie_r=True, init='normal', init_range=0.01, proj_init_std=0.01, init_std=0.02, layer_norm_epsilon=1e-05, eos_token_id=0, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.transfo_xl.configuration_transfo_xl.TransfoXLConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]
class transformers4rec.torch.LongformerConfig(attention_window: Union[List[int], int] = 512, sep_token_id: int = 2, pad_token_id: int = 1, bos_token_id: int = 0, eos_token_id: int = 2, vocab_size: int = 30522, hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, onnx_export: bool = False, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.longformer.configuration_longformer.LongformerConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]
class transformers4rec.torch.AlbertConfig(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.albert.configuration_albert.AlbertConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]
class transformers4rec.torch.ReformerConfig(attention_head_size=64, attn_layers=['local', 'lsh', 'local', 'lsh', 'local', 'lsh'], axial_norm_std=1.0, axial_pos_embds=True, axial_pos_shape=[64, 64], axial_pos_embds_dim=[64, 192], chunk_size_lm_head=0, eos_token_id=2, feed_forward_size=512, hash_seed=None, hidden_act='relu', hidden_dropout_prob=0.05, hidden_size=256, initializer_range=0.02, is_decoder=False, layer_norm_eps=1e-12, local_num_chunks_before=1, local_num_chunks_after=0, local_attention_probs_dropout_prob=0.05, local_attn_chunk_length=64, lsh_attn_chunk_length=64, lsh_attention_probs_dropout_prob=0.0, lsh_num_chunks_before=1, lsh_num_chunks_after=0, max_position_embeddings=4096, num_attention_heads=12, num_buckets=None, num_hashes=1, pad_token_id=0, vocab_size=320, tie_word_embeddings=False, use_cache=True, classifier_dropout=None, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.reformer.configuration_reformer.ReformerConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, axial_pos_shape_first_dim=4, **kwargs)[source]
class transformers4rec.torch.ElectraConfig(vocab_size=30522, embedding_size=128, hidden_size=256, num_hidden_layers=12, num_attention_heads=4, intermediate_size=1024, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, summary_type='first', summary_use_proj=True, summary_activation='gelu', summary_last_dropout=0.1, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source]

Bases: transformers4rec.config.transformer.T4RecConfig, transformers.models.electra.configuration_electra.ElectraConfig

classmethod build(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source]
class transformers4rec.torch.T4RecTrainingArguments(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: Union[transformers.trainer_utils.SchedulerType, str] = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'warning', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', save_steps: float = 500, save_total_limit: Optional[int] = None, save_safetensors: Optional[bool] = False, save_on_each_node: bool = False, no_cuda: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: Optional[int] = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, ddp_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[float] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', fsdp: str = '', fsdp_min_num_params: int = 0, fsdp_config: Optional[str] = None, fsdp_transformer_layer_cls_to_wrap: Optional[str] = None, deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: Union[transformers.training_args.OptimizerNames, str] = 'adamw_hf', optim_args: Optional[str] = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: Union[transformers.trainer_utils.HubStrategy, str] = 'every_save', hub_token: Optional[str] = None, hub_private_repo: bool = False, gradient_checkpointing: bool = False, include_inputs_for_metrics: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: Optional[str] = None, ray_scope: Optional[str] = 'last', ddp_timeout: Optional[int] = 1800, torch_compile: bool = False, torch_compile_backend: Optional[str] = None, torch_compile_mode: Optional[str] = None, xpu_backend: Optional[str] = None, max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'merlin', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 0, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source]

Bases: transformers.training_args.TrainingArguments

Class that inherits HF TrainingArguments and add on top of it arguments needed for session-based and sequential-based recommendation

Parameters
  • shuffle_buffer_size (int) –

  • validate_every (Optional[int], int) – Run validation set every this epoch. -1 means no validation is used by default -1

  • eval_on_test_set (bool) –

  • eval_steps_on_train_set (int) –

  • predict_top_k (Option[int], int) – Truncate recommendation list to the highest top-K predicted items, (do not affect evaluation metrics computation), this parameter is specific to NextItemPredictionTask. by default 0

  • log_predictions (Optional[bool], bool) – log predictions, labels and metadata features each –compute_metrics_each_n_steps (for test set). by default False

  • log_attention_weights (Optional[bool], bool) – Logs the inputs and attention weights each –eval_steps (only test set)” by default False

  • learning_rate_num_cosine_cycles_by_epoch (Optional[int], int) – Number of cycles for by epoch when –lr_scheduler_type = cosine_with_warmup. The number of waves in the cosine schedule (e.g. 0.5 is to just decrease from the max value to 0, following a half-cosine). by default 1.25

  • experiments_group (Optional[str], str) – Name of the Experiments Group, for organizing job runs logged on W&B by default “default”

max_sequence_length: Optional[int] = None
shuffle_buffer_size: int = 0
data_loader_engine: str = 'merlin'
eval_on_test_set: bool = False
eval_steps_on_train_set: int = 20
predict_top_k: int = 0
learning_rate_num_cosine_cycles_by_epoch: float = 1.25
log_predictions: bool = False
compute_metrics_each_n_steps: int = 1
experiments_group: str = 'default'
property place_model_on_device

Override the method to allow running training on cpu

output_dir: str
class transformers4rec.torch.SequentialBlock(*args, output_size=None)[source]

Bases: transformers4rec.torch.block.base.BlockBase, torch.nn.modules.container.Sequential

property inputs
add_module(name: str, module: Optional[torch.nn.modules.module.Module])None[source]
add_module_and_maybe_build(name: str, module, parent, idx)torch.nn.modules.module.Module[source]
forward(input, training=False, testing=False, **kwargs)[source]
build(input_size, schema=None, **kwargs)[source]
as_tabular(name=None)[source]
forward_output_size(input_size)[source]
static get_children_by_class_name(parent, *class_name)[source]
transformers4rec.torch.right_shift_block(self, other)[source]
transformers4rec.torch.build_blocks(*modules)[source]
class transformers4rec.torch.BlockBase(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module

to_model(prediction_task_or_head, inputs=None, **kwargs)[source]
as_tabular(name=None)[source]
class transformers4rec.torch.TabularBlock(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.tabular.base.TabularModule, abc.ABC

TabularBlock extends TabularModule to turn it into a block with output size info.

Parameters
to_module(shape_or_module, device=None)[source]
output_size(input_size=None)[source]
build(input_size, schema=None, **kwargs)[source]
class transformers4rec.torch.Block(module: torch.nn.modules.module.Module, output_size: Union[List[int], torch.Size])[source]

Bases: transformers4rec.torch.block.base.BlockBase

forward(inputs, **kwargs)[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.MLPBlock(dimensions, activation=<class 'torch.nn.modules.activation.ReLU'>, use_bias: bool = True, dropout=None, normalization=None, filter_features=None)[source]

Bases: transformers4rec.torch.block.base.BuildableBlock

build(input_shape)transformers4rec.torch.block.base.SequentialBlock[source]
class transformers4rec.torch.TabularTransformation(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module, abc.ABC

Transformation that takes in TabularData and outputs TabularData.

forward(inputs: Dict[str, torch.Tensor], **kwargs)Dict[str, torch.Tensor][source]
classmethod parse(class_or_str)[source]
class transformers4rec.torch.SequentialTabularTransformations(*transformation: Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]])[source]

Bases: transformers4rec.torch.block.base.SequentialBlock

A sequential container, modules will be added to it in the order they are passed in.

Parameters

transformation (TabularTransformationType) – transformations that are passed in here will be called in order.

append(transformation)[source]
class transformers4rec.torch.TabularAggregation(*args, **kwargs)[source]

Bases: transformers4rec.torch.utils.torch_utils.OutputSizeMixin, torch.nn.modules.module.Module, abc.ABC

Aggregation of TabularData that outputs a single Tensor

forward(inputs: Dict[str, torch.Tensor])torch.Tensor[source]
classmethod parse(class_or_str)[source]
class transformers4rec.torch.StochasticSwapNoise(schema=None, pad_token=0, replacement_prob=0.1)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies Stochastic replacement of sequence features. It can be applied as a pre transform like TransformerBlock(pre=”stochastic-swap-noise”)

forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], input_mask: Optional[torch.Tensor] = None, **kwargs)Union[torch.Tensor, Dict[str, torch.Tensor]][source]
forward_output_size(input_size)[source]
augment(input_tensor: torch.Tensor, mask: Optional[torch.Tensor] = None)torch.Tensor[source]
class transformers4rec.torch.TabularLayerNorm(features_dim: Optional[Dict[str, int]] = None)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies Layer norm to each input feature individually, before the aggregation

classmethod from_feature_config(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig])[source]
forward(inputs: Dict[str, torch.Tensor], **kwargs)Dict[str, torch.Tensor][source]
forward_output_size(input_size)[source]
build(input_size, **kwargs)[source]
class transformers4rec.torch.TabularDropout(dropout_rate=0.0)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Applies dropout transformation.

forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], **kwargs)Union[torch.Tensor, Dict[str, torch.Tensor]][source]
forward_output_size(input_size)[source]
class transformers4rec.torch.TransformerBlock(transformer: Union[transformers.modeling_utils.PreTrainedModel, transformers.configuration_utils.PretrainedConfig], masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, prepare_module: Optional[Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = None, output_fn=<function TransformerBlock.<lambda>>)[source]

Bases: transformers4rec.torch.block.base.BlockBase

Class to support HF Transformers for session-based and sequential-based recommendation models.

Parameters
  • transformer (TransformerBody) – The T4RecConfig or a pre-trained HF object related to specific transformer architecture.

  • masking – Needed when masking is applied on the inputs.

TRANSFORMER_TO_PREPARE: Dict[Type[transformers.modeling_utils.PreTrainedModel], Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = {<class 'transformers.models.gpt2.modeling_gpt2.GPT2Model'>: <class 'transformers4rec.torch.block.transformer.GPT2Prepare'>}
classmethod from_registry(transformer: str, d_model: int, n_head: int, n_layer: int, total_seq_length: int, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None)[source]

Load the HF transformer architecture based on its name

Parameters
  • transformer (str) – Name of the Transformer to use. Possible values are : [“reformer”, “gtp2”, “longformer”, “electra”, “albert”, “xlnet”]

  • d_model (int) – size of hidden states for Transformers

  • n_head – Number of attention heads for Transformers

  • n_layer (int) – Number of layers for RNNs and Transformers”

  • total_seq_length (int) – The maximum sequence length

forward(inputs_embeds, **kwargs)[source]

Transformer Models

forward_output_size(input_size)[source]
class transformers4rec.torch.ContinuousFeatures(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for continuous features.

Parameters
classmethod from_features(features, **kwargs)[source]
forward(inputs, **kwargs)[source]
forward_output_size(input_sizes)[source]
class transformers4rec.torch.EmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.base.InputBlock

Input block for embedding-lookups for categorical features.

For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.

Parameters
  • feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.

  • item_id (str, optional) – The name of the feature that’s used for the item_id.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs when the module is called (so before forward).

post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs after the module is called (so after forward).

aggregation: Union[str, TabularAggregation], optional

Aggregation to apply after processing the forward-method to output a single Tensor.

property item_embedding_table
table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig)torch.nn.modules.module.Module[source]
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs)Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source]

Instantitates EmbeddingFeatures from a DatasetSchema.

Parameters
  • schema (DatasetSchema) – Dataset schema

  • embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None

  • default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in default_soft_embedding_dim, by default 64

  • infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False

  • infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0

  • embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables

  • combiner (Optional[str], optional) – Feature aggregation option, by default “mean”

  • tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None

  • item_id (Optional[str], optional) – Name of the item id column (feature), by default None

  • automatic_build (bool, optional) – Automatically infers input size from features, by default True

  • max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None

Returns

Returns the EmbeddingFeatures for the dataset schema

Return type

Optional[EmbeddingFeatures]

item_ids(inputs)torch.Tensor[source]
forward(inputs, **kwargs)[source]
forward_output_size(input_sizes)[source]
class transformers4rec.torch.SoftEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings.

Parameters
  • feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.

  • layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post.

  • pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).

  • post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).

  • aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs)Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source]

Instantitates SoftEmbeddingFeatures from a DatasetSchema.

Parameters
  • schema (DatasetSchema) – Dataset schema

  • soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None

  • soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in soft_embedding_cardinalities, by default 10

  • soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None

  • soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in soft_embedding_dim_default, by default 8

  • embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables

  • combiner (Optional[str], optional) – Feature aggregation option, by default “mean”

  • tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None

  • automatic_build (bool, optional) – Automatically infers input size from features, by default True

  • max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None

Returns

Returns a SoftEmbeddingFeatures instance from the dataset schema

Return type

Optional[SoftEmbeddingFeatures]

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig)transformers4rec.torch.features.embedding.SoftEmbedding[source]
class transformers4rec.torch.PretrainedEmbeddingsInitializer(weight_matrix: Union[torch.Tensor, List[List[float]]], trainable: bool = False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Initializer of embedding tables with pre-trained weights

Parameters
  • weight_matrix (Union[torch.Tensor, List[List[float]]]) – A 2D torch or numpy tensor or lists of lists with the pre-trained weights for embeddings. The expect dims are (embedding_cardinality, embedding_dim). The embedding_cardinality can be inferred from the column schema, for example, schema.select_by_name(“item_id”).feature[0].int_domain.max + 1. The first position of the embedding table is reserved for padded items (id=0).

  • trainable (bool) – Whether the embedding table should be trainable or not

training: bool
forward(x)[source]
class transformers4rec.torch.TabularSequenceFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.features.tabular.TabularFeatures

Input module that combines different types of features to a sequence: continuous, categorical & text.

Parameters
  • continuous_module (TabularModule, optional) – Module used to process continuous features.

  • categorical_module (TabularModule, optional) – Module used to process categorical features.

  • text_embedding_module (TabularModule, optional) – Module used to process text features.

  • projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock.

  • masking (MaskSequence, optional) – Masking to apply to the inputs.

  • pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).

  • post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).

  • aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

EMBEDDING_MODULE_CLASS

alias of transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs)transformers4rec.torch.features.sequence.TabularSequenceFeatures[source]

Instantiates TabularFeatures from a DatasetSchema

Parameters
  • schema (DatasetSchema) – Dataset schema

  • continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS

  • categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL

  • aggregation (Optional[str], optional) – Feature aggregation option, by default None

  • automatic_build (bool, optional) – Automatically infers input size from features, by default True

  • max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None

  • continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None

  • continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False

  • projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None

  • d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None

  • masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

property masking
set_masking(value)[source]
property item_id
property item_embedding_table
forward(inputs, training=False, testing=False, **kwargs)[source]
project_continuous_features(dimensions)[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.SequenceEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.features.embedding.EmbeddingFeatures

Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers.

Parameters
  • feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.

  • item_id (str, optional) – The name of the feature that’s used for the item_id.

  • padding_idx (int) – The symbol to use for padding.

  • pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).

  • post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).

  • aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig)torch.nn.modules.sparse.Embedding[source]
forward_output_size(input_sizes)[source]
class transformers4rec.torch.FeatureConfig(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source]

Bases: object

class transformers4rec.torch.TableConfig(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source]

Bases: object

class transformers4rec.torch.TabularFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.MergeTabular

Input module that combines different types of features: continuous, categorical & text.

Parameters
  • continuous_module (TabularModule, optional) – Module used to process continuous features.

  • categorical_module (TabularModule, optional) – Module used to process categorical features.

  • text_embedding_module (TabularModule, optional) – Module used to process text features.

pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs when the module is called (so before forward).

post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional

Transformations to apply on the inputs after the module is called (so after forward).

aggregation: Union[str, TabularAggregation], optional

Aggregation to apply after processing the forward-method to output a single Tensor.

CONTINUOUS_MODULE_CLASS

alias of transformers4rec.torch.features.continuous.ContinuousFeatures

EMBEDDING_MODULE_CLASS

alias of transformers4rec.torch.features.embedding.EmbeddingFeatures

SOFT_EMBEDDING_MODULE_CLASS

alias of transformers4rec.torch.features.embedding.SoftEmbeddingFeatures

project_continuous_features(mlp_layers_dims: Union[List[int], int])transformers4rec.torch.features.tabular.TabularFeatures[source]

Combine all concatenated continuous features with stacked MLP layers

Parameters

mlp_layers_dims (Union[List[int], int]) – The MLP layer dimensions

Returns

Returns the same TabularFeatures object with the continuous features projected

Return type

TabularFeatures

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, **kwargs)transformers4rec.torch.features.tabular.TabularFeatures[source]

Instantiates TabularFeatures from a DatasetSchema

Parameters
  • schema (DatasetSchema) – Dataset schema

  • continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS

  • categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL

  • aggregation (Optional[str], optional) – Feature aggregation option, by default None

  • automatic_build (bool, optional) – Automatically infers input size from features, by default True

  • max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None

  • continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None

  • continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False

Returns

Returns TabularFeatures from a dataset schema

Return type

TabularFeatures

forward_output_size(input_size)[source]
property continuous_module
property categorical_module
class transformers4rec.torch.Head(body: transformers4rec.torch.block.base.BlockBase, prediction_tasks: Union[List[transformers4rec.torch.model.base.PredictionTask], transformers4rec.torch.model.base.PredictionTask], task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weights: Optional[List[float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None)[source]

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

Head of a Model, a head has a single body but could have multiple prediction-tasks. :param body: TODO :type body: Block :param prediction_tasks: TODO :type prediction_tasks: Union[List[PredictionTask], PredictionTask], optional :param task_blocks: TODO :param task_weights: TODO :type task_weights: List[float], optional :param loss_reduction: TODO :type loss_reduction: str, default=”mean” :param inputs: TODO :type inputs: TabularFeaturesType, optional

build(inputs=None, device=None, task_blocks=None)[source]

Build each prediction task that’s part of the head. :param body: :param inputs: :param device: :param task_blocks:

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, body: transformers4rec.torch.block.base.BlockBase, task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weight_dict: Optional[Dict[str, float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None)transformers4rec.torch.model.base.Head[source]

Instantiate a Head from a Schema through tagged targets. :param schema: Schema to use for inferring all targets based on the tags. :type schema: DatasetSchema :param body: :param task_blocks: :param task_weight_dict: :param loss_reduction: :param inputs:

Returns

Return type

Head

pop_labels(inputs: Dict[str, torch.Tensor])Dict[str, torch.Tensor][source]

Pop the labels from the different prediction_tasks from the inputs. :param inputs: Input dictionary containing all targets. :type inputs: TabularData

Returns

Return type

TabularData

forward(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], training: bool = False, testing: bool = False, targets: Optional[Union[torch.Tensor, Dict[str, torch.Tensor]]] = None, call_body: bool = False, top_k: Optional[int] = None, **kwargs)Union[torch.Tensor, Dict[str, torch.Tensor]][source]
calculate_metrics(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]])Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]

Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors to use for calculate metrics.

They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple tasks are used. In the second case, the dictionary is indexed by the tasks names.

Parameters

targets – The tensor or dictionary of targets to use for computing the metrics of one or multiple tasks.

compute_metrics(mode: Optional[str] = None)Dict[str, Union[float, torch.Tensor]][source]
reset_metrics()[source]
property task_blocks
to_model(**kwargs)transformers4rec.torch.model.base.Model[source]

Convert the head to a Model. :returns: :rtype: Model

training: bool
class transformers4rec.torch.Model(*head: transformers4rec.torch.model.base.Head, head_weights: Optional[List[float]] = None, head_reduction: str = 'mean', optimizer: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, name: Optional[str] = None, max_sequence_length: Optional[int] = None, top_k: Optional[int] = None)[source]

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

forward(inputs: Dict[str, torch.Tensor], targets=None, training=False, testing=False, **kwargs)[source]
calculate_metrics(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]])Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]

Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors returned by the model.

They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple heads/tasks are used. In the second case, the dictionary is indexed by the tasks names.

Parameters

targets – The tensor or dictionary of targets returned by the model. They are used for computing the metrics of one or multiple tasks.

compute_metrics(mode=None)Dict[str, Union[float, torch.Tensor]][source]
reset_metrics()[source]
to_lightning()[source]
fit(dataloader, optimizer=<class 'torch.optim.adam.Adam'>, eval_dataloader=None, num_epochs=1, amp=False, train=True, verbose=True, compute_metric=True)[source]
evaluate(dataloader, targets=None, training=False, testing=True, verbose=True, mode='eval')[source]
property input_schema
property output_schema
property prediction_tasks
save(path: Union[str, os.PathLike], model_name='t4rec_model_class')[source]

Saves the model to f”{export_path}/{model_name}.pkl” using cloudpickle :param path: Path to the directory where the T4Rec model should be saved. :type path: Union[str, os.PathLike] :param model_name:

the name given to the pickle file storing the T4Rec model,

by default ‘t4rec_model_class’

classmethod load(path: Union[str, os.PathLike], model_name='t4rec_model_class')transformers4rec.torch.model.base.Model[source]

Loads a T4Rec model that was saved with model.save(). :param path: Path to the directory where the T4Rec model is saved. :type path: Union[str, os.PathLike] :param model_name:

the name given to the pickle file storing the T4Rec model,

by default ‘t4rec_model_class’.

training: bool
class transformers4rec.torch.PredictionTask(loss: torch.nn.modules.module.Module, metrics: Optional[Iterable[torchmetrics.metric.Metric]] = None, target_name: Optional[str] = None, task_name: Optional[str] = None, forward_to_prediction_fn: Callable[[torch.Tensor], torch.Tensor] = <function PredictionTask.<lambda>>, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, summary_type: str = 'last')[source]

Bases: torch.nn.modules.module.Module, transformers4rec.torch.utils.torch_utils.LossMixin, transformers4rec.torch.utils.torch_utils.MetricsMixin

Individual prediction-task of a model. :param loss: The loss to use during training of this task. :type loss: torch.nn.Module :param metrics: The metrics to calculate during training & evaluation. :type metrics: torch.nn.Module :param target_name: Name of the target, this is needed when there are multiple targets. :type target_name: str, optional :param task_name: Name of the prediction task, if not provided a name will be automatically constructed based

on the target-name & class-name.

Parameters
  • forward_to_prediction_fn (Callable[[torch.Tensor], torch.Tensor]) – Function to apply before the prediction

  • task_block (BlockType) – Module to transform input tensor before computing predictions.

  • pre (BlockType) – Module to compute the predictions probabilities.

  • summary_type (str) –

    This is used to summarize a sequence into a single tensor. Accepted values are:
    • ”last” – Take the last token hidden state (like XLNet)

    • ”first” – Take the first token hidden state (like Bert)

    • ”mean” – Take the mean of all tokens hidden states

    • ”cls_index” – Supply a Tensor of classification token position (GPT/GPT-2)

    • ”attn” – Not implemented now, use multi-head attention

build(body: Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock], input_size, inputs: Optional[transformers4rec.torch.features.base.InputBlock] = None, device=None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre=None)[source]

The method will be called when block is converted to a model, i.e when linked to prediction head. :param block: the model block to link with head :param device: set the device for the metrics and layers of the task

forward(inputs: torch.Tensor, targets: Optional[torch.Tensor] = None, training: bool = False, testing: bool = False)[source]
property task_name
child_name(name)[source]
set_metrics(metrics)[source]
calculate_metrics(predictions: torch.Tensor, targets: torch.Tensor)Dict[str, torch.Tensor][source]
compute_metrics(**kwargs)[source]
metric_name(metric: torchmetrics.metric.Metric)str[source]
reset_metrics()[source]
to_head(body, inputs=None, **kwargs)transformers4rec.torch.model.base.Head[source]
to_model(body, inputs=None, **kwargs)transformers4rec.torch.model.base.Model[source]
training: bool
class transformers4rec.torch.AsTabular(output_name: str)[source]

Bases: transformers4rec.torch.tabular.base.TabularBlock

Converts a Tensor to TabularData by converting it to a dictionary.

Parameters

output_name (str) – Name that should be used as the key in the output dictionary.

forward(inputs: torch.Tensor, **kwargs)Dict[str, torch.Tensor][source]
forward_output_size(input_size)[source]
class transformers4rec.torch.ConcatFeatures(*args, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.TabularAggregation

Aggregation by stacking all values in TabularData, all non-sequential values will be converted to a sequence.

The output of this concatenation will have 3 dimensions.

forward(inputs: Dict[str, torch.Tensor])torch.Tensor[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.FilterFeatures(to_include: List[str], pop: bool = False)[source]

Bases: transformers4rec.torch.tabular.base.TabularTransformation

Module that filters out certain features from TabularData.”

Parameters
  • to_include (List[str]) – List of features to include in the result of calling the module

  • pop (bool) – Boolean indicating whether to pop the features to exclude from the inputs dictionary.

forward(inputs: Dict[str, torch.Tensor], **kwargs)Dict[str, torch.Tensor][source]
Parameters
  • inputs (TabularData) – Input dictionary containing features to filter.

  • Filtered TabularData that only contains the feature-names in self.to_include. (Returns) –

  • -------

forward_output_size(input_shape)[source]
Parameters

input_shape

class transformers4rec.torch.ElementwiseSum[source]

Bases: transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation

Aggregation by first stacking all values in TabularData in the first dimension, and then summing the result.

forward(inputs: Dict[str, torch.Tensor])torch.Tensor[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.ElementwiseSumItemMulti(schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]

Bases: transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation

Aggregation by applying the ElementwiseSum aggregation to all features except the item-id, and then multiplying this with the item-ids.

Parameters

schema (DatasetSchema) –

forward(inputs: Dict[str, torch.Tensor])torch.Tensor[source]
forward_output_size(input_size)[source]
REQUIRES_SCHEMA = True
class transformers4rec.torch.MergeTabular(*modules_to_merge: Union[transformers4rec.torch.tabular.base.TabularModule, Dict[str, transformers4rec.torch.tabular.base.TabularModule]], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]

Bases: transformers4rec.torch.tabular.base.TabularBlock

Merge multiple TabularModule’s into a single output of TabularData.

Parameters
property merge_values
forward(inputs: Dict[str, torch.Tensor], training=True, **kwargs)Dict[str, torch.Tensor][source]
forward_output_size(input_size)[source]
build(input_size, **kwargs)[source]
class transformers4rec.torch.StackFeatures(axis: int = - 1)[source]

Bases: transformers4rec.torch.tabular.base.TabularAggregation

Aggregation by stacking all values in input dictionary in the given dimension.

Parameters

axis (int, default=-1) – Axis to use for the stacking operation.

forward(inputs: Dict[str, torch.Tensor])torch.Tensor[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.BinaryClassificationTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=BCELoss(), metrics=(BinaryPrecision(), BinaryRecall(), BinaryAccuracy()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

Returns a PredictionTask for binary classification.

Example usage:

# Define the input module to process the tabular input features.
input_module = tr.TabularSequenceFeatures.from_schema(
    schema,
    max_sequence_length=max_sequence_length,
    continuous_projection=d_model,
    aggregation="concat",
    masking=None,
)

# Define XLNetConfig class and set default parameters for HF XLNet config.
transformer_config = tr.XLNetConfig.build(
    d_model=d_model, n_head=4, n_layer=2, total_seq_length=max_sequence_length
)

# Define the model block including: inputs, masking, projection and transformer block.
body = tr.SequentialBlock(
    input_module,
    tr.MLPBlock([64]),
    tr.TransformerBlock(
        transformer_config,
        masking=input_module.masking
    )
)

# Define a head with BinaryClassificationTask.
head = tr.Head(
    body,
    tr.BinaryClassificationTask(
        "click",
        summary_type="mean",
        metrics=[
            tm.Precision(task='binary'),
            tm.Recall(task='binary'),
            tm.Accuracy(task='binary'),
            tm.F1Score(task='binary')
        ]
    ),
    inputs=input_module,
)

# Get the end-to-end Model class.
model = tr.Model(head)
Parameters
  • target_name (Optional[str] = None) – Specifies the variable name that represents the positive and negative values.

  • task_name (Optional[str] = None) – Specifies the name of the prediction task. If this parameter is not specified, a name is automatically constructed based on target_name and the Python class name of the model.

  • task_block (Optional[BlockType] = None) – Specifies a module to transform the input tensor before computing predictions.

  • loss (torch.nn.Module) – Specifies the loss function for the task. The default class is torch.nn.BCELoss.

  • metrics (Tuple[torch.nn.Module, ..]) – Specifies the metrics to calculate during training and evaluation. The default metrics are Precision, Recall, and Accuracy.

  • summary_type (str) –

    Summarizes a sequence into a single tensor. Accepted values are:

    • last – Take the last token hidden state (like XLNet)

    • first – Take the first token hidden state (like Bert)

    • mean – Take the mean of all tokens hidden states

    • cls_index – Supply a Tensor of classification token position (GPT/GPT-2)

    • attn – Not implemented now, use multi-head attention

DEFAULT_LOSS = BCELoss()
DEFAULT_METRICS = (BinaryPrecision(), BinaryRecall(), BinaryAccuracy())
training: bool
class transformers4rec.torch.RegressionTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=MSELoss(), metrics=(MeanSquaredError()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

DEFAULT_LOSS = MSELoss()
DEFAULT_METRICS = (MeanSquaredError(),)
training: bool
class transformers4rec.torch.NextItemPredictionTask(loss: torch.nn.modules.module.Module = CrossEntropyLoss(), metrics: Iterable[torchmetrics.metric.Metric] = (NDCGAt(), AvgPrecisionAt(), RecallAt()), task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, task_name: str = 'next-item', weight_tying: bool = False, softmax_temperature: float = 1, padding_idx: int = 0, target_dim: Optional[int] = None, sampled_softmax: Optional[bool] = False, max_n_samples: Optional[int] = 100)[source]

Bases: transformers4rec.torch.model.base.PredictionTask

This block performs item prediction task for session and sequential-based models. It requires a body containing a masking schema to use for training and target generation. For the supported masking schemes, please refers to: https://nvidia-merlin.github.io/Transformers4Rec/main/model_definition.html#sequence-masking

Parameters
  • loss (torch.nn.Module) – Loss function to use. Defaults to NLLLos.

  • metrics (Iterable[torchmetrics.Metric]) – List of ranking metrics to use for evaluation.

  • task_block – Module to transform input tensor before computing predictions.

  • task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name.

  • weight_tying (bool) – The item id embedding table weights are shared with the prediction network layer.

  • softmax_temperature (float) – Softmax temperature, used to reduce model overconfidence, so that softmax(logits / T). Value 1.0 reduces to regular softmax.

  • padding_idx (int) – pad token id.

  • target_dim (int) – vocabulary size of item ids

  • sampled_softmax (Optional[bool]) – Enables sampled softmax. By default False

  • max_n_samples (Optional[int]) – Number of samples for sampled softmax. By default 100

DEFAULT_METRICS = (NDCGAt(), AvgPrecisionAt(), RecallAt())
build(body, input_size, device=None, inputs=None, task_block=None, pre=None)[source]

Build method, this is called by the Head.

forward(inputs: torch.Tensor, targets=None, training=False, testing=False, top_k=None, **kwargs)[source]
remove_pad_3d(inp_tensor, non_pad_mask)[source]
calculate_metrics(predictions, targets)Dict[str, torch.Tensor][source]
compute_metrics()[source]
training: bool
class transformers4rec.torch.TabularModule(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwargs)[source]

Bases: torch.nn.modules.module.Module

PyTorch Module that’s specialized for tabular-data by integrating many often used operations.

Parameters
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, tags=None, **kwargs)Optional[transformers4rec.torch.tabular.base.TabularModule][source]

Instantiate a TabularModule instance from a DatasetSchema.

Parameters
  • schema

  • tags

  • kwargs

Returns

Return type

Optional[TabularModule]

classmethod from_features(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None)transformers4rec.torch.tabular.base.TabularModule[source]
Initializes a TabularModule instance where the contents of features will be filtered

out

Parameters
  • features (List[str]) – A list of feature-names that will be used as the first pre-processing op to filter out all other features not in this list.

  • pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).

  • post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).

  • aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.

Returns

Return type

TabularModule

property pre

returns: :rtype: SequentialTabularTransformations, optional

property post

returns: :rtype: SequentialTabularTransformations, optional

property aggregation

returns: :rtype: TabularAggregation, optional

pre_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None)Dict[str, torch.Tensor][source]

Method that’s typically called before the forward method for pre-processing.

Parameters
  • inputs (TabularData) – input-data, typically the output of the forward method.

  • transformations (TabularAggregationType, optional) –

Returns

Return type

TabularData

forward(x: Dict[str, torch.Tensor], *args, **kwargs)Dict[str, torch.Tensor][source]
post_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, merge_with: Optional[Union[transformers4rec.torch.tabular.base.TabularModule, List[transformers4rec.torch.tabular.base.TabularModule]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None)Union[torch.Tensor, Dict[str, torch.Tensor]][source]

Method that’s typically called after the forward method for post-processing.

Parameters
  • inputs (TabularData) – input-data, typically the output of the forward method.

  • transformations (TabularTransformationType, optional) – Transformations to apply on the input data.

  • merge_with (Union[TabularModule, List[TabularModule]], optional) – Other TabularModule’s to call and merge the outputs with.

  • aggregation (TabularAggregationType, optional) – Aggregation to aggregate the output to a single Tensor.

Returns

Return type

TensorOrTabularData (Tensor when aggregation is set, else TabularData)

merge(other)
training: bool
class transformers4rec.torch.SoftEmbedding(num_embeddings, embeddings_dim, emb_initializer=None)[source]

Bases: torch.nn.modules.module.Module

Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings

forward(input_numeric)[source]
training: bool
class transformers4rec.torch.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]

Bases: transformers.trainer.Trainer

An Trainer specialized for sequential recommendation including (session-based and sequtial recommendation)

Parameters
  • model (Model) – The Model defined using Transformers4Rec api.

  • args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.

  • schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None

  • train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None

  • eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None

  • train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None

  • eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None

  • compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None

  • incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard

get_train_dataloader()[source]

Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.

get_eval_dataloader(eval_dataset=None)[source]

Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.

get_test_dataloader(test_dataset=None)[source]

Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.

num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]

Overriding Trainer.num_examples() method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size

reset_lr_scheduler()None[source]

Resets the LR scheduler of the previous Trainer.train() call, so that a new LR scheduler one is created by the next Trainer.train() call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train

create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]

Unified API to get any scheduler from its name.

Parameters
  • name ((str or :obj:`SchedulerType)) – The name of the scheduler to use.

  • optimizer ((torch.optim.Optimizer)) – The optimizer that will be used during training.

  • num_warmup_steps ((int, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

  • num_training_steps ((int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

  • num_cycles ((int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler

compute_loss(model, inputs, return_outputs=False)[source]

Overriding Trainer.compute_loss() To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}

prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True)Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]

Overriding Trainer.prediction_step() to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model

evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval')transformers.trainer_utils.EvalLoopOutput[source]

Overriding Trainer.prediction_loop() (shared by Trainer.evaluate() and Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)

Parameters
  • dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data

  • description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test

  • prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None

  • ignore_keys (Optional[List[str]]) – Columns not accepted by the model.forward() method are automatically removed. by default None

  • metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval

load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]

This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)

Parameters
  • checkpoint_path (str) – Path to the checkpoint directory.

  • model (Optional[Model]) – Model class used by Trainer. by default None

property log_predictions_callback
log(logs: Dict[str, float])None[source]
transformers4rec.torch.LabelSmoothCrossEntropyLoss(smoothing: float = 0.0, reduction: str = 'mean', **kwargs)[source]

Coss-entropy loss with label smoothing. This is going to be deprecated. You should use torch.nn.CrossEntropyLoss() directly that in recent PyTorch versions already supports label_smoothing arg

Parameters
  • smoothing (float) – The label smoothing factor. Specify a value between 0 and 1.

  • reduction (str) – Specifies the reduction to apply to the output. Specify one of none, sum, or mean.

  • from https (Adapted) –