transformers4rec.torch package
Subpackages
- transformers4rec.torch.block package
- transformers4rec.torch.features package- Submodules
- transformers4rec.torch.features.base module
- transformers4rec.torch.features.continuous module
- transformers4rec.torch.features.embedding module
- transformers4rec.torch.features.sequence module
- transformers4rec.torch.features.tabular module
- transformers4rec.torch.features.text module
- Module contents
 
- transformers4rec.torch.model package
- transformers4rec.torch.tabular package
- transformers4rec.torch.utils package
Submodules
transformers4rec.torch.masking module
- 
class transformers4rec.torch.masking.MaskingInfo(schema: torch.Tensor, targets: torch.Tensor)[source]
- Bases: - object- 
schema: torch.Tensor
 - 
targets: torch.Tensor
 
- 
- 
class transformers4rec.torch.masking.MaskSequence(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, **kwargs)[source]
- Bases: - transformers4rec.torch.utils.torch_utils.OutputSizeMixin,- torch.nn.modules.module.Module- Base class to prepare masked items inputs/labels for language modeling tasks. - Transformer architectures can be trained in different ways. Depending of the training method, there is a specific masking schema. The masking schema sets the items to be predicted (labels) and mask (hide) their positions in the sequence so that they are not used by the Transformer layers for prediction. - We currently provide 4 different masking schemes out of the box:
- Causal LM (clm) 
- Masked LM (mlm) 
- Permutation LM (plm) 
- Replacement Token Detection (rtd) 
 
 - This class can be extended to add different a masking scheme. - Parameters
- hidden_size – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions. 
- pad_token (int, default = 0) – Index of the padding token used for getting batch of sequences with the same length 
 
 - 
compute_masked_targets(item_ids: torch.Tensor, training=False) → transformers4rec.torch.masking.MaskingInfo[source]
- Method to prepare masked labels based on the sequence of item ids. It returns The true labels of masked positions and the related boolean mask. And the attributes of the class mask_schema and masked_targets are updated to be re-used in other modules. - Parameters
- item_ids (torch.Tensor) – The sequence of input item ids used for deriving labels of next item prediction task. 
- training (bool) – Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task. During evaluation, we are predicting the last item in the sequence. 
 
- Returns
- Return type
- Tuple[MaskingSchema, MaskedTargets] 
 
 - 
apply_mask_to_inputs(inputs: torch.Tensor, schema: torch.Tensor) → torch.Tensor[source]
- Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding. - Parameters
- inputs (torch.Tensor) – The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional) 
- schema (MaskingSchema) – The boolean mask indicating masked positions. 
 
 
 - 
predict_all(item_ids: torch.Tensor) → transformers4rec.torch.masking.MaskingInfo[source]
- Prepare labels for all next item predictions instead of last-item predictions in a user’s sequence. - Parameters
- item_ids (torch.Tensor) – The sequence of input item ids used for deriving labels of next item prediction task. 
- Returns
- Return type
- Tuple[MaskingSchema, MaskedTargets] 
 
 - 
forward(inputs: torch.Tensor, item_ids: torch.Tensor, training=False) → torch.Tensor[source]
 - 
property transformer_arguments
- Prepare additional arguments to pass to the Transformer forward methods. 
 
- 
class transformers4rec.torch.masking.CausalLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)[source]
- Bases: - transformers4rec.torch.masking.MaskSequence- In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked. - Parameters
- hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions. 
- padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length 
- eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation 
- train_on_last_item_seq_only (predict only last item during training) – 
 
 - 
apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor) → torch.Tensor[source]
 
- 
class transformers4rec.torch.masking.MaskedLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)[source]
- Bases: - transformers4rec.torch.masking.MaskSequence- In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item. - Parameters
- hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions. 
- padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length 
- eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation 
- mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it. 
 
 
- 
class transformers4rec.torch.masking.PermutationLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)[source]
- Bases: - transformers4rec.torch.masking.MaskSequence- In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context. - Parameters
- hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions. 
- padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length 
- eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation 
- max_span_length (int) – maximum length of a span of masked items 
- plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items 
- permute_all (bool) – Compute partial span-based prediction (=False) or not. 
 
 - 
compute_masked_targets(item_ids: torch.Tensor, training=False) → transformers4rec.torch.masking.MaskingInfo[source]
 
- 
class transformers4rec.torch.masking.ReplacementLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)[source]
- Bases: - transformers4rec.torch.masking.MaskedLanguageModeling- Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks. - Parameters
- hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions. 
- padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length 
- eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation 
- sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not 
 
 - 
get_fake_tokens(itemid_seq, target_flat, logits)[source]
- Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements. - Parameters
- itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids 
- target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences 
- logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise 
 
- Returns
- corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement 
- discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items 
- batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled 
 
 
 - 
sample_from_softmax(logits: torch.Tensor) → torch.Tensor[source]
- Sampling method for replacement token modeling (ELECTRA) - Parameters
- logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model 
- Returns
- samples – ids of replacements items. 
- Return type
- torch.Tensor(#pos_item) 
 
 
transformers4rec.torch.ranking_metric module
- 
class transformers4rec.torch.ranking_metric.RankingMetric(top_ks=None, labels_onehot=False)[source]
- Bases: - torchmetrics.metric.Metric- Metric wrapper for computing ranking metrics@K for session-based task. - Parameters
 - 
update(preds: torch.Tensor, target: torch.Tensor, **kwargs)[source]
 
- 
class transformers4rec.torch.ranking_metric.AvgPrecisionAt(top_ks=None, labels_onehot=False)[source]
transformers4rec.torch.trainer module
- 
class transformers4rec.torch.trainer.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]
- Bases: - transformers.trainer.Trainer- An - Trainerspecialized for sequential recommendation including (session-based and sequtial recommendation)- Parameters
- model (Model) – The Model defined using Transformers4Rec api. 
- args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments. 
- schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None 
- train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None 
- eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None 
- train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None 
- eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None 
- compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None 
- incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard 
 
 - 
get_train_dataloader()[source]
- Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments. 
 - 
get_eval_dataloader(eval_dataset=None)[source]
- Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments. 
 - 
num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]
- Overriding - Trainer.num_examples()method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size
 - 
reset_lr_scheduler() → None[source]
- Resets the LR scheduler of the previous - Trainer.train()call, so that a new LR scheduler one is created by the next- Trainer.train()call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train
 - 
create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
 - 
static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]
- Unified API to get any scheduler from its name. - Parameters
- name (( - stror :obj:`SchedulerType)) – The name of the scheduler to use.
- optimizer (( - torch.optim.Optimizer)) – The optimizer that will be used during training.
- num_warmup_steps (( - int, optional)) – The number of warmup steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
- num_training_steps (( - int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
- num_cycles (( - int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler
 
 
 - 
prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]
- Overriding - Trainer.prediction_step()to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model
 - 
evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source]
- Overriding - Trainer.prediction_loop()(shared by- Trainer.evaluate()and- Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)- Parameters
- dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data 
- description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test 
- prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None 
- ignore_keys (Optional[List[str]]) – Columns not accepted by the - model.forward()method are automatically removed. by default None
- metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval 
 
 
 - 
load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]
- This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load) 
 - 
property log_predictions_callback
 
- 
class transformers4rec.torch.trainer.IncrementalLoggingCallback(trainer: transformers4rec.torch.trainer.Trainer)[source]
- Bases: - transformers.trainer_callback.TrainerCallback- An - TrainerCallbackthat changes the state of the Trainer on specific hooks for the purpose of the incremental logging :param trainer: :type trainer: Trainer
transformers4rec.torch.typing module
Module contents
- 
class transformers4rec.torch.Schema(feature: Sequence[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>, weighted_feature: List[merlin_standard_lib.proto.schema_bp.WeightedFeature] = <betterproto._PLACEHOLDER object>, string_domain: List[merlin_standard_lib.proto.schema_bp.StringDomain] = <betterproto._PLACEHOLDER object>, float_domain: List[merlin_standard_lib.proto.schema_bp.FloatDomain] = <betterproto._PLACEHOLDER object>, int_domain: List[merlin_standard_lib.proto.schema_bp.IntDomain] = <betterproto._PLACEHOLDER object>, default_environment: List[str] = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, dataset_constraints: merlin_standard_lib.proto.schema_bp.DatasetConstraints = <betterproto._PLACEHOLDER object>, tensor_representation_group: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup] = <betterproto._PLACEHOLDER object>)[source]
- Bases: - merlin_standard_lib.proto.schema_bp._Schema- A collection of column schemas for a dataset. - 
feature: List[merlin_standard_lib.schema.schema.ColumnSchema] = Field(name=None,type=None,default=<betterproto._PLACEHOLDER object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'betterproto': FieldMetadata(number=1, proto_type='message', map_types=None, group=None, wraps=None)}),_field_type=None)
 - 
classmethod create(column_schemas: Optional[Union[List[Union[merlin_standard_lib.schema.schema.ColumnSchema, str]], Dict[str, Union[merlin_standard_lib.schema.schema.ColumnSchema, str]]]] = None, **kwargs)[source]
 - 
apply(selector) → merlin_standard_lib.schema.schema.Schema[source]
 - 
apply_inverse(selector) → merlin_standard_lib.schema.schema.Schema[source]
 - 
select_by_type(to_select) → merlin_standard_lib.schema.schema.Schema[source]
 - 
remove_by_type(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
 - 
select_by_tag(to_select) → merlin_standard_lib.schema.schema.Schema[source]
 - 
remove_by_tag(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
 - 
select_by_name(to_select) → merlin_standard_lib.schema.schema.Schema[source]
 - 
remove_by_name(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
 - 
map_column_schemas(map_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], merlin_standard_lib.schema.schema.ColumnSchema]) → merlin_standard_lib.schema.schema.Schema[source]
 - 
filter_column_schemas(filter_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], bool], negate=False) → merlin_standard_lib.schema.schema.Schema[source]
 - 
property column_names
 - 
property column_schemas
 - 
property item_id_column_name
 - 
from_json(value: Union[str, bytes]) → merlin_standard_lib.schema.schema.Schema[source]
 - 
from_proto_text(path_or_proto_text: str) → merlin_standard_lib.schema.schema.Schema[source]
 - 
copy(**kwargs) → merlin_standard_lib.schema.schema.Schema[source]
 - 
add(other, allow_overlap=True) → merlin_standard_lib.schema.schema.Schema[source]
 
- 
- 
class transformers4rec.torch.Tag(value)[source]
- Bases: - enum.Enum- An enumeration. - 
CATEGORICAL= 'categorical'
 - 
CONTINUOUS= 'continuous'
 - 
LIST= 'list'
 - 
TEXT= 'text'
 - 
TEXT_TOKENIZED= 'text_tokenized'
 - 
TIME= 'time'
 - 
USER= 'user'
 - 
USER_ID= 'user_id'
 - 
ITEM= 'item'
 - 
ITEM_ID= 'item_id'
 - 
SESSION= 'session'
 - 
SESSION_ID= 'session_id'
 - 
CONTEXT= 'context'
 - 
TARGETS= 'target'
 - 
BINARY_CLASSIFICATION= 'binary_classification'
 - 
MULTI_CLASS_CLASSIFICATION= 'multi_class'
 - 
REGRESSION= 'regression'
 
- 
- 
class transformers4rec.torch.T4RecConfig[source]
- Bases: - object- 
to_torch_model(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction='mean', **kwargs)[source]
 - 
to_tf_model(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction=<function reduce_mean>, **kwargs)[source]
 - 
property transformers_config_cls
 
- 
- 
class transformers4rec.torch.GPT2Config(vocab_size=50257, n_positions=1024, n_embd=768, n_layer=12, n_head=12, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, scale_attn_weights=True, use_cache=True, bos_token_id=50256, eos_token_id=50256, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.gpt2.configuration_gpt2.GPT2Config
- 
class transformers4rec.torch.XLNetConfig(vocab_size=32000, d_model=1024, n_layer=24, n_head=16, d_inner=4096, ff_activation='gelu', untie_r=True, attn_type='bi', initializer_range=0.02, layer_norm_eps=1e-12, dropout=0.1, mem_len=512, reuse_len=None, use_mems_eval=True, use_mems_train=False, bi_data=False, clamp_len=- 1, same_length=False, summary_type='last', summary_use_proj=True, summary_activation='tanh', summary_last_dropout=0.1, start_n_top=5, end_n_top=5, pad_token_id=5, bos_token_id=1, eos_token_id=2, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.xlnet.configuration_xlnet.XLNetConfig
- 
class transformers4rec.torch.TransfoXLConfig(vocab_size=267735, cutoffs=[20000, 40000, 200000], d_model=1024, d_embed=1024, n_head=16, d_head=64, d_inner=4096, div_val=4, pre_lnorm=False, n_layer=18, mem_len=1600, clamp_len=1000, same_length=True, proj_share_all_but_first=True, attn_type=0, sample_softmax=- 1, adaptive=True, dropout=0.1, dropatt=0.0, untie_r=True, init='normal', init_range=0.01, proj_init_std=0.01, init_std=0.02, layer_norm_epsilon=1e-05, eos_token_id=0, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.transfo_xl.configuration_transfo_xl.TransfoXLConfig
- 
class transformers4rec.torch.LongformerConfig(attention_window: Union[List[int], int] = 512, sep_token_id: int = 2, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.longformer.configuration_longformer.LongformerConfig
- 
class transformers4rec.torch.AlbertConfig(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.albert.configuration_albert.AlbertConfig
- 
class transformers4rec.torch.ReformerConfig(attention_head_size=64, attn_layers=['local', 'lsh', 'local', 'lsh', 'local', 'lsh'], axial_norm_std=1.0, axial_pos_embds=True, axial_pos_shape=[64, 64], axial_pos_embds_dim=[64, 192], chunk_size_lm_head=0, eos_token_id=2, feed_forward_size=512, hash_seed=None, hidden_act='relu', hidden_dropout_prob=0.05, hidden_size=256, initializer_range=0.02, is_decoder=False, layer_norm_eps=1e-12, local_num_chunks_before=1, local_num_chunks_after=0, local_attention_probs_dropout_prob=0.05, local_attn_chunk_length=64, lsh_attn_chunk_length=64, lsh_attention_probs_dropout_prob=0.0, lsh_num_chunks_before=1, lsh_num_chunks_after=0, max_position_embeddings=4096, num_attention_heads=12, num_buckets=None, num_hashes=1, pad_token_id=0, vocab_size=320, tie_word_embeddings=False, use_cache=True, classifier_dropout=None, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.reformer.configuration_reformer.ReformerConfig
- 
class transformers4rec.torch.ElectraConfig(vocab_size=30522, embedding_size=128, hidden_size=256, num_hidden_layers=12, num_attention_heads=4, intermediate_size=1024, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, summary_type='first', summary_use_proj=True, summary_activation='gelu', summary_last_dropout=0.1, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source]
- Bases: - transformers4rec.config.transformer.T4RecConfig,- transformers.models.electra.configuration_electra.ElectraConfig
- 
class transformers4rec.torch.T4RecTrainingArguments(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: transformers.trainer_utils.IntervalStrategy = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: transformers.trainer_utils.SchedulerType = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'passive', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: transformers.trainer_utils.IntervalStrategy = 'steps', logging_first_step: bool = False, logging_steps: int = 500, logging_nan_inf_filter: bool = True, save_strategy: transformers.trainer_utils.IntervalStrategy = 'steps', save_steps: int = 500, save_total_limit: Optional[int] = None, save_on_each_node: bool = False, no_cuda: bool = False, seed: int = 42, data_seed: Optional[int] = None, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, xpu_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[int] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: transformers.training_args.OptimizerNames = 'adamw_hf', adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: transformers.trainer_utils.HubStrategy = 'every_save', hub_token: Optional[str] = None, gradient_checkpointing: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'nvtabular', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 10, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source]
- Bases: - transformers.training_args.TrainingArguments- Class that inherits HF TrainingArguments and add on top of it arguments needed for session-based and sequential-based recommendation - Parameters
- shuffle_buffer_size (int) – 
- validate_every (Optional[int], int) – Run validation set every this epoch. -1 means no validation is used by default -1 
- eval_on_test_set (bool) – 
- eval_steps_on_train_set (int) – 
- predict_top_k (Option[int], int) – Truncate recommendation list to the highest top-K predicted items (do not affect evaluation metrics computation) by default 10 
- log_predictions (Optional[bool], bool) – log predictions, labels and metadata features each –compute_metrics_each_n_steps (for test set). by default False 
- log_attention_weights (Optional[bool], bool) – Logs the inputs and attention weights each –eval_steps (only test set)” bu default False 
- learning_rate_num_cosine_cycles_by_epoch (Optional[int], int) – Number of cycles for by epoch when –lr_scheduler_type = cosine_with_warmup. The number of waves in the cosine schedule (e.g. 0.5 is to just decrease from the max value to 0, following a half-cosine). by default 1.25 
- experiments_group (Optional[str], str) – Name of the Experiments Group, for organizing job runs logged on W&B by default “default” 
 
 - 
property place_model_on_device
- Override the method to allow running training on cpu 
 
- 
class transformers4rec.torch.SequentialBlock(*args, output_size=None)[source]
- Bases: - transformers4rec.torch.block.base.BlockBase,- torch.nn.modules.container.Sequential- 
property inputs
 
- 
property 
- 
class transformers4rec.torch.BlockBase[source]
- Bases: - transformers4rec.torch.utils.torch_utils.OutputSizeMixin,- torch.nn.modules.module.Module
- 
class transformers4rec.torch.TabularBlock(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]
- Bases: - transformers4rec.torch.block.base.BlockBase,- transformers4rec.torch.tabular.base.TabularModule,- abc.ABC- TabularBlock extends TabularModule to turn it into a block with output size info. - Parameters
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 
- 
class transformers4rec.torch.Block(module: torch.nn.modules.module.Module, output_size: Union[List[int], torch.Size])[source]
- 
class transformers4rec.torch.MLPBlock(dimensions, activation=<class 'torch.nn.modules.activation.ReLU'>, use_bias: bool = True, dropout=None, normalization=None, filter_features=None)[source]
- Bases: - transformers4rec.torch.block.base.BuildableBlock- 
build(input_shape) → transformers4rec.torch.block.base.SequentialBlock[source]
 
- 
- 
class transformers4rec.torch.TabularTransformation[source]
- Bases: - transformers4rec.torch.utils.torch_utils.OutputSizeMixin,- torch.nn.modules.module.Module,- abc.ABC- Transformation that takes in TabularData and outputs TabularData. - 
forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]
 
- 
- 
class transformers4rec.torch.SequentialTabularTransformations(*transformation: Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]])[source]
- Bases: - transformers4rec.torch.block.base.SequentialBlock- A sequential container, modules will be added to it in the order they are passed in. - Parameters
- transformation (TabularTransformationType) – transformations that are passed in here will be called in order. 
 
- 
class transformers4rec.torch.TabularAggregation[source]
- Bases: - transformers4rec.torch.utils.torch_utils.OutputSizeMixin,- torch.nn.modules.module.Module,- abc.ABC- Aggregation of TabularData that outputs a single Tensor - 
forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
 
- 
- 
class transformers4rec.torch.StochasticSwapNoise(schema=None, pad_token=0, replacement_prob=0.1)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularTransformation- Applies Stochastic replacement of sequence features. It can be applied as a pre transform like TransformerBlock(pre=”stochastic-swap-noise”) - 
forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], input_mask: Optional[torch.Tensor] = None, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
 - 
augment(input_tensor: torch.Tensor, mask: Optional[torch.Tensor] = None) → torch.Tensor[source]
 
- 
- 
class transformers4rec.torch.TabularLayerNorm(features_dim: Optional[Dict[str, int]] = None)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularTransformation- Applies Layer norm to each input feature individually, before the aggregation - 
classmethod from_feature_config(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig])[source]
 - 
forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]
 
- 
classmethod 
- 
class transformers4rec.torch.TabularDropout(dropout_rate=0.0)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularTransformation- Applies dropout transformation. - 
forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
 
- 
- 
class transformers4rec.torch.TransformerBlock(transformer: Union[transformers.modeling_utils.PreTrainedModel, transformers.configuration_utils.PretrainedConfig], masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, prepare_module: Optional[Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = None, output_fn=<function TransformerBlock.<lambda>>)[source]
- Bases: - transformers4rec.torch.block.base.BlockBase- Class to support HF Transformers for session-based and sequential-based recommendation models. - Parameters
- transformer (TransformerBody) – The T4RecConfig or a pre-trained HF object related to specific transformer architecture. 
- masking – Needed when masking is applied on the inputs. 
 
 - 
TRANSFORMER_TO_PREPARE: Dict[Type[transformers.modeling_utils.PreTrainedModel], Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = {<class 'transformers.models.gpt2.modeling_gpt2.GPT2Model'>: <class 'transformers4rec.torch.block.transformer.GPT2Prepare'>}
 - 
classmethod from_registry(transformer: str, d_model: int, n_head: int, n_layer: int, total_seq_length: int, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None)[source]
- Load the HF transformer architecture based on its name - Parameters
- transformer (str) – Name of the Transformer to use. Possible values are : [“reformer”, “gtp2”, “longformer”, “electra”, “albert”, “xlnet”] 
- d_model (int) – size of hidden states for Transformers 
- n_head – Number of attention heads for Transformers 
- n_layer (int) – Number of layers for RNNs and Transformers” 
- total_seq_length (int) – The maximum sequence length 
 
 
 
- 
class transformers4rec.torch.ContinuousFeatures(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]
- Bases: - transformers4rec.torch.features.base.InputBlock- Input block for continuous features. - Parameters
- features (List[str]) – List of continuous features to include in this module. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 
- 
class transformers4rec.torch.EmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]
- Bases: - transformers4rec.torch.features.base.InputBlock- Input block for embedding-lookups for categorical features. - For multi-hot features, the embeddings will be aggregated into a single tensor using the mean. - Parameters
- feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features. 
- item_id (str, optional) – The name of the feature that’s used for the item_id. 
 
 - pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
- Transformations to apply on the inputs when the module is called (so before forward). 
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
- Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation: Union[str, TabularAggregation], optional
- Aggregation to apply after processing the forward-method to output a single Tensor. 
 - 
property item_embedding_table
 - 
table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.module.Module[source]
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin_standard_lib.schema.tag.Tag, list, str]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs) → Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source]
- Instantitates - EmbeddingFeaturesfrom a- DatasetSchema.- Parameters
- schema (DatasetSchema) – Dataset schema 
- embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None 
- default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in - default_soft_embedding_dim, by default 64
- infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False 
- infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0 
- embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables 
- combiner (Optional[str], optional) – Feature aggregation option, by default “mean” 
- tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None 
- item_id (Optional[str], optional) – Name of the item id column (feature), by default None 
- automatic_build (bool, optional) – Automatically infers input size from features, by default True 
- max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None 
 
- Returns
- Returns the - EmbeddingFeaturesfor the dataset schema
- Return type
- Optional[EmbeddingFeatures] 
 
 - 
item_ids(inputs) → torch.Tensor[source]
 
- 
class transformers4rec.torch.SoftEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source]
- Bases: - transformers4rec.torch.features.embedding.EmbeddingFeatures- Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings. - Parameters
- feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features. 
- layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin_standard_lib.schema.tag.Tag, list, str]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs) → Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source]
- Instantitates - SoftEmbeddingFeaturesfrom a- DatasetSchema.- Parameters
- schema (DatasetSchema) – Dataset schema 
- soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None 
- soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in - soft_embedding_cardinalities, by default 10
- soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None 
- soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in - soft_embedding_dim_default, by default 8
- embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables 
- combiner (Optional[str], optional) – Feature aggregation option, by default “mean” 
- tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None 
- automatic_build (bool, optional) – Automatically infers input size from features, by default True 
- max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None 
 
- Returns
- Returns a - SoftEmbeddingFeaturesinstance from the dataset schema
- Return type
- Optional[SoftEmbeddingFeatures] 
 
 - 
table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → transformers4rec.torch.features.embedding.SoftEmbedding[source]
 
- 
class transformers4rec.torch.TabularSequenceFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]
- Bases: - transformers4rec.torch.features.tabular.TabularFeatures- Input module that combines different types of features to a sequence: continuous, categorical & text. - Parameters
- continuous_module (TabularModule, optional) – Module used to process continuous features. 
- categorical_module (TabularModule, optional) – Module used to process categorical features. 
- text_embedding_module (TabularModule, optional) – Module used to process text features. 
- projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock. 
- masking (MaskSequence, optional) – Masking to apply to the inputs. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 - 
EMBEDDING_MODULE_CLASS
- alias of - transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]], Tuple[merlin_standard_lib.schema.tag.Tag]]] = (<Tag.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]], Tuple[merlin_standard_lib.schema.tag.Tag]]] = (<Tag.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[List[int], int]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs) → transformers4rec.torch.features.sequence.TabularSequenceFeatures[source]
- Instantiates - TabularFeaturesfrom a- DatasetSchema- Parameters
- schema (DatasetSchema) – Dataset schema 
- continuous_tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter the continuous features, by default Tag.CONTINUOUS 
- categorical_tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter the categorical features, by default Tag.CATEGORICAL 
- aggregation (Optional[str], optional) – Feature aggregation option, by default None 
- automatic_build (bool, optional) – Automatically infers input size from features, by default True 
- max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None 
- continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and projet them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None 
- continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False 
- projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None 
- d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None 
- masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None 
 
- Returns
- Returns - TabularFeaturesfrom a dataset schema
- Return type
 
 - 
property masking
 - 
property item_id
 - 
property item_embedding_table
 
- 
class transformers4rec.torch.SequenceEmbeddingFeatures(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]
- Bases: - transformers4rec.torch.features.embedding.EmbeddingFeatures- Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers. - Parameters
- feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features. 
- item_id (str, optional) – The name of the feature that’s used for the item_id. 
- padding_idx (int) – The symbol to use for padding. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 - 
table_to_embedding_module(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.sparse.Embedding[source]
 
- 
class transformers4rec.torch.FeatureConfig(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source]
- Bases: - object
- 
class transformers4rec.torch.TableConfig(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source]
- Bases: - object
- 
class transformers4rec.torch.TabularFeatures(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]
- Bases: - transformers4rec.torch.tabular.base.MergeTabular- Input module that combines different types of features: continuous, categorical & text. - Parameters
- continuous_module (TabularModule, optional) – Module used to process continuous features. 
- categorical_module (TabularModule, optional) – Module used to process categorical features. 
- text_embedding_module (TabularModule, optional) – Module used to process text features. 
 
 - pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
- Transformations to apply on the inputs when the module is called (so before forward). 
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
- Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation: Union[str, TabularAggregation], optional
- Aggregation to apply after processing the forward-method to output a single Tensor. 
 - 
CONTINUOUS_MODULE_CLASS
- alias of - transformers4rec.torch.features.continuous.ContinuousFeatures
 - 
EMBEDDING_MODULE_CLASS
- alias of - transformers4rec.torch.features.embedding.EmbeddingFeatures
 - 
SOFT_EMBEDDING_MODULE_CLASS
- alias of - transformers4rec.torch.features.embedding.SoftEmbeddingFeatures
 - 
project_continuous_features(mlp_layers_dims: Union[List[int], int]) → transformers4rec.torch.features.tabular.TabularFeatures[source]
- Combine all concatenated continuous features with stacked MLP layers 
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]], Tuple[merlin_standard_lib.schema.tag.Tag]]] = (<Tag.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]], Tuple[merlin_standard_lib.schema.tag.Tag]]] = (<Tag.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[List[int], int]] = None, continuous_soft_embeddings: bool = False, **kwargs) → transformers4rec.torch.features.tabular.TabularFeatures[source]
- Instantiates - TabularFeaturesfrom a- DatasetSchema- Parameters
- schema (DatasetSchema) – Dataset schema 
- continuous_tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter the continuous features, by default Tag.CONTINUOUS 
- categorical_tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter the categorical features, by default Tag.CATEGORICAL 
- aggregation (Optional[str], optional) – Feature aggregation option, by default None 
- automatic_build (bool, optional) – Automatically infers input size from features, by default True 
- max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None 
- continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None 
- continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False 
 
- Returns
- Returns - TabularFeaturesfrom a dataset schema
- Return type
 
 - 
property continuous_module
 - 
property categorical_module
 
- 
class transformers4rec.torch.Head(body: transformers4rec.torch.block.base.BlockBase, prediction_tasks: Union[List[transformers4rec.torch.model.base.PredictionTask], transformers4rec.torch.model.base.PredictionTask], task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weights: Optional[List[float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None)[source]
- Bases: - torch.nn.modules.module.Module,- transformers4rec.torch.utils.torch_utils.LossMixin,- transformers4rec.torch.utils.torch_utils.MetricsMixin- Head of a Model, a head has a single body but could have multiple prediction-tasks. - Parameters
- body (Block) – TODO 
- prediction_tasks (Union[List[PredictionTask], PredictionTask], optional) – TODO 
- task_blocks – TODO 
- task_weights (List[float], optional) – TODO 
- loss_reduction (str, default="mean") – TODO 
- inputs (TabularFeaturesType, optional) – TODO 
 
 - 
build(inputs=None, device=None, task_blocks=None)[source]
- Build each prediction task that’s part of the head. - Parameters
- body – 
- inputs – 
- device – 
- task_blocks – 
 
 
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, body: transformers4rec.torch.block.base.BlockBase, task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weight_dict: Optional[Dict[str, float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None) → transformers4rec.torch.model.base.Head[source]
- Instantiate a Head from a Schema through tagged targets. - Parameters
- schema (DatasetSchema) – Schema to use for inferring all targets based on the tags. 
- body – 
- task_blocks – 
- task_weight_dict – 
- loss_reduction – 
- inputs – 
 
- Returns
- Return type
 
 - 
pop_labels(inputs: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source]
- Pop the labels from the different prediction_tasks from the inputs. - Parameters
- inputs (TabularData) – Input dictionary containing all targets. 
- Returns
- Return type
- TabularData 
 
 - 
forward(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], training: bool = True, call_body: bool = False, always_output_dict: bool = False, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
 - 
compute_loss(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], training: bool = True, compute_metrics: bool = True, call_body: bool = False, **kwargs) → torch.Tensor[source]
 - 
calculate_metrics(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], mode: str = 'val', forward=True, call_body=False, **kwargs) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]
 - 
property task_blocks
 
- 
class transformers4rec.torch.Model(*head: transformers4rec.torch.model.base.Head, head_weights: Optional[List[float]] = None, head_reduction: str = 'mean', optimizer: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, name=None)[source]
- Bases: - torch.nn.modules.module.Module,- transformers4rec.torch.utils.torch_utils.LossMixin,- transformers4rec.torch.utils.torch_utils.MetricsMixin- Model class that can aggregate one of multiple heads. - Parameters
- head (Head) – One or more heads of the model. 
- head_weights (List[float], optional) – Weight-value to use for each head. 
- head_reduction (str, optional) – How to reduce the losses into a single tensor when multiple heads are used. 
- optimizer (Type[torch.optim.Optimizer]) – Optimizer-class to use during fitting 
- name (str, optional) – Name of the model. 
 
 - 
forward(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], training=True, **kwargs)[source]
 - 
compute_loss(inputs, targets, compute_metrics=True, **kwargs) → torch.Tensor[source]
 - 
calculate_metrics(inputs, targets, mode='val', call_body=True, forward=True, **kwargs) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source]
 - 
compute_metrics(mode=None) → Dict[str, Union[float, torch.Tensor]][source]
 
- 
class transformers4rec.torch.PredictionTask(loss: torch.nn.modules.module.Module, metrics: Optional[Iterable[torchmetrics.metric.Metric]] = None, target_name: Optional[str] = None, task_name: Optional[str] = None, forward_to_prediction_fn: Callable[[torch.Tensor], torch.Tensor] = <function PredictionTask.<lambda>>, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, summary_type: str = 'last')[source]
- Bases: - torch.nn.modules.module.Module,- transformers4rec.torch.utils.torch_utils.LossMixin,- transformers4rec.torch.utils.torch_utils.MetricsMixin- Individual prediction-task of a model. - Parameters
- loss (torch.nn.Module) – The loss to use during training of this task. 
- metrics (torch.nn.Module) – The metrics to calculate during training & evaluation. 
- target_name (str, optional) – Name of the target, this is needed when there are multiple targets. 
- task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name. 
- forward_to_prediction_fn (Callable[[torch.Tensor], torch.Tensor]) – Function to apply before the prediction 
- task_block (BlockType) – Module to transform input tensor before computing predictions. 
- pre (BlockType) – Module to compute the predictions probabilities. 
- summary_type (str) – - This is used to summarize a sequence into a single tensor. Accepted values are:
- ”last” – Take the last token hidden state (like XLNet) 
- ”first” – Take the first token hidden state (like Bert) 
- ”mean” – Take the mean of all tokens hidden states 
- ”cls_index” – Supply a Tensor of classification token position (GPT/GPT-2) 
- ”attn” – Not implemented now, use multi-head attention 
 
 
 
 - 
build(body: Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock], input_size, inputs: Optional[transformers4rec.torch.features.base.InputBlock] = None, device=None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre=None)[source]
- The method will be called when block is converted to a model, i.e when linked to prediction head. - Parameters
- block – the model block to link with head 
- device – set the device for the metrics and layers of the task 
 
 
 - 
property task_name
 - 
compute_loss(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], compute_metrics: bool = True, training: bool = False, **kwargs) → torch.Tensor[source]
 - 
calculate_metrics(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], mode: str = 'val', forward: bool = True, **kwargs) → Dict[str, torch.Tensor][source]
 
- 
class transformers4rec.torch.AsTabular(output_name: str)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularBlock- Converts a Tensor to TabularData by converting it to a dictionary. - Parameters
- output_name (str) – Name that should be used as the key in the output dictionary. 
 - 
forward(inputs: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]
 
- 
class transformers4rec.torch.ConcatFeatures[source]
- Bases: - transformers4rec.torch.tabular.base.TabularAggregation- Aggregation by stacking all values in TabularData, all non-sequential values will be converted to a sequence. - The output of this concatenation will have 3 dimensions. - 
forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
 
- 
- 
class transformers4rec.torch.FilterFeatures(to_include: List[str], pop: bool = False)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularTransformation- Module that filters out certain features from TabularData.” - Parameters
 - 
forward(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]
- Parameters
- inputs (TabularData) – Input dictionary containing features to filter. 
- Filtered TabularData that only contains the feature-names in self.to_include. (Returns) – 
- ------- – 
 
 
 
- 
class transformers4rec.torch.ElementwiseSum[source]
- Bases: - transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation- Aggregation by first stacking all values in TabularData in the first dimension, and then summing the result. - 
forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
 
- 
- 
class transformers4rec.torch.ElementwiseSumItemMulti(schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source]
- Bases: - transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation- Aggregation by applying the ElementwiseSum aggregation to all features except the item-id, and then multiplying this with the item-ids. - Parameters
- schema (DatasetSchema) – 
 - 
forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
 - 
REQUIRES_SCHEMA= True
 
- 
class transformers4rec.torch.MergeTabular(*modules_to_merge: Union[transformers4rec.torch.tabular.base.TabularModule, Dict[str, transformers4rec.torch.tabular.base.TabularModule]], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularBlock- Merge multiple TabularModule’s into a single output of TabularData. - Parameters
- modules_to_merge (Union[TabularModule, Dict[str, TabularModule]]) – TabularModules to merge into, this can also be one or multiple dictionaries keyed by the name the module should have. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 - 
property merge_values
 - 
forward(inputs: Dict[str, torch.Tensor], training=True, **kwargs) → Dict[str, torch.Tensor][source]
 
- 
class transformers4rec.torch.StackFeatures(axis: int = - 1)[source]
- Bases: - transformers4rec.torch.tabular.base.TabularAggregation- Aggregation by stacking all values in input dictionary in the given dimension. - Parameters
- axis (int, default=-1) – Axis to use for the stacking operation. 
 - 
forward(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
 
- 
class transformers4rec.torch.BinaryClassificationTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=BCELoss(), metrics=(Precision(), Recall(), Accuracy()), summary_type='first')[source]
- Bases: - transformers4rec.torch.model.base.PredictionTask- 
DEFAULT_LOSS= BCELoss()
 - 
DEFAULT_METRICS= (Precision(), Recall(), Accuracy())
 
- 
- 
class transformers4rec.torch.RegressionTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=MSELoss(), metrics=(MeanSquaredError()), summary_type='first')[source]
- Bases: - transformers4rec.torch.model.base.PredictionTask- 
DEFAULT_LOSS= MSELoss()
 - 
DEFAULT_METRICS= (MeanSquaredError(),)
 
- 
- 
class transformers4rec.torch.NextItemPredictionTask(loss: torch.nn.modules.module.Module = NLLLoss(), metrics: Iterable[torchmetrics.metric.Metric] = (NDCGAt(), AvgPrecisionAt(), RecallAt()), task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, task_name: str = 'next-item', weight_tying: bool = False, softmax_temperature: float = 1, padding_idx: int = 0, target_dim: Optional[int] = None, hf_format=False)[source]
- Bases: - transformers4rec.torch.model.base.PredictionTask- Next-item prediction task. - Parameters
- loss (torch.nn.Module) – Loss function to use. Defaults to NLLLos. 
- metrics (Iterable[torchmetrics.Metric]) – List of ranking metrics to use for evaluation. 
- task_block – Module to transform input tensor before computing predictions. 
- task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name. 
- weight_tying (bool) – The item id embedding table weights are shared with the prediction network layer. 
- softmax_temperature (float) – Softmax temperature, used to reduce model overconfidence, so that softmax(logits / T). Value 1.0 reduces to regular softmax. 
- padding_idx (int) – pad token id. 
- target_dim (int) – vocabulary size of item ids 
- hf_format (bool) – Output the dictionary of outputs needed by RecSysTrainer, if set to False, return the predictions tensor. 
 
 - 
DEFAULT_METRICS= (NDCGAt(), AvgPrecisionAt(), RecallAt())
 - 
build(body, input_size, device=None, inputs=None, task_block=None, pre=None)[source]
- Build method, this is called by the Head. 
 - 
forward(inputs: torch.Tensor, **kwargs)[source]
 - 
calculate_metrics(predictions, targets, mode='val', forward=True, **kwargs) → Dict[str, torch.Tensor][source]
 
- 
class transformers4rec.torch.TabularModule(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwargs)[source]
- Bases: - torch.nn.modules.module.Module- PyTorch Module that’s specialized for tabular-data by integrating many often used operations. - Parameters
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
 - 
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, tags=None, **kwargs) → Optional[transformers4rec.torch.tabular.base.TabularModule][source]
- Instantiate a TabularModule instance from a DatasetSchema. - Parameters
- schema – 
- tags – 
- kwargs – 
 
- Returns
- Return type
- Optional[TabularModule] 
 
 - 
classmethod from_features(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → transformers4rec.torch.tabular.base.TabularModule[source]
- Initializes a TabularModule instance where the contents of features will be filtered
- out 
 - Parameters
- features (List[str]) – A list of feature-names that will be used as the first pre-processing op to filter out all other features not in this list. 
- pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward). 
- post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward). 
- aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor. 
 
- Returns
- Return type
 
 - 
property pre
- returns: :rtype: SequentialTabularTransformations, optional 
 - 
property post
- returns: :rtype: SequentialTabularTransformations, optional 
 - 
property aggregation
- returns: :rtype: TabularAggregation, optional 
 - 
pre_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None) → Dict[str, torch.Tensor][source]
- Method that’s typically called before the forward method for pre-processing. - Parameters
- inputs (TabularData) – input-data, typically the output of the forward method. 
- transformations (TabularAggregationType, optional) – 
 
- Returns
- Return type
- TabularData 
 
 - 
forward(x: Dict[str, torch.Tensor], *args, **kwargs) → Dict[str, torch.Tensor][source]
 - 
post_forward(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, merge_with: Optional[Union[transformers4rec.torch.tabular.base.TabularModule, List[transformers4rec.torch.tabular.base.TabularModule]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
- Method that’s typically called after the forward method for post-processing. - Parameters
- inputs (TabularData) – input-data, typically the output of the forward method. 
- transformations (TabularTransformationType, optional) – Transformations to apply on the input data. 
- merge_with (Union[TabularModule, List[TabularModule]], optional) – Other TabularModule’s to call and merge the outputs with. 
- aggregation (TabularAggregationType, optional) – Aggregation to aggregate the output to a single Tensor. 
 
- Returns
- Return type
- TensorOrTabularData (Tensor when aggregation is set, else TabularData) 
 
 - 
merge(other)
 
- 
class transformers4rec.torch.SoftEmbedding(num_embeddings, embeddings_dim, emb_initializer=None)[source]
- Bases: - torch.nn.modules.module.Module- Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings 
- 
class transformers4rec.torch.Trainer(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source]
- Bases: - transformers.trainer.Trainer- An - Trainerspecialized for sequential recommendation including (session-based and sequtial recommendation)- Parameters
- model (Model) – The Model defined using Transformers4Rec api. 
- args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments. 
- schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None 
- train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None 
- eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None 
- train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None 
- eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None 
- compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None 
- incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard 
 
 - 
get_train_dataloader()[source]
- Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments. 
 - 
get_eval_dataloader(eval_dataset=None)[source]
- Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments. 
 - 
num_examples(dataloader: torch.utils.data.dataloader.DataLoader)[source]
- Overriding - Trainer.num_examples()method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size
 - 
reset_lr_scheduler() → None[source]
- Resets the LR scheduler of the previous - Trainer.train()call, so that a new LR scheduler one is created by the next- Trainer.train()call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train
 - 
create_scheduler(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
 - 
static get_scheduler(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source]
- Unified API to get any scheduler from its name. - Parameters
- name (( - stror :obj:`SchedulerType)) – The name of the scheduler to use.
- optimizer (( - torch.optim.Optimizer)) – The optimizer that will be used during training.
- num_warmup_steps (( - int, optional)) – The number of warmup steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
- num_training_steps (( - int, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.
- num_cycles (( - int, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler
 
 
 - 
prediction_step(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source]
- Overriding - Trainer.prediction_step()to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model
 - 
evaluation_loop(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source]
- Overriding - Trainer.prediction_loop()(shared by- Trainer.evaluate()and- Trainer.predict()) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)- Parameters
- dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data 
- description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test 
- prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None 
- ignore_keys (Optional[List[str]]) – Columns not accepted by the - model.forward()method are automatically removed. by default None
- metric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval 
 
 
 - 
load_model_trainer_states_from_checkpoint(checkpoint_path, model=None)[source]
- This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load) 
 - 
property log_predictions_callback
 
- 
class transformers4rec.torch.LabelSmoothCrossEntropyLoss(weight: Optional[torch.Tensor] = None, reduction: str = 'mean', smoothing=0.0)[source]
- Bases: - torch.nn.modules.loss._WeightedLoss- Constructor for cross-entropy loss with label smoothing - smoothing: float
- The label smoothing factor. it should be between 0 and 1. 
- weight: torch.Tensor
- The tensor of weights given to each class. 
- reduction: str
- Specifies the reduction to apply to the output, possible values are none | sum | mean 
 - Adapted from https://github.com/NingAnMe/Label-Smoothing-for-CrossEntropyLoss-PyTorch