transformers4rec.torch package
Subpackages
- transformers4rec.torch.block package
- transformers4rec.torch.features package
- Submodules
- transformers4rec.torch.features.base module
- transformers4rec.torch.features.continuous module
- transformers4rec.torch.features.embedding module
- transformers4rec.torch.features.sequence module
- transformers4rec.torch.features.tabular module
- transformers4rec.torch.features.text module
- Module contents
- transformers4rec.torch.model package
- transformers4rec.torch.tabular package
- transformers4rec.torch.utils package
Submodules
transformers4rec.torch.masking module
-
class
transformers4rec.torch.masking.
MaskingInfo
(schema: torch.Tensor, targets: torch.Tensor)[source] Bases:
object
-
schema
: torch.Tensor
-
targets
: torch.Tensor
-
-
class
transformers4rec.torch.masking.
MaskSequence
(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, **kwargs)[source] Bases:
transformers4rec.torch.utils.torch_utils.OutputSizeMixin
,torch.nn.modules.module.Module
Base class to prepare masked items inputs/labels for language modeling tasks.
Transformer architectures can be trained in different ways. Depending of the training method, there is a specific masking schema. The masking schema sets the items to be predicted (labels) and mask (hide) their positions in the sequence so that they are not used by the Transformer layers for prediction.
- We currently provide 4 different masking schemes out of the box:
Causal LM (clm)
Masked LM (mlm)
Permutation LM (plm)
Replacement Token Detection (rtd)
This class can be extended to add different a masking scheme.
- Parameters
hidden_size – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
pad_token (int, default = 0) – Index of the padding token used for getting batch of sequences with the same length
-
compute_masked_targets
(item_ids: torch.Tensor, training: bool = False, testing: bool = False) → transformers4rec.torch.masking.MaskingInfo[source] Method to prepare masked labels based on the sequence of item ids. It returns The true labels of masked positions and the related boolean mask. And the attributes of the class mask_schema and masked_targets are updated to be re-used in other modules.
- item_ids: torch.Tensor
The sequence of input item ids used for deriving labels of next item prediction task.
- training: bool
Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.
- testing: bool
Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.
Tuple[MaskingSchema, MaskedTargets]
-
apply_mask_to_inputs
(inputs: torch.Tensor, schema: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor[source] Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.
- Parameters
inputs (torch.Tensor) – The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)
schema (MaskingSchema) – The boolean mask indicating masked positions.
-
predict_all
(item_ids: torch.Tensor) → transformers4rec.torch.masking.MaskingInfo[source] Prepare labels for all next item predictions instead of last-item predictions in a user’s sequence.
- Parameters
item_ids (torch.Tensor) – The sequence of input item ids used for deriving labels of next item prediction task.
- Returns
- Return type
Tuple[MaskingSchema, MaskedTargets]
-
forward
(inputs: torch.Tensor, item_ids: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor[source]
-
property
transformer_arguments
Prepare additional arguments to pass to the Transformer forward methods.
-
class
transformers4rec.torch.masking.
CausalLanguageModeling
(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)[source] Bases:
transformers4rec.torch.masking.MaskSequence
In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked.
- Parameters
hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
train_on_last_item_seq_only (predict only last item during training) –
-
apply_mask_to_inputs
(inputs: torch.Tensor, mask_schema: torch.Tensor, training: bool = False, testing: bool = False) → torch.Tensor[source]
-
class
transformers4rec.torch.masking.
MaskedLanguageModeling
(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)[source] Bases:
transformers4rec.torch.masking.MaskSequence
In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item.
- Parameters
hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it.
-
apply_mask_to_inputs
(inputs: torch.Tensor, mask_schema: torch.Tensor, training=False, testing=False) → torch.Tensor[source] Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.
- inputs: torch.Tensor
The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)
- schema: MaskingSchema
The boolean mask indicating masked positions.
- training: bool
Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.
- testing: bool
Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.
-
class
transformers4rec.torch.masking.
PermutationLanguageModeling
(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)[source] Bases:
transformers4rec.torch.masking.MaskSequence
In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context.
- Parameters
hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
max_span_length (int) – maximum length of a span of masked items
plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items
permute_all (bool) – Compute partial span-based prediction (=False) or not.
-
compute_masked_targets
(item_ids: torch.Tensor, training=False, **kwargs) → transformers4rec.torch.masking.MaskingInfo[source]
-
class
transformers4rec.torch.masking.
ReplacementLanguageModeling
(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)[source] Bases:
transformers4rec.torch.masking.MaskedLanguageModeling
Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks.
- Parameters
hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not
-
get_fake_tokens
(itemid_seq, target_flat, logits)[source] Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements.
- Parameters
itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids
target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences
logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise
- Returns
corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement
discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items
batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled
-
sample_from_softmax
(logits: torch.Tensor) → torch.Tensor[source] Sampling method for replacement token modeling (ELECTRA)
- Parameters
logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model
- Returns
samples – ids of replacements items.
- Return type
torch.Tensor(#pos_item)
transformers4rec.torch.ranking_metric module
-
class
transformers4rec.torch.ranking_metric.
RankingMetric
(top_ks=None, labels_onehot=False)[source] Bases:
torchmetrics.metric.Metric
Metric wrapper for computing ranking metrics@K for session-based task.
- Parameters
-
update
(preds: torch.Tensor, target: torch.Tensor, **kwargs)[source]
-
class
transformers4rec.torch.ranking_metric.
AvgPrecisionAt
(top_ks=None, labels_onehot=False)[source]
transformers4rec.torch.trainer module
-
class
transformers4rec.torch.trainer.
Trainer
(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source] Bases:
transformers.trainer.Trainer
An
Trainer
specialized for sequential recommendation including (session-based and sequtial recommendation)- Parameters
model (Model) – The Model defined using Transformers4Rec api.
args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.
schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None
train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None
eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None
train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None
eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None
compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None
incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard
-
get_train_dataloader
()[source] Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.
-
get_eval_dataloader
(eval_dataset=None)[source] Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.
-
get_test_dataloader
(test_dataset=None)[source] Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.
-
num_examples
(dataloader: torch.utils.data.dataloader.DataLoader)[source] Overriding
Trainer.num_examples()
method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size
-
reset_lr_scheduler
() → None[source] Resets the LR scheduler of the previous
Trainer.train()
call, so that a new LR scheduler one is created by the nextTrainer.train()
call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train
-
create_scheduler
(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
-
static
get_scheduler
(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source] Unified API to get any scheduler from its name.
- Parameters
name ((
str
or :obj:`SchedulerType)) – The name of the scheduler to use.optimizer ((
torch.optim.Optimizer
)) – The optimizer that will be used during training.num_warmup_steps ((
int
, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.num_training_steps ((
int
, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.num_cycles ((
int
, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler
-
compute_loss
(model, inputs, return_outputs=False)[source] Overriding
Trainer.compute_loss()
To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}
-
prediction_step
(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source] Overriding
Trainer.prediction_step()
to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model
-
evaluation_loop
(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source] Overriding
Trainer.prediction_loop()
(shared byTrainer.evaluate()
andTrainer.predict()
) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)- Parameters
dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data
description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test
prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None
ignore_keys (Optional[List[str]]) – Columns not accepted by the
model.forward()
method are automatically removed. by default Nonemetric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval
-
load_model_trainer_states_from_checkpoint
(checkpoint_path, model=None)[source] This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)
-
property
log_predictions_callback
-
class
transformers4rec.torch.trainer.
IncrementalLoggingCallback
(trainer: transformers4rec.torch.trainer.Trainer)[source] Bases:
transformers.trainer_callback.TrainerCallback
An
TrainerCallback
that changes the state of the Trainer on specific hooks for the purpose of the incremental logging :param trainer: :type trainer: Trainer
transformers4rec.torch.typing module
Module contents
-
class
transformers4rec.torch.
Schema
(feature: Sequence[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>, weighted_feature: List[merlin_standard_lib.proto.schema_bp.WeightedFeature] = <betterproto._PLACEHOLDER object>, string_domain: List[merlin_standard_lib.proto.schema_bp.StringDomain] = <betterproto._PLACEHOLDER object>, float_domain: List[merlin_standard_lib.proto.schema_bp.FloatDomain] = <betterproto._PLACEHOLDER object>, int_domain: List[merlin_standard_lib.proto.schema_bp.IntDomain] = <betterproto._PLACEHOLDER object>, default_environment: List[str] = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, dataset_constraints: merlin_standard_lib.proto.schema_bp.DatasetConstraints = <betterproto._PLACEHOLDER object>, tensor_representation_group: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup] = <betterproto._PLACEHOLDER object>)[source] Bases:
merlin_standard_lib.proto.schema_bp._Schema
A collection of column schemas for a dataset.
-
feature
: List[merlin_standard_lib.schema.schema.ColumnSchema] = Field(name=None,type=None,default=<betterproto._PLACEHOLDER object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'betterproto': FieldMetadata(number=1, proto_type='message', map_types=None, group=None, wraps=None)}),_field_type=None)
-
classmethod
create
(column_schemas: Optional[Union[List[Union[merlin_standard_lib.schema.schema.ColumnSchema, str]], Dict[str, Union[merlin_standard_lib.schema.schema.ColumnSchema, str]]]] = None, **kwargs)[source]
-
apply
(selector) → merlin_standard_lib.schema.schema.Schema[source]
-
apply_inverse
(selector) → merlin_standard_lib.schema.schema.Schema[source]
-
select_by_type
(to_select) → merlin_standard_lib.schema.schema.Schema[source]
-
remove_by_type
(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
-
select_by_tag
(to_select) → merlin_standard_lib.schema.schema.Schema[source]
-
remove_by_tag
(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
-
select_by_name
(to_select) → merlin_standard_lib.schema.schema.Schema[source]
-
remove_by_name
(to_remove) → merlin_standard_lib.schema.schema.Schema[source]
-
map_column_schemas
(map_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], merlin_standard_lib.schema.schema.ColumnSchema]) → merlin_standard_lib.schema.schema.Schema[source]
-
filter_column_schemas
(filter_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], bool], negate=False) → merlin_standard_lib.schema.schema.Schema[source]
-
property
column_names
-
property
column_schemas
-
property
item_id_column_name
-
from_json
(value: Union[str, bytes]) → merlin_standard_lib.schema.schema.Schema[source]
-
from_proto_text
(path_or_proto_text: str) → merlin_standard_lib.schema.schema.Schema[source]
-
copy
(**kwargs) → merlin_standard_lib.schema.schema.Schema[source]
-
add
(other, allow_overlap=True) → merlin_standard_lib.schema.schema.Schema[source]
-
-
class
transformers4rec.torch.
T4RecConfig
[source] Bases:
object
-
to_torch_model
(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction='mean', **kwargs)[source]
-
property
transformers_config_cls
-
-
class
transformers4rec.torch.
GPT2Config
(vocab_size=50257, n_positions=1024, n_embd=768, n_layer=12, n_head=12, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, scale_attn_weights=True, use_cache=True, bos_token_id=50256, eos_token_id=50256, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.gpt2.configuration_gpt2.GPT2Config
-
class
transformers4rec.torch.
XLNetConfig
(vocab_size=32000, d_model=1024, n_layer=24, n_head=16, d_inner=4096, ff_activation='gelu', untie_r=True, attn_type='bi', initializer_range=0.02, layer_norm_eps=1e-12, dropout=0.1, mem_len=512, reuse_len=None, use_mems_eval=True, use_mems_train=False, bi_data=False, clamp_len=- 1, same_length=False, summary_type='last', summary_use_proj=True, summary_activation='tanh', summary_last_dropout=0.1, start_n_top=5, end_n_top=5, pad_token_id=5, bos_token_id=1, eos_token_id=2, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.xlnet.configuration_xlnet.XLNetConfig
-
class
transformers4rec.torch.
TransfoXLConfig
(vocab_size=267735, cutoffs=[20000, 40000, 200000], d_model=1024, d_embed=1024, n_head=16, d_head=64, d_inner=4096, div_val=4, pre_lnorm=False, n_layer=18, mem_len=1600, clamp_len=1000, same_length=True, proj_share_all_but_first=True, attn_type=0, sample_softmax=- 1, adaptive=True, dropout=0.1, dropatt=0.0, untie_r=True, init='normal', init_range=0.01, proj_init_std=0.01, init_std=0.02, layer_norm_epsilon=1e-05, eos_token_id=0, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.transfo_xl.configuration_transfo_xl.TransfoXLConfig
-
class
transformers4rec.torch.
LongformerConfig
(attention_window: Union[List[int], int] = 512, sep_token_id: int = 2, pad_token_id: int = 1, bos_token_id: int = 0, eos_token_id: int = 2, vocab_size: int = 30522, hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, onnx_export: bool = False, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.longformer.configuration_longformer.LongformerConfig
-
class
transformers4rec.torch.
AlbertConfig
(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.albert.configuration_albert.AlbertConfig
-
class
transformers4rec.torch.
ReformerConfig
(attention_head_size=64, attn_layers=['local', 'lsh', 'local', 'lsh', 'local', 'lsh'], axial_norm_std=1.0, axial_pos_embds=True, axial_pos_shape=[64, 64], axial_pos_embds_dim=[64, 192], chunk_size_lm_head=0, eos_token_id=2, feed_forward_size=512, hash_seed=None, hidden_act='relu', hidden_dropout_prob=0.05, hidden_size=256, initializer_range=0.02, is_decoder=False, layer_norm_eps=1e-12, local_num_chunks_before=1, local_num_chunks_after=0, local_attention_probs_dropout_prob=0.05, local_attn_chunk_length=64, lsh_attn_chunk_length=64, lsh_attention_probs_dropout_prob=0.0, lsh_num_chunks_before=1, lsh_num_chunks_after=0, max_position_embeddings=4096, num_attention_heads=12, num_buckets=None, num_hashes=1, pad_token_id=0, vocab_size=320, tie_word_embeddings=False, use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.reformer.configuration_reformer.ReformerConfig
-
class
transformers4rec.torch.
ElectraConfig
(vocab_size=30522, embedding_size=128, hidden_size=256, num_hidden_layers=12, num_attention_heads=4, intermediate_size=1024, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, summary_type='first', summary_use_proj=True, summary_activation='gelu', summary_last_dropout=0.1, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.electra.configuration_electra.ElectraConfig
-
class
transformers4rec.torch.
T4RecTrainingArguments
(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: Union[transformers.trainer_utils.SchedulerType, str] = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'warning', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', save_steps: float = 500, save_total_limit: Optional[int] = None, save_safetensors: Optional[bool] = False, save_on_each_node: bool = False, no_cuda: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: Optional[int] = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, ddp_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[float] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', fsdp: str = '', fsdp_min_num_params: int = 0, fsdp_config: Optional[str] = None, fsdp_transformer_layer_cls_to_wrap: Optional[str] = None, deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: Union[transformers.training_args.OptimizerNames, str] = 'adamw_hf', optim_args: Optional[str] = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: Union[transformers.trainer_utils.HubStrategy, str] = 'every_save', hub_token: Optional[str] = None, hub_private_repo: bool = False, gradient_checkpointing: bool = False, include_inputs_for_metrics: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: Optional[str] = None, ray_scope: Optional[str] = 'last', ddp_timeout: Optional[int] = 1800, torch_compile: bool = False, torch_compile_backend: Optional[str] = None, torch_compile_mode: Optional[str] = None, xpu_backend: Optional[str] = None, max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'merlin', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 0, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source] Bases:
transformers.training_args.TrainingArguments
Class that inherits HF TrainingArguments and add on top of it arguments needed for session-based and sequential-based recommendation
- Parameters
shuffle_buffer_size (int) –
validate_every (Optional[int], int) – Run validation set every this epoch. -1 means no validation is used by default -1
eval_on_test_set (bool) –
eval_steps_on_train_set (int) –
predict_top_k (Option[int], int) – Truncate recommendation list to the highest top-K predicted items, (do not affect evaluation metrics computation), this parameter is specific to NextItemPredictionTask. by default 0
log_predictions (Optional[bool], bool) – log predictions, labels and metadata features each –compute_metrics_each_n_steps (for test set). by default False
log_attention_weights (Optional[bool], bool) – Logs the inputs and attention weights each –eval_steps (only test set)” by default False
learning_rate_num_cosine_cycles_by_epoch (Optional[int], int) – Number of cycles for by epoch when –lr_scheduler_type = cosine_with_warmup. The number of waves in the cosine schedule (e.g. 0.5 is to just decrease from the max value to 0, following a half-cosine). by default 1.25
experiments_group (Optional[str], str) – Name of the Experiments Group, for organizing job runs logged on W&B by default “default”
-
property
place_model_on_device
Override the method to allow running training on cpu
-
class
transformers4rec.torch.
SequentialBlock
(*args, output_size=None)[source] Bases:
transformers4rec.torch.block.base.BlockBase
,torch.nn.modules.container.Sequential
-
property
inputs
-
add_module
(name: str, module: Optional[torch.nn.modules.module.Module]) → None[source]
-
add_module_and_maybe_build
(name: str, module, parent, idx) → torch.nn.modules.module.Module[source]
-
property
-
class
transformers4rec.torch.
BlockBase
(*args, **kwargs)[source] Bases:
transformers4rec.torch.utils.torch_utils.OutputSizeMixin
,torch.nn.modules.module.Module
-
class
transformers4rec.torch.
TabularBlock
(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.block.base.BlockBase
,transformers4rec.torch.tabular.base.TabularModule
,abc.ABC
TabularBlock extends TabularModule to turn it into a block with output size info.
- Parameters
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
class
transformers4rec.torch.
Block
(module: torch.nn.modules.module.Module, output_size: Union[List[int], torch.Size])[source]
-
class
transformers4rec.torch.
MLPBlock
(dimensions, activation=<class 'torch.nn.modules.activation.ReLU'>, use_bias: bool = True, dropout=None, normalization=None, filter_features=None)[source] Bases:
transformers4rec.torch.block.base.BuildableBlock
-
build
(input_shape) → transformers4rec.torch.block.base.SequentialBlock[source]
-
-
class
transformers4rec.torch.
TabularTransformation
(*args, **kwargs)[source] Bases:
transformers4rec.torch.utils.torch_utils.OutputSizeMixin
,torch.nn.modules.module.Module
,abc.ABC
Transformation that takes in TabularData and outputs TabularData.
-
forward
(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]
-
-
class
transformers4rec.torch.
SequentialTabularTransformations
(*transformation: Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]])[source] Bases:
transformers4rec.torch.block.base.SequentialBlock
A sequential container, modules will be added to it in the order they are passed in.
- Parameters
transformation (TabularTransformationType) – transformations that are passed in here will be called in order.
-
class
transformers4rec.torch.
TabularAggregation
(*args, **kwargs)[source] Bases:
transformers4rec.torch.utils.torch_utils.OutputSizeMixin
,torch.nn.modules.module.Module
,abc.ABC
Aggregation of TabularData that outputs a single Tensor
-
forward
(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
-
-
class
transformers4rec.torch.
StochasticSwapNoise
(schema=None, pad_token=0, replacement_prob=0.1)[source] Bases:
transformers4rec.torch.tabular.base.TabularTransformation
Applies Stochastic replacement of sequence features. It can be applied as a pre transform like TransformerBlock(pre=”stochastic-swap-noise”)
-
forward
(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], input_mask: Optional[torch.Tensor] = None, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
-
augment
(input_tensor: torch.Tensor, mask: Optional[torch.Tensor] = None) → torch.Tensor[source]
-
-
class
transformers4rec.torch.
TabularLayerNorm
(features_dim: Optional[Dict[str, int]] = None)[source] Bases:
transformers4rec.torch.tabular.base.TabularTransformation
Applies Layer norm to each input feature individually, before the aggregation
-
classmethod
from_feature_config
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig])[source]
-
forward
(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source]
-
classmethod
-
class
transformers4rec.torch.
TabularDropout
(dropout_rate=0.0)[source] Bases:
transformers4rec.torch.tabular.base.TabularTransformation
Applies dropout transformation.
-
forward
(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
-
-
class
transformers4rec.torch.
TransformerBlock
(transformer: Union[transformers.modeling_utils.PreTrainedModel, transformers.configuration_utils.PretrainedConfig], masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, prepare_module: Optional[Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = None, output_fn=<function TransformerBlock.<lambda>>)[source] Bases:
transformers4rec.torch.block.base.BlockBase
Class to support HF Transformers for session-based and sequential-based recommendation models.
- Parameters
transformer (TransformerBody) – The T4RecConfig or a pre-trained HF object related to specific transformer architecture.
masking – Needed when masking is applied on the inputs.
-
TRANSFORMER_TO_PREPARE
: Dict[Type[transformers.modeling_utils.PreTrainedModel], Type[transformers4rec.torch.block.transformer.TransformerPrepare]] = {<class 'transformers.models.gpt2.modeling_gpt2.GPT2Model'>: <class 'transformers4rec.torch.block.transformer.GPT2Prepare'>}
-
classmethod
from_registry
(transformer: str, d_model: int, n_head: int, n_layer: int, total_seq_length: int, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None)[source] Load the HF transformer architecture based on its name
- Parameters
transformer (str) – Name of the Transformer to use. Possible values are : [“reformer”, “gtp2”, “longformer”, “electra”, “albert”, “xlnet”]
d_model (int) – size of hidden states for Transformers
n_head – Number of attention heads for Transformers
n_layer (int) – Number of layers for RNNs and Transformers”
total_seq_length (int) – The maximum sequence length
-
class
transformers4rec.torch.
ContinuousFeatures
(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.features.base.InputBlock
Input block for continuous features.
- Parameters
features (List[str]) – List of continuous features to include in this module.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
class
transformers4rec.torch.
EmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source] Bases:
transformers4rec.torch.features.base.InputBlock
Input block for embedding-lookups for categorical features.
For multi-hot features, the embeddings will be aggregated into a single tensor using the mean.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
- pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs when the module is called (so before forward).
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs after the module is called (so after forward).
- aggregation: Union[str, TabularAggregation], optional
Aggregation to apply after processing the forward-method to output a single Tensor.
-
property
item_embedding_table
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.module.Module[source]
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, embedding_dims: Optional[Dict[str, int]] = None, embedding_dim_default: int = 64, infer_embedding_sizes: bool = False, infer_embedding_sizes_multiplier: float = 2.0, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, item_id: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, aggregation=None, pre=None, post=None, **kwargs) → Optional[transformers4rec.torch.features.embedding.EmbeddingFeatures][source] Instantitates
EmbeddingFeatures
from aDatasetSchema
.- Parameters
schema (DatasetSchema) – Dataset schema
embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None by default None
default_embedding_dim (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in
default_soft_embedding_dim
, by default 64infer_embedding_sizes (bool, optional) – Automatically defines the embedding dimension from the feature cardinality in the schema, by default False
infer_embedding_sizes_multiplier (Optional[int], by default 2.0) – multiplier used by the heuristic to infer the embedding dimension from its cardinality. Generally reasonable values range between 2.0 and 10.0
embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
item_id (Optional[str], optional) – Name of the item id column (feature), by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features,, by default None
- Returns
Returns the
EmbeddingFeatures
for the dataset schema- Return type
Optional[EmbeddingFeatures]
-
item_ids
(inputs) → torch.Tensor[source]
-
class
transformers4rec.torch.
SoftEmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], layer_norm: bool = True, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwarg)[source] Bases:
transformers4rec.torch.features.embedding.EmbeddingFeatures
Encapsulate continuous features encoded using the Soft-one hot encoding embedding technique (SoftEmbedding), from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it keeps an embedding table for each continuous feature, which is represented as a weighted average of embeddings.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
layer_norm (boolean) – When layer_norm is true, TabularLayerNorm will be used in post.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, soft_embedding_cardinalities: Optional[Dict[str, int]] = None, soft_embedding_cardinality_default: int = 10, soft_embedding_dims: Optional[Dict[str, int]] = None, soft_embedding_dim_default: int = 8, embeddings_initializers: Optional[Dict[str, Callable[[Any], None]]] = None, layer_norm: bool = True, combiner: str = 'mean', tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]]]] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, **kwargs) → Optional[transformers4rec.torch.features.embedding.SoftEmbeddingFeatures][source] Instantitates
SoftEmbeddingFeatures
from aDatasetSchema
.- Parameters
schema (DatasetSchema) – Dataset schema
soft_embedding_cardinalities (Optional[Dict[str, int]], optional) – The cardinality of the embedding table for each feature (key), by default None
soft_embedding_cardinality_default (Optional[int], optional) – Default cardinality of the embedding table, when the feature is not found in
soft_embedding_cardinalities
, by default 10soft_embedding_dims (Optional[Dict[str, int]], optional) – The dimension of the embedding table for each feature (key), by default None
soft_embedding_dim_default (Optional[int], optional) – Default dimension of the embedding table, when the feature is not found in
soft_embedding_dim_default
, by default 8embeddings_initializers (Optional[Dict[str, Callable[[Any], None]]]) – Dict where keys are feature names and values are callable to initialize embedding tables
combiner (Optional[str], optional) – Feature aggregation option, by default “mean”
tags (Optional[Union[DefaultTags, list, str]], optional) – Tags to filter columns, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features, by default None
- Returns
Returns a
SoftEmbeddingFeatures
instance from the dataset schema- Return type
Optional[SoftEmbeddingFeatures]
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → transformers4rec.torch.features.embedding.SoftEmbedding[source]
-
class
transformers4rec.torch.
PretrainedEmbeddingsInitializer
(weight_matrix: Union[torch.Tensor, List[List[float]]], trainable: bool = False, **kwargs)[source] Bases:
torch.nn.modules.module.Module
Initializer of embedding tables with pre-trained weights
- Parameters
weight_matrix (Union[torch.Tensor, List[List[float]]]) – A 2D torch or numpy tensor or lists of lists with the pre-trained weights for embeddings. The expect dims are (embedding_cardinality, embedding_dim). The embedding_cardinality can be inferred from the column schema, for example, schema.select_by_name(“item_id”).feature[0].int_domain.max + 1. The first position of the embedding table is reserved for padded items (id=0).
trainable (bool) – Whether the embedding table should be trainable or not
-
class
transformers4rec.torch.
TabularSequenceFeatures
(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, projection_module: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, torch.nn.modules.module.Module]] = None, masking: Optional[transformers4rec.torch.masking.MaskSequence] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.features.tabular.TabularFeatures
Input module that combines different types of features to a sequence: continuous, categorical & text.
- Parameters
continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
projection_module (BlockOrModule, optional) – Module that’s used to project the output of this module, typically done by an MLPBlock.
masking (MaskSequence, optional) – Masking to apply to the inputs.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.sequence.SequenceEmbeddingFeatures
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, projection: Optional[Union[torch.nn.modules.module.Module, transformers4rec.torch.block.base.BuildableBlock]] = None, d_output: Optional[int] = None, masking: Optional[Union[str, transformers4rec.torch.masking.MaskSequence]] = None, **kwargs) → transformers4rec.torch.features.sequence.TabularSequenceFeatures[source] Instantiates
TabularFeatures
from aDatasetSchema
- Parameters
schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
projection (Optional[Union[torch.nn.Module, BuildableBlock]], optional) – If set, project the aggregated embeddings vectors into hidden dimension vector space, by default None
d_output (Optional[int], optional) – If set, init a MLPBlock as projection module to project embeddings vectors, by default None
masking (Optional[Union[str, MaskSequence]], optional) – If set, Apply masking to the input embeddings and compute masked labels, It requires a categorical_module including an item_id column, by default None
- Returns
Returns
TabularFeatures
from a dataset schema- Return type
-
property
masking
-
property
item_id
-
property
item_embedding_table
-
class
transformers4rec.torch.
SequenceEmbeddingFeatures
(feature_config: Dict[str, transformers4rec.torch.features.embedding.FeatureConfig], item_id: Optional[str] = None, padding_idx: int = 0, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source] Bases:
transformers4rec.torch.features.embedding.EmbeddingFeatures
Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers.
- Parameters
feature_config (Dict[str, FeatureConfig]) – This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features.
item_id (str, optional) – The name of the feature that’s used for the item_id.
padding_idx (int) – The symbol to use for padding.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
table_to_embedding_module
(table: transformers4rec.torch.features.embedding.TableConfig) → torch.nn.modules.sparse.Embedding[source]
-
class
transformers4rec.torch.
FeatureConfig
(table: transformers4rec.torch.features.embedding.TableConfig, max_sequence_length: int = 0, name: Optional[str] = None)[source] Bases:
object
-
class
transformers4rec.torch.
TableConfig
(vocabulary_size: int, dim: int, initializer: Optional[Callable[[torch.Tensor], None]] = None, combiner: str = 'mean', name: Optional[str] = None)[source] Bases:
object
-
class
transformers4rec.torch.
TabularFeatures
(continuous_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, categorical_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, text_embedding_module: Optional[transformers4rec.torch.tabular.base.TabularModule] = None, pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.tabular.base.MergeTabular
Input module that combines different types of features: continuous, categorical & text.
- Parameters
continuous_module (TabularModule, optional) – Module used to process continuous features.
categorical_module (TabularModule, optional) – Module used to process categorical features.
text_embedding_module (TabularModule, optional) – Module used to process text features.
- pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs when the module is called (so before forward).
- post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional
Transformations to apply on the inputs after the module is called (so after forward).
- aggregation: Union[str, TabularAggregation], optional
Aggregation to apply after processing the forward-method to output a single Tensor.
-
CONTINUOUS_MODULE_CLASS
alias of
transformers4rec.torch.features.continuous.ContinuousFeatures
-
EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.embedding.EmbeddingFeatures
-
SOFT_EMBEDDING_MODULE_CLASS
alias of
transformers4rec.torch.features.embedding.SoftEmbeddingFeatures
-
project_continuous_features
(mlp_layers_dims: Union[List[int], int]) → transformers4rec.torch.features.tabular.TabularFeatures[source] Combine all concatenated continuous features with stacked MLP layers
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, continuous_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CONTINUOUS: 'continuous'>,), categorical_tags: Optional[Union[merlin.schema.tags.TagSet, List[str], List[merlin.schema.tags.Tags], List[Union[str, merlin.schema.tags.Tags]], Tuple[merlin.schema.tags.Tags]]] = (<Tags.CATEGORICAL: 'categorical'>,), aggregation: Optional[str] = None, automatic_build: bool = True, max_sequence_length: Optional[int] = None, continuous_projection: Optional[Union[int, List[int]]] = None, continuous_soft_embeddings: bool = False, **kwargs) → transformers4rec.torch.features.tabular.TabularFeatures[source] Instantiates
TabularFeatures
from aDatasetSchema
- Parameters
schema (DatasetSchema) – Dataset schema
continuous_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the continuous features, by default Tags.CONTINUOUS
categorical_tags (Optional[Union[TagsType, Tuple[Tags]]], optional) – Tags to filter the categorical features, by default Tags.CATEGORICAL
aggregation (Optional[str], optional) – Feature aggregation option, by default None
automatic_build (bool, optional) – Automatically infers input size from features, by default True
max_sequence_length (Optional[int], optional) – Maximum sequence length for list features by default None
continuous_projection (Optional[Union[List[int], int]], optional) – If set, concatenate all numerical features and project them by a number of MLP layers. The argument accepts a list with the dimensions of the MLP layers, by default None
continuous_soft_embeddings (bool) – Indicates if the soft one-hot encoding technique must be used to represent continuous features, by default False
- Returns
Returns
TabularFeatures
from a dataset schema- Return type
-
property
continuous_module
-
property
categorical_module
-
class
transformers4rec.torch.
Head
(body: transformers4rec.torch.block.base.BlockBase, prediction_tasks: Union[List[transformers4rec.torch.model.base.PredictionTask], transformers4rec.torch.model.base.PredictionTask], task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weights: Optional[List[float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None)[source] Bases:
torch.nn.modules.module.Module
,transformers4rec.torch.utils.torch_utils.LossMixin
,transformers4rec.torch.utils.torch_utils.MetricsMixin
Head of a Model, a head has a single body but could have multiple prediction-tasks. :param body: TODO :type body: Block :param prediction_tasks: TODO :type prediction_tasks: Union[List[PredictionTask], PredictionTask], optional :param task_blocks: TODO :param task_weights: TODO :type task_weights: List[float], optional :param loss_reduction: TODO :type loss_reduction: str, default=”mean” :param inputs: TODO :type inputs: TabularFeaturesType, optional
-
build
(inputs=None, device=None, task_blocks=None)[source] Build each prediction task that’s part of the head. :param body: :param inputs: :param device: :param task_blocks:
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, body: transformers4rec.torch.block.base.BlockBase, task_blocks: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock, Dict[str, Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]]]] = None, task_weight_dict: Optional[Dict[str, float]] = None, loss_reduction: str = 'mean', inputs: Optional[Union[transformers4rec.torch.features.sequence.TabularSequenceFeatures, transformers4rec.torch.features.tabular.TabularFeatures]] = None) → transformers4rec.torch.model.base.Head[source] Instantiate a Head from a Schema through tagged targets. :param schema: Schema to use for inferring all targets based on the tags. :type schema: DatasetSchema :param body: :param task_blocks: :param task_weight_dict: :param loss_reduction: :param inputs:
- Returns
- Return type
-
pop_labels
(inputs: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor][source] Pop the labels from the different prediction_tasks from the inputs. :param inputs: Input dictionary containing all targets. :type inputs: TabularData
- Returns
- Return type
TabularData
-
forward
(body_outputs: Union[torch.Tensor, Dict[str, torch.Tensor]], training: bool = False, testing: bool = False, targets: Optional[Union[torch.Tensor, Dict[str, torch.Tensor]]] = None, call_body: bool = False, top_k: Optional[int] = None, **kwargs) → Union[torch.Tensor, Dict[str, torch.Tensor]][source]
-
calculate_metrics
(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]]) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source] Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors to use for calculate metrics.
They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple tasks are used. In the second case, the dictionary is indexed by the tasks names.
- Parameters
targets – The tensor or dictionary of targets to use for computing the metrics of one or multiple tasks.
-
property
task_blocks
-
-
class
transformers4rec.torch.
Model
(*head: transformers4rec.torch.model.base.Head, head_weights: Optional[List[float]] = None, head_reduction: str = 'mean', optimizer: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, name: Optional[str] = None, max_sequence_length: Optional[int] = None, top_k: Optional[int] = None)[source] Bases:
torch.nn.modules.module.Module
,transformers4rec.torch.utils.torch_utils.LossMixin
,transformers4rec.torch.utils.torch_utils.MetricsMixin
-
forward
(inputs: Dict[str, torch.Tensor], targets=None, training=False, testing=False, **kwargs)[source]
-
calculate_metrics
(predictions: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]]) → Dict[str, Union[Dict[str, torch.Tensor], torch.Tensor]][source] Calculate metrics of the task(s) set in the Head instance. :param predictions: The predictions tensors returned by the model.
They can be either a torch.Tensor if a single task is used or a dictionary of torch.Tensor if multiple heads/tasks are used. In the second case, the dictionary is indexed by the tasks names.
- Parameters
targets – The tensor or dictionary of targets returned by the model. They are used for computing the metrics of one or multiple tasks.
-
compute_metrics
(mode=None) → Dict[str, Union[float, torch.Tensor]][source]
-
fit
(dataloader, optimizer=<class 'torch.optim.adam.Adam'>, eval_dataloader=None, num_epochs=1, amp=False, train=True, verbose=True, compute_metric=True)[source]
-
evaluate
(dataloader, targets=None, training=False, testing=True, verbose=True, mode='eval')[source]
-
property
input_schema
-
property
output_schema
-
property
prediction_tasks
-
save
(path: Union[str, os.PathLike], model_name='t4rec_model_class')[source] Saves the model to f”{export_path}/{model_name}.pkl” using cloudpickle :param path: Path to the directory where the T4Rec model should be saved. :type path: Union[str, os.PathLike] :param model_name:
- the name given to the pickle file storing the T4Rec model,
by default ‘t4rec_model_class’
-
classmethod
load
(path: Union[str, os.PathLike], model_name='t4rec_model_class') → transformers4rec.torch.model.base.Model[source] Loads a T4Rec model that was saved with model.save(). :param path: Path to the directory where the T4Rec model is saved. :type path: Union[str, os.PathLike] :param model_name:
- the name given to the pickle file storing the T4Rec model,
by default ‘t4rec_model_class’.
-
-
class
transformers4rec.torch.
PredictionTask
(loss: torch.nn.modules.module.Module, metrics: Optional[Iterable[torchmetrics.metric.Metric]] = None, target_name: Optional[str] = None, task_name: Optional[str] = None, forward_to_prediction_fn: Callable[[torch.Tensor], torch.Tensor] = <function PredictionTask.<lambda>>, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, summary_type: str = 'last')[source] Bases:
torch.nn.modules.module.Module
,transformers4rec.torch.utils.torch_utils.LossMixin
,transformers4rec.torch.utils.torch_utils.MetricsMixin
Individual prediction-task of a model. :param loss: The loss to use during training of this task. :type loss: torch.nn.Module :param metrics: The metrics to calculate during training & evaluation. :type metrics: torch.nn.Module :param target_name: Name of the target, this is needed when there are multiple targets. :type target_name: str, optional :param task_name: Name of the prediction task, if not provided a name will be automatically constructed based
on the target-name & class-name.
- Parameters
forward_to_prediction_fn (Callable[[torch.Tensor], torch.Tensor]) – Function to apply before the prediction
task_block (BlockType) – Module to transform input tensor before computing predictions.
pre (BlockType) – Module to compute the predictions probabilities.
summary_type (str) –
- This is used to summarize a sequence into a single tensor. Accepted values are:
”last” – Take the last token hidden state (like XLNet)
”first” – Take the first token hidden state (like Bert)
”mean” – Take the mean of all tokens hidden states
”cls_index” – Supply a Tensor of classification token position (GPT/GPT-2)
”attn” – Not implemented now, use multi-head attention
-
build
(body: Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock], input_size, inputs: Optional[transformers4rec.torch.features.base.InputBlock] = None, device=None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, pre=None)[source] The method will be called when block is converted to a model, i.e when linked to prediction head. :param block: the model block to link with head :param device: set the device for the metrics and layers of the task
-
forward
(inputs: torch.Tensor, targets: Optional[torch.Tensor] = None, training: bool = False, testing: bool = False)[source]
-
property
task_name
-
calculate_metrics
(predictions: torch.Tensor, targets: torch.Tensor) → Dict[str, torch.Tensor][source]
-
class
transformers4rec.torch.
AsTabular
(output_name: str)[source] Bases:
transformers4rec.torch.tabular.base.TabularBlock
Converts a Tensor to TabularData by converting it to a dictionary.
- Parameters
output_name (str) – Name that should be used as the key in the output dictionary.
-
forward
(inputs: torch.Tensor, **kwargs) → Dict[str, torch.Tensor][source]
-
class
transformers4rec.torch.
ConcatFeatures
(*args, **kwargs)[source] Bases:
transformers4rec.torch.tabular.base.TabularAggregation
Aggregation by stacking all values in TabularData, all non-sequential values will be converted to a sequence.
The output of this concatenation will have 3 dimensions.
-
forward
(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
-
-
class
transformers4rec.torch.
FilterFeatures
(to_include: List[str], pop: bool = False)[source] Bases:
transformers4rec.torch.tabular.base.TabularTransformation
Module that filters out certain features from TabularData.”
- Parameters
-
forward
(inputs: Dict[str, torch.Tensor], **kwargs) → Dict[str, torch.Tensor][source] - Parameters
inputs (TabularData) – Input dictionary containing features to filter.
Filtered TabularData that only contains the feature-names in self.to_include. (Returns) –
------- –
-
class
transformers4rec.torch.
ElementwiseSum
[source] Bases:
transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation
Aggregation by first stacking all values in TabularData in the first dimension, and then summing the result.
-
forward
(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
-
-
class
transformers4rec.torch.
ElementwiseSumItemMulti
(schema: Optional[merlin_standard_lib.schema.schema.Schema] = None)[source] Bases:
transformers4rec.torch.tabular.aggregation.ElementwiseFeatureAggregation
Aggregation by applying the ElementwiseSum aggregation to all features except the item-id, and then multiplying this with the item-ids.
- Parameters
schema (DatasetSchema) –
-
forward
(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
-
REQUIRES_SCHEMA
= True
-
class
transformers4rec.torch.
MergeTabular
(*modules_to_merge: Union[transformers4rec.torch.tabular.base.TabularModule, Dict[str, transformers4rec.torch.tabular.base.TabularModule]], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, **kwargs)[source] Bases:
transformers4rec.torch.tabular.base.TabularBlock
Merge multiple TabularModule’s into a single output of TabularData.
- Parameters
modules_to_merge (Union[TabularModule, Dict[str, TabularModule]]) – TabularModules to merge into, this can also be one or multiple dictionaries keyed by the name the module should have.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
property
merge_values
-
forward
(inputs: Dict[str, torch.Tensor], training=True, **kwargs) → Dict[str, torch.Tensor][source]
-
class
transformers4rec.torch.
StackFeatures
(axis: int = - 1)[source] Bases:
transformers4rec.torch.tabular.base.TabularAggregation
Aggregation by stacking all values in input dictionary in the given dimension.
- Parameters
axis (int, default=-1) – Axis to use for the stacking operation.
-
forward
(inputs: Dict[str, torch.Tensor]) → torch.Tensor[source]
-
class
transformers4rec.torch.
BinaryClassificationTask
(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=BCELoss(), metrics=(BinaryPrecision(), BinaryRecall(), BinaryAccuracy()), summary_type='first')[source] Bases:
transformers4rec.torch.model.base.PredictionTask
Returns a
PredictionTask
for binary classification.Example usage:
# Define the input module to process the tabular input features. input_module = tr.TabularSequenceFeatures.from_schema( schema, max_sequence_length=max_sequence_length, continuous_projection=d_model, aggregation="concat", masking=None, ) # Define XLNetConfig class and set default parameters for HF XLNet config. transformer_config = tr.XLNetConfig.build( d_model=d_model, n_head=4, n_layer=2, total_seq_length=max_sequence_length ) # Define the model block including: inputs, masking, projection and transformer block. body = tr.SequentialBlock( input_module, tr.MLPBlock([64]), tr.TransformerBlock( transformer_config, masking=input_module.masking ) ) # Define a head with BinaryClassificationTask. head = tr.Head( body, tr.BinaryClassificationTask( "click", summary_type="mean", metrics=[ tm.Precision(task='binary'), tm.Recall(task='binary'), tm.Accuracy(task='binary'), tm.F1Score(task='binary') ] ), inputs=input_module, ) # Get the end-to-end Model class. model = tr.Model(head)
- Parameters
target_name (Optional[str] = None) – Specifies the variable name that represents the positive and negative values.
task_name (Optional[str] = None) – Specifies the name of the prediction task. If this parameter is not specified, a name is automatically constructed based on
target_name
and the Python class name of the model.task_block (Optional[BlockType] = None) – Specifies a module to transform the input tensor before computing predictions.
loss (torch.nn.Module) – Specifies the loss function for the task. The default class is
torch.nn.BCELoss
.metrics (Tuple[torch.nn.Module, ..]) – Specifies the metrics to calculate during training and evaluation. The default metrics are
Precision
,Recall
, andAccuracy
.summary_type (str) –
Summarizes a sequence into a single tensor. Accepted values are:
last
– Take the last token hidden state (like XLNet)first
– Take the first token hidden state (like Bert)mean
– Take the mean of all tokens hidden statescls_index
– Supply a Tensor of classification token position (GPT/GPT-2)attn
– Not implemented now, use multi-head attention
-
DEFAULT_LOSS
= BCELoss()
-
DEFAULT_METRICS
= (BinaryPrecision(), BinaryRecall(), BinaryAccuracy())
-
class
transformers4rec.torch.
RegressionTask
(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=MSELoss(), metrics=(MeanSquaredError()), summary_type='first')[source] Bases:
transformers4rec.torch.model.base.PredictionTask
-
DEFAULT_LOSS
= MSELoss()
-
DEFAULT_METRICS
= (MeanSquaredError(),)
-
-
class
transformers4rec.torch.
NextItemPredictionTask
(loss: torch.nn.modules.module.Module = CrossEntropyLoss(), metrics: Iterable[torchmetrics.metric.Metric] = (NDCGAt(), AvgPrecisionAt(), RecallAt()), task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, task_name: str = 'next-item', weight_tying: bool = False, softmax_temperature: float = 1, padding_idx: int = 0, target_dim: Optional[int] = None, sampled_softmax: Optional[bool] = False, max_n_samples: Optional[int] = 100)[source] Bases:
transformers4rec.torch.model.base.PredictionTask
This block performs item prediction task for session and sequential-based models. It requires a body containing a masking schema to use for training and target generation. For the supported masking schemes, please refers to: https://nvidia-merlin.github.io/Transformers4Rec/main/model_definition.html#sequence-masking
- Parameters
loss (torch.nn.Module) – Loss function to use. Defaults to NLLLos.
metrics (Iterable[torchmetrics.Metric]) – List of ranking metrics to use for evaluation.
task_block – Module to transform input tensor before computing predictions.
task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name.
weight_tying (bool) – The item id embedding table weights are shared with the prediction network layer.
softmax_temperature (float) – Softmax temperature, used to reduce model overconfidence, so that softmax(logits / T). Value 1.0 reduces to regular softmax.
padding_idx (int) – pad token id.
target_dim (int) – vocabulary size of item ids
sampled_softmax (Optional[bool]) – Enables sampled softmax. By default False
max_n_samples (Optional[int]) – Number of samples for sampled softmax. By default 100
-
DEFAULT_METRICS
= (NDCGAt(), AvgPrecisionAt(), RecallAt())
-
build
(body, input_size, device=None, inputs=None, task_block=None, pre=None)[source] Build method, this is called by the Head.
-
forward
(inputs: torch.Tensor, targets=None, training=False, testing=False, top_k=None, **kwargs)[source]
-
calculate_metrics
(predictions, targets) → Dict[str, torch.Tensor][source]
-
class
transformers4rec.torch.
TabularModule
(pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None, **kwargs)[source] Bases:
torch.nn.modules.module.Module
PyTorch Module that’s specialized for tabular-data by integrating many often used operations.
- Parameters
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
-
classmethod
from_schema
(schema: merlin_standard_lib.schema.schema.Schema, tags=None, **kwargs) → Optional[transformers4rec.torch.tabular.base.TabularModule][source] Instantiate a TabularModule instance from a DatasetSchema.
- Parameters
schema –
tags –
kwargs –
- Returns
- Return type
Optional[TabularModule]
-
classmethod
from_features
(features: List[str], pre: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, post: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → transformers4rec.torch.tabular.base.TabularModule[source] - Initializes a TabularModule instance where the contents of features will be filtered
out
- Parameters
features (List[str]) – A list of feature-names that will be used as the first pre-processing op to filter out all other features not in this list.
pre (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs when the module is called (so before forward).
post (Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional) – Transformations to apply on the inputs after the module is called (so after forward).
aggregation (Union[str, TabularAggregation], optional) – Aggregation to apply after processing the forward-method to output a single Tensor.
- Returns
- Return type
-
property
pre
returns: :rtype: SequentialTabularTransformations, optional
-
property
post
returns: :rtype: SequentialTabularTransformations, optional
-
property
aggregation
returns: :rtype: TabularAggregation, optional
-
pre_forward
(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None) → Dict[str, torch.Tensor][source] Method that’s typically called before the forward method for pre-processing.
- Parameters
inputs (TabularData) – input-data, typically the output of the forward method.
transformations (TabularAggregationType, optional) –
- Returns
- Return type
TabularData
-
forward
(x: Dict[str, torch.Tensor], *args, **kwargs) → Dict[str, torch.Tensor][source]
-
post_forward
(inputs: Dict[str, torch.Tensor], transformations: Optional[Union[str, transformers4rec.torch.tabular.base.TabularTransformation, List[Union[str, transformers4rec.torch.tabular.base.TabularTransformation]]]] = None, merge_with: Optional[Union[transformers4rec.torch.tabular.base.TabularModule, List[transformers4rec.torch.tabular.base.TabularModule]]] = None, aggregation: Optional[Union[str, transformers4rec.torch.tabular.base.TabularAggregation]] = None) → Union[torch.Tensor, Dict[str, torch.Tensor]][source] Method that’s typically called after the forward method for post-processing.
- Parameters
inputs (TabularData) – input-data, typically the output of the forward method.
transformations (TabularTransformationType, optional) – Transformations to apply on the input data.
merge_with (Union[TabularModule, List[TabularModule]], optional) – Other TabularModule’s to call and merge the outputs with.
aggregation (TabularAggregationType, optional) – Aggregation to aggregate the output to a single Tensor.
- Returns
- Return type
TensorOrTabularData (Tensor when aggregation is set, else TabularData)
-
merge
(other)
-
class
transformers4rec.torch.
SoftEmbedding
(num_embeddings, embeddings_dim, emb_initializer=None)[source] Bases:
torch.nn.modules.module.Module
Soft-one hot encoding embedding technique, from https://arxiv.org/pdf/1708.00065.pdf In a nutshell, it represents a continuous feature as a weighted average of embeddings
-
class
transformers4rec.torch.
Trainer
(model: transformers4rec.torch.model.base.Model, args: transformers4rec.config.trainer.T4RecTrainingArguments, schema: Optional[merlin_standard_lib.schema.schema.Schema] = None, train_dataset_or_path=None, eval_dataset_or_path=None, test_dataset_or_path=None, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, eval_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, test_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, callbacks: Optional[List[transformers.trainer_callback.TrainerCallback]] = [], compute_metrics=None, incremental_logging: bool = False, **kwargs)[source] Bases:
transformers.trainer.Trainer
An
Trainer
specialized for sequential recommendation including (session-based and sequtial recommendation)- Parameters
model (Model) – The Model defined using Transformers4Rec api.
args (T4RecTrainingArguments) – The training arguments needed to setup training and evaluation experiments.
schema (Optional[Dataset.schema], optional) – The schema object including features to use and their properties. by default None
train_dataset_or_path (Optional[Union[str, Dataset]], optional) – Path of parquet files or DataSet to use for training. by default None
eval_dataset_or_path (Optional[str, Dataset], optional) – Path of parquet files or DataSet to use for evaluation. by default None
train_dataloader (Optional[DataLoader], optional) – The data generator to use for training. by default None
eval_dataloader (Optional[DataLoader], optional) – The data generator to use for evaluation. by default None
compute_metrics (Optional[bool], optional) – Whether to compute metrics defined by Model class or not. by default None
incremental_logging (bool) – Whether to enable incremental logging or not. If True, it ensures that global steps are incremented over many trainer.train() calls, so that train and eval metrics steps do not overlap and can be seen properly in reports like W&B and Tensorboard
-
get_train_dataloader
()[source] Set the train dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using train_dataset and the data_loader_engine specified in Training Arguments.
-
get_eval_dataloader
(eval_dataset=None)[source] Set the eval dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using eval_dataset and the data_loader_engine specified in Training Arguments.
-
get_test_dataloader
(test_dataset=None)[source] Set the test dataloader to use by Trainer. It supports user defined data-loader set as an attribute in the constructor. When the attribute is None, The data-loader is defined using test_dataset and the data_loader_engine specified in Training Arguments.
-
num_examples
(dataloader: torch.utils.data.dataloader.DataLoader)[source] Overriding
Trainer.num_examples()
method because the data loaders for this project do not return the dataset size, but the number of steps. So we estimate the dataset size here by multiplying the number of steps * batch size
-
reset_lr_scheduler
() → None[source] Resets the LR scheduler of the previous
Trainer.train()
call, so that a new LR scheduler one is created by the nextTrainer.train()
call. This is important for LR schedules like get_linear_schedule_with_warmup() which decays LR to 0 in the end of the train
-
create_scheduler
(num_training_steps: int, optimizer: Optional[torch.optim.optimizer.Optimizer] = None)[source]
-
static
get_scheduler
(name: Union[str, transformers.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None, num_cycles: Optional[int] = 0.5)[source] Unified API to get any scheduler from its name.
- Parameters
name ((
str
or :obj:`SchedulerType)) – The name of the scheduler to use.optimizer ((
torch.optim.Optimizer
)) – The optimizer that will be used during training.num_warmup_steps ((
int
, optional)) – The number of warm-up steps to perform. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.num_training_steps ((
int
, optional)) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.num_cycles ((
int
, optional)) – The number of waves in the cosine schedule / hard restarts to use for cosine scheduler
-
compute_loss
(model, inputs, return_outputs=False)[source] Overriding
Trainer.compute_loss()
To allow for passing the targets to the model’s forward method How the loss is computed by Trainer. By default, all Transformers4Rec models return a dictionary of three elements {‘loss’, ‘predictions’, and ‘labels}
-
prediction_step
(model: torch.nn.modules.module.Module, inputs: Dict[str, torch.Tensor], prediction_loss_only: bool, ignore_keys: Optional[List[str]] = None, training: bool = False, testing: bool = True) → Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor], Optional[Dict[str, Any]]][source] Overriding
Trainer.prediction_step()
to provide more flexibility to unpack results from the model, like returning labels that are not exactly one input feature model
-
evaluation_loop
(dataloader: torch.utils.data.dataloader.DataLoader, description: str, prediction_loss_only: Optional[bool] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: Optional[str] = 'eval') → transformers.trainer_utils.EvalLoopOutput[source] Overriding
Trainer.prediction_loop()
(shared byTrainer.evaluate()
andTrainer.predict()
) to provide more flexibility to work with streaming metrics (computed at each eval batch) and to log with the outputs of the model (e.g. prediction scores, prediction metadata, attention weights)- Parameters
dataloader (DataLoader) – DataLoader object to use to iterate over evaluation data
description (str) – Parameter to describe the evaluation experiment. e.g: Prediction, test
prediction_loss_only (Optional[bool]) – Whether or not to return the loss only. by default None
ignore_keys (Optional[List[str]]) – Columns not accepted by the
model.forward()
method are automatically removed. by default Nonemetric_key_prefix (Optional[str]) – Prefix to use when logging evaluation metrics. by default eval
-
load_model_trainer_states_from_checkpoint
(checkpoint_path, model=None)[source] This method loads the checkpoints states of the model, trainer and random states. If model is None the serialized model class is loaded from checkpoint. It does not loads the optimizer and LR scheduler states (for that call trainer.train() with resume_from_checkpoint argument for a complete load)
-
property
log_predictions_callback
-
transformers4rec.torch.
LabelSmoothCrossEntropyLoss
(smoothing: float = 0.0, reduction: str = 'mean', **kwargs)[source] Coss-entropy loss with label smoothing. This is going to be deprecated. You should use torch.nn.CrossEntropyLoss() directly that in recent PyTorch versions already supports label_smoothing arg