transformers4rec.torch.utils package

Submodules

transformers4rec.torch.utils.data_utils module

class transformers4rec.torch.utils.data_utils.T4RecDataLoader[source]

Bases: abc.ABC

Base Helper class to build dataloader from the schema with properties required by T4Rec Trainer class.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, paths_or_dataset, batch_size, max_sequence_length, **kwargs)[source]
set_dataset(paths_or_dataset)[source]
classmethod parse(class_or_str)[source]
class transformers4rec.torch.utils.data_utils.PyarrowDataLoader(paths_or_dataset, batch_size, max_sequence_length, cols_to_read=None, target_names=None, shuffle=False, shuffle_buffer_size=0, num_workers=1, pin_memory=True, drop_last=False, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

batch_size: Optional[int]
num_workers: int
pin_memory: bool
drop_last: bool
set_dataset(cols_to_read, target_names)[source]

set the Parquet dataset

Parameters

cols_to_read (str) – The list of features names to load

classmethod from_schema(schema, paths_or_dataset, batch_size, max_sequence_length, continuous_features=None, categorical_features=None, targets=None, shuffle=False, shuffle_buffer_size=0, num_workers=1, pin_memory=True, **kwargs)[source]

Instantiates PyarrowDataLoader from a DatasetSchema.

Parameters
  • schema (DatasetSchema) – Dataset schema

  • paths_or_dataset (Union[str, Dataset]) – Path to paquet data of Dataset object.

  • batch_size (int) – batch size of Dataloader.

  • max_sequence_length (int) – The maximum length of list features.

dataset: torch.utils.data.dataset.Dataset[T_co]
timeout: float
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
pin_memory_device: str
prefetch_factor: Optional[int]
class transformers4rec.torch.utils.data_utils.DLDataLoader(*args, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

This class is an extension of the torch dataloader. It is required to support the FastAI framework.

Setting the batch size directly to DLDataLoader makes it 3x slower. So we set as an alternative attribute and use it within T4Rec Trainer during evaluation # TODO : run experiments with new merlin-dataloader

property device
dataset: torch.utils.data.dataset.Dataset[T_co]
batch_size: Optional[int]
num_workers: int
pin_memory: bool
drop_last: bool
timeout: float
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
pin_memory_device: str
prefetch_factor: Optional[int]
class transformers4rec.torch.utils.data_utils.MerlinDataLoader(paths_or_dataset, batch_size, max_sequence_length=None, conts=None, cats=None, labels=None, lists=None, collate_fn=<function MerlinDataLoader.<lambda>>, engine=None, buffer_size=0.1, reader_kwargs=None, shuffle=False, seed_fn=None, parts_per_chunk=1, device=None, global_size=None, global_rank=None, drop_last=False, schema=None, row_groups_per_part=True, transforms=None, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

This class extends the [Merlin data loader] (https://github.com/NVIDIA-Merlin/dataloader/blob/stable/merlin/dataloader/torch.py). The data input requires a merlin.io.Dataset or a path to the data files. It also sets the dataset’s schema with the necessary properties to prepare the input list features as dense tensors (i.e. padded to the specified max_sequence_length). The dense representation is required by the Transformers4Rec input modules.

Parameters
  • paths_or_dataset (Union[str, merlin.io.Dataset]) – The dataset to load.

  • batch_size (int) – The size of each batch to supply to the model.

  • max_sequence_length (int) – The maximum sequence length to use for padding list columns. By default, 0 is used as the padding index.

  • cats (List[str], optional) – The list of categorical columns in the dataset. By default None.

  • conts (List[str], optional) – The list of continuous columns in the dataset. By default None.

  • labels (List[str], optional) – The list of label columns in the dataset. By default None.

  • lists (List[str], optional) – The list of sequential columns in the dataset. By default None.

  • shuffle (bool, optional) – Enable/disable shuffling of dataset. By default False.

  • parts_per_chunk (int) – The number of partitions from the iterator, an Merlin Dataset, to concatenate into a “chunk”. By default 1.

  • device (int, optional) – The device id of the selected GPU By default None.

  • drop_last (bool, optional) – Whether or not to drop the last batch in an epoch. This is useful when you need to guarantee that each batch contains exactly batch_size rows - since the last batch will usually contain fewer rows.

  • seed_fn (callable) – Function used to initialize random state

  • parts_per_chunk – Number of dataset partitions with size dictated by buffer_size to load and concatenate asynchronously. More partitions leads to better epoch-level randomness but can negatively impact throughput

  • global_size (int, optional) – When doing distributed training, this indicates the number of total processes that are training the model.

  • global_rank (int, optional) – When doing distributed training, this indicates the local rank for the current process.

  • schema (Schema, optional) – The Schema with the input features.

  • reader_kwargs – Extra arguments to pass to the merlin.io.Dataset object, when the path to data files is provided in paths_or_dataset argument.

  • row_groups_per_part (bool, optional) – If true, preserve the group partitions when loading the dataset from parquet files.

  • collate_fn (Callable, optional) – A processing function to collect and prepare the list samples (tuple of (input, target) Tensor(s)) returned by the Merlin DataLoader.

  • transforms (List[merlin.dag.BaseOperator]) – A list of operators that the Merlin dataloader applies on top of the loaded batch, which is a tuple of input and target tensors.

batch_size: Optional[int]
drop_last: bool
dataset: torch.utils.data.dataset.Dataset[T_co]
set_dataset(buffer_size, engine, reader_kwargs, schema=None)[source]
classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, paths_or_dataset, batch_size, max_sequence_length=None, continuous_features=None, categorical_features=None, list_features=None, targets=None, collate_fn=<function MerlinDataLoader.<lambda>>, shuffle=True, buffer_size=0.06, parts_per_chunk=1, transforms=None, **kwargs)[source]

Instantitates MerlinDataLoader from a DatasetSchema.

Parameters
  • schema (DatasetSchema) – Dataset schema

  • paths_or_dataset (Union[str, Dataset]) – Path to paquet data of Dataset object.

  • batch_size (int) – batch size of Dataloader.

property output_schema
num_workers: int
pin_memory: bool
timeout: float
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
pin_memory_device: str
prefetch_factor: Optional[int]
class transformers4rec.torch.utils.data_utils.ParquetDataset(parquet_file, cols_to_read, target_names, seq_features_len_pad_trim)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

pad_seq_column_if_needed(values)[source]
class transformers4rec.torch.utils.data_utils.ShuffleDataset(dataset, buffer_size)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co], Iterable[torch.utils.data.dataset.T_co]

transformers4rec.torch.utils.data_utils.to_core_schema(t4rec_schema)[source]

transformers4rec.torch.utils.examples_utils module

transformers4rec.torch.utils.examples_utils.list_files(startpath)[source]

Util function to print the nested structure of a directory

transformers4rec.torch.utils.examples_utils.visualize_response(batch, response, top_k, session_col='session_id')[source]

Util function to extract top-k encoded item-ids from logits

Parameters
  • batch (cudf.DataFrame) – the batch of raw data sent to triton server.

  • response (tritonclient.grpc.InferResult) – the response returned by grpc client.

  • top_k (int) – the top_k top items to retrieve from predictions.

transformers4rec.torch.utils.examples_utils.fit_and_evaluate(trainer, start_time_index, end_time_index, input_dir)[source]

Util function for time-window based fine-tuning using the T4rec Trainer class. Iteratively train using data of a given index and evaluate on the validation data of the following index.

Parameters
  • start_time_index (int) – The start index for training, it should match the partitions of the data directory

  • end_time_index (int) – The end index for training, it should match the partitions of the data directory

  • input_dir (str) – The input directory where the parquet files were saved based on partition column

Returns

indexed_by_time_metrics – The dictionary of ranking metrics: each item is the list of scores over time indices.

Return type

dict

transformers4rec.torch.utils.examples_utils.wipe_memory()[source]

transformers4rec.torch.utils.schema_utils module

transformers4rec.torch.utils.schema_utils.random_data_from_schema(schema: merlin_standard_lib.schema.schema.Schema, num_rows: int, max_session_length: Optional[int] = None, min_session_length: int = 5, device=None, ragged=False, seed=0)Dict[str, torch.Tensor][source]

Generates random tabular data based on a given schema. The generated data can be used for testing data preprocessing or model training pipelines.

Parameters
  • schema (Schema) – The schema to be used for generating the random tabular data.

  • num_rows (int) – The number of rows.

  • max_session_length (Optional[int]) – The maximum session length. If None, the session length will not be limited. By default None

  • min_session_length (int) – The minimum session length. By default 5

  • device (torch.device) – The device on which the synthetic data should be created. If None, the synthetic data will be created on the CPU. By default None

  • ragged (bool) – If True, the sequence features will be represented with __values and __offsets. By default False

Returns

A dictionary where each key is a feature name and each value is the generated tensor.

Return type

TabularData

transformers4rec.torch.utils.torch_utils module

class transformers4rec.torch.utils.torch_utils.OutputSizeMixin[source]

Bases: transformers4rec.config.schema.SchemaMixin, abc.ABC

build(input_size, schema=None, **kwargs)[source]
output_size(input_size=None)[source]
forward_output_size(input_size)[source]
class transformers4rec.torch.utils.torch_utils.LossMixin[source]

Bases: object

Mixin to use for a torch.Module that can calculate a loss.

compute_loss(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], compute_metrics: bool = True, **kwargs)torch.Tensor[source]

Compute the loss on a batch of data.

Parameters
  • inputs (Union[torch.Tensor, TabularData]) – TODO

  • targets (Union[torch.Tensor, TabularData]) – TODO

  • compute_metrics (bool, default=True) – Boolean indicating whether or not to update the state of the metrics (if they are defined).

class transformers4rec.torch.utils.torch_utils.MetricsMixin[source]

Bases: object

Mixin to use for a torch.Module that can calculate metrics.

calculate_metrics(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]])Dict[str, torch.Tensor][source]

Calculate metrics on a batch of data, each metric is stateful and this updates the state.

The state of each metric can be retrieved by calling the compute_metrics method.

Parameters
  • inputs (Union[torch.Tensor, TabularData]) – Tensor or dictionary of predictions returned by the T4Rec model

  • targets (Union[torch.Tensor, TabularData]) – Tensor or dictionary of true labels returned by the T4Rec model

compute_metrics(mode: Optional[str] = None)Dict[str, Union[float, torch.Tensor]][source]

Returns the current state of each metric.

The state is typically updated each batch by calling the calculate_metrics method.

Parameters

mode (str, default="val") –

Returns

Return type

Dict[str, Union[float, torch.Tensor]]

reset_metrics()[source]

Reset all metrics.

transformers4rec.torch.utils.torch_utils.requires_schema(module)[source]
transformers4rec.torch.utils.torch_utils.check_gpu(module)[source]
transformers4rec.torch.utils.torch_utils.get_output_sizes_from_schema(schema: merlin_standard_lib.schema.schema.Schema, batch_size=- 1, max_sequence_length=None)[source]
transformers4rec.torch.utils.torch_utils.calculate_batch_size_from_input_size(input_size)[source]
transformers4rec.torch.utils.torch_utils.check_inputs(ks, scores, labels)[source]
transformers4rec.torch.utils.torch_utils.extract_topk(ks, scores, labels)[source]
transformers4rec.torch.utils.torch_utils.create_output_placeholder(scores, ks)[source]
transformers4rec.torch.utils.torch_utils.tranform_label_to_onehot(labels, vocab_size)[source]
transformers4rec.torch.utils.torch_utils.nested_detach(tensors)[source]

Detach tensors (even if it’s a nested list/tuple/dict of tensors). #TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.nested_concat(tensors, new_tensors, padding_index=- 100)[source]

Concat the new_tensors to tensors on the first dim and pad them on the second if needed. Works for tensors or nested list/tuples/dict of tensors. #TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.torch_pad_and_concatenate(tensor1, tensor2, padding_index=- 100)[source]

Concatenates tensor1 and tensor2 on first axis, applying padding on the second as needed

#TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.atleast_1d(tensor_or_array: Union[torch.Tensor, numpy.ndarray])[source]
transformers4rec.torch.utils.torch_utils.nested_numpify(tensors)[source]

Numpify tensors (even if it’s a nested list/tuple/dict of tensors). #TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.nested_truncate(tensors, limit)[source]

Truncate tensors at limit (even if it’s a nested list/tuple/dict of tensors). #TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.numpy_pad_and_concatenate(array1, array2, padding_index=- 100)[source]

Concatenates array1 and array2 on first axis, applying padding on the second if necessary. #TODO this method was copied from the latest version of HF transformers library to support dict outputs. So we should remove it when T4Rec is updated to use the latest version

transformers4rec.torch.utils.torch_utils.one_hot_1d(labels: torch.Tensor, num_classes: int, device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = torch.float32)torch.Tensor[source]

Coverts a 1d label tensor to one-hot representation

Parameters
  • labels (torch.Tensor) – tensor with labels of shape \((N, H, W)\), where N is batch size. Each value is an integer representing correct classification.

  • num_classes (int) – number of classes in labels.

  • device (Optional[torch.device]) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

  • dtype (Optional[torch.dtype]) – the desired data type of returned tensor. Default: torch.float32

Returns

the labels in one hot tensor.

Return type

torch.Tensor

Examples::
>>> labels = torch.LongTensor([0, 1, 2, 0])
>>> one_hot_1d(labels, num_classes=3)
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
       ])
class transformers4rec.torch.utils.torch_utils.LambdaModule(lambda_fn)[source]

Bases: torch.nn.modules.module.Module

forward(x)[source]
training: bool
class transformers4rec.torch.utils.torch_utils.MappingTransformerMasking[source]

Bases: object

class CausalLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • train_on_last_item_seq_only (predict only last item during training) –

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training: bool = False, testing: bool = False)torch.Tensor
class MaskedLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it.

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor, training=False, testing=False)torch.Tensor

Control the masked positions in the inputs by replacing the true interaction by a learnable masked embedding.

inputs: torch.Tensor

The 3-D tensor of interaction embeddings resulting from the ops: TabularFeatures + aggregation + projection(optional)

schema: MaskingSchema

The boolean mask indicating masked positions.

training: bool

Flag to indicate whether we are in Training mode or not. During training, the labels can be any items within the sequence based on the selected masking task.

testing: bool

Flag to indicate whether we are in Evaluation (=True) or Inference (=False) mode. During evaluation, we are predicting all next items or last item only in the sequence based on the param eval_on_last_item_seq_only. During inference, we don’t mask the input sequence and use all available information to predict the next item.

class PermutationLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • max_span_length (int) – maximum length of a span of masked items

  • plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items

  • permute_all (bool) – Compute partial span-based prediction (=False) or not.

compute_masked_targets(item_ids: torch.Tensor, training=False, **kwargs)transformers4rec.torch.masking.MaskingInfo
transformer_required_arguments()Dict[str, Any]
class ReplacementLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskedLanguageModeling

Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks.

Parameters
  • hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.

  • padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length

  • eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation

  • sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not

get_fake_tokens(itemid_seq, target_flat, logits)

Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements.

Parameters
  • itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids

  • target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences

  • logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise

Returns

  • corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement

  • discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items

  • batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled

sample_from_softmax(logits: torch.Tensor)torch.Tensor

Sampling method for replacement token modeling (ELECTRA)

Parameters

logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model

Returns

samples – ids of replacements items.

Return type

torch.Tensor(#pos_item)

DEFAULT_MASKING = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>, <class 'transformers4rec.torch.masking.PermutationLanguageModeling'>]
BertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
ConvBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
DebertaConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
DistilBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
GPT2Config = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>]
LongformerConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
MegatronBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
MPNetConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
RobertaConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
RoFormerConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]
TransfoXLConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>]
XLNetConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>, <class 'transformers4rec.torch.masking.PermutationLanguageModeling'>]

Module contents