transformers4rec.torch.utils package

Submodules

transformers4rec.torch.utils.data_utils module

class transformers4rec.torch.utils.data_utils.T4RecDataLoader[source]

Bases: abc.ABC

Base Helper class to build dataloader from the schema with properties required by T4Rec Trainer class.

classmethod from_schema(schema: merlin_standard_lib.schema.schema.Schema, paths_or_dataset, batch_size, max_sequence_length, **kwargs)[source]

set_dataset(paths_or_dataset)[source]

classmethod parse(class_or_str)[source]

class transformers4rec.torch.utils.data_utils.PyarrowDataLoader(paths_or_dataset, batch_size, max_sequence_length, cols_to_read=None, shuffle=False, shuffle_buffer_size=0, num_workers=1, pin_memory=True, drop_last=False, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

batch_size: Optional[int]

num_workers: int

pin_memory: bool

drop_last: bool

set_dataset(cols_to_read)[source]

set the Parquet dataset

Parameters: cols_to_read (str) – The list of features names to load

classmethod from_schema(schema, paths_or_dataset, batch_size, max_sequence_length, continuous_features=None, categorical_features=None, targets=None, shuffle=False, shuffle_buffer_size=0, num_workers=1, pin_memory=True, **kwargs)[source]

Instantiates PyarrowDataLoader from a DatasetSchema.

Parameters

schema (DatasetSchema) – Dataset schema
paths_or_dataset (Union[str, Dataset]) – Path to paquet data of Dataset object.
batch_size (int) – batch size of Dataloader.
max_sequence_length (int) – The maximum length of list features.

dataset: torch.utils.data.dataset.Dataset[T_co]

timeout: float

sampler: Union[torch.utils.data.sampler.Sampler, Iterable]

pin_memory_device: str

prefetch_factor: Optional[int]

class transformers4rec.torch.utils.data_utils.ParquetDataset(parquet_file, cols_to_read, seq_features_len_pad_trim)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

pad_seq_column_if_needed(values)[source]

class transformers4rec.torch.utils.data_utils.ShuffleDataset(dataset, buffer_size)[source]: Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

transformers4rec.torch.utils.examples_utils module

transformers4rec.torch.utils.examples_utils.list_files(startpath)[source]: Util function to print the nested structure of a directory

transformers4rec.torch.utils.examples_utils.visualize_response(batch, response, top_k, session_col='session_id')[source]

Util function to extract top-k encoded item-ids from logits

Parameters

batch (cudf.DataFrame) – the batch of raw data sent to triton server.
response (tritonclient.grpc.InferResult) – the response returned by grpc client.
top_k (int) – the top_k top items to retrieve from predictions.

transformers4rec.torch.utils.examples_utils.fit_and_evaluate(trainer, start_time_index, end_time_index, input_dir)[source]

Util function for time-window based fine-tuning using the T4rec Trainer class. Iteratively train using data of a given index and evaluate on the validation data of the following index.

Parameters

start_time_index (int) – The start index for training, it should match the partitions of the data directory
end_time_index (int) – The end index for training, it should match the partitions of the data directory
input_dir (str) – The input directory where the parquet files were saved based on partition column

Returns

indexed_by_time_metrics – The dictionary of ranking metrics: each item is the list of scores over time indices.

Return type

dict

transformers4rec.torch.utils.examples_utils.wipe_memory()[source]

transformers4rec.torch.utils.schema_utils module

transformers4rec.torch.utils.schema_utils.random_data_from_schema(schema: merlin_standard_lib.schema.schema.Schema, num_rows: int, max_session_length: Optional[int] = None, min_session_length: int = 5, device=None) → Dict[str, torch.Tensor][source]

transformers4rec.torch.utils.torch_utils module

class transformers4rec.torch.utils.torch_utils.OutputSizeMixin[source]

Bases: transformers4rec.config.schema.SchemaMixin, abc.ABC

build(input_size, schema=None, **kwargs)[source]

output_size(input_size=None)[source]

forward_output_size(input_size)[source]

class transformers4rec.torch.utils.torch_utils.LossMixin[source]

Bases: object

Mixin to use for a torch.Module that can calculate a loss.

compute_loss(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], compute_metrics: bool = True, **kwargs) → torch.Tensor [source]

Compute the loss on a batch of data.

Parameters

inputs (Union[torch.Tensor, TabularData]) – TODO
targets (Union[torch.Tensor, TabularData]) – TODO
compute_metrics (bool, default=True) – Boolean indicating whether or not to update the state of the metrics (if they are defined).

class transformers4rec.torch.utils.torch_utils.MetricsMixin[source]

Bases: object

Mixin to use for a torch.Module that can calculate metrics.

calculate_metrics(inputs: Union[torch.Tensor, Dict[str, torch.Tensor]], targets: Union[torch.Tensor, Dict[str, torch.Tensor]], mode: str = 'val', forward=True, **kwargs) → Dict[str, torch.Tensor][source]

Calculate metrics on a batch of data, each metric is stateful and this updates the state.

The state of each metric can be retrieved by calling the compute_metrics method.

Parameters

inputs (Union[torch.Tensor, TabularData]) – TODO
targets (Union[torch.Tensor, TabularData]) – TODO
forward (bool, default True) –
mode (str, default="val") –

compute_metrics(mode: Optional[str] = None) → Dict[str, Union[float, torch.Tensor]][source]

Returns the current state of each metric.

The state is typically updated each batch by calling the calculate_metrics method.

Parameters: mode (str, default="val") –
Returns
Return type: Dict[str, Union[float, torch.Tensor]]

reset_metrics()[source]: Reset all metrics.

transformers4rec.torch.utils.torch_utils.requires_schema(module)[source]

transformers4rec.torch.utils.torch_utils.check_gpu(module)[source]

transformers4rec.torch.utils.torch_utils.get_output_sizes_from_schema(schema: merlin_standard_lib.schema.schema.Schema, batch_size=- 1, max_sequence_length=None)[source]

transformers4rec.torch.utils.torch_utils.calculate_batch_size_from_input_size(input_size)[source]

transformers4rec.torch.utils.torch_utils.check_inputs(ks, scores, labels)[source]

transformers4rec.torch.utils.torch_utils.extract_topk(ks, scores, labels)[source]

transformers4rec.torch.utils.torch_utils.create_output_placeholder(scores, ks)[source]

transformers4rec.torch.utils.torch_utils.tranform_label_to_onehot(labels, vocab_size)[source]

transformers4rec.torch.utils.torch_utils.one_hot_1d(labels: torch.Tensor, num_classes: int, device: Optional[torch.device] = None, dtype: Optional[torch.dtype] = torch.float32) → torch.Tensor [source]

Coverts a 1d label tensor to one-hot representation

Parameters

labels (torch.Tensor) – tensor with labels of shape \((N, H, W)\), where N is batch size. Each value is an integer representing correct classification.
num_classes (int) – number of classes in labels.
device (Optional[torch.device]) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
dtype (Optional[torch.dtype]) – the desired data type of returned tensor. Default: torch.float32

Returns

the labels in one hot tensor.

Return type

torch.Tensor

Examples::

>>> labels = torch.LongTensor([0, 1, 2, 0])
>>> one_hot_1d(labels, num_classes=3)
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
       ])

class transformers4rec.torch.utils.torch_utils.LambdaModule(lambda_fn)[source]

Bases: torch.nn.modules.module.Module

forward(x)[source]

training: bool

class transformers4rec.torch.utils.torch_utils.MappingTransformerMasking[source]

Bases: object

class CausalLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, train_on_last_item_seq_only: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Causal Language Modeling (clm) you predict the next item based on past positions of the sequence. Future positions are masked.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
train_on_last_item_seq_only (predict only last item during training) –

apply_mask_to_inputs(inputs: torch.Tensor, mask_schema: torch.Tensor) → torch.Tensor 

class MaskedLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, mlm_probability: float = 0.15, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Masked Language Modeling (mlm) you randomly select some positions of the sequence to be predicted, which are masked. During training, the Transformer layer is allowed to use positions on the right (future info). During inference, all past items are visible for the Transformer layer, which tries to predict the next item.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
mlm_probability (Optional[float], default = 0.15) – Probability of an item to be selected (masked) as a label of the given sequence. p.s. We enforce that at least one item is masked for each sequence, so that the network can learn something with it.

class PermutationLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, plm_probability: float = 0.16666666666666666, max_span_length: int = 5, permute_all: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskSequence

In Permutation Language Modeling (plm) you use a permutation factorization at the level of the self-attention layer to define the accessible bidirectional context.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
max_span_length (int) – maximum length of a span of masked items
plm_probability (float) – The ratio of surrounding items to unmask to define the context of the span-based prediction segment of items
permute_all (bool) – Compute partial span-based prediction (=False) or not.

compute_masked_targets(item_ids: torch.Tensor, training=False) → transformers4rec.torch.masking.MaskingInfo 

transformer_required_arguments() → Dict[str, Any]

class ReplacementLanguageModeling(hidden_size: int, padding_idx: int = 0, eval_on_last_item_seq_only: bool = True, sample_from_batch: bool = False, **kwargs)

Bases: transformers4rec.torch.masking.MaskedLanguageModeling

Replacement Language Modeling (rtd) you use MLM to randomly select some items, but replace them by random tokens. Then, a discriminator model (that can share the weights with the generator or not), is asked to classify whether the item at each position belongs or not to the original sequence. The generator-discriminator architecture was jointly trained using Masked LM and RTD tasks.

Parameters

hidden_size (int) – The hidden dimension of input tensors, needed to initialize trainable vector of masked positions.
padding_idx (int, default = 0) – Index of padding item used for getting batch of sequences with the same length
eval_on_last_item_seq_only (bool, default = True) – Predict only last item during evaluation
sample_from_batch (bool) – Whether to sample replacement item ids from the same batch or not

get_fake_tokens(itemid_seq, target_flat, logits)

Second task of RTD is binary classification to train the discriminator. The task consists of generating fake data by replacing [MASK] positions with random items, ELECTRA discriminator learns to detect fake replacements.

Parameters

itemid_seq (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids
target_flat (torch.Tensor of shape (bs*max_seq_len)) – flattened masked label sequences
logits (torch.Tensor of shape (#pos_item, vocab_size or #pos_item),) – mlm probabilities of positive items computed by the generator model. The logits are over the whole corpus if sample_from_batch = False, over the positive items (masked) of the current batch otherwise

Returns

corrupted_inputs (torch.Tensor of shape (bs, max_seq_len)) – input sequence of item ids with fake replacement
discriminator_labels (torch.Tensor of shape (bs, max_seq_len)) – binary labels to distinguish between original and replaced items
batch_updates (torch.Tensor of shape (#pos_item)) – the indices of replacement item within the current batch if sample_from_batch is enabled

sample_from_softmax(logits: torch.Tensor) → torch.Tensor 

Sampling method for replacement token modeling (ELECTRA)

Parameters: logits (torch.Tensor(pos_item, vocab_size)) – scores of probability of masked positions returned by the generator model
Returns: samples – ids of replacements items.
Return type: torch.Tensor(#pos_item)

DEFAULT_MASKING = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>, <class 'transformers4rec.torch.masking.PermutationLanguageModeling'>]

BertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

ConvBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

DebertaConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

DistilBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

GPT2Config = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>]

LongformerConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

MegatronBertConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

MPNetConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

RobertaConfig = [<class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

RoFormerConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>]

TransfoXLConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>]

XLNetConfig = [<class 'transformers4rec.torch.masking.CausalLanguageModeling'>, <class 'transformers4rec.torch.masking.MaskedLanguageModeling'>, <class 'transformers4rec.torch.masking.ReplacementLanguageModeling'>, <class 'transformers4rec.torch.masking.PermutationLanguageModeling'>]

transformers4rec.torch.utils package

Submodules

transformers4rec.torch.utils.data_utils module

transformers4rec.torch.utils.examples_utils module

transformers4rec.torch.utils.schema_utils module

transformers4rec.torch.utils.torch_utils module

Module contents