transformers4rec.torch.model package


transformers4rec.torch.model.head module

transformers4rec.torch.model.model module

transformers4rec.torch.model.prediction_task module

class transformers4rec.torch.model.prediction_task.BinaryClassificationPrepareBlock[source]

Bases: transformers4rec.torch.block.base.BuildableBlock

class transformers4rec.torch.model.prediction_task.BinaryClassificationTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=BCELoss(), metrics=(BinaryPrecision(), BinaryRecall(), BinaryAccuracy()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

Returns a PredictionTask for binary classification.

Example usage:

# Define the input module to process the tabular input features.
input_module = tr.TabularSequenceFeatures.from_schema(

# Define XLNetConfig class and set default parameters for HF XLNet config.
transformer_config =
    d_model=d_model, n_head=4, n_layer=2, total_seq_length=max_sequence_length

# Define the model block including: inputs, masking, projection and transformer block.
body = tr.SequentialBlock(

# Define a head with BinaryClassificationTask.
head = tr.Head(

# Get the end-to-end Model class.
model = tr.Model(head)
  • target_name (Optional[str] = None) – Specifies the variable name that represents the positive and negative values.

  • task_name (Optional[str] = None) – Specifies the name of the prediction task. If this parameter is not specified, a name is automatically constructed based on target_name and the Python class name of the model.

  • task_block (Optional[BlockType] = None) – Specifies a module to transform the input tensor before computing predictions.

  • loss (torch.nn.Module) – Specifies the loss function for the task. The default class is torch.nn.BCELoss.

  • metrics (Tuple[torch.nn.Module, ..]) – Specifies the metrics to calculate during training and evaluation. The default metrics are Precision, Recall, and Accuracy.

  • summary_type (str) –

    Summarizes a sequence into a single tensor. Accepted values are:

    • last – Take the last token hidden state (like XLNet)

    • first – Take the first token hidden state (like Bert)

    • mean – Take the mean of all tokens hidden states

    • cls_index – Supply a Tensor of classification token position (GPT/GPT-2)

    • attn – Not implemented now, use multi-head attention

DEFAULT_METRICS = (BinaryPrecision(), BinaryRecall(), BinaryAccuracy())
training: bool
class transformers4rec.torch.model.prediction_task.RegressionPrepareBlock[source]

Bases: transformers4rec.torch.block.base.BuildableBlock

class transformers4rec.torch.model.prediction_task.RegressionTask(target_name: Optional[str] = None, task_name: Optional[str] = None, task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, loss=MSELoss(), metrics=(MeanSquaredError()), summary_type='first')[source]

Bases: transformers4rec.torch.model.base.PredictionTask

DEFAULT_METRICS = (MeanSquaredError(),)
training: bool
class transformers4rec.torch.model.prediction_task.NextItemPredictionTask(loss: torch.nn.modules.module.Module = CrossEntropyLoss(), metrics: Iterable[torchmetrics.metric.Metric] = (NDCGAt(), AvgPrecisionAt(), RecallAt()), task_block: Optional[Union[transformers4rec.torch.block.base.BlockBase, transformers4rec.torch.block.base.BuildableBlock]] = None, task_name: str = 'next-item', weight_tying: bool = False, softmax_temperature: float = 1, padding_idx: int = 0, target_dim: Optional[int] = None, sampled_softmax: Optional[bool] = False, max_n_samples: Optional[int] = 100)[source]

Bases: transformers4rec.torch.model.base.PredictionTask

This block performs item prediction task for session and sequential-based models. It requires a body containing a masking schema to use for training and target generation. For the supported masking schemes, please refers to:

  • loss (torch.nn.Module) – Loss function to use. Defaults to NLLLos.

  • metrics (Iterable[torchmetrics.Metric]) – List of ranking metrics to use for evaluation.

  • task_block – Module to transform input tensor before computing predictions.

  • task_name (str, optional) – Name of the prediction task, if not provided a name will be automatically constructed based on the target-name & class-name.

  • weight_tying (bool) – The item id embedding table weights are shared with the prediction network layer.

  • softmax_temperature (float) – Softmax temperature, used to reduce model overconfidence, so that softmax(logits / T). Value 1.0 reduces to regular softmax.

  • padding_idx (int) – pad token id.

  • target_dim (int) – vocabulary size of item ids

  • sampled_softmax (Optional[bool]) – Enables sampled softmax. By default False

  • max_n_samples (Optional[int]) – Number of samples for sampled softmax. By default 100

DEFAULT_METRICS = (NDCGAt(), AvgPrecisionAt(), RecallAt())
build(body, input_size, device=None, inputs=None, task_block=None, pre=None)[source]

Build method, this is called by the Head.

forward(inputs: torch.Tensor, targets=None, training=False, testing=False, top_k=None, **kwargs)[source]
remove_pad_3d(inp_tensor, non_pad_mask)[source]
calculate_metrics(predictions, targets)Dict[str, torch.Tensor][source]
training: bool
class transformers4rec.torch.model.prediction_task.NextItemPredictionPrepareBlock(target_dim: int, weight_tying: bool = False, item_embedding_table: Optional[torch.nn.modules.module.Module] = None, softmax_temperature: float = 0, sampled_softmax: Optional[bool] = False, max_n_samples: Optional[int] = 100, min_id: Optional[int] = 0)[source]

Bases: transformers4rec.torch.block.base.BuildableBlock

class transformers4rec.torch.model.prediction_task.LogUniformSampler(max_n_samples: int, max_id: int, min_id: Optional[int] = 0, unique_sampling: bool = True, n_samples_multiplier_before_unique: int = 2)[source]

Bases: torch.nn.modules.module.Module

get_log_uniform_distr(max_id: int, min_id: int = 0)torch.Tensor[source]

Approximates the items frequency distribution with log-uniform probability distribution with P(class) = (log(class + 2) - log(class + 1)) / log(max_id + 1). It assumes item ids are sorted decreasingly by their frequency.


max_id (int) – Maximum discrete value for sampling (e.g. cardinality of the item id)


Returns the log uniform probability distribution

Return type


get_unique_sampling_distr(dist, n_sample)[source]

Returns the probability that each item is sampled at least once given the specified number of trials. This is meant to be used when self.unique_sampling == True. That probability can be approximated by by 1 - (1 - p)^n and we use a numerically stable version: -expm1(num_tries * log1p(-p))

sample(labels: torch.Tensor)[source]

Sample negative samples and calculate their probabilities.

If unique_sampling==True, then only unique sampled items will be returned. The actual # samples will vary from run to run if unique_sampling==True, as sampling without replacement (torch.multinomial(…, replacement=False)) is slow, so we use torch.multinomial(…, replacement=True).unique() which doesn’t guarantee the same number of unique sampled items. You can try to increase n_samples_multiplier_before_unique to increase the chances to have more unique samples in that case.


labels (torch.Tensor, dtype=torch.long, shape=(batch_size,)) – The input labels for which negative samples should be generated.


  • neg_samples (torch.Tensor, dtype=torch.long, shape=(n_samples,)) – The unique negative samples drawn from the log-uniform distribution.

  • true_probs (torch.Tensor, dtype=torch.float32, shape=(batch_size,)) – The probabilities of the input labels according to the log-uniform distribution (depends on self.unique_sampling choice).

  • samp_log_probs (torch.Tensor, dtype=torch.float32, shape=(n_samples,)) – The probabilities of the sampled negatives according to the log-uniform distribution (depends on self.unique_sampling choice).

training: bool

Module contents