API Documentation#

TensorFlow Models#

Ranking Model Constructors#

DCNModel(schema, depth[, deep_block, ...])

Create a model using the architecture proposed in DCN V2: Improved Deep & Cross Network [1].

DeepFMModel(schema[, embedding_dim, ...])

DeepFM-model architecture, which is the sum of the 1-dim output of a Factorization Machine [2] and a Deep Neural Network

DLRMModel(schema, *[, embeddings, ...])

DLRM-model architecture.

WideAndDeepModel(schema, deep_block[, ...])

The Wide&Deep architecture [1] was proposed by Google in 2016 to balance between the ability of neural networks to generalize and capacity of linear models to memorize relevant feature interactions.

Retrieval Model Constructors#

Encoder(*args, **kwargs)

Block that can be used for prediction and evaluation but not for training

EmbeddingEncoder(*args, **kwargs)

ItemRetrievalScorer(*args, **kwargs)

Block for ItemRetrieval, which expects query/user and item embeddings as input and uses dot product to score the positive item (inputs["item"]) and also sampled negative items (during training). :param samplers: List of item samplers that provide negative samples when training=True :type samplers: List[ItemSampler], optional :param sampling_downscore_false_negatives: Identify false negatives (sampled item ids equal to the positive item and downscore them to the sampling_downscore_false_negatives_value), by default True :type sampling_downscore_false_negatives: bool, optional :param sampling_downscore_false_negatives_value: Value to be used to downscore false negatives when sampling_downscore_false_negatives=True, by default np.finfo(np.float32).min / 100.0 :type sampling_downscore_false_negatives_value: int, optional :param item_id_feature_name: Name of the column containing the item ids Defaults to item_id :type item_id_feature_name: str :param query_name: Identify query tower for query/user embeddings, by default 'query' :type query_name: str :param item_name: Identify item tower for item embeddings, by default'item' :type item_name: str :param cache_query: Add query embeddings to the context block, by default False :type cache_query: bool :param sampled_softmax_mode: Use sampled softmax for scoring, by default False :type sampled_softmax_mode: bool :param store_negative_ids: Returns negative items ids as part of the output, by default False :type store_negative_ids: bool.

RetrievalModelV2(*args, **kwargs)

MatrixFactorizationModelV2(schema, dim[, ...])

Builds a matrix factorization (MF) model.

MatrixFactorizationModel(schema, dim[, ...])

Builds a matrix factorization model.

TwoTowerModelV2(query_tower, candidate_tower)

Builds the Two-tower architecture, as proposed in [1].

TwoTowerModel(schema, query_tower[, ...])

Builds the Two-tower architecture, as proposed in [1].

YoutubeDNNRetrievalModelV2(schema[, ...])

Build the Youtube-DNN retrieval model.

YoutubeDNNRetrievalModel(schema[, ...])

Build the Youtube-DNN retrieval model.

Input Block Constructors#

Embeddings(schema[, dim, infer_dim_fn, ...])

Creates a ParallelBlock with an EmbeddingTable for each categorical feature in the schema.

EmbeddingTable(*args, **kwargs)

Embedding table that is backed by a standard Keras Embedding Layer.

AverageEmbeddingsByWeightFeature(*args, **kwargs)

ReplaceMaskedEmbeddings(*args, **kwargs)

Takes a 3D input tensor (batch size x seq. length x embedding dim) and replaces

L2Norm(*args, **kwargs)

Apply L2-normalization to input tensors along a given axis

InputBlockV2([schema, categorical, ...])

The entry block of the model to process input features from a schema.

InputBlock(schema[, branches, post, ...])

The entry block of the model to process input features from a schema.

Continuous(*args, **kwargs)

Filters (keeps) only the continuous features.

ContinuousFeatures(*args, **kwargs)

Input block for continuous features.

ContinuousEmbedding(inputs, embedding_block)

ContinuousProjection(schema, projection)

Concatenates the continuous features and combines them using a layer

SequenceEmbeddingFeatures(*args, **kwargs)

Input block for embedding-lookups for categorical features. This module produces 3-D tensors, this is useful for sequential models like transformers. :param feature_config: This specifies what TableConfig to use for each feature. For shared embeddings, the same TableConfig can be used for multiple features. :type feature_config: Dict[str, FeatureConfig] :param item_id: The name of the feature that's used for the item_id. :type item_id: str, optional :param padding_idx: The symbol to use for padding. :type padding_idx: int :param pre: Transformations to apply on the inputs when the module is called (so before call). :type pre: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional :param post: Transformations to apply on the inputs after the module is called (so after call). :type post: Union[str, TabularTransformation, List[str], List[TabularTransformation]], optional :param aggregation: Aggregation to apply after processing the call-method to output a single Tensor.

Model Building Block Constructors#

DLRMBlock(schema, *[, embedding_dim, ...])

Builds the DLRM architecture, as proposed in the following `paper https://arxiv.org/pdf/1906.00091.pdf`_ [1]_.

MLPBlock(dimensions[, activation, use_bias, ...])

A block that applies a multi-layer perceptron to the input.

CrossBlock([depth, filter, low_rank_dim, ...])

This block provides a way to create high-order feature interactions

TwoTowerBlock(*args, **kwargs)

Builds the Two-tower architecture, as proposed in the following `paper https://doi.org/10.1145/3298689.3346996`_ [Xinyang19].

MatrixFactorizationBlock(schema, dim[, ...])

Returns a block for Matrix Factorization, which created the user and item embeddings based on the schema and computes the dot product between user and item L2-norm embeddings

DotProductInteraction(*args, **kwargs)

FMBlock(schema[, fm_input_block, ...])

Implements the Factorization Machine, as introduced in [1].

FMPairwiseInteraction(*args, **kwargs)

Compute pairwise (2nd-order) feature interactions like defined in Factorized Machine [1].

Modeling Prediction Task Constructors#


The modeling prediction task classes are deprecated in favor of the prediction output classes.

PredictionTasks(schema[, task_blocks, ...])

Creates Multi-task prediction Blocks from schema

PredictionTask(*args, **kwargs)

Base-class for prediction tasks.

BinaryClassificationTask(*args, **kwargs)

Prediction task for binary classification.

MultiClassClassificationTask(*args, **kwargs)

Prediction task for multi-class classification.

RegressionTask(*args, **kwargs)

Prediction task for regression-task.

ItemRetrievalTask(*args, **kwargs)

Prediction-task for item-retrieval.

Modeling Prediction Output Constructors#

OutputBlock(schema[, model_outputs, pre, ...])

Creates model output(s) based on the columns tagged as target in the schema.

ModelOutput(*args, **kwargs)

Base-class for prediction blocks.

BinaryOutput(*args, **kwargs)

Binary-classification prediction block.

CategoricalOutput(*args, **kwargs)

Categorical output

ContrastiveOutput(*args, **kwargs)

Categorical output

RegressionOutput(*args, **kwargs)

Regression prediction block

ColumnBasedSampleWeight(*args, **kwargs)

Allows using columns (features or targets) as sample weights for a give ModelOutput.

Model Pipeline Constructors#

SequentialBlock(*args, **kwargs)

The SequentialLayer represents a sequence of Keras layers. It is a Keras Layer that can be used instead of tf.keras.layers.Sequential, which is actually a Keras Model. In contrast to keras Sequential, this layer can be used as a pure Layer in tf.functions and when exporting SavedModels, without having to pre-declare input and output shapes. In turn, this layer is usable as a preprocessing layer for TF Agents Networks, and can be exported via PolicySaver. Usage::.

ParallelBlock(*args, **kwargs)

Merge multiple layers or TabularModule's into a single output of TabularData.

ParallelPredictionBlock(*args, **kwargs)

Multi-task prediction block.

DenseResidualBlock([low_rank_dim, ...])

A block that applies a dense residual block to the input.

DualEncoderBlock(*args, **kwargs)

ResidualBlock(*args, **kwargs)

TabularBlock(*args, **kwargs)

Layer that's specialized for tabular-data by integrating many often used operations.

Filter(*args, **kwargs)

Transformation that filters out certain features from TabularData."

Cond(*args, **kwargs)

Layer to enable conditionally apply layers.

Model Evaluation Constructors#

TopKEncoder(*args, **kwargs)

Block that can be used for top-k prediction & evaluation, initialized from a trained retrieval model

Model Optimizer Constructors#

MultiOptimizer(optimizers_and_blocks[, ...])

An optimizer that composes multiple individual optimizers.

LazyAdam([learning_rate, beta_1, beta_2, ...])

Variant of the Adam optimizer that handles sparse updates more efficiently.

OptimizerBlocks(optimizer, blocks)

dataclass for a pair of optimizer and blocks that the optimizer should apply to.

split_embeddings_on_size(embeddings, threshold)

split embedding tables in ParallelBlock based on size threshold (first dimension of embedding tables), return a tuple of two lists, which contain large embeddings and small embeddings

Transformation Block Constructors#

CategoryEncoding(*args, **kwargs)

A preprocessing layer which encodes integer features.

MapValues(*args, **kwargs)

Layer to map values of a dictionary of tensors.

ListToDense(*args, **kwargs)

Convert all list-inputs to dense-tensors.

ListToRagged(*args, **kwargs)

Convert all list (multi-hot/sequential) features to tf.RaggedTensor

ListToSparse(*args, **kwargs)

Convert all list-inputs to sparse-tensors.

ToSparse(*args, **kwargs)

Convert the features provided in the schema to sparse tensors.

ToDense(*args, **kwargs)

Convert the features provided in the schema to dense tensors.

ToTarget(*args, **kwargs)

Transform columns to targets

ToOneHot(*args, **kwargs)

Transform the categorical encoded labels into a one-hot representation

HashedCross(*args, **kwargs)

A transformation block which crosses categorical features using the "hashing trick". Conceptually, the transformation can be thought of as: hash(concatenation of features) % num_bins Example usage:: model_body = ParallelBlock( TabularBlock.from_schema(schema=cross_schema, pre=ml.HashedCross(cross_schema, num_bins = 1000)), is_input=True).connect(ml.MLPBlock([64, 32])) model = ml.Model(model_body, ml.BinaryClassificationTask("click")) :param schema: The Schema with the input features :type schema: Schema :param num_bins: Number of hash bins. :type num_bins: int :param output_mode: Specification for the output of the layer. Defaults to "one_hot". Values can be "int", or "one_hot", configuring the layer as follows: - "int": Return the integer bin indices directly. - "one_hot": Encodes each individual element in the input into an array with the same size as num_bins, containing a 1 at the input's bin index. :type output_mode: string :param sparse: Boolean. Only applicable to "one_hot" mode. If True, returns a SparseTensor instead of a dense Tensor. Defaults to False. :type sparse: bool :param output_name: Name of output feature, if not specified, default would be cross_<feature_name>_<feature_name>_<...> :type output_name: string :param infer_num_bins: If True, num_bins would be set as the multiplier of feature cadinalities, if the multiplier is bigger than max_num_bins, then it would be cliped by max_num_bins :type infer_num_bins: bool :param max_num_bins: Upper bound of num_bins, by default 100000. :type max_num_bins: int.

HashedCrossAll(schema[, num_bins, ...])

Parallel block consists of HashedCross blocks for all combinations of schema with all levels

BroadcastToSequence(*args, **kwargs)

Broadcast context features to match the timesteps of sequence features.

SequencePredictNext(*args, **kwargs)

Prepares sequential inputs and targets for next-item prediction.

SequencePredictLast(*args, **kwargs)

Prepares sequential inputs and targets for last-item prediction.

SequencePredictRandom(*args, **kwargs)

Prepares sequential inputs and targets for random-item prediction.

SequenceTargetAsInput(*args, **kwargs)

Creates targets to be equal to one of the sequential input features.

SequenceMaskLast(*args, **kwargs)

This block copies one of the sequence input features to be the target feature.

SequenceMaskRandom(*args, **kwargs)

This block implements the Masked Language Modeling (MLM) training approach introduced in BERT (NLP) and later adapted to RecSys by BERT4Rec [1].

ExpandDims(*args, **kwargs)

Expand dims of selected input tensors. Example:: inputs = { "cont_feat1": tf.random.uniform((NUM_ROWS,)), "cont_feat2": tf.random.uniform((NUM_ROWS,)), "multi_hot_categ_feat": tf.random.uniform( (NUM_ROWS, 4), minval=1, maxval=100, dtype=tf.int32 ), } expand_dims_op = tr.ExpandDims(expand_dims={"cont_feat2": 0, "multi_hot_categ_feat": 1}) expanded_inputs = expand_dims_op(inputs).

StochasticSwapNoise(*args, **kwargs)

Applies Stochastic replacement of sequence features

AsTabular(*args, **kwargs)

Converts a Tensor to TabularData by converting it to a dictionary.

Multi-Task Block Constructors#

MMOEBlock(outputs, expert_block, num_experts)

Implements the Multi-gate Mixture-of-Experts (MMoE) introduced in [1].

CGCBlock(*args, **kwargs)

Implements the Customized Gate Control (CGC) proposed in [1].

PLEBlock(num_layers, outputs, expert_block)

Implements the Progressive Layered Extraction (PLE) model from [1], by stacking CGC blocks (CGCBlock).

Data Loader Customization Constructor#

merlin.models.tf.Loader(paths_or_dataset, ...)

Override class to customize data loading for backward compatibility with older NVTabular releases.


AvgPrecisionAt(*args, **kwargs)

MRRAt(*args, **kwargs)

NDCGAt(*args, **kwargs)

PrecisionAt(*args, **kwargs)

RecallAt(*args, **kwargs)

TopKMetricsAggregator(*args, **kwargs)

Aggregator for top-k metrics (TopkMetric) that is optimized to sort top-k predictions only once for all metrics.


ItemSampler(*args, **kwargs)

InBatchSampler(*args, **kwargs)

Provides in-batch sampling [1]_ for two-tower item retrieval models.

PopularityBasedSampler(*args, **kwargs)

Provides a popularity-based negative sampling for the softmax layer to ensure training efficiency when the catalog of items is very large.



Extends tf.keras.losses.SparseCategoricalCrossentropy by making from_logits=True by default (in this case an optimized softmax activation is applied within this loss, you should not include softmax activation manually in the output layer).


Extends tf.keras.losses.SparseCategoricalCrossentropy by making from_logits=True by default (in this case an optimized softmax activation is applied within this loss, you should not include softmax activation manually in the output layer).

BPRLoss([reduction, name])

The Bayesian Personalised Ranking (BPR) pairwise loss [1]_


The BPR-max pairwise loss proposed in [1]_

HingeLoss([reduction, name])

Pairwise hinge loss, as described in [1]_: max(0, 1 + r_uj - r_ui)), where r_ui is the score of the positive item and r_uj the score of negative items.

LogisticLoss([reduction, name])

Pairwise log loss, as described in [1]_: log(1 + exp(r_uj - r_ui)), where r_ui is the score of the positive item and r_uj the score of negative items.

TOP1Loss([reduction, name])

The TOP pairwise loss proposed in [1]_

TOP1maxLoss([reduction, name])

The TOP1-max pairwise loss proposed in [1]_

TOP1v2Loss([reduction, name])

An adapted version of the TOP pairwise loss proposed in [1]_, but following the current GRU4Rec implementation [2]_.

Schema Functions#







Filters out entries from input_dict, returns a dictionary where every entry corresponds to a column in the schema




Provides a heristic (from Google) that suggests the embedding sizes as a function (forth root) of categorical features cardinalities, obtained from the schema.


Provides a heuristic (from Google) that suggests the embedding dimension as a function (forth root) of the feature cardinality.


Tensor Utilities#

TensorInitializer(weights, **kwargs)

Initializer that returns a tensor (e.g.

Miscellaneous Utility Functions#





Analyses the feature map config and returns the name of the label feature (e.g.


Analyses the feature map config and returns the name of the label feature (e.g.



A context manager that prints the execution time of the block it manages


Recursively finds size of objects


Util function to load NVTabular Dataset from disk

Registry Functions#




Default name for a class or function.


merlin.models.utils.registry.Registry(...[, ...])

Dict-like class for managing function registrations.



Creates a help string for names_list grouped by prefix.