# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ================================
Taking the Next Step with Merlin Models: Define Your Own Architecture
In Iterating over Deep Learning Models using Merlin Models, we conducted a benchmark of standard and deep learning-based ranking models provided by the high-level Merlin Models API. The library also includes the standard components of deep learning that let recsys practitioners and researchers to define custom models, train and export them for inference.
In this example, we combine pre-existing blocks and demonstrate how to create the DLRM architecture.
Learning objectives
Understand the building blocks of Merlin Models
Define a model architecture from scratch
Introduction to Merlin-models core building blocks
The Block is the core abstraction in Merlin Models and is the class from which all blocks inherit.
The class extends the tf.keras.layers.Layer base class and implements a number of properties that simplify the creation of custom blocks and models. These properties include the Schema
object for determining the embedding dimensions, input shapes, and output shapes. Additionally, the Block
has a BlockContext
instance to store and retrieve public variables and share them with other blocks in the same model as additional meta-data.
Before deep-diving into the definition of the DLRM architecture, let’s start by listing the core components you need to know to define a model from scratch:
Features Blocks
They include input blocks to process various inputs based on their types and shapes. Merlin Models supports three main blocks:
EmbeddingFeatures
: Input block for embedding-lookups for categorical features.SequenceEmbeddingFeatures
: Input block for embedding-lookups for sequential categorical features (3D tensors).ContinuousFeatures
: Input block for continuous features.
Transformations Blocks
They include various operators commonly used to transform tensors in various parts of the model, such as:
AsDenseFeatures
: It takes a dictionary of raw input tensors and transforms the sparse tensors into dense tensors.L2Norm
: It takes a single or a dictionary of hidden tensors and applies an L2-normalization along a given axis.LogitsTemperatureScaler
: It scales the output tensor of predicted logits to lower the model’s confidence.
Aggregations Blocks
They include common aggregation operations to combine multiple tensors, such as:
ConcatFeatures
: Concatenate dictionary of tensors along a given dimension.StackFeatures
: Stack dictionary of tensors along a given dimension.CosineSimilarity
: Calculate the cosine similarity between two tensors.
Connects Methods
The base class Block
implements different connects methods that control how to link a given block to other blocks:
connect
: Connect the block to other blocks sequentially. The output is a tensor returned by the last block.connect_branch
: Link the block to other blocks in parallel. The output is a dictionary containing the output tensor of each block.connect_with_shortcut
: Connect the block to other blocks sequentially and apply a skip connection with the block’s output.connect_with_residual
: Connect the block to other blocks sequentially and apply a residual sum with the block’s output.
Prediction Tasks
Merlin Models introduces the PredictionTask
layer that defines the necessary blocks and transformation operations to compute the final prediction scores. It also provides the default loss and metrics related to the given prediction task.
Merlin Models supports the core tasks: BinaryClassificationTask
, MultiClassClassificationTask
, andRegressionTask
. In addition to the preceding tasks, Merlin Models provides tasks that are specific to recommender systems: NextItemPredictionTask
, and ItemRetrievalTask
.
Implement the DLRM model with MovieLens-1M data
Now that we have introduced the core blocks of Merlin Models, let’s take a look at how we can combine them to define the DLRM architecture:
import tensorflow as tf
import merlin.models.tf as mm
from merlin.datasets.entertainment import get_movielens
from merlin.schema.tags import Tags
2022-04-12 17:57:26.504390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24570 MB memory: -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6
We use the get_movielens
function to download, extract, and preprocess the MovieLens 1M dataset:
train, valid = get_movielens(variant="ml-1m")
WARNING:tensorflow:From /models/merlin/models/utils/nvt_utils.py:14: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
WARNING:tensorflow:From /models/merlin/models/utils/nvt_utils.py:14: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-04-12 17:57:27.081082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 24570 MB memory: -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
We display the first five rows of the validation data and use them to check the outputs of each building block:
valid.head()
userId | movieId | title | genres | gender | age | occupation | zipcode | TE_age_rating | TE_gender_rating | TE_occupation_rating | TE_zipcode_rating | TE_movieId_rating | TE_userId_rating | rating_binary | rating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 178 | 60 | 60 | [3, 7, 14] | 2 | 1 | 8 | 178 | -0.520464 | 1.792874 | -0.076353 | -0.251986 | -0.320740 | -0.461858 | 1 | 4.0 |
1 | 81 | 1408 | 1409 | [1] | 1 | 5 | 11 | 240 | 1.955704 | -0.537666 | 1.541820 | 1.849453 | 0.161224 | 1.619103 | 1 | 4.0 |
2 | 183 | 349 | 352 | [1, 9] | 1 | 1 | 4 | 58 | -0.561167 | -0.602045 | -0.140828 | 0.369887 | -0.701068 | -0.095035 | 0 | 3.0 |
3 | 153 | 1310 | 1311 | [2, 6] | 1 | 1 | 10 | 338 | -0.535551 | -0.506479 | 0.173980 | 0.671975 | -0.082473 | 0.599116 | 1 | 4.0 |
4 | 297 | 1491 | 1496 | [5, 4] | 2 | 1 | 11 | 408 | -0.523482 | 1.630173 | 1.541820 | -0.721210 | -3.000164 | -0.781899 | 0 | 1.0 |
We convert the first five rows of the valid
dataset to a batch of input tensors:
batch = mm.sample_batch(valid, batch_size=5, shuffle=False, include_targets=False)
batch["userId"]
<tf.Tensor: shape=(5, 1), dtype=int32, numpy=
array([[178],
[ 81],
[183],
[153],
[297]], dtype=int32)>
Define the inputs block
For the sake of simplicity, let’s create a schema with a subset of the following continuous and categorical features:
sub_schema = train.schema.select_by_name(
[
"userId",
"movieId",
"title",
"gender",
"TE_zipcode_rating",
"TE_movieId_rating",
"rating_binary",
]
)
We define the continuous layer based on the schema:
continuous_block = mm.ContinuousFeatures.from_schema(sub_schema, tags=Tags.CONTINUOUS)
We display the output tensor of the continuous block by using the data from the first batch. We can see the raw tensors of the continuous features:
continuous_block(batch)
{'TE_zipcode_rating': <tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[-0.25198567],
[ 1.8494534 ],
[ 0.36988667],
[ 0.67197526],
[-0.7212096 ]], dtype=float32)>,
'TE_movieId_rating': <tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[-0.3207402 ],
[ 0.16122401],
[-0.70106816],
[-0.08247337],
[-3.0001638 ]], dtype=float32)>}
We connect the continuous block to a MLPBlock
instance to project them into the same dimensionality as the embedding width of categorical features:
deep_continuous_block = continuous_block.connect(mm.MLPBlock([64]))
deep_continuous_block(batch).shape
2022-04-12 17:57:48.227735: I tensorflow/stream_executor/cuda/cuda_blas.cc:1792] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
TensorShape([5, 64])
We define the categorical embedding block based on the schema:
embedding_block = mm.EmbeddingFeatures.from_schema(sub_schema)
We display the output tensor of the categorical embedding block using the data from the first batch. We can see the embeddings tensors of categorical features with a default dimension of 64:
embeddings = embedding_block(batch)
embeddings.keys(), embeddings["userId"].shape
(dict_keys(['userId', 'movieId', 'title', 'gender']), TensorShape([5, 64]))
Let’s store the continuous and categorical representations in a single dictionary using a ParallelBlock
instance:
dlrm_input_block = mm.ParallelBlock(
{"embeddings": embedding_block, "deep_continuous": deep_continuous_block}
)
print("Output shapes of DLRM input block:")
for key, val in dlrm_input_block(batch).items():
print("\t%s : %s" % (key, val.shape))
Output shapes of DLRM input block:
userId : (5, 64)
movieId : (5, 64)
title : (5, 64)
gender : (5, 64)
deep_continuous : (5, 64)
By looking at the output, we can see that the ParallelBlock
class applies embedding and continuous blocks, in parallel, to the same input batch. Additionally, it merges the resulting tensors into one dictionary.
Define the interaction block
Now that we have a vector representation of each input feature, we will create the DLRM interaction block. It consists of three operations:
Apply a dot product between all continuous and categorical features to learn pairwise interactions.
Concat the resulting pairwise interaction with the deep representation of conitnuous features (skip-connection).
Apply an
MLPBlock
with a series of dense layers to the concatenated tensor.
First, we use the connect_with_shortcut
method to create first two operations of the DLRM interaction block:
from merlin.models.tf.blocks.dlrm import DotProductInteractionBlock
dlrm_interaction = dlrm_input_block.connect_with_shortcut(
DotProductInteractionBlock(), shortcut_filter=mm.Filter("deep_continuous"), aggregation="concat"
)
The Filter
operation allows us to select the deep_continuous
tensor from the dlrm_input_block
outputs.
The following diagram provides a visualization of the operations that we constructed in the dlrm_interaction
object.
dlrm_interaction(batch)
<tf.Tensor: shape=(5, 2080), dtype=float32, numpy=
array([[ 0.03531839, 0. , 0.02178912, ..., 0.00348584,
0.01123738, 0.05082896],
[ 0. , 0.06999855, 0.38183114, ..., 0.02661334,
0.00329179, -0.0324194 ],
[ 0.03445464, 0. , 0.25753298, ..., -0.0443273 ,
0.08484615, -0.04135836],
[ 0. , 0. , 0.17358088, ..., -0.0163713 ,
0.02033711, -0.03035038],
[ 0.25441766, 0. , 0.5767709 , ..., 0.01078878,
-0.02322949, 0.04039076]], dtype=float32)>
Then, we project the learned interaction using a series of dense layers:
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([64, 128, 512]))
deep_dlrm_interaction(batch)
<tf.Tensor: shape=(5, 512), dtype=float32, numpy=
array([[0.00196931, 0. , 0.01411253, ..., 0.00167978, 0.00330653,
0. ],
[0.00134648, 0.02019053, 0.04212135, ..., 0.00931738, 0. ,
0. ],
[0.00061063, 0.00702857, 0. , ..., 0. , 0. ,
0. ],
[0.00954365, 0. , 0.00239623, ..., 0. , 0. ,
0. ],
[0.01073698, 0.04097259, 0. , ..., 0.00655706, 0.01244057,
0. ]], dtype=float32)>
Define the Prediction block
At this stage, we have created the DLRM block that accepts a dictionary of categorical and continuous tensors as input. The output of this block is the interaction representation vector of shape 512
. The next step is to use this hidden representation to conduct a given prediction task. In our case, we use the label rating_binary
and the objective is: to predict if a user A
will give a high rating to a movie B
or not.
We use the BinaryClassificationTask
class and evaluate the performances using the AUC
metric. We also use the LogitsTemperatureScaler
block as a pre-transformation operation that scales the logits returned by the task before computing the loss and metrics:
from merlin.models.tf.blocks.core.transformations import LogitsTemperatureScaler
binary_task = mm.BinaryClassificationTask(
sub_schema,
metrics=[tf.keras.metrics.AUC],
pre=LogitsTemperatureScaler(temperature=2),
)
Define, train, and evaluate the final DLRM Model
We connect the deep DLRM interaction to the binary task and the method automatically generates the Model
class for us.
We note that the Model
class inherits from tf.keras.Model class:
model = deep_dlrm_interaction.connect(binary_task)
type(model)
merlin.models.tf.models.base.Model
We train the model using the built-in tf.keras fit
method:
model.compile(optimizer="adam")
model.fit(train, batch_size=1024, epochs=1)
2022-04-12 17:57:49.043373: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
WARNING:tensorflow:Gradients do not exist for variables ['embedding_features/userId:0', 'embedding_features/movieId:0', 'embedding_features/title:0', 'embedding_features/gender:0', 'parallel_block/userId:0', 'parallel_block/movieId:0', 'parallel_block/title:0', 'parallel_block/gender:0', 'sequential_block_7/userId:0', 'sequential_block_7/movieId:0', 'sequential_block_7/title:0', 'sequential_block_7/gender:0', 'sequential_block_9/userId:0', 'sequential_block_9/movieId:0', 'sequential_block_9/title:0', 'sequential_block_9/gender:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?
782/782 [==============================] - 13s 12ms/step - rating_binary/binary_classification_task/auc: 0.7175 - loss: 0.6489 - regularization_loss: 0.0000e+00 - total_loss: 0.6489
<keras.callbacks.History at 0x7f1551a2cf10>
Let’s check out the model evaluation scores:
metrics = model.evaluate(valid, batch_size=1024, return_dict=True)
metrics
2022-04-12 17:58:03.691971: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: cond/then/_0/cond/cond/branch_executed/_128
196/196 [==============================] - 3s 8ms/step - rating_binary/binary_classification_task/auc: 0.7464 - loss: 2.2079 - regularization_loss: 0.0000e+00 - total_loss: 2.2079
{'rating_binary/binary_classification_task/auc': 0.746394157409668,
'loss': 2.0516271591186523,
'regularization_loss': 0.0,
'total_loss': 2.0516271591186523}
Note that the evaluate()
progress bar shows the loss score for every batch, whereas the final loss stored in the dictionary represents the total loss across all batches.
Save the model so we can use it for serving predictions in production or for resuming training with new observations:
model.save("custom_dlrm")
WARNING:absl:Function `_wrapped_model` contains input name(s) TE_age_rating, TE_gender_rating, TE_movieId_rating, TE_occupation_rating, TE_userId_rating, TE_zipcode_rating, movieId, userId with unsupported characters which will be renamed to te_age_rating, te_gender_rating, te_movieid_rating, te_occupation_rating, te_userid_rating, te_zipcode_rating, movieid, userid in the SavedModel.
WARNING:absl:Found untraced functions such as sequential_block_9_layer_call_fn, sequential_block_9_layer_call_and_return_conditional_losses, binary_classification_task_layer_call_fn, binary_classification_task_layer_call_and_return_conditional_losses, sequential_block_9_layer_call_fn while saving (showing 5 of 155). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: custom_dlrm/assets
INFO:tensorflow:Assets written to: custom_dlrm/assets
Conclusion
Merlin Models provides common and state-of-the-art RecSys architectures in a high-level API as well as all the required low-level building blocks for you to create your own architecture (input blocks, MLP layers, prediction tasks, loss functions, etc.). In this example, we explored a subset of these pre-existing blocks to create the DLRM model, but you can view our documentation to discover more. You can also contribute to the library by submitting new RecSys architectures and custom building Blocks.
Next steps
To learn more about how to deploy the trained DLRM model, please visit Merlin Systems library and execute the Serving-Ranking-Models-With-Merlin-Systems.ipynb
notebook that deploys an ensemble of a NVTabular Workflow and a trained model from Merlin Models to Triton Inference Server.