# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

HugeCTR to ONNX Converter


To improve compatibility and interoperability with other deep-learning frameworks, we provide a Python module to convert HugeCTR models to ONNX. ONNX serves as an open-source format for AI models. Basically, this converter requires the model graph in JSON, dense model, and sparse models as inputs and saves the converted ONNX model to the specified path. All the required input files can be obtained with HugeCTR training APIs and the whole workflow can be accomplished seamlessly in Python.

This notebook demonstrates how to access and use the HugeCTR to ONNX converter. Please make sure that you are familiar with HugeCTR training APIs which will be covered here to ensure the completeness. For more details of the usage of this converter, refer to the HugeCTR to ONNX Converter in the onnx_converter directory of the repository.

Setup HugeCTR

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

Access the HugeCTR to ONNX Converter

Make sure that you start the notebook inside a running 22.09 or later NGC docker container: nvcr.io/nvidia/merlin/merlin-hugectr:22.09. The module of the ONNX converter is installed to the path /usr/local/lib/python3.8/dist-packages. As for HugeCTR Python interface, a dynamic link to the hugectr.so library is installed to the path /usr/local/hugectr/lib/. You can access the ONNX converter as well as HugeCTR Python interface anywhere within the container.

Run the following cell to confirm that the HugeCTR Python interface can be accessed correctly.

import hugectr

Run the following cell to confirm that the HugeCTR to ONNX converter can be accessed correctly.

import hugectr2onnx

Wide and Deep Model

To download and prepare the dataset we will be doing the following steps. At the end of this cell, we provide the shell commands you can run on the terminal to get the data ready for this notebook.

Note: If you already have the data downloaded, then skip to the preprocessing step (2). If preprocessing is also done, skip to creating the softlink between the processed data to the notebooks/ directory (3).

Data Preparation

  1. Download the Criteo dataset

To preprocess the downloaded Kaggle Criteo dataset, we’ll make the following operations:

  • Reduce the amounts of data to speed up the preprocessing

  • Fill missing values

  • Remove the feature values whose occurrences are very rare, etc.

  1. Preprocessing by Pandas:

    Meanings of the command line arguments:

    • The 1st argument represents the dataset postfix. It is 1 here since day_1 is used.

    • The 2nd argument wdl_data is where the preprocessed data is stored.

    • The 3rd argument pandas is the processing script going to use, here we choose pandas.

    • The 4th argument 1 embodies that the normalization is applied to dense features.

    • The 5th argument 1 means that the feature crossing is applied.

    • The 6th argument 100 means the number of data files in each file list.

    For more details about the data preprocessing, please refer to the “Preprocess the Criteo Dataset” section of the README in the samples/criteo directory of the repository on GitHub.

  2. Create a soft link of the dataset folder to the path of this notebook

Run the following commands on the terminal to prepare the data for this notebook

export project_root=/home/hugectr # set this to the directory where hugectr is downloaded
cd ${project_root}/tools
# Step 1
wget https://storage.googleapis.com/criteo-cail-datasets/day_0.gz
#Step 2
bash preprocess.sh 0 wdl_data pandas 1 1 100
#Step 3
ln -s ${project_root}/tools/wdl_data ${project_root}/notebooks/wdl_data

Train the HugeCTR Model

We can train fom scratch, dump the model graph to a JSON file, and save the model weights and optimizer states by performing the following with Python APIs:

  1. Create the solver, reader and optimizer, then initialize the model.

  2. Construct the model graph by adding input, sparse embedding and dense layers in order.

  3. Compile the model and have an overview of the model graph.

  4. Dump the model graph to the JSON file.

  5. Fit the model, save the model weights and optimizer states implicitly.

Please note that the training mode is determined by repeat_dataset within hugectr.CreateSolver. If it is True, the non-epoch mode training is adopted and the maximum iterations should be specified by max_iter within hugectr.Model.fit. If it is False, the epoch-mode training is adopted and the number of epochs should be specified by num_epochs within hugectr.Model.fit.

The optimizer that is used to initialize the model applies to the weights of dense layers, while the optimizer for each sparse embedding layer can be specified independently within hugectr.SparseEmbedding.

%%writefile wdl_train.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 300,
                              batchsize_eval = 16384,
                              batchsize = 16384,
                              lr = 0.001,
                              vvgpu = [[0]],
                              repeat_dataset = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                  source = ["./wdl_data/file_list.0.txt"],
                                  eval_source = "./wdl_data/file_list_test.0.txt",
                                  check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                    update_type = hugectr.Update_t.Global,
                                    beta1 = 0.9,
                                    beta2 = 0.999,
                                    epsilon = 0.0000001)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("wide_data", 2, True, 1),
                        hugectr.DataReaderSparseParam("deep_data", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 75,
                            embedding_vec_size = 1,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding2",
                            bottom_name = "wide_data",
                            optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 1074,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "deep_data",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding2"],
                            top_names = ["reshape2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "dense"],
                            top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat1"],
                            top_names = ["fc1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout2"],
                            top_names = ["fc3"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
                            bottom_names = ["fc3", "reshape2"],
                            top_names = ["add1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["add1", "label"],
                            top_names = ["loss"]))
model.fit(max_iter = 2300, display = 200, eval_interval = 1000, snapshot = 2000, snapshot_prefix = "wdl")
Overwriting wdl_train.py
!python3 wdl_train.py
====================================================Model Init=====================================================
[17d09h39m52s][HUGECTR][INFO]: Global seed is 2566812942
[17d09h39m53s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[17d09h39m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h39m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h39m55s][HUGECTR][INFO]: End all2all warmup
[17d09h39m55s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Tesla V100-SXM2-16GB
[17d09h39m55s][HUGECTR][INFO]: num of DataReader workers: 12
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=6553600
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5865472
[17d09h39m55s][HUGECTR][INFO]: Save the model graph to wdl.json, successful
===================================================Model Compile===================================================
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[17d09h40m31s][HUGECTR][INFO]: Warm-up done
===================================================Model Summary===================================================
Label                                   Dense                         Sparse                        
label                                   dense                          wide_data,deep_data           
(None, 1)                               (None, 13)                              
Layer Type                              Input Name                    Output Name                   Output Shape                  
DistributedSlotSparseEmbeddingHash      wide_data                     sparse_embedding2             (None, 1, 1)                  
DistributedSlotSparseEmbeddingHash      deep_data                     sparse_embedding1             (None, 26, 16)                
Reshape                                 sparse_embedding1             reshape1                      (None, 416)                   
Reshape                                 sparse_embedding2             reshape2                      (None, 1)                     
Concat                                  reshape1,dense                concat1                       (None, 429)                   
InnerProduct                            concat1                       fc1                           (None, 1024)                  
ReLU                                    fc1                           relu1                         (None, 1024)                  
Dropout                                 relu1                         dropout1                      (None, 1024)                  
InnerProduct                            dropout1                      fc2                           (None, 1024)                  
ReLU                                    fc2                           relu2                         (None, 1024)                  
Dropout                                 relu2                         dropout2                      (None, 1024)                  
InnerProduct                            dropout2                      fc3                           (None, 1)                     
Add                                     fc3,reshape2                  add1                          (None, 1)                     
BinaryCrossEntropyLoss                  add1,label                    loss                                                        
=====================================================Model Fit=====================================================
[17d90h40m31s][HUGECTR][INFO]: Use non-epoch mode with number of iterations: 2300
[17d90h40m31s][HUGECTR][INFO]: Training batchsize: 16384, evaluation batchsize: 16384
[17d90h40m31s][HUGECTR][INFO]: Evaluation interval: 1000, snapshot interval: 2000
[17d90h40m31s][HUGECTR][INFO]: Sparse embedding trainable: 1, dense network trainable: 1
[17d90h40m31s][HUGECTR][INFO]: Use mixed precision: 0, scaler: 1.000000, use cuda graph: 1
[17d90h40m31s][HUGECTR][INFO]: lr: 0.001000, warmup_steps: 1, decay_start: 0, decay_steps: 1, decay_power: 2.000000, end_lr: 0.000000
[17d90h40m31s][HUGECTR][INFO]: Training source file: ./wdl_data/file_list.txt
[17d90h40m31s][HUGECTR][INFO]: Evaluation source file: ./wdl_data/file_list_test.txt
[17d90h40m35s][HUGECTR][INFO]: Iter: 200 Time(200 iters): 4.140480s Loss: 0.130462 lr:0.001000
[17d90h40m39s][HUGECTR][INFO]: Iter: 400 Time(200 iters): 4.015980s Loss: 0.127865 lr:0.001000
[17d90h40m43s][HUGECTR][INFO]: Iter: 600 Time(200 iters): 4.015593s Loss: 0.125415 lr:0.001000
[17d90h40m47s][HUGECTR][INFO]: Iter: 800 Time(200 iters): 4.014245s Loss: 0.132623 lr:0.001000
[17d90h40m51s][HUGECTR][INFO]: Iter: 1000 Time(200 iters): 4.016144s Loss: 0.128454 lr:0.001000
[17d90h40m53s][HUGECTR][INFO]: Evaluation, AUC: 0.771219
[17d90h40m53s][HUGECTR][INFO]: Eval Time for 300 iters: 1.542828s
[17d90h40m57s][HUGECTR][INFO]: Iter: 1200 Time(200 iters): 5.570898s Loss: 0.130795 lr:0.001000
[17d90h41m10s][HUGECTR][INFO]: Iter: 1400 Time(200 iters): 4.013813s Loss: 0.135299 lr:0.001000
[17d90h41m50s][HUGECTR][INFO]: Iter: 1600 Time(200 iters): 4.016540s Loss: 0.130995 lr:0.001000
[17d90h41m90s][HUGECTR][INFO]: Iter: 1800 Time(200 iters): 4.017910s Loss: 0.132997 lr:0.001000
[17d90h41m13s][HUGECTR][INFO]: Iter: 2000 Time(200 iters): 4.015891s Loss: 0.119502 lr:0.001000
[17d90h41m14s][HUGECTR][INFO]: Evaluation, AUC: 0.778875
[17d90h41m14s][HUGECTR][INFO]: Eval Time for 300 iters: 1.545023s
[17d90h41m14s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m16s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m18s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m20s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m20s][HUGECTR][INFO]: Done
[17d90h41m21s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m21s][HUGECTR][INFO]: Done
[17d90h41m32s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense weights to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[17d90h41m37s][HUGECTR][INFO]: Iter: 2200 Time(200 iters): 24.012700s Loss: 0.128822 lr:0.001000
Finish 2300 iterations with batchsize: 16384 in 67.90s

Convert to ONNX

We can convert the trained HugeCTR model to ONNX with a call to hugectr2onnx.converter.convert. We can specify whether to convert the sparse embeddings via the flag convert_embedding and do not need to provide the sparse models if it is set as False. In this notebook, both dense and sparse parts of the HugeCTR model will be converted to ONNX, in order that we can check the correctness of the conversion more easily by comparing inference results based on HugeCTR and ONNX Runtime.

import hugectr2onnx
hugectr2onnx.converter.convert(onnx_model_path = "wdl.onnx",
                            graph_config = "wdl.json",
                            dense_model = "wdl_dense_2000.model",
                            convert_embedding = True,
                            sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"])
The model is checked!
The model is saved at wdl.onnx

Inference with ONNX Runtime and HugeCTR

To make inferences with the ONNX runtime, we need to read samples from the data and feed them to the ONNX inference session. Specifically, we need to extract dense features, wide sparse features and deep sparse features from the preprocessed Wide&Deep dataset. To guarantee fair comparison with HugeCTR inference, we will use the first data file within ./wdl_data/file_list_test.txt, i.e., ./wdl_data/val/sparse_embedding0.data, and make inference for the same number of samples (should be less than the total number of samples within ./wdl_data/val/sparse_embedding0.data).

import struct
import numpy as np
def read_samples_for_wdl(data_file, num_samples, key_type="I32", slot_num=27):
    key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}
    with open(data_file, 'rb') as file:
        # skip data_header
        file.seek(4 + 64 + 1, 0)
        batch_label = []
        batch_dense = []
        batch_wide_data = []
        batch_deep_data = []
        for _ in range(num_samples):
            # one sample
            length_buffer = file.read(4) # int
            length = struct.unpack('i', length_buffer)
            label_buffer = file.read(4) # int
            label = struct.unpack('i', label_buffer)[0]
            dense_buffer = file.read(4 * 13) # dense_dim * float
            dense = struct.unpack("13f", dense_buffer)
            keys = []
            for _ in range(slot_num):
                nnz_buffer = file.read(4) # int
                nnz = struct.unpack("i", nnz_buffer)[0]
                key_buffer = file.read(key_type_map[key_type][1] * nnz) # nnz * sizeof(key_type)
                key = struct.unpack(str(nnz) + key_type_map[key_type][0], key_buffer)
                keys += list(key)
            check_bit_buffer = file.read(1) # char
            check_bit = struct.unpack("c", check_bit_buffer)[0]
    batch_label = np.reshape(np.array(batch_label, dtype=np.float32), newshape=(num_samples, 1))
    batch_dense = np.reshape(np.array(batch_dense, dtype=np.float32), newshape=(num_samples, 13))
    batch_wide_data = np.reshape(np.array(batch_wide_data, dtype=np.int64), newshape=(num_samples, 1, 2))
    batch_deep_data = np.reshape(np.array(batch_deep_data, dtype=np.int64), newshape=(num_samples, 26, 1))
    return batch_label, batch_dense, batch_wide_data, batch_deep_data
batch_size = 64
num_batches = 100
data_file = "./wdl_data/val/sparse_embedding0.data" # there are totally 40960 samples
onnx_model_path = "wdl.onnx"

label, dense, wide_data, deep_data = read_samples_for_wdl(data_file, batch_size*num_batches, key_type="I32", slot_num = 27)
import onnxruntime as ort
sess = ort.InferenceSession(onnx_model_path)
res = sess.run(output_names=[sess.get_outputs()[0].name],
                  input_feed={sess.get_inputs()[0].name: dense, sess.get_inputs()[1].name: wide_data, sess.get_inputs()[2].name: deep_data})
onnx_preds = res[0].reshape((batch_size*num_batches,))
print("ONNX Runtime Predicions:", onnx_preds)
ONNX Runtime Predicions: [0.02525118 0.00920713 0.0080741  ... 0.02893934 0.02577347 0.1296753 ]

We can then make inference based on HugeCTR APIs and compare the prediction results.

dense_model = "wdl_dense_2000.model"
sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"]
graph_config = "wdl.json"
data_source = "./wdl_data/file_list_test.0.txt"
import hugectr
from mpi4py import MPI
from hugectr.inference import InferenceParams, CreateInferenceSession
inference_params = InferenceParams(model_name = "wdl",
                                max_batchsize = batch_size,
                                hit_rate_threshold = 0.6,
                                dense_model_file = dense_model,
                                sparse_model_files = sparse_models,
                                device_id = 0,
                                use_gpu_embedding_cache = True,
                                cache_size_percentage = 0.6,
                                i64_input_key = False)
inference_session = CreateInferenceSession(graph_config, inference_params)
hugectr_preds = inference_session.predict(num_batches, data_source, hugectr.DataReaderType_t.Norm, hugectr.Check_t.Sum)
print("HugeCTR Predictions: ", hugectr_preds)
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m53s][HUGECTR][INFO]: Global seed is 3782721491
[17d09h43m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h43m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h43m55s][HUGECTR][INFO]: End all2all warmup
[17d09h43m55s][HUGECTR][INFO]: Use mixed precision: 0
[17d09h43m55s][HUGECTR][INFO]: start create embedding for inference
[17d09h43m55s][HUGECTR][INFO]: sparse_input name wide_data
[17d09h43m55s][HUGECTR][INFO]: sparse_input name deep_data
[17d09h43m55s][HUGECTR][INFO]: create embedding for inference success
[17d09h43m55s][HUGECTR][INFO]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
HugeCTR Predictions:  [0.02525118 0.00920718 0.00807416 ... 0.0289393  0.02577345 0.12967525]
print("Min absolute error: ", np.min(np.abs(onnx_preds-hugectr_preds)))
print("Mean absolute error: ", np.mean(np.abs(onnx_preds-hugectr_preds)))
print("Max absolute error: ", np.max(np.abs(onnx_preds-hugectr_preds)))
Min absolute error:  0.0
Mean absolute error:  2.3289697e-08
Max absolute error:  1.1920929e-07

API Signature for hugectr2onnx.converter


    convert(onnx_model_path, graph_config, dense_model, convert_embedding=False, sparse_models=[], ntp_file=None, graph_name='hugectr')
        Convert a HugeCTR model to an ONNX model
            onnx_model_path: the path to store the ONNX model
            graph_config: the graph configuration JSON file of the HugeCTR model
            dense_model: the file of the dense weights for the HugeCTR model
            convert_embedding: whether to convert the sparse embeddings for the HugeCTR model (optional)
            sparse_models: the files of the sparse embeddings for the HugeCTR model (optional)
            ntp_file: the file of the non-trainable parameters for the HugeCTR model (optional)
            graph_name: the graph name for the ONNX model (optional)