HugeCTR to ONNX Converter


To improve compatibility and interoperability with other deep-learning frameworks, we provide a Python module to convert HugeCTR models to ONNX. ONNX serves as an open-source format for AI models. Basically, this converter requires the model graph in JSON, dense model, and sparse models as inputs and saves the converted ONNX model to the specified path. All the required input files can be obtained with HugeCTR training APIs and the whole workflow can be accomplished seamlessly in Python.

This notebook demonstrates how to access and use the HugeCTR to ONNX converter. Please make sure that you are familiar with HugeCTR training APIs which will be covered here to ensure the completeness. For more details of the usage of this converter, refer to the HugeCTR to ONNX Converter in the onnx_converter directory of the repository.

Access the HugeCTR to ONNX Converter

Make sure that you start the notebook inside a running 22.10 or later NGC docker container: The module of the ONNX converter is installed to the path /usr/local/lib/python3.8/dist-packages. As for HugeCTR Python interface, a dynamic link to the library is installed to the path /usr/local/hugectr/lib/. You can access the ONNX converter as well as HugeCTR Python interface anywhere within the container.

Run the following cell to confirm that the HugeCTR Python interface can be accessed correctly.

import hugectr

Run the following cell to confirm that the HugeCTR to ONNX converter can be accessed correctly.

import hugectr2onnx

Wide and Deep Model

Download and Preprocess Data

  1. Download the Criteo dataset using the following command:

    $ cd ${project_root}/tools
    $ wget

    In preprocessing, we will further reduce the amounts of data to speedup the preprocessing, fill missing values, remove the feature values whose occurrences are very rare, etc. Here we choose pandas preprocessing method to make the dataset ready for HugeCTR training.

  2. Preprocessing by Pandas using the following command:

    $ bash 1 wdl_data pandas 1 1 100

    The first argument represents the dataset postfix. It is 1 here since day_1 is used. The second argument wdl_data is where the preprocessed data is stored. The fourth argument (one after pandas) 1 embodies that the normalization is applied to dense features. The fifth argument 1 means that the feature crossing is applied. The last argument 100 means the number of data files in each file list.

  3. Create a soft link to the dataset folder using the following command:

    $ ln -s ${project_root}/tools/wdl_data ${project_root}/notebooks/wdl_data

Train the HugeCTR Model

We can train fom scratch, dump the model graph to a JSON file, and save the model weights and optimizer states by performing the following with Python APIs:

  1. Create the solver, reader and optimizer, then initialize the model.

  2. Construct the model graph by adding input, sparse embedding and dense layers in order.

  3. Compile the model and have an overview of the model graph.

  4. Dump the model graph to the JSON file.

  5. Fit the model, save the model weights and optimizer states implicitly.

Please note that the training mode is determined by repeat_dataset within hugectr.CreateSolver. If it is True, the non-epoch mode training is adopted and the maximum iterations should be specified by max_iter within If it is False, the epoch-mode training is adopted and the number of epochs should be specified by num_epochs within

The optimizer that is used to initialize the model applies to the weights of dense layers, while the optimizer for each sparse embedding layer can be specified independently within hugectr.SparseEmbedding.

import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 300,
                              batchsize_eval = 16384,
                              batchsize = 16384,
                              lr = 0.001,
                              vvgpu = [[0]],
                              repeat_dataset = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                  source = ["./wdl_data/file_list.txt"],
                                  eval_source = "./wdl_data/file_list_test.txt",
                                  check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                    update_type = hugectr.Update_t.Global,
                                    beta1 = 0.9,
                                    beta2 = 0.999,
                                    epsilon = 0.0000001)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("wide_data", 2, True, 1),
                        hugectr.DataReaderSparseParam("deep_data", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 75,
                            embedding_vec_size = 1,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding2",
                            bottom_name = "wide_data",
                            optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 1074,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "deep_data",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding2"],
                            top_names = ["reshape2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "dense"],
                            top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat1"],
                            top_names = ["fc1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout2"],
                            top_names = ["fc3"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
                            bottom_names = ["fc3", "reshape2"],
                            top_names = ["add1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["add1", "label"],
                            top_names = ["loss"]))
model.summary() = 2300, display = 200, eval_interval = 1000, snapshot = 2000, snapshot_prefix = "wdl")
====================================================Model Init=====================================================
[17d09h39m52s][HUGECTR][INFO]: Global seed is 2566812942
[17d09h39m53s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[17d09h39m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h39m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h39m55s][HUGECTR][INFO]: End all2all warmup
[17d09h39m55s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Tesla V100-SXM2-16GB
[17d09h39m55s][HUGECTR][INFO]: num of DataReader workers: 12
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=6553600
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5865472
[17d09h39m55s][HUGECTR][INFO]: Save the model graph to wdl.json, successful
===================================================Model Compile===================================================
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[17d09h40m31s][HUGECTR][INFO]: Warm-up done
===================================================Model Summary===================================================
Label                                   Dense                         Sparse                        
label                                   dense                          wide_data,deep_data           
(None, 1)                               (None, 13)                              
Layer Type                              Input Name                    Output Name                   Output Shape                  
DistributedSlotSparseEmbeddingHash      wide_data                     sparse_embedding2             (None, 1, 1)                  
DistributedSlotSparseEmbeddingHash      deep_data                     sparse_embedding1             (None, 26, 16)                
Reshape                                 sparse_embedding1             reshape1                      (None, 416)                   
Reshape                                 sparse_embedding2             reshape2                      (None, 1)                     
Concat                                  reshape1,dense                concat1                       (None, 429)                   
InnerProduct                            concat1                       fc1                           (None, 1024)                  
ReLU                                    fc1                           relu1                         (None, 1024)                  
Dropout                                 relu1                         dropout1                      (None, 1024)                  
InnerProduct                            dropout1                      fc2                           (None, 1024)                  
ReLU                                    fc2                           relu2                         (None, 1024)                  
Dropout                                 relu2                         dropout2                      (None, 1024)                  
InnerProduct                            dropout2                      fc3                           (None, 1)                     
Add                                     fc3,reshape2                  add1                          (None, 1)                     
BinaryCrossEntropyLoss                  add1,label                    loss                                                        
=====================================================Model Fit=====================================================
[17d90h40m31s][HUGECTR][INFO]: Use non-epoch mode with number of iterations: 2300
[17d90h40m31s][HUGECTR][INFO]: Training batchsize: 16384, evaluation batchsize: 16384
[17d90h40m31s][HUGECTR][INFO]: Evaluation interval: 1000, snapshot interval: 2000
[17d90h40m31s][HUGECTR][INFO]: Sparse embedding trainable: 1, dense network trainable: 1
[17d90h40m31s][HUGECTR][INFO]: Use mixed precision: 0, scaler: 1.000000, use cuda graph: 1
[17d90h40m31s][HUGECTR][INFO]: lr: 0.001000, warmup_steps: 1, decay_start: 0, decay_steps: 1, decay_power: 2.000000, end_lr: 0.000000
[17d90h40m31s][HUGECTR][INFO]: Training source file: ./wdl_data/file_list.txt
[17d90h40m31s][HUGECTR][INFO]: Evaluation source file: ./wdl_data/file_list_test.txt
[17d90h40m35s][HUGECTR][INFO]: Iter: 200 Time(200 iters): 4.140480s Loss: 0.130462 lr:0.001000
[17d90h40m39s][HUGECTR][INFO]: Iter: 400 Time(200 iters): 4.015980s Loss: 0.127865 lr:0.001000
[17d90h40m43s][HUGECTR][INFO]: Iter: 600 Time(200 iters): 4.015593s Loss: 0.125415 lr:0.001000
[17d90h40m47s][HUGECTR][INFO]: Iter: 800 Time(200 iters): 4.014245s Loss: 0.132623 lr:0.001000
[17d90h40m51s][HUGECTR][INFO]: Iter: 1000 Time(200 iters): 4.016144s Loss: 0.128454 lr:0.001000
[17d90h40m53s][HUGECTR][INFO]: Evaluation, AUC: 0.771219
[17d90h40m53s][HUGECTR][INFO]: Eval Time for 300 iters: 1.542828s
[17d90h40m57s][HUGECTR][INFO]: Iter: 1200 Time(200 iters): 5.570898s Loss: 0.130795 lr:0.001000
[17d90h41m10s][HUGECTR][INFO]: Iter: 1400 Time(200 iters): 4.013813s Loss: 0.135299 lr:0.001000
[17d90h41m50s][HUGECTR][INFO]: Iter: 1600 Time(200 iters): 4.016540s Loss: 0.130995 lr:0.001000
[17d90h41m90s][HUGECTR][INFO]: Iter: 1800 Time(200 iters): 4.017910s Loss: 0.132997 lr:0.001000
[17d90h41m13s][HUGECTR][INFO]: Iter: 2000 Time(200 iters): 4.015891s Loss: 0.119502 lr:0.001000
[17d90h41m14s][HUGECTR][INFO]: Evaluation, AUC: 0.778875
[17d90h41m14s][HUGECTR][INFO]: Eval Time for 300 iters: 1.545023s
[17d90h41m14s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m16s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m18s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m20s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m20s][HUGECTR][INFO]: Done
[17d90h41m21s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m21s][HUGECTR][INFO]: Done
[17d90h41m32s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense weights to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[17d90h41m37s][HUGECTR][INFO]: Iter: 2200 Time(200 iters): 24.012700s Loss: 0.128822 lr:0.001000
Finish 2300 iterations with batchsize: 16384 in 67.90s

Convert to ONNX

We can convert the trained HugeCTR model to ONNX with a call to hugectr2onnx.converter.convert. We can specify whether to convert the sparse embeddings via the flag convert_embedding and do not need to provide the sparse models if it is set as False. In this notebook, both dense and sparse parts of the HugeCTR model will be converted to ONNX, in order that we can check the correctness of the conversion more easily by comparing inference results based on HugeCTR and ONNX Runtime.

import hugectr2onnx
hugectr2onnx.converter.convert(onnx_model_path = "wdl.onnx",
                            graph_config = "wdl.json",
                            dense_model = "wdl_dense_2000.model",
                            convert_embedding = True,
                            sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"])
The model is checked!
The model is saved at wdl.onnx

Inference with ONNX Runtime and HugeCTR

To make inferences with the ONNX runtime, we need to read samples from the data and feed them to the ONNX inference session. Specifically, we need to extract dense features, wide sparse features and deep sparse features from the preprocessed Wide&Deep dataset. To guarantee fair comparison with HugeCTR inference, we will use the first data file within ./wdl_data/file_list_test.txt, i.e., ./wdl_data/val/, and make inference for the same number of samples (should be less than the total number of samples within ./wdl_data/val/

import struct
import numpy as np
def read_samples_for_wdl(data_file, num_samples, key_type="I32", slot_num=27):
    key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}
    with open(data_file, 'rb') as file:
        # skip data_header + 64 + 1, 0)
        batch_label = []
        batch_dense = []
        batch_wide_data = []
        batch_deep_data = []
        for _ in range(num_samples):
            # one sample
            length_buffer = # int
            length = struct.unpack('i', length_buffer)
            label_buffer = # int
            label = struct.unpack('i', label_buffer)[0]
            dense_buffer = * 13) # dense_dim * float
            dense = struct.unpack("13f", dense_buffer)
            keys = []
            for _ in range(slot_num):
                nnz_buffer = # int
                nnz = struct.unpack("i", nnz_buffer)[0]
                key_buffer =[key_type][1] * nnz) # nnz * sizeof(key_type)
                key = struct.unpack(str(nnz) + key_type_map[key_type][0], key_buffer)
                keys += list(key)
            check_bit_buffer = # char
            check_bit = struct.unpack("c", check_bit_buffer)[0]
    batch_label = np.reshape(np.array(batch_label, dtype=np.float32), newshape=(num_samples, 1))
    batch_dense = np.reshape(np.array(batch_dense, dtype=np.float32), newshape=(num_samples, 13))
    batch_wide_data = np.reshape(np.array(batch_wide_data, dtype=np.int64), newshape=(num_samples, 1, 2))
    batch_deep_data = np.reshape(np.array(batch_deep_data, dtype=np.int64), newshape=(num_samples, 26, 1))
    return batch_label, batch_dense, batch_wide_data, batch_deep_data
batch_size = 64
num_batches = 100
data_file = "./wdl_data/val/" # there are totally 40960 samples
onnx_model_path = "wdl.onnx"

label, dense, wide_data, deep_data = read_samples_for_wdl(data_file, batch_size*num_batches, key_type="I32", slot_num = 27)
import onnxruntime as ort
sess = ort.InferenceSession(onnx_model_path)
res =[sess.get_outputs()[0].name],
                  input_feed={sess.get_inputs()[0].name: dense, sess.get_inputs()[1].name: wide_data, sess.get_inputs()[2].name: deep_data})
onnx_preds = res[0].reshape((batch_size*num_batches,))
print("ONNX Runtime Predicions:", onnx_preds)
ONNX Runtime Predicions: [0.02525118 0.00920713 0.0080741  ... 0.02893934 0.02577347 0.1296753 ]

We can then make inference based on HugeCTR APIs and compare the prediction results.

dense_model = "wdl_dense_2000.model"
sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"]
graph_config = "wdl.json"
data_source = "./wdl_data/file_list_test.txt"
import hugectr
from mpi4py import MPI
from hugectr.inference import InferenceParams, CreateInferenceSession
inference_params = InferenceParams(model_name = "wdl",
                                max_batchsize = batch_size,
                                hit_rate_threshold = 0.6,
                                dense_model_file = dense_model,
                                sparse_model_files = sparse_models,
                                device_id = 0,
                                use_gpu_embedding_cache = True,
                                cache_size_percentage = 0.6,
                                i64_input_key = False)
inference_session = CreateInferenceSession(graph_config, inference_params)
hugectr_preds = inference_session.predict(num_batches, data_source, hugectr.DataReaderType_t.Norm, hugectr.Check_t.Sum)
print("HugeCTR Predictions: ", hugectr_preds)
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m53s][HUGECTR][INFO]: Global seed is 3782721491
[17d09h43m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h43m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h43m55s][HUGECTR][INFO]: End all2all warmup
[17d09h43m55s][HUGECTR][INFO]: Use mixed precision: 0
[17d09h43m55s][HUGECTR][INFO]: start create embedding for inference
[17d09h43m55s][HUGECTR][INFO]: sparse_input name wide_data
[17d09h43m55s][HUGECTR][INFO]: sparse_input name deep_data
[17d09h43m55s][HUGECTR][INFO]: create embedding for inference success
[17d09h43m55s][HUGECTR][INFO]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
HugeCTR Predictions:  [0.02525118 0.00920718 0.00807416 ... 0.0289393  0.02577345 0.12967525]
print("Min absolute error: ", np.min(np.abs(onnx_preds-hugectr_preds)))
print("Mean absolute error: ", np.mean(np.abs(onnx_preds-hugectr_preds)))
print("Max absolute error: ", np.max(np.abs(onnx_preds-hugectr_preds)))
Min absolute error:  0.0
Mean absolute error:  2.3289697e-08
Max absolute error:  1.1920929e-07

API Signature for hugectr2onnx.converter


    convert(onnx_model_path, graph_config, dense_model, convert_embedding=False, sparse_models=[], ntp_file=None, graph_name='hugectr')
        Convert a HugeCTR model to an ONNX model
            onnx_model_path: the path to store the ONNX model
            graph_config: the graph configuration JSON file of the HugeCTR model
            dense_model: the file of the dense weights for the HugeCTR model
            convert_embedding: whether to convert the sparse embeddings for the HugeCTR model (optional)
            sparse_models: the files of the sparse embeddings for the HugeCTR model (optional)
            ntp_file: the file of the non-trainable parameters for the HugeCTR model (optional)
            graph_name: the graph name for the ONNX model (optional)