# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.
HugeCTR to ONNX Converter
Overview
To improve compatibility and interoperability with other deep-learning frameworks, we provide a Python module to convert HugeCTR models to ONNX. ONNX serves as an open-source format for AI models. Basically, this converter requires the model graph in JSON, dense model, and sparse models as inputs and saves the converted ONNX model to the specified path. All the required input files can be obtained with HugeCTR training APIs and the whole workflow can be accomplished seamlessly in Python.
This notebook demonstrates how to access and use the HugeCTR to ONNX converter. Please make sure that you are familiar with HugeCTR training APIs which will be covered here to ensure the completeness. For more details of the usage of this converter, refer to the HugeCTR to ONNX Converter in the onnx_converter directory of the repository.
Setup HugeCTR
To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.
Access the HugeCTR to ONNX Converter
Make sure that you start the notebook inside a running 22.09 or later NGC docker container: nvcr.io/nvidia/merlin/merlin-hugectr:22.09
.
The module of the ONNX converter is installed to the path /usr/local/lib/python3.8/dist-packages
.
As for HugeCTR Python interface, a dynamic link to the hugectr.so
library is installed to the path /usr/local/hugectr/lib/
.
You can access the ONNX converter as well as HugeCTR Python interface anywhere within the container.
Run the following cell to confirm that the HugeCTR Python interface can be accessed correctly.
import hugectr
Run the following cell to confirm that the HugeCTR to ONNX converter can be accessed correctly.
import hugectr2onnx
Wide and Deep Model
To download and prepare the dataset we will be doing the following steps. At the end of this cell, we provide the shell commands you can run on the terminal to get the data ready for this notebook.
Note: If you already have the data downloaded, then skip to the preprocessing step (2). If preprocessing is also done, skip to creating the softlink between the processed data to the notebooks/
directory (3).
Data Preparation
Download the Criteo dataset
To preprocess the downloaded Kaggle Criteo dataset, we’ll make the following operations:
Reduce the amounts of data to speed up the preprocessing
Fill missing values
Remove the feature values whose occurrences are very rare, etc.
Preprocessing by Pandas:
Meanings of the command line arguments:
The 1st argument represents the dataset postfix. It is
1
here sinceday_1
is used.The 2nd argument
wdl_data
is where the preprocessed data is stored.The 3rd argument
pandas
is the processing script going to use, here we choosepandas
.The 4th argument
1
embodies that the normalization is applied to dense features.The 5th argument
1
means that the feature crossing is applied.The 6th argument
100
means the number of data files in each file list.
For more details about the data preprocessing, please refer to the “Preprocess the Criteo Dataset” section of the README in the samples/criteo directory of the repository on GitHub.
Create a soft link of the dataset folder to the path of this notebook
Run the following commands on the terminal to prepare the data for this notebook
export project_root=/home/hugectr # set this to the directory where hugectr is downloaded
cd ${project_root}/tools
# Step 1
wget https://storage.googleapis.com/criteo-cail-datasets/day_0.gz
#Step 2
bash preprocess.sh 0 wdl_data pandas 1 1 100
#Step 3
ln -s ${project_root}/tools/wdl_data ${project_root}/notebooks/wdl_data
Train the HugeCTR Model
We can train fom scratch, dump the model graph to a JSON file, and save the model weights and optimizer states by performing the following with Python APIs:
Create the solver, reader and optimizer, then initialize the model.
Construct the model graph by adding input, sparse embedding and dense layers in order.
Compile the model and have an overview of the model graph.
Dump the model graph to the JSON file.
Fit the model, save the model weights and optimizer states implicitly.
Please note that the training mode is determined by repeat_dataset
within hugectr.CreateSolver
.
If it is True
, the non-epoch mode training is adopted and the maximum iterations should be specified by max_iter
within hugectr.Model.fit
.
If it is False
, the epoch-mode training is adopted and the number of epochs should be specified by num_epochs
within hugectr.Model.fit
.
The optimizer that is used to initialize the model applies to the weights of dense layers, while the optimizer for each sparse embedding layer can be specified independently within hugectr.SparseEmbedding
.
%%writefile wdl_train.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 300,
batchsize_eval = 16384,
batchsize = 16384,
lr = 0.001,
vvgpu = [[0]],
repeat_dataset = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
source = ["./wdl_data/file_list.0.txt"],
eval_source = "./wdl_data/file_list_test.0.txt",
check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
update_type = hugectr.Update_t.Global,
beta1 = 0.9,
beta2 = 0.999,
epsilon = 0.0000001)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
dense_dim = 13, dense_name = "dense",
data_reader_sparse_param_array =
[hugectr.DataReaderSparseParam("wide_data", 2, True, 1),
hugectr.DataReaderSparseParam("deep_data", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb = 75,
embedding_vec_size = 1,
combiner = "sum",
sparse_embedding_name = "sparse_embedding2",
bottom_name = "wide_data",
optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb = 1074,
embedding_vec_size = 16,
combiner = "sum",
sparse_embedding_name = "sparse_embedding1",
bottom_name = "deep_data",
optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["sparse_embedding1"],
top_names = ["reshape1"],
leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["sparse_embedding2"],
top_names = ["reshape2"],
leading_dim=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
bottom_names = ["reshape1", "dense"],
top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["concat1"],
top_names = ["fc1"],
num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
bottom_names = ["fc1"],
top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
bottom_names = ["relu1"],
top_names = ["dropout1"],
dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["dropout1"],
top_names = ["fc2"],
num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
bottom_names = ["fc2"],
top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
bottom_names = ["relu2"],
top_names = ["dropout2"],
dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["dropout2"],
top_names = ["fc3"],
num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
bottom_names = ["fc3", "reshape2"],
top_names = ["add1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
bottom_names = ["add1", "label"],
top_names = ["loss"]))
model.graph_to_json("wdl.json")
model.compile()
model.summary()
model.fit(max_iter = 2300, display = 200, eval_interval = 1000, snapshot = 2000, snapshot_prefix = "wdl")
Overwriting wdl_train.py
!python3 wdl_train.py
====================================================Model Init=====================================================
[17d09h39m52s][HUGECTR][INFO]: Global seed is 2566812942
[17d09h39m53s][HUGECTR][INFO]: Device to NUMA mapping:
GPU 0 -> node 0
[17d09h39m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h39m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h39m55s][HUGECTR][INFO]: End all2all warmup
[17d09h39m55s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Tesla V100-SXM2-16GB
[17d09h39m55s][HUGECTR][INFO]: num of DataReader workers: 12
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=6553600
[17d09h39m55s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5865472
[17d09h39m55s][HUGECTR][INFO]: Save the model graph to wdl.json, successful
===================================================Model Compile===================================================
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: gpu0 start to init embedding
[17d09h40m31s][HUGECTR][INFO]: gpu0 init embedding done
[17d09h40m31s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[17d09h40m31s][HUGECTR][INFO]: Warm-up done
===================================================Model Summary===================================================
Label Dense Sparse
label dense wide_data,deep_data
(None, 1) (None, 13)
------------------------------------------------------------------------------------------------------------------
Layer Type Input Name Output Name Output Shape
------------------------------------------------------------------------------------------------------------------
DistributedSlotSparseEmbeddingHash wide_data sparse_embedding2 (None, 1, 1)
DistributedSlotSparseEmbeddingHash deep_data sparse_embedding1 (None, 26, 16)
Reshape sparse_embedding1 reshape1 (None, 416)
Reshape sparse_embedding2 reshape2 (None, 1)
Concat reshape1,dense concat1 (None, 429)
InnerProduct concat1 fc1 (None, 1024)
ReLU fc1 relu1 (None, 1024)
Dropout relu1 dropout1 (None, 1024)
InnerProduct dropout1 fc2 (None, 1024)
ReLU fc2 relu2 (None, 1024)
Dropout relu2 dropout2 (None, 1024)
InnerProduct dropout2 fc3 (None, 1)
Add fc3,reshape2 add1 (None, 1)
BinaryCrossEntropyLoss add1,label loss
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[17d90h40m31s][HUGECTR][INFO]: Use non-epoch mode with number of iterations: 2300
[17d90h40m31s][HUGECTR][INFO]: Training batchsize: 16384, evaluation batchsize: 16384
[17d90h40m31s][HUGECTR][INFO]: Evaluation interval: 1000, snapshot interval: 2000
[17d90h40m31s][HUGECTR][INFO]: Sparse embedding trainable: 1, dense network trainable: 1
[17d90h40m31s][HUGECTR][INFO]: Use mixed precision: 0, scaler: 1.000000, use cuda graph: 1
[17d90h40m31s][HUGECTR][INFO]: lr: 0.001000, warmup_steps: 1, decay_start: 0, decay_steps: 1, decay_power: 2.000000, end_lr: 0.000000
[17d90h40m31s][HUGECTR][INFO]: Training source file: ./wdl_data/file_list.txt
[17d90h40m31s][HUGECTR][INFO]: Evaluation source file: ./wdl_data/file_list_test.txt
[17d90h40m35s][HUGECTR][INFO]: Iter: 200 Time(200 iters): 4.140480s Loss: 0.130462 lr:0.001000
[17d90h40m39s][HUGECTR][INFO]: Iter: 400 Time(200 iters): 4.015980s Loss: 0.127865 lr:0.001000
[17d90h40m43s][HUGECTR][INFO]: Iter: 600 Time(200 iters): 4.015593s Loss: 0.125415 lr:0.001000
[17d90h40m47s][HUGECTR][INFO]: Iter: 800 Time(200 iters): 4.014245s Loss: 0.132623 lr:0.001000
[17d90h40m51s][HUGECTR][INFO]: Iter: 1000 Time(200 iters): 4.016144s Loss: 0.128454 lr:0.001000
[17d90h40m53s][HUGECTR][INFO]: Evaluation, AUC: 0.771219
[17d90h40m53s][HUGECTR][INFO]: Eval Time for 300 iters: 1.542828s
[17d90h40m57s][HUGECTR][INFO]: Iter: 1200 Time(200 iters): 5.570898s Loss: 0.130795 lr:0.001000
[17d90h41m10s][HUGECTR][INFO]: Iter: 1400 Time(200 iters): 4.013813s Loss: 0.135299 lr:0.001000
[17d90h41m50s][HUGECTR][INFO]: Iter: 1600 Time(200 iters): 4.016540s Loss: 0.130995 lr:0.001000
[17d90h41m90s][HUGECTR][INFO]: Iter: 1800 Time(200 iters): 4.017910s Loss: 0.132997 lr:0.001000
[17d90h41m13s][HUGECTR][INFO]: Iter: 2000 Time(200 iters): 4.015891s Loss: 0.119502 lr:0.001000
[17d90h41m14s][HUGECTR][INFO]: Evaluation, AUC: 0.778875
[17d90h41m14s][HUGECTR][INFO]: Eval Time for 300 iters: 1.545023s
[17d90h41m14s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m16s][HUGECTR][INFO]: Rank0: Write hash table to file
[17d90h41m18s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m18s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m18s][HUGECTR][INFO]: Done
[17d90h41m20s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m20s][HUGECTR][INFO]: Done
[17d90h41m21s][HUGECTR][INFO]: Rank0: Write optimzer state to file
[17d90h41m21s][HUGECTR][INFO]: Done
[17d90h41m32s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense weights to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[17d90h41m32s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[17d90h41m37s][HUGECTR][INFO]: Iter: 2200 Time(200 iters): 24.012700s Loss: 0.128822 lr:0.001000
Finish 2300 iterations with batchsize: 16384 in 67.90s
Convert to ONNX
We can convert the trained HugeCTR model to ONNX with a call to hugectr2onnx.converter.convert. We can specify whether to convert the sparse embeddings via the flag convert_embedding
and do not need to provide the sparse models if it is set as False
. In this notebook, both dense and sparse parts of the HugeCTR model will be converted to ONNX, in order that we can check the correctness of the conversion more easily by comparing inference results based on HugeCTR and ONNX Runtime.
import hugectr2onnx
hugectr2onnx.converter.convert(onnx_model_path = "wdl.onnx",
graph_config = "wdl.json",
dense_model = "wdl_dense_2000.model",
convert_embedding = True,
sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"])
The model is checked!
The model is saved at wdl.onnx
Inference with ONNX Runtime and HugeCTR
To make inferences with the ONNX runtime, we need to read samples from the data and feed them to the ONNX inference session. Specifically, we need to extract dense features, wide sparse features and deep sparse features from the preprocessed Wide&Deep dataset. To guarantee fair comparison with HugeCTR inference, we will use the first data file within ./wdl_data/file_list_test.txt
, i.e., ./wdl_data/val/sparse_embedding0.data
, and make inference for the same number of samples (should be less than the total number of samples within ./wdl_data/val/sparse_embedding0.data
).
import struct
import numpy as np
def read_samples_for_wdl(data_file, num_samples, key_type="I32", slot_num=27):
key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}
with open(data_file, 'rb') as file:
# skip data_header
file.seek(4 + 64 + 1, 0)
batch_label = []
batch_dense = []
batch_wide_data = []
batch_deep_data = []
for _ in range(num_samples):
# one sample
length_buffer = file.read(4) # int
length = struct.unpack('i', length_buffer)
label_buffer = file.read(4) # int
label = struct.unpack('i', label_buffer)[0]
dense_buffer = file.read(4 * 13) # dense_dim * float
dense = struct.unpack("13f", dense_buffer)
keys = []
for _ in range(slot_num):
nnz_buffer = file.read(4) # int
nnz = struct.unpack("i", nnz_buffer)[0]
key_buffer = file.read(key_type_map[key_type][1] * nnz) # nnz * sizeof(key_type)
key = struct.unpack(str(nnz) + key_type_map[key_type][0], key_buffer)
keys += list(key)
check_bit_buffer = file.read(1) # char
check_bit = struct.unpack("c", check_bit_buffer)[0]
batch_label.append(label)
batch_dense.append(dense)
batch_wide_data.append(keys[0:2])
batch_deep_data.append(keys[2:28])
batch_label = np.reshape(np.array(batch_label, dtype=np.float32), newshape=(num_samples, 1))
batch_dense = np.reshape(np.array(batch_dense, dtype=np.float32), newshape=(num_samples, 13))
batch_wide_data = np.reshape(np.array(batch_wide_data, dtype=np.int64), newshape=(num_samples, 1, 2))
batch_deep_data = np.reshape(np.array(batch_deep_data, dtype=np.int64), newshape=(num_samples, 26, 1))
return batch_label, batch_dense, batch_wide_data, batch_deep_data
batch_size = 64
num_batches = 100
data_file = "./wdl_data/val/sparse_embedding0.data" # there are totally 40960 samples
onnx_model_path = "wdl.onnx"
label, dense, wide_data, deep_data = read_samples_for_wdl(data_file, batch_size*num_batches, key_type="I32", slot_num = 27)
import onnxruntime as ort
sess = ort.InferenceSession(onnx_model_path)
res = sess.run(output_names=[sess.get_outputs()[0].name],
input_feed={sess.get_inputs()[0].name: dense, sess.get_inputs()[1].name: wide_data, sess.get_inputs()[2].name: deep_data})
onnx_preds = res[0].reshape((batch_size*num_batches,))
print("ONNX Runtime Predicions:", onnx_preds)
ONNX Runtime Predicions: [0.02525118 0.00920713 0.0080741 ... 0.02893934 0.02577347 0.1296753 ]
We can then make inference based on HugeCTR APIs and compare the prediction results.
dense_model = "wdl_dense_2000.model"
sparse_models = ["wdl0_sparse_2000.model", "wdl1_sparse_2000.model"]
graph_config = "wdl.json"
data_source = "./wdl_data/file_list_test.0.txt"
import hugectr
from mpi4py import MPI
from hugectr.inference import InferenceParams, CreateInferenceSession
inference_params = InferenceParams(model_name = "wdl",
max_batchsize = batch_size,
hit_rate_threshold = 0.6,
dense_model_file = dense_model,
sparse_model_files = sparse_models,
device_id = 0,
use_gpu_embedding_cache = True,
cache_size_percentage = 0.6,
i64_input_key = False)
inference_session = CreateInferenceSession(graph_config, inference_params)
hugectr_preds = inference_session.predict(num_batches, data_source, hugectr.DataReaderType_t.Norm, hugectr.Check_t.Sum)
print("HugeCTR Predictions: ", hugectr_preds)
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m49s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[17d09h43m53s][HUGECTR][INFO]: Global seed is 3782721491
[17d09h43m55s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[17d09h43m55s][HUGECTR][INFO]: Start all2all warmup
[17d09h43m55s][HUGECTR][INFO]: End all2all warmup
[17d09h43m55s][HUGECTR][INFO]: Use mixed precision: 0
[17d09h43m55s][HUGECTR][INFO]: start create embedding for inference
[17d09h43m55s][HUGECTR][INFO]: sparse_input name wide_data
[17d09h43m55s][HUGECTR][INFO]: sparse_input name deep_data
[17d09h43m55s][HUGECTR][INFO]: create embedding for inference success
[17d09h43m55s][HUGECTR][INFO]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
HugeCTR Predictions: [0.02525118 0.00920718 0.00807416 ... 0.0289393 0.02577345 0.12967525]
print("Min absolute error: ", np.min(np.abs(onnx_preds-hugectr_preds)))
print("Mean absolute error: ", np.mean(np.abs(onnx_preds-hugectr_preds)))
print("Max absolute error: ", np.max(np.abs(onnx_preds-hugectr_preds)))
Min absolute error: 0.0
Mean absolute error: 2.3289697e-08
Max absolute error: 1.1920929e-07
API Signature for hugectr2onnx.converter
NAME
hugectr2onnx.converter
FUNCTIONS
convert(onnx_model_path, graph_config, dense_model, convert_embedding=False, sparse_models=[], ntp_file=None, graph_name='hugectr')
Convert a HugeCTR model to an ONNX model
Args:
onnx_model_path: the path to store the ONNX model
graph_config: the graph configuration JSON file of the HugeCTR model
dense_model: the file of the dense weights for the HugeCTR model
convert_embedding: whether to convert the sparse embeddings for the HugeCTR model (optional)
sparse_models: the files of the sparse embeddings for the HugeCTR model (optional)
ntp_file: the file of the non-trainable parameters for the HugeCTR model (optional)
graph_name: the graph name for the ONNX model (optional)