HugeCTR Example Notebooks
This directory contains a set of Jupyter notebook that demonstrate how to use HugeCTR.
Quickstart
The simplest way to run a one of our notebooks is with a Docker container. A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. Docker images are available from the NVIDIA GPU Cloud (NGC). If you prefer to build the HugeCTR Docker image on your own, refer to Set Up the Development Environment With Merlin Containers.
Pull the NGC Docker
Pull the container using the following command:
docker pull nvcr.io/nvidia/merlin/merlin-training:22.04
To run the Sparse Operation Kit notebooks, pull the
nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.04
container.
Clone the HugeCTR Repository
Use the following command to clone the HugeCTR repository:
git clone https://github.com/NVIDIA/HugeCTR
Start the Jupyter Notebook
Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-training:22.04
To run the Sparse Operation Kit notebooks, specify the
nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.04
container.Start Jupyter using these commands:
cd /hugectr/notebooks jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'
Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser:
http://[host machine]:8888
Use the token available from the output by running the command above to log in. For example:
http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b
Optional: Import MPI.
By default, HugeCTR initializes and finalizes MPI when you run the
import hugectr
statement within the NGC Merlin container. If you build and install HugeCTR yourself, specify theENABLE_MULTINODES=ON
argument when you build. See Build HugeCTR from Source.If your program uses MPI for a reason other than interacting with HugeCTR, initialize MPI with the
from mpi4py import MPI
statement before you import HugeCTR.Important Note:
HugeCTR is written in CUDA/C++ and wrapped to Python using Pybind11. The C++ output will not display in Notebook cells unless you run the Python script in a command line manner.
Notebook List
The notebooks are located within the container and can be found in the /hugectr/notebooks
directory.
Here’s a list of notebooks that you can run:
ecommerce-example.ipynb: Explains how to train and inference with the eCommerce dataset.
movie-lens-example.ipynb: Explains how to train and inference with the MoveLense dataset.
hugectr-criteo.ipynb: Explains the usage of HugeCTR Python interface with the Criteo dataset.
hugectr2onnx_demo.ipynb: Explains how to convert the trained HugeCTR model to ONNX.
continuous_training.ipynb: Notebook to introduce how to deploy continued training with HugeCTR.
hugectr_wdl_prediction.ipynb: Tutorial how to train a wdl model using HugeCTR High-level python API.
news-example.ipynb: Tutorial to demonstrate NVTabular for ETL the data and HugeCTR for training Deep Neural Network models on MIND dataset.
multi_gpu_offline_inference.ipynb: Explain how to do multi-GPU offline inference with HugeCTR Python APIs.
hps_demo.ipynb: Demonstrate how to utilize HPS Python APIs together with ONNX Runtime APIs to create an ensemble inference model.
training_with_hdfs.ipynb: Demonstrates how to train a model with data that is stored in Hadoop HDFS.
The multi-modal-data series of notebooks demonstrate how to use of multi-modal data such as text and images for the task of movie recommendation. The notebooks use the Movielens-25M dataset.
For Sparse Operation Kit notebooks, refer to the sparse_operation_kit/notebooks/ directory of the repository or the documentation.
System Specifications
The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.
Notebook |
CPU |
GPU |
#GPUs |
Author |
---|---|---|---|---|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
1 |
Vinh Nguyen |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
1 |
Xiaolei Shi |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-16GB |
8 |
Vinh Nguyen |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-16GB |
1 |
Kingsley Liu |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
1 |
Kingsley Liu |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
4 |
Kingsley Liu |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
1 |
Kingsley Liu |
|
AMD Ryzen 9 3900X 12-Core |
GeForce RTX 2080Ti |
1 |
Yingcan Wei |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
1 |
Vinh Nguyen |
|
Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz |
Tesla V100-SXM2-32GB |
4 |
Ashish Sardana |