Hierarchical Parameter Server Notebooks

This directory contains a set of Jupyter notebooks that demonstrate how to use HPS in TensorFlow.

Quickstart

The simplest way to run a one of our notebooks is with a Docker container. A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. Docker images are available from the NVIDIA GPU Cloud (NGC). If you prefer to build the HugeCTR Docker image on your own, refer to Set Up the Development Environment With Merlin Containers.

Pull the NGC Docker

Pull the container using the following command:

docker pull nvcr.io/nvidia/merlin/merlin-tensorflow:23.02

Clone the HugeCTR Repository

Use the following command to clone the HugeCTR repository:

git clone https://github.com/NVIDIA/HugeCTR

Start the Jupyter Notebook

Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:

docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow:23.02

Start Jupyter using these commands:

cd /hugectr/hps_tf/notebooks
jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'

Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser: http://[host machine]:8888

Use the token available from the output by running the command above to log in. For example:

http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b

Notebook List

Here’s a list of notebooks that you can run:

hierarchical_parameter_server_demo.ipynb: Demonstrates how to train with native TF layers and make inference with HPS.
hps_multi_table_sparse_input_demo.ipynb: Demonstrates how to train with native TF layers and make inference with HPS when there are multiple embedding tables and the input keys are in the form of sparse tensor.
hps_pretrained_model_training_demo.ipynb: Demonstrates how to leverage the HPS to load the pre-trained embedding tables for new training tasks and how to use HPS with TensorFlow Mirrored Strategy.
sok_to_hps_dlrm_demo.ipynb: Demonstrates how to train a DLRM model with SparseOperationKit (SOK) and then make inference with HPS.
hps_tensorflow_triton_deployment_demo.ipynb: Demonstrates how to deploy the inference SavedModel that leverages HPS with the Triton TensorFlow backend. The feature of implicit HPS initialization is utilized in this notebook. It also shows how to apply TF-TRT optimization to SavedModel whose embedding lookup is based on HPS.
hps_table_fusion_demo.ipynb: Demonstrates how to fuse embedding tables of the same embedding vector size with the HPS plugin for TensorFlow.

System Specifications

The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.

Notebook	CPU	GPU	#GPUs	Author
hierarchical_parameter_server_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	1	Kingsley Liu
hps_multi_table_sparse_input_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	1	Kingsley Liu
hps_pretrained_model_training_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	4	Kingsley Liu
sok_to_hps_dlrm_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	1	Kingsley Liu
hps_tensorflow_triton_deployment_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	1	Kingsley Liu
hps_table_fusion_demo.ipynb	Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz 512 GB Memory	Tesla V100-SXM2-32GB 32 GB Memory	1	Kingsley Liu