Scaling Large Datasets with Criteo#

Criteo provides the largest publicly available dataset for recommender systems with a size of 1TB of uncompressed click logs that contain 4 billion examples.

We demonstrate how to scale NVTabular, as well as:

Use multiple GPUs and nodes with NVTabular for feature engineering.
Train recommender system models with the Merlin Models for TensorFlow.
Train recommender system models with HugeCTR using multiple GPUs.
Inference with the Triton Inference Server and Merlin Models for TensorFlow or HugeCTR.

Our recommendation is to use our latest stable Merlin containers for the examples. Each notebook provides the required container.

Explore the following notebooks:

Training and Deployment with TensorFlow:

Download and Convert
Feature Engineering with NVTabular
Training with TensorFlow
Deploy the TensorFlow Model with Triton Inference Server

Training and Deployment with HugeCTR:

Download and Convert
Feature Engineering with NVTabular
Training with HugeCTR
Deploy the HugeCTR Model with Triton Inference Server