Scaling Large Datasets with Criteo

Criteo provides the largest publicly available dataset for recommender systems with a size of 1TB of uncompressed click logs that contain 4 billion examples.

We demonstrate how to scale NVTabular, as well as:

Use multiple GPUs and nodes with NVTabular for feature engineering.
Train recommender system models with the Merlin Models for TensorFlow.
Train recommender system models with HugeCTR using multiple GPUs.
Inference with the Triton Inference Server and Merlin Models for TensorFlow or HugeCTR.

Our recommendation is to use our latest stable Merlin containers for the examples. Each notebook provides the required container.

Explore the following notebooks: