Scaling to Large Datasets with Criteo

Criteo provides the largest publicly available dataset for recommender systems. The dataset is 1 TB uncompressed click logs of 4 billion examples. The example notebooks show how to scale NVTabular in the following ways:

Using multiple GPUs and multiple nodes with NVTabular for ETL.
Training recommender system model with NVTabular dataloader for PyTorch.

Refer to the following notebooks:

Download and Convert
ETL with NVTabular
Training a model: HugeCTR | TensorFlow
Use Triton Inference Server to serve a model: HugeCTR | TensorFlow

Other Versions v: v1.7.0

Tags: v1.7.0; v1.8.0; v1.8.1; v23.02.00; v23.04.00; v23.05.00

Branches: main; stable