DLRM using SparseOperationKit

Demonstrates how to build DLRM model with SparseOperationKit.

You can find the source codes in sparse_operation_kit/documents/tutorials/DLRM/.

steps

Generate datasets

Criteo Terabytes Dataset will be used. Download these files. And there are several options for you to generate datasets.

[Option1]

Follow TensorFlow’s instructions to process these files and save as CSV files.

[Option2]

Follow HugeCTR’s instructions to process these files. Then convert the generated binary files to CSV files.

$ python3 bin2csv.py \
    --input_file="YourBinaryFilePath/train.bin" \
    --num_output_files=1024 \
    --output_path="./train/" \
    --save_prefix="train_"

$ python3 bin2csv.py \
    --input_file="YourBinaryFilePath/test.bin" \
    --num_output_files=64 \
    --output_path="./test/" \
    --save_prefix="test_"

Set common params

$ export EMBEDDING_DIM=32

Run DLRM with TensorFlow

$ mpiexec --allow-run-as-root -np 4 \
    python3 main.py \
        --global_batch_size=16384 \
        --train_file_pattern="./train/*.csv" \
        --test_file_pattern="./test/*.csv" \
        --embedding_layer="TF" \
        --embedding_vec_size=$EMBEDDING_DIM \
        --bottom_stack 512 256 $EMBEDDING_DIM \
        --top_stack 1024 1024 512 256 1 \
        --distribute_strategy="multiworker" \
        --TF_MP=1

Run DLRM with SOK

$ mpiexec --allow-run-as-root -np 4 \
    python3 main.py \
        --global_batch_size=16384 \
        --train_file_pattern="./train/*.csv" \
        --test_file_pattern="./test/*.csv" \
        --embedding_layer="SOK" \
        --embedding_vec_size=$EMBEDDING_DIM \
        --bottom_stack 512 256 $EMBEDDING_DIM \
        --top_stack 1024 1024 512 256 1 \
        --distribute_strategy="multiworker"

reference

DLRM (https://arxiv.org/pdf/1906.00091.pdf)
Criteo TeraBytes Datasets (https://labs.criteo.com/2013/12/download-terabyte-click-logs/)
TensorFlow DLRM model (https://github.com/tensorflow/models/tree/master/official/recommendation/ranking)