DLRM using SparseOperationKit
Demonstrates how to build DLRM model with SparseOperationKit.
You can find the source codes in sparse_operation_kit/documents/tutorials/DLRM/
.
steps
Generate datasets
Criteo Terabytes Dataset will be used. Download these files. And there are several options for you to generate datasets.
[Option1]
Follow TensorFlow’s instructions to process these files and save as CSV files.
[Option2]
Follow HugeCTR’s instructions to process these files. Then convert the generated binary files to CSV files.
$ python3 bin2csv.py \
--input_file="YourBinaryFilePath/train.bin" \
--num_output_files=1024 \
--output_path="./train/" \
--save_prefix="train_"
$ python3 bin2csv.py \
--input_file="YourBinaryFilePath/test.bin" \
--num_output_files=64 \
--output_path="./test/" \
--save_prefix="test_"
Set common params
$ export EMBEDDING_DIM=32
Run DLRM with TensorFlow
$ mpiexec --allow-run-as-root -np 4 \
python3 main.py \
--global_batch_size=16384 \
--train_file_pattern="./train/*.csv" \
--test_file_pattern="./test/*.csv" \
--embedding_layer="TF" \
--embedding_vec_size=$EMBEDDING_DIM \
--bottom_stack 512 256 $EMBEDDING_DIM \
--top_stack 1024 1024 512 256 1 \
--distribute_strategy="multiworker" \
--TF_MP=1
Run DLRM with SOK
$ mpiexec --allow-run-as-root -np 4 \
python3 main.py \
--global_batch_size=16384 \
--train_file_pattern="./train/*.csv" \
--test_file_pattern="./test/*.csv" \
--embedding_layer="SOK" \
--embedding_vec_size=$EMBEDDING_DIM \
--bottom_stack 512 256 $EMBEDDING_DIM \
--top_stack 1024 1024 512 256 1 \
--distribute_strategy="multiworker"
reference
Criteo TeraBytes Datasets (https://labs.criteo.com/2013/12/download-terabyte-click-logs/)
TensorFlow DLRM model (https://github.com/tensorflow/models/tree/master/official/recommendation/ranking)