# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.
http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_embedding-collection/nvidia_logo.png

HugeCTR Embedding Collection

About this Notebook

This notebook demonstrates the following:

  • Introduces the API of the embedding collection.

  • Introduces the embedding table placement strategy (ETPS) and how to configure ETPS in embedding collection.

  • Shows how to use an embedding collection in a DLRM model with the Criteo dataset for training and evaluation. The notebook shows two different ETPS as reference.

Concepts and API Reference

The following key classes and configuration file are used in this notebook:

  • hugectr.EmbeddingTableConfig

  • hugectr.EmbeddingPlanner

  • JSON plan file for the ETPS

For the concepts and API reference information about the classes and file, see the Overview of Using the HugeCTR Embedding Collection in the HugeCTR Layer Classes and Methods information.

Setup

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

Use an Embedding Collection with a DLRM Model

Data Preparation

To download and prepare the dataset we will be doing the following steps. At the end of this cell, we provide the shell commands you can run on the terminal to get the data ready for this notebook.

Note: If you already have the data downloaded, then skip to the preprocessing step (2). If preprocessing is also done, skip to creating the softlink between the processed data to the notebooks/ directory (3).

  1. Download the Criteo dataset

To preprocess the downloaded Kaggle Criteo dataset, we’ll make the following operations:

  • Reduce the amounts of data to speed up the preprocessing

  • Fill missing values

  • Remove the feature values whose occurrences are very rare, etc.

  1. Preprocessing by Pandas:

    Meanings of the command line arguments:

    • The 1st argument represents the dataset postfix. It is 1 here since day_1 is used.

    • The 2nd argument wdl_data is where the preprocessed data is stored.

    • The 3rd argument pandas is the processing script going to use, here we choose pandas.

    • The 4th argument 1 embodies that the normalization is applied to dense features.

    • The 5th argument 1 means that the feature crossing is applied.

    • The 6th argument 100 means the number of data files in each file list.

    For more details about the data preprocessing, please refer to the “Preprocess the Criteo Dataset” section of the README in the samples/criteo directory of the repository on GitHub.

  2. Create a soft link of the dataset folder to the path of this notebook

Run the following commands on the terminal to prepare the data for this notebook

export project_root=/home/hugectr # set this to the directory where hugectr is downloaded
cd ${project_root}/tools
# Step 1
wget https://storage.googleapis.com/criteo-cail-datasets/day_0.gz
#Step 2
bash preprocess.sh 0 deepfm_data_nvt nvt 1 0 0
#Step 3
ln -s ${project_root}/tools/deepfm_data_nvt ${project_root}/notebooks/deepfm_data_nvt

Prepare the Training Script

This notebook was developed with on single DGX-1 to run the DLRM model in this notebook. The GPU info in DGX-1 is as follows. It consists of 8 V100-SXM2 GPUs.

! nvidia-smi
Thu Jun 23 00:14:56 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   33C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0    45W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   33C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   35C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   34C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The training script, dlrm_train.py, uses the the embedding collection API. The script accepts one command-line argument that specifies the plan file so we can run the script several times and evaluate different ETPS:

%%writefile dlrm_train.py
import sys
import hugectr

plan_file = sys.argv[1]
slot_size_array = [203931, 18598, 14092, 7012, 18977, 4, 6385, 1245, 49,
                   186213, 71328, 67288, 11, 2168, 7338, 61, 4, 932, 15,
                   204515, 141526, 199433, 60919, 9137, 71, 34]

solver = hugectr.CreateSolver(
    max_eval_batches=70,
    batchsize_eval=65536,
    batchsize=65536,
    lr=0.5,
    warmup_steps=300,
    vvgpu=[[0, 1, 2, 3, 4, 5, 6, 7]],
    repeat_dataset=True,
    i64_input_key=True,
    metrics_spec={hugectr.MetricsType.AverageLoss: 0.0},
    use_embedding_collection=True,
)

reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=["./deepfm_data_nvt/train/_file_list.txt"],
    eval_source="./deepfm_data_nvt/val/_file_list.txt",
    check_type=hugectr.Check_t.Non,
    slot_size_array=slot_size_array
)

optimizer = hugectr.CreateOptimizer(
    optimizer_type=hugectr.Optimizer_t.SGD,
    update_type=hugectr.Update_t.Local,
    atomic_update=True
)

model = hugectr.Model(solver, reader, optimizer)

model.add(
    hugectr.Input(
        label_dim=1,
        label_name="label",
        dense_dim=13,
        dense_name="dense",
        data_reader_sparse_param_array=[
            hugectr.DataReaderSparseParam("data{}".format(i), 1, False, 1)
            for i in range(len(slot_size_array))
        ],
    )
)

# Create the embedding table.
embedding_table_list = []
for i in range(len(slot_size_array)):
    embedding_table_list.append(
        hugectr.EmbeddingTableConfig(
            table_id=i,
            max_vocabulary_size=slot_size_array[i],
            ev_size=128,
            min_key=0,
            max_key=slot_size_array[i],
        )
    )

# Create the embedding planner and embedding collection.
embedding_planner = hugectr.EmbeddingPlanner()
emb_vec_list = []
for i in range(len(slot_size_array)):
    embedding_planner.embedding_lookup(
        table_config=embedding_table_list[i],
        bottom_name="data{}".format(i),
        top_name="emb_vec{}".format(i),
        combiner="sum"
    )

embedding_collection = embedding_planner.create_embedding_collection(plan_file)

model.add(embedding_collection)
# need concat
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.Concat,
        bottom_names=["emb_vec{}".format(i) for i in range(len(slot_size_array))],
        top_names=["sparse_embedding1"],
        axis=1
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["dense"],
        top_names=["fc1"],
        num_output=512
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu1"],
        top_names=["fc2"],
        num_output=256
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc2"], top_names=["relu2"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu2"],
        top_names=["fc3"],
        num_output=128
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc3"], top_names=["relu3"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.Interaction,  # interaction only support 3-D input
        bottom_names=["relu3", "sparse_embedding1"],
        top_names=["interaction1"],
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["interaction1"],
        top_names=["fc4"],
        num_output=1024,
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc4"], top_names=["relu4"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu4"],
        top_names=["fc5"],
        num_output=1024,
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc5"], top_names=["relu5"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu5"],
        top_names=["fc6"],
        num_output=512,
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc6"], top_names=["relu6"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu6"],
        top_names=["fc7"],
        num_output=256,
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc7"], top_names=["relu7"]
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu7"],
        top_names=["fc8"],
        num_output=1,
    )
)

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
        bottom_names=["fc8", "label"],
        top_names=["loss"],
    )
)

model.compile()
model.summary()
model.fit(
    max_iter=1000,
    display=100,
    eval_interval=100,
    snapshot=10000000,
    snapshot_prefix="dlrm",
)
Overwriting dlrm_train.py

Embedding Table Placement Strategy: Data Parallel and Model Parallel

The following generate_plan() function shows how to configure small tables as data parallel and use model parallel for larger tables. Each table is on single GPU and different GPU will hold different table—the same way we work with data in hugectr.LocalizedHashEmbedding.

def print_plan(plan):
    for id, single_gpu_plan in enumerate(plan):
        print("single_gpu_plan index = {}".format(id))
        for plan_attr in single_gpu_plan:
            for key in plan_attr:
                if key != "global_embedding_list":
                    print("\t{}:{}".format(key, plan_attr[key]))
                else:
                    prefix_len = len(key)
                    left_space_fill = " " * prefix_len
                    print("\t{}:{}".format(key, plan_attr[key][0]))
                    for index in range(1, len(plan_attr[key])):
                        print("\t{}:{}".format(left_space_fill, plan_attr[key][index]))


def generate_plan(slot_size_array, gpu_count, plan_file):

    mp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] > 6000]
    dp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] <= 6000]

    # Place the table across all GPUs.
    plan = []
    for gpu_id in range(gpu_count):
        single_gpu_plan = []
        mp_plan = {
            "local_embedding_list": [
                table_id
                for i, table_id in enumerate(mp_table)
                if i % gpu_count == gpu_id
            ],
            "table_placement_strategy": "mp",
        }
        dp_plan = {"local_embedding_list": dp_table, "table_placement_strategy": "dp"}
        single_gpu_plan.append(mp_plan)
        single_gpu_plan.append(dp_plan)
        plan.append(single_gpu_plan)

    # Generate the global view of table placement.
    mp_global_embedding_list = []
    dp_global_embedding_list = []
    for single_gpu_plan in plan:
        mp_global_embedding_list.append(single_gpu_plan[0]["local_embedding_list"])
        dp_global_embedding_list.append(single_gpu_plan[1]["local_embedding_list"])
    for single_gpu_plan in plan:
        single_gpu_plan[0]["global_embedding_list"] = mp_global_embedding_list
        single_gpu_plan[1]["global_embedding_list"] = dp_global_embedding_list
    print_plan(plan)

    # Write the plan file to disk.
    import json
    with open(plan_file, "w") as f:
        json.dump(plan, f, indent=4)
slot_size_array = [
    203931,
    18598,
    14092,
    7012,
    18977,
    4,
    6385,
    1245,
    49,
    186213,
    71328,
    67288,
    11,
    2168,
    7338,
    61,
    4,
    932,
    15,
    204515,
    141526,
    199433,
    60919,
    9137,
    71,
    34,
]

generate_plan(
    slot_size_array=slot_size_array,
    gpu_count=8,
    plan_file="./dp_and_localized_plan.json",
)
single_gpu_plan index = 0
	local_embedding_list:[0, 11]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 1
	local_embedding_list:[1, 14]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 2
	local_embedding_list:[2, 19]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 3
	local_embedding_list:[3, 20]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 4
	local_embedding_list:[4, 21]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 5
	local_embedding_list:[6, 22]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 6
	local_embedding_list:[9, 23]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 7
	local_embedding_list:[10]
	table_placement_strategy:mp
	global_embedding_list:[0, 11]
	                     :[1, 14]
	                     :[2, 19]
	                     :[3, 20]
	                     :[4, 21]
	                     :[6, 22]
	                     :[9, 23]
	                     :[10]
	local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	table_placement_strategy:dp
	global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
	                     :[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
!python3 dlrm_train.py ./dp_and_localized_plan.json
HugeCTR Version: 3.7
====================================================Model Init=====================================================
[HCTR][08:41:17.942][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:41:17.942][INFO][RK0][main]: Global seed is 1323844045
[HCTR][08:41:18.400][INFO][RK0][main]: Device to NUMA mapping:
  GPU 0 ->  node 0
  GPU 1 ->  node 0
  GPU 2 ->  node 0
  GPU 3 ->  node 0
  GPU 4 ->  node 1
  GPU 5 ->  node 1
  GPU 6 ->  node 1
  GPU 7 ->  node 1
[HCTR][08:41:29.902][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][08:41:29.903][INFO][RK0][main]: Start all2all warmup
[HCTR][08:41:30.083][INFO][RK0][main]: End all2all warmup
[HCTR][08:41:30.095][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:41:30.097][INFO][RK0][main]: Device 0: Tesla V100-SXM2-16GB
[HCTR][08:41:30.097][INFO][RK0][main]: Device 1: Tesla V100-SXM2-16GB
[HCTR][08:41:30.098][INFO][RK0][main]: Device 2: Tesla V100-SXM2-16GB
[HCTR][08:41:30.099][INFO][RK0][main]: Device 3: Tesla V100-SXM2-16GB
[HCTR][08:41:30.100][INFO][RK0][main]: Device 4: Tesla V100-SXM2-16GB
[HCTR][08:41:30.100][INFO][RK0][main]: Device 5: Tesla V100-SXM2-16GB
[HCTR][08:41:30.101][INFO][RK0][main]: Device 6: Tesla V100-SXM2-16GB
[HCTR][08:41:30.102][INFO][RK0][main]: Device 7: Tesla V100-SXM2-16GB
[HCTR][08:41:30.103][INFO][RK0][main]: num of DataReader workers: 8
[HCTR][08:41:30.133][DEBUG][RK0][tid #140397011531520]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:41:30.133][DEBUG][RK0][tid #140397774890752]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:41:30.140][DEBUG][RK0][tid #140397783283456]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:41:30.140][DEBUG][RK0][tid #140394788546304]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:41:30.140][DEBUG][RK0][tid #140406197053184]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:41:30.140][DEBUG][RK0][tid #140397829383936]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:41:30.140][DEBUG][RK0][tid #140394780153600]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:41:30.141][DEBUG][RK0][tid #140397003138816]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:41:30.155][INFO][RK0][main]: Vocabulary size: 1221286
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394528503552]: [HCTR][08:41:30.156][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:41:30.174][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:41:30.179][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:41:30.767][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
===================================================Model Summary===================================================
[HCTR][08:42:08.060][INFO][RK0][main]: label                                   Dense                         Sparse                        
label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
(None, 1)                               (None, 13)                              
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type                              Input Name                    Output Name                   Output Shape                  
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
EmbeddingCollection                     data0                         emb_vec0                      (None, 1, 128)                
                                        data1                         emb_vec1                      (None, 1, 128)                
                                        data2                         emb_vec2                      (None, 1, 128)                
                                        data3                         emb_vec3                      (None, 1, 128)                
                                        data4                         emb_vec4                      (None, 1, 128)                
                                        data5                         emb_vec5                      (None, 1, 128)                
                                        data6                         emb_vec6                      (None, 1, 128)                
                                        data7                         emb_vec7                      (None, 1, 128)                
                                        data8                         emb_vec8                      (None, 1, 128)                
                                        data9                         emb_vec9                      (None, 1, 128)                
                                        data10                        emb_vec10                     (None, 1, 128)                
                                        data11                        emb_vec11                     (None, 1, 128)                
                                        data12                        emb_vec12                     (None, 1, 128)                
                                        data13                        emb_vec13                     (None, 1, 128)                
                                        data14                        emb_vec14                     (None, 1, 128)                
                                        data15                        emb_vec15                     (None, 1, 128)                
                                        data16                        emb_vec16                     (None, 1, 128)                
                                        data17                        emb_vec17                     (None, 1, 128)                
                                        data18                        emb_vec18                     (None, 1, 128)                
                                        data19                        emb_vec19                     (None, 1, 128)                
                                        data20                        emb_vec20                     (None, 1, 128)                
                                        data21                        emb_vec21                     (None, 1, 128)                
                                        data22                        emb_vec22                     (None, 1, 128)                
                                        data23                        emb_vec23                     (None, 1, 128)                
                                        data24                        emb_vec24                     (None, 1, 128)                
                                        data25                        emb_vec25                     (None, 1, 128)                
------------------------------------------------------------------------------------------------------------------
Concat                                  emb_vec0                      sparse_embedding1             (None, 26, 128)               
                                        emb_vec1                                                                                  
                                        emb_vec2                                                                                  
                                        emb_vec3                                                                                  
                                        emb_vec4                                                                                  
                                        emb_vec5                                                                                  
                                        emb_vec6                                                                                  
                                        emb_vec7                                                                                  
                                        emb_vec8                                                                                  
                                        emb_vec9                                                                                  
                                        emb_vec10                                                                                 
                                        emb_vec11                                                                                 
                                        emb_vec12                                                                                 
                                        emb_vec13                                                                                 
                                        emb_vec14                                                                                 
                                        emb_vec15                                                                                 
                                        emb_vec16                                                                                 
                                        emb_vec17                                                                                 
                                        emb_vec18                                                                                 
                                        emb_vec19                                                                                 
                                        emb_vec20                                                                                 
                                        emb_vec21                                                                                 
                                        emb_vec22                                                                                 
                                        emb_vec23                                                                                 
                                        emb_vec24                                                                                 
                                        emb_vec25                                                                                 
------------------------------------------------------------------------------------------------------------------
InnerProduct                            dense                         fc1                           (None, 512)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc1                           relu1                         (None, 512)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu1                         fc2                           (None, 256)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc2                           relu2                         (None, 256)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu2                         fc3                           (None, 128)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc3                           relu3                         (None, 128)                   
------------------------------------------------------------------------------------------------------------------
Interaction                             relu3                         interaction1                  (None, 480)                   
                                        sparse_embedding1                                                                         
------------------------------------------------------------------------------------------------------------------
InnerProduct                            interaction1                  fc4                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc4                           relu4                         (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu4                         fc5                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc5                           relu5                         (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu5                         fc6                           (None, 512)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc6                           relu6                         (None, 512)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu6                         fc7                           (None, 256)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc7                           relu7                         (None, 256)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu7                         fc8                           (None, 1)                     
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss                  fc8                           loss                                                        
                                        label                                                                                     
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:42:08.061][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
[HCTR][08:42:08.061][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:42:08.061][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
[HCTR][08:42:08.061][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:42:08.061][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][08:42:08.061][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
[HCTR][08:42:08.061][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][08:42:08.061][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
[HCTR][08:42:08.061][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
[HCTR][08:42:16.322][INFO][RK0][main]: Iter: 100 Time(100 iters): 8.19237s Loss: 0.140113 lr:0.168333
[HCTR][08:42:18.453][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:18.491][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:18.534][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:18.572][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:18.610][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:18.651][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:18.684][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:18.720][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:18.957][INFO][RK0][main]: Evaluation, AverageLoss: 0.141261
[HCTR][08:42:18.957][INFO][RK0][main]: Eval Time for 70 iters: 2.63429s
[HCTR][08:42:27.041][INFO][RK0][main]: Iter: 200 Time(100 iters): 10.6496s Loss: 0.142313 lr:0.335
[HCTR][08:42:29.077][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:29.115][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:29.157][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:29.195][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:29.237][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:29.275][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:29.312][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:29.351][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:29.532][INFO][RK0][main]: Evaluation, AverageLoss: 0.141891
[HCTR][08:42:29.532][INFO][RK0][main]: Eval Time for 70 iters: 2.4907s
[HCTR][08:42:37.639][INFO][RK0][main]: Iter: 300 Time(100 iters): 10.5395s Loss: 0.154403 lr:0.5
[HCTR][08:42:39.748][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:39.785][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:39.824][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:39.862][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:39.905][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:39.952][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:39.987][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:40.021][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:40.125][INFO][RK0][main]: Evaluation, AverageLoss: 0.147726
[HCTR][08:42:40.125][INFO][RK0][main]: Eval Time for 70 iters: 2.48534s
[HCTR][08:42:48.262][INFO][RK0][main]: Iter: 400 Time(100 iters): 10.5647s Loss: 0.141461 lr:0.5
[HCTR][08:42:50.199][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:50.274][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:50.311][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:50.462][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:50.533][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:50.638][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:50.675][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:50.714][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:50.788][INFO][RK0][main]: Evaluation, AverageLoss: 0.140187
[HCTR][08:42:50.788][INFO][RK0][main]: Eval Time for 70 iters: 2.52533s
[HCTR][08:42:58.948][INFO][RK0][main]: Iter: 500 Time(100 iters): 10.605s Loss: 0.142035 lr:0.5
[HCTR][08:43:00.914][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:00.951][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:00.990][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:01.023][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:01.057][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:01.094][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:01.127][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:01.163][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:01.403][INFO][RK0][main]: Evaluation, AverageLoss: 0.140354
[HCTR][08:43:01.403][INFO][RK0][main]: Eval Time for 70 iters: 2.45442s
[HCTR][08:43:04.871][DEBUG][RK0][tid #140397003138816]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:43:04.951][DEBUG][RK0][tid #140397011531520]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:43:05.031][DEBUG][RK0][tid #140406197053184]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:43:05.111][DEBUG][RK0][tid #140397829383936]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:43:05.192][DEBUG][RK0][tid #140397783283456]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:43:05.274][DEBUG][RK0][tid #140397774890752]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:43:05.354][DEBUG][RK0][tid #140394788546304]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:43:06.072][DEBUG][RK0][tid #140394780153600]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:43:09.539][INFO][RK0][main]: Iter: 600 Time(100 iters): 10.5255s Loss: 0.140006 lr:0.5
[HCTR][08:43:11.577][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:11.615][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:11.653][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:11.690][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:11.734][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:11.780][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:11.813][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:11.851][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:12.020][INFO][RK0][main]: Evaluation, AverageLoss: 0.141187
[HCTR][08:43:12.020][INFO][RK0][main]: Eval Time for 70 iters: 2.4811s
[HCTR][08:43:20.138][INFO][RK0][main]: Iter: 700 Time(100 iters): 10.5241s Loss: 0.143169 lr:0.5
[HCTR][08:43:22.305][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:22.343][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:22.382][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:22.420][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:22.463][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:22.507][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:22.551][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:22.588][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:22.694][INFO][RK0][main]: Evaluation, AverageLoss: 0.140917
[HCTR][08:43:22.694][INFO][RK0][main]: Eval Time for 70 iters: 2.55575s
[HCTR][08:43:30.768][INFO][RK0][main]: Iter: 800 Time(100 iters): 10.5603s Loss: 0.143395 lr:0.5
[HCTR][08:43:32.698][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:32.771][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:32.809][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:32.950][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:33.023][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:33.131][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:33.169][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:33.212][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:33.292][INFO][RK0][main]: Evaluation, AverageLoss: 0.139397
[HCTR][08:43:33.292][INFO][RK0][main]: Eval Time for 70 iters: 2.52409s
[HCTR][08:43:41.361][INFO][RK0][main]: Iter: 900 Time(100 iters): 10.5237s Loss: 0.141716 lr:0.5
[HCTR][08:43:43.361][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:43.399][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:43.436][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:43.474][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:43.518][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:43.555][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:43.589][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:43.626][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:43.867][INFO][RK0][main]: Evaluation, AverageLoss: 0.141604
[HCTR][08:43:43.867][INFO][RK0][main]: Eval Time for 70 iters: 2.50584s
[HCTR][08:43:51.826][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 103.76s.

Embedding Table Placement Strategy: Distributed

The generate_distributed_plan() function shows how to distribute all tables across all GPUs This strategy is similar to hugectr.DistributedHashEmbedding.

def generate_distributed_plan(slot_size_array, gpu_count, plan_file):
    # Place the table across all GPUs.
    plan = []
    for gpu_id in range(gpu_count):
        distributed_plan = {
            "local_embedding_list": [
                table_id for table_id in range(len(slot_size_array))
            ],
            "table_placement_strategy": "mp",
            "shard_id": gpu_id,
            "shards_count": gpu_count,
        }
        plan.append([distributed_plan])

    # Generate the global view of table placement.
    distributed_global_embedding_list = []
    for single_gpu_plan in plan:
        distributed_global_embedding_list.append(
            single_gpu_plan[0]["local_embedding_list"]
        )
    for single_gpu_plan in plan:
        single_gpu_plan[0]["global_embedding_list"] = distributed_global_embedding_list
    print_plan(plan)

    # Write the plan file to disk.
    import json
    with open(plan_file, "w") as f:
        json.dump(plan, f, indent=4)
slot_size_array = [
    203931,
    18598,
    14092,
    7012,
    18977,
    4,
    6385,
    1245,
    49,
    186213,
    71328,
    67288,
    11,
    2168,
    7338,
    61,
    4,
    932,
    15,
    204515,
    141526,
    199433,
    60919,
    9137,
    71,
    34,
]

generate_distributed_plan(
    slot_size_array=slot_size_array,
    gpu_count=8,
    plan_file="./distributed_plan.json"
)
single_gpu_plan index = 0
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:0
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 1
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:1
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 2
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:2
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 3
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:3
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 4
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:4
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 5
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:5
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 6
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:6
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 7
	local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	table_placement_strategy:mp
	shard_id:7
	shards_count:8
	global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
	                     :[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
!python3 dlrm_train.py ./distributed_plan.json
HugeCTR Version: 3.7
====================================================Model Init=====================================================
[HCTR][08:44:05.384][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:44:05.384][INFO][RK0][main]: Global seed is 1510630763
[HCTR][08:44:05.843][INFO][RK0][main]: Device to NUMA mapping:
  GPU 0 ->  node 0
  GPU 1 ->  node 0
  GPU 2 ->  node 0
  GPU 3 ->  node 0
  GPU 4 ->  node 1
  GPU 5 ->  node 1
  GPU 6 ->  node 1
  GPU 7 ->  node 1
[HCTR][08:44:17.340][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][08:44:17.341][INFO][RK0][main]: Start all2all warmup
[HCTR][08:44:17.532][INFO][RK0][main]: End all2all warmup
[HCTR][08:44:17.544][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:44:17.545][INFO][RK0][main]: Device 0: Tesla V100-SXM2-16GB
[HCTR][08:44:17.546][INFO][RK0][main]: Device 1: Tesla V100-SXM2-16GB
[HCTR][08:44:17.547][INFO][RK0][main]: Device 2: Tesla V100-SXM2-16GB
[HCTR][08:44:17.548][INFO][RK0][main]: Device 3: Tesla V100-SXM2-16GB
[HCTR][08:44:17.548][INFO][RK0][main]: Device 4: Tesla V100-SXM2-16GB
[HCTR][08:44:17.549][INFO][RK0][main]: Device 5: Tesla V100-SXM2-16GB
[HCTR][08:44:17.550][INFO][RK0][main]: Device 6: Tesla V100-SXM2-16GB
[HCTR][08:44:17.551][INFO][RK0][main]: Device 7: Tesla V100-SXM2-16GB
[HCTR][08:44:17.552][INFO][RK0][main]: num of DataReader workers: 8
[HCTR][08:44:17.578][DEBUG][RK0][tid #139614253741824]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:44:17.578][DEBUG][RK0][tid #139614119524096]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614371174144]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:44:17.579][DEBUG][RK0][tid #139618506761984]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614505387776]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614387959552]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614111131392]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:44:17.583][INFO][RK0][main]: Vocabulary size: 1221286
[HCTR][08:44:17.583][DEBUG][RK0][tid #139614379566848]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:44:17.583][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:44:17.584][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:44:17.584][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:44:18.246][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
===================================================Model Summary===================================================
[HCTR][08:44:55.511][INFO][RK0][main]: label                                   Dense                         Sparse                        
label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
(None, 1)                               (None, 13)                              
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type                              Input Name                    Output Name                   Output Shape                  
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
EmbeddingCollection                     data0                         emb_vec0                      (None, 1, 128)                
                                        data1                         emb_vec1                      (None, 1, 128)                
                                        data2                         emb_vec2                      (None, 1, 128)                
                                        data3                         emb_vec3                      (None, 1, 128)                
                                        data4                         emb_vec4                      (None, 1, 128)                
                                        data5                         emb_vec5                      (None, 1, 128)                
                                        data6                         emb_vec6                      (None, 1, 128)                
                                        data7                         emb_vec7                      (None, 1, 128)                
                                        data8                         emb_vec8                      (None, 1, 128)                
                                        data9                         emb_vec9                      (None, 1, 128)                
                                        data10                        emb_vec10                     (None, 1, 128)                
                                        data11                        emb_vec11                     (None, 1, 128)                
                                        data12                        emb_vec12                     (None, 1, 128)                
                                        data13                        emb_vec13                     (None, 1, 128)                
                                        data14                        emb_vec14                     (None, 1, 128)                
                                        data15                        emb_vec15                     (None, 1, 128)                
                                        data16                        emb_vec16                     (None, 1, 128)                
                                        data17                        emb_vec17                     (None, 1, 128)                
                                        data18                        emb_vec18                     (None, 1, 128)                
                                        data19                        emb_vec19                     (None, 1, 128)                
                                        data20                        emb_vec20                     (None, 1, 128)                
                                        data21                        emb_vec21                     (None, 1, 128)                
                                        data22                        emb_vec22                     (None, 1, 128)                
                                        data23                        emb_vec23                     (None, 1, 128)                
                                        data24                        emb_vec24                     (None, 1, 128)                
                                        data25                        emb_vec25                     (None, 1, 128)                
------------------------------------------------------------------------------------------------------------------
Concat                                  emb_vec0                      sparse_embedding1             (None, 26, 128)               
                                        emb_vec1                                                                                  
                                        emb_vec2                                                                                  
                                        emb_vec3                                                                                  
                                        emb_vec4                                                                                  
                                        emb_vec5                                                                                  
                                        emb_vec6                                                                                  
                                        emb_vec7                                                                                  
                                        emb_vec8                                                                                  
                                        emb_vec9                                                                                  
                                        emb_vec10                                                                                 
                                        emb_vec11                                                                                 
                                        emb_vec12                                                                                 
                                        emb_vec13                                                                                 
                                        emb_vec14                                                                                 
                                        emb_vec15                                                                                 
                                        emb_vec16                                                                                 
                                        emb_vec17                                                                                 
                                        emb_vec18                                                                                 
                                        emb_vec19                                                                                 
                                        emb_vec20                                                                                 
                                        emb_vec21                                                                                 
                                        emb_vec22                                                                                 
                                        emb_vec23                                                                                 
                                        emb_vec24                                                                                 
                                        emb_vec25                                                                                 
------------------------------------------------------------------------------------------------------------------
InnerProduct                            dense                         fc1                           (None, 512)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc1                           relu1                         (None, 512)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu1                         fc2                           (None, 256)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc2                           relu2                         (None, 256)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu2                         fc3                           (None, 128)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc3                           relu3                         (None, 128)                   
------------------------------------------------------------------------------------------------------------------
Interaction                             relu3                         interaction1                  (None, 480)                   
                                        sparse_embedding1                                                                         
------------------------------------------------------------------------------------------------------------------
InnerProduct                            interaction1                  fc4                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc4                           relu4                         (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu4                         fc5                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc5                           relu5                         (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu5                         fc6                           (None, 512)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc6                           relu6                         (None, 512)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu6                         fc7                           (None, 256)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc7                           relu7                         (None, 256)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu7                         fc8                           (None, 1)                     
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss                  fc8                           loss                                                        
                                        label                                                                                     
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:44:55.512][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
[HCTR][08:44:55.512][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:44:55.512][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
[HCTR][08:44:55.512][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:44:55.512][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][08:44:55.512][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
[HCTR][08:44:55.512][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][08:44:55.512][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
[HCTR][08:44:55.512][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
[HCTR][08:45:11.057][INFO][RK0][main]: Iter: 100 Time(100 iters): 15.3926s Loss: 0.144649 lr:0.168333
[HCTR][08:45:14.328][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:14.386][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:14.444][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:14.503][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:14.562][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:14.620][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:14.677][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:14.736][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:15.139][INFO][RK0][main]: Evaluation, AverageLoss: 0.146034
[HCTR][08:45:15.139][INFO][RK0][main]: Eval Time for 70 iters: 4.08247s
[HCTR][08:45:30.553][INFO][RK0][main]: Iter: 200 Time(100 iters): 19.4961s Loss: 0.149704 lr:0.335
[HCTR][08:45:33.943][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:34.001][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:34.060][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:34.118][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:34.177][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:34.235][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:34.292][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:34.350][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:34.645][INFO][RK0][main]: Evaluation, AverageLoss: 0.146364
[HCTR][08:45:34.645][INFO][RK0][main]: Eval Time for 70 iters: 4.09159s
[HCTR][08:45:50.040][INFO][RK0][main]: Iter: 300 Time(100 iters): 19.3843s Loss: 0.158335 lr:0.5
[HCTR][08:45:53.544][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:53.603][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:53.660][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:53.720][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:53.778][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:53.836][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:53.894][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:53.953][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:54.121][INFO][RK0][main]: Evaluation, AverageLoss: 0.148422
[HCTR][08:45:54.121][INFO][RK0][main]: Eval Time for 70 iters: 4.08104s
[HCTR][08:46:09.558][INFO][RK0][main]: Iter: 400 Time(100 iters): 19.5178s Loss: 0.139716 lr:0.5
[HCTR][08:46:12.751][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:12.868][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:12.926][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:13.164][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:13.280][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:13.457][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:13.514][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:13.574][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:13.683][INFO][RK0][main]: Evaluation, AverageLoss: 0.139018
[HCTR][08:46:13.683][INFO][RK0][main]: Eval Time for 70 iters: 4.12495s
[HCTR][08:46:29.073][INFO][RK0][main]: Iter: 500 Time(100 iters): 19.4974s Loss: 0.139979 lr:0.5
[HCTR][08:46:32.347][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:32.403][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:32.462][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:32.521][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:32.579][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:32.637][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:32.696][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:32.754][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:33.157][INFO][RK0][main]: Evaluation, AverageLoss: 0.138041
[HCTR][08:46:33.157][INFO][RK0][main]: Eval Time for 70 iters: 4.08385s
[HCTR][08:46:39.671][DEBUG][RK0][tid #139614253741824]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:46:39.823][DEBUG][RK0][tid #139614371174144]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:46:39.973][DEBUG][RK0][tid #139618506761984]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:46:40.125][DEBUG][RK0][tid #139614505387776]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:46:40.284][DEBUG][RK0][tid #139614387959552]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:46:40.431][DEBUG][RK0][tid #139614379566848]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:46:40.586][DEBUG][RK0][tid #139614119524096]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:46:41.968][DEBUG][RK0][tid #139614111131392]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:46:48.555][INFO][RK0][main]: Iter: 600 Time(100 iters): 19.4819s Loss: 0.134819 lr:0.5
[HCTR][08:46:51.959][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:52.017][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:52.075][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:52.134][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:52.193][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:52.253][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:52.313][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:52.372][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:52.663][INFO][RK0][main]: Evaluation, AverageLoss: 0.137611
[HCTR][08:46:52.663][INFO][RK0][main]: Eval Time for 70 iters: 4.1073s
[HCTR][08:47:08.048][INFO][RK0][main]: Iter: 700 Time(100 iters): 19.4769s Loss: 0.140394 lr:0.5
[HCTR][08:47:11.590][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:11.647][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:11.706][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:11.765][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:11.824][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:11.881][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:11.940][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:11.998][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:12.171][INFO][RK0][main]: Evaluation, AverageLoss: 0.138108
[HCTR][08:47:12.171][INFO][RK0][main]: Eval Time for 70 iters: 4.1189s
[HCTR][08:47:27.578][INFO][RK0][main]: Iter: 800 Time(100 iters): 19.5118s Loss: 0.141259 lr:0.5
[HCTR][08:47:30.764][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:30.880][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:30.938][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:31.175][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:31.292][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:31.467][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:31.525][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:31.584][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:31.692][INFO][RK0][main]: Evaluation, AverageLoss: 0.137271
[HCTR][08:47:31.692][INFO][RK0][main]: Eval Time for 70 iters: 4.11364s
[HCTR][08:47:47.105][INFO][RK0][main]: Iter: 900 Time(100 iters): 19.3756s Loss: 0.13619 lr:0.5
[HCTR][08:47:50.398][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:50.455][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:50.513][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:50.573][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:50.631][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:50.690][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:50.747][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:50.805][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:51.207][INFO][RK0][main]: Evaluation, AverageLoss: 0.137273
[HCTR][08:47:51.207][INFO][RK0][main]: Eval Time for 70 iters: 4.1011s
[HCTR][08:48:06.342][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 190.83s.

Performance Comparison for the Different ETPS

The iteration duration for the data parallel and model parallel strategy is 103.45s. For the distributed strategy, the duration is 190.85s. This comparison shows how different ETPS can greatly affect the performance of embedding. The results show that performance is better if you configure the embedding table as data parallel or localized when the table can fit on a single GPU.