HugeCTR Embedding Collection
About this Notebook
This notebook demonstrates the following:
Introduces the API of the embedding collection.
Introduces the embedding table placement strategy (ETPS) and how to configure ETPS in embedding collection.
Shows how to use an embedding collection in a DLRM model with the Criteo dataset for training and evaluation. The notebook shows two different ETPS as reference.
Concepts and API Reference
The following key classes and configuration file are used in this notebook:
hugectr.EmbeddingTableConfig
hugectr.EmbeddingPlanner
JSON plan file for the ETPS
For the concepts and API reference information about the classes and file, see the Overview of Using the HugeCTR Embedding Collection in the HugeCTR Layer Classes and Methods information.
Use an Embedding Collection with a DLRM Model
Prepare the Data
Follow the instructions under heading “Preprocess the Dataset through NVTabular” from the README in the samples/deepfm directory of the repository to prepare data.
Prepare the Training Script
This notebook was developed with on single DGX-1 to run the DLRM model in this notebook. The GPU info in DGX-1 is as follows. It consists of 8 V100-SXM2 GPUs.
! nvidia-smi
Thu Jun 23 00:14:56 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 33C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 35C P0 45W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 36C P0 44W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 33C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 36C P0 44W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 35C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 36C P0 44W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 34C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The training script, dlrm_train.py
, uses the the embedding collection API.
The script accepts one command-line argument that specifies the plan file so we can run the script several times and evaluate different ETPS:
%%writefile dlrm_train.py
import sys
import hugectr
plan_file = sys.argv[1]
slot_size_array = [203931, 18598, 14092, 7012, 18977, 4, 6385, 1245, 49,
186213, 71328, 67288, 11, 2168, 7338, 61, 4, 932, 15,
204515, 141526, 199433, 60919, 9137, 71, 34]
solver = hugectr.CreateSolver(
max_eval_batches=70,
batchsize_eval=65536,
batchsize=65536,
lr=0.5,
warmup_steps=300,
vvgpu=[[0, 1, 2, 3, 4, 5, 6, 7]],
repeat_dataset=True,
i64_input_key=True,
metrics_spec={hugectr.MetricsType.AverageLoss: 0.0},
use_embedding_collection=True,
)
reader = hugectr.DataReaderParams(
data_reader_type=hugectr.DataReaderType_t.Parquet,
source=["./deepfm_data_nvt/train/_file_list.txt"],
eval_source="./deepfm_data_nvt/val/_file_list.txt",
check_type=hugectr.Check_t.Non,
slot_size_array=slot_size_array
)
optimizer = hugectr.CreateOptimizer(
optimizer_type=hugectr.Optimizer_t.SGD,
update_type=hugectr.Update_t.Local,
atomic_update=True
)
model = hugectr.Model(solver, reader, optimizer)
model.add(
hugectr.Input(
label_dim=1,
label_name="label",
dense_dim=13,
dense_name="dense",
data_reader_sparse_param_array=[
hugectr.DataReaderSparseParam("data{}".format(i), 1, False, 1)
for i in range(len(slot_size_array))
],
)
)
# Create the embedding table.
embedding_table_list = []
for i in range(len(slot_size_array)):
embedding_table_list.append(
hugectr.EmbeddingTableConfig(
table_id=i,
max_vocabulary_size=slot_size_array[i],
ev_size=128,
min_key=0,
max_key=slot_size_array[i],
)
)
# Create the embedding planner and embedding collection.
embedding_planner = hugectr.EmbeddingPlanner()
emb_vec_list = []
for i in range(len(slot_size_array)):
embedding_planner.embedding_lookup(
table_config=embedding_table_list[i],
bottom_name="data{}".format(i),
top_name="emb_vec{}".format(i),
combiner="sum"
)
embedding_collection = embedding_planner.create_embedding_collection(plan_file)
model.add(embedding_collection)
# need concat
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.Concat,
bottom_names=["emb_vec{}".format(i) for i in range(len(slot_size_array))],
top_names=["sparse_embedding1"],
axis=1
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["dense"],
top_names=["fc1"],
num_output=512
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu1"],
top_names=["fc2"],
num_output=256
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc2"], top_names=["relu2"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu2"],
top_names=["fc3"],
num_output=128
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc3"], top_names=["relu3"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.Interaction, # interaction only support 3-D input
bottom_names=["relu3", "sparse_embedding1"],
top_names=["interaction1"],
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["interaction1"],
top_names=["fc4"],
num_output=1024,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc4"], top_names=["relu4"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu4"],
top_names=["fc5"],
num_output=1024,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc5"], top_names=["relu5"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu5"],
top_names=["fc6"],
num_output=512,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc6"], top_names=["relu6"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu6"],
top_names=["fc7"],
num_output=256,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc7"], top_names=["relu7"]
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu7"],
top_names=["fc8"],
num_output=1,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
bottom_names=["fc8", "label"],
top_names=["loss"],
)
)
model.compile()
model.summary()
model.fit(
max_iter=1000,
display=100,
eval_interval=100,
snapshot=10000000,
snapshot_prefix="dlrm",
)
Overwriting dlrm_train.py
Embedding Table Placement Strategy: Data Parallel and Model Parallel
The following generate_plan()
function shows how to configure small tables as data parallel and use model parallel for larger tables.
Each table is on single GPU and different GPU will hold different table—the same way we work with data in hugectr.LocalizedHashEmbedding
.
def print_plan(plan):
for id, single_gpu_plan in enumerate(plan):
print("single_gpu_plan index = {}".format(id))
for plan_attr in single_gpu_plan:
for key in plan_attr:
if key != "global_embedding_list":
print("\t{}:{}".format(key, plan_attr[key]))
else:
prefix_len = len(key)
left_space_fill = " " * prefix_len
print("\t{}:{}".format(key, plan_attr[key][0]))
for index in range(1, len(plan_attr[key])):
print("\t{}:{}".format(left_space_fill, plan_attr[key][index]))
def generate_plan(slot_size_array, gpu_count, plan_file):
mp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] > 6000]
dp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] <= 6000]
# Place the table across all GPUs.
plan = []
for gpu_id in range(gpu_count):
single_gpu_plan = []
mp_plan = {
"local_embedding_list": [
table_id
for i, table_id in enumerate(mp_table)
if i % gpu_count == gpu_id
],
"table_placement_strategy": "mp",
}
dp_plan = {"local_embedding_list": dp_table, "table_placement_strategy": "dp"}
single_gpu_plan.append(mp_plan)
single_gpu_plan.append(dp_plan)
plan.append(single_gpu_plan)
# Generate the global view of table placement.
mp_global_embedding_list = []
dp_global_embedding_list = []
for single_gpu_plan in plan:
mp_global_embedding_list.append(single_gpu_plan[0]["local_embedding_list"])
dp_global_embedding_list.append(single_gpu_plan[1]["local_embedding_list"])
for single_gpu_plan in plan:
single_gpu_plan[0]["global_embedding_list"] = mp_global_embedding_list
single_gpu_plan[1]["global_embedding_list"] = dp_global_embedding_list
print_plan(plan)
# Write the plan file to disk.
import json
with open(plan_file, "w") as f:
json.dump(plan, f, indent=4)
slot_size_array = [
203931,
18598,
14092,
7012,
18977,
4,
6385,
1245,
49,
186213,
71328,
67288,
11,
2168,
7338,
61,
4,
932,
15,
204515,
141526,
199433,
60919,
9137,
71,
34,
]
generate_plan(
slot_size_array=slot_size_array,
gpu_count=8,
plan_file="./dp_and_localized_plan.json",
)
single_gpu_plan index = 0
local_embedding_list:[0, 11]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 1
local_embedding_list:[1, 14]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 2
local_embedding_list:[2, 19]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 3
local_embedding_list:[3, 20]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 4
local_embedding_list:[4, 21]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 5
local_embedding_list:[6, 22]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 6
local_embedding_list:[9, 23]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
single_gpu_plan index = 7
local_embedding_list:[10]
table_placement_strategy:mp
global_embedding_list:[0, 11]
:[1, 14]
:[2, 19]
:[3, 20]
:[4, 21]
:[6, 22]
:[9, 23]
:[10]
local_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
table_placement_strategy:dp
global_embedding_list:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
:[5, 7, 8, 12, 13, 15, 16, 17, 18, 24, 25]
!python3 dlrm_train.py ./dp_and_localized_plan.json
HugeCTR Version: 3.7
====================================================Model Init=====================================================
[HCTR][08:41:17.942][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:41:17.942][INFO][RK0][main]: Global seed is 1323844045
[HCTR][08:41:18.400][INFO][RK0][main]: Device to NUMA mapping:
GPU 0 -> node 0
GPU 1 -> node 0
GPU 2 -> node 0
GPU 3 -> node 0
GPU 4 -> node 1
GPU 5 -> node 1
GPU 6 -> node 1
GPU 7 -> node 1
[HCTR][08:41:29.902][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][08:41:29.903][INFO][RK0][main]: Start all2all warmup
[HCTR][08:41:30.083][INFO][RK0][main]: End all2all warmup
[HCTR][08:41:30.095][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:41:30.097][INFO][RK0][main]: Device 0: Tesla V100-SXM2-16GB
[HCTR][08:41:30.097][INFO][RK0][main]: Device 1: Tesla V100-SXM2-16GB
[HCTR][08:41:30.098][INFO][RK0][main]: Device 2: Tesla V100-SXM2-16GB
[HCTR][08:41:30.099][INFO][RK0][main]: Device 3: Tesla V100-SXM2-16GB
[HCTR][08:41:30.100][INFO][RK0][main]: Device 4: Tesla V100-SXM2-16GB
[HCTR][08:41:30.100][INFO][RK0][main]: Device 5: Tesla V100-SXM2-16GB
[HCTR][08:41:30.101][INFO][RK0][main]: Device 6: Tesla V100-SXM2-16GB
[HCTR][08:41:30.102][INFO][RK0][main]: Device 7: Tesla V100-SXM2-16GB
[HCTR][08:41:30.103][INFO][RK0][main]: num of DataReader workers: 8
[HCTR][08:41:30.133][DEBUG][RK0][tid #140397011531520]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:41:30.133][DEBUG][RK0][tid #140397774890752]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:41:30.140][DEBUG][RK0][tid #140397783283456]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:41:30.140][DEBUG][RK0][tid #140394788546304]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:41:30.140][DEBUG][RK0][tid #140406197053184]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:41:30.140][DEBUG][RK0][tid #140397829383936]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:41:30.140][DEBUG][RK0][tid #140394780153600]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:41:30.141][DEBUG][RK0][tid #140397003138816]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:41:30.155][INFO][RK0][main]: Vocabulary size: 1221286
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394528503552]: [HCTR][08:41:30.156][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:41:30.156][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:41:30.174][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:41:30.179][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:41:30.767][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
===================================================Model Summary===================================================
[HCTR][08:42:08.060][INFO][RK0][main]: label Dense Sparse
label dense data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
(None, 1) (None, 13)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type Input Name Output Name Output Shape
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
EmbeddingCollection data0 emb_vec0 (None, 1, 128)
data1 emb_vec1 (None, 1, 128)
data2 emb_vec2 (None, 1, 128)
data3 emb_vec3 (None, 1, 128)
data4 emb_vec4 (None, 1, 128)
data5 emb_vec5 (None, 1, 128)
data6 emb_vec6 (None, 1, 128)
data7 emb_vec7 (None, 1, 128)
data8 emb_vec8 (None, 1, 128)
data9 emb_vec9 (None, 1, 128)
data10 emb_vec10 (None, 1, 128)
data11 emb_vec11 (None, 1, 128)
data12 emb_vec12 (None, 1, 128)
data13 emb_vec13 (None, 1, 128)
data14 emb_vec14 (None, 1, 128)
data15 emb_vec15 (None, 1, 128)
data16 emb_vec16 (None, 1, 128)
data17 emb_vec17 (None, 1, 128)
data18 emb_vec18 (None, 1, 128)
data19 emb_vec19 (None, 1, 128)
data20 emb_vec20 (None, 1, 128)
data21 emb_vec21 (None, 1, 128)
data22 emb_vec22 (None, 1, 128)
data23 emb_vec23 (None, 1, 128)
data24 emb_vec24 (None, 1, 128)
data25 emb_vec25 (None, 1, 128)
------------------------------------------------------------------------------------------------------------------
Concat emb_vec0 sparse_embedding1 (None, 26, 128)
emb_vec1
emb_vec2
emb_vec3
emb_vec4
emb_vec5
emb_vec6
emb_vec7
emb_vec8
emb_vec9
emb_vec10
emb_vec11
emb_vec12
emb_vec13
emb_vec14
emb_vec15
emb_vec16
emb_vec17
emb_vec18
emb_vec19
emb_vec20
emb_vec21
emb_vec22
emb_vec23
emb_vec24
emb_vec25
------------------------------------------------------------------------------------------------------------------
InnerProduct dense fc1 (None, 512)
------------------------------------------------------------------------------------------------------------------
ReLU fc1 relu1 (None, 512)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu1 fc2 (None, 256)
------------------------------------------------------------------------------------------------------------------
ReLU fc2 relu2 (None, 256)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu2 fc3 (None, 128)
------------------------------------------------------------------------------------------------------------------
ReLU fc3 relu3 (None, 128)
------------------------------------------------------------------------------------------------------------------
Interaction relu3 interaction1 (None, 480)
sparse_embedding1
------------------------------------------------------------------------------------------------------------------
InnerProduct interaction1 fc4 (None, 1024)
------------------------------------------------------------------------------------------------------------------
ReLU fc4 relu4 (None, 1024)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu4 fc5 (None, 1024)
------------------------------------------------------------------------------------------------------------------
ReLU fc5 relu5 (None, 1024)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu5 fc6 (None, 512)
------------------------------------------------------------------------------------------------------------------
ReLU fc6 relu6 (None, 512)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu6 fc7 (None, 256)
------------------------------------------------------------------------------------------------------------------
ReLU fc7 relu7 (None, 256)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu7 fc8 (None, 1)
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss fc8 loss
label
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:42:08.061][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
[HCTR][08:42:08.061][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:42:08.061][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
[HCTR][08:42:08.061][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:42:08.061][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][08:42:08.061][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
[HCTR][08:42:08.061][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][08:42:08.061][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
[HCTR][08:42:08.061][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
[HCTR][08:42:16.322][INFO][RK0][main]: Iter: 100 Time(100 iters): 8.19237s Loss: 0.140113 lr:0.168333
[HCTR][08:42:18.453][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:18.491][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:18.534][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:18.572][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:18.610][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:18.651][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:18.684][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:18.720][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:18.957][INFO][RK0][main]: Evaluation, AverageLoss: 0.141261
[HCTR][08:42:18.957][INFO][RK0][main]: Eval Time for 70 iters: 2.63429s
[HCTR][08:42:27.041][INFO][RK0][main]: Iter: 200 Time(100 iters): 10.6496s Loss: 0.142313 lr:0.335
[HCTR][08:42:29.077][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:29.115][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:29.157][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:29.195][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:29.237][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:29.275][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:29.312][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:29.351][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:29.532][INFO][RK0][main]: Evaluation, AverageLoss: 0.141891
[HCTR][08:42:29.532][INFO][RK0][main]: Eval Time for 70 iters: 2.4907s
[HCTR][08:42:37.639][INFO][RK0][main]: Iter: 300 Time(100 iters): 10.5395s Loss: 0.154403 lr:0.5
[HCTR][08:42:39.748][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:39.785][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:39.824][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:39.862][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:39.905][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:39.952][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:39.987][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:40.021][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:40.125][INFO][RK0][main]: Evaluation, AverageLoss: 0.147726
[HCTR][08:42:40.125][INFO][RK0][main]: Eval Time for 70 iters: 2.48534s
[HCTR][08:42:48.262][INFO][RK0][main]: Iter: 400 Time(100 iters): 10.5647s Loss: 0.141461 lr:0.5
[HCTR][08:42:50.199][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:42:50.274][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:42:50.311][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:42:50.462][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:42:50.533][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:42:50.638][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:42:50.675][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:42:50.714][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:42:50.788][INFO][RK0][main]: Evaluation, AverageLoss: 0.140187
[HCTR][08:42:50.788][INFO][RK0][main]: Eval Time for 70 iters: 2.52533s
[HCTR][08:42:58.948][INFO][RK0][main]: Iter: 500 Time(100 iters): 10.605s Loss: 0.142035 lr:0.5
[HCTR][08:43:00.914][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:00.951][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:00.990][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:01.023][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:01.057][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:01.094][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:01.127][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:01.163][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:01.403][INFO][RK0][main]: Evaluation, AverageLoss: 0.140354
[HCTR][08:43:01.403][INFO][RK0][main]: Eval Time for 70 iters: 2.45442s
[HCTR][08:43:04.871][DEBUG][RK0][tid #140397003138816]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:43:04.951][DEBUG][RK0][tid #140397011531520]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:43:05.031][DEBUG][RK0][tid #140406197053184]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:43:05.111][DEBUG][RK0][tid #140397829383936]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:43:05.192][DEBUG][RK0][tid #140397783283456]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:43:05.274][DEBUG][RK0][tid #140397774890752]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:43:05.354][DEBUG][RK0][tid #140394788546304]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:43:06.072][DEBUG][RK0][tid #140394780153600]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:43:09.539][INFO][RK0][main]: Iter: 600 Time(100 iters): 10.5255s Loss: 0.140006 lr:0.5
[HCTR][08:43:11.577][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:11.615][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:11.653][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:11.690][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:11.734][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:11.780][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:11.813][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:11.851][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:12.020][INFO][RK0][main]: Evaluation, AverageLoss: 0.141187
[HCTR][08:43:12.020][INFO][RK0][main]: Eval Time for 70 iters: 2.4811s
[HCTR][08:43:20.138][INFO][RK0][main]: Iter: 700 Time(100 iters): 10.5241s Loss: 0.143169 lr:0.5
[HCTR][08:43:22.305][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:22.343][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:22.382][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:22.420][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:22.463][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:22.507][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:22.551][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:22.588][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:22.694][INFO][RK0][main]: Evaluation, AverageLoss: 0.140917
[HCTR][08:43:22.694][INFO][RK0][main]: Eval Time for 70 iters: 2.55575s
[HCTR][08:43:30.768][INFO][RK0][main]: Iter: 800 Time(100 iters): 10.5603s Loss: 0.143395 lr:0.5
[HCTR][08:43:32.698][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:32.771][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:32.809][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:32.950][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:33.023][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:33.131][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:33.169][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:33.212][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:33.292][INFO][RK0][main]: Evaluation, AverageLoss: 0.139397
[HCTR][08:43:33.292][INFO][RK0][main]: Eval Time for 70 iters: 2.52409s
[HCTR][08:43:41.361][INFO][RK0][main]: Iter: 900 Time(100 iters): 10.5237s Loss: 0.141716 lr:0.5
[HCTR][08:43:43.361][DEBUG][RK0][tid #140394662721280]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:43:43.399][DEBUG][RK0][tid #140394654328576]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:43:43.436][DEBUG][RK0][tid #140394645935872]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:43:43.474][DEBUG][RK0][tid #140394528503552]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:43:43.518][DEBUG][RK0][tid #140394520110848]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:43:43.555][DEBUG][RK0][tid #140394511718144]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:43:43.589][DEBUG][RK0][tid #140394394285824]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:43:43.626][DEBUG][RK0][tid #140394385893120]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:43:43.867][INFO][RK0][main]: Evaluation, AverageLoss: 0.141604
[HCTR][08:43:43.867][INFO][RK0][main]: Eval Time for 70 iters: 2.50584s
[HCTR][08:43:51.826][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 103.76s.
Embedding Table Placement Strategy: Distributed
The generate_distributed_plan()
function shows how to distribute all tables across all GPUs
This strategy is similar to hugectr.DistributedHashEmbedding
.
def generate_distributed_plan(slot_size_array, gpu_count, plan_file):
# Place the table across all GPUs.
plan = []
for gpu_id in range(gpu_count):
distributed_plan = {
"local_embedding_list": [
table_id for table_id in range(len(slot_size_array))
],
"table_placement_strategy": "mp",
"shard_id": gpu_id,
"shards_count": gpu_count,
}
plan.append([distributed_plan])
# Generate the global view of table placement.
distributed_global_embedding_list = []
for single_gpu_plan in plan:
distributed_global_embedding_list.append(
single_gpu_plan[0]["local_embedding_list"]
)
for single_gpu_plan in plan:
single_gpu_plan[0]["global_embedding_list"] = distributed_global_embedding_list
print_plan(plan)
# Write the plan file to disk.
import json
with open(plan_file, "w") as f:
json.dump(plan, f, indent=4)
slot_size_array = [
203931,
18598,
14092,
7012,
18977,
4,
6385,
1245,
49,
186213,
71328,
67288,
11,
2168,
7338,
61,
4,
932,
15,
204515,
141526,
199433,
60919,
9137,
71,
34,
]
generate_distributed_plan(
slot_size_array=slot_size_array,
gpu_count=8,
plan_file="./distributed_plan.json"
)
single_gpu_plan index = 0
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:0
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 1
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:1
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 2
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:2
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 3
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:3
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 4
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:4
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 5
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:5
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 6
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:6
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
single_gpu_plan index = 7
local_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
table_placement_strategy:mp
shard_id:7
shards_count:8
global_embedding_list:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
!python3 dlrm_train.py ./distributed_plan.json
HugeCTR Version: 3.7
====================================================Model Init=====================================================
[HCTR][08:44:05.384][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:44:05.384][INFO][RK0][main]: Global seed is 1510630763
[HCTR][08:44:05.843][INFO][RK0][main]: Device to NUMA mapping:
GPU 0 -> node 0
GPU 1 -> node 0
GPU 2 -> node 0
GPU 3 -> node 0
GPU 4 -> node 1
GPU 5 -> node 1
GPU 6 -> node 1
GPU 7 -> node 1
[HCTR][08:44:17.340][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][08:44:17.341][INFO][RK0][main]: Start all2all warmup
[HCTR][08:44:17.532][INFO][RK0][main]: End all2all warmup
[HCTR][08:44:17.544][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:44:17.545][INFO][RK0][main]: Device 0: Tesla V100-SXM2-16GB
[HCTR][08:44:17.546][INFO][RK0][main]: Device 1: Tesla V100-SXM2-16GB
[HCTR][08:44:17.547][INFO][RK0][main]: Device 2: Tesla V100-SXM2-16GB
[HCTR][08:44:17.548][INFO][RK0][main]: Device 3: Tesla V100-SXM2-16GB
[HCTR][08:44:17.548][INFO][RK0][main]: Device 4: Tesla V100-SXM2-16GB
[HCTR][08:44:17.549][INFO][RK0][main]: Device 5: Tesla V100-SXM2-16GB
[HCTR][08:44:17.550][INFO][RK0][main]: Device 6: Tesla V100-SXM2-16GB
[HCTR][08:44:17.551][INFO][RK0][main]: Device 7: Tesla V100-SXM2-16GB
[HCTR][08:44:17.552][INFO][RK0][main]: num of DataReader workers: 8
[HCTR][08:44:17.578][DEBUG][RK0][tid #139614253741824]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:44:17.578][DEBUG][RK0][tid #139614119524096]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614371174144]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:44:17.579][DEBUG][RK0][tid #139618506761984]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614505387776]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614387959552]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:44:17.579][DEBUG][RK0][tid #139614111131392]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:44:17.583][INFO][RK0][main]: Vocabulary size: 1221286
[HCTR][08:44:17.583][DEBUG][RK0][tid #139614379566848]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:44:17.583][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:44:17.583][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:44:17.584][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:44:17.584][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:44:18.246][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
===================================================Model Summary===================================================
[HCTR][08:44:55.511][INFO][RK0][main]: label Dense Sparse
label dense data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
(None, 1) (None, 13)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type Input Name Output Name Output Shape
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
EmbeddingCollection data0 emb_vec0 (None, 1, 128)
data1 emb_vec1 (None, 1, 128)
data2 emb_vec2 (None, 1, 128)
data3 emb_vec3 (None, 1, 128)
data4 emb_vec4 (None, 1, 128)
data5 emb_vec5 (None, 1, 128)
data6 emb_vec6 (None, 1, 128)
data7 emb_vec7 (None, 1, 128)
data8 emb_vec8 (None, 1, 128)
data9 emb_vec9 (None, 1, 128)
data10 emb_vec10 (None, 1, 128)
data11 emb_vec11 (None, 1, 128)
data12 emb_vec12 (None, 1, 128)
data13 emb_vec13 (None, 1, 128)
data14 emb_vec14 (None, 1, 128)
data15 emb_vec15 (None, 1, 128)
data16 emb_vec16 (None, 1, 128)
data17 emb_vec17 (None, 1, 128)
data18 emb_vec18 (None, 1, 128)
data19 emb_vec19 (None, 1, 128)
data20 emb_vec20 (None, 1, 128)
data21 emb_vec21 (None, 1, 128)
data22 emb_vec22 (None, 1, 128)
data23 emb_vec23 (None, 1, 128)
data24 emb_vec24 (None, 1, 128)
data25 emb_vec25 (None, 1, 128)
------------------------------------------------------------------------------------------------------------------
Concat emb_vec0 sparse_embedding1 (None, 26, 128)
emb_vec1
emb_vec2
emb_vec3
emb_vec4
emb_vec5
emb_vec6
emb_vec7
emb_vec8
emb_vec9
emb_vec10
emb_vec11
emb_vec12
emb_vec13
emb_vec14
emb_vec15
emb_vec16
emb_vec17
emb_vec18
emb_vec19
emb_vec20
emb_vec21
emb_vec22
emb_vec23
emb_vec24
emb_vec25
------------------------------------------------------------------------------------------------------------------
InnerProduct dense fc1 (None, 512)
------------------------------------------------------------------------------------------------------------------
ReLU fc1 relu1 (None, 512)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu1 fc2 (None, 256)
------------------------------------------------------------------------------------------------------------------
ReLU fc2 relu2 (None, 256)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu2 fc3 (None, 128)
------------------------------------------------------------------------------------------------------------------
ReLU fc3 relu3 (None, 128)
------------------------------------------------------------------------------------------------------------------
Interaction relu3 interaction1 (None, 480)
sparse_embedding1
------------------------------------------------------------------------------------------------------------------
InnerProduct interaction1 fc4 (None, 1024)
------------------------------------------------------------------------------------------------------------------
ReLU fc4 relu4 (None, 1024)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu4 fc5 (None, 1024)
------------------------------------------------------------------------------------------------------------------
ReLU fc5 relu5 (None, 1024)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu5 fc6 (None, 512)
------------------------------------------------------------------------------------------------------------------
ReLU fc6 relu6 (None, 512)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu6 fc7 (None, 256)
------------------------------------------------------------------------------------------------------------------
ReLU fc7 relu7 (None, 256)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu7 fc8 (None, 1)
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss fc8 loss
label
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:44:55.512][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
[HCTR][08:44:55.512][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:44:55.512][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
[HCTR][08:44:55.512][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:44:55.512][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][08:44:55.512][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
[HCTR][08:44:55.512][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][08:44:55.512][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
[HCTR][08:44:55.512][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
[HCTR][08:45:11.057][INFO][RK0][main]: Iter: 100 Time(100 iters): 15.3926s Loss: 0.144649 lr:0.168333
[HCTR][08:45:14.328][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:14.386][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:14.444][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:14.503][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:14.562][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:14.620][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:14.677][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:14.736][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:15.139][INFO][RK0][main]: Evaluation, AverageLoss: 0.146034
[HCTR][08:45:15.139][INFO][RK0][main]: Eval Time for 70 iters: 4.08247s
[HCTR][08:45:30.553][INFO][RK0][main]: Iter: 200 Time(100 iters): 19.4961s Loss: 0.149704 lr:0.335
[HCTR][08:45:33.943][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:34.001][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:34.060][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:34.118][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:34.177][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:34.235][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:34.292][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:34.350][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:34.645][INFO][RK0][main]: Evaluation, AverageLoss: 0.146364
[HCTR][08:45:34.645][INFO][RK0][main]: Eval Time for 70 iters: 4.09159s
[HCTR][08:45:50.040][INFO][RK0][main]: Iter: 300 Time(100 iters): 19.3843s Loss: 0.158335 lr:0.5
[HCTR][08:45:53.544][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:45:53.603][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:45:53.660][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:45:53.720][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:45:53.778][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:45:53.836][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:45:53.894][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:45:53.953][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:45:54.121][INFO][RK0][main]: Evaluation, AverageLoss: 0.148422
[HCTR][08:45:54.121][INFO][RK0][main]: Eval Time for 70 iters: 4.08104s
[HCTR][08:46:09.558][INFO][RK0][main]: Iter: 400 Time(100 iters): 19.5178s Loss: 0.139716 lr:0.5
[HCTR][08:46:12.751][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:12.868][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:12.926][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:13.164][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:13.280][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:13.457][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:13.514][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:13.574][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:13.683][INFO][RK0][main]: Evaluation, AverageLoss: 0.139018
[HCTR][08:46:13.683][INFO][RK0][main]: Eval Time for 70 iters: 4.12495s
[HCTR][08:46:29.073][INFO][RK0][main]: Iter: 500 Time(100 iters): 19.4974s Loss: 0.139979 lr:0.5
[HCTR][08:46:32.347][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:32.403][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:32.462][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:32.521][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:32.579][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:32.637][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:32.696][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:32.754][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:33.157][INFO][RK0][main]: Evaluation, AverageLoss: 0.138041
[HCTR][08:46:33.157][INFO][RK0][main]: Eval Time for 70 iters: 4.08385s
[HCTR][08:46:39.671][DEBUG][RK0][tid #139614253741824]: file_name_ deepfm_data_nvt/train/0.1738817c5c5c47dba75a428d0837cbc3.parquet file_total_rows_ 4586722
[HCTR][08:46:39.823][DEBUG][RK0][tid #139614371174144]: file_name_ deepfm_data_nvt/train/1.c7b6f2423fec47ff97a09ec95f6346f9.parquet file_total_rows_ 4585117
[HCTR][08:46:39.973][DEBUG][RK0][tid #139618506761984]: file_name_ deepfm_data_nvt/train/2.6b134d3f8f0a4f0d9453f1d7c08f74d5.parquet file_total_rows_ 4584304
[HCTR][08:46:40.125][DEBUG][RK0][tid #139614505387776]: file_name_ deepfm_data_nvt/train/3.4b192542e2ad4cc8b745feb142d1878a.parquet file_total_rows_ 4581022
[HCTR][08:46:40.284][DEBUG][RK0][tid #139614387959552]: file_name_ deepfm_data_nvt/train/4.4f7e95ed8f9b4bcc9b63c5f3278e6905.parquet file_total_rows_ 4580476
[HCTR][08:46:40.431][DEBUG][RK0][tid #139614379566848]: file_name_ deepfm_data_nvt/train/5.c5b89db1e82d4842998d560796eab838.parquet file_total_rows_ 4583901
[HCTR][08:46:40.586][DEBUG][RK0][tid #139614119524096]: file_name_ deepfm_data_nvt/train/6.92133f3ee3664684854969202958122f.parquet file_total_rows_ 4581782
[HCTR][08:46:41.968][DEBUG][RK0][tid #139614111131392]: file_name_ deepfm_data_nvt/train/7.9345ade3421b40a5803f518c48ae436f.parquet file_total_rows_ 4589169
[HCTR][08:46:48.555][INFO][RK0][main]: Iter: 600 Time(100 iters): 19.4819s Loss: 0.134819 lr:0.5
[HCTR][08:46:51.959][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:46:52.017][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:46:52.075][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:46:52.134][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:46:52.193][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:46:52.253][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:46:52.313][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:46:52.372][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:46:52.663][INFO][RK0][main]: Evaluation, AverageLoss: 0.137611
[HCTR][08:46:52.663][INFO][RK0][main]: Eval Time for 70 iters: 4.1073s
[HCTR][08:47:08.048][INFO][RK0][main]: Iter: 700 Time(100 iters): 19.4769s Loss: 0.140394 lr:0.5
[HCTR][08:47:11.590][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:11.647][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:11.706][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:11.765][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:11.824][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:11.881][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:11.940][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:11.998][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:12.171][INFO][RK0][main]: Evaluation, AverageLoss: 0.138108
[HCTR][08:47:12.171][INFO][RK0][main]: Eval Time for 70 iters: 4.1189s
[HCTR][08:47:27.578][INFO][RK0][main]: Iter: 800 Time(100 iters): 19.5118s Loss: 0.141259 lr:0.5
[HCTR][08:47:30.764][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:30.880][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:30.938][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:31.175][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:31.292][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:31.467][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:31.525][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:31.584][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:31.692][INFO][RK0][main]: Evaluation, AverageLoss: 0.137271
[HCTR][08:47:31.692][INFO][RK0][main]: Eval Time for 70 iters: 4.11364s
[HCTR][08:47:47.105][INFO][RK0][main]: Iter: 900 Time(100 iters): 19.3756s Loss: 0.13619 lr:0.5
[HCTR][08:47:50.398][DEBUG][RK0][tid #139614102738688]: file_name_ deepfm_data_nvt/val/0.35ab81b16b4a409ba42a1baf89dcba52.parquet file_total_rows_ 571942
[HCTR][08:47:50.455][DEBUG][RK0][tid #139609623230208]: file_name_ deepfm_data_nvt/val/1.01854d707a564342aef3af44b814de1c.parquet file_total_rows_ 573919
[HCTR][08:47:50.513][DEBUG][RK0][tid #139609614837504]: file_name_ deepfm_data_nvt/val/2.7d7593c16af64625973ed246f68af624.parquet file_total_rows_ 572137
[HCTR][08:47:50.573][DEBUG][RK0][tid #139609606444800]: file_name_ deepfm_data_nvt/val/3.eec657484d40418cbf2648541592d09e.parquet file_total_rows_ 572545
[HCTR][08:47:50.631][DEBUG][RK0][tid #139609220577024]: file_name_ deepfm_data_nvt/val/4.e60c2f9421d84490bbc4de5f15ec5a0f.parquet file_total_rows_ 573664
[HCTR][08:47:50.690][DEBUG][RK0][tid #139609212184320]: file_name_ deepfm_data_nvt/val/5.883be83fecd74c1fbac00321911f2787.parquet file_total_rows_ 573448
[HCTR][08:47:50.747][DEBUG][RK0][tid #139609203791616]: file_name_ deepfm_data_nvt/val/6.0f6ed30e74dc49668d1e1011e819e9e3.parquet file_total_rows_ 573727
[HCTR][08:47:50.805][DEBUG][RK0][tid #139609086359296]: file_name_ deepfm_data_nvt/val/7.9e48c14d9bde498a8ef5d840d636d276.parquet file_total_rows_ 572680
[HCTR][08:47:51.207][INFO][RK0][main]: Evaluation, AverageLoss: 0.137273
[HCTR][08:47:51.207][INFO][RK0][main]: Eval Time for 70 iters: 4.1011s
[HCTR][08:48:06.342][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 190.83s.
Performance Comparison for the Different ETPS
The iteration duration for the data parallel and model parallel strategy is 103.45s
.
For the distributed strategy, the duration is 190.85s
.
This comparison shows how different ETPS can greatly affect the performance of embedding.
The results show that performance is better if you configure the embedding table as data parallel or localized when the table can fit on a single GPU.