Training HugeCTR Model with Pre-trained Embeddings
In this notebook, we will train a deep neural network for predicting user’s rating (binary target with 1 for ratings >3
and 0 for ratings <=3
). The two categorical features are userId
and movieId
.
We will also make use of movie’s pretrained embeddings, extracted in the previous notebooks.
Loading pretrained movie features into non-trainable embedding layer
# loading NVTabular movie encoding
import pandas as pd
import os
INPUT_DATA_DIR = './data'
movie_mapping = pd.read_parquet(os.path.join(INPUT_DATA_DIR, "workflow-hugectr/categories/unique.movieId.parquet"))
movie_mapping.tail()
movieId | movieId_size | |
---|---|---|
56581 | 209155 | 1 |
56582 | 209157 | 1 |
56583 | 209159 | 1 |
56584 | 209169 | 1 |
56585 | 209171 | 1 |
feature_df = pd.read_parquet('feature_df.parquet')
print(feature_df.shape)
feature_df.head()
(62423, 3073)
movieId | poster_feature_0 | poster_feature_1 | poster_feature_2 | poster_feature_3 | poster_feature_4 | poster_feature_5 | poster_feature_6 | poster_feature_7 | poster_feature_8 | ... | text_feature_1014 | text_feature_1015 | text_feature_1016 | text_feature_1017 | text_feature_1018 | text_feature_1019 | text_feature_1020 | text_feature_1021 | text_feature_1022 | text_feature_1023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 0.026260 | 0.857608 | 0.410247 | 0.066654 | 0.382803 | 0.899998 | 0.511562 | 0.592291 | 0.565434 | ... | 0.636716 | 0.578369 | 0.996169 | 0.402107 | 0.412318 | 0.859952 | 0.293852 | 0.341114 | 0.727113 | 0.085829 |
1 | 2.0 | 0.141265 | 0.721758 | 0.679958 | 0.955634 | 0.391091 | 0.324611 | 0.505211 | 0.258331 | 0.048264 | ... | 0.161505 | 0.431864 | 0.836532 | 0.525013 | 0.654566 | 0.823841 | 0.818313 | 0.856280 | 0.638048 | 0.685537 |
2 | 3.0 | 0.119418 | 0.911146 | 0.470762 | 0.762258 | 0.626335 | 0.768947 | 0.241833 | 0.775992 | 0.236340 | ... | 0.865548 | 0.387806 | 0.668321 | 0.552122 | 0.750238 | 0.863707 | 0.382173 | 0.894487 | 0.565142 | 0.164083 |
3 | 4.0 | 0.538184 | 0.980678 | 0.643513 | 0.928519 | 0.794906 | 0.201022 | 0.744666 | 0.962188 | 0.915320 | ... | 0.777534 | 0.904200 | 0.167337 | 0.875194 | 0.180481 | 0.815904 | 0.808288 | 0.036711 | 0.902779 | 0.580946 |
4 | 5.0 | 0.772951 | 0.239788 | 0.061874 | 0.162997 | 0.388310 | 0.236311 | 0.162757 | 0.207134 | 0.111078 | ... | 0.250022 | 0.335043 | 0.091674 | 0.121507 | 0.418124 | 0.150020 | 0.803506 | 0.059504 | 0.002342 | 0.932321 |
5 rows × 3073 columns
feature_df.set_index('movieId', inplace=True)
from tqdm import tqdm
import numpy as np
num_tokens = len(movie_mapping)
embedding_dim = 2048+1024
hits = 0
misses = 0
# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
print("Loading pretrained embedding matrix...")
for i, row in tqdm(movie_mapping.iterrows(), total=len(movie_mapping)):
movieId = row['movieId']
if movieId in feature_df.index:
embedding_vector = feature_df.loc[movieId]
# embedding found
embedding_matrix[i] = embedding_vector
hits += 1
else:
misses += 1
print("Found features for %d movies (%d misses)" % (hits, misses))
Loading pretrained embedding matrix...
100%|████████████████████████████████████| 56586/56586 [00:14<00:00, 3967.46it/s]
Found features for 56585 movies (1 misses)
embedding_dim
3072
embedding_matrix
array([[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0.17294852, 0.15285189, 0.26095702, ..., 0.75369112, 0.29602144,
0.78917433],
[0.13539355, 0.84843078, 0.70951219, ..., 0.10441725, 0.72871966,
0.11719463],
...,
[0.18514273, 0.72422918, 0.04273015, ..., 0.1404219 , 0.54169348,
0.96875489],
[0.08307642, 0.3673532 , 0.15777258, ..., 0.01297393, 0.36267638,
0.14848055],
[0.82188376, 0.56516905, 0.70838085, ..., 0.45119769, 0.9273439 ,
0.42464321]])
Next, we write the pretrained embedding to a raw format supported by HugeCTR.
Note: As of version 3.2, HugeCTR only supports a maximum embedding size of 1024. Hence, we shall be using the first 512 element of image embedding plus 512 element of text embedding.
import struct
PRETRAINED_EMBEDDING_SIZE = 1024
def convert_pretrained_embeddings_to_sparse_model(keys, pre_trained_sparse_embeddings, hugectr_sparse_model, embedding_vec_size):
os.system("mkdir -p {}".format(hugectr_sparse_model))
with open("{}/key".format(hugectr_sparse_model), 'wb') as key_file, \
open("{}/emb_vector".format(hugectr_sparse_model), 'wb') as vec_file:
for i, key in enumerate(keys):
vec = np.concatenate([pre_trained_sparse_embeddings[i,:int(PRETRAINED_EMBEDDING_SIZE/2)], pre_trained_sparse_embeddings[i, 1024:1024+int(PRETRAINED_EMBEDDING_SIZE/2)]])
key_struct = struct.pack('q', key)
vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
key_file.write(key_struct)
vec_file.write(vec_struct)
keys = list(movie_mapping.index)
convert_pretrained_embeddings_to_sparse_model(keys, embedding_matrix, 'hugectr_pretrained_embedding.model', embedding_vec_size=PRETRAINED_EMBEDDING_SIZE) # HugeCTR not supporting embedding size > 1024
Define and train model
In this section, we define and train the model. The model comprise trainable embedding layers for categorical features (userId
, movieId
) and pretrained (non-trainable) embedding layer for movie features.
We will write the model to ./model.py
and execute it afterwards.
First, we need the cardinalities of each categorical feature to assign as slot_size_array
in the model below.
import nvtabular as nvt
from nvtabular.ops import get_embedding_sizes
workflow = nvt.Workflow.load(os.path.join(INPUT_DATA_DIR, "workflow-hugectr"))
embeddings = get_embedding_sizes(workflow)
print(embeddings)
#{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
We use graph_to_json
to convert the model to a JSON configuration, required for the inference.
%%writefile './model.py'
import hugectr
from mpi4py import MPI # noqa
INPUT_DATA_DIR = './data/'
solver = hugectr.CreateSolver(
vvgpu=[[0]],
batchsize=2048,
batchsize_eval=2048,
max_eval_batches=160,
i64_input_key=True,
use_mixed_precision=False,
repeat_dataset=True,
)
optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.Adam)
reader = hugectr.DataReaderParams(
data_reader_type=hugectr.DataReaderType_t.Parquet,
source=[INPUT_DATA_DIR + "train-hugectr/_file_list.txt"],
eval_source=INPUT_DATA_DIR + "valid-hugectr/_file_list.txt",
check_type=hugectr.Check_t.Non,
slot_size_array=[162542, 56586, 21, 56586],
)
model = hugectr.Model(solver, reader, optimizer)
model.add(
hugectr.Input(
label_dim=1,
label_name="label",
dense_dim=0,
dense_name="dense",
data_reader_sparse_param_array=[
hugectr.DataReaderSparseParam("data1", nnz_per_slot=[1, 1, 2], is_fixed_length=False, slot_num=3),
hugectr.DataReaderSparseParam("movieId", nnz_per_slot=[1], is_fixed_length=True, slot_num=1)
],
)
)
model.add(
hugectr.SparseEmbedding(
embedding_type=hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb=3000,
embedding_vec_size=16,
combiner="sum",
sparse_embedding_name="sparse_embedding1",
bottom_name="data1",
optimizer=optimizer,
)
)
# pretrained embedding
model.add(
hugectr.SparseEmbedding(
embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb=3000,
embedding_vec_size=1024,
combiner="sum",
sparse_embedding_name="pretrained_embedding",
bottom_name="movieId",
optimizer=optimizer,
)
)
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["sparse_embedding1"],
top_names = ["reshape1"],
leading_dim=48))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["pretrained_embedding"],
top_names = ["reshape2"],
leading_dim=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
bottom_names = ["reshape1", "reshape2"],
top_names = ["concat1"]))
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["concat1"],
top_names=["fc1"],
num_output=128,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU,
bottom_names=["fc1"],
top_names=["relu1"],
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu1"],
top_names=["fc2"],
num_output=128,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.ReLU,
bottom_names=["fc2"],
top_names=["relu2"],
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.InnerProduct,
bottom_names=["relu2"],
top_names=["fc3"],
num_output=1,
)
)
model.add(
hugectr.DenseLayer(
layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
bottom_names=["fc3", "label"],
top_names=["loss"],
)
)
model.compile()
model.summary()
# Load the pretrained embedding layer
model.load_sparse_weights({"pretrained_embedding": "./hugectr_pretrained_embedding.model"})
model.freeze_embedding("pretrained_embedding")
model.fit(max_iter=10001, display=100, eval_interval=200, snapshot=5000)
model.graph_to_json(graph_config_file="hugectr-movielens.json")
Overwriting ./model.py
We train our model.
!python model.py
HugeCTR Version: 3.2
====================================================Model Init=====================================================
[HUGECTR][01:09:00][INFO][RANK0]: Global seed is 476440390
[HUGECTR][01:09:00][INFO][RANK0]: Device to NUMA mapping:
GPU 0 -> node 0
[HUGECTR][01:09:01][WARNING][RANK0]: Peer-to-peer access cannot be fully enabled.
[HUGECTR][01:09:01][INFO][RANK0]: Start all2all warmup
[HUGECTR][01:09:01][INFO][RANK0]: End all2all warmup
[HUGECTR][01:09:01][INFO][RANK0]: Using All-reduce algorithm: NCCL
[HUGECTR][01:09:01][INFO][RANK0]: Device 0: Tesla V100-SXM2-16GB
[HUGECTR][01:09:01][INFO][RANK0]: num of DataReader workers: 1
[HUGECTR][01:09:01][INFO][RANK0]: Vocabulary size: 275735
[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=16384000
[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=256000
[HUGECTR][01:09:01][INFO][RANK0]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
[HUGECTR][01:09:04][INFO][RANK0]: Starting AUC NCCL warm-up
[HUGECTR][01:09:04][INFO][RANK0]: Warm-up done
===================================================Model Summary===================================================
label Dense Sparse
label dense data1,movieId
(None, 1) (None, 0)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type Input Name Output Name Output Shape
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
LocalizedSlotSparseEmbeddingHash data1 sparse_embedding1 (None, 3, 16)
------------------------------------------------------------------------------------------------------------------
DistributedSlotSparseEmbeddingHash movieId pretrained_embedding (None, 1, 1024)
------------------------------------------------------------------------------------------------------------------
Reshape sparse_embedding1 reshape1 (None, 48)
------------------------------------------------------------------------------------------------------------------
Reshape pretrained_embedding reshape2 (None, 1024)
------------------------------------------------------------------------------------------------------------------
Concat reshape1 concat1 (None, 1072)
reshape2
------------------------------------------------------------------------------------------------------------------
InnerProduct concat1 fc1 (None, 128)
------------------------------------------------------------------------------------------------------------------
ReLU fc1 relu1 (None, 128)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu1 fc2 (None, 128)
------------------------------------------------------------------------------------------------------------------
ReLU fc2 relu2 (None, 128)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu2 fc3 (None, 1)
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss fc3 loss
label
------------------------------------------------------------------------------------------------------------------
[HUGECTR][01:09:04][INFO][RANK0]: Loading sparse model: ./hugectr_pretrained_embedding.model
=====================================================Model Fit=====================================================
[HUGECTR][01:09:06][INFO][RANK0]: Use non-epoch mode with number of iterations: 10001
[HUGECTR][01:09:06][INFO][RANK0]: Training batchsize: 2048, evaluation batchsize: 2048
[HUGECTR][01:09:06][INFO][RANK0]: Evaluation interval: 200, snapshot interval: 5000
[HUGECTR][01:09:06][INFO][RANK0]: Dense network trainable: True
[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding pretrained_embedding trainable: False
[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding sparse_embedding1 trainable: True
[HUGECTR][01:09:06][INFO][RANK0]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HUGECTR][01:09:06][INFO][RANK0]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
[HUGECTR][01:09:06][INFO][RANK0]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HUGECTR][01:09:06][INFO][RANK0]: Training source file: ./data/train-hugectr/_file_list.txt
[HUGECTR][01:09:06][INFO][RANK0]: Evaluation source file: ./data/valid-hugectr/_file_list.txt
[HUGECTR][01:09:08][INFO][RANK0]: Iter: 100 Time(100 iters): 2.297110s Loss: 0.581705 lr:0.001000
[HUGECTR][01:09:11][INFO][RANK0]: Iter: 200 Time(100 iters): 2.274680s Loss: 0.574425 lr:0.001000
[HUGECTR][01:09:11][INFO][RANK0]: Evaluation, AUC: 0.746443
[HUGECTR][01:09:11][INFO][RANK0]: Eval Time for 160 iters: 0.054157s
[HUGECTR][01:09:13][INFO][RANK0]: Iter: 300 Time(100 iters): 2.332273s Loss: 0.564224 lr:0.001000
[HUGECTR][01:09:15][INFO][RANK0]: Iter: 400 Time(100 iters): 2.277900s Loss: 0.550730 lr:0.001000
[HUGECTR][01:09:15][INFO][RANK0]: Evaluation, AUC: 0.764630
[HUGECTR][01:09:15][INFO][RANK0]: Eval Time for 160 iters: 0.054009s
[HUGECTR][01:09:18][INFO][RANK0]: Iter: 500 Time(100 iters): 2.434429s Loss: 0.536507 lr:0.001000
[HUGECTR][01:09:20][INFO][RANK0]: Iter: 600 Time(100 iters): 2.279014s Loss: 0.525059 lr:0.001000
[HUGECTR][01:09:20][INFO][RANK0]: Evaluation, AUC: 0.773702
[HUGECTR][01:09:20][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
[HUGECTR][01:09:22][INFO][RANK0]: Iter: 700 Time(100 iters): 2.335757s Loss: 0.532503 lr:0.001000
[HUGECTR][01:09:25][INFO][RANK0]: Iter: 800 Time(100 iters): 2.278661s Loss: 0.526352 lr:0.001000
[HUGECTR][01:09:25][INFO][RANK0]: Evaluation, AUC: 0.779897
[HUGECTR][01:09:25][INFO][RANK0]: Eval Time for 160 iters: 0.167787s
[HUGECTR][01:09:27][INFO][RANK0]: Iter: 900 Time(100 iters): 2.447136s Loss: 0.547141 lr:0.001000
[HUGECTR][01:09:29][INFO][RANK0]: Iter: 1000 Time(100 iters): 2.376035s Loss: 0.548916 lr:0.001000
[HUGECTR][01:09:30][INFO][RANK0]: Evaluation, AUC: 0.784775
[HUGECTR][01:09:30][INFO][RANK0]: Eval Time for 160 iters: 0.054224s
[HUGECTR][01:09:32][INFO][RANK0]: Iter: 1100 Time(100 iters): 2.334735s Loss: 0.540766 lr:0.001000
[HUGECTR][01:09:34][INFO][RANK0]: Iter: 1200 Time(100 iters): 2.277728s Loss: 0.515882 lr:0.001000
[HUGECTR][01:09:34][INFO][RANK0]: Evaluation, AUC: 0.786808
[HUGECTR][01:09:34][INFO][RANK0]: Eval Time for 160 iters: 0.054551s
[HUGECTR][01:09:36][INFO][RANK0]: Iter: 1300 Time(100 iters): 2.336372s Loss: 0.531510 lr:0.001000
[HUGECTR][01:09:39][INFO][RANK0]: Iter: 1400 Time(100 iters): 2.277408s Loss: 0.511901 lr:0.001000
[HUGECTR][01:09:39][INFO][RANK0]: Evaluation, AUC: 0.791416
[HUGECTR][01:09:39][INFO][RANK0]: Eval Time for 160 iters: 0.165986s
[HUGECTR][01:09:41][INFO][RANK0]: Iter: 1500 Time(100 iters): 2.554217s Loss: 0.522047 lr:0.001000
[HUGECTR][01:09:44][INFO][RANK0]: Iter: 1600 Time(100 iters): 2.279548s Loss: 0.540521 lr:0.001000
[HUGECTR][01:09:44][INFO][RANK0]: Evaluation, AUC: 0.793460
[HUGECTR][01:09:44][INFO][RANK0]: Eval Time for 160 iters: 0.054801s
[HUGECTR][01:09:46][INFO][RANK0]: Iter: 1700 Time(100 iters): 2.336303s Loss: 0.525447 lr:0.001000
[HUGECTR][01:09:48][INFO][RANK0]: Iter: 1800 Time(100 iters): 2.278906s Loss: 0.523558 lr:0.001000
[HUGECTR][01:09:48][INFO][RANK0]: Evaluation, AUC: 0.793137
[HUGECTR][01:09:48][INFO][RANK0]: Eval Time for 160 iters: 0.054431s
[HUGECTR][01:09:51][INFO][RANK0]: Iter: 1900 Time(100 iters): 2.336023s Loss: 0.511348 lr:0.001000
[HUGECTR][01:09:53][INFO][RANK0]: Iter: 2000 Time(100 iters): 2.384979s Loss: 0.515268 lr:0.001000
[HUGECTR][01:09:53][INFO][RANK0]: Evaluation, AUC: 0.796599
[HUGECTR][01:09:53][INFO][RANK0]: Eval Time for 160 iters: 0.172160s
[HUGECTR][01:09:55][INFO][RANK0]: Iter: 2100 Time(100 iters): 2.453174s Loss: 0.526615 lr:0.001000
[HUGECTR][01:09:58][INFO][RANK0]: Iter: 2200 Time(100 iters): 2.278781s Loss: 0.536789 lr:0.001000
[HUGECTR][01:09:58][INFO][RANK0]: Evaluation, AUC: 0.798459
[HUGECTR][01:09:58][INFO][RANK0]: Eval Time for 160 iters: 0.054509s
[HUGECTR][01:10:00][INFO][RANK0]: Iter: 2300 Time(100 iters): 2.335596s Loss: 0.508902 lr:0.001000
[HUGECTR][01:10:02][INFO][RANK0]: Iter: 2400 Time(100 iters): 2.277901s Loss: 0.520411 lr:0.001000
[HUGECTR][01:10:02][INFO][RANK0]: Evaluation, AUC: 0.798726
[HUGECTR][01:10:02][INFO][RANK0]: Eval Time for 160 iters: 0.054518s
[HUGECTR][01:10:05][INFO][RANK0]: Iter: 2500 Time(100 iters): 2.444557s Loss: 0.490832 lr:0.001000
[HUGECTR][01:10:07][INFO][RANK0]: Iter: 2600 Time(100 iters): 2.279310s Loss: 0.507799 lr:0.001000
[HUGECTR][01:10:07][INFO][RANK0]: Evaluation, AUC: 0.801325
[HUGECTR][01:10:07][INFO][RANK0]: Eval Time for 160 iters: 0.164203s
[HUGECTR][01:10:10][INFO][RANK0]: Iter: 2700 Time(100 iters): 2.443310s Loss: 0.519460 lr:0.001000
[HUGECTR][01:10:12][INFO][RANK0]: Iter: 2800 Time(100 iters): 2.277569s Loss: 0.512426 lr:0.001000
[HUGECTR][01:10:12][INFO][RANK0]: Evaluation, AUC: 0.800731
[HUGECTR][01:10:12][INFO][RANK0]: Eval Time for 160 iters: 0.054590s
[HUGECTR][01:10:14][INFO][RANK0]: Iter: 2900 Time(100 iters): 2.336213s Loss: 0.512216 lr:0.001000
[HUGECTR][01:10:17][INFO][RANK0]: Iter: 3000 Time(100 iters): 2.384833s Loss: 0.522102 lr:0.001000
[HUGECTR][01:10:17][INFO][RANK0]: Evaluation, AUC: 0.803801
[HUGECTR][01:10:17][INFO][RANK0]: Eval Time for 160 iters: 0.054133s
[HUGECTR][01:10:19][INFO][RANK0]: Iter: 3100 Time(100 iters): 2.334245s Loss: 0.507463 lr:0.001000
[HUGECTR][01:10:21][INFO][RANK0]: Iter: 3200 Time(100 iters): 2.279046s Loss: 0.526148 lr:0.001000
[HUGECTR][01:10:21][INFO][RANK0]: Evaluation, AUC: 0.802950
[HUGECTR][01:10:21][INFO][RANK0]: Eval Time for 160 iters: 0.070003s
[HUGECTR][01:10:24][INFO][RANK0]: Iter: 3300 Time(100 iters): 2.352114s Loss: 0.504611 lr:0.001000
[HUGECTR][01:10:26][INFO][RANK0]: Iter: 3400 Time(100 iters): 2.277292s Loss: 0.502907 lr:0.001000
[HUGECTR][01:10:26][INFO][RANK0]: Evaluation, AUC: 0.804364
[HUGECTR][01:10:26][INFO][RANK0]: Eval Time for 160 iters: 0.054315s
[HUGECTR][01:10:28][INFO][RANK0]: Iter: 3500 Time(100 iters): 2.442956s Loss: 0.512927 lr:0.001000
[HUGECTR][01:10:31][INFO][RANK0]: Iter: 3600 Time(100 iters): 2.277974s Loss: 0.519042 lr:0.001000
[HUGECTR][01:10:31][INFO][RANK0]: Evaluation, AUC: 0.806404
[HUGECTR][01:10:31][INFO][RANK0]: Eval Time for 160 iters: 0.054291s
[HUGECTR][01:10:33][INFO][RANK0]: Iter: 3700 Time(100 iters): 2.335365s Loss: 0.499368 lr:0.001000
[HUGECTR][01:10:35][INFO][RANK0]: Iter: 3800 Time(100 iters): 2.277786s Loss: 0.509683 lr:0.001000
[HUGECTR][01:10:35][INFO][RANK0]: Evaluation, AUC: 0.805164
[HUGECTR][01:10:35][INFO][RANK0]: Eval Time for 160 iters: 0.064908s
[HUGECTR][01:10:38][INFO][RANK0]: Iter: 3900 Time(100 iters): 2.344106s Loss: 0.508182 lr:0.001000
[HUGECTR][01:10:40][INFO][RANK0]: Iter: 4000 Time(100 iters): 2.387872s Loss: 0.493841 lr:0.001000
[HUGECTR][01:10:40][INFO][RANK0]: Evaluation, AUC: 0.808367
[HUGECTR][01:10:40][INFO][RANK0]: Eval Time for 160 iters: 0.054222s
[HUGECTR][01:10:42][INFO][RANK0]: Iter: 4100 Time(100 iters): 2.335361s Loss: 0.508106 lr:0.001000
[HUGECTR][01:10:45][INFO][RANK0]: Iter: 4200 Time(100 iters): 2.278802s Loss: 0.519000 lr:0.001000
[HUGECTR][01:10:45][INFO][RANK0]: Evaluation, AUC: 0.808897
[HUGECTR][01:10:45][INFO][RANK0]: Eval Time for 160 iters: 0.054320s
[HUGECTR][01:10:47][INFO][RANK0]: Iter: 4300 Time(100 iters): 2.334094s Loss: 0.502797 lr:0.001000
[HUGECTR][01:10:49][INFO][RANK0]: Iter: 4400 Time(100 iters): 2.388990s Loss: 0.508890 lr:0.001000
[HUGECTR][01:10:49][INFO][RANK0]: Evaluation, AUC: 0.809649
[HUGECTR][01:10:49][INFO][RANK0]: Eval Time for 160 iters: 0.074584s
[HUGECTR][01:10:52][INFO][RANK0]: Iter: 4500 Time(100 iters): 2.355005s Loss: 0.505778 lr:0.001000
[HUGECTR][01:10:54][INFO][RANK0]: Iter: 4600 Time(100 iters): 2.277275s Loss: 0.532776 lr:0.001000
[HUGECTR][01:10:54][INFO][RANK0]: Evaluation, AUC: 0.810962
[HUGECTR][01:10:54][INFO][RANK0]: Eval Time for 160 iters: 0.054498s
[HUGECTR][01:10:56][INFO][RANK0]: Iter: 4700 Time(100 iters): 2.335553s Loss: 0.503001 lr:0.001000
[HUGECTR][01:10:59][INFO][RANK0]: Iter: 4800 Time(100 iters): 2.279237s Loss: 0.495762 lr:0.001000
[HUGECTR][01:10:59][INFO][RANK0]: Evaluation, AUC: 0.808618
[HUGECTR][01:10:59][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
[HUGECTR][01:11:01][INFO][RANK0]: Iter: 4900 Time(100 iters): 2.449926s Loss: 0.503213 lr:0.001000
[HUGECTR][01:11:03][INFO][RANK0]: Iter: 5000 Time(100 iters): 2.277141s Loss: 0.481138 lr:0.001000
[HUGECTR][01:11:03][INFO][RANK0]: Evaluation, AUC: 0.810767
[HUGECTR][01:11:03][INFO][RANK0]: Eval Time for 160 iters: 0.064807s
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Dump hash table from GPU0
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table <key,value> pairs to file
[HUGECTR][01:11:04][INFO][RANK0]: Done
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table to file
[HUGECTR][01:11:13][INFO][RANK0]: Dumping sparse weights to files, successful
[HUGECTR][01:11:13][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:14][INFO][RANK0]: Done
[HUGECTR][01:11:14][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:15][INFO][RANK0]: Done
[HUGECTR][01:11:34][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:35][INFO][RANK0]: Done
[HUGECTR][01:11:35][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:36][INFO][RANK0]: Done
[HUGECTR][01:11:55][INFO][RANK0]: Dumping sparse optimzer states to files, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense weights to file, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense optimizer states to file, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping untrainable weights to file, successful
[HUGECTR][01:11:57][INFO][RANK0]: Iter: 5100 Time(100 iters): 53.630313s Loss: 0.485568 lr:0.001000
[HUGECTR][01:11:59][INFO][RANK0]: Iter: 5200 Time(100 iters): 2.278359s Loss: 0.518924 lr:0.001000
[HUGECTR][01:11:59][INFO][RANK0]: Evaluation, AUC: 0.811217
[HUGECTR][01:11:59][INFO][RANK0]: Eval Time for 160 iters: 0.054624s
[HUGECTR][01:12:02][INFO][RANK0]: Iter: 5300 Time(100 iters): 2.336246s Loss: 0.516505 lr:0.001000
[HUGECTR][01:12:04][INFO][RANK0]: Iter: 5400 Time(100 iters): 2.384571s Loss: 0.512404 lr:0.001000
[HUGECTR][01:12:04][INFO][RANK0]: Evaluation, AUC: 0.811464
[HUGECTR][01:12:04][INFO][RANK0]: Eval Time for 160 iters: 0.054350s
[HUGECTR][01:12:06][INFO][RANK0]: Iter: 5500 Time(100 iters): 2.334675s Loss: 0.500305 lr:0.001000
[HUGECTR][01:12:09][INFO][RANK0]: Iter: 5600 Time(100 iters): 2.279563s Loss: 0.484969 lr:0.001000