http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-7-training-with-hugectr/nvidia_logo.png

Training HugeCTR Model with Pre-trained Embeddings

In this notebook, we will train a deep neural network for predicting user’s rating (binary target with 1 for ratings >3 and 0 for ratings <=3). The two categorical features are userId and movieId.

We will also make use of movie’s pretrained embeddings, extracted in the previous notebooks.

Loading pretrained movie features into non-trainable embedding layer

# loading NVTabular movie encoding
import pandas as pd
import os

INPUT_DATA_DIR = './data'
movie_mapping = pd.read_parquet(os.path.join(INPUT_DATA_DIR, "workflow-hugectr/categories/unique.movieId.parquet"))
movie_mapping.tail()
movieId movieId_size
56581 209155 1
56582 209157 1
56583 209159 1
56584 209169 1
56585 209171 1
feature_df = pd.read_parquet('feature_df.parquet')
print(feature_df.shape)
feature_df.head()
(62423, 3073)
movieId poster_feature_0 poster_feature_1 poster_feature_2 poster_feature_3 poster_feature_4 poster_feature_5 poster_feature_6 poster_feature_7 poster_feature_8 ... text_feature_1014 text_feature_1015 text_feature_1016 text_feature_1017 text_feature_1018 text_feature_1019 text_feature_1020 text_feature_1021 text_feature_1022 text_feature_1023
0 1.0 0.026260 0.857608 0.410247 0.066654 0.382803 0.899998 0.511562 0.592291 0.565434 ... 0.636716 0.578369 0.996169 0.402107 0.412318 0.859952 0.293852 0.341114 0.727113 0.085829
1 2.0 0.141265 0.721758 0.679958 0.955634 0.391091 0.324611 0.505211 0.258331 0.048264 ... 0.161505 0.431864 0.836532 0.525013 0.654566 0.823841 0.818313 0.856280 0.638048 0.685537
2 3.0 0.119418 0.911146 0.470762 0.762258 0.626335 0.768947 0.241833 0.775992 0.236340 ... 0.865548 0.387806 0.668321 0.552122 0.750238 0.863707 0.382173 0.894487 0.565142 0.164083
3 4.0 0.538184 0.980678 0.643513 0.928519 0.794906 0.201022 0.744666 0.962188 0.915320 ... 0.777534 0.904200 0.167337 0.875194 0.180481 0.815904 0.808288 0.036711 0.902779 0.580946
4 5.0 0.772951 0.239788 0.061874 0.162997 0.388310 0.236311 0.162757 0.207134 0.111078 ... 0.250022 0.335043 0.091674 0.121507 0.418124 0.150020 0.803506 0.059504 0.002342 0.932321

5 rows × 3073 columns

feature_df.set_index('movieId', inplace=True)
from tqdm import tqdm
import numpy as np

num_tokens = len(movie_mapping)
embedding_dim = 2048+1024
hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))

print("Loading pretrained embedding matrix...")
for i, row in tqdm(movie_mapping.iterrows(), total=len(movie_mapping)):
    movieId = row['movieId']
    if movieId in feature_df.index: 
        embedding_vector = feature_df.loc[movieId]
        # embedding found
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
print("Found features for %d movies (%d misses)" % (hits, misses))
Loading pretrained embedding matrix...
100%|████████████████████████████████████| 56586/56586 [00:14<00:00, 3967.46it/s]
Found features for 56585 movies (1 misses)

embedding_dim
3072
embedding_matrix
array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.17294852, 0.15285189, 0.26095702, ..., 0.75369112, 0.29602144,
        0.78917433],
       [0.13539355, 0.84843078, 0.70951219, ..., 0.10441725, 0.72871966,
        0.11719463],
       ...,
       [0.18514273, 0.72422918, 0.04273015, ..., 0.1404219 , 0.54169348,
        0.96875489],
       [0.08307642, 0.3673532 , 0.15777258, ..., 0.01297393, 0.36267638,
        0.14848055],
       [0.82188376, 0.56516905, 0.70838085, ..., 0.45119769, 0.9273439 ,
        0.42464321]])

Next, we write the pretrained embedding to a raw format supported by HugeCTR.

Note: As of version 3.2, HugeCTR only supports a maximum embedding size of 1024. Hence, we shall be using the first 512 element of image embedding plus 512 element of text embedding.

import struct

PRETRAINED_EMBEDDING_SIZE = 1024

def convert_pretrained_embeddings_to_sparse_model(keys, pre_trained_sparse_embeddings, hugectr_sparse_model, embedding_vec_size):
    os.system("mkdir -p {}".format(hugectr_sparse_model))
    with open("{}/key".format(hugectr_sparse_model), 'wb') as key_file, \
        open("{}/emb_vector".format(hugectr_sparse_model), 'wb') as vec_file:
                
        for i, key in enumerate(keys):
            vec = np.concatenate([pre_trained_sparse_embeddings[i,:int(PRETRAINED_EMBEDDING_SIZE/2)], pre_trained_sparse_embeddings[i, 1024:1024+int(PRETRAINED_EMBEDDING_SIZE/2)]])
            key_struct = struct.pack('q', key)
            vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
            key_file.write(key_struct)
            vec_file.write(vec_struct)

keys = list(movie_mapping.index)
convert_pretrained_embeddings_to_sparse_model(keys, embedding_matrix, 'hugectr_pretrained_embedding.model', embedding_vec_size=PRETRAINED_EMBEDDING_SIZE) # HugeCTR not supporting embedding size > 1024

Define and train model

In this section, we define and train the model. The model comprise trainable embedding layers for categorical features (userId, movieId) and pretrained (non-trainable) embedding layer for movie features.

We will write the model to ./model.py and execute it afterwards.

First, we need the cardinalities of each categorical feature to assign as slot_size_array in the model below.

import nvtabular as nvt
from nvtabular.ops import get_embedding_sizes

workflow = nvt.Workflow.load(os.path.join(INPUT_DATA_DIR, "workflow-hugectr"))

embeddings = get_embedding_sizes(workflow)
print(embeddings)

#{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}

We use graph_to_json to convert the model to a JSON configuration, required for the inference.

%%writefile './model.py'

import hugectr
from mpi4py import MPI  # noqa
INPUT_DATA_DIR = './data/'

solver = hugectr.CreateSolver(
    vvgpu=[[0]],
    batchsize=2048,
    batchsize_eval=2048,
    max_eval_batches=160,
    i64_input_key=True,
    use_mixed_precision=False,
    repeat_dataset=True,
)
optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.Adam)
reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=[INPUT_DATA_DIR + "train-hugectr/_file_list.txt"],
    eval_source=INPUT_DATA_DIR + "valid-hugectr/_file_list.txt",
    check_type=hugectr.Check_t.Non,
    slot_size_array=[162542, 56586, 21, 56586],
)

model = hugectr.Model(solver, reader, optimizer)

model.add(
    hugectr.Input(
        label_dim=1,
        label_name="label",
        dense_dim=0,
        dense_name="dense",
        data_reader_sparse_param_array=[
            hugectr.DataReaderSparseParam("data1", nnz_per_slot=[1, 1, 2], is_fixed_length=False, slot_num=3),
            hugectr.DataReaderSparseParam("movieId", nnz_per_slot=[1], is_fixed_length=True, slot_num=1)
        ],
    )
)
model.add(
    hugectr.SparseEmbedding(
        embedding_type=hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
        workspace_size_per_gpu_in_mb=3000,
        embedding_vec_size=16,
        combiner="sum",
        sparse_embedding_name="sparse_embedding1",
        bottom_name="data1",
        optimizer=optimizer,
    )
)

# pretrained embedding
model.add(
    hugectr.SparseEmbedding(
        embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
        workspace_size_per_gpu_in_mb=3000,
        embedding_vec_size=1024,
        combiner="sum",
        sparse_embedding_name="pretrained_embedding",
        bottom_name="movieId",
        optimizer=optimizer,
    )
)

model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
                            leading_dim=48))

model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["pretrained_embedding"],
                            top_names = ["reshape2"],
                            leading_dim=1024))

model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "reshape2"],
                            top_names = ["concat1"]))

model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["concat1"],
        top_names=["fc1"],
        num_output=128,
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU,
        bottom_names=["fc1"],
        top_names=["relu1"],
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu1"],
        top_names=["fc2"],
        num_output=128,
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.ReLU,
        bottom_names=["fc2"],
        top_names=["relu2"],
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu2"],
        top_names=["fc3"],
        num_output=1,
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
        bottom_names=["fc3", "label"],
        top_names=["loss"],
    )
)
model.compile()
model.summary()

# Load the pretrained embedding layer
model.load_sparse_weights({"pretrained_embedding": "./hugectr_pretrained_embedding.model"})
model.freeze_embedding("pretrained_embedding")

model.fit(max_iter=10001, display=100, eval_interval=200, snapshot=5000)
model.graph_to_json(graph_config_file="hugectr-movielens.json")
Overwriting ./model.py

We train our model.

!python model.py
HugeCTR Version: 3.2
====================================================Model Init=====================================================
[HUGECTR][01:09:00][INFO][RANK0]: Global seed is 476440390
[HUGECTR][01:09:00][INFO][RANK0]: Device to NUMA mapping:
  GPU 0 ->  node 0

[HUGECTR][01:09:01][WARNING][RANK0]: Peer-to-peer access cannot be fully enabled.
[HUGECTR][01:09:01][INFO][RANK0]: Start all2all warmup
[HUGECTR][01:09:01][INFO][RANK0]: End all2all warmup
[HUGECTR][01:09:01][INFO][RANK0]: Using All-reduce algorithm: NCCL
[HUGECTR][01:09:01][INFO][RANK0]: Device 0: Tesla V100-SXM2-16GB
[HUGECTR][01:09:01][INFO][RANK0]: num of DataReader workers: 1
[HUGECTR][01:09:01][INFO][RANK0]: Vocabulary size: 275735
[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=16384000
[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=256000
[HUGECTR][01:09:01][INFO][RANK0]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
[HUGECTR][01:09:04][INFO][RANK0]: Starting AUC NCCL warm-up
[HUGECTR][01:09:04][INFO][RANK0]: Warm-up done
===================================================Model Summary===================================================
label                                   Dense                         Sparse                        
label                                   dense                          data1,movieId                 
(None, 1)                               (None, 0)                               
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type                              Input Name                    Output Name                   Output Shape                  
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
LocalizedSlotSparseEmbeddingHash        data1                         sparse_embedding1             (None, 3, 16)                 
------------------------------------------------------------------------------------------------------------------
DistributedSlotSparseEmbeddingHash      movieId                       pretrained_embedding          (None, 1, 1024)               
------------------------------------------------------------------------------------------------------------------
Reshape                                 sparse_embedding1             reshape1                      (None, 48)                    
------------------------------------------------------------------------------------------------------------------
Reshape                                 pretrained_embedding          reshape2                      (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
Concat                                  reshape1                      concat1                       (None, 1072)                  
                                        reshape2                                                                                  
------------------------------------------------------------------------------------------------------------------
InnerProduct                            concat1                       fc1                           (None, 128)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc1                           relu1                         (None, 128)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu1                         fc2                           (None, 128)                   
------------------------------------------------------------------------------------------------------------------
ReLU                                    fc2                           relu2                         (None, 128)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            relu2                         fc3                           (None, 1)                     
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss                  fc3                           loss                                                        
                                        label                                                                                     
------------------------------------------------------------------------------------------------------------------
[HUGECTR][01:09:04][INFO][RANK0]: Loading sparse model: ./hugectr_pretrained_embedding.model
=====================================================Model Fit=====================================================
[HUGECTR][01:09:06][INFO][RANK0]: Use non-epoch mode with number of iterations: 10001
[HUGECTR][01:09:06][INFO][RANK0]: Training batchsize: 2048, evaluation batchsize: 2048
[HUGECTR][01:09:06][INFO][RANK0]: Evaluation interval: 200, snapshot interval: 5000
[HUGECTR][01:09:06][INFO][RANK0]: Dense network trainable: True
[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding pretrained_embedding trainable: False
[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding sparse_embedding1 trainable: True
[HUGECTR][01:09:06][INFO][RANK0]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HUGECTR][01:09:06][INFO][RANK0]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
[HUGECTR][01:09:06][INFO][RANK0]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HUGECTR][01:09:06][INFO][RANK0]: Training source file: ./data/train-hugectr/_file_list.txt
[HUGECTR][01:09:06][INFO][RANK0]: Evaluation source file: ./data/valid-hugectr/_file_list.txt
[HUGECTR][01:09:08][INFO][RANK0]: Iter: 100 Time(100 iters): 2.297110s Loss: 0.581705 lr:0.001000
[HUGECTR][01:09:11][INFO][RANK0]: Iter: 200 Time(100 iters): 2.274680s Loss: 0.574425 lr:0.001000
[HUGECTR][01:09:11][INFO][RANK0]: Evaluation, AUC: 0.746443
[HUGECTR][01:09:11][INFO][RANK0]: Eval Time for 160 iters: 0.054157s
[HUGECTR][01:09:13][INFO][RANK0]: Iter: 300 Time(100 iters): 2.332273s Loss: 0.564224 lr:0.001000
[HUGECTR][01:09:15][INFO][RANK0]: Iter: 400 Time(100 iters): 2.277900s Loss: 0.550730 lr:0.001000
[HUGECTR][01:09:15][INFO][RANK0]: Evaluation, AUC: 0.764630
[HUGECTR][01:09:15][INFO][RANK0]: Eval Time for 160 iters: 0.054009s
[HUGECTR][01:09:18][INFO][RANK0]: Iter: 500 Time(100 iters): 2.434429s Loss: 0.536507 lr:0.001000
[HUGECTR][01:09:20][INFO][RANK0]: Iter: 600 Time(100 iters): 2.279014s Loss: 0.525059 lr:0.001000
[HUGECTR][01:09:20][INFO][RANK0]: Evaluation, AUC: 0.773702
[HUGECTR][01:09:20][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
[HUGECTR][01:09:22][INFO][RANK0]: Iter: 700 Time(100 iters): 2.335757s Loss: 0.532503 lr:0.001000
[HUGECTR][01:09:25][INFO][RANK0]: Iter: 800 Time(100 iters): 2.278661s Loss: 0.526352 lr:0.001000
[HUGECTR][01:09:25][INFO][RANK0]: Evaluation, AUC: 0.779897
[HUGECTR][01:09:25][INFO][RANK0]: Eval Time for 160 iters: 0.167787s
[HUGECTR][01:09:27][INFO][RANK0]: Iter: 900 Time(100 iters): 2.447136s Loss: 0.547141 lr:0.001000
[HUGECTR][01:09:29][INFO][RANK0]: Iter: 1000 Time(100 iters): 2.376035s Loss: 0.548916 lr:0.001000
[HUGECTR][01:09:30][INFO][RANK0]: Evaluation, AUC: 0.784775
[HUGECTR][01:09:30][INFO][RANK0]: Eval Time for 160 iters: 0.054224s
[HUGECTR][01:09:32][INFO][RANK0]: Iter: 1100 Time(100 iters): 2.334735s Loss: 0.540766 lr:0.001000
[HUGECTR][01:09:34][INFO][RANK0]: Iter: 1200 Time(100 iters): 2.277728s Loss: 0.515882 lr:0.001000
[HUGECTR][01:09:34][INFO][RANK0]: Evaluation, AUC: 0.786808
[HUGECTR][01:09:34][INFO][RANK0]: Eval Time for 160 iters: 0.054551s
[HUGECTR][01:09:36][INFO][RANK0]: Iter: 1300 Time(100 iters): 2.336372s Loss: 0.531510 lr:0.001000
[HUGECTR][01:09:39][INFO][RANK0]: Iter: 1400 Time(100 iters): 2.277408s Loss: 0.511901 lr:0.001000
[HUGECTR][01:09:39][INFO][RANK0]: Evaluation, AUC: 0.791416
[HUGECTR][01:09:39][INFO][RANK0]: Eval Time for 160 iters: 0.165986s
[HUGECTR][01:09:41][INFO][RANK0]: Iter: 1500 Time(100 iters): 2.554217s Loss: 0.522047 lr:0.001000
[HUGECTR][01:09:44][INFO][RANK0]: Iter: 1600 Time(100 iters): 2.279548s Loss: 0.540521 lr:0.001000
[HUGECTR][01:09:44][INFO][RANK0]: Evaluation, AUC: 0.793460
[HUGECTR][01:09:44][INFO][RANK0]: Eval Time for 160 iters: 0.054801s
[HUGECTR][01:09:46][INFO][RANK0]: Iter: 1700 Time(100 iters): 2.336303s Loss: 0.525447 lr:0.001000
[HUGECTR][01:09:48][INFO][RANK0]: Iter: 1800 Time(100 iters): 2.278906s Loss: 0.523558 lr:0.001000
[HUGECTR][01:09:48][INFO][RANK0]: Evaluation, AUC: 0.793137
[HUGECTR][01:09:48][INFO][RANK0]: Eval Time for 160 iters: 0.054431s
[HUGECTR][01:09:51][INFO][RANK0]: Iter: 1900 Time(100 iters): 2.336023s Loss: 0.511348 lr:0.001000
[HUGECTR][01:09:53][INFO][RANK0]: Iter: 2000 Time(100 iters): 2.384979s Loss: 0.515268 lr:0.001000
[HUGECTR][01:09:53][INFO][RANK0]: Evaluation, AUC: 0.796599
[HUGECTR][01:09:53][INFO][RANK0]: Eval Time for 160 iters: 0.172160s
[HUGECTR][01:09:55][INFO][RANK0]: Iter: 2100 Time(100 iters): 2.453174s Loss: 0.526615 lr:0.001000
[HUGECTR][01:09:58][INFO][RANK0]: Iter: 2200 Time(100 iters): 2.278781s Loss: 0.536789 lr:0.001000
[HUGECTR][01:09:58][INFO][RANK0]: Evaluation, AUC: 0.798459
[HUGECTR][01:09:58][INFO][RANK0]: Eval Time for 160 iters: 0.054509s
[HUGECTR][01:10:00][INFO][RANK0]: Iter: 2300 Time(100 iters): 2.335596s Loss: 0.508902 lr:0.001000
[HUGECTR][01:10:02][INFO][RANK0]: Iter: 2400 Time(100 iters): 2.277901s Loss: 0.520411 lr:0.001000
[HUGECTR][01:10:02][INFO][RANK0]: Evaluation, AUC: 0.798726
[HUGECTR][01:10:02][INFO][RANK0]: Eval Time for 160 iters: 0.054518s
[HUGECTR][01:10:05][INFO][RANK0]: Iter: 2500 Time(100 iters): 2.444557s Loss: 0.490832 lr:0.001000
[HUGECTR][01:10:07][INFO][RANK0]: Iter: 2600 Time(100 iters): 2.279310s Loss: 0.507799 lr:0.001000
[HUGECTR][01:10:07][INFO][RANK0]: Evaluation, AUC: 0.801325
[HUGECTR][01:10:07][INFO][RANK0]: Eval Time for 160 iters: 0.164203s
[HUGECTR][01:10:10][INFO][RANK0]: Iter: 2700 Time(100 iters): 2.443310s Loss: 0.519460 lr:0.001000
[HUGECTR][01:10:12][INFO][RANK0]: Iter: 2800 Time(100 iters): 2.277569s Loss: 0.512426 lr:0.001000
[HUGECTR][01:10:12][INFO][RANK0]: Evaluation, AUC: 0.800731
[HUGECTR][01:10:12][INFO][RANK0]: Eval Time for 160 iters: 0.054590s
[HUGECTR][01:10:14][INFO][RANK0]: Iter: 2900 Time(100 iters): 2.336213s Loss: 0.512216 lr:0.001000
[HUGECTR][01:10:17][INFO][RANK0]: Iter: 3000 Time(100 iters): 2.384833s Loss: 0.522102 lr:0.001000
[HUGECTR][01:10:17][INFO][RANK0]: Evaluation, AUC: 0.803801
[HUGECTR][01:10:17][INFO][RANK0]: Eval Time for 160 iters: 0.054133s
[HUGECTR][01:10:19][INFO][RANK0]: Iter: 3100 Time(100 iters): 2.334245s Loss: 0.507463 lr:0.001000
[HUGECTR][01:10:21][INFO][RANK0]: Iter: 3200 Time(100 iters): 2.279046s Loss: 0.526148 lr:0.001000
[HUGECTR][01:10:21][INFO][RANK0]: Evaluation, AUC: 0.802950
[HUGECTR][01:10:21][INFO][RANK0]: Eval Time for 160 iters: 0.070003s
[HUGECTR][01:10:24][INFO][RANK0]: Iter: 3300 Time(100 iters): 2.352114s Loss: 0.504611 lr:0.001000
[HUGECTR][01:10:26][INFO][RANK0]: Iter: 3400 Time(100 iters): 2.277292s Loss: 0.502907 lr:0.001000
[HUGECTR][01:10:26][INFO][RANK0]: Evaluation, AUC: 0.804364
[HUGECTR][01:10:26][INFO][RANK0]: Eval Time for 160 iters: 0.054315s
[HUGECTR][01:10:28][INFO][RANK0]: Iter: 3500 Time(100 iters): 2.442956s Loss: 0.512927 lr:0.001000
[HUGECTR][01:10:31][INFO][RANK0]: Iter: 3600 Time(100 iters): 2.277974s Loss: 0.519042 lr:0.001000
[HUGECTR][01:10:31][INFO][RANK0]: Evaluation, AUC: 0.806404
[HUGECTR][01:10:31][INFO][RANK0]: Eval Time for 160 iters: 0.054291s
[HUGECTR][01:10:33][INFO][RANK0]: Iter: 3700 Time(100 iters): 2.335365s Loss: 0.499368 lr:0.001000
[HUGECTR][01:10:35][INFO][RANK0]: Iter: 3800 Time(100 iters): 2.277786s Loss: 0.509683 lr:0.001000
[HUGECTR][01:10:35][INFO][RANK0]: Evaluation, AUC: 0.805164
[HUGECTR][01:10:35][INFO][RANK0]: Eval Time for 160 iters: 0.064908s
[HUGECTR][01:10:38][INFO][RANK0]: Iter: 3900 Time(100 iters): 2.344106s Loss: 0.508182 lr:0.001000
[HUGECTR][01:10:40][INFO][RANK0]: Iter: 4000 Time(100 iters): 2.387872s Loss: 0.493841 lr:0.001000
[HUGECTR][01:10:40][INFO][RANK0]: Evaluation, AUC: 0.808367
[HUGECTR][01:10:40][INFO][RANK0]: Eval Time for 160 iters: 0.054222s
[HUGECTR][01:10:42][INFO][RANK0]: Iter: 4100 Time(100 iters): 2.335361s Loss: 0.508106 lr:0.001000
[HUGECTR][01:10:45][INFO][RANK0]: Iter: 4200 Time(100 iters): 2.278802s Loss: 0.519000 lr:0.001000
[HUGECTR][01:10:45][INFO][RANK0]: Evaluation, AUC: 0.808897
[HUGECTR][01:10:45][INFO][RANK0]: Eval Time for 160 iters: 0.054320s
[HUGECTR][01:10:47][INFO][RANK0]: Iter: 4300 Time(100 iters): 2.334094s Loss: 0.502797 lr:0.001000
[HUGECTR][01:10:49][INFO][RANK0]: Iter: 4400 Time(100 iters): 2.388990s Loss: 0.508890 lr:0.001000
[HUGECTR][01:10:49][INFO][RANK0]: Evaluation, AUC: 0.809649
[HUGECTR][01:10:49][INFO][RANK0]: Eval Time for 160 iters: 0.074584s
[HUGECTR][01:10:52][INFO][RANK0]: Iter: 4500 Time(100 iters): 2.355005s Loss: 0.505778 lr:0.001000
[HUGECTR][01:10:54][INFO][RANK0]: Iter: 4600 Time(100 iters): 2.277275s Loss: 0.532776 lr:0.001000
[HUGECTR][01:10:54][INFO][RANK0]: Evaluation, AUC: 0.810962
[HUGECTR][01:10:54][INFO][RANK0]: Eval Time for 160 iters: 0.054498s
[HUGECTR][01:10:56][INFO][RANK0]: Iter: 4700 Time(100 iters): 2.335553s Loss: 0.503001 lr:0.001000
[HUGECTR][01:10:59][INFO][RANK0]: Iter: 4800 Time(100 iters): 2.279237s Loss: 0.495762 lr:0.001000
[HUGECTR][01:10:59][INFO][RANK0]: Evaluation, AUC: 0.808618
[HUGECTR][01:10:59][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
[HUGECTR][01:11:01][INFO][RANK0]: Iter: 4900 Time(100 iters): 2.449926s Loss: 0.503213 lr:0.001000
[HUGECTR][01:11:03][INFO][RANK0]: Iter: 5000 Time(100 iters): 2.277141s Loss: 0.481138 lr:0.001000
[HUGECTR][01:11:03][INFO][RANK0]: Evaluation, AUC: 0.810767
[HUGECTR][01:11:03][INFO][RANK0]: Eval Time for 160 iters: 0.064807s
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Dump hash table from GPU0
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table <key,value> pairs to file
[HUGECTR][01:11:04][INFO][RANK0]: Done
[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table to file
[HUGECTR][01:11:13][INFO][RANK0]: Dumping sparse weights to files, successful
[HUGECTR][01:11:13][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:14][INFO][RANK0]: Done
[HUGECTR][01:11:14][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:15][INFO][RANK0]: Done
[HUGECTR][01:11:34][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:35][INFO][RANK0]: Done
[HUGECTR][01:11:35][INFO][RANK0]: Rank0: Write optimzer state to file
[HUGECTR][01:11:36][INFO][RANK0]: Done
[HUGECTR][01:11:55][INFO][RANK0]: Dumping sparse optimzer states to files, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense weights to file, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense optimizer states to file, successful
[HUGECTR][01:11:55][INFO][RANK0]: Dumping untrainable weights to file, successful
[HUGECTR][01:11:57][INFO][RANK0]: Iter: 5100 Time(100 iters): 53.630313s Loss: 0.485568 lr:0.001000
[HUGECTR][01:11:59][INFO][RANK0]: Iter: 5200 Time(100 iters): 2.278359s Loss: 0.518924 lr:0.001000
[HUGECTR][01:11:59][INFO][RANK0]: Evaluation, AUC: 0.811217
[HUGECTR][01:11:59][INFO][RANK0]: Eval Time for 160 iters: 0.054624s
[HUGECTR][01:12:02][INFO][RANK0]: Iter: 5300 Time(100 iters): 2.336246s Loss: 0.516505 lr:0.001000
[HUGECTR][01:12:04][INFO][RANK0]: Iter: 5400 Time(100 iters): 2.384571s Loss: 0.512404 lr:0.001000
[HUGECTR][01:12:04][INFO][RANK0]: Evaluation, AUC: 0.811464
[HUGECTR][01:12:04][INFO][RANK0]: Eval Time for 160 iters: 0.054350s
[HUGECTR][01:12:06][INFO][RANK0]: Iter: 5500 Time(100 iters): 2.334675s Loss: 0.500305 lr:0.001000
[HUGECTR][01:12:09][INFO][RANK0]: Iter: 5600 Time(100 iters): 2.279563s Loss: 0.484969 lr:0.001000