# Copyright 2020 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

HugeCTR demo on Movie lens data

Overview

HugeCTR is a recommender-specific framework that is capable of distributed training across multiple GPUs and nodes for click-through-rate (CTR) estimation. HugeCTR is a component of NVIDIA Merlin (documentation | GitHub). Merlin which is a framework that accelerates the entire pipeline from data ingestion and training to deploying GPU-accelerated recommender systems.

Learning objectives

Training a deep-learning recommender model (DLRM) on the MovieLens 20M dataset.
Walk through data preprocessing, training a DLRM model with HugeCTR, and then using the movie embedding to answer item similarity queries.

Prerequisites

Docker containers

Start the notebook inside a running 22.06 or later NGC Docker container: nvcr.io/nvidia/merlin/merlin-training:22.06. The HugeCTR Python interface is installed to the path /usr/local/hugectr/lib/ and the path is added to the environment variable PYTHONPATH. You can use the HugeCTR Python interface within the Docker container without any additional configuration.

Hardware

This notebook requires a Pascal, Volta, Turing, Ampere or newer GPUs, such as P100, V100, T4 or A100. You can view the GPU information with the nvidia-smi command:

!nvidia-smi

Mon Jul 12 06:54:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:1A:00.0 Off |                    0 |
| N/A   29C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000000:1B:00.0 Off |                    0 |
| N/A   27C    P0    22W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  On   | 00000000:3D:00.0 Off |                    0 |
| N/A   26C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  On   | 00000000:3E:00.0 Off |                    0 |
| N/A   28C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-PCIE...  On   | 00000000:88:00.0 Off |                    0 |
| N/A   25C    P0    24W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-PCIE...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   25C    P0    22W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-PCIE...  On   | 00000000:B1:00.0 Off |                    0 |
| N/A   26C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-PCIE...  On   | 00000000:B2:00.0 Off |                    0 |
| N/A   25C    P0    24W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Data download and preprocessing

We first install a few extra utilities for data preprocessing.

print("Downloading and installing 'tqdm' package.")
!pip3 -q install torch tqdm

print("Downloading and installing 'unzip' command")
!conda install -y -q -c conda-forge unzip

Downloading and installing 'tqdm' package.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Downloading and installing 'unzip' command
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - unzip


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    unzip-6.0                  |       h7f98852_2         143 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         143 KB

The following NEW packages will be INSTALLED:

  unzip              conda-forge/linux-64::unzip-6.0-h7f98852_2


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Next, we download and unzip the MovieLens 20M dataset.

print("Downloading and extracting 'Movie Lens 20M' dataset.")
!wget -nc http://files.grouplens.org/datasets/movielens/ml-20m.zip -P data -q --show-progress
!unzip -n data/ml-20m.zip -d data
!ls ./data

Downloading and extracting 'Movie Lens 20M' dataset.
ml-20m.zip          100%[===================>] 189.50M  46.1MB/s    in 4.5s    
Archive:  data/ml-20m.zip
   creating: data/ml-20m/
  inflating: data/ml-20m/genome-scores.csv  
  inflating: data/ml-20m/genome-tags.csv  
  inflating: data/ml-20m/links.csv   
  inflating: data/ml-20m/movies.csv  
  inflating: data/ml-20m/ratings.csv  
  inflating: data/ml-20m/README.txt  
  inflating: data/ml-20m/tags.csv    
ml-20m	ml-20m.zip

MovieLens data preprocessing

import pandas as pd
import torch
import tqdm

MIN_RATINGS = 20
USER_COLUMN = 'userId'
ITEM_COLUMN = 'movieId'

Next, we read the data into a Pandas dataframe and encode userID and itemID with integers.

df = pd.read_csv('./data/ml-20m/ratings.csv')
print("Filtering out users with less than {} ratings".format(MIN_RATINGS))
grouped = df.groupby(USER_COLUMN)
df = grouped.filter(lambda x: len(x) >= MIN_RATINGS)

print("Mapping original user and item IDs to new sequential IDs")
df[USER_COLUMN], unique_users = pd.factorize(df[USER_COLUMN])
df[ITEM_COLUMN], unique_items = pd.factorize(df[ITEM_COLUMN])

nb_users = len(unique_users)
nb_items = len(unique_items)

print("Number of users: %d\nNumber of items: %d"%(len(unique_users), len(unique_items)))

# Save the mapping to do the inference later on
import pickle
with open('./mappings.pickle', 'wb') as handle:
    pickle.dump({"users": unique_users, "items": unique_items}, handle, protocol=pickle.HIGHEST_PROTOCOL)

Filtering out users with less than 20 ratings
Mapping original user and item IDs to new sequential IDs
Number of users: 138493
Number of items: 26744

Next, we split the data into a train and test set. The last movie each user has recently rated is used for the test set.

# Need to sort before popping to get the last item
df.sort_values(by='timestamp', inplace=True)
    
# clean up data
del df['rating'], df['timestamp']
df = df.drop_duplicates() # assuming it keeps order

# now we have filtered and sorted by time data, we can split test data out
grouped_sorted = df.groupby(USER_COLUMN, group_keys=False)
test_data = grouped_sorted.tail(1).sort_values(by=USER_COLUMN)

# need to pop for each group
train_data = grouped_sorted.apply(lambda x: x.iloc[:-1])

train_data['target']=1
test_data['target']=1
train_data.head()

	movieId	target
20	20	1
19	19	1
86	86	1
61	61	1
23	23	1

Because the MovieLens data contains only positive examples, first we define a utility function to generate negative samples.

class _TestNegSampler:
    def __init__(self, train_ratings, nb_users, nb_items, nb_neg):
        self.nb_neg = nb_neg
        self.nb_users = nb_users 
        self.nb_items = nb_items 

        # compute unique ids for quickly created hash set and fast lookup
        ids = (train_ratings[:, 0] * self.nb_items) + train_ratings[:, 1]
        self.set = set(ids)

    def generate(self, batch_size=128*1024):
        users = torch.arange(0, self.nb_users).reshape([1, -1]).repeat([self.nb_neg, 1]).transpose(0, 1).reshape(-1)

        items = [-1] * len(users)

        random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
        print('Generating validation negatives...')
        for idx, u in enumerate(tqdm.tqdm(users.tolist())):
            if not random_items:
                random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
            j = random_items.pop()
            while u * self.nb_items + j in self.set:
                if not random_items:
                    random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
                j = random_items.pop()

            items[idx] = j
        items = torch.LongTensor(items)
        return items

Next, we generate the negative samples for training.

sampler = _TestNegSampler(df.values, nb_users, nb_items, 500)  # using 500 negative samples
train_negs = sampler.generate()
train_negs = train_negs.reshape(-1, 500)

sampler = _TestNegSampler(df.values, nb_users, nb_items, 100)  # using 100 negative samples
test_negs = sampler.generate()
test_negs = test_negs.reshape(-1, 100)

Generating validation negatives...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 69246500/69246500 [00:44<00:00, 1566380.37it/s]

Generating validation negatives...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13849300/13849300 [00:08<00:00, 1594800.54it/s]

import numpy as np

# generating negative samples for training
train_data_neg = np.zeros((train_negs.shape[0]*train_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(train_negs.shape[0])):
    for j in range(train_negs.shape[1]):
        train_data_neg[idx, 0] = i # user ID
        train_data_neg[idx, 1] = train_negs[i, j] # negative item ID
        idx += 1
    
# generating negative samples for testing
test_data_neg = np.zeros((test_negs.shape[0]*test_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(test_negs.shape[0])):
    for j in range(test_negs.shape[1]):
        test_data_neg[idx, 0] = i
        test_data_neg[idx, 1] = test_negs[i, j]
        idx += 1

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [04:07<00:00, 558.71it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [00:49<00:00, 2819.57it/s]

train_data_np= np.concatenate([train_data_neg, train_data.values])
np.random.shuffle(train_data_np)

test_data_np= np.concatenate([test_data_neg, test_data.values])
np.random.shuffle(test_data_np)

# HugeCTR expect user ID and item ID to be different, so we use 0 -> nb_users for user IDs and
# nb_users -> nb_users+nb_items for item IDs.
train_data_np[:,1] += nb_users 
test_data_np[:,1] += nb_users 

np.max(train_data_np[:,1])

Write HugeCTR data files

After pre-processing, we write the data to disk using HugeCTR the Norm dataset format.

from ctypes import c_longlong as ll
from ctypes import c_uint
from ctypes import c_float
from ctypes import c_int

def write_hugeCTR_data(huge_ctr_data, filename='huge_ctr_data.dat'):
    print("Writing %d samples"%huge_ctr_data.shape[0])
    with open(filename, 'wb') as f:
        #write header
        f.write(ll(0)) # 0: no error check; 1: check_num
        f.write(ll(huge_ctr_data.shape[0])) # the number of samples in this data file
        f.write(ll(1)) # dimension of label
        f.write(ll(1)) # dimension of dense feature
        f.write(ll(2)) # long long slot_num
        for _ in range(3): f.write(ll(0)) # reserved for future use

        for i in tqdm.tqdm(range(huge_ctr_data.shape[0])):
            f.write(c_float(huge_ctr_data[i,2])) # float label[label_dim];
            f.write(c_float(0)) # dummy dense feature
            f.write(c_int(1)) # slot 1 nnz: user ID
            f.write(c_uint(huge_ctr_data[i,0]))
            f.write(c_int(1)) # slot 2 nnz: item ID
            f.write(c_uint(huge_ctr_data[i,1]))

Train data

def generate_filelist(filelist_name, num_files, filename_prefix):
    with open(filelist_name, 'wt') as f:
        f.write('{0}\n'.format(num_files));
        for i in range(num_files):
            f.write('{0}_{1}.dat\n'.format(filename_prefix, i))

!rm -rf ./data/hugeCTR
!mkdir ./data/hugeCTR

for i, data_arr in enumerate(np.array_split(train_data_np,10)):
    write_hugeCTR_data(data_arr, filename='./data/hugeCTR/train_huge_ctr_data_%d.dat'%i)

generate_filelist('./data/hugeCTR/train_filelist.txt', 10, './data/hugeCTR/train_huge_ctr_data')

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:17<00:00, 513695.42it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 526049.22it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 525218.45it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 528084.97it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 525638.15it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 528931.43it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 531191.33it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 532537.58it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:16<00:00, 528103.37it/s]

Writing 8910827 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:17<00:00, 522249.44it/s]

Test data

for i, data_arr in enumerate(np.array_split(test_data_np,10)):
    write_hugeCTR_data(data_arr, filename='./data/hugeCTR/test_huge_ctr_data_%d.dat'%i)
    
generate_filelist('./data/hugeCTR/test_filelist.txt', 10, './data/hugeCTR/test_huge_ctr_data')

Writing 1398780 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:02<00:00, 510667.93it/s]

Writing 1398780 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:02<00:00, 523734.65it/s]

Writing 1398780 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:02<00:00, 512399.13it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 519540.59it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 522322.45it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 525051.49it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 527603.11it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 521668.76it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 517335.28it/s]

Writing 1398779 samples

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:02<00:00, 522761.79it/s]

HugeCTR DLRM training

In this section, we will train a DLRM network on the augmented movie lens data. First, we write the training Python script.

%%writefile hugectr_dlrm_movielens.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 1000,
                              batchsize_eval = 65536,
                              batchsize = 65536,
                              lr = 0.1,
                              warmup_steps = 1000,
                              decay_start = 10000,
                              decay_steps = 40000,
                              decay_power = 2.0,
                              end_lr = 1e-5,
                              vvgpu = [[0]],
                              repeat_dataset = True,
                              use_mixed_precision = True,
                              scaler = 1024)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                  source = ["./data/hugeCTR/train_filelist.txt"],
                                  eval_source = "./data/hugeCTR/test_filelist.txt",
                                  check_type = hugectr.Check_t.Non)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.SGD,
                                    update_type = hugectr.Update_t.Local,
                                    atomic_update = True)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 1, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("data1", 1, True, 2)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 41,
                            embedding_vec_size = 64,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "data1",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["dense"],
                            top_names = ["fc1"],
                            num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc1"],
                            top_names = ["fc2"],
                            num_output=128))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc2"],
                            top_names = ["fc3"],
                            num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Interaction,
                            bottom_names = ["fc3","sparse_embedding1"],
                            top_names = ["interaction1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["interaction1"],
                            top_names = ["fc4"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc4"],
                            top_names = ["fc5"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc5"],
                            top_names = ["fc6"],
                            num_output=512))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc6"],
                            top_names = ["fc7"],
                            num_output=256))                                                  
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["fc7"],
                            top_names = ["fc8"],
                            num_output=1))                                                                                           
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["fc8", "label"],
                            top_names = ["loss"]))
model.compile()
model.summary()
model.fit(max_iter = 50000, display = 1000, eval_interval = 3000, snapshot = 3000, snapshot_prefix = "./hugeCTR_saved_model_DLRM/")

Overwriting hugectr_dlrm_movielens.py

!rm -rf ./hugeCTR_saved_model_DLRM/
!mkdir ./hugeCTR_saved_model_DLRM/

!CUDA_VISIBLE_DEVICES=0 python3 hugectr_dlrm_movielens.py

====================================================Model Init=====================================================
[12d06h55m13s][HUGECTR][INFO]: Global seed is 2552343530
[12d06h55m15s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: Tesla V100-PCIE-32GB
[12d06h55m15s][HUGECTR][INFO]: num of DataReader workers: 12
[12d06h55m15s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=167936
[12d06h55m15s][HUGECTR][INFO]: All2All Warmup Start
[12d06h55m15s][HUGECTR][INFO]: All2All Warmup End
===================================================Model Compile===================================================
[12d06h56m10s][HUGECTR][INFO]: gpu0 start to init embedding
[12d06h56m10s][HUGECTR][INFO]: gpu0 init embedding done
===================================================Model Summary===================================================
Label                                   Dense                         Sparse                        
label                                   dense                          data1                         
(None, 1)                               (None, 1)                               
------------------------------------------------------------------------------------------------------------------
Layer Type                              Input Name                    Output Name                   Output Shape                  
------------------------------------------------------------------------------------------------------------------
LocalizedSlotSparseEmbeddingHash        data1                         sparse_embedding1             (None, 2, 64)                 
FusedInnerProduct                       dense                         fc1                           (None, 64)                    
FusedInnerProduct                       fc1                           fc2                           (None, 128)                   
FusedInnerProduct                       fc2                           fc3                           (None, 64)                    
Interaction                             fc3,sparse_embedding1         interaction1                  (None, 68)                    
FusedInnerProduct                       interaction1                  fc4                           (None, 1024)                  
FusedInnerProduct                       fc4                           fc5                           (None, 1024)                  
FusedInnerProduct                       fc5                           fc6                           (None, 512)                   
FusedInnerProduct                       fc6                           fc7                           (None, 256)                   
InnerProduct                            fc7                           fc8                           (None, 1)                     
BinaryCrossEntropyLoss                  fc8,label                     loss                                                        
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[12d60h56m10s][HUGECTR][INFO]: Use non-epoch mode with number of iterations: 50000
[12d60h56m10s][HUGECTR][INFO]: Training batchsize: 65536, evaluation batchsize: 65536
[12d60h56m10s][HUGECTR][INFO]: Evaluation interval: 3000, snapshot interval: 3000
[12d60h56m10s][HUGECTR][INFO]: Sparse embedding trainable: 1, dense network trainable: 1
[12d60h56m10s][HUGECTR][INFO]: Use mixed precision: 1, scaler: 1024.000000, use cuda graph: 1
[12d60h56m10s][HUGECTR][INFO]: lr: 0.100000, warmup_steps: 1000, decay_start: 10000, decay_steps: 40000, decay_power: 2.000000, end_lr: 0.000010
[12d60h56m10s][HUGECTR][INFO]: Training source file: ./data/hugeCTR/train_filelist.txt
[12d60h56m10s][HUGECTR][INFO]: Evaluation source file: ./data/hugeCTR/test_filelist.txt
[12d60h56m25s][HUGECTR][INFO]: Iter: 1000 Time(1000 iters): 14.895018s Loss: 0.534868 lr:0.100000
[12d60h56m40s][HUGECTR][INFO]: Iter: 2000 Time(1000 iters): 14.917098s Loss: 0.526272 lr:0.100000
[12d60h56m55s][HUGECTR][INFO]: Iter: 3000 Time(1000 iters): 14.945527s Loss: 0.504054 lr:0.100000
[12d60h57m10s][HUGECTR][INFO]: Evaluation, AUC: 0.698215
[12d60h57m10s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.962128s
[12d60h57m10s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d60h57m10s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d60h57m10s][HUGECTR][INFO]: Done
[12d60h57m10s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d60h57m10s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d60h57m10s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d60h57m10s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d60h57m10s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d60h57m16s][HUGECTR][INFO]: Iter: 4000 Time(1000 iters): 21.357401s Loss: 0.286658 lr:0.100000
[12d60h57m31s][HUGECTR][INFO]: Iter: 5000 Time(1000 iters): 15.037847s Loss: 0.249509 lr:0.100000
[12d60h57m46s][HUGECTR][INFO]: Iter: 6000 Time(1000 iters): 15.048834s Loss: 0.239949 lr:0.100000
[12d60h57m52s][HUGECTR][INFO]: Evaluation, AUC: 0.928999
[12d60h57m52s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.993647s
[12d60h57m52s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d60h57m52s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d60h57m52s][HUGECTR][INFO]: Done
[12d60h57m52s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d60h57m52s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d60h57m52s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d60h57m52s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d60h57m52s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d60h58m80s][HUGECTR][INFO]: Iter: 7000 Time(1000 iters): 21.364920s Loss: 0.242271 lr:0.100000
[12d60h58m23s][HUGECTR][INFO]: Iter: 8000 Time(1000 iters): 15.036863s Loss: 0.236050 lr:0.100000
[12d60h58m38s][HUGECTR][INFO]: Iter: 9000 Time(1000 iters): 15.042685s Loss: 0.235748 lr:0.100000
[12d60h58m44s][HUGECTR][INFO]: Evaluation, AUC: 0.937590
[12d60h58m44s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.990306s
[12d60h58m44s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d60h58m44s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d60h58m44s][HUGECTR][INFO]: Done
[12d60h58m44s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d60h58m44s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d60h58m44s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d60h58m44s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d60h58m44s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d60h58m59s][HUGECTR][INFO]: Iter: 10000 Time(1000 iters): 21.408894s Loss: 0.233947 lr:0.099995
[12d60h59m14s][HUGECTR][INFO]: Iter: 11000 Time(1000 iters): 15.050379s Loss: 0.231177 lr:0.095058
[12d60h59m29s][HUGECTR][INFO]: Iter: 12000 Time(1000 iters): 15.047381s Loss: 0.230662 lr:0.090245
[12d60h59m35s][HUGECTR][INFO]: Evaluation, AUC: 0.940782
[12d60h59m35s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.990065s
[12d60h59m35s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d60h59m35s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d60h59m35s][HUGECTR][INFO]: Done
[12d60h59m36s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d60h59m36s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d60h59m36s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d60h59m36s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d60h59m36s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d60h59m51s][HUGECTR][INFO]: Iter: 13000 Time(1000 iters): 21.492720s Loss: 0.229246 lr:0.085558
[12d70h00m60s][HUGECTR][INFO]: Iter: 14000 Time(1000 iters): 15.051535s Loss: 0.227302 lr:0.080996
[12d70h00m21s][HUGECTR][INFO]: Iter: 15000 Time(1000 iters): 15.062830s Loss: 0.22.067 lr:0.076558
[12d70h00m27s][HUGECTR][INFO]: Evaluation, AUC: 0.941291
[12d70h00m27s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.004500s
[12d70h00m27s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h00m27s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h00m27s][HUGECTR][INFO]: Done
[12d70h00m27s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h00m27s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h00m27s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h00m27s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h00m27s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h00m42s][HUGECTR][INFO]: Iter: 16000 Time(1000 iters): 21.480675s Loss: 0.220782 lr:0.072246
[12d70h00m57s][HUGECTR][INFO]: Iter: 17000 Time(1000 iters): 15.057642s Loss: 0.214406 lr:0.068058
[12d70h10m12s][HUGECTR][INFO]: Iter: 18000 Time(1000 iters): 15.068874s Loss: 0.211810 lr:0.063996
[12d70h10m18s][HUGECTR][INFO]: Evaluation, AUC: 0.943403
[12d70h10m18s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.994943s
[12d70h10m18s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h10m18s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h10m18s][HUGECTR][INFO]: Done
[12d70h10m19s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h10m19s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h10m19s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h10m19s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h10m19s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h10m34s][HUGECTR][INFO]: Iter: 19000 Time(1000 iters): 21.541020s Loss: 0.208731 lr:0.060059
[12d70h10m49s][HUGECTR][INFO]: Iter: 20000 Time(1000 iters): 15.051771s Loss: 0.206068 lr:0.056246
[12d70h20m40s][HUGECTR][INFO]: Iter: 21000 Time(1000 iters): 15.067925s Loss: 0.205040 lr:0.052559
[12d70h20m10s][HUGECTR][INFO]: Evaluation, AUC: 0.945471
[12d70h20m10s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.037830s
[12d70h20m10s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h20m10s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h20m10s][HUGECTR][INFO]: Done
[12d70h20m11s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h20m11s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h20m11s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h20m11s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h20m11s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h20m26s][HUGECTR][INFO]: Iter: 22000 Time(1000 iters): 22.271977s Loss: 0.199577 lr:0.048997
[12d70h20m41s][HUGECTR][INFO]: Iter: 23000 Time(1000 iters): 15.047657s Loss: 0.194625 lr:0.045559
[12d70h20m56s][HUGECTR][INFO]: Iter: 24000 Time(1000 iters): 15.054897s Loss: 0.197816 lr:0.042247
[12d70h30m20s][HUGECTR][INFO]: Evaluation, AUC: 0.946273
[12d70h30m20s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.023635s
[12d70h30m20s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h30m20s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h30m20s][HUGECTR][INFO]: Done
[12d70h30m40s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h30m40s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h30m40s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h30m40s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h30m40s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h30m19s][HUGECTR][INFO]: Iter: 25000 Time(1000 iters): 22.792095s Loss: 0.195353 lr:0.039059
[12d70h30m34s][HUGECTR][INFO]: Iter: 26000 Time(1000 iters): 15.069135s Loss: 0.194946 lr:0.035997
[12d70h30m49s][HUGECTR][INFO]: Iter: 27000 Time(1000 iters): 15.044690s Loss: 0.196138 lr:0.033060
[12d70h30m55s][HUGECTR][INFO]: Evaluation, AUC: 0.946479
[12d70h30m55s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.036560s
[12d70h30m55s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h30m55s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h30m55s][HUGECTR][INFO]: Done
[12d70h30m56s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h30m56s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h30m56s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h30m56s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h30m56s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h40m11s][HUGECTR][INFO]: Iter: 28000 Time(1000 iters): 21.477826s Loss: 0.196544 lr:0.030247
[12d70h40m26s][HUGECTR][INFO]: Iter: 29000 Time(1000 iters): 15.047754s Loss: 0.192916 lr:0.027560
[12d70h40m41s][HUGECTR][INFO]: Iter: 30000 Time(1000 iters): 15.076476s Loss: 0.193249 lr:0.024998
[12d70h40m47s][HUGECTR][INFO]: Evaluation, AUC: 0.946866
[12d70h40m47s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.019900s
[12d70h40m47s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h40m47s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h40m47s][HUGECTR][INFO]: Done
[12d70h40m47s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h40m47s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h40m47s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h40m47s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h40m47s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h50m20s][HUGECTR][INFO]: Iter: 31000 Time(1000 iters): 21.420334s Loss: 0.191549 lr:0.022560
[12d70h50m17s][HUGECTR][INFO]: Iter: 32000 Time(1000 iters): 15.056377s Loss: 0.192337 lr:0.020248
[12d70h50m32s][HUGECTR][INFO]: Iter: 33000 Time(1000 iters): 15.049432s Loss: 0.190889 lr:0.018060
[12d70h50m38s][HUGECTR][INFO]: Evaluation, AUC: 0.947067
[12d70h50m38s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.038870s
[12d70h50m39s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h50m39s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h50m39s][HUGECTR][INFO]: Done
[12d70h50m39s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h50m39s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h50m39s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h50m39s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h50m39s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h50m54s][HUGECTR][INFO]: Iter: 34000 Time(1000 iters): 21.957504s Loss: 0.190454 lr:0.015998
[12d70h60m90s][HUGECTR][INFO]: Iter: 35000 Time(1000 iters): 15.051283s Loss: 0.188163 lr:0.014061
[12d70h60m24s][HUGECTR][INFO]: Iter: 36000 Time(1000 iters): 15.057633s Loss: 0.192510 lr:0.012248
[12d70h60m31s][HUGECTR][INFO]: Evaluation, AUC: 0.947169
[12d70h60m31s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.039515s
[12d70h60m31s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h60m31s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h60m31s][HUGECTR][INFO]: Done
[12d70h60m31s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h60m31s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h60m31s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h60m31s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h60m31s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h60m46s][HUGECTR][INFO]: Iter: 37000 Time(1000 iters): 21.491865s Loss: 0.190069 lr:0.010561
[12d70h70m10s][HUGECTR][INFO]: Iter: 38000 Time(1000 iters): 15.070367s Loss: 0.192338 lr:0.008999
[12d70h70m16s][HUGECTR][INFO]: Iter: 39000 Time(1000 iters): 15.056408s Loss: 0.189535 lr:0.007561
[12d70h70m22s][HUGECTR][INFO]: Evaluation, AUC: 0.947164
[12d70h70m22s][HUGECTR][INFO]: Eval Time for 1000 iters: 5.993091s
[12d70h70m22s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h70m22s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h70m22s][HUGECTR][INFO]: Done
[12d70h70m22s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h70m22s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h70m22s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h70m22s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h70m22s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h70m38s][HUGECTR][INFO]: Iter: 40000 Time(1000 iters): 21.440558s Loss: 0.188189 lr:0.006249
[12d70h70m53s][HUGECTR][INFO]: Iter: 41000 Time(1000 iters): 15.057426s Loss: 0.187295 lr:0.005061
[12d70h80m80s][HUGECTR][INFO]: Iter: 42000 Time(1000 iters): 15.075448s Loss: 0.188529 lr:0.003999
[12d70h80m14s][HUGECTR][INFO]: Evaluation, AUC: 0.947195
[12d70h80m14s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.011289s
[12d70h80m14s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h80m14s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h80m14s][HUGECTR][INFO]: Done
[12d70h80m14s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h80m14s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h80m14s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h80m14s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h80m14s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h80m29s][HUGECTR][INFO]: Iter: 43000 Time(1000 iters): 21.454947s Loss: 0.188799 lr:0.003062
[12d70h80m44s][HUGECTR][INFO]: Iter: 44000 Time(1000 iters): 15.055168s Loss: 0.190610 lr:0.002249
[12d70h80m59s][HUGECTR][INFO]: Iter: 45000 Time(1000 iters): 15.067865s Loss: 0.191055 lr:0.001562
[12d70h90m50s][HUGECTR][INFO]: Evaluation, AUC: 0.947241
[12d70h90m50s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.046591s
[12d70h90m50s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h90m50s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h90m50s][HUGECTR][INFO]: Done
[12d70h90m60s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h90m60s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h90m60s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h90m60s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h90m60s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h90m21s][HUGECTR][INFO]: Iter: 46000 Time(1000 iters): 21.669764s Loss: 0.187626 lr:0.001000
[12d70h90m36s][HUGECTR][INFO]: Iter: 47000 Time(1000 iters): 15.044369s Loss: 0.188257 lr:0.000562
[12d70h90m51s][HUGECTR][INFO]: Iter: 48000 Time(1000 iters): 15.050518s Loss: 0.190723 lr:0.000250
[12d70h90m57s][HUGECTR][INFO]: Evaluation, AUC: 0.947264
[12d70h90m57s][HUGECTR][INFO]: Eval Time for 1000 iters: 6.008485s
[12d70h90m57s][HUGECTR][INFO]: Rank0: Dump hash table from GPU0
[12d70h90m57s][HUGECTR][INFO]: Rank0: Write hash table <key,value> pairs to file
[12d70h90m57s][HUGECTR][INFO]: Done
[12d70h90m58s][HUGECTR][INFO]: Dumping sparse weights to files, successful
[12d70h90m58s][HUGECTR][INFO]: Dumping sparse optimzer states to files, successful
[12d70h90m58s][HUGECTR][INFO]: Dumping dense weights to file, successful
[12d70h90m58s][HUGECTR][INFO]: Dumping dense optimizer states to file, successful
[12d70h90m58s][HUGECTR][INFO]: Dumping untrainable weights to file, successful
[12d70h10m13s][HUGECTR][INFO]: Iter: 49000 Time(1000 iters): 21.945730s Loss: 0.188774 lr:0.000062

Answer item similarity with DLRM embedding

In this section, we demonstrate how the output of HugeCTR training can be used to carry out simple inference tasks. Specifically, we will show that the movie embeddings can be used for simple item-to-item similarity queries. Such a simple inference can be used as an efficient candidate generator to generate a small set of candidates prior to deep learning model re-ranking.

First, we read the embedding tables and extract the movie embeddings.

import struct 
import pickle
import numpy as np

key_type = 'I64'
key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}

embedding_vec_size = 64

HUGE_CTR_VERSION = 2.21 # set HugeCTR version here, 2.2 for v2.2, 2.21 for v2.21

if HUGE_CTR_VERSION <= 2.2:
    each_key_size = key_type_map[key_type][1] + key_type_map[key_type][1] + 4 * embedding_vec_size
else:
    each_key_size = key_type_map[key_type][1] + 8 + 4 * embedding_vec_size

embedding_table = {}
        
with open("./hugeCTR_saved_model_DLRM/0_sparse_9000.model" + "/key", 'rb') as key_file, \
     open("./hugeCTR_saved_model_DLRM/0_sparse_9000.model" + "/emb_vector", 'rb') as vec_file:
    try:
        while True:
            key_buffer = key_file.read(key_type_map[key_type][1])
            vec_buffer = vec_file.read(4 * embedding_vec_size)
            if len(key_buffer) == 0 or len(vec_buffer) == 0:
                break
            key = struct.unpack(key_type_map[key_type][0], key_buffer)[0]
            values = struct.unpack(str(embedding_vec_size) + "f", vec_buffer)

            embedding_table[key] = values

    except BaseException as error:
        print(error)

item_embedding = np.zeros((26744, embedding_vec_size), dtype='float')
for i in range(len(embedding_table[1])):
    item_embedding[i] = embedding_table[1][i]
    

Answer nearest neighbor queries

from scipy.spatial.distance import cdist

def find_similar_movies(nn_movie_id, item_embedding, k=10, metric="euclidean"):
    #find the top K similar items according to one of the distance metric: cosine or euclidean
    sim = 1-cdist(item_embedding, item_embedding[nn_movie_id].reshape(1, -1), metric=metric)
   
    return sim.squeeze().argsort()[-k:][::-1]

with open('./mappings.pickle', 'rb') as handle:
    movies_mapping = pickle.load(handle)["items"]

nn_to_movies = movies_mapping
movies_to_nn = {}
for i in range(len(movies_mapping)):
    movies_to_nn[movies_mapping[i]] = i

import pandas as pd
movies = pd.read_csv("./data/ml-20m/movies.csv", index_col="movieId")

for movie_ID in range(1,10):
    try:
        print("Query: ", movies.loc[movie_ID]["title"], movies.loc[movie_ID]["genres"])

        print("Similar movies: ")
        similar_movies = find_similar_movies(movies_to_nn[movie_ID], item_embedding)

        for i in similar_movies:
            print(nn_to_movies[i], movies.loc[nn_to_movies[i]]["title"], movies.loc[nn_to_movies[i]]["genres"])
        print("=================================\n")
    except Exception as e:
        pass

Query:  Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Jumanji (1995) Adventure|Children|Fantasy
Similar movies: 
Jumanji (1995) Adventure|Children|Fantasy
Birds, The (1963) Horror|Thriller
Terminator, The (1984) Action|Sci-Fi|Thriller
Reservoir Dogs (1992) Crime|Mystery|Thriller
Silence of the Lambs, The (1991) Crime|Horror|Thriller
Jaws (1975) Action|Horror
Rumble in the Bronx (Hont faan kui) (1995) Action|Adventure|Comedy|Crime
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981) Action|Adventure
Die Hard (1988) Action|Crime|Thriller
Dead Poets Society (1989) Drama
=================================

Query:  Grumpier Old Men (1995) Comedy|Romance
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Waiting to Exhale (1995) Comedy|Drama|Romance
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Father of the Bride Part II (1995) Comedy
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Heat (1995) Action|Crime|Thriller
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Sabrina (1995) Comedy|Romance
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Tom and Huck (1995) Adventure|Children
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================

Query:  Sudden Death (1995) Action
Similar movies: 
110510 Série noire (1979) Film-Noir
Come and Get It (1936) Drama
Global Metal (2008) Documentary
Zulu Dawn (1979) Action|Drama|Thriller|War
Hitman, The (1991) Action|Crime|Thriller
Pekka ja Pätkä neekereinä (1960) Comedy
Franklyn (2008) Drama|Fantasy|Romance|Thriller
Cold Souls (2009) Comedy|Drama
Kill Buljo: The Movie (2007) Action|Comedy
Botched (2007) Comedy|Crime|Horror|Thriller
=================================