# Copyright 2020 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

HugeCTR demo on Movie lens data

Overview

HugeCTR is a recommender-specific framework that is capable of distributed training across multiple GPUs and nodes for click-through-rate (CTR) estimation. HugeCTR is a component of NVIDIA Merlin (documentation | GitHub). Merlin which is a framework that accelerates the entire pipeline from data ingestion and training to deploying GPU-accelerated recommender systems.

Learning objectives

Training a deep-learning recommender model (DLRM) on the MovieLens 20M dataset.
Walk through data preprocessing, training a DLRM model with HugeCTR, and then using the movie embedding to answer item similarity queries.

Prerequisites

Docker containers

Start the notebook inside a running 22.07 or later NGC Docker container: nvcr.io/nvidia/merlin/merlin-hugectr:22.07. The HugeCTR Python interface is installed to the path /usr/local/hugectr/lib/ and the path is added to the environment variable PYTHONPATH. You can use the HugeCTR Python interface within the Docker container without any additional configuration.

Hardware

This notebook requires a Pascal, Volta, Turing, Ampere or newer GPUs, such as P100, V100, T4 or A100. You can view the GPU information with the nvidia-smi command:

!nvidia-smi

Mon Aug 15 07:05:22 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   30C    P0    41W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0    43W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   33C    P0    43W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   31C    P0    41W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   32C    P0    42W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   32C    P0    42W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   35C    P0    42W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   31C    P0    42W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Data download and preprocessing

We first install a few extra utilities for data preprocessing.

print("Downloading and installing 'tqdm' package.")
!pip3 -q install torch tqdm

print("Downloading and installing 'unzip' command")
!apt-get update
!apt-get install -y zip

Downloading and installing 'tqdm' package.
Downloading and installing 'unzip' command
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]      
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [663 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]                
Get:5 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2087 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1461 kB]
Get:7 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [888 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]       
Get:9 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]     
Get:11 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:13 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1176 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2539 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1569 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB]
Fetched 24.0 MB in 3s (7097 kB/s)                            
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  unzip
The following NEW packages will be installed:
  unzip zip
0 upgraded, 2 newly installed, 0 to remove and 51 not upgraded.
Need to get 336 kB of archives.
After this operation, 1231 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 unzip amd64 6.0-25ubuntu1 [169 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/main amd64 zip amd64 3.0-11build1 [167 kB]
Fetched 336 kB in 1s (314 kB/s)
Selecting previously unselected package unzip.
(Reading database ... 43708 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-25ubuntu1_amd64.deb ...
Unpacking unzip (6.0-25ubuntu1) ...
Selecting previously unselected package zip.
Preparing to unpack .../zip_3.0-11build1_amd64.deb ...
Unpacking zip (3.0-11build1) ...
Setting up unzip (6.0-25ubuntu1) ...
Setting up zip (3.0-11build1) ...
Processing triggers for mime-support (3.64ubuntu1) ...

Next, we download and unzip the MovieLens 20M dataset.

print("Downloading and extracting 'Movie Lens 20M' dataset.")
#!wget -nc http://files.grouplens.org/datasets/movielens/ml-20m.zip -P data -q --show-progress
!unzip -n data/ml-20m.zip -d data
!ls ./data

Downloading and extracting 'Movie Lens 20M' dataset.
Archive:  data/ml-20m.zip
   creating: data/ml-20m/
  inflating: data/ml-20m/genome-scores.csv  
  inflating: data/ml-20m/genome-tags.csv  
  inflating: data/ml-20m/links.csv   
  inflating: data/ml-20m/movies.csv  
  inflating: data/ml-20m/ratings.csv  
  inflating: data/ml-20m/README.txt  
  inflating: data/ml-20m/tags.csv    
ml-20m	ml-20m.zip

MovieLens data preprocessing

import pandas as pd
import torch
import tqdm

MIN_RATINGS = 20
USER_COLUMN = 'userId'
ITEM_COLUMN = 'movieId'

/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Next, we read the data into a Pandas dataframe and encode userID and itemID with integers.

df = pd.read_csv('./data/ml-20m/ratings.csv')
print("Filtering out users with less than {} ratings".format(MIN_RATINGS))
grouped = df.groupby(USER_COLUMN)
df = grouped.filter(lambda x: len(x) >= MIN_RATINGS)

print("Mapping original user and item IDs to new sequential IDs")
df[USER_COLUMN], unique_users = pd.factorize(df[USER_COLUMN])
df[ITEM_COLUMN], unique_items = pd.factorize(df[ITEM_COLUMN])

nb_users = len(unique_users)
nb_items = len(unique_items)

print("Number of users: %d\nNumber of items: %d"%(len(unique_users), len(unique_items)))

Filtering out users with less than 20 ratings
Mapping original user and item IDs to new sequential IDs
Number of users: 138493
Number of items: 26744

Next, we split the data into a train and test set. The last movie each user has recently rated is used for the test set.

# Need to sort before popping to get the last item
df.sort_values(by='timestamp', inplace=True)
    
# clean up data
del df['rating'], df['timestamp']
df = df.drop_duplicates() # assuming it keeps order

df.head()

	userId	movieId
4182421	28506	3258
18950979	131159	23
18950936	131159	3
18950930	131159	630
12341178	85251	1867

# HugeCTR expect user ID and item ID to be different, so we will add nb_users to the movieId to prevent key range overlapping
df['movieId'] = df['movieId'] + nb_users

# now we have filtered and sorted by time data, we can split test data out
grouped_sorted = df.groupby(USER_COLUMN, group_keys=False)
test_data = grouped_sorted.tail(1).sort_values(by=USER_COLUMN)

# need to pop for each group
train_data = grouped_sorted.apply(lambda x: x.iloc[:-1])

train_data['target']=1
test_data['target']=1
train_data.head()

	movieId	target
20	138513	1
19	138512	1
86	138579	1
61	138554	1
23	138516	1

Because the MovieLens data contains only positive examples, first we define a utility function to generate negative samples.

class _TestNegSampler:
    def __init__(self, train_ratings, nb_users, nb_items, nb_neg):
        self.nb_neg = nb_neg
        self.nb_users = nb_users 
        self.nb_items = nb_items 

        # compute unique ids for quickly created hash set and fast lookup
        ids = (train_ratings[:, 0] * self.nb_items) + train_ratings[:, 1]
        self.set = set(ids)

    def generate(self, batch_size=128*1024):
        users = torch.arange(0, self.nb_users).reshape([1, -1]).repeat([self.nb_neg, 1]).transpose(0, 1).reshape(-1)

        items = [-1] * len(users)

        random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
        print('Generating validation negatives...')
        for idx, u in enumerate(tqdm.tqdm(users.tolist())):
            if not random_items:
                random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
            j = random_items.pop()
            while u * self.nb_items + j in self.set:
                if not random_items:
                    random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
                j = random_items.pop()

            items[idx] = j
        items = torch.LongTensor(items)
        return items

Next, we generate the negative samples for training.

sampler = _TestNegSampler(df.values, nb_users, nb_items, 500)  # using 500 negative samples
train_negs = sampler.generate()
train_negs = train_negs.reshape(-1, 500)

sampler = _TestNegSampler(df.values, nb_users, nb_items, 100)  # using 100 negative samples
test_negs = sampler.generate()
test_negs = test_negs.reshape(-1, 100)

Generating validation negatives...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 69246500/69246500 [00:57<00:00, 1197676.54it/s]

Generating validation negatives...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13849300/13849300 [00:11<00:00, 1181648.22it/s]

import numpy as np

# generating negative samples for training
train_data_neg = np.zeros((train_negs.shape[0]*train_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(train_negs.shape[0])):
    for j in range(train_negs.shape[1]):
        train_data_neg[idx, 0] = i # user ID
        train_data_neg[idx, 1] = train_negs[i, j] # negative item ID
        idx += 1
    
# generating negative samples for testing
test_data_neg = np.zeros((test_negs.shape[0]*test_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(test_negs.shape[0])):
    for j in range(test_negs.shape[1]):
        test_data_neg[idx, 0] = i
        test_data_neg[idx, 1] = test_negs[i, j]
        idx += 1

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [06:23<00:00, 360.91it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [01:16<00:00, 1804.65it/s]

train_data_np= np.concatenate([train_data_neg, train_data.values])
np.random.shuffle(train_data_np)

test_data_np= np.concatenate([test_data_neg, test_data.values])
np.random.shuffle(test_data_np)

Write HugeCTR data files

After pre-processing, we write the data to disk using HugeCTR the Norm dataset format.

from ctypes import c_longlong as ll
from ctypes import c_uint
from ctypes import c_float
from ctypes import c_int

def write_hugeCTR_data(huge_ctr_data, filename='huge_ctr_data.dat'):
    print("Writing %d samples"%huge_ctr_data.shape[0])
    with open(filename, 'wb') as f:
        #write header
        f.write(ll(0)) # 0: no error check; 1: check_num
        f.write(ll(huge_ctr_data.shape[0])) # the number of samples in this data file
        f.write(ll(1)) # dimension of label
        f.write(ll(1)) # dimension of dense feature
        f.write(ll(2)) # long long slot_num
        for _ in range(3): f.write(ll(0)) # reserved for future use

        for i in tqdm.tqdm(range(huge_ctr_data.shape[0])):
            f.write(c_float(huge_ctr_data[i,2])) # float label[label_dim];
            f.write(c_float(0)) # dummy dense feature
            f.write(c_int(1)) # slot 1 nnz: user ID
            f.write(c_uint(huge_ctr_data[i,0]))
            f.write(c_int(1)) # slot 2 nnz: item ID
            f.write(c_uint(huge_ctr_data[i,1]))

Train data

def generate_filelist(filelist_name, num_files, filename_prefix):
    with open(filelist_name, 'wt') as f:
        f.write('{0}\n'.format(num_files));
        for i in range(num_files):
            f.write('{0}_{1}.dat\n'.format(filename_prefix, i))

!rm -rf ./data/hugeCTR
!mkdir ./data/hugeCTR

for i, data_arr in enumerate(np.array_split(train_data_np,10)):
    write_hugeCTR_data(data_arr, filename='./data/hugeCTR/train_huge_ctr_data_%d.dat'%i)

generate_filelist('./data/hugeCTR/train_filelist.txt', 10, './data/hugeCTR/train_huge_ctr_data')

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313062.86it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 314545.08it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313687.26it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 316105.12it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313179.63it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 314053.42it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 312377.54it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313288.65it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313456.87it/s]

Writing 8910827 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 312600.20it/s]

Test data

for i, data_arr in enumerate(np.array_split(test_data_np,10)):
    write_hugeCTR_data(data_arr, filename='./data/hugeCTR/test_huge_ctr_data_%d.dat'%i)
    
generate_filelist('./data/hugeCTR/test_filelist.txt', 10, './data/hugeCTR/test_huge_ctr_data')

Writing 1398780 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 314708.42it/s]

Writing 1398780 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 313743.84it/s]

Writing 1398780 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 316072.53it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315541.63it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315705.03it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315520.94it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 313371.66it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 314972.66it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 314166.37it/s]

Writing 1398779 samples

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315031.06it/s]

HugeCTR DLRM training

In this section, we will train a DLRM network on the augmented movie lens data. First, we write the training Python script.

%%writefile hugectr_dlrm_movielens.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 1000,
                              batchsize_eval = 65536,
                              batchsize = 65536,
                              lr = 0.1,
                              warmup_steps = 1000,
                              decay_start = 10000,
                              decay_steps = 40000,
                              decay_power = 2.0,
                              end_lr = 1e-5,
                              vvgpu = [[0, 1]],
                              repeat_dataset = True,
                              use_mixed_precision = True,
                              scaler = 1024)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                  source = ["./data/hugeCTR/train_filelist.txt"],
                                  eval_source = "./data/hugeCTR/test_filelist.txt",
                                  num_workers = 2,
                                  check_type = hugectr.Check_t.Non)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.SGD,
                                    update_type = hugectr.Update_t.Local,
                                    atomic_update = True)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 1, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("data1", 1, True, 2)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 150,
                            embedding_vec_size = 64,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "data1",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["dense"],
                            top_names = ["fc1"],
                            num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc1"],
                            top_names = ["fc2"],
                            num_output=128))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc2"],
                            top_names = ["fc3"],
                            num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Interaction,
                            bottom_names = ["fc3","sparse_embedding1"],
                            top_names = ["interaction1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["interaction1"],
                            top_names = ["fc4"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc4"],
                            top_names = ["fc5"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc5"],
                            top_names = ["fc6"],
                            num_output=512))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
                            bottom_names = ["fc6"],
                            top_names = ["fc7"],
                            num_output=256))                                                  
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["fc7"],
                            top_names = ["fc8"],
                            num_output=1))                                                                                           
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["fc8", "label"],
                            top_names = ["loss"]))
model.compile()
model.summary()
model.fit(max_iter = 50000, display = 1000, eval_interval = 3000, snapshot = 49000, snapshot_prefix = "./hugeCTR_saved_model_DLRM/")

Overwriting hugectr_dlrm_movielens.py

!rm -rf ./hugeCTR_saved_model_DLRM/
!mkdir ./hugeCTR_saved_model_DLRM/

!python3 hugectr_dlrm_movielens.py

HugeCTR Version: 3.8
====================================================Model Init=====================================================
[HCTR][08:52:43.612][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:52:43.612][INFO][RK0][main]: Global seed is 3027443801
[HCTR][08:52:43.615][INFO][RK0][main]: Device to NUMA mapping:
  GPU 0 ->  node 0
  GPU 1 ->  node 0
[HCTR][08:52:46.789][INFO][RK0][main]: Start all2all warmup
[HCTR][08:52:46.796][INFO][RK0][main]: End all2all warmup
[HCTR][08:52:46.798][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:52:46.799][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
[HCTR][08:52:46.800][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
[HCTR][08:52:46.800][INFO][RK0][main]: num of DataReader workers for train: 2
[HCTR][08:52:46.800][INFO][RK0][main]: num of DataReader workers for eval: 2
[HCTR][08:52:46.809][INFO][RK0][main]: max_vocabulary_size_per_gpu_=614400
[HCTR][08:52:46.812][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
[HCTR][08:53:14.439][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][08:53:14.439][INFO][RK0][tid #140704957323008]: gpu1 start to init embedding
[HCTR][08:53:14.440][INFO][RK0][main]: gpu0 init embedding done
[HCTR][08:53:14.440][INFO][RK0][tid #140704957323008]: gpu1 init embedding done
[HCTR][08:53:14.450][INFO][RK0][main]: Starting AUC NCCL warm-up
[HCTR][08:53:14.488][INFO][RK0][main]: Warm-up done
===================================================Model Summary===================================================
[HCTR][08:53:14.489][INFO][RK0][main]: label                                   Dense                         Sparse                        
label                                   dense                          data1                         
(None, 1)                               (None, 1)                               
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type                              Input Name                    Output Name                   Output Shape                  
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
LocalizedSlotSparseEmbeddingHash        data1                         sparse_embedding1             (None, 2, 64)                 
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       dense                         fc1                           (None, 64)                    
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       fc1                           fc2                           (None, 128)                   
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       fc2                           fc3                           (None, 64)                    
------------------------------------------------------------------------------------------------------------------
Interaction                             fc3                           interaction1                  (None, 68)                    
                                        sparse_embedding1                                                                         
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       interaction1                  fc4                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       fc4                           fc5                           (None, 1024)                  
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       fc5                           fc6                           (None, 512)                   
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct                       fc6                           fc7                           (None, 256)                   
------------------------------------------------------------------------------------------------------------------
InnerProduct                            fc7                           fc8                           (None, 1)                     
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss                  fc8                           loss                                                        
                                        label                                                                                     
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:53:14.489][INFO][RK0][main]: Use non-epoch mode with number of iterations: 50000
[HCTR][08:53:14.489][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:53:14.489][INFO][RK0][main]: Evaluation interval: 3000, snapshot interval: 49000
[HCTR][08:53:14.489][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:53:14.489][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
[HCTR][08:53:14.489][INFO][RK0][main]: Use mixed precision: True, scaler: 1024.000000, use cuda graph: True
[HCTR][08:53:14.489][INFO][RK0][main]: lr: 0.100000, warmup_steps: 1000, end_lr: 0.000010
[HCTR][08:53:14.489][INFO][RK0][main]: decay_start: 10000, decay_steps: 40000, decay_power: 2.000000
[HCTR][08:53:14.489][INFO][RK0][main]: Training source file: ./data/hugeCTR/train_filelist.txt
[HCTR][08:53:14.489][INFO][RK0][main]: Evaluation source file: ./data/hugeCTR/test_filelist.txt
[HCTR][08:53:23.393][INFO][RK0][main]: Iter: 1000 Time(1000 iters): 8.8962s Loss: 0.528513 lr:0.1
[HCTR][08:53:32.267][INFO][RK0][main]: Iter: 2000 Time(1000 iters): 8.86544s Loss: 0.528953 lr:0.1
[HCTR][08:53:41.173][INFO][RK0][main]: Iter: 3000 Time(1000 iters): 8.89732s Loss: 0.52741 lr:0.1
[HCTR][08:53:46.649][INFO][RK0][main]: Evaluation, AUC: 0.615216
[HCTR][08:53:46.649][INFO][RK0][main]: Eval Time for 1000 iters: 5.47524s
[HCTR][08:53:55.561][INFO][RK0][main]: Iter: 4000 Time(1000 iters): 14.38s Loss: 0.173515 lr:0.1
[HCTR][08:54:04.467][INFO][RK0][main]: Iter: 5000 Time(1000 iters): 8.89775s Loss: 0.0643398 lr:0.1
[HCTR][08:54:13.378][INFO][RK0][main]: Iter: 6000 Time(1000 iters): 8.90232s Loss: 0.055425 lr:0.1
[HCTR][08:54:18.613][INFO][RK0][main]: Evaluation, AUC: 0.983938
[HCTR][08:54:18.613][INFO][RK0][main]: Eval Time for 1000 iters: 5.23426s
[HCTR][08:54:27.527][INFO][RK0][main]: Iter: 7000 Time(1000 iters): 14.14s Loss: 0.0444573 lr:0.1
[HCTR][08:54:36.392][INFO][RK0][main]: Iter: 8000 Time(1000 iters): 8.85758s Loss: 0.354917 lr:0.1
[HCTR][08:54:45.263][INFO][RK0][main]: Iter: 9000 Time(1000 iters): 8.86224s Loss: 0.0637668 lr:0.1
[HCTR][08:54:50.450][INFO][RK0][main]: Evaluation, AUC: 0.966228
[HCTR][08:54:50.450][INFO][RK0][main]: Eval Time for 1000 iters: 5.18632s
[HCTR][08:54:59.305][INFO][RK0][main]: Iter: 10000 Time(1000 iters): 14.0335s Loss: 0.0474014 lr:0.099995
[HCTR][08:55:08.171][INFO][RK0][main]: Iter: 11000 Time(1000 iters): 8.8579s Loss: 0.0336978 lr:0.0950576
[HCTR][08:55:16.985][INFO][RK0][main]: Iter: 12000 Time(1000 iters): 8.80581s Loss: 0.0208526 lr:0.0902453
[HCTR][08:55:22.100][INFO][RK0][main]: Evaluation, AUC: 0.990911
[HCTR][08:55:22.100][INFO][RK0][main]: Eval Time for 1000 iters: 5.11441s
[HCTR][08:55:30.936][INFO][RK0][main]: Iter: 13000 Time(1000 iters): 13.9421s Loss: 0.0173013 lr:0.0855579
[HCTR][08:55:39.769][INFO][RK0][main]: Iter: 14000 Time(1000 iters): 8.82507s Loss: 0.0128202 lr:0.0809955
[HCTR][08:55:48.619][INFO][RK0][main]: Iter: 15000 Time(1000 iters): 8.84112s Loss: 0.0100981 lr:0.0765581
[HCTR][08:55:53.942][INFO][RK0][main]: Evaluation, AUC: 0.996372
[HCTR][08:55:53.942][INFO][RK0][main]: Eval Time for 1000 iters: 5.32278s
[HCTR][08:56:02.785][INFO][RK0][main]: Iter: 16000 Time(1000 iters): 14.1583s Loss: 0.00852386 lr:0.0722457
[HCTR][08:56:11.624][INFO][RK0][main]: Iter: 17000 Time(1000 iters): 8.82997s Loss: 0.00812518 lr:0.0680584
[HCTR][08:56:20.473][INFO][RK0][main]: Iter: 18000 Time(1000 iters): 8.84099s Loss: 0.00878625 lr:0.063996
[HCTR][08:56:25.671][INFO][RK0][main]: Evaluation, AUC: 0.997613
[HCTR][08:56:25.671][INFO][RK0][main]: Eval Time for 1000 iters: 5.19794s
[HCTR][08:56:34.533][INFO][RK0][main]: Iter: 19000 Time(1000 iters): 14.0519s Loss: 0.00652799 lr:0.0600586
[HCTR][08:56:43.383][INFO][RK0][main]: Iter: 20000 Time(1000 iters): 8.84127s Loss: 0.00636787 lr:0.0562463
[HCTR][08:56:52.245][INFO][RK0][main]: Iter: 21000 Time(1000 iters): 8.85349s Loss: 0.00630231 lr:0.0525589
[HCTR][08:56:57.272][INFO][RK0][main]: Evaluation, AUC: 0.998177
[HCTR][08:56:57.272][INFO][RK0][main]: Eval Time for 1000 iters: 5.02735s
[HCTR][08:57:06.114][INFO][RK0][main]: Iter: 22000 Time(1000 iters): 13.8611s Loss: 0.00599465 lr:0.0489965
[HCTR][08:57:14.971][INFO][RK0][main]: Iter: 23000 Time(1000 iters): 8.84889s Loss: 0.00456903 lr:0.0455591
[HCTR][08:57:23.829][INFO][RK0][main]: Iter: 24000 Time(1000 iters): 8.84915s Loss: 0.0048366 lr:0.0422468
[HCTR][08:57:28.904][INFO][RK0][main]: Evaluation, AUC: 0.998516
[HCTR][08:57:28.904][INFO][RK0][main]: Eval Time for 1000 iters: 5.07521s
[HCTR][08:57:37.757][INFO][RK0][main]: Iter: 25000 Time(1000 iters): 13.9202s Loss: 0.00472847 lr:0.0390594
[HCTR][08:57:46.597][INFO][RK0][main]: Iter: 26000 Time(1000 iters): 8.8316s Loss: 0.00477947 lr:0.035997
[HCTR][08:57:55.448][INFO][RK0][main]: Iter: 27000 Time(1000 iters): 8.84248s Loss: 0.00496196 lr:0.0330596
[HCTR][08:58:00.628][INFO][RK0][main]: Evaluation, AUC: 0.998732
[HCTR][08:58:00.628][INFO][RK0][main]: Eval Time for 1000 iters: 5.17941s
[HCTR][08:58:09.475][INFO][RK0][main]: Iter: 28000 Time(1000 iters): 14.0191s Loss: 0.00393799 lr:0.0302472
[HCTR][08:58:18.304][INFO][RK0][main]: Iter: 29000 Time(1000 iters): 8.82012s Loss: 0.00410887 lr:0.0275599
[HCTR][08:58:27.122][INFO][RK0][main]: Iter: 30000 Time(1000 iters): 8.80965s Loss: 0.00343625 lr:0.0249975
[HCTR][08:58:32.205][INFO][RK0][main]: Evaluation, AUC: 0.998878
[HCTR][08:58:32.205][INFO][RK0][main]: Eval Time for 1000 iters: 5.08249s
[HCTR][08:58:41.057][INFO][RK0][main]: Iter: 31000 Time(1000 iters): 13.9267s Loss: 0.00338647 lr:0.0225601
[HCTR][08:58:49.898][INFO][RK0][main]: Iter: 32000 Time(1000 iters): 8.83291s Loss: 0.00431207 lr:0.0202478
[HCTR][08:58:58.759][INFO][RK0][main]: Iter: 33000 Time(1000 iters): 8.85196s Loss: 0.00314963 lr:0.0180604
[HCTR][08:59:04.056][INFO][RK0][main]: Evaluation, AUC: 0.998967
[HCTR][08:59:04.056][INFO][RK0][main]: Eval Time for 1000 iters: 5.29728s
[HCTR][08:59:12.903][INFO][RK0][main]: Iter: 34000 Time(1000 iters): 14.1363s Loss: 0.00491561 lr:0.015998
[HCTR][08:59:21.769][INFO][RK0][main]: Iter: 35000 Time(1000 iters): 8.85741s Loss: 0.00385364 lr:0.0140606
[HCTR][08:59:30.614][INFO][RK0][main]: Iter: 36000 Time(1000 iters): 8.8366s Loss: 0.00431366 lr:0.0122482
[HCTR][08:59:35.777][INFO][RK0][main]: Evaluation, AUC: 0.999021
[HCTR][08:59:35.777][INFO][RK0][main]: Eval Time for 1000 iters: 5.16256s
[HCTR][08:59:44.585][INFO][RK0][main]: Iter: 37000 Time(1000 iters): 13.9628s Loss: 0.00293767 lr:0.0105609
[HCTR][08:59:53.412][INFO][RK0][main]: Iter: 38000 Time(1000 iters): 8.81858s Loss: 0.00274502 lr:0.0089985
[HCTR][09:00:02.255][INFO][RK0][main]: Iter: 39000 Time(1000 iters): 8.83457s Loss: 0.00254011 lr:0.00756112
[HCTR][09:00:07.380][INFO][RK0][main]: Evaluation, AUC: 0.999059
[HCTR][09:00:07.380][INFO][RK0][main]: Eval Time for 1000 iters: 5.1243s
[HCTR][09:00:16.245][INFO][RK0][main]: Iter: 40000 Time(1000 iters): 13.982s Loss: 0.00315883 lr:0.00624875
[HCTR][09:00:25.106][INFO][RK0][main]: Iter: 41000 Time(1000 iters): 8.85296s Loss: 0.0038635 lr:0.00506138
[HCTR][09:00:33.969][INFO][RK0][main]: Iter: 42000 Time(1000 iters): 8.85403s Loss: 0.0034295 lr:0.003999
[HCTR][09:00:39.221][INFO][RK0][main]: Evaluation, AUC: 0.999073
[HCTR][09:00:39.221][INFO][RK0][main]: Eval Time for 1000 iters: 5.2517s
[HCTR][09:00:48.067][INFO][RK0][main]: Iter: 43000 Time(1000 iters): 14.0899s Loss: 0.00349809 lr:0.00306162
[HCTR][09:00:56.913][INFO][RK0][main]: Iter: 44000 Time(1000 iters): 8.83807s Loss: 0.0017837 lr:0.00224925
[HCTR][09:01:05.775][INFO][RK0][main]: Iter: 45000 Time(1000 iters): 8.85327s Loss: 0.00304943 lr:0.00156188
[HCTR][09:01:10.893][INFO][RK0][main]: Evaluation, AUC: 0.999083
[HCTR][09:01:10.893][INFO][RK0][main]: Eval Time for 1000 iters: 5.11746s
[HCTR][09:01:19.722][INFO][RK0][main]: Iter: 46000 Time(1000 iters): 13.9386s Loss: 0.00260634 lr:0.0009995
[HCTR][09:01:28.590][INFO][RK0][main]: Iter: 47000 Time(1000 iters): 8.85932s Loss: 0.00273577 lr:0.000562125
[HCTR][09:01:37.437][INFO][RK0][main]: Iter: 48000 Time(1000 iters): 8.8387s Loss: 0.00348975 lr:0.00024975
[HCTR][09:01:42.659][INFO][RK0][main]: Evaluation, AUC: 0.999091
[HCTR][09:01:42.659][INFO][RK0][main]: Eval Time for 1000 iters: 5.22141s
[HCTR][09:01:51.535][INFO][RK0][main]: Iter: 49000 Time(1000 iters): 14.0898s Loss: 0.00397105 lr:6.23751e-05
[HCTR][09:01:51.576][INFO][RK0][main]: Rank0: Dump hash table from GPU0
[HCTR][09:01:51.576][INFO][RK0][main]: Rank0: Dump hash table from GPU1
[HCTR][09:01:51.583][INFO][RK0][main]: Rank0: Write hash table <key,value> pairs to file
[HCTR][09:01:51.662][INFO][RK0][main]: Done
[HCTR][09:01:51.671][INFO][RK0][main]: Dumping sparse weights to files, successful
[HCTR][09:01:51.671][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
[HCTR][09:01:51.680][INFO][RK0][main]: Dumping dense weights to file, successful
[HCTR][09:01:51.681][INFO][RK0][main]: Dumping dense optimizer states to file, successful
[HCTR][09:02:00.584][INFO][RK0][main]: Finish 50000 iterations with batchsize: 65536 in 526.10s.

Answer item similarity with DLRM embedding

In this section, we demonstrate how the output of HugeCTR training can be used to carry out simple inference tasks. Specifically, we will show that the movie embeddings can be used for simple item-to-item similarity queries. Such a simple inference can be used as an efficient candidate generator to generate a small set of candidates prior to deep learning model re-ranking.

First, we read the embedding tables and extract the movie embeddings.

import struct 
import pickle
import numpy as np

key_type = 'I64'
key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}

embedding_vec_size = 64

HUGE_CTR_VERSION = 2.21 # set HugeCTR version here, 2.2 for v2.2, 2.21 for v2.21

if HUGE_CTR_VERSION <= 2.2:
    each_key_size = key_type_map[key_type][1] + key_type_map[key_type][1] + 4 * embedding_vec_size
else:
    each_key_size = key_type_map[key_type][1] + 8 + 4 * embedding_vec_size

embedding_table = {}
        
with open("./hugeCTR_saved_model_DLRM/0_sparse_49000.model" + "/key", 'rb') as key_file, \
     open("./hugeCTR_saved_model_DLRM/0_sparse_49000.model" + "/emb_vector", 'rb') as vec_file:
    try:
        while True:
            key_buffer = key_file.read(key_type_map[key_type][1])
            vec_buffer = vec_file.read(4 * embedding_vec_size)
            if len(key_buffer) == 0 or len(vec_buffer) == 0:
                break
            key = struct.unpack(key_type_map[key_type][0], key_buffer)[0]
            values = struct.unpack(str(embedding_vec_size) + "f", vec_buffer)

            embedding_table[key] = values

    except BaseException as error:
        print(error)

# Create mapping between the MovieId and the keys in the embedding table
def mid_to_key(mid):
    return mid + nb_users

def key_to_mid(key):
    return key - nb_users

max_key = max(embedding_table.keys())
item_embedding = np.zeros((max_key + 1, embedding_vec_size), dtype='float')
for i in embedding_table.keys():
    item_embedding[i] = embedding_table[i]

Answer nearest neighbor queries

from scipy.spatial.distance import cdist

def find_similar_movies(nn_movie_id, item_embedding, k=10, metric="euclidean"):
    #find the top K similar items according to one of the distance metric: cosine or euclidean
    sim = 1-cdist(item_embedding, item_embedding[nn_movie_id].reshape(1, -1), metric=metric)
   
    return sim.squeeze().argsort()[-k:][::-1]

import pandas as pd
movies = pd.read_csv("./data/ml-20m/movies.csv", index_col="movieId")

movies.index[:10]

Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64', name='movieId')

item_embedding.shape

(165237, 64)

for movie_ID in movies.index[:10]:
    try:
        print("Query: ", movies.loc[movie_ID]["title"], movies.loc[movie_ID]["genres"])

        print("Similar movies: ")
        similar_movies = find_similar_movies(mid_to_key(movie_ID), item_embedding)

        for i in similar_movies[1:]:
            try:
                print(key_to_mid(i), movies.loc[key_to_mid(i)]["title"], movies.loc[key_to_mid(i)]["genres"])
            except Exception as e:
                pass
        print("=================================\n")
    except Exception as e:
        pass

Query:  Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
Similar movies: 
339 While You Were Sleeping (1995) Comedy|Romance
2549 Wing Commander (1999) Action|Sci-Fi
=================================

Query:  Jumanji (1995) Adventure|Children|Fantasy
Similar movies: 
511 Program, The (1993) Action|Drama
1897 High Art (1998) Drama|Romance
314 Secret of Roan Inish, The (1994) Children|Drama|Fantasy|Mystery
28 Persuasion (1995) Drama|Romance
194 Smoke (1995) Comedy|Drama
80 White Balloon, The (Badkonake sefid) (1995) Children|Drama
10 GoldenEye (1995) Action|Adventure|Thriller
1084 Bonnie and Clyde (1967) Crime|Drama
649 Cold Fever (Á köldum klaka) (1995) Comedy|Drama
=================================

Query:  Grumpier Old Men (1995) Comedy|Romance
Similar movies: 
626 Thin Line Between Love and Hate, A (1996) Comedy
952 Around the World in 80 Days (1956) Adventure|Comedy
1119 Drunks (1995) Drama
353 Crow, The (1994) Action|Crime|Fantasy|Thriller
791 Last Klezmer: Leopold Kozlowski, His Life and Music, The (1994) Documentary
1115 Sleepover (1995) Drama
237 Forget Paris (1995) Comedy|Romance
389 Colonel Chabert, Le (1994) Drama|Romance|War
=================================

Query:  Waiting to Exhale (1995) Comedy|Drama|Romance
Similar movies: 
406 Federal Hill (1994) Drama
827 Convent, The (O Convento) (1995) Drama
266 Legends of the Fall (1994) Drama|Romance|War|Western
261 Little Women (1994) Drama
264 Enfer, L' (1994) Drama
511 Program, The (1993) Action|Drama
2506 Other Sister, The (1999) Comedy|Drama|Romance
1061 Sleepers (1996) Thriller
206 Unzipped (1995) Documentary
=================================

Query:  Father of the Bride Part II (1995) Comedy
Similar movies: 
2965 Omega Code, The (1999) Action
1050 Looking for Richard (1996) Documentary|Drama
=================================

Query:  Heat (1995) Action|Crime|Thriller
Similar movies: 
5370 Big Bad Mama II (1987) Action|Comedy
1528 Intimate Relations (1996) Comedy
1679 Chairman of the Board (1998) Comedy
603 Bye Bye, Love (1995) Comedy
2786 Haunted Honeymoon (1986) Comedy
=================================

Query:  Sabrina (1995) Comedy|Romance
Similar movies: 
260 Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Sci-Fi
603 Bye Bye, Love (1995) Comedy
726 Last Dance (1996) Drama
47 Seven (a.k.a. Se7en) (1995) Mystery|Thriller
2162 NeverEnding Story II: The Next Chapter, The (1990) Adventure|Children|Fantasy
82 Antonia's Line (Antonia) (1995) Comedy|Drama
=================================

Query:  Tom and Huck (1995) Adventure|Children
Similar movies: 
368 Maverick (1994) Adventure|Comedy|Western
3579 I Dreamed of Africa (2000) Drama
477 What's Love Got to Do with It? (1993) Drama|Musical
423 Blown Away (1994) Action|Thriller
339 While You Were Sleeping (1995) Comedy|Romance
1693 Amistad (1997) Drama|Mystery
35 Carrington (1995) Drama|Romance
400 Homage (1995) Drama
=================================

Query:  Sudden Death (1995) Action
Similar movies: 
742 Thinner (1996) Horror|Thriller
481 Kalifornia (1993) Drama|Thriller
715 Horseman on the Roof, The (Hussard sur le toit, Le) (1995) Drama|Romance
237 Forget Paris (1995) Comedy|Romance
640 Diabolique (1996) Drama|Thriller
574 Spanking the Monkey (1994) Comedy|Drama
32 Twelve Monkeys (a.k.a. 12 Monkeys) (1995) Mystery|Sci-Fi|Thriller
8 Tom and Huck (1995) Adventure|Children
=================================

Query:  GoldenEye (1995) Action|Adventure|Thriller
Similar movies: 
257 Just Cause (1995) Mystery|Thriller
1913 Picnic at Hanging Rock (1975) Drama|Mystery
1224 Henry V (1989) Action|Drama|Romance|War
1542 Brassed Off (1996) Comedy|Drama|Romance
243 Gordy (1995) Children|Comedy|Fantasy
2335 Waterboy, The (1998) Comedy
1218 Killer, The (Die xue shuang xiong) (1989) Action|Crime|Drama|Thriller
477 What's Love Got to Do with It? (1993) Drama|Musical
1894 Six Days Seven Nights (1998) Adventure|Comedy|Romance
=================================