# Copyright 2020 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
HugeCTR demo on Movie lens data
Overview
HugeCTR is a recommender-specific framework that is capable of distributed training across multiple GPUs and nodes for click-through-rate (CTR) estimation. HugeCTR is a component of NVIDIA Merlin (documentation | GitHub). Merlin which is a framework that accelerates the entire pipeline from data ingestion and training to deploying GPU-accelerated recommender systems.
Prerequisites
Docker containers
Start the notebook inside a running 22.07 or later NGC Docker container: nvcr.io/nvidia/merlin/merlin-hugectr:22.07
.
The HugeCTR Python interface is installed to the path /usr/local/hugectr/lib/
and the path is added to the environment variable PYTHONPATH
.
You can use the HugeCTR Python interface within the Docker container without any additional configuration.
Hardware
This notebook requires a Pascal, Volta, Turing, Ampere or newer GPUs, such as P100, V100, T4 or A100.
You can view the GPU information with the nvidia-smi
command:
!nvidia-smi
Mon Aug 15 07:05:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 30C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:07:00.0 Off | 0 |
| N/A 32C P0 43W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:0A:00.0 Off | 0 |
| N/A 33C P0 43W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:0B:00.0 Off | 0 |
| N/A 31C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 32C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |
| N/A 32C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |
| N/A 35C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 31C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Data download and preprocessing
We first install a few extra utilities for data preprocessing.
print("Downloading and installing 'tqdm' package.")
!pip3 -q install torch tqdm
print("Downloading and installing 'unzip' command")
!apt-get update
!apt-get install -y zip
Downloading and installing 'tqdm' package.
Downloading and installing 'unzip' command
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease [1581 B]
Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages [663 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2087 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1461 kB]
Get:7 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [888 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:13 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1176 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2539 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1569 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB]
Fetched 24.0 MB in 3s (7097 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
unzip
The following NEW packages will be installed:
unzip zip
0 upgraded, 2 newly installed, 0 to remove and 51 not upgraded.
Need to get 336 kB of archives.
After this operation, 1231 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 unzip amd64 6.0-25ubuntu1 [169 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/main amd64 zip amd64 3.0-11build1 [167 kB]
Fetched 336 kB in 1s (314 kB/s)
Selecting previously unselected package unzip.
(Reading database ... 43708 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-25ubuntu1_amd64.deb ...
Unpacking unzip (6.0-25ubuntu1) ...
Selecting previously unselected package zip.
Preparing to unpack .../zip_3.0-11build1_amd64.deb ...
Unpacking zip (3.0-11build1) ...
Setting up unzip (6.0-25ubuntu1) ...
Setting up zip (3.0-11build1) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Next, we download and unzip the MovieLens 20M dataset.
print("Downloading and extracting 'Movie Lens 20M' dataset.")
#!wget -nc http://files.grouplens.org/datasets/movielens/ml-20m.zip -P data -q --show-progress
!unzip -n data/ml-20m.zip -d data
!ls ./data
Downloading and extracting 'Movie Lens 20M' dataset.
Archive: data/ml-20m.zip
creating: data/ml-20m/
inflating: data/ml-20m/genome-scores.csv
inflating: data/ml-20m/genome-tags.csv
inflating: data/ml-20m/links.csv
inflating: data/ml-20m/movies.csv
inflating: data/ml-20m/ratings.csv
inflating: data/ml-20m/README.txt
inflating: data/ml-20m/tags.csv
ml-20m ml-20m.zip
MovieLens data preprocessing
import pandas as pd
import torch
import tqdm
MIN_RATINGS = 20
USER_COLUMN = 'userId'
ITEM_COLUMN = 'movieId'
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Next, we read the data into a Pandas dataframe and encode userID
and itemID
with integers.
df = pd.read_csv('./data/ml-20m/ratings.csv')
print("Filtering out users with less than {} ratings".format(MIN_RATINGS))
grouped = df.groupby(USER_COLUMN)
df = grouped.filter(lambda x: len(x) >= MIN_RATINGS)
print("Mapping original user and item IDs to new sequential IDs")
df[USER_COLUMN], unique_users = pd.factorize(df[USER_COLUMN])
df[ITEM_COLUMN], unique_items = pd.factorize(df[ITEM_COLUMN])
nb_users = len(unique_users)
nb_items = len(unique_items)
print("Number of users: %d\nNumber of items: %d"%(len(unique_users), len(unique_items)))
Filtering out users with less than 20 ratings
Mapping original user and item IDs to new sequential IDs
Number of users: 138493
Number of items: 26744
Next, we split the data into a train and test set. The last movie each user has recently rated is used for the test set.
# Need to sort before popping to get the last item
df.sort_values(by='timestamp', inplace=True)
# clean up data
del df['rating'], df['timestamp']
df = df.drop_duplicates() # assuming it keeps order
df.head()
userId | movieId | |
---|---|---|
4182421 | 28506 | 3258 |
18950979 | 131159 | 23 |
18950936 | 131159 | 3 |
18950930 | 131159 | 630 |
12341178 | 85251 | 1867 |
# HugeCTR expect user ID and item ID to be different, so we will add nb_users to the movieId to prevent key range overlapping
df['movieId'] = df['movieId'] + nb_users
# now we have filtered and sorted by time data, we can split test data out
grouped_sorted = df.groupby(USER_COLUMN, group_keys=False)
test_data = grouped_sorted.tail(1).sort_values(by=USER_COLUMN)
# need to pop for each group
train_data = grouped_sorted.apply(lambda x: x.iloc[:-1])
train_data['target']=1
test_data['target']=1
train_data.head()
userId | movieId | target | |
---|---|---|---|
20 | 0 | 138513 | 1 |
19 | 0 | 138512 | 1 |
86 | 0 | 138579 | 1 |
61 | 0 | 138554 | 1 |
23 | 0 | 138516 | 1 |
Because the MovieLens data contains only positive examples, first we define a utility function to generate negative samples.
class _TestNegSampler:
def __init__(self, train_ratings, nb_users, nb_items, nb_neg):
self.nb_neg = nb_neg
self.nb_users = nb_users
self.nb_items = nb_items
# compute unique ids for quickly created hash set and fast lookup
ids = (train_ratings[:, 0] * self.nb_items) + train_ratings[:, 1]
self.set = set(ids)
def generate(self, batch_size=128*1024):
users = torch.arange(0, self.nb_users).reshape([1, -1]).repeat([self.nb_neg, 1]).transpose(0, 1).reshape(-1)
items = [-1] * len(users)
random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
print('Generating validation negatives...')
for idx, u in enumerate(tqdm.tqdm(users.tolist())):
if not random_items:
random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
j = random_items.pop()
while u * self.nb_items + j in self.set:
if not random_items:
random_items = torch.LongTensor(batch_size).random_(0, self.nb_items).tolist()
j = random_items.pop()
items[idx] = j
items = torch.LongTensor(items)
return items
Next, we generate the negative samples for training.
sampler = _TestNegSampler(df.values, nb_users, nb_items, 500) # using 500 negative samples
train_negs = sampler.generate()
train_negs = train_negs.reshape(-1, 500)
sampler = _TestNegSampler(df.values, nb_users, nb_items, 100) # using 100 negative samples
test_negs = sampler.generate()
test_negs = test_negs.reshape(-1, 100)
Generating validation negatives...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 69246500/69246500 [00:57<00:00, 1197676.54it/s]
Generating validation negatives...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13849300/13849300 [00:11<00:00, 1181648.22it/s]
import numpy as np
# generating negative samples for training
train_data_neg = np.zeros((train_negs.shape[0]*train_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(train_negs.shape[0])):
for j in range(train_negs.shape[1]):
train_data_neg[idx, 0] = i # user ID
train_data_neg[idx, 1] = train_negs[i, j] # negative item ID
idx += 1
# generating negative samples for testing
test_data_neg = np.zeros((test_negs.shape[0]*test_negs.shape[1],3), dtype=int)
idx = 0
for i in tqdm.tqdm(range(test_negs.shape[0])):
for j in range(test_negs.shape[1]):
test_data_neg[idx, 0] = i
test_data_neg[idx, 1] = test_negs[i, j]
idx += 1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [06:23<00:00, 360.91it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 138493/138493 [01:16<00:00, 1804.65it/s]
train_data_np= np.concatenate([train_data_neg, train_data.values])
np.random.shuffle(train_data_np)
test_data_np= np.concatenate([test_data_neg, test_data.values])
np.random.shuffle(test_data_np)
Write HugeCTR data files
After pre-processing, we write the data to disk using HugeCTR the Norm dataset format.
from ctypes import c_longlong as ll
from ctypes import c_uint
from ctypes import c_float
from ctypes import c_int
def write_hugeCTR_data(huge_ctr_data, filename='huge_ctr_data.dat'):
print("Writing %d samples"%huge_ctr_data.shape[0])
with open(filename, 'wb') as f:
#write header
f.write(ll(0)) # 0: no error check; 1: check_num
f.write(ll(huge_ctr_data.shape[0])) # the number of samples in this data file
f.write(ll(1)) # dimension of label
f.write(ll(1)) # dimension of dense feature
f.write(ll(2)) # long long slot_num
for _ in range(3): f.write(ll(0)) # reserved for future use
for i in tqdm.tqdm(range(huge_ctr_data.shape[0])):
f.write(c_float(huge_ctr_data[i,2])) # float label[label_dim];
f.write(c_float(0)) # dummy dense feature
f.write(c_int(1)) # slot 1 nnz: user ID
f.write(c_uint(huge_ctr_data[i,0]))
f.write(c_int(1)) # slot 2 nnz: item ID
f.write(c_uint(huge_ctr_data[i,1]))
Train data
def generate_filelist(filelist_name, num_files, filename_prefix):
with open(filelist_name, 'wt') as f:
f.write('{0}\n'.format(num_files));
for i in range(num_files):
f.write('{0}_{1}.dat\n'.format(filename_prefix, i))
!rm -rf ./data/hugeCTR
!mkdir ./data/hugeCTR
for i, data_arr in enumerate(np.array_split(train_data_np,10)):
write_hugeCTR_data(data_arr, filename='./data/hugeCTR/train_huge_ctr_data_%d.dat'%i)
generate_filelist('./data/hugeCTR/train_filelist.txt', 10, './data/hugeCTR/train_huge_ctr_data')
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313062.86it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 314545.08it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313687.26it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 316105.12it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313179.63it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 314053.42it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 312377.54it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313288.65it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 313456.87it/s]
Writing 8910827 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8910827/8910827 [00:28<00:00, 312600.20it/s]
Test data
for i, data_arr in enumerate(np.array_split(test_data_np,10)):
write_hugeCTR_data(data_arr, filename='./data/hugeCTR/test_huge_ctr_data_%d.dat'%i)
generate_filelist('./data/hugeCTR/test_filelist.txt', 10, './data/hugeCTR/test_huge_ctr_data')
Writing 1398780 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 314708.42it/s]
Writing 1398780 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 313743.84it/s]
Writing 1398780 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398780/1398780 [00:04<00:00, 316072.53it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315541.63it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315705.03it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315520.94it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 313371.66it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 314972.66it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 314166.37it/s]
Writing 1398779 samples
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1398779/1398779 [00:04<00:00, 315031.06it/s]
HugeCTR DLRM training
In this section, we will train a DLRM network on the augmented movie lens data. First, we write the training Python script.
%%writefile hugectr_dlrm_movielens.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(max_eval_batches = 1000,
batchsize_eval = 65536,
batchsize = 65536,
lr = 0.1,
warmup_steps = 1000,
decay_start = 10000,
decay_steps = 40000,
decay_power = 2.0,
end_lr = 1e-5,
vvgpu = [[0, 1]],
repeat_dataset = True,
use_mixed_precision = True,
scaler = 1024)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
source = ["./data/hugeCTR/train_filelist.txt"],
eval_source = "./data/hugeCTR/test_filelist.txt",
num_workers = 2,
check_type = hugectr.Check_t.Non)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.SGD,
update_type = hugectr.Update_t.Local,
atomic_update = True)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
dense_dim = 1, dense_name = "dense",
data_reader_sparse_param_array =
[hugectr.DataReaderSparseParam("data1", 1, True, 2)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb = 150,
embedding_vec_size = 64,
combiner = "sum",
sparse_embedding_name = "sparse_embedding1",
bottom_name = "data1",
optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["dense"],
top_names = ["fc1"],
num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["fc1"],
top_names = ["fc2"],
num_output=128))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["fc2"],
top_names = ["fc3"],
num_output=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Interaction,
bottom_names = ["fc3","sparse_embedding1"],
top_names = ["interaction1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["interaction1"],
top_names = ["fc4"],
num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["fc4"],
top_names = ["fc5"],
num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["fc5"],
top_names = ["fc6"],
num_output=512))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedInnerProduct,
bottom_names = ["fc6"],
top_names = ["fc7"],
num_output=256))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["fc7"],
top_names = ["fc8"],
num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
bottom_names = ["fc8", "label"],
top_names = ["loss"]))
model.compile()
model.summary()
model.fit(max_iter = 50000, display = 1000, eval_interval = 3000, snapshot = 49000, snapshot_prefix = "./hugeCTR_saved_model_DLRM/")
Overwriting hugectr_dlrm_movielens.py
!rm -rf ./hugeCTR_saved_model_DLRM/
!mkdir ./hugeCTR_saved_model_DLRM/
!python3 hugectr_dlrm_movielens.py
HugeCTR Version: 3.8
====================================================Model Init=====================================================
[HCTR][08:52:43.612][WARNING][RK0][main]: The model name is not specified when creating the solver.
[HCTR][08:52:43.612][INFO][RK0][main]: Global seed is 3027443801
[HCTR][08:52:43.615][INFO][RK0][main]: Device to NUMA mapping:
GPU 0 -> node 0
GPU 1 -> node 0
[HCTR][08:52:46.789][INFO][RK0][main]: Start all2all warmup
[HCTR][08:52:46.796][INFO][RK0][main]: End all2all warmup
[HCTR][08:52:46.798][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][08:52:46.799][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
[HCTR][08:52:46.800][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
[HCTR][08:52:46.800][INFO][RK0][main]: num of DataReader workers for train: 2
[HCTR][08:52:46.800][INFO][RK0][main]: num of DataReader workers for eval: 2
[HCTR][08:52:46.809][INFO][RK0][main]: max_vocabulary_size_per_gpu_=614400
[HCTR][08:52:46.812][INFO][RK0][main]: Graph analysis to resolve tensor dependency
===================================================Model Compile===================================================
[HCTR][08:53:14.439][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][08:53:14.439][INFO][RK0][tid #140704957323008]: gpu1 start to init embedding
[HCTR][08:53:14.440][INFO][RK0][main]: gpu0 init embedding done
[HCTR][08:53:14.440][INFO][RK0][tid #140704957323008]: gpu1 init embedding done
[HCTR][08:53:14.450][INFO][RK0][main]: Starting AUC NCCL warm-up
[HCTR][08:53:14.488][INFO][RK0][main]: Warm-up done
===================================================Model Summary===================================================
[HCTR][08:53:14.489][INFO][RK0][main]: label Dense Sparse
label dense data1
(None, 1) (None, 1)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type Input Name Output Name Output Shape
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
LocalizedSlotSparseEmbeddingHash data1 sparse_embedding1 (None, 2, 64)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct dense fc1 (None, 64)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct fc1 fc2 (None, 128)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct fc2 fc3 (None, 64)
------------------------------------------------------------------------------------------------------------------
Interaction fc3 interaction1 (None, 68)
sparse_embedding1
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct interaction1 fc4 (None, 1024)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct fc4 fc5 (None, 1024)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct fc5 fc6 (None, 512)
------------------------------------------------------------------------------------------------------------------
FusedInnerProduct fc6 fc7 (None, 256)
------------------------------------------------------------------------------------------------------------------
InnerProduct fc7 fc8 (None, 1)
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss fc8 loss
label
------------------------------------------------------------------------------------------------------------------
=====================================================Model Fit=====================================================
[HCTR][08:53:14.489][INFO][RK0][main]: Use non-epoch mode with number of iterations: 50000
[HCTR][08:53:14.489][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
[HCTR][08:53:14.489][INFO][RK0][main]: Evaluation interval: 3000, snapshot interval: 49000
[HCTR][08:53:14.489][INFO][RK0][main]: Dense network trainable: True
[HCTR][08:53:14.489][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
[HCTR][08:53:14.489][INFO][RK0][main]: Use mixed precision: True, scaler: 1024.000000, use cuda graph: True
[HCTR][08:53:14.489][INFO][RK0][main]: lr: 0.100000, warmup_steps: 1000, end_lr: 0.000010
[HCTR][08:53:14.489][INFO][RK0][main]: decay_start: 10000, decay_steps: 40000, decay_power: 2.000000
[HCTR][08:53:14.489][INFO][RK0][main]: Training source file: ./data/hugeCTR/train_filelist.txt
[HCTR][08:53:14.489][INFO][RK0][main]: Evaluation source file: ./data/hugeCTR/test_filelist.txt
[HCTR][08:53:23.393][INFO][RK0][main]: Iter: 1000 Time(1000 iters): 8.8962s Loss: 0.528513 lr:0.1
[HCTR][08:53:32.267][INFO][RK0][main]: Iter: 2000 Time(1000 iters): 8.86544s Loss: 0.528953 lr:0.1
[HCTR][08:53:41.173][INFO][RK0][main]: Iter: 3000 Time(1000 iters): 8.89732s Loss: 0.52741 lr:0.1
[HCTR][08:53:46.649][INFO][RK0][main]: Evaluation, AUC: 0.615216
[HCTR][08:53:46.649][INFO][RK0][main]: Eval Time for 1000 iters: 5.47524s
[HCTR][08:53:55.561][INFO][RK0][main]: Iter: 4000 Time(1000 iters): 14.38s Loss: 0.173515 lr:0.1
[HCTR][08:54:04.467][INFO][RK0][main]: Iter: 5000 Time(1000 iters): 8.89775s Loss: 0.0643398 lr:0.1
[HCTR][08:54:13.378][INFO][RK0][main]: Iter: 6000 Time(1000 iters): 8.90232s Loss: 0.055425 lr:0.1
[HCTR][08:54:18.613][INFO][RK0][main]: Evaluation, AUC: 0.983938
[HCTR][08:54:18.613][INFO][RK0][main]: Eval Time for 1000 iters: 5.23426s
[HCTR][08:54:27.527][INFO][RK0][main]: Iter: 7000 Time(1000 iters): 14.14s Loss: 0.0444573 lr:0.1
[HCTR][08:54:36.392][INFO][RK0][main]: Iter: 8000 Time(1000 iters): 8.85758s Loss: 0.354917 lr:0.1
[HCTR][08:54:45.263][INFO][RK0][main]: Iter: 9000 Time(1000 iters): 8.86224s Loss: 0.0637668 lr:0.1
[HCTR][08:54:50.450][INFO][RK0][main]: Evaluation, AUC: 0.966228
[HCTR][08:54:50.450][INFO][RK0][main]: Eval Time for 1000 iters: 5.18632s
[HCTR][08:54:59.305][INFO][RK0][main]: Iter: 10000 Time(1000 iters): 14.0335s Loss: 0.0474014 lr:0.099995
[HCTR][08:55:08.171][INFO][RK0][main]: Iter: 11000 Time(1000 iters): 8.8579s Loss: 0.0336978 lr:0.0950576
[HCTR][08:55:16.985][INFO][RK0][main]: Iter: 12000 Time(1000 iters): 8.80581s Loss: 0.0208526 lr:0.0902453
[HCTR][08:55:22.100][INFO][RK0][main]: Evaluation, AUC: 0.990911
[HCTR][08:55:22.100][INFO][RK0][main]: Eval Time for 1000 iters: 5.11441s
[HCTR][08:55:30.936][INFO][RK0][main]: Iter: 13000 Time(1000 iters): 13.9421s Loss: 0.0173013 lr:0.0855579
[HCTR][08:55:39.769][INFO][RK0][main]: Iter: 14000 Time(1000 iters): 8.82507s Loss: 0.0128202 lr:0.0809955
[HCTR][08:55:48.619][INFO][RK0][main]: Iter: 15000 Time(1000 iters): 8.84112s Loss: 0.0100981 lr:0.0765581
[HCTR][08:55:53.942][INFO][RK0][main]: Evaluation, AUC: 0.996372
[HCTR][08:55:53.942][INFO][RK0][main]: Eval Time for 1000 iters: 5.32278s
[HCTR][08:56:02.785][INFO][RK0][main]: Iter: 16000 Time(1000 iters): 14.1583s Loss: 0.00852386 lr:0.0722457
[HCTR][08:56:11.624][INFO][RK0][main]: Iter: 17000 Time(1000 iters): 8.82997s Loss: 0.00812518 lr:0.0680584
[HCTR][08:56:20.473][INFO][RK0][main]: Iter: 18000 Time(1000 iters): 8.84099s Loss: 0.00878625 lr:0.063996
[HCTR][08:56:25.671][INFO][RK0][main]: Evaluation, AUC: 0.997613
[HCTR][08:56:25.671][INFO][RK0][main]: Eval Time for 1000 iters: 5.19794s
[HCTR][08:56:34.533][INFO][RK0][main]: Iter: 19000 Time(1000 iters): 14.0519s Loss: 0.00652799 lr:0.0600586
[HCTR][08:56:43.383][INFO][RK0][main]: Iter: 20000 Time(1000 iters): 8.84127s Loss: 0.00636787 lr:0.0562463
[HCTR][08:56:52.245][INFO][RK0][main]: Iter: 21000 Time(1000 iters): 8.85349s Loss: 0.00630231 lr:0.0525589
[HCTR][08:56:57.272][INFO][RK0][main]: Evaluation, AUC: 0.998177
[HCTR][08:56:57.272][INFO][RK0][main]: Eval Time for 1000 iters: 5.02735s
[HCTR][08:57:06.114][INFO][RK0][main]: Iter: 22000 Time(1000 iters): 13.8611s Loss: 0.00599465 lr:0.0489965
[HCTR][08:57:14.971][INFO][RK0][main]: Iter: 23000 Time(1000 iters): 8.84889s Loss: 0.00456903 lr:0.0455591
[HCTR][08:57:23.829][INFO][RK0][main]: Iter: 24000 Time(1000 iters): 8.84915s Loss: 0.0048366 lr:0.0422468
[HCTR][08:57:28.904][INFO][RK0][main]: Evaluation, AUC: 0.998516
[HCTR][08:57:28.904][INFO][RK0][main]: Eval Time for 1000 iters: 5.07521s
[HCTR][08:57:37.757][INFO][RK0][main]: Iter: 25000 Time(1000 iters): 13.9202s Loss: 0.00472847 lr:0.0390594
[HCTR][08:57:46.597][INFO][RK0][main]: Iter: 26000 Time(1000 iters): 8.8316s Loss: 0.00477947 lr:0.035997
[HCTR][08:57:55.448][INFO][RK0][main]: Iter: 27000 Time(1000 iters): 8.84248s Loss: 0.00496196 lr:0.0330596
[HCTR][08:58:00.628][INFO][RK0][main]: Evaluation, AUC: 0.998732
[HCTR][08:58:00.628][INFO][RK0][main]: Eval Time for 1000 iters: 5.17941s
[HCTR][08:58:09.475][INFO][RK0][main]: Iter: 28000 Time(1000 iters): 14.0191s Loss: 0.00393799 lr:0.0302472
[HCTR][08:58:18.304][INFO][RK0][main]: Iter: 29000 Time(1000 iters): 8.82012s Loss: 0.00410887 lr:0.0275599
[HCTR][08:58:27.122][INFO][RK0][main]: Iter: 30000 Time(1000 iters): 8.80965s Loss: 0.00343625 lr:0.0249975
[HCTR][08:58:32.205][INFO][RK0][main]: Evaluation, AUC: 0.998878
[HCTR][08:58:32.205][INFO][RK0][main]: Eval Time for 1000 iters: 5.08249s
[HCTR][08:58:41.057][INFO][RK0][main]: Iter: 31000 Time(1000 iters): 13.9267s Loss: 0.00338647 lr:0.0225601
[HCTR][08:58:49.898][INFO][RK0][main]: Iter: 32000 Time(1000 iters): 8.83291s Loss: 0.00431207 lr:0.0202478
[HCTR][08:58:58.759][INFO][RK0][main]: Iter: 33000 Time(1000 iters): 8.85196s Loss: 0.00314963 lr:0.0180604
[HCTR][08:59:04.056][INFO][RK0][main]: Evaluation, AUC: 0.998967
[HCTR][08:59:04.056][INFO][RK0][main]: Eval Time for 1000 iters: 5.29728s
[HCTR][08:59:12.903][INFO][RK0][main]: Iter: 34000 Time(1000 iters): 14.1363s Loss: 0.00491561 lr:0.015998
[HCTR][08:59:21.769][INFO][RK0][main]: Iter: 35000 Time(1000 iters): 8.85741s Loss: 0.00385364 lr:0.0140606
[HCTR][08:59:30.614][INFO][RK0][main]: Iter: 36000 Time(1000 iters): 8.8366s Loss: 0.00431366 lr:0.0122482
[HCTR][08:59:35.777][INFO][RK0][main]: Evaluation, AUC: 0.999021
[HCTR][08:59:35.777][INFO][RK0][main]: Eval Time for 1000 iters: 5.16256s
[HCTR][08:59:44.585][INFO][RK0][main]: Iter: 37000 Time(1000 iters): 13.9628s Loss: 0.00293767 lr:0.0105609
[HCTR][08:59:53.412][INFO][RK0][main]: Iter: 38000 Time(1000 iters): 8.81858s Loss: 0.00274502 lr:0.0089985
[HCTR][09:00:02.255][INFO][RK0][main]: Iter: 39000 Time(1000 iters): 8.83457s Loss: 0.00254011 lr:0.00756112
[HCTR][09:00:07.380][INFO][RK0][main]: Evaluation, AUC: 0.999059
[HCTR][09:00:07.380][INFO][RK0][main]: Eval Time for 1000 iters: 5.1243s
[HCTR][09:00:16.245][INFO][RK0][main]: Iter: 40000 Time(1000 iters): 13.982s Loss: 0.00315883 lr:0.00624875
[HCTR][09:00:25.106][INFO][RK0][main]: Iter: 41000 Time(1000 iters): 8.85296s Loss: 0.0038635 lr:0.00506138
[HCTR][09:00:33.969][INFO][RK0][main]: Iter: 42000 Time(1000 iters): 8.85403s Loss: 0.0034295 lr:0.003999
[HCTR][09:00:39.221][INFO][RK0][main]: Evaluation, AUC: 0.999073
[HCTR][09:00:39.221][INFO][RK0][main]: Eval Time for 1000 iters: 5.2517s
[HCTR][09:00:48.067][INFO][RK0][main]: Iter: 43000 Time(1000 iters): 14.0899s Loss: 0.00349809 lr:0.00306162
[HCTR][09:00:56.913][INFO][RK0][main]: Iter: 44000 Time(1000 iters): 8.83807s Loss: 0.0017837 lr:0.00224925
[HCTR][09:01:05.775][INFO][RK0][main]: Iter: 45000 Time(1000 iters): 8.85327s Loss: 0.00304943 lr:0.00156188
[HCTR][09:01:10.893][INFO][RK0][main]: Evaluation, AUC: 0.999083
[HCTR][09:01:10.893][INFO][RK0][main]: Eval Time for 1000 iters: 5.11746s
[HCTR][09:01:19.722][INFO][RK0][main]: Iter: 46000 Time(1000 iters): 13.9386s Loss: 0.00260634 lr:0.0009995
[HCTR][09:01:28.590][INFO][RK0][main]: Iter: 47000 Time(1000 iters): 8.85932s Loss: 0.00273577 lr:0.000562125
[HCTR][09:01:37.437][INFO][RK0][main]: Iter: 48000 Time(1000 iters): 8.8387s Loss: 0.00348975 lr:0.00024975
[HCTR][09:01:42.659][INFO][RK0][main]: Evaluation, AUC: 0.999091
[HCTR][09:01:42.659][INFO][RK0][main]: Eval Time for 1000 iters: 5.22141s
[HCTR][09:01:51.535][INFO][RK0][main]: Iter: 49000 Time(1000 iters): 14.0898s Loss: 0.00397105 lr:6.23751e-05
[HCTR][09:01:51.576][INFO][RK0][main]: Rank0: Dump hash table from GPU0
[HCTR][09:01:51.576][INFO][RK0][main]: Rank0: Dump hash table from GPU1
[HCTR][09:01:51.583][INFO][RK0][main]: Rank0: Write hash table <key,value> pairs to file
[HCTR][09:01:51.662][INFO][RK0][main]: Done
[HCTR][09:01:51.671][INFO][RK0][main]: Dumping sparse weights to files, successful
[HCTR][09:01:51.671][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
[HCTR][09:01:51.680][INFO][RK0][main]: Dumping dense weights to file, successful
[HCTR][09:01:51.681][INFO][RK0][main]: Dumping dense optimizer states to file, successful
[HCTR][09:02:00.584][INFO][RK0][main]: Finish 50000 iterations with batchsize: 65536 in 526.10s.
Answer item similarity with DLRM embedding
In this section, we demonstrate how the output of HugeCTR training can be used to carry out simple inference tasks. Specifically, we will show that the movie embeddings can be used for simple item-to-item similarity queries. Such a simple inference can be used as an efficient candidate generator to generate a small set of candidates prior to deep learning model re-ranking.
First, we read the embedding tables and extract the movie embeddings.
import struct
import pickle
import numpy as np
key_type = 'I64'
key_type_map = {"I32": ["I", 4], "I64": ["q", 8]}
embedding_vec_size = 64
HUGE_CTR_VERSION = 2.21 # set HugeCTR version here, 2.2 for v2.2, 2.21 for v2.21
if HUGE_CTR_VERSION <= 2.2:
each_key_size = key_type_map[key_type][1] + key_type_map[key_type][1] + 4 * embedding_vec_size
else:
each_key_size = key_type_map[key_type][1] + 8 + 4 * embedding_vec_size
embedding_table = {}
with open("./hugeCTR_saved_model_DLRM/0_sparse_49000.model" + "/key", 'rb') as key_file, \
open("./hugeCTR_saved_model_DLRM/0_sparse_49000.model" + "/emb_vector", 'rb') as vec_file:
try:
while True:
key_buffer = key_file.read(key_type_map[key_type][1])
vec_buffer = vec_file.read(4 * embedding_vec_size)
if len(key_buffer) == 0 or len(vec_buffer) == 0:
break
key = struct.unpack(key_type_map[key_type][0], key_buffer)[0]
values = struct.unpack(str(embedding_vec_size) + "f", vec_buffer)
embedding_table[key] = values
except BaseException as error:
print(error)
# Create mapping between the MovieId and the keys in the embedding table
def mid_to_key(mid):
return mid + nb_users
def key_to_mid(key):
return key - nb_users
max_key = max(embedding_table.keys())
item_embedding = np.zeros((max_key + 1, embedding_vec_size), dtype='float')
for i in embedding_table.keys():
item_embedding[i] = embedding_table[i]
Answer nearest neighbor queries
from scipy.spatial.distance import cdist
def find_similar_movies(nn_movie_id, item_embedding, k=10, metric="euclidean"):
#find the top K similar items according to one of the distance metric: cosine or euclidean
sim = 1-cdist(item_embedding, item_embedding[nn_movie_id].reshape(1, -1), metric=metric)
return sim.squeeze().argsort()[-k:][::-1]
import pandas as pd
movies = pd.read_csv("./data/ml-20m/movies.csv", index_col="movieId")
movies.index[:10]
Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64', name='movieId')
item_embedding.shape
(165237, 64)
for movie_ID in movies.index[:10]:
try:
print("Query: ", movies.loc[movie_ID]["title"], movies.loc[movie_ID]["genres"])
print("Similar movies: ")
similar_movies = find_similar_movies(mid_to_key(movie_ID), item_embedding)
for i in similar_movies[1:]:
try:
print(key_to_mid(i), movies.loc[key_to_mid(i)]["title"], movies.loc[key_to_mid(i)]["genres"])
except Exception as e:
pass
print("=================================\n")
except Exception as e:
pass
Query: Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
Similar movies:
339 While You Were Sleeping (1995) Comedy|Romance
2549 Wing Commander (1999) Action|Sci-Fi
=================================
Query: Jumanji (1995) Adventure|Children|Fantasy
Similar movies:
511 Program, The (1993) Action|Drama
1897 High Art (1998) Drama|Romance
314 Secret of Roan Inish, The (1994) Children|Drama|Fantasy|Mystery
28 Persuasion (1995) Drama|Romance
194 Smoke (1995) Comedy|Drama
80 White Balloon, The (Badkonake sefid) (1995) Children|Drama
10 GoldenEye (1995) Action|Adventure|Thriller
1084 Bonnie and Clyde (1967) Crime|Drama
649 Cold Fever (Á köldum klaka) (1995) Comedy|Drama
=================================
Query: Grumpier Old Men (1995) Comedy|Romance
Similar movies:
626 Thin Line Between Love and Hate, A (1996) Comedy
952 Around the World in 80 Days (1956) Adventure|Comedy
1119 Drunks (1995) Drama
353 Crow, The (1994) Action|Crime|Fantasy|Thriller
791 Last Klezmer: Leopold Kozlowski, His Life and Music, The (1994) Documentary
1115 Sleepover (1995) Drama
237 Forget Paris (1995) Comedy|Romance
389 Colonel Chabert, Le (1994) Drama|Romance|War
=================================
Query: Waiting to Exhale (1995) Comedy|Drama|Romance
Similar movies:
406 Federal Hill (1994) Drama
827 Convent, The (O Convento) (1995) Drama
266 Legends of the Fall (1994) Drama|Romance|War|Western
261 Little Women (1994) Drama
264 Enfer, L' (1994) Drama
511 Program, The (1993) Action|Drama
2506 Other Sister, The (1999) Comedy|Drama|Romance
1061 Sleepers (1996) Thriller
206 Unzipped (1995) Documentary
=================================
Query: Father of the Bride Part II (1995) Comedy
Similar movies:
2965 Omega Code, The (1999) Action
1050 Looking for Richard (1996) Documentary|Drama
=================================
Query: Heat (1995) Action|Crime|Thriller
Similar movies:
5370 Big Bad Mama II (1987) Action|Comedy
1528 Intimate Relations (1996) Comedy
1679 Chairman of the Board (1998) Comedy
603 Bye Bye, Love (1995) Comedy
2786 Haunted Honeymoon (1986) Comedy
=================================
Query: Sabrina (1995) Comedy|Romance
Similar movies:
260 Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Sci-Fi
603 Bye Bye, Love (1995) Comedy
726 Last Dance (1996) Drama
47 Seven (a.k.a. Se7en) (1995) Mystery|Thriller
2162 NeverEnding Story II: The Next Chapter, The (1990) Adventure|Children|Fantasy
82 Antonia's Line (Antonia) (1995) Comedy|Drama
=================================
Query: Tom and Huck (1995) Adventure|Children
Similar movies:
368 Maverick (1994) Adventure|Comedy|Western
3579 I Dreamed of Africa (2000) Drama
477 What's Love Got to Do with It? (1993) Drama|Musical
423 Blown Away (1994) Action|Thriller
339 While You Were Sleeping (1995) Comedy|Romance
1693 Amistad (1997) Drama|Mystery
35 Carrington (1995) Drama|Romance
400 Homage (1995) Drama
=================================
Query: Sudden Death (1995) Action
Similar movies:
742 Thinner (1996) Horror|Thriller
481 Kalifornia (1993) Drama|Thriller
715 Horseman on the Roof, The (Hussard sur le toit, Le) (1995) Drama|Romance
237 Forget Paris (1995) Comedy|Romance
640 Diabolique (1996) Drama|Thriller
574 Spanking the Monkey (1994) Comedy|Drama
32 Twelve Monkeys (a.k.a. 12 Monkeys) (1995) Mystery|Sci-Fi|Thriller
8 Tom and Huck (1995) Adventure|Children
=================================
Query: GoldenEye (1995) Action|Adventure|Thriller
Similar movies:
257 Just Cause (1995) Mystery|Thriller
1913 Picnic at Hanging Rock (1975) Drama|Mystery
1224 Henry V (1989) Action|Drama|Romance|War
1542 Brassed Off (1996) Comedy|Drama|Romance
243 Gordy (1995) Children|Comedy|Fantasy
2335 Waterboy, The (1998) Comedy
1218 Killer, The (Die xue shuang xiong) (1989) Action|Crime|Drama|Thriller
477 What's Love Got to Do with It? (1993) Drama|Musical
1894 Six Days Seven Nights (1998) Adventure|Comedy|Romance
=================================