Hierarchical Parameter Server Demo


In HugeCTR version 3.5, we provide Python APIs for embedding table lookup with HugeCTR Hierarchical Parameter Server (HPS) HPS supports different database backends and GPU embedding caches.

This notebook demonstrates how to use HPS with HugeCTR Python APIs. Without loss of generality, the HPS APIs are utilized together with the ONNX Runtime APIs to create an ensemble inference model, where HPS is responsible for embedding table lookup while the ONNX model takes charge of feed forward of dense neural networks.

  1. Inference with HPS & ONNX

  2. Lookup the Embedding Vector from DLPacke

  3. Multi-process inferenceon

  4. Redis Cluster deployment (without TLS/SSL)

  5. Redis Cluster deployment (with TLS/SSL)


To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

Data Generation

HugeCTR provides a tool to generate synthetic datasets. The Data Generator is capable of generating datasets of different file formats and different distributions. We will generate one-hot Parquet datasets with power-law distribution for this notebook:

import hugectr
from import DataGeneratorParams, DataGenerator

data_generator_params = DataGeneratorParams(
  format = hugectr.DataReaderType_t.Parquet,
  label_dim = 1,
  dense_dim = 10,
  num_slot = 4,
  i64_input_key = True,
  nnz_array = [1, 1, 1, 1],
  source = "./data_parquet/file_list.txt",
  eval_source = "./data_parquet/file_list_test.txt",
  slot_size_array = [10000, 10000, 10000, 10000],
  check_type = hugectr.Check_t.Non,
  dist_type = hugectr.Distribution_t.PowerLaw,
  power_law_type = hugectr.PowerLaw_t.Short,
  num_files = 16,
  eval_num_files = 4,
  num_samples_per_file = 40960)
data_generator = DataGenerator(data_generator_params)
[HCTR][06:31:47.413][INFO][RK0][main]: Generate Parquet dataset
[HCTR][06:31:47.413][INFO][RK0][main]: train data folder: ./data_parquet, eval data folder: ./data_parquet, slot_size_array: 10000, 10000, 10000, 10000, nnz array: 1, 1, 1, 1, #files for train: 16, #files for eval: 4, #samples per file: 40960, Use power law distribution: 1, alpha of power law: 1.3
[HCTR][06:31:47.416][INFO][RK0][main]: ./data_parquet exist
[HCTR][06:31:47.423][INFO][RK0][main]: ./data_parquet/train/gen_0.parquet
[HCTR][06:31:50.739][INFO][RK0][main]: ./data_parquet/train/gen_1.parquet
[HCTR][06:31:50.846][INFO][RK0][main]: ./data_parquet/train/gen_2.parquet
[HCTR][06:31:50.929][INFO][RK0][main]: ./data_parquet/train/gen_3.parquet
[HCTR][06:31:51.011][INFO][RK0][main]: ./data_parquet/train/gen_4.parquet
[HCTR][06:31:51.092][INFO][RK0][main]: ./data_parquet/train/gen_5.parquet
[HCTR][06:31:51.171][INFO][RK0][main]: ./data_parquet/train/gen_6.parquet
[HCTR][06:31:51.250][INFO][RK0][main]: ./data_parquet/train/gen_7.parquet
[HCTR][06:31:51.329][INFO][RK0][main]: ./data_parquet/train/gen_8.parquet
[HCTR][06:31:51.407][INFO][RK0][main]: ./data_parquet/train/gen_9.parquet
[HCTR][06:31:51.485][INFO][RK0][main]: ./data_parquet/train/gen_10.parquet
[HCTR][06:31:51.562][INFO][RK0][main]: ./data_parquet/train/gen_11.parquet
[HCTR][06:31:51.638][INFO][RK0][main]: ./data_parquet/train/gen_12.parquet
[HCTR][06:31:51.715][INFO][RK0][main]: ./data_parquet/train/gen_13.parquet
[HCTR][06:31:51.792][INFO][RK0][main]: ./data_parquet/train/gen_14.parquet
[HCTR][06:31:51.868][INFO][RK0][main]: ./data_parquet/train/gen_15.parquet
[HCTR][06:31:51.962][INFO][RK0][main]: ./data_parquet/file_list.txt done!
[HCTR][06:31:51.986][INFO][RK0][main]: ./data_parquet/val/gen_0.parquet
[HCTR][06:31:52.064][INFO][RK0][main]: ./data_parquet/val/gen_1.parquet
[HCTR][06:31:52.142][INFO][RK0][main]: ./data_parquet/val/gen_2.parquet
[HCTR][06:31:52.218][INFO][RK0][main]: ./data_parquet/val/gen_3.parquet
[HCTR][06:31:52.296][INFO][RK0][main]: ./data_parquet/file_list_test.txt done!

Train from Scratch

We can train from scratch by performing the following steps with Python APIs:

  1. Create the solver, reader and optimizer, then initialize the model.

  2. Construct the model graph by adding input, sparse embedding and dense layers in order.

  3. Compile the model and have an overview of the model graph.

  4. Dump the model graph to the JSON file.

  5. Fit the model, save the model weights and optimizer states implicitly.

  6. Dump one batch of evaluation results to files.

import os
import hugectr
from mpi4py import MPI
import numpy as np
solver = hugectr.CreateSolver(model_name = "hps_demo",
                              max_eval_batches = 1,
                              batchsize_eval = 1024,
                              batchsize = 1024,
                              lr = 0.001,
                              vvgpu = [[0]],
                              i64_input_key = True,
                              repeat_dataset = True,
                              use_cuda_graph = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
                                  source = ["./data_parquet/file_list.txt"],
                                  eval_source = "./data_parquet/file_list_test.txt",
                                  check_type = hugectr.Check_t.Non,
                                  slot_size_array = [10000, 10000, 10000, 10000])
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 10, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("data1", [1, 1], True, 2),
                        hugectr.DataReaderSparseParam("data2", [1, 1], True, 2)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 4,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "data1",
                            optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 8,
                            embedding_vec_size = 32,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding2",
                            bottom_name = "data2",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding2"],
                            top_names = ["reshape2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "reshape2", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat1"],
                            top_names = ["fc1"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["relu1"],
                            top_names = ["fc2"],
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["fc2", "label"],
                            top_names = ["loss"]))
model.graph_to_json("hps_demo.json") = 1100, display = 200, eval_interval = 1000, snapshot = 1000, snapshot_prefix = "hps_demo")

ground_truth = model.check_out_tensor("fc2", hugectr.Tensor_t.Evaluate)"ground_truth.npy", ground_truth)
HugeCTR Version: 23.8
====================================================Model Init=====================================================
[HCTR][06:32:11.556][INFO][RK0][main]: Initialize model: hps_demo
[HCTR][06:32:11.556][INFO][RK0][main]: Global seed is 2598678435
[HCTR][06:32:11.561][INFO][RK0][main]: Device to NUMA mapping:
[HCTR][06:32:11.642][INFO][RK0][main]:   GPU 0 ->  node 0
[HCTR][06:32:15.564][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][06:32:15.564][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.0886 
[HCTR][06:32:15.564][INFO][RK0][main]: Start all2all warmup
[HCTR][06:32:15.565][INFO][RK0][main]: End all2all warmup
[HCTR][06:32:15.566][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][06:32:15.567][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
[HCTR][06:32:15.636][INFO][RK0][main]: eval source ./data_parquet/file_list_test.txt max_row_group_size 40960
[HCTR][06:32:15.808][INFO][RK0][main]: train source ./data_parquet/file_list.txt max_row_group_size 40960
[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for train: 1
[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for eval: 1
[HCTR][06:32:15.937][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
[HCTR][06:32:15.938][DEBUG][RK0][main]: [device 0] allocating 0.0047 GB, available 29.6921 
[HCTR][06:32:15.939][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
[HCTR][06:32:15.940][DEBUG][RK0][main]: [device 0] allocating 0.0092 GB, available 29.6824 
[HCTR][06:32:15.940][INFO][RK0][main]: Graph analysis to resolve tensor dependency
[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.===================================================Model Compile===================================================
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 init embedding done
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][06:32:17.206][INFO][RK0][main]: gpu0 init embedding done
[HCTR][06:32:17.207][INFO][RK0][main]: Starting AUC NCCL warm-up
[HCTR][06:32:17.208][INFO][RK0][main]: Warm-up done
===================================================Model Summary===================================================
[HCTR][06:32:17.208][INFO][RK0][main]: Model structure on each GPU
Label                                   Dense                         Sparse                        
label                                   dense                          data1,data2                   
(1024, 1)                               (1024, 10)                              
Layer Type                              Input Name                    Output Name                   Output Shape                  
DistributedSlotSparseEmbeddingHash      data1                         sparse_embedding1             (1024, 2, 16)                 
DistributedSlotSparseEmbeddingHash      data2                         sparse_embedding2             (1024, 2, 32)                 
Reshape                                 sparse_embedding1             reshape1                      (1024, 32)                    
Reshape                                 sparse_embedding2             reshape2                      (1024, 64)                    
Concat                                  reshape1                      concat1                       (1024, 106)                   
InnerProduct                            concat1                       fc1                           (1024, 1024)                  
ReLU                                    fc1                           relu1                         (1024, 1024)                  
InnerProduct                            relu1                         fc2                           (1024, 1)                     
BinaryCrossEntropyLoss                  fc2                           loss                                                        
[HCTR][06:32:17.212][INFO][RK0][main]: Save the model graph to hps_demo.json successfully
=====================================================Model Fit=====================================================
[HCTR][06:32:17.213][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1100
[HCTR][06:32:17.213][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation interval: 1000, snapshot interval: 1000
[HCTR][06:32:17.213][INFO][RK0][main]: Dense network trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding2 trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][06:32:17.213][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
[HCTR][06:32:17.213][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][06:32:17.213][INFO][RK0][main]: Training source file: ./data_parquet/file_list.txt
[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation source file: ./data_parquet/file_list_test.txt
[HCTR][06:32:17.658][INFO][RK0][main]: Iter: 200 Time(200 iters): 0.444961s Loss: 0.693355 lr:0.001
[HCTR][06:32:18.167][INFO][RK0][main]: Iter: 400 Time(200 iters): 0.508793s Loss: 0.694358 lr:0.001
[HCTR][06:32:18.589][INFO][RK0][main]: Iter: 600 Time(200 iters): 0.422282s Loss: 0.695494 lr:0.001
[HCTR][06:32:18.764][INFO][RK0][main]: Iter: 800 Time(200 iters): 0.175263s Loss: 0.691037 lr:0.001
[HCTR][06:32:18.939][INFO][RK0][main]: Iter: 1000 Time(200 iters): 0.174492s Loss: 0.688767 lr:0.001
[HCTR][06:32:18.940][INFO][RK0][main]: Evaluation, AUC: 0.503806
[HCTR][06:32:18.940][INFO][RK0][main]: Eval Time for 1 iters: 0.000913s
[HCTR][06:32:18.941][INFO][RK0][main]: Rank0: Write hash table to file
[HCTR][06:32:19.024][INFO][RK0][main]: Rank0: Write hash table to file
[HCTR][06:32:19.092][INFO][RK0][main]: Dumping sparse weights to files, successful
[HCTR][06:32:19.093][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.123][INFO][RK0][main]: Done
[HCTR][06:32:19.123][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.148][INFO][RK0][main]: Done
[HCTR][06:32:19.150][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.203][INFO][RK0][main]: Done
[HCTR][06:32:19.203][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.252][INFO][RK0][main]: Done
[HCTR][06:32:19.252][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
[HCTR][06:32:19.262][INFO][RK0][main]: Dumping dense weights to file, successful
[HCTR][06:32:19.279][INFO][RK0][main]: Dumping dense optimizer states to file, successful
[HCTR][06:32:19.368][INFO][RK0][main]: Finish 1100 iterations with batchsize: 1024 in 2.16s.

Convert HugeCTR to ONNX

We will convert the saved HugeCTR models to ONNX using the HugeCTR to ONNX Converter. For more information about the converter, refer to the README in the onnx_converter directory of the repository.

For the sake of double checking the correctness, we will investigate both cases of conversion depending on whether or not to convert the sparse embedding models.

import hugectr2onnx
hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_with_embedding.onnx",
                            graph_config = "hps_demo.json",
                            dense_model = "hps_demo_dense_1000.model",
                            convert_embedding = True,
                            sparse_models = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"])

hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_without_embedding.onnx",
                            graph_config = "hps_demo.json",
                            dense_model = "hps_demo_dense_1000.model",
                            convert_embedding = False)
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
[HUGECTR2ONNX][INFO]: The model is checked!
[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_with_embedding.onnx
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
Skip sparse embedding layers in converted ONNX model
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
Skip sparse embedding layers in converted ONNX model
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
[HUGECTR2ONNX][INFO]: The model is checked!
[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_without_embedding.onnx

1. Inference with HPS & ONNX

We will make inference by performing the following steps with Python APIs:

  1. Configure the HPS hyperparameters. Please refer to hps configuration for detailed configurations.

  2. Initialize the HPS object, which is responsible for embedding table lookup.

  3. Loading the Parquet data.

  4. Make inference with the HPS object and the ONNX inference session of hps_demo_without_embedding.onnx.

  5. Check the correctness by comparing with dumped evaluation results.

  6. Make inference with the ONNX inference session of hps_demo_with_embedding.onnx (double check).

from hugectr.inference import HPS, ParameterServerConfig, InferenceParams

import pandas as pd
import numpy as np

import onnxruntime as ort

slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024

# 1. Configure the HPS hyperparameters
ps_config = ParameterServerConfig(
           emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
           embedding_vec_size = {"hps_demo": [16, 32]},
           max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
           inference_params_array = [
                model_name = "hps_demo",
                max_batchsize = batch_size,
                hit_rate_threshold = 1.0,
                dense_model_file = "",
                sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
                deployed_devices = [0],
                use_gpu_embedding_cache = True,
                cache_size_percentage = 0.5,
                i64_input_key = True)

# 2. Initialize the HPS object
hps = HPS(ps_config)

# 3. Loading the Parquet data.
df = pd.read_parquet("data_parquet/val/gen_0.parquet")
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))

# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
res =[sess.get_outputs()[0].name],
               input_feed={sess.get_inputs()[0].name: dense_input,
               sess.get_inputs()[1].name: embedding1,
               sess.get_inputs()[2].name: embedding2})
pred = res[0]

# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print("ground_truth: ", ground_truth)

diff = pred.flatten()-ground_truth
mse = np.mean(diff*diff)
print("pred: ", pred)
print("mse between pred and ground_truth: ", mse)

# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
res_ref =[sess_ref.get_outputs()[0].name],
                   input_feed={sess_ref.get_inputs()[0].name: dense_input,
                   sess_ref.get_inputs()[1].name: cat_input1,
                   sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0]
diff_ref = pred_ref.flatten()-ground_truth
mse_ref = np.mean(diff_ref*diff_ref)
print("pred_ref: ", pred_ref)
print("mse between pred_ref and ground_truth: ", mse_ref)
[HCTR][06:32:40.791][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:32:40.791][INFO][RK0][main]: Creating HashMap CPU database backend...
[HCTR][06:32:40.791][DEBUG][RK0][main]: Created blank database backend in local memory!
[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:32:40.791][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][06:32:41.123][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (HashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:32:41.431][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (HashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][06:32:41.431][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:32:41.431][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:32:41.437][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:32:41.437][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:32:41.437][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:32:41.437][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:32:41.437][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:32:41.437][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:32:41.437][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:32:41.437][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:32:41.437][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:32:41.437][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:32:41.437][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:32:41.437][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:32:41.453][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:32:41.453][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
ground_truth:  [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
pred:  [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
mse between pred and ground_truth:  2.3887142e-15
pred_ref:  [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
mse between pred_ref and ground_truth:  2.3887142e-15
2023-09-20 06:32:41.566238532 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.

2. Lookup the Embedding Vector from DLPack

We also provide a lookup_fromdlpack interface that could query embedding keys on the CPU and return the embedding vectors on the GPU/CPU.

  1. Suppose you have created a Pytorch/Tensorflow tensor that stores the embedded keys.

  2. Convert the embedding key tensor to DLPack capsule through the corresponding platform’s to_dlpack function.

  3. Creates an empty tensor as a buffer to store embedding vectors.

  4. Convert a buffer tensor to DLPack capsule.

  5. Lookup the embedding vector of the corresponding embedding key directly through lookup_fromdlpack interface, and output it to the embedding vector buffer tensor

  6. If the output capsule is allocated on the GPU, then a device_id needs to be specified in lookup_fromdlpack interface for corresponding embedding cache. If not specified, the default value is device 0

Note: Please make sure that tensorflow or pytorch have been installed correctly in the merlin-hugectr container:

pip install tensorflow
pip install torch
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)

# 1. Look up from dlpack for Pytorch tensor on CPU
print(" Look up from dlpack for Pytorch tensor")
import torch.utils.dlpack
import os
print("************Look up from pytorch dlpack on CPU")
device = torch.device("cpu")
key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
out = torch.empty((1,cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=device)
key_capsule = torch.utils.dlpack.to_dlpack(key)
print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the keys: {}".format(key.device, key))
out_capsule = torch.utils.dlpack.to_dlpack(out)
# Lookup the embedding vectors from dlpack
hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
out_put = torch.utils.dlpack.from_dlpack(out_capsule)
print("[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n".format(out_put.device, out_put))
diff = out_put-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
if diff.mean() > 1e-4:
    raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
    print("Pytorch dlpack on cpu  results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))

# 2. Look up from dlpack for Pytorch tensor on GPU
print("************Look up from pytorch dlpack on GPU")
cuda_device = torch.device("cuda:0" if torch.cuda.is_available else "cpu")
key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
key_capsule = torch.utils.dlpack.to_dlpack(key)
out = torch.empty((cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=cuda_device)
out_capsule = torch.utils.dlpack.to_dlpack(out)
hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
out_put = torch.utils.dlpack.from_dlpack(out_capsule)
print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n\n".format(out_put.device, out_put))
diff = out_put.cpu()-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
if diff.mean() > 1e-3:
    raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
    print("Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))
 Look up from dlpack for Pytorch tensor
************Look up from pytorch dlpack on CPU
The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the keys: tensor([   85, 10028,     0,  ..., 10004,    10, 10000])
[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the vectors: tensor([[-0.0307,  0.0264, -0.0294,  ...,  0.0151, -0.0281,  0.0088]])

Pytorch dlpack on cpu  results are consistent with native HPS lookup api, mse: 0.0
************Look up from pytorch dlpack on GPU
The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cuda:0, the vectors: tensor([-0.0307,  0.0264, -0.0294,  ...,  0.0151, -0.0281,  0.0088],

Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0
# 3. Look up from dlpack for tensorflow tensor on CPU
print("Look up from dlpack for Tensorflow tensor")
from tensorflow.python.dlpack import dlpack  
import tensorflow as tf
from tensorflow.python.eager import context
from tensorflow.python.framework import dtypes
print("***************Look up from tensorflow dlpack on CPU**********")
with tf.device('/CPU:0'):
    key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
    out_tensor = tf.zeros([1, cat_input2.flatten().shape[0]*32],dtype=tf.float32)
    print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the keys: {}".format(key_tensor.device, key_tensor))
    key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
    out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
out = tf.experimental.dlpack.from_dlpack(out_dlcapsule)
print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the vectors: {}\n".format(out.device, out))
diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
mse = tf.reduce_mean(diff)
if mse> 1e-3:
    raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
    print("tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: {}".format(mse))
# 4. Look up from dlpack for tensorflow tensor on GPU
print("***************Look up from tensorflow dlpack on GPU**********")
with tf.device('/GPU:0'):
    key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
    out_tensor = tf.zeros([cat_input2.flatten().shape[0]*32],dtype=tf.float32)
    key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
    out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
out= tf.experimental.dlpack.from_dlpack(out_dlcapsule)
print("[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: {}, the vectors: {}\n".format(out.device, out))
diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
mse = tf.reduce_mean(diff)
if mse> 1e-3:
    raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
    print("tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(mse))
Look up from dlpack for Tensorflow tensor
2023-09-20 06:34:21.729218: I tensorflow/core/platform/] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
***************Look up from tensorflow dlpack on CPU**********
2023-09-20 06:34:44.168630: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30048 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2023-09-20 06:34:44.170043: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30184 MB memory:  -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2023-09-20 06:34:44.171618: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30184 MB memory:  -> device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0
2023-09-20 06:34:44.173095: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 30184 MB memory:  -> device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0
2023-09-20 06:34:44.174795: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 30184 MB memory:  -> device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0
2023-09-20 06:34:44.176299: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 30184 MB memory:  -> device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
2023-09-20 06:34:44.177782: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 30184 MB memory:  -> device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
2023-09-20 06:34:44.179411: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 30184 MB memory:  -> device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0
The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the keys: [20005 30047 20004 ... 30001 20037 30001]
The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the vectors: [[ 0.02182689  0.01806355  0.01985828 ...  0.0136845  -0.01738386

tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: 0.0
***************Look up from tensorflow dlpack on GPU**********
[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: /job:localhost/replica:0/task:0/device:GPU:0, the vectors: [ 0.02182689  0.01806355  0.01985828 ...  0.0136845  -0.01738386

tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0

3. Multi-process inference

It is possible to share the a hashmap database between multiple processes. The following example launches 3 processes which achieve this using the operating system’s shared memory, which is located at /dev/shm in most unix systems. In this example, we separate processes into a primary and multiple secondary processes, and only the primary process initializes the shared memory database. The secondary processes wait until the shared memory has been fully initialized. However, note that inter-process database access is guaranteed to be thread-safe. Therefore, it is also possible to implement more complicated initialization/refresh mechanisms for your use-case.

import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams

slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024

def create_hps(name, initialized, device_id, num_max_processes):
    # 1. Let secondary processes wait until shared memory is initialized.
    while name != 'primary' and initialized.value == 0:
        print(f'Subprocess {name} awaiting SHM initialization...')

    # 2. Configure the HPS hyperparameters
    ps_config = ParameterServerConfig(
           emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
           embedding_vec_size = {"hps_demo": [16, 32]},
           max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
           inference_params_array = [
                model_name = "hps_demo",
                max_batchsize = batch_size,
                hit_rate_threshold = 1.0,
                dense_model_file = "",
                sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
                deployed_devices = [device_id],
                use_gpu_embedding_cache = True,
                cache_size_percentage = 0.5,
                i64_input_key = True)
           volatile_db = VolatileDatabaseParams(
                DatabaseType_t.multi_process_hash_map,  # Use /dev/shm instead of normal memory for storage.
                # Skips initializing model. If we run HPS in multiple processes, only one needs to initialize.
                initialize_after_startup = name == 'primary',

    # 3. Initialize the HPS object
    hps = HPS(ps_config)
    initialized.value += 1
    print(f'Subprocess {name} initialized')
    # 4. In (1) the secondary processes wait until the primary process has completed initializing
    #    the shared memory. If the last process disconnects, the shared memory is erased.
    #    Therefore, if threads that currently have attached to the shared memory manage to complete
    #    their program before another process has attached, the contents of the shared memory are
    #    lost and the new process will instead construct an empty shared memory. To avoid this
    #    situation, we have multiple options.
    #   a) Setting `shared_memory_auto_remove = False` in the `VolatileDatabaseParams`
    #      configuration [default: True]. This will prevent the deletion of the shared memory when
    #      the last process disconnects. In other words, revoking this flag allows you to preserve
    #      and use the state of a shared memory across multiple program restarts. However, while
    #      desirable in some situations, this is not the behavior we need here, because this
    #      notebook cell should be allowed to be executed repeatedly without relying on risidual
    #      state.
    #   b) Another approach is to ensure that the all other processes that should attach have
    #      attached. Here we achieve this by simply monitoring the `initialized` cross process
    #      counter variable that we used in (1). Once it hits `num_max_processes` we can be sure
    #      that each subprocess has properly connected.
    while initialized.value != num_max_processes:
        print(f'Subprocess {name} await other processes...')
    # 5. Load query data.
    df = pd.read_parquet("data_parquet/val/gen_0.parquet")
    dense_input_columns = df.columns[1:11]
    cat_input1_columns = df.columns[11:13]
    cat_input2_columns = df.columns[13:15]
    dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
    cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
    cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))

    # 6. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
    embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0,device_id).reshape(batch_size, 2, 16)
    embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1,device_id).reshape(batch_size, 2, 32)
    sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
    res =[sess.get_outputs()[0].name],
                   input_feed={sess.get_inputs()[0].name: dense_input,
                   sess.get_inputs()[1].name: embedding1,
                   sess.get_inputs()[2].name: embedding2})
    pred = res[0]

    # 7. Check the correctness by comparing with dumped evaluation results.
    ground_truth = np.load("ground_truth.npy").flatten()
    print(f'Subprocess {name}; ground_truth: {ground_truth}')
    diff = pred.flatten()-ground_truth
    mse = np.mean(diff*diff)
    print(f'Subprocess {name}; pred: {pred}')
    print(f'Subprocess {name}; mse between pred and ground_truth: {mse}')

    # 8. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
    sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
    res_ref =[sess_ref.get_outputs()[0].name],
                   input_feed={sess_ref.get_inputs()[0].name: dense_input,
                   sess_ref.get_inputs()[1].name: cat_input1,
                   sess_ref.get_inputs()[2].name: cat_input2})
    pred_ref = res_ref[0]
    diff_ref = pred_ref.flatten()-ground_truth
    mse_ref = np.mean(diff_ref*diff_ref)
    print(f'Subprocess {name}; pred_ref: {pred_ref}')
    print(f'Subprocess {name}; mse between pred_ref and ground_truth: {mse_ref}')

    print(f'Subprocess {name} exiting...')

if __name__ == '__main__':
    # Destroy shared memory.
    initialized = mp.Value('i', 0)

    # Create sub processes.
    processes = [
        mp.Process(target=create_hps, args=('primary', initialized, 0, 3)),
        mp.Process(target=create_hps, args=('secondary', initialized, 1, 3)),
        mp.Process(target=create_hps, args=('secondary', initialized, 2, 3)),
    for p in processes:

    # Go to sleep until subprocesses are initialized.
    while initialized.value < len(processes):
        print(f'Main process; awaiting subprocess initialization... So far {initialized.value} initialized...')
    # Wait for subprocesses to exit.
    for i, p in enumerate(processes):
        print(f'Main process; awaiting subprocess {i} to exit...')
    print(f'Main process; exiting...')
[HCTR][06:48:37.272][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:37.272][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:37.272][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:37.772][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 269706559488 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 17179868672 bytes; other processes connected = 0
[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:37.773][DEBUG][RK0][main]: Created raw model loader in local memory!
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:38.313][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (MultiProcessHashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:48:38.947][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (MultiProcessHashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:41.289][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:41.289][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:48:41.295][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:41.295][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:41.295][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:41.295][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:41.295][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:41.295][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:41.295][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:41.295][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:41.295][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:41.295][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:41.295][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:41.295][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:48:41.311][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:41.311][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
Subprocess primary initialized
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
[HCTR][06:48:42.279][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:42.280][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:42.281][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
[HCTR][06:48:42.281][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:42.282][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:42.282][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
Subprocess primary await other processes...
[HCTR][06:48:42.781][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:42.781][DEBUG][RK0][main]: Created raw model loader in local memory!
Main process; awaiting subprocess initialization... So far 1 initialized...
[HCTR][06:48:43.281][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:43.281][DEBUG][RK0][main]: Created raw model loader in local memory!
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
[HCTR][06:48:45.440][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:45.441][INFO][RK0][main]: Creating embedding cache in device 1.
[HCTR][06:48:45.463][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:45.463][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:45.463][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:45.463][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:45.463][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:45.463][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:45.463][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:45.463][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:45.463][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:45.463][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:45.463][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:45.463][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:48:45.706][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:45.706][INFO][RK0][main]: Creating embedding cache in device 2.
[HCTR][06:48:45.711][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:45.711][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:45.711][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:45.711][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:45.711][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:45.711][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:45.711][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:45.711][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:45.711][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:45.711][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:45.711][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:45.711][INFO][RK0][main]: The refresh percentage : 0.000000
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
[HCTR][06:48:46.699][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:46.699][INFO][RK0][main]: Creating lookup session for hps_demo on device: 1
Subprocess secondary initialized
Subprocess secondary await other processes...
[HCTR][06:48:46.764][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:46.764][INFO][RK0][main]: Creating lookup session for hps_demo on device: 2
Subprocess secondary initialized
2023-09-20 06:48:46.842594773 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess secondary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Subprocess secondary; pred: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess secondary; pred_ref: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess secondary exiting...
[HCTR][06:48:46.900][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
Main process; awaiting subprocess 0 to exit...
2023-09-20 06:48:47.497305659 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess primary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Subprocess primary; pred: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess primary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess primary; pred_ref: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess primary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess primary exiting...
[HCTR][06:48:47.568][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
2023-09-20 06:48:48.101124718 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess secondary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Subprocess secondary; pred: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess secondary; pred_ref: [[0.48954916]
 [0.5065045 ]
 [0.4792769 ]]
Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess secondary exiting...
[HCTR][06:48:48.176][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
Main process; awaiting subprocess 1 to exit...
[HCTR][06:48:48.687][INFO][RK0][main]: Detached last process from shared memory 'hctr_mp_hash_map_database'. Auto remove in progress...
Main process; awaiting subprocess 2 to exit...
Main process; exiting...

4. Redis Cluster deployment (without TLS/SSL)

HugeCTR can use Redis clusters as backing storage. In the following steps we show how to setup a mock Redis / HugeCTR deployment in a single machine. We assume that you have started this notebook in a HugeCTR docker container.

Step 1: Get + build Redis

!rm -f 7.0.8.tar.gz && wget
!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
!cd redis && make
--2023-09-20 06:49:01--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2023-09-20 06:49:01--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘7.0.8.tar.gz’

7.0.8.tar.gz            [   <=>              ]   2.87M  5.50MB/s    in 0.5s    

2023-09-20 06:49:02 (5.50 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]

cd src && make all
make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
./ line 2: echo: write error: Broken pipe
    CC Makefile.dep
./ line 2: echo: write error: Broken pipe
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov lcov-html Makefile.dep
rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
(cd ../deps && make distclean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
rm -rf *.xo *.so
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
(cd ../tests/modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
rm -f commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
(rm -f .make-*)
echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
echo OPT=-O2 >> .make-settings
echo MALLOC=jemalloc >> .make-settings
echo BUILD_TLS= >> .make-settings
echo USE_SYSTEMD= >> .make-settings
echo CFLAGS= >> .make-settings
echo LDFLAGS= >> .make-settings
echo REDIS_CFLAGS= >> .make-settings
echo REDIS_LDFLAGS= >> .make-settings
echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb   -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include >> .make-settings
echo PREV_FINAL_LDFLAGS=  -g -ggdb -rdynamic >> .make-settings
(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
(echo "" > .make-cflags)
(echo "" > .make-ldflags)
MAKE hiredis
cd hiredis && make static 
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
MAKE linenoise
cd linenoise && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
cc  -Wall -Os -g  -c linenoise.c
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
MAKE lua
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldebug.o ldebug.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lobject.o lobject.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lopcodes.o lopcodes.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lparser.o lparser.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstate.o lstate.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstring.o lstring.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltable.o ltable.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lundump.o lundump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o strbuf.o strbuf.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o fpconv.o fpconv.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lauxlib.o lauxlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lbaselib.o lbaselib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldblib.o ldblib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o liolib.o liolib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmathlib.o lmathlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loslib.o loslib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltablib.o ltablib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstrlib.o lstrlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loadlib.o loadlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cjson.o lua_cjson.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_struct.o lua_struct.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cmsgpack.o lua_cmsgpack.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_bit.o lua_bit.c
ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o	# DLL needs all object files
ranlib liblua.a
cc -o lua  lua.o liblua.a -lm 
cc -o luac  luac.o print.o liblua.a -lm 
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
MAKE hdr_histogram
cd hdr_histogram && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
cc -std=c99 -Wall -Os -g  -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c  hdr_histogram.c 
ar rcs libhdrhistogram.a hdr_histogram.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
MAKE jemalloc
cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" 
checking for xsltproc... false
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether compiler is cray... no
checking whether compiler supports -std=gnu11... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -Wshorten-64-to-32... no
checking whether compiler supports -Wsign-compare... yes
checking whether compiler supports -Wundef... yes
checking whether compiler supports -Wno-format-zero-length... yes
checking whether compiler supports -pipe... yes
checking whether compiler supports -g3... yes
checking how to run the C preprocessor... gcc -E
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether g++ supports C++14 features by default... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -g3... yes
checking whether libstdc++ linkage is compilable... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether byte ordering is bigendian... no
checking size of void *... 8
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of intmax_t... 8
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether pause instruction is compilable... yes
checking number of significant virtual address bits... 48
checking for ar... ar
checking for nm... nm
checking for gawk... no
checking for mawk... mawk
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking whether malloc_usable_size definition can use const argument... no
checking for library containing log... -lm
checking whether __attribute__ syntax is compilable... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether tls_model attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether alloc_size attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(gnu_printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for ld... /usr/bin/ld
checking for autoconf... /usr/bin/autoconf
checking for memalign... yes
checking for valloc... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -funroll-loops... yes
checking configured backtracing method... N/A
checking for sbrk... yes
checking whether utrace(2) is compilable... no
checking whether a program using __builtin_unreachable is compilable... yes
checking whether a program using __builtin_ffsl is compilable... yes
checking whether a program using __builtin_popcountl is compilable... yes
checking LG_PAGE... 12
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for pthread_create in -lpthread... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking for dlsym... yes
checking whether pthread_atfork(3) is compilable... yes
checking whether pthread_setname_np(3) is compilable... yes
checking for library containing clock_gettime... none required
checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
checking whether mach_absolute_time() is compilable... no
checking whether compiler supports -Werror... yes
checking whether syscall(2) is compilable... yes
checking for secure_getenv... yes
checking for sched_getcpu... yes
checking for sched_setaffinity... yes
checking for issetugid... no
checking for _malloc_thread_cleanup... no
checking for _pthread_mutex_init_calloc_cb... no
checking for TLS... yes
checking whether C11 atomics is compilable... no
checking whether GCC __atomic atomics is compilable... yes
checking whether GCC 8-bit __atomic atomics is compilable... yes
checking whether GCC __sync atomics is compilable... yes
checking whether GCC 8-bit __sync atomics is compilable... yes
checking whether Darwin OSAtomic*() is compilable... no
checking whether madvise(2) is compilable... yes
checking whether madvise(..., MADV_FREE) is compilable... yes
checking whether madvise(..., MADV_DONTNEED) is compilable... yes
checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
checking for __builtin_clz... yes
checking whether Darwin os_unfair_lock_*() is compilable... no
checking whether glibc malloc hook is compilable... no
checking whether glibc memalign hook is compilable... no
checking whether pthreads adaptive mutexes is compilable... yes
checking whether compiler supports -D_GNU_SOURCE... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether strerror_r returns char with gnu source is compilable... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating jemalloc.pc
config.status: creating doc/html.xsl
config.status: creating doc/manpages.xsl
config.status: creating doc/jemalloc.xml
config.status: creating include/jemalloc/jemalloc_macros.h
config.status: creating include/jemalloc/jemalloc_protos.h
config.status: creating include/jemalloc/jemalloc_typedefs.h
config.status: creating include/jemalloc/internal/jemalloc_preamble.h
config.status: creating test/
config.status: creating test/include/test/jemalloc_test.h
config.status: creating config.stamp
config.status: creating bin/jemalloc-config
config.status: creating bin/
config.status: creating bin/jeprof
config.status: creating include/jemalloc/jemalloc_defs.h
config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
config.status: creating test/include/test/jemalloc_test_defs.h
config.status: executing include/jemalloc/internal/public_symbols.txt commands
config.status: executing include/jemalloc/internal/private_symbols.awk commands
config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
config.status: executing include/jemalloc/internal/public_namespace.h commands
config.status: executing include/jemalloc/internal/public_unnamespace.h commands
config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
config.status: executing include/jemalloc/jemalloc_rename.h commands
config.status: executing include/jemalloc/jemalloc_mangle.h commands
config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
config.status: executing include/jemalloc/jemalloc.h commands
jemalloc version   : 5.2.1-0-g0
library revision   : 2

CONFIG             : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
CC                 : gcc
CONFIGURE_CFLAGS   : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
SPECIFIED_CFLAGS   : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops 
CXX                : g++
CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
LDFLAGS            : 
DSO_LDFLAGS        : -shared -Wl,-soname,$(@F)
LIBS               : -lm -lstdc++ -pthread
RPATH_EXTRA        : 

XSLTPROC           : false
XSLROOT            : 

PREFIX             : /usr/local
BINDIR             : /usr/local/bin
DATADIR            : /usr/local/share
INCLUDEDIR         : /usr/local/include
LIBDIR             : /usr/local/lib
MANDIR             : /usr/local/share/man

srcroot            : 
abs_srcroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
objroot            : 
abs_objroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/

                   : je_
install_suffix     : 
malloc_conf        : 
documentation      : 1
shared libs        : 1
static libs        : 1
autogen            : 0
debug              : 0
stats              : 1
experimetal_smallocx : 0
prof               : 0
prof-libunwind     : 0
prof-libgcc        : 0
prof-gcc           : 0
fill               : 1
utrace             : 0
xmalloc            : 0
log                : 0
lazy_lock          : 0
cache-oblivious    : 1
cxx                : 1
cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
/bin/sh include/jemalloc/internal/ src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
ar: `u' modifier ignored since `D' is the default (see `U')
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
    CC adlist.o
    CC quicklist.o
    CC ae.o
    CC anet.o
    CC dict.o
    CC server.o
    CC sds.o
    CC zmalloc.o
    CC lzf_c.o
    CC lzf_d.o
    CC pqsort.o
    CC zipmap.o
    CC sha1.o
    CC ziplist.o
    CC release.o
    CC networking.o
    CC util.o
    CC object.o
    CC db.o
    CC replication.o
    CC rdb.o
    CC t_string.o
    CC t_list.o
    CC t_set.o
    CC t_zset.o
    CC t_hash.o
    CC config.o
    CC aof.o
    CC pubsub.o
    CC multi.o
    CC debug.o
    CC sort.o
    CC intset.o
    CC syncio.o
    CC cluster.o
    CC crc16.o
    CC endianconv.o
    CC slowlog.o
    CC eval.o
    CC bio.o
    CC rio.o
    CC rand.o
    CC memtest.o
    CC syscheck.o
    CC crcspeed.o
    CC crc64.o
    CC bitops.o
    CC sentinel.o
    CC notify.o
    CC setproctitle.o
    CC blocked.o
    CC hyperloglog.o
    CC latency.o
    CC sparkline.o
    CC redis-check-rdb.o
    CC redis-check-aof.o
    CC geo.o
    CC lazyfree.o
    CC module.o
    CC evict.o
    CC expire.o
    CC geohash.o
    CC geohash_helper.o
    CC childinfo.o
    CC defrag.o
    CC siphash.o
    CC rax.o
    CC t_stream.o
    CC listpack.o
    CC localtime.o
    CC lolwut.o
    CC lolwut5.o
    CC lolwut6.o
    CC acl.o
    CC tracking.o
    CC connection.o
    CC tls.o
    CC sha256.o
    CC timeout.o
    CC setcpuaffinity.o
    CC monotonic.o
    CC mt19937-64.o
    CC resp_parser.o
    CC call_reply.o
    CC script_lua.o
    CC script.o
    CC functions.o
    CC function_lua.o
    CC commands.o
    LINK redis-server
    INSTALL redis-sentinel
    CC redis-cli.o
    CC redisassert.o
    CC cli_common.o
    LINK redis-cli
    CC redis-benchmark.o
    LINK redis-benchmark
    INSTALL redis-check-rdb
    INSTALL redis-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'

If you see the message Hint: It's a good idea to run 'make test' ;) followed by make[1]: Leaving directory ..., the compilation should have completed successfully.

Step 2: Configure a mock Redis cluster

WARNING: The following commands will erase the all contents in the following directories: redis-server-1, redis-server-2 and redis-server-3.

!mkdir -p redis-server-1 redis-server-2 redis-server-3
!rm -f redis-server-1/* redis-server-2/* redis-server-3/*

!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server
%%writefile redis-server-1/redis.conf
daemonize yes
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-1/redis.conf
%%writefile redis-server-2/redis.conf
daemonize yes
port 7001
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-2/redis.conf
%%writefile redis-server-3/redis.conf
daemonize yes
port 7002
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-3/redis.conf

Step 3: Form Redis cluster

WARNING: The following command will shutdown any processes called redis-cluster in the current system!

# Shutdown existing cluster (if any).
!pkill redis-server

# Reset configuration and start 3 Redis servers.
!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf

# Form the cluster.
!redis/src/redis-cli \
    --cluster create \
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: fa9bb82124685a6438a696cc1562693ccc815ff0
   slots:[0-5460] (5461 slots) master
M: c6d7ad6353bf568d17a147e65b8198ded9d65717
   slots:[5461-10922] (5462 slots) master
M: e26ae6cfbeea8a1e6367444445364d963ae17436
   slots:[10923-16383] (5461 slots) master
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
>>> Performing Cluster Check (using node
M: fa9bb82124685a6438a696cc1562693ccc815ff0
   slots:[0-5460] (5461 slots) master
M: e26ae6cfbeea8a1e6367444445364d963ae17436
   slots:[10923-16383] (5461 slots) master
M: c6d7ad6353bf568d17a147e65b8198ded9d65717
   slots:[5461-10922] (5462 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Step 4: Run HugeCTR

import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams

slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024


# 1. Configure the HPS hyperparameters.
ps_config = ParameterServerConfig(
       emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
       embedding_vec_size = {'hps_demo': [16, 32]},
       max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
       inference_params_array = [
            model_name = 'hps_demo',
            max_batchsize = batch_size,
            hit_rate_threshold = 1.0,
            dense_model_file = '',
            sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
            deployed_devices = [0],
            use_gpu_embedding_cache = True,
            cache_size_percentage = 0.5,
            i64_input_key = True)
       volatile_db = VolatileDatabaseParams(
            address = '',
            num_partitions = 15,
            num_node_connections = 5,

# 2. Initialize the HPS object.
hps = HPS(ps_config)
print('HPS initialized')

# 3. Load query data.
df = pd.read_parquet('data_parquet/val/gen_0.parquet')
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))

# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
res =[sess.get_outputs()[0].name],
               input_feed={sess.get_inputs()[0].name: dense_input,
               sess.get_inputs()[1].name: embedding1,
               sess.get_inputs()[2].name: embedding2})
pred = res[0].flatten()

# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print('                         HPS demo without embedding                            ')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print(f'Prediction without embedding: {pred.shape} = {pred}')

diff = pred - ground_truth
mse = np.mean(diff * diff)
print(f'MSE between prediction and ground_truth: {mse}')

# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
res_ref =[sess_ref.get_outputs()[0].name],
               input_feed={sess_ref.get_inputs()[0].name: dense_input,
               sess_ref.get_inputs()[1].name: cat_input1,
               sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0].flatten()

print('                           HPS demo with embedding                             ')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')

diff_ref = pred_ref.flatten() - ground_truth
mse_ref = np.mean(diff_ref * diff_ref)
print(f'MSE between prediction and ground_truth: {mse_ref}')
HPS initialized
[HCTR][06:54:27.572][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:54:27.572][INFO][RK0][main]: Creating RedisCluster backend...
[HCTR][06:54:27.577][INFO][RK0][main]: RedisCluster: Connecting via
[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:54:27.577][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][06:54:27.753][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:54:27.873][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][06:54:30.134][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:54:30.134][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:54:30.140][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:54:30.140][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:54:30.140][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:54:30.140][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:54:30.140][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:54:30.140][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:54:30.140][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:54:30.140][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:54:30.140][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:54:30.140][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:54:30.140][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:54:30.140][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:54:30.156][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:54:30.156][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
                         HPS demo without embedding                            
Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
                           HPS demo with embedding                             
Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
2023-09-20 06:54:30.230052244 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.

Step 5: Shutdown Redis cluster

!pkill redis-server

5. Redis Cluster deployment (with TLS/SSL)

When using Redis as backing storage, HugeCTR can use make use of TLS/SSL to encrypt data transfers. In the following steps we setupt a small Redis cluster and enable SSL for it.

Step 1: Build a TLS/SSL capable distribution of Redis

!rm -f 7.0.8.tar.gz && wget
!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
!cd redis && make BUILD_TLS=yes
--2023-09-20 06:55:14--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: [following]
--2023-09-20 06:55:14--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘7.0.8.tar.gz’

7.0.8.tar.gz            [     <=>            ]   2.87M  3.24MB/s    in 0.9s    

2023-09-20 06:55:15 (3.24 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]

cd src && make all
make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
./ line 2: echo: write error: Broken pipe
    CC Makefile.dep
./ line 2: echo: write error: Broken pipe
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov lcov-html Makefile.dep
rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
(cd ../deps && make distclean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
rm -rf *.xo *.so
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
(cd ../tests/modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
rm -f commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
(rm -f .make-*)
echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
echo OPT=-O2 >> .make-settings
echo MALLOC=jemalloc >> .make-settings
echo BUILD_TLS=yes >> .make-settings
echo USE_SYSTEMD= >> .make-settings
echo CFLAGS= >> .make-settings
echo LDFLAGS= >> .make-settings
echo REDIS_CFLAGS= >> .make-settings
echo REDIS_LDFLAGS= >> .make-settings
echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb   -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include -DUSE_OPENSSL  >> .make-settings
echo PREV_FINAL_LDFLAGS=  -g -ggdb -rdynamic  >> .make-settings
(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
(echo "" > .make-cflags)
(echo "" > .make-ldflags)
MAKE hiredis
cd hiredis && make static USE_SSL=1
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic ssl.c
ar rcs libhiredis_ssl.a ssl.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
MAKE linenoise
cd linenoise && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
cc  -Wall -Os -g  -c linenoise.c
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
MAKE lua
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldebug.o ldebug.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lobject.o lobject.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lopcodes.o lopcodes.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lparser.o lparser.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstate.o lstate.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstring.o lstring.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltable.o ltable.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lundump.o lundump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o strbuf.o strbuf.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o fpconv.o fpconv.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lauxlib.o lauxlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lbaselib.o lbaselib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldblib.o ldblib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o liolib.o liolib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmathlib.o lmathlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loslib.o loslib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltablib.o ltablib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstrlib.o lstrlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loadlib.o loadlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cjson.o lua_cjson.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_struct.o lua_struct.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cmsgpack.o lua_cmsgpack.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_bit.o lua_bit.c
ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o	# DLL needs all object files
ranlib liblua.a
cc -o lua  lua.o liblua.a -lm 
cc -o luac  luac.o print.o liblua.a -lm 
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
MAKE hdr_histogram
cd hdr_histogram && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
cc -std=c99 -Wall -Os -g  -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c  hdr_histogram.c 
ar rcs libhdrhistogram.a hdr_histogram.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
MAKE jemalloc
cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" 
checking for xsltproc... false
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether compiler is cray... no
checking whether compiler supports -std=gnu11... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -Wshorten-64-to-32... no
checking whether compiler supports -Wsign-compare... yes
checking whether compiler supports -Wundef... yes
checking whether compiler supports -Wno-format-zero-length... yes
checking whether compiler supports -pipe... yes
checking whether compiler supports -g3... yes
checking how to run the C preprocessor... gcc -E
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether g++ supports C++14 features by default... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -g3... yes
checking whether libstdc++ linkage is compilable... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether byte ordering is bigendian... no
checking size of void *... 8
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of intmax_t... 8
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether pause instruction is compilable... yes
checking number of significant virtual address bits... 48
checking for ar... ar
checking for nm... nm
checking for gawk... no
checking for mawk... mawk
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking whether malloc_usable_size definition can use const argument... no
checking for library containing log... -lm
checking whether __attribute__ syntax is compilable... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether tls_model attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether alloc_size attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(gnu_printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for ld... /usr/bin/ld
checking for autoconf... /usr/bin/autoconf
checking for memalign... yes
checking for valloc... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -funroll-loops... yes
checking configured backtracing method... N/A
checking for sbrk... yes
checking whether utrace(2) is compilable... no
checking whether a program using __builtin_unreachable is compilable... yes
checking whether a program using __builtin_ffsl is compilable... yes
checking whether a program using __builtin_popcountl is compilable... yes
checking LG_PAGE... 12
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for pthread_create in -lpthread... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking for dlsym... yes
checking whether pthread_atfork(3) is compilable... yes
checking whether pthread_setname_np(3) is compilable... yes
checking for library containing clock_gettime... none required
checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
checking whether mach_absolute_time() is compilable... no
checking whether compiler supports -Werror... yes
checking whether syscall(2) is compilable... yes
checking for secure_getenv... yes
checking for sched_getcpu... yes
checking for sched_setaffinity... yes
checking for issetugid... no
checking for _malloc_thread_cleanup... no
checking for _pthread_mutex_init_calloc_cb... no
checking for TLS... yes
checking whether C11 atomics is compilable... no
checking whether GCC __atomic atomics is compilable... yes
checking whether GCC 8-bit __atomic atomics is compilable... yes
checking whether GCC __sync atomics is compilable... yes
checking whether GCC 8-bit __sync atomics is compilable... yes
checking whether Darwin OSAtomic*() is compilable... no
checking whether madvise(2) is compilable... yes
checking whether madvise(..., MADV_FREE) is compilable... yes
checking whether madvise(..., MADV_DONTNEED) is compilable... yes
checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
checking for __builtin_clz... yes
checking whether Darwin os_unfair_lock_*() is compilable... no
checking whether glibc malloc hook is compilable... no
checking whether glibc memalign hook is compilable... no
checking whether pthreads adaptive mutexes is compilable... yes
checking whether compiler supports -D_GNU_SOURCE... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether strerror_r returns char with gnu source is compilable... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating jemalloc.pc
config.status: creating doc/html.xsl
config.status: creating doc/manpages.xsl
config.status: creating doc/jemalloc.xml
config.status: creating include/jemalloc/jemalloc_macros.h
config.status: creating include/jemalloc/jemalloc_protos.h
config.status: creating include/jemalloc/jemalloc_typedefs.h
config.status: creating include/jemalloc/internal/jemalloc_preamble.h
config.status: creating test/
config.status: creating test/include/test/jemalloc_test.h
config.status: creating config.stamp
config.status: creating bin/jemalloc-config
config.status: creating bin/
config.status: creating bin/jeprof
config.status: creating include/jemalloc/jemalloc_defs.h
config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
config.status: creating test/include/test/jemalloc_test_defs.h
config.status: executing include/jemalloc/internal/public_symbols.txt commands
config.status: executing include/jemalloc/internal/private_symbols.awk commands
config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
config.status: executing include/jemalloc/internal/public_namespace.h commands
config.status: executing include/jemalloc/internal/public_unnamespace.h commands
config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
config.status: executing include/jemalloc/jemalloc_rename.h commands
config.status: executing include/jemalloc/jemalloc_mangle.h commands
config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
config.status: executing include/jemalloc/jemalloc.h commands
jemalloc version   : 5.2.1-0-g0
library revision   : 2

CONFIG             : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
CC                 : gcc
CONFIGURE_CFLAGS   : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
SPECIFIED_CFLAGS   : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops 
CXX                : g++
CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
LDFLAGS            : 
DSO_LDFLAGS        : -shared -Wl,-soname,$(@F)
LIBS               : -lm -lstdc++ -pthread
RPATH_EXTRA        : 

XSLTPROC           : false
XSLROOT            : 

PREFIX             : /usr/local
BINDIR             : /usr/local/bin
DATADIR            : /usr/local/share
INCLUDEDIR         : /usr/local/include
LIBDIR             : /usr/local/lib
MANDIR             : /usr/local/share/man

srcroot            : 
abs_srcroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
objroot            : 
abs_objroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/

                   : je_
install_suffix     : 
malloc_conf        : 
documentation      : 1
shared libs        : 1
static libs        : 1
autogen            : 0
debug              : 0
stats              : 1
experimetal_smallocx : 0
prof               : 0
prof-libunwind     : 0
prof-libgcc        : 0
prof-gcc           : 0
fill               : 1
utrace             : 0
xmalloc            : 0
log                : 0
lazy_lock          : 0
cache-oblivious    : 1
cxx                : 1
cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
/bin/sh include/jemalloc/internal/ src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
ar: `u' modifier ignored since `D' is the default (see `U')
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
    CC adlist.o
    CC quicklist.o
    CC ae.o
    CC anet.o
    CC dict.o
    CC server.o
    CC sds.o
    CC zmalloc.o
    CC lzf_c.o
    CC lzf_d.o
    CC pqsort.o
    CC zipmap.o
    CC sha1.o
    CC ziplist.o
    CC release.o
    CC networking.o
    CC util.o
    CC object.o
    CC db.o
    CC replication.o
    CC rdb.o
    CC t_string.o
    CC t_list.o
    CC t_set.o
    CC t_zset.o
    CC t_hash.o
    CC config.o
    CC aof.o
    CC pubsub.o
    CC multi.o
    CC debug.o
    CC sort.o
    CC intset.o
    CC syncio.o
    CC cluster.o
    CC crc16.o
    CC endianconv.o
    CC slowlog.o
    CC eval.o
    CC bio.o
    CC rio.o
    CC rand.o
    CC memtest.o
    CC syscheck.o
    CC crcspeed.o
    CC crc64.o
    CC bitops.o
    CC sentinel.o
    CC notify.o
    CC setproctitle.o
    CC blocked.o
    CC hyperloglog.o
    CC latency.o
    CC sparkline.o
    CC redis-check-rdb.o
    CC redis-check-aof.o
    CC geo.o
    CC lazyfree.o
    CC module.o
    CC evict.o
    CC expire.o
    CC geohash.o
    CC geohash_helper.o
    CC childinfo.o
    CC defrag.o
    CC siphash.o
    CC rax.o
    CC t_stream.o
    CC listpack.o
    CC localtime.o
    CC lolwut.o
    CC lolwut5.o
    CC lolwut6.o
    CC acl.o
    CC tracking.o
    CC connection.o
    CC tls.o
    CC sha256.o
    CC timeout.o
    CC setcpuaffinity.o
    CC monotonic.o
    CC mt19937-64.o
    CC resp_parser.o
    CC call_reply.o
    CC script_lua.o
    CC script.o
    CC functions.o
    CC function_lua.o
    CC commands.o
    LINK redis-server
    INSTALL redis-sentinel
    CC redis-cli.o
    CC redisassert.o
    CC cli_common.o
    LINK redis-cli
    CC redis-benchmark.o
    LINK redis-benchmark
    INSTALL redis-check-rdb
    INSTALL redis-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'

If you see the message Hint: It's a good idea to run 'make test' ;) followed by make[1]: Leaving directory ..., the compilation should have completed successfully.

Step 2: Configure a mock Redis cluster

Setup TLS/SSL certificates. Can skip if encyryption is not needed.

WARNING: The following commands will erase the all contents in the following directories: test_certs, redis-server-1, redis-server-2 and redis-server-3.

!mkdir -p test_certs
!rm -f test_certs/*

with open("test_certs/openssl.conf", "w") as f:
    f.write("""[ redis_server ]
keyUsage = digitalSignature, keyEncipherment

[ hugectr_client ]
keyUsage = digitalSignature, keyEncipherment
nsCertType = client""")
# Create private keys for CA, Redis server and HugeCTR client.
!openssl genrsa -out test_certs/ca-private.pem 4096
!openssl genrsa -out test_certs/redis-private.pem 4096
!openssl genrsa -out test_certs/hugectr-private.pem 4096

# Create public keys for CA, Redis server and HugeCTR client.
#!openssl rsa -pubout -in test_certs/ca-private.pem -out test_certs/ca-public.pem
#!openssl rsa -pubout -in test_certs/redis-private.pem -out test_certs/redis-public.pem
#!openssl rsa -pubout -in test_certs/hugectr-private.pem -out test_certs/hugectr-public.pem

# Form dummy CA.
!openssl req -new -nodes -sha256 -x509 -subj '/O=NVIDIA Merlin/CN=Certificate Authority' -days 365 \
    -key test_certs/ca-private.pem \
    -out test_certs/ca.crt
# Generate certificate for Redis server.
!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=Redis Server" \
    -key test_certs/redis-private.pem | \
        openssl x509 -req -sha256 \
            -CA test_certs/ca.crt \
            -CAkey test_certs/ca-private.pem \
            -CAserial test_certs/redis.ser \
            -CAcreateserial \
            -days 365 \
            -extfile test_certs/openssl.conf -extensions redis_server \
            -out test_certs/redis.crt

# Generate certificate for HugeCTR client.
!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=HugeCTR Redis Client" \
        -key test_certs/hugectr-private.pem | \
        openssl x509 \
            -req -sha256 \
            -CA test_certs/ca.crt \
            -CAkey test_certs/ca-private.pem \
            -CAserial test_certs/hugectr.ser \
            -CAcreateserial \
            -days 365 \
            -extfile test_certs/openssl.conf -extensions hugectr_client \
            -out test_certs/hugectr.crt
Certificate request self-signature ok
subject=O = NVIDIA Merlin, CN = Redis Server
Certificate request self-signature ok
subject=O = NVIDIA Merlin, CN = HugeCTR Redis Client
!mkdir -p redis-server-1 redis-server-2 redis-server-3
!rm -f redis-server-1/* redis-server-2/* redis-server-3/*

!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server

!ln -sf $PWD/test_certs/ca.crt redis-server-1/ca.crt
!ln -sf $PWD/test_certs/ca.crt redis-server-2/ca.crt
!ln -sf $PWD/test_certs/ca.crt redis-server-3/ca.crt

!ln -sf $PWD/test_certs/redis-private.pem redis-server-1/private.pem
!ln -sf $PWD/test_certs/redis-private.pem redis-server-2/private.pem
!ln -sf $PWD/test_certs/redis-private.pem redis-server-3/private.pem

!ln -sf $PWD/test_certs/redis.crt redis-server-1/redis.crt
!ln -sf $PWD/test_certs/redis.crt redis-server-2/redis.crt
!ln -sf $PWD/test_certs/redis.crt redis-server-3/redis.crt
%%writefile redis-server-1/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7000
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-1/redis.conf
%%writefile redis-server-2/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7001
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-2/redis.conf
%%writefile redis-server-3/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7002
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-3/redis.conf

Step 3: Form Redis cluster

WARNING: The following command will shutdown any processes called redis-cluster in the current system!

# Shutdown existing cluster (if any).
!pkill redis-server

# Reset configuration and start 3 Redis servers.
!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf

# Form the cluster.
!redis/src/redis-cli \
    --cluster create \
    --cluster-yes \
    --tls \
    --cacert test_certs/ca.crt \
    --cert test_certs/hugectr.crt \
    --key test_certs/hugectr-private.pem
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: a441806db5506b7600ee8ae794fa01dc31ac83c9
   slots:[0-5460] (5461 slots) master
M: 6fa93392a396aa3c321736234b7eafc86bb1f979
   slots:[5461-10922] (5462 slots) master
M: 8e9cd68cc229fcb568a84d7358011201b4246046
   slots:[10923-16383] (5461 slots) master
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
>>> Performing Cluster Check (using node
M: a441806db5506b7600ee8ae794fa01dc31ac83c9
   slots:[0-5460] (5461 slots) master
M: 8e9cd68cc229fcb568a84d7358011201b4246046
   slots:[10923-16383] (5461 slots) master
M: 6fa93392a396aa3c321736234b7eafc86bb1f979
   slots:[5461-10922] (5462 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Step 4: Run HugeCTR

import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams

slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024


# 1. Configure the HPS hyperparameters.
ps_config = ParameterServerConfig(
       emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
       embedding_vec_size = {'hps_demo': [16, 32]},
       max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
       inference_params_array = [
            model_name = 'hps_demo',
            max_batchsize = batch_size,
            hit_rate_threshold = 1.0,
            dense_model_file = '',
            sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
            deployed_devices = [0],
            use_gpu_embedding_cache = True,
            cache_size_percentage = 0.5,
            i64_input_key = True)
       volatile_db = VolatileDatabaseParams(
            address = '',
            num_partitions = 15,
            num_node_connections = 5,
            enable_tls = True,
            tls_ca_certificate = 'test_certs/ca.crt',
            tls_client_certificate = 'test_certs/hugectr.crt',
            tls_client_key = 'test_certs/hugectr-private.pem',
            tls_server_name_identification = 'redis.localhost',

# 2. Initialize the HPS object.
hps = HPS(ps_config)
print('HPS initialized')

# 3. Load query data.
df = pd.read_parquet('data_parquet/val/gen_0.parquet')
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))

# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
res =[sess.get_outputs()[0].name],
               input_feed={sess.get_inputs()[0].name: dense_input,
               sess.get_inputs()[1].name: embedding1,
               sess.get_inputs()[2].name: embedding2})
pred = res[0].flatten()

# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print('                         HPS demo without embedding                            ')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print(f'Prediction without embedding: {pred.shape} = {pred}')

diff = pred - ground_truth
mse = np.mean(diff * diff)
print(f'MSE between prediction and ground_truth: {mse}')

# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
res_ref =[sess_ref.get_outputs()[0].name],
               input_feed={sess_ref.get_inputs()[0].name: dense_input,
               sess_ref.get_inputs()[1].name: cat_input1,
               sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0].flatten()

print('                           HPS demo with embedding                             ')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')

diff_ref = pred_ref.flatten() - ground_truth
mse_ref = np.mean(diff_ref * diff_ref)
print(f'MSE between prediction and ground_truth: {mse_ref}')
[HCTR][07:00:07.643][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
HPS initialized
====================================================HPS Create====================================================
[HCTR][07:00:07.643][INFO][RK0][main]: Creating RedisCluster backend...
[HCTR][07:00:07.644][INFO][RK0][main]: RedisCluster: Connecting via
[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][07:00:07.667][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][07:00:07.894][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][07:00:07.984][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][07:00:07.984][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][07:00:07.984][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][07:00:07.990][INFO][RK0][main]: Model name: hps_demo
[HCTR][07:00:07.990][INFO][RK0][main]: Max batch size: 1024
[HCTR][07:00:07.990][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][07:00:07.990][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][07:00:07.990][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][07:00:07.990][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][07:00:07.990][INFO][RK0][main]: Use I64 input key: True
[HCTR][07:00:07.990][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][07:00:07.990][INFO][RK0][main]: The size of thread pool: 80
[HCTR][07:00:07.990][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][07:00:07.990][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][07:00:07.990][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][07:00:07.995][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][07:00:07.995][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Awaiting background worker to conclude...
[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Disconnecting...
                         HPS demo without embedding                            
Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
                           HPS demo with embedding                             
Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
2023-09-20 07:00:08.022623188 [W:onnxruntime:, CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.

Step 5: Shutdown Redis cluster

!pkill redis-server