Hierarchical Parameter Server Demo
Overview
In HugeCTR version 3.5, we provide Python APIs for embedding table lookup with HugeCTR Hierarchical Parameter Server (HPS) HPS supports different database backends and GPU embedding caches.
This notebook demonstrates how to use HPS with HugeCTR Python APIs. Without loss of generality, the HPS APIs are utilized together with the ONNX Runtime APIs to create an ensemble inference model, where HPS is responsible for embedding table lookup while the ONNX model takes charge of feed forward of dense neural networks.
Setup
To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.
Data Generation
HugeCTR provides a tool to generate synthetic datasets. The Data Generator is capable of generating datasets of different file formats and different distributions. We will generate one-hot Parquet datasets with power-law distribution for this notebook:
import hugectr
from hugectr.tools import DataGeneratorParams, DataGenerator
data_generator_params = DataGeneratorParams(
format = hugectr.DataReaderType_t.Parquet,
label_dim = 1,
dense_dim = 10,
num_slot = 4,
i64_input_key = True,
nnz_array = [1, 1, 1, 1],
source = "./data_parquet/file_list.txt",
eval_source = "./data_parquet/file_list_test.txt",
slot_size_array = [10000, 10000, 10000, 10000],
check_type = hugectr.Check_t.Non,
dist_type = hugectr.Distribution_t.PowerLaw,
power_law_type = hugectr.PowerLaw_t.Short,
num_files = 16,
eval_num_files = 4,
num_samples_per_file = 40960)
data_generator = DataGenerator(data_generator_params)
data_generator.generate()
[HCTR][06:31:47.413][INFO][RK0][main]: Generate Parquet dataset
[HCTR][06:31:47.413][INFO][RK0][main]: train data folder: ./data_parquet, eval data folder: ./data_parquet, slot_size_array: 10000, 10000, 10000, 10000, nnz array: 1, 1, 1, 1, #files for train: 16, #files for eval: 4, #samples per file: 40960, Use power law distribution: 1, alpha of power law: 1.3
[HCTR][06:31:47.416][INFO][RK0][main]: ./data_parquet exist
[HCTR][06:31:47.423][INFO][RK0][main]: ./data_parquet/train/gen_0.parquet
[HCTR][06:31:50.739][INFO][RK0][main]: ./data_parquet/train/gen_1.parquet
[HCTR][06:31:50.846][INFO][RK0][main]: ./data_parquet/train/gen_2.parquet
[HCTR][06:31:50.929][INFO][RK0][main]: ./data_parquet/train/gen_3.parquet
[HCTR][06:31:51.011][INFO][RK0][main]: ./data_parquet/train/gen_4.parquet
[HCTR][06:31:51.092][INFO][RK0][main]: ./data_parquet/train/gen_5.parquet
[HCTR][06:31:51.171][INFO][RK0][main]: ./data_parquet/train/gen_6.parquet
[HCTR][06:31:51.250][INFO][RK0][main]: ./data_parquet/train/gen_7.parquet
[HCTR][06:31:51.329][INFO][RK0][main]: ./data_parquet/train/gen_8.parquet
[HCTR][06:31:51.407][INFO][RK0][main]: ./data_parquet/train/gen_9.parquet
[HCTR][06:31:51.485][INFO][RK0][main]: ./data_parquet/train/gen_10.parquet
[HCTR][06:31:51.562][INFO][RK0][main]: ./data_parquet/train/gen_11.parquet
[HCTR][06:31:51.638][INFO][RK0][main]: ./data_parquet/train/gen_12.parquet
[HCTR][06:31:51.715][INFO][RK0][main]: ./data_parquet/train/gen_13.parquet
[HCTR][06:31:51.792][INFO][RK0][main]: ./data_parquet/train/gen_14.parquet
[HCTR][06:31:51.868][INFO][RK0][main]: ./data_parquet/train/gen_15.parquet
[HCTR][06:31:51.962][INFO][RK0][main]: ./data_parquet/file_list.txt done!
[HCTR][06:31:51.986][INFO][RK0][main]: ./data_parquet/val/gen_0.parquet
[HCTR][06:31:52.064][INFO][RK0][main]: ./data_parquet/val/gen_1.parquet
[HCTR][06:31:52.142][INFO][RK0][main]: ./data_parquet/val/gen_2.parquet
[HCTR][06:31:52.218][INFO][RK0][main]: ./data_parquet/val/gen_3.parquet
[HCTR][06:31:52.296][INFO][RK0][main]: ./data_parquet/file_list_test.txt done!
Train from Scratch
We can train from scratch by performing the following steps with Python APIs:
Create the solver, reader and optimizer, then initialize the model.
Construct the model graph by adding input, sparse embedding and dense layers in order.
Compile the model and have an overview of the model graph.
Dump the model graph to the JSON file.
Fit the model, save the model weights and optimizer states implicitly.
Dump one batch of evaluation results to files.
%%writefile train.py
import os
import hugectr
from mpi4py import MPI
import numpy as np
solver = hugectr.CreateSolver(model_name = "hps_demo",
max_eval_batches = 1,
batchsize_eval = 1024,
batchsize = 1024,
lr = 0.001,
vvgpu = [[0]],
i64_input_key = True,
repeat_dataset = True,
use_cuda_graph = True)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
source = ["./data_parquet/file_list.txt"],
eval_source = "./data_parquet/file_list_test.txt",
check_type = hugectr.Check_t.Non,
slot_size_array = [10000, 10000, 10000, 10000])
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam)
model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
dense_dim = 10, dense_name = "dense",
data_reader_sparse_param_array =
[hugectr.DataReaderSparseParam("data1", [1, 1], True, 2),
hugectr.DataReaderSparseParam("data2", [1, 1], True, 2)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb = 4,
embedding_vec_size = 16,
combiner = "sum",
sparse_embedding_name = "sparse_embedding1",
bottom_name = "data1",
optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
workspace_size_per_gpu_in_mb = 8,
embedding_vec_size = 32,
combiner = "sum",
sparse_embedding_name = "sparse_embedding2",
bottom_name = "data2",
optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["sparse_embedding1"],
top_names = ["reshape1"],
leading_dim=32))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
bottom_names = ["sparse_embedding2"],
top_names = ["reshape2"],
leading_dim=64))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
bottom_names = ["reshape1", "reshape2", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["concat1"],
top_names = ["fc1"],
num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
bottom_names = ["fc1"],
top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
bottom_names = ["relu1"],
top_names = ["fc2"],
num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
bottom_names = ["fc2", "label"],
top_names = ["loss"]))
model.compile()
model.summary()
model.graph_to_json("hps_demo.json")
model.fit(max_iter = 1100, display = 200, eval_interval = 1000, snapshot = 1000, snapshot_prefix = "hps_demo")
ground_truth = model.check_out_tensor("fc2", hugectr.Tensor_t.Evaluate)
np.save("ground_truth.npy", ground_truth)
Writing train.py
!python3 train.py
HugeCTR Version: 23.8
====================================================Model Init=====================================================
[HCTR][06:32:11.556][INFO][RK0][main]: Initialize model: hps_demo
[HCTR][06:32:11.556][INFO][RK0][main]: Global seed is 2598678435
[HCTR][06:32:11.561][INFO][RK0][main]: Device to NUMA mapping:
[HCTR][06:32:11.642][INFO][RK0][main]: GPU 0 -> node 0
[HCTR][06:32:15.564][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][06:32:15.564][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.0886
[HCTR][06:32:15.564][INFO][RK0][main]: Start all2all warmup
[HCTR][06:32:15.565][INFO][RK0][main]: End all2all warmup
[HCTR][06:32:15.566][INFO][RK0][main]: Using All-reduce algorithm: NCCL
[HCTR][06:32:15.567][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
[HCTR][06:32:15.636][INFO][RK0][main]: eval source ./data_parquet/file_list_test.txt max_row_group_size 40960
[HCTR][06:32:15.808][INFO][RK0][main]: train source ./data_parquet/file_list.txt max_row_group_size 40960
[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for train: 1
[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for eval: 1
[HCTR][06:32:15.937][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
[HCTR][06:32:15.938][DEBUG][RK0][main]: [device 0] allocating 0.0047 GB, available 29.6921
[HCTR][06:32:15.939][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
[HCTR][06:32:15.940][DEBUG][RK0][main]: [device 0] allocating 0.0092 GB, available 29.6824
[HCTR][06:32:15.940][INFO][RK0][main]: Graph analysis to resolve tensor dependency
[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.===================================================Model Compile===================================================
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 init embedding done
[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
[HCTR][06:32:17.206][INFO][RK0][main]: gpu0 init embedding done
[HCTR][06:32:17.207][INFO][RK0][main]: Starting AUC NCCL warm-up
[HCTR][06:32:17.208][INFO][RK0][main]: Warm-up done
===================================================Model Summary===================================================
[HCTR][06:32:17.208][INFO][RK0][main]: Model structure on each GPU
Label Dense Sparse
label dense data1,data2
(1024, 1) (1024, 10)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Layer Type Input Name Output Name Output Shape
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
DistributedSlotSparseEmbeddingHash data1 sparse_embedding1 (1024, 2, 16)
------------------------------------------------------------------------------------------------------------------
DistributedSlotSparseEmbeddingHash data2 sparse_embedding2 (1024, 2, 32)
------------------------------------------------------------------------------------------------------------------
Reshape sparse_embedding1 reshape1 (1024, 32)
------------------------------------------------------------------------------------------------------------------
Reshape sparse_embedding2 reshape2 (1024, 64)
------------------------------------------------------------------------------------------------------------------
Concat reshape1 concat1 (1024, 106)
reshape2
dense
------------------------------------------------------------------------------------------------------------------
InnerProduct concat1 fc1 (1024, 1024)
------------------------------------------------------------------------------------------------------------------
ReLU fc1 relu1 (1024, 1024)
------------------------------------------------------------------------------------------------------------------
InnerProduct relu1 fc2 (1024, 1)
------------------------------------------------------------------------------------------------------------------
BinaryCrossEntropyLoss fc2 loss
label
------------------------------------------------------------------------------------------------------------------
[HCTR][06:32:17.212][INFO][RK0][main]: Save the model graph to hps_demo.json successfully
=====================================================Model Fit=====================================================
[HCTR][06:32:17.213][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1100
[HCTR][06:32:17.213][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation interval: 1000, snapshot interval: 1000
[HCTR][06:32:17.213][INFO][RK0][main]: Dense network trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding2 trainable: True
[HCTR][06:32:17.213][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
[HCTR][06:32:17.213][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
[HCTR][06:32:17.213][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
[HCTR][06:32:17.213][INFO][RK0][main]: Training source file: ./data_parquet/file_list.txt
[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation source file: ./data_parquet/file_list_test.txt
[HCTR][06:32:17.658][INFO][RK0][main]: Iter: 200 Time(200 iters): 0.444961s Loss: 0.693355 lr:0.001
[HCTR][06:32:18.167][INFO][RK0][main]: Iter: 400 Time(200 iters): 0.508793s Loss: 0.694358 lr:0.001
[HCTR][06:32:18.589][INFO][RK0][main]: Iter: 600 Time(200 iters): 0.422282s Loss: 0.695494 lr:0.001
[HCTR][06:32:18.764][INFO][RK0][main]: Iter: 800 Time(200 iters): 0.175263s Loss: 0.691037 lr:0.001
[HCTR][06:32:18.939][INFO][RK0][main]: Iter: 1000 Time(200 iters): 0.174492s Loss: 0.688767 lr:0.001
[HCTR][06:32:18.940][INFO][RK0][main]: Evaluation, AUC: 0.503806
[HCTR][06:32:18.940][INFO][RK0][main]: Eval Time for 1 iters: 0.000913s
[HCTR][06:32:18.941][INFO][RK0][main]: Rank0: Write hash table to file
[HCTR][06:32:19.024][INFO][RK0][main]: Rank0: Write hash table to file
[HCTR][06:32:19.092][INFO][RK0][main]: Dumping sparse weights to files, successful
[HCTR][06:32:19.093][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.123][INFO][RK0][main]: Done
[HCTR][06:32:19.123][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.148][INFO][RK0][main]: Done
[HCTR][06:32:19.150][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.203][INFO][RK0][main]: Done
[HCTR][06:32:19.203][INFO][RK0][main]: Rank0: Write optimzer state to file
[HCTR][06:32:19.252][INFO][RK0][main]: Done
[HCTR][06:32:19.252][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
[HCTR][06:32:19.262][INFO][RK0][main]: Dumping dense weights to file, successful
[HCTR][06:32:19.279][INFO][RK0][main]: Dumping dense optimizer states to file, successful
[HCTR][06:32:19.368][INFO][RK0][main]: Finish 1100 iterations with batchsize: 1024 in 2.16s.
Convert HugeCTR to ONNX
We will convert the saved HugeCTR models to ONNX using the HugeCTR to ONNX Converter. For more information about the converter, refer to the README in the onnx_converter directory of the repository.
For the sake of double checking the correctness, we will investigate both cases of conversion depending on whether or not to convert the sparse embedding models.
import hugectr2onnx
hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_with_embedding.onnx",
graph_config = "hps_demo.json",
dense_model = "hps_demo_dense_1000.model",
convert_embedding = True,
sparse_models = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"])
hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_without_embedding.onnx",
graph_config = "hps_demo.json",
dense_model = "hps_demo_dense_1000.model",
convert_embedding = False)
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
[HUGECTR2ONNX][INFO]: The model is checked!
[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_with_embedding.onnx
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
Skip sparse embedding layers in converted ONNX model
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
Skip sparse embedding layers in converted ONNX model
[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
[HUGECTR2ONNX][INFO]: The model is checked!
[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_without_embedding.onnx
1. Inference with HPS & ONNX
We will make inference by performing the following steps with Python APIs:
Configure the HPS hyperparameters. Please refer to hps configuration for detailed configurations.
Initialize the HPS object, which is responsible for embedding table lookup.
Loading the Parquet data.
Make inference with the HPS object and the ONNX inference session of
hps_demo_without_embedding.onnx
.Check the correctness by comparing with dumped evaluation results.
Make inference with the ONNX inference session of
hps_demo_with_embedding.onnx
(double check).
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams
import pandas as pd
import numpy as np
import onnxruntime as ort
slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024
# 1. Configure the HPS hyperparameters
ps_config = ParameterServerConfig(
emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
embedding_vec_size = {"hps_demo": [16, 32]},
max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
inference_params_array = [
InferenceParams(
model_name = "hps_demo",
max_batchsize = batch_size,
hit_rate_threshold = 1.0,
dense_model_file = "",
sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
deployed_devices = [0],
use_gpu_embedding_cache = True,
cache_size_percentage = 0.5,
i64_input_key = True)
])
# 2. Initialize the HPS object
hps = HPS(ps_config)
# 3. Loading the Parquet data.
df = pd.read_parquet("data_parquet/val/gen_0.parquet")
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
res = sess.run(output_names=[sess.get_outputs()[0].name],
input_feed={sess.get_inputs()[0].name: dense_input,
sess.get_inputs()[1].name: embedding1,
sess.get_inputs()[2].name: embedding2})
pred = res[0]
# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print("ground_truth: ", ground_truth)
diff = pred.flatten()-ground_truth
mse = np.mean(diff*diff)
print("pred: ", pred)
print("mse between pred and ground_truth: ", mse)
# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
input_feed={sess_ref.get_inputs()[0].name: dense_input,
sess_ref.get_inputs()[1].name: cat_input1,
sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0]
diff_ref = pred_ref.flatten()-ground_truth
mse_ref = np.mean(diff_ref*diff_ref)
print("pred_ref: ", pred_ref)
print("mse between pred_ref and ground_truth: ", mse_ref)
[HCTR][06:32:40.791][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:32:40.791][INFO][RK0][main]: Creating HashMap CPU database backend...
[HCTR][06:32:40.791][DEBUG][RK0][main]: Created blank database backend in local memory!
[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:32:40.791][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][06:32:41.123][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (HashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:32:41.431][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (HashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][06:32:41.431][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:32:41.431][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:32:41.437][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:32:41.437][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:32:41.437][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:32:41.437][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:32:41.437][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:32:41.437][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:32:41.437][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:32:41.437][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:32:41.437][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:32:41.437][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:32:41.437][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:32:41.437][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:32:41.453][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:32:41.453][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
ground_truth: [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
pred: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
mse between pred and ground_truth: 2.3887142e-15
pred_ref: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
mse between pred_ref and ground_truth: 2.3887142e-15
2023-09-20 06:32:41.566238532 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
2. Lookup the Embedding Vector from DLPack
We also provide a lookup_fromdlpack
interface that could query embedding keys on the CPU
and return the embedding vectors on the GPU/CPU
.
Suppose you have created a Pytorch/Tensorflow tensor that stores the embedded keys.
Convert the embedding key tensor to DLPack capsule through the corresponding platform’s
to_dlpack
function.Creates an empty tensor as a buffer to store embedding vectors.
Convert a buffer tensor to DLPack capsule.
Lookup the embedding vector of the corresponding embedding key directly through
lookup_fromdlpack
interface, and output it to the embedding vector buffer tensorIf the output capsule is allocated on the GPU, then a
device_id
needs to be specified inlookup_fromdlpack
interface for corresponding embedding cache. If not specified, the default value is device 0
Note: Please make sure that tensorflow or pytorch have been installed correctly in the merlin-hugectr
container:
pip install tensorflow
pip install torch
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)
# 1. Look up from dlpack for Pytorch tensor on CPU
print(" Look up from dlpack for Pytorch tensor")
import torch.utils.dlpack
import os
print("************Look up from pytorch dlpack on CPU")
device = torch.device("cpu")
key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
out = torch.empty((1,cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=device)
key_capsule = torch.utils.dlpack.to_dlpack(key)
print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the keys: {}".format(key.device, key))
out_capsule = torch.utils.dlpack.to_dlpack(out)
# Lookup the embedding vectors from dlpack
hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
out_put = torch.utils.dlpack.from_dlpack(out_capsule)
print("[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n".format(out_put.device, out_put))
diff = out_put-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
if diff.mean() > 1e-4:
raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
sys.exit(1)
else:
print("Pytorch dlpack on cpu results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))
# 2. Look up from dlpack for Pytorch tensor on GPU
print("************Look up from pytorch dlpack on GPU")
cuda_device = torch.device("cuda:0" if torch.cuda.is_available else "cpu")
key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
key_capsule = torch.utils.dlpack.to_dlpack(key)
out = torch.empty((cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=cuda_device)
out_capsule = torch.utils.dlpack.to_dlpack(out)
hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
out_put = torch.utils.dlpack.from_dlpack(out_capsule)
print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n\n".format(out_put.device, out_put))
diff = out_put.cpu()-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
if diff.mean() > 1e-3:
raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
sys.exit(1)
else:
print("Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))
Look up from dlpack for Pytorch tensor
************Look up from pytorch dlpack on CPU
The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the keys: tensor([ 85, 10028, 0, ..., 10004, 10, 10000])
[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the vectors: tensor([[-0.0307, 0.0264, -0.0294, ..., 0.0151, -0.0281, 0.0088]])
Pytorch dlpack on cpu results are consistent with native HPS lookup api, mse: 0.0
************Look up from pytorch dlpack on GPU
The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cuda:0, the vectors: tensor([-0.0307, 0.0264, -0.0294, ..., 0.0151, -0.0281, 0.0088],
device='cuda:0')
Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0
# 3. Look up from dlpack for tensorflow tensor on CPU
print("Look up from dlpack for Tensorflow tensor")
from tensorflow.python.dlpack import dlpack
import tensorflow as tf
from tensorflow.python.eager import context
from tensorflow.python.framework import dtypes
print("***************Look up from tensorflow dlpack on CPU**********")
with tf.device('/CPU:0'):
key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
out_tensor = tf.zeros([1, cat_input2.flatten().shape[0]*32],dtype=tf.float32)
print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the keys: {}".format(key_tensor.device, key_tensor))
key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
out = tf.experimental.dlpack.from_dlpack(out_dlcapsule)
print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the vectors: {}\n".format(out.device, out))
diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
mse = tf.reduce_mean(diff)
if mse> 1e-3:
raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
sys.exit(1)
else:
print("tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: {}".format(mse))
# 4. Look up from dlpack for tensorflow tensor on GPU
print("***************Look up from tensorflow dlpack on GPU**********")
with tf.device('/GPU:0'):
key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
out_tensor = tf.zeros([cat_input2.flatten().shape[0]*32],dtype=tf.float32)
key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
out= tf.experimental.dlpack.from_dlpack(out_dlcapsule)
print("[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: {}, the vectors: {}\n".format(out.device, out))
diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
mse = tf.reduce_mean(diff)
if mse> 1e-3:
raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
sys.exit(1)
else:
print("tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(mse))
Look up from dlpack for Tensorflow tensor
2023-09-20 06:34:21.729218: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
***************Look up from tensorflow dlpack on CPU**********
2023-09-20 06:34:44.168630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30048 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2023-09-20 06:34:44.170043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30184 MB memory: -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2023-09-20 06:34:44.171618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30184 MB memory: -> device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0
2023-09-20 06:34:44.173095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 30184 MB memory: -> device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0
2023-09-20 06:34:44.174795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 30184 MB memory: -> device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0
2023-09-20 06:34:44.176299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 30184 MB memory: -> device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
2023-09-20 06:34:44.177782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 30184 MB memory: -> device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
2023-09-20 06:34:44.179411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 30184 MB memory: -> device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0
The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the keys: [20005 30047 20004 ... 30001 20037 30001]
The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the vectors: [[ 0.02182689 0.01806355 0.01985828 ... 0.0136845 -0.01738386
-0.00323257]]
tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: 0.0
***************Look up from tensorflow dlpack on GPU**********
[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: /job:localhost/replica:0/task:0/device:GPU:0, the vectors: [ 0.02182689 0.01806355 0.01985828 ... 0.0136845 -0.01738386
-0.00323257]
tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0
3. Multi-process inference
It is possible to share the a hashmap database between multiple processes. The following example launches 3 processes which achieve this using the operating system’s shared memory, which is located at /dev/shm
in most unix systems. In this example, we separate processes into a primary and multiple secondary processes, and only the primary process initializes the shared memory database. The secondary processes wait until the shared memory has been fully initialized. However, note that inter-process database access is guaranteed to be thread-safe. Therefore, it is also possible to implement more complicated initialization/refresh mechanisms for your use-case.
%%writefile multi_process_hps.py
import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024
def create_hps(name, initialized, device_id, num_max_processes):
print(f'subprocess:{name}({os.getpid()})launch...')
# 1. Let secondary processes wait until shared memory is initialized.
while name != 'primary' and initialized.value == 0:
print(f'Subprocess {name} awaiting SHM initialization...')
time.sleep(1)
# 2. Configure the HPS hyperparameters
ps_config = ParameterServerConfig(
emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
embedding_vec_size = {"hps_demo": [16, 32]},
max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
inference_params_array = [
InferenceParams(
model_name = "hps_demo",
max_batchsize = batch_size,
hit_rate_threshold = 1.0,
dense_model_file = "",
sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
device_id=device_id,
deployed_devices = [device_id],
use_gpu_embedding_cache = True,
cache_size_percentage = 0.5,
i64_input_key = True)
],
volatile_db = VolatileDatabaseParams(
DatabaseType_t.multi_process_hash_map, # Use /dev/shm instead of normal memory for storage.
# Skips initializing model. If we run HPS in multiple processes, only one needs to initialize.
initialize_after_startup = name == 'primary',
))
# 3. Initialize the HPS object
hps = HPS(ps_config)
initialized.value += 1
print(f'Subprocess {name} initialized')
# 4. In (1) the secondary processes wait until the primary process has completed initializing
# the shared memory. If the last process disconnects, the shared memory is erased.
# Therefore, if threads that currently have attached to the shared memory manage to complete
# their program before another process has attached, the contents of the shared memory are
# lost and the new process will instead construct an empty shared memory. To avoid this
# situation, we have multiple options.
#
# a) Setting `shared_memory_auto_remove = False` in the `VolatileDatabaseParams`
# configuration [default: True]. This will prevent the deletion of the shared memory when
# the last process disconnects. In other words, revoking this flag allows you to preserve
# and use the state of a shared memory across multiple program restarts. However, while
# desirable in some situations, this is not the behavior we need here, because this
# notebook cell should be allowed to be executed repeatedly without relying on risidual
# state.
#
# b) Another approach is to ensure that the all other processes that should attach have
# attached. Here we achieve this by simply monitoring the `initialized` cross process
# counter variable that we used in (1). Once it hits `num_max_processes` we can be sure
# that each subprocess has properly connected.
while initialized.value != num_max_processes:
print(f'Subprocess {name} await other processes...')
time.sleep(1)
# 5. Load query data.
df = pd.read_parquet("data_parquet/val/gen_0.parquet")
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
# 6. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0,device_id).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1,device_id).reshape(batch_size, 2, 32)
sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
res = sess.run(output_names=[sess.get_outputs()[0].name],
input_feed={sess.get_inputs()[0].name: dense_input,
sess.get_inputs()[1].name: embedding1,
sess.get_inputs()[2].name: embedding2})
pred = res[0]
# 7. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print(f'Subprocess {name}; ground_truth: {ground_truth}')
diff = pred.flatten()-ground_truth
mse = np.mean(diff*diff)
print(f'Subprocess {name}; pred: {pred}')
print(f'Subprocess {name}; mse between pred and ground_truth: {mse}')
# 8. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
input_feed={sess_ref.get_inputs()[0].name: dense_input,
sess_ref.get_inputs()[1].name: cat_input1,
sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0]
diff_ref = pred_ref.flatten()-ground_truth
mse_ref = np.mean(diff_ref*diff_ref)
print(f'Subprocess {name}; pred_ref: {pred_ref}')
print(f'Subprocess {name}; mse between pred_ref and ground_truth: {mse_ref}')
print(f'Subprocess {name} exiting...')
if __name__ == '__main__':
# Destroy shared memory.
try:
os.remove('/dev/shm/hctr_mp_hash_map_database')
except:
pass
initialized = mp.Value('i', 0)
# Create sub processes.
processes = [
mp.Process(target=create_hps, args=('primary', initialized, 0, 3)),
mp.Process(target=create_hps, args=('secondary', initialized, 1, 3)),
mp.Process(target=create_hps, args=('secondary', initialized, 2, 3)),
]
for p in processes:
p.start()
# Go to sleep until subprocesses are initialized.
while initialized.value < len(processes):
print(f'Main process; awaiting subprocess initialization... So far {initialized.value} initialized...')
time.sleep(1)
# Wait for subprocesses to exit.
for i, p in enumerate(processes):
print(f'Main process; awaiting subprocess {i} to exit...')
p.join()
print(f'Main process; exiting...')
Writing multi_process_hps.py
!python3 multi_process_hps.py
subprocess:primary(1394)launch...
[HCTR][06:48:37.272][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:37.272][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:37.272][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
subprocess:secondary(1396)launch...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
subprocess:secondary(1397)launch...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:37.772][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 269706559488 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 17179868672 bytes; other processes connected = 0
[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:37.773][DEBUG][RK0][main]: Created raw model loader in local memory!
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:38.313][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (MultiProcessHashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:48:38.947][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (MultiProcessHashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
Subprocess secondary awaiting SHM initialization...
Main process; awaiting subprocess initialization... So far 0 initialized...
Subprocess secondary awaiting SHM initialization...
[HCTR][06:48:41.289][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:41.289][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:48:41.295][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:41.295][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:41.295][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:41.295][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:41.295][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:41.295][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:41.295][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:41.295][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:41.295][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:41.295][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:41.295][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:41.295][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:48:41.311][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:41.311][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
Subprocess primary initialized
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
[HCTR][06:48:42.279][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:42.280][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:42.281][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
[HCTR][06:48:42.281][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:48:42.282][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
[HCTR][06:48:42.282][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
Subprocess primary await other processes...
[HCTR][06:48:42.781][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:42.781][DEBUG][RK0][main]: Created raw model loader in local memory!
Main process; awaiting subprocess initialization... So far 1 initialized...
[HCTR][06:48:43.281][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:48:43.281][DEBUG][RK0][main]: Created raw model loader in local memory!
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
[HCTR][06:48:45.440][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:45.441][INFO][RK0][main]: Creating embedding cache in device 1.
[HCTR][06:48:45.463][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:45.463][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:45.463][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:45.463][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:45.463][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:45.463][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:45.463][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:45.463][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:45.463][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:45.463][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:45.463][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:45.463][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:48:45.706][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:48:45.706][INFO][RK0][main]: Creating embedding cache in device 2.
[HCTR][06:48:45.711][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:48:45.711][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:48:45.711][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:48:45.711][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:48:45.711][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:48:45.711][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:48:45.711][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:48:45.711][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:48:45.711][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:48:45.711][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:48:45.711][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:48:45.711][INFO][RK0][main]: The refresh percentage : 0.000000
Main process; awaiting subprocess initialization... So far 1 initialized...
Subprocess primary await other processes...
[HCTR][06:48:46.699][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:46.699][INFO][RK0][main]: Creating lookup session for hps_demo on device: 1
Subprocess secondary initialized
Subprocess secondary await other processes...
[HCTR][06:48:46.764][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:48:46.764][INFO][RK0][main]: Creating lookup session for hps_demo on device: 2
Subprocess secondary initialized
2023-09-20 06:48:46.842594773 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess secondary; ground_truth: [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
Subprocess secondary; pred: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess secondary; pred_ref: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess secondary exiting...
[HCTR][06:48:46.900][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
Main process; awaiting subprocess 0 to exit...
2023-09-20 06:48:47.497305659 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess primary; ground_truth: [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
Subprocess primary; pred: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess primary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess primary; pred_ref: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess primary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess primary exiting...
[HCTR][06:48:47.568][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
2023-09-20 06:48:48.101124718 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Subprocess secondary; ground_truth: [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
Subprocess secondary; pred: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
Subprocess secondary; pred_ref: [[0.48954916]
[0.50902206]
[0.38192907]
...
[0.52649266]
[0.5065045 ]
[0.4792769 ]]
Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
Subprocess secondary exiting...
[HCTR][06:48:48.176][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
Main process; awaiting subprocess 1 to exit...
[HCTR][06:48:48.687][INFO][RK0][main]: Detached last process from shared memory 'hctr_mp_hash_map_database'. Auto remove in progress...
Main process; awaiting subprocess 2 to exit...
Main process; exiting...
4. Redis Cluster deployment (without TLS/SSL)
HugeCTR can use Redis clusters as backing storage. In the following steps we show how to setup a mock Redis / HugeCTR deployment in a single machine. We assume that you have started this notebook in a HugeCTR docker container.
Step 1: Get + build Redis
!rm -f 7.0.8.tar.gz && wget https://github.com/redis/redis/archive/7.0.8.tar.gz
!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
!cd redis && make
--2023-09-20 06:49:01-- https://github.com/redis/redis/archive/7.0.8.tar.gz
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8 [following]
--2023-09-20 06:49:01-- https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8
Resolving codeload.github.com (codeload.github.com)... 192.30.255.120
Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘7.0.8.tar.gz’
7.0.8.tar.gz [ <=> ] 2.87M 5.50MB/s in 0.5s
2023-09-20 06:49:02 (5.50 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]
cd src && make all
make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
CC Makefile.dep
./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep
rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
(cd ../deps && make distclean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
rm -rf *.xo *.so
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
(cd ../tests/modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
rm -f commandfilter.so basics.so testrdb.so fork.so infotest.so propagate.so misc.so hooks.so blockonkeys.so blockonbackground.so scan.so datatype.so datatype2.so auth.so keyspace_events.so blockedclient.so getkeys.so getchannels.so test_lazyfree.so timer.so defragtest.so keyspecs.so hash.so zset.so stream.so mallocsize.so aclcheck.so list.so subcommands.so reply.so cmdintrospection.so eventloop.so moduleconfigs.so moduleconfigstwo.so publish.so usercall.so commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
(rm -f .make-*)
echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
echo OPT=-O2 >> .make-settings
echo MALLOC=jemalloc >> .make-settings
echo BUILD_TLS= >> .make-settings
echo USE_SYSTEMD= >> .make-settings
echo CFLAGS= >> .make-settings
echo LDFLAGS= >> .make-settings
echo REDIS_CFLAGS= >> .make-settings
echo REDIS_LDFLAGS= >> .make-settings
echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include >> .make-settings
echo PREV_FINAL_LDFLAGS= -g -ggdb -rdynamic >> .make-settings
(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
(echo "" > .make-cflags)
(echo "" > .make-ldflags)
MAKE hiredis
cd hiredis && make static
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
cc -std=c99 -c -O3 -fPIC -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
MAKE linenoise
cd linenoise && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
cc -Wall -Os -g -c linenoise.c
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
MAKE lua
cd lua/src && make all CFLAGS="-Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 " MYLDFLAGS="" AR="ar rc"
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lapi.o lapi.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lcode.o lcode.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldebug.o ldebug.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldo.o ldo.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldump.o ldump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lfunc.o lfunc.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lgc.o lgc.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o llex.o llex.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lmem.o lmem.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lobject.o lobject.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lopcodes.o lopcodes.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lparser.o lparser.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstate.o lstate.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstring.o lstring.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltable.o ltable.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltm.o ltm.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lundump.o lundump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lvm.o lvm.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lzio.o lzio.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o strbuf.o strbuf.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o fpconv.o fpconv.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lauxlib.o lauxlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lbaselib.o lbaselib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldblib.o ldblib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o liolib.o liolib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lmathlib.o lmathlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o loslib.o loslib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltablib.o ltablib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstrlib.o lstrlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o loadlib.o loadlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o linit.o linit.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_cjson.o lua_cjson.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_struct.o lua_struct.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_cmsgpack.o lua_cmsgpack.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_bit.o lua_bit.c
ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o # DLL needs all object files
ranlib liblua.a
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua.o lua.c
cc -o lua lua.o liblua.a -lm
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o luac.o luac.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o print.o print.c
cc -o luac luac.o print.o liblua.a -lm
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
MAKE hdr_histogram
cd hdr_histogram && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
cc -std=c99 -Wall -Os -g -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c hdr_histogram.c
ar rcs libhdrhistogram.a hdr_histogram.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
MAKE jemalloc
cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS=""
checking for xsltproc... false
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether compiler is cray... no
checking whether compiler supports -std=gnu11... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -Wshorten-64-to-32... no
checking whether compiler supports -Wsign-compare... yes
checking whether compiler supports -Wundef... yes
checking whether compiler supports -Wno-format-zero-length... yes
checking whether compiler supports -pipe... yes
checking whether compiler supports -g3... yes
checking how to run the C preprocessor... gcc -E
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether g++ supports C++14 features by default... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -g3... yes
checking whether libstdc++ linkage is compilable... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether byte ordering is bigendian... no
checking size of void *... 8
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of intmax_t... 8
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether pause instruction is compilable... yes
checking number of significant virtual address bits... 48
checking for ar... ar
checking for nm... nm
checking for gawk... no
checking for mawk... mawk
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking whether malloc_usable_size definition can use const argument... no
checking for library containing log... -lm
checking whether __attribute__ syntax is compilable... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether tls_model attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether alloc_size attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(gnu_printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for ld... /usr/bin/ld
checking for autoconf... /usr/bin/autoconf
checking for memalign... yes
checking for valloc... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -funroll-loops... yes
checking configured backtracing method... N/A
checking for sbrk... yes
checking whether utrace(2) is compilable... no
checking whether a program using __builtin_unreachable is compilable... yes
checking whether a program using __builtin_ffsl is compilable... yes
checking whether a program using __builtin_popcountl is compilable... yes
checking LG_PAGE... 12
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for pthread_create in -lpthread... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking for dlsym... yes
checking whether pthread_atfork(3) is compilable... yes
checking whether pthread_setname_np(3) is compilable... yes
checking for library containing clock_gettime... none required
checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
checking whether mach_absolute_time() is compilable... no
checking whether compiler supports -Werror... yes
checking whether syscall(2) is compilable... yes
checking for secure_getenv... yes
checking for sched_getcpu... yes
checking for sched_setaffinity... yes
checking for issetugid... no
checking for _malloc_thread_cleanup... no
checking for _pthread_mutex_init_calloc_cb... no
checking for TLS... yes
checking whether C11 atomics is compilable... no
checking whether GCC __atomic atomics is compilable... yes
checking whether GCC 8-bit __atomic atomics is compilable... yes
checking whether GCC __sync atomics is compilable... yes
checking whether GCC 8-bit __sync atomics is compilable... yes
checking whether Darwin OSAtomic*() is compilable... no
checking whether madvise(2) is compilable... yes
checking whether madvise(..., MADV_FREE) is compilable... yes
checking whether madvise(..., MADV_DONTNEED) is compilable... yes
checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
checking for __builtin_clz... yes
checking whether Darwin os_unfair_lock_*() is compilable... no
checking whether glibc malloc hook is compilable... no
checking whether glibc memalign hook is compilable... no
checking whether pthreads adaptive mutexes is compilable... yes
checking whether compiler supports -D_GNU_SOURCE... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether strerror_r returns char with gnu source is compilable... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating jemalloc.pc
config.status: creating doc/html.xsl
config.status: creating doc/manpages.xsl
config.status: creating doc/jemalloc.xml
config.status: creating include/jemalloc/jemalloc_macros.h
config.status: creating include/jemalloc/jemalloc_protos.h
config.status: creating include/jemalloc/jemalloc_typedefs.h
config.status: creating include/jemalloc/internal/jemalloc_preamble.h
config.status: creating test/test.sh
config.status: creating test/include/test/jemalloc_test.h
config.status: creating config.stamp
config.status: creating bin/jemalloc-config
config.status: creating bin/jemalloc.sh
config.status: creating bin/jeprof
config.status: creating include/jemalloc/jemalloc_defs.h
config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
config.status: creating test/include/test/jemalloc_test_defs.h
config.status: executing include/jemalloc/internal/public_symbols.txt commands
config.status: executing include/jemalloc/internal/private_symbols.awk commands
config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
config.status: executing include/jemalloc/internal/public_namespace.h commands
config.status: executing include/jemalloc/internal/public_unnamespace.h commands
config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
config.status: executing include/jemalloc/jemalloc_rename.h commands
config.status: executing include/jemalloc/jemalloc_mangle.h commands
config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
config.status: executing include/jemalloc/jemalloc.h commands
===============================================================================
jemalloc version : 5.2.1-0-g0
library revision : 2
CONFIG : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
CC : gcc
CONFIGURE_CFLAGS : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
SPECIFIED_CFLAGS : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops
EXTRA_CFLAGS :
CPPFLAGS : -D_GNU_SOURCE -D_REENTRANT
CXX : g++
CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
SPECIFIED_CXXFLAGS :
EXTRA_CXXFLAGS :
LDFLAGS :
EXTRA_LDFLAGS :
DSO_LDFLAGS : -shared -Wl,-soname,$(@F)
LIBS : -lm -lstdc++ -pthread
RPATH_EXTRA :
XSLTPROC : false
XSLROOT :
PREFIX : /usr/local
BINDIR : /usr/local/bin
DATADIR : /usr/local/share
INCLUDEDIR : /usr/local/include
LIBDIR : /usr/local/lib
MANDIR : /usr/local/share/man
srcroot :
abs_srcroot : /hugectr/notebooks/tmr/redis/deps/jemalloc/
objroot :
abs_objroot : /hugectr/notebooks/tmr/redis/deps/jemalloc/
JEMALLOC_PREFIX : je_
JEMALLOC_PRIVATE_NAMESPACE
: je_
install_suffix :
malloc_conf :
documentation : 1
shared libs : 1
static libs : 1
autogen : 0
debug : 0
stats : 1
experimetal_smallocx : 0
prof : 0
prof-libunwind : 0
prof-libgcc : 0
prof-gcc : 0
fill : 1
utrace : 0
xmalloc : 0
log : 0
lazy_lock : 0
cache-oblivious : 1
cxx : 1
===============================================================================
cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
/bin/sh include/jemalloc/internal/private_namespace.sh src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
ar: `u' modifier ignored since `D' is the default (see `U')
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
CC adlist.o
CC quicklist.o
CC ae.o
CC anet.o
CC dict.o
CC server.o
CC sds.o
CC zmalloc.o
CC lzf_c.o
CC lzf_d.o
CC pqsort.o
CC zipmap.o
CC sha1.o
CC ziplist.o
CC release.o
CC networking.o
CC util.o
CC object.o
CC db.o
CC replication.o
CC rdb.o
CC t_string.o
CC t_list.o
CC t_set.o
CC t_zset.o
CC t_hash.o
CC config.o
CC aof.o
CC pubsub.o
CC multi.o
CC debug.o
CC sort.o
CC intset.o
CC syncio.o
CC cluster.o
CC crc16.o
CC endianconv.o
CC slowlog.o
CC eval.o
CC bio.o
CC rio.o
CC rand.o
CC memtest.o
CC syscheck.o
CC crcspeed.o
CC crc64.o
CC bitops.o
CC sentinel.o
CC notify.o
CC setproctitle.o
CC blocked.o
CC hyperloglog.o
CC latency.o
CC sparkline.o
CC redis-check-rdb.o
CC redis-check-aof.o
CC geo.o
CC lazyfree.o
CC module.o
CC evict.o
CC expire.o
CC geohash.o
CC geohash_helper.o
CC childinfo.o
CC defrag.o
CC siphash.o
CC rax.o
CC t_stream.o
CC listpack.o
CC localtime.o
CC lolwut.o
CC lolwut5.o
CC lolwut6.o
CC acl.o
CC tracking.o
CC connection.o
CC tls.o
CC sha256.o
CC timeout.o
CC setcpuaffinity.o
CC monotonic.o
CC mt19937-64.o
CC resp_parser.o
CC call_reply.o
CC script_lua.o
CC script.o
CC functions.o
CC function_lua.o
CC commands.o
LINK redis-server
INSTALL redis-sentinel
CC redis-cli.o
CC redisassert.o
CC cli_common.o
LINK redis-cli
CC redis-benchmark.o
LINK redis-benchmark
INSTALL redis-check-rdb
INSTALL redis-check-aof
Hint: It's a good idea to run 'make test' ;)
make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
If you see the message Hint: It's a good idea to run 'make test' ;)
followed by make[1]: Leaving directory ...
, the compilation should have completed successfully.
Step 2: Configure a mock Redis cluster
WARNING: The following commands will erase the all contents in the following directories: redis-server-1
, redis-server-2
and redis-server-3
.
!mkdir -p redis-server-1 redis-server-2 redis-server-3
!rm -f redis-server-1/* redis-server-2/* redis-server-3/*
!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server
%%writefile redis-server-1/redis.conf
daemonize yes
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-1/redis.conf
%%writefile redis-server-2/redis.conf
daemonize yes
port 7001
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-2/redis.conf
%%writefile redis-server-3/redis.conf
daemonize yes
port 7002
cluster-enabled yes
cluster-config-file nodes.conf
appendonly no
save ""
Writing redis-server-3/redis.conf
Step 3: Form Redis cluster
WARNING: The following command will shutdown any processes called redis-cluster
in the current system!
# Shutdown existing cluster (if any).
!pkill redis-server
# Reset configuration and start 3 Redis servers.
!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf
# Form the cluster.
!redis/src/redis-cli \
--cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
--cluster-yes
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: fa9bb82124685a6438a696cc1562693ccc815ff0 127.0.0.1:7000
slots:[0-5460] (5461 slots) master
M: c6d7ad6353bf568d17a147e65b8198ded9d65717 127.0.0.1:7001
slots:[5461-10922] (5462 slots) master
M: e26ae6cfbeea8a1e6367444445364d963ae17436 127.0.0.1:7002
slots:[10923-16383] (5461 slots) master
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: fa9bb82124685a6438a696cc1562693ccc815ff0 127.0.0.1:7000
slots:[0-5460] (5461 slots) master
M: e26ae6cfbeea8a1e6367444445364d963ae17436 127.0.0.1:7002
slots:[10923-16383] (5461 slots) master
M: c6d7ad6353bf568d17a147e65b8198ded9d65717 127.0.0.1:7001
slots:[5461-10922] (5462 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Step 4: Run HugeCTR
import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024
print('Launching...')
# 1. Configure the HPS hyperparameters.
ps_config = ParameterServerConfig(
emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
embedding_vec_size = {'hps_demo': [16, 32]},
max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
inference_params_array = [
InferenceParams(
model_name = 'hps_demo',
max_batchsize = batch_size,
hit_rate_threshold = 1.0,
dense_model_file = '',
sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
deployed_devices = [0],
use_gpu_embedding_cache = True,
cache_size_percentage = 0.5,
i64_input_key = True)
],
volatile_db = VolatileDatabaseParams(
DatabaseType_t.redis_cluster,
address = '127.0.0.1:7000',
num_partitions = 15,
num_node_connections = 5,
))
# 2. Initialize the HPS object.
hps = HPS(ps_config)
print('HPS initialized')
# 3. Load query data.
df = pd.read_parquet('data_parquet/val/gen_0.parquet')
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
res = sess.run(output_names=[sess.get_outputs()[0].name],
input_feed={sess.get_inputs()[0].name: dense_input,
sess.get_inputs()[1].name: embedding1,
sess.get_inputs()[2].name: embedding2})
pred = res[0].flatten()
# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print('-------------------------------------------------------------------------------')
print(' HPS demo without embedding ')
print('-------------------------------------------------------------------------------')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print('-------------------------------------------------------------------------------')
print(f'Prediction without embedding: {pred.shape} = {pred}')
diff = pred - ground_truth
mse = np.mean(diff * diff)
print(f'MSE between prediction and ground_truth: {mse}')
# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
input_feed={sess_ref.get_inputs()[0].name: dense_input,
sess_ref.get_inputs()[1].name: cat_input1,
sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0].flatten()
print('-------------------------------------------------------------------------------')
print(' HPS demo with embedding ')
print('-------------------------------------------------------------------------------')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print('-------------------------------------------------------------------------------')
print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')
diff_ref = pred_ref.flatten() - ground_truth
mse_ref = np.mean(diff_ref * diff_ref)
print(f'MSE between prediction and ground_truth: {mse_ref}')
Launching...
HPS initialized
[HCTR][06:54:27.572][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
====================================================HPS Create====================================================
[HCTR][06:54:27.572][INFO][RK0][main]: Creating RedisCluster backend...
[HCTR][06:54:27.577][INFO][RK0][main]: RedisCluster: Connecting via 127.0.0.1:7000...
[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][06:54:27.577][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][06:54:27.753][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][06:54:27.873][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][06:54:30.134][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][06:54:30.134][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][06:54:30.140][INFO][RK0][main]: Model name: hps_demo
[HCTR][06:54:30.140][INFO][RK0][main]: Max batch size: 1024
[HCTR][06:54:30.140][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][06:54:30.140][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][06:54:30.140][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][06:54:30.140][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][06:54:30.140][INFO][RK0][main]: Use I64 input key: True
[HCTR][06:54:30.140][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][06:54:30.140][INFO][RK0][main]: The size of thread pool: 80
[HCTR][06:54:30.140][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][06:54:30.140][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][06:54:30.140][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][06:54:30.156][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][06:54:30.156][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
-------------------------------------------------------------------------------
HPS demo without embedding
-------------------------------------------------------------------------------
Ground truth: (1024,) = [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
-------------------------------------------------------------------------------
Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045 0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
-------------------------------------------------------------------------------
HPS demo with embedding
-------------------------------------------------------------------------------
Ground truth: (1024,) = [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
-------------------------------------------------------------------------------
Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045 0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
2023-09-20 06:54:30.230052244 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Step 5: Shutdown Redis cluster
!pkill redis-server
5. Redis Cluster deployment (with TLS/SSL)
When using Redis as backing storage, HugeCTR can use make use of TLS/SSL to encrypt data transfers. In the following steps we setupt a small Redis cluster and enable SSL for it.
Step 1: Build a TLS/SSL capable distribution of Redis
!rm -f 7.0.8.tar.gz && wget https://github.com/redis/redis/archive/7.0.8.tar.gz
!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
!cd redis && make BUILD_TLS=yes
--2023-09-20 06:55:14-- https://github.com/redis/redis/archive/7.0.8.tar.gz
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8 [following]
--2023-09-20 06:55:14-- https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8
Resolving codeload.github.com (codeload.github.com)... 192.30.255.121
Connecting to codeload.github.com (codeload.github.com)|192.30.255.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘7.0.8.tar.gz’
7.0.8.tar.gz [ <=> ] 2.87M 3.24MB/s in 0.9s
2023-09-20 06:55:15 (3.24 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]
cd src && make all
make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
CC Makefile.dep
./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep
rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
(cd ../deps && make distclean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
rm -rf *.xo *.so
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
(cd ../tests/modules && make clean)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
rm -f commandfilter.so basics.so testrdb.so fork.so infotest.so propagate.so misc.so hooks.so blockonkeys.so blockonbackground.so scan.so datatype.so datatype2.so auth.so keyspace_events.so blockedclient.so getkeys.so getchannels.so test_lazyfree.so timer.so defragtest.so keyspecs.so hash.so zset.so stream.so mallocsize.so aclcheck.so list.so subcommands.so reply.so cmdintrospection.so eventloop.so moduleconfigs.so moduleconfigstwo.so publish.so usercall.so commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
(rm -f .make-*)
echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
echo OPT=-O2 >> .make-settings
echo MALLOC=jemalloc >> .make-settings
echo BUILD_TLS=yes >> .make-settings
echo USE_SYSTEMD= >> .make-settings
echo CFLAGS= >> .make-settings
echo LDFLAGS= >> .make-settings
echo REDIS_CFLAGS= >> .make-settings
echo REDIS_LDFLAGS= >> .make-settings
echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include -DUSE_OPENSSL >> .make-settings
echo PREV_FINAL_LDFLAGS= -g -ggdb -rdynamic >> .make-settings
(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
(cd hiredis && make clean) > /dev/null || true
(cd linenoise && make clean) > /dev/null || true
(cd lua && make clean) > /dev/null || true
(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
(cd hdr_histogram && make clean) > /dev/null || true
(rm -f .make-*)
(echo "" > .make-cflags)
(echo "" > .make-ldflags)
MAKE hiredis
cd hiredis && make static USE_SSL=1
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
cc -std=c99 -c -O3 -fPIC -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic ssl.c
ar rcs libhiredis_ssl.a ssl.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
MAKE linenoise
cd linenoise && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
cc -Wall -Os -g -c linenoise.c
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
MAKE lua
cd lua/src && make all CFLAGS="-Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 " MYLDFLAGS="" AR="ar rc"
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lapi.o lapi.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lcode.o lcode.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldebug.o ldebug.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldo.o ldo.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldump.o ldump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lfunc.o lfunc.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lgc.o lgc.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o llex.o llex.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lmem.o lmem.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lobject.o lobject.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lopcodes.o lopcodes.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lparser.o lparser.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstate.o lstate.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstring.o lstring.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltable.o ltable.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltm.o ltm.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lundump.o lundump.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lvm.o lvm.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lzio.o lzio.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o strbuf.o strbuf.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o fpconv.o fpconv.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lauxlib.o lauxlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lbaselib.o lbaselib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ldblib.o ldblib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o liolib.o liolib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lmathlib.o lmathlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o loslib.o loslib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o ltablib.o ltablib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lstrlib.o lstrlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o loadlib.o loadlib.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o linit.o linit.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_cjson.o lua_cjson.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_struct.o lua_struct.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_cmsgpack.o lua_cmsgpack.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua_bit.o lua_bit.c
ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o # DLL needs all object files
ranlib liblua.a
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o lua.o lua.c
cc -o lua lua.o liblua.a -lm
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o luac.o luac.c
cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP -O2 -c -o print.o print.c
cc -o luac luac.o print.o liblua.a -lm
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
MAKE hdr_histogram
cd hdr_histogram && make
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
cc -std=c99 -Wall -Os -g -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c hdr_histogram.c
ar rcs libhdrhistogram.a hdr_histogram.o
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
MAKE jemalloc
cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS=""
checking for xsltproc... false
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether compiler is cray... no
checking whether compiler supports -std=gnu11... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -Wshorten-64-to-32... no
checking whether compiler supports -Wsign-compare... yes
checking whether compiler supports -Wundef... yes
checking whether compiler supports -Wno-format-zero-length... yes
checking whether compiler supports -pipe... yes
checking whether compiler supports -g3... yes
checking how to run the C preprocessor... gcc -E
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether g++ supports C++14 features by default... yes
checking whether compiler supports -Wall... yes
checking whether compiler supports -Wextra... yes
checking whether compiler supports -g3... yes
checking whether libstdc++ linkage is compilable... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking whether byte ordering is bigendian... no
checking size of void *... 8
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of intmax_t... 8
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking whether pause instruction is compilable... yes
checking number of significant virtual address bits... 48
checking for ar... ar
checking for nm... nm
checking for gawk... no
checking for mawk... mawk
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking whether malloc_usable_size definition can use const argument... no
checking for library containing log... -lm
checking whether __attribute__ syntax is compilable... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -fvisibility=hidden... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether tls_model attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether alloc_size attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(gnu_printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether format(printf, ...) attribute is compilable... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for ld... /usr/bin/ld
checking for autoconf... /usr/bin/autoconf
checking for memalign... yes
checking for valloc... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -O3... yes
checking whether compiler supports -funroll-loops... yes
checking configured backtracing method... N/A
checking for sbrk... yes
checking whether utrace(2) is compilable... no
checking whether a program using __builtin_unreachable is compilable... yes
checking whether a program using __builtin_ffsl is compilable... yes
checking whether a program using __builtin_popcountl is compilable... yes
checking LG_PAGE... 12
checking pthread.h usability... yes
checking pthread.h presence... yes
checking for pthread.h... yes
checking for pthread_create in -lpthread... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking for dlsym... yes
checking whether pthread_atfork(3) is compilable... yes
checking whether pthread_setname_np(3) is compilable... yes
checking for library containing clock_gettime... none required
checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
checking whether mach_absolute_time() is compilable... no
checking whether compiler supports -Werror... yes
checking whether syscall(2) is compilable... yes
checking for secure_getenv... yes
checking for sched_getcpu... yes
checking for sched_setaffinity... yes
checking for issetugid... no
checking for _malloc_thread_cleanup... no
checking for _pthread_mutex_init_calloc_cb... no
checking for TLS... yes
checking whether C11 atomics is compilable... no
checking whether GCC __atomic atomics is compilable... yes
checking whether GCC 8-bit __atomic atomics is compilable... yes
checking whether GCC __sync atomics is compilable... yes
checking whether GCC 8-bit __sync atomics is compilable... yes
checking whether Darwin OSAtomic*() is compilable... no
checking whether madvise(2) is compilable... yes
checking whether madvise(..., MADV_FREE) is compilable... yes
checking whether madvise(..., MADV_DONTNEED) is compilable... yes
checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
checking for __builtin_clz... yes
checking whether Darwin os_unfair_lock_*() is compilable... no
checking whether glibc malloc hook is compilable... no
checking whether glibc memalign hook is compilable... no
checking whether pthreads adaptive mutexes is compilable... yes
checking whether compiler supports -D_GNU_SOURCE... yes
checking whether compiler supports -Werror... yes
checking whether compiler supports -herror_on_warning... yes
checking whether strerror_r returns char with gnu source is compilable... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating jemalloc.pc
config.status: creating doc/html.xsl
config.status: creating doc/manpages.xsl
config.status: creating doc/jemalloc.xml
config.status: creating include/jemalloc/jemalloc_macros.h
config.status: creating include/jemalloc/jemalloc_protos.h
config.status: creating include/jemalloc/jemalloc_typedefs.h
config.status: creating include/jemalloc/internal/jemalloc_preamble.h
config.status: creating test/test.sh
config.status: creating test/include/test/jemalloc_test.h
config.status: creating config.stamp
config.status: creating bin/jemalloc-config
config.status: creating bin/jemalloc.sh
config.status: creating bin/jeprof
config.status: creating include/jemalloc/jemalloc_defs.h
config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
config.status: creating test/include/test/jemalloc_test_defs.h
config.status: executing include/jemalloc/internal/public_symbols.txt commands
config.status: executing include/jemalloc/internal/private_symbols.awk commands
config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
config.status: executing include/jemalloc/internal/public_namespace.h commands
config.status: executing include/jemalloc/internal/public_unnamespace.h commands
config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
config.status: executing include/jemalloc/jemalloc_rename.h commands
config.status: executing include/jemalloc/jemalloc_mangle.h commands
config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
config.status: executing include/jemalloc/jemalloc.h commands
===============================================================================
jemalloc version : 5.2.1-0-g0
library revision : 2
CONFIG : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
CC : gcc
CONFIGURE_CFLAGS : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
SPECIFIED_CFLAGS : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops
EXTRA_CFLAGS :
CPPFLAGS : -D_GNU_SOURCE -D_REENTRANT
CXX : g++
CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
SPECIFIED_CXXFLAGS :
EXTRA_CXXFLAGS :
LDFLAGS :
EXTRA_LDFLAGS :
DSO_LDFLAGS : -shared -Wl,-soname,$(@F)
LIBS : -lm -lstdc++ -pthread
RPATH_EXTRA :
XSLTPROC : false
XSLROOT :
PREFIX : /usr/local
BINDIR : /usr/local/bin
DATADIR : /usr/local/share
INCLUDEDIR : /usr/local/include
LIBDIR : /usr/local/lib
MANDIR : /usr/local/share/man
srcroot :
abs_srcroot : /hugectr/notebooks/tmr/redis/deps/jemalloc/
objroot :
abs_objroot : /hugectr/notebooks/tmr/redis/deps/jemalloc/
JEMALLOC_PREFIX : je_
JEMALLOC_PRIVATE_NAMESPACE
: je_
install_suffix :
malloc_conf :
documentation : 1
shared libs : 1
static libs : 1
autogen : 0
debug : 0
stats : 1
experimetal_smallocx : 0
prof : 0
prof-libunwind : 0
prof-libgcc : 0
prof-gcc : 0
fill : 1
utrace : 0
xmalloc : 0
log : 0
lazy_lock : 0
cache-oblivious : 1
cxx : 1
===============================================================================
cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
/bin/sh include/jemalloc/internal/private_namespace.sh src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
ar: `u' modifier ignored since `D' is the default (see `U')
make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
CC adlist.o
CC quicklist.o
CC ae.o
CC anet.o
CC dict.o
CC server.o
CC sds.o
CC zmalloc.o
CC lzf_c.o
CC lzf_d.o
CC pqsort.o
CC zipmap.o
CC sha1.o
CC ziplist.o
CC release.o
CC networking.o
CC util.o
CC object.o
CC db.o
CC replication.o
CC rdb.o
CC t_string.o
CC t_list.o
CC t_set.o
CC t_zset.o
CC t_hash.o
CC config.o
CC aof.o
CC pubsub.o
CC multi.o
CC debug.o
CC sort.o
CC intset.o
CC syncio.o
CC cluster.o
CC crc16.o
CC endianconv.o
CC slowlog.o
CC eval.o
CC bio.o
CC rio.o
CC rand.o
CC memtest.o
CC syscheck.o
CC crcspeed.o
CC crc64.o
CC bitops.o
CC sentinel.o
CC notify.o
CC setproctitle.o
CC blocked.o
CC hyperloglog.o
CC latency.o
CC sparkline.o
CC redis-check-rdb.o
CC redis-check-aof.o
CC geo.o
CC lazyfree.o
CC module.o
CC evict.o
CC expire.o
CC geohash.o
CC geohash_helper.o
CC childinfo.o
CC defrag.o
CC siphash.o
CC rax.o
CC t_stream.o
CC listpack.o
CC localtime.o
CC lolwut.o
CC lolwut5.o
CC lolwut6.o
CC acl.o
CC tracking.o
CC connection.o
CC tls.o
CC sha256.o
CC timeout.o
CC setcpuaffinity.o
CC monotonic.o
CC mt19937-64.o
CC resp_parser.o
CC call_reply.o
CC script_lua.o
CC script.o
CC functions.o
CC function_lua.o
CC commands.o
LINK redis-server
INSTALL redis-sentinel
CC redis-cli.o
CC redisassert.o
CC cli_common.o
LINK redis-cli
CC redis-benchmark.o
LINK redis-benchmark
INSTALL redis-check-rdb
INSTALL redis-check-aof
Hint: It's a good idea to run 'make test' ;)
make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
If you see the message Hint: It's a good idea to run 'make test' ;)
followed by make[1]: Leaving directory ...
, the compilation should have completed successfully.
Step 2: Configure a mock Redis cluster
Setup TLS/SSL certificates. Can skip if encyryption is not needed.
WARNING: The following commands will erase the all contents in the following directories: test_certs
, redis-server-1
, redis-server-2
and redis-server-3
.
!mkdir -p test_certs
!rm -f test_certs/*
with open("test_certs/openssl.conf", "w") as f:
f.write("""[ redis_server ]
keyUsage = digitalSignature, keyEncipherment
[ hugectr_client ]
keyUsage = digitalSignature, keyEncipherment
nsCertType = client""")
# Create private keys for CA, Redis server and HugeCTR client.
!openssl genrsa -out test_certs/ca-private.pem 4096
!openssl genrsa -out test_certs/redis-private.pem 4096
!openssl genrsa -out test_certs/hugectr-private.pem 4096
# Create public keys for CA, Redis server and HugeCTR client.
#!openssl rsa -pubout -in test_certs/ca-private.pem -out test_certs/ca-public.pem
#!openssl rsa -pubout -in test_certs/redis-private.pem -out test_certs/redis-public.pem
#!openssl rsa -pubout -in test_certs/hugectr-private.pem -out test_certs/hugectr-public.pem
# Form dummy CA.
!openssl req -new -nodes -sha256 -x509 -subj '/O=NVIDIA Merlin/CN=Certificate Authority' -days 365 \
-key test_certs/ca-private.pem \
-out test_certs/ca.crt
# Generate certificate for Redis server.
!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=Redis Server" \
-key test_certs/redis-private.pem | \
openssl x509 -req -sha256 \
-CA test_certs/ca.crt \
-CAkey test_certs/ca-private.pem \
-CAserial test_certs/redis.ser \
-CAcreateserial \
-days 365 \
-extfile test_certs/openssl.conf -extensions redis_server \
-out test_certs/redis.crt
# Generate certificate for HugeCTR client.
!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=HugeCTR Redis Client" \
-key test_certs/hugectr-private.pem | \
openssl x509 \
-req -sha256 \
-CA test_certs/ca.crt \
-CAkey test_certs/ca-private.pem \
-CAserial test_certs/hugectr.ser \
-CAcreateserial \
-days 365 \
-extfile test_certs/openssl.conf -extensions hugectr_client \
-out test_certs/hugectr.crt
Certificate request self-signature ok
subject=O = NVIDIA Merlin, CN = Redis Server
Certificate request self-signature ok
subject=O = NVIDIA Merlin, CN = HugeCTR Redis Client
!mkdir -p redis-server-1 redis-server-2 redis-server-3
!rm -f redis-server-1/* redis-server-2/* redis-server-3/*
!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server
!ln -sf $PWD/test_certs/ca.crt redis-server-1/ca.crt
!ln -sf $PWD/test_certs/ca.crt redis-server-2/ca.crt
!ln -sf $PWD/test_certs/ca.crt redis-server-3/ca.crt
!ln -sf $PWD/test_certs/redis-private.pem redis-server-1/private.pem
!ln -sf $PWD/test_certs/redis-private.pem redis-server-2/private.pem
!ln -sf $PWD/test_certs/redis-private.pem redis-server-3/private.pem
!ln -sf $PWD/test_certs/redis.crt redis-server-1/redis.crt
!ln -sf $PWD/test_certs/redis.crt redis-server-2/redis.crt
!ln -sf $PWD/test_certs/redis.crt redis-server-3/redis.crt
%%writefile redis-server-1/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7000
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-1/redis.conf
%%writefile redis-server-2/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7001
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-2/redis.conf
%%writefile redis-server-3/redis.conf
daemonize yes
port 0
cluster-enabled yes
cluster-config-file nodes.conf
tls-port 7002
tls-ca-cert-file ca.crt
tls-cert-file redis.crt
tls-key-file private.pem
tls-cluster yes
appendonly no
save ""
Writing redis-server-3/redis.conf
Step 3: Form Redis cluster
WARNING: The following command will shutdown any processes called redis-cluster
in the current system!
# Shutdown existing cluster (if any).
!pkill redis-server
# Reset configuration and start 3 Redis servers.
!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf
# Form the cluster.
!redis/src/redis-cli \
--cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
--cluster-yes \
--tls \
--cacert test_certs/ca.crt \
--cert test_certs/hugectr.crt \
--key test_certs/hugectr-private.pem
>>> Performing hash slots allocation on 3 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: a441806db5506b7600ee8ae794fa01dc31ac83c9 127.0.0.1:7000
slots:[0-5460] (5461 slots) master
M: 6fa93392a396aa3c321736234b7eafc86bb1f979 127.0.0.1:7001
slots:[5461-10922] (5462 slots) master
M: 8e9cd68cc229fcb568a84d7358011201b4246046 127.0.0.1:7002
slots:[10923-16383] (5461 slots) master
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: a441806db5506b7600ee8ae794fa01dc31ac83c9 127.0.0.1:7000
slots:[0-5460] (5461 slots) master
M: 8e9cd68cc229fcb568a84d7358011201b4246046 127.0.0.1:7002
slots:[10923-16383] (5461 slots) master
M: 6fa93392a396aa3c321736234b7eafc86bb1f979 127.0.0.1:7001
slots:[5461-10922] (5462 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Step 4: Run HugeCTR
import os
import time
import multiprocessing as mp
import pandas as pd
import numpy as np
import onnxruntime as ort
from hugectr import DatabaseType_t
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
slot_size_array = [10000, 10000, 10000, 10000]
key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
batch_size = 1024
print('Launching...')
# 1. Configure the HPS hyperparameters.
ps_config = ParameterServerConfig(
emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
embedding_vec_size = {'hps_demo': [16, 32]},
max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
inference_params_array = [
InferenceParams(
model_name = 'hps_demo',
max_batchsize = batch_size,
hit_rate_threshold = 1.0,
dense_model_file = '',
sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
deployed_devices = [0],
use_gpu_embedding_cache = True,
cache_size_percentage = 0.5,
i64_input_key = True)
],
volatile_db = VolatileDatabaseParams(
DatabaseType_t.redis_cluster,
address = '127.0.0.1:7000',
num_partitions = 15,
num_node_connections = 5,
enable_tls = True,
tls_ca_certificate = 'test_certs/ca.crt',
tls_client_certificate = 'test_certs/hugectr.crt',
tls_client_key = 'test_certs/hugectr-private.pem',
tls_server_name_identification = 'redis.localhost',
))
# 2. Initialize the HPS object.
hps = HPS(ps_config)
print('HPS initialized')
# 3. Load query data.
df = pd.read_parquet('data_parquet/val/gen_0.parquet')
dense_input_columns = df.columns[1:11]
cat_input1_columns = df.columns[11:13]
cat_input2_columns = df.columns[13:15]
dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
res = sess.run(output_names=[sess.get_outputs()[0].name],
input_feed={sess.get_inputs()[0].name: dense_input,
sess.get_inputs()[1].name: embedding1,
sess.get_inputs()[2].name: embedding2})
pred = res[0].flatten()
# 5. Check the correctness by comparing with dumped evaluation results.
ground_truth = np.load("ground_truth.npy").flatten()
print('-------------------------------------------------------------------------------')
print(' HPS demo without embedding ')
print('-------------------------------------------------------------------------------')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print('-------------------------------------------------------------------------------')
print(f'Prediction without embedding: {pred.shape} = {pred}')
diff = pred - ground_truth
mse = np.mean(diff * diff)
print(f'MSE between prediction and ground_truth: {mse}')
# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
input_feed={sess_ref.get_inputs()[0].name: dense_input,
sess_ref.get_inputs()[1].name: cat_input1,
sess_ref.get_inputs()[2].name: cat_input2})
pred_ref = res_ref[0].flatten()
print('-------------------------------------------------------------------------------')
print(' HPS demo with embedding ')
print('-------------------------------------------------------------------------------')
print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
print('-------------------------------------------------------------------------------')
print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')
diff_ref = pred_ref.flatten() - ground_truth
mse_ref = np.mean(diff_ref * diff_ref)
print(f'MSE between prediction and ground_truth: {mse_ref}')
Launching...
[HCTR][07:00:07.643][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
HPS initialized
====================================================HPS Create====================================================
[HCTR][07:00:07.643][INFO][RK0][main]: Creating RedisCluster backend...
[HCTR][07:00:07.644][INFO][RK0][main]: RedisCluster: Connecting via 127.0.0.1:7000...
[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][07:00:07.667][DEBUG][RK0][main]: Created raw model loader in local memory!
[HCTR][07:00:07.894][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
[HCTR][07:00:07.984][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
[HCTR][07:00:07.984][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][07:00:07.984][INFO][RK0][main]: Creating embedding cache in device 0.
[HCTR][07:00:07.990][INFO][RK0][main]: Model name: hps_demo
[HCTR][07:00:07.990][INFO][RK0][main]: Max batch size: 1024
[HCTR][07:00:07.990][INFO][RK0][main]: Fuse embedding tables: False
[HCTR][07:00:07.990][INFO][RK0][main]: Number of embedding tables: 2
[HCTR][07:00:07.990][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
[HCTR][07:00:07.990][INFO][RK0][main]: Embedding cache type: dynamic
[HCTR][07:00:07.990][INFO][RK0][main]: Use I64 input key: True
[HCTR][07:00:07.990][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
[HCTR][07:00:07.990][INFO][RK0][main]: The size of thread pool: 80
[HCTR][07:00:07.990][INFO][RK0][main]: The size of worker memory pool: 2
[HCTR][07:00:07.990][INFO][RK0][main]: The size of refresh memory pool: 1
[HCTR][07:00:07.990][INFO][RK0][main]: The refresh percentage : 0.000000
[HCTR][07:00:07.995][INFO][RK0][main]: LookupSession i64_input_key: True
[HCTR][07:00:07.995][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Awaiting background worker to conclude...
[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Disconnecting...
-------------------------------------------------------------------------------
HPS demo without embedding
-------------------------------------------------------------------------------
Ground truth: (1024,) = [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
-------------------------------------------------------------------------------
Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045 0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
-------------------------------------------------------------------------------
HPS demo with embedding
-------------------------------------------------------------------------------
Ground truth: (1024,) = [0.4895492 0.509022 0.38192913 ... 0.5264926 0.50650454 0.47927693]
-------------------------------------------------------------------------------
Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045 0.4792769 ]
MSE between prediction and ground_truth: 2.3887142264200634e-15
2023-09-20 07:00:08.022623188 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
Step 5: Shutdown Redis cluster
!pkill redis-server