Template Class HashTable

Inheritance Relationships

Base Type

  • public nv::merlin::HashTableBase< K, V, uint64_t >

Class Documentation

template<typename K, typename V, typename S = uint64_t, int Strategy = EvictStrategy::kLru, typename ArchTag = Sm80>
class nv::merlin::HashTable : public nv::merlin::HashTableBase<K, V, uint64_t>

A HierarchicalKV hash table is a concurrent and hierarchical hash table that is powered by GPUs and can use HBM and host memory as storage for key-value pairs. Support for SSD storage is a future consideration.

The score is introduced to define the importance of each key, the larger, the more important, the less likely they will be evicted. Eviction occurs automatically when a bucket is full. The keys with the minimum score value are evicted first. In a customized eviction strategy, we recommend using the timestamp or frequency of the key occurrence as the score value for each key. You can also assign a special value to the score to perform a customized eviction strategy.

Note

By default configuration, this class is thread-safe.

Template Parameters
  • K – The data type of the key.

  • V – The data type of the vector’s item type. The item data type should be a basic data type of C++/CUDA.

  • S – The data type for score. The currently supported data type is only uint64_t.

Public Types

using size_type = size_t
using key_type = K
using value_type = V
using score_type = S
using Pred = EraseIfPredict<key_type, score_type>
using allocator_type = BaseAllocator

Public Functions

inline HashTable()

Default constructor for the hash table class.

inline ~HashTable()

Frees the resources used by the hash table and destroys the hash table object.

inline void init(const HashTableOptions &options, allocator_type *allocator = nullptr)

Initialize a merlin::HashTable.

Parameters

options – The configuration options.

inline void insert_or_assign(const size_type n, const key_type *keys, const value_type *values, const score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true, bool ignore_evict_strategy = false)

Insert new key-value-score tuples into the hash table. If the key already exists, the values and scores are assigned new values.

If the target bucket is full, the keys with minimum score will be overwritten by new key unless the score of the new key is even less than minimum score of the target bucket.

Parameters
  • n – Number of key-value-score tuples to insert or assign.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

  • values – The values to insert on GPU-accessible memory with shape (n, DIM).

  • scores – The scores to insert on GPU-accessible memory with shape (n).

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

  • ignore_evict_strategy – A boolean option indicating whether if the insert_or_assign ignores the evict strategy of table with current scores anyway. If true, it does not check whether the scores conforms to the evict strategy. If false, it requires the scores follow the evict strategy of table.

inline void insert_and_evict(const size_type n, const key_type *keys, const value_type *values, const score_type *scores, key_type *evicted_keys, value_type *evicted_values, score_type *evicted_scores, size_type *d_evicted_counter, cudaStream_t stream = 0, bool unique_key = true, bool ignore_evict_strategy = false)

Insert new key-value-score tuples into the hash table. If the key already exists, the values and scores are assigned new values.

If the target bucket is full, the keys with minimum score will be overwritten by new key unless the score of the new key is even less than minimum score of the target bucket. The overwritten key with minimum score will be evicted, with its values and score, to evicted_keys, evicted_values, evcted_scores seperately in compact format.

Parameters
  • n – Number of key-value-score tuples to insert or assign.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

  • values – The values to insert on GPU-accessible memory with shape (n, DIM).

  • scores – The scores to insert on GPU-accessible memory with shape (n). @params evicted_keys The output of keys replaced with minimum score. @params evicted_values The output of values replaced with minimum score on keys. @params evicted_scores The output of scores replaced with minimum score on keys.

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • d_evicted_counter – The number of elements evicted on GPU-accessible memory. @notice The caller should guarantee it is set to 0 before calling.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

  • ignore_evict_strategy – A boolean option indicating whether if the insert_or_assign ignores the evict strategy of table with current scores anyway. If true, it does not check whether the scores confroms to the evict strategy. If false, it requires the scores follow the evict strategy of table.

inline size_type insert_and_evict(const size_type n, const key_type *keys, const value_type *values, const score_type *scores, key_type *evicted_keys, value_type *evicted_values, score_type *evicted_scores, cudaStream_t stream = 0, bool unique_key = true, bool ignore_evict_strategy = false)

Insert new key-value-score tuples into the hash table. If the key already exists, the values and scores are assigned new values.

If the target bucket is full, the keys with minimum score will be overwritten by new key unless the score of the new key is even less than minimum score of the target bucket. The overwritten key with minimum score will be evicted, with its values and score, to evicted_keys, evicted_values, evcted_scores seperately in compact format.

Parameters
  • n – Number of key-value-score tuples to insert or assign.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

  • values – The values to insert on GPU-accessible memory with shape (n, DIM).

  • scores – The scores to insert on GPU-accessible memory with shape (n). @params evicted_keys The output of keys replaced with minimum score. @params evicted_values The output of values replaced with minimum score on keys. @params evicted_scores The output of scores replaced with minimum score on keys.

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

  • ignore_evict_strategy – A boolean option indicating whether if the insert_or_assign ignores the evict strategy of table with current scores anyway. If true, it does not check whether the scores confroms to the evict strategy. If false, it requires the scores follow the evict strategy of table.

Returns

The number of elements evicted.

inline void accum_or_assign(const size_type n, const key_type *keys, const value_type *value_or_deltas, const bool *accum_or_assigns, const score_type *scores = nullptr, cudaStream_t stream = 0, bool ignore_evict_strategy = false)

Searches for each key in keys in the hash table. If the key is found and the corresponding value in accum_or_assigns is true, the vectors_or_deltas is treated as a delta to the old value, and the delta is added to the old value of the key.

If the key is not found and the corresponding value in accum_or_assigns is false, the vectors_or_deltas is treated as a new value and the key-value pair is updated in the table directly.

Note

When the key is found and the value of accum_or_assigns is false, or when the key is not found and the value of accum_or_assigns is true, nothing is changed and this operation is ignored. The algorithm assumes these situations occur while the key was modified or removed by other processes just now.

Parameters
  • n – The number of key-value-score tuples to process.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

  • value_or_deltas – The values or deltas to insert on GPU-accessible memory with shape (n, DIM).

  • accum_or_assigns – The operation type with shape (n). A value of true indicates to accum and false indicates to assign.

  • scores – The scores to insert on GPU-accessible memory with shape (n).

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • stream – The CUDA stream that is used to execute the operation.

  • ignore_evict_strategy – A boolean option indicating whether if the accum_or_assign ignores the evict strategy of table with current scores anyway. If true, it does not check whether the scores confroms to the evict strategy. If false, it requires the scores follow the evict strategy of table.

inline void find_or_insert(const size_type n, const key_type *keys, value_type *values, score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true, bool ignore_evict_strategy = false)

Searches the hash table for the specified keys. When a key is missing, the value in values and scores will be inserted.

Parameters
  • n – The number of key-value-score tuples to search or insert.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • values – The values to search on GPU-accessible memory with shape (n, DIM).

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void find_or_insert(const size_type n, const key_type *keys, value_type **values, bool *founds, score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true, bool ignore_evict_strategy = false)

Searches the hash table for the specified keys and returns address of the values. When a key is missing, the value in values and scores will be inserted.

Warning

This API returns internal addresses for high-performance but thread-unsafe. The caller is responsible for guaranteeing data consistency.

Parameters
  • n – The number of key-value-score tuples to search or insert.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • values – The addresses of values to search on GPU-accessible memory with shape (n).

  • founds – The status that indicates if the keys are found on

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void assign(const size_type n, const key_type *keys, const value_type *values, const score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true)

Assign new key-value-score tuples into the hash table. If the key doesn’t exist, the operation on the key will be ignored.

Parameters
  • n – Number of key-value-score tuples to insert or assign.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

  • values – The values to insert on GPU-accessible memory with shape (n, DIM).

  • scores – The scores to insert on GPU-accessible memory with shape (n).

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void assign_scores(const size_type n, const key_type *keys, const score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true)

Assign new scores for keys. If the key doesn’t exist, the operation on the key will be ignored.

Parameters
  • n – Number of key-score pairs to assign.

  • keys – The keys to insert on GPU-accessible memory with shape (n).

    The scores should be a uint64_t value. You can specify a value that such as the timestamp of the key insertion, number of the key occurrences, or another value to perform a custom eviction strategy.

    The scores should be nullptr, when the LRU eviction strategy is applied.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void assign(const size_type n, const key_type *keys, const score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true)

Alias of assign_scores.

inline void assign_values(const size_type n, const key_type *keys, const value_type *values, cudaStream_t stream = 0, bool unique_key = true)

Assign new values for each keys . If the key doesn’t exist, the operation on the key will be ignored.

Parameters
  • n – Number of key-value pairs to assign.

  • keys – The keys need to be operated, which must be on GPU-accessible memory with shape (n).

  • values – The values need to be updated, which must be on GPU-accessible memory with shape (n, DIM).

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void find(const size_type n, const key_type *keys, value_type *values, bool *founds, score_type *scores = nullptr, cudaStream_t stream = 0) const

Searches the hash table for the specified keys.

Note

When a key is missing, the value in values is not changed.

Parameters
  • n – The number of key-value-score tuples to search.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • values – The values to search on GPU-accessible memory with shape (n, DIM).

  • founds – The status that indicates if the keys are found on GPU-accessible memory with shape (n).

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

inline void find(const size_type n, const key_type *keys, value_type *values, key_type *missed_keys, int *missed_indices, int *missed_size, score_type *scores = nullptr, cudaStream_t stream = 0) const

Searches the hash table for the specified keys.

Note

When the searched keys are not hit, missed keys/indices/size can be obtained.

Parameters
  • n – The number of key-value-score tuples to search.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • values – The values to search on GPU-accessible memory with shape (n, DIM).

  • missed_keys – The missed keys to search on GPU-accessible memory with shape (n).

  • missed_indices – The missed indices to search on GPU-accessible memory with shape (n).

  • missed_size – The size of missed_keys and missed_indices.

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

inline void find(const size_type n, const key_type *keys, value_type **values, bool *founds, score_type *scores = nullptr, cudaStream_t stream = 0, bool unique_key = true) const

Searches the hash table for the specified keys and returns address of the values.

Note

When a key is missing, the data in values won’t change.

Warning

This API returns internal addresses for high-performance but thread-unsafe. The caller is responsible for guaranteeing data consistency.

Parameters
  • n – The number of key-value-score tuples to search.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • values – The addresses of values to search on GPU-accessible memory with shape (n).

  • founds – The status that indicates if the keys are found on GPU-accessible memory with shape (n).

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

  • unique_key – If all keys in the same batch are unique.

inline void contains(const size_type n, const key_type *keys, bool *founds, cudaStream_t stream = 0) const

Checks if there are elements with key equivalent to keys in the table.

Parameters
  • n – The number of keys to check.

  • keys – The keys to search on GPU-accessible memory with shape (n).

  • founds – The result that indicates if the keys are found, and should be allocated by caller on GPU-accessible memory with shape (n).

  • stream – The CUDA stream that is used to execute the operation.

inline void erase(const size_type n, const key_type *keys, cudaStream_t stream = 0)

Removes specified elements from the hash table.

Parameters
  • n – The number of keys to remove.

  • keys – The keys to remove on GPU-accessible memory.

  • stream – The CUDA stream that is used to execute the operation.

template<template<typename, typename> class PredFunctor>
inline size_type erase_if(const key_type &pattern, const score_type &threshold, cudaStream_t stream = 0)

Erases all elements that satisfy the predicate pred from the hash table.

template <class K, class S>
struct EraseIfPredFunctor {
  __forceinline__ __device__ bool operator()(const K& key,
                                             S& score,
                                             const K& pattern,
                                             const S& threshold) {
    return ((key & 0x1 == pattern) && (score < threshold));
  }
};

Template Parameters

PredFunctor – The predicate template <typename K, typename S> function with operator signature (bool*)(const K&, const S&, const K&, const threshold) that returns true if the element should be erased. The value for pred should be a function with type Pred defined like the following example:

Parameters
  • pattern – The third user-defined argument to pred with key_type type.

  • threshold – The fourth user-defined argument to pred with score_type type.

  • stream – The CUDA stream that is used to execute the operation.

Returns

The number of elements removed.

inline void clear(cudaStream_t stream = 0)

Removes all of the elements in the hash table with no release object.

inline void export_batch(size_type n, const size_type offset, size_type *d_counter, key_type *keys, value_type *values, score_type *scores = nullptr, cudaStream_t stream = 0) const

Exports a certain number of the key-value-score tuples from the hash table.

Parameters
  • n – The maximum number of exported pairs.

  • offset – The position of the key to search.

  • d_counter – Accumulates amount of successfully exported values.

  • keys – The keys to dump from GPU-accessible memory with shape (n).

  • values – The values to dump from GPU-accessible memory with shape (n, DIM).

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

Throws

CudaException – If the key-value size is too large for GPU shared memory. Reducing the value for n is currently required if this exception occurs.

Returns

The number of elements dumped.

inline size_type export_batch(const size_type n, const size_type offset, key_type *keys, value_type *values, score_type *scores = nullptr, cudaStream_t stream = 0) const
template<template<typename, typename> class PredFunctor>
inline void export_batch_if(const key_type &pattern, const score_type &threshold, size_type n, const size_type offset, size_type *d_counter, key_type *keys, value_type *values, score_type *scores = nullptr, cudaStream_t stream = 0) const

Exports a certain number of the key-value-score tuples which match.

template <class K, class S>
struct ExportIfPredFunctor {
  __forceinline__ __device__ bool operator()(const K& key,
                                             S& score,
                                             const K& pattern,
                                             const S& threshold) {
    return score >= threshold;
  }
};

Template Parameters

PredFunctor – A functor with template <K, S> defined an operator with signature: device (bool*)(const K&, S&, const K&, const S&). specified condition from the hash table.

Parameters
  • n – The maximum number of exported pairs. The value for pred should be a function with type Pred defined like the following example:

  • pattern – The third user-defined argument to pred with key_type type.

  • threshold – The fourth user-defined argument to pred with score_type type.

  • offset – The position of the key to remove.

  • keys – The keys to dump from GPU-accessible memory with shape (n).

  • values – The values to dump from GPU-accessible memory with shape (n, DIM).

  • scores – The scores to search on GPU-accessible memory with shape (n).

    If scores is nullptr, the score for each key will not be returned.

  • stream – The CUDA stream that is used to execute the operation.

Throws

CudaException – If the key-value size is too large for GPU shared memory. Reducing the value for n is currently required if this exception occurs.

Returns

The number of elements dumped.

inline bool empty(cudaStream_t stream = 0) const

Indicates if the hash table has no elements.

Parameters

stream – The CUDA stream that is used to execute the operation.

Returns

true if the table is empty and false otherwise.

inline size_type size(cudaStream_t stream = 0) const

Returns the hash table size.

Parameters

stream – The CUDA stream that is used to execute the operation.

Returns

The table size.

inline size_type capacity() const

Returns the hash table capacity.

Note

The value that is returned might be less than the actual capacity of the hash table because the hash table currently keeps the capacity to be a power of 2 for performance considerations.

Returns

The table capacity.

inline void reserve(const size_type new_capacity, cudaStream_t stream = 0)

Sets the number of buckets to the number that is needed to accommodate at least new_capacity elements without exceeding the maximum load factor. This method rehashes the hash table. Rehashing puts the elements into the appropriate buckets considering that total number of buckets has changed.

Note

If the value of new_capacity or double of new_capacity is greater or equal than options_.max_capacity, the reserve does not perform any change to the hash table.

Parameters
  • new_capacity – The requested capacity for the hash table.

  • stream – The CUDA stream that is used to execute the operation.

inline float load_factor(cudaStream_t stream = 0) const

Returns the average number of elements per slot, that is, size() divided by capacity().

Parameters

stream – The CUDA stream that is used to execute the operation.

Returns

The load factor

inline void set_max_capacity(size_type new_max_capacity)

Set max_capacity of the table.

Parameters

new_max_capacity – The new expecting max_capacity. It must be power of 2. Otherwise it will raise an error.

inline size_type dim() const noexcept

Returns the dimension of the vectors.

Returns

The dimension of the vectors.

inline size_type max_bucket_size() const noexcept

Returns The length of each bucket.

Returns

The length of each bucket.

inline size_type bucket_count() const noexcept

Returns the number of buckets in the table.

Returns

The number of buckets in the table.

inline size_type save(BaseKVFile<K, V, S> *file, const size_t max_workspace_size = 1L * 1024 * 1024, cudaStream_t stream = 0) const

Save keys, vectors, scores in table to file or files.

Parameters
  • file – A BaseKVFile object defined the file format on host filesystem.

  • max_workspace_size – Saving is conducted in chunks. This value denotes the maximum amount of temporary memory to use when dumping the table. Larger values can lead to higher performance.

  • stream – The CUDA stream used to execute the operation.

Returns

Number of KV pairs saved to file.

inline size_type load(BaseKVFile<K, V, S> *file, const size_t max_workspace_size = 1L * 1024 * 1024, cudaStream_t stream = 0)

Load keys, vectors, scores from file to table.

Parameters
  • file – An BaseKVFile defined the file format within filesystem.

  • max_workspace_size – Loading is conducted in chunks. This value denotes the maximum size of such chunks. Larger values can lead to higher performance.

  • stream – The CUDA stream used to execute the operation.

Returns

Number of keys loaded from file.

inline void set_global_epoch(const uint64_t epoch)

Public Static Attributes

static constexpr int evict_strategy = Strategy