Distributed Sparse Embedding

class sparse_operation_kit.embeddings.distributed_embedding.DistributedEmbedding(*args, **kwargs)[source]

Bases: Layer

Abbreviated as sok.DistributedEmbedding(*args, **kwargs).

This is a wrapper class for distributed sparse embedding layer. It can be used to create a sparse embedding layer which will distribute keys based on gpu_id = key % gpu_num to each GPU.

  • combiner (string) – it is used to specify how to combine embedding vectors intra slots. Can be Mean or Sum.

  • max_vocabulary_size_per_gpu (integer) – the first dimension of embedding variable whose shape is [max_vocabulary_size_per_gpu, embedding_vec_size].

  • embedding_vec_size (integer) – the second dimension of embedding variable whose shape is [max_vocabulary_size_per_gpu, embedding_vec_size].

  • slot_num (integer) – the number of feature-fileds which will be processed at the same time in each iteration, where all feature-fileds produce embedding vectors of the same dimension.

  • max_nnz (integer) – the number of maximum valid keys in each slot (feature-filed).

  • max_feature_num (integer = slot_num*max_nnz) – the maximum valid keys in each sample. It can be used to save GPU memory when this statistic is known. By default, it is equal to \(max\_feature\_num=slot\_num*max\_nnz\).

  • use_hashtable (boolean = True) – whether using Hashtable in EmbeddingVariable, if True, Hashtable will be created for dynamic insertion. Otherwise, the input keys will be used as the index for embedding vector looking-up, so that input keys must be in the range [0, max_vocabulary_size_per_gpu * gpu_num).

  • key_dtype (tf.dtypes = tf.int64) – the data type of input keys. By default, it is tf.int64.

  • embedding_initializer (string or an instance of tf.keras.initializers.Initializer) – the initializer used to generate initial value for embedding variable. By default, it will use random_uniform where minval=-0.05, maxval=0.05.


initializer = tf.keras.initializers.RandomUniform() # or "random_uniform"

emb_layer = sok.DistributedEmbedding(combiner, max_vocabulary_size_per_gpu,
                                     embedding_vec_size, slot_num, max_nnz,

def _train_step(inputs, labels):
    emb_vectors = emb_layer(inputs)

for i, (inputs, labels) in enumerate(dataset):
call(inputs, training=True)[source]

The forward logic of this wrapper class.

  • inputs (tf.sparse.SparseTensor) – keys are stored in SparseTensor.values. SparseTensor.dense_shape is 2-dim and denotes [batchsize * slot_num, max_nnz]. Therefore, the rank of SparseTensor.indices must be 2 which denotes [row-indices, column-indices] in the corresponding dense tensor.

  • training (boolean) – whether training or not.


emb_vector – the embedding vectors for the input keys. Its shape is [batchsize, slot_num, embedding_vec_size]

Return type