embedding

Embedding Layers

Embedding

class distributed_embeddings.python.layers.embedding.Embedding(*args, **kwargs)[source]

Turns indices into vectors of fixed size.

Parameters:
  • input_dim (int) – Size of the vocabulary, i.e. maximum index + 1.

  • output_dim (int) – Length of embedding vectors.

  • embeddings_initializer – Initializer for the embeddings matrix (see keras.initializers).

  • embeddings_regularizer – Regularizer function applied to the embeddings matrix (see keras.regularizers).

  • embeddings_constraint – Constraint function applied to the embeddings matrix (see keras.constraints).

  • combiner (str) – Reduction method, [‘sum’, ‘mean’] or None. Default None.

When combiner is not None, supported input and their respectively output shape are:

N-D Tensor: (d1,…,dn), output shape: (d1,…,dn-1,output_dim), N >= 2 2-D RaggedTensor: (batch_size, ragged_dim), output shape: (batch_size, output_dim) 2-D SparseTensor: (batch_size, max_hotness), output shape: (batch_size, output_dim)

Embedding picked from last input dimension will be reduced with given combiner.

build(input_shape)

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

compute_output_shape(input_shape)

Computes the output shape of the layer.

This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.

Parameters:

input_shape – Shape tuple (tuple of integers) or tf.TensorShape, or structure of shape tuples / tf.TensorShape instances (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns:

A tf.TensorShape instance or structure of tf.TensorShape instances.

classmethod from_config(config)[source]

Creates a layer from its config. Overriding this to enable instatiating fast embedding from keras embedding configs

get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:

Python dictionary.

Embedding Ops

embedding_lookup

distributed_embeddings.python.ops.embedding_lookup_ops.embedding_lookup(param, ids, combiner=None)[source]

Looks up embeddings for the given ids from a embedding tensor.

Parameters:
  • param (Tensor) – A single tensor representing the complete embedding tensor.

  • ids (Tensor) – A 2D int32 or int64 Tensor containing the ids to be looked up in param. Also support RaggedTensor and SparseTensor.

  • combiner (string or None) – Reduction method, [‘sum’, ‘mean’] or None. Default None.

Returns:

Tensor – A Tensor with the same type as the tensors in param.

Note

When combiner is None, returned tensor has shape: shape(ids) + shape(param)[1]

Otherwise, embedding from same row is reduced and returned tensor has shape: shape(ids)[0] + shape(param)[1]

Note when ids is RaggedTensor, its values and row_splits are col_index and row_index of CSR format hotness matrix, thus can be directly constructed.

Raises:
  • TypeError – If param is empty.

  • ValueError – If ids is not 2D tensor.

IntegerLookup Layers

Embedding

class distributed_embeddings.python.layers.embedding.IntegerLookup(*args, **kwargs)[source]

A preprocessing layer which maps integer features to contiguous ranges. Vocabulary is generated on the fly, static vocabulary and adapt() will be supported. Partially support features of tf.keras.layers.IntegerLookup. Frequency of keys are counted when GPU algorithm is used.