merlin.models.tf.InBatchSampler
-
class
merlin.models.tf.InBatchSampler(*args, **kwargs)[source] Bases:
merlin.models.tf.blocks.sampling.base.ItemSamplerProvides in-batch sampling 1 for two-tower item retrieval models. The implementation is very simple, as it just returns the current item embeddings and metadata, but it is necessary to have InBatchSampler under the same interface of other more advanced samplers (e.g. CachedCrossBatchSampler). In a nutshell, for a given (user,item) embeddings pair, the other in-batch item embeddings are used as negative items, rather than computing different embeddings exclusively for negative items. This is a popularity-biased sampling as popular items are observed more often in training batches. P.s. Ignoring the false negatives (negative items equal to the positive ones) is managed by ItemRetrievalScorer(…, sampling_downscore_false_negatives=True)
References
- 1
Yi, Xinyang, et al. “Sampling-bias-corrected neural modeling for large corpus item recommendations.” Proceedings of the 13th ACM Conference on Recommender Systems. 2019.
- Parameters
batch_size (int, optional) – The batch size. If not set it is inferred when the layer is built (first call())
Methods
__init__([batch_size])add(inputs[, training])add_loss(losses, **kwargs)Add loss tensor(s), potentially dependent on layer inputs.
add_metric(value[, name])Adds metric tensor to the layer.
add_update(updates)Add update op(s), potentially dependent on layer inputs.
add_variable(*args, **kwargs)Deprecated, do NOT use! Alias for add_weight.
add_weight([name, shape, dtype, …])Adds a new variable to the layer.
build(input_shapes)build_from_config(config)call(inputs[, training])Returns the item embeddings and item metadata from the current batch.
compute_mask(inputs[, mask])Computes an output mask tensor.
compute_output_shape(input_shape)Computes the output shape of the layer.
compute_output_signature(input_signature)Compute the output tensor signature of the layer based on the inputs.
count_params()Count the total number of scalars composing the weights.
finalize_state()Finalizes the layers state after updating layer weights.
from_config(config)Creates a layer from its config.
get_build_config()get_input_at(node_index)Retrieves the input tensor(s) of a layer at a given node.
get_input_mask_at(node_index)Retrieves the input mask tensor(s) of a layer at a given node.
get_input_shape_at(node_index)Retrieves the input shape(s) of a layer at a given node.
get_output_at(node_index)Retrieves the output tensor(s) of a layer at a given node.
get_output_mask_at(node_index)Retrieves the output mask tensor(s) of a layer at a given node.
get_output_shape_at(node_index)Retrieves the output shape(s) of a layer at a given node.
get_weights()Returns the current weights of the layer, as NumPy arrays.
sample()set_batch_size(value)set_max_num_samples(value)set_weights(weights)Sets the weights of the layer, from NumPy arrays.
with_name_scope(method)Decorator to automatically enter the module name scope.
Attributes
activity_regularizerOptional regularizer function for the output of this layer.
compute_dtypeThe dtype of the layer’s computations.
dtypeThe dtype of the layer weights.
dtype_policyThe dtype policy associated with this layer.
dynamicWhether the layer is dynamic (eager-only); set in the constructor.
inbound_nodesReturn Functional API nodes upstream of this layer.
inputRetrieves the input tensor(s) of a layer.
input_maskRetrieves the input mask tensor(s) of a layer.
input_shapeRetrieves the input shape(s) of a layer.
input_specInputSpec instance(s) describing the input format for this layer.
lossesList of losses added using the add_loss() API.
max_num_samplesmetricsList of metrics added using the add_metric() API.
nameName of the layer (string), set in the constructor.
name_scopeReturns a tf.name_scope instance for this class.
non_trainable_variablesnon_trainable_weightsList of all non-trainable weights tracked by this layer.
outbound_nodesReturn Functional API nodes downstream of this layer.
outputRetrieves the output tensor(s) of a layer.
output_maskRetrieves the output mask tensor(s) of a layer.
output_shapeRetrieves the output shape(s) of a layer.
required_featuresstatefulsubmodulesSequence of all sub-modules.
supports_maskingWhether this layer supports computing a mask using compute_mask.
trainabletrainable_variablestrainable_weightsList of all trainable weights tracked by this layer.
updatesvariable_dtypeAlias of Layer.dtype, the dtype of the weights.
variablesReturns the list of all layer variables/weights.
weightsReturns the list of all layer variables/weights.
-
property
batch_size
-
call(inputs: Dict[str, tensorflow.python.framework.ops.Tensor], training=True) → merlin.models.tf.core.base.EmbeddingWithMetadata[source] Returns the item embeddings and item metadata from the current batch. The implementation is very simple, as it just returns the current item embeddings and metadata, but it is necessary to have InBatchSampler under the same interface of other more advanced samplers (e.g. CachedCrossBatchSampler).
- Parameters
inputs (TabularData) –
- Dict with two keys:
”items_embeddings”: Items embeddings tensor “items_metadata”: Dict like {“<feature name>”: “<feature tensor>”} which contains features that might be relevant for the sampler. The InBatchSampler does not use metadata features specifically, but “item_id” is required when using in combination with ItemRetrievalScorer(…, sampling_downscore_false_negatives=True), so that false negatives are identified and downscored.
training (bool, optional) – Flag indicating if on training mode, by default True
- Returns
Value object with the sampled item embeddings and item metadata
- Return type
EmbeddingWithMetadata