merlin.models.tf.YoutubeDNNRetrievalModel

merlin.models.tf.YoutubeDNNRetrievalModel(schema: merlin.schema.schema.Schema, aggregation: str = 'concat', top_block: merlin.models.tf.core.base.Block = MLPBlock( (layers): List( (0): _Dense( (dense): Dense(64, activation=relu, use_bias=True) ) ) ), l2_normalization: bool = True, extra_pre_call: Optional[merlin.models.tf.core.base.Block] = None, task_block: Optional[merlin.models.tf.core.base.Block] = None, logits_temperature: float = 1.0, sampled_softmax: bool = True, num_sampled: int = 100, min_sampled_id: int = 0, embedding_options: merlin.models.tf.inputs.embedding.EmbeddingOptions = EmbeddingOptions(embedding_dims=None, embedding_dim_default=64, infer_embedding_sizes=False, infer_embedding_sizes_multiplier=2.0, infer_embeddings_ensure_dim_multiple_of_8=False, embeddings_initializers=None, embeddings_l2_reg=0.0, combiner='mean')) → merlin.models.tf.models.base.Model[source]

Build the Youtube-DNN retrieval model. More details of the architecture can be found in 1. The sampled_softmax is enabled by default 2 3 4.

Example Usage::: model = YoutubeDNNRetrievalModel(schema, num_sampled=100) model.compile(optimizer=”adam”) model.fit(train_data, epochs=10)

References

1: Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM conference on recommender systems. 2016.
2: Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).
3: Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713–722. https://doi.org/10.1109/TNN.2007.912312
4: Jean, Sébastien, et al. “On using very large target vocabulary for neural machine translation.” arXiv preprint arXiv:1412.2007 (2014).

Parameters

schema (Schema) – The Schema with the input features
aggregation (str) – The aggregation method to use for the sequence of features. Defaults to concat.
top_block (Block) – The Block that combines the top features
l2_normalization (bool) – Whether to apply L2 normalization before computing dot interactions. Defaults to True.
extra_pre_call (Optional[Block]) – The optional Block to apply before the model.
task_block (Optional[Block]) – The optional Block to apply on the model.
logits_temperature (float) – Parameter used to reduce model overconfidence, so that logits / T. Defaults to 1.
sampled_softmax (bool) – Compute the logits scores over all items of the catalog or generate a subset of candidates Defaults to False
num_sampled (int) – When sampled_softmax is enabled, specify the number of negative candidates to generate for each batch. Defaults to 100
min_sampled_id (int) – The minimum id value to be sampled with sampled softmax. Useful to ignore the first categorical encoded ids, which are usually reserved for <nulls>, out-of-vocabulary or padding. Defaults to 0.
embedding_options (EmbeddingOptions, optional) – An EmbeddingOptions instance, which allows for a number of options for the embedding table, by default EmbeddingOptions()