merlin.schema.schema.Schema, aggregation: str = 'concat', top_block: = MLPBlock(   (layers): List(     (0): _Dense(       (dense): Dense(64, activation=relu, use_bias=True)     )   ) ), l2_normalization: bool = True, extra_pre_call: typing.Optional[] = None, task_block: typing.Optional[] = None, logits_temperature: float = 1.0, sampled_softmax: bool = True, num_sampled: int = 100, min_sampled_id: int = 0, embedding_options: = EmbeddingOptions(embedding_dims=None, embedding_dim_default=64, infer_embedding_sizes=False, infer_embedding_sizes_multiplier=2.0, infer_embeddings_ensure_dim_multiple_of_8=False, embeddings_initializers=None, embeddings_l2_reg=0.0, combiner='mean'))[source]#

Build the Youtube-DNN retrieval model. More details of the architecture can be found in 1. The sampled_softmax is enabled by default 2 3 4.

Example Usage::

model = YoutubeDNNRetrievalModel(schema, num_sampled=100) model.compile(optimizer=”adam”), epochs=10)



Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM conference on recommender systems. 2016.


Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).


Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713–722.


Jean, Sébastien, et al. “On using very large target vocabulary for neural machine translation.” arXiv preprint arXiv:1412.2007 (2014).

  • schema (Schema) – The Schema with the input features

  • aggregation (str) – The aggregation method to use for the sequence of features. Defaults to concat.

  • top_block (Block) – The Block that combines the top features

  • l2_normalization (bool) – Whether to apply L2 normalization before computing dot interactions. Defaults to True.

  • extra_pre_call (Optional[Block]) – The optional Block to apply before the model.

  • task_block (Optional[Block]) – The optional Block to apply on the model.

  • logits_temperature (float) – Parameter used to reduce model overconfidence, so that logits / T. Defaults to 1.

  • sampled_softmax (bool) – Compute the logits scores over all items of the catalog or generate a subset of candidates Defaults to False

  • num_sampled (int) – When sampled_softmax is enabled, specify the number of negative candidates to generate for each batch. Defaults to 100

  • min_sampled_id (int) – The minimum id value to be sampled with sampled softmax. Useful to ignore the first categorical encoded ids, which are usually reserved for <nulls>, out-of-vocabulary or padding. Defaults to 0.

  • embedding_options (EmbeddingOptions, optional) – An EmbeddingOptions instance, which allows for a number of options for the embedding table, by default EmbeddingOptions()