merlin.schema.schema.Schema, candidate_id_tag=Tags.ITEM_ID, top_block: typing.Optional[keras.engine.base_layer.Layer] = MLPBlock(   (layers): List(     (0): _Dense(       (dense): Dense(64, activation=relu, use_bias=True)     )   ) ), post: typing.Optional[keras.engine.base_layer.Layer] = None, inputs: typing.Optional[keras.engine.base_layer.Layer] = None, outputs: typing.Optional[typing.Union[, typing.List[]]] = None, logits_temperature: float = 1.0, num_sampled: int = 100, min_sampled_id: int = 0, **kwargs)[source]#

Build the Youtube-DNN retrieval model. More details of the architecture can be found in 1. Training with sampled_softmax is enabled by default 2 3 4.

Example Usage::

model = YoutubeDNNRetrievalModelV2(schema, num_sampled=100) model.compile(optimizer=”adam”), epochs=10)



Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM conference on recommender systems. 2016.


Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).


Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713–722.


Jean, Sébastien, et al. “On using very large target vocabulary for neural machine translation.” arXiv preprint arXiv:1412.2007 (2014).

  • schema (Schema) – The Schema with the input features

  • candidate_id_tag (Tag) – The tag to select candidate-id feature, by default Tags.ITEM_ID

  • top_block (tf.keras.layers.Layer) – The hidden layers to apply on top of the features representation vector.

  • inputs (tf.keras.layers.Layer, optional) – The input layer to encode input features (sparse and context features) If not specified, the input layer is inferred from the schema By default None

  • post (Optional[tf.keras.layers.Layer], optional) – The optional layer to apply on top of the query encoder.

  • logits_temperature (float, optional) – Parameter used to reduce model overconfidence, so that logits / T. Defaults to 1.

  • num_sampled (int, optional) – When sampled_softmax is enabled, specify the number of negative candidates to generate for each batch. By default 100

  • min_sampled_id (int, optional) – The minimum id value to be sampled with sampled softmax. Useful to ignore the first categorical encoded ids, which are usually reserved for <nulls>, out-of-vocabulary or padding. By default 0.