merlin.models.tf.YoutubeDNNRetrievalModel

merlin.models.tf.YoutubeDNNRetrievalModel(schema: merlin.schema.schema.Schema, max_seq_length: int, aggregation: str = 'concat', top_block: merlin.models.tf.blocks.core.base.Block = MLPBlock(   (layers): List(     (0): _Dense(       (dense): Dense(64, activation=relu, use_bias=True)     )   ) ), l2_normalization: bool = True, extra_pre_call: Optional[merlin.models.tf.blocks.core.base.Block] = None, task_block: Optional[merlin.models.tf.blocks.core.base.Block] = None, logits_temperature: float = 1.0, seq_aggregator: merlin.models.tf.blocks.core.base.Block = SequenceAggregator(), sampled_softmax: bool = True, num_sampled: int = 100, min_sampled_id: int = 0, embedding_options: merlin.models.tf.inputs.embedding.EmbeddingOptions = EmbeddingOptions(embedding_dims=None, embedding_dim_default=64, infer_embedding_sizes=False, infer_embedding_sizes_multiplier=2.0, infer_embeddings_ensure_dim_multiple_of_8=False, embeddings_initializers=None, embeddings_l2_reg=0.0, combiner='mean'))merlin.models.tf.models.base.Model[source]

Build the Youtube-DNN retrieval model. More details of the architecture can be found in 1. The sampled_softmax is enabled by default 2 3 4.

Example Usage::

model = YoutubeDNNRetrievalModel(schema, num_sampled=100) model.compile(optimizer=”adam”) model.fit(train_data, epochs=10)

References

1

Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM conference on recommender systems. 2016.

2

Yoshua Bengio and Jean-Sébastien Sénécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the conference on Artificial Intelligence and Statistics (AISTATS).

3

Y. Bengio and J. S. Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. Trans. Neur. Netw. 19, 4 (April 2008), 713–722. https://doi.org/10.1109/TNN.2007.912312

4

Jean, Sébastien, et al. “On using very large target vocabulary for neural machine translation.” arXiv preprint arXiv:1412.2007 (2014).

Parameters
  • schema (Schema) – The Schema with the input features

  • aggregation (str) – The aggregation method to use for the sequence of features. Defaults to concat.

  • top_block (Block) – The Block that combines the top features

  • l2_normalization (bool) – Whether to apply L2 normalization before computing dot interactions. Defaults to True.

  • extra_pre_call (Optional[Block]) – The optional Block to apply before the model.

  • task_block (Optional[Block]) – The optional Block to apply on the model.

  • logits_temperature (float) – Parameter used to reduce model overconfidence, so that logits / T. Defaults to 1.

  • seq_aggregator (Block) – The Block to aggregate the sequence of features.

  • sampled_softmax (bool) – Compute the logits scores over all items of the catalog or generate a subset of candidates Defaults to False

  • num_sampled (int) – When sampled_softmax is enabled, specify the number of negative candidates to generate for each batch. Defaults to 100

  • min_sampled_id (int) – The minimum id value to be sampled with sampled softmax. Useful to ignore the first categorical encoded ids, which are usually reserved for <nulls>, out-of-vocabulary or padding. Defaults to 0.

  • embedding_options (EmbeddingOptions, optional) – An EmbeddingOptions instance, which allows for a number of options for the embedding table, by default EmbeddingOptions()