merlin.models.tf.MMOEBlock#
- merlin.models.tf.MMOEBlock(outputs: Union[List[str], List[merlin.models.tf.prediction_tasks.base.PredictionTask], merlin.models.tf.prediction_tasks.base.ParallelPredictionBlock, merlin.models.tf.core.combinators.ParallelBlock], expert_block: merlin.models.tf.core.base.Block, num_experts: int, gate_block: Optional[merlin.models.tf.core.base.Block] = None, gate_softmax_temperature: float = 1.0, **gate_kwargs) merlin.models.tf.core.combinators.SequentialBlock [source]#
Implements the Multi-gate Mixture-of-Experts (MMoE) introduced in [1].
References
[1] Ma, Jiaqi, et al. “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
- Parameters
outputs (Union[List[str], List[PredictionTask], ParallelPredictionBlock, ParallelBlock]) – Names of the tasks or PredictionTask/ParallelPredictionBlock objects from which we can extract the task names. A gate is created for each task.
expert_block (Block) – Expert block to be replicated, e.g. MLPBlock([64])
num_experts (int) – Number of experts to be replicated
gate_block (Block, optional) – Allows for having a Block (e.g. MLPBlock([32])) to combine the inputs before the final projection layer (created automatically) that outputs a softmax distribution over the number of experts. This might give more capacity to the gates to decide from the inputs how to better combine the experts.
gate_softmax_temperature (float, optional) – The temperature used by the gates, by default 1.0. It can be used to smooth the weights distribution over experts outputs.
- Returns
Outputs the sequence of blocks that implement MMOE
- Return type