Union[List[str], List[],,], expert_block:, num_experts: int, gate_block: Optional[] = None, gate_softmax_temperature: float = 1.0, **gate_kwargs)[source]#

Implements the Multi-gate Mixture-of-Experts (MMoE) introduced in [1].


[1] Ma, Jiaqi, et al. “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.

  • outputs (Union[List[str], List[PredictionTask], ParallelPredictionBlock, ParallelBlock]) – Names of the tasks or PredictionTask/ParallelPredictionBlock objects from which we can extract the task names. A gate is created for each task.

  • expert_block (Block) – Expert block to be replicated, e.g. MLPBlock([64])

  • num_experts (int) – Number of experts to be replicated

  • gate_block (Block, optional) – Allows for having a Block (e.g. MLPBlock([32])) to combine the inputs before the final projection layer (created automatically) that outputs a softmax distribution over the number of experts. This might give more capacity to the gates to decide from the inputs how to better combine the experts.

  • gate_softmax_temperature (float, optional) – The temperature used by the gates, by default 1.0. It can be used to smooth the weights distribution over experts outputs.


Outputs the sequence of blocks that implement MMOE

Return type