transformers4rec.config package
Submodules
transformers4rec.config.schema module
transformers4rec.config.trainer module
-
class
transformers4rec.config.trainer.
T4RecTrainingArguments
(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: Union[transformers.trainer_utils.SchedulerType, str] = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'warning', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', save_steps: float = 500, save_total_limit: Optional[int] = None, save_safetensors: Optional[bool] = False, save_on_each_node: bool = False, no_cuda: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: Optional[int] = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, ddp_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[float] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', fsdp: str = '', fsdp_min_num_params: int = 0, fsdp_config: Optional[str] = None, fsdp_transformer_layer_cls_to_wrap: Optional[str] = None, deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: Union[transformers.training_args.OptimizerNames, str] = 'adamw_hf', optim_args: Optional[str] = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: Union[transformers.trainer_utils.HubStrategy, str] = 'every_save', hub_token: Optional[str] = None, hub_private_repo: bool = False, gradient_checkpointing: bool = False, include_inputs_for_metrics: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: Optional[str] = None, ray_scope: Optional[str] = 'last', ddp_timeout: Optional[int] = 1800, torch_compile: bool = False, torch_compile_backend: Optional[str] = None, torch_compile_mode: Optional[str] = None, xpu_backend: Optional[str] = None, max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'merlin', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 100, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source] Bases:
transformers.training_args.TrainingArguments
Class that inherits HF TrainingArguments and add on top of it arguments needed for session-based and sequential-based recommendation
- Parameters
shuffle_buffer_size (int) –
validate_every (Optional[int], int) – Run validation set every this epoch. -1 means no validation is used by default -1
eval_on_test_set (bool) –
eval_steps_on_train_set (int) –
predict_top_k (Option[int], int) – Truncate recommendation list to the highest top-K predicted items, (do not affect evaluation metrics computation), This parameter is specific to NextItemPredictionTask and only affects model.predict() and model.evaluate(), which both call Trainer.evaluation_loop. By default 100.
log_predictions (Optional[bool], bool) – log predictions, labels and metadata features each –compute_metrics_each_n_steps (for test set). by default False
log_attention_weights (Optional[bool], bool) – Logs the inputs and attention weights each –eval_steps (only test set)” by default False
learning_rate_num_cosine_cycles_by_epoch (Optional[int], int) – Number of cycles for by epoch when –lr_scheduler_type = cosine_with_warmup. The number of waves in the cosine schedule (e.g. 0.5 is to just decrease from the max value to 0, following a half-cosine). by default 1.25
experiments_group (Optional[str], str) – Name of the Experiments Group, for organizing job runs logged on W&B by default “default”
-
property
place_model_on_device
Override the method to allow running training on cpu
-
class
transformers4rec.config.trainer.
T4RecTrainingArgumentsTF
(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, evaluation_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Optional[int] = None, per_gpu_eval_batch_size: Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Optional[int] = None, eval_delay: Optional[float] = 0, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = - 1, lr_scheduler_type: Union[transformers.trainer_utils.SchedulerType, str] = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, log_level: Optional[str] = 'passive', log_level_replica: Optional[str] = 'warning', log_on_each_node: bool = True, logging_dir: Optional[str] = None, logging_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps', save_steps: float = 500, save_total_limit: Optional[int] = None, save_safetensors: Optional[bool] = False, save_on_each_node: bool = False, no_cuda: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: Optional[int] = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: Optional[bool] = None, local_rank: int = - 1, ddp_backend: Optional[str] = None, tpu_num_cores: Optional[int] = None, tpu_metrics_debug: bool = False, debug: str = '', dataloader_drop_last: bool = False, eval_steps: Optional[float] = None, dataloader_num_workers: int = 0, past_index: int = - 1, run_name: Optional[str] = None, disable_tqdm: Optional[bool] = None, remove_unused_columns: Optional[bool] = True, label_names: Optional[List[str]] = None, load_best_model_at_end: Optional[bool] = False, metric_for_best_model: Optional[str] = None, greater_is_better: Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', fsdp: str = '', fsdp_min_num_params: int = 0, fsdp_config: Optional[str] = None, fsdp_transformer_layer_cls_to_wrap: Optional[str] = None, deepspeed: Optional[str] = None, label_smoothing_factor: float = 0.0, optim: Union[transformers.training_args.OptimizerNames, str] = 'adamw_hf', optim_args: Optional[str] = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: Optional[str] = 'length', report_to: Optional[List[str]] = None, ddp_find_unused_parameters: Optional[bool] = None, ddp_bucket_cap_mb: Optional[int] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: Optional[str] = None, hub_model_id: Optional[str] = None, hub_strategy: Union[transformers.trainer_utils.HubStrategy, str] = 'every_save', hub_token: Optional[str] = None, hub_private_repo: bool = False, gradient_checkpointing: bool = False, include_inputs_for_metrics: bool = False, fp16_backend: str = 'auto', push_to_hub_model_id: Optional[str] = None, push_to_hub_organization: Optional[str] = None, push_to_hub_token: Optional[str] = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: Optional[str] = None, ray_scope: Optional[str] = 'last', ddp_timeout: Optional[int] = 1800, torch_compile: bool = False, torch_compile_backend: Optional[str] = None, torch_compile_mode: Optional[str] = None, xpu_backend: Optional[str] = None, max_sequence_length: Optional[int] = None, shuffle_buffer_size: int = 0, data_loader_engine: str = 'merlin', eval_on_test_set: bool = False, eval_steps_on_train_set: int = 20, predict_top_k: int = 100, learning_rate_num_cosine_cycles_by_epoch: float = 1.25, log_predictions: bool = False, compute_metrics_each_n_steps: int = 1, experiments_group: str = 'default')[source] Bases:
transformers4rec.config.trainer.T4RecTrainingArguments
,transformers.training_args_tf.TFTrainingArguments
Prepare Training arguments for TFTrainer, Inherit arguments from T4RecTrainingArguments and TFTrainingArguments
transformers4rec.config.transformer module
-
class
transformers4rec.config.transformer.
T4RecConfig
[source] Bases:
object
A class responsible for setting the configuration of the transformers class from Hugging Face and returning the corresponding T4Rec model.
-
to_huggingface_torch_model
()[source] Instantiate a Hugging Face transformer model based on the configuration parameters of the class.
- Returns
The Hugging Face transformer model.
- Return type
transformers.PreTrainedModel
-
to_torch_model
(input_features, *prediction_task, task_blocks=None, task_weights=None, loss_reduction='mean', **kwargs)[source] Links the Hugging Face transformer model to the given input block and prediction tasks, and returns a T4Rec model.
- Parameters
input_features (torch4rec.TabularSequenceFeatures) – The sequential block that represents the input features and defines the masking strategy for training and evaluation.
prediction_task (torch4rec.PredictionTask) – One or multiple prediction tasks.
task_blocks (list, optional) – List of task-specific blocks that we apply on top of the HF transformer’s output.
task_weights (list, optional) – List of the weights to use for combining the tasks losses.
loss_reduction (str, optional) –
- The reduction to apply to the prediction losses, possible values are:
’none’: no reduction will be applied, ‘mean’: the weighted mean of the output is taken, ‘sum’: the output will be summed.
By default: ‘mean’.
- Returns
The T4Rec torch model.
- Return type
torch4rec.Model
- Raises
ValueError – If input block or prediction task is of the wrong type.
-
property
transformers_config_cls
-
-
class
transformers4rec.config.transformer.
ReformerConfig
(attention_head_size=64, attn_layers=['local', 'lsh', 'local', 'lsh', 'local', 'lsh'], axial_norm_std=1.0, axial_pos_embds=True, axial_pos_shape=[64, 64], axial_pos_embds_dim=[64, 192], chunk_size_lm_head=0, eos_token_id=2, feed_forward_size=512, hash_seed=None, hidden_act='relu', hidden_dropout_prob=0.05, hidden_size=256, initializer_range=0.02, is_decoder=False, layer_norm_eps=1e-12, local_num_chunks_before=1, local_num_chunks_after=0, local_attention_probs_dropout_prob=0.05, local_attn_chunk_length=64, lsh_attn_chunk_length=64, lsh_attention_probs_dropout_prob=0.0, lsh_num_chunks_before=1, lsh_num_chunks_after=0, max_position_embeddings=4096, num_attention_heads=12, num_buckets=None, num_hashes=1, pad_token_id=0, vocab_size=320, tie_word_embeddings=False, use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.reformer.configuration_reformer.ReformerConfig
Subclass of T4RecConfig and transformers.ReformerConfig from Hugging Face. It handles configuration for Reformer layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, axial_pos_shape_first_dim=4, **kwargs)[source] Creates an instance of ReformerConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
axial_pos_shape_first_dim (int, optional) – The first dimension of the axial position encodings. During training, the product of the position dims has to be equal to the sequence length.
- Returns
An instance of ReformerConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
GPT2Config
(vocab_size=50257, n_positions=1024, n_embd=768, n_layer=12, n_head=12, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, scale_attn_weights=True, use_cache=True, bos_token_id=50256, eos_token_id=50256, scale_attn_by_inverse_layer_idx=False, reorder_and_upcast_attn=False, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.gpt2.configuration_gpt2.GPT2Config
Subclass of T4RecConfig and transformers.GPT2Config from Hugging Face. It handles configuration for GPT2 layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of GPT2Config with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of GPT2Config.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
LongformerConfig
(attention_window: Union[List[int], int] = 512, sep_token_id: int = 2, pad_token_id: int = 1, bos_token_id: int = 0, eos_token_id: int = 2, vocab_size: int = 30522, hidden_size: int = 768, num_hidden_layers: int = 12, num_attention_heads: int = 12, intermediate_size: int = 3072, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.1, attention_probs_dropout_prob: float = 0.1, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, onnx_export: bool = False, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.longformer.configuration_longformer.LongformerConfig
Subclass of T4RecConfig and transformers.LongformerConfig from Hugging Face. It handles configuration for LongformerConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of LongformerConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of LongformerConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
ElectraConfig
(vocab_size=30522, embedding_size=128, hidden_size=256, num_hidden_layers=12, num_attention_heads=4, intermediate_size=1024, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, summary_type='first', summary_use_proj=True, summary_activation='gelu', summary_last_dropout=0.1, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.electra.configuration_electra.ElectraConfig
Subclass of T4RecConfig and transformers.ElectraConfig from Hugging Face. It handles configuration for ElectraConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of ElectraConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of ElectraConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
AlbertConfig
(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.albert.configuration_albert.AlbertConfig
Subclass of T4RecConfig and transformers.AlbertConfig from Hugging Face. It handles configuration for AlbertConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of AlbertConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of AlbertConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
XLNetConfig
(vocab_size=32000, d_model=1024, n_layer=24, n_head=16, d_inner=4096, ff_activation='gelu', untie_r=True, attn_type='bi', initializer_range=0.02, layer_norm_eps=1e-12, dropout=0.1, mem_len=512, reuse_len=None, use_mems_eval=True, use_mems_train=False, bi_data=False, clamp_len=- 1, same_length=False, summary_type='last', summary_use_proj=True, summary_activation='tanh', summary_last_dropout=0.1, start_n_top=5, end_n_top=5, pad_token_id=5, bos_token_id=1, eos_token_id=2, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.xlnet.configuration_xlnet.XLNetConfig
Subclass of T4RecConfig and transformers.XLNetConfig from Hugging Face. It handles configuration for XLNetConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length=None, attn_type='bi', hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, mem_len=1, **kwargs)[source] Creates an instance of XLNetConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
mem_len (int,) – The number of tokens to be cached. Pre-computed key/value pairs from a previous forward pass are stored and won’t be re-computed. This parameter is especially useful for long sequence modeling where different batches may truncate the entire sequence. Tasks like user-aware recommendation could benefit from this feature. By default, this parameter is set to 1, which means no caching is used.
- Returns
An instance of XLNetConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
BertConfig
(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=0, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.bert.configuration_bert.BertConfig
Subclass of T4RecConfig and transformers.BertConfig from Hugging Face. It handles configuration for BertConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of BertConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of BertConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
RobertaConfig
(vocab_size=50265, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=1, bos_token_id=0, eos_token_id=2, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.roberta.configuration_roberta.RobertaConfig
Subclass of T4RecConfig and transformers.RobertaConfig from Hugging Face. It handles configuration for RobertaConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of RobertaConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of RobertaConfig.
- Return type
-
classmethod
-
class
transformers4rec.config.transformer.
TransfoXLConfig
(vocab_size=267735, cutoffs=[20000, 40000, 200000], d_model=1024, d_embed=1024, n_head=16, d_head=64, d_inner=4096, div_val=4, pre_lnorm=False, n_layer=18, mem_len=1600, clamp_len=1000, same_length=True, proj_share_all_but_first=True, attn_type=0, sample_softmax=- 1, adaptive=True, dropout=0.1, dropatt=0.0, untie_r=True, init='normal', init_range=0.01, proj_init_std=0.01, init_std=0.02, layer_norm_epsilon=1e-05, eos_token_id=0, **kwargs)[source] Bases:
transformers4rec.config.transformer.T4RecConfig
,transformers.models.transfo_xl.configuration_transfo_xl.TransfoXLConfig
Subclass of T4RecConfig and transformers. TransfoXLConfig from Hugging Face. It handles configuration for TransfoXLConfig layers in the context of T4Rec models.
-
classmethod
build
(d_model, n_head, n_layer, total_seq_length, hidden_act='gelu', initializer_range=0.01, layer_norm_eps=0.03, dropout=0.3, pad_token=0, log_attention_weights=False, **kwargs)[source] Creates an instance of TransfoXLConfig with the given parameters.
- Parameters
{transformer_cfg_parameters} –
- Returns
An instance of TransfoXLConfig.
- Return type
-
classmethod