Torch Dataloader

class nvtabular.loader.torch.IterDL(file_paths, batch_size=1, shuffle=False)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

class nvtabular.loader.torch.TorchAsyncItr(dataset, cats=None, conts=None, labels=None, batch_size=1, shuffle=False, seed_fn=None, parts_per_chunk=1, device=None, global_size=None, global_rank=None, drop_last=False, sparse_names=None, sparse_max=None, sparse_as_dense=False)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

This class creates batches of tensor. Each batch size is specified by the user. The data input requires an NVTabular dataset. Handles spillover to ensure all batches are the specified size until the final batch.

Parameters
  • dataset (NVTabular dataset) –

  • cats ([str]) – the list of categorical columns in the dataset

  • conts ([str]) – the list of continuous columns in the dataset

  • labels ([str]) – the list of label columns in the dataset

  • batch_size (int) – the size of each batch to supply to the model

  • shuffle (bool) – enable/disable shuffling of dataset

  • parts_per_chunk (int) – number of partitions from the iterator, an NVTabular Dataset, to concatenate into a “chunk”

  • device (int) – device id of selected GPU

  • sparse_list ([str]) – list with column names of columns that should be represented as sparse tensors

  • sparse_max ({str: int}) – dictionary of key: column_name + value: integer representing max sequence length for column

  • sparse_as_dense (bool) – bool value to activate transforming sparse tensors to dense

class nvtabular.loader.torch.DLDataLoader(dataset: torch.utils.data.dataset.Dataset[T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

This class is an extension of the torch dataloader. It is required to support the FastAI framework.

property device
dataset: torch.utils.data.dataset.Dataset[T_co]
batch_size: Optional[int]
num_workers: int
pin_memory: bool
drop_last: bool
timeout: float
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
prefetch_factor: int

Torch Layers

class nvtabular.framework_utils.torch.layers.embeddings.ConcatenatedEmbeddings(embedding_table_shapes, dropout=0.0, sparse_columns=())[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters
  • embedding_table_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.

  • dropout – A float.

  • sparse_columns – A list of sparse columns

Inputs:

x: An int64 Tensor with shape [batch_size, num_variables].

Outputs:

A Float Tensor with shape [batch_size, embedding_size_after_concat].

forward(x)[source]
training: bool
class nvtabular.framework_utils.torch.layers.embeddings.MultiHotEmbeddings(embedding_table_shapes, dropout=0.0, mode='sum')[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters
  • embedding_dict_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.

  • dropout – A float.

Inputs:
x: A dictionary with multi-hot column name as keys and a tuple

containing the column values and offsets as values.

Outputs:

A Float Tensor with shape [batch_size, embedding_size_after_concat].

forward(x)[source]
training: bool
class nvtabular.framework_utils.torch.models.Model(embedding_table_shapes, num_continuous, emb_dropout, layer_hidden_dims, layer_dropout_rates, max_output=None, bag_mode='sum')[source]

Bases: torch.nn.modules.module.Module

Generic Base Pytorch Model, that contains support for Categorical and Continuous values.

Parameters
  • embedding_tables_shapes (dict) – A dictionary representing the <column>: <max cardinality of column> for all categorical columns.

  • num_continuous (int) – Number of continuous columns in data.

  • emb_dropout (float, 0 - 1) – Sets the embedding dropout rate.

  • layer_hidden_dims (list) – Hidden layer dimensions.

  • layer_dropout_rates (list) – A list of the layer dropout rates expressed as floats, 0-1, for each layer

  • max_output (float) – Signifies the max output.

forward(x_cat, x_cont)[source]
training: bool