Torch Dataloader

class nvtabular.loader.torch.IterDL(file_paths, batch_size=1, shuffle=False)[source]: Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

class nvtabular.loader.torch.TorchAsyncItr(dataset, cats=None, conts=None, labels=None, batch_size=1, shuffle=False, seed_fn=None, parts_per_chunk=1, device=None, global_size=None, global_rank=None, drop_last=False, sparse_names=None, sparse_max=None, sparse_as_dense=False)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

This class creates batches of tensor. Each batch size is specified by the user. The data input requires an NVTabular dataset. Handles spillover to ensure all batches are the specified size until the final batch.

Parameters

dataset (NVTabular dataset) –
cats ([str]) – the list of categorical columns in the dataset
conts ([str]) – the list of continuous columns in the dataset
labels ([str]) – the list of label columns in the dataset
batch_size (int) – the size of each batch to supply to the model
shuffle (bool) – enable/disable shuffling of dataset
parts_per_chunk (int) – number of partitions from the iterator, an NVTabular Dataset, to concatenate into a “chunk”
device (int) – device id of selected GPU
sparse_list ([str]) – list with column names of columns that should be represented as sparse tensors
sparse_max ({str: int}) – dictionary of key: column_name + value: integer representing max sequence length for column
sparse_as_dense (bool) – bool value to activate transforming sparse tensors to dense

class nvtabular.loader.torch.DLDataLoader(dataset: torch.utils.data.dataset.Dataset[T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

This class is an extension of the torch dataloader. It is required to support the FastAI framework.

property device

dataset: torch.utils.data.dataset.Dataset[T_co]

batch_size: Optional[int]

num_workers: int

pin_memory: bool

drop_last: bool

timeout: float

sampler: Union[torch.utils.data.sampler.Sampler, Iterable]

prefetch_factor: int

Torch Layers

class nvtabular.framework_utils.torch.layers.embeddings.ConcatenatedEmbeddings(embedding_table_shapes, dropout=0.0, sparse_columns=())[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters

embedding_table_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.
dropout – A float.
sparse_columns – A list of sparse columns

Inputs:: x: An int64 Tensor with shape [batch_size, num_variables].
Outputs:: A Float Tensor with shape [batch_size, embedding_size_after_concat].

forward(x)[source]

training: bool

class nvtabular.framework_utils.torch.layers.embeddings.MultiHotEmbeddings(embedding_table_shapes, dropout=0.0, mode='sum')[source]

Bases: torch.nn.modules.module.Module

Map multiple categorical variables to concatenated embeddings.

Parameters

embedding_dict_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.
dropout – A float.

Inputs:

x: A dictionary with multi-hot column name as keys and a tuple: containing the column values and offsets as values.

Outputs:

A Float Tensor with shape [batch_size, embedding_size_after_concat].

forward(x)[source]

training: bool

class nvtabular.framework_utils.torch.models.Model(embedding_table_shapes, num_continuous, emb_dropout, layer_hidden_dims, layer_dropout_rates, max_output=None, bag_mode='sum')[source]

Bases: torch.nn.modules.module.Module

Generic Base Pytorch Model, that contains support for Categorical and Continuous values.

Parameters

embedding_tables_shapes (dict) – A dictionary representing the <column>: <max cardinality of column> for all categorical columns.
num_continuous (int) – Number of continuous columns in data.
emb_dropout (float, 0 - 1) – Sets the embedding dropout rate.
layer_hidden_dims (list) – Hidden layer dimensions.
layer_dropout_rates (list) – A list of the layer dropout rates expressed as floats, 0-1, for each layer
max_output (float) – Signifies the max output.

forward(x_cat, x_cont)[source]

training: bool