Torch Dataloader
-
class
nvtabular.loader.torch.
IterDL
(file_paths, batch_size=1, shuffle=False)[source] Bases:
torch.utils.data.dataset.Dataset
[torch.utils.data.dataset.T_co
]
-
class
nvtabular.loader.torch.
TorchAsyncItr
(dataset, cats=None, conts=None, labels=None, batch_size=1, shuffle=False, seed_fn=None, parts_per_chunk=1, device=None, global_size=None, global_rank=None, drop_last=False, sparse_names=None, sparse_max=None, sparse_as_dense=False)[source] Bases:
torch.utils.data.dataset.Dataset
[torch.utils.data.dataset.T_co
]This class creates batches of tensor. Each batch size is specified by the user. The data input requires an NVTabular dataset. Handles spillover to ensure all batches are the specified size until the final batch.
- Parameters
dataset (NVTabular dataset) –
cats ([str]) – the list of categorical columns in the dataset
conts ([str]) – the list of continuous columns in the dataset
labels ([str]) – the list of label columns in the dataset
batch_size (int) – the size of each batch to supply to the model
shuffle (bool) – enable/disable shuffling of dataset
parts_per_chunk (int) – number of partitions from the iterator, an NVTabular Dataset, to concatenate into a “chunk”
device (int) – device id of selected GPU
sparse_list ([str]) – list with column names of columns that should be represented as sparse tensors
sparse_max ({str: int}) – dictionary of key: column_name + value: integer representing max sequence length for column
sparse_dense (bool) – bool value to activate transforming sparse tensors to dense
-
class
nvtabular.loader.torch.
DLDataLoader
(dataset: torch.utils.data.dataset.Dataset[T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source] Bases:
Generic
[torch.utils.data.dataloader.T_co
]This class is an extension of the torch dataloader. It is required to support the FastAI framework.
-
property
device
-
dataset
: torch.utils.data.dataset.Dataset[T_co]
-
sampler
: Union[torch.utils.data.sampler.Sampler, Iterable]
-
property
Torch Layers
-
class
nvtabular.framework_utils.torch.layers.embeddings.
ConcatenatedEmbeddings
(embedding_table_shapes, dropout=0.0, sparse_columns=())[source] Bases:
torch.nn.modules.module.Module
Map multiple categorical variables to concatenated embeddings.
- Parameters
embedding_table_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.
dropout – A float.
sparse_columns – A list of sparse columns
- Inputs:
x: An int64 Tensor with shape [batch_size, num_variables].
- Outputs:
A Float Tensor with shape [batch_size, embedding_size_after_concat].
-
class
nvtabular.framework_utils.torch.layers.embeddings.
MultiHotEmbeddings
(embedding_table_shapes, dropout=0.0, mode='sum')[source] Bases:
torch.nn.modules.module.Module
Map multiple categorical variables to concatenated embeddings.
- Parameters
embedding_dict_shapes – A dictionary mapping column names to (cardinality, embedding_size) tuples.
dropout – A float.
- Inputs:
- x: A dictionary with multi-hot column name as keys and a tuple
containing the column values and offsets as values.
- Outputs:
A Float Tensor with shape [batch_size, embedding_size_after_concat].
-
class
nvtabular.framework_utils.torch.models.
Model
(embedding_table_shapes, num_continuous, emb_dropout, layer_hidden_dims, layer_dropout_rates, max_output=None, bag_mode='sum')[source] Bases:
torch.nn.modules.module.Module
Generic Base Pytorch Model, that contains support for Categorical and Continuous values.
- Parameters
embedding_tables_shapes (dict) – A dictionary representing the <column>: <max cardinality of column> for all categorical columns.
num_continuous (int) – Number of continuous columns in data.
emb_dropout (float, 0 - 1) – Sets the embedding dropout rate.
layer_hidden_dims (list) – Hidden layer dimensions.
layer_dropout_rates (list) – A list of the layer dropout rates expressed as floats, 0-1, for each layer
max_output (float) – Signifies the max output.