Accelerated Training with PyTorch
When training pipelines with PyTorch, the dataloader cannot prepare
sequential batches fast enough, so the GPU is not fully utilized. To
combat this issue, we’ve developed a highly customized tabular
TorchAsyncItr, to accelerate existing pipelines in
PyTorch. The NVTabular dataloader is capable of:
removing bottlenecks from dataloading by processing large chunks of data at a time instead of item by item
processing datasets that don’t fit within the GPU or CPU memory by streaming from the disk
reading data directly into the GPU memory and removing CPU-GPU communication
preparing batch asynchronously into the GPU to avoid CPU-GPU communication
supporting commonly used formats such as parquet
integrating easily into existing PyTorch training pipelines by using a similar API as the native PyTorch dataloader
TorchAsyncItr accelerates training with PyTorch, the following
The required libraries are imported.
import torch from nvtabular.loader.torch import TorchAsyncItr, DLDataLoader
TorchAsyncItriterator is initialized. The input is a NVTabular dataset that uses a list of file names. The NVTabular dataset is an abstraction layer that iterates over the full dataset in chunks. The dataset schema is defined in which
catsare the column names for the categorical input features,
contsare the column names for the continuous input features, and
labelsare the column names for the target. Each parameter should be formatted as a list of strings. The batch size is also specified.
TRAIN_PATHS = glob.glob("./train/*.parquet") train_dataset = TorchAsyncItr( nvt.Dataset(TRAIN_PATHS), cats=CATEGORICAL_COLUMNS, conts=CONTINUOUS_COLUMNS, labels=LABEL_COLUMNS, batch_size=BATCH_SIZE )
TorchAsyncItris wrapped as
train_loader = DLDataLoader( train_dataset, batch_size=None, collate_fn=collate_fn, pin_memory=False, num_workers=0 )
torch.nn.Modulemodel was created,
train_loadercan be used in the same way as the PyTorch dataloader.
... model = get_model() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) for x_cat, x_cont, y in iter(dataloader): y_pred = model(x_cat, x_cont) loss = loss_func(y_pred, y) optimizer.zero_grad() loss.backward() optimizer.step()
TorchAsyncItrdataloader can be initialized for the validation dataset using the same structure.