merlin.dataloader.torch.Loader

class merlin.dataloader.torch.Loader(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

__init__(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Methods

__init__(dataset, batch_size[, shuffle, …])

array_lib()

convert_batch(batch)

Returns a batch after it has been converted to the appropriate tensor column type and then formats it in a flat dictionary which makes list columns into values and offsets as separate entries.

epochs([epochs])

Create a dataloader that will efficiently run for more than one epoch.

make_tensors(gdf[, use_row_lengths])

Yields batches of tensors from a dataframe

map(fn)

Applying a function to each batch.

peek()

Grab the next batch from the dataloader without removing it from the queue

stop()

Halts and resets the initialization parameters of the dataloader.

Attributes

input_schema

Get input schema of data to be loaded.

output_schema

Get output schema of data being loaded.

schema

Get input schema of data to be loaded

transforms

peek()[source]

Grab the next batch from the dataloader without removing it from the queue

convert_batch(batch)[source]

Returns a batch after it has been converted to the appropriate tensor column type and then formats it in a flat dictionary which makes list columns into values and offsets as separate entries.

Parameters

batch (tuple) – Tuple of dictionary inputs and n-dimensional array of targets

Returns

A tuple of dictionary inputs, with lists split as values and offsets, and targets as an array

Return type

Tuple

map(fn)[source]

Applying a function to each batch.

This can for instance be used to add sample_weight to the model.