merlin.dataloader.loader_base.LoaderBase

class merlin.dataloader.loader_base.LoaderBase(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Bases: object

Base class containing common functionality between the PyTorch and TensorFlow dataloaders.

__init__(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Methods

__init__(dataset, batch_size[, shuffle, …])

array_lib()

epochs([epochs])

Create a dataloader that will efficiently run for more than one epoch.

make_tensors(gdf[, use_row_lengths])

Yields batches of tensors from a dataframe

peek()

Get the next batch without advancing the iterator.

stop()

Halts and resets the initialization parameters of the dataloader.

Attributes

input_schema

Get input schema of data to be loaded.

output_schema

Get output schema of data being loaded.

schema

Get input schema of data to be loaded

transforms

property transforms
epochs(epochs=1)[source]

Create a dataloader that will efficiently run for more than one epoch.

Parameters

epochs (int, optional) – Number of epochs the dataloader should process data, by default 1

Returns

return a dataloader that will run for user defined epochs.

Return type

DataLoader

stop()[source]

Halts and resets the initialization parameters of the dataloader.

peek()[source]

Get the next batch without advancing the iterator.

make_tensors(gdf, use_row_lengths=False)[source]

Yields batches of tensors from a dataframe

Parameters
  • gdf (DataFrame) – A dataframe type object.

  • use_row_lengths (bool, optional) – Enable using row lengths instead of offsets for list columns, by default False

Returns

A dictionary of the column tensor representations.

Return type

Dict[Tensors]

array_lib()[source]
property schema

Get input schema of data to be loaded

Returns

Schema corresponding to the data

Return type

Schema

property output_schema

Get output schema of data being loaded.

When there are transforms defined that change the features being loaded, This output schema is intended to account for this and should match the features returned by the loader. If there are no transforms then this will be the same as the input schema.

Returns

Schema corresponding to the data that will be output by the loader

Return type

Schema

property input_schema

Get input schema of data to be loaded.

If there are no transforms then this will be the same as the output schema.

Returns

Schema corresponding to the data that will be loaded prior to any transforms.

Return type

Schema