merlin.dataloader.loader_base.LoaderBase

class merlin.dataloader.loader_base.LoaderBase(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Bases: object

Base class containing common functionality between the PyTorch and TensorFlow dataloaders.

__init__(dataset, batch_size, shuffle=False, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False, transforms=None, device=None)[source]

Methods

`__init__`(dataset, batch_size[, shuffle, …])
`array_lib`()
`epochs`([epochs])	Create a dataloader that will efficiently run for more than one epoch.
`make_tensors`(gdf[, use_row_lengths])	Yields batches of tensors from a dataframe
`peek`()	Get the next batch without advancing the iterator.
`stop`()	Halts and resets the initialization parameters of the dataloader.

Attributes

`input_schema`	Get input schema of data to be loaded.
`output_schema`	Get output schema of data being loaded.
`schema`	Get input schema of data to be loaded
`transforms`

property transforms

epochs(epochs=1)[source]

Create a dataloader that will efficiently run for more than one epoch.

Parameters: epochs (int, optional) – Number of epochs the dataloader should process data, by default 1
Returns: return a dataloader that will run for user defined epochs.
Return type: DataLoader

stop()[source]: Halts and resets the initialization parameters of the dataloader.

peek()[source]: Get the next batch without advancing the iterator.

make_tensors(gdf, use_row_lengths=False)[source]

Yields batches of tensors from a dataframe

Parameters

gdf (DataFrame) – A dataframe type object.
use_row_lengths (bool, optional) – Enable using row lengths instead of offsets for list columns, by default False

Returns

A dictionary of the column tensor representations.

Return type

Dict[Tensors]

array_lib()[source]

property schema

Get input schema of data to be loaded

Returns: Schema corresponding to the data
Return type: Schema

property output_schema

Get output schema of data being loaded.

When there are transforms defined that change the features being loaded, This output schema is intended to account for this and should match the features returned by the loader. If there are no transforms then this will be the same as the input schema.

Returns: Schema corresponding to the data that will be output by the loader
Return type: Schema

property input_schema

Get input schema of data to be loaded.

If there are no transforms then this will be the same as the output schema.

Returns: Schema corresponding to the data that will be loaded prior to any transforms.
Return type: Schema