merlin.loader.loader_base.LoaderBase

class merlin.loader.loader_base.LoaderBase(dataset, batch_size, shuffle, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False)[source]

Bases: object

Base class containing common functionality between the PyTorch and TensorFlow dataloaders.

__init__(dataset, batch_size, shuffle, seed_fn=None, parts_per_chunk=1, global_size=None, global_rank=None, drop_last=False)[source]

Methods

__init__(dataset, batch_size, shuffle[, …])

epochs([epochs])

Create a dataloader that will efficiently run for more than one epoch.

make_tensors(gdf[, use_nnz])

Turns a gdf into tensor representation by column

stop()

Halts and resets the initialization parameters of the dataloader.

epochs(epochs=1)[source]

Create a dataloader that will efficiently run for more than one epoch.

Parameters

epochs (int, optional) – Number of epochs the dataloader should process data, by default 1

Returns

return a dataloader that will run for user defined epochs.

Return type

DataLoader

stop()[source]

Halts and resets the initialization parameters of the dataloader.

make_tensors(gdf, use_nnz=False)[source]

Turns a gdf into tensor representation by column

Parameters
  • gdf (DataFrame) – A dataframe type object.

  • use_nnz (bool, optional) – toggle nnzs or use offsets for list columns, by default False

Returns

A dictionary of the column tensor representations.

Return type

Dict[Tensors]