merlin_standard_lib.utils package

Submodules

merlin_standard_lib.utils.doc_utils module

merlin_standard_lib.utils.doc_utils.docstring_parameter(*args, extra_padding=None, **kwargs)[source]

merlin_standard_lib.utils.embedding_utils module

merlin_standard_lib.utils.embedding_utils.get_embedding_sizes_from_schema(schema: merlin_standard_lib.schema.schema.Schema, multiplier: float = 2.0)[source]
merlin_standard_lib.utils.embedding_utils.get_embedding_size_from_cardinality(cardinality: int, multiplier: float = 2.0)[source]

merlin_standard_lib.utils.misc_utils module

merlin_standard_lib.utils.misc_utils.filter_kwargs(kwargs, thing_with_kwargs, filter_positional_or_keyword=True)[source]
merlin_standard_lib.utils.misc_utils.safe_json(data)[source]
merlin_standard_lib.utils.misc_utils.get_filenames(data_paths, files_filter_pattern='*')[source]
merlin_standard_lib.utils.misc_utils.get_label_feature_name(feature_map: Dict[str, Any])str[source]

Analyses the feature map config and returns the name of the label feature (e.g. item_id)

merlin_standard_lib.utils.misc_utils.get_timestamp_feature_name(feature_map: Dict[str, Any])str[source]

Analyses the feature map config and returns the name of the label feature (e.g. item_id)

merlin_standard_lib.utils.misc_utils.get_parquet_files_names(data_args, time_indices, is_train, eval_on_test_set=False)[source]
class merlin_standard_lib.utils.misc_utils.Timing(message, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, logger=None, one_line=True)[source]

Bases: object

A context manager that prints the execution time of the block it manages

merlin_standard_lib.utils.misc_utils.get_object_size(obj, seen=None)[source]

Recursively finds size of objects

merlin_standard_lib.utils.misc_utils.validate_dataset(paths_or_dataset, batch_size, buffer_size, engine, reader_kwargs)[source]

Util function to load NVTabular Dataset from disk

Parameters
  • paths_or_dataset (Union[nvtabular.Dataset, str]) – Path to dataset to load of nvtabular Dataset, if Dataset, return the object.

  • batch_size (int) – batch size for Dataloader.

  • buffer_size (float) – parameter, which refers to the fraction of batches to load at once.

  • engine (str) – parameter to specify the file format, possible values are: [“parquet”, “csv”, “csv-no-header”].

  • reader_kwargs (dict) – Additional arguments of the specified reader.

merlin_standard_lib.utils.proto_utils module

merlin_standard_lib.utils.proto_utils.has_field(self, field_name)[source]
merlin_standard_lib.utils.proto_utils.copy_better_proto_message(better_proto_message: ProtoMessageType, **kwargs)ProtoMessageType[source]
merlin_standard_lib.utils.proto_utils.better_proto_to_proto_text(better_proto_message: betterproto.Message, message: google.protobuf.message.Message)str[source]
merlin_standard_lib.utils.proto_utils.proto_text_to_better_proto(better_proto_message: ProtoMessageType, path_proto_text: str, message: google.protobuf.message.Message)ProtoMessageType[source]

Module contents