merlin_standard_lib package

Submodules

merlin_standard_lib.registry module

merlin_standard_lib.registry.camelcase_to_snakecase(name)[source]
merlin_standard_lib.registry.snakecase_to_camelcase(name)[source]
merlin_standard_lib.registry.default_name(class_or_fn)[source]

Default name for a class or function.

This is the naming function by default for registries expecting classes or functions.

Parameters

class_or_fn – class or function to be named.

Returns

Return type

Default name for registration.

merlin_standard_lib.registry.default_object_name(obj)[source]
class merlin_standard_lib.registry.Registry(registry_name, default_key_fn=<function default_name>, validator=None, on_set=None, value_transformer=<function Registry.<lambda>>)[source]

Bases: object

Dict-like class for managing function registrations.

Example usage:

my_registry = Registry("custom_name")
@my_registry.register
def my_func():
  pass
@my_registry.register()
def another_func():
  pass
@my_registry.register("non_default_name")
def third_func(x, y, z):
  pass
def foo():
  pass
my_registry.register()(foo)
my_registry.register("baz")(lambda (x, y): x + y)
my_register.register("bar")
print(list(my_registry))
# ["my_func", "another_func", "non_default_name", "foo", "baz"]
# (order may vary)
print(my_registry["non_default_name"] is third_func)  # True
print("third_func" in my_registry)                    # False
print("bar" in my_registry)                           # False
my_registry["non-existent_key"]                       # raises KeyError

Optional validation, on_set callback and value transform also supported.

Parameters
  • registry_name (str) – identifier for the given registry. Used in error msgs.

  • default_key_fn (callable, optional) – function mapping value -> key for registration when a key is not provided

  • validator (callable, optional) – if given, this is run before setting a given (key, value) pair. Accepts (key, value) and should raise if there is a problem. Overwriting existing keys is not allowed and is checked separately. Values are also checked to be callable separately.

  • on_set (callable, optional) – callback function accepting (key, value) pair which is run after an item is successfully set.

  • value_transformer (callable, optional) – if run, __getitem__ will return value_transformer(key, registered_value).

classmethod class_registry(registry_name, default_key_fn=<function default_name>, validator=None, on_set=None)[source]
default_key(value)[source]

Default key used when key not provided. Uses function from __init__.

property name
validate(key, value)[source]

Validation function run before setting. Uses function from __init__.

on_set(key, value)[source]

Callback called on successful set. Uses function from __init__.

register(key_or_value=None)[source]

Decorator to register a function, or registration itself. This is primarily intended for use as a decorator, either with or without a key/parentheses.

Example Usage:

@my_registry.register('key1')
def value_fn(x, y, z):
  pass
@my_registry.register()
def another_fn(x, y):
  pass
@my_registry.register
def third_func():
  pass

Note if key_or_value is provided as a non-callable, registration only occurs once the returned callback is called with a callable as its only argument:

callback = my_registry.register('different_key')
'different_key' in my_registry  # False
callback(lambda (x, y): x + y)
'different_key' in my_registry  # True
Parameters

(optional) (key_or_value) – key to access the registered value with, or the unction itself. If None (default), self.default_key will be called on value once the returned callback is called with value as the only arg. If key_or_value is itself callable, it is assumed to be the value and the key is given by self.default_key(key).

Returns

Return type

decorated callback, or callback generated a decorated function.

register_with_multiple_names(*names)[source]
keys()[source]
values()[source]
items()[source]
get(key, default=None)[source]
parse(class_or_str)[source]
class merlin_standard_lib.registry.RegistryMixin[source]

Bases: Generic[merlin_standard_lib.registry.RegistryClassT], abc.ABC

classmethod parse(class_or_str)RegistryClassT[source]
classmethod registry()merlin_standard_lib.registry.Registry[source]
merlin_standard_lib.registry.display_list_by_prefix(names_list, starting_spaces=0)[source]

Creates a help string for names_list grouped by prefix.

Module contents

class merlin_standard_lib.ColumnSchema(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>, value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>, domain: str = <betterproto._PLACEHOLDER object>, int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>, float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>, string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>, bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>, struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>, natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>, image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>, mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>, url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>, time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>, time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>, distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, in_environment: List[str] = <betterproto._PLACEHOLDER object>, not_in_environment: List[str] = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>)[source]

Bases: merlin_standard_lib.proto.schema_bp.Feature

classmethod create_categorical(name: str, num_items: int, shape: Optional[Union[Tuple[int, ], List[int]]] = None, value_count: Optional[Union[merlin_standard_lib.proto.schema_bp.ValueCount, merlin_standard_lib.proto.schema_bp.ValueCountList]] = None, min_index: int = 0, tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]]]] = None, **kwargs)merlin_standard_lib.schema.schema.ColumnSchema[source]
classmethod create_continuous(name: str, is_float: bool = True, min_value: Optional[Union[int, float]] = None, max_value: Optional[Union[int, float]] = None, disallow_nan: bool = False, disallow_inf: bool = False, is_embedding: bool = False, shape: Optional[Union[Tuple[int, ], List[int]]] = None, value_count: Optional[Union[merlin_standard_lib.proto.schema_bp.ValueCount, merlin_standard_lib.proto.schema_bp.ValueCountList]] = None, tags: Optional[Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]]]] = None, **kwargs)merlin_standard_lib.schema.schema.ColumnSchema[source]
copy(**kwargs)merlin_standard_lib.schema.schema.ColumnSchema[source]
with_name(name: str)[source]
with_tags(tags: Union[List[str], List[merlin_standard_lib.schema.tag.Tag], List[Union[merlin_standard_lib.schema.tag.Tag, str]]])merlin_standard_lib.schema.schema.ColumnSchema[source]
with_tags_based_on_properties(using_value_count=True, using_domain=True)merlin_standard_lib.schema.schema.ColumnSchema[source]
with_properties(properties: Dict[str, Union[str, int, float]])merlin_standard_lib.schema.schema.ColumnSchema[source]
to_proto_text()str[source]
property tags
property properties
class merlin_standard_lib.Schema(feature: Sequence[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>, weighted_feature: List[merlin_standard_lib.proto.schema_bp.WeightedFeature] = <betterproto._PLACEHOLDER object>, string_domain: List[merlin_standard_lib.proto.schema_bp.StringDomain] = <betterproto._PLACEHOLDER object>, float_domain: List[merlin_standard_lib.proto.schema_bp.FloatDomain] = <betterproto._PLACEHOLDER object>, int_domain: List[merlin_standard_lib.proto.schema_bp.IntDomain] = <betterproto._PLACEHOLDER object>, default_environment: List[str] = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, dataset_constraints: merlin_standard_lib.proto.schema_bp.DatasetConstraints = <betterproto._PLACEHOLDER object>, tensor_representation_group: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup] = <betterproto._PLACEHOLDER object>)[source]

Bases: merlin_standard_lib.proto.schema_bp._Schema

A collection of column schemas for a dataset.

feature: List[merlin_standard_lib.schema.schema.ColumnSchema] = Field(name=None,type=None,default=<betterproto._PLACEHOLDER object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'betterproto': FieldMetadata(number=1, proto_type='message', map_types=None, group=None, wraps=None)}),_field_type=None)
classmethod create(column_schemas: Optional[Union[List[Union[merlin_standard_lib.schema.schema.ColumnSchema, str]], Dict[str, Union[merlin_standard_lib.schema.schema.ColumnSchema, str]]]] = None, **kwargs)[source]
with_tags_based_on_properties(using_value_count=True, using_domain=True)merlin_standard_lib.schema.schema.Schema[source]
apply(selector)merlin_standard_lib.schema.schema.Schema[source]
apply_inverse(selector)merlin_standard_lib.schema.schema.Schema[source]
filter_columns_from_dict(input_dict)[source]
select_by_type(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_type(to_remove)merlin_standard_lib.schema.schema.Schema[source]
select_by_tag(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_tag(to_remove)merlin_standard_lib.schema.schema.Schema[source]
select_by_name(to_select)merlin_standard_lib.schema.schema.Schema[source]
remove_by_name(to_remove)merlin_standard_lib.schema.schema.Schema[source]
map_column_schemas(map_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], merlin_standard_lib.schema.schema.ColumnSchema])merlin_standard_lib.schema.schema.Schema[source]
filter_column_schemas(filter_fn: Callable[[merlin_standard_lib.schema.schema.ColumnSchema], bool], negate=False)merlin_standard_lib.schema.schema.Schema[source]
categorical_cardinalities()Dict[str, int][source]
property column_names
property column_schemas
property item_id_column_name
from_json(value: Union[str, bytes])merlin_standard_lib.schema.schema.Schema[source]
to_proto_text()str[source]
from_proto_text(path_or_proto_text: str)merlin_standard_lib.schema.schema.Schema[source]
copy(**kwargs)merlin_standard_lib.schema.schema.Schema[source]
add(other, allow_overlap=True)merlin_standard_lib.schema.schema.Schema[source]
class merlin_standard_lib.Tag(value)[source]

Bases: enum.Enum

An enumeration.

CATEGORICAL = 'categorical'
CONTINUOUS = 'continuous'
LIST = 'list'
TEXT = 'text'
TEXT_TOKENIZED = 'text_tokenized'
TIME = 'time'
USER = 'user'
USER_ID = 'user_id'
ITEM = 'item'
ITEM_ID = 'item_id'
SESSION = 'session'
SESSION_ID = 'session_id'
CONTEXT = 'context'
TARGETS = 'target'
BINARY_CLASSIFICATION = 'binary_classification'
MULTI_CLASS_CLASSIFICATION = 'multi_class'
REGRESSION = 'regression'
class merlin_standard_lib.Registry(registry_name, default_key_fn=<function default_name>, validator=None, on_set=None, value_transformer=<function Registry.<lambda>>)[source]

Bases: object

Dict-like class for managing function registrations.

Example usage:

my_registry = Registry("custom_name")
@my_registry.register
def my_func():
  pass
@my_registry.register()
def another_func():
  pass
@my_registry.register("non_default_name")
def third_func(x, y, z):
  pass
def foo():
  pass
my_registry.register()(foo)
my_registry.register("baz")(lambda (x, y): x + y)
my_register.register("bar")
print(list(my_registry))
# ["my_func", "another_func", "non_default_name", "foo", "baz"]
# (order may vary)
print(my_registry["non_default_name"] is third_func)  # True
print("third_func" in my_registry)                    # False
print("bar" in my_registry)                           # False
my_registry["non-existent_key"]                       # raises KeyError

Optional validation, on_set callback and value transform also supported.

Parameters
  • registry_name (str) – identifier for the given registry. Used in error msgs.

  • default_key_fn (callable, optional) – function mapping value -> key for registration when a key is not provided

  • validator (callable, optional) – if given, this is run before setting a given (key, value) pair. Accepts (key, value) and should raise if there is a problem. Overwriting existing keys is not allowed and is checked separately. Values are also checked to be callable separately.

  • on_set (callable, optional) – callback function accepting (key, value) pair which is run after an item is successfully set.

  • value_transformer (callable, optional) – if run, __getitem__ will return value_transformer(key, registered_value).

classmethod class_registry(registry_name, default_key_fn=<function default_name>, validator=None, on_set=None)[source]
default_key(value)[source]

Default key used when key not provided. Uses function from __init__.

property name
validate(key, value)[source]

Validation function run before setting. Uses function from __init__.

on_set(key, value)[source]

Callback called on successful set. Uses function from __init__.

register(key_or_value=None)[source]

Decorator to register a function, or registration itself. This is primarily intended for use as a decorator, either with or without a key/parentheses.

Example Usage:

@my_registry.register('key1')
def value_fn(x, y, z):
  pass
@my_registry.register()
def another_fn(x, y):
  pass
@my_registry.register
def third_func():
  pass

Note if key_or_value is provided as a non-callable, registration only occurs once the returned callback is called with a callable as its only argument:

callback = my_registry.register('different_key')
'different_key' in my_registry  # False
callback(lambda (x, y): x + y)
'different_key' in my_registry  # True
Parameters

(optional) (key_or_value) – key to access the registered value with, or the unction itself. If None (default), self.default_key will be called on value once the returned callback is called with value as the only arg. If key_or_value is itself callable, it is assumed to be the value and the key is given by self.default_key(key).

Returns

Return type

decorated callback, or callback generated a decorated function.

register_with_multiple_names(*names)[source]
keys()[source]
values()[source]
items()[source]
get(key, default=None)[source]
parse(class_or_str)[source]
class merlin_standard_lib.RegistryMixin[source]

Bases: Generic[merlin_standard_lib.registry.RegistryClassT], abc.ABC

classmethod parse(class_or_str)RegistryClassT[source]
classmethod registry()merlin_standard_lib.registry.Registry[source]