merlin.schema package

class merlin.schema.Schema(column_schemas=None)[source]

Bases: object

A collection of column schemas for a dataset.

apply(selector) merlin.schema.schema.Schema[source]

Select matching columns from this Schema object using a ColumnSelector

Parameters

selector (ColumnSelector) – Selector that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

apply_inverse(selector) merlin.schema.schema.Schema[source]

Select non-matching columns from this Schema object using a ColumnSelector

Parameters

selector (ColumnSelector) – Selector that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

property column_names
property first: merlin.schema.schema.ColumnSchema

Returns the first ColumnSchema in the Schema. Useful for cases where you select down to a single column via select_by_name or select_by_tag, and just want the value

Returns

The first column schema present in this Schema object

Return type

ColumnSchema

Raises

ValueError – If this Schema object contains no column schemas

get(col_name: str, default: Optional[merlin.schema.schema.ColumnSchema] = None) merlin.schema.schema.ColumnSchema[source]

Get a ColumnSchema by name

Parameters
  • col_name (str) – Name of the column to get

  • default (ColumnSchema :) –

    Default value to return if column is not found.

    (Default value = None)

Returns

Retrieved column schema (or default value, if not found)

Return type

ColumnSchema

remove_by_tag(tags) merlin.schema.schema.Schema[source]
remove_col(col_name: str) merlin.schema.schema.Schema[source]

Remove a column from this Schema object by name

Parameters

col_name (str) – Name of the column to remove

Returns

This Schema object after the column is removed

Return type

Schema

select_by_name(names: List[str]) merlin.schema.schema.Schema[source]

Select matching columns from this Schema object using a list of column names

Parameters

names (List[str] :) – List of column names that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

select_by_tag(tags: List[Union[str, merlin.schema.tags.Tags]]) merlin.schema.schema.Schema[source]

Select matching columns from this Schema object using a list of tags

Parameters

tags (List[Union[str, Tags]] :) – List of tags that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

without(col_names: List[str]) merlin.schema.schema.Schema[source]

Remove columns from this Schema object by name

Parameters

col_names (List[str]) – Names of the column to remove

Returns

New Schema object after the columns are removed

Return type

Schema

class merlin.schema.ColumnSchema(name: str, tags: Optional[merlin.schema.tags.TagSet] = <factory>, properties: Optional[Dict[str, any]] = <factory>, dtype: Optional[object] = None, is_list: bool = False, is_ragged: bool = False)[source]

Bases: object

A schema containing metadata of a dataframe column.

dtype: Optional[object] = None
property float_domain: Optional[merlin.schema.schema.Domain]
property int_domain: Optional[merlin.schema.schema.Domain]
is_list: bool = False
is_ragged: bool = False
name: str
properties: Optional[Dict[str, any]]
tags: Optional[merlin.schema.tags.TagSet]
property value_count: Optional[merlin.schema.schema.Domain]
with_dtype(dtype, is_list: Optional[bool] = None, is_ragged: Optional[bool] = None) merlin.schema.schema.ColumnSchema[source]

Create a copy of this ColumnSchema object with different column dtype

Parameters
  • dtype (np.dtype) – New column dtype

  • is_list (bool :) –

    Whether rows in this column contain lists.

    (Default value = None)

  • is_ragged (bool :) –

    Whether lists in this column have varying lengths.

    (Default value = None)

Returns

Copied object with new column dtype

Return type

ColumnSchema

with_name(name: str) merlin.schema.schema.ColumnSchema[source]

Create a copy of this ColumnSchema object with a different column name

Parameters

name (str) – New column name

Returns

Copied object with new column name

Return type

ColumnSchema

with_properties(properties: dict) merlin.schema.schema.ColumnSchema[source]

Create a copy of this ColumnSchema object with different column properties

Parameters

properties (dict) – New column properties

Returns

Copied object with new column properties

Return type

ColumnSchema

Raises

TypeError – If properties are not a dict

with_tags(tags: Union[str, merlin.schema.tags.Tags]) merlin.schema.schema.ColumnSchema[source]

Create a copy of this ColumnSchema object with different column tags

Parameters

tags (Union[str, Tags]) – New column tags

Returns

Copied object with new column tags

Return type

ColumnSchema

class merlin.schema.Tags(value)[source]

Bases: enum.Enum

Standard tags used in the Merlin ecosystem

BINARY_CLASSIFICATION = 'binary_classification'
CATEGORICAL = 'categorical'
CONTEXT = 'context'
CONTINUOUS = 'continuous'
ITEM = 'item'
ITEM_ID = 'item_id'
LIST = 'list'
MULTI_CLASS_CLASSIFICATION = 'multi_class'
REGRESSION = 'regression'
SEQUENCE = 'sequence'
SESSION = 'session'
SESSION_ID = 'session_id'
TARGET = 'target'
TEXT = 'text'
TEXT_TOKENIZED = 'text_tokenized'
TIME = 'time'
USER = 'user'
USER_ID = 'user_id'