merlin.schema.Schema#

class merlin.schema.Schema(column_schemas=None)[source]#

Bases: object

A collection of column schemas for a dataset.

__init__(column_schemas=None)[source]#

Methods

__init__([column_schemas])

apply(selector)

apply_inverse(selector)

copy()

Return a copy of the schema

excluding(selector)

Select non-matching columns from this Schema object using a ColumnSelector

excluding_by_name(col_names)

Remove columns from this Schema object by name

excluding_by_tag(tags[, pred_fn])

Remove columns from the schema that match ANY of the supplied tags.

get(col_name[, default])

Get a ColumnSchema by name

remove_by_tag(tags[, pred_fn])

remove_col(col_name)

Remove a column from this Schema object by name

select(selector)

Select matching columns from this Schema object using a ColumnSelector

select_by_name(names)

Select matching columns from this Schema object using a list of column names

select_by_tag(tags[, pred_fn])

Select columns from this Schema that match ANY of the supplied tags.

to_pandas()

Convert this Schema object to a pandas DataFrame

without(col_names)

Attributes

column_names

first

Returns the first ColumnSchema in the Schema.

property column_names#
select(selector) merlin.schema.schema.Schema[source]#

Select matching columns from this Schema object using a ColumnSelector

Parameters

selector (ColumnSelector) – Selector that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

apply(selector) merlin.schema.schema.Schema[source]#
excluding(selector) merlin.schema.schema.Schema[source]#

Select non-matching columns from this Schema object using a ColumnSelector

Parameters

selector (ColumnSelector) – Selector that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

apply_inverse(selector) merlin.schema.schema.Schema[source]#
select_by_tag(tags: Union[str, merlin.schema.tags.Tags, List[Union[str, merlin.schema.tags.Tags]]], pred_fn=None) merlin.schema.schema.Schema[source]#

Select columns from this Schema that match ANY of the supplied tags.

Parameters
  • tags (List[Union[str, Tags]] :) – List of tags that describes which columns match

  • pred_fn (any or all) – Predicate function that decides if the column should be selected. Receives iterable of bool values indicating whether each of the provided tags is present on a column schema. Returning True selects this column, False will not return that column.

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

excluding_by_tag(tags, pred_fn=None) merlin.schema.schema.Schema[source]#

Remove columns from the schema that match ANY of the supplied tags.

Parameters
  • tags (_type_) – List of tags that describes which columns remove

  • pred_fn (any or all, optional, by default None (ANY)) – Predicate function that decides if a column should be selected. all can be provided to remove columns that contain ALL the tags provided

Returns

New Schema containing only the columns that don’t contain the provided tags

Return type

Schema

remove_by_tag(tags, pred_fn=None) merlin.schema.schema.Schema[source]#
select_by_name(names: List[str]) merlin.schema.schema.Schema[source]#

Select matching columns from this Schema object using a list of column names

Parameters

names (List[str] :) – List of column names that describes which columns match

Returns

New object containing only the ColumnSchemas of selected columns

Return type

Schema

excluding_by_name(col_names: List[str])[source]#

Remove columns from this Schema object by name

Parameters

col_names (List[str]) – Names of the column to remove

Returns

New Schema object after the columns are removed

Return type

Schema

remove_col(col_name: str) merlin.schema.schema.Schema[source]#

Remove a column from this Schema object by name

Parameters

col_name (str) – Name of the column to remove

Returns

This Schema object after the column is removed

Return type

Schema

without(col_names: List[str]) merlin.schema.schema.Schema[source]#
get(col_name: str, default: Optional[merlin.schema.schema.ColumnSchema] = None) merlin.schema.schema.ColumnSchema[source]#

Get a ColumnSchema by name

Parameters
  • col_name (str) – Name of the column to get

  • default (ColumnSchema :) –

    Default value to return if column is not found.

    (Default value = None)

Returns

Retrieved column schema (or default value, if not found)

Return type

ColumnSchema

property first: merlin.schema.schema.ColumnSchema#

Returns the first ColumnSchema in the Schema. Useful for cases where you select down to a single column via select_by_name or select_by_tag, and just want the value

Returns

The first column schema present in this Schema object

Return type

ColumnSchema

Raises

ValueError – If this Schema object contains no column schemas

to_pandas() pandas.core.frame.DataFrame[source]#

Convert this Schema object to a pandas DataFrame

Returns

DataFrame containing the column schemas in this Schema object

Return type

pd.DataFrame

copy() merlin.schema.schema.Schema[source]#

Return a copy of the schema