merlin.schema.Schema#

class merlin.schema.Schema(column_schemas=None)[source]#

Bases: object

A collection of column schemas for a dataset.

__init__(column_schemas=None)[source]#

Methods

__init__([column_schemas])

apply(selector)

apply_inverse(selector)

copy()

Return a copy of the schema

excluding(selector)

Select non-matching columns from this Schema object using a ColumnSelector

excluding_by_name(col_names)

Remove columns from this Schema object by name

excluding_by_tag(tags[, pred_fn])

Remove columns from the schema that match ANY of the supplied tags.

get(col_name[, default])

Get a ColumnSchema by name

remove_by_tag(tags[, pred_fn])

remove_col(col_name)

Remove a column from this Schema object by name

select(selector)

Select matching columns from this Schema object using a ColumnSelector

select_by_name(names)

Select matching columns from this Schema object using a list of column names

select_by_tag(tags[, pred_fn])

Select columns from this Schema that match ANY of the supplied tags.

to_pandas()

Convert this Schema object to a pandas DataFrame

without(col_names)

Attributes

column_names

first

Returns the first ColumnSchema in the Schema.

property column_names#
select(selector) Schema[source]#

Select matching columns from this Schema object using a ColumnSelector

Parameters:

selector (ColumnSelector) – Selector that describes which columns match

Returns:

New object containing only the ColumnSchemas of selected columns

Return type:

Schema

apply(selector) Schema[source]#
excluding(selector) Schema[source]#

Select non-matching columns from this Schema object using a ColumnSelector

Parameters:

selector (ColumnSelector) – Selector that describes which columns match

Returns:

New object containing only the ColumnSchemas of selected columns

Return type:

Schema

apply_inverse(selector) Schema[source]#
select_by_tag(tags: str | Tags | List[str | Tags], pred_fn=None) Schema[source]#

Select columns from this Schema that match ANY of the supplied tags.

Parameters:
  • tags (List[Union[str, Tags]] :) – List of tags that describes which columns match

  • pred_fn (any or all) – Predicate function that decides if the column should be selected. Receives iterable of bool values indicating whether each of the provided tags is present on a column schema. Returning True selects this column, False will not return that column.

Returns:

New object containing only the ColumnSchemas of selected columns

Return type:

Schema

excluding_by_tag(tags, pred_fn=None) Schema[source]#

Remove columns from the schema that match ANY of the supplied tags.

Parameters:
  • tags (_type_) – List of tags that describes which columns remove

  • pred_fn (any or all, optional, by default None (ANY)) – Predicate function that decides if a column should be selected. all can be provided to remove columns that contain ALL the tags provided

Returns:

New Schema containing only the columns that don’t contain the provided tags

Return type:

Schema

remove_by_tag(tags, pred_fn=None) Schema[source]#
select_by_name(names: List[str]) Schema[source]#

Select matching columns from this Schema object using a list of column names

Parameters:

names (List[str] :) – List of column names that describes which columns match

Returns:

New object containing only the ColumnSchemas of selected columns

Return type:

Schema

excluding_by_name(col_names: List[str])[source]#

Remove columns from this Schema object by name

Parameters:

col_names (List[str]) – Names of the column to remove

Returns:

New Schema object after the columns are removed

Return type:

Schema

remove_col(col_name: str) Schema[source]#

Remove a column from this Schema object by name

Parameters:

col_name (str) – Name of the column to remove

Returns:

This Schema object after the column is removed

Return type:

Schema

without(col_names: List[str]) Schema[source]#
get(col_name: str, default: ColumnSchema | None = None) ColumnSchema[source]#

Get a ColumnSchema by name

Parameters:
  • col_name (str) – Name of the column to get

  • default (ColumnSchema :) –

    Default value to return if column is not found.

    (Default value = None)

Returns:

Retrieved column schema (or default value, if not found)

Return type:

ColumnSchema

property first: ColumnSchema#

Returns the first ColumnSchema in the Schema. Useful for cases where you select down to a single column via select_by_name or select_by_tag, and just want the value

Returns:

The first column schema present in this Schema object

Return type:

ColumnSchema

Raises:

ValueError – If this Schema object contains no column schemas

to_pandas() DataFrame[source]#

Convert this Schema object to a pandas DataFrame

Returns:

DataFrame containing the column schemas in this Schema object

Return type:

pd.DataFrame

copy() Schema[source]#

Return a copy of the schema