merlin.schema.Schema#
- class merlin.schema.Schema(column_schemas=None)[source]#
Bases:
object
A collection of column schemas for a dataset.
Methods
__init__
([column_schemas])apply
(selector)apply_inverse
(selector)copy
()Return a copy of the schema
excluding
(selector)Select non-matching columns from this Schema object using a ColumnSelector
excluding_by_name
(col_names)Remove columns from this Schema object by name
excluding_by_tag
(tags[, pred_fn])Remove columns from the schema that match ANY of the supplied tags.
get
(col_name[, default])Get a ColumnSchema by name
remove_by_tag
(tags[, pred_fn])remove_col
(col_name)Remove a column from this Schema object by name
select
(selector)Select matching columns from this Schema object using a ColumnSelector
select_by_name
(names)Select matching columns from this Schema object using a list of column names
select_by_tag
(tags[, pred_fn])Select columns from this Schema that match ANY of the supplied tags.
Convert this Schema object to a pandas DataFrame
without
(col_names)Attributes
Returns the first ColumnSchema in the Schema.
- property column_names#
- select(selector) Schema [source]#
Select matching columns from this Schema object using a ColumnSelector
- Parameters:
selector (ColumnSelector) – Selector that describes which columns match
- Returns:
New object containing only the ColumnSchemas of selected columns
- Return type:
- excluding(selector) Schema [source]#
Select non-matching columns from this Schema object using a ColumnSelector
- Parameters:
selector (ColumnSelector) – Selector that describes which columns match
- Returns:
New object containing only the ColumnSchemas of selected columns
- Return type:
- select_by_tag(tags: str | Tags | List[str | Tags], pred_fn=None) Schema [source]#
Select columns from this Schema that match ANY of the supplied tags.
- Parameters:
tags (List[Union[str, Tags]] :) – List of tags that describes which columns match
pred_fn (any or all) – Predicate function that decides if the column should be selected. Receives iterable of bool values indicating whether each of the provided tags is present on a column schema. Returning True selects this column, False will not return that column.
- Returns:
New object containing only the ColumnSchemas of selected columns
- Return type:
- excluding_by_tag(tags, pred_fn=None) Schema [source]#
Remove columns from the schema that match ANY of the supplied tags.
- Parameters:
tags (_type_) – List of tags that describes which columns remove
pred_fn (any or all, optional, by default None (ANY)) – Predicate function that decides if a column should be selected. all can be provided to remove columns that contain ALL the tags provided
- Returns:
New Schema containing only the columns that don’t contain the provided tags
- Return type:
- select_by_name(names: List[str]) Schema [source]#
Select matching columns from this Schema object using a list of column names
- get(col_name: str, default: ColumnSchema | None = None) ColumnSchema [source]#
Get a ColumnSchema by name
- Parameters:
col_name (str) – Name of the column to get
default (ColumnSchema :) –
- Default value to return if column is not found.
(Default value = None)
- Returns:
Retrieved column schema (or default value, if not found)
- Return type:
- property first: ColumnSchema#
Returns the first ColumnSchema in the Schema. Useful for cases where you select down to a single column via select_by_name or select_by_tag, and just want the value
- Returns:
The first column schema present in this Schema object
- Return type:
- Raises:
ValueError – If this Schema object contains no column schemas