merlin.schema package
-
class
merlin.schema.
Schema
(column_schemas=None)[source] Bases:
object
A collection of column schemas for a dataset.
-
property
column_names
-
select
(selector) → merlin.schema.schema.Schema[source] Select matching columns from this Schema object using a ColumnSelector
- Parameters
selector (ColumnSelector) – Selector that describes which columns match
- Returns
New object containing only the ColumnSchemas of selected columns
- Return type
-
excluding
(selector) → merlin.schema.schema.Schema[source] Select non-matching columns from this Schema object using a ColumnSelector
- Parameters
selector (ColumnSelector) – Selector that describes which columns match
- Returns
New object containing only the ColumnSchemas of selected columns
- Return type
-
select_by_tag
(tags: Union[str, merlin.schema.tags.Tags, List[Union[str, merlin.schema.tags.Tags]]]) → merlin.schema.schema.Schema[source] Select matching columns from this Schema object using a list of tags
-
select_by_name
(names: List[str]) → merlin.schema.schema.Schema[source] Select matching columns from this Schema object using a list of column names
-
remove_col
(col_name: str) → merlin.schema.schema.Schema[source] Remove a column from this Schema object by name
-
get
(col_name: str, default: Optional[merlin.schema.schema.ColumnSchema] = None) → merlin.schema.schema.ColumnSchema[source] Get a ColumnSchema by name
- Parameters
col_name (str) – Name of the column to get
default (ColumnSchema :) –
- Default value to return if column is not found.
(Default value = None)
- Returns
Retrieved column schema (or default value, if not found)
- Return type
-
property
first
Returns the first ColumnSchema in the Schema. Useful for cases where you select down to a single column via select_by_name or select_by_tag, and just want the value
- Returns
The first column schema present in this Schema object
- Return type
- Raises
ValueError – If this Schema object contains no column schemas
-
property
-
class
merlin.schema.
ColumnSchema
(name: str, tags: Optional[merlin.schema.tags.TagSet] = <factory>, properties: Optional[Dict] = <factory>, dtype: Optional[object] = None, is_list: bool = False, is_ragged: Optional[bool] = None)[source] Bases:
object
A schema containing metadata of a dataframe column.
-
properties
: Optional[Dict]
-
property
quantity
Describes the number of elements in each row of this column
- Returns
SCALAR when one element per row FIXED_LIST when the same number of elements per row RAGGED_LIST when different numbers of elements per row
- Return type
ColumnQuantity
-
with_name
(name: str) → merlin.schema.schema.ColumnSchema[source] Create a copy of this ColumnSchema object with a different column name
- Parameters
name (str) – New column name
- Returns
Copied object with new column name
- Return type
Create a copy of this ColumnSchema object with different column tags
- Parameters
- Returns
Copied object with new column tags
- Return type
-
with_properties
(properties: dict) → merlin.schema.schema.ColumnSchema[source] Create a copy of this ColumnSchema object with different column properties
-
with_dtype
(dtype, is_list: Optional[bool] = None, is_ragged: Optional[bool] = None) → merlin.schema.schema.ColumnSchema[source] Create a copy of this ColumnSchema object with different column dtype
- Parameters
dtype (np.dtype) – New column dtype
is_list (bool :) –
- Whether rows in this column contain lists.
(Default value = None)
is_ragged (bool :) –
- Whether lists in this column have varying lengths.
(Default value = None)
- Returns
Copied object with new column dtype
- Return type
-
property
int_domain
-
property
float_domain
-
property
value_count
-
-
class
merlin.schema.
Tags
(value)[source] Bases:
enum.Enum
Standard tags used in the Merlin ecosystem
-
CATEGORICAL
= 'categorical'
-
CONTINUOUS
= 'continuous'
-
LIST
= 'list'
-
SEQUENCE
= 'sequence'
-
TEXT
= 'text'
-
TOKENIZED
= 'tokenized'
-
TIME
= 'time'
-
ID
= 'id'
-
USER
= 'user'
-
ITEM
= 'item'
-
SESSION
= 'session'
-
CONTEXT
= 'context'
-
TARGET
= 'target'
-
REGRESSION
= 'regression'
-
CLASSIFICATION
= 'classification'
-
BINARY
= 'binary'
-
MULTI_CLASS
= 'multi_class'
-
USER_ID
= 'user_id'
-
ITEM_ID
= 'item_id'
-
SESSION_ID
= 'session_id'
-
TEXT_TOKENIZED
= 'text_tokenized'
-
BINARY_CLASSIFICATION
= 'binary_classification'
-
MULTI_CLASS_CLASSIFICATION
= 'multi_class_classification'
-