merlin.dag package
-
class
merlin.dag.
BaseOperator
[source] Bases:
object
Base class for all operator classes.
-
compute_selector
(input_schema: merlin.schema.schema.Schema, selector: merlin.dag.selector.ColumnSelector, parents_selector: merlin.dag.selector.ColumnSelector, dependencies_selector: merlin.dag.selector.ColumnSelector) → merlin.dag.selector.ColumnSelector[source]
-
compute_input_schema
(root_schema: merlin.schema.schema.Schema, parents_schema: merlin.schema.schema.Schema, deps_schema: merlin.schema.schema.Schema, selector: merlin.dag.selector.ColumnSelector) → merlin.schema.schema.Schema[source] Given the schemas coming from upstream sources and a column selector for the input columns, returns a set of schemas for the input columns this operator will use :param root_schema: Base schema of the dataset before running any operators. :type root_schema: Schema :param parents_schema: The combined schemas of the upstream parents feeding into this operator :type parents_schema: Schema :param deps_schema: The combined schemas of the upstream dependencies feeding into this operator :type deps_schema: Schema :param col_selector: The column selector to apply to the input schema :type col_selector: ColumnSelector
- Returns
The schemas of the columns used by this operator
- Return type
-
compute_output_schema
(input_schema: merlin.schema.schema.Schema, col_selector: merlin.dag.selector.ColumnSelector, prev_output_schema: Optional[merlin.schema.schema.Schema] = None) → merlin.schema.schema.Schema[source] Given a set of schemas and a column selector for the input columns, returns a set of schemas for the transformed columns this operator will produce :param input_schema: The schemas of the columns to apply this operator to :type input_schema: Schema :param col_selector: The column selector to apply to the input schema :type col_selector: ColumnSelector
- Returns
The schemas of the columns produced by this operator
- Return type
-
property
dynamic_dtypes
-
output_column_names
(col_selector: merlin.dag.selector.ColumnSelector) → merlin.dag.selector.ColumnSelector[source] Given a set of columns names returns the names of the transformed columns this operator will produce :param columns: The columns to apply this operator to :type columns: list of str, or list of list of str
- Returns
The names of columns produced by this operator
- Return type
list of str, or list of list of str
-
property
dependencies
Defines an optional list of column dependencies for this operator. This lets you consume columns that aren’t part of the main transformation workflow. :returns: Extra dependencies of this operator. Defaults to None :rtype: str, list of str or ColumnSelector, optional
-
property
output_dtype
-
property
output_properties
-
property
label
-
property
supports
Returns what kind of data representation this operator supports
-
-
class
merlin.dag.
Graph
(output_node: merlin.dag.node.Node, subgraphs: Optional[Dict[str, merlin.dag.node.Node]] = None)[source] Bases:
object
-
property
input_dtypes
-
property
output_dtypes
-
property
column_mapping
-
construct_schema
(root_schema: merlin.schema.schema.Schema, preserve_dtypes=False) → merlin.dag.graph.Graph[source]
-
property
input_schema
-
property
leaf_nodes
-
property
output_schema
-
property
-
class
merlin.dag.
Node
(selector=None)[source] Bases:
object
A Node is a group of columns that you want to apply the same transformations to. Node’s can be transformed by shifting operators on to them, which returns a new Node with the transformations applied. This lets you define a graph of operations that makes up your workflow
- Parameters
selector (ColumnSelector) – Defines which columns to select from the input Dataset using column names and tags.
-
property
selector
-
add_dependency
(dep: Union[str, merlin.dag.selector.ColumnSelector, merlin.dag.node.Node, List[Union[str, merlin.dag.node.Node, merlin.dag.selector.ColumnSelector]]])[source] Adding a dependency node to this node
- Parameters
dep (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Dependency to be added
-
add_parent
(parent: Union[str, merlin.dag.selector.ColumnSelector, merlin.dag.node.Node, List[Union[str, merlin.dag.node.Node, merlin.dag.selector.ColumnSelector]]])[source] Adding a parent node to this node
- Parameters
parent (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Parent to be added
-
add_child
(child: Union[str, merlin.dag.selector.ColumnSelector, merlin.dag.node.Node, List[Union[str, merlin.dag.node.Node, merlin.dag.selector.ColumnSelector]]])[source] Adding a child node to this node
- Parameters
child (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Child to be added
-
remove_child
(child: Union[str, merlin.dag.selector.ColumnSelector, merlin.dag.node.Node, List[Union[str, merlin.dag.node.Node, merlin.dag.selector.ColumnSelector]]])[source] Removing a child node from this node
- Parameters
child (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Child to be removed
-
compute_schemas
(root_schema: merlin.schema.schema.Schema, preserve_dtypes: bool = False)[source] Defines the input and output schema
-
validate_schemas
(root_schema: merlin.schema.schema.Schema, strict_dtypes: bool = False)[source] Check if this Node’s input schema matches the output schemas of parents and dependencies
- Parameters
- Raises
ValueError – If parents and dependencies don’t provide an expected column based on the input schema
ValueError – If the dtype of a column from parents and dependencies doesn’t match the expected dtype based on the input schema
-
property
exportable
-
property
parents_with_dependencies
-
property
grouped_parents_with_dependencies
-
property
input_columns
-
property
output_columns
-
property
column_mapping
-
property
dependency_columns
-
property
label
-
property
graph
-
class
merlin.dag.
ColumnSelector
(names: Optional[List[str]] = None, subgroups: Optional[List[merlin.dag.selector.ColumnSelector]] = None, tags: Optional[List[Union[str, merlin.schema.tags.Tags]]] = None)[source] Bases:
object
A ColumnSelector describes a group of columns to be transformed by Operators in a Graph. Operators can be applied to the selected columns by shifting (>>) operators on to the ColumnSelector, which returns a new Node with the transformations applied. This lets you define a graph of operations that makes up your Graph.
- Parameters
names (list of (str or tuple of str)) – The columns to select from the input Dataset. The elements of this list are strings indicating the column names in most cases, but can also be tuples of strings for feature crosses.
subgroups (list of ColumnSelector objects) – This provides an alternate syntax for grouping column names together (instead of nesting tuples inside the list of names)
optional (list of ColumnSelector objects) – This provides an alternate syntax for grouping column names together (instead of nesting tuples inside the list of names)
-
property
names
-
property
grouped_names