merlin.dag.Node#

class merlin.dag.Node(selector=None)[source]#

Bases: object

A Node is a group of columns that you want to apply the same transformations to. Node’s can be transformed by shifting operators on to them, which returns a new Node with the transformations applied. This lets you define a graph of operations that makes up your workflow

Parameters:

selector (ColumnSelector) – Defines which columns to select from the input Dataset using column names and tags.

__init__(selector=None)[source]#

Methods

__init__([selector])

add_child(child)

Adding a child node to this node

add_dependency(dep)

Adding a dependency node to this node

add_parent(parent)

Adding a parent node to this node

compute_schemas(root_schema[, preserve_dtypes])

Defines the input and output schema

construct_from(nodable)

Convert Node-like objects to a Node or list of Nodes.

export(output_path[, node_id, version])

Export a directory for this node, containing the required artifacts to run in the target context.

exportable([backend])

remove_child(child)

Removing a child node from this node

remove_inputs(input_cols)

Remove input columns and all output columns that depend on them.

validate_schemas(root_schema[, strict_dtypes])

Check if this Node's input schema matches the output schemas of parents and dependencies

Attributes

column_mapping

dependency_columns

export_name

Name for the exported node directory.

graph

grouped_parents_with_dependencies

input_columns

label

output_columns

parents_with_dependencies

selector

property selector#
add_dependency(dep: Node | Operator | str | List[str] | ColumnSelector | List[Node | Operator | str | List[str] | ColumnSelector])[source]#

Adding a dependency node to this node

Parameters:

dep (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Dependency to be added

add_parent(parent: Node | Operator | str | List[str] | ColumnSelector | List[Node | Operator | str | List[str] | ColumnSelector])[source]#

Adding a parent node to this node

Parameters:

parent (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Parent to be added

add_child(child: Node | Operator | str | List[str] | ColumnSelector | List[Node | Operator | str | List[str] | ColumnSelector])[source]#

Adding a child node to this node

Parameters:

child (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Child to be added

remove_child(child: Node | Operator | str | List[str] | ColumnSelector | List[Node | Operator | str | List[str] | ColumnSelector])[source]#

Removing a child node from this node

Parameters:

child (Union[str, ColumnSelector, Node, List[Union[str, Node, ColumnSelector]]]) – Child to be removed

compute_schemas(root_schema: Schema, preserve_dtypes: bool = False)[source]#

Defines the input and output schema

Parameters:
  • root_schema (Schema) – Schema of the input dataset

  • preserve_dtypes (bool, optional) – True if we don’t want to override dtypes in the current schema, by default False

validate_schemas(root_schema: Schema, strict_dtypes: bool = False)[source]#

Check if this Node’s input schema matches the output schemas of parents and dependencies

Parameters:
  • root_schema (Schema) – Schema of the input dataset

  • strict_dtypes (bool, optional) – If an error should be raised when column dtypes don’t match, by default False

Raises:
  • ValueError – If parents and dependencies don’t provide an expected column based on the input schema

  • ValueError – If the dtype of a column from parents and dependencies doesn’t match the expected dtype based on the input schema

remove_inputs(input_cols: List[str]) List[str][source]#

Remove input columns and all output columns that depend on them.

Parameters:

input_cols (List[str]) – The input columns to remove

Returns:

The output columns that were removed

Return type:

List[str]

exportable(backend: str | None = None)[source]#
export(output_path: str | PathLike, node_id: int | None = None, version: int = 1)[source]#

Export a directory for this node, containing the required artifacts to run in the target context.

Parameters:
  • output_path (Union[str, os.PathLike]) – The base path to write this node’s export directory.

  • node_id (int, optional) – The id of this node in a larger graph (for disambiguation), by default None.

  • version (int, optional) – The version of the node to use for this export, by default 1.

property export_name#

Name for the exported node directory.

Returns:

Name supplied by this node’s operator.

Return type:

str

property parents_with_dependencies#
property grouped_parents_with_dependencies#
property input_columns#
property output_columns#
property column_mapping#
property dependency_columns#
property label#
property graph#
classmethod construct_from(nodable: Node | Operator | str | List[str] | ColumnSelector | List[Node | Operator | str | List[str] | ColumnSelector])[source]#

Convert Node-like objects to a Node or list of Nodes.

Parameters:

nodable (Nodable) – Node-like objects to convert to a Node or list of Nodes.

Returns:

New Node(s) corresponding to the Node-like input objects

Return type:

Union[“Node”, List[“Node”]]

Raises:

TypeError – If supplied input cannot be converted to a Node or list of Nodes