merlin.dag.Graph#

class merlin.dag.Graph(output_node: Node)[source]#

Bases: object

Represents an DAG composed of Nodes, each of which contains an operator that transforms dataframes or dataframe-like data

__init__(output_node: Node)[source]#

Methods

__init__(output_node)

clear_stats()

Removes calculated statistics from each node in the graph

construct_schema(root_schema[, preserve_dtypes])

Given the schema of a dataset to transform, determine the output schema of the graph

get_nodes_by_op_type(nodes, op_type)

remove_inputs(to_remove)

Removes columns from a Graph

subgraph(name)

Attributes

subgraph(name: str) Graph[source]#
property input_dtypes#
property output_dtypes#
property column_mapping#
construct_schema(root_schema: Schema, preserve_dtypes=False) Graph[source]#

Given the schema of a dataset to transform, determine the output schema of the graph

Parameters:
  • root_schema (Schema) – The schema of a dataset to be transformed with this DAG

  • preserve_dtypes (bool, optional) – Whether to keep any dtypes that may already be present in the schemas, by default False

Returns:

This DAG after the schemas have been filled in

Return type:

Graph

property input_schema#
property leaf_nodes#
property output_schema#
remove_inputs(to_remove)[source]#

Removes columns from a Graph

Starting at the leaf nodes, trickle down looking for columns to remove, when found remove but then must propagate the removal of any other output columns derived from that column.

Parameters:
  • graph (Graph) – The graph to remove columns from

  • to_remove (array_like) – A list of input column names to remove from the graph

Returns:

The same graph with columns removed

Return type:

Graph

classmethod get_nodes_by_op_type(nodes, op_type)[source]#
clear_stats()[source]#

Removes calculated statistics from each node in the graph

See also

StatOperator.clear