merlin.dag.ColumnSelector#

class merlin.dag.ColumnSelector(names: str | List[str] | None = None, subgroups: List[ColumnSelector] | None = None, tags: List[str | Tags] | None = None)[source]#

Bases: object

A ColumnSelector describes a group of columns to be transformed by Operators in a Graph. Operators can be applied to the selected columns by shifting (>>) operators on to the ColumnSelector, which returns a new Node with the transformations applied. This lets you define a graph of operations that makes up your Graph.

Parameters:
  • names (list of (str or tuple of str)) – The columns to select from the input Dataset. The elements of this list are strings indicating the column names in most cases, but can also be tuples of strings for feature crosses.

  • subgroups (list of ColumnSelector objects) – This provides an alternate syntax for grouping column names together (instead of nesting tuples inside the list of names)

  • optional (list of ColumnSelector objects) – This provides an alternate syntax for grouping column names together (instead of nesting tuples inside the list of names)

  • tags (list of Tags) – The columns to select from the input dataset based on Tags. Any column with at-least-one of the tags provided will be considered.

__init__(names: str | List[str] | None = None, subgroups: List[ColumnSelector] | None = None, tags: List[str | Tags] | None = None)[source]#

Methods

__init__([names, subgroups, tags])

filter_columns(other_selector)

Narrow the content of this selector to the columns that would be selected by another

resolve(schema)

Takes a schema and produces a new selector with selected column names how selection occurs (tags, name) does not matter.

Attributes

all

grouped_names

names

tags

property all#
property tags#
property names#
property grouped_names#
resolve(schema)[source]#

Takes a schema and produces a new selector with selected column names how selection occurs (tags, name) does not matter.

filter_columns(other_selector: ColumnSelector)[source]#

Narrow the content of this selector to the columns that would be selected by another

Parameters:

other_selector (ColumnSelector) – Other selector to apply as the filter

Returns:

This selector filtered by the other selector

Return type:

ColumnSelector