DropLowCardinality

class nvtabular.ops.DropLowCardinality(min_cardinality=2)[source]

Bases: nvtabular.ops.operator.Operator

DropLowCardinality drops low cardinality categorical columns. This requires the cardinality of these columns to be known in the schema - for instance by first encoding these columns using Categorify.

transform(col_selector: merlin.dag.selector.ColumnSelector, df: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Selects all non-categorical columns and any categorical columns of at least the minimum cardinality from the dataframe.

Parameters
  • col_selector (ColumnSelector) – The columns to select.

  • df (DataFrameType) – The dataframe to transform

Returns

Dataframe with only the selected columns.

Return type

DataFrameType

compute_selector(input_schema: merlin.schema.schema.Schema, selector: merlin.dag.selector.ColumnSelector, parents_selector: merlin.dag.selector.ColumnSelector, dependencies_selector: merlin.dag.selector.ColumnSelector)merlin.dag.selector.ColumnSelector[source]

Checks the cardinality of the input columns and drops any categorical columns with cardinality less than the specified minimum.

Parameters
  • input_schema (Schema) – The current node’s input schema

  • selector (ColumnSelector) – The current node’s selector

  • parents_selector (ColumnSelector) – A selector for the output columns of the current node’s parents

  • dependencies_selector (ColumnSelector) – A selector for the output columns of the current node’s dependencies

Returns

Selector that contains all non-categorical columns and any categorical columns of at least the minimum cardinality.

Return type

ColumnSelector