ReduceDtypeSize

class nvtabular.ops.ReduceDtypeSize(float_dtype=<class 'numpy.float32'>)[source]

Bases: nvtabular.ops.stat_operator.StatOperator

ReduceDtypeSize changes the dtypes of numeric columns. For integer columns this will choose a dtype such that the minimum and maximum values in the column will fit. For float columns this will cast to a float32.

fit(col_selector: merlin.dag.selector.ColumnSelector, ddf: dask.dataframe.core.DataFrame)[source]

Calculate statistics for this operator, and return a dask future to these statistics, which will be computed by the workflow.

fit_finalize(dask_stats)[source]

Finalize statistics calculation - the workflow calls this function with the computed statistics from the ‘fit’ object’

clear()[source]

zero and reinitialize all relevant statistical properties

transform(col_selector: merlin.dag.selector.ColumnSelector, df: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Transform the dataframe by applying this operator to the set of input columns

Parameters
  • columns (list of str or list of list of str) – The columns to apply this operator to

  • df (Dataframe) – A pandas or cudf dataframe that this operator will work on

Returns

Returns a transformed dataframe for this operator

Return type

DataFrame

compute_output_schema(input_schema, selector, prev_output_schema=None)[source]

Given a set of schemas and a column selector for the input columns, returns a set of schemas for the transformed columns this operator will produce :param input_schema: The schemas of the columns to apply this operator to :type input_schema: Schema :param col_selector: The column selector to apply to the input schema :type col_selector: ColumnSelector

Returns

The schemas of the columns produced by this operator

Return type

Schema