ReduceDtypeSize
-
class
nvtabular.ops.
ReduceDtypeSize
(float_dtype=<class 'numpy.float32'>)[source] Bases:
nvtabular.ops.stat_operator.StatOperator
ReduceDtypeSize changes the dtypes of numeric columns. For integer columns this will choose a dtype such that the minimum and maximum values in the column will fit. For float columns this will cast to a float32.
-
fit
(col_selector: merlin.dag.selector.ColumnSelector, ddf: dask.dataframe.core.DataFrame)[source] Calculate statistics for this operator, and return a dask future to these statistics, which will be computed by the workflow.
-
fit_finalize
(dask_stats)[source] Finalize statistics calculation - the workflow calls this function with the computed statistics from the ‘fit’ object’
-
transform
(col_selector: merlin.dag.selector.ColumnSelector, df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source] Transform the dataframe by applying this operator to the set of input columns
- Parameters
columns (list of str or list of list of str) – The columns to apply this operator to
df (Dataframe) – A pandas or cudf dataframe that this operator will work on
- Returns
Returns a transformed dataframe for this operator
- Return type
DataFrame
-
compute_output_schema
(input_schema, selector, prev_output_schema=None)[source] Given a set of schemas and a column selector for the input columns, returns a set of schemas for the transformed columns this operator will produce :param input_schema: The schemas of the columns to apply this operator to :type input_schema: Schema :param col_selector: The column selector to apply to the input schema :type col_selector: ColumnSelector
- Returns
The schemas of the columns produced by this operator
- Return type
Schema
-