Standardizing the features around 0 with a standard deviation of 1 is a common technique to compare measurements that have different units. This operation can be added to the workflow to standardize the features.
It performs Normalization using the mean std method.
# Use Normalize to define a NVTabular workflow cont_features = CONTINUOUS_COLUMNS >> ops.Normalize() processor = nvtabular.Workflow(cont_features)
out_dtype (str) – Specifies the data type for the output columns. The default value is numpy.float64 if not set here
fit(col_selector: merlin.dag.selector.ColumnSelector, ddf: dask.dataframe.core.DataFrame)[source]
Calculate statistics for this operator, and return a dask future to these statistics, which will be computed by the workflow.
Finalize statistics calculation - the workflow calls this function with the computed statistics from the ‘fit’ object’
transform(col_selector: merlin.dag.selector.ColumnSelector, df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]
Transform the dataframe by applying this operator to the set of input columns
columns (list of str or list of list of str) – The columns to apply this operator to
df (Dataframe) – A pandas or cudf dataframe that this operator will work on
Returns a transformed dataframe for this operator
- Return type