FillMedian
-
class
nvtabular.ops.
FillMedian
(add_binary_cols=False)[source] Bases:
nvtabular.ops.stat_operator.StatOperator
This operation replaces missing values with the median value for the column.
Example usage:
# Use FillMedian in a workflow for continuous columns cont_features = ['cont1', 'cont2', 'cont3'] >> ops.FillMedian() processor = nvtabular.Workflow(cont_features)
- Parameters
add_binary_cols (boolean, default False) – When True, adds binary columns that indicate whether cells in each column were filled
-
transform
(col_selector: merlin.dag.selector.ColumnSelector, df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source] Transform the dataframe by applying this operator to the set of input columns
- Parameters
columns (list of str or list of list of str) – The columns to apply this operator to
df (Dataframe) – A pandas or cudf dataframe that this operator will work on
- Returns
Returns a transformed dataframe for this operator
- Return type
DataFrame
-
fit
(col_selector: merlin.dag.selector.ColumnSelector, ddf: dask.dataframe.core.DataFrame)[source] Calculate statistics for this operator, and return a dask future to these statistics, which will be computed by the workflow.