FillMedian

class nvtabular.ops.FillMedian(add_binary_cols=False)[source]

Bases: nvtabular.ops.stat_operator.StatOperator

This operation replaces missing values with the median value for the column.

Example usage:

# Use FillMedian in a workflow for continuous columns
cont_features = ['cont1', 'cont2', 'cont3'] >> ops.FillMedian()
processor = nvtabular.Workflow(cont_features)
Parameters

add_binary_cols (boolean, default False) – When True, adds binary columns that indicate whether cells in each column were filled

transform(col_selector: nvtabular.columns.selector.ColumnSelector, df: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Transform the dataframe by applying this operator to the set of input columns

Parameters
  • columns (list of str or list of list of str) – The columns to apply this operator to

  • df (Dataframe) – A pandas or cudf dataframe that this operator will work on

Returns

Returns a transformed dataframe for this operator

Return type

DataFrame

fit(col_selector: nvtabular.columns.selector.ColumnSelector, ddf: dask.dataframe.core.DataFrame)[source]

Calculate statistics for this operator, and return a dask future to these statistics, which will be computed by the workflow.

fit_finalize(dask_stats)[source]

Finalize statistics calculation - the workflow calls this function with the computed statistics from the ‘fit’ object’

clear()[source]
compute_output_schema(input_schema: nvtabular.columns.schema.Schema, col_selector: nvtabular.columns.selector.ColumnSelector)nvtabular.columns.schema.Schema[source]
output_column_names(col_selector: nvtabular.columns.selector.ColumnSelector)nvtabular.columns.selector.ColumnSelector[source]