nvtabular.workflow.workflow.Workflow
- 
class nvtabular.workflow.workflow.Workflow(output_node: nvtabular.workflow.node.WorkflowNode, client: Optional[distributed.Client] = None)[source]
- Bases: - object- The Workflow class applies a graph of operations onto a dataset, letting you transform datasets to do feature engineering and preprocessing operations. This class follows an API similar to Transformers in sklearn: we first - fitthe workflow by calculating statistics on the dataset, and then once fit we can- transformdatasets by applying these statistics.- Example usage: - # define a graph of operations cat_features = CAT_COLUMNS >> nvtabular.ops.Categorify() cont_features = CONT_COLUMNS >> nvtabular.ops.FillMissing() >> nvtabular.ops.Normalize() workflow = nvtabular.Workflow(cat_features + cont_features + "label") # calculate statistics on the training dataset workflow.fit(merlin.io.Dataset(TRAIN_PATH)) # transform the training and validation datasets and write out as parquet workflow.transform(merlin.io.Dataset(TRAIN_PATH)).to_parquet(output_path=TRAIN_OUT_PATH) workflow.transform(merlin.io.Dataset(VALID_PATH)).to_parquet(output_path=VALID_OUT_PATH) - Parameters
- output_node (WorkflowNode) – The last node in the graph of operators this workflow should apply 
 - 
__init__(output_node: nvtabular.workflow.node.WorkflowNode, client: Optional[distributed.Client] = None)[source]
 - Methods - __init__(output_node[, client])- Removes calculated statistics from each node in the workflow graph - fit(dataset)- Calculates statistics for this workflow on the input dataset - fit_schema(input_schema)- Fits the schema onto the workflow, computing the Schema for each node in the Workflow Graph - fit_transform(dataset)- Convenience method to both fit the workflow and transform the dataset in a single call. - load(path[, client])- Load up a saved workflow object from disk - remove_inputs(input_cols)- Removes input columns from the workflow. - save(path)- Save this workflow to disk - transform(dataset)- Transforms the dataset by applying the graph of operators to it. - Attributes - 
transform(dataset: merlin.io.dataset.Dataset) → merlin.io.dataset.Dataset[source]
- Transforms the dataset by applying the graph of operators to it. Requires the - fitmethod to have already been called, or calculated statistics to be loaded from disk- This method returns a Dataset object, with the transformations lazily loaded. None of the actual computation will happen until the produced Dataset is consumed, or written out to disk. - Parameters
- dataset (Dataset) – Input dataset to transform 
- Returns
- Transformed Dataset with the workflow graph applied to it 
- Return type
- Dataset 
 
 - 
fit_schema(input_schema: merlin.schema.schema.Schema)[source]
- Fits the schema onto the workflow, computing the Schema for each node in the Workflow Graph - Parameters
- input_schema (Schema) – The input schema to use 
- Returns
- This workflow where each node in the graph has a fitted schema 
- Return type
 
 - 
property input_dtypes
 - 
property input_schema
 - 
property output_schema
 - 
property output_dtypes
 - 
property output_node
 - 
remove_inputs(input_cols) → nvtabular.workflow.workflow.Workflow[source]
- Removes input columns from the workflow. - This is useful for the case of inference where you might need to remove label columns from the processed set. - Parameters
- input_cols (list of str) – List of column names to 
- Returns
- This workflow with the input columns removed from it 
- Return type
 - See also 
 - 
fit(dataset: merlin.io.dataset.Dataset) → nvtabular.workflow.workflow.Workflow[source]
- Calculates statistics for this workflow on the input dataset - Parameters
- dataset (Dataset) – The input dataset to calculate statistics for. If there is a train/test split this data should be the training dataset only. 
- Returns
- This Workflow with statistics calculated on it 
- Return type
 
 - 
fit_transform(dataset: merlin.io.dataset.Dataset) → merlin.io.dataset.Dataset[source]
- Convenience method to both fit the workflow and transform the dataset in a single call. Equivalent to calling - workflow.fit(dataset)followed by- workflow.transform(dataset)- Parameters
- dataset (Dataset) – Input dataset to calculate statistics on, and transform results 
- Returns
- Transformed Dataset with the workflow graph applied to it 
- Return type
- Dataset 
 
 - 
save(path)[source]
- Save this workflow to disk - Parameters
- path (str) – The path to save the workflow to 
 
 - 
classmethod load(path, client=None) → nvtabular.workflow.workflow.Workflow[source]
- Load up a saved workflow object from disk