API Documentation
Workflow Constructors

The Workflow class applies a graph of operations onto a dataset, letting you transform datasets to do feature engineering and preprocessing operations. 

WorkflowNode represents a Node in a NVTabular workflow graph 
Categorical Operators

This operation transforms continuous features into categorical features with bins based on the provided bin boundaries. 

Most of the data set will contain categorical features, and these variables are typically stored as text values. 

DropLowCardinality drops low cardinality categorical columns. 

This op maps categorical columns to a contiguous integer range by first hashing the column, then reducing modulo the number of buckets. 

This ops creates hashed cross columns by first combining categorical features and hashing the combined feature, then reducing modulo the number of buckets. 

Target encoding is a common featureengineering technique for categorical columns in tabular datasets. 
Continuous Operators

This operation clips continuous values so that they are within a min/max bound. For instance by setting the min value to 0, you can replace all negative values with 0. This is helpful in cases where you want to log normalize values::. 

This operator calculates the log of continuous columns. 

Standardizing the features around 0 with a standard deviation of 1 is a common technique to compare measurements that have different units. 

This operator standardizes continuous features such that they are between 0 and 1. 
Missing Value Operators

This operation detects and filters out rows with missing values. 

This operation replaces missing values with a constant predefined value 

This operation replaces missing values with the median value for the column. 
Row Manipulation Operators

Calculates the difference between two consecutive rows of the dataset. 

Filters rows from the dataset. 

Groupby Transformation 

Join each dataset partition to an external table. 

One of the ways to create new features is to calculate the basic statistics of the data that is grouped by categorical features. 
Schema Operators

This operator will add user defined tags and properties to a Schema. 





This operation renames columns by one of several methods: 

ReduceDtypeSize changes the dtypes of numeric columns. 







List Operators

Slices a list column 
The operator calculates the min and max lengths of multihot columns. 
Vector Operators

Calculates the similarity between two columns using tfidf, cosine or inner product as the distance metric. 
UserDefined Function Operators

LambdaOp allows you to apply row level functions to an NVTabular workflow. 