merlin.models.tf.HashedCrossAll#

merlin.models.tf.HashedCrossAll(schema: merlin.schema.schema.Schema, num_bins: Optional[int] = None, infer_num_bins: bool = False, max_num_bins: int = 100000, max_level: int = 2, sparse: bool = False, output_mode: str = 'one_hot', ignore_combinations: Sequence[Sequence[str]] = []) → merlin.models.tf.core.combinators.ParallelBlock[source]#

Parallel block consists of HashedCross blocks for all combinations of schema with all levels: through level 2 to max_level.

schema: Schema

Schema of the input data.

max_level: int

Max num of levels, this function would hash cross all combinations, the number of features included in these combinations is in the range from 2 to max_level, i.e. [2, max_level], by default 2, which means it would return hashed cross blocks of all level 2 combinations of features within schema

For example, if schemas contain 3 features: feature_1, feature_2 and feature_3, if we call

level_3_cross = HashedCrossAll(schema = schemas, max_level = 3)

Then level_3_cross is a Parallel block, which contains 4 hashed crosses of

feature_1 and feature_2
feature_1 and feature_3
feature_2 and feature_3
feature_1, feature_2 and feature_3

num_binsint

Number of hash bins, note that num_bins is for all hashed cross transformation block, no matter what level it is, if you want to set different num_bins for different hashed cross, please use HashedCross to define each one with different num_bins.

output_mode: string

Specification for the output of the layer. Defaults to “one_hot”. Values can be “int”, or “one_hot” configuring the layer as follows: - “int”: Return the integer bin indices directly. - “one_hot”: Encodes each individual element in the input into an

array the same size as num_bins, containing a 1 at the input’s bin index.

sparsebool

Boolean. Only applicable to “one_hot” mode. If True, returns a SparseTensor instead of a dense Tensor. Defaults to False.

infer_num_bins: bool

If True, all num_bins would be set as the multiplier of corresponding feature cadinalities, if the multiplier is bigger than max_num_bins, then it would be cliped by max_num_bins

max_num_bins: int

Upper bound of num_bins for all hashed cross transformation blocks, by default 100000.

ignore_combinationsSequence[Sequence[str]]

If provided, ignore feature combinations from this list. Useful to avoid interacting features whose combined value is always the same. For example, interacting these features is not useful and one of the features is dependent on the other : [[“item_id”, “item_category”], [“user_id”, “user_birth_city”, “user_age”]]

Example usage:

level_3_cross = HashedCrossAll(schema = schemas, max_level = 3, infer_num_bins = True)