merlin.models.tf.HashedCrossAll#
- merlin.models.tf.HashedCrossAll(schema: merlin.schema.schema.Schema, num_bins: Optional[int] = None, infer_num_bins: bool = False, max_num_bins: int = 100000, max_level: int = 2, sparse: bool = False, output_mode: str = 'one_hot', ignore_combinations: Sequence[Sequence[str]] = []) merlin.models.tf.core.combinators.ParallelBlock [source]#
- Parallel block consists of HashedCross blocks for all combinations of schema with all levels
through level 2 to max_level.
- schema: Schema
Schema of the input data.
- max_level: int
Max num of levels, this function would hash cross all combinations, the number of features included in these combinations is in the range from 2 to max_level, i.e. [2, max_level], by default 2, which means it would return hashed cross blocks of all level 2 combinations of features within schema
- For example, if schemas contain 3 features: feature_1, feature_2 and feature_3, if we call
level_3_cross = HashedCrossAll(schema = schemas, max_level = 3)
- Then level_3_cross is a Parallel block, which contains 4 hashed crosses of
feature_1 and feature_2
feature_1 and feature_3
feature_2 and feature_3
feature_1, feature_2 and feature_3
- num_binsint
Number of hash bins, note that num_bins is for all hashed cross transformation block, no matter what level it is, if you want to set different num_bins for different hashed cross, please use HashedCross to define each one with different num_bins.
- output_mode: string
Specification for the output of the layer. Defaults to “one_hot”. Values can be “int”, or “one_hot” configuring the layer as follows: - “int”: Return the integer bin indices directly. - “one_hot”: Encodes each individual element in the input into an
array the same size as num_bins, containing a 1 at the input’s bin index.
- sparsebool
Boolean. Only applicable to “one_hot” mode. If True, returns a SparseTensor instead of a dense Tensor. Defaults to False.
- infer_num_bins: bool
If True, all num_bins would be set as the multiplier of corresponding feature cadinalities, if the multiplier is bigger than max_num_bins, then it would be cliped by max_num_bins
- max_num_bins: int
Upper bound of num_bins for all hashed cross transformation blocks, by default 100000.
- ignore_combinationsSequence[Sequence[str]]
If provided, ignore feature combinations from this list. Useful to avoid interacting features whose combined value is always the same. For example, interacting these features is not useful and one of the features is dependent on the other : [[“item_id”, “item_category”], [“user_id”, “user_birth_city”, “user_age”]]
Example usage:
level_3_cross = HashedCrossAll(schema = schemas, max_level = 3, infer_num_bins = True)