merlin_standard_lib.proto package
Submodules
merlin_standard_lib.proto.schema_bp module
-
class
merlin_standard_lib.proto.schema_bp.
LifecycleStage
(value)[source] Bases:
betterproto.Enum
LifecycleStage. Only UNKNOWN_STAGE, BETA, and PRODUCTION features are actually validated. PLANNED, ALPHA, DISABLED, and DEBUG are treated as DEPRECATED.
-
UNKNOWN_STAGE
= 0
-
PLANNED
= 1
-
ALPHA
= 2
-
BETA
= 3
-
PRODUCTION
= 4
-
DEPRECATED
= 5
-
DEBUG_ONLY
= 6
-
DISABLED
= 7
-
-
class
merlin_standard_lib.proto.schema_bp.
FeatureType
(value)[source] Bases:
betterproto.Enum
Describes the physical representation of a feature. It may be different than the logical representation, which is represented as a Domain.
-
TYPE_UNKNOWN
= 0
-
BYTES
= 1
-
INT
= 2
-
FLOAT
= 3
-
STRUCT
= 4
-
-
class
merlin_standard_lib.proto.schema_bp.
TimeDomainIntegerTimeFormat
(value)[source] Bases:
betterproto.Enum
An enumeration.
-
FORMAT_UNKNOWN
= 0
-
UNIX_DAYS
= 5
-
UNIX_SECONDS
= 1
-
UNIX_MILLISECONDS
= 2
-
UNIX_MICROSECONDS
= 3
-
UNIX_NANOSECONDS
= 4
-
-
class
merlin_standard_lib.proto.schema_bp.
TimeOfDayDomainIntegerTimeOfDayFormat
(value)[source] Bases:
betterproto.Enum
An enumeration.
-
FORMAT_UNKNOWN
= 0
-
PACKED_64_NANOS
= 1
-
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationRowPartitionDType
(value)[source] Bases:
betterproto.Enum
An enumeration.
-
UNSPECIFIED
= 0
-
INT64
= 1
-
INT32
= 2
-
-
class
merlin_standard_lib.proto.schema_bp.
Path
(step: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A path is a more general substitute for the name of a field or feature that can be used for flat examples as well as structured data. For example, if we had data in a protocol buffer: message Person { int age = 1; optional string gender = 2; repeated Person parent = 3; } Thus, here the path {step:[“parent”, “age”]} in statistics would refer to the age of a parent, and {step:[“parent”, “parent”, “age”]} would refer to the age of a grandparent. This allows us to distinguish between the statistics of parents’ ages and grandparents’ ages. In general, repeated messages are to be preferred to linked lists of arbitrary length. For SequenceExample, if we have a feature list “foo”, this is represented by {step:[“##SEQUENCE##”, “foo”]}.
-
class
merlin_standard_lib.proto.schema_bp.
ValueCountList
(value_count: List[ForwardRef('ValueCount')] = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>)[source] Bases:
betterproto.Message
-
value_count
: List[merlin_standard_lib.proto.schema_bp.ValueCount] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
Feature
(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>, value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>, domain: str = <betterproto._PLACEHOLDER object>, int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>, float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>, string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>, bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>, struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>, natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>, image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>, mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>, url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>, time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>, time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>, distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, in_environment: List[str] = <betterproto._PLACEHOLDER object>, not_in_environment: List[str] = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Describes schema-level information about a specific feature. NextID: 33
-
presence
: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>
-
group_presence
: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>
-
shape
: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
value_count
: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>
-
value_counts
: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>
-
type
: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>
-
int_domain
: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>
-
float_domain
: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>
-
string_domain
: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>
-
bool_domain
: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>
-
struct_domain
: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>
-
natural_language_domain
: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>
-
image_domain
: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>
-
mid_domain
: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>
-
url_domain
: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>
-
time_domain
: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>
-
time_of_day_domain
: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>
-
distribution_constraints
: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>
-
annotation
: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>
-
skew_comparator
: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>
-
drift_comparator
: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>
-
lifecycle_stage
: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
unique_constraints
: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
Annotation
(tag: List[str] = <betterproto._PLACEHOLDER object>, comment: List[str] = <betterproto._PLACEHOLDER object>, extra_metadata: List[Any] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Additional information about the schema or about a feature.
-
extra_metadata
: List[Any] = <betterproto._PLACEHOLDER object>
-
property
metadata
-
-
class
merlin_standard_lib.proto.schema_bp.
NumericValueComparator
(min_fraction_threshold: float = <betterproto._PLACEHOLDER object>, max_fraction_threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Checks that the ratio of the current value to the previous value is not below the min_fraction_threshold or above the max_fraction_threshold. That is, previous value * min_fraction_threshold <= current value <= previous value * max_fraction_threshold. To specify that the value cannot change, set both min_fraction_threshold and max_fraction_threshold to 1.0.
-
class
merlin_standard_lib.proto.schema_bp.
DatasetConstraints
(num_examples_drift_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, num_examples_version_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, min_examples_count: int = <betterproto._PLACEHOLDER object>, max_examples_count: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Constraints on the entire dataset.
-
num_examples_drift_comparator
: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>
-
num_examples_version_comparator
: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
FixedShape
(dim: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Specifies a fixed shape for the feature’s values. The immediate implication is that each feature has a fixed number of values. Moreover, these values can be parsed in a multi-dimensional tensor using the specified axis sizes. The FixedShape defines a lexicographical ordering of the data. For instance, if there is a FixedShape { dim {size:3} dim {size:2} } then tensor[0][0]=field[0] then tensor[0][1]=field[1] then tensor[1][0]=field[2] then tensor[1][1]=field[3] then tensor[2][0]=field[4] then tensor[2][1]=field[5] The FixedShape message is identical with the TensorFlow TensorShape proto message.
-
dim
: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
FixedShapeDim
(size: int = <betterproto._PLACEHOLDER object>, name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
An axis in a multi-dimensional feature representation.
-
class
merlin_standard_lib.proto.schema_bp.
ValueCount
(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Limits on maximum and minimum number of values in a single example (when the feature is present). Use this when the minimum value count can be different than the maximum value count. Otherwise prefer FixedShape.
-
class
merlin_standard_lib.proto.schema_bp.
WeightedFeature
(name: str = <betterproto._PLACEHOLDER object>, feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, weight_feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Represents a weighted feature that is encoded as a combination of raw base features. The weight_feature should be a float feature with identical shape as the feature. This is useful for representing weights associated with categorical tokens (e.g. a TFIDF weight associated with each token). TODO(b/142122960): Handle WeightedCategorical end to end in TFX (validation, TFX Unit Testing, etc)
-
feature
: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
weight_feature
: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
lifecycle_stage
: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
SparseFeature
(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>, is_sorted: bool = <betterproto._PLACEHOLDER object>, value_feature: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A sparse feature represents a sparse tensor that is encoded with a combination of raw features, namely index features and a value feature. Each index feature defines a list of indices in a different dimension.
-
lifecycle_stage
: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
presence
: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>
-
dense_shape
: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
index_feature
: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>
-
value_feature
: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>
-
type
: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
SparseFeatureIndexFeature
(name: str = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.
SparseFeatureValueFeature
(name: str = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.
DistributionConstraints
(min_domain_mass: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Models constraints on the distribution of a feature’s values. TODO(martinz): replace min_domain_mass with max_off_domain (but slowly).
-
class
merlin_standard_lib.proto.schema_bp.
FeatureCoverageConstraints
(min_coverage: float = <betterproto._PLACEHOLDER object>, min_avg_token_length: float = <betterproto._PLACEHOLDER object>, excluded_string_tokens: List[str] = <betterproto._PLACEHOLDER object>, excluded_int_tokens: List[int] = <betterproto._PLACEHOLDER object>, oov_string_tokens: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes vocabulary coverage constraints.
-
class
merlin_standard_lib.proto.schema_bp.
SequenceValueConstraints
(int_value: int = <betterproto._PLACEHOLDER object>, string_value: str = <betterproto._PLACEHOLDER object>, min_per_sequence: int = <betterproto._PLACEHOLDER object>, max_per_sequence: int = <betterproto._PLACEHOLDER object>, min_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>, max_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes constraints on specific values in sequences.
-
class
merlin_standard_lib.proto.schema_bp.
SequenceLengthConstraints
(excluded_int_value: List[int] = <betterproto._PLACEHOLDER object>, excluded_string_value: List[str] = <betterproto._PLACEHOLDER object>, min_sequence_length: int = <betterproto._PLACEHOLDER object>, max_sequence_length: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes constraints on sequence lengths.
-
class
merlin_standard_lib.proto.schema_bp.
IntDomain
(name: str = <betterproto._PLACEHOLDER object>, min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>, is_categorical: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes information for domains of integer values. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.
FloatDomain
(name: str = <betterproto._PLACEHOLDER object>, min: float = <betterproto._PLACEHOLDER object>, max: float = <betterproto._PLACEHOLDER object>, disallow_nan: bool = <betterproto._PLACEHOLDER object>, disallow_inf: bool = <betterproto._PLACEHOLDER object>, is_embedding: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes information for domains of float values. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.
StructDomain
(feature: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Domain for a recursive struct. NOTE: If a feature with a StructDomain is deprecated, then all the child features (features and sparse_features of the StructDomain) are also considered to be deprecated. Similarly child features can only be in environments of the parent feature.
-
feature
: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>
-
sparse_feature
: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
StringDomain
(name: str = <betterproto._PLACEHOLDER object>, value: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes information for domains of string values.
-
class
merlin_standard_lib.proto.schema_bp.
BoolDomain
(name: str = <betterproto._PLACEHOLDER object>, true_value: str = <betterproto._PLACEHOLDER object>, false_value: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Encodes information about the domain of a boolean attribute that encodes its TRUE/FALSE values as strings, or 0=false, 1=true. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.
NaturalLanguageDomain
(vocabulary: str = <betterproto._PLACEHOLDER object>, coverage: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>, token_constraints: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>, sequence_length_constraints: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>, location_constraint_regex: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Natural language text.
-
coverage
: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>
-
token_constraints
: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>
-
sequence_length_constraints
: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
ImageDomain
(minimum_supported_image_fraction: float = <betterproto._PLACEHOLDER object>, max_image_byte_size: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Image data.
-
class
merlin_standard_lib.proto.schema_bp.
MIDDomain
[source] Bases:
betterproto.Message
Knowledge graph ID, see: https://www.wikidata.org/wiki/Property:P646
-
class
merlin_standard_lib.proto.schema_bp.
URLDomain
[source] Bases:
betterproto.Message
A URL, see: https://en.wikipedia.org/wiki/URL
-
class
merlin_standard_lib.proto.schema_bp.
TimeDomain
(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Time or date representation.
-
integer_format
: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
TimeOfDayDomain
(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Time of day, without a particular date.
-
integer_format
: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
FeaturePresence
(min_fraction: float = <betterproto._PLACEHOLDER object>, min_count: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Describes constraints on the presence of the feature in the data.
-
class
merlin_standard_lib.proto.schema_bp.
FeaturePresenceWithinGroup
(required: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Records constraints on the presence of a feature inside a “group” context (e.g., .presence inside a group of features that define a sequence).
-
class
merlin_standard_lib.proto.schema_bp.
InfinityNorm
(threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Checks that the L-infinity norm is below a certain threshold between the two discrete distributions. Since this is applied to a FeatureNameStatistics, it only considers the top K.
\[l_{\infty}(p,q) = max_{i} | p_{i} - q_{i} |\]
-
class
merlin_standard_lib.proto.schema_bp.
JensenShannonDivergence
(threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Checks that the approximate Jensen-Shannon Divergence is below a certain threshold between the two distributions.
-
class
merlin_standard_lib.proto.schema_bp.
FeatureComparator
(infinity_norm: 'InfinityNorm' = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>, jensen_shannon_divergence: 'JensenShannonDivergence' = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>)[source] Bases:
betterproto.Message
-
infinity_norm
: merlin_standard_lib.proto.schema_bp.InfinityNorm = <betterproto._PLACEHOLDER object>
-
jensen_shannon_divergence
: merlin_standard_lib.proto.schema_bp.JensenShannonDivergence = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
UniqueConstraints
(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Checks that the number of unique values is greater than or equal to the min, and less than or equal to the max.
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentation
(dense_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>, varlen_sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>, sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>, ragged_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A TensorRepresentation captures the intent for converting columns in a dataset to TensorFlow Tensors (or more generally, tf.CompositeTensors). Note that one tf.CompositeTensor may consist of data from multiple columns, for example, a N-dimensional tf.SparseTensor may need N + 1 columns to provide the sparse indices and values. Note that the “column name” that a TensorRepresentation needs is a string, not a Path – it means that the column name identifies a top-level Feature in the schema (i.e. you cannot specify a Feature nested in a STRUCT Feature).
-
dense_tensor
: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>
-
varlen_sparse_tensor
: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>
-
sparse_tensor
: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>
-
ragged_tensor
: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationDefaultValue
(float_value: float = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>, int_value: int = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>, bytes_value: bytes = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>, uint_value: int = <betterproto._PLACEHOLDER object at 0x7fd08a9df5b0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationDenseTensor
(column_name: str = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, default_value: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A tf.Tensor
-
shape
: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
default_value
: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationVarLenSparseTensor
(column_name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A ragged tf.SparseTensor that models nested lists.
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationSparseTensor
(dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_column_names: List[str] = <betterproto._PLACEHOLDER object>, value_column_name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A tf.SparseTensor whose indices and values come from separate data columns. This will replace Schema.sparse_feature eventually. The index columns must be of INT type, and all the columns must co-occur and have the same valency at the same row.
-
dense_shape
: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationRaggedTensor
(feature_path: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, partition: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>, row_partition_dtype: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A tf.RaggedTensor that models nested lists. Currently there is no way for the user to specify the shape of the leaf value (the innermost value tensor of the RaggedTensor). The leaf value will always be a 1-D tensor.
-
feature_path
: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
partition
: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>
-
row_partition_dtype
: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationRaggedTensorPartition
(uniform_row_length: int = <betterproto._PLACEHOLDER object>, row_length: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
Further partition of the feature values at the leaf level.
-
class
merlin_standard_lib.proto.schema_bp.
TensorRepresentationGroup
(tensor_representation: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.Message
A TensorRepresentationGroup is a collection of TensorRepresentations with names. These names may serve as identifiers when converting the dataset to a collection of Tensors or tf.CompositeTensors. For example, given the following group: { key: “dense_tensor” tensor_representation { dense_tensor { column_name: “univalent_feature” shape { dim { size: 1 } } default_value { float_value: 0 } } } } { key: “varlen_sparse_tensor” tensor_representation { varlen_sparse_tensor { column_name: “multivalent_feature” } } } Then the schema is expected to have feature “univalent_feature” and “multivalent_feature”, and when a batch of data is converted to Tensors using this TensorRepresentationGroup, the result may be the following dict: { “dense_tensor”: tf.Tensor(…), “varlen_sparse_tensor”: tf.SparseTensor(…), }
-
tensor_representation
: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>
-