merlin_standard_lib.proto package
Submodules
merlin_standard_lib.proto.schema_bp module
-
class
merlin_standard_lib.proto.schema_bp.LifecycleStage(value)[source] Bases:
betterproto.EnumLifecycleStage. Only UNKNOWN_STAGE, BETA, and PRODUCTION features are actually validated. PLANNED, ALPHA, DISABLED, and DEBUG are treated as DEPRECATED.
-
UNKNOWN_STAGE= 0
-
PLANNED= 1
-
ALPHA= 2
-
BETA= 3
-
PRODUCTION= 4
-
DEPRECATED= 5
-
DEBUG_ONLY= 6
-
DISABLED= 7
-
-
class
merlin_standard_lib.proto.schema_bp.FeatureType(value)[source] Bases:
betterproto.EnumDescribes the physical representation of a feature. It may be different than the logical representation, which is represented as a Domain.
-
TYPE_UNKNOWN= 0
-
BYTES= 1
-
INT= 2
-
FLOAT= 3
-
STRUCT= 4
-
-
class
merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat(value)[source] Bases:
betterproto.EnumAn enumeration.
-
FORMAT_UNKNOWN= 0
-
UNIX_DAYS= 5
-
UNIX_SECONDS= 1
-
UNIX_MILLISECONDS= 2
-
UNIX_MICROSECONDS= 3
-
UNIX_NANOSECONDS= 4
-
-
class
merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat(value)[source] Bases:
betterproto.EnumAn enumeration.
-
FORMAT_UNKNOWN= 0
-
PACKED_64_NANOS= 1
-
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType(value)[source] Bases:
betterproto.EnumAn enumeration.
-
UNSPECIFIED= 0
-
INT64= 1
-
INT32= 2
-
-
class
merlin_standard_lib.proto.schema_bp.Path(step: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA path is a more general substitute for the name of a field or feature that can be used for flat examples as well as structured data. For example, if we had data in a protocol buffer: message Person { int age = 1; optional string gender = 2; repeated Person parent = 3; } Thus, here the path {step:[“parent”, “age”]} in statistics would refer to the age of a parent, and {step:[“parent”, “parent”, “age”]} would refer to the age of a grandparent. This allows us to distinguish between the statistics of parents’ ages and grandparents’ ages. In general, repeated messages are to be preferred to linked lists of arbitrary length. For SequenceExample, if we have a feature list “foo”, this is represented by {step:[“##SEQUENCE##”, “foo”]}.
-
class
merlin_standard_lib.proto.schema_bp.ValueCountList(value_count: List[ForwardRef('ValueCount')] = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>)[source] Bases:
betterproto.Message-
value_count: List[merlin_standard_lib.proto.schema_bp.ValueCount] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.Feature(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>, value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>, domain: str = <betterproto._PLACEHOLDER object>, int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>, float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>, string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>, bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>, struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>, natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>, image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>, mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>, url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>, time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>, time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>, distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, in_environment: List[str] = <betterproto._PLACEHOLDER object>, not_in_environment: List[str] = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageDescribes schema-level information about a specific feature. NextID: 33
-
presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>
-
group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>
-
shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>
-
value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>
-
type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>
-
int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>
-
float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>
-
string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>
-
bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>
-
struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>
-
natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>
-
image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>
-
mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>
-
url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>
-
time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>
-
time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>
-
distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>
-
annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>
-
skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>
-
drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>
-
lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.Annotation(tag: List[str] = <betterproto._PLACEHOLDER object>, comment: List[str] = <betterproto._PLACEHOLDER object>, extra_metadata: List[Any] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageAdditional information about the schema or about a feature.
-
extra_metadata: List[Any] = <betterproto._PLACEHOLDER object>
-
property
metadata
-
-
class
merlin_standard_lib.proto.schema_bp.NumericValueComparator(min_fraction_threshold: float = <betterproto._PLACEHOLDER object>, max_fraction_threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageChecks that the ratio of the current value to the previous value is not below the min_fraction_threshold or above the max_fraction_threshold. That is, previous value * min_fraction_threshold <= current value <= previous value * max_fraction_threshold. To specify that the value cannot change, set both min_fraction_threshold and max_fraction_threshold to 1.0.
-
class
merlin_standard_lib.proto.schema_bp.DatasetConstraints(num_examples_drift_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, num_examples_version_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, min_examples_count: int = <betterproto._PLACEHOLDER object>, max_examples_count: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageConstraints on the entire dataset.
-
num_examples_drift_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>
-
num_examples_version_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.FixedShape(dim: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageSpecifies a fixed shape for the feature’s values. The immediate implication is that each feature has a fixed number of values. Moreover, these values can be parsed in a multi-dimensional tensor using the specified axis sizes. The FixedShape defines a lexicographical ordering of the data. For instance, if there is a FixedShape { dim {size:3} dim {size:2} } then tensor[0][0]=field[0] then tensor[0][1]=field[1] then tensor[1][0]=field[2] then tensor[1][1]=field[3] then tensor[2][0]=field[4] then tensor[2][1]=field[5] The FixedShape message is identical with the TensorFlow TensorShape proto message.
-
dim: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.FixedShapeDim(size: int = <betterproto._PLACEHOLDER object>, name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageAn axis in a multi-dimensional feature representation.
-
class
merlin_standard_lib.proto.schema_bp.ValueCount(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageLimits on maximum and minimum number of values in a single example (when the feature is present). Use this when the minimum value count can be different than the maximum value count. Otherwise prefer FixedShape.
-
class
merlin_standard_lib.proto.schema_bp.WeightedFeature(name: str = <betterproto._PLACEHOLDER object>, feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, weight_feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageRepresents a weighted feature that is encoded as a combination of raw base features. The weight_feature should be a float feature with identical shape as the feature. This is useful for representing weights associated with categorical tokens (e.g. a TFIDF weight associated with each token). TODO(b/142122960): Handle WeightedCategorical end to end in TFX (validation, TFX Unit Testing, etc)
-
feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
weight_feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.SparseFeature(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>, is_sorted: bool = <betterproto._PLACEHOLDER object>, value_feature: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA sparse feature represents a sparse tensor that is encoded with a combination of raw features, namely index features and a value feature. Each index feature defines a list of indices in a different dimension.
-
lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>
-
presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>
-
dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
index_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>
-
value_feature: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>
-
type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature(name: str = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature(name: str = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.DistributionConstraints(min_domain_mass: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageModels constraints on the distribution of a feature’s values. TODO(martinz): replace min_domain_mass with max_off_domain (but slowly).
-
class
merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints(min_coverage: float = <betterproto._PLACEHOLDER object>, min_avg_token_length: float = <betterproto._PLACEHOLDER object>, excluded_string_tokens: List[str] = <betterproto._PLACEHOLDER object>, excluded_int_tokens: List[int] = <betterproto._PLACEHOLDER object>, oov_string_tokens: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes vocabulary coverage constraints.
-
class
merlin_standard_lib.proto.schema_bp.SequenceValueConstraints(int_value: int = <betterproto._PLACEHOLDER object>, string_value: str = <betterproto._PLACEHOLDER object>, min_per_sequence: int = <betterproto._PLACEHOLDER object>, max_per_sequence: int = <betterproto._PLACEHOLDER object>, min_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>, max_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes constraints on specific values in sequences.
-
class
merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints(excluded_int_value: List[int] = <betterproto._PLACEHOLDER object>, excluded_string_value: List[str] = <betterproto._PLACEHOLDER object>, min_sequence_length: int = <betterproto._PLACEHOLDER object>, max_sequence_length: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes constraints on sequence lengths.
-
class
merlin_standard_lib.proto.schema_bp.IntDomain(name: str = <betterproto._PLACEHOLDER object>, min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>, is_categorical: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes information for domains of integer values. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.FloatDomain(name: str = <betterproto._PLACEHOLDER object>, min: float = <betterproto._PLACEHOLDER object>, max: float = <betterproto._PLACEHOLDER object>, disallow_nan: bool = <betterproto._PLACEHOLDER object>, disallow_inf: bool = <betterproto._PLACEHOLDER object>, is_embedding: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes information for domains of float values. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.StructDomain(feature: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageDomain for a recursive struct. NOTE: If a feature with a StructDomain is deprecated, then all the child features (features and sparse_features of the StructDomain) are also considered to be deprecated. Similarly child features can only be in environments of the parent feature.
-
feature: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>
-
sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.StringDomain(name: str = <betterproto._PLACEHOLDER object>, value: List[str] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes information for domains of string values.
-
class
merlin_standard_lib.proto.schema_bp.BoolDomain(name: str = <betterproto._PLACEHOLDER object>, true_value: str = <betterproto._PLACEHOLDER object>, false_value: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageEncodes information about the domain of a boolean attribute that encodes its TRUE/FALSE values as strings, or 0=false, 1=true. Note that FeatureType could be either INT or BYTES.
-
class
merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain(vocabulary: str = <betterproto._PLACEHOLDER object>, coverage: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>, token_constraints: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>, sequence_length_constraints: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>, location_constraint_regex: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageNatural language text.
-
coverage: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>
-
token_constraints: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>
-
sequence_length_constraints: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.ImageDomain(minimum_supported_image_fraction: float = <betterproto._PLACEHOLDER object>, max_image_byte_size: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageImage data.
-
class
merlin_standard_lib.proto.schema_bp.MIDDomain[source] Bases:
betterproto.MessageKnowledge graph ID, see: https://www.wikidata.org/wiki/Property:P646
-
class
merlin_standard_lib.proto.schema_bp.URLDomain[source] Bases:
betterproto.MessageA URL, see: https://en.wikipedia.org/wiki/URL
-
class
merlin_standard_lib.proto.schema_bp.TimeDomain(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageTime or date representation.
-
integer_format: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.TimeOfDayDomain(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageTime of day, without a particular date.
-
integer_format: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.FeaturePresence(min_fraction: float = <betterproto._PLACEHOLDER object>, min_count: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageDescribes constraints on the presence of the feature in the data.
-
class
merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup(required: bool = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageRecords constraints on the presence of a feature inside a “group” context (e.g., .presence inside a group of features that define a sequence).
-
class
merlin_standard_lib.proto.schema_bp.InfinityNorm(threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageChecks that the L-infinity norm is below a certain threshold between the two discrete distributions. Since this is applied to a FeatureNameStatistics, it only considers the top K.
\[l_{\infty}(p,q) = max_{i} | p_{i} - q_{i} |\]
-
class
merlin_standard_lib.proto.schema_bp.JensenShannonDivergence(threshold: float = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageChecks that the approximate Jensen-Shannon Divergence is below a certain threshold between the two distributions.
-
class
merlin_standard_lib.proto.schema_bp.FeatureComparator(infinity_norm: 'InfinityNorm' = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>, jensen_shannon_divergence: 'JensenShannonDivergence' = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>)[source] Bases:
betterproto.Message-
infinity_norm: merlin_standard_lib.proto.schema_bp.InfinityNorm = <betterproto._PLACEHOLDER object>
-
jensen_shannon_divergence: merlin_standard_lib.proto.schema_bp.JensenShannonDivergence = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.UniqueConstraints(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageChecks that the number of unique values is greater than or equal to the min, and less than or equal to the max.
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentation(dense_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>, varlen_sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>, sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>, ragged_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA TensorRepresentation captures the intent for converting columns in a dataset to TensorFlow Tensors (or more generally, tf.CompositeTensors). Note that one tf.CompositeTensor may consist of data from multiple columns, for example, a N-dimensional tf.SparseTensor may need N + 1 columns to provide the sparse indices and values. Note that the “column name” that a TensorRepresentation needs is a string, not a Path – it means that the column name identifies a top-level Feature in the schema (i.e. you cannot specify a Feature nested in a STRUCT Feature).
-
dense_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>
-
varlen_sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>
-
sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>
-
ragged_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue(float_value: float = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>, int_value: int = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>, bytes_value: bytes = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>, uint_value: int = <betterproto._PLACEHOLDER object at 0x7f42a57dc4c0>)[source] Bases:
betterproto.Message
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor(column_name: str = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, default_value: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA tf.Tensor
-
shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
default_value: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor(column_name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA ragged tf.SparseTensor that models nested lists.
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor(dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_column_names: List[str] = <betterproto._PLACEHOLDER object>, value_column_name: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA tf.SparseTensor whose indices and values come from separate data columns. This will replace Schema.sparse_feature eventually. The index columns must be of INT type, and all the columns must co-occur and have the same valency at the same row.
-
dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor(feature_path: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, partition: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>, row_partition_dtype: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA tf.RaggedTensor that models nested lists. Currently there is no way for the user to specify the shape of the leaf value (the innermost value tensor of the RaggedTensor). The leaf value will always be a 1-D tensor.
-
feature_path: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>
-
partition: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>
-
row_partition_dtype: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>
-
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition(uniform_row_length: int = <betterproto._PLACEHOLDER object>, row_length: str = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageFurther partition of the feature values at the leaf level.
-
class
merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup(tensor_representation: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>)[source] Bases:
betterproto.MessageA TensorRepresentationGroup is a collection of TensorRepresentations with names. These names may serve as identifiers when converting the dataset to a collection of Tensors or tf.CompositeTensors. For example, given the following group: { key: “dense_tensor” tensor_representation { dense_tensor { column_name: “univalent_feature” shape { dim { size: 1 } } default_value { float_value: 0 } } } } { key: “varlen_sparse_tensor” tensor_representation { varlen_sparse_tensor { column_name: “multivalent_feature” } } } Then the schema is expected to have feature “univalent_feature” and “multivalent_feature”, and when a batch of data is converted to Tensors using this TensorRepresentationGroup, the result may be the following dict: { “dense_tensor”: tf.Tensor(…), “varlen_sparse_tensor”: tf.SparseTensor(…), }
-
tensor_representation: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>
-