merlin_standard_lib.proto package

Submodules

merlin_standard_lib.proto.schema_bp module

class merlin_standard_lib.proto.schema_bp.LifecycleStage(value)[source]

Bases: betterproto.Enum

LifecycleStage. Only UNKNOWN_STAGE, BETA, and PRODUCTION features are actually validated. PLANNED, ALPHA, DISABLED, and DEBUG are treated as DEPRECATED.

UNKNOWN_STAGE = 0

PLANNED = 1

ALPHA = 2

BETA = 3

PRODUCTION = 4

DEPRECATED = 5

DEBUG_ONLY = 6

DISABLED = 7

class merlin_standard_lib.proto.schema_bp.FeatureType(value)[source]

Bases: betterproto.Enum

Describes the physical representation of a feature. It may be different than the logical representation, which is represented as a Domain.

TYPE_UNKNOWN = 0

BYTES = 1

INT = 2

FLOAT = 3

STRUCT = 4

class merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat(value)[source]

Bases: betterproto.Enum

An enumeration.

FORMAT_UNKNOWN = 0

UNIX_DAYS = 5

UNIX_SECONDS = 1

UNIX_MILLISECONDS = 2

UNIX_MICROSECONDS = 3

UNIX_NANOSECONDS = 4

class merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat(value)[source]

Bases: betterproto.Enum

An enumeration.

FORMAT_UNKNOWN = 0

PACKED_64_NANOS = 1

class merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType(value)[source]

Bases: betterproto.Enum

An enumeration.

UNSPECIFIED = 0

INT64 = 1

INT32 = 2

class merlin_standard_lib.proto.schema_bp.Path(step: List[str] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A path is a more general substitute for the name of a field or feature that can be used for flat examples as well as structured data. For example, if we had data in a protocol buffer: message Person { int age = 1; optional string gender = 2; repeated Person parent = 3; } Thus, here the path {step:[“parent”, “age”]} in statistics would refer to the age of a parent, and {step:[“parent”, “parent”, “age”]} would refer to the age of a grandparent. This allows us to distinguish between the statistics of parents’ ages and grandparents’ ages. In general, repeated messages are to be preferred to linked lists of arbitrary length. For SequenceExample, if we have a feature list “foo”, this is represented by {step:[“##SEQUENCE##”, “foo”]}.

step: List[str] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.ValueCountList(value_count: List[ForwardRef('ValueCount')] = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>)[source]

Bases: betterproto.Message

value_count: List[merlin_standard_lib.proto.schema_bp.ValueCount] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.Feature(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>, value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>, domain: str = <betterproto._PLACEHOLDER object>, int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>, float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>, string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>, bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>, struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>, natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>, image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>, mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>, url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>, time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>, time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>, distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>, annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>, skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>, in_environment: List[str] = <betterproto._PLACEHOLDER object>, not_in_environment: List[str] = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Describes schema-level information about a specific feature. NextID: 33

name: str = <betterproto._PLACEHOLDER object>

deprecated: bool = <betterproto._PLACEHOLDER object>

presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>

group_presence: merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup = <betterproto._PLACEHOLDER object>

shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>

value_count: merlin_standard_lib.proto.schema_bp.ValueCount = <betterproto._PLACEHOLDER object>

value_counts: merlin_standard_lib.proto.schema_bp.ValueCountList = <betterproto._PLACEHOLDER object>

type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>

domain: str = <betterproto._PLACEHOLDER object>

int_domain: merlin_standard_lib.proto.schema_bp.IntDomain = <betterproto._PLACEHOLDER object>

float_domain: merlin_standard_lib.proto.schema_bp.FloatDomain = <betterproto._PLACEHOLDER object>

string_domain: merlin_standard_lib.proto.schema_bp.StringDomain = <betterproto._PLACEHOLDER object>

bool_domain: merlin_standard_lib.proto.schema_bp.BoolDomain = <betterproto._PLACEHOLDER object>

struct_domain: merlin_standard_lib.proto.schema_bp.StructDomain = <betterproto._PLACEHOLDER object>

natural_language_domain: merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain = <betterproto._PLACEHOLDER object>

image_domain: merlin_standard_lib.proto.schema_bp.ImageDomain = <betterproto._PLACEHOLDER object>

mid_domain: merlin_standard_lib.proto.schema_bp.MIDDomain = <betterproto._PLACEHOLDER object>

url_domain: merlin_standard_lib.proto.schema_bp.URLDomain = <betterproto._PLACEHOLDER object>

time_domain: merlin_standard_lib.proto.schema_bp.TimeDomain = <betterproto._PLACEHOLDER object>

time_of_day_domain: merlin_standard_lib.proto.schema_bp.TimeOfDayDomain = <betterproto._PLACEHOLDER object>

distribution_constraints: merlin_standard_lib.proto.schema_bp.DistributionConstraints = <betterproto._PLACEHOLDER object>

annotation: merlin_standard_lib.proto.schema_bp.Annotation = <betterproto._PLACEHOLDER object>

skew_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>

drift_comparator: merlin_standard_lib.proto.schema_bp.FeatureComparator = <betterproto._PLACEHOLDER object>

in_environment: List[str] = <betterproto._PLACEHOLDER object>

not_in_environment: List[str] = <betterproto._PLACEHOLDER object>

lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>

unique_constraints: merlin_standard_lib.proto.schema_bp.UniqueConstraints = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.Annotation(tag: List[str] = <betterproto._PLACEHOLDER object>, comment: List[str] = <betterproto._PLACEHOLDER object>, extra_metadata: List[Any] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Additional information about the schema or about a feature.

tag: List[str] = <betterproto._PLACEHOLDER object>

comment: List[str] = <betterproto._PLACEHOLDER object>

extra_metadata: List[Any] = <betterproto._PLACEHOLDER object>

property metadata

class merlin_standard_lib.proto.schema_bp.NumericValueComparator(min_fraction_threshold: float = <betterproto._PLACEHOLDER object>, max_fraction_threshold: float = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Checks that the ratio of the current value to the previous value is not below the min_fraction_threshold or above the max_fraction_threshold. That is, previous value * min_fraction_threshold <= current value <= previous value * max_fraction_threshold. To specify that the value cannot change, set both min_fraction_threshold and max_fraction_threshold to 1.0.

min_fraction_threshold: float = <betterproto._PLACEHOLDER object>

max_fraction_threshold: float = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.DatasetConstraints(num_examples_drift_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, num_examples_version_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>, min_examples_count: int = <betterproto._PLACEHOLDER object>, max_examples_count: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Constraints on the entire dataset.

num_examples_drift_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>

num_examples_version_comparator: merlin_standard_lib.proto.schema_bp.NumericValueComparator = <betterproto._PLACEHOLDER object>

min_examples_count: int = <betterproto._PLACEHOLDER object>

max_examples_count: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FixedShape(dim: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Specifies a fixed shape for the feature’s values. The immediate implication is that each feature has a fixed number of values. Moreover, these values can be parsed in a multi-dimensional tensor using the specified axis sizes. The FixedShape defines a lexicographical ordering of the data. For instance, if there is a FixedShape { dim {size:3} dim {size:2} } then tensor[0][0]=field[0] then tensor[0][1]=field[1] then tensor[1][0]=field[2] then tensor[1][1]=field[3] then tensor[2][0]=field[4] then tensor[2][1]=field[5] The FixedShape message is identical with the TensorFlow TensorShape proto message.

dim: List[merlin_standard_lib.proto.schema_bp.FixedShapeDim] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FixedShapeDim(size: int = <betterproto._PLACEHOLDER object>, name: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

An axis in a multi-dimensional feature representation.

size: int = <betterproto._PLACEHOLDER object>

name: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.ValueCount(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Limits on maximum and minimum number of values in a single example (when the feature is present). Use this when the minimum value count can be different than the maximum value count. Otherwise prefer FixedShape.

min: int = <betterproto._PLACEHOLDER object>

max: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.WeightedFeature(name: str = <betterproto._PLACEHOLDER object>, feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, weight_feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Represents a weighted feature that is encoded as a combination of raw base features. The weight_feature should be a float feature with identical shape as the feature. This is useful for representing weights associated with categorical tokens (e.g. a TFIDF weight associated with each token). TODO(b/142122960): Handle WeightedCategorical end to end in TFX (validation, TFX Unit Testing, etc)

name: str = <betterproto._PLACEHOLDER object>

feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>

weight_feature: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>

lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.SparseFeature(name: str = <betterproto._PLACEHOLDER object>, deprecated: bool = <betterproto._PLACEHOLDER object>, lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>, presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>, dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>, is_sorted: bool = <betterproto._PLACEHOLDER object>, value_feature: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>, type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A sparse feature represents a sparse tensor that is encoded with a combination of raw features, namely index features and a value feature. Each index feature defines a list of indices in a different dimension.

name: str = <betterproto._PLACEHOLDER object>

deprecated: bool = <betterproto._PLACEHOLDER object>

lifecycle_stage: merlin_standard_lib.proto.schema_bp.LifecycleStage = <betterproto._PLACEHOLDER object>

presence: merlin_standard_lib.proto.schema_bp.FeaturePresence = <betterproto._PLACEHOLDER object>

dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>

index_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature] = <betterproto._PLACEHOLDER object>

is_sorted: bool = <betterproto._PLACEHOLDER object>

value_feature: merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature = <betterproto._PLACEHOLDER object>

type: merlin_standard_lib.proto.schema_bp.FeatureType = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.SparseFeatureIndexFeature(name: str = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>)[source]

Bases: betterproto.Message

name: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.SparseFeatureValueFeature(name: str = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>)[source]

Bases: betterproto.Message

name: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.DistributionConstraints(min_domain_mass: float = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Models constraints on the distribution of a feature’s values. TODO(martinz): replace min_domain_mass with max_off_domain (but slowly).

min_domain_mass: float = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints(min_coverage: float = <betterproto._PLACEHOLDER object>, min_avg_token_length: float = <betterproto._PLACEHOLDER object>, excluded_string_tokens: List[str] = <betterproto._PLACEHOLDER object>, excluded_int_tokens: List[int] = <betterproto._PLACEHOLDER object>, oov_string_tokens: List[str] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes vocabulary coverage constraints.

min_coverage: float = <betterproto._PLACEHOLDER object>

min_avg_token_length: float = <betterproto._PLACEHOLDER object>

excluded_string_tokens: List[str] = <betterproto._PLACEHOLDER object>

excluded_int_tokens: List[int] = <betterproto._PLACEHOLDER object>

oov_string_tokens: List[str] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.SequenceValueConstraints(int_value: int = <betterproto._PLACEHOLDER object>, string_value: str = <betterproto._PLACEHOLDER object>, min_per_sequence: int = <betterproto._PLACEHOLDER object>, max_per_sequence: int = <betterproto._PLACEHOLDER object>, min_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>, max_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes constraints on specific values in sequences.

int_value: int = <betterproto._PLACEHOLDER object>

string_value: str = <betterproto._PLACEHOLDER object>

min_per_sequence: int = <betterproto._PLACEHOLDER object>

max_per_sequence: int = <betterproto._PLACEHOLDER object>

min_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>

max_fraction_of_sequences: float = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints(excluded_int_value: List[int] = <betterproto._PLACEHOLDER object>, excluded_string_value: List[str] = <betterproto._PLACEHOLDER object>, min_sequence_length: int = <betterproto._PLACEHOLDER object>, max_sequence_length: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes constraints on sequence lengths.

excluded_int_value: List[int] = <betterproto._PLACEHOLDER object>

excluded_string_value: List[str] = <betterproto._PLACEHOLDER object>

min_sequence_length: int = <betterproto._PLACEHOLDER object>

max_sequence_length: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.IntDomain(name: str = <betterproto._PLACEHOLDER object>, min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>, is_categorical: bool = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes information for domains of integer values. Note that FeatureType could be either INT or BYTES.

name: str = <betterproto._PLACEHOLDER object>

min: int = <betterproto._PLACEHOLDER object>

max: int = <betterproto._PLACEHOLDER object>

is_categorical: bool = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FloatDomain(name: str = <betterproto._PLACEHOLDER object>, min: float = <betterproto._PLACEHOLDER object>, max: float = <betterproto._PLACEHOLDER object>, disallow_nan: bool = <betterproto._PLACEHOLDER object>, disallow_inf: bool = <betterproto._PLACEHOLDER object>, is_embedding: bool = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes information for domains of float values. Note that FeatureType could be either INT or BYTES.

name: str = <betterproto._PLACEHOLDER object>

min: float = <betterproto._PLACEHOLDER object>

max: float = <betterproto._PLACEHOLDER object>

disallow_nan: bool = <betterproto._PLACEHOLDER object>

disallow_inf: bool = <betterproto._PLACEHOLDER object>

is_embedding: bool = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.StructDomain(feature: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>, sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Domain for a recursive struct. NOTE: If a feature with a StructDomain is deprecated, then all the child features (features and sparse_features of the StructDomain) are also considered to be deprecated. Similarly child features can only be in environments of the parent feature.

feature: List[merlin_standard_lib.proto.schema_bp.Feature] = <betterproto._PLACEHOLDER object>

sparse_feature: List[merlin_standard_lib.proto.schema_bp.SparseFeature] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.StringDomain(name: str = <betterproto._PLACEHOLDER object>, value: List[str] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes information for domains of string values.

name: str = <betterproto._PLACEHOLDER object>

value: List[str] = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.BoolDomain(name: str = <betterproto._PLACEHOLDER object>, true_value: str = <betterproto._PLACEHOLDER object>, false_value: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Encodes information about the domain of a boolean attribute that encodes its TRUE/FALSE values as strings, or 0=false, 1=true. Note that FeatureType could be either INT or BYTES.

name: str = <betterproto._PLACEHOLDER object>

true_value: str = <betterproto._PLACEHOLDER object>

false_value: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.NaturalLanguageDomain(vocabulary: str = <betterproto._PLACEHOLDER object>, coverage: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>, token_constraints: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>, sequence_length_constraints: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>, location_constraint_regex: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Natural language text.

vocabulary: str = <betterproto._PLACEHOLDER object>

coverage: merlin_standard_lib.proto.schema_bp.FeatureCoverageConstraints = <betterproto._PLACEHOLDER object>

token_constraints: List[merlin_standard_lib.proto.schema_bp.SequenceValueConstraints] = <betterproto._PLACEHOLDER object>

sequence_length_constraints: merlin_standard_lib.proto.schema_bp.SequenceLengthConstraints = <betterproto._PLACEHOLDER object>

location_constraint_regex: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.ImageDomain(minimum_supported_image_fraction: float = <betterproto._PLACEHOLDER object>, max_image_byte_size: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Image data.

minimum_supported_image_fraction: float = <betterproto._PLACEHOLDER object>

max_image_byte_size: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.MIDDomain[source]

Bases: betterproto.Message

Knowledge graph ID, see: https://www.wikidata.org/wiki/Property:P646

class merlin_standard_lib.proto.schema_bp.URLDomain[source]

Bases: betterproto.Message

A URL, see: https://en.wikipedia.org/wiki/URL

class merlin_standard_lib.proto.schema_bp.TimeDomain(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Time or date representation.

string_format: str = <betterproto._PLACEHOLDER object>

integer_format: merlin_standard_lib.proto.schema_bp.TimeDomainIntegerTimeFormat = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TimeOfDayDomain(string_format: str = <betterproto._PLACEHOLDER object>, integer_format: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Time of day, without a particular date.

string_format: str = <betterproto._PLACEHOLDER object>

integer_format: merlin_standard_lib.proto.schema_bp.TimeOfDayDomainIntegerTimeOfDayFormat = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FeaturePresence(min_fraction: float = <betterproto._PLACEHOLDER object>, min_count: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Describes constraints on the presence of the feature in the data.

min_fraction: float = <betterproto._PLACEHOLDER object>

min_count: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FeaturePresenceWithinGroup(required: bool = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Records constraints on the presence of a feature inside a “group” context (e.g., .presence inside a group of features that define a sequence).

required: bool = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.InfinityNorm(threshold: float = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Checks that the L-infinity norm is below a certain threshold between the two discrete distributions. Since this is applied to a FeatureNameStatistics, it only considers the top k. L_infty(p,q) = max_i |p_i-q_i|

threshold: float = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.JensenShannonDivergence(threshold: float = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Checks that the approximate Jensen-Shannon Divergence is below a certain threshold between the two distributions.

threshold: float = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.FeatureComparator(infinity_norm: 'InfinityNorm' = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>, jensen_shannon_divergence: 'JensenShannonDivergence' = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>)[source]

Bases: betterproto.Message

infinity_norm: merlin_standard_lib.proto.schema_bp.InfinityNorm = <betterproto._PLACEHOLDER object>

jensen_shannon_divergence: merlin_standard_lib.proto.schema_bp.JensenShannonDivergence = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.UniqueConstraints(min: int = <betterproto._PLACEHOLDER object>, max: int = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Checks that the number of unique values is greater than or equal to the min, and less than or equal to the max.

min: int = <betterproto._PLACEHOLDER object>

max: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentation(dense_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>, varlen_sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>, sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>, ragged_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A TensorRepresentation captures the intent for converting columns in a dataset to TensorFlow Tensors (or more generally, tf.CompositeTensors). Note that one tf.CompositeTensor may consist of data from multiple columns, for example, a N-dimensional tf.SparseTensor may need N + 1 columns to provide the sparse indices and values. Note that the “column name” that a TensorRepresentation needs is a string, not a Path – it means that the column name identifies a top-level Feature in the schema (i.e. you cannot specify a Feature nested in a STRUCT Feature).

dense_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor = <betterproto._PLACEHOLDER object>

varlen_sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor = <betterproto._PLACEHOLDER object>

sparse_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor = <betterproto._PLACEHOLDER object>

ragged_tensor: merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue(float_value: float = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>, int_value: int = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>, bytes_value: bytes = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>, uint_value: int = <betterproto._PLACEHOLDER object at 0x7fdb252b23d0>)[source]

Bases: betterproto.Message

float_value: float = <betterproto._PLACEHOLDER object>

int_value: int = <betterproto._PLACEHOLDER object>

bytes_value: bytes = <betterproto._PLACEHOLDER object>

uint_value: int = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationDenseTensor(column_name: str = <betterproto._PLACEHOLDER object>, shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, default_value: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A tf.Tensor

column_name: str = <betterproto._PLACEHOLDER object>

shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>

default_value: merlin_standard_lib.proto.schema_bp.TensorRepresentationDefaultValue = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationVarLenSparseTensor(column_name: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A ragged tf.SparseTensor that models nested lists.

column_name: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationSparseTensor(dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>, index_column_names: List[str] = <betterproto._PLACEHOLDER object>, value_column_name: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A tf.SparseTensor whose indices and values come from separate data columns. This will replace Schema.sparse_feature eventually. The index columns must be of INT type, and all the columns must co-occur and have the same valency at the same row.

dense_shape: merlin_standard_lib.proto.schema_bp.FixedShape = <betterproto._PLACEHOLDER object>

index_column_names: List[str] = <betterproto._PLACEHOLDER object>

value_column_name: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensor(feature_path: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>, partition: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>, row_partition_dtype: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A tf.RaggedTensor that models nested lists. Currently there is no way for the user to specify the shape of the leaf value (the innermost value tensor of the RaggedTensor). The leaf value will always be a 1-D tensor.

feature_path: merlin_standard_lib.proto.schema_bp.Path = <betterproto._PLACEHOLDER object>

partition: List[merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition] = <betterproto._PLACEHOLDER object>

row_partition_dtype: merlin_standard_lib.proto.schema_bp.TensorRepresentationRowPartitionDType = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationRaggedTensorPartition(uniform_row_length: int = <betterproto._PLACEHOLDER object>, row_length: str = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

Further partition of the feature values at the leaf level.

uniform_row_length: int = <betterproto._PLACEHOLDER object>

row_length: str = <betterproto._PLACEHOLDER object>

class merlin_standard_lib.proto.schema_bp.TensorRepresentationGroup(tensor_representation: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>)[source]

Bases: betterproto.Message

A TensorRepresentationGroup is a collection of TensorRepresentations with names. These names may serve as identifiers when converting the dataset to a collection of Tensors or tf.CompositeTensors. For example, given the following group: { key: “dense_tensor” tensor_representation { dense_tensor { column_name: “univalent_feature” shape { dim { size: 1 } } default_value { float_value: 0 } } } } { key: “varlen_sparse_tensor” tensor_representation { varlen_sparse_tensor { column_name: “multivalent_feature” } } } Then the schema is expected to have feature “univalent_feature” and “multivalent_feature”, and when a batch of data is converted to Tensors using this TensorRepresentationGroup, the result may be the following dict: { “dense_tensor”: tf.Tensor(…), “varlen_sparse_tensor”: tf.SparseTensor(…), }

tensor_representation: Dict[str, merlin_standard_lib.proto.schema_bp.TensorRepresentation] = <betterproto._PLACEHOLDER object>

merlin_standard_lib.proto package

Submodules

merlin_standard_lib.proto.schema_bp module

Module contents