merlin.models.tf.TopKEncoder#

class merlin.models.tf.TopKEncoder(*args, **kwargs)[source]#

Bases: merlin.models.tf.core.encoder.Encoder, merlin.models.tf.models.base.BaseModel

Block that can be used for top-k prediction & evaluation, initialized from a trained retrieval model

Parameters

query_encoder (Union[Encoder, tf.keras.layers.Layer],) – The layer to use for encoding the query features
topk_layer (Union[str, tf.keras.layers.Layer, TopKOutput]) – The layer to use for computing the top-k predictions. You can also pass the name of registered top-k layer. The current supported strategies are [brute-force-topk] By default “brute-force-topk”
candidates (Union[tf.Tensor, Dataset]) – The candidate embeddings to use for the Top-k index. You can pass a tensor of pre-trained embeddings or a merlin.io.Dataset of pre-trained embeddings, indexed by the candidates ids. This is required when topk_layer is a string By default None
candidate_encoder (Union[Encoder, tf.keras.layers.Layer],) – The layer to use for encoding the item features
k (int, Optional) – Number of candidates to return, by default 10
pre (Optional[tf.keras.layers.Layer]) – A block to use before encoding the input query By default None
post (Optional[tf.keras.layers.Layer]) – A block to use after getting the top-k prediction scores By default None
target (str, optional) – The name of the target. This is required when multiple targets are provided. By default None

__init__(query_encoder: Union[merlin.models.tf.core.encoder.Encoder, keras.engine.base_layer.Layer], topk_layer: Union[str, keras.engine.base_layer.Layer, merlin.models.tf.outputs.topk.TopKOutput] = 'brute-force-topk', candidates: Optional[Union[tensorflow.python.framework.ops.Tensor, merlin.io.dataset.Dataset]] = None, candidate_encoder: Optional[Union[merlin.models.tf.core.encoder.Encoder, keras.engine.base_layer.Layer]] = None, k: int = 10, pre: Optional[keras.engine.base_layer.Layer] = None, post: Optional[keras.engine.base_layer.Layer] = None, target: Optional[str] = None, **kwargs)[source]#

Methods

`__init__`(query_encoder[, topk_layer, ...])
`add_loss`(losses, **kwargs)	Add loss tensor(s), potentially dependent on layer inputs.
`add_metric`(value[, name])	Adds metric tensor to the layer.
`add_update`(updates)	Add update op(s), potentially dependent on layer inputs.
`add_variable`(args, *kwargs)	Deprecated, do NOT use! Alias for add_weight.
`add_weight`([name, shape, dtype, ...])	Adds a new variable to the layer.
`adjust_predictions_and_targets`(predictions, ...)	Adjusts the predictions and targets to ensure compatibility with most Keras losses and metrics.
`batch_predict`(dataset, batch_size[, ...])	Batched top-k prediction using Dask.
`build`(input_shape)	Creates the variables of the layer.
`build_from_config`(config)
`call`(inputs, *[, targets, training, testing])	Calls the model on new inputs and returns the outputs as tensors.
`call_train_test`(x[, y, sample_weight, ...])	Apply the model's call method during Train or Test modes and prepare Prediction (v2) or PredictionOutput (v1 - depreciated) objects
`compile`([optimizer, loss, metrics, ...])	Extend the compile method of BaseModel to set the threshold k of the top-k encoder.
`compile_from_config`(config)
`compute_loss`([x, y, y_pred, sample_weight])	Compute the total loss, validate it, and return it.
`compute_mask`(inputs[, mask])	Computes an output mask tensor.
`compute_metrics`(prediction_outputs[, ...])	Overrides Model.compute_metrics() for some custom behaviour
`compute_output_shape`(input_shape)	Computes the output shape of the layer.
`compute_output_signature`(input_signature)	Compute the output tensor signature of the layer based on the inputs.
`count_params`()	Count the total number of scalars composing the weights.
`encode`(dataset, index, batch_size, **kwargs)	Encodes the given dataset and index.
`encode_candidates`(dataset[, index_column, ...])	Method to generate candidates embeddings
`evaluate`([x, y, batch_size, verbose, ...])
`evaluate_generator`(generator[, steps, ...])	Evaluates the model on a data generator.
`export`(filepath)	Create a SavedModel artifact for inference (e.g.
`finalize_state`()	Finalizes the layers state after updating layer weights.
`fit`(args, *kwargs)	Fit model
`fit_generator`(generator[, steps_per_epoch, ...])	Fits the model on data yielded batch-by-batch by a Python generator.
`from_candidate_dataset`(query_encoder, ...[, ...])	Class method to initialize a TopKEncoder from a dataset of raw candidates features.
`from_config`(config[, custom_objects])	Creates a new instance of the class by deserializing.
`get_build_config`()
`get_compile_config`()
`get_config`()	Returns the configuration of the model as a dictionary.
`get_input_at`(node_index)	Retrieves the input tensor(s) of a layer at a given node.
`get_input_mask_at`(node_index)	Retrieves the input mask tensor(s) of a layer at a given node.
`get_input_shape_at`(node_index)	Retrieves the input shape(s) of a layer at a given node.
`get_layer`([name, index])	Retrieves a layer based on either its name (unique) or index.
`get_metrics_result`()	Returns the model's metrics values as a dict.
`get_output_at`(node_index)	Retrieves the output tensor(s) of a layer at a given node.
`get_output_mask_at`(node_index)	Retrieves the output mask tensor(s) of a layer at a given node.
`get_output_shape_at`(node_index)	Retrieves the output shape(s) of a layer at a given node.
`get_weight_paths`()	Retrieve all the variables and their paths for the model.
`get_weights`()	Retrieves the weights of the model.
`index_candidates`(candidates[, identifiers])
`load_weights`(filepath[, skip_mismatch, ...])	Loads all layer weights from a saved files.
`make_predict_function`([force])	Creates a function that executes one step of inference.
`make_test_function`([force])	Creates a function that executes one step of evaluation.
`make_train_function`([force])	Creates a function that executes one step of training.
`metrics_results`()	Logic to consolidate metrics results extracted from standard Keras Model.compute_metrics()
`outputs_by_name`()	Returns the task names from the model outputs
`outputs_by_target`()	Method to index the model's prediction blocks by target names.
`predict`(x[, batch_size, verbose, steps, ...])
`predict_generator`(generator[, steps, ...])	Generates predictions for the input samples from a data generator.
`predict_on_batch`(x)	Returns predictions for a single batch of samples.
`predict_step`(data)	Custom predict step to obtain the outputs
`prediction_tasks_by_name`()
`prediction_tasks_by_target`()	Method to index the model's prediction tasks by target names.
`reset_metrics`()	Resets the state of all the metrics in the model.
`reset_states`()
`save`(export_path[, include_optimizer, ...])	Saves the model to export_path as a Tensorflow Saved Model.
`save_spec`([dynamic_batch])	Returns the tf.TensorSpec of call args as a tuple (args, kwargs).
`save_weights`(filepath[, overwrite, ...])	Saves all layer weights.
`set_weights`(weights)	Sets the weights of the layer, from NumPy arrays.
`summary`([line_length, positions, print_fn, ...])	Prints a string summary of the network.
`test_on_batch`(x[, y, sample_weight, ...])	Test the model on a single batch of samples.
`test_step`(data)	Custom test step using the compute_loss method.
`to_json`(**kwargs)	Returns a JSON string containing the network configuration.
`to_yaml`(**kwargs)	Returns a yaml string containing the network configuration.
`train_compute_metrics`(outputs, compiled_metrics)	Returns metrics for the outputs of this step.
`train_on_batch`(x[, y, sample_weight, ...])	Runs a single gradient update on a single batch of data.
`train_step`(data)	Performs a training step.
`with_name_scope`(method)	Decorator to automatically enter the module name scope.

Attributes

`activity_regularizer`	Optional regularizer function for the output of this layer.
`compute_dtype`	The dtype of the layer's computations.
`distribute_reduction_method`	The method employed to reduce per-replica values during training.
`distribute_strategy`	The tf.distribute.Strategy this model was created under.
`dtype`	The dtype of the layer weights.
`dtype_policy`	The dtype policy associated with this layer.
`dynamic`	Whether the layer is dynamic (eager-only); set in the constructor.
`first`	Returns the first block of the model.
`has_schema`	Returns True as this class does contain a schema.
`inbound_nodes`	Return Functional API nodes upstream of this layer.
`input`	Retrieves the input tensor(s) of a layer.
`input_mask`	Retrieves the input mask tensor(s) of a layer.
`input_schema`	Get the input schema if it's defined.
`input_shape`	Retrieves the input shape(s) of a layer.
`input_spec`	InputSpec instance(s) describing the input format for this layer.
`jit_compile`	Specify whether to compile the model with XLA.
`last`	Returns the last block of the model.
`layers`
`losses`	List of losses added using the add_loss() API.
`metrics`	Return metrics added using compile() or add_metric().
`metrics_names`	Returns the model's display labels for all outputs.
`model_outputs`	Returns a list with the ModelOutput in the model
`name`	Name of the layer (string), set in the constructor.
`name_scope`	Returns a tf.name_scope instance for this class.
`non_trainable_variables`
`non_trainable_weights`
`outbound_nodes`	Return Functional API nodes downstream of this layer.
`output`	Retrieves the output tensor(s) of a layer.
`output_mask`	Retrieves the output mask tensor(s) of a layer.
`output_shape`	Retrieves the output shape(s) of a layer.
`prediction_tasks`	Returns the Prediction tasks in the model.
`run_eagerly`	Settable attribute indicating whether the model should run eagerly.
`schema`	Returns the schema of the model.
`state_updates`	Deprecated, do NOT use!
`stateful`
`submodules`	Sequence of all sub-modules.
`supports_masking`	Whether this layer supports computing a mask using compute_mask.
`to_call`	Provides the list of blocks to be called during the execution of the model.
`topk_layer`
`trainable`
`trainable_variables`
`trainable_weights`
`updates`
`variable_dtype`	Alias of Layer.dtype, the dtype of the weights.
`variables`	Returns the list of all layer variables/weights.
`weights`	Returns the list of all layer variables/weights.

classmethod from_candidate_dataset(query_encoder: Union[merlin.models.tf.core.encoder.Encoder, keras.engine.base_layer.Layer], candidate_encoder: Union[merlin.models.tf.core.encoder.Encoder, keras.engine.base_layer.Layer], dataset: merlin.io.dataset.Dataset, top_k: int = 10, index_column: Optional[Union[str, merlin.schema.schema.ColumnSchema, merlin.schema.schema.Schema, merlin.schema.tags.Tags]] = None, **kwargs)[source]#

Class method to initialize a TopKEncoder from a dataset of raw candidates features.

Parameters

query_encoder (Union[Encoder, tf.keras.layers.Layer]) – The encoder layer to use for computing the query embeddings.
candidate_encoder (Union[Encoder, tf.keras.layers.Layer]) – The encoder layer to use for computing the candidates embeddings.
dataset (merlin.io.Dataset) – Raw candidate features dataset
index_column (Union[str, ColumnSchema, Schema, Tags], optional) – The column to use as candidates identifiers, this will be used for returning the topk ids of candidates with the highest scores. If not specified, the candidates indices will be used instead. by default None
top_k (int, optional) – Number of candidates to return, by default 10

Returns

a TopKEncoder indexed by the pre-trained embeddings of the candidates in the specified dataset

Return type

TopKEncoder

compile(optimizer='rmsprop', loss=None, metrics=None, loss_weights=None, weighted_metrics=None, run_eagerly=None, steps_per_execution=None, jit_compile=None, k: Optional[int] = None, **kwargs)[source]#: Extend the compile method of BaseModel to set the threshold k of the top-k encoder.

property topk_layer#

index_candidates(candidates, identifiers=None)[source]#

encode_candidates(dataset: merlin.io.dataset.Dataset, index_column: Optional[Union[str, merlin.schema.schema.ColumnSchema, merlin.schema.schema.Schema, merlin.schema.tags.Tags]] = None, candidate_encoder: Optional[Union[merlin.models.tf.core.encoder.Encoder, keras.engine.base_layer.Layer]] = None, **kwargs) → merlin.io.dataset.Dataset[source]#

Method to generate candidates embeddings

Parameters

dataset (merlin.io.Dataset) – Raw candidate features dataset
index_column (Union[str, ColumnSchema, Schema, Tags], optional) – The column to use as candidates identifiers, this will be used for returning the topk ids of candidates with the highest scores. If not specified, the candidates indices will be used instead. by default None
candidate_encoder (Union[Encoder, tf.keras.layers.Layer], optional) – The encoder layer to use for computing the candidates embeddings. If not specified, the candidate_encoder set in the constructor will be used instead. by default None

Returns

A merlin dataset of candidates embeddings, indexed by index_column.

Return type

merlin.io.Dataset

batch_predict(dataset: Union[merlin.io.dataset.Dataset, merlin.models.tf.loader.Loader], batch_size: int, output_schema: Optional[merlin.schema.schema.Schema] = None, **kwargs) → merlin.io.dataset.Dataset[source]#

Batched top-k prediction using Dask.

Parameters

dataset (Union[merlin.io.Dataset, merlin.models.tf.loader.Loader]) – Raw queries features dataset or Loader
batch_size (int) – The number of queries to process at each prediction step
output_schema (Schema, optional) – The columns to output from the input dataset

Returns

A merlin dataset with the top-k predictions, the candidates identifiers and related scores.

Return type

merlin.io.Dataset

fit(*args, **kwargs)[source]#: Fit model