# Get Started With SparseOperationKit # This document will walk you through simple demos to get you familiar with SparseOperationKit.

See also

For experts or more examples, please refer to Examples section

Refer to the [*Installation* section](https://nvidia-merlin.github.io/HugeCTR/sparse_operation_kit/master/intro_link.html#installation) to install SparseOperationKit on your system. ## Import SparseOperationKit ## ```python import sparse_operation_kit as sok ``` The SOK supports the TensorFlow 1.15 and >=2.6. It automatically detects the TensorFlow version in use on behalf of users. The SOK interface is identical regardless of which TensorFlow version is used. ## Use SOK with TensorFLow ## Currently, we use horovod for communication. So in the beginning, you need to import horovod and correctly bind a GPU to each process like this: ```python import numpy as np import tensorflow as tf import horovod.tensorflow as hvd import sparse_operation_kit as sok hvd.init() gpus = tf.config.experimental.list_physical_devices("GPU") for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) if gpus: tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], "GPU") # nopep8 sok.init() ``` Next, in order to use the distributed embedding op, you need to create a variable on each process that represents a portion of the entire embedding table, whose shape is also a subset of the full embedding table. We provide a tensorflow variable wrapper to help you simplify this process. ```python # Default mode of sok.Variable is Distributed mode # If there are 2 GPUs in total, the shape of v1 on GPU0 will be [9, 3] and the shape # on GPU1 will be [8, 3] v1 = sok.Variable(np.arange(17 * 3).reshape(17, 3), dtype=tf.float32) v2 = sok.Variable(np.arange(7 * 5).reshape(7, 5), dtype=tf.float32) print("v1:\n", v1) print("v2:\n", v2) ``` Then, create the indices for the embedding lookup. This step is no different from the normal tensorflow. ```python indices1 = tf.SparseTensor( indices=[[0, 0], [0, 1], [1, 0], [1, 1], [1, 2]], values=[1, 1, 3, 4, 5], dense_shape=[2, 3] ) print("indices1:\n", indices1) # indices1: batch_size=2, max_hotness=3 # [[1, 1] # [3, 4, 5]] indices2 = tf.SparseTensor( indices=[[0, 0], [1, 0], [1, 1]], values=[1, 2, 3], dense_shape=[2, 2] ) print("indices2:\n", indices2) # indices2: batch_size=2, max_hotness=2 # [[1] # [2, 3]] ``` Then, use sok's embedding op to do the lookup. Note that here we pass two embedding variables and two indices into the lookup at the same time through a list, this fused operation will bring performance gain for us. ```python with tf.GradientTape() as tape: embeddings = sok.lookup_sparse( [v1, v2], [indices1, indices2], combiners=["sum", "sum"] ) loss = 0.0 for i, embedding in enumerate(embeddings): loss += tf.reduce_sum(embedding) print("embedding%d:\n" % (i + 1), embedding) # embedding1: [[6, 8, 10] # [36, 39, 42]] # embedding2: [[5, 6, 7, 8, 9 # [25, 27, 29, 31, 33]] ``` Finally, update the variable like normal tensorflow. ```python # If there are 2 GPUs in total # GPU0: # In Distributed mode: shape of grad of v1 will be [1, 3], shape of grad of v2 will be [1, 5] # In Localized mode: shape of grad of v1 will be [4, 3], grad of v2 will None # GPU1: # In Distributed mode: shape of grad of v1 will be [3, 3], shape of grad of v2 will be [2, 5] # In Localized mode: grad of v1 will be None, shape of grad of v2 will be [3, 5] grads = tape.gradient(loss, [v1, v2]) for i, grad in enumerate(grads): print("grad%d:\n" % (i + 1), grad) # Use tf.keras.optimizer to optimize the sok.Variable optimizer = tf.keras.optimizers.SGD(learning_rate=1.0) optimizer.apply_gradients(zip(grads, [v1, v2])) print("v1:\n", v1) print("v2:\n", v2) ``` For more examples and API descriptions see the Example section and API section.