SparseOperationKit Release Notes
The release notes for SparseOperationKit.
What’s new in Version 1.1.4
sok.experimentmodule to integrate hugectr 3G embedding:
sok.experiment.lookup_sparse, which support distributed and fused embedding lookup.
sok.experiment.DynamicVariable, whose size can grow dynamically when doing lookup.
See API Docs -> Experiment to get other functino of
What’s new in Version 1.1.3
Update pip install instruction and fix some bugs.
What’s new in Version 1.1.2
Add TensorFlow Functional API support
What’s new in Version 1.1.1
Add Auto-Mixed-Precision training support
Add uint32 key dtype support
Add TensorFlow initializers support
Add DLRM benchmark results
What’s new in Version 1.1.0
Supports TensorFlow 1.15.
Supports configuring visible devices via
Added a dedicated CUDA stream for SOK’s Ops.
Supports pip installation.
Fixed hanging issue in
tf.distribute.MirroredStrategywhen TensorFlow version greater than 2.4.
What’s new in Version 1.0.1
Supports Horovod as the synchronized training communication tool.
Supports dynamic input in All2AllDenseEmbedding, which means
unique->lookup->gatherpattern can be used.
Supports IdentityHashtable, which means no hash-mapping during inserting new keys.
Added TF Distributed Embedding totally with TF’s ops.
What’s new in Version 1.0.0
Implemented a new framework that can be used to easily integrate different embedding algorithms to common DL frameworks.
Supports single-node & multi-node synchronized training with TensorFlow.
Integrated HugeCTR’s DistributedSparseEmbedding algorithm.
Integrated All2AllDenseEmbedding algorithm.
Added custom Adam optimizer for SOK when TF version <= 2.4.
There are several issues in SparseOperationKit, and we are trying to fix those issues in the near future.
In SparseOperationKit’s embedding layers, NCCL is used to transfer data among GPUs. When there exists multiple embedding layers and there is not data dependencies among those layers, the execution order must be deterministic otherwise program might be hanging.
device-0: embedding-0 -> embedding-1 device-1: embedding-1 -> embedding-0
The solution for such problem is to make the program launch those layers with the same order in different GPUs, you can add
tf.control_dependencies() between different SOK embedding layers to force the deterministic launching order.