SparseOperationKit Release Notes
The release notes for SparseOperationKit.
What’s new in Version 1.1.4
Add
sok.experiment
module to integrate hugectr 3G embedding:Add
sok.experiment.lookup_sparse
, which support distributed and fused embedding lookup.Add
sok.experiment.DynamicVariable
, whose size can grow dynamically when doing lookup.See API Docs -> Experiment to get other functino of
sok.experiment
What’s new in Version 1.1.3
Update pip install instruction and fix some bugs.
What’s new in Version 1.1.2
Add TensorFlow Functional API support
What’s new in Version 1.1.1
Add Auto-Mixed-Precision training support
Add uint32 key dtype support
Add TensorFlow initializers support
Add DLRM benchmark results
What’s new in Version 1.1.0
Supports TensorFlow 1.15.
Supports configuring visible devices via
tf.config.set_visible_devices
.Added a dedicated CUDA stream for SOK’s Ops.
Supports pip installation.
Fixed hanging issue in
tf.distribute.MirroredStrategy
when TensorFlow version greater than 2.4.
What’s new in Version 1.0.1
Supports Horovod as the synchronized training communication tool.
Supports dynamic input in All2AllDenseEmbedding, which means
unique->lookup->gather
pattern can be used.Supports IdentityHashtable, which means no hash-mapping during inserting new keys.
Added TF Distributed Embedding totally with TF’s ops.
What’s new in Version 1.0.0
Implemented a new framework that can be used to easily integrate different embedding algorithms to common DL frameworks.
Supports single-node & multi-node synchronized training with TensorFlow.
Integrated HugeCTR’s DistributedSparseEmbedding algorithm.
Integrated All2AllDenseEmbedding algorithm.
Added custom Adam optimizer for SOK when TF version <= 2.4.
Known Issues
There are several issues in SparseOperationKit, and we are trying to fix those issues in the near future.
NCCL conflicts
In SparseOperationKit’s embedding layers, NCCL is used to transfer data among GPUs. When there exists multiple embedding layers and there is not data dependencies among those layers, the execution order must be deterministic otherwise program might be hanging.
device-0: embedding-0 -> embedding-1
device-1: embedding-1 -> embedding-0
The solution for such problem is to make the program launch those layers with the same order in different GPUs, you can add tf.control_dependencies()
between different SOK embedding layers to force the deterministic launching order.