Get the latest tech news
An illustrated guide to automatic sparse differentiation
In numerous applications of machine learning, Hessians and Jacobians exhibit sparsity, a property that can be leveraged to vastly accelerate their computation. While the usage of automatic differentiation in machine learning is ubiquitous, automatic sparse differentiation (ASD) remains largely unknown. This post introduces ASD, explaining its key components and their roles in the computation of both sparse Jacobians and Hessians. We conclude with a practical demonstration showcasing the performance benefits of ASD.
Since neural networks are usually trained using scalar loss functions, reverse-mode AD only requires the evaluation of a single VJP to materialize a gradient, which is rather cheap (see Baur and Strassen ). Additionally, if we need to compute Jacobians multiple times (for different inputs) and are able to reuse the sparsity pattern and the coloring result, the cost of this prelude can be amortized over several subsequent evaluations. We use Julia for our demonstration since we are not aware of a similar ecosystem in Python or R. At the time of writing, PyTorch, TensorFlow, and JAX lack comparable sparsity detection and coloring capabilities.
Or read this on Hacker News