Get the latest tech news
An unexpected detour into partially symbolic, sparsity-expoiting autodiff
Exploiting linearity and sparisty to speed up JAX Hessians and slowly ruin my life.
In practice, people have found that Laplace approximations do a reasonable job quantifying uncertainty even in complex neural network models and it is at the heart of any number of classical estimators in statistics. I guess the question is can we generalist the observation if the Hessian is diagonal we only need to compute a single Hessian-vector product to general sparsity structures. Because JAX doesn’t do a symbolic transformation of the program (only a trace through paths associated with specific values), there is no guarantee that the sparsity pattern for \(H\) remains the same at each step.
Or read this on Hacker News