Get the latest tech news
Fun times with energy-based models
Personal blog of Michal Pándy. Come for the AI, stay for the jokes.
As we can observe in the visualization, the samples (left) evolve from scattered points to the characteristic two-moon shape, while the score field (right) progressively aligns towards areas of high data density. This approach is viable in NCE because the objective function’s structure prevents trivial solutions where $c$ could grow arbitrarily large, unlike in maximum likelihood estimation. Non-smooth energy functions: When using gradient-based sampling methods like Langevin dynamics with deep neural nets, it can be helpful to implement gradient clipping or forms of spectral regularization.
Or read this on Hacker News