Get the latest tech news
Compiling a neural net to C for a speedup
A cozy little corner of the web.
This research caught my attention (I mean, who doesn’t love a pretty picture), and as I read it, I realized the whole idea wouldn’t be too hard to replicate. Here’s the big idea: we want the gradients to reach the input of the network, but if we initialize gate weights uniformly, or even randomly, we’ll get a flat activation function. We’re not doing anything fancy here, like testing for truisms; just doing the simple thing and taking the transitive closure of all dependencies, starting from the root output, going backwards:
Or read this on Hacker News