Get the latest tech news
Computational complexity of neural networks (2022)
In order to motivate why we separate the training and inference phases of neural networks, it can be useful to analyse the computational complexity. This essay assumes familiarity with analytical complexity analysis of algorithms, and hereunder big-O notation.
So by assuming that gradient descent runs for $n$ iterations, and that there are $n$ layers each with $n$ neurons as we did with forward propagation, we find the total run-time of backpropagation to be: GPUs are specifically designed to run many matrix operations in parallel since 3D geometry and animation can be expressed as a series of linear transformations. Furthermore, we have discussed some theoretical aspects of learning representations of functions, and hereunder the role of neural networks, but we were unable to reach a definitive conclusion.
Or read this on Hacker News