Neural networks involve artificial neuron units that sum their inputs and produce an output that is a function of the resulting sum. Deep learning involves arranging units into layers, anywhere from 2-1000 layers, but commonly 20-200. A training phase is used to determine the weights to be used in the neural network. The training phase searches for the minimum error loss by repeatedly computing the sensitivity of the loss to changes in the weights and adjusting the weights accordingly. It computes the loss sensitivity by back propagating terms that describe the loss sensitivity back through the layers.
Training of neural networks is very expensive. For instance training a convolutional neural network to recognize 28x28 pixel digits with 98% accuracy requires processing around 500,000 image examples, some of which may be repeated, and 99% accuracy requires 3,000,000 image examples. Running on a late 2014 era GPU (1 half of an Nvidia Tesla K80 in an Amazon EC2 p2.xlarge instance) this takes 100 and 610 seconds respectively. By contrast once a neural network has been trained forward execution, prediction, or inference, is very fast. To recognize a digit takes 26 microseconds using the same hardware. Thus training takes around 10 million times longer than inference.
Deep neural networks have been around since the 1970's. However it is only since about 2010 that neural networks have started outperforming other approaches. This is a result of a number of engineering issues getting resolved: the hardware power needed for neural networks is readily available, we have better architectural models (convolutional neural networks and rectified linear units), and better training methods (cross entropy, early stopping, and dropout).
Deep neural networks have achieved a lot of success at classification, translation, and audio/visual processing type tasks. These all involve determining the best mapping from an input to an output value. Since 2014 however there have been a series of apparent breakthroughs broadening the scope of what deep neural networks are able to successfully achieve:
- 2014 Neural Turing machines (Graves et al. 2014, Neural Turing machines). Neural networks have historically lacked the equivalent of working memory. Building on Long Short Term Memory (LSTM) Neural Turing machines provide a way for neural networks to read and write memory cells possibly using content based addressing.
- 2015 Reinforcement learning (Mnih et al., 2015, Human-level control through deep reinforcement learning). Reinforcement learning involves learning the action to take that will maximize the total eventual reward, given you are in a particular current state, and have a set of available actions. It has broad applicability, but one simple example is playing a video game.
- 2016 Strategic game playing (Silver et al., 2016, Mastering the game of Go with deep neural networks and tree search). The last of the classical board games to succumb to AI through the use of deep neural networks for move policy and position value evaluation.
2016 Neural programmer-interpreters (Reed & de Freitas, 2016, Neural Programmer-Interpreters). Humans solve problems hierarchically. Getting ready for work involves getting out of bed, getting dressed, eating breakfast, and brushing your teeth. Each of these tasks is itself composed of smaller and smaller tasks. Neural Programmer-Interpreters (NPI) allow a neural network to learn tasks involving sub-tasks. Example problems solved so far include addition of multi-digit numbers together given example sums, and sorting lists of numbers via bubblesort.
Concurrent with this broadening of scope, several qualitative neural network architectural improvements have also occurred:
- 2016 Residual learning (He et al., 2016, Deep Residual Learning for Image Recognition). Learning residuals rather than more complicated functions makes is possible to significantly increase the depth of neural networks and improve their accuracy.
- 2017 Pure Monte Carlo tree search based reinforcement learning (Silver et al., 2017, Mastering the Game of Go without Human Knowledge). A simpler architecture for policy iteration based reinforcement learning along with the use of residual learning significantly outperforms what was previously achieved.
Power consumption at human brain scale
When contemplating human brain level simulation electrical power consumption can be a dominating cost.
Human brain: 86x109 neurons x 1,000 synapses/neuron x perhaps 10Hz spiking rate = 8.6x1014 synaptic operations/second
It is difficult to translate neural network performance into equivalent human brain performance. The architectures differ, human brain neurons have far more synapses than the typical deep learning neuron has inputs. and human brain neurons only fire intermittently. Nonetheless if we assume activation levels are stored in DRAM memory and 320pJ per DRAM access (JASON, 2017, Perspective on AI and AGI relevant to DoD, pg. 45):
- Human brain level performance (all memory accesses off chip): $28/hr (assuming $0.10/kW-hr)
On the other hand if we assume the use of a deep learning hardware accelerator such as the Efficient Inference Engine (JASON, 2017, Perspective on AI and AGI relevant to DoD, pg. 48), which is estimated to achieve 3.5x1011 operations per Joule:
- Human brain level performance (all memory accesses on-chip): $0.25/hr (assuming $0.10/kW-hr)
The difficulty then becomes it is unlikely to be possible to fit sufficient hardware on a single chip forcing some accesses to occur off chip, with the associated power consumption. The true cost today then probably lies somewhere between these two estimates.