Neural Networks

Neural networks involve artificial neuron units that sum their inputs and produce an output that is a function of the resulting sum. Deep learning involves arranging units into layers, anywhere from 2-1000 layers, but commonly 20-200. A training phase is used to determine the weights to be used in the neural network. The training phase searches for the minimum error loss by repeatedly computing the sensitivity of the loss to changes in the weights and adjusting the weights accordingly. It computes the loss sensitivity by back propagating terms that describe the loss sensitivity back through the layers.

Training of neural networks is very expensive. For instance training a convolutional neural network to recognize 28x28 pixel digits with 98% accuracy requires processing around 500,000 image examples, some of which may be repeated, and 99% accuracy requires 3,000,000 image examples. Running on a late 2014 era GPU (1 half of an Nvidia Tesla K80 in an Amazon EC2 p2.xlarge instance) this takes 100 and 610 seconds respectively. By contrast once a neural network has been trained forward execution, prediction, or inference, is very fast. To recognize a digit takes 26 microseconds using the same hardware. Thus training takes around 10 million times longer than inference.

Deep neural networks have been around since the 1970's. However it is only since about 2010 that neural networks have started outperforming other approaches. This is a result of a number of engineering issues getting resolved: the hardware power needed for neural networks is readily available, we have better architectural models (convolutional neural networks and rectified linear units), and better training methods (cross entropy, early stopping, and dropout).

Deep neural networks have achieved a lot of success at classification, translation, and audio/visual processing type tasks. These all involve determining the best mapping from an input to an output value. Since 2014 however there have been a series of apparent breakthroughs broadening the scope of what deep neural networks are able to successfully achieve:

Concurrent with this broadening of scope, several qualitative neural network architectural improvements have also occurred:

Power consumption at human brain scale

When contemplating human brain level simulation electrical power consumption can be a dominating cost.

It is difficult to translate neural network performance into equivalent human brain performance. The architectures differ, human brain neurons have far more synapses than the typical deep learning neuron has inputs. and human brain neurons only fire intermittently. Nonetheless if we assume activation levels are stored in DRAM memory and 320pJ per DRAM access (JASON, 2017, Perspective on AI and AGI relevant to DoD, pg. 45):

On the other hand if we assume the use of a deep learning hardware accelerator such as the Efficient Inference Engine (JASON, 2017, Perspective on AI and AGI relevant to DoD, pg. 48), which is estimated to achieve 3.5x1011 operations per Joule:

The difficulty then becomes it is unlikely to be possible to fit sufficient hardware on a single chip forcing some accesses to occur off chip, with the associated power consumption. The true cost today then probably lies somewhere between these two estimates.

AI Policies Wiki: NeuralNetworks (last edited 2018-04-13 04:48:18 by GordonIrlam)