A White Paper On The Future of Artificial Intelligence
A White Paper On The Future of Artificial Intelligence
A White Paper On The Future of Artificial Intelligence
Intelligence
Abstract
In the present white paper we discuss the current state of Artificial Intelligence (AI)
research and its future opportunities. We argue that solving the problem of invariant
representations is the key to overcoming the limitations inherent in today's neural
networks and to making progress towards Strong AI. Based on this premise, we
describe a research strategy towards the next generation of machine learning
algorithms beyond the currently dominant deep learning paradigm. Following the
example of biological brains, we propose an unsupervised learning approach to
solve the problem of invariant representations. A focused interdisciplinary research
effort is required to establish an abstract mathematical theory of invariant
representations and to apply it in the development of functional software algorithms,
while both applying and enhancing our conceptual understanding of the (human)
brain.
heir results on the ImageNet LSVRC-
2010 contest, a computer vision challenge to automatically classify 1.2 million high-
resolution images into 1,000
different classes. Their use of deep neural networks yielded a substantial
improvement in error rate and marks the
beginning of the recent wave of interest in machine learning and artificial intelligence.
In the subsequent years, deep
learning has been applied to a considerable number of other problems and used
productively in applications such as
voice recognition for digital assistants, translation software, and self-driving vehicles.
But despite all these impressive success stories, deep learning still suffers from
severe limitations. For one thing,
enormous amounts of labeled data are required to train the networks. Where a
human child might learn to recognize
an animal species or a class of objects by seeing only a few examples, a deep
neural network typically needs tens
of thousands of images to achieve similar accuracy. For another thing, today’s
algorithms clearly are far away from
grasping the essence of an entity or a class in the way humans do. Many examples
show how even the most modern
neural networks fail spectacularly in cases that seem trivial to humans [22].
While neural networks are quite fashionable nowadays, their conceptual foundations
are actually rather old; they were
already being intensely studied in the 1950s and 1960s, inspired by the brain’s
anatomy – according to the understanding
at that time. Today’s deep neural networks are essentially the same as those
classical networks except for their higher
number of layers. They owe their success in recent years largely to an increase in
computing power and the availability
of huge amounts of training data.
Our central hypothesis, which drives our research strategy, is that the current
limitations in AI can only be overcome by
a new generation of algorithms. These algorithms will be inspired by today’s
neurosciences and – to some extent – by
advances in our understanding of the brain which are yet to come. Our envisioned
path forward is an interdisciplinary
∗Learn more at https://www.merckgroup.com/en/research/ai-research.html
One view is that conceptual knowledge is organized using the circuitry in the
medial temporal lobe (MTL) that supports spatial processing and navigation. In
contrast, we find that a domain-general learning algorithm explains key findings in
both spatial and conceptual domains. When the clustering model is applied to
spatial navigation tasks, so-called place and grid cell-like representations emerge
because of the relatively uniform distribution of possible inputs in these tasks. The
same mechanism applied to conceptual tasks, where the overall space can be
higher-dimensional and sampling sparser, leading to representations more
aligned with human conceptual knowledge. Although the types of memory
supported by the MTL are superficially dissimilar, the information processing steps
appear shared. Our account suggests that the MTL uses a general-purpose
algorithm to learn and organize context-relevant information in a useful format,
rather than relying on navigation-specific neural circuitry. Spatial maps in the
medial temporal lobe (MTL) have been proposed to map abstract conceptual
knowledge. Rather than grounding abstract knowledge in a spatial map, the
authors propose a general-purpose clustering algorithm that explains how both
spatial (including place and grid cells) and higher-dimensional conceptual
representations arise during learning.
The human brain has some capabilities that the brains of other animals lack. It is
to these distinctive capabilities that our species owes its dominant position. Other
animals have stronger muscles or sharper claws, but we have cleverer brains. If
machine brains one day come to surpass human brains in general intelligence,
then this new superintelligence could become very powerful. As the fate of the
gorillas now depends more on us humans than on the gorillas themselves, so the
fate of our species then would come to depend on the actions of the machine
superintelligence. But we have one advantage: we get to make the first move. Will
it be possible to construct a seed AI or otherwise to engineer initial conditions so
as to make an intelligence explosion survivable? How could one achieve a
controlled detonation? To get closer to an answer to this question, we must make
our way through a fascinating landscape of topics and considerations. Read the
book and learn about oracles, genies, singletons; about boxing methods,
tripwires, and mind crime; about humanity's cosmic endowment and differential
technological development; indirect normativity, instrumental convergence, whole
brain emulation and technology couplings; Malthusian economics and dystopian
evolution; artificial intelligence, and biological cognitive enhancement, and
collective intelligence.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make training
faster, we used non-saturating neurons and a very efficient GPU implemen- tation
of the convolution operation. To reduce overfitting in the fully-connected layers we
employed a recently-developed regularization method called dropout that proved
to be very effective. We also entered a variant of this model in the ILSVRC-2012
competition and achieved a winning top-5 test error rate of 15.3%, compared to
26.2% achieved by the second-best entry
There is a popular belief in neuroscience that we are primarily data limited, and
that producing large, multimodal, and complex datasets will, with the help of
advanced data analysis algorithms, lead to fundamental insights into the way the
brain processes information. These datasets do not yet exist, and if they did we
would have no way of evaluating whether or not the algorithmically-generated
insights were sufficient or even correct. To address this, here we take a classical
microprocessor as a model organism, and use our ability to perform arbitrary
experiments on it to see if popular data analysis methods from neuroscience can
elucidate the way it processes information. Microprocessors are among those
artificial information processing systems that are both complex and that we
understand at all levels, from the overall logical flow, via logical gates, to the
dynamics of transistors. We show that the approaches reveal interesting structure
in the data but do not meaningfully describe the hierarchy of information
processing in the microprocessor. This suggests current analytic approaches in
neuroscience may fall short of producing meaningful understanding of neural
systems, regardless of the amount of data. Additionally, we argue for scientists
using complex non-linear dynamical systems with known ground truth, such as
the microprocessor as a validation platform for time-series and structure discovery
methods.
The game of Go has long been viewed as the most challenging of classic games
for artificial intelligence owing to its enormous search space and the difficulty of
evaluating board positions and moves. Here we introduce a new approach to
computer Go that uses ‘value networks’ to evaluate board positions and ‘policy
networks’ to select moves. These deep neural networks are trained by a novel
combination of supervised learning from human expert games, and reinforcement
learning from games of self-play. Without any lookahead search, the neural
networks play Go at the level of state-of-the-art Monte Carlo tree search programs
that simulate thousands of random games of self-play. We also introduce a new
search algorithm that combines Monte Carlo simulation with value and policy
networks. Using this search algorithm, our program AlphaGo achieved a 99.8%
winning rate against other Go programs, and defeated the human European Go
champion by 5 games to 0. This is the first time that a computer program has
defeated a human professional player in the full-sized game of Go, a feat
previously thought to be at least a decade away.