Chapter 1
Cognitive Computing:
Concepts, Architectures,
Systems, and Applications
V.N. Gudivada1
East Carolina University, Greenville, NC, United States
1
Corresponding author: e-mail: [email protected]
ABSTRACT
Cognitive computing is an emerging field ushered in by the synergistic confluence of
cognitive science, data science, and an array of computing technologies. Cognitive sci-
ence theories provide frameworks to describe various models of human cognition
including how information is represented and processed by the brain. Data science pro-
vides processes and systems to extract knowledge from both structured and unstructured
data. Cognitive computing employs the computing discipline’s theories, methods, and
tools to model human cognition. The recent advances in data science and computing
disciplines—neuromorphic processors, big data, predictive modeling, machine learning,
natural language understanding, and cloud computing—are accelerating advances in
cognitive science and cognitive computing.
The overarching goal of this chapter is to provide an interdisciplinary introduction
to cognitive computing. The focus is on breadth to provide a unified view of the disci-
pline. The chapter begins with an overview of cognitive science, data science, and cog-
nitive computing. The principal technology enablers of cognitive computing are
presented next. An overview of three major categories of cognitive architectures is pre-
sented, which is followed by a description of cognitive computing systems and their
applications. Trends and future research directions in cognitive computing are dis-
cussed. The chapter concludes by listing various cognitive computing resources.
Keywords: Cognitive computing, Cognitive architectures, Cognitive models, Cogni-
tive systems, Cognitive applications, Cognitive computing systems, Data science
1 INTRODUCTION
An autonomous system is a self-contained and self-regulated entity. The sys-
tem continually reconstitutes itself in real time in response to changes in its
Handbook of Statistics, Vol. 35. http://dx.doi.org/10.1016/bs.host.2016.07.004
© 2016 Elsevier B.V. All rights reserved. 3
4 SECTION A Fundamentals and Principles
environment (Vernon, 2014). The self-reorganization aspect embodies
learning, development, and evolution. Cognition is the process by which an
autonomous system acquires its knowledge and improves its behavior through
senses, thoughts, and experiences (Anderson, 1983). Cognitive processes are
critical to autonomous systems for their realization and existence (Franklin
et al., 2014; Newell, 1994).
Human cognition refers to the cognitive processes which enable humans to
perform various tasks, both mundane and highly specialized (Chipman, 2015).
A collection of processes that enable computers to accomplish tasks that target
performance at human cognition levels is referred to as machine cognition.
Human cognition employs biological and natural means—brain and mind—
for its realization. On the other hand, machine cognition views cognition as a
type of computation and uses cognitive computing techniques for its realization.
Cognitive science is an interdisciplinary approach to the study of human
and animal cognition (Frankish and Ramsey, 2012; Friedenberg and
Silverman, 2015). Abrahamsen and Bechtel (2012) provide an exposition
and core themes of cognitive science. Cognitive computing is an emerging
field ushered in by the synergistic confluence of cognitive science, data sci-
ence, and an array of computing technologies (Hurwitz et al., 2015). The
recent advances in the computing discipline—high-performance computers,
neuromorphic chips, neurocams, big data, machine learning, predictive mod-
eling, natural language processing (NLP), and cloud computing—are acceler-
ating advances in cognitive science and cognitive computing disciplines.
Given the interdisciplinary origin of cognitive science and data science,
there are multiple perspectives on cognitive computing. These perspectives
are shaped by diverse domain-specific applications and fast evolution of
enabling technologies. There is no consensus on what exactly comprises the
field of cognitive computing. Our exposition of cognitive computing in this
chapter is driven by big data, information retrieval, machine learning, and
natural language understanding and applications.
1.1 Chapter Organization
The overarching goal for this chapter is to provide a unified introduction to
cognitive computing by drawing on multiple perspectives. Section 2 provides
an overview of cognitive science as an interdisciplinary domain. The primary
characteristics of cognitive computing systems and a preview of cognitive
applications are provided in Section 3. Principal technology enablers of cog-
nitive computing are discussed in Section 5. Concepts of knowledge represen-
tation are presented in Section 4.
Cognitive architectures model human performance on multiple cognitive
tasks. They are computational frameworks which specify structure and
functions of cognitive systems as well as how structure and functions interact.
Section 6 discusses cognitive architectures and approaches to cognitive tasks.
Concepts, Architectures, Systems, and Applications Chapter 1 5
Cognitive computing systems and their applications are presented in
Section 7. Trends and future research directions in cognitive computing are
discussed in Section 8. Finally, Section 9 ends the chapter by listing various
cognitive computing resources.
2 INTERDISCIPLINARY NATURE OF COGNITIVE SCIENCE
Cognitive science theories provide frameworks to describe various models of
human cognition including how information is represented and processed by
the brain. The human brain is perhaps the most complex system in terms of
its structure and function. The mental processes of the brain span a broad
spectrum ranging from visual and auditory perception, attention,
memory, imagery, problem solving, and natural language understanding. We
use the terms mental processes and cognitive tasks synonymously.
Cognitive science encompasses academic disciplines including philoso-
phy, psychology, neuroscience, linguistics, artificial intelligence (AI), and
robotics. Philosophers pose broad questions about the nature of the mind
and the relationship between the mind and thought processes (Thagard,
2009). They also offer hypotheses to explain the mind and its mental pro-
cesses. Philosophers’ primary method of inquiry is through deductive and
inductive reasoning.
Cognitive psychologists design experiments and execute them under con-
trolled conditions to validate hypotheses and develop cognitive theories
(Neisser, 2014). Cognitive psychology studies aim to discover how thinking
works. Such studies encompass, for example, how experts solve problems
compared with novices, how short is short-term memory, and why people
who are the most incompetent are the least aware of their own incompetence.
The discipline of evolutionary psychology explains human mental processes
using the selection theory. More specifically, it uses evolutionary principles
to explain psychological adaptations such as changes in our thinking to
improve our survival.
Neuroscientists employ engineering instruments and scientific methods to
measure brain activity in response to external stimuli (McClelland and Ralph,
2015). For example, functional magnetic resonance imaging (fMRI), positron
emission tomography (PET), and computerized axial tomography (CAT)
techniques are used to identify specific brain regions associated with various
cognitive tasks. A neurocam is a head-mounted camera which monitors brain
waves (Neurowear, n.d). When the person wearing this device looks at some-
thing that causes the brain activity to spike, the activity is automatically
recorded. This camera is not yet available in the market.
Linguists study various aspects of natural languages including language
acquisition and understanding (Evans, 2012; Isac and Reiss, 2013). Cognitive
linguists investigate the interaction between language and cognition. How can
we explain the fact that a 5-year old in one culture can do with ease a simple
6 SECTION A Fundamentals and Principles
task such as pointing in the direction of the north that eminent scientists in other
cultures struggle with? The notion that different languages may impart different
cognitive skills dates back centuries and there is empirical evidence for this
causal relation (Boroditsky, 2011). This notion is formally stated as Sapir–
Whorf hypothesis, which states that the structure of a language affects its speak-
ers’ cognition or world view (Kay and Kempton, 1984). However, it appears
that language is only one factor that influences cognition and behavior.
AI (Russell and Norvig, 2009) and robotics (Samani, 2015) researchers
investigate how robots can be endowed with human-like intelligent behavior
to perform various cognitive tasks. In recent years, the practice of developing
intelligent systems by implicitly or explicitly embedding knowledge through
advanced programming techniques is not a dominant AI practice. Though
some of the AI systems incorporate learning into their design, the primary
effort has been on codifying domain knowledge, specifying integrity con-
straints, and designing inference rules. Also, such systems are strongly cou-
pled with the domain, and the effort required for domain adaption is as
much as developing the system from scratch for the new domain.
Another AI approach to developing intelligent systems is data driven and
eases the domain knowledge encoding and rule specification effort (Abu-
Mostafa et al., 2012). Though this approach existed for quite some time, the
recent emergence of big data created renewed interest (Hastie et al., 2003).
It emphasizes primarily semi-supervised and unsupervised machine learning
algorithms. This approach entails relatively less effort for domain adaptation.
The terms brain and mind are often used interchangeably. Cognitive scien-
tists from philosophy, psychology, and linguistics backgrounds typically use
the term mind. These domains investigate cognition at a more abstract and
logical level and are less concerned about the underlying apparatus that
enables cognition. On the other hand, cognitive scientists from the neurosci-
ence and computing disciplines use the term brain. The apparatus that enables
cognition is central to their investigations.
3 COGNITIVE COMPUTING SYSTEMS
Cognitive computing employs the computing discipline’s theories, methods,
and tools to model cognitive tasks. It views the mind as a highly parallel
information processor, uses various models for representing information,
and employs algorithms for transforming and reasoning with the information.
The means to represent and store information in a computer bears little or no
resemblance to its counterparts in the human brain.
Technologies that enable cognitive computing systems include AI,
machine learning, computer vision, robotics, written and spoken language rec-
ognition and understanding, information retrieval, big data, Internet of Things
(IoT), and cloud computing. Some of these are enabling technologies and
others are technologies in their own right.
Concepts, Architectures, Systems, and Applications Chapter 1 7
Cognitive computing systems are fundamentally different from traditional
computing systems. Cognitive systems are adaptive, learn and evolve over
time, and incorporate context into the computation. They sense their environ-
ment, think and act autonomously, and deal with uncertain, ambiguous, and
incomplete information.
Cognitive computing systems do not use brute force approaches. For
example, the IBM’s Deep Blue system which defeated the world Chess cham-
pion Garry Kasparov in 1997 is not considered a cognitive computing system.
Deep Blue used exhaustive search in planning its moves. In contrast, the IBM
Watson of 2011 is a cognitive computing system. It uses deep natural lan-
guage understanding, incorporates contextual information into its decision
making, and reasons with incomplete and uncertain data. It performs spatial
and temporal reasoning and can recognize statistical paraphrasing of natural
language text.
Cognitive computing systems span a broad spectrum in terms of their cap-
abilities. With rapid advances in cognitive science and data science, current
computing applications embody varying degrees of cognitive capabilities.
For example, an assortment of cognitive capabilities is essential for self-
driving cars. Cognitive capabilities enable self-driving cars to learn from past
experiences and use contextual information in making decisions in real time.
Cognitive assistants such as the Google Now predict and suggest next
context-dependent actions. Emerging information extraction and search tech-
nologies provide evidence-based answers and explain their answers. Cognitive
technologies for translating webpage content to different languages have
achieved unprecedented levels of accuracy. Transformative advances in
speech recognition and language understanding are used for real-time speech
understanding and translation tasks. Cognitive IoT (Wu et al., 2014) and big
data technologies are used in smart cities for improving public safety and effi-
ciency of infrastructure operations. Finally, humanoid robots are learning
difficult tasks such as archery (Kormushev et al., 2010).
4 REPRESENTATIONS FOR INFORMATION
AND KNOWLEDGE
Cognitive computing views the brain as an information processor. Therefore,
suitable representations are needed to represent and transform information
(Davis et al., 1993). In fact, how to represent information/knowledge is one
of the challenges in developing autonomous systems. According to
Friedenberg and Silverman (2015), there are four categories of representa-
tions: concept, proposition, rule, and analogy. Concepts denote objects of
interest in the domain such as people, places, and events. Words in a natural
language are good examples of concepts. Propositions are statements about
the domain. They are always associated with a truth value (true or false).
For example, the sentence “Cognitive computing is an emerging area of
8 SECTION A Fundamentals and Principles
computer science,” is a proposition. Concepts are the building blocks of pro-
positions. Propositions can be combined using logical connectives to generate
compound propositions.
Rules specify relationships between propositions. Rules enable inferring
new information from existing information. Rules help to lessen the need to
exhaustively and explicitly store factual information about the domain. These
type of rules are called inference or reasoning rules. There is another type of
rules referred to as integrity constraints. Their purpose is to verify the consis-
tency of the information and to identify incompatibilities. The third type of
rules is called procedural knowledge, represent more complex and abstract rules
which describe the sequences of steps involved in performing cognitive tasks.
An analogy is a comparison between two things, typically based on their
structure. Analogical representations store information about analogies. Such
representations are used to solve problems through analogical reasoning. If
two problem situations are similar and a solution is known to the first prob-
lem, analogical reasoning proposes the solution to the first problem as a solu-
tion to the second problem. Analogical reasoning solutions are typically
specified using a certainty factor. The knowledge represented using analogies
is termed heuristic knowledge.
An ontology is another knowledge representation scheme (Stephan et al.,
2007). They are explicit formal specifications of the terms and concepts in
the domain and relations among them (Gruber, 1993; Guarino et al., 2009).
They provide a consistent and formal vocabulary to describe the domain
and facilitate reasoning. They promote domain knowledge reuse as well as
enable analyzing the domain knowledge. Ontologies help in making domain
assumptions explicit. Ontologies are not suitable for representing certain types
of knowledge such as diagrammatic and procedural knowledge (Brewster and
O’Hara, 2004). WordNet (Miller, 1995) is a widely used lexical ontology in
cognitive computing. DBpedia is a knowledge base of information extracted
from Wikipedia through crowd-sourcing methods (DBpedia, n.d). In addition
to enabling posing of sophisticated queries against Wikipedia, DBpedia is an
excellent knowledge base for developing certain types of cognitive computing
applications.
Using declarative methods to represent facts and relationships about enti-
ties in the domain have their limitations. Not everything in the domain is ame-
nable for facts and relationships representation. Relationships are too many
for explicit representation and exceptions are common. The above approaches
to knowledge representation are called symbolic representations.
There is another class of representations termed distributed representa-
tions, which is used in the neural network-based cognitive computing archi-
tectures. A neural network is a weighted directed graph comprised of nodes
and edges in a predefined configuration. Typically, a neural network consists
of a layer of input nodes and another layer of output nodes. Input layer nodes
may be directly connected to the nodes in the output layer or there can be a
Concepts, Architectures, Systems, and Applications Chapter 1 9
number of hidden layers between them. A neural network represents knowl-
edge as the weights associated with the edges. A network needs to be trained
on inputs to learn edge weights. In a multilayer network, the layers are learned
simultaneously in a nonlinear fashion. In essence, knowledge representation
emerges as a result of training the network. For example, distributed word
representations are used in Bowman et al. (2015) to support the rich, diverse
logical reasoning captured by natural logic.
An advantage of distributed representations is that they are more resilient
to noisy input and performance degradation is more graceful. However, it is
difficult to explain the behavior of the system using the internal structure of
the network. In applications such as personalized medicine, an explanation
about how a decision has been made is critical. Deep learning, which is a type
of machine learning, heavily relies on multiscale distributed representations.
Input data is characterized using multiple features, and each feature is repre-
sented at multiple levels of scale. In passing, it should be noted that there is a
strong coupling between the cognitive computing architectures and the knowl-
edge representations used.
5 PRINCIPAL TECHNOLOGY ENABLERS OF COGNITIVE
COMPUTING
Cognitive science has been in existence for long as an interdisciplinary disci-
pline whose research focus has been understanding cognition and functioning
of the human brain. In contrast, computing is a young discipline. However,
during the last few years, there were transformational advances in the comput-
ing discipline. These advances, in turn, are providing unprecedented and
unique opportunities for advancing research in cognitive science and data sci-
ence. This section provides an overview of computing technologies which are
central to realizing cognitive computing systems.
5.1 Big Data and Data Science
Recent advances in storage technologies, high performance computing, giga-
bit networks, and pervasive sensing are driving the production of unprece-
dented volumes of data (Gudivada et al., 2015a). Some of this is streaming
data which is produced at high velocities. Furthermore, most of this data is
unstructured and heterogeneous in the form of written and spoken documents,
images, and videos (Berman, 2013). This data is referred to as big data and
numerous systems have been developed for its storage and retrieval
(Gudivada et al., 2016a,b). Big data has enabled several new and innovative
applications (McCreary and Kelly, 2013).
Data Science refers to big data enabled approaches to research and applica-
tions development (Grus, 2015). Data Science provides innovative algorithms
and workflows for analysis, visualization, and interpretation of big data to
10 SECTION A Fundamentals and Principles
enable scientific breakthroughs (Hey et al., 2009). Dhar (2013) defines data
science as the systematic study of the extraction of generalizable knowledge
from data.
Big data provides new ways to solve problems using data-driven
approaches (Gudivada et al., 2015b). In Halevy et al. (2009), it is argued that
the accurate selection of a mathematical model ceases its importance when
compensated by big enough data. This insight is particularly significant for
solving various problems in AI, machine learning, and autonomous systems.
These problems are typically ill-posed for mathematically precise algorithmic
solutions. For example, in NLP, such problems include parsing, part-of-
speech (POS) tagging, named entity recognition, information extraction, topic
modeling, machine translation, and language modeling.
To illustrate how big data and data science are changing the course of
research in NLP, consider the POS tagging problem. This involves assigning
a correct POS tag for each word in a document. For example, given the input
sentence—Big Data is changing the course of NLP research and enabling new
applications—a POS tagger may produce the following: Big/NNP Data/NNP
is/VBZ changing/VBG the/DT course/NN of/IN natural/JJ language/NN
processing/NN (/( NLP/IN )/) research/NN and/CC enabling/VBG new/JJ
applications/NNS ./. The notation Big/NNP means that the POS tag of Big is
NNP (proper noun, singular). The meaning of other POS tags is: CC ¼
conjunction, coordinating; DT ¼ determiner/pronoun, singular; IN ¼ preposition;
JJ ¼ adjective; NN ¼ noun, singular, common; NNS ¼ noun, plural, common;
VBG ¼ verb, present participle or gerund; VBZ ¼ verb, present tense, third person
singular. There is no standard for POS tag sets.
The POS tagging problem is difficult because the same word can be
assigned different tags depending on the context. Therefore, assigning a tag
to a word must consider the definition of the word as well as the context in
which the word is used. Furthermore, many nouns can also be used as verbs.
Also, POS tagging rules vary from one language to another.
There are two broad categories of algorithms for POS tagging: rule based
and stochastic. Algorithms in the first category employ rules. The Brill tagger
uses a form of supervised learning which aims to minimize error (Brill, 1995).
Initial POS tags assigned to words are iteratively changed using a set of pre-
defined rules that take context into consideration. This approach requires
hundreds of rules which are developed by linguists or synthesized using
machine learning algorithms and training data. This is an error-prone and
labor-intensive process. Furthermore, the rules are bound to the language
and domain adaptation is difficult.
Stochastic POS algorithms, on the other hand, are data driven. They are
based on supervised learning models such as Hidden Markov Model, Log-
linear Model, and Conditional Random Field (CRF). More recent stochastic
approaches strive to transition from supervised to semi-supervised and unsu-
pervised algorithms. For example, the approach to POS tagging in Ling
Concepts, Architectures, Systems, and Applications Chapter 1 11
et al. (2015) obviate the need for manually engineering lexical features
in words. The work presented in Andor et al. (2016) is another data-driven,
globally normalized transition-based neural network model that achieves
state-of-the-art performance on POS tagging.
TensorFlow is an open source software library for developing machine
learning-centric applications (TensorFlow, n.d). SyntaxNet (n.d) is an open-
source neural network framework for TensorFlow. SyntaxNet provides a
library of neural models for developing Natural Language Understanding
(NLU) systems. Parsey McParseface is a component of SyntaxNet, which is
a pretrained parser for analyzing English text including POS tagging.
5.2 Performance, Scalability, and Elasticity
Cognitive computing applications typically deal with large, unstructured data.
Moreover, the data may be fraught with quality problems. The data may be
incomplete, inconsistent, conflicting, uncertain, and ambiguous. Furthermore,
the data may also contain duplicates and determining them is not trivial. Pro-
cessing unstructured data to extract information is computationally expensive.
Therefore, substantial computing resources are needed to clean the data and to
extract meaning.
Performance, scalability, and elasticity are three attributes that are used to
characterize the computing needs of cognitive applications. Performance
refers to stringent requirements placed on the time to process the data. For
example, consider IBM Watson and its capabilities as of 2011. Watson is a
question-answering cognitive application whose capabilities include natural
language understanding. To enable IBM Watson to play the Jeopardy! game,
Watson must answer all questions in less than 3 s. The scalability parameter
refers to a computing systems ability to perform under increased workload
without requiring any changes to the software. For example, how does a
system’s performance gets affected when the size of the input is doubled?
Finally, the term elasticity refers to how a computing system dynamically
and automatically provisions and deprovisions resources such as processor
cycles, primary memory, and secondary storage to meet fluctuations in the
system workload. The dynamic aspect is key to provisioning just enough
resources to operate a cognitive system at a specified performance level
despite unpredictable fluctuations in the system workload. Elasticity is critical
to minimize the operating costs of a cognitive system.
5.3 Distributed Computing Architectures
One way cognitive computing systems meet performance and scalability
requirements is through distributed computing architectures. Such architec-
tures consist of several processing nodes, where a node is a self-contained
computer comprised of compute cores, primary memory, and secondary
12 SECTION A Fundamentals and Principles
FIG. 1 A compute cluster whose nodes span across geographically separated data centers.
storage as shown in Fig. 1. The nodes communicate and coordinate their
actions to achieve a common goal through mechanisms such as shared mem-
ory and message passing. The nodes are interconnected through a high-speed
computer network. A logical collection of nodes is called a cluster. Several
nodes are physically mounted on a rack. Some cognitive computing systems
run on clusters whose nodes reside in geographically separated data centers.
Client-server architecture is a widely used computing model for cognitive
computing applications. A server provides a service which is made available
to the clients through an Application Programming Interface (API) or a proto-
col such as REST (Fielding, 2000). Typically, the server and the clients reside
on physically different computers and communicate over a network. How-
ever, the server and the clients may also reside on the same physical com-
puter. The workload is divided between the server and the clients.
In production environments, a cognitive computing application server typ-
ically runs on a cluster. The responsibility for processing client requests as
well as distributing and coordinating workload among various nodes can be
centralized or distributed. Fig. 2 shows both these models. Shown on the left
is the master-worker architecture. A specific node is designated as the master
and is responsible for intercepting client requests and delegating them to
worker nodes. In this sense, the master node acts as a load balancer. It is
also responsible for coordinating the activities of the entire cluster.
Concepts, Architectures, Systems, and Applications Chapter 1 13
FIG. 2 A shared-nothing architecture. (A) Master–worker shared-nothing architecture. (B) Master–
master shared-nothing architecture.
This architecture simplifies cluster management, but the master node becomes
the single point of failure. If the master node fails, a standby master takes over
the responsibility.
Shown in Fig. 2B is an alternative to the master-worker architecture. This
is called master-master or peer-to-peer architecture. All nodes in the cluster
are treated as equals. At any given point of time, a specific node is accorded
the role of a master. If the master node fails, one of the remaining nodes is
elected as the new master. Another architecture called multi-master employs
a hierarchy of masters and master–worker style is used at the lowest level.
Master–worker and master–master configurations are called shared-
nothing architectures since the nodes are self-contained and do not share
resources. Both architectures distribute data and processing across the nodes
in the cluster to achieve performance at scale. Data is also replicated to a sub-
set of the nodes to ensure high availability. Some systems allow adding new
nodes or removing existing ones (intentionally or due to a node failure) with-
out service interruption. Computing systems based on shared-nothing archi-
tecture accommodate increased workloads by adding new nodes. Testing of
cognitive computing systems that use master–master architecture is easier
than the ones that use master–worker architecture.
5.4 Massive Parallel Processing Through MapReduce
Given the massive volumes of unstructured data that cognitive systems process
in near real time, a high degree of parallel processing is required. MapReduce
is a distributed programming model designed for processing massive amounts
of data using cluster computers (Ryza et al., Year; White, 2015). This model
is inspired by the map and reduce functions commonly used in functional
programming languages.
Fig. 3 shows processing steps involved in a MapReduce computation.
They are best illustrated through an example. Consider the problem of
14 SECTION A Fundamentals and Principles
FIG. 3 MapReduce architecture.
computing an inverted index for a very large textual document collection.
Each document in the collection is identified with a unique identifier.
For each significant word in the collection, the inverted index lists all the
documents in which the word occurs.
As a first step, the documents in the collection are partitioned into nonover-
lapping sets and each set is assigned to a map process (see Fig. 3). Optimal
partitioning of the input files into sets and assigning each set to a map process
depends on the problem characteristics. Map processes execute in parallel on
different nodes (step 1). They read documents assigned to them and extract
ordered pairs of the form (word, doc_id). In other words, for each instance
of a significant word that appears in any document, an ordered pair is
generated.
In the second step, the Barrier column in Fig. 3 acts as a barricade by
ensuring that all the mapper processes have completed their work before
moving to the third step. The second step also collects the generated key-
value pairs from each mapper process, sorts them on the key value, and
partitions the sorted key-value pairs. In the last step, each partition is assigned
to a different reduce process. A function which performs this assignment is
called shard.
Each reduce process essentially receives all the key-value pairs
corresponding to one or more words (e.g., (authentic, doc_id_14), (authentic,
doc_id_3), (eclectic, doc_id_111), (eclectic, doc_id_21), (eclectic,
doc_id_45)). The output of each reduce process is a subset of the inverted
index (e.g., (authentic, (4, 13)), (eclectic, (21, 45, 111))). No synchronization
is required for the reduce processes.
Concepts, Architectures, Systems, and Applications Chapter 1 15
5.5 Cloud Computing
For individuals and even for many organizations, building and maintaining
cluster-based distributed computing infrastructure is not desirable for many
reasons. First, a significant upfront investment is needed to build the cluster.
Second, highly specialized, technical personnel are required to build and
operate the cluster. Third, it is difficult to achieve scalability without service
interruption. Fourth, given the unpredictable fluctuations in workload, it is
extremely difficult to achieve scalability in a cost-effective manner. Often,
scalability requires overprovisioning of resources to compensate for unpre-
dictable fluctuations in the workload. Cognitive computing applications need
elastic scalability—provisioning resources optimally to guarantee perfor-
mance requirements at all times regardless of fluctuations in the workload.
This is where cloud computing comes into play.
The primary goal of cloud computing is to provide computing infrastructure
as well as turnkey software applications as utilities (Armbrust et al., 2010).
Terms such as infrastructure as a service and software as a service are used
to describe these utility services. Cloud computing is a model for providing
on-demand access to a shared pool of computing resources. It leverages the fact
that a fixed pool of resources can be dynamically provisioned and deprovi-
sioned across multiple applications to meet their fluctuating workloads. In other
words, an economy of scale is achieved through dynamically sharing resources.
Each application is provided a virtual machine (VM) whose local resources are
configured and dynamically adjusted using the global shared pool.
Organizations pay for only the resources that their applications use, irre-
spective of the size of the resource pool. Several vendors including Microsoft,
IBM, Nvidia, and Amazon provide cloud services for hosting applications.
Features of these services vary along the facets such as service level agree-
ments, authorization and authentication services, data encryption, data transfer
bottlenecks, application monitoring, and appearance of infinite computing
resources on demand. Cloud computing is especially important for cognitive
computing applications as vendors such as IBM and Numenta make their cog-
nitive APIs available via cloud hosting.
5.6 AI and Machine Learning
There is an analogy between the state of AI and machine learning today to
that of the Information Retrieval (IR) (Gudivada et al., 1997) research
between 1990 and 1995. Before 1990, IR research was mostly confined to
academia. There were few commercial products such as those offered by
LexisNexis, which provided search functionality to a collection of legal docu-
ments and journal publications. These documents were stored together on a
disk. The advent of the World-wide Web and the need for searching docu-
ments distributed across the Web provided the necessary impetus for advanc-
ing IR research and distributed search. The Text REtrieval Conference
16 SECTION A Fundamentals and Principles
(TREC) is an annual conference and competition, whose goal is to promote IR
research by providing test data sets. TREC-1 started in 1992 (Harman, 1993)
and continues to attract researchers globally even today. The dramatic
advances in the Web search are in part credited to the TREC competitions.
A similar situation exists today with cognitive computing. For example,
consider the very first autonomous ground vehicle Grand Challenge sponsored
by the Defense Advanced Research Projects Agency (DARPA) in 2004. The
goal was to promote research and development of autonomous, self-driving
ground vehicles capable of reaching a destination within specified time con-
straints using off-road terrain in the Mojave Desert region of the United States.
Twenty teams participated in this event and none of the vehicles reached the
destination. The farthest distance traveled by any vehicle was 7.32 miles. Con-
trast this with the DARPA Urban Challenge competition held in 2007. The
course involved a 60-mile stretch of an urban area and the journey needed to
be completed in less than 6 h. Six vehicles successfully completed the entire
course. This is a remarkable improvement in just 3 years. DARPA continues
to conduct these challenges, and the most recent one being the 2013 Fast
Adaptable Next-Generation Ground Vehicle (FANG) Challenge.
This unprecedented interest in autonomous and intelligent systems
resulted in the creation of several frameworks and libraries for developing
such systems. Notable is the availability of various data-driven machine
learning libraries. They include TensorFlow, a high-level neural network
library; DeepLearning4J, a framework for deep learning; NuPIC, a library
for a theory of the neocortex called Hierarchical Temporal Memory (HTM);
Caffe, a machine learning library for computer vision applications; and scikit-
learn, a low-level library for data mining and data analysis research.
Intended use cases and functional features of machine learning algorithms
vary greatly. The facets that differentiate them include types of preprocessing
required on input data to achieve data quality; amount of data required for
training, testing, and model refinement; approaches to cross-validating models
to prevent overfitting; types of learning algorithms—supervised vs unsuper-
vised, and ensemble learning; quality and pervasiveness of learning; and ease
of domain adaptation. Based on these facets, machine learning algorithms are
categorized into the following classes: decision trees, associative rule learning,
genetic algorithms, refinement learning, random forest, support vector
machines, Bayesian networks, and deep learning.
5.7 Neuromorphic Computing
Cognitive computing applications typically require sophisticated processing
of noisy and unstructured real-world data under stringent time constraints.
Neuromorphic computing and neural network accelerators are solutions to
meet these processing challenges. The goal of neuromorphic computing is
to use very large scale integration (VLSI) systems that are driven by
Concepts, Architectures, Systems, and Applications Chapter 1 17
electronic analog circuits to simulate neurobiological architectures present in
the nervous system (Williams, 2016). These VLSI chips are characterized by
ultra-low power consumption and high performance. They are referred to as
neuromorphic chips or brain chips.
IBM’s funding through the DARPA Synapse program resulted in the cre-
ation of a neuromorphic chip called True North. It is a 4096-core chip consist-
ing of 256 programmable neurons which function much like synapses in the
brain. True North uses digital spikes to perform neuromorphic computing.
Kim et al. (2015) describe a reconfigurable digital neuromorphic processor
(DNP) architecture for large-scale spiking neural networks. In another study,
Du et al. (2015) investigate relative merits of a hardware implementation of
two neural network accelerators. The first accelerator’s design is inspired by
the machine learning domain, whereas that of the second by the neuroscience
discipline. They analyze these two classes of accelerators in terms of energy
consumption, speed gains, area cost, accuracy, and functionality. One more
study reported in Chen et al. (2015) discusses implementing machine learning
algorithms on a chip. Another study (Liu et al., 2013) describes how an ultra-
high power efficiency beyond One-TeraFlops-Per-Watt was achieved in a
bioinspired neuromorphic embedded computing engine named Centaur.
In April 2016, Nvidia released a state-of-the-art chip, Tesla P100 GPU, which
specifically targets machine learning algorithms that employ deep learning. The
GPU features 150 billion transistors on a single chip. DGX-1 is Nvidia’s newest
supercomputer which is powered by eight Tesla P100 GPUs. DX-1 comes with
deep-learning software preinstalled and costs less than $130,000.
Neuromorphic computing opens up new possibilities for advancing cogni-
tive computing. The field is new and the methods lack maturity. For example,
the low quantization resolution of the synaptic weights and spikes signifi-
cantly limits the inference in the True North chip. To alleviate this problem,
Wen et al. (2016) propose a new learning method which constrains a random
variance in each computation copy. This results in a 68.8% reduction in the
number of neurosynaptic cores or equivalently a 6.5 speedup.
The recent European Union funded Human Brain Project aims to develop
scientific research infrastructure to accelerate research in neuroscience, com-
puting, and brain-related medicine. A counterpart in the US is the White
House Brain Research through Advancing Innovative Neurotechnologies
(BRAIN) initiate, whose focus is dynamic understanding of brain functions.
6 COGNITIVE COMPUTING ARCHITECTURES
AND APPROACHES
A cognitive architecture is a blueprint for developing cognitive systems. It
specifies fixed structures and interactions among them with the goal of
achieving functions of the mind. The knowledge embodied in the architecture
drives the interactions among the structures to achieve intelligent behavior.
18 SECTION A Fundamentals and Principles
A cognitive model, in contrast with a cognitive architecture, focuses on a
single cognitive process such as language acquisition. Cognitive models are
also used to study the interaction between cognitive processes such as lan-
guage understanding and problem-solving. Furthermore, they are used for
behavioral predictions for tasks. For example, how does an increase in train-
ing time affect air traffic controllers’ performance?
Cognitive architectures tend to focus on structural aspects of cognitive
systems. They constrain the types of cognitive models that can be developed.
Likewise, cognitive models help to reveal the limitations of cognitive archi-
tectures. Thus, there is a strong interplay between cognitive architectures
and models. In the literature, the terms cognitive architecture and cognitive
model are not used consistently, and are often used synonymously. The con-
text should help to reveal the intended meaning.
Cognitive architectures are an area of intensive research. In Langley et al.
(2009), motivations for research on cognitive architectures are discussed. The
study enumerates capabilities that cognitive architecture should provide
related to representation, organization, performance, and learning. It also spe-
cifies criteria for evaluating cognitive architectures at the systems level and
points out open research problems in cognitive architectures. A critical survey
of the state-of-the-art in cognitive architectures is presented in Duch et al.
(2008). This study provides useful insights into the usefulness of existing arch-
itectures for creating artificial general intelligence. The focus of cognitive
architectures research is moving away from the functional capabilities of arch-
itectures to their ability to model details of human behavior and brain activity
(Taatgen and Anderson, 2010).
There are three major classes of cognitive architectures: cognitivist,
connectionist, and hybrid. They are discussed in the following sections. At the
core of any cognitive system lies a cognitive architecture. A cognitive system is
realized by creating a cognitive model using a cognitive computing architecture.
6.1 Cognitivist Architectures and Approaches
Cognitivist architectures represent information using explicit symbolic repre-
sentations (Anderson, 1983). These representations use an absolute ontology
to symbolize external objects. Representations are synthesized by human
designers and are directly placed into artificial cognitive systems. Cognitivist
architectures are also called symbolic architectures and Artificial Intelligence
(AI) approaches. Cognitive systems based on this architecture are quite
successful in solving specific problems. However, they lack generality to be
useful across domains.
6.1.1 ACT-R
Adaptive Control of Thought-Rational (ACT-R) (Anderson, 1996; Anderson
et al., 2004) is a theory about the mind and also a basis for several cognitive
Concepts, Architectures, Systems, and Applications Chapter 1 19
architectures of the cognitivist type. According to this theory, complex cogni-
tion is enabled by interaction among a set of integrated knowledge modules in
the mind. Each module is associated with a distinct cortical region in the
brain. Some modules represent procedural knowledge whereas others repre-
sent declarative knowledge.
Procedural knowledge is represented using production rules and modules
called chunks represent declarative knowledge. Production rules capture trans-
formations that occur in the cognitive system’s environment. Declarative
knowledge about objects in the environment is captured by chunks.
A large collection of modules in the brain is the underlying infrastructure
for functions of the mind. Depending on the context and cognitive task, appro-
priate modules are selected and activated. Anderson et al. (2004) illustrates
how these modules function individually and in unison in realizing simple
and complex cognitive tasks.
6.1.2 Soar
Soar is another general architecture for developing cognitive systems
(Lehman et al., 1996, 2006). It has been in use since 1983 and has evolved
over the years. Soar uses a single architecture for all its tasks and subtasks.
In the initial versions, Soar used a single representation for production rules
and temporary knowledge, another single mechanism for generating goals
and subgoals, and chunking for learning. In the recent releases, Soar uses mul-
tiple learning mechanisms—chunking, reinforcement learning, episodic
learning, and semantic learning (Laird, 2008). Also, multiple representations
of long-term knowledge are used—productions for procedural knowledge,
semantic memory, and episodic memory.
Soar’s goal is to support cognitive functions required to implement a gen-
eral intelligent agent. Soar makes its decisions dynamically based on relevant
knowledge. Decisions are based on current interpretation of sensory data,
contents of working memory accumulated through previous experiences,
and relevant knowledge retrieved from the long-term memory.
6.1.3 Mala
Mala is a multientity cognitive architecture for developing robots that must
work in dynamic, unstructured environments (Haber and Sammut, 2013).
Mala bridges the gap between a robot’s sensorimotor and cognitive compo-
nents. Mala supports modular and asynchronous processing, specialized repre-
sentations, translations between representations, relational reasoning, and
multiple types of integrated learning.
6.1.4 GOMS
GOMS is a family of predictive models of human performance (Card et al.,
1983). GOMS stands for Goals (which can be accomplished with the system),
20 SECTION A Fundamentals and Principles
Operators (basic actions that can be performed on the system), Methods
(sequences of operators that can be used to accomplish a goal), and Selection
rules (selecting right methods to accomplish a goal). The models in the
GOMS family are Keystroke-Level Model (KLM), Critical-Path Method
GOMS (CPM-GOMS), Natural GOMS Language (NGOMSL)/Cognitive
Complexity Theory, and Executable GOMS Language (GOMSL)/GLEAN.
Each model in the family provides different sets of operators.
GOMS models are used to evaluate and improve human–computer interac-
tion (HCI) (Sears and Jacko, 2009). They are used to describe the knowledge
of procedures that a user must possess to operate a system such as a software
application. GOMS models help to identify and eliminate unnecessary user
actions. In other words, GOMS is a technique for task analysis. GOMS models
are limited to describing procedural knowledge only.
6.1.5 Limitations
Symbolic representations reflect the designers’ understanding of the domain
and may bias the system. Also, it is difficult for the designers to come up with
all relevant representations which are adequate to realize the desired cognitive
behaviors of the system. Another issue with representations is the symbol
grounding problem (Harnard, 2003), which is related to the problem of how
meanings are assigned to words and what these meanings really are. The thing
that a word refers to (its referent) is not its meaning.
6.2 Connectionist Architectures and Approaches
Connectionist architectures are inspired by the information processing that
occurs in biological neural systems (Flusberg and McClelland, 2014). In the
latter, information is processed by simple, networked computational units
called neurons, which communicate in parallel with each other using electro-
chemical signals (Laughlin and Sejnowski, 2003). A synapse is a junction
between two neurons with a minute gap across which signals pass by diffusion
of a neurotransmitter.
The brain is made up of neurons, which are estimated to be in the 10–100
billion range. Each neuron is predicted to have over 10,000 connections to other
neurons. A neuron receives stimuli from other neurons through its incoming
connections. Neurons perform nonlinear computations using the received
stimuli. The effect of this computation activates other neurons through its outgo-
ing connections. Strengths of connection activations are quantified on a numeric
scale and adjusted to reflect the state of network learning. Architectures based
on this approach are called connectionist or emergent architectures.
6.2.1 ANN
Artificial neural networks (ANN) are a family of computational models based
on connectionist architectures. In recent years, there is a renaissance of neural
Concepts, Architectures, Systems, and Applications Chapter 1 21
networks as powerful machine learning models (Goldberg, 2015). Though
neural models have been used for tasks such as speech processing and image
recognition for many decades, their widespread and intense use in NLP is
relatively new. Goldberg (2015) provides a tutorial survey of neural network
models for NLP including input encoding; feed-forward, convolutional, recur-
rent, and recursive networks; and computation graph abstraction for automatic
gradient computation.
6.2.2 Lexical and Compositional Semantics
The mathematical representation of semantics is an important open issue in
NLP. Semantics are of two types: lexical and compositional. Lexical seman-
tics focuses on the meaning of individual words, whereas compositional
semantics represent the meaning at the level of larger units such as phrases,
sentences, and paragraphs.
Vector space is one way to represent lexical semantics. Lexical semantics
are useful, but compositional semantics are critical for many natural language
understanding (NLU) tasks such as text summarization, statistical paraphras-
ing, and textual entailment. Representations that draw upon cooccurrence
statistics of large corpora are called distributed representations. Hermann
(2014) describes several approaches to distributed representations and
learning compositional semantics. He also discusses neural models that use
distributed representations for various NLP tasks. A related study (Bowman
et al., 2015) discusses how distributed representations-driven neural models
learn the basic algebra of natural logic relations.
6.2.3 NEF
Neural Engineering Framework (NEF) is a general methodology for develop-
ing large-scale, biologically plausible, neural models of cognition (Stewart,
2012). NEF functions like a neural compiler. Given suitable inputs such as
neuron properties and functions to be computed, NEF solves for the connec-
tion weights between neurons that will perform the desired functions. Nengo
is a Python software library for developing and simulating large-scale brain
models based on the NEF (Bekolay et al., 2014).
Development of a 2.5-million-neuron model of the brain, called Spaun, is
described in Eliasmith et al. (2012). Spaun is built using the NEF and models
many aspects of neuroanatomy, neurophysiology, and psychological behavior.
These behaviors are illustrated via eight tasks—copying a drawing; image rec-
ognition; performing a three-armed bandit task; reproducing a given arbitrary
length list; summing two values; question-answering on a list of numbers;
given a number of syntactic input/output patterns, determining output pattern
for a novel input pattern; and performing a syntactic or semantic reasoning
task that is similar to the induction problems from the Raven’s Progressive
Matrices test for fluid intelligence.
22 SECTION A Fundamentals and Principles
An overview and comparison of several recent large-scale brain models
are described in Eliasmith and Trujillo (2014). In his recent book, Eliasmith
(2015) provides a guided exploration of a new cognitive architecture termed
Semantic Pointer Architecture. The book also provides tools for constructing
a wide range of perceptual, cognitive, and motor models.
6.2.4 Deep Learning
The recent advent of computing processors specially designed for neural net-
work computations (see Section 5.7) ushered in deep learning networks.
These networks employ multiple layers of neural processing units. The core
of almost all deep learning algorithms is backpropagation, which is used for
training the network. Backpropagation is a gradient descent computation
distributed over the neural network.
Fundamental concepts such as representations are not formally defined for
deep learning (Balduzzi, 2015). Also, there is no common language for
describing and analyzing deep learning algorithms. Balduzzi (2015) proposes
an abstract framework which enables formalizing current deep learning algo-
rithms and approaches. Deep learning networks have been quite successful in
solving diverse classification problems.
It is argued in Hawkins et al. (2016) that neural networks deviate from
known brain principles and thus do not truly reflect the way the brain func-
tions. Therefore, from a neuroscience perspective, it can be stated that systems
based on connectionist architectures also lack generality in solving problems
the way the human brain does.
6.3 Hybrid Architectures and Approaches
The last class of cognitive computing architectures encompasses those that
employ a combination of symbolic and connectionist architectures. This class
also includes neocortex inspired approaches (Mountcastle, 1998). The neocor-
tex is a part of the cerebral cortex (brain) concerned with sight and hearing in
mammals. Neocortex employs an extremely high degree of parallelism.
Neocortex neurons perform simple functions. These characteristics bode
well for highly efficient hardware implementations of the neocortex (Rice
et al., 2009). Also, every region of the neocortex has both sensory and motor
functions. This observation suggests that cognitive computing systems be
built by integrating sensory-motor senses. Furthermore, neocortex of the brain
uses common principles and algorithms for diverse cognitive tasks such as
vision, hearing, language, touch, and behavior. This suits well for developing
neocortex inspired cognitive systems that are not tied to specific domains. We
use the umbrella term hybrid architectures to refer to these approaches.
6.3.1 LIDA
Learning Intelligent Distribution Agent (LIDA) is a hybrid cognitive architec-
ture as well as a model of cognition. LIDA is grounded in cognitive science
Concepts, Architectures, Systems, and Applications Chapter 1 23
and cognitive neuroscience and is intended to model a significant portion of
human cognition (Franklin et al., 2014). Human cognition functions through
cascading cycles of recurring brain events. Each cognitive cycle assesses the
current state, interprets the state with reference to current goals of a cognitive
task, and selects an internal or external response.
LIDA implements a three-phase cycle: understanding phase, attention
(consciousness) phase, and action selection and learning phase. To achieve
a human-level performance, a cognitive system must be capable of a theory
of mind. How LIDA accomplished its version of a theory of mind is discussed
in Friedlander and Franklin (2008). The suitability of LIDA’s characteristics
for developing cognitive architectures for artificial general intelligence
(AGI) is discussed in Snaider et al. (2011).
6.3.2 Sigma
Sigma (S) is a recent cognitive system based on a novel cognitive architecture
(Rosenbloom, 2013). It is comprised of three layers: a cognitive architecture,
which is a fixed structure; knowledge and skills component which is posi-
tioned on top of the architecture; and an equivalent of firmware architecture.
Predicates and conditionals provided by the cognitive architecture integrate
the functionality of rule-based systems and probabilistic networks. The firm-
ware architecture connects its implementation language (Lisp) and the cogni-
tive architecture using a language of factor graphs and piecewise continuous
functions. Sigma’s capability has been demonstrated for the following cogni-
tive functions: perception, mental imagery, decision making and problem
solving, and NLP tasks such as such as word sense disambiguation and part
of speech tagging.
6.3.3 HTM
Hierarchical Temporal Memory (HTM) is a theoretical framework for both
biological and machine intelligence (Hawkins et al., 2016). The HTM models
structural and algorithmic properties of the neocortex. It combines existing
ideas to simulate the neocortex with a simple design, and yet provides a large
range of cognitive capabilities. HTM integrates and extends approaches used
in sparse distributed memory, Bayesian networks, spatial and temporal clus-
tering algorithms. HTM structure resembles a tree-shaped hierarchy of nodes
as in some neural networks.
6.3.4 IBM Watson
IBM Watson is perhaps the best known cognitive computing system in terms
of its broader impact on society at large (IBM, n.d; Ferrucci et al., 2010). It is
also the first cognitive computing system that leveraged the synergy between
cognitive science and an array of computing technologies. Watson performed
at a level to win Jeopardy! game against two all-time Jeopardy! human cham-
pions in 2011.
24 SECTION A Fundamentals and Principles
Watson cognitive computing capabilities are available to organizations
and businesses through BlueMix API. There are more than two dozen Watson
APIs which encapsulate over 50 cognitive technologies. IBM is also market-
ing Watson technologies under themes such as cognitive analytics, cognitive
businesses, cognitive homes, and cognitive cars. Underlying these themes is
a transparent cognitive computing infrastructure that provides services to
build cognitive agents.
6.3.5 Hierarchical Bayesian Model
George and Hawkins (2005) describe a hierarchical Bayesian model for
invariant pattern recognition in the visual cortex. Rice et al. (2009) investigate
parallel software- and hardware-accelerated implementations of George and
Hawkins model and also perform scaling analysis using Cray XD1 supercom-
puter. They report that hardware-acceleration provides an average throughput
gain of 75 over software-only implementations of the networks they studied.
6.3.6 Textual Narrative to 3D Geometry Models
Constructing 3D geometric representations from textual narratives has appli-
cations in arts, robotics, and education. A hybrid cognitive architecture is used
to accomplish this task in Chang et al. (2015). A dataset of 3D scenes anno-
tated with natural language descriptions is used as training data for a neural
network classifier and extract significant features that ground lexical terms
to 3D models. These features are then integrated into a rule-based scene gen-
eration system.
7 COGNITIVE COMPUTING SYSTEMS AND APPLICATIONS
Cognitive computing systems have been used to solve a range of problems in
diverse disciplines.
7.1 Intelligent Tutoring Systems
The earliest cognitive systems incarnated as intelligent tutoring systems
(Anderson et al., 1995). Using the ACT cognitive architecture (Anderson,
1996) and advanced computer tutoring theory (Anderson, 1983), three ACT-
based production models were developed (Anderson et al., 1995). The models
reflect how students solve problems in algebra, geometry, and Lisp. Best case
evaluations showed that students were able to achieve the same level of per-
formance with these cognitive tutors compared to conventional instruction in
one-third of the time.
PAT (Koedinger et al., 1997) is a cognitive tutor for high school algebra.
It is also based on the ACT cognitive architecture. PAT’s evaluation was
performed on 470 students in experimental classes. Students in these classes
outperformed students in comparison classes (who did not use PAT) by
Concepts, Architectures, Systems, and Applications Chapter 1 25
15% on standardized tests. Supporting guided learning using a cognitive tutor
is discussed in Aleven and Koedinger (2002). The study reports that students
who explained their steps during problem-solving practice learned with
greater understanding compared to those who did not explain their steps.
7.2 Problem Solving Systems
GEOS is a cognitive system for solving unaltered SAT geometry questions
(Seo et al., 2015). It employs both text understanding and diagram interpreta-
tion in solving geometry problems. The problem is modeled as one of sub-
modular optimization. It identifies a formal problem description which is
likely to be compatible with question text and diagram. GEOS scored 49%
on official SAT questions and 61% on practice questions.
SimStudent is a teachable agent which learns skills such as solving linear
equations (Matsuda et al., 2013). It learns from examples as well as from
feedback on its performance. SimStudent is used to investigate how and when
students learn or not learn when they teach. Analysis of results indicates that
several cognitive and social factors are correlated with SimStudent learning
(equivalently, student learning). Such factors include accuracy of students’
feedback and hints, quality of students’ explanations during tutoring, and
appropriateness of problem selection.
7.3 Question Answering
The IBM DeepQA project is IBM Watson’s genesis (Ferrucci et al., 2010;
Zadrozny et al., 2015). Knowledge representation and reasoning, machine
learning, natural language understanding, and information retrieval are Watson’s
foundational technologies. DeepQA technology is at the core of Watson,
whose functions include hypothesis generation, evidence gathering across het-
erogeneous data sources, analyzing evidence, and assigning a score for each
hypothesis. Highly parallel underlying hardware and network provided the com-
pute power.
7.4 Health Care
IBM Watson Health is a version of Watson for the health care and wellness
domains. In 2011 Watson made forays into personalized cancer treatments.
Medtronic is a medical technology, services, and solutions company. Using
IBM Watson Health, Medtronic developed a glucose monitor and an insulin
pump. The glucose monitor tracks sugar levels in people with diabetes and
the insulin pump automatically supplies insulin. 125 million patient days of
anonymous data from insulin pumps and glucose monitors has been collected
by Medtronic to date. Watson Health will analyze this data to discover
insights which can lead to new treatments for diabetes.
26 SECTION A Fundamentals and Principles
The Massachusetts General Hospital plans to use Nvidia GPUs to spot
anomalies in CT scans and other medical images (Don, n.d). This project will
draw upon 10 billion existing images to train a cognitive computing system
that will use deep learning algorithms. The system is intended to help doctors
more accurately detect diseases such as cancer and Alzheimers when the
diseases are in their early stages.
7.5 Cognitive Businesses
Cognitive businesses are characterized by learning systems explicitly
designed to collaborate with people in conversational style and understand
natural language. Cognitive businesses also benefit from cognitive analytics.
Instead of using predefined rules and structured queries to identify data to
enable decision making, cognitive analytics utilizes cognitive computing tech-
nologies to generate multiple hypotheses, gather and weigh evidence from
multiple data sources in support of each hypothesis, and rank hypotheses.
Hypotheses that score above a certain threshold are presented as recommenda-
tions along with a numeric value to indicate the system’s confidence in each
hypothesis. The quality of insights generated by cognitive analytics increases
with more data, which is used to train machine learning algorithms. Versions
of IBM Watson are being marketed as cognitive business solutions.
7.6 Human–Robot Interaction
An important aspect of a human–robot interaction is perspective-taking,
which encompasses an understanding of a concept or assessing a situation
from a different point-of-view. Trafton et al. (2005) illustrate how
perspective-taking enables astronauts to work in a collaborative project envi-
ronment. They describe a cognitive architecture termed Polyscheme for per-
spective taking. They also develop a cognitive system based on Polyscheme
and integrate it with a working robot system. The cognitive system is success-
ful in solving a series of perspective-taking problems.
7.7 Cognitive Robots
IBM Watson is used at SoftBank Robotics, which is a provider of robotics
platforms. IBM Watson-powered robot, called Pepper, is under development
at SoftBank Robotics. Watson will help Pepper to extract meaning from
unstructured data including text, images, video, and social media.
7.8 Deep Learning and Image Search
Machine learning algorithms play a prominent role in cognitive computing.
However, many of them require hand crafting of features, which requires sub-
stantial effort. A new class of neural algorithms, called deep learning
Concepts, Architectures, Systems, and Applications Chapter 1 27
algorithms, is becoming popular since they can automatically learn feature
representations. This automation obviates the need for labeled data required
for training the neural networks.
Deep learning algorithms require massive neural networks similar to that
of biological neural networks. The Google Brain project, Google DeepMind,
and GPU-accelerated Torch scientific computing framework (with extensive
support for machine learning algorithms) attest to the enormous interest in
deep learning research and applications. Deep learning algorithms have been
successfully used for solving a range of problems.
The MSR-Bing grand challenge (Hua et al., 2014) is a competition to
advance content-based image retrieval research (Gudivada and Raghavan,
1995). MSR-Bing grand challenge goals are similar to those of TREC confer-
ences (Harman, 1993). The image dataset for training is generated by sampling
1-year click logs of the Microsoft Bing image search engine. The dataset con-
sists of 23.1 million triads of the form (I, Q, C). A triad specifies that for the
image query Q, image I was clicked (as the relevant image) C times.
A deep neural network (DNN) approach presented in Bai et al. (2014) for
computing similarity between images, maps image raw pixels to a bag-of-
words vector space. Images are represented as vectors in the bag-of-words
space and relevance between two images is computed using a cosine similar-
ity measure. The DNN model also extracts high-level visual features, which
are also used for computing image similarity. The DNN model is trained
using the MSR-Bing grand challenge dataset. In another study, Wan et al.
(2014) developed a framework for deep learning and used it to investigate
approaches for learning feature representations and similarity measures in
the context of content-based image retrieval applications.
7.9 Cross-media Retrieval
Cross-media retrieval is a challenging problem. It allows users to search for
documents across various media by specifying a query in any media. The term
document is used in a generic sense and refers to any media content such as
text, image, audio, or video. Searching for images using a query expressed
in text is an example of cross-media query. To support such queries, generat-
ing text that best describes the content of an image is required. Dong et al.
(2016) propose a deep neural network-based approach, called Word2Visual-
Vec, for cross-media retrieval. Evaluation is performed using three classes
of queries: text-to-image, image-to-text, and text-to-text.
7.10 Brain–Computer Interfaces
Brain–computer interface (BCI) enables interaction between a brain and an
external device. For example, brain signals can be used to control external
devices such as cars and prosthetic limbs. Several analytic platforms are
28 SECTION A Fundamentals and Principles
available for studying brain–computer interface (BCI). These platforms use
electroencephalogram (EEG), magnetoencephalography (MEG), and func-
tional near-infrared spectroscopy (fNIRS) to record brain signals. The signals
are used to estimate a person’s cognitive state, response, or intent for various
purposes such as opening a door. Martinez et al. (2007) propose a new multi-
stage procedure for a real-time BCI. They develop a system based on the pro-
cedure, which allows a BCI user to navigate a small car on a computer screen
in real time. In a related study, Negueruela et al. (2011) investigate the use of
BCI in the space environments, where an astronaut can use mental commands
to control semi-automatic manipulators.
7.11 Autonomous Vehicle Navigation
Cognitive systems are central to autonomous vehicle navigation. They use
several technologies such as radar, lidar, GPS, computer vision, and odometry
to detect their surroundings. Advanced control systems analyze sensory data
to distinguish different objects on the road. Google self-driving car, Tesla
Motors Autopilot, and Mercedes-Benz Drive Pilot are active projects towards
realizing the dream of self-driving cars.
8 TRENDS AND FUTURE RESEARCH DIRECTIONS
The synergistic confluence of cognitive science, data science, and computing
will accelerate cognitive computing research and applications. Cognitive sci-
ence will continue to provide theoretical underpinnings, the data science will
provide cognitive analytics, and the computing discipline will bring advances
in hardware technologies, big data, and machine learning to enable develop-
ing extremely large scale cognitive systems. The latter is critical to advancing
the field by validating cognitive theories through empirical studies.
Biological cognitive systems have an advantage in that we do not under-
stand their structure and function at a level to replicate them artificially. On
the other hand, cognitive computing systems do not have the limitations of
biological cognitive systems in terms of memory size and performance degra-
dation due to aging and overwork.
The distinction between the symbolic AI approaches and neural network
approaches will continue to blur. They complement each other rather than
compete. Symbolic approaches have an edge over neural approaches in
explaining the reasoning, whereas the neural network approaches contribute
to building better models by leveraging big data and machine learning algo-
rithms. Neural approaches can use symbolic inference rules to extract
reasoning steps. On the other hand, neural approaches can be used to synthesize
symbolic inference rules using semi-supervised and unsupervised learning.
Schatsky et al. (2015) conducted a study to assess how cognitive technolo-
gies are used in over 100 organizations that span 17 industry sectors. Their study
Concepts, Architectures, Systems, and Applications Chapter 1 29
revealed that cognitive technologies use falls into three categories. The first case
involves embedding the technology into a product or service such as self-driving
cars and other autonomous vehicles. Second, process applications embed cogni-
tive technologies into workflows to improve operational efficiency. The third
category encompasses using cognitive analytics to gain insights that will drive
operational and strategic decisions across an organization. The current use of
cognitive technologies by businesses is the proverbial tip of the iceberg.
According to EMC Digital Universe, based on research & analysis per-
formed by International Data Corporation (IDC) (Turner, n.d), data is growing
at an annual rate of 40% into the next decade. For example, the impending
Airbus A380-1000 airliner will be capable of carrying about 1000 passengers.
It will be equipped with 10,000 sensors in each wing, which will generate
about 8 Tb of data daily. There will be many more sensors in the engines
and other parts of the aircraft. Cognitive analytics will play a prominent role
in utilizing this data for a range of tasks including preventive maintenance,
improving passenger comfort and safety, and enhancing the operational effi-
ciency of airports. More and more companies, ranging from Netflix, Amazon,
and IBM to Google are branding themselves machine learning companies.
Machine learning algorithms are at the core of their products and services.
Garrett (2014) discusses how big data analytics and cognitive computing
will propel research and applications in astronomy. He argues that factors such
as performance at scale, machine learning, and cognitive science will contribute
to the rapid progress of the field. Along the same lines, Noor (2014) states that
cognitive computing will be the game changer for engineering systems. Patel
et al. (2001) explain how cognitive science can help gain insights into the
nature of human–computer interaction processes in medical informatics, which
can be used to improve the design of medical information systems.
Though the state-of-the-art cognitive systems are not anywhere close to
performing at the level of humans, the gap is rapidly diminishing. For exam-
ple, consider the Go game, which has long been viewed as the most challeng-
ing problem for AI and machine learning researchers. Compared to the Chess
game, Go has enormous search space and this makes brute force evaluation of
board positions and moves difficult. Silver et al. (2016) discuss how they have
applied DeepMind cognitive technologies to develop AlphaGo, which
defeated the human European Go champion by 5 games to 0. Furthermore,
AlphaGo achieved a 99.8% winning rate against other Go computer programs.
This event is extremely significant given that this is the first time ever a com-
puter program defeated a human professional player in a full-sized Go game.
In closing, cognitive computing is the confluence of cognitive science,
data science, and computing. Dramatic advances in natural language under-
standing, image, and video analysis hold keys for propelling cognitive com-
puting to the next level. Cognitive computing is a double-edged sword with
the power to transform human lives or become an instrument for misuse
and destruction.
30 SECTION A Fundamentals and Principles
9 COGNITIVE COMPUTING RESOURCES
9.1 Open Source Frameworks, Tools, and Digital Libraries
1. 1-billion-word-language-modeling-benchmark, a standard training and
test setup for natural language modeling experiments. Google
Corporation. https://github.com/ciprian-chelba/1-billion-word-language-
modeling-benchmark.
2. Apache Lucene Core, a full-featured text search engine Java library.
http://lucene.apache.org/core/.
3. Apache UIMA project. Open source frameworks, tools, and annotators
for facilitating the analysis of unstructured content such as text, audio
and video. http://uima.apache.org/.
4. ACL Anthology: A Digital Archive of Research Papers in Computational
Linguistics. Association for Computational Linguistics, http://www.
aclweb.org/anthology/.
5. Biologically Inspired Cognitive Architectures (BICA) Society. Promotes
and facilitates the transdisciplinary study of biologically inspired cogni-
tive architectures. http://bicasociety.org/.
6. British National Corpus. A 100 million word collection of samples of
written and spoken language designed to represent a wide cross-section
of British English. http://www.natcorp.ox.ac.uk/.
7. The Cognitive Computing Consortium, a forum for researchers, develo-
pers and practitioners of cognitive computing and its allied technologies.
https://cognitivecomputingconsortium.com/.
8. Cognitive Linguistics Journal, a publication of the International Cognitive
Linguistics Association. Publishes linguistic research which addresses
the interaction between language and cognition. http://www.
cognitivelinguistics.org/en/journal.
9. Cognitive Science Society. A professional organization for researchers
whose goal is to understanding the nature of the human mind. http://
www.cognitivesciencesociety.org/.
10. Database Management Systems (DBMS). This Web application provides
a ranking of over 300 software systems for data management. http://
db-engines.com/en/.
11. DBpedia, Towards a Public Data Infrastructure for a Large, Multilingual,
Semantic Knowledge Graph. http://wiki.dbpedia.org/.
12. Europarl, a Parallel Corpus for Statistical Machine Translation. Includes
versions for 21 European languages. http://www.statmt.org/europarl/.
13. eWAVE, an interactive database on morphosyntactic variations in 50 vari-
eties of spontaneous spoken English. Kortmann, Bernd & Lunkenheimer,
Kerstin (eds.), The Electronic World Atlas of Varieties of English, Leipzig:
Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas.org/.
14. GATE, an open source software for solving natural language problems,
The University of Sheffield. http://gate.ac.uk/.
Concepts, Architectures, Systems, and Applications Chapter 1 31
15. IBM Watson Academy. Cognitive computing resources including a public
access Cognitive Computing MOOC. http://www-304.ibm.com/services/
weblectures/watsonacademy/#intro.
16. IBM Research: Cognitive computing. http://www.research.ibm.com/
cognitive-computing/#fbid¼-yy66f-kYIh.
17. International Cognitive Linguistics Association. Promotes approaches to
linguistics research that are based on the perspective that language is an
integral part of cognition. http://www.cognitivelinguistics.org/en.
18. ImageJ, an open source platform for scientific image analysis. http://
imagej.net/Welcome.
19. ImmuNet, data science tools for predicting the role of genes in immunol-
ogy. http://immunet.princeton.edu/.
20. MITCogNet, a research tool for scholars in the Brain & Cognitive
Sciences. http://cognet.mit.edu/.
21. NoSQL. A list of systems for nonrelational data management. http://
nosql-database.org/.
22. NuPIC, a platform and community for machine intelligence based on
HTM theory. http://numenta.org/.
23. OpenCV, an open source computer vision and machine learning software
library. http://opencv.org/.
24. openNLP, a machine learning based toolkit for NLP. he Apache Software
Foundation. http://opennlp.apache.org/.
25. openSMILE, a tool for extracting audio feature spaces in real time. http://
audeering.com/research/opensmile/.
26. Open Source Speech Software, Carnegie Mellon University. http://www.
speech.cs.cmu.edu/.
27. Open Text Summarizer, a library and a command line tool for multilin-
gual text summarization. http://libots.sourceforge.net/.
28. Praat, a tool for speech manipulation, analysis, and synthesis. http://www.
fon.hum.uva.nl/praat/.
29. Parsey McParseface, a pretrained SyntaxNet model for parsing the stan-
dard English language. https://github.com/tensorflow/models/tree/
master/syntaxnet.
30. Project Gutenberg, offers over 50,000 free ebooks. https://www.
gutenberg.org/.
31. The R Project for Statistical Computing. Provides open source tools
for statistical computing and data visualization. https://www.r-project.
org/.
32. The Stanford Encyclopedia of Philosophy, Edward N. Zalta (ed.), http://
plato.stanford.edu/.
33. The Stanford Natural Language Processing Group’s statistical NLP,
deep learning NLP, and rule-based NLP software tools for solving major
computational linguistics problems. http://nlp.stanford.edu/software/
index.shtml.