0% found this document useful (0 votes)
59 views17 pages

Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples

In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media.

Uploaded by

sanjrockz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views17 pages

Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples

In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media.

Uploaded by

sanjrockz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Knowledge will Propel Machine Understanding of

Content: Extrapolating from Current Examples

Amit Sheth, Sujan Perera, Sanjaya Wijeratne, and Krishnaprasad


Thirunarayan

Kno.e.sis Center, Wright State University


Dayton, Ohio, USA
{amit,sujan,sanjaya,tkprasad}@knoesis.org
http://www.knoesis.org
arXiv:1707.05308v1 [cs.AI] 14 Jul 2017

Abstract. Machine Learning has been a big success story during the
AI resurgence. One particular stand out success relates to learning from
a massive amount of data. In spite of early assertions of the unreason-
able effectiveness of data, there is increasing recognition for utilizing
knowledge whenever it is available or can be created purposefully. In
this paper, we discuss the indispensable role of knowledge for deeper
understanding of content where (i) large amounts of training data are
unavailable, (ii) the objects to be recognized are complex, (e.g., implicit
entities and highly subjective content), and (iii) applications need to
use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance
ML/NLP techniques. Using diverse examples, we seek to foretell unprece-
dented progress in our ability for deeper understanding and exploitation
of multimodal data and continued incorporation of knowledge in learning
techniques.

Keywords: Machine Intelligence, Multimodal Exploitation, Understand-


ing Complex Text, Knowledge-enhanced Machine Learning, Knowledge-
enhanced NLP, Knowledge-driven Deep Content Understanding, Person-
alized Digital Health, Semantic-Cognitive-Perceptual Computing, Im-
plicit Entity Recognition, Emoji Sense Disambiguation

1 Introduction
Recent success in the area of Machine Learning (ML) for Natural Language Pro-
cessing (NLP) has been largely credited to the availability of enormous training
datasets and computing power to train complex computational models [12]. Com-
plex NLP tasks such as statistical machine translation and speech recognition
have greatly benefited from the Web-scale unlabeled data that is freely available
for consumption by learning systems such as deep neural nets. However, many
traditional research problems related to NLP, such as part-of-speech tagging and
named entity recognition (NER), require labeled or human-annotated data, but
the creation of such datasets is expensive in terms of the human effort required.
2 Sheth et al.

In spite of early assertion of the unreasonable effectiveness of data (i.e., data


alone is sufficient), there is an increasing recognition for utilizing knowledge to
solve complex AI problems. Even though knowledge base creation and curation
is non-trivial, it can significantly improve result quality, reliability, and coverage.
A number of AI experts, including Yoav Shoham [37], Oren Etzioni, and Pedro
Domingos [8,9], have talked about this in recent years. In fact, codification and
exploitation of declarative knowledge can be both feasible and beneficial in sit-
uations where there is not enough data or adequate methodology to learn the
nuances associated with the concepts and their relationships.
The value of domain/world knowledge in solving complex problems was rec-
ognized much earlier [43]. These efforts were centered around language under-
standing. Hence, the major focus was towards representing linguistic knowledge.
The most popular artifacts of these efforts are FrameNet [29] and WordNet [22],
which were developed by realizing the ideas of frame semantics [11] and lexical-
semantic relations [6], respectively. Both these resources have been used exten-
sively by the NLP research community to understand the semantics of natural
language documents.
The building and utilization of the knowledge bases took a major leap with
the advent of the Semantic Web in the early 2000s. For example, it was the key to
the first patent on Semantic Web and a commercial semantic search/browsing
and personalization engine over 15 years ago [33], where knowledge in multi-
ple domains complemented ML techniques for information extraction (NER,
semantic annotation) and building intelligent applications1 . Major efforts in the
Semantic Web community have produced large, cross-domain (e.g., DBpedia,
Yago, Freebase, Google Knowledge Graph) and domain specific (e.g., Gene On-
tology, MusicBrainz, UMLS) knowledge bases in recent years which have served
as the foundation for the intelligent applications discussed next.
The value of these knowledge bases has been demonstrated for determining
semantic similarity [20,42], question answering [30], ontology alignment [14], and
word sense disambiguation (WSD) [21], as well as major practical AI services,
including Apple’s Siri, Google’s Semantic Search, and IBM’s Watson. For exam-
ple, Siri relies on knowledge extracted from reputed online resources to answer
queries on restaurant searches, movie suggestions, nearby events, etc. In fact,
“question answering”, which is the core competency of Siri, was built by partner-
ing with Semantic Web and Semantic Search service providers who extensively
utilize knowledge bases in their applications2 . The Jeopardy version of IBM
Watson uses semi-structured and structured knowledge bases such as DBpedia,
Yago, and WordNet to strengthen the evidence and answer sources to fuel its
DeepQA architecture [10]. A recent study [19] has shown that Google search
results can be negatively affected when it does not have access to Wikipedia.
Google Semantic Search is fueled by Google Knowledge Graph3 , which is also

1
http://j.mp/15yrsSS
2
https://en.wikipedia.org/wiki/Siri
3
http://bit.ly/22xUjZ6
Knowledge will Propel Machine Understanding of Content 3

used to enrich search results similar to what the Taalee/Semagix semantic search
engine did 15 years ago4 [33,34].
While knowledge bases are used in an auxiliary manner in the above scenar-
ios, we argue that they have a major role to play in understanding real-world
data. Real-world data has a greater complexity that has yet to be fully ap-
preciated and supported by automated systems. This complexity emerges from
various dimensions. Human communication has added many constructs to lan-
guage that help people better organize knowledge and communicate effectively
and concisely. However, current information extraction solutions fall short in
processing several implicit constructs and information that is readily accessible
to humans. One source of such complexity is our ability to express ideas, facts,
and opinions in an implicit manner. For example, the sentence “The patient
showed accumulation of fluid in his extremities, but respirations were unlabored
and there were no use of accessory muscles” refers to the clinical conditions of
“shortness of breath” and “edema”, which would be understood by a clinician.
However, the sentence does not contain names of these clinical conditions –
rather it contains descriptions that imply the two conditions. Current literature
on entity extraction has not paid much attention to implicit entities [28].
Another complexity in real-world scenarios and use cases is data heterogene-
ity due to their multimodal nature. There is an increasing availability of physical
(including sensor/IoT), cyber, and social data that are related to events and ex-
periences of human interest [31]. For example, in our personalized digital health
application for managing asthma in children5 , we use numeric data from sensors
for measuring a patient’s physiology (e.g., exhaled nitric oxide) and immediate
surroundings (e.g., volatile organic compounds, particulate matter, temperature,
humidity), collect data from the Web for the local area (e.g., air quality, pollen,
weather), and extract textual data from social media (i.e., tweets and web forum
data relevant to asthma) [1]. Each of these modalities provides complementary
information that is helpful in evaluating a hypothesis provided by a clinician
and also helps in disease management. We can also relate anomalies in the sen-
sor readings (such as spirometer) to asthma symptoms and potential treatments
(such as taking rescue medication). Thus, understanding a patient’s health and
well-being requires integrating and interpreting multimodal data and gleaning
insights to provide reliable situational awareness and decisions. Knowledge bases
play a critical role in establishing relationships between multiple data streams of
diverse modalities, disease characteristics and treatments, and in transcending
multiple abstraction levels [32]. For instance, we can relate the asthma severity
level of a patient, measured exhaled nitric oxide, relevant environmental triggers,
and prescribed asthma medications to one another to come up with personalized
actionable insights and decisions.
Knowledge bases can come in handy when there is not enough hand-labaled
data for supervised learning. For example, emoji sense disambiguation, which is
the ability to identify the meaning of an emoji in the context of a message in a
4
https://goo.gl/A54hno
5
http://bit.ly/kAsthma
4 Sheth et al.

computational manner [40,41], is a problem that can be solved using supervised


and knowledge-based approaches. However, there is no hand-labeled emoji sense
dataset in existence that can be used to solve this problem using supervised
learning algorithms. One reason for this could be that emoji have only recently
become popular, despite having been first introduced in the late 1990s [40].
We have developed a comprehensive emoji sense knowledge base called Emo-
jiNet [40,41] by automatically extracting emoji senses from open web resources
and integrating them with BabelNet. Using EmojiNet as a sense inventory, we
have demonstrated that the emoji sense disambiguation problem can be solved
with carefully designed knowledge bases, obtaining promising results [41].
In this paper, we argue that careful exploitation of knowledge can greatly
enhance the current ability of (big) data processing. At Kno.e.sis, we have dealt
with several complex situations where:

1. Large quantities of hand-labeled data required for unsupervised (self-taught)


techniques to work well is not available or the annotation effort is significant.
2. The text to be recognized is complex (i.e., beyond simple entity - per-
son/location/organization), requiring novel techniques for dealing with com-
plex/compound entities [27], implicit entities [25,26], and subjectivity (emo-
tions, intention) [13,38].
3. Multimodal data – numeric, textual and image, qualitative and quantitative,
certain and uncertain – are available naturally [1,2,4,39].

Our recent efforts have centered around exploiting different kinds of knowl-
edge bases and using semantic techniques to complement and enhance ML, sta-
tistical techniques, and NLP. Our ideas are inspired by the human brain’s ability
to learn and generalize knowledge from a small amount of data (i.e., humans do
not need to examine tens of thousands of cat faces to recognize the next “unseen”
cat shown to them), analyze situations by simultaneously and synergistically ex-
ploiting multimodal data streams, and understand more complex and nuanced
aspects of content, especially by knowing (through common-sense knowledge)
semantics/identity preserving transformations.

2 Challenges in creating and using knowledge bases


Last decade saw an increasing use of background knowledge for solving diverse
problems. While applications such as searching, browsing, and question answer-
ing can use large, publically available knowledge bases in their current form,
others like movie recommendation, biomedical knowledge discovery, and clinical
data interpretation are challenged by the limitations discussed below.

Lack of organization of knowledge bases: Proper organization of knowledge


bases has not kept pace with their rapid growth, both in terms of variety and
size. Users find it increasingly difficult to find relevant knowledge bases or rele-
vant portions of a large knowledge base for use in domain-specific applications
(e.g., movie, clinical, biomedical). This highlights the need to identify and select
Knowledge will Propel Machine Understanding of Content 5

relevant knowledge bases such as the linked open data cloud, and extract the rel-
evant portion of the knowledge from broad coverage sources such as Wikipedia
and DBpedia. We are working on automatically indexing the domains of the
knowledge bases [17] and exploiting the semantics of the entities and their rela-
tionships to select relevant portions of a knowledge base [18].

Gaps in represented knowledge: The existing knowledge bases can be incom-


plete with respect to a task at hand. For example, applications such as computer
assisted coding (CAC) and clinical document improvement (CDI) require com-
prehensive knowledge about a particular domain (e.g., cardiology, oncology)6 .
We observe that although the existing medical knowledge bases (e.g., Unified
Medical Language System (UMLS)) are rich in taxonomical relationships, they
lack non-taxonomical relationships among clinical entities. We have developed
data-driven algorithms that use real-world clinical data (such as EMRs) to dis-
cover missing relationships between clinical entities in existing knowledge base,
and then get these validated by a domain-expert-in-the-loop [24]. Yet another
challenge is creating personalized knowledge bases for specific tasks. For example,
in [35], personal knowledge graphs are created based on the content consumed
by a user, taking into account the dynamically changing vocabulary, and this is
applied to improve subsequent filtering of relevant content.

Inefficient metadata representation and reasoning techniques: The scope


of what is captured in the knowledge bases is rapidly expanding, and involves
capturing more subtle aspects such as subjectivity (intention, emotions, senti-
ments), spatial and temporal information, and provenance. Traditional triple-
based representation languages developed by Semantic Web community (e.g.,
RDF, OWL) are unsuitable for capturing such metadata due to their limited
expressivity. For example, representation of spatio-temporal context or uncer-
tainty associated with a triple is ad hoc, inefficient, and lacks semantic integra-
tion for formal reasoning. These limitations and requirements are well-recognized
by the Semantic Web community, with some recent promising research to ad-
dress them. For example, the singleton-property based representation [23] adds
ability to make statements about a triple (i.e., to express the context of a triple)
and probabilistic soft logic [15] adds ability to associate the probability value
with a triple and reason over them. It will be really exciting to see applications
exploiting such enhanced hybrid knowledge representation models that perform
‘human-like’ reasoning on them.
Next, we discuss several applications that utilize knowledge bases and multi-
modal data to circumvent or overcoming some of the aforementioned challenges
due to insufficient manually-created knowledge.

Application 1: Emoji sense disambiguation


With the rise of social media, “emoji” have become extremely popular in on-
line communication. People are using emoji as a new language on social media to
6
https://goo.gl/nXDY8x
6 Sheth et al.

Sense Example Sense Example Sense Example


Laugh Can’t stop laughing Pray Pray for my family, god Monkey Got a pet monkey
(noun) (verb) gained an angel today (Noun)

Crying My knee hurts, already in Highfive We did it man! High-fives all Hiding The dog was hiding
(verb) tears (noun) around (verb) behind the door

Hilarious Central Intelligence was Thanks Thank you so much for Blind (verb) I’m blind with no lights
(Adjective) damn hilarious! (noun) taking care of the baby on. Can’t see anything

Fig. 1. Emoji usage in social media with multiple senses.

add color and whimsiness to their messages. Without rigid semantics attached to
them, emoji symbols take on different meanings based on the context of a mes-
sage. This has resulted in ambiguity in emoji use (see Figure 1). Only recently
have there been efforts to mimic NLP techniques used for machine translation,
word sense disambiguation and search into the realm of emoji [41]. The ability to
automatically process, derive meaning, and interpret text fused with emoji will
be essential as society embraces emoji as a standard form of online communica-
tion. Having access to knowledge bases that are specifically designed to capture
emoji meaning can play a vital role in representing, contextually disambiguat-
ing, and converting pictorial forms of emoji into text, thereby leveraging and
generalizing NLP techniques for processing richer medium of communication.
As a step towards building machines that can understand emoji, we have de-
veloped EmojiNet [40,41], the first machine readable sense inventory for emoji.
It links Unicode emoji representations to their English meanings extracted from
the Web, enabling systems to link emoji with their context-specific meanings.
EmojiNet is constructed by integrating multiple emoji resources with BabelNet,
which is the most comprehensive multilingual sense inventory available to-date.
For example, for the emoji ‘face with tears of joy’ , EmojiNet lists 14 differ-
ent senses, ranging from happy to sad. An application designed to disambiguate
emoji senses can use the senses provided by EmojiNet to automatically learn
message contexts where a particular emoji sense could appear. Emoji sense dis-
ambiguation could improve the research on sentiment and emotion analysis. For
example, consider the emoji , which can take the meanings happy and sad
based on the context in which it has been used. Current sentiment analysis
applications do not differentiate among these two meanings when they process
. However, finding the meanings of by emoji sense disambiguation tech-
niques [41] can improve sentiment prediction. Emoji similarity calculation is
another task that could be benefited by knowledge bases and multi-modal data
analysis. Similar to computing similarity between words, we can calculate the
similarity between emoji characters. We have demonstrated how EmojiNet can
be utilized to solve the problem of emoji similarity [42]. Specifically, we have
shown that emoji similarity measures based on the rich emoji meanings avail-
Knowledge will Propel Machine Understanding of Content 7

able in EmojiNet can outperform conventional emoji similarity measures based


on distributional semantic models and also helps to improve applications such
as sentiment analysis [42].

Application 2: Implicit entity linking


As discussed, one of the complexities in data is the ability to express facts,
ideas, and opinions in an implicit manner. As humans, we seamlessly express and
infer implicit information in our daily conversations. Consider the two tweets
“Aren’t we gonna talk about how ridiculous the new space movie with Sandra
Bullock is?” and “I’m striving to be +ve in what I say, so I’ll refrain from making
a comment abt the latest Michael Bay movie”. The first tweet contains an implicit
mention of movie ‘Gravity’ and the second tweet contains an element of sarcasm
and negative sentiment towards the movie ‘Transformers: Age of Extinction’.
Both the sentiment and the movie are implicit in the tweet. While it is possible
to express facts, ideas, and opinions in an implicit manner, for brevity, we will
focus on how knowledge aids in automatic identification of implicitly mentioned
entities in text.
We define implicit entities as “entities mentioned in text where neither its
name nor its synonym/alias/abbreviation or co-reference is explicitly mentioned
in the same text”. Implicit entities are a common occurrence. For example, our
studies found that 21% of movie mentions and 40% of book mentions are implicit
in tweets, and about 35% and 40% of ‘edema’ and ‘shortness of breath’ mentions
are implicit in clinical narratives. There are genuine reasons why people tend to
use implicit mentions in daily conversations. Here are few reasons that we have
observed:

1. To express sentiment and sarcasm : See above examples.


2. To provide descriptive information : For example, it is a common practice
to describe the features of an entity rather than simply list down its name
in clinical narratives. Consider the sentence ‘small fluid adjacent to the gall-
bladder with gallstones which may represent inflammation.’ This sentence
contains implicit mention of the condition cholecystitis (‘inflammation in
gallbladder’ is recognized as cholecystitis) with its possible cause. The extra
information (i.e., possible cause) in description can be critical in understand-
ing the patient’s health status and treating the patient. While it is feasible
to provide these extra information with the corresponding explicit entity
names, it is observed that clinical professionals prefer this style.
3. To emphasize the features of an entity : Sometimes we replace the name
of the entity with its special characteristics in order to give importance to
those characteristics. For example, the text snippet “Mason Evans 12 year
long shoot won big in golden globe” has an implicit mention of the movie
‘Boyhood.’ There is a difference between this text snippet and its alternative
form “Boyhood won big in golden globe.” The speaker is interested in em-
phasizing the distinct feature of the movie, which would have been ignored
if he had used the name of the movie as in the second phrase.
8 Sheth et al.

4. To communicate shared understanding : We do not bother spelling out every-


thing when we know that the other person has enough background knowledge
to understand the message conveyed. A good example is the fact that clin-
ical narratives rarely mention the relationships between entities explicitly
(e.g., relationships between symptoms and disorders, relationships between
medications and disorders), rather it is understood that the other profes-
sionals reading the document have the expertise to understand such implicit
relationships in the document.

Christopher
Sandra Bullock Nolan

Interstellar
Mars Orbiter
Mission
Alfonso Curan

Matt
Damon

Woman in Space

Gravity
The Martian
Astronaut

Legend

Factual Knowledge Contextual Knowledge Entity

Fig. 2. Entity model extracted for three movies.

Whenever we communicate, we assume common understanding or shared-


knowledge with the audience. A reader who does not know that Sandra Bullock
starred in the movie ‘Gravity’ and that it is a space exploration movie would
not be able to decode the reference to the movie ‘Gravity’ in the first example; a
reader who does not know about Michael Bay’s movie release would have no clue
about the movie mentioned in the second tweet; a reader who does not know the
characteristics of the clinical condition ’cholecystitis’ would not be able to decode
its mention in the clinical text snippet shown above; a reader who is not a medi-
cal expert would not be able to connect the diseases and symptoms mentioned in
a clinical narrative. These examples demonstrate the indispensable value
of domain knowledge in text understanding. Unfortunately, state-of-the-
art named entity recognition applications do not capture implicit entities [28].
Also, we have not seen big data-centric or other approaches that can glean im-
plicit entities without the use of background knowledge (that is already available
(e.g., in UMLS) or can be created (e.g., from tweets and Wikipedia)).
The task of recognizing implicit entities in text demands comprehensive and
up-to-date world knowledge. Individuals resort to a diverse set of entity char-
acteristics to make implicit references. For example, references to the movie
Knowledge will Propel Machine Understanding of Content 9

‘Boyhood’ can use phrases like “Richard Linklater movie”, “Ellar Coltrane on
his 12-year movie role”, “12-year long movie shoot”, “latest movie shot in my
city Houston”, and “Mason Evan’s childhood movie”. Hence, it is important to
have comprehensive knowledge about the entities to decode their implicit men-
tions. Another complexity is the temporal relevancy of the knowledge. The same
phrase can be used to refer to different entities at different points in time. For
instance, the phrase “space movie” referred to the movie ‘Gravity’ in Fall 2013,
while the same phrase in Fall 2015 referred to the movie ‘The Martian’. On the
flip side, the most salient characteristics of a movie may change over time and so
will the phrases used to refer to it. In November 2014 the movie ‘Furious 7’ was
frequently referred to with the phrase “Paul Walker’s last movie”. This was due
to the actor’s death around that time. However, after the movie release in April
2015, the same movie was often mentioned through the phrase “fastest film to
reach the $1 billion”.
We have developed knowledge-driven solutions that decode the implicit en-
tity mentions in clinical narratives [25] and tweets [26]. We exploit the publicly
available knowledge bases (only the portions that matches with the domain of
interest) in order to access the required domain knowledge to decode implicitly
mentioned entities. Our solution models individual entities of interest by collect-
ing knowledge about the entities from these publicly available knowledge bases,
which consist of definitions of the entities, other associated concepts, and the
temporal relevance of the associated concepts. Figure 2 shows a snippet from
generated entity model. It shows the models generated for movies ‘Gravity’, ‘In-
terstellar’, and ‘The Martian’. The colored (shaded) nodes (circles) represent
factual knowledge related to these movies extracted from DBpedia knowledge
base and the uncolored nodes represent the contextual knowledge (time-sensitive
knowledge) related to entities extracted from daily communications in Twitter.
The implicit entity linking algorithms are designed to carefully use the knowl-
edge encoded in these models to identify implicit entities in the text.

Application 3: Understanding and analyzing drug abuse related dis-


cussions on web forums
The use of knowledge bases to improve keyword-based search has received
much attention from commercial search engines lately. However, the use of knowl-
edge bases alone cannot solve complex, domain-specific information needs. For
example, answering a complex search query such as “How are drug users over-
dosing on semi synthetic opioid Buprenorphine?” may require a search engine
to be aware of several facts, including that Buprenorphine is a drug, that users
refer to Buprenorphine with synonyms such as ‘bupe’, ‘bupey’, ‘suboxone’, and
‘subbies’, and the prescribed daily dosage range for Buprenorphine. The search
engine should also have access to ontological knowledge as well as other “intelli-
gible constructs” that are not typically modeled in ontologies, such as equivalent
references to the frequency of drug use, the interval of use, and the typical
dosage, to answer such complex search needs. At Kno.e.sis, we have developed
an information retrieval system that integrates ontology-driven query interpreta-
10 Sheth et al.

(a) Drug Abuse Ontology (b)


Subutex Suboxone
Sentiment Extraction
Rule-based subClassOf subClassOf +Ve - feel pretty damn
Ontology Lexicon Lexico-ontology
Grammar Buprenorphine good, feel good
has_slang has_slang -Ve - bad experience,
sucked, didn’t do shit,
bupey bupe bad headache

I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took
all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited
Emotion Drug form 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It
Dosage gave me a bad headache, for hours, and I almost vomited. I could
Entities Intensity Route of
Frequency feel the bupe working but overall the experience sucked.
Triples Pronoun administration
Sideeffect Interval
Sentiment
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting
48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any
rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected
another 1 mg. That was about half an hour ago. I feel great now.

Triples
Ointment, DOSAGE: <AMT> Diverse data types
<UNIT> Codes Triple
Suboxone, Disgusted, Tablet, Pill, (subject-predicate-object)
FREQ: <AMT> ENTITIES
Kratom, Amazed, Smoke, Inject,
<FREQ_IND> Suboxone used by injection, Suboxone injection-dosage
Heroin, Irritated, More Snort, Sniff, DOSAGE PRONOUN
<PERIOD> amount amount-2mg
Suboxone-CAU than, Few of, I, Itching, INTERVAL: ROUTE OF Suboxone used by injection, Suboxone injection-has_side_
INTERVAL
SE-Cephalalgia me, mine, my Blisters, <PERIOD_IND> ADMIN. positive experience effect-Euphoria
Shaking hands <PERIOD> RELATIONSHIPS SENTIMENT

Fig. 3. (a) Use of background knowledge to enhance information extraction of diverse


types of information. (b) Example use of diverse knowledge and information extraction
for deeper and more comprehensive understanding of text in health and drug abuse
domain. See [5] for more information.

tion with synonym-based query expansion and domain-specific rules to facilitate


analysis of online web forums for drug abuse-related information extraction. Our
system is based on a context-free grammar (CFG) that defines the interpreta-
tion of the query language constructs used to search for the drug abuse-related
information needs and a domain-specific knowledge base that can be used to
understand information in drug-related web forum posts. Our tool utilizes lexi-
cal, lexico-ontological, ontological, and rule-based knowledge to understand the
information needs behind complex search queries and uses that information to
expand the queries for significantly higher recall and precision (see Figure 3) [5].
This research [7] resulted in an unexpected finding of abuse of over the counter
drug, which led to a FDA warning7 .

Application 4: Understanding city traffic using sensor and textual ob-


servations
With increased urbanization, understanding and controlling city traffic flow
has become an important problem. Currently, there are over 1 billion cars on the
road network, and there has been a 236% increase in vehicular traffic from 1981
to 2001 [2]. Given that road traffic is predicted to double by 2020, achieving zero
traffic fatalities and reducing traffic delays are becoming pressing challenges,
requiring deeper understanding of traffic events, and their consequences and in-
teraction with traffic flow. Sensors deployed on road networks continuously relay
important information about travel speed through certain road networks while
citizen sensors (i.e., humans) share real-time information about traffic/road con-
ditions on public social media streams such as Twitter. As humans, we know how
7
http://bit.ly/k-FDA
Knowledge will Propel Machine Understanding of Content 11

to integrate information from these multimodal data sources: qualitative traf-


fic event information to account for quantitative measured traffic flow (e.g., an
accident reported in tweets can explain a slow-moving traffic nearby). However,
current research on understanding city traffic dynamics either focuses only on
sensory data or only on social media data but not both. Further, we use his-
torical data to understand traffic patterns and exploit the complementary and
corroborative nature of these multimodal data sources to provide comprehensive
information about traffic.
One research direction is to create and materialize statistical domain knowl-
edge about traffic into a machine-readable format. In other words, we want
to define and establish associations between different variables (concepts) in
the traffic domain (e.g., association between ‘bad weather’ and a ‘traffic jam’).
However, mining such correlations from data alone is neither complete nor reli-
able. We have developed statistical techniques based on probabilistic graphical
models (PGMs) [16] to learn the structure (variable dependencies), leverage
declarative domain knowledge to enrich and/or correct the gleaned structure
due to limitations of a data-driven approach, and finally learn parameters for
the updated structural model. Specifically, we use the sensor data collected by
511.org to develop an initial PGM that explains the conditional dependencies
between variables in the traffic domain. Then we use declarative knowledge in
ConceptNet to add/modify variables (nodes) and the type and the nature of
conditional dependencies (directed edges) before learning parameters, thereby
obtaining the complete PGM. Figure 4(a)(i) shows a snippet of ConceptNet and
Figure 4(a)(ii) demonstrates the enrichment step of the developed model using
the domain knowledge in ConceptNet [3].
Another research direction is to characterize a normal traffic pattern derived
from sensor observations and then detect and explain any anomalies using social
media data. We used a Restricted Switching Linear Dynamical System (RSLDS)
to model normal speed and travel time dynamics and detect anomalies. Using
speed and travel time data from each link, plus our common sense knowledge
about the nature of expected traffic variations, we learn the parameters of the
RSLDS model for each link. We then use a box-plot of the log likelihood scores of
the various average speed traces with respect to the RSLDS model to learn and
characterize anomalies for each link in the San Francisco Bay Area traffic data [2].
Later, given a new traffic speed trace over a link, we can obtain its log likelihood
score with respect to the RSLDS model for the particular day of the week and the
hour of the day, to determine whether it is normal or anomalous. This anomalous
traffic speed information is further correlated with traffic events extracted from
Twitter data (using crawlers seeded with OSM, 511.org and Scribe vocabularies)
using their spatio-temporal context to explain the anomalies. Figure 4(b) demon-
strates this process. This example again demonstrates the vital role of
multi-modal data for better interpretation of traffic dynamics, synthe-
sizing probabilistic/statistical knowledge, and the application of both
statistical models such as RSLDS and complementary semantic anal-
ysis of Twitter data. Further exploration of different approaches to represent
12 Sheth et al.

Frequency of Extracted Events


Occurrence
scheduled
event
causes
causes
Overturned Truck

Domain knowledge of traffic flow


synthesized as PGM from sensor data

(Step 3) (Step 2) (Step 1)


Bad
weather

Frequency Frequency
of of Frequency
Occurrence Occurrence of
Occurrence

Fig. 4. (a)(i) Domain knowledge of traffic in the form of concepts and relationships
(mostly causal) from the ConceptNet (a)(ii) Probabilistic graphical model (PGM) that
explains the conditional dependencies between variables in traffic domain (only a por-
tion is shown in the picture) is enriched by adding the missing random variables, links,
and link directions extracted from ConceptNet. Figure 4(b) shows how this enriched
PGM is used to correlate contextually related data of different modalities [3].

and exploit semantics appear in [36]. Table 1 summarizes the role of knowledge
bases in the four applications discussed above.

3 Looking forward
We discussed the importance of domain/world knowledge in understanding com-
plex data in the real world, particularly when large amounts of training data are
not readily available or it is expensive to generate. We demonstrated several ap-
plications where knowledge plays an indispensable role in understanding complex
language constructs and multimodal data. Specifically, we have demonstrated
how knowledge can be created to incorporate a new medium of communication
(such as emoji), curated knowledge can be adapted to process implicit refer-
ences (such as in implicit entity and relation linking), statistical knowledge can
be synthesized in terms of normalcy and anomaly and integrated with textual
information (such as in traffic context), and linguistic knowledge can be used
for more expressive querying of informal text with improved recall (such as in
drug related posts). We are also seeing early efforts in making knowledge bases
dynamic and evolve to account for the changes in the real world8 .
Knowledge seems to play a central role in human learning and intelligence,
such as in learning from a small amount of data, and in cognition – especially
perception. Our ability to create or deploy just the right knowledge in our com-
puting processes will improve machine intelligence, perhaps in a similar way as
8
http://bit.ly/2cVGbov
Knowledge will Propel Machine Understanding of Content 13

Table 1. Summary of knowledge-based approaches and the resulting improvements for


each problem domain.

Problem Domain Use of Knowledge bases Nature of Improvement


Emoji Similarity and Generation and application of Leveraging linguistic knowledge
Sense Disambiguation EmojiNet for emoji interpretation
Implicit Entity Linking Adapted UMLS definitions for Recall and coverage
identifying medical entities, and
Wikipedia and Twitter data for
identifying Twitter entities
Understanding Drug Application of Drug Abuse Recall and coverage
Abuse-related Ontology along with slang term
Discussions dictionaries and grammar
Traffic Data Analysis Statistical knowledge extraction Anomaly detection and
and using ontologies for Twitter explanation; Multi-modal data
event extraction stream correlation

knowledge has played a central role in human intelligence. As a corollary to this,


two specific advances we expect are: a deeper and nuanced understanding of
content (including but not limited to text) and our ability to process and learn
from multimodal data at a semantic level (given that concepts manifest very
differently at the data level in different media or modalities). The human brain
is extremely adept at processing multimodal data – our senses are capable of
receiving 11 million bits per second, and our brain is able to distill that into ab-
stractions that need only a few tens of bits to represent (for further explorations,
see [32]). Knowledge plays a central role in this abstraction and reasoning process
known as the perception cycle.
Knowledge-driven processing can be viewed from three increasingly sophisti-
cated computational approaches: (1) Semantic Computing, (2) Cognitive Com-
puting, and (3) Perceptual Computing. Semantic Computing refers to computing
the type of a data value, and relating it to other domain concepts. In the health-
care context, this can involve relating symptoms to diseases and treatments.
Ontologies, and Semantic Web technologies provide the foundation for semantic
computing. Cognitive computing refers to representation and reasoning with data
using background knowledge reflecting how humans interpret and process data.
In the healthcare context, this requires capturing the experience and domain
expertise of doctors through knowledge bases and heuristic rules for abstract-
ing multimodal data into medically relevant abstractions, insights, and actions,
taking into account triggers, personal data, patient health history, demographics
data, health objectives, and medical domain knowledge. For instance, “normal”
blood pressure varies with factors such as age, gender, emotional state, activity,
and illness; similarly, the “target” blood pressure, HBA1C, and cholesterol values
a patient is advised to maintain depend on whether the patient is diabetic or not.
In the traffic context, this can be used to interpret and label a time-series of traf-
fic sensor data using a traffic event ontology. Perceptual computing, which builds
on background knowledge created for semantic and cognitive computing, uses
14 Sheth et al.

deductive reasoning to predict effects and treatments from causes, and abductive
reasoning to explain the effects using causes, resolving any data incompleteness
or ambiguity by seeking additional data. The knowledge itself can be a hybrid
of deterministic and probabilistic rules, modeling both normalcy and anomalies,
transcending abstraction levels. This directly contributes to making decisions
and taking actions.

PC
Explaining Sensor Data Time
Heath Condition Insights series and Anomaly using
Human Expert Abstractions for Decision and Treatments Twitter Events; Predict Traffic
making, Predictions & Actions (Asthma is Moderately Flow in a City
Controlled; Take Inhaled (E.g., 10 mph Speed on Right
CC Corticosteroid Lane due to Overturned Truck
Personalization & Regularly) in Left Lane)
Demographics Data Contextualization using
Experiences
Knowledge bases and Declarative
Health Signal Abstractions Knowledge-guided
Unstructured Data Learning from Historical (Low Activity, High Pollen, Piecewise Linear
Disturbed Sleep,
Data High Cough)
Approximation of
Normalcy and
Historical Data SC Anomaly Detection
(E.g., Personal, Traffic Data)
Annotated Data
Normalcy Traffic
Physiological and Models Events
Environmental
Ontologies Observations
SCRIBE
(# of Steps, # of Coughs, RSLDS 511.Org
AQI, FEVI) OSM

Raw Data 511.org Twitter


Health Data Data Data

Example 1 Example 2

Fig. 5. Interplay between Semantic, Cognitive, and Perceptual Computing (SC, CC


and PC) with Examples.

We expect more progress in hybrid knowledge representation and reasoning


techniques to better fit domain characteristics and applications. Even though
deep learning techniques have made incredible progress in machine learning and
prediction tasks, they are still uninterpretable and prone to devious attacks.
There are anecdotal examples of misinterpretations of audio and video data
through adversarial attacks that can result in egregious errors with serious neg-
ative consequences. In such scenarios, we expect hybrid knowledge bases to pro-
vide a complementary foundation for reliable reasoning. In the medical domain,
the use of interleaved abductive and deductive reasoning (a.k.a., perception cy-
cle) can provide actionable insights ranging from determining confirmatory lab-
oratory tests and disease diagnosis to treatment decisions. Declarative medical
knowledge bases can be used to verify the consistency of an EMR and data-driven
techniques can be applied to a collection of EMRs to determine and fix potential
gaps in the knowledge bases. Thus, there is a symbiotic relationship between
the application of knowledge and data to improve the reliability of each other.
The traffic scenario shows how to hybridize complementary statistical knowledge
Knowledge will Propel Machine Understanding of Content 15

and declarative knowledge to obtain an enriched representation (See also [36]).


It also shows how multimodal data streams can be integrated to provide more
comprehensive situational awareness.
Machine intelligence has been the holy grail of a lot of AI research lately.
The statistical pattern matching approach and learning from big data, typically
of a single modality, has seen tremendous success. For those of us who have pur-
sued brain-inspired computing approaches, we think the time has come for rapid
progress using a model-building approach. The ability to build broad models
(both in terms of coverage as well as variety – not only with entities and re-
lationships but also representing emotions, intentions and subjectivity features,
such as, linguistic, cultural, and other aspects of human interest and functions)
will be critical. Further, domain-specific, purpose-specific, personalized declar-
ative knowledge combined with richer representation – especially probabilistic
graph models – will see rapid progress. These will complement neural network
approaches. We may also see knowledge playing a significant role in enhancing
deep learning. Rather than the dominance of data-centric approaches, we will
see an interleaving and interplay of the data and knowledge tracks, each with its
own strengths and weaknesses, and their combinations performing better than
the parts in isolation.

Acknowledgments
We acknowledge partial support from the National Institutes of Health (NIH)
award: 1R01HD087132-01: “kHealth: Semantic Multisensory Mobile Approach to
Personalized Asthma Care” and the National Science Foundation (NSF) award:
EAR 1520870: “Hazards SEES: Social and Physical Sensing Enabled Decision
Support for Disaster Management and Response”. Points of view or opinions
in this document are those of the authors and do not necessarily represent the
official position or policies of the NIH or NSF.

References
1. Anantharam, P., Banerjee, T., Sheth, A., Thirunarayan, K., Marupudi, S., Srid-
haran, V., Forbis, S.G.: Knowledge-driven personalized contextual mhealth service
for asthma management in children. In: 2015 IEEE Intl. Conf. on Mobile Services
(IEEE MS) (2015)
2. Anantharam, P., Thirunarayan, K., Marupudi, S., Sheth, A., Banerjee, T.: Under-
standing city traffic dynamics utilizing sensor and textual observations. In: Proc. of
The 13th AAAI Conf. on Artificial Intelligence (AAAI), February 12–17, Phoenix,
Arizona, USA (2016)
3. Anantharam, P., Thirunarayan, K., Sheth, A.P.: Traffic analytics using probabilis-
tic graphical models enhanced with knowledge bases. Analytics for Cyber Physical
Systems workshop at the SIAM Conf. on Data Mining (ACS) (2013)
4. Balasuriya, L., Wijeratne, S., Doran, D., Sheth, A.: Finding street gang members
on twitter. In: The 2016 IEEE/ACM Intl. Conf. on Advances in Social Networks
Analysis and Mining (ASONAM). vol. 8, pp. 685–692 (August 2016)
16 Sheth et al.

5. Cameron, D., Sheth, A., Jaykumar, N., Thirunarayan, K., Anand, G., Smith, G.A.:
A hybrid approach to finding relevant social media content for complex domain
specific information needs. Web Semantics: Science, Services and Agents on the
World Wide Web 29 (2014)
6. Cruse, D.A.: Lexical semantics. Cambridge University Press (1986)
7. Daniulaityte, R., Carlson, R., Falck, R., Cameron, D., Perera, S., Chen, L., Sheth,
A.: “i just wanted to tell you that loperamide will work”: a web-based study of
extra-medical use of loperamide. Drug and alcohol dependence 130(1), 241–244
(2013)
8. Domingos, P.: A few useful things to know about machine learning. Communica-
tions of the ACM 55 (2012)
9. Domingos, P.: The master algorithm: How the quest for the ultimate learning
machine will remake our world. Basic Books (2015)
10. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A.,
Lally, A., Murdock, J.W., Nyberg, E., Prager, J., et al.: Building watson: An
overview of the deepqa project. AI magazine 31(3) (2010)
11. Fillmore, C.J.: Frame semantics and the nature of language. Annals of the New
York Academy of Sciences 280(1) (1976)
12. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE
Intelligent Systems 24 (2009)
13. Jadhav, A.: Knowledge Driven Search Intent Mining. Ph.D. thesis, Wright State
University (2016)
14. Jain, P., Hitzler, P., Sheth, A.P., Verma, K., Yeh, P.Z.: Ontology alignment for
linked open data. In: International Semantic Web Conference (ISWC). pp. 402–
417 (2010)
15. Kimmig, A., Bach, S., Broecheler, M., Huang, B., Getoor, L.: A short introduc-
tion to probabilistic soft logic. In: Proc. of the NIPS Workshop on Probabilistic
Programming: Foundations and Applications (2012)
16. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques.
MIT press (2009)
17. Lalithsena, S., Hitzler, P., Sheth, A., Jain, P.: Automatic domain identification for
linked open data. In: IEEE/WIC/ACM Intl. Joint Conf. on Web Intelligence and
Intelligent Agent Technologies (WI). vol. 1 (2013)
18. Lalithsena, S., Kapanipathi, P., Sheth, A.: Harnessing relationships for domain-
specific subgraph extraction: A recommendation use case. In: 2016 IEEE Intl. Conf.
on Big Data (Big Data). pp. 706–715 (2016)
19. McMahon, C., Johnson, I., Hecht, B.: The substantial interdependence of wikipedia
and google: A case study on the relationship between peer production communities
and information technologies. In: 11th Intl. AAAI Conf. on Web and Social Media
(ICWSM). pp. 142–151. Montreal, Canada (May 2017)
20. Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in wordnet.
International Journal of Hybrid Information Technology 6(1) (2013)
21. Mihalcea, R.: Knowledge-based methods for wsd. Word Sense Disambiguation:
Algorithms and Applications (2006)
22. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to
wordnet: An on-line lexical database. International Journal of Lexicography 3(4)
(1990)
23. Nguyen, V., Bodenreider, O., Sheth, A.: Don’t like rdf reification?: making state-
ments about statements using singleton property. In: Proc. of the 23rd Intl. Conf.
on World Wide Web (WWW). pp. 759–770. Seoul, Korea (2014)
Knowledge will Propel Machine Understanding of Content 17

24. Perera, S., Henson, C., Thirunarayan, K., Sheth, A., Nair, S.: Semantics driven
approach for knowledge acquisition from emrs. IEEE Journal of BHI 18(2) (2014)
25. Perera, S., Mendes, P., Sheth, A., Thirunarayan, K., Alex, A., Heid, C., Mott, G.:
Implicit entity recognition in clinical documents. In: Proc. of the 4th Joint Conf.
on Lexical and Computational Semantics (*SEM). pp. 228–238 (2015)
26. Perera, S., Mendes, P.N., Alex, A., Sheth, A., Thirunarayan, K.: Implicit entity
linking in tweets. In: Extended Semantic Web Conference (ESWC). pp. 118–132.
Greece (2016)
27. Ramakrishnan, C., Mendes, P.N., da Gama, R.A., Ferreira, G.C., Sheth, A.: Joint
extraction of compound entities and relationships from biomedical literature. In:
Proc. of the 2008 IEEE/WIC/ACM Intl. Conf. on Web Intelligence and Intelligent
Agent Technology (WI). pp. 398–401. Sydney, Australia (2008)
28. Rizzo, G., Basave, A.E.C., Pereira, B., Varga, A., Rowe, M., Stankovic, M., Dadzie,
A.: Making sense of microposts (#microposts2015) named entity recognition and
linking challenge. In: #MSM (2015)
29. Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R., Scheffczyk, J.:
Framenet ii: Extended theory and practice (2006)
30. Shekarpour, S., Ngonga Ngomo, A.C., Auer, S.: Question answering on interlinked
data. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW). pp. 1145–
1156. Rio de Janeiro, Brazil (2013)
31. Sheth, A., Anantharam, P., Henson, C.: Physical-cyber-social computing: An early
21st century approach. IEEE Intelligent Systems 28(1) (2013)
32. Sheth, A., Anantharam, P., Henson, C.: Semantic, cognitive, and perceptual com-
puting: Paradigms that shape human experience. Computer 49(3) (2016)
33. Sheth, A., Avant, D., Bertram, C.: System and method for creating a semantic web
and its applications in browsing, searching, profiling, personalization and advertis-
ing (Oct 30 2001), uS Patent 6,311,194
34. Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing
semantic content for the web. IEEE Internet Computing 6(4), 80–87 (2002)
35. Sheth, A., Kapanipathi, P.: Semantic filtering for social data. IEEE Internet Com-
puting 20(4) (2016)
36. Sheth, A., Ramakrishnan, C., Thomas, C.: Semantics for the semantic web: The
implicit, the formal and the powerful. International Journal on Semantic Web and
Information Systems (IJSWIS) 1(1), 1–18 (2005)
37. Shoham, Y.: Why knowledge representation matters. Communications of the ACM
59(1) (2015)
38. Wang, W.: Automatic Emotion Identification from Text. Ph.D. thesis, Wright State
University (2015)
39. Wijeratne, S., Balasuriya, L., Doran, D., Sheth, A.: Word embeddings to enhance
twitter gang member profile identification. In: IJCAI Workshop on Semantic Ma-
chine Learning (SML). pp. 18–24. New York City (07 2016)
40. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: Building a ma-
chine readable sense inventory for emoji. In: 8th Intl. Conf. on Social Informatics
(SocInfo). pp. 527–541. Bellevue, WA, USA (November 2016)
41. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: An open service and
api for emoji sense discovery. In: 11th Intl. AAAI Conf. on Web and Social Media
(ICWSM). pp. 437–446. Montreal, Canada (May 2017)
42. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: A semantics-based measure of
emoji similarity. In: 2017 IEEE/WIC/ACM Intl. Conf. on Web Intelligence (WI).
Leipzig, Germany (August 2017)
43. Winograd, T.: Understanding natural language. Cognitive psychology 3(1) (1972)

You might also like