Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples
Knowledge Will Propel Machine Understanding of Content: Extrapolating From Current Examples
Abstract. Machine Learning has been a big success story during the
AI resurgence. One particular stand out success relates to learning from
a massive amount of data. In spite of early assertions of the unreason-
able effectiveness of data, there is increasing recognition for utilizing
knowledge whenever it is available or can be created purposefully. In
this paper, we discuss the indispensable role of knowledge for deeper
understanding of content where (i) large amounts of training data are
unavailable, (ii) the objects to be recognized are complex, (e.g., implicit
entities and highly subjective content), and (iii) applications need to
use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance
ML/NLP techniques. Using diverse examples, we seek to foretell unprece-
dented progress in our ability for deeper understanding and exploitation
of multimodal data and continued incorporation of knowledge in learning
techniques.
1 Introduction
Recent success in the area of Machine Learning (ML) for Natural Language Pro-
cessing (NLP) has been largely credited to the availability of enormous training
datasets and computing power to train complex computational models [12]. Com-
plex NLP tasks such as statistical machine translation and speech recognition
have greatly benefited from the Web-scale unlabeled data that is freely available
for consumption by learning systems such as deep neural nets. However, many
traditional research problems related to NLP, such as part-of-speech tagging and
named entity recognition (NER), require labeled or human-annotated data, but
the creation of such datasets is expensive in terms of the human effort required.
2 Sheth et al.
1
http://j.mp/15yrsSS
2
https://en.wikipedia.org/wiki/Siri
3
http://bit.ly/22xUjZ6
Knowledge will Propel Machine Understanding of Content 3
used to enrich search results similar to what the Taalee/Semagix semantic search
engine did 15 years ago4 [33,34].
While knowledge bases are used in an auxiliary manner in the above scenar-
ios, we argue that they have a major role to play in understanding real-world
data. Real-world data has a greater complexity that has yet to be fully ap-
preciated and supported by automated systems. This complexity emerges from
various dimensions. Human communication has added many constructs to lan-
guage that help people better organize knowledge and communicate effectively
and concisely. However, current information extraction solutions fall short in
processing several implicit constructs and information that is readily accessible
to humans. One source of such complexity is our ability to express ideas, facts,
and opinions in an implicit manner. For example, the sentence “The patient
showed accumulation of fluid in his extremities, but respirations were unlabored
and there were no use of accessory muscles” refers to the clinical conditions of
“shortness of breath” and “edema”, which would be understood by a clinician.
However, the sentence does not contain names of these clinical conditions –
rather it contains descriptions that imply the two conditions. Current literature
on entity extraction has not paid much attention to implicit entities [28].
Another complexity in real-world scenarios and use cases is data heterogene-
ity due to their multimodal nature. There is an increasing availability of physical
(including sensor/IoT), cyber, and social data that are related to events and ex-
periences of human interest [31]. For example, in our personalized digital health
application for managing asthma in children5 , we use numeric data from sensors
for measuring a patient’s physiology (e.g., exhaled nitric oxide) and immediate
surroundings (e.g., volatile organic compounds, particulate matter, temperature,
humidity), collect data from the Web for the local area (e.g., air quality, pollen,
weather), and extract textual data from social media (i.e., tweets and web forum
data relevant to asthma) [1]. Each of these modalities provides complementary
information that is helpful in evaluating a hypothesis provided by a clinician
and also helps in disease management. We can also relate anomalies in the sen-
sor readings (such as spirometer) to asthma symptoms and potential treatments
(such as taking rescue medication). Thus, understanding a patient’s health and
well-being requires integrating and interpreting multimodal data and gleaning
insights to provide reliable situational awareness and decisions. Knowledge bases
play a critical role in establishing relationships between multiple data streams of
diverse modalities, disease characteristics and treatments, and in transcending
multiple abstraction levels [32]. For instance, we can relate the asthma severity
level of a patient, measured exhaled nitric oxide, relevant environmental triggers,
and prescribed asthma medications to one another to come up with personalized
actionable insights and decisions.
Knowledge bases can come in handy when there is not enough hand-labaled
data for supervised learning. For example, emoji sense disambiguation, which is
the ability to identify the meaning of an emoji in the context of a message in a
4
https://goo.gl/A54hno
5
http://bit.ly/kAsthma
4 Sheth et al.
Our recent efforts have centered around exploiting different kinds of knowl-
edge bases and using semantic techniques to complement and enhance ML, sta-
tistical techniques, and NLP. Our ideas are inspired by the human brain’s ability
to learn and generalize knowledge from a small amount of data (i.e., humans do
not need to examine tens of thousands of cat faces to recognize the next “unseen”
cat shown to them), analyze situations by simultaneously and synergistically ex-
ploiting multimodal data streams, and understand more complex and nuanced
aspects of content, especially by knowing (through common-sense knowledge)
semantics/identity preserving transformations.
relevant knowledge bases such as the linked open data cloud, and extract the rel-
evant portion of the knowledge from broad coverage sources such as Wikipedia
and DBpedia. We are working on automatically indexing the domains of the
knowledge bases [17] and exploiting the semantics of the entities and their rela-
tionships to select relevant portions of a knowledge base [18].
Crying My knee hurts, already in Highfive We did it man! High-fives all Hiding The dog was hiding
(verb) tears (noun) around (verb) behind the door
Hilarious Central Intelligence was Thanks Thank you so much for Blind (verb) I’m blind with no lights
(Adjective) damn hilarious! (noun) taking care of the baby on. Can’t see anything
add color and whimsiness to their messages. Without rigid semantics attached to
them, emoji symbols take on different meanings based on the context of a mes-
sage. This has resulted in ambiguity in emoji use (see Figure 1). Only recently
have there been efforts to mimic NLP techniques used for machine translation,
word sense disambiguation and search into the realm of emoji [41]. The ability to
automatically process, derive meaning, and interpret text fused with emoji will
be essential as society embraces emoji as a standard form of online communica-
tion. Having access to knowledge bases that are specifically designed to capture
emoji meaning can play a vital role in representing, contextually disambiguat-
ing, and converting pictorial forms of emoji into text, thereby leveraging and
generalizing NLP techniques for processing richer medium of communication.
As a step towards building machines that can understand emoji, we have de-
veloped EmojiNet [40,41], the first machine readable sense inventory for emoji.
It links Unicode emoji representations to their English meanings extracted from
the Web, enabling systems to link emoji with their context-specific meanings.
EmojiNet is constructed by integrating multiple emoji resources with BabelNet,
which is the most comprehensive multilingual sense inventory available to-date.
For example, for the emoji ‘face with tears of joy’ , EmojiNet lists 14 differ-
ent senses, ranging from happy to sad. An application designed to disambiguate
emoji senses can use the senses provided by EmojiNet to automatically learn
message contexts where a particular emoji sense could appear. Emoji sense dis-
ambiguation could improve the research on sentiment and emotion analysis. For
example, consider the emoji , which can take the meanings happy and sad
based on the context in which it has been used. Current sentiment analysis
applications do not differentiate among these two meanings when they process
. However, finding the meanings of by emoji sense disambiguation tech-
niques [41] can improve sentiment prediction. Emoji similarity calculation is
another task that could be benefited by knowledge bases and multi-modal data
analysis. Similar to computing similarity between words, we can calculate the
similarity between emoji characters. We have demonstrated how EmojiNet can
be utilized to solve the problem of emoji similarity [42]. Specifically, we have
shown that emoji similarity measures based on the rich emoji meanings avail-
Knowledge will Propel Machine Understanding of Content 7
Christopher
Sandra Bullock Nolan
Interstellar
Mars Orbiter
Mission
Alfonso Curan
Matt
Damon
Woman in Space
Gravity
The Martian
Astronaut
Legend
‘Boyhood’ can use phrases like “Richard Linklater movie”, “Ellar Coltrane on
his 12-year movie role”, “12-year long movie shoot”, “latest movie shot in my
city Houston”, and “Mason Evan’s childhood movie”. Hence, it is important to
have comprehensive knowledge about the entities to decode their implicit men-
tions. Another complexity is the temporal relevancy of the knowledge. The same
phrase can be used to refer to different entities at different points in time. For
instance, the phrase “space movie” referred to the movie ‘Gravity’ in Fall 2013,
while the same phrase in Fall 2015 referred to the movie ‘The Martian’. On the
flip side, the most salient characteristics of a movie may change over time and so
will the phrases used to refer to it. In November 2014 the movie ‘Furious 7’ was
frequently referred to with the phrase “Paul Walker’s last movie”. This was due
to the actor’s death around that time. However, after the movie release in April
2015, the same movie was often mentioned through the phrase “fastest film to
reach the $1 billion”.
We have developed knowledge-driven solutions that decode the implicit en-
tity mentions in clinical narratives [25] and tweets [26]. We exploit the publicly
available knowledge bases (only the portions that matches with the domain of
interest) in order to access the required domain knowledge to decode implicitly
mentioned entities. Our solution models individual entities of interest by collect-
ing knowledge about the entities from these publicly available knowledge bases,
which consist of definitions of the entities, other associated concepts, and the
temporal relevance of the associated concepts. Figure 2 shows a snippet from
generated entity model. It shows the models generated for movies ‘Gravity’, ‘In-
terstellar’, and ‘The Martian’. The colored (shaded) nodes (circles) represent
factual knowledge related to these movies extracted from DBpedia knowledge
base and the uncolored nodes represent the contextual knowledge (time-sensitive
knowledge) related to entities extracted from daily communications in Twitter.
The implicit entity linking algorithms are designed to carefully use the knowl-
edge encoded in these models to identify implicit entities in the text.
I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took
all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited
Emotion Drug form 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It
Dosage gave me a bad headache, for hours, and I almost vomited. I could
Entities Intensity Route of
Frequency feel the bupe working but overall the experience sucked.
Triples Pronoun administration
Sideeffect Interval
Sentiment
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting
48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any
rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected
another 1 mg. That was about half an hour ago. I feel great now.
Triples
Ointment, DOSAGE: <AMT> Diverse data types
<UNIT> Codes Triple
Suboxone, Disgusted, Tablet, Pill, (subject-predicate-object)
FREQ: <AMT> ENTITIES
Kratom, Amazed, Smoke, Inject,
<FREQ_IND> Suboxone used by injection, Suboxone injection-dosage
Heroin, Irritated, More Snort, Sniff, DOSAGE PRONOUN
<PERIOD> amount amount-2mg
Suboxone-CAU than, Few of, I, Itching, INTERVAL: ROUTE OF Suboxone used by injection, Suboxone injection-has_side_
INTERVAL
SE-Cephalalgia me, mine, my Blisters, <PERIOD_IND> ADMIN. positive experience effect-Euphoria
Shaking hands <PERIOD> RELATIONSHIPS SENTIMENT
Frequency Frequency
of of Frequency
Occurrence Occurrence of
Occurrence
Fig. 4. (a)(i) Domain knowledge of traffic in the form of concepts and relationships
(mostly causal) from the ConceptNet (a)(ii) Probabilistic graphical model (PGM) that
explains the conditional dependencies between variables in traffic domain (only a por-
tion is shown in the picture) is enriched by adding the missing random variables, links,
and link directions extracted from ConceptNet. Figure 4(b) shows how this enriched
PGM is used to correlate contextually related data of different modalities [3].
and exploit semantics appear in [36]. Table 1 summarizes the role of knowledge
bases in the four applications discussed above.
3 Looking forward
We discussed the importance of domain/world knowledge in understanding com-
plex data in the real world, particularly when large amounts of training data are
not readily available or it is expensive to generate. We demonstrated several ap-
plications where knowledge plays an indispensable role in understanding complex
language constructs and multimodal data. Specifically, we have demonstrated
how knowledge can be created to incorporate a new medium of communication
(such as emoji), curated knowledge can be adapted to process implicit refer-
ences (such as in implicit entity and relation linking), statistical knowledge can
be synthesized in terms of normalcy and anomaly and integrated with textual
information (such as in traffic context), and linguistic knowledge can be used
for more expressive querying of informal text with improved recall (such as in
drug related posts). We are also seeing early efforts in making knowledge bases
dynamic and evolve to account for the changes in the real world8 .
Knowledge seems to play a central role in human learning and intelligence,
such as in learning from a small amount of data, and in cognition – especially
perception. Our ability to create or deploy just the right knowledge in our com-
puting processes will improve machine intelligence, perhaps in a similar way as
8
http://bit.ly/2cVGbov
Knowledge will Propel Machine Understanding of Content 13
deductive reasoning to predict effects and treatments from causes, and abductive
reasoning to explain the effects using causes, resolving any data incompleteness
or ambiguity by seeking additional data. The knowledge itself can be a hybrid
of deterministic and probabilistic rules, modeling both normalcy and anomalies,
transcending abstraction levels. This directly contributes to making decisions
and taking actions.
PC
Explaining Sensor Data Time
Heath Condition Insights series and Anomaly using
Human Expert Abstractions for Decision and Treatments Twitter Events; Predict Traffic
making, Predictions & Actions (Asthma is Moderately Flow in a City
Controlled; Take Inhaled (E.g., 10 mph Speed on Right
CC Corticosteroid Lane due to Overturned Truck
Personalization & Regularly) in Left Lane)
Demographics Data Contextualization using
Experiences
Knowledge bases and Declarative
Health Signal Abstractions Knowledge-guided
Unstructured Data Learning from Historical (Low Activity, High Pollen, Piecewise Linear
Disturbed Sleep,
Data High Cough)
Approximation of
Normalcy and
Historical Data SC Anomaly Detection
(E.g., Personal, Traffic Data)
Annotated Data
Normalcy Traffic
Physiological and Models Events
Environmental
Ontologies Observations
SCRIBE
(# of Steps, # of Coughs, RSLDS 511.Org
AQI, FEVI) OSM
Example 1 Example 2
Acknowledgments
We acknowledge partial support from the National Institutes of Health (NIH)
award: 1R01HD087132-01: “kHealth: Semantic Multisensory Mobile Approach to
Personalized Asthma Care” and the National Science Foundation (NSF) award:
EAR 1520870: “Hazards SEES: Social and Physical Sensing Enabled Decision
Support for Disaster Management and Response”. Points of view or opinions
in this document are those of the authors and do not necessarily represent the
official position or policies of the NIH or NSF.
References
1. Anantharam, P., Banerjee, T., Sheth, A., Thirunarayan, K., Marupudi, S., Srid-
haran, V., Forbis, S.G.: Knowledge-driven personalized contextual mhealth service
for asthma management in children. In: 2015 IEEE Intl. Conf. on Mobile Services
(IEEE MS) (2015)
2. Anantharam, P., Thirunarayan, K., Marupudi, S., Sheth, A., Banerjee, T.: Under-
standing city traffic dynamics utilizing sensor and textual observations. In: Proc. of
The 13th AAAI Conf. on Artificial Intelligence (AAAI), February 12–17, Phoenix,
Arizona, USA (2016)
3. Anantharam, P., Thirunarayan, K., Sheth, A.P.: Traffic analytics using probabilis-
tic graphical models enhanced with knowledge bases. Analytics for Cyber Physical
Systems workshop at the SIAM Conf. on Data Mining (ACS) (2013)
4. Balasuriya, L., Wijeratne, S., Doran, D., Sheth, A.: Finding street gang members
on twitter. In: The 2016 IEEE/ACM Intl. Conf. on Advances in Social Networks
Analysis and Mining (ASONAM). vol. 8, pp. 685–692 (August 2016)
16 Sheth et al.
5. Cameron, D., Sheth, A., Jaykumar, N., Thirunarayan, K., Anand, G., Smith, G.A.:
A hybrid approach to finding relevant social media content for complex domain
specific information needs. Web Semantics: Science, Services and Agents on the
World Wide Web 29 (2014)
6. Cruse, D.A.: Lexical semantics. Cambridge University Press (1986)
7. Daniulaityte, R., Carlson, R., Falck, R., Cameron, D., Perera, S., Chen, L., Sheth,
A.: “i just wanted to tell you that loperamide will work”: a web-based study of
extra-medical use of loperamide. Drug and alcohol dependence 130(1), 241–244
(2013)
8. Domingos, P.: A few useful things to know about machine learning. Communica-
tions of the ACM 55 (2012)
9. Domingos, P.: The master algorithm: How the quest for the ultimate learning
machine will remake our world. Basic Books (2015)
10. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A.,
Lally, A., Murdock, J.W., Nyberg, E., Prager, J., et al.: Building watson: An
overview of the deepqa project. AI magazine 31(3) (2010)
11. Fillmore, C.J.: Frame semantics and the nature of language. Annals of the New
York Academy of Sciences 280(1) (1976)
12. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE
Intelligent Systems 24 (2009)
13. Jadhav, A.: Knowledge Driven Search Intent Mining. Ph.D. thesis, Wright State
University (2016)
14. Jain, P., Hitzler, P., Sheth, A.P., Verma, K., Yeh, P.Z.: Ontology alignment for
linked open data. In: International Semantic Web Conference (ISWC). pp. 402–
417 (2010)
15. Kimmig, A., Bach, S., Broecheler, M., Huang, B., Getoor, L.: A short introduc-
tion to probabilistic soft logic. In: Proc. of the NIPS Workshop on Probabilistic
Programming: Foundations and Applications (2012)
16. Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques.
MIT press (2009)
17. Lalithsena, S., Hitzler, P., Sheth, A., Jain, P.: Automatic domain identification for
linked open data. In: IEEE/WIC/ACM Intl. Joint Conf. on Web Intelligence and
Intelligent Agent Technologies (WI). vol. 1 (2013)
18. Lalithsena, S., Kapanipathi, P., Sheth, A.: Harnessing relationships for domain-
specific subgraph extraction: A recommendation use case. In: 2016 IEEE Intl. Conf.
on Big Data (Big Data). pp. 706–715 (2016)
19. McMahon, C., Johnson, I., Hecht, B.: The substantial interdependence of wikipedia
and google: A case study on the relationship between peer production communities
and information technologies. In: 11th Intl. AAAI Conf. on Web and Social Media
(ICWSM). pp. 142–151. Montreal, Canada (May 2017)
20. Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in wordnet.
International Journal of Hybrid Information Technology 6(1) (2013)
21. Mihalcea, R.: Knowledge-based methods for wsd. Word Sense Disambiguation:
Algorithms and Applications (2006)
22. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to
wordnet: An on-line lexical database. International Journal of Lexicography 3(4)
(1990)
23. Nguyen, V., Bodenreider, O., Sheth, A.: Don’t like rdf reification?: making state-
ments about statements using singleton property. In: Proc. of the 23rd Intl. Conf.
on World Wide Web (WWW). pp. 759–770. Seoul, Korea (2014)
Knowledge will Propel Machine Understanding of Content 17
24. Perera, S., Henson, C., Thirunarayan, K., Sheth, A., Nair, S.: Semantics driven
approach for knowledge acquisition from emrs. IEEE Journal of BHI 18(2) (2014)
25. Perera, S., Mendes, P., Sheth, A., Thirunarayan, K., Alex, A., Heid, C., Mott, G.:
Implicit entity recognition in clinical documents. In: Proc. of the 4th Joint Conf.
on Lexical and Computational Semantics (*SEM). pp. 228–238 (2015)
26. Perera, S., Mendes, P.N., Alex, A., Sheth, A., Thirunarayan, K.: Implicit entity
linking in tweets. In: Extended Semantic Web Conference (ESWC). pp. 118–132.
Greece (2016)
27. Ramakrishnan, C., Mendes, P.N., da Gama, R.A., Ferreira, G.C., Sheth, A.: Joint
extraction of compound entities and relationships from biomedical literature. In:
Proc. of the 2008 IEEE/WIC/ACM Intl. Conf. on Web Intelligence and Intelligent
Agent Technology (WI). pp. 398–401. Sydney, Australia (2008)
28. Rizzo, G., Basave, A.E.C., Pereira, B., Varga, A., Rowe, M., Stankovic, M., Dadzie,
A.: Making sense of microposts (#microposts2015) named entity recognition and
linking challenge. In: #MSM (2015)
29. Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R., Scheffczyk, J.:
Framenet ii: Extended theory and practice (2006)
30. Shekarpour, S., Ngonga Ngomo, A.C., Auer, S.: Question answering on interlinked
data. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW). pp. 1145–
1156. Rio de Janeiro, Brazil (2013)
31. Sheth, A., Anantharam, P., Henson, C.: Physical-cyber-social computing: An early
21st century approach. IEEE Intelligent Systems 28(1) (2013)
32. Sheth, A., Anantharam, P., Henson, C.: Semantic, cognitive, and perceptual com-
puting: Paradigms that shape human experience. Computer 49(3) (2016)
33. Sheth, A., Avant, D., Bertram, C.: System and method for creating a semantic web
and its applications in browsing, searching, profiling, personalization and advertis-
ing (Oct 30 2001), uS Patent 6,311,194
34. Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing
semantic content for the web. IEEE Internet Computing 6(4), 80–87 (2002)
35. Sheth, A., Kapanipathi, P.: Semantic filtering for social data. IEEE Internet Com-
puting 20(4) (2016)
36. Sheth, A., Ramakrishnan, C., Thomas, C.: Semantics for the semantic web: The
implicit, the formal and the powerful. International Journal on Semantic Web and
Information Systems (IJSWIS) 1(1), 1–18 (2005)
37. Shoham, Y.: Why knowledge representation matters. Communications of the ACM
59(1) (2015)
38. Wang, W.: Automatic Emotion Identification from Text. Ph.D. thesis, Wright State
University (2015)
39. Wijeratne, S., Balasuriya, L., Doran, D., Sheth, A.: Word embeddings to enhance
twitter gang member profile identification. In: IJCAI Workshop on Semantic Ma-
chine Learning (SML). pp. 18–24. New York City (07 2016)
40. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: Building a ma-
chine readable sense inventory for emoji. In: 8th Intl. Conf. on Social Informatics
(SocInfo). pp. 527–541. Bellevue, WA, USA (November 2016)
41. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: Emojinet: An open service and
api for emoji sense discovery. In: 11th Intl. AAAI Conf. on Web and Social Media
(ICWSM). pp. 437–446. Montreal, Canada (May 2017)
42. Wijeratne, S., Balasuriya, L., Sheth, A., Doran, D.: A semantics-based measure of
emoji similarity. In: 2017 IEEE/WIC/ACM Intl. Conf. on Web Intelligence (WI).
Leipzig, Germany (August 2017)
43. Winograd, T.: Understanding natural language. Cognitive psychology 3(1) (1972)