Theme-Related Keyword Extraction From Free Text Descriptions of Image Contents For Tagging
Theme-Related Keyword Extraction From Free Text Descriptions of Image Contents For Tagging
Theme-Related Keyword Extraction From Free Text Descriptions of Image Contents For Tagging
Abstract— This paper discusses a method for automatic theme- users’ natural language comments on their photos and videos;
related keyword extraction from users’ natural language in most social media, user comment texts are posted together
comments on their photographs and videos. ‘Theme’ indicates with the image contents. The theme-related keyword
the concepts circumscribing and describing the content of the extraction system employs a deep learning algorithm,
photos and videos such as pets, natural sites, palaces and places.
The method employs a deep learning algorithm, RNN(Recurrent
RNN(Recurrent Neural Network) that is good at recognizing
Neural Network) that is good at recognizing implicit patterns of implicit patterns of sequential data like sequence of words in
sequential data. The method has been applied to the construction user comments. In existing methods having pre-set candidate
of a place-related image content DB, and delivers reasonably keywords for a certain target domain (i.e. theme), many
good performance even in case the measure (i.e. themes of image legitimate keywords as closely associated with the theme can
contents) is abstract and vague. be omitted or many unnecessary keywords as weakly
associated with the theme can be extracted, so that the quality
Keywords— Image Content DB, Keyword Tagging, Content of extracted keywords tends to be worsened. This is because
Search, Tag Extraction, Recurrent Neural Network (RNN) the concept of the theme of image contents is vague so that it
cannot be defined with explicit rules and that even the same
I. INTRODUCTION keyword can have different meanings or nuance in the context
Let’s imagine an automatic construction of theme-related of a user comment.
image content DB using photographs and videos collected
II. BACKGROUND TECHNOLOGY AND RELATED WORK
from various social media such as SNS and blogs. Here,
‘theme’ indicates concepts circumscribing and describing the A. Social Media
content of the photographs and videos such as pets, palaces,
Huge amounts of contents are posted and distributed
natural sites and places. Such theme-related image content DB
through various social media such as blogs and SNS(Social
can be utilized in various applications. For example, imagine
Network Service). Especially, owing to increase of the
the task of choosing places to take scenes of a movie or TV
Internet bandwidth and progress of telecommunication
drama. The person in charge of the task in a film-making
technology, the amounts of image contents such as
project can refer to the photos or videos retrieved from a
photographs and videos are growing explosively. Most social
place-related image content DB in order to reduce the number
media platforms provide open APIs for users and computer
of candidate places which he/she should go out to and check
systems to use the image contents freely and easily [1], [2].
the real conditions or feelings of. By checking the real
Looking at typical social media such as flickr[3] and
conditions only of a small number of candidate places rather
facebook[4], we can see that photos and videos are posted
than many possible places, he/she can save a lot of time and
together with user comments written in natural language. Thus,
money. We can imagine many other tasks or applications that
if we analyse the user comments we can find the sort (i.e.
could benefit from such theme-related image content DBs
theme) of the image contents and, furthermore, we can extract
such as choosing travel destination, determining a pet dog
proper keywords related to the image content. Figure 1 shows
breed and so on.
a snap shot of a flickr web page posting a photo and user
Those image content DBs should provide an easy and quick
comments on it.
way of finding the image contents that are related closely to
There are numbers of work such as Rae[2] and Kim[5] that
users’ task. The most familiar way of searching is to use
extract certain information from unstructured data like natural
search keywords. In order for the DB systems to provide
language text. Kim developed a system that extracts from
keyword search, each photos and videos have to be tagged
email text the information related to a meeting such as the
with keywords when they are stored in the DBs. This paper
name, place, date and time of the meeting. The system
discusses an automatic theme-related keyword extraction from
employs CRF(Conditional Random Field) machine learning and desired patterns to be recognized [6], [7]. As the attributes
algorithms to recognize such information words in the texts. of words in the user comment text, besides the order of the
Rae et al. also utilize CRF algorithms to extract from user words, we can consider various language features such as POS
comment text the information related to a place where the (Part of Speech), dependency relations and semantic
flickr image content are taken. categories [1], [2], [5]. Words assigned to the same POS
generally display similar behaviour in terms of syntax, and
words in a sentence connect with each other directly or
indirectly. Moreover, a word represents some things or
concepts so that it can be classified into categories of
meanings according to its representations. Humans, in fact,
unconsciously refer to such features when understanding
sentences.
Meanwhile, machines do not know which patterns are
significant and thus which patterns they should recognize. So,
we have to inform the machine of the desired patterns by
providing labels. While the language features as the attributes
of data are usually provided and tagged automatically by
natural language processing toolkits, the labels are provided
and tagged manually by human users.
A. Tagging the Language Features genitive case marker and an adjective-derived suffix,
The language feature tagging system decomposes an input respectively. The tags such as 1, 2 and 3 denote word
user comment text into syntactic units (i.e. words) preserving dependencies, meaning the distance to other word the word
the original order of the words. And then, it tags language depends on. The semantic categories are marked with such
features such as POS, dependencies and semantic categories to tags as 㧧㣿__01, 㤊☯__02 and 㡂㧦__02, meaning action,
each word. The system has been implemented using movement and female, respectively. Such language feature
UWordMap[8] and UTagger[9] developed by Ulsan tags are given automatically by UWordMap and UTagger.
University, Korea. UWordMap is a Korean word map Note that those language attributes tagged might include
constructed based on the Standard Korean Dictionary of many noises i.e. wrong tags. In other words, because the
National Institute of Korean Language. UTagger is a toolkit natural language processing algorithms themselves are also
that tags POS, dependency relations and semantic categories based on statistics model and/or machine learning techniques,
of words using UWordMap and various natural language their analysis might be inaccurate. Moreover, a word can have
processing algorithms. many different meanings (i.e. semantic categories) in different
texts (i.e. sentences) according to the context of the sentences.
The system, however, does not tag the right semantic
categories of the word in a sentence but tags all possible
semantic categories of the word. In other words, the semantic
category in this paper does not mean the exact meaning of the
word in a user comment text. It is because judging accurate
meanings of a word in a sentence is very difficult for the state-
of-the-art techniques or at least, the accuracy is so poor that
much noise has to be included inevitably.
The last tags of each word are the labels which are given by
human users as the answer of the pattern recognition. The
labels are tagged according to IOB2 tagging model [10]. In
this model, B tag denotes the beginning of a target pattern (i.e.
a theme-related keyword), I tag denotes the inside of the target
pattern, and O tag denotes the outside of the target pattern. So,
with these label tags, we can separate the target theme-related
keywords from others.
B. Learning Patterns of the Theme-Related Keywords
The theme-related keyword pattern learning system is
trained to generate a pattern recognition model using a bulk of
data (i.e. machine learning data) which incorporates the
language features tags and labels. This paper employs RNN
algorithms in order that the system can learn a model for
sequential data patterns. As a toolkit for RNN algorithms,
DeepLearning4J[11] is used. DL4J is developed using Java so
that it is easy y to integrate DL4J with application system.
The RNN used in this pater consists of 4 neural network
layers. The first layer is the input layer and has 962 sigmoid
neurons (i.e. nodes) according to the number of input
attributes of data. The second and third layers are the hidden
layers having 100 LSTM(Long and Short-Term Memory)
neurons respectively. Lastly, the fourth layer is the out layer
of 4 softmax neurons for the three label tags and an
exceptional output tag; the exceptional case is thought to
occur by user’s false label tagging. SGD(Stochastic Gradient
Descent) algorithm is used for the training of the neural
Figure 3. An example of machine learning data composed of language
feature tags and labels network. And, as the cost function of the training algorithm,
LossMCXENT, a sort of the negative log-likelihood cost
Figure 3 shows an example of machine learning data the function is used. This paper applies the Dropout regularization
theme-related keyword pattern recognition. The data preserves and L2 regularization. Such RNN explained above can be
the order of words and each word is tagged with language thought quite typical one having no special features. In Figure
features and labels. The tags such as NNG, JKG and XSA 4 we can see the network configuration and some hyper-
denote POS features of the word, meaning a common noun, a parameters of the network.
voting scheme increases the recall rate of search by expanding [9] J. Shin, and C. Ock, “Optional features for speeding up UTagger,” in
Proc. Of the 24th Annual Conference on Human and Cognitive
potential keywords that would be extracted. The method has
Language Technology, 2012.
been implemented in the construction of place-related image [10] T. Ek, C. Kirkegaard, H. Jonsson, and P. Nugues, “Named Entity
content DB. The place-related keywords extraction system Recognition for Short Text Messages,” in Proc. Of International
renders an average precision of about 76% and an average Conference of the Pacific Association for Computational Linguistics,
2011.
recall of about 94% according to human user estimation for
[11] (2017) Deeplearning4J [Online]. Available: https://deeplearning4j.org/
the keywords extracted.
This paper shows it is possible to deliver reasonable
performance in the automatic extracting of theme-related
keywords when using machine learning technique, especially,
Joonmyun Cho received his B.S., M.S. and Ph.D.
RNN, even when the measure (in this paper, themes of image degrees in mechanical engineering from
contents) is abstract and vague. Moreover, even in case the KAIST(Korea Advanced Institute of Science and
attributes used for the training data has some noise (in this Technology), South Korea, in 1993, 1995 and 2006,
respectively. He joined ETRI(Electronics and
paper, inaccurate semantic categories of a word with respect Telecommunications Research Institute), South
to the word’s exact meaning in text), the accuracy of the Korea in 2007 and was involved with the URC
extracted keywords is reliable. The method in this paper can (Ubiquitous Robotic Companion) project until 2011
also be used to determine the sort of photos and videos for and Beyond Smart TV project until 2015. Dr. Cho is
currently working in Intelligent IoT SW Platform
classification by analysing user comments on the contents
project as a senior researcher. His research interests
instead of analysing image contents themselves. include knowledge based systems, intelligent agent systems and machine
learning.
ACKNOWLEDGMENT
This research is supported by Ministry of Culture, Sports Yoon-Seop Chang received his B.S., M.S., and Ph.D.
and Tourism (MCST) and Korea Creative Content Agency degrees in geographic information system from Seoul
(KOCCA) in the Culture Technology (CT) Research & National University, South Korea, in 1999, 2001 and
Development Program 2017. 2005, respectively. He joined ETRI (Electronics and
Telecommunications Research Institute), South
Korea, in 2005 and is currently working as a principal
REFERENCES researcher. Since 2008, Dr. Chang has also been a
[1] S. Kumar, F. Morstatter, and H. Liu, “Twitter Data Analytics,” faculty member of University of Science and
Database Management & Information Retrieval, 2013. Technology, South Korea, as an associate professor.
[2] A. Rae, A. Popescu, V. Murdock, and H. Bouchard, “Mining the Web His research interests include geographic information
for Points of Interest,” in Proc. Of International SIGIR Conference on system, web mashup, augmented reality and virtual reality.
Research and Development in Information Retrieval, 2012.
[3] (2017) flickr [Online]. Available: https://www.flickr.com/
[4] (2017) facebook [Online]. Available: https://www.facebook.com/ Seong-Ho Lee received his B.S. and M.S. degrees in
[5] K. R. Kim, “Location Extraction from Meeting Announcements,” computer science from Chungbuk National
KAIST, Master’s Thesis, 2012. University, South Korea, in 1997 and 2000,
[6] B. T. Jang, “Next-Generation Machine Learning Technologies,” respectively. Since 2000, he has been a senior
Communications of the Korean Institute of Information Scientists and member of research staff with ETRI, South Korea,
Engineers, 2007. and he is also working toward the Ph.D. degree in
[7] A. Graves, “Supervised Sequence Labelling with Recurrent Neural computer science Chungbuk National University. Mr.
Networks,” Studies in Computational Intelligence, Springer, 2012. Lee is currently working in Location-based Smart
[8] Y. Bae, and C. Ock, “Introduction to the Korean Word Content Platform project as a senior researcher. His
Map(UWordMap) and API,” in Proc. of the 26th Annual Conference research interests are spatio-temporal database
on Human and Cognitive Language Technology, 2014. systems, geographic information systems, and location-based services.