Study On Ambiguity and NLP Application
Study On Ambiguity and NLP Application
PROBLEMS
Anuj Yadav
ABSTRACT
Natural language processing (NLP) has been considered as one of the important area in Artificial
Intelligence. However, the progress made in natural language processing is quite slow, compared to other
areas. The aim of this study is to conduct a systematic literature review for identifying the most prominent
applications, techniques and challenging issues in NLP applications. To conduct this review, I had screened
587 retrieved papers from major databases such as SCOPUS and IEEE Explore, and also from Google
search engine. In searching relevant papers search keywords such as "natural language processing, NLP
applications, and complexity of NLP applications" had been used. However, to focus to the scope of the
study 503 papers were excluded. Only the most prominent NLP applications namely information extraction,
question answering system and automated text summarization were chosen to be reviewed. It is obvious
that the challenging issue in NLP is the complexity of the natural language itself, which is the ambiguity
problems that occur in various level of the language. This paper also aims at addressing ambiguity
problems which occur at lexical and structural levels and significance techniques or approaches for solving
the problems. Finally, the paper briefly discuss the future of NLP.
Keywords: Natural language processing, NLP applications, Ambiguity in NLP
The work presented in [81] demonstrated an effort It is no doubt that NLP is an enabler for deploying
to resolve ambiguous terms using sense-tagged natural, intelligent, and intuitive applications for
everyday use. It is transforming the way how technologies which are based on NLP are briefly
human interact with computers. Thus, resolving the highlighted.
complexity issues in a human language is indeed
critical, vital, and urgent. REFRENCES:
[1] McCallum, A. (2005). Information extraction:
Applications such as chatbot, smart search, distilling structured data from unstructured text.
recommender, customer service, personal assistant, Queue, 3 , 48–57.
multi lingual automated translation machine, [2] Sekimizu, T., Park, H., & Tsuji, J. (1998).
question answering, caption generation are Identifying the interactions between genes and
expected to be able to capitalize NLP techniques gene products based on frequently seen verbs in
for human-like understanding of speech and texts. medline abstract. Tokyo Japan: Universal
Deeper applications such as extracting insights and Academy Press.
analysis from a vast amount unindexed and
[3] Chu, C.-T., Sung, Y.-H., Yuan, Z., & Jurafsky,
unstructured data, mining texts, images, audios and
D. (2006). Detection of word fragments in
videos or reading, filtering, analyzing, extracting,
mandarin telephone conversation. In
and visualizing pieces of knowledge from text
International Conference on Spoken Language
documents such as emails, short messages, reviews,
Processing. URL pubs/fragment-icslp-06.pdf
and so on, are seen as critical technologies of NLP
in the future. [4] Ramage, D., Rosen, E., Chuang, J., Manning, C.
D., & McFarland, D. A. (2009). Topic modeling
When machines are intelligent enough to for the social sciences. In Workshop on
understand and communicate in a human language, Applications for Topic Models: Text and
human users are able to be more effective and Beyond (NIPS 2009 ). Whistler, Canada.
efficient in accessing, analyzing, and leveraging [5] Jurafsky, D., Ranganath, R., & McFarland, D.
huge amount of data. NLP market is growing. (2009). Extracting social meaning: identifying
According to a 2017 Tractica report [86], NLP interactional style in spoken conversation. In
market is estimated to be around 22.3 billion USD Proceedings of Human Language Technologies:
by 2025. This estimation has included the total NLP The 2009 Annual Conference of the North
software, hardware and services. Furthermore, NLP American Chapter of the Association for
solutions that leveraging AI will see a market Computational Linguistics (NAACL ’09), (pp.
growth from 136 million USD in 2016 to 5.4 billion 638–646). Morristown, NJ, USA: Association
USD by 2025. for Computational Linguistics.
[6] Grenager, T., Klein, D., & Manning, C. D.
5. CONCLUSION (2005). Unsupervised learning of field
segmentation models for information extraction.
This paper has successfully present the most In Proceedings of the 43rd Annual Meeting on
prominent applications of NLP. These include Association for Computational Linguistics, (pp.
information extraction, question answering 371–378).
systems, and automated text summarizations. [7] Jurafsky, D., & Martin, J. H. (2009). Speech and
Because of the mechanism (the citation numbers) Language Processing: An introduction to
used in selecting papers to be reviewed, a number Natural language Processing, Computational
of current literatures which discussed the selected Linguistics and Speech Recognition. United
topic might be left out unintentionally. In overall, States of America: Prentice Hall.
this paper has given a depth overview of main [8]Allen, J. (1988). Natural Language
applications of NLP. NLP has been also considered Understanding, United States of America: The
as one of AI hard problems. The complexity of Ben-jamin/Cummings Publishing Company.
natural language processing is caused by the
[9] Karat, C., Vergo, J., & Nahamoo, D. (2003).
ambiguity problems which always occur in a
Conversational interface technologies. In J. A.
human language. Although the ambiguity problem
Jacko, & A. Sears (Eds.). The Human-
may occur in all levels of a natural language, the
Computer Interaction Handbook, (pp. 169–186).
most common problems always occur at lexical and
Lawrence Erlbaum Associates.
structural levels. The paper also addresses,
discusses, and distinguishes between approaches in [10] Feldman, R., & Sanger, J. (2007). The text
resolving the ambiguity problems. The future mining Handbook: Advanced Approaches in
Analyzing Unstructured Data. United State of [22] Huang, F. (2005). Multilingual Named Entity
America: Cambridge University Press. Extraction and Translation from Text and
[11] Lee, S., & Lee, G. (2005). Heuristic methods for Speech. Ph.D. thesis, Carnegie Mellon
reducing errors of geographic named entities University.
learned by bootstrapping. In Proceeding of the [23] Abuleil, S. (2006). Hybrid system for extracting
International Joint Conference on Natural and classifying arabic proper names. In
Language Processing. Proceedings of the 5th WSEAS International
[12] Fleischman, M., & Hovy, E. (2002). Fine Conference on Artificial Intelligence,
grained classification of named entities. In Knowledge Engineering and Data Bases, (pp.
Proceeding of the 19th International Conference 205–210). Stevens Point, Wisconsin, USA:
on Computational Linguistics (COLING). World Scientific and Engineering Academy and
[13] Bodenreider, O., & Zweigenbaum, P. (2000). Society (WSEAS).
Identifying proper names in parallel medical [24] AlHajjar, A., Hajjar, M., & Khaldoun, Z.
terminologies. Stud Health Technol Inform, 77 , (2010). A system for evaluation of arabic root
443–447. extraction methods. In Proceedings of the 2010
[14] McCallum, A., & Li, W. (2003). Early results Fifth International Conference on Internet and
for named entity recognition with conditional Web Applications and Services, ICIW ‟10, (pp.
random fields, feature induction and Web- 506–512). Washington, DC, USA: IEEE
enhanced lexicons. In Proceedings of the Computer Society.
Conference on Computational Natural [25] Zaghouani, W. (2012). Renar: A rule-based
Language Learning . arabic named entity recognition system. 11 (1),
[15] Alfonseca, E., & Manandhar, S. (2002). An 2:1–2:13.
unsupervised method for general named entity [26] Hirschman, L., & Gaizauskas, R. (2001).
recognition and automated concept discovery. Natural language question answering:the view
In Proceedings of the 1st International from here. Natural Language Engineering, 7,
Conference on General WordNet, (pp. 466– 275–300.
471). [27] Al-Harbi, O., Jusoh, S., & Norwawi, N. M.
[16] Chang, C. H., & Kup, S.-C. (2004). A semi- (2011). Lexical disambiguation in natural
supervised approach of web data extraction with language questions-nlqs). Journal of Computer
visual support. Intelligent System, 19 (6), 56–64 Science Issues, 8, 143 International –150.
[17] Nadeau, D. (2007). Semi-Supervised Named [28] Green, B., Wolf, A., Chomsky, C., & Laughery,
Entity Recognition:Learning to Recognize 100 K. (1961). Baseball: An automatic question
Entity Types with Little Supervision. Ph.D. answerer. In Proceedings Western Joint
thesis, University of Ottawa. Computer Conference, vol. 19, (pp. 219–224).
[18] Cucerzan, S., & Yarowsky, D. (1999). [29] Katz, B., Borchardt, G., & Felshin, S. (2006).
Language independent named entity recognition Natural language annotations for question
combining morphological and contextual answering. In Proceedings of the 19th
evidence. In Proceedings of the Joint Sigdat International FLAIRS Conference (FLAIRS
Conference on Empirical Methods in Natural 2006).
Language Processing and Very Large Corpora. [30] Mohammed F. A, Khaled Nasser, & Harb H.M.
[19] Bick, E. (2004). A named entity recognizer for (1993). A knowledge based Arabic question
Danish. In Proceedings of the Conference on answering system (AQAS). SIGART Bull. 4, 4
Language Resources and Evaluation. (October 1993), 21-30.
[20] May, J., Brunstein, A., Natarajan, P., & [31] Hammo, B., Abu-Salem, H., & Lytinen, S.
Weischedel, R. (2003). Surprise! what‟s in a (2002). Qarab: a question answering system to
cebuano or Hindi name? ACM Transactions on support the arabic language. In Proceedings of
Asian Language Information Processing the ACL-02 workshop on Computational
(TALIP), 2 (3), 169–180 approaches to semitic languages, SEMITIC ‟02,
[21] Piskorski, J. (2004). Named-entity recognition (pp. 1–11). Stroudsburg, PA, USA: Association
for Polish with SProUT. In L. Bolc, Z. for Computational Linguistics.
Michalewicz, & T. Nishida (Eds.) Lecture [32] Kanaan, G., Hammouri, A., Al-Shalabi, R., &
Notes in Computer Science, vol. 3490, (pp. Swalha, M. (2009). A new question answering
122–133). system for the arabic language. American
Journal of Applied Sciences, 6, 797–805.
[33] Mani, I., & Benjamin, J. (2002). Review of [44] Alfawareh, H.M. & Jusoh, S. (2011). Resolving
automatic summarization. Journal of ambiguous entity through context knowledge
Computational Linguistics, 28, 221–223. and fuzzy approach. International Journal on
[34] Mani, I. (1999). Advances in Automatic Text Computer Science and Engineering (IJCSE), 3
Summarization. Cambridge, MA: MIT Press. (1), 410 – 422.
[35] Loo, P., & Tan, C. (2002). Word and sentence [45] Burnard, L. (2000). Reference Guide for the
extraction using irregular pyramid. In British National Corpus. Oxford, UK: Oxford
Proceedings of the 5th International Workshop University Computing Services.
on Document Analysis Systems V (DAS ‟02), [46] Chodorow, M., Tetreault, J., & N.Han (2007).
(pp. 307–318). Heidelberg: Springer-Verlag, Detection of grammatical errors involving
London, UK. prepositions. In Proceedings of the 4th ACL-
[36] Jusoh, S., Masoud, A. M., & Alfawareh, H. M. SIGSEM Workshop on Prepositions, (pp. 25–
(2011). Automated text summarization: 30).
Sentence refinement approach. In V. Snasel, J. [47] Lindstromberg, S. (2001). Preposition entries in
Platos, & E. El-Qawasmeh (Eds.) Digital UK monolingual learners dictionaries: Problems
Information Processing and Communications, and possible solutions. Applied Linguistics, 22
vol. 189 of Communications in Computer and (1), 79–103.
Information Science, (pp. 207–218). Springer [48] Baldwin, T., Kordoni, V., & Villavicencio, A.
Berlin Heidelberg. (2009). Prepositions in applications: A survey
[37] Chan, S. (2006). Beyond keyword and cue- and introduction to the special issue.
phrase matching: a sentence-based abstraction Computational Linguistic, 35 (2), 119–149.
technique for information extraction. Decision [49] Hindle, D., & Rooth, M. (1993). Structural
Support System, 42, 759–77. Chang, C. H., & ambiguity and lexical relations. Computational
Kup, S.-C. (2004). A semi-supervised approach Linguistics., 19 (1), 103–120.
of web data extraction with visual support. [50] Collins, M., & Brooks, J. (1995). Prepositional
Intelligent System, 19 (6), 56–64. phrase attachment through a backed-off model.
[38] Jeek, K., & Steinberger, J. (2008). Automatic In Proceedings of the 3rd Annual Workshop on
text summarization: The state of the art and new Very Large Corpora, (pp. 27–38).
challenges. In Proceedings of the Znalosti 2008, [51] Zavrel, J., Daelemans, D., & Veenstra, J.
(pp. 1–12). (1997). Resolving PP attachment ambiguities
[39] Devasena, C. L., & Hemalatha, M. (2012, with memory-based learning. In Proceedings of
March). Automatic text categorization and the Conference on Computational Natural
summarization using rule reduction. Language Learning (CoNLL-97), (pp. 136–
In Advances in Engineering, Science and 144).
Management (ICAESM), 2012 International [52] Ratnaparkhi, A., Reynar, J., & Roukos, S.
Conference on (pp. 594-598). IEEE. (1994). A maximum entropy model for
[40] Sebastiani, F. (2002). Machine learning in preposi¬tional phrase attachment. In
automated text categorization. ACM computing Proceedings of the workshop on Human
surveys (CSUR), 34(1), 1-47. Language Technol¬ogy, (pp. 250–255).
[41] Yang, Y. (1999). An evaluation of statistical [53] Merlo, P., Crocker, M. W., & Berthouzoz, C.
approaches to text categorization. Information (1997). Attaching multiple prepositional
retrieval, 1(1), 69-90. phrases: Generalized backed-off estimation. In
[42] El-Haj, M., & Hammo, B. (2008). Evaluation of Proceedings of the 2nd Conference on
query-based arabic text summarization system. Empirical Methods in Natural Language
In Proceeding of the IEEE International Processing (EMNLP-97), (pp. 149–155).
Conference on Natural Language Processing [54] Alam, Y. S. (2004). Decision trees for sense
and Knowledge Engineering, NLP-KE08, (p. disambiguation of prepositions: Case of over. In
17). IEEE Computer Society. Proceedings of the Workshop on Computational
[43] El-Haj, M., Kruschwitz, U. and Fox, C., 2009, Lexical Semantics, (pp. 52–59).
November. Experimenting with Automatic Text [55] Sopena, J. M., Lloberas, A., & Moliner, J. L.
Summarisation for Arabic. In LTC (pp. 490- (1998). A connectionist approach to
499). prepo¬sitional phrase attachment for real world
texts. In Proceedings of the 36th Annual
Meeting of the ACL and 17th International Conference on Intelligent Text Processing and
Conference on Computational Linguistics Computational Linguistics (CICLing-2006),
(COLING/ACL-98), (pp. 1233–1237). (pp. 196–207).
[56] Alegre, M. A., Sopena, J. M., & Lloberas, A. [65] Navigli, R., & Velardi, P. (2005). Structural
(1999). PP-attachment: A committee machine semantic interconnections: a knowledge-based
approach. In Proceedings of the Joint SIGDAT approach to word sense disambiguation. IEEE
Conference on Empirical Methods in Natural Transactions on Pattern Analysis an
Language Processing and Very Large Corpora [66] Agirre, E., & Edmonds, P. (2007). Introduction.
(EMNLP/VLC-99), (pp. 231–238). In E. Agirre, & P. Edmonds (Eds.) Word Sense
[57] Abney, S., Schapire, R. E., & Singer, Y. (1999). Disambiguation: Algorithms and Applications,
Boosting applied to tagging and pp attachment. (pp. 1–28). New York: Springer Verlag.
In Proceedings of the Joint SIGDAT [67] Navigli, R. (2009). Word sense disambiguation:
Conference on Empirical Methods in Natural a survey. ACM Computing Surveys, 41 (2), 1–
Language Processing and Very Large Corpora 69.
(EMNLP/VLC-99), (pp. 38–45). [68] Agirre, E., & Stevenson, M. (2007). Knowledge
[58] Nadh, K., & Christian, H. (2009). Prepositional sources for WSD, (pp. 217–251). New York:
phrase attachment ambiguity resolution using Springer Verlag
semantic hierarchies. In Proceedings of the [69] McCarthy, D., Carroll, J., & Preiss, J. (2001).
Ninth IASTED International Conference on Disambiguating noun and verb senses using
Artificial Intelligence and Applications, (pp. automatically acquired selectional preferences.
73–80). In Proceedings of the SENSEVAL-2 Workshop
[59] Srinivas, M., & Bhattacharyya, P. (2006). at the European Chapter ACL, (pp. 119–122).
Prepositional phrase attachment through Toulouse, France.
seman¬tic association using connectionist [70] Schank, R., & Abelson, R. (1977). Scripts,
approach. In Proceedings of the Third Plans, Goals, and Understanding. Hillsdale, N.J:
International WordNet Conference Lawrence Erlbaum.
(GWC2006), (pp. 273–277).
[71] Wilks, Y. (1978). A preferential pattern-seeking
[60] Wu, H., & Furugori, T. (1996). Prepositional semantics for natural language inference.
phrase attachment through a hybrid Artificial Intelligence, 6 , 53–74.
disambiguation model. In Proceedings of the
[72]Resnik, P. (1995). Using information content to
16th conference on Computational linguistics,
evaluate semantic similarity in a taxonomy. In
(pp 1070-1073). Morristown, NJ, USA:
Proceedings of the International Joint
Association for Computational Linguistics.
Conference Artificial Intelligence (IJCAI), (pp.
[61] Hartrumpf, S. (1999). Hybrid disambiguation of 448–453).
prepositional phrase attachment and in-
[73] Mihalcea, R., & Moldovan, D. (2001). A highly
terpretation. In Proceedings of the Joint
accurate bootstrapping algorithm for word sense
SIGDAT Conference on Empirical Methods in
disambiguation. International Journal of
Natural Language Processing and Very Large
Artificial Intelligence Tools, 10 (1-2), 5– 21.
Corpora (EMNLP/VLC-99), (pp. 111–120).
[74] Jiang, J. J., & Conrath, D. W. (1997). Semantic
[62] Chiara, C. M., Fernando, F., & Patrizia, G.
similarity based on corpus statistics and lexical
(2008). Ambiguity detection in multimodal
taxonomy. In Proceedings of the 10th
systems. In Proceedings of the Working
International Conference on Research in
Conference on Advanced visual interfaces, (pp.
Computational Linguistics.
331–334). New York, NY, USA: ACM.
[75] Agirre, E., & Martinez, D. (2000). Exploring
[63] Benamara, F. (2005). Reasoning with
automatic word sense disambiguation with
prepositions within a cooperative question-
decision lists and the web. In Proceedings of the
answering framework. In Proceedings of the
Semantic Annotation And Intelligent
Second ACL-SIGSEM Workshop on the
Annotation workshop organized by COLING
Linguistic Dimensions of Prepositions and their
Luxembourg 2000 , (pp. 11–19).
Use in Computational Linguistics Formalisms
and Applications, (pp. 145–152). [76] Banerjee, S., & Pedersen, T. (2003). Extended
gloss overlaps as a measure of semantic relat-
[64] Boonthum, C., Toida, S., & Levinstein, I.
edness. In Proceedings of the 18th international
(2006). Preposition senses: Generalized
disambiguation model. In Proceedings of the
joint conference on Artificial intelligence,
(pp.805–810).
[77] Brown, P., Stephen, E., Pietra, D., Vincent, J.,
Pietra, D., & Mercer, R. L. (1991). Word sense
disambiguation using statistical methods. In
Proceedings of the 29th Annual Meeting for
Computational Linguistics, (pp. 264–270).
[78] Nadas, A., Nahamoo, D., Picheny, M., &
Powell, J. (1991). An iterative approximation of
the most informative split in the construction of
decision trees. In Proceedings of the IEEE
International Conference on Acoustics, Speech
and Signal Processing, (pp. 565– 568). Toronto.
[79] Yarowsky, D. (1996). Homograph
disambiguation in text-to-speech synthesi. In J.
Hirschberg, R. Sproat, & J. van Santen (Eds.)
Progress in Speech Synthesis, (pp. 159–175).
New York: Springer Verlag. [77]
[80] Youjin, C., & Jong-Hyeok, L. (2005). Practical
word-sense disambiguation using c-occurring
concept codes. Machine Translation, 19 (1), 59–
82.
[81]Liu, H., Hu, Z., Torii, M., Wu, C., Friedman, C.
(2006). Quantitative assessment of dictionary-
based protein named entity tagging. Journal of
the American Medical Informatics Associations
(JAMIA), 13, 497–507.
[82] Liu, H., Johnson, S. B., & Friedman, C. (2002).
Automatic resolution of ambiguous terms based
on machine learning and conceptual relations in
the UMLS. Journal of the American Medical
Informatics Associations (JAMIA), 9, 621–636.
[83] Liu, H., Hu, Z., Torii, M., Wu, C., & Friedman,
C. (2006). Quantitative assessment of
dictionary-based protein named entity tagging.
Journal of the American Medical Informatics
Associations (JAMIA), 13, 497–507.
[84] James, A., & Hema, R. (2002). Using part-of-
speech patterns to reduce query ambiguity. In
Proceedings of the 25th annual international
ACM SIGIR conference on Research and
development in information retrieval, (pp. 307–
314). New York, NY, USA: ACM.
[85]Tractica, Natural Language Processing Market
to Reach $22.3 Billion by 2025, August 21,
2017, Retrieved from:
https://www.tractica.com/newsroom/press-
releases/natural-language-processing-market-to-
reach-22-3-billion-by-2025/