0% found this document useful (0 votes)
67 views

Intro To NLP and Text Mining

This document provides an introduction to natural language processing and text mining. It discusses key references in the field, defines text mining as analyzing large amounts of natural language text to extract useful information, and outlines some of the main tasks in NLP like part-of-speech tagging, named entity recognition, and syntactic parsing. It also discusses different approaches to solving NLP problems like rule-based and statistical methods, and describes some common NLP applications like machine translation, question answering, and sentiment analysis.

Uploaded by

Bobbie N Ananda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Intro To NLP and Text Mining

This document provides an introduction to natural language processing and text mining. It discusses key references in the field, defines text mining as analyzing large amounts of natural language text to extract useful information, and outlines some of the main tasks in NLP like part-of-speech tagging, named entity recognition, and syntactic parsing. It also discusses different approaches to solving NLP problems like rule-based and statistical methods, and describes some common NLP applications like machine translation, question answering, and sentiment analysis.

Uploaded by

Bobbie N Ananda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to NLP and Text Mining

Instructor: Rahmad Mahendra

Natural Language Processing & Text Mining

Pusat Ilmu Komputer UI & Sinar Mas


15th January 2018
References
• Jurafsky and Martin, Speech and Language
Processing 2nd ed, Prentice-Hall, 2008.
• Manning and Schutze, Foundation of Statistical
Natural Language Processing, 1999.
• Natural Language Processing course materials:
Stanford University, Edinburgh University, Illinois
University, University of California at Berkeley, University
of Texas at Austin, ETH Zurich, National University of
Singapore, Universitas Indonesia
References
• Feldman and Sanger, The Text Mining
Handbook: Advanced Approaches in Analyzing
Unstructured Data, Cambridge University Press,
2007
• Indurkhya and Damerau (ed), Handbook of
Natural Language Processing 2nd ed, CRC
Press, 2010
Text Mining
Text Mining
System that analyzes large quantities of natural
language text dan detects lexical or linguistic
patterns in an attempt to extract probably useful
information. (Sebastiani, 2002)

Mining useful information from unstructured


text...
Unstructured…

Free text,
Grammatical
Error,
Ambiguity,
Complex,
Slank Words, …
Semi-Unstructured…

XML,
JSON

Example: ECG Reports

(Angelino, 2012)
Structured…
Database

(Dzerovski, 1996)
Data Mining vs Text Mining
• “Data Mining is essentially concerned with
information extraction from structured
databases.”

• In reality, a large portion of the available information


appears in textual and unstructured form. Text
mining operates on textual data to extract
information from a collections of texts.

(Rajman & Besancon, 1997)


Text Mining
INPUT: raw and unstructured text
This past Saturday, I bought a Nokia phone
and my friend bought a Motorola phone OUTPUT:
with Bluetooth. We called each other when
we Nokia
got home. Basically I like the screen. But Screen: good
the voice on my phone was not so clear, Battery life : bad
worse than my previous Samsung Sound quality : bad
phone. The battery life was short too. My
friend was quite happy with her phone. I Motorola
Sound quality : good
wanted a phone with good sound quality
just like his phone. So my purchase was Samsung
a real disappointment. I returned the Sound quality : better-than Nokia
phone yesterday.”
Natural Language Processing
Natural Language Processing
• NLP is the branch of computer science focused
on developing systems that allow computers to
communicate with people using everyday
language.
• Also called Computational Linguistics
– Also concerns how computational methods can aid
the understanding of human language
Why Study NLP
• An enormous amount of knowledge is now
available in machine readable form as natural
language text.
• Conversational agents are becoming an
important form of human-computer
communication.
• Much of human-human communication is now
mediated by computers.
• Lots of exciting stuff going on ...
NLP Related Area
• Artificial Intelligence
• Formal Language (Automata) Theory
• Machine Learning
• Linguistics
• Psycholinguistics
• Cognitive Science
• Philosophy of Language
Linguistic Level of Analysis
• Word
• Syntax
– concerns the proper ordering of words and its affect
on meaning.
• Semantics
– concerns the (literal) meaning of words, phrases, and
sentences.
• Pragmatics
– concerns the overall communicative and social
context and its effect on interpretation.
Word

Example is taken from Edinburgh’s lecture notes


Morphology

Example is taken from Edinburgh’s lecture notes


Part of Speech

Example is taken from Edinburgh’s lecture notes


Syntax

Example is taken from Edinburgh’s lecture notes


Semantics

Example is taken from Edinburgh’s lecture notes


Discourse

Example is taken from Edinburgh’s lecture notes


Why NLP is Hard
• Ambiguity
– Lexical Ambiguity
– Structural Ambiguity
– Referential Ambiguity
• Sparsity
• Scale
• Unmodeled Variable
NLP Core Tasks
• Morphological Analysis
• Part-of-Speech Tagging
• Named-Entity Recognition
• Syntactic Parsing
• Semantic Parsing
• Word Sense Disambiguation
• Textual Entailment
• Coreference Resolution
NLP Applications
• Spelling and Grammar Correction

• Information Retrieval

• Text Summarization
http://autosummarizer.com/

• Text Classification
NLP Applications
• Machine Translation
http://translate.google.com

• Question Answering
http://start.csail.mit.edu

• Sentiment Analysis
Approach to Solve NLP Problem
• Rule Based (Symbolic)
– Developed hand coded rules
• Statistics Based (Empirical)
– Annotate data based on standard tagsets, then
machine learn a model
• Hybrid systems
– Often blend rule-based pre- and post-processing with
ML core
(Effective) NLP Cycle
• Pick a problem (usually some disambiguation).
• Get a lot of data (hopefully labeled, but often
unlabeled).
• Build the simplest thing that could possibly work.
• Repeat:
– Examine the most common errors are.
– Figure out what information a human might use to avoid
them.
– Modify the system to exploit that information
• Feature engineering
• Representation redesign
• Different machine learning methods
THANK YOU

You might also like