Parvathy V J, Engineer Special Programs, Livewire, Trivandrum
Parvathy V J, Engineer Special Programs, Livewire, Trivandrum
Outline
• Basics of NLP
• Text pre-processing & vectorisation
• NLP process implementation in python
• Spam email classifier use-case
What is an Intelligent Machine ?
• Generalized Adaptability
• Automated Reasoning
• Knowledge Representation
• Natural Language Processing Alan Turing
• Problem Solving Ability
• Machine Learning
• Natural Language Processing or NLP is a field of Artificial Intelligence that
gives the machines the ability to read, understand and derive meaning from
human languages
Human-Machine Interaction using NLP
• Language detection
• Next word prediction
• Automated query answering
• Audio to text conversion
• Data to audio conversion
• Processing of the text data
What is NLP used for?
• NLP enables recognition and prediction of diseases-Amazon Comprehend
Medical
• Customer review analysis in social media-sentimental analysis
• Personal assistants-cognitive assistants-IBM
• Spam email filtering-Yahoo and Google
• Fake news identification-NLP Group at MIT
• Voice assistants-Siri, Alexa, Cortana
• NLP in health care-Woebot chatbot therapist
• Language translation applications - Google Translate
• Checking grammatical accuracy of texts- Microsoft Word and Grammarly
Why is NLP difficult?
• Ambiguity and imprecise characteristics of natural languages make NLP
difficult for machines to implement
How does NLP works?
• Syntactic analysis and semantic analysis are the main techniques used
• Syntax -arrangement of words in a sentence to make grammatical sense
• Syntactic analysis shows how natural language aligns with grammatical rules
• Semantics refers to the meaning that is conveyed by a text
At a glimpse…
• Tokenisation
• Removal of Stop-words, punctuations, numbers ,special characters
• Stemming
• Lemmatisation
• Bag of Words
• TF-IDF
• Word2Vec
• POS tags
• Named Entity Recognition
• Chunking
• Parsing
Stop-words Removal
Vectorisation in NLP
• Bag of Words
• TF- IDF : Term frequency- Inverse document frequency
• Word2Vec
Converting words into vectors-Bag of Words
Converts text into vectors containing the count of word occurrences in the document
Histogram Creation
TF-IDF
• TERM FREQUENCY