0% found this document useful (0 votes)
30 views14 pages

Foundation For NLP

Natural Language Processing (NLP) is a field of computer science and artificial intelligence that enables machines to understand and interpret human language. It encompasses various techniques and applications, including text processing, sentiment analysis, machine translation, and chatbots, which help improve efficiency and communication between humans and computers. Despite its advantages, NLP has limitations such as context understanding and adaptability to new domains.

Uploaded by

jaismeensaini348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

Foundation For NLP

Natural Language Processing (NLP) is a field of computer science and artificial intelligence that enables machines to understand and interpret human language. It encompasses various techniques and applications, including text processing, sentiment analysis, machine translation, and chatbots, which help improve efficiency and communication between humans and computers. Despite its advantages, NLP has limitations such as context understanding and adaptability to new domains.

Uploaded by

jaismeensaini348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

What is NLP?

NLP stands for Natural Language Processing, which is a part


of Computer Science, Human language, and Artificial Intelligence.
It is the technology that is used by machines to understand, analyse,
manipulate, and interpret human's languages. It helps developers to
organize knowledge for performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and topic segmentation.

Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct
response within seconds.
o NLP offers exact answers to the question means it does not offer
unnecessary and unwanted information.
o NLP helps computers to communicate with humans in their languages.
o It is very time efficient.
o Most of the companies use NLP to improve the efficiency of documentation
processes, accuracy of documentation, and identify the information from
large databases.
Disadvantages of NLP
A list of disadvantages of NLP is given below:

o NLP may not show context.


o NLP is unpredictable
o NLP may require more keystrokes.
o NLP is unable to adapt to the new domain, and it has a limited function
that's why NLP is built for a single and specific task only.

Components of NLP
There are the following two components of NLP -

1. Natural Language Understanding (NLU)

Natural Language Understanding (NLU) helps the machine to understand


and analyse human language by extracting the metadata from content
such as concepts, entities, keywords, emotion, relations, and semantic
roles.

ADVERTISEMENT

NLU mainly used in Business applications to understand the customer's


problem in both spoken and written language.

NLU involves the following tasks -

o It is used to map the given input into useful representation.


o It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG)

Natural Language Generation (NLG) acts as a translator that converts the


computerized data into natural language representation. It mainly
involves Text planning, Sentence planning, and Text Realization.

Note: The NLU is difficult than NLG.

Difference between NLU and NLG

NLU NLG
NLU is the process of reading and NLG is the process of writing or generating
interpreting language. language.

It produces non-linguistic outputs It produces constructing natural language


from natural language inputs. outputs from non-linguistic inputs.

Applications of NLP
There are the following applications of NLP -

1. Question Answering

Question Answering focuses on building systems that automatically


answer the questions asked by humans in a natural language.

2. Spam Detection

Spam detection is used to detect unwanted e-mails getting to a user's


inbox.
3. Sentiment Analysis

Sentiment Analysis is also known as opinion mining. It is used on the


web to analyse the attitude, behaviour, and emotional state of the sender.
This application is implemented through a combination of NLP (Natural
Language Processing) and statistics by assigning the values to the text
(positive, negative, or natural), identify the mood of the context (happy,
sad, angry, etc.)

4. Machine Translation

Machine translation is used to translate text or speech from one natural


language to another natural language.
Example: Google Translator

5. Spelling correction

Microsoft Corporation provides word processor software like MS-word,


PowerPoint for the spelling correction.

6. Speech Recognition

Speech recognition is used for converting spoken words into text. It is


used in applications, such as mobile, home automation, video recovery,
dictating to Microsoft Word, voice biometrics, voice user interface, and so
on.

7. Chatbot

Implementing the Chatbot is one of the important applications of NLP. It is


used by many companies to provide the customer's chat services.
8. Information extraction

Information extraction is one of the most important applications of NLP. It


is used for extracting structured information from unstructured or semi-
structured machine-readable documents.

9. Natural Language Understanding (NLU)

It converts a large set of text into more formal representations such as


first-order logic structures that are easier for the computer programs to
manipulate notations of the natural language processing.

Phases of NLP
There are the following five phases of NLP:
1. Lexical Analysis and Morphological

The first phase of NLP is the Lexical Analysis. This phase scans the source
code as a stream of characters and converts it into meaningful lexemes. It
divides the whole text into paragraphs, sentences, and words.

2. Syntactic Analysis (Parsing)

Syntactic Analysis is used to check grammar, word arrangements, and


shows the relationship among the words.

Example: Agra goes to the Poonam

In the real world, Agra goes to the Poonam, does not make any sense, so this
sentence is rejected by the Syntactic analyzer.

3. Semantic Analysis

Semantic analysis is concerned with the meaning representation. It mainly


focuses on the literal meaning of words, phrases, and sentences.

4. Discourse Integration

Discourse Integration depends upon the sentences that proceeds it and


also invokes the meaning of the sentences that follow it.

5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterize cooperative
dialogues.

For Example: "Open the door" is interpreted as a request instead of an


order.

What is Natural Language


Processing?
Natural language processing (NLP) is a field of computer science
and a subfield of artificial intelligence that aims to make
computers understand human language. NLP uses computational
linguistics, which is the study of how language works, and
various models based on statistics, machine learning, and deep
learning. These technologies allow computers to analyze and
process text or voice data, and to grasp their full meaning,
including the speaker’s or writer’s intentions and emotions.
NLP powers many applications that use language, such as text
translation, voice recognition, text summarization, and chatbots.
You may have used some of these applications yourself, such as
voice-operated GPS systems, digital assistants, speech-to-text
software, and customer service bots. NLP also helps businesses
improve their efficiency, productivity, and performance by
simplifying complex tasks that involve language.
NLP Techniques
NLP encompasses a wide array of techniques that aimed at
enabling computers to process and understand human language.
These tasks can be categorized into several broad areas, each
addressing different aspects of language processing. Here are
some of the key NLP techniques:

1. Text Processing and Preprocessing In NLP


 Tokenization: Dividing text into smaller units, such as
words or sentences.
 Stemming and Lemmatization: Reducing words to
their base or root forms.
 Stopword Removal: Removing common words (like
“and”, “the”, “is”) that may not carry significant
meaning.
 Text Normalization: Standardizing text, including case
normalization, removing punctuation, and correcting
spelling errors.
2. Syntax and Parsing In NLP
 Part-of-Speech (POS) Tagging: Assigning parts of
speech to each word in a sentence (e.g., noun, verb,
adjective).
 Dependency Parsing: Analyzing the grammatical
structure of a sentence to identify relationships between
words.
 Constituency Parsing: Breaking down a sentence into
its constituent parts or phrases (e.g., noun phrases, verb
phrases).
3. Semantic Analysis
 Named Entity Recognition (NER): Identifying and
classifying entities in text, such as names of people,
organizations, locations, dates, etc.
 Word Sense Disambiguation (WSD): Determining
which meaning of a word is used in a given context.
 Coreference Resolution: Identifying when different
words refer to the same entity in a text (e.g., “he” refers
to “John”).
4. Information Extraction
 Entity Extraction: Identifying specific entities and their
relationships within the text.
 Relation Extraction: Identifying and categorizing the
relationships between entities in a text.
5. Text Classification in NLP
 Sentiment Analysis: Determining the sentiment or
emotional tone expressed in a text (e.g., positive,
negative, neutral).
 Topic Modeling: Identifying topics or themes within a
large collection of documents.
 Spam Detection: Classifying text as spam or not spam.
6. Language Generation
 Machine Translation: Translating text from one
language to another.
 Text Summarization: Producing a concise summary of
a larger text.
 Text Generation: Automatically generating coherent
and contextually relevant text.
7. Speech Processing
 Speech Recognition: Converting spoken language into
text.
 Text-to-Speech (TTS) Synthesis: Converting written
text into spoken language.
8. Question Answering
 Retrieval-Based QA: Finding and returning the most
relevant text passage in response to a query.
 Generative QA: Generating an answer based on the
information available in a text corpus.
9. Dialogue Systems
 Chatbots and Virtual Assistants: Enabling systems to
engage in conversations with users, providing responses
and performing tasks based on user input.
10. Sentiment and Emotion Analysis in NLP
 Emotion Detection: Identifying and categorizing
emotions expressed in text.
 Opinion Mining: Analyzing opinions or reviews to
understand public sentiment toward products, services,
or topics.
Working of Natural Language
Processing (NLP)
Working in natural language processing (NLP) typically involves
using computational techniques to analyze and understand
human language. This can include tasks such as language
understanding, language generation, and language interaction.

1. Text Input and Data Collection


 Data Collection: Gathering text data from various
sources such as websites, books, social media, or
proprietary databases.
 Data Storage: Storing the collected text data in a
structured format, such as a database or a collection of
documents.

2. Text Preprocessing
Preprocessing is crucial to clean and prepare the raw text data
for analysis. Common preprocessing steps include:
 Tokenization: Splitting text into smaller units like words
or sentences.
 Lowercasing: Converting all text to lowercase to ensure
uniformity.
 Stopword Removal: Removing common words that do
not contribute significant meaning, such as “and,” “the,”
“is.”
 Punctuation Removal: Removing punctuation marks.
 Stemming and Lemmatization: Reducing words to
their base or root forms. Stemming cuts off suffixes,
while lemmatization considers the context and converts
words to their meaningful base form.
 Text Normalization: Standardizing text format,
including correcting spelling errors, expanding
contractions, and handling special characters.
3. Text Representation
 Bag of Words (BoW): Representing text as a collection
of words, ignoring grammar and word order but keeping
track of word frequency.
 Term Frequency-Inverse Document Frequency (TF-
IDF): A statistic that reflects the importance of a word in
a document relative to a collection of documents.
 Word Embeddings: Using dense vector representations
of words where semantically similar words are closer
together in the vector space (e.g., Word2Vec, GloVe).

4. Feature Extraction
Extracting meaningful features from the text data that can be
used for various NLP tasks.
 N-grams: Capturing sequences of N words to preserve
some context and word order.
 Syntactic Features: Using parts of speech tags,
syntactic dependencies, and parse trees.
 Semantic Features: Leveraging word embeddings and
other representations to capture word meaning and
context.

5. Model Selection and Training


Selecting and training a machine learning or deep learning
model to perform specific NLP tasks.
 Supervised Learning: Using labeled data to train
models like Support Vector Machines (SVM), Random
Forests, or deep learning models like Convolutional
Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs).
 Unsupervised Learning: Applying techniques like
clustering or topic modeling (e.g., Latent Dirichlet
Allocation) on unlabeled data.
 Pre-trained Models: Utilizing pre-trained language
models such as BERT, GPT, or transformer-based models
that have been trained on large corpora.

6. Model Deployment and Inference


Deploying the trained model and using it to make predictions or
extract insights from new text data.
 Text Classification: Categorizing text into predefined
classes (e.g., spam detection, sentiment analysis).
 Named Entity Recognition (NER): Identifying and
classifying entities in the text.
 Machine Translation: Translating text from one
language to another.
 Question Answering: Providing answers to questions
based on the context provided by text data.
7. Evaluation and Optimization
Evaluating the performance of the NLP algorithm using metrics
such as accuracy, precision, recall, F1-score, and others.
 Hyperparameter Tuning: Adjusting model parameters
to improve performance.
 Error Analysis: Analyzing errors to understand model
weaknesses and improve robustness.

8. Iteration and Improvement


Continuously improving the algorithm by incorporating new data,
refining preprocessing techniques, experimenting with different
models, and optimizing features.

Technologies related to
Natural Language Processing
There are a variety of technologies related to natural language
processing (NLP) that are used to analyze and understand
human language. Some of the most common include:
1. Machine learning: NLP relies heavily on machine
learning techniques such as supervised and
unsupervised learning, deep learning, and reinforcement
learning to train models to understand and generate
human language.

2. Natural Language Toolkits (NLTK) and other


libraries: NLTK is a popular open-source library in Python
that provides tools for NLP tasks such as tokenization,
stemming, and part-of-speech tagging. Other popular
libraries include spaCy, OpenNLP, and CoreNLP.
3. Parsers: Parsers are used to analyze the syntactic
structure of sentences, such as dependency parsing and
constituency parsing.

4. Text-to-Speech (TTS) and Speech-to-Text (STT)


systems: TTS systems convert written text into spoken
words, while STT systems convert spoken words into
written text.

5. Named Entity Recognition (NER) systems : NER


systems identify and extract named entities such as
people, places, and organizations from the text.

6. Sentiment Analysis: A technique to understand the


emotions or opinions expressed in a piece of text, by
using various techniques like Lexicon-Based, Machine
Learning-Based, and Deep Learning-based methods.

7. Machine Translation: NLP is used for language


translation from one language to another through a
computer.

8. Chatbots: NLP is used for chatbots that communicate


with other chatbots or humans through auditory or
textual methods.

9. AI Software: NLP is used in question-answering


software for knowledge representation, analytical
reasoning as well as information retrieval.

You might also like