0% found this document useful (0 votes)
25 views19 pages

Disruptive Technologies AI Lecture 3

Uploaded by

Sagnik Bachhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views19 pages

Disruptive Technologies AI Lecture 3

Uploaded by

Sagnik Bachhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Disruptive Technologies

LECTURE 3

Artificial Intelligence and Machine Learning

Dr. Tamal Ghosh, Department of Computer Science & Engineering, Ad


amas University
Natural Language Processing (NLP)
Natural Language Processing (NLP) is a
branch of artificial intelligence (AI) that
focuses on the interaction between
computers and humans through natural
language.
 The goal of NLP is to enable machines
to understand, interpret, and generate
human language in a way that is both
meaningful and contextually relevant.
NLP involves a range of tasks and
challenges, aiming to bridge the gap
between human communication and
computer understanding.
Challenges of nlp

Ambiguity: Natural language is often ambiguous, and words or


phrases can have multiple meanings.

Context Understanding: Understanding context is crucial for


interpreting the meaning of words and phrases in a given context.

Negation and Irony: NLP systems need to identify and understand


negation and expressions of irony or sarcasm in text.

Multilingualism: NLP faces challenges in handling multiple


languages and language variations.
key components and aspects of NLP

1. Text Understanding:
NLP systems are designed to comprehend and extract meaning
from textual data. This includes tasks such as part-of-speech
tagging, named entity recognition, and syntactic parsing.

2. Machine Translation:
NLP plays a crucial role in machine translation systems, allowing
computers to automatically translate text or speech from one
language to another. Prominent examples include Google Translate.
Contd…

3. Sentiment Analysis:
Sentiment analysis, or opinion mining, involves determining the
sentiment expressed in a piece of text. This is often used to analyze
social media content, reviews, and customer feedback.

4. Speech Recognition:
NLP enables machines to convert spoken language into written
text. Speech recognition technology is used in applications like
virtual assistants and voice-controlled systems.
Contd…
5. Question Answering:
NLP systems aim to understand user queries and provide relevant
answers. This is seen in applications like search engines and virtual
assistants.

6. Text Generation:
NLP can be applied to generate human-like text. This is utilized in
chatbots, content creation, and even in the generation of news articles.

7. Named Entity Recognition (NER):


NER involves identifying entities, such as names of people,
organizations, locations, dates, and other specific information, in a given
text.
Contd…
8. Coreference Resolution:
Resolving coreferences is the task of determining which words or
phrases in a text refer to the same entities. This is important for
understanding the context of the text.

9. Language Models:
Language models, often based on deep learning techniques, are
central to NLP. These models learn to predict the probability of the
next word in a sequence, capturing syntactic and semantic patterns.
Text Preprocessing and Tokenization
Text preprocessing is a crucial step in natural language processing
(NLP) that involves cleaning and transforming raw text data into a
format suitable for analysis. Tokenization is a specific preprocessing
technique that breaks down text into individual units called tokens.
1. Text Preprocessing:
Text preprocessing includes several tasks to clean and prepare raw
text data for analysis. Common preprocessing steps include:
Lowercasing:
Convert all text to lowercase to ensure consistency in the analysis.
This helps in treating words in a case-insensitive manner.
Removing Punctuation:
Eliminate punctuation marks from the text. This step simplifies the
analysis and ensures that punctuation does not interfere with the
identification of words.
Contd…
Removing Numbers:
Exclude numerical digits from the text. In many cases, numerical
values may not contribute significantly to certain types of analyses.
Removing Stopwords:
Stopwords are common words (e.g., "and," "the," "is") that often do not
carry significant meaning in certain contexts. Removing them can
reduce noise in the analysis.
Stemming and Lemmatization:
Stemming involves reducing words to their root or base form, while
lemmatization aims to reduce words to their dictionary form. Both
techniques help in grouping similar words.
Handling Special Characters:
Address special characters or symbols in the text. This may involve
removing or replacing specific characters based on the context.
Contd…
2. Tokenization :
Tokenization is the process of breaking down text into smaller units
called tokens, which are usually words or subwords. Tokens serve as
the basic building blocks for further analysis. Common tokenization
techniques include:
Word Tokenization:
Dividing text into individual words. Each word becomes a separate
token.
Example:
Input: "Text preprocessing is important."
Output: ["Text", "preprocessing", "is", "important", "."]
Contd…
Sentence Tokenization:
Dividing text into individual sentences. Each sentence becomes a
separate token.
Example:
Input: "NLP involves various tasks. Tokenization is one of them."
Output: ["NLP involves various tasks.", "Tokenization is one of them."]
Subword Tokenization:
Breaking down words into smaller units, such as subword pieces. This
technique is useful for handling unknown words and improving the
flexibility of models.
Example:
Input: "Tokenization"
Output: ["To", "ken", "iza", "tion"]
Basic Sentiment Analysis
Sentiment analysis is a natural language processing (NLP) task that
involves determining the sentiment expressed in a piece of text,
typically as positive, negative, or neutral.

Basic sentiment analysis often employs rule-based or machine-


learning approaches to categorize text based on its emotional tone.

Below is a simple example of sentiment analysis using Python and


the Natural Language Toolkit (NLTK) library.

The polarity_scores function returns a dictionary of sentiment scores,


including a compound score that represents the overall sentiment.
The function analyze_sentiment classifies the sentiment as positive,
negative, or neutral based on the compound score.
Python sentiment analyzer
 # Classify sentiment based on the
import nltk
compound score
from nltk.sentiment import
 if sentiment_scores['compound']
SentimentIntensityAnalyzer >= 0.05:
# Download NLTK resources  return 'Positive'
(run this once)
 elif sentiment_scores['compound']
nltk.download('vader_lexico <= -0.05:
n')  return 'Negative'
def analyze_sentiment(text):
 else: return 'Neutral'
 # Initialize the
 text_to_analyze = "I love this
SentimentIntensityAnalyzer
product! It's amazing."
 sia =  # Analyze sentiment
SentimentIntensityAnalyzer()
  result =
# Get sentiment scores analyze_sentiment(text_to_analyz
 sentiment_scores = e)
sia.polarity_scores(text)  print("Sentiment: {result}")
Applications of NLP in Real-World Scenarios

1. Virtual Assistants and Chatbots:

NLP powers virtual assistants like Siri, Alexa, and Google Assistant,
as well as chatbots on websites and messaging platforms. These
applications understand user queries, provide information, and
perform tasks through natural language interaction.

2. Search Engines:

NLP algorithms enhance the accuracy and relevance of search


engine results. Understanding user queries and providing
contextually relevant information improves the overall search
Contd…

3. Sentiment Analysis in Social Media:

NLP is used to analyze sentiment in social media posts, reviews,


and comments. This application helps businesses understand
customer opinions, assess brand sentiment, and respond to
feedback.

4. Email Filtering and Categorization:

NLP is employed in email systems to filter spam, categorize


emails, and prioritize messages. Understanding the context and
intent of emails contributes to effective inbox management.
Contd…
5. Language Translation:

NLP plays a crucial role in machine translation systems,


facilitating the automatic translation of text from one language
to another. Google Translate is a notable example.

6. Text Summarization:

NLP is used to automatically generate concise and coherent


summaries of long pieces of text. This is beneficial for quickly
understanding the main points of articles, documents, or news
stories.
Contd…
7. Named Entity Recognition (NER) in Finance:

In the finance industry, NLP is applied to extract and categorize


named entities (e.g., companies, people, locations) from news
articles, reports, and financial documents for analysis and
decision-making.

8. Medical Record Analysis:

NLP aids in extracting information from medical records, enabling


healthcare professionals to analyze patient histories, identify
patterns, and make informed decisions.
Contd…

9. Legal Document Analysis:

NLP is used to process and analyze legal documents, contracts, and


court rulings. This helps legal professionals search for relevant
information, perform due diligence, and manage large volumes of
legal texts.

10. Customer Support and Ticketing Systems:

- NLP is employed in customer support systems to understand and


respond to customer queries. It is also used in ticketing systems to
categorize and prioritize support tickets based on their content.
Contd…

11. News and Content Recommendation:

- NLP algorithms analyze user preferences and behaviors to


recommend personalized news articles, videos, or other content.
This enhances user engagement on platforms like news websites
or streaming services.

12. Fraud Detection in Finance:

- In the finance sector, NLP assists in fraud detection by analyzing


text data, including transaction descriptions and communication
logs, to identify potentially fraudulent activities.

You might also like