Disruptive Technologies AI Lecture 3
Disruptive Technologies AI Lecture 3
LECTURE 3
1. Text Understanding:
NLP systems are designed to comprehend and extract meaning
from textual data. This includes tasks such as part-of-speech
tagging, named entity recognition, and syntactic parsing.
2. Machine Translation:
NLP plays a crucial role in machine translation systems, allowing
computers to automatically translate text or speech from one
language to another. Prominent examples include Google Translate.
Contd…
3. Sentiment Analysis:
Sentiment analysis, or opinion mining, involves determining the
sentiment expressed in a piece of text. This is often used to analyze
social media content, reviews, and customer feedback.
4. Speech Recognition:
NLP enables machines to convert spoken language into written
text. Speech recognition technology is used in applications like
virtual assistants and voice-controlled systems.
Contd…
5. Question Answering:
NLP systems aim to understand user queries and provide relevant
answers. This is seen in applications like search engines and virtual
assistants.
6. Text Generation:
NLP can be applied to generate human-like text. This is utilized in
chatbots, content creation, and even in the generation of news articles.
9. Language Models:
Language models, often based on deep learning techniques, are
central to NLP. These models learn to predict the probability of the
next word in a sequence, capturing syntactic and semantic patterns.
Text Preprocessing and Tokenization
Text preprocessing is a crucial step in natural language processing
(NLP) that involves cleaning and transforming raw text data into a
format suitable for analysis. Tokenization is a specific preprocessing
technique that breaks down text into individual units called tokens.
1. Text Preprocessing:
Text preprocessing includes several tasks to clean and prepare raw
text data for analysis. Common preprocessing steps include:
Lowercasing:
Convert all text to lowercase to ensure consistency in the analysis.
This helps in treating words in a case-insensitive manner.
Removing Punctuation:
Eliminate punctuation marks from the text. This step simplifies the
analysis and ensures that punctuation does not interfere with the
identification of words.
Contd…
Removing Numbers:
Exclude numerical digits from the text. In many cases, numerical
values may not contribute significantly to certain types of analyses.
Removing Stopwords:
Stopwords are common words (e.g., "and," "the," "is") that often do not
carry significant meaning in certain contexts. Removing them can
reduce noise in the analysis.
Stemming and Lemmatization:
Stemming involves reducing words to their root or base form, while
lemmatization aims to reduce words to their dictionary form. Both
techniques help in grouping similar words.
Handling Special Characters:
Address special characters or symbols in the text. This may involve
removing or replacing specific characters based on the context.
Contd…
2. Tokenization :
Tokenization is the process of breaking down text into smaller units
called tokens, which are usually words or subwords. Tokens serve as
the basic building blocks for further analysis. Common tokenization
techniques include:
Word Tokenization:
Dividing text into individual words. Each word becomes a separate
token.
Example:
Input: "Text preprocessing is important."
Output: ["Text", "preprocessing", "is", "important", "."]
Contd…
Sentence Tokenization:
Dividing text into individual sentences. Each sentence becomes a
separate token.
Example:
Input: "NLP involves various tasks. Tokenization is one of them."
Output: ["NLP involves various tasks.", "Tokenization is one of them."]
Subword Tokenization:
Breaking down words into smaller units, such as subword pieces. This
technique is useful for handling unknown words and improving the
flexibility of models.
Example:
Input: "Tokenization"
Output: ["To", "ken", "iza", "tion"]
Basic Sentiment Analysis
Sentiment analysis is a natural language processing (NLP) task that
involves determining the sentiment expressed in a piece of text,
typically as positive, negative, or neutral.
NLP powers virtual assistants like Siri, Alexa, and Google Assistant,
as well as chatbots on websites and messaging platforms. These
applications understand user queries, provide information, and
perform tasks through natural language interaction.