0% found this document useful (0 votes)

20 views3 pages

NLP PDF

The document provides an overview of essential Natural Language Processing (NLP) techniques, including tokenization, stop words removal, stemming, lemmatization, word embeddings, and vector databases. It includes Python examples using libraries like NLTK and Gensim to demonstrate these concepts. Additionally, it discusses the Bag of Words, TF-IDF, and N-grams methods for text representation and their applications in semantic search and recommendation systems.

Uploaded by

shabarirajan92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views3 pages

NLP PDF

Uploaded by

shabarirajan92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Natural Language Processing (NLP) Essentials with Examples

1. Tokenization
Tokenization is the process of splitting text into smaller units, such as words or sentences.

Example of word tokenization in Python:

```python

from nltk.tokenize import word_tokenize

sentence = "Tokenization is an important NLP task."

tokens = word_tokenize(sentence)

print(tokens) # Output: ['Tokenization', 'is', 'an', 'important', 'NLP', 'task', '.']

```

2. Stop Words
Stop words are common words that are usually removed from text as they carry minimal meaning.

Example of removing stop words using NLTK:

```python

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

stop_words = set(stopwords.words('english'))

words = word_tokenize("This is a simple NLP example.")

filtered_words = [word for word in words if word.lower() not in stop_words]

print(filtered_words) # Output: ['simple', 'NLP', 'example', '.']

```

3. Stemming and Lemmatization

- Stemming reduces words to their root form by cutting off prefixes or suffixes, e.g., "running"

becomes "run".

- Lemmatization reduces words to their base form using language rules, e.g., "better" to "good".
Example using NLTK:

```python

from nltk.stem import PorterStemmer, WordNetLemmatizer

stemmer = PorterStemmer()

lemmatizer = WordNetLemmatizer()

print(stemmer.stem("running")) # Output: 'run'

print(lemmatizer.lemmatize("better", pos="a")) # Output: 'good'

```

4. Word Embeddings
Word embeddings are dense vector representations of words capturing semantic relationships.

Example of generating word embeddings using Gensim's Word2Vec:

```python

from gensim.models import Word2Vec

sentences = [["hello", "world"], ["machine", "learning"], ["hello", "machine"]]

model = Word2Vec(sentences, vector_size=5, min_count=1)

vector = model.wv['hello']

print(vector) # Output: A dense vector representing 'hello'

```

5. Bag of Words, TF-IDF, and N-grams

- **Bag of Words (BoW)**: Represents text as a word count vector.

- TF-IDF (Term Frequency-Inverse Document Frequency): Adjusts word frequency by document

frequency.

- N-grams: Groups of N consecutive words.

Example using Scikit-learn for BoW and TF-IDF:

```python

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

text = ["Natural Language Processing", "Text Processing"]

bow = CountVectorizer().fit_transform(text)

tfidf = TfidfVectorizer().fit_transform(text)

print(bow.toarray()) # Output: BoW representation

print(tfidf.toarray()) # Output: TF-IDF representation

```

6. Vector Databases
Vector databases store word embeddings or document vectors for efficient similarity search.

Example usage in search systems:

1. Convert query and documents to vectors.

2. Use cosine similarity to retrieve the closest vectors.

Applications: Semantic search, recommendation systems.

NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
Information Storage and Retrieval - 783
100% (1)
Information Storage and Retrieval - 783
12 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
Irt 2 Marks With Answer
No ratings yet
Irt 2 Marks With Answer
15 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
28 pages
21 01 23
No ratings yet
21 01 23
8 pages
Introductory Sheet
No ratings yet
Introductory Sheet
4 pages
NLP 101 - Machine Learning Seminar 2017
100% (1)
NLP 101 - Machine Learning Seminar 2017
30 pages
NLP Record
No ratings yet
NLP Record
16 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
NLP Exp2
No ratings yet
NLP Exp2
6 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Text Processing
No ratings yet
Text Processing
5 pages
NLP Notes
No ratings yet
NLP Notes
12 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
Cyberbullying Detection Using Natural Language Processing
No ratings yet
Cyberbullying Detection Using Natural Language Processing
10 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Adnan Amin
No ratings yet
Adnan Amin
19 pages
2 Marks
No ratings yet
2 Marks
11 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Core Components of Natural Language Processing
No ratings yet
Core Components of Natural Language Processing
43 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
5 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
Module III
No ratings yet
Module III
42 pages
NLP 9
No ratings yet
NLP 9
44 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Transformer
No ratings yet
Transformer
5 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Hands-On Data Science With R Text Mining
No ratings yet
Hands-On Data Science With R Text Mining
41 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
1 PB
No ratings yet
1 PB
8 pages
Tutorial IJCAI 2013
No ratings yet
Tutorial IJCAI 2013
144 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Final Year Project Edi Irawan
No ratings yet
Final Year Project Edi Irawan
75 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
Shopee Code League 2021 Administrative Guide V1
No ratings yet
Shopee Code League 2021 Administrative Guide V1
11 pages
Clustering and Similarity:: Retrieving Documents
No ratings yet
Clustering and Similarity:: Retrieving Documents
47 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Staff Working Paper No. 865: Making Text Count: Economic Forecasting Using Newspaper Text
No ratings yet
Staff Working Paper No. 865: Making Text Count: Economic Forecasting Using Newspaper Text
49 pages
Unit-I QB
No ratings yet
Unit-I QB
5 pages
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
No ratings yet
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
61 pages
Introduction To The TM Package Text Mining in R: Ingo Feinerer June 10, 2014
No ratings yet
Introduction To The TM Package Text Mining in R: Ingo Feinerer June 10, 2014
7 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
A Survey Paper On Elastic Search Similarity Algorithm
No ratings yet
A Survey Paper On Elastic Search Similarity Algorithm
4 pages
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
No ratings yet
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
10 pages
Information Retrieval Practical
No ratings yet
Information Retrieval Practical
35 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
Amazon Recommender System
No ratings yet
Amazon Recommender System
24 pages
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
No ratings yet
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
4 pages
图书馆系统相关文献综述
100% (1)
图书馆系统相关文献综述
8 pages
Ir QB
No ratings yet
Ir QB
8 pages
Content-Based Filtering
No ratings yet
Content-Based Filtering
20 pages
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
No ratings yet
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
17 pages
2022 Lrec-1 704
No ratings yet
2022 Lrec-1 704
7 pages
Recognition and Processing of Phishing Emails Using NLP A Survey
No ratings yet
Recognition and Processing of Phishing Emails Using NLP A Survey
4 pages
Data Mining - Extracting Knowledge From Large Datasets
No ratings yet
Data Mining - Extracting Knowledge From Large Datasets
1 page
Enhancing Product Categorization in E-Commerce Using NLP and Machine Learning
No ratings yet
Enhancing Product Categorization in E-Commerce Using NLP and Machine Learning
6 pages
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet

NLP PDF

Uploaded by

NLP PDF

Uploaded by

Natural Language Processing (NLP) Essentials with Examples

Example of word tokenization in Python:

from nltk.tokenize import word_tokenize

sentence = "Tokenization is an important NLP task."

print(tokens) # Output: ['Tokenization', 'is', 'an', 'important', 'NLP', 'task', '.']

Example of removing stop words using NLTK:

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

words = word_tokenize("This is a simple NLP example.")

filtered_words = [word for word in words if word.lower() not in stop_words]

print(filtered_words) # Output: ['simple', 'NLP', 'example', '.']

3. Stemming and Lemmatization

from nltk.stem import PorterStemmer, WordNetLemmatizer

print(stemmer.stem("running")) # Output: 'run'

print(lemmatizer.lemmatize("better", pos="a")) # Output: 'good'

Example of generating word embeddings using Gensim's Word2Vec:

from gensim.models import Word2Vec

sentences = [["hello", "world"], ["machine", "learning"], ["hello", "machine"]]

model = Word2Vec(sentences, vector_size=5, min_count=1)

print(vector) # Output: A dense vector representing 'hello'

5. Bag of Words, TF-IDF, and N-grams

- **TF-IDF (Term Frequency-Inverse Document Frequency)**: Adjusts word frequency by document

- **N-grams**: Groups of N consecutive words.

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

text = ["Natural Language Processing", "Text Processing"]

print(bow.toarray()) # Output: BoW representation

print(tfidf.toarray()) # Output: TF-IDF representation

Example usage in search systems:

1. Convert query and documents to vectors.

2. Use cosine similarity to retrieve the closest vectors.

Applications: Semantic search, recommendation systems.

You might also like

- TF-IDF (Term Frequency-Inverse Document Frequency): Adjusts word frequency by document

- N-grams: Groups of N consecutive words.