0% found this document useful (0 votes)

17 views6 pages

Python Code For NLP

Uploaded by

Amanraj Somawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Python Code For NLP

Uploaded by

Amanraj Somawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Python

import nltk
import string
import re
Text Lowercase
We lowercase the text to reduce the size of the vocabulary of our
text data.
Python
def text_lowercase(text):
return text.lower()

input_str = "Hey, did you know that the summer break is coming? Amazing
right !! It's only 5 more days !!";
text_lowercase(input_str)
Example:

Input: “Hey, did you know that the summer break is coming?
Amazing right!! It’s only 5 more days!!”

Output: “hey, did you know that the summer break is coming?
amazing right!! it’s only 5 more days!!”

Remove numbers
Python
# Remove numbers
def remove_numbers(text):
result = re.sub(r'\d+', '', text)
return result

input_str = "There are 3 balls in this bag, and 12 in the other one."
remove_numbers(input_str)
Example:
Input: “There are 3 balls in this bag, and 12 in the other one.”
Output: ‘There are balls in this bag, and in the other one.’

We can also convert the numbers into words. This can be done by
using the inflect library.
Python
# import the inflect library
import inflect
p = inflect.engine()
# convert number into words
def convert_number(text):
# split string into list of words
temp_str = text.split()
# initialise empty list
new_string = []

for word in temp_str:

# if word is a digit, convert the digit
# to numbers and append into the new_string list
if word.isdigit():
temp = p.number_to_words(word)
new_string.append(temp)

# append the word as it is

else:
new_string.append(word)

# join the words of new_string to form a string

temp_str = ' '.join(new_string)
return temp_str

input_str = 'There are 3 balls in this bag, and 12 in the other one.'
convert_number(input_str)
Example:

Input: “There are 3 balls in this bag, and 12 in the other one.”
Output: “There are three balls in this bag, and twelve in the other
one.”
Python
# remove punctuation
def remove_punctuation(text):
translator = str.maketrans('', '', string.punctuation)
return text.translate(translator)
input_str = "Hey, did you know that the summer break is coming? Amazing
right !! It's only 5 more days !!"
remove_punctuation(input_str)
Example:

Input: “Hey, did you know that the summer break is coming?
Amazing right!! It’s only 5 more days!!”
Output: “Hey did you know that the summer break is coming
Amazing right Its only 5 more days”

Remove whitespace
We can use the join and split function to remove all the white spaces
in a string.
Python
# remove whitespace from text
def remove_whitespace(text):
return " ".join(text.split())
input_str = "we don't need the given questions"
remove_whitespace(input_str)
Example:

Input: " we don't need the given questions"

Output: "we don't need the given questions"
Remove default stopwords

Example:
Python
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# remove stopwords function
def remove_stopwords(text):
stop_words = set(stopwords.words("english"))
word_tokens = word_tokenize(text)
filtered_text = [word for word in word_tokens if word not in
stop_words]
return filtered_text

example_text = "This is a sample sentence and we are going to remove the

stopwords from this."
remove_stopwords(example_text)
Example:

Input: “This is a sample sentence and we are going to remove the

stopwords from this”
Output: [‘This’, ‘sample’, ‘sentence’, ‘going’, ‘remove’, ‘stopwords’]

Stemming
books ---> book
looked ---> look
denied ---> deni
flies ---> fli
Python
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import word_tokenize
stemmer = PorterStemmer()

# stem words in the list of tokenized words

def stem_words(text):
word_tokens = word_tokenize(text)
stems = [stemmer.stem(word) for word in word_tokens]
return stems

text = 'data science uses scientific methods algorithms and many types of
processes'
stem_words(text)

Example:
Input: ‘data science uses scientific methods algorithms and many
types of processes’
Output: [‘data’, ‘scienc’, ‘use’, ‘scientif’, ‘method’, ‘algorithm’, ‘and’,
‘mani’, ‘type’, ‘of’, ‘process’]
Lemmatization
Python
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
lemmatizer = WordNetLemmatizer()

def lemma_words(text):
word_tokens = word_tokenize(text)
lemmas = [lemmatizer.lemmatize(word) for word in word_tokens]
return lemmas

input_str = "data science uses scientific methods algorithms and many

types of processes"
lemma_words(input_str)

Example:
Input: ‘data science uses scientific methods algorithms and many
types of processes’
Output: [‘data’, ‘science’, ‘use’, ‘scientific’, ‘methods’, ‘algorithms’,
‘and’, ‘many’, ‘type’, ‘of’, ‘process’]

Post-Quiz Basisoftesting
100% (4)
Post-Quiz Basisoftesting
3 pages
Approaching Almost Any NLP
No ratings yet
Approaching Almost Any NLP
118 pages
Caper Jones Rule of Thumb
50% (2)
Caper Jones Rule of Thumb
17 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
NLP Lab Exp 01
No ratings yet
NLP Lab Exp 01
5 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Bling
No ratings yet
Bling
7 pages
Experiment No 3
No ratings yet
Experiment No 3
7 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
Module 5
No ratings yet
Module 5
69 pages
Unit 5
No ratings yet
Unit 5
4 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Preprocessing in Python
No ratings yet
Preprocessing in Python
50 pages
NLP Slides
No ratings yet
NLP Slides
19 pages
Ch-10 (String Manipulation)
No ratings yet
Ch-10 (String Manipulation)
5 pages
String and Text Processing
No ratings yet
String and Text Processing
8 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
Assignment 3
No ratings yet
Assignment 3
10 pages
Assessment 2- Strings
No ratings yet
Assessment 2- Strings
2 pages
Unit 3 Strings
No ratings yet
Unit 3 Strings
9 pages
List Coomprehensions
No ratings yet
List Coomprehensions
24 pages
PPT
No ratings yet
PPT
3 pages
Ngram 2x3
No ratings yet
Ngram 2x3
5 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
38 pages
Text Cleaning Methods in NLP
No ratings yet
Text Cleaning Methods in NLP
7 pages
Strings
No ratings yet
Strings
4 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Python Course
No ratings yet
Python Course
6 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
13 BSC CS Python CHP 4
No ratings yet
13 BSC CS Python CHP 4
12 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
SDFSD 5
No ratings yet
SDFSD 5
18 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Assi-4.ipynb - Colab
No ratings yet
Assi-4.ipynb - Colab
6 pages
Lab File Complete
No ratings yet
Lab File Complete
10 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
String Function
No ratings yet
String Function
6 pages
String
No ratings yet
String
7 pages
String Assignment - Part 1
No ratings yet
String Assignment - Part 1
8 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Ch11 ManipulatingTextWithMethodsAndFiles
No ratings yet
Ch11 ManipulatingTextWithMethodsAndFiles
54 pages
String
No ratings yet
String
7 pages
AssignmentQuestions (sampleQuestionswithAnswers)
No ratings yet
AssignmentQuestions (sampleQuestionswithAnswers)
12 pages
SHAHEER
No ratings yet
SHAHEER
7 pages
CTPS Week 4 Solutions
No ratings yet
CTPS Week 4 Solutions
7 pages
String Python
No ratings yet
String Python
8 pages
NLP TP1 Report Lahouel Ibrahim
No ratings yet
NLP TP1 Report Lahouel Ibrahim
6 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
XI Computer Science Gist-04
No ratings yet
XI Computer Science Gist-04
36 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
7 Exp
No ratings yet
7 Exp
6 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
(CH-6) String Manipulation
No ratings yet
(CH-6) String Manipulation
27 pages
String Methods
No ratings yet
String Methods
9 pages
NLP Pyth
No ratings yet
NLP Pyth
5 pages
NLP Exp-123
No ratings yet
NLP Exp-123
6 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
1 page
DocScanner 20 Nov 2024 7-29 PM
No ratings yet
DocScanner 20 Nov 2024 7-29 PM
1 page
DocScanner 20 Nov 2024 7-30 PM
No ratings yet
DocScanner 20 Nov 2024 7-30 PM
1 page
Send - Unit-4 Notes - Entrepreneurship & EDP
No ratings yet
Send - Unit-4 Notes - Entrepreneurship & EDP
26 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Cat D
100% (1)
Cat D
46 pages
Integrated Routing and Addressing For Improved Ipv4 and Ipv6 Coexistence Seminar Report 2011-2012
No ratings yet
Integrated Routing and Addressing For Improved Ipv4 and Ipv6 Coexistence Seminar Report 2011-2012
19 pages
PERCENTAGE
No ratings yet
PERCENTAGE
35 pages
1 s2.0 S240584402202552X Main
No ratings yet
1 s2.0 S240584402202552X Main
12 pages
PP-4 QP
No ratings yet
PP-4 QP
7 pages
1691298443
No ratings yet
1691298443
4 pages
Upperlevelsyllabus16 1
No ratings yet
Upperlevelsyllabus16 1
3 pages
TAB1
No ratings yet
TAB1
120 pages
Business Process Architecture and The Workflow Reference Model
No ratings yet
Business Process Architecture and The Workflow Reference Model
52 pages
English6 q1
No ratings yet
English6 q1
11 pages
Most Rev PMA Metropolitan and Ors Vs Moran Mar Mar0712s950104COM874240
No ratings yet
Most Rev PMA Metropolitan and Ors Vs Moran Mar Mar0712s950104COM874240
88 pages
3408-Article Text-14726-1-10-20240115
No ratings yet
3408-Article Text-14726-1-10-20240115
6 pages
Ephemeral City Cheap Print and Urban Culture in Renaissance Venice Rosa Salzberg Instant Download
100% (1)
Ephemeral City Cheap Print and Urban Culture in Renaissance Venice Rosa Salzberg Instant Download
52 pages
Setting Filesystem ACL: Add A New Resource
No ratings yet
Setting Filesystem ACL: Add A New Resource
3 pages
REGULAR-AND-IRREGULAR-VERBS (Recuperado Automáticamente)
No ratings yet
REGULAR-AND-IRREGULAR-VERBS (Recuperado Automáticamente)
6 pages
Guru Gobind Singh Ji Marg Details
No ratings yet
Guru Gobind Singh Ji Marg Details
18 pages
Microsoft PowerPoint - Thieu Nang - Session 8
No ratings yet
Microsoft PowerPoint - Thieu Nang - Session 8
25 pages
ASMB-823I Startup Manual ed.1-FINAL
No ratings yet
ASMB-823I Startup Manual ed.1-FINAL
3 pages
Dsal Assignment 7
No ratings yet
Dsal Assignment 7
5 pages
Ddt3 - Parts Service Quotation - First Global Conglomerates
No ratings yet
Ddt3 - Parts Service Quotation - First Global Conglomerates
1 page
Essay Healthy Urdu PDF
No ratings yet
Essay Healthy Urdu PDF
4 pages
Entrance Test Bachelor 2017 - English
No ratings yet
Entrance Test Bachelor 2017 - English
11 pages
(Legal Code) Disclaimer
No ratings yet
(Legal Code) Disclaimer
224 pages
Heebon Self-Intro
No ratings yet
Heebon Self-Intro
4 pages
Cultural Models
No ratings yet
Cultural Models
7 pages
YTD MTD Presentation
No ratings yet
YTD MTD Presentation
42 pages
G9 Math Summative Test Q1W4 5
No ratings yet
G9 Math Summative Test Q1W4 5
2 pages

Python Code For NLP

Uploaded by

Python Code For NLP

Uploaded by

Python

for word in temp_str:

# append the word as it is

# join the words of new_string to form a string

Input: " we don't need the given questions"

example_text = "This is a sample sentence and we are going to remove the

Input: “This is a sample sentence and we are going to remove the

# stem words in the list of tokenized words

input_str = "data science uses scientific methods algorithms and many

You might also like