0% found this document useful (0 votes)

6 views5 pages

NLP-pyth

Uploaded by

oaboalwafa75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

NLP-pyth

Uploaded by

oaboalwafa75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

NLP in Python

Dr. Loai Alnemer

1. Regular Expression

a. Example 1: Match an Email Address

import re

pattern = r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b'

text = "Contact us at [email protected]"

emails = re.findall(pattern, text)

print(emails)

- `\b`: Word boundary to ensure we match the whole email.

- `[a-zA-Z0-9._%+-]+`: Matches the username part, which can include letters, digits, and
special characters.

- `@`: The "@" symbol, required in every email.

- `[a-zA-Z0-9.-]+`: Matches the domain name, which can include letters, digits, dots, and
hyphens.

- `\.[a-zA-Z]{2,}`: Matches the top-level domain (e.g., `.com`, `.org`), requiring at least two
letters.

b. Example 2: Match a phone Number

pattern = r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b'

text = "Call me at 123-456-7890 or 987.654.3210"

phones = re.findall(pattern, text)

print(phones)
- `\b`: Word boundary.

- `\d{3}`: Matches three digits (area code).

- `[-.\s]?`: Matches an optional separator (dash, dot, or space).

- `\d{3}`: Matches the next three digits (exchange).

- `[-.\s]?`: Matches another optional separator.

- `\d{4}`: Matches the last four digits (subscriber number).

c. Example 3: Matches Date (MM/DD/YYYY)

pattern = r'\b\d{2}/\d{2}/\d{4}\b'

text = "Today's date is 05/12/2024"

dates = re.findall(pattern, text)

print(dates)

- `\b`: Word boundary.

- `\d{2}`: Matches exactly two digits (day).

- `/`: The separator.

- `\d{2}`: Matches two digits (month).

- `/`: The separator.

- `\d{4}`: Matches four digits (year).

d. Search for the first match of email:

import re

text = "My email is [email protected] and My alternative email is [email protected]"

pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

match = re.search(pattern, text)

if match:

print(f"Found email: {match.group()}")

It retrieves the first match

e. Match function

import re
text = "Hello World!"
pattern = r'Hello'
match = re.match(pattern, text)
if match:
print("Match found at the beginning!")
else:
print("No match at the beginning.")

text = "123 apples"

pattern = r'\d+'
match = re.match(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match.")

# Output: String starts with letters and ends with digits!

text = "abc123"
pattern = r'^[a-zA-Z]+\d+$'
match = re.match(pattern, text)
if match:
print("String starts with letters and ends with digits!")
else:
print("No match.")

f. sub function

text = "Please contact us at [email protected]."

pattern = r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b'
replacement = "REDACTED"
new_text = re.sub(pattern, replacement, text)
print(new_text)
# Output: Please contact us at REDACTED

g. Split function

text = "Please contact us at [email protected]."

pattern = r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b'
replacement = "REDACTED"
new_text = re.sub(pattern, replacement, text)
print(new_text)
# Output: Please contact us at REDACTED

h. Finditer function

text = "The rain in Spain stays mainly in the plain."

pattern = r'\bain\b'
matches = re.finditer(pattern, text)
for match in matches:
print(f"Match found at position {match.start()}: '{match.group()}'")
# Output:
# Match found at position 5: 'ain'
# Match found at position 21: 'ain'
# Match found at position 44: 'ain'

2. Stemming
a. PorterStemmer
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ["running", "ran", "easily", "fairly"]
stems = [stemmer.stem(word) for word in words]
print(stems)
# Output: ['run', 'ran', ' easili', 'fairli']
b. Lancaster Stemmer
from nltk.stem import LancasterStemmer
# Initialize the Lancaster Stemmer
lancaster_stemmer = LancasterStemmer()
# Example words to stem
words = ["running", "ran", "easily", "fairly", "happiness"]
# Apply stemming
lancaster_stems = [lancaster_stemmer.stem(word) for word in words]
print("Lancaster Stemmer:", lancaster_stems)
# Output: ['run', 'ran', 'easy', 'fair', 'happi']

c. Snowball stemmer
from nltk.stem import SnowballStemmer
# Initialize the Snowball Stemmer for English
snowball_stemmer = SnowballStemmer("english")
# Example words to stem
words = ["running", "ran", "easily", "fairly", "happiness"]
# Apply stemming
snowball_stems = [snowball_stemmer.stem(word) for word in words]
print("Snowball Stemmer:", snowball_stems)
# Output: ['run', 'ran', 'easili', 'fairli', 'happi']

d. Compare all

# Combine results for comparison

results = {
"Word": words,
"Porter": porter_stems,
"Lancaster": lancaster_stems,
"Snowball": snowball_stems,
}
# Print the results as a table
for i in range(len(words)):
print(f"{results['Word'][i]:<10} | {results['Porter'][i]:<10} | {results['Lancaster'][i]:<10} |
{results['Snowball'][i]:<10}")

Word | Porter | Lancaster | Snowball

Test Review: Section Title: One Mark Questions Total Questions: 10 Max Marks: 1 Ve Marks:0.33
No ratings yet
Test Review: Section Title: One Mark Questions Total Questions: 10 Max Marks: 1 Ve Marks:0.33
748 pages
ug903-vivado-using-constraints
No ratings yet
ug903-vivado-using-constraints
199 pages
ATPCS
No ratings yet
ATPCS
37 pages
632223462-unit-3-python
No ratings yet
632223462-unit-3-python
72 pages
9.RegEx
No ratings yet
9.RegEx
57 pages
Regular Expressions
No ratings yet
Regular Expressions
104 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
43 pages
ME7 Eeprom Checksum Guide
100% (1)
ME7 Eeprom Checksum Guide
2 pages
Unit 2
No ratings yet
Unit 2
69 pages
Python unit 3
No ratings yet
Python unit 3
46 pages
UNIT4
No ratings yet
UNIT4
67 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel
No ratings yet
String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel
37 pages
Day3.3 StringManipulation
No ratings yet
Day3.3 StringManipulation
43 pages
W10A Full
No ratings yet
W10A Full
40 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
STRING METHODS
No ratings yet
STRING METHODS
14 pages
unit 4 Regular expression
No ratings yet
unit 4 Regular expression
16 pages
11th Maths EM Question Bank 1st Volume English Medium PDF Download
No ratings yet
11th Maths EM Question Bank 1st Volume English Medium PDF Download
16 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
2021-Sixth-Semster-Diploma in Computer Engineering-[www.arjun00.com.np] (1)
No ratings yet
2021-Sixth-Semster-Diploma in Computer Engineering-[www.arjun00.com.np] (1)
28 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Binary Trees 101 Binary Trees Exercises 101 E1 Construct The 14 Binary Trees With Four Nodes
No ratings yet
Binary Trees 101 Binary Trees Exercises 101 E1 Construct The 14 Binary Trees With Four Nodes
94 pages
WFM Calc Groups and Quick Rule Editor
No ratings yet
WFM Calc Groups and Quick Rule Editor
13 pages
Regular Expression 1
No ratings yet
Regular Expression 1
17 pages
Summary Python 1
No ratings yet
Summary Python 1
36 pages
Module II
No ratings yet
Module II
17 pages
Ge Rex
No ratings yet
Ge Rex
32 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
Session-20 - Jupyter Notebook
No ratings yet
Session-20 - Jupyter Notebook
12 pages
22MCA1061 Regx
No ratings yet
22MCA1061 Regx
18 pages
2017 CSN523 T1 Solution PDF
No ratings yet
2017 CSN523 T1 Solution PDF
3 pages
Python Module-3 QB Solution (21EC643)
No ratings yet
Python Module-3 QB Solution (21EC643)
24 pages
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
No ratings yet
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
17 pages
UNIT 5 High
No ratings yet
UNIT 5 High
24 pages
Python Operators and Control Flow Statements Analysis
No ratings yet
Python Operators and Control Flow Statements Analysis
17 pages
Lesson 12
No ratings yet
Lesson 12
14 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
BECE SAMPLE QUESTION ON COMPUTER
100% (1)
BECE SAMPLE QUESTION ON COMPUTER
6 pages
Advanced Python Programming Practical Manual
No ratings yet
Advanced Python Programming Practical Manual
29 pages
Year 9 Term 2 Assessment Shadow Paper
No ratings yet
Year 9 Term 2 Assessment Shadow Paper
18 pages
17_Regular Expression
No ratings yet
17_Regular Expression
20 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
цждццдцто2цйцйгнтб
No ratings yet
цждццдцто2цйцйгнтб
5 pages
Reliable_and_Fault_Diagnosis_Architectures_for_Hardware_and_Software-Efficient_Block_Cipher_KLEIN_Benchmarked_on_FPGA
No ratings yet
Reliable_and_Fault_Diagnosis_Architectures_for_Hardware_and_Software-Efficient_Block_Cipher_KLEIN_Benchmarked_on_FPGA
5 pages
Python Re
No ratings yet
Python Re
18 pages
Oop Java Project 2 Guess The Movie
No ratings yet
Oop Java Project 2 Guess The Movie
5 pages
regular exp
No ratings yet
regular exp
10 pages
String and Text Processing
No ratings yet
String and Text Processing
8 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
SASTRA OS MCQ Bank
No ratings yet
SASTRA OS MCQ Bank
21 pages
AA (Secret Messages) - Mark - 6-3 CHS
No ratings yet
AA (Secret Messages) - Mark - 6-3 CHS
4 pages
EEE 8086 Assembly Language Programming
No ratings yet
EEE 8086 Assembly Language Programming
42 pages
Unit01_String Functions
No ratings yet
Unit01_String Functions
15 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Unit_3_2
No ratings yet
Unit_3_2
3 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Regular
No ratings yet
Regular
9 pages
python 2nd internal
No ratings yet
python 2nd internal
3 pages
Module 4
No ratings yet
Module 4
29 pages
Mod 51210
No ratings yet
Mod 51210
4 pages
Lecture 7 Re Part2 Split
No ratings yet
Lecture 7 Re Part2 Split
8 pages
Term 3 OOP Prelim Lab Exam - Attempt Review2
No ratings yet
Term 3 OOP Prelim Lab Exam - Attempt Review2
5 pages
Python Tutorial 27
No ratings yet
Python Tutorial 27
3 pages
Regex
No ratings yet
Regex
5 pages
PYTHON Data Science Internal
No ratings yet
PYTHON Data Science Internal
2 pages
Structured Programming
100% (1)
Structured Programming
42 pages
Python
No ratings yet
Python
4 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Re Methods
No ratings yet
Re Methods
2 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
Prob PC
No ratings yet
Prob PC
1 page
Manipulating Text with Regular Expression in python
No ratings yet
Manipulating Text with Regular Expression in python
4 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Php - Laravel 5 Handle Return From Controller - Stack Overflow
No ratings yet
Php - Laravel 5 Handle Return From Controller - Stack Overflow
2 pages
Unit 5
No ratings yet
Unit 5
4 pages
Flat Units-3,45 Questionbank
No ratings yet
Flat Units-3,45 Questionbank
4 pages
Easy UPS 3S 208 V UL RS485 Modbus Register Map - EN
No ratings yet
Easy UPS 3S 208 V UL RS485 Modbus Register Map - EN
6 pages
Testing 13
No ratings yet
Testing 13
21 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Submission I - Case Study For PGDDS (Semester II)
No ratings yet
Submission I - Case Study For PGDDS (Semester II)
14 pages
Regular Expressions
100% (1)
Regular Expressions
15 pages
DAA Assignment - 17-April-2020: Q1. With Help of Venn Diagram Explain Commonly Believed Relationship Between P and NP?
No ratings yet
DAA Assignment - 17-April-2020: Q1. With Help of Venn Diagram Explain Commonly Believed Relationship Between P and NP?
11 pages
Python An Introduction
From Everand
Python An Introduction
Renier Engelbrecht
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

NLP-pyth

Uploaded by

NLP-pyth

Uploaded by

NLP in Python

Dr. Loai Alnemer

a. Example 1: Match an Email Address

text = "Contact us at [email protected]"

emails = re.findall(pattern, text)

- `\b`: Word boundary to ensure we match the whole email.

- `@`: The "@" symbol, required in every email.

b. Example 2: Match a phone Number

text = "Call me at 123-456-7890 or 987.654.3210"

phones = re.findall(pattern, text)

- `\d{3}`: Matches three digits (area code).

- `[-.\s]?`: Matches an optional separator (dash, dot, or space).

- `\d{3}`: Matches the next three digits (exchange).

- `[-.\s]?`: Matches another optional separator.

- `\d{4}`: Matches the last four digits (subscriber number).

c. Example 3: Matches Date (MM/DD/YYYY)

text = "Today's date is 05/12/2024"

dates = re.findall(pattern, text)

- `\b`: Word boundary.

- `\d{2}`: Matches exactly two digits (day).

- `/`: The separator.

- `\d{2}`: Matches two digits (month).

- `/`: The separator.

- `\d{4}`: Matches four digits (year).

text = "My email is [email protected] and My alternative email is [email protected]"

match = re.search(pattern, text)

print(f"Found email: {match.group()}")

It retrieves the first match

text = "123 apples"

# Output: String starts with letters and ends with digits!

text = "Please contact us at [email protected]."

text = "Please contact us at [email protected]."

text = "The rain in Spain stays mainly in the plain."

# Combine results for comparison

Word | Porter | Lancaster | Snowball

You might also like