0% found this document useful (0 votes)

25 views3 pages

(Assignment 1 & 2) Regular Expression

This document discusses regular expressions (regex) and provides examples of regex patterns to accomplish various tasks: 1. Extract domain names from URLs 2. Extract and standardize dates in different formats 3. Extract prices from product descriptions considering currency formats 4. Extract hyperlinks from HTML code 5. Correct common spelling mistakes in text 6. Extract street addresses considering variations in formats 7. Identify and extract hexadecimal color codes from CSS It also discusses developing algorithms to disambiguate context-related ambiguities when extracting patterns and using sentiment analysis with non-negative matrix factorization to analyze sentiment in texts.

Uploaded by

TECHNICAL MALIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views3 pages

(Assignment 1 & 2) Regular Expression

Uploaded by

TECHNICAL MALIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

2023MSCS229

Assignment 1: Regular Expressions

1. Given a list of URLs, write a regular expression to extract the domain name from
each URL. Provide examples and explain the logic behind your regex.

[a-zA-Z0-9]*\.[a-zA-Z]*|[a-zA-Z0-9-]*+\.[a-zA-Z0-9]*\.[a-zA-Z]*

2. You are given a text document containing various dates in different formats (e.g.,
MM/DD/YYYY, DD-MM-YYYY, YYYY/MM/DD). Write a Python script that uses
regular expressions to extract and format all the dates in a consistent manner.

import re
from dateutil.parser import parse as dateutil_parser
datePattern = '\d{4}-\d{2}-\d{2}|\d{2}-\d{2}-\d{4}|\d{2}-\d{2}-\d{2}|\d{4}\/\d{2}\/\d{2}|
\d{2}\/\d{2}\/\d{4}|\d{2}\/\d{2}\/\d{2}'
dateMatches = re.findall(datePattern, dates);
consistentDateFormate = "%Y-%m-%d"
for date in dateMatches:
print(dateutil_parser(date).strftime(consistentDateFormate))

3. Suppose you have a large dataset of product descriptions. Write a regular expression
to find and extract all the prices mentioned in the descriptions, considering different
currency formats (e.g., $100, €50, ¥5000). Explain how your regex works.

\p{S}[0-9]*[\.|\,][0-9]*

4. Given a block of HTML code, write a regular expression to extract all the hyperlinks
(URLs) contained within the HTML <a> tags. Explain the steps and groups in your
regex pattern.

<a\s+href="(.*?)">

5. Implement a regular expression that can detect and correct common spelling
mistakes in a text document. Provide examples and explain the substitution logic
used in your regex.

text = "Thier are two many peple here. I bdefinately want to go."

corrections = {
'bthier': 'there',
'peple': 'people',
'bdefinately': 'definitely'
}

for pattern, replacement in corrections.items():

text = re.sub(pattern, replacement, text, flags=re.IGNORECASE)

text
6. Design a regular expression to extract street addresses from a text document,
considering variations in address formats (e.g., 123 Main St, Apt 4B vs. 456 Elm
Avenue). Discuss the challenges and strategies for handling different address
structures.

\d+(?:[ \t][\w,-]+)*

7. Create a regular expression that identifies and extracts hexadecimal color codes
(e.g., #FFAABB) from a CSS stylesheet. Explain the pattern you use to capture these
codes.

#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})

Assignment 2: Regular Expressions

1. Develop an algorithm that can resolve context related ambiguities when extracting
patterns e.g., disambiguating between Apple as fruit and Apple as a company.
2. Sentiment analysis and sentiment visualization using non-negative matrix
factorization.
3. Evaluate the performance of your above algorithm in terms of accuracy, precision
and recall.

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# Sample text data and corresponding labels (replace with your data)
text_data = [
"I want to eat apple. They are delicious",
"I want to deposit money in the bank.",
"I prefer to use apple products. They are great",
"The axe fell down near to the bank of the river",
]

# Create a TF-IDF document-term matrix

vectorizer = TfidfVectorizer()
dtm = vectorizer.fit_transform(text_data)

# Apply NMF for topic modeling (assuming 2 topics)

num_topics = 3
nmf = NMF(n_components=num_topics)
W = nmf.fit_transform(dtm)

# Sentiment labeling for topics (replace with your sentiment labels)

# In practice, you might have a more sophisticated sentiment labeling approach.
topic_sentiments = ["fruit", "company", "place"]

# Perform sentiment analysis for each text sample

for i, sample in enumerate(text_data):
topic_weights = W[i] # Topic weights for the current sample
sample_sentiment = np.argmax(topic_weights) # Choose the dominant topic
sentiment_label = topic_sentiments[sample_sentiment] # Map to sentiment label

print(f"Text: {sample}")
print(f"Predicted Sentiment: {sentiment_label} (Topic {sample_sentiment})")

Spark For Dummies Ibm
100% (1)
Spark For Dummies Ibm
77 pages
Relatório Final Do Período Experimental - Paulo Manuel Correia Da Silva
No ratings yet
Relatório Final Do Período Experimental - Paulo Manuel Correia Da Silva
56 pages
HW - Regex: 1 Instructions HW - Regular Expression - 10 Points
No ratings yet
HW - Regex: 1 Instructions HW - Regular Expression - 10 Points
9 pages
EXPERT
No ratings yet
EXPERT
12 pages
CN3032 Databases Tutorial
100% (1)
CN3032 Databases Tutorial
3 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
22MCA1061 Regx
No ratings yet
22MCA1061 Regx
18 pages
03 - Regex
No ratings yet
03 - Regex
64 pages
цждццдцто2цйцйгнтб
No ratings yet
цждццдцто2цйцйгнтб
5 pages
Module II
No ratings yet
Module II
17 pages
Exercises Regular Expressions
No ratings yet
Exercises Regular Expressions
6 pages
Motivation
No ratings yet
Motivation
2 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
Ue 228120
No ratings yet
Ue 228120
8 pages
NLP Mod-2-1
No ratings yet
NLP Mod-2-1
25 pages
Unit 5
No ratings yet
Unit 5
4 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
9python Simple Character Matches
No ratings yet
9python Simple Character Matches
19 pages
Experiment 7-1
No ratings yet
Experiment 7-1
6 pages
CC Lab Task 1
No ratings yet
CC Lab Task 1
3 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
Experiment No.8 1
No ratings yet
Experiment No.8 1
7 pages
Lecture 4
No ratings yet
Lecture 4
13 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Python Notes - Unit 4
No ratings yet
Python Notes - Unit 4
13 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Network Security - 4.1 Filtering Complex Data With Cyber Chef
No ratings yet
Network Security - 4.1 Filtering Complex Data With Cyber Chef
2 pages
Lec 07 II Dsfa23
No ratings yet
Lec 07 II Dsfa23
44 pages
Python Ultimate Guide
100% (1)
Python Ultimate Guide
10 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
Shaista (22 SE 90) - LAB05
No ratings yet
Shaista (22 SE 90) - LAB05
4 pages
Unit - 4 Regex
No ratings yet
Unit - 4 Regex
28 pages
TP3 Text Preprocessing Using Regex: Objective
No ratings yet
TP3 Text Preprocessing Using Regex: Objective
8 pages
Lab Report 2
No ratings yet
Lab Report 2
28 pages
Untitled
No ratings yet
Untitled
53 pages
03.1 - Regular Expressions
No ratings yet
03.1 - Regular Expressions
34 pages
Regular Expressions
No ratings yet
Regular Expressions
104 pages
Experiment No 3
No ratings yet
Experiment No 3
7 pages
Lec 07-II-DSFa23
No ratings yet
Lec 07-II-DSFa23
44 pages
Python
No ratings yet
Python
4 pages
RegEx in Python
No ratings yet
RegEx in Python
6 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
1111 23 Regex
No ratings yet
1111 23 Regex
17 pages
CS173 Class Activity 2 Regex PDF
No ratings yet
CS173 Class Activity 2 Regex PDF
3 pages
Python Re
No ratings yet
Python Re
101 pages
Full Python Regex Questions Detailed
No ratings yet
Full Python Regex Questions Detailed
4 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
UNIT4
No ratings yet
UNIT4
67 pages
Unit 3 2
No ratings yet
Unit 3 2
3 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
61 pages
Spacy Regex
No ratings yet
Spacy Regex
5 pages
W10A Full
No ratings yet
W10A Full
40 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
NLP 2
No ratings yet
NLP 2
5 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
George Winslow Plummer - The Master's Word
No ratings yet
George Winslow Plummer - The Master's Word
125 pages
Distributed File System
No ratings yet
Distributed File System
20 pages
Sample Multiple Choice Questions. Class: Ty BSC (It) Semester-Vi Subject: Business Intelligence
No ratings yet
Sample Multiple Choice Questions. Class: Ty BSC (It) Semester-Vi Subject: Business Intelligence
8 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Implementing Data Guard
No ratings yet
Implementing Data Guard
77 pages
Info125 Notes1
No ratings yet
Info125 Notes1
54 pages
MWANSA'S Computer Studies Scheme of Work - 2022
No ratings yet
MWANSA'S Computer Studies Scheme of Work - 2022
20 pages
Digital Forensics Tools
No ratings yet
Digital Forensics Tools
11 pages
ISDS402FinalExamReviewwAudio PDF
No ratings yet
ISDS402FinalExamReviewwAudio PDF
2 pages
Comprehensive Systematic Review On Virtual Reality For Cultural Heritage Practices Coherent Taxonomy and Motivations
No ratings yet
Comprehensive Systematic Review On Virtual Reality For Cultural Heritage Practices Coherent Taxonomy and Motivations
17 pages
17 Bibliography
No ratings yet
17 Bibliography
18 pages
Computer Engineering: Database Systems (DS) Assignment
No ratings yet
Computer Engineering: Database Systems (DS) Assignment
15 pages
IT Grade 9
No ratings yet
IT Grade 9
4 pages
Class 6
No ratings yet
Class 6
29 pages
Rule Based Extraction From PDF
No ratings yet
Rule Based Extraction From PDF
4 pages
Exam Questions Introduction Multimedia
No ratings yet
Exam Questions Introduction Multimedia
3 pages
Future of Social Media Presentation
No ratings yet
Future of Social Media Presentation
12 pages
Data Collection Scripts and Process Flow For Periodic Average Costing (PAC IPAC) - Cost Management
No ratings yet
Data Collection Scripts and Process Flow For Periodic Average Costing (PAC IPAC) - Cost Management
13 pages
Data Analytics Class - Unit-Ii
No ratings yet
Data Analytics Class - Unit-Ii
40 pages
Kesiapan Umkm (Usaha Mikro Kecil Menengah) Menuju Ekonomi Digital Di Desa Mekargalih Kecamatan Jatinangor Kabupaten Sumedang PDF
No ratings yet
Kesiapan Umkm (Usaha Mikro Kecil Menengah) Menuju Ekonomi Digital Di Desa Mekargalih Kecamatan Jatinangor Kabupaten Sumedang PDF
11 pages
Presentation IT Infrastructure
No ratings yet
Presentation IT Infrastructure
18 pages
It Audit Jeopardy
No ratings yet
It Audit Jeopardy
53 pages
BI Lab Manual
No ratings yet
BI Lab Manual
21 pages
Laboratory Exercise Hci Guidelines I
No ratings yet
Laboratory Exercise Hci Guidelines I
3 pages
L6a Web Optimisation and Web Design
No ratings yet
L6a Web Optimisation and Web Design
4 pages
Postgis Case Studies: What Is It, Who Is Using It, and Why?
No ratings yet
Postgis Case Studies: What Is It, Who Is Using It, and Why?
50 pages

(Assignment 1 & 2) Regular Expression

Uploaded by

(Assignment 1 & 2) Regular Expression

Uploaded by

2023MSCS229

Assignment 1: Regular Expressions

for pattern, replacement in corrections.items():

Assignment 2: Regular Expressions

# Create a TF-IDF document-term matrix

# Apply NMF for topic modeling (assuming 2 topics)

# Sentiment labeling for topics (replace with your sentiment labels)

# Perform sentiment analysis for each text sample

You might also like