Lab 5

Uploaded by

mdtahmid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lab 5

Uploaded by

mdtahmid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

IT 166 Lab 5

Regular expressions and file parsing

You may work in groups of 2 but you MUST make individual submissions and mention you and your
group mate’s name using comments on the first line.

Objectives
• Be able to craft and apply regular expressions to find matching text and solve problems.
• Be able to read from text files on the hard drive.
Preparation
• Launch the Jupyter notebook.
• Rename the notebook page as “lab5”.
• Solution to one problem should occupy one cell.
Please provide solutions to the problems below.

Problem 1
From your Lab module, download the file emails.txt and place it in the same folder as the current
notebook. Write a Python function called extract_emails_from_file that takes the path to a text
file as its input (in your case, emails.txt). A given text file can contain a mixture of text and email
addresses scattered across different lines. Your function’s task is to:
i) read the text file,
ii) use a regular expression to find all the email addresses, and
iii) return them as a list.
Ensure that your function can handle a variety of email formats, including those with dots, dashe
s, and underscores in the local part and domain name.

Hints:
• Make sure the text file is in the same folder as the notebook.
• You are allowed to test out your regular expression on regexr.com

Use the following snippet of code to test your function:

emails = extract_emails_from_file('emails.txt')
for email in emails:
print(email)

Sample Outputs:
[email protected]
[email protected]
[email protected]
Problem 2
From your Lab module, download the file regex_sum_42.txt and place it in the same folder as the
current notebook. Create a Python function called compute_sum_from_text that takes the path to
a text file as its input (in your case, regex_sum_42.txt) and returns the sum. Your function’s task
is to:
i) read through and parse a file with text and numbers
ii) extract all the numbers in the file and
iii) compute the sum of the numbers.
Hints:
• Make sure the text file is in the same folder as the notebook.
• You are allowed to test out your regular expression on regexr.com
Use the following snippet of code to test your function:
mySum = compute_sum_from_text('regex_sum_42.txt')
print(mySum)

Sample Outputs:
445833

Problem 3
From your Lab module, download the file server_errors.log and place it in the same folder as the
current notebook Write a Python function named parse_log that takes the path to a log file as its
input. The function should:
i) read the file,
ii) use regular expressions to extract error messages and their corresponding timestamps, and
iii) return a list of tuples.
Each tuple should contain the timestamp of the error and the error message.
Hints:
• Make sure the log file is in the same folder as the notebook
• Assume the log format is "YYYY-MM-DD HH:MM:SS: Error: ErrorMessage".
• The function should return a list of tuples like this (timestamp, error_message).

Use the following snippet of code to test your function:

errors = parse_log('logfile.log')
for error in errors:
print(error)

Sample Outputs:
[('2023-04-01 12:00:00', 'Failed to connect to the database.')]
Problem 4
From your Lab module, download the file insurance_claims.txt and place it in the same folder as
the current notebook. In the field of actuarial science, accurate data analysis is crucial. You are
provided with a dataset containing raw insurance claim entries that include various pieces of
information such as the claim date, claim amount, policy number, and claimant comments.
However, the data is in a highly inconsistent format with lots of noise, including unnecessary
spaces, special characters, and varying data entry conventions. Your task is to write a Python
function named parse_insurance_claims to process this dataset and extract structured information.
Your Function Should:
• Accept the path to a dataset file as its input (in your case insurance_claims.txt).
• Use regular expressions to robustly parse each claim entry.
• Extract and return a list of dictionaries. Each dictionary represents a claim entry with the
following keys: claim_date, claim_amount, policy_number, and claimant_comments.
Hints:
• The claim_date always follows the YYYY-MM-DD format, but it might be surrounded by
different characters.
• The claim_amount is prefixed with a dollar sign and may include commas for thousands, but it
could be surrounded by various symbols.
• The policy_number starts with a hash symbol followed by alphanumeric characters, but it might
be encased in different symbols or formats.
• The claimant_comments describe the claim reason and may contain any characters, usually
following the policy_number.
• Your regex must be versatile enough to handle different formats and noise within the claim
entries, accurately extracting the required information despite the inconsistencies.
Sample outputs (for the given input file):
[{'claim_date': '2024-03-01', 'claim_amount': 2500, 'policy_number': 'AB123',
'claimant_comments': 'Water damage in basement'},
{'claim_date': '2024-03-02', 'claim_amount': 1500, 'policy_number': 'CD456',
'claimant_comments': 'Stolen bicycle'},
{'claim_date': '2024-03-03', 'claim_amount': 5000, 'policy_number': 'EF789',
'claimant_comments': 'Car accident'},
{'claim_date': '2024-03-04', 'claim_amount': 300, 'policy_number': 'GH012',
'claimant_comments': 'Broken window'}
]

Model Optimization Methods for Efficient and Edge AI (2025)
No ratings yet
Model Optimization Methods for Efficient and Edge AI (2025)
414 pages
Session-20 - Jupyter Notebook
No ratings yet
Session-20 - Jupyter Notebook
12 pages
ue228120
No ratings yet
ue228120
8 pages
22MCA1061 Regx
No ratings yet
22MCA1061 Regx
18 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
assignment1_Advanced Python_UHasselt
No ratings yet
assignment1_Advanced Python_UHasselt
3 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
Experiment 7-1
No ratings yet
Experiment 7-1
6 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Untitled Document
No ratings yet
Untitled Document
66 pages
Python Notes_unit 4 (1)
No ratings yet
Python Notes_unit 4 (1)
13 pages
Assignment-1 Ch-1 & 2
No ratings yet
Assignment-1 Ch-1 & 2
3 pages
File Handling Revision
No ratings yet
File Handling Revision
6 pages
Mod 51210
No ratings yet
Mod 51210
4 pages
IDAP Assignment
No ratings yet
IDAP Assignment
6 pages
CWSS 2020 Computing P2
No ratings yet
CWSS 2020 Computing P2
6 pages
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
No ratings yet
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
4 pages
Cs Tutorial On Python
No ratings yet
Cs Tutorial On Python
8 pages
GE - Computer Scien 81iXSLh
No ratings yet
GE - Computer Scien 81iXSLh
6 pages
Data File Handling Worksheet
No ratings yet
Data File Handling Worksheet
10 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
Module 2 Files in Python - Programs
No ratings yet
Module 2 Files in Python - Programs
12 pages
Python
No ratings yet
Python
4 pages
Lab Report 05
No ratings yet
Lab Report 05
5 pages
Computer Science-Class-Xii-Sample Question Paper-2
No ratings yet
Computer Science-Class-Xii-Sample Question Paper-2
11 pages
12 computer(2022-23)
No ratings yet
12 computer(2022-23)
5 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
Exam SC-400: Microsoft Information Protection and Compliance Administrator Associate Exam Preparation
From Everand
Exam SC-400: Microsoft Information Protection and Compliance Administrator Associate Exam Preparation
Georgio Daccache
No ratings yet
Lecture 3-4 Regex
No ratings yet
Lecture 3-4 Regex
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
88 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
PYTHON2
No ratings yet
PYTHON2
10 pages
XI CS QP.docx
No ratings yet
XI CS QP.docx
6 pages
Lab - 11 - File Handling in Python
No ratings yet
Lab - 11 - File Handling in Python
6 pages
Full Test-1
No ratings yet
Full Test-1
9 pages
SPSS 2018 P2
No ratings yet
SPSS 2018 P2
8 pages
LAB 2
No ratings yet
LAB 2
3 pages
Pythonn 1 To 8 & 11
No ratings yet
Pythonn 1 To 8 & 11
19 pages
Pythonass
No ratings yet
Pythonass
8 pages
Unit3 File Handling
No ratings yet
Unit3 File Handling
9 pages
Test Your Knowledge - Work With Files in Python - Coursera100
No ratings yet
Test Your Knowledge - Work With Files in Python - Coursera100
1 page
idsup a1
No ratings yet
idsup a1
17 pages
CSV Files Worksheet
No ratings yet
CSV Files Worksheet
7 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
MCQ-2
No ratings yet
MCQ-2
25 pages
Pograms
No ratings yet
Pograms
20 pages
FODS Using Python Practical File
No ratings yet
FODS Using Python Practical File
18 pages
All Subject HomeWork
No ratings yet
All Subject HomeWork
60 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
3
No ratings yet
3
7 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
ANNUAL-CS-23-24
No ratings yet
ANNUAL-CS-23-24
7 pages
Practical-11 Vraj CS
No ratings yet
Practical-11 Vraj CS
7 pages
Computer SC Sample Paper by CBSE With Marking Scheme
No ratings yet
Computer SC Sample Paper by CBSE With Marking Scheme
18 pages
ComputerScience SQP
No ratings yet
ComputerScience SQP
11 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Xiicomp.sc.Pt1458
No ratings yet
Xiicomp.sc.Pt1458
4 pages
CS Model Test
No ratings yet
CS Model Test
3 pages
JavaScript Introduction
From Everand
JavaScript Introduction
Lisa Saldivar
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Computer Networking Course - Network Engineering (CompTIA Network+ Exam Prep) - YouTube
No ratings yet
Computer Networking Course - Network Engineering (CompTIA Network+ Exam Prep) - YouTube
2 pages
LDD-HW Dimmable Driver and Storm Controller Wiring
No ratings yet
LDD-HW Dimmable Driver and Storm Controller Wiring
5 pages
Anshul Resume IITGoa
No ratings yet
Anshul Resume IITGoa
1 page
Weirs - Basic Civil Engineering Questions and Answers - Sanfoundry PDF
No ratings yet
Weirs - Basic Civil Engineering Questions and Answers - Sanfoundry PDF
4 pages
Sony VAIO Brand Guidelines
100% (2)
Sony VAIO Brand Guidelines
27 pages
Basic Maths Complete Handout PDF
No ratings yet
Basic Maths Complete Handout PDF
65 pages
Eternal Shine: Tap or Click To Add A Heading
No ratings yet
Eternal Shine: Tap or Click To Add A Heading
8 pages
BW Strategies, Techniques, and Best Practices To Upgrade, Copy, and Migrate SAP BW Systems PDF
No ratings yet
BW Strategies, Techniques, and Best Practices To Upgrade, Copy, and Migrate SAP BW Systems PDF
86 pages
Brannan - Electronic Fridge FreezerThermometer
No ratings yet
Brannan - Electronic Fridge FreezerThermometer
1 page
Functions: Defining A Function, Calling A Function, Types of Functions
No ratings yet
Functions: Defining A Function, Calling A Function, Types of Functions
97 pages
Resume s24
No ratings yet
Resume s24
1 page
Functional Programming: Int Double
No ratings yet
Functional Programming: Int Double
27 pages
A Design Multiform Web Project With Following Menus: Home Courses, Departments, Staff 1. Home Page (Default - Aspx)
No ratings yet
A Design Multiform Web Project With Following Menus: Home Courses, Departments, Staff 1. Home Page (Default - Aspx)
15 pages
APC Application Note 144 - PowerChuteTM Network Shutdown Security Features & Deployment
No ratings yet
APC Application Note 144 - PowerChuteTM Network Shutdown Security Features & Deployment
10 pages
Quality Management: Section 5
No ratings yet
Quality Management: Section 5
3 pages
U.S. Food & Drug: Administration 10903 New Hampshire Avenue Silver Spring, MD 20993
No ratings yet
U.S. Food & Drug: Administration 10903 New Hampshire Avenue Silver Spring, MD 20993
10 pages
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
No ratings yet
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
8 pages
Imagens: Optical Cable
No ratings yet
Imagens: Optical Cable
2 pages
The 4 Simple Steps For Creating A Monte Carlo Simulation With Engage or Workspace
No ratings yet
The 4 Simple Steps For Creating A Monte Carlo Simulation With Engage or Workspace
11 pages
Eaton AxisPro Servo-Performance Directional Valve User Manual
100% (1)
Eaton AxisPro Servo-Performance Directional Valve User Manual
254 pages
Compiler Design
No ratings yet
Compiler Design
7 pages
International Trade Theory Final.
0% (1)
International Trade Theory Final.
76 pages
C++ MCQ Dacc
No ratings yet
C++ MCQ Dacc
50 pages
Ai Lanaguage Models
No ratings yet
Ai Lanaguage Models
3 pages
Wood, Carpenter Company in Malaysia
No ratings yet
Wood, Carpenter Company in Malaysia
16 pages
Anna University:: Chennai - 600 025 Model Question Paper
No ratings yet
Anna University:: Chennai - 600 025 Model Question Paper
2 pages
Banglamusicstylo: A Stylometric Dataset of Bangla Music Lyrics
No ratings yet
Banglamusicstylo: A Stylometric Dataset of Bangla Music Lyrics
5 pages
#Prod Número de Parte Descripción $ Normal % Outlet $ Outlet Marca
No ratings yet
#Prod Número de Parte Descripción $ Normal % Outlet $ Outlet Marca
12 pages
Evaluation of Chemical and Physical Changes in Different Commercial Oils During Heating
No ratings yet
Evaluation of Chemical and Physical Changes in Different Commercial Oils During Heating
8 pages

Lab 5

Uploaded by

Lab 5

Uploaded by

IT 166 Lab 5

Regular expressions and file parsing

Use the following snippet of code to test your function:

Use the following snippet of code to test your function:

You might also like