Regular Expressions: Luísa Coheur
Regular Expressions: Luísa Coheur
com/r/ProgrammerHumor/
REGULAR EXPRESSIONS
Luísa Coheur
Overview
Learning objectives
Topics
Regular Expressions
Key takeaways
Suggested readings
LEARNING OBJECTIVES
LEARNING OBJECTIVES
Text Cleaning
Removing HTML Tags from Web Scraped Data:
A regex pattern like /<[^>]+>/g can be used to find and
remove all HTML tags (g means “everywhere”)
Pattern Recognition/data extraction
Identifying Email Addresses:
The regex pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-
Z|a-z]{2,}\b matches most email address formats
Extracting Dates:
A regex can help identify and extract the various formats of
dates, useful for timeline analysis or event tracking
REGULAR EXPRESSIONS: APPLICATIONS
Spam Detection:
Regular expressions can be instrumental in identifying
common characteristics of spam messages, such as
excessive use of capital letters, the presence of certain
phrases (e.g., “BUY NOW”, “FREE”, “CLICK HERE”), or
suspicious URLs
ACTIVE LEARNING MOMENT
EXERCISE