0% found this document useful (0 votes)
18 views

Lecture 4

Uploaded by

ayataljarrah89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture 4

Uploaded by

ayataljarrah89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Cybersecurity Programming

Crash Course In Python Programming Language—Lesson 4

Dr. Mustafa Ababneh


Regular Expressions (RegEx): A Very Short Introduction

Regex is a pattern describing/matching a certain amount of text


Meta-characters (characters pairs) that have special meanings in
RegEx: \, ^, $, ., |, ?, *, +, (), [], {}
If they needed to be used as literals in a regex, the must be escaped
with \
My favorite resources:
https://www.regular-expressions.info/quickstart.html
Regular Expressions Cookbook by Jan Goyvaerts & Steven Levithan
Tools & utilities for working with, building & testing RegExs
grep
https://regexr.com
A side note: Google Search supports some simple
RegExs
E.g., "(gray|red) (wolf|fox)" returns results for
"gray wolf" OR "red fox" OR "red wolf" OR
"gray fox"
https://www.google.com/search?hl=en&q=%22(red%
1/
RegEx Cheatsheet
Expression Will Match
. any character except newline
\w \d \s word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
^abc start of the string
abc$ end of the string
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
\xHH \x0A \xA9 a byte with Hexadecimal notation, \n, ©
(abc) capture group
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} a{3,7} exactly five, two or more, from 3 to 7
.+? .{2,}? match as few as possible (Ungreedy)
ab|cd match ab or cd (| has the lowest precedence)
2/
https://regexr.com

3/
RegExs In Python

Based on https://docs.python.org/3.6/howto/regex.html
Used to specify rules to match patterns in strings
Like Unix’s grep
Also, can be used to substitute strings
Like Unix’s sed

import re

l = re.findall(r'\d', 'we23f345rfsfewt5r')
print(l) # outputs ['2', '3', '3', '4', '5',
'5'] l = re.findall(r'[0-9]', 'we23f345rfsfewt5r')
print(l) # same as above

l = re.findall(r'\d+', 'we23f345rfsfewt5r') # >= 1


times
print(l) # outputs ['23', '345', '5']

l = re.findall(r'ca*t', 'ewewrctcatcaatcaaaaaaatcg')

#
l >= 0 times
= re.findall(r'ca?t','ewewrctcatcaatcaaaaaaatcg') # 0 or 1 times
print(l)
print(l) # outputs
# outputs ['ct',
['ct', 'cat', 'caat',
'cat']
'caaaaaaat'] 4 / 12
RegExs In Python

File = """
[email protected] wrer@efe@fe @r4etre.com
dfraeqw@
wrere.org [email protected]
"""

l = re.findall(r'[a-zA-
Z0-9_\-.]+@[a-zA-Z0-
9_\-]+\.[a-zA-Z]+',
File)
print(l) #
outputs ['[email protected]',
'[email protected]']

l = re.findall(r'[a-zA-
Z0-9_\-.]+@[a-zA-Z0-
9_\-]+\.[a-zA-Z0-9\-.]
+', File)
print(l) #
outputs ['[email protected]', 5 / 12
Network Programming (High Level): Interacting with the Web

We will use the Requests package


Resources
https://requests.readthedocs.io/en/master/
https://requests.readthedocs.io/en/latest/user/quickstart/

A third-party package endorsed and recommended by Python docs (


https://docs.python.org/3.6/library/urllib.request.html)
If not already installed, install via pip3 install requests

6 / 12
Network Programming (High Level): Interacting with the Web

#!/usr/bin/python3.6

import requests

request_url = "https://www.google.com/search"

# Query String Parameters are obtained from the Developer Tools


# of the Vanilla Web Browser

payload = {'client' : 'safari', 'source' : 'hp',


'q' : 'Regular Expression', 'sclient' : 'psy-ab'}

response = requests.get(request_url, params = payload)

print("Status Code: {}".format(response.status_code))


print("Complete URL: {}".format(response.url))
print("Encoding: {}".format(response.encoding))
print("Response Headers:\n{}".format(response.headers))

f = open('google.html', 'w')
f.write(response.text)
7 / 12
Safari Web Developer Tools: Web Inspector

8 / 12
Network Programming (High Level): Interacting with the Web

ahmad@ubuntu:~/NES470$ ./Requests.py
Status Code: 200
Complete URL: https://www.google.com/search?client=safari&source=hp&
q=Regular+Expression&sclient=psy-ab
Encoding: ISO-8859-1
Response Headers:
{'Content-Type':
'text/html;
charset=ISO-8859-1',
'Date': 'Fri, 25
...
ma=2592000;
v="46,43"',
'Transfer-Encoding':
'chunked'}

9 / 12
Network Programming (High Level): Interacting with the Web

#!/usr/bin/python3.6

import requests
import re

request_url = "https://www.google.com/search"

# Query String Parameters are obtained from the Developer Tools


# of the Vanilla Web Browser
payload = {'client' : 'safari', 'source' : 'hp',
'q' : 'Regular Expression', 'sclient' : 'psy-ab'}

response = requests.get(request_url, params = payload)

# RE is crafted by investigating the HTML source file with a Web


# browser's Developer Tools & a text editor
urls = re.findall(r'<div class="kCrYT"><a href="/url\?q=(.+?)&
amp;sa=U.+?">', response.text)

for u in urls:
print("{:2d}: {}".format(urls.index(u) + 1, u))
10 /
Safari Web Developer Tools: Web Inspector

11 /
Network Programming (High Level): Interacting with the Web

ahmad@ubuntu:~/NES470$ ./Requests-RE.py
1: https://en.wikipedia.org/wiki/Regular_expression
2: https://developer.mozilla.org/ar/docs/Web/JavaScript/Guide/
Regular_Expressions
3: https://regexr.com/
4: https://docs.python.org/3/library/re.html
5: https://docs.python.org/3/howto/regex.html
6: https://www.w3schools.com/jsref/jsref_obj_regexp.asp
7: https://www.w3schools.com/js/js_regexp.asp
8: https://www.regular-expressions.info/
9: https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.
html
10: https://support.google.com/analytics/answer/1034324%3Fhl%3Den

12 /

You might also like