Networking in Python
Objectives
In this session, you will learn to:
Extract Data Using Regular Expression.
Combine and Search Data.
Implement Networking in Python.
Explore HTTP.
Retrieve Images using HTTP.
Retrieve Web Pages with URL Library.
Extracting Data Using
Regular Expression
Are Regular Expressions used to
extract a particular data from a
given expression?
• A regular expression (regex for short)
is a special text string for describing a
search pattern.
• It is a pattern describing a certain
amount of text.
Extracting Data Using
Regular Expression (Contd.)
Findall ()
Extracts data from a string.
Returns all non-overlapping matches of pattern in string.
Scans data from left-to-right and returns exact match.
Returns a list of groups if one or more groups found.
re.findall(pattern, string[, flags])
Extracting Data Using
Regular Expression (Contd.)
Let us look at an example of extracting the e-mail address from a certain
statement.
Output:
[‘[email protected]',
‘[email protected]']
Combining, Searching
and Extracting
findall() extracts a line or string that matches specified
pattern
findall() can also be used to segregate portions of a string
Example for searching using regular expression
import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
x = re.findall('ˆX\S*: ([0-9.]+)', line)
if len(x) > 0 :
print x
Escape Character
Escape Characters are an alternative interpretation on
subsequent character sequence.
Escape Characters are used in Regular Expression to indicate
normal character and match the actual character such as
dollar sign or caret.
Just a Minute
Predict the output of the following code
string = “Once you have
accomplished small things, you may
attempt great ones safely.”
print re.findall(r"\ba[\w]*", string)
Just a Minute
Predict the output of the following code
string = “Once you have
accomplished small things, you may
attempt great ones safely.”
print re.findall(r"\ba[\w]*", string)
Answer: ['accomplished',
'attempt']
Activity
Activity : Understanding & Implementing Regular Expression
Problem Statement
Write a program to simulate the operation of the grep command in Unix. Ask the user to
enter a regular expression and count the number of lines that matched the regular
expression.
Here is a sample execution of the program:
Enter a regular expression: ˆAuthor
mbox.txt had 1798 lines that matched ˆAuthor
Enter a regular expression: ˆX
mbox.txt had 14368 lines that matched ˆX
Enter a regular expression: java$
mbox.txt had 4218 lines that matched java$
Prerequisite: For this activity please refer “mbox.txt” available inside
“Data_File_For_Students” folder.
Network Programming
Networking is concept of two programs communicating across a
network.
Whether it be from client-client, client-server or even client to itself.
Client : An end device interfacing with a human.
Server : A device providing a service for clients.
HyperText Transfer Protocol –
HTTP
HTTP is an application protocol for distributed, collaborative
information systems.
HTTP is a foundation of data communication for the WWW.
Http Request
Client Server
Http Response
Understanding Sockets
Sockets are endpoints of a bidirectional communications
channel.
Sockets are much like a file except that it provides a two way
connection between two programs with a single socket
World’s Simplest Web Browser
The use of HTTP Protocol in Python Program is to make a
connection to a Web Server by following the rule of HTTP
protocol.
World’s Simplest Web Browser
(Contd.)
The following program makes a connection to port 80 on the server and
prints the data what server has sent.
import socket
mysock=socket.socket(socket.AF_INET,so Output:
cket.SOCK_STREAM)
mysock.connect((‘www.py4inf.com’,80)) HTTP/1.1 200 OK
Date: Sun, 14 Mar 2010 23:52:41
mysock.send(‘GET GMT
http://www.p4inf.com/code/romeo.txt Server: Apache
HTTP/1.0\n\n’) Last-Modified: Tue, 29 Dec 2009
01:31:22 GMT
while True:
ETag: "143c1b33-a7-4b395bea"
data=mysock.recv(512) Accept-Ranges: bytes
if(len(data)<1): Content-Length: 167
break Connection: close
Content-Type: text/plain
print data
mysock.close()
Retrieving Image over HTTP
Copying the data - Accumulate the data in a String ,trim off
the headers and save the image data to a file.
$ python urljpeg.py
2920 2920
1460 4380
1460 5840
1460 7300
...
1460 62780
1460 64240
2920 67160
Header length 240
HTTP/1.1 200 OK
Date: Sat, 02 Nov 2013 02:15:07 GMT
Output
Retrieving Web Pages
with “urllib”
We can manually send and receive data over HTTP using the
socket library in a simpler way using “urllib”.
“urllib” will retrieve the web page which you have indicated
and handles all of the http protocol and header details.
Output:
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
Just a Minute
Using _______, we can treat Web
Page much like a file.
Just a Minute
Using _______, we can treat Web
Page much like a file.
Answer: urllib
Just a Minute
A ______ is much like a file,
except that it provides a two-way
connection between two
programs with a single socket.
Just a Minute
A ______ is much like a file,
except that it provides a two-way
connection between two
programs with a single socket.
Answer: Socket
Activity
Activity : Implementing Network
Programming
Problem Statement:
Write a socket program so that it counts the number of characters it has received and stops
displaying any text after it has shown 3000 characters. The program should retrieve the entire
document and count the total number of characters and display the count of the number of
characters at the end of the document.
Prerequisite: Internet connection must be available to run this activity.
Summary
In this session, you learned to:
Extract Data Using Regular Expression.
Combine and Search Data.
Implement Networking in Python.
Explore HTTP.
Retrieve Images using HTTP.
Retrieve Web Pages with URL Library.