The Glowing Python: networking

Showing posts with label networking. Show all posts

Friday, May 4, 2012

Analyzing your Gmail with Matplotlib

Lately, I read this post about using Mathematica to analyze a Gmail account. I found it very interesting and I worked a with imaplib and matplotlib to create two of the graph they showed:

A diurnal plot, which shows the date and time each email was sent (or received), with years running along the x axis and times of day on the y axis.
And a daily distribution histogram, which represents the distribution of emails sent by time of day.

In order to plot those graphs I created three functions. The first one, retrieve the headers of the emails we want to analyze:

from imaplib import IMAP4_SSL
from datetime import date,timedelta,datetime
from time import mktime
from email.utils import parsedate
from pylab import plot_date,show,xticks,date2num
from pylab import figure,hist,num2date
from matplotlib.dates import DateFormatter

def getHeaders(address,password,folder,d):
 """ retrieve the headers of the emails 
     from d days ago until now """
 # imap connection
 mail = IMAP4_SSL('imap.gmail.com')
 mail.login(address,password)
 mail.select(folder) 
 # retrieving the uids
 interval = (date.today() - timedelta(d)).strftime("%d-%b-%Y")
 result, data = mail.uid('search', None, 
                      '(SENTSINCE {date})'.format(date=interval))
 # retrieving the headers
 result, data = mail.uid('fetch', data[0].replace(' ',','), 
                         '(BODY[HEADER.FIELDS (DATE)])')
 mail.close()
 mail.logout()
 return data

The second one, make us able to make the diurnal plot:

def diurnalPlot(headers):
 """ diurnal plot of the emails, 
     with years running along the x axis 
     and times of day on the y axis.
 """
 xday = []
 ytime = []
 for h in headers: 
  if len(h) > 1:
   timestamp = mktime(parsedate(h[1][5:].replace('.',':')))
   mailstamp = datetime.fromtimestamp(timestamp)
   xday.append(mailstamp)
   # Time the email is arrived
   # Note that years, month and day are not important here.
   y = datetime(2010,10,14, 
     mailstamp.hour, mailstamp.minute, mailstamp.second)
   ytime.append(y)

 plot_date(xday,ytime,'.',alpha=.7)
 xticks(rotation=30)
 return xday,ytime

And this is the function for the daily distribution histogram:

def dailyDistributioPlot(ytime):
 """ draw the histogram of the daily distribution """
 # converting dates to numbers
 numtime = [date2num(t) for t in ytime] 
 # plotting the histogram
 ax = figure().gca()
 _, _, patches = hist(numtime, bins=24,alpha=.5)
 # adding the labels for the x axis
 tks = [num2date(p.get_x()) for p in patches] 
 xticks(tks,rotation=75)
 # formatting the dates on the x axis
 ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))

Now we got everything we need to make the graphs. Let's try to analyze the outgoing mails of last 5 years:

print 'Fetching emails...'
headers = getHeaders('[email protected]',
                      'ofcourseiamsupersexy','inbox',365*5)

print 'Plotting some statistics...'
xday,ytime = diurnalPlot(headers)
dailyDistributioPlot(ytime)
print len(xday),'Emails analysed.'
show()

The result would appear as follows

We can analyze the outgoing mails just using selecting the folder '[Gmail]/Sent Mail':

print 'Fetching emails...'
headers = getHeaders('[email protected]',
                     'ofcourseiamsupersexy','[Gmail]/Sent Mail',365*5)

print 'Plotting some statistics...'
xday,ytime = diurnalPlot(headers)
dailyDistributioPlot(ytime)
print len(xday),'Emails analysed.'
show()

And this is the result:

Thursday, June 9, 2011

Crawling the web with SGMLParser

In this example we will use SGMLParser in order to build a simple web crawler.

import urllib
from random import choice
from sgmllib import SGMLParser

class LinkExplorer(SGMLParser): 
 def reset(self):                              
  SGMLParser.reset(self) 
  self.links = [] # list with the urls

 def start_a(self, attrs):
  """ fill the links with the links in the page """
  for k in attrs:
   if k[0] == 'href' and k[1].startswith('http'): 
    self.links.append(k[1])

def explore(parser,s_url,maxvisit=10,iter=0):
 """ pick a random link in the page s_url
     and follow its links recursively """
 if iter < maxvisit: # it will stop after maxvisit iteration
  print '(',iter,') I am in',s_url
  usock = urllib.urlopen(s_url) # download the page
  parser.reset() # reset the list
  parser.feed(usock.read()) # parse the current page
  if len(parser.links) > 0:
   explore(parser,choice(parser.links),maxvisit,iter+1)
  else: # if the page has no links to follow
   print 'the page has no links'

# test the crawler starting from the python's website
parser = LinkExplorer()
explore(parser,"http://www.python.org/")

Let's go!

( 0 ) I am in http://www.python.org/
( 1 ) I am in http://wiki.python.org/moin/NumericAndScientific
( 2 ) I am in http://numpy.scipy.org/
( 3 ) I am in http://sphinx.pocoo.org/
( 4 ) I am in http://www.bitbucket.org/birkenfeld/sphinx/issues/
( 5 ) I am in http://blog.bitbucket.org
( 6 ) I am in http://haproxy.1wt.eu/
( 7 ) I am in http://www.olivepeak.com/blog/posts/read/free-your-port-80-with-haproxy
( 8 ) I am in http://www.olivepeak.com/peaknotes/
( 9 ) I am in http://notes.olivepeak.com/account/create

Monday, May 16, 2011

How to create an Irc echo bot

The example shows how to connect to an irc server and how to read and send data from the server.

import socket

def reply(privmsg, socket):
 """ decode the string with message,
     something like ':nickname!~hostname PRIVMSG my_nickname :hi'
     and echoes the message received to nickname """
 nick = privmsg[1:privmsg.find('!')]
 msg = privmsg[privmsg.find(':',1,len(privmsg))+1:len(privmsg)] 
 socket.send('PRIVMSG '+nick+' :'+msg) # sending to the socket


print 'Connecting...'
s = socket.socket()
s.connect(('irc.freenode.net',6667)) #connection to the irc server
s.send('NICK GlowPy\n')
s.send('USER PythonBot my.host.name humm : My Real Name\n')

while True:
 data = s.recv(1024) # reading from the socket
 print data
 if data.find('PRIVMSG') > 0: # if the string is a message
  reply(data,s)

A conversation with the bot:

<JustGlowing> hi there!
<GlowPy> hi there!
<JustGlowing> how are you?
<GlowPy> how are you?

Wednesday, May 11, 2011

How to download the profile picture of a facebook user

The follwing function uses the facebook graph API to retrieve the url of the profile picture from the user's id:

import urllib
import simplejson

def getProfilePicUrl(user_id):
 api_query = urllib.urlopen('https://graph.facebook.com/'+user_id)
 dict = simplejson.loads(api_query.read())
 return dict['picture']

When we visit a facebook profile the user id is displayed in the address of the page. This is the address of the cocacola page, in red the user id of coca cola:

http://www.facebook.com/profile.php?id=40796308305

now can use the id to save the profile picture of coca cola.

pic_url = getProfilePicUrl('40796308305')
pic = urllib.urlopen(pic_url) # retrieve the picture
f = open("cocacola.jpg","wb")
f.write(pic.read()) # save the pic
f.close()

The script will save the picture on the disk.

Warning: coca cola has a public profile, non-public profile need authentication.

Monday, May 2, 2011

How to create a chart with Google Chart API

The example shows how to create a scatter plot using the Google Chart API.

import random
import urllib

def list2String(x):
 """ from a list like [1,2,5]
     return a string like '1,2,5' """
 data = ""
 for i in x:
  data += str(i)+","
 return data[0:len(data)-1]

def makeChart(x,y,filename):
 query_url = "http://chart.apis.google.com/chart?chxt=x,y&chs=300x200&cht=s&chd=t:"
 query_url += list2String(x)+"|"+list2String(y)
 chart = urllib.urlopen(query_url) # retrieve the chart
 print "saving",query_url
 f = open(filename,"wb")
 f.write(chart.read()) # save the pic
 f.close()

x = random.sample(range(0,100),10) # list with
y = random.sample(range(0,100),10) # random values in [0 100[
makeChart(x,y,"chart.png")

You can embed the picture in a web page:

<img alt="Google chart example" src="http://chart.apis.google.com/chart?chxt=x,y&amp;chs=300x200&amp;cht=s&amp;chd=t:64,10,18,42,49,83,73,27,44,51|77,89,13,87,27,34,38,44,22,42" />

Or use it from the disk.

Google chart example

Monday, April 25, 2011

How to use twitter search api

The example show how to search in twitter without using third party libraries. We will use the json data-interchange format provided by twitter.

import urllib
import simplejson

def searchTweets(query):
 search = urllib.urlopen("http://search.twitter.com/search.json?q="+query)
 dict = simplejson.loads(search.read())
 for result in dict["results"]: # result is a list of dictionaries
  print "*",result["text"],"\n"

# we will search tweets about "fc liverpool" football team
searchTweets("fc+liverpool")

The program will print the most popular tweets about fc liverpool

* Poulsen set for return home? http://tinyurl.com/3vnyc9r 

* Now: Watch live press conf - Liverpool FC http://lfc.tv/GYb 

* Who wants 20 percent off a Liverpool fc stuff 

* Liverpool FC manager Gerard Houllier in Birmingham hospital after health s... http://bit.ly/glcgxJ #LFC 

* RT @darsh710: Liverpool Echo: Kenny: Alberto Aquilani welcome back at LFC after Juve loan ends http://bit.ly/emp5QZ  #LFC @ShilChandi @KaushP @BolaAnt 

* RT @empireofthekop: Liverpool Echo : News: Kenny Dalglish: Alberto Aquilani welcome back at Liverpool FC after Juventus loan ends http://bit.ly/emp5QZ #LFC #fb

Friday, April 22, 2011

How to implement a multithread echo server

The example implement a multithread echo server. Every incoming request is handed off to a worker thread that will process the request.

import socket
import thread

def handle(client_socket, address):
 while True:
  data = client_socket.recv(512)
  if data.startswith("exit"): # if data start with "exit"
   client_socket.close() # close the connection with the client
   break
  client_socket.send(data) # echo the received string

# opening the port 1075
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((socket.gethostname(),1075))
server.listen(2)

while True: # listen for incoming connections
 client_socket, address = server.accept()
 print "request from the ip",address[0]
 # spawn a new thread that run the function handle()
 thread.start_new_thread(handle, (client_socket, address))

And now we can use telnet to communicate with the server application:

$ telnet localhost 1075
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hi
hi
echo this!
echo this!
exit
Connection closed by foreign host.

Thursday, April 21, 2011

How to retrieve tweets from twitter

How to retrieve the most recent tweets of a twitter user without third party library. The example use the xml format provided by twitter to describe a user timeline.

import urllib
import xml.dom.minidom as minidom

def printTweets(username):
 timeline_xml = urllib.urlopen("http://twitter.com/statuses/user_timeline.xml?screen_name="+username)
 doc = minidom.parse(timeline_xml) # we're using the twitter xml format
 tweets = doc.getElementsByTagName("text") # tweet text is in ...
 
 for tweet in tweets:
  print "tweet:",tweet.childNodes[0].data,"\n"

## call the our function
printTweets("JustGlowing")

The function will print the 20 most recent JusetGlowing's tweet:

tweet: Security researchers find iPhones, 3G iPads track user location http://t.co/Fg9TIQy via @arstechnica 

tweet: White Blood Cells Solve Traveling-Salesman Problem http://zite.to/fMTv9J - RT @semanticvoid 

tweet: #IWouldTrade traffic in the city for a wonderful beach 

tweet: The time you enjoy wasting is not wasted time ~ Bertrand Russel 

tweet: numpy is a great tool, it make you feel like using matlab but you're using a free #python library #in

...