0% found this document useful (0 votes)

8 views18 pages

CTP-MD5 ch3

The document provides an overview of databases and SQL, explaining what a database is and how to use SQLite for data manipulation. It covers basic SQL commands such as CREATE, INSERT, SELECT, UPDATE, and DELETE, along with examples of how to implement them in Python. Additionally, it describes a Twitter spidering application that retrieves and stores Twitter account data in a database using SQL commands.

Uploaded by

sindhud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

CTP-MD5 ch3

Uploaded by

sindhud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

M O D U L E - 05

Cha pter 03

Using Databases and SQL

W h a t is a database?
 A database is a file that is organized for storing data.
 Database software maintains its performance by building indexes as data is added to the
database to allow the computer to jump quickly to a particular entry.
 SQLite is well suited to some of the data manipulation problems that we seein
Informatics such as the Twitter spidering application that we describe in this
chapter.

Database concepts
 When you first look at a database it looks like a spreadsheet with multiple sheets. The
primary data structures in a database are: tables, rows, and columns.
 In technical descriptions of relational databases the concepts of table, row, and column are
more formally referred to as relation, tuple, and attribute, respectively.

Table column
Relation
attribute

row 2.3
tuple
2.3

Figure 1: Relational Databases

Database Browser for SQLite

 Python to work with data in SQLite database files, many operations can be done more
conveniently using software called the Database Browser for SQLite which is freely
available from:
http://sqlitebrowser.org/
 Using the browser, you can easily create tables, insert data, edit data, or run simpleSQL
queries on the data in the database.
Creating a database table
 The code to create a database file and a table named Tracks with two columns inthe
database is as follows:

import s q l i t e 3

conn = s q l i t e 3 . c o n n e c t ( ' m u s i c . s q l i t e ' )

cur = conn.cursor()

cur.execute('DROP TABLE I F EXISTS Tr ac k s' )

cur.execute('CREATE TABLE Tracks ( t i t l e TEXT, p lay s

INTEGER)') conn.close()

 A cursor is like a file handle that we can use to perform operations on the data stored in the
database. Calling cursor() is very similar conceptually to calling open() when dealing with
text files.

Figure 2: A Database Cursor

 Database commands are expressed in a special language that has been standardizedacross
many different database vendors to allow us to learn a single database language. The
database language is called Structured Query Language or SQL for short.
 The first SQL command removes the Tracks table from the database if it exists. This
pattern is simply to allow us to run the same program to create the Tracks table
over
and over again without causing an error.
cur.execute('DROP TABLE I F EXISTS Tracks ' )

 The second command creates a table named Tracks with a text column named
t i t l e and an integer column named plays.

cur.execute('CREATE TABLE Tracks ( t i t l e TEXT, p l a y s

INTEGER)')

 Now that we have created a table named Tracks, we can put some data into that table using
the S Q L INSERT operation. Again, we begin by making a connection to the database
and obtaining the cursor. We can then execute SQL commands using the cursor.
 The S Q L INSERT command indicates which table we are using and then defines a new
row by listing the fields we want to include ( t i t l e , p l a y s ) followed by the VALUES we
want placed in the new row.
 We specify the values as question marks ( ? , ? ) to indicate that the actual values are
passed in as a tuple ( 'My Way',15 ) as the second parameter to the execute() call.

import s q l i t e 3

conn = s q l i t e 3 . c o n n e c t ( ' m u s i c . s q l i t e ' ) cur = conn.cursor()

cur.execute('INSERT INTO Tracks ( t i t l e , p l a y s ) VALUES ( ? , ? ) ' ,

( 'Th u n d er struck' , 20))
cur.execute('INSERT INTO Tracks ( t i t l e , p l a y s ) VALUES ( ? , ? ) ' , ('My Way',
15))
conn.commit()

p r i n t ( ' Tr a c k s : ' )
cur.execute('SELECT t i t l e , p l a y s FROM Tr a c k s' )
f o r row i n c u r :
print(row)

cur.execute('DELETE FROM Tracks WHERE p l a y s < 1 0 0') conn.commit()

c u r. c l o s e ( )
Tracks
title plays
Thunderstruck 20
My Way 15

Figure 3: Rows in a Table

 First we INSERT two rows into our table and use commit() to force the data to be
written to the database file.
 Then we use the SELECT command to retrieve the rows we just inserted from the table.
On the SELECT command, we indicate which columns we would like ( t i t l e , p l a y s )
and indicate which table we want to retrieve the data from.
 After we execute the SELECT statement, the cursor is something we can loop
through in a for statement. For efficiency, the cursor does not read all of the data
from the database when we execute the SELECT statement. Instead, the data is read on
demand as we loop through the rows in the for statement.
The output of the program is as follows:

Tr a c k s :
( ' T h u n d e r s t r u c k ' , 20)
('My Way', 15)

 The DELETE command shows the use of a WHERE clause that allows us to express a
selection criterion so that we can ask the database to apply the command to only the rows
that match the criterion.
 In this example the criterion happens to apply to all the rows so we empty the
table out so we can run the program repeatedly. After the DELETE is performed, we
also call commit() to force the data to be removed from the database.

Structured Q u e r y La ng ua g e s u m ma r y
 So far, we have been using the Structured Query Language in our Python examples and have
covered many of the basics of the SQL commands. In this section, we look at the SQL
language in particular and give an overview of SQL syntax.
 Since there are so many different database vendors, the Structured Query Language(SQL)
was standardized so we could communicate in a portable manner to databasesystems from
multiple vendors.
 A relational database is made up of tables, rows, and columns. The columns
generally have a type such as text, numeric, or date data. When we create a table,
we indicate the names and types of the columns:

CREATE TABLE Tracks ( t i t l e TEXT, p l a y s INTEGER)

 To insert a row into a table, we use the S Q L INSERT command:

INSERT INTO Tracks ( t i t l e , p l a y s ) VALUES ('My Way', 15)

 The INSERT statement specifies the table name, then a list of the
fields/columns that you would like to set in the new row, and then the keyword
VALUES and a list of corresponding values for each of the fields.
 The SQL SELECT command is used to retrieve rows and columns from a
database. The SELECT statement lets you specify which columns you would
like to retrieve as well as a WHERE clause to select which rows you would like
to see. It also allows an optional ORDER BY clause to control the sorting of the
returned rows.

SELECT * FROM Tracks WHERE t i t l e = 'My Way'

 Using * indicates that you want the database to return all of the columns for each
row that matches the WHERE clause.
You can request that the returned rows be sorted by one of the fields as follows:

SELECT t i t l e , p l a y s FROM Tracks ORDER BY t i t l e

 To remove a row, you need a WHERE clause on an SQL DELETE statement. The
WHERE clause determines which rows are to be deleted:

DELETE FROM Tracks WHERE t i t l e = 'My Way'

It is possible to UPDATE a column or columns within one or more rows in a table

using the SQL UPDATE statement as follows:

UPDATE Tracks SET p l a y s = 16 WHERE t i t l e = 'My Way'

 The UPDATE statement specifies a table and then a list of fields and values to
changeafter the SET keyword and then an optional WHERE clause to select the
rows that are to be updated. A single UPDATE statement will change all of the
rows that match the WHERE clause. If a WHERE clause is not specified, it
performs the UPDATEon all of the rows in the table.
 These four basic SQL commands (INSERT, S E L E CT, U P D AT E , and
DELETE)allow the four basic operations needed to create and maintain
data.
Spidering Twitter using a database
 we will create a simple spidering program that will go through Twitter accounts and
build a database of them.
Note: Be very careful when running this program. You do not want to pull
too much data or run the program for too long and end up having your
Twitter access shut off.
Here is the source code for our Twitter spidering application:

from u r l l i b . r e q u e s t import urlopen import u r l l i b . e r r o r

import twurl import jso n import s q l i t e 3 import s s l

TWITTER_URL = ' h t t p s : / / a p i . t w i t t e r. c o m / 1 . 1 / f r i e n d s / l i s t . j s o n ' conn =

s q l i t e 3 . c o n n e c t ( ' s p i d e r. s q l i t e ' )
cur = conn.cursor()

c u r. e x e c u t e ( ' ' '

CREATE TABLE I F NOT EXISTS Twitter
(name TEXT, r etrieved INTEGER, f r i e n d s I NTEGER) ''' )

c t x = ssl. c r e a te_d efau lt_co n text() ctx.check_hostname = False

ctx.verify_mode = ssl.CERT_NONE

while True:
acct = in p u t( 'En ter a Twitter account, or q u i t : ' )
i f ( a c c t == ' q u i t ' ) : break
i f (len(acct) < 1):
cur.execute('SELECT
name FROM Twitter
WHERE r etrieved = 0
L I MI T 1 ' )
try:
acct =
cur.fetchone()[ 0 ]
except:
p r in t( 'No
unretrieved Twitter
accounts found')
continue

u r l = twurl.augment(TWITTER_URL, {'screen_name': a c c t , ' c o u n t ' : ' 5 ' } )

print('Retrieving', url)
connection = u r l o p e n ( u r l , context=ctx) data
= connection.read().decode() headers =
dict(connection.getheaders())

p r i n t ( ' R e m a i n i n g ' , headers[ 'x- r a t e - l i m i t - r e m a i n i n g ' ] ) j s =

jso n.lo ad s( d ata)
# Debugging
# p r i n t j s o n . d u m p s( j s, indent=4)

cur.execute('UPDATE Twitter SET retrieved=1 WHERE name = ? ' , ( a c c t , ) )

countnew = 0
f r i e n d = u['screen_name']
print(friend)
cur.execute('SELECT
f r i e n d s FROM Twitter
WHERE name = ? LIMIT
1',
(friend, ) )
try:
count =
cur.fetchone()[ 0 ]
cur.execute('UPDATE
Twitter SET f r i e n d s
= ? WHERE name =
? ' , (count+1, f r i e n d ) )
countold = countold + 1
except:
cur.execute( '''INSERT INTO Twitter (name, r e t r i e v e d , f r i e n d s )
VALUES ( ? , 0 , 1 ) ' ' ' , ( f r i e n d , ) )
countnew = countnew + 1
print('New accounts=', countnew, ' r e v i s i t e d = ' , countold)
conn.commit()

c u r. c l o s e ( )

 Once we retrieve the list of friends and statuses, we loop through all of the useritems in
the returned J S O N and retrieve the screen_name for each user. Then we use the SELECT
statement to see if we already have stored this particular screen_name in the database and
retrieve the friend count ( f r i e n d s) if the record exists.

countnew = 0
countold = 0
for u in js['users'] :
f r i e n d = u['screen_name']
print(friend)
cur.execute('SELECT f r i e n d s FROM Twitter WHERE name = ? L I MI T 1 ' ,
(friend, ) )
try:
count = cur.fetchone()[ 0 ]
cur.execute('UPDATE Twitter SET f r i e n d s = ? WHERE name = ? ' ,
(count+1, f r i e n d ) )
countold = countold + 1
except:
cur.execute( '''INSERT INTO Twitter (name, r e t r i e v e d , f r i e n d s )
VALUES ( ? , 0 , 1 ) ' ' ' , ( f r i e n d , ) )
countnew = countnew + 1
print('New accounts=',countnew,' r e v i s i t e d = ' , c o u n t o l d )
conn.commit()

So the first time the program runs and we enter a Twitter account, the program runs as
follows:

E n t er a Twitter acco u n t, o r quit: d rch u ck

Retriev in g http://api.twitter.com/1.1/friends . . .
New accounts= 20 revisited= 0
E n t er a Twi t t er acco u n t, o r quit: quit

import s q l i t e 3

conn = s q l i t e 3 . c o n n e c t ( ' s p i d e r. s q l i t e ' )

cur = conn.cursor()
cur.execute('SELECT *
Tw i t t e r ' ) count = 0 FROM
f o r row i n c u r :
print(row)
count = count + 1
print(count, 'rows.')
c u r. c l o s e ( )

This program simply opens the database and selects all of the columns of all of the
rows in the table Twitter, then loops through the rows and prints out each row.

E n t er a Twitter acco u n t, o r quit:

Retriev in g http://api.twitter.com/1.1/friends . . .
New accounts= 18 revisited= 2
E n t er a Twitter acco u n t, o r quit:
Retriev in g http://api.twitter.com/1.1/friends . . .
New accounts= 17 revisited= 3
E n t er a Twi t t er acco u n t, o r quit: quit

Since we pressed enter (i.e., we did not specify a Twitter

account), the following code is executed:

i f ( len(acct) < 1 ) :
cur.execute('SELECT name FROM
Twitter WHERE retrieved = 0 L I MI T
1')
try:
acct = cur.fetchone()[ 0 ]
except:
p rin t( 'No unretrieved t w i t t e r
accounts found')
continue

 We use the S Q L SELECT statement to retrieve the name of the first ( LI MI T 1) user who
still has their “have we retrieved this user” value set to zero. We also use the fetchone()[0]
pattern within a try/except block to either extract a screen_name from the retrieved data or
put out an error message and loop back up.
If we successfully retrieved an unprocessed screen_name, we retrieve their data as
follows:

url=twurl.augment(TWITTER_URL,{'screen_name': a c c t , ' c o u n t ' : ' 2 0 ' } )

print('Retrieving', url)
connection = u r l l i b . u r l o p e n ( u r l )
data = connection.read()
j s = json.loads(data)

cur.execute('UPDATE Twitter SET retrieved=1 WHERE name

= ?',(acct, ) )

If we run the friend program and press enter twice to retrieve the next unvisited
friend’s friends, then run the dumping program, it will give us the following output:
Basic data modeling
 The real power of a relational database is when we create multiple tables and
makelinks between those tables.
 The act of deciding how to break up your application data into multiple tables
and establishing the relationships between the tablesis called data modeling.

 The design document that shows the tables and their relationships is called a
data model.
 create a new table that keeps track of pairs of friends. The following is a simple
way of making such a table:

CREATE TABLE P a l s (from_friend TEXT, to_friend TEXT)

Each time we encounter a person who drchuck is following, we would insert a row of
the form:

INSERT INTO P a l s ( fr om _f rien d ,to _ fr ien d ) VALUES ( ' d r c h u c k ' , 'lhawthorn')

As we are processing the 20 friends from the drchuck Twitter feed, we will insert
20 records with “drchuck” as the first parameter so we will end up duplicating the
string many times in the database.
 The People table has an additional column to store the numeric key associated with the
row for this Twitter user. SQLite hasa feature that automatically adds the key value for
any row we insert into a tableusing a special type of data column (INTEGER PRIMARY
KEY).
We can create the People table with this additional i d column as follows:

CREATE TABLE People

( i d INTEGER PRIMARY KEY, name TEXT UNIQUE, r etriev ed
INTEGER)

 Now instead of creating the table P a l s above, we create a table called Follows with
two integer columns from_id and to_id and a constraint on the table that the combination of
from_id and to_id must be unique in this table (i.e., we cannot insert duplicate rows)
in our database.

CREATE TABLE Follows

(from_id INTEGER, to_id INTEGER, UNIQUE(from_id, t o _ i d ) )
 When we add UNIQUE clauses to our tables, we are communicating a set of rules that we
are asking the database to enforce when we attempt to insert records.
P ro g r a mm i n g with multiple tables

Figure 4: Relationships Between Tables

import urllib.request, urllib.parse, urllib.error

import twurl
import json
import sqlite3
import ssl

TWITTER_URL = ' h t t p s : / / a p i . t w i t t e r. c o m / 1 . 1 / f r i e n d s / l i s t . j s o n '

conn = s q l i t e 3 . c o n n e c t ( ' f r i e n d s . s q l i t e ' )

cur = conn.cursor()

cur.execute('''CREATE TABLE I F NOT EXISTS

People
( i d INTEGER PRIMARY KEY, name TEXT UNIQUE, retrieved
INTEGER) ''' )
cur.execute('''CREATE TABLE I F NOT EXISTS
Follows
(from_id INTEGER, to_id INTEGER, UNIQUE(from_id, t o _ i d ) ) ' ' ' )

# Ignore SSL c e r t i f i c a t e e r ro r s
c t x = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
acct = in p u t( 'En ter a Twitter
account, or q u i t : ' )
i f ( a c c t == ' q u i t ' ) : break
i f (len(acct) < 1):
cur.execute('SELECT
i d , name FROM People
WHERE retrieved = 0
L IMIT 1 ' )
try:
( i d , acct) =
except:
p rin t( 'No unretrieved Twitter accounts found')
continue
else:
cur.execute('SELECT i d FROM People WHERE name = ? LIMIT
1 ' , ( acct, ) )
try:
i d = cur.fetchone()[ 0 ]
except:
cur.execute( '''INSERT OR IGNORE INTO People
(name, r e t r i e v e d ) VALUES ( ? , 0 ) ' ' ' , ( a c c t , ) )
conn.commit()
i f cur.rowcount ! = 1 :
p r i n t ( ' E r r o r i n s e r t i n g a cc o u n t:' , acct)
continue
i d = cur.lastro wid

u r l = twurl.augment(TWITTER_URL, {'screen_name':
a c c t , 'co u n t' : ' 1 0 0 ' } ) p r i n t ( ' R e t r i e v i n g account', acct)
try:
connection = u r l l i b . r e q u e s t . u r l o p e n ( u r l ,
context=ctx)
except Exception as e r r :
p r i n t ( ' F a i l e d to R e t r i e v e ' , e r r )
break

data = connection.read().decode()
headers
= dict(connection.getheaders())

p r i n t ( ' R e m a i n i n g ' , headers[ 'x- r a t e -

l i m i t - remaining' ])

try:
j s = json.loads(data)
except:
p rint( 'Unable to parse j s o n ' )
p rin t(d ata)
break

# Debugging
# print(json.dumps(js,
indent=4))

i f ' u s e r s ' not i n j s :

p r i n t ( ' I n c o r r e c t JSON r ec eiv ed ' )
p r i n t ( j s o n . d u m p s ( j s , indent=4))
continue

cur.execute('UPDATE People SET retrieved=1 WHERE name = ? ' , ( a c c t , ) )

countnew = 0
countold = 0
for u in js['users']:
f r i e n d = u['screen_name']
print(friend)
cur.execute('SELECT i d FROM People WHERE name = ? LIMIT
1', (friend, ) )
try:
f r i e n d _ i d = cur.fetchone()[ 0 ] countold =
countold + 1
except:
cur.execute( '''INSERT OR IGNORE INTO People (name, r etr ie v e d )
VALUES ( ? , 0 ) ' ' ' , ( f r i e n d , ) )
conn.commit()
i f cur.rowcount != 1 :
p rin t( 'Er ro r inserting account:', friend)
continue
f r i e n d _ i d = cur.lastrowid countnew =
countnew + 1
cur.ex ecu te( '''INSERT OR IGNORE
INTO Follows (from_id,
t o _ i d ) VALUES ( ? , ? ) ' ' ' , ( i d , f r i e n d _ i d ) )
print('New accounts=', countnew, ' r e v i s i t e d = ' , countold)
p r i n t ( ' R e m a i n i n g ' , headers[ 'x- r a t e - l i m i t - r emain ing ' ] )
conn.commit()
c u r. c l o s e ( )

# Code: http://www.py4e.com/code3/twfriends.py

This program is starting to get a bit complicated, but it illustrates the patterns
that we need to use when we are using integer keys to link tables. The basic patterns
are:

1. Create tables with primary keys and constraints.

2. When we have a logical key for a person (i.e., account name) and we need the
i d value for the person, depending on whether or not the person is
already in the People table we either need to: (1) look up the person in
the People table and retrieve the i d value for the person or (2) add the
person to the People table and get the i d value for the newly
added row.
3. Insert the row that captures the “follows” relationship.

Constraints in database tables

 As we design our table structures, we can tell the database system that we would
like it to enforce a few rules on us. These rules help us from making mistakes and
introducing incorrect data into out tables. When we create our tables:

cur.execute('''CREATE TABLE I F NOT EXISTS People

( i d INTEGER PRIMARY KEY, name TEXT UNIQUE, r etriev ed
I NTEGER) ''' )
cur.execute('''CREATE TABLE I F NOT EXISTS Follows
(from_id INTEGER, to_id INTEGER, UNIQUE(from_id, t o _ i d ) ) ' ' ' )
We indicate that the name column in the People table must be UNIQUE. We also
indicate that the combination of the two numbers in each row of the Follows table
must be unique. These constraints keep us from making mistakes such as adding the
same relationship more than once.
We can take advantage of these constraints in the following code:

cur.ex ecute( '''INSERT OR IGNORE INTO People (name, r etr ie v e d )

VALUES ( ? , 0 ) ' ' ' , ( f r i e n d , ) )

We add the OR IGNORE clause to our INSERT statement to indicate that if this
particular INSERT would cause a violation of the “name must be unique” rule, the
database system is allowed to ignore the INSERT. We are using the database con-
straint as a safety net to make sure we don’t inadvertently do something incorrect.
Similarly, the following code ensures that we don’t add the exact same Follows
relationship twice.

cur.ex ecute( '''INSERT OR IGNORE INTO Follows

( fr om _id, t o _ i d ) VALUES ( ? , ? ) ' ' ' , ( i d , f r i e n d _ i d ) )

Again, we simply tell the database to ignore our attempted INSERT if it would
violate the uniqueness constraint that we specified for the Follows rows.

Retrieve and/or insert a record

When we prompt the user for a Twitter account, if the account exists, we must
look up its i d value. If the account does not yet exist in the People table, we must
insert the record and get the i d value from the inserted row.
This is a very common pattern and is done twice in the program above. This code
shows how we look up the i d for a friend’s account when we have extracted a
screen_name from a user node in the retrieved Twitter JSON.
Since over time it will be increasingly likely that the account will already be in
the database, we first check to see if the People record exists using a SELECT
statement.
If all goes well 2 inside the try section, we retrieve the record using fetchone()
and then retrieve the first (and only) element of the returned tuple and store it in
f rien d _id.
If the SELECT fails, the fetchone()[0] code will fail and control will transfer
intothe except section.

f r i e n d = u['screen_name']
cur.execute('SELECT i d FROM People WHERE name = ? LIMIT
1', (friend, ) )
try:
2 In general, when a sentence starts with “if all goes well” you will find that the code needs

to use try/except.
f r i e n d _ i d = cur.fetchone()[ 0 ]
countold = countold + 1
except:
cur.ex ecute( '''INSERT OR IGNORE INTO People (name, r etr ie v e d )
VALUES ( ? , 0 ) ' ' ' , ( f r i e n d , ) )
conn.commit()
i f cur.rowcount != 1 :
p rin t( 'Er ro r inserting account:',friend)
continue
f r i e n d _ i d = cur.lastrowid
countnew = countnew + 1

If we end up in the except code, it simply means that the row was not found, sowe
must insert the row. We use INSERT OR IGNORE just to avoid errors and then
call commit() to force the database to really be updated. After the write is done, we
can check the cur.rowcount to see how many rows were affected. Since we are
attempting to insert a single row, if the number of affected rows is something other
than 1, it is an error.
If the INSERT is successful, we can look at c u r. l a st r o w i d to find out what
value the database assigned to the i d column in our newly created row.

Storing the friend relationship

Once we know the key value for both the Twitter user and the friend in the JSON,it
is a simple matter to insert the two numbers into the Follows table with the
following code:

cur.execute('INSERT OR IGNORE INTO Follows ( f r o m _ id , t o _ i d )

VALUES ( ? ,
? ) ' , ( i d , friend_id) )

Notice that we let the database take care of keeping us from “double-inserting” a
relationship by creating the table with a uniqueness constraint and then addingOR
IGNORE to our INSERT statement.
Here is a sample execution of this program:

E n t er a Twitter acco u n t, o r quit:

N o unretrieved Twitter accounts found
Enter a Twitter account, or quit: drchuck
Retriev in g http://api.twitter.com/1.1/friends . . . N e w
accounts= 20 revisited= 0
E n t er a Twitter acco u n t, o r quit:
Retriev in g http://api.twitter.com/1.1/friends . . . N e w
accounts= 17 revisited= 3
E n t er a Twitter acco u n t, o r quit:
Retriev in g http://api.twitter.com/1.1/friends . . . N e w
accounts= 17 revisited= 3
E n t er a Twi t t er acco u n t, o r quit: quit
We started with the drchuck account and then let the program automatically pick the
next two accounts to retrieve and add to our database.
The following is the first few rows in the People and Follows tables after this run
is completed:

People:
(1, 'drchuck', 1)
( 2 , ' o p e n c o n t e n t ' , 1)
(3, 'lhawthorn', 1)
(4, 'steve_coppin', 0)
(5, 'davidkocher', 0)
55 rows.
Fo llo ws
:
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
60 rows.

You can see the i d , name, and v i s i t e d fields in the People table and you see
the numbers of both ends of the relationship in the Follows table. In the People
table, we can see that the first three people have been visited and their data hasbeen
retrieved. The data in the Follows table indicates that drchuck (user 1) is a friend
to all of the people shown in the first five rows. This makes sense because the first
data we retrieved and stored was the Twitter friends of drchuck. If you were to
print more rows from the Follows table, you would see the friends of users 2 and 3
as well.

Thre e kinds of keys

Now that we have started building a data model putting our data into multiple linked
tables and linking the rows in those tables using keys, we need to look at some
terminology around keys. There are generally three kinds of keys used in a database
model.
• A logical key is a key that the “real world” might use to look up a row. In
our example data model, the name field is a logical key. It is the screen
name for the user and we indeed look up a user’s row several times in the
program using the name field. You will often find that it makes sense
to add a UNIQUE constraint to a logical key. Since the
logical key is how we look up a row from the outside world, it
makes little sense to allow multiple rows with the same
value in the table.
• A primary key is usually a number that is assigned automatically by the
database. It generally has no meaning outside the program and is only
used to link rows from different tables together. When we want to
look up a rowin a table, usually searching for the row using
the primary key is the fastestway to find the row. Since primary keys are
integer numbers, they take upvery little storage and can be
compared or sorted very quickly. In our datamodel, the i d
field is an example of a primary key.
• A foreign key is usually a number that points to the primary key of an
associated row in a different table. An example of a foreign key in
our data model is the from_id.

We are using a naming convention of always calling the primary key field name i d
and appending the suffix _ i d to any field name that is a foreign key.

Us i n g J O I N to retrieve data
Now that we have followed the rules of database normalization and have data
separated into two tables, linked together using primary and foreign keys, we needto
be able to build a SELECT that reassembles the data across the tables.
S Q L uses the JOIN clause to reconnect these tables. In the JOIN clause you specify
the fields that are used to reconnect the rows between the tables.
The following is an example of a SELECT with a JOIN clause:

SELECT * FROM Follows JOIN People

ON Follows.from_id = Peo p le.id WHERE Peo p le . id = 1

The JOIN clause indicates that the fields we are selecting cross both the Follows
and People tables. The ON clause indicates how the two tables are to be joined:
Take the rows from Follows and append the row from People where the field
from_id in Follows is the same the i d value in the People table.

People
Follows
id name retrieved

1 drchuck 1
2 opecontent 1
3 lhawthorn 1
4 steve_coppin 0
...

name id from_id to_id name

drchuck 1 1 2 opencontent
drchuck 1 1 3 lhawthorn
drchuck 1 1 4 steve_coppin

Figure 15.5: Connecting Tables Using JOIN

The result of the J O I N is to create extra-long “metarows” which have both the
fields from People and the matching fields from Follows. Where there is more
than one match between the i d field from People and the from_id from People,
then J O I N creates a metarow for each of the matching pairs of rows, duplicating
data as needed.
The following code demonstrates the data that we will have in the database after the
multi-table Twitter spider program (above) has been run several times.

import s q l i t e 3

conn = s q l i t e 3 . c o n n e c t ( ' f r i e n d s . s q l i t e ' )

cur = conn.cursor()

cur.execute('SELECT * FROM
Peo p le') count = 0
print('People:')
f o r row i n cur :
i f count < 5 : print(row)
c o u n t = cou n t + 1
print(count, 'rows.')

cur.execute('SELECT * FROM
F o l l o w s ' ) count = 0
print('Follows:')
f o r row i n cur :
i f count < 5 : print(row)
c o u n t = cou n t + 1
print(count, 'rows.')

cur.ex ecute( '''SELECT * FROM Follows JOIN

People ON Fo llo ws.to _ id = People.id
WHERE Follows.from_id = 2 ' ' ' )
count = 0
print('Connections f o r i d = 2 : ' )
f o r row i n cur :
i f count < 5 : print(row)
c o u n t = cou n t + 1
print(count, 'rows.')

c u r. c l o s e ( )

# Code:
http://www.py4e.com/code3/twjoi
n.py

In this program, we first dump out the People and Follows and then dump out
a subset of the data in the tables joined together.
Here is the output of the program:

python twjoin.py
People:
(1, 'drchuck', 1)
( 2 , ' o p e n c o n t e n t ' , 1)
(3, 'lhawthorn', 1)
(4, 'steve_coppin', 0)
(5, 'davidkocher', 0)
55 rows.
Fo llo ws
:
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
60 rows.
Connect
ions for
id=2:
(2, 1,
1,
'drchuc
k', 1)
(2, 2 8 ,
28,
'cnxorg
', 0)
(2, 3 0 ,
30,
'kthano
s', 0)
(2,
102,
102,
'Someth
ingGirl'
, 0)
(2,
103,
103,
' j a_ P ac' ,
0)
20 rows.

You see the columns from the People and Follows tables and the last set of rows
is the result of the SELECT with the JOIN clause.
In the last select, we are looking for accounts that are friends of “opencontent”
(i.e., People.id=2).
In each of the “metarows” in the last select, the first two columns are from the
Follows table followed by columns three through five from the People table. You
can also see that the second column ( Fo llo ws.to _ id ) matches the third column
(Peo p le.id ) in each of the joined-up “metarows”.

Applied DAX With Power BI
100% (1)
Applied DAX With Power BI
402 pages
Book SQL PDF
No ratings yet
Book SQL PDF
76 pages
Creating Tables in MS-Access
No ratings yet
Creating Tables in MS-Access
18 pages
SQL Sqlite Commands Cheat Sheet PDF
No ratings yet
SQL Sqlite Commands Cheat Sheet PDF
5 pages
M4 Python SQL
No ratings yet
M4 Python SQL
40 pages
Unit IV Part1
No ratings yet
Unit IV Part1
5 pages
Database Using Python
No ratings yet
Database Using Python
7 pages
9 Database Handling
No ratings yet
9 Database Handling
24 pages
Unit 6
No ratings yet
Unit 6
20 pages
4th Module Python
No ratings yet
4th Module Python
33 pages
Python SQLite
No ratings yet
Python SQLite
7 pages
01 Introduction To SQLite
No ratings yet
01 Introduction To SQLite
42 pages
Unit 5
No ratings yet
Unit 5
30 pages
Python Programming Unit-5
No ratings yet
Python Programming Unit-5
50 pages
Pythonlearn 15 Databases
No ratings yet
Pythonlearn 15 Databases
96 pages
Database Programming
No ratings yet
Database Programming
16 pages
Database, Mysql, SQL
No ratings yet
Database, Mysql, SQL
37 pages
GRE Committed Registration Receipt PDF
No ratings yet
GRE Committed Registration Receipt PDF
73 pages
Lecture 2 Examples
No ratings yet
Lecture 2 Examples
38 pages
Databases Python
No ratings yet
Databases Python
44 pages
Module 4
No ratings yet
Module 4
30 pages
Lec02 Data Models
No ratings yet
Lec02 Data Models
16 pages
Unit IV Part2
No ratings yet
Unit IV Part2
5 pages
Database Connectivity
No ratings yet
Database Connectivity
12 pages
ML PGM
No ratings yet
ML PGM
8 pages
SBL Python LAB Manual by NY Expt. No. 6
No ratings yet
SBL Python LAB Manual by NY Expt. No. 6
5 pages
Relational Databases and Mysql: This Work Is Licensed Under A
No ratings yet
Relational Databases and Mysql: This Work Is Licensed Under A
47 pages
SQL Update Delete
No ratings yet
SQL Update Delete
6 pages
Python Module 5 Important Questions
No ratings yet
Python Module 5 Important Questions
14 pages
Structured Query Language
No ratings yet
Structured Query Language
23 pages
Python MYSQL
No ratings yet
Python MYSQL
29 pages
Basic Design and Implementation
No ratings yet
Basic Design and Implementation
23 pages
Dbms Notes
No ratings yet
Dbms Notes
13 pages
SQL Using Python
No ratings yet
SQL Using Python
14 pages
DB Connectivity
No ratings yet
DB Connectivity
67 pages
Chapter 3 Persistence and Databases
No ratings yet
Chapter 3 Persistence and Databases
70 pages
Module 3 Notes
No ratings yet
Module 3 Notes
45 pages
Python MySQL Connectivity
No ratings yet
Python MySQL Connectivity
26 pages
PythonSQLite
No ratings yet
PythonSQLite
6 pages
Week 10 OOP SQL and Javafx PDF
No ratings yet
Week 10 OOP SQL and Javafx PDF
47 pages
MySQL Connectivity With Python
No ratings yet
MySQL Connectivity With Python
7 pages
Interface Python With MySQL - 1
No ratings yet
Interface Python With MySQL - 1
28 pages
PY Mod 4
No ratings yet
PY Mod 4
20 pages
Interface With Python 2
No ratings yet
Interface With Python 2
15 pages
Dataquest - Io-Tutorial Inserting Records and DataFrames Into A SQL Database
No ratings yet
Dataquest - Io-Tutorial Inserting Records and DataFrames Into A SQL Database
15 pages
Python Module IV
No ratings yet
Python Module IV
19 pages
Week 5
No ratings yet
Week 5
20 pages
Unit 5 Python
No ratings yet
Unit 5 Python
13 pages
Python Interface With SQL Databases
No ratings yet
Python Interface With SQL Databases
8 pages
SQL Handbook
No ratings yet
SQL Handbook
26 pages
Unit-5 Python
100% (1)
Unit-5 Python
36 pages
CS1 1 1
No ratings yet
CS1 1 1
2 pages
Some More Programming Examples and Coding Challenges Week-5
No ratings yet
Some More Programming Examples and Coding Challenges Week-5
10 pages
Unit 4 Python
No ratings yet
Unit 4 Python
52 pages
Rdbms File
No ratings yet
Rdbms File
46 pages
Topic 4: Introduction To Database Management Systems
No ratings yet
Topic 4: Introduction To Database Management Systems
30 pages
SQL Handbook
No ratings yet
SQL Handbook
26 pages
Rdbms File Dabba
No ratings yet
Rdbms File Dabba
45 pages
Interface Python With MySQL-2
No ratings yet
Interface Python With MySQL-2
5 pages
Grade XII DBMS
No ratings yet
Grade XII DBMS
33 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
SQL in 30 Pages
From Everand
SQL in 30 Pages
U.Q. Magnusson
4/5 (12)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
Bindu K N - 22AM44 - 2023-24
No ratings yet
Bindu K N - 22AM44 - 2023-24
2 pages
CTP M5 CH1, CH2
No ratings yet
CTP M5 CH1, CH2
18 pages
DBMS Mini Project Report
No ratings yet
DBMS Mini Project Report
25 pages
VTU Syllabus Software Testing Laboratory Pages 114 116 114 117
No ratings yet
VTU Syllabus Software Testing Laboratory Pages 114 116 114 117
2 pages
Informatica Filter Transformation
No ratings yet
Informatica Filter Transformation
33 pages
Optim Exit Routine
No ratings yet
Optim Exit Routine
524 pages
MySQL Queries 11
No ratings yet
MySQL Queries 11
44 pages
SQL Server Till Basic Group by
No ratings yet
SQL Server Till Basic Group by
31 pages
Student Task Management System
No ratings yet
Student Task Management System
41 pages
Structured Query Language - SQLite
No ratings yet
Structured Query Language - SQLite
31 pages
LeapfrogGeoTutorials PDF
100% (3)
LeapfrogGeoTutorials PDF
92 pages
Types of Keys in Database Design
No ratings yet
Types of Keys in Database Design
3 pages
Chapter 10 Normalization
No ratings yet
Chapter 10 Normalization
58 pages
XII IP Practicals B With Answers
No ratings yet
XII IP Practicals B With Answers
5 pages
Details of Delta Lake Tutorial
67% (3)
Details of Delta Lake Tutorial
43 pages
Prasad Reddy19 - Power BI 4.2yr
No ratings yet
Prasad Reddy19 - Power BI 4.2yr
4 pages
Structured Query Language (SQL)
No ratings yet
Structured Query Language (SQL)
30 pages
Assignment 1: Due Date: January 25, 2022, 11:55pm IST
No ratings yet
Assignment 1: Due Date: January 25, 2022, 11:55pm IST
5 pages
SQL Functions
No ratings yet
SQL Functions
9 pages
3rd Term SS 1 DP NOTES
No ratings yet
3rd Term SS 1 DP NOTES
20 pages
Chapter 12 Forms and Reports IT Code 402 Book Solution Class 10 - MyCSTutorial - The Path To Success in Exam
No ratings yet
Chapter 12 Forms and Reports IT Code 402 Book Solution Class 10 - MyCSTutorial - The Path To Success in Exam
7 pages
Chapter 6 Information Management Basics
No ratings yet
Chapter 6 Information Management Basics
55 pages
Oracle Database Question Bank 1
No ratings yet
Oracle Database Question Bank 1
5 pages
A5 SQL DB Application Architecture
No ratings yet
A5 SQL DB Application Architecture
24 pages
Information Technology For Sports Management
No ratings yet
Information Technology For Sports Management
7 pages
Dbamp Refresh and Replicate Optimizations
No ratings yet
Dbamp Refresh and Replicate Optimizations
35 pages
DCIT 24 Reviewer
No ratings yet
DCIT 24 Reviewer
16 pages
2 Oracle Developer Tables Indexes Essentials m2 Slides
No ratings yet
2 Oracle Developer Tables Indexes Essentials m2 Slides
28 pages
SQL Commands
100% (1)
SQL Commands
28 pages
Cookbook Examples Langchain Chat With SQL Using Langchain - Ipynb at Main Google-Gemini Cookbook
No ratings yet
Cookbook Examples Langchain Chat With SQL Using Langchain - Ipynb at Main Google-Gemini Cookbook
9 pages

CTP-MD5 ch3

Uploaded by

CTP-MD5 ch3

Uploaded by

M O D U L E - 05

Using Databases and SQL

Figure 1: Relational Databases

Database Browser for SQLite

conn = s q l i t e 3 . c o n n e c t ( ' m u s i c . s q l i t e ' )

cur.execute('DROP TABLE I F EXISTS Tr ac k s' )

Figure 2: A Database Cursor

cur.execute('CREATE TABLE Tracks ( t i t l e TEXT, p l a y s

conn = s q l i t e 3 . c o n n e c t ( ' m u s i c . s q l i t e ' ) cur = conn.cursor()

cur.execute('INSERT INTO Tracks ( t i t l e , p l a y s ) VALUES ( ? , ? ) ' ,

cur.execute('DELETE FROM Tracks WHERE p l a y s < 1 0 0') conn.commit()

Figure 3: Rows in a Table

CREATE TABLE Tracks ( t i t l e TEXT, p l a y s INTEGER)

INSERT INTO Tracks ( t i t l e , p l a y s ) VALUES ('My Way', 15)

SELECT * FROM Tracks WHERE t i t l e = 'My Way'

SELECT t i t l e , p l a y s FROM Tracks ORDER BY t i t l e

DELETE FROM Tracks WHERE t i t l e = 'My Way'

It is possible to UPDATE a column or columns within one or more rows in a table

UPDATE Tracks SET p l a y s = 16 WHERE t i t l e = 'My Way'

from u r l l i b . r e q u e s t import urlopen import u r l l i b . e r r o r

TWITTER_URL = ' h t t p s : / / a p i . t w i t t e r. c o m / 1 . 1 / f r i e n d s / l i s t . j s o n ' conn =

c u r. e x e c u t e ( ' ' '

c t x = ssl. c r e a te_d efau lt_co n text() ctx.check_hostname = False

u r l = twurl.augment(TWITTER_URL, {'screen_name': a c c t , ' c o u n t ' : ' 5 ' } )

p r i n t ( ' R e m a i n i n g ' , headers[ 'x- r a t e - l i m i t - r e m a i n i n g ' ] ) j s =

cur.execute('UPDATE Twitter SET retrieved=1 WHERE name = ? ' , ( a c c t , ) )

E n t er a Twitter acco u n t, o r quit: d rch u ck

conn = s q l i t e 3 . c o n n e c t ( ' s p i d e r. s q l i t e ' )

E n t er a Twitter acco u n t, o r quit:

Since we pressed enter (i.e., we did not specify a Twitter

url=twurl.augment(TWITTER_URL,{'screen_name': a c c t , ' c o u n t ' : ' 2 0 ' } )

cur.execute('UPDATE Twitter SET retrieved=1 WHERE name

CREATE TABLE P a l s (from_friend TEXT, to_friend TEXT)

INSERT INTO P a l s ( fr om _f rien d ,to _ fr ien d ) VALUES ( ' d r c h u c k ' , 'lhawthorn')

CREATE TABLE People

CREATE TABLE Follows

Figure 4: Relationships Between Tables

import urllib.request, urllib.parse, urllib.error

TWITTER_URL = ' h t t p s : / / a p i . t w i t t e r. c o m / 1 . 1 / f r i e n d s / l i s t . j s o n '

conn = s q l i t e 3 . c o n n e c t ( ' f r i e n d s . s q l i t e ' )

cur.execute('''CREATE TABLE I F NOT EXISTS

p r i n t ( ' R e m a i n i n g ' , headers[ 'x- r a t e -

i f ' u s e r s ' not i n j s :

cur.execute('UPDATE People SET retrieved=1 WHERE name = ? ' , ( a c c t , ) )

1. Create tables with primary keys and constraints.

Constraints in database tables

cur.execute('''CREATE TABLE I F NOT EXISTS People

cur.ex ecute( '''INSERT OR IGNORE INTO People (name, r etr ie v e d )

cur.ex ecute( '''INSERT OR IGNORE INTO Follows

Retrieve and/or insert a record

Storing the friend relationship

cur.execute('INSERT OR IGNORE INTO Follows ( f r o m _ id , t o _ i d )

E n t er a Twitter acco u n t, o r quit:

Thre e kinds of keys

SELECT * FROM Follows JOIN People

name id from_id to_id name

Figure 15.5: Connecting Tables Using JOIN

conn = s q l i t e 3 . c o n n e c t ( ' f r i e n d s . s q l i t e ' )

cur.ex ecute( '''SELECT * FROM Follows JOIN

You might also like