Lesson 10 Working with External Data Types
Lesson 10 Working with External Data Types
Files
Introduction to Python Programming
Lesson 10: Working With External
Data Files
Lesson Overview
➢
Python File Basics
➢
More File Operations
➢
Pickle and Shelve
➢
Parsing CSV Files With Python’s
Built-in CSV Library
➢
Parsing CSV Files With the pandas
Library
Introduction
➢
Imagine if you
decided to write a
program to keep
track of orders for
your company.
➢
You'd probably
need some kind of
permanent record
of those
transactions, right?
3
Introduction
➢
In this lesson, you'll
learn how to
open data files,
➢
➢
Then you’ll take a
look at file elements
that are more
advanced,
➢
like creating random access
files with Python's pickle( ) and
➢
shelve( ) functions.
4
Python File Basics: Opening a Data File
➢
To open a file, you use the
open( ) function, provide the
name of the file, and then
specify whether you'll be
reading from or writing to
the file.
➢
Note that the open( )
function returns a file object
that you'll store in a
variable to be used in your
output statements.
5
Python File Basics: Where is the Data
File Saved?
➢
The save location for the data file depends on how
you create it.
➢
When the above line of code is executed directly
at the interpreter's prompt, the new file
(mydata.txt) is saved in the same location as your
Python executable program (python.exe) when
running from the IDLE prompt.
➢
However, when running using Run as Module, it’s
saved the same location as your source code file.
➢
And if you save all your Python statements into their own source code file,
when you run the program, the new file is saved in the same directory as
your source code file.
6
Python File Basics: Where is the Data
File Saved?
➢
So if you want to be sure of your file's location,
you can simply list the full path instead of just a
filename in your open statement.
➢
For example, if you wanted this file saved on
your Desktop, you could with this:
7
Python File Basics: Writing to a Data
File
8
Python File Basics: Writing to a Data
File
➢
If you issue a command to write data to a file,
the data might not immediately appear there.
➢
This is because the file access is a time-consuming operation.
Therefore, the computer might wait for more data to be written to the
file
➢
If you want to force the data to be written
immediately, you can use the flush( ) function.
Here's an example of the code:
out_file.flush( )
9
Python File Basics: Writing to a Data
File
➢
The last important part in this cycle is to close
the file. For that, you'll use the close( ) function.
For example, the following code closes my
out_file object:
out_file.close()
➢
There is no output because instead of calling
the print() function we wrote the data into the
file.
10
Python File Basics: Adding Line Breaks
➢
You include a newline
character, \n every time
you want to move a
string to the next line in
a file.
11
Python File Basics: The writelines()
Function
➢
This function will also
enable you to write
content to a file, but
instead of passing a
single value, you’ll need
to pass it some type of
collection of values, like
a list you created in the
previous lesson.
12
Python File Basics: Reading from Files
with read()
➢
To read a file instead of
opening your file with a
'w', you open it with an
'r'.
➢
The variable names
out_file and in_file are
different just because of
convention adopted by
many python
programmers.
13
Python File Basics: Reading from Files
with read()
➢
Once the file is opened, there are three functions
you can use to read from a file: read( ),
readline( ), and readlines( ).
➢
Although each function will read from the file,
each works a little differently so you’ll learn about
each one separately.
14
Python File Basics: The read( ) Function
➢
When using the read( )
function, you can provide
a number of bytes to be
read in.
➢
However, Python also
allows you to leave the
parentheses empty.
When you do, the rest of
the data from the file will
be read.
15
Python File Basics: The read( ) Function
In the example:
➢
16
Python File Basics: The readline( )
Function
➢
The readline( ) function
reads an entire line of data
from a file if the
parentheses are empty, and
optionally accepts a
maximum number of bytes
in the parentheses.
➢
You might want to use one
for entire lines and the
other for certain numbers of
characters in your code; it
can help make your code a
little easier to understand.
17
Python File Basics: The readlines()
Function
➢
Like the other two,
readlines( ) gives you the
ability to provide a
maximum number of
bytes to be read.
➢
However, if you don't
provide a number of
bytes, it'll read to the end
of the file, not just to the
end of the line.
18
More File Operations: Appending to an
Existing File
➢
If you want to have a file
that logs information about
the user each time they work
with your program, and
doesn’t erase the previous
data each time- you'll need
to open the file in append
mode with the 'a' argument.
➢
This opens that same data
file you were using before,
but this time it'll keep all the
existing data and add in the
new data at the end of the
file.
19
More File Operations: Other Options
when Opening Files
➢
As you have seen, opening a file for output
means you can only write to it, and opening a file
for input means you can only read from it.
➢
Although you can always just close a file and then reopen it in the
other mode, that extra set of steps can be a hassle.
➢
For this reason, Python provides two other ways
of opening our files: 'r+' and 'w+'.
➢
While both ways give you the ability to both
read and write your files, there is a difference.
20
More File Operations: Other Options
when Opening Files
➢
If you attempt to open your file with 'r+' and
that file doesn't exist, then Python will generate
an IOError exception, and your program will stop.
That’s because you can’t read from a file that doesn’t exist.
➢
➢
On the other hand, if you open a nonexistent file
with 'w+', then Python will simply create one for
you.
➢
Note, however, that if that file did exist, then its data would be erased,
just as if you had opened the file with 'w'.
21
More File Operations:The tell( ) Function
➢
To read and write to the same
file at the same time you can
still use the read( ) and
write( ) functions that you
learned earlier.
➢
The only difference is that now you have to
keep track of your current position in the
file as you do your reading and writing.
➢
You can always find out where
you are in your file with the
tell( ) function. This will give
you your current file position
as the number of bytes from
the start of the file.
22
More File Operations:The tell( ) Function
23
More File Operations:The seek()
Function
➢
Of course, if there's a way
to determine where you're
located in a file, there's
also a way to change that
location. You can do this
with the seek( ) function.
➢
When you use seek( ), you
must provide where in the
file you want to move by
specifying a number that
represents the number of
bytes from the beginning
of the file.
24
More File Operations:Reading and
Writing at the Same Time
25
More File Operations:
➢
Now that you understand how to work with
basic files, let’s next explore how to work with
a database-like file.
For that you'll need to learn about a couple
➢
26
Pickle and Shelve: Introduction to Pickle
of bytes.
➢
This stream can then be reconverted to the original object
later.
This is a useful thing to do for a couple reasons.
➢
➢
First, by converting to a string of bytes, we’ll actually save a little bit of space on the disk.
➢
But possibly more important than this, is that we can store an entire object using just a
single line of code.
➢
That is, without the ability to use pickle, we would need to store every field of an object
one at a time in the file, and then when we restored the data, we would need to read in
each piece of data and create a new object. Even for something like an object from our
little Time class, we’re talking about replacing three lines of code: one for each of the
hour, minute, and second, into a single line.
27
Pickle and Shelve: Introduction to Pickle
➢
There are two different ways to pickle an object,
depending on where you want the result to be
stored.
➢
Note, however, that in order to use any of the
pickling functions, you need to have an import
statement to import pickle.
➢
You use the first function, dumps( ), if you want to store the result in a
string.
The other function, dump( ), stores the result in a file.
➢
28
Pickle and Shelve: Introduction to Pickle
➢
Example
29
Pickle and Shelve: Introduction to Pickle
➢
One of the reasons pickling is so important is
because we’re actually storing the values of all
instance variables in the object in a single line of
code.
➢
If we didn’t have the ability to use pickle, then we would need to access
each data member, one at a time.
➢
Pickling converts your list to a stream of bytes
that you store as a string.
➢
It shouldn't be a surprise that the stream is hard to read. However,
putting that stream in a data file for later use can be quite helpful in
certain situations, as you see next.
➢
30
Pickle and Shelve: Pickling to a File
➢
Sending the pickled result to a file is very similar
to sending it to a string, with two differences.
First, the function name is different: remember,
➢
31
Pickle and Shelve: Pickling to a File
➢
For example, if you want
to send the list above to
a data file named,
data.txt, you'd need to
do this:
32
Pickle and Shelve: Pickling from a File
➢
To get the data back to its
original form- you'll need
either the loads( ) or the
load( ) function.
➢
Notice how loads( ) works
just like dumps( ).
➢
That is, you place the variable that's
holding the pickled data inside the
parentheses.
➢
The result is returned, and
in this case, we're storing
it in another variable.
33
Pickle and Shelve: Pickling from a File
➢
The same idea works then
with load( ) and data files,
except that you need to
remember to first open the
data file such that it can
read bytes.
What if there's more than
➢
34
Pickle and Shelve: Python Shelves
➢
As you can see, pickling is a handy way of
converting your data into bytes and cramming
them into an external data file. However it’s quite
useful when used in conjunction with shelves.
➢
A shelf is a database-like object that can
efficiently store pickled values.
In actuality, a shelf is an external data file that is
➢
35
Pickle and Shelve: Python Shelves
➢
To use a shelf in your program, you first add the
import shelf line.
➢
Next, you can open the shelf file by using the
open( ) function.
➢
This works just like the open( ) function for regular files with the name
of the file as the first argument and a flag to tell the computer how the
file should be opened as the second argument.
However, the flags for shelves are a little
➢
different.
36
Pickle and Shelve: Python Shelves
➢
You can use the 'r' and 'w' flags to open your
shelf for only reading or writing, respectively.
➢
Alternatively you can use the 'c' flag, which
enables you to open the shelf for both reading
and writing.
➢
Using it creates a new file if it doesn't already
exist.
➢
The last flag is 'n', which creates a new, empty
file no matter what.
37
Pickle and Shelve: Python Shelves
➢
The following code will open
the file letters.txt and write
two different records.
➢
The first one will have the vowels a, e, i, o,
and u.
➢
The second will have the key 'end' that
contains the letters x, y, and z.
Note that when you run this
➢
38
Pickle and Shelve: Interacting with the
Shelf
➢
Notice how the syntax for adding an item is the
same as adding an item to a regular dictionary.
The difference, of course, is that this data is
➢
➢
Demonstration:>
40
Pickle and Shelve: The sync( ) Function
➢
Because file operations are time consuming,
these files don't always write the data to the file
immediately.
➢
If you want to immediately write the data, you
can use the sync( ) function. It is similar to the
flush() function you used with the simple files
earlier.
➢
We’ll now see some examples to better acquaint
you with all we’ve so far.
41
Reading and Writing CSV Files in Python
➢
Exchanging information through text files is a
common way to share info between programs.
One of the most popular formats for exchanging
➢
42
Reading and Writing CSV Files in Python
➢
In this section we will see:
How to read, process, and parse CSV from text files using Python.
➢
➢
You’ll see how CSV files work, learn the all-important csv library built
into Python, and
See how CSV parsing works using the pandas library.
➢
➢
But first let’s get acquainted with CSV.
43
Reading and Writing CSV Files in Python
➢
In this section we will see:
How to read, process, and parse CSV from text files using Python.
➢
➢
You’ll see how CSV files work, learn the all-important csv library built
into Python, and
See how CSV parsing works using the pandas library.
➢
➢
But first let’s get acquainted with CSV.
44
What Is a CSV File?
➢
A CSV file (Comma Separated Values file) is a
type of plain text file that uses specific
structuring to arrange tabular data.
➢
Because it’s a plain text file, it can contain only actual text data—in
other words, printable ASCII or Unicode characters.
➢
Normally, CSV files use a comma to separate
each specific data value. Here’s what that
structure looks like:
45
What Is a CSV File?
➢
Notice how each piece of data is separated by a
comma.
➢
Normally, the first line identifies each piece of data—in other words, the name
of a data column.
➢
Every subsequent line after that is actual data and is limited only by file size
constraints.
In general, the separator character is called a
➢
➢
Notice how each piece of data is separated by a
comma.
➢
Normally, the first line identifies each piece of data—in other words, the name
of a data column.
➢
Every subsequent line after that is actual data and is limited only by file size
constraints.
In general, the separator character is called a
➢
programmatically.
Any language that supports text file input and string
➢
➢
The csv library provides functionality to both
read from and write to CSV files.
Designed to work out of the box with Excel-
➢
49
Reading CSV Files With csv
➢
The CSV file is opened as a text file with Python’s
built-in open() function, which returns a file object.
➢
This is then passed to the reader, which does the heavy
lifting.
➢
For example here’s a file called employee_birthday.txt
file.
50
Reading CSV Files With csv
➢
Here’s a code to read the csv file:-
51
Reading CSV Files Into a Dictionary With
csv
employee_birthday.txt.
52
Optional Python CSV reader Parameters
➢
The reader object can handle different styles of
CSV files by specifying additional parameters,
some of which are shown below:
➢
delimiter specifies the character used to separate each field. The
default is the comma (',').
➢
quotechar specifies the character used to surround fields that contain
the delimiter character. The default is a double quote (' " ').
➢
escapechar specifies the character used to escape the delimiter
character, in case quotes aren’t used. The default is no escape character.
53
Optional Python CSV reader Parameters
➢
For example if we have employee_addresses.txt
and it looked like this:
➢
The problem is that the data for the address field
also contains a comma to signify separation
between the fields.
54
Optional Python CSV reader Parameters
➢
There are three different ways to handle this situation:
Use a different delimiter
➢
That way, the comma can safely be used in the data itself. You use the delimiter optional
parameter to specify the new delimiter.
The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can
specify the character used for quoting with the quotechar optional parameter. As long as that
character also doesn’t appear in the data, you’re fine.
Escape characters work just as they do in format strings, nullifying the interpretation of the
character being escaped (in this case, the delimiter). If an escape character is used, it must be
specified using the escapechar optional parameter.
55
Writing CSV Files With csv
➢
You can also write to a CSV file using a writer
object and the .write_row() method:
56
Writing CSV Files With csv
➢
If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote all
fields containing text data and convert all numeric fields to the float data type.
➢
If quoting is set to csv.QUOTE_NONE, then .writerow() will escape delimiters
instead of quoting them. In this case, you also must provide a value for the
escapechar optional parameter.
57
Writing CSV Files With csv
➢
Reading the file back in plain text shows that the
file is created as follows:
58
Parsing CSV Files With the pandas
Library
59
Parsing CSV Files With the pandas
Library
pip.
You can read on how to use conda if you work
➢
60
Reading CSV Files With pandas
➢
Once you’ve pandas installed – you can use it to
read csv files and much more.
➢
For example if we have a data file like this:
61
Reading CSV Files With pandas
➢
You can easily read the hrdata.csv file by this
code which uses pandas dataframe
62
Reading CSV Files With pandas
➢
Some notes
➢
First, pandas recognized that the first line of the CSV contained column
names, and used them automatically.
However, pandas is also using zero-based integer indices in the
➢
DataFrame. That’s because we didn’t tell it what our index should be.
➢
Further, if you look at the data types of our columns , you’ll see pandas
has properly converted the Salary and Sick Days remaining columns to
numbers, but the Hire Date column is still a String. This is easily
confirmed in interactive mode:
print(type(df[‘Hire Date][0]))
63
Reading CSV Files With pandas
➢
To use a different column as the DataFrame
index, add the index_col optional parameter:
64
Reading CSV Files With pandas
➢
You can force pandas to read data as a date with
the parse_date optional parameter-which is
defined as a list of column names to treat as
dates:
65
Reading CSV Files With pandas
➢
You can check that the date is parsed
appropriately by typing in the prompt:-
print(type(df['Hire Date'][0]))
➢
You will get a result like:-
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
➢
If your CSV files doesn’t have column names in
the first line, you can use the names optional
parameter to provide a list of column names.
66
Reading CSV Files With pandas
➢
You can also use this if you want to override the
column names provided in the first line. In this
case, you must also tell pandas.read_csv() to
ignore existing column names using the
header=0 optional parameter:
67
Reading CSV Files With pandas
➢
Notice that, since the column names changed,
the columns specified in the index_col and
parse_dates optional parameters must also be
changed.
68
Writing CSV Files With pandas
➢
Writing a DataFrame to a CSV file is just as
easy as reading one in. Let’s write the data
with the new column names to a new CSV file:
69
Writing CSV Files With pandas
➢
The only difference between this code and the
reading code above is that the print(df) call
was replaced with df.to_csv(), providing the file
name. The new CSV file looks like this:
70
Lesson 10 Review
➢
This lesson started by discussing external data
files.
➢
You learned how to create, open, write, and close
a data file. Then you went on to learn how to open
the same file and read the data out of it.
➢
Although working with simple, sequential files one line at a time can be
easy, you also learned that Python gives enables you to move around in
the file wherever you want with some additional functions.
You also saw how to work with csv files with the
➢
71
Some exercises:
➢
Write a simple library management system. It
should have one class called Book – which
contains title, author name, publisher, ISBN
edition etc.
➢
The main program should allow to create new books or view the
books already there.
➢
You should be able to see the list of books even when you restart the
application.
72