0% found this document useful (0 votes)
32 views

Lesson 10 Working with External Data Types

Lesson 10 covers working with external data files in Python, including file basics, reading and writing operations, and advanced techniques like using pickle and shelve for object serialization. It explains how to open, read, write, and manage file operations, as well as how to parse CSV files using Python's built-in libraries and pandas. The lesson emphasizes the importance of proper file handling and the benefits of using pickling for efficient data storage.

Uploaded by

Qombuter Agafari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Lesson 10 Working with External Data Types

Lesson 10 covers working with external data files in Python, including file basics, reading and writing operations, and advanced techniques like using pickle and shelve for object serialization. It explains how to open, read, write, and manage file operations, as well as how to parse CSV files using Python's built-in libraries and pandas. The lesson emphasizes the importance of proper file handling and the benefits of using pickling for efficient data storage.

Uploaded by

Qombuter Agafari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 72

Lesson 10: Working With External Data

Files
Introduction to Python Programming
Lesson 10: Working With External
Data Files
Lesson Overview

Python File Basics

More File Operations

Pickle and Shelve

Parsing CSV Files With Python’s
Built-in CSV Library

Parsing CSV Files With the pandas
Library
Introduction


Imagine if you
decided to write a
program to keep
track of orders for
your company.

You'd probably
need some kind of
permanent record
of those
transactions, right?

3
Introduction


In this lesson, you'll
learn how to
open data files,

read from them,


write to them, and close them.



Then you’ll take a
look at file elements
that are more
advanced,

like creating random access
files with Python's pickle( ) and

shelve( ) functions.

4
Python File Basics: Opening a Data File


To open a file, you use the
open( ) function, provide the
name of the file, and then
specify whether you'll be
reading from or writing to
the file.

Note that the open( )
function returns a file object
that you'll store in a
variable to be used in your
output statements.

5
Python File Basics: Where is the Data
File Saved?


The save location for the data file depends on how
you create it.

When the above line of code is executed directly
at the interpreter's prompt, the new file
(mydata.txt) is saved in the same location as your
Python executable program (python.exe) when
running from the IDLE prompt.

However, when running using Run as Module, it’s
saved the same location as your source code file.

And if you save all your Python statements into their own source code file,
when you run the program, the new file is saved in the same directory as
your source code file.
6
Python File Basics: Where is the Data
File Saved?


So if you want to be sure of your file's location,
you can simply list the full path instead of just a
filename in your open statement.

For example, if you wanted this file saved on
your Desktop, you could with this:

7
Python File Basics: Writing to a Data
File

There are two functions that you can use to


write the data: write( ) and writelines( ).



The write( ) function writes a single string to a
file. writelines( ) writes a list of strings.

But be aware that neither function creates line breaks within a single
line or multiple lines of string.

8
Python File Basics: Writing to a Data
File


If you issue a command to write data to a file,
the data might not immediately appear there.

This is because the file access is a time-consuming operation.
Therefore, the computer might wait for more data to be written to the
file

If you want to force the data to be written
immediately, you can use the flush( ) function.
Here's an example of the code:
out_file.flush( )

9
Python File Basics: Writing to a Data
File


The last important part in this cycle is to close
the file. For that, you'll use the close( ) function.
For example, the following code closes my
out_file object:
out_file.close()

There is no output because instead of calling
the print() function we wrote the data into the
file.

10
Python File Basics: Adding Line Breaks


You include a newline
character, \n every time
you want to move a
string to the next line in
a file.

11
Python File Basics: The writelines()
Function


This function will also
enable you to write
content to a file, but
instead of passing a
single value, you’ll need
to pass it some type of
collection of values, like
a list you created in the
previous lesson.

12
Python File Basics: Reading from Files
with read()


To read a file instead of
opening your file with a
'w', you open it with an
'r'.

The variable names
out_file and in_file are
different just because of
convention adopted by
many python
programmers.

13
Python File Basics: Reading from Files
with read()


Once the file is opened, there are three functions
you can use to read from a file: read( ),
readline( ), and readlines( ).

Although each function will read from the file,
each works a little differently so you’ll learn about
each one separately.

14
Python File Basics: The read( ) Function


When using the read( )
function, you can provide
a number of bytes to be
read in.

However, Python also
allows you to leave the
parentheses empty.
When you do, the rest of
the data from the file will
be read.

15
Python File Basics: The read( ) Function

In the example:

The first line of code will read in (at the most) 1


byte of data from the file (that is, one character)


and store the result in a variable named first.

The second line of code will read in an entire
line and store it in a variable named second:

A pointer keeps track of what
has been read in, and this pointer
increments after each read.

The results of my two lines of
read code would result in "S"
being printed first and
"aturdaySundaySaturdaySunday"
being printed next.

16
Python File Basics: The readline( )
Function


The readline( ) function
reads an entire line of data
from a file if the
parentheses are empty, and
optionally accepts a
maximum number of bytes
in the parentheses.

You might want to use one
for entire lines and the
other for certain numbers of
characters in your code; it
can help make your code a
little easier to understand.
17
Python File Basics: The readlines()
Function


Like the other two,
readlines( ) gives you the
ability to provide a
maximum number of
bytes to be read.

However, if you don't
provide a number of
bytes, it'll read to the end
of the file, not just to the
end of the line.

18
More File Operations: Appending to an
Existing File


If you want to have a file
that logs information about
the user each time they work
with your program, and
doesn’t erase the previous
data each time- you'll need
to open the file in append
mode with the 'a' argument.

This opens that same data
file you were using before,
but this time it'll keep all the
existing data and add in the
new data at the end of the
file.

19
More File Operations: Other Options
when Opening Files


As you have seen, opening a file for output
means you can only write to it, and opening a file
for input means you can only read from it.

Although you can always just close a file and then reopen it in the
other mode, that extra set of steps can be a hassle.

For this reason, Python provides two other ways
of opening our files: 'r+' and 'w+'.

While both ways give you the ability to both
read and write your files, there is a difference.

20
More File Operations: Other Options
when Opening Files


If you attempt to open your file with 'r+' and
that file doesn't exist, then Python will generate
an IOError exception, and your program will stop.
That’s because you can’t read from a file that doesn’t exist.


On the other hand, if you open a nonexistent file
with 'w+', then Python will simply create one for
you.

Note, however, that if that file did exist, then its data would be erased,
just as if you had opened the file with 'w'.

21
More File Operations:The tell( ) Function


To read and write to the same
file at the same time you can
still use the read( ) and
write( ) functions that you
learned earlier.

The only difference is that now you have to
keep track of your current position in the
file as you do your reading and writing.

You can always find out where
you are in your file with the
tell( ) function. This will give
you your current file position
as the number of bytes from
the start of the file.

22
More File Operations:The tell( ) Function

You can always find out


where you are in your file


with the tell( ) function.

This will give you your current file
position as the number of bytes from
the start of the file.

One thing to realize is that
when you read in an entire
line of text, the newline
character is also read in.

And while we call this a character, it's
actually two characters inside the file.

23
More File Operations:The seek()
Function


Of course, if there's a way
to determine where you're
located in a file, there's
also a way to change that
location. You can do this
with the seek( ) function.

When you use seek( ), you
must provide where in the
file you want to move by
specifying a number that
represents the number of
bytes from the beginning
of the file.
24
More File Operations:Reading and
Writing at the Same Time

Now that you know how to


move around inside a file,


you're ready to start reading
and writing to a file at the
same time.
Be aware that when you're

writing to a file that already


has data in it, you're going to
be overwriting its characters;
existing characters don’t
automatically move over to
accommodate new text the
way they do in a word
processor.

25
More File Operations:


Now that you understand how to work with
basic files, let’s next explore how to work with
a database-like file.
For that you'll need to learn about a couple

more Python functions.

26
Pickle and Shelve: Introduction to Pickle

The pickling process simply converts an object to a stream


of bytes.

This stream can then be reconverted to the original object
later.
This is a useful thing to do for a couple reasons.


First, by converting to a string of bytes, we’ll actually save a little bit of space on the disk.

But possibly more important than this, is that we can store an entire object using just a
single line of code.

That is, without the ability to use pickle, we would need to store every field of an object
one at a time in the file, and then when we restored the data, we would need to read in
each piece of data and create a new object. Even for something like an object from our
little Time class, we’re talking about replacing three lines of code: one for each of the
hour, minute, and second, into a single line.

27
Pickle and Shelve: Introduction to Pickle


There are two different ways to pickle an object,
depending on where you want the result to be
stored.

Note, however, that in order to use any of the
pickling functions, you need to have an import
statement to import pickle.

You use the first function, dumps( ), if you want to store the result in a
string.
The other function, dump( ), stores the result in a file.

28
Pickle and Shelve: Introduction to Pickle


Example

29
Pickle and Shelve: Introduction to Pickle


One of the reasons pickling is so important is
because we’re actually storing the values of all
instance variables in the object in a single line of
code.

If we didn’t have the ability to use pickle, then we would need to access
each data member, one at a time.

Pickling converts your list to a stream of bytes
that you store as a string.

It shouldn't be a surprise that the stream is hard to read. However,
putting that stream in a data file for later use can be quite helpful in
certain situations, as you see next.

30
Pickle and Shelve: Pickling to a File


Sending the pickled result to a file is very similar
to sending it to a string, with two differences.
First, the function name is different: remember,

we use dump( ) for files.



Second, you need to provide the name of the file
as a second argument to the function call.

This file needs to be opened so you can write a set of bytes to it—
because our pickled object is now a set of bytes, not characters.

This is easy in Python: just use 'wb', instead of 'w' as you were using
before.

31
Pickle and Shelve: Pickling to a File


For example, if you want
to send the list above to
a data file named,
data.txt, you'd need to
do this:

32
Pickle and Shelve: Pickling from a File


To get the data back to its
original form- you'll need
either the loads( ) or the
load( ) function.

Notice how loads( ) works
just like dumps( ).

That is, you place the variable that's
holding the pickled data inside the
parentheses.

The result is returned, and
in this case, we're storing
it in another variable.
33
Pickle and Shelve: Pickling from a File


The same idea works then
with load( ) and data files,
except that you need to
remember to first open the
data file such that it can
read bytes.
What if there's more than

one pickled object in the


data file?

The load( ) function will read these
objects one at a time. The first call to
load will get the first object, the next call
will get the second object, and so on.

34
Pickle and Shelve: Python Shelves


As you can see, pickling is a handy way of
converting your data into bytes and cramming
them into an external data file. However it’s quite
useful when used in conjunction with shelves.

A shelf is a database-like object that can
efficiently store pickled values.
In actuality, a shelf is an external data file that is

used the same way as a Python dictionary.



The only difference is that in a shelf the keys
must be strings and the values must be objects
that can be pickled.

35
Pickle and Shelve: Python Shelves


To use a shelf in your program, you first add the
import shelf line.

Next, you can open the shelf file by using the
open( ) function.

This works just like the open( ) function for regular files with the name
of the file as the first argument and a flag to tell the computer how the
file should be opened as the second argument.
However, the flags for shelves are a little

different.

36
Pickle and Shelve: Python Shelves


You can use the 'r' and 'w' flags to open your
shelf for only reading or writing, respectively.

Alternatively you can use the 'c' flag, which
enables you to open the shelf for both reading
and writing.

Using it creates a new file if it doesn't already
exist.

The last flag is 'n', which creates a new, empty
file no matter what.

37
Pickle and Shelve: Python Shelves


The following code will open
the file letters.txt and write
two different records.

The first one will have the vowels a, e, i, o,
and u.

The second will have the key 'end' that
contains the letters x, y, and z.
Note that when you run this

code, Python may produce


additional files with
extensions .bak, .dat, and .dir.
Just know that these files are the shelf and

not intended to be human-readable.

38
Pickle and Shelve: Interacting with the
Shelf


Notice how the syntax for adding an item is the
same as adding an item to a regular dictionary.
The difference, of course, is that this data is

being sent to a file.



Other operations that are possible on your shelf
are accessing the value by providing the key,
using the in operator, and using the keys( )
function.

If you decide that you want to remove a record
from the file, you can use del, just like you did
with dictionaries.
39
Pickle and Shelve: Interacting with the
Shelf


Demonstration:>

40
Pickle and Shelve: The sync( ) Function


Because file operations are time consuming,
these files don't always write the data to the file
immediately.

If you want to immediately write the data, you
can use the sync( ) function. It is similar to the
flush() function you used with the simple files
earlier.

We’ll now see some examples to better acquaint
you with all we’ve so far.

41
Reading and Writing CSV Files in Python


Exchanging information through text files is a
common way to share info between programs.
One of the most popular formats for exchanging

data is the CSV format. But how do you use it?



Let’s get one thing clear: you don’t have to (and
you won’t) build your own CSV parser from scratch.

There are several perfectly acceptable libraries
you can use. The Python csv library will work for
most cases.

If your work requires lots of data or numerical analysis, the pandas library
has CSV parsing capabilities as well, which should handle the rest.

42
Reading and Writing CSV Files in Python


In this section we will see:
How to read, process, and parse CSV from text files using Python.


You’ll see how CSV files work, learn the all-important csv library built
into Python, and
See how CSV parsing works using the pandas library.

We will also see how to install libraries using pip.



But first let’s get acquainted with CSV.

43
Reading and Writing CSV Files in Python


In this section we will see:
How to read, process, and parse CSV from text files using Python.


You’ll see how CSV files work, learn the all-important csv library built
into Python, and
See how CSV parsing works using the pandas library.

We will also see how to install libraries using pip.



But first let’s get acquainted with CSV.

44
What Is a CSV File?


A CSV file (Comma Separated Values file) is a
type of plain text file that uses specific
structuring to arrange tabular data.

Because it’s a plain text file, it can contain only actual text data—in
other words, printable ASCII or Unicode characters.

Normally, CSV files use a comma to separate
each specific data value. Here’s what that
structure looks like:

45
What Is a CSV File?


Notice how each piece of data is separated by a
comma.

Normally, the first line identifies each piece of data—in other words, the name
of a data column.

Every subsequent line after that is actual data and is limited only by file size
constraints.
In general, the separator character is called a

delimiter, and the comma is not the only one used.



Other popular delimiters include the tab (\t), colon (:) and semi-colon (;)
characters.

Properly parsing a CSV file requires us to know which
delimiter is being used.
46
What Is a CSV File?


Notice how each piece of data is separated by a
comma.

Normally, the first line identifies each piece of data—in other words, the name
of a data column.

Every subsequent line after that is actual data and is limited only by file size
constraints.
In general, the separator character is called a

delimiter, and the comma is not the only one used.



Other popular delimiters include the tab (\t), colon (:) and semi-colon (;)
characters.

Properly parsing a CSV file requires us to know which
delimiter is being used.
47
What Is a CSV File?

CSV files are normally created by programs that


handle large amounts of data. T


hey are a convenient way to export data from spreadsheets and databases as

well as import or use it in other programs.



For example, you might export the results of a data mining program to a CSV
file and then import that into a spreadsheet to analyze the data, generate
graphs for a presentation, or prepare a report for publication.
CSV files are very easy to work with

programmatically.
Any language that supports text file input and string

manipulation (like Python) can work with CSV files


directly.
48
Parsing CSV Files With Python’s Built-in
CSV Library


The csv library provides functionality to both
read from and write to CSV files.
Designed to work out of the box with Excel-

generated CSV files, it is easily adapted to work


with a variety of CSV formats.

The csv library contains objects and other code
to read, write, and process data from and to CSV
files.

49
Reading CSV Files With csv

Reading from a CSV file is done using the reader object.



The CSV file is opened as a text file with Python’s
built-in open() function, which returns a file object.

This is then passed to the reader, which does the heavy
lifting.

For example here’s a file called employee_birthday.txt
file.

50
Reading CSV Files With csv


Here’s a code to read the csv file:-

51
Reading CSV Files Into a Dictionary With
csv

Rather than deal with a list of individual String elements,


you can read CSV data directly into a dictionary .


We will be working with the same file

employee_birthday.txt.

52
Optional Python CSV reader Parameters


The reader object can handle different styles of
CSV files by specifying additional parameters,
some of which are shown below:

delimiter specifies the character used to separate each field. The
default is the comma (',').

quotechar specifies the character used to surround fields that contain
the delimiter character. The default is a double quote (' " ').

escapechar specifies the character used to escape the delimiter
character, in case quotes aren’t used. The default is no escape character.

53
Optional Python CSV reader Parameters


For example if we have employee_addresses.txt
and it looked like this:


The problem is that the data for the address field
also contains a comma to signify separation
between the fields.

54
Optional Python CSV reader Parameters


There are three different ways to handle this situation:
Use a different delimiter

That way, the comma can safely be used in the data itself. You use the delimiter optional
parameter to specify the new delimiter.

Wrap the data in quotes


The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can
specify the character used for quoting with the quotechar optional parameter. As long as that
character also doesn’t appear in the data, you’re fine.

Escape the delimiter characters in the data


Escape characters work just as they do in format strings, nullifying the interpretation of the
character being escaped (in this case, the delimiter). If an escape character is used, it must be
specified using the escapechar optional parameter.

55
Writing CSV Files With csv


You can also write to a CSV file using a writer
object and the .write_row() method:

The quotechar optional parameter tells the


writer which character to use to quote fields


when writing.

56
Writing CSV Files With csv

The quotechar optional parameter tells the writer


which character to use to quote fields when writing.



If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote fields only
if they contain the delimiter or the quotechar. This is the default case.
If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all fields.


If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote all
fields containing text data and convert all numeric fields to the float data type.

If quoting is set to csv.QUOTE_NONE, then .writerow() will escape delimiters
instead of quoting them. In this case, you also must provide a value for the
escapechar optional parameter.

57
Writing CSV Files With csv


Reading the file back in plain text shows that the
file is created as follows:

58
Parsing CSV Files With the pandas
Library

Reading CSV files is possible in pandas as well. It is


highly recommended if you have a lot of data to


analyze.
pandas is an open-source Python library that

provides high performance data analysis tools and


easy to use data structures.

pandas is available for all Python installations, but
it is a key part of the Anaconda distribution and
works extremely well in Jupyter notebooks to share
data, code, analysis results, visualizations, and
narrative text.

59
Parsing CSV Files With the pandas
Library

You can install pandas either through conda or


pip.
You can read on how to use conda if you work

with Anaconda or Jupyter – for now we will see


how to use pip.

Enter these commands in cmd.

60
Reading CSV Files With pandas


Once you’ve pandas installed – you can use it to
read csv files and much more.

For example if we have a data file like this:

61
Reading CSV Files With pandas


You can easily read the hrdata.csv file by this
code which uses pandas dataframe

62
Reading CSV Files With pandas


Some notes

First, pandas recognized that the first line of the CSV contained column
names, and used them automatically.
However, pandas is also using zero-based integer indices in the

DataFrame. That’s because we didn’t tell it what our index should be.

Further, if you look at the data types of our columns , you’ll see pandas
has properly converted the Salary and Sick Days remaining columns to
numbers, but the Hire Date column is still a String. This is easily
confirmed in interactive mode:
print(type(df[‘Hire Date][0]))

63
Reading CSV Files With pandas


To use a different column as the DataFrame
index, add the index_col optional parameter:

64
Reading CSV Files With pandas


You can force pandas to read data as a date with
the parse_date optional parameter-which is
defined as a list of column names to treat as
dates:

65
Reading CSV Files With pandas


You can check that the date is parsed
appropriately by typing in the prompt:-
print(type(df['Hire Date'][0]))

You will get a result like:-
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

If your CSV files doesn’t have column names in
the first line, you can use the names optional
parameter to provide a list of column names.

66
Reading CSV Files With pandas


You can also use this if you want to override the
column names provided in the first line. In this
case, you must also tell pandas.read_csv() to
ignore existing column names using the
header=0 optional parameter:

67
Reading CSV Files With pandas


Notice that, since the column names changed,
the columns specified in the index_col and
parse_dates optional parameters must also be
changed.

68
Writing CSV Files With pandas


Writing a DataFrame to a CSV file is just as
easy as reading one in. Let’s write the data
with the new column names to a new CSV file:

69
Writing CSV Files With pandas


The only difference between this code and the
reading code above is that the print(df) call
was replaced with df.to_csv(), providing the file
name. The new CSV file looks like this:

70
Lesson 10 Review


This lesson started by discussing external data
files.

You learned how to create, open, write, and close
a data file. Then you went on to learn how to open
the same file and read the data out of it.

Although working with simple, sequential files one line at a time can be
easy, you also learned that Python gives enables you to move around in
the file wherever you want with some additional functions.
You also saw how to work with csv files with the

builtin library as well as with pandas.

71
Some exercises:


Write a simple library management system. It
should have one class called Book – which
contains title, author name, publisher, ISBN
edition etc.

The main program should allow to create new books or view the
books already there.

You should be able to see the list of books even when you restart the
application.

72

You might also like