0% found this document useful (0 votes)

25 views8 pages

Week 5

Chapter 8 discusses methods for importing and exporting data using pandas and other techniques. It covers importing data from various formats such as CSV, Excel, and STATA files, as well as methods for handling data without pandas, including using loadtxt and genfromtxt. Additionally, it addresses reading complex files and provides examples of how to manage different data types effectively.

Uploaded by

Minh Tài

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Week 5

Uploaded by

Minh Tài

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Chapter 8

Importing and Exporting Data

8.1 Importing Data using pandas

pandas is an increasingly important component of the Python scientific stack, and a complete discussion of its
main features is included in Chapter 16. All of the data readers in pandas load data into a pandas DataFrame
(see Section 16.1.2), and so these examples all make use of the values property to extract a NumPy array. In
practice, the DataFrame is much more useful since it includes useful information such as column names read
from the data source. In addition to the three formats presented here, pandas can also read json, SQL, html
tables or from the clipboard, which is particularly useful for interactive work since virtually any source that can
be copied to the clipboard can be imported.

8.1.1 CSV and other formatted text files

Comma-separated value (CSV) files can be read using read_csv. When the CSV file contains mixed data, the
default behavior will read the file into an array with an object data type, and so further processing is usually
required to extract the individual series.
>>> from pandas import read_csv
>>> csv_data = read_csv('FTSE_1984_2012.csv')
>>> csv_data = csv_data.to_numpy() # As a NumPy array
>>> csv_data[:4]
array([['2012-02-15', 5899.9, 5923.8, 5880.6, 5892.2, 801550000, 5892.2],
['2012-02-14', 5905.7, 5920.6, 5877.2, 5899.9, 832567200, 5899.9],
['2012-02-13', 5852.4, 5920.1, 5852.4, 5905.7, 643543000, 5905.7],
['2012-02-10', 5895.5, 5895.5, 5839.9, 5852.4, 948790200, 5852.4]],
dtype=object)

>>> open_price = csv_data[:,1]

When the entire file is numeric, the data will be stored as a homogeneous array using one of the numeric
data types, typically float64. In this example, the first column contains Excel dates as numbers, which are the
number of days past January 1, 1900.
>>> csv_data = read_csv('FTSE_1984_2012_numeric.csv')
>>> csv_data = csv_data.to_numpy()
>>> csv_data[:4,:2]
array([[ 40954. , 5899.9],
[ 40953. , 5905.7],
[ 40952. , 5852.4],
[ 40949. , 5895.5]])
76 Importing and Exporting Data

8.1.2 Excel files

Excel files, both 97/2003 (xls) and 2007/10/13 (xlsx), can be imported using read_excel. Two inputs are
required to use read_excel, the filename and the sheet name containing the data. In this example, pandas
makes use of the information in the Excel workbook that the first column contains dates and converts these to
datetimes. Like the mixed CSV data, the array returned has object data type.
>>> from pandas import read_excel
>>> excel_data = read_excel('FTSE_1984_2012.xls','FTSE_1984_2012')
>>> excel_data = excel_data.values
>>> excel_data[:4,:2]
array([[datetime.datetime(2012, 2, 15, 0, 0), 5899.9],
[datetime.datetime(2012, 2, 14, 0, 0), 5905.7],
[datetime.datetime(2012, 2, 13, 0, 0), 5852.4],
[datetime.datetime(2012, 2, 10, 0, 0), 5895.5]], dtype=object)

>>> open_price = excel_data[:,1]

8.1.3 STATA files

pandas also contains a method to read STATA files.
>>> from pandas import read_stata
>>> stata_data = read_stata('FTSE_1984_2012.dta')
>>> stata_data = stata_data.values
>>> stata_data[:4,:2]
array([[ 0.00000000e+00, 4.09540000e+04],
[ 1.00000000e+00, 4.09530000e+04],
[ 2.00000000e+00, 4.09520000e+04],
[ 3.00000000e+00, 4.09490000e+04]])

8.2 Importing Data without pandas

Importing data without pandas ranges from easy to difficult depending on whether the files contain only num-
bers, the data size and the regularity of the format of the data. A few principles can simplify this task:
• The file imported should contain numbers only, with the exception of the first row which may contain the
variable names.

• Use another program, such as Microsoft Excel, to manipulate data before importing.

• Each column of the spreadsheet should contain a single variable.

• Dates should be converted to YYYYMMDD, a numeric format, before importing. This can be done in
Excel using the formula:
=10000⁎YEAR(A1)+100⁎MONTH(A1)+DAY(A1)+(A1-FLOOR(A1,1))

• Store times separately from dates using a numeric format such as seconds past midnight or HHmmSS.sss.

8.2.1 CSV and other formatted text files

A number of importers are available for regular (e.g. all rows have the same number of columns) comma-
separated value (CSV) data. The choice of which importer to use depends on the complexity and size of the
file. Purely numeric files are the simplest to import, although most files which have a repeated structure can be
directly imported (unless they are very large).
8.2 Importing Data without pandas 77

loadtxt
loadtxt is a simple, fast text importer. The basic use is loadtxt(filename), which will attempt to load the
data in file name as floats. Other useful named arguments include delim, which allow the file delimiter to be
specified, and skiprows which allows one or more rows to be skipped.
loadtxt requires the data to be numeric and so is only useful for the simplest files.
>>> data = loadtxt('FTSE_1984_2012.csv',delimiter=',') # Error
ValueError: could not convert string to float: Date

# Fails since CSV has a header

>>> data = loadtxt('FTSE_1984_2012_numeric.csv',delimiter=',') # Error
ValueError: could not convert string to float: Date

>>> data = loadtxt('FTSE_1984_2012_numeric.csv',delimiter=',',skiprows=1)

>>> data[0]
array([ 4.09540000e+04, 5.89990000e+03, 5.92380000e+03, 5.88060000e+03,
5.89220000e+03, 8.01550000e+08, 5.89220000e+03])

genfromtxt
genfromtxt is a slightly slower, more robust importer. genfromtxt is called using the same syntax as loadtxt,
but will not fail if a non-numeric type is encountered. Instead, genfromtxt will return a NaN (not-a-number)
for fields in the file it cannot read.
>>> data = genfromtxt('FTSE_1984_2012.csv',delimiter=',')
>>> data[0]
array([ nan, nan, nan, nan, nan, nan, nan])
>>> data[1]
array([ nan, 5.89990000e+03, 5.92380000e+03, 5.88060000e+03, 5.89220000e+03, 8.01550000e+08,
5.89220000e+03])

Tab delimited data can be read in a similar manner using delimiter='\t'.

>>> data = genfromtxt('FTSE_1984_2012_numeric_tab.txt',delimiter='\t')

csv2rec
csv2rec has been removed from matplotlib. pandas is the preferred method to import csv data.

8.2.2 Excel Files

xlrd
Reading Excel files in Python is more involved, and it is simpler to convert the xls to CSV. Excel files can be
read using xlrd (which is part of xlutils).
import xlrd

wb = xlrd.open_workbook('FTSE_1984_2012.xls')
# To read xlsx change the filename
# wb = xlrd.open_workbook('FTSE_1984_2012.xlsx')
sheetNames = wb.sheet_names()
# Assumes 1 sheet name
sheet = wb.sheet_by_name(sheetNames[0])
excelData = [] # List to hold data
for i in range(sheet.nrows):
78 Importing and Exporting Data

excelData.append(sheet.row_values(i))

# Subtract 1 since excelData has the header row

open_price = empty(len(excelData) - 1)
for i in range(len(excelData) - 1):
open_price[i] = excelData[i+1][1]

The listing does a few things. First, it opens the workbook for reading (open_workbook), then it gets the
sheet names (wb.sheet_names()) and opens a sheet (wb.sheet_by_name using the first sheet name in the file,
sheetNames[0]). From the sheet, it gets the number of rows (sheet.nrows), and fills a list with the values,
row-by-row. Once the data has been read-in, the final block fills an array with the opening prices. This is
substantially more complicated than importing from a CSV file, although reading Excel files is useful for
automated work (e.g. you have no choice but to import from an Excel file since it is produced by some other
software).

openpyxl
openpyxl reads and writes the modern Excel file format (.xlsx) that is the default in Office 2007 or later.
openpyxl also supports a reader and writer which is optimized for large files, a feature not available in xlrd.
Unfortunately, openpyxl uses a different syntax from xlrd, and so some modifications are required when using
openpyxl.
import openpyxl

wb = openpyxl.load_workbook('FTSE_1984_2012.xlsx')
sheetNames = wb.sheetnames
# Assumes 1 sheet name
sheet = wb[sheetNames[0]]
rows = sheet.rows

# rows is a generator, so it is directly iterable

open_price = [row[1].value for row in rows]

The strategy with 2007/10/13 xlsx files is essentially the same as with 97/2003 files. The main difference is
that the command sheet.rows() returns a tuple containing the all of the rows in the selected sheet. Each row
is itself a tuple which contains Cells (which are a type created by openpyxl), and each cell has a value (Cells
also have other useful attributes such as data_type and methods such as is_date()) .
Using the optimized reader is similar. The primary differences are:

• The rows are sequentially accessible using iter_rows().

• value is not available, and so internal_value must be used.

• The number of rows is not known, and so it isn’t possible to pre-allocate the storage variable with the
correct number of rows.

import openpyxl

wb = openpyxl.load_workbook('FTSE_1984_2012.xlsx')
sheetNames = wb.sheetnames
# Assumes 1 sheet name
sheet = wb[sheetNames[0]]

# Use list to store data

open_price = []
8.2 Importing Data without pandas 79

# Changes since access is via memory efficient iterator

# Note () on iter_rows
for row in sheet.iter_rows():
# Must use internal_value
open_price.append(row[1].internal_value)

# Burn first row and convert to array

open_price = array(open_price[1:])

8.2.3 MATLAB Data Files (.mat)

Modern mat files

MATLAB stores data in a standard format known as Hierarchical Data Format, of HDF. HDF is a generic stor-
age technology that provides fast access to stored data as well as on-the-fly compression. Starting in MATLAB
7.3, mat files are stored using HDF version 5, and so can be read in using the PyTables package.
>>> import tables
>>> matfile = tables.open_file('FTSE_1984_2012_v73.mat')
>>> matfile.root
/ (RootGroup) ''
children := ['volume' (CArray), 'high' (CArray), 'adjclose' (CArray), 'low' (CArray), '
close' (CArray), 'open' (CArray)]

>>> matfile.root.open
/open (CArray(1, 7042), zlib(3)) ''
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := (1, 7042)

>>> open_price = matfile.root.open.read()

>>> open_price
array([[5899.9, 5905.7, 5852.4, ..., 1095.4, 1095.4, 1108.1]])

>>> matfile.close() # Close the file

Legacy mat files

SciPy enables legacy MATLAB data files (mat files) to be read. The most recent file format, V7.3, is not
supported but can be read using PyTables or h5py. Data from compatible mat files can be loaded using loadmat.
The data is loaded into a dictionary, and individual variables are accessed using the keys of the dictionary.
>>> import scipy.io as sio
>>> mat_data = sio.loadmat('FTSE_1984_2012.mat')
>>> type(mat_data)
dict

>>> mat_data.keys()
['volume',
'__header__',
'__globals__',
'high',
'adjclose',
'low',
'close',
80 Importing and Exporting Data

'__version__',
'open']

>>> open_price = mat_data['open']

8.2.4 Reading Complex Files

Python can be programmed to read any text file format since it contains functions for directly accessing files
and parsing strings. Reading poorly formatted data files is an advanced technique and should be avoided if
possible. However, some data is only available in formats where reading in data line-by-line is the only option.
For example, the standard import methods fail if the raw data is very large (too large for Excel) and is poorly
formatted. In this case, the only possibility may be to write a program to read the file line-by-line (or in blocks)
and to directly process the raw text.
The file IBM_TAQ.txt contains a simple example of data that is difficult to import. This file was downloaded
from Wharton Research Data Services and contains all prices for IBM from the TAQ database between January
1, 2001, and January 31, 2001. It is too large to use in Excel and has both numbers, dates, and text on each line.
The following code block shows one method for importing this data set.
from numpy import array

f = open('IBM_TAQ.txt', 'r')
line = f.readline()
# Burn the first list as a header
line = f.readline()

date = []
time = []
price = []
volume = []
while line:
data = line.split(',')
date.append(int(data[1]))
price.append(float(data[3]))
volume.append(int(data[4]))
t = data[2]
time.append(int(t.replace(':','')))
line = f.readline()

# Convert to arrays, which are more useful than lists

# for numeric data
date = array(date)
price = array(price)
volume = array(volume)
time = array(time)

all_data = array([date, price, volume, time])

f.close()

This block of code does a few things:

• Open the file directly using open

• Reads the file line by line using readline

• Initializes lists for all of the data

8.3 Saving or Exporting Data using pandas 81

• Rereads the file parsing each line by the location of the commas using split(',') to split the line at
each comma into a list
• Uses replace(':','') to remove colons from the times
• Uses int() and float() to convert strings to numbers
• Closes the file directly using close()

8.3 Saving or Exporting Data using pandas

pandas supports writing to CSV, other delimited text formats, Excel files, json, html tables, HDF5 and STATA.
An understanding of the pandas’ DataFrame is required prior to using pandas file writing facilities, and Chapter
16 provides further information.

8.4 Saving or Exporting Data without pandas

Native NumPy Format
A number of options are available for saving data. These include using native npz data files, MATLAB data
files, CSV or plain text. Multiple NumPy arrays can be saved using NumPy’s savez_compressed.
x = arange(10)
y = zeros((100,100))
savez_compressed('test',x,y)
data = load('test.npz')
# If no name is given, arrays are generic names arr_1, arr_2, etc
x = data['arr_1']

savez_compressed('test',x=x,otherData=y)
data = load('test.npz')
# x=x provides the name x for the data in x
x = data['x']
# otherDate = y saves the data in y as otherData
y = data['otherData']

A version which does not compress data but is otherwise identical is savez. Compression is usually a good
idea and is very helpful for storing arrays which have repeated values and are large.

8.4.1 Writing MATLAB Data Files (.mat)

SciPy enables MATLAB data files to be written. Data can be written using savemat, which takes two inputs, a
file name and a dictionary containing data, in its simplest form.
import scipy.io as sio

x = array([1.0,2.0,3.0])
y = zeros((10,10))
# Set up the dictionary
saveData = {'x':x, 'y':y}
sio.savemat('test',saveData,do_compression=True)
# Read the data back in
mat_data = sio.loadmat('test.mat')

savemat uses the optional argument do_compression = True, which compresses the data, and is generally a
good idea on modern computers and/or for large datasets.
82 Importing and Exporting Data

8.4.2 Exporting Data to Text Files

Data can be exported to a tab-delimited text files using savetxt. By default, savetxt produces tab delimited
files, although then can be changed using the names argument delimiter.
x = randn(10,10)
# Save using tabs
savetxt('tabs.txt',x)
# Save to CSV
savetxt('commas.csv',x,delimiter=',')
# Reread the data
x_data = loadtxt('commas.csv',delimiter=',')

8.5 Exercises
Note: There are no exercises using pandas in this chapter. For exercises using pandas to read or write data,
see Chapter 16.

1. The file exercise3.xls contains three columns of data, the date, the return on the S&P 500, and the return
on XOM (ExxonMobil). Using Excel, convert the date to YYYYMMDD format and save the file.

2. Save the file as both CSV and tab delimited. Use the three text readers to read the file, and compare the
arrays returned.

3. Parse loaded data into three variables, dates, SP500 and XOM.

4. Save NumPy, compressed NumPy and MATLAB data files with all three variables. Which files is the
smallest?

5. Construct a new variable, sumreturns as the sum of SP500 and XOM. Create another new variable,
outputdata as a horizontal concatenation of dates and sumreturns.

6. Export the variable outputdata to a new CSV file using savetxt.

7. (Difficult) Read in exercise3.xls directly using xlrd.

8. (Difficult) Save exercise3.xls as exercise3.xlsx and read in directly using openpyxl.

Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
Automation With Python Using Excel
No ratings yet
Automation With Python Using Excel
14 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Ingenuity Series: V 3 - 5 - X / V 3 - 6 - X T o V 4 - 0 - 1
100% (1)
Ingenuity Series: V 3 - 5 - X / V 3 - 6 - X T o V 4 - 0 - 1
353 pages
ISACA Glossary English Arabic Mis Ara 0615 1
No ratings yet
ISACA Glossary English Arabic Mis Ara 0615 1
76 pages
Tutorial Using Excel With Python and Pandas
100% (2)
Tutorial Using Excel With Python and Pandas
28 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
1.2. Data Analysis With Python - Importing Datasets 2
No ratings yet
1.2. Data Analysis With Python - Importing Datasets 2
14 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Software Engineering Solved Mcqs PDF
100% (1)
Software Engineering Solved Mcqs PDF
16 pages
Python in Excel Boost Your Data Analysis and Automation With Powerful Python Scripts Hayden Van Der Post Download
No ratings yet
Python in Excel Boost Your Data Analysis and Automation With Powerful Python Scripts Hayden Van Der Post Download
91 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Python Series Day20
No ratings yet
Python Series Day20
7 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas 1
No ratings yet
Pandas 1
64 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Learning Pandas PDF
No ratings yet
Learning Pandas PDF
171 pages
تفسیر حکمت القران لسم جلد
No ratings yet
تفسیر حکمت القران لسم جلد
664 pages
CEP For Real TIme Applications
No ratings yet
CEP For Real TIme Applications
81 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
RM - Pandas - Importing Data
No ratings yet
RM - Pandas - Importing Data
15 pages
Assembly Language: 1.machine Language: This Is The Machine
No ratings yet
Assembly Language: 1.machine Language: This Is The Machine
59 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
Introduction To Pandas Programming 1
No ratings yet
Introduction To Pandas Programming 1
2 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
1 Import and Handling Data - Jupyter Notebook
No ratings yet
1 Import and Handling Data - Jupyter Notebook
9 pages
Using Excel With Pandas
No ratings yet
Using Excel With Pandas
16 pages
Python and Excel
No ratings yet
Python and Excel
11 pages
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
No ratings yet
Importing Data Cheat Sheet Python For Data Science: Pickled Files Exploring Your Data
1 page
Prac 1
No ratings yet
Prac 1
5 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
Bingo Da Porcentagem
No ratings yet
Bingo Da Porcentagem
96 pages
Data Analytics With PowerBI
No ratings yet
Data Analytics With PowerBI
27 pages
Lab Manual 5
No ratings yet
Lab Manual 5
5 pages
York University Itec2600 by Amanda Tian
No ratings yet
York University Itec2600 by Amanda Tian
9 pages
Emerging Technologies For Business Processes
No ratings yet
Emerging Technologies For Business Processes
19 pages
Domain and Range Homework
100% (1)
Domain and Range Homework
5 pages
Rest of The Ip Project
No ratings yet
Rest of The Ip Project
26 pages
Learn Python Pandas For Data Science Quick TutorialExamples For All Primary Operations of DataFrames
No ratings yet
Learn Python Pandas For Data Science Quick TutorialExamples For All Primary Operations of DataFrames
37 pages
Data Representation
No ratings yet
Data Representation
13 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
MP Vs MC
No ratings yet
MP Vs MC
6 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
A Guide To Excel Spreadsheets in Python With Openpyxl
No ratings yet
A Guide To Excel Spreadsheets in Python With Openpyxl
40 pages
Cover Letter Ebook Design
No ratings yet
Cover Letter Ebook Design
22 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Advanced Databases Second Semester: Dr. Jihan A. Rasool
No ratings yet
Advanced Databases Second Semester: Dr. Jihan A. Rasool
18 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
Prac 1
No ratings yet
Prac 1
5 pages
Zimbra OS Admin Guide 8.6.0
No ratings yet
Zimbra OS Admin Guide 8.6.0
208 pages
Importing Data Python Cheat Sheet PDF
No ratings yet
Importing Data Python Cheat Sheet PDF
1 page
Reading and Plotting Stock Data Notes
No ratings yet
Reading and Plotting Stock Data Notes
2 pages
Xilinx BMM File
No ratings yet
Xilinx BMM File
5 pages
Data Exploration in Python PDF
No ratings yet
Data Exploration in Python PDF
1 page
Pandas 1
No ratings yet
Pandas 1
2 pages
Python Programming Mock Exam
No ratings yet
Python Programming Mock Exam
20 pages
Signal Fire OTDR ZS1000-A)
No ratings yet
Signal Fire OTDR ZS1000-A)
9 pages
MT6737T Android Scatter
No ratings yet
MT6737T Android Scatter
9 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Juniper - SRX5400, SRX5600, SRX5800 Services Gateways Firewalls
No ratings yet
Juniper - SRX5400, SRX5600, SRX5800 Services Gateways Firewalls
14 pages
Guide To Digital Identity Verification Report
No ratings yet
Guide To Digital Identity Verification Report
21 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Excel 1
No ratings yet
Excel 1
177 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
Computer Science - Data Structures
No ratings yet
Computer Science - Data Structures
3 pages
PM Debug Info
No ratings yet
PM Debug Info
15 pages
3.1. Evaluarea Satisfactiei Beneficiarilor Educationali - 1. Managemantul Clasei
No ratings yet
3.1. Evaluarea Satisfactiei Beneficiarilor Educationali - 1. Managemantul Clasei
20 pages
T12 Se
No ratings yet
T12 Se
11 pages
EHEv1 Module 04 Password Cracking Techniques and Countermeasures
No ratings yet
EHEv1 Module 04 Password Cracking Techniques and Countermeasures
25 pages
Exp-6 IOTSAD
No ratings yet
Exp-6 IOTSAD
5 pages
BLUM Inward-Opening Door
No ratings yet
BLUM Inward-Opening Door
4 pages
Sanofi Coursera Pathways
No ratings yet
Sanofi Coursera Pathways
6 pages
Packing Slip
No ratings yet
Packing Slip
2 pages
Jenisha INTERNSHIP REPORT-2
No ratings yet
Jenisha INTERNSHIP REPORT-2
19 pages
Lokesh S Resume
No ratings yet
Lokesh S Resume
2 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Cambridge IGCSE ™: Computer Science 0478/13
No ratings yet
Cambridge IGCSE ™: Computer Science 0478/13
8 pages
How To Surf The Web Anonymously
No ratings yet
How To Surf The Web Anonymously
6 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Learn Excel in 24 Hours
From Everand
Learn Excel in 24 Hours
Alex Nordeen
4/5 (3)
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Excel At Excel - The Ultimate Guide
From Everand
Excel At Excel - The Ultimate Guide
PA BOOKS
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Learning Open Office: Calc & Base
From Everand
Learning Open Office: Calc & Base
Durgesh
No ratings yet

Week 5

Uploaded by

Week 5

Uploaded by

Chapter 8

Importing and Exporting Data

8.1 Importing Data using pandas

8.1.1 CSV and other formatted text files

>>> open_price = csv_data[:,1]

8.1.2 Excel files

>>> open_price = excel_data[:,1]

8.1.3 STATA files

8.2 Importing Data without pandas

• Each column of the spreadsheet should contain a single variable.

8.2.1 CSV and other formatted text files

# Fails since CSV has a header

>>> data = loadtxt('FTSE_1984_2012_numeric.csv',delimiter=',',skiprows=1)

Tab delimited data can be read in a similar manner using delimiter='\t'.

8.2.2 Excel Files

# Subtract 1 since excelData has the header row

# rows is a generator, so it is directly iterable

• The rows are sequentially accessible using iter_rows().

• value is not available, and so internal_value must be used.

# Use list to store data

# Changes since access is via memory efficient iterator

# Burn first row and convert to array

8.2.3 MATLAB Data Files (.mat)

>>> open_price = matfile.root.open.read()

>>> matfile.close() # Close the file

Legacy mat files

>>> open_price = mat_data['open']

8.2.4 Reading Complex Files

# Convert to arrays, which are more useful than lists

all_data = array([date, price, volume, time])

This block of code does a few things:

• Open the file directly using open

• Reads the file line by line using readline

• Initializes lists for all of the data

8.3 Saving or Exporting Data using pandas

8.4 Saving or Exporting Data without pandas

8.4.1 Writing MATLAB Data Files (.mat)

8.4.2 Exporting Data to Text Files

6. Export the variable outputdata to a new CSV file using savetxt.

7. (Difficult) Read in exercise3.xls directly using xlrd.

8. (Difficult) Save exercise3.xls as exercise3.xlsx and read in directly using openpyxl.

You might also like