Python - Working With Data - Text Formats
Python - Working With Data - Text Formats
text formats
ASCII or text file formats
Advantages of working with text formats:
• They are usually human-readable.
• They tend to be simple structures.
• It is relatively easy to write code to interpret
them.
Disadvantages include:
• Inefficient storage for big data volumes.
• Most people invent their own format so
there is a lack of standardisation.
Using python to read text formats
As we have seen Python has a great toolkit for
reading files and working with strings.
Lines numbers
(for reference
only)
Data (first 9 columns)
Data (last 8 columns)
Look! A missing value!
Let's write some code to read it
We'll need:
example_code/test_read_rainfall.py
example_data/uk_rainfall.txt
Reading the header
UK Rainfall (mm)
Areal series, starting from 1910
Allowances have been made for topographic, coastal
and urban effects where relationships are
found to exist.
Seasons: Winter=Dec-Feb, Spring=Mar-May,
Summer=June-Aug, Autumn=Sept-Nov. (Winter:
Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where
values are equal, rankings are based in
order of year descending.
Data are provisional from December 2014 & Winter
2015. Last updated 07/04/2015
Reading the header
UK Rainfall (mm) Line 1 is important
Areal series, starting from information.
1910
Allowances have been made for topographic, coastal
and urban effects where relationships are
found to exist. Other lines are useful
information.
Seasons: Winter=Dec-Feb, Spring=Mar-May,
Summer=June-Aug, Autumn=Sept-Nov. (Winter:
Let's capture the metadata in:
Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where
values are equal, rankings are based in
- location: UK
order of year descending.
- variable:
Data are provisional from December Rainfall
2014 & Winter
- units: mm
2015. Last updated 07/04/2015
Reading the header
def readHeader(fname):
# Open the file and read the relevant lines
f = open(fname)
head = f.readlines()[:6]
f.close()
def checkValue(value):
# Check if value should be a float
# or flagged as missing
if value == "---":
value = MA.masked
else:
value = float(value)
return value
Reading the data (part 1)
import numpy.ma as MA
def readData(fname):
# Open file and read column names and data block
f = open(fname)
# Ignore header
for i in range(7):
f.readline()
col_names = f.readline().split()
data_block = f.readlines()
f.close()
return data
Testing the code
>>> data = readData("example_data/uk_rainfall.txt")
>>> print data["Year"]
[ 1910. 1911. 1912. ...
line.split(",")
Or try the Python "csv" module
There is a python "csv" module that is able to read text files
with various delimiters. E.g.:
See: https://docs.python.org/2/library/csv.html