0% found this document useful (0 votes)

100 views23 pages

Python - Working With Data - Text Formats

This document discusses reading text file formats into Python for data analysis. It provides an example of reading a .txt file of monthly UK rainfall data from 1910-present. Functions are defined to read the header metadata, check for and handle missing values, and read the data into a dictionary with columns as keys and masked arrays as values. The techniques demonstrated work similarly for comma-separated CSV files using Python's csv module.

Uploaded by

sunil jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views23 pages

Python - Working With Data - Text Formats

Uploaded by

sunil jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Python – working with data –

text formats
ASCII or text file formats
Advantages of working with text formats:
• They are usually human-readable.
• They tend to be simple structures.
• It is relatively easy to write code to interpret
them.
Disadvantages include:
• Inefficient storage for big data volumes.
• Most people invent their own format so
there is a lack of standardisation.
Using python to read text formats
As we have seen Python has a great toolkit for
reading files and working with strings.

In this example we use a file that we found on

the web, and then adapt some code to read it
into a useful, re-usable form.
Our example file
We found a suitable data set on the web:
[Link]

Met Office monthly weather statistics for

the UK since 1910.
Header

Lines numbers
(for reference
only)
Data (first 9 columns)
Data (last 8 columns)
Look! A missing value!
Let's write some code to read it
We'll need:

• To read the header and data separately

• To think about the data structure (so it is easy to
retrieve the data in a useful manner).

Let's put into practice what we have learnt:

• Use NumPy to store the arrays

• But we'll need to test for missing values and use
Masked Array ([Link])
Example code (and data)
Please refer to the example code:

example_code/test_read_rainfall.py

And data file:

example_data/uk_rainfall.txt
Reading the header
UK Rainfall (mm)
Areal series, starting from 1910
Allowances have been made for topographic, coastal
and urban effects where relationships are
found to exist.
Seasons: Winter=Dec-Feb, Spring=Mar-May,
Summer=June-Aug, Autumn=Sept-Nov. (Winter:
Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where
values are equal, rankings are based in
order of year descending.
Data are provisional from December 2014 & Winter
2015. Last updated 07/04/2015
Reading the header
UK Rainfall (mm) Line 1 is important
Areal series, starting from information.
1910
Allowances have been made for topographic, coastal
and urban effects where relationships are
found to exist. Other lines are useful
information.
Seasons: Winter=Dec-Feb, Spring=Mar-May,
Summer=June-Aug, Autumn=Sept-Nov. (Winter:
Let's capture the metadata in:
Year refers to Jan/Feb).
Values are ranked and displayed to 1 dp. Where
values are equal, rankings are based in
- location: UK
order of year descending.
- variable:
Data are provisional from December Rainfall
2014 & Winter
- units: mm
2015. Last updated 07/04/2015
Reading the header
def readHeader(fname):
# Open the file and read the relevant lines
f = open(fname)
head = [Link]()[:6]
[Link]()

# Get important stuff

location, variable, units = head[0].split()
units = [Link]("(", "").replace(")", "")

# Put others lines in comments

comments = head[1:6]
return (location, variable, units, comments)
Test the reader
>>> (location, variable, units, comments) = \
readHeader("example_data/uk_rainfall.txt")

>>> print location, variable, units

UK Rainfall mm

>>> print comments[1]

Allowances have been made for topographic, coastal
and urban effects where relationships are found to
exist.
Write a function to handle missing
data properly
import [Link] as MA

def checkValue(value):
# Check if value should be a float
# or flagged as missing
if value == "---":
value = [Link]
else:
value = float(value)
return value
Reading the data (part 1)
import [Link] as MA
def readData(fname):
# Open file and read column names and data block
f = open(fname)

# Ignore header
for i in range(7):
[Link]()

col_names = [Link]().split()
data_block = [Link]()
[Link]()

# Create a data dictionary, containing

# a list of values for each variable
data = {}
Data (first 9 columns)
Reading the data (part 2)
# Add an entry to the dictionary for each column
for col_name in col_names:

data[col_name] = [Link](len(data_block), 'f',

fill_value = -999.999)
Reading the data (part 3)
# Loop through each value: append to each column
for (line_count, line) in enumerate(data_block):
items = [Link]()

for (col_count, col_name) in enumerate(col_names):

value = items[col_count]
data[col_name][line_count] = checkValue(value)

return data
Testing the code
>>> data = readData("example_data/uk_rainfall.txt")
>>> print data["Year"]
[ 1910. 1911. 1912. ...

>>> print data["JAN"]

[ 111.40000153 59.20000076 111.69999695 ...

>>> winter = data["WIN"]

>>> print MA.is_masked(winter[0])
True
>>> print MA.is_masked(winter[1])
False
Look! A missing value!
What about CSV or tab-delimited?
The above example will work exactly the same with
a tab-delimited file (because the string split
method splits on white space) .

If the file used commas (CSV) to separate columns

then you could use:

[Link](",")
Or try the Python "csv" module
There is a python "csv" module that is able to read text files
with various delimiters. E.g.:

>>> import csv

>>> r = [Link](open("example_data/[Link]"))
>>> for row in r:
... print row

['Date', 'Time', 'Temp', 'Rainfall']

['2014-01-01', '00:00', '2.34', '4.45']
['2014-01-01', '12:00', '6.70', '8.34']
['2014-01-02', '00:00', '-1.34', '10.25']

See: [Link]

MATLAB Data Import and Export Guide
No ratings yet
MATLAB Data Import and Export Guide
11 pages
MATLAB Homework: Data Analysis & Plotting
No ratings yet
MATLAB Homework: Data Analysis & Plotting
8 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Numpy, Pandas, and Matplotlib Basics
No ratings yet
Numpy, Pandas, and Matplotlib Basics
50 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Python NumPy for Beginners
No ratings yet
Python NumPy for Beginners
50 pages
Python Foundation For Data Science
No ratings yet
Python Foundation For Data Science
9 pages
Programming With Python: Contents
No ratings yet
Programming With Python: Contents
28 pages
Pythonfile
No ratings yet
Pythonfile
37 pages
Unit 3
No ratings yet
Unit 3
110 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Week1 Numpy, Pandas (178) .Ipynb Colab
No ratings yet
Week1 Numpy, Pandas (178) .Ipynb Colab
6 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Data Science Problem Statements - Project 2 Titles
No ratings yet
Data Science Problem Statements - Project 2 Titles
50 pages
Python Matrix Multiplication Program
No ratings yet
Python Matrix Multiplication Program
21 pages
11th PGM
No ratings yet
11th PGM
9 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Importing Data Python Cheat Sheet PDF
No ratings yet
Importing Data Python Cheat Sheet PDF
1 page
Ai Ai
No ratings yet
Ai Ai
15 pages
Dfs Manual
No ratings yet
Dfs Manual
43 pages
DSF Lab Manual (OCS353T)
No ratings yet
DSF Lab Manual (OCS353T)
36 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
Python Programs For Practical File
No ratings yet
Python Programs For Practical File
16 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
54 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
6 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
DW - DW Internal 1 - Merged
No ratings yet
DW - DW Internal 1 - Merged
12 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Numpy - Ipynb - Colaboratory
No ratings yet
Numpy - Ipynb - Colaboratory
32 pages
CS3361 - Data Science University Question Paper Answers
No ratings yet
CS3361 - Data Science University Question Paper Answers
46 pages
03 Numpy and Pandas
No ratings yet
03 Numpy and Pandas
68 pages
MTE204 Data Python
No ratings yet
MTE204 Data Python
45 pages
Fds Lab
No ratings yet
Fds Lab
16 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Practicals 1 To 4
No ratings yet
Practicals 1 To 4
15 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
No ratings yet
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
6 pages
Data Loading and Wrangling Guide
No ratings yet
Data Loading and Wrangling Guide
22 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Dmp2.ipynb - Colab
No ratings yet
Dmp2.ipynb - Colab
16 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Python
No ratings yet
Python
17 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
For Loops in Computational Genomics
No ratings yet
For Loops in Computational Genomics
27 pages
PRINCIPLES OF DATA SCIENCE Lab
No ratings yet
PRINCIPLES OF DATA SCIENCE Lab
20 pages
RAW Data
No ratings yet
RAW Data
22 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
Python API for ANSYS Users
No ratings yet
Python API for ANSYS Users
35 pages
Pandas PDF
100% (2)
Pandas PDF
1,787 pages
Cgna16684enc 001 PDF
No ratings yet
Cgna16684enc 001 PDF
272 pages
ANSYS Composite PrepPost Users Guide
100% (1)
ANSYS Composite PrepPost Users Guide
370 pages
Adhesive Bonding ECSS E HB 32 21A 20march2011
100% (1)
Adhesive Bonding ECSS E HB 32 21A 20march2011
461 pages
Adhesive Bonding ECSS E HB 32 21A 20march2011
100% (1)
Adhesive Bonding ECSS E HB 32 21A 20march2011
461 pages
Compiler Design: Storage & Code Optimization
No ratings yet
Compiler Design: Storage & Code Optimization
8 pages
Intel Dh61ho Blkdh61ho Manual de Usuario PDF
100% (1)
Intel Dh61ho Blkdh61ho Manual de Usuario PDF
4 pages
PlanAhead Tutorial RTL Design IP
No ratings yet
PlanAhead Tutorial RTL Design IP
38 pages
Optical Supervisory Channel Module DS
No ratings yet
Optical Supervisory Channel Module DS
2 pages
Overview of Text Editors and Debugging
No ratings yet
Overview of Text Editors and Debugging
5 pages
C Programming Complete Notes
100% (10)
C Programming Complete Notes
147 pages
Computer Abbreviations - Ritambhara Pandey
No ratings yet
Computer Abbreviations - Ritambhara Pandey
4 pages
Mukesh More Final Cloud Computing
No ratings yet
Mukesh More Final Cloud Computing
14 pages
Top 50 Data Structures Interview Questions & Answers: 1) What Is Data Structure?
No ratings yet
Top 50 Data Structures Interview Questions & Answers: 1) What Is Data Structure?
8 pages
Apple Watch Series 3 OTA Updates
No ratings yet
Apple Watch Series 3 OTA Updates
1 page
Apple Product Price List
No ratings yet
Apple Product Price List
1 page
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
No ratings yet
HikCentral Professional On Amazon Web Services - Deployment Guide - 20211018
15 pages
MB Manual Z370-Aorus-Gaming-5 1002 e
No ratings yet
MB Manual Z370-Aorus-Gaming-5 1002 e
60 pages
32-bit CSLA Multiplier System Evaluation
No ratings yet
32-bit CSLA Multiplier System Evaluation
23 pages
OSPF Routing Protocol Quiz
No ratings yet
OSPF Routing Protocol Quiz
6 pages
Windows Server Installation Guide
No ratings yet
Windows Server Installation Guide
2 pages
ARM An ARMv8.1-M Performance Monitoring User Guide
No ratings yet
ARM An ARMv8.1-M Performance Monitoring User Guide
58 pages
Data Protection Advisor 6.2 SP1 Migrator Technical Notes
No ratings yet
Data Protection Advisor 6.2 SP1 Migrator Technical Notes
25 pages
Oracle Interview Preparation
No ratings yet
Oracle Interview Preparation
2 pages
Ch11 AVR Serial Port Programming
100% (1)
Ch11 AVR Serial Port Programming
25 pages
JavaScript Basics and ES6 Features
No ratings yet
JavaScript Basics and ES6 Features
158 pages
Programming in C Lab Manual
100% (1)
Programming in C Lab Manual
164 pages
Asus p5gc MX Rev 2 0
No ratings yet
Asus p5gc MX Rev 2 0
47 pages
Poweredge R220 Rack Server: The Dell Online Store: Build Your System
No ratings yet
Poweredge R220 Rack Server: The Dell Online Store: Build Your System
3 pages
Modbus Communication Guide
No ratings yet
Modbus Communication Guide
15 pages
Log
No ratings yet
Log
92 pages
Overview of Microsoft Windows OS
No ratings yet
Overview of Microsoft Windows OS
6 pages
VHDL Beginners Book
100% (7)
VHDL Beginners Book
201 pages
CSEC JUNE IT 2023 P2 Solution
100% (1)
CSEC JUNE IT 2023 P2 Solution
16 pages
Using DAO and RDO To Access Data
No ratings yet
Using DAO and RDO To Access Data
6 pages

Python - Working With Data - Text Formats

Uploaded by

Python - Working With Data - Text Formats

Uploaded by

Python – working with data –

In this example we use a file that we found on

Met Office monthly weather statistics for

• To read the header and data separately

Let's put into practice what we have learnt:

• Use NumPy to store the arrays

And data file:

# Get important stuff

# Put others lines in comments

>>> print location, variable, units

>>> print comments[1]

# Create a data dictionary, containing

data[col_name] = [Link](len(data_block), 'f',

for (col_count, col_name) in enumerate(col_names):

>>> print data["JAN"]

>>> winter = data["WIN"]

If the file used commas (CSV) to separate columns

>>> import csv

['Date', 'Time', 'Temp', 'Rainfall']

You might also like