0% found this document useful (0 votes)
90 views

Welcome To The Course!: Hugo Bowne-Anderson

The document introduces importing data in Python. It discusses importing data from flat files like CSVs and text files. It demonstrates how to read, print, and write to text files in Python. Flat files contain records organized in rows and columns, with each row representing a record and each column a feature or attribute. The header row specifies the column names. Importing flat files is an important part of data science work in Python.

Uploaded by

Luiz Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Welcome To The Course!: Hugo Bowne-Anderson

The document introduces importing data in Python. It discusses importing data from flat files like CSVs and text files. It demonstrates how to read, print, and write to text files in Python. Flat files contain records organized in rows and columns, with each row representing a record and each column a feature or attribute. The header row specifies the column names. Importing flat files is an important part of data science work in Python.

Uploaded by

Luiz Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Welcome to the

course!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
Import data
Flat les, e.g. .txts, .csvs

Files from other so ware

INTRODUCTION TO IMPORTING DATA IN PYTHON


Import data
Flat les, e.g. .txts, .csvs

Files from other so ware

Relational databases

INTRODUCTION TO IMPORTING DATA IN PYTHON


Plain text files

INTRODUCTION TO IMPORTING DATA IN PYTHON


Table data
titanic.csv

Name Sex Cabin Survived


Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0

1 Source: Kaggle

INTRODUCTION TO IMPORTING DATA IN PYTHON


Table data
titanic.csv

Name Sex Cabin Survived


_______________________________________________________
Braund, Mr. Owen Harris male NaN 0 <-- row
_______________________________________________________
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0

INTRODUCTION TO IMPORTING DATA IN PYTHON


Table data
titanic.csv

Name | Sex | Cabin Survived


Braund, Mr. Owen Harris | male | NaN 0
Cumings, Mrs. John Bradley | female | C85 1
Heikkinen, Miss. Laina | female | NaN 1
Futrelle, Mrs. Jacques Heath | female | C123 1
Allen, Mr. William Henry | male | NaN 0

^column

Flat le

INTRODUCTION TO IMPORTING DATA IN PYTHON


Reading a text file
filename = 'huck_finn.txt'
file = open(filename, mode='r') # 'r' is to read
text = file.read()
file.close()

INTRODUCTION TO IMPORTING DATA IN PYTHON


Printing a text file
print(text)

YOU don't know about me without you have read a book by


the name of The Adventures of Tom Sawyer; but that
ain't no matter. That book was made by Mr. Mark Twain,
and he told the truth, mainly. There was things which
he stretched, but mainly he told the truth. That is
nothing. never seen anybody but lied one time or
another, without it was Aunt Polly, or the widow, or
maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and
Mary, and the Widow Douglas is all told about in that
book, which is mostly a true book, with some
stretchers, as I said before.

INTRODUCTION TO IMPORTING DATA IN PYTHON


Writing to a file
filename = 'huck_finn.txt'
file = open(filename, mode='w') # 'w' is to write
file.close()

INTRODUCTION TO IMPORTING DATA IN PYTHON


Context manager with
with open('huck_finn.txt', 'r') as file:
print(file.read())

YOU don't know about me without you have read a book by


the name of The Adventures of Tom Sawyer; but that
ain't no matter. That book was made by Mr. Mark Twain,
and he told the truth, mainly. There was things which
he stretched, but mainly he told the truth. That is
nothing. never seen anybody but lied one time or
another, without it was Aunt Polly, or the widow, or
maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and
Mary, and the Widow Douglas is all told about in that
book, which is mostly a true book, with some
stretchers, as I said before.

INTRODUCTION TO IMPORTING DATA IN PYTHON


In the exercises, you’ll:
Print les to the console

Print speci c lines

Discuss at les

INTRODUCTION TO IMPORTING DATA IN PYTHON


Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
The importance of
flat files in data
science
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
Flat files
titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked

1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S

2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C

3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

Name Sex Cabin Survived


Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen Mr William Henry male NaN 0

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

Name Sex Cabin Survived


Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
titanic.csv

column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

Name Sex Cabin Survived


Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
Text les containing records

That is, table data

Record: row of elds or a ributes

titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
Text les containing records

That is, table data

Record: row of elds or a ributes

Column: feature or a ribute

titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


Flat files
Text les containing records

That is, table data

Record: row of elds or a ributes

Column: feature or a ribute

titanic.csv

column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


Header
titanic.csv

________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


Header
titanic.csv

________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

INTRODUCTION TO IMPORTING DATA IN PYTHON


File extension
.csv - Comma separated values

.txt - Text le

commas, tabs - Delimiters

INTRODUCTION TO IMPORTING DATA IN PYTHON


Tab-delimited file
MNIST.txt

pixel149 pixel150 pixel151 pixel152 pixel153


0 0 0 0 0
86 250 254 254 254
0 0 0 9 254
0 0 0 0 0
103 253 253 253 253
0 0 0 0 0
0 0 0 0 0
0 0 0 0 41
253 253 253 253 253

INTRODUCTION TO IMPORTING DATA IN PYTHON


Tab-delimited file
MNIST.txt

pixel149 pixel150 pixel151 pixel152 pixel153


0 0 0 0 0
86 250 254 254 254
0 0 0 9 254
0 0 0 0 0
103 253 253 253 253
0 0 0 0 0
0 0 0 0 0
0 0 0 0 41
253 253 253 253 253

MNIST image:

INTRODUCTION TO IMPORTING DATA IN PYTHON


How do you import flat files?
Two main packages: NumPy, pandas

Here, you’ll learn to import:


Flat les with numerical data (MNIST)

Flat les with numerical data and strings (titanic.csv)

INTRODUCTION TO IMPORTING DATA IN PYTHON


Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Importing flat files
using NumPy
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
Why NumPy?
NumPy arrays: standard for storing numerical data

INTRODUCTION TO IMPORTING DATA IN PYTHON


Why NumPy?
NumPy arrays: standard for storing numerical data

Essential for other packages: e.g. scikit-learn

loadtxt()

genfromtxt()

INTRODUCTION TO IMPORTING DATA IN PYTHON


Importing flat files using NumPy
import numpy as np
filename = 'MNIST.txt'
data = np.loadtxt(filename, delimiter=',')
data

[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

INTRODUCTION TO IMPORTING DATA IN PYTHON


Customizing your NumPy import
import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1)
print(data)

[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

INTRODUCTION TO IMPORTING DATA IN PYTHON


Customizing your NumPy import
import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0, 2])
print(data)

[[ 0. 0.]
[ 86. 254.]
[ 0. 0.]
...,
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]

INTRODUCTION TO IMPORTING DATA IN PYTHON


Customizing your NumPy import
data = np.loadtxt(filename, delimiter=',', dtype=str)

INTRODUCTION TO IMPORTING DATA IN PYTHON


Mixed datatypes
titanic.csv

Name Sex Cabin Fare


Braund, Mr. Owen Harris male NaN 7.3
Cumings, Mrs. John Bradley female C85 71.3
Heikkinen, Miss. Laina female NaN 8.0
Futrelle, Mrs. Jacques Heath female C123 53.1
Allen, Mr. William Henry male NaN 8.05

1 Source: Kaggle

INTRODUCTION TO IMPORTING DATA IN PYTHON


Mixed datatypes
titanic.csv

Name Sex Cabin Fare


Braund, Mr. Owen Harris male NaN 7.3
Cumings, Mrs. John Bradley female C85 71.3
Heikkinen, Miss. Laina female NaN 8.0
Futrelle, Mrs. Jacques Heath female C123 53.1
Allen, Mr. William Henry male NaN 8.05
^ ^
strings floats

1 Source: Kaggle

INTRODUCTION TO IMPORTING DATA IN PYTHON


Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Importing flat files
using pandas
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
What a data scientist needs
Two-dimensional labeled data structure(s)

Columns of potentially di erent types

Manipulate, slice, reshape, groupby, join, merge

Perform statistics

Work with time series data

INTRODUCTION TO IMPORTING DATA IN PYTHON


Pandas and the DataFrame

INTRODUCTION TO IMPORTING DATA IN PYTHON


Pandas and the DataFrame

INTRODUCTION TO IMPORTING DATA IN PYTHON


Pandas and the DataFrame

DataFrame = pythonic analog of R’s data frame

INTRODUCTION TO IMPORTING DATA IN PYTHON


Pandas and the DataFrame

INTRODUCTION TO IMPORTING DATA IN PYTHON


Manipulating pandas DataFrames
Exploratory data analysis

Data wrangling

Data preprocessing

Building models

Visualization

Standard and best practice to use pandas

INTRODUCTION TO IMPORTING DATA IN PYTHON


Importing using pandas
import pandas as pd
filename = 'winequality-red.csv'
data = pd.read_csv(filename)
data.head()

volatile acidity citric acid residual sugar


0 0.70 0.00 1.9
1 0.88 0.00 2.6
2 0.76 0.04 2.3
3 0.28 0.56 1.9
4 0.70 0.00 1.9

data_array = data.values

INTRODUCTION TO IMPORTING DATA IN PYTHON


You’ll experience:
Importing at les in a straightforward manner

Importing at les with issues such as comments and missing


values

INTRODUCTION TO IMPORTING DATA IN PYTHON


Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Final thoughts on
data import
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

Hugo Bowne-Anderson
Data Scientist at DataCamp
Next chapters:
Import other le types:
Excel, SAS, Stata

Feather

Interact with relational databases

INTRODUCTION TO IMPORTING DATA IN PYTHON


Next course:
Scrape data from the web

Interact with APIs

INTRODUCTION TO IMPORTING DATA IN PYTHON


Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N

You might also like