Welcome To The Course!: Hugo Bowne-Anderson
Welcome To The Course!: Hugo Bowne-Anderson
course!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Import data
Flat les, e.g. .txts, .csvs
Relational databases
1 Source: Kaggle
^column
Flat le
Discuss at les
Hugo Bowne-Anderson
Data Scientist at DataCamp
Flat files
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
titanic.csv
column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
.txt - Text le
MNIST image:
Hugo Bowne-Anderson
Data Scientist at DataCamp
Why NumPy?
NumPy arrays: standard for storing numerical data
loadtxt()
genfromtxt()
[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0.]
[ 86. 254.]
[ 0. 0.]
...,
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
1 Source: Kaggle
1 Source: Kaggle
Hugo Bowne-Anderson
Data Scientist at DataCamp
What a data scientist needs
Two-dimensional labeled data structure(s)
Perform statistics
Data wrangling
Data preprocessing
Building models
Visualization
data_array = data.values
Hugo Bowne-Anderson
Data Scientist at DataCamp
Next chapters:
Import other le types:
Excel, SAS, Stata
Feather