0% found this document useful (0 votes)
9 views2 pages

5-1 Dataframes Intro Load Inspect - Instruction

Uploaded by

claryfray0930
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

5-1 Dataframes Intro Load Inspect - Instruction

Uploaded by

claryfray0930
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Dataframes: Introduction

You have a file iris-h.data containing the iris species data as a csv1, where however the delimiter is
not a comma but an “&”, the string data types are within “$” instead of quotes, and the first line is a
header line displaying column names. The first few rows of the data are shown as follows:
sample id&sepal length in cm&sepal width in cm&petal length in cm&petal
width in cm&class
0&5.1&3.5&1.4&0.2&$Iris-setosa$
1&4.9&3.0&1.4&_0.2&$Iris-setosa$
2&4.7&3.2&&0.2&$Iris-setosa$
# there is a comment here
3&4.6&_3.1&1.5&0.2&$Iris-setosa$

1. Load iris-h.data into a pandas variable called irisdata and specify the respective separator,
quote, and header parameters to load correctly. Use the names parameter to change your
column names into: "id","slen","swid","plen", "pwid", "cls". Use the ‘variable explorer’ &
‘Dataframe’ to check your result. You also can inspect first 5 rows using the head() method,
the last 3 rows using the tail() method, and print the results as below, you should have a
dataframe of size (151,6).
Head:
id slen swid plen pwid cls
0 0 5.1 3.5 1.4 0.2 Iris-setosa
1 1 4.9 3.0 1.4 _0.2 Iris-setosa
2 2 4.7 3.2 1.6 0.2 Iris-setosa
3 # there is a comment here NaN NaN NaN NaN NaN
4 3 4.6 _3.1 1.5 0.2 Iris-setosa

Tail:
id slen swid plen pwid cls
148 147 6.5 3.0 5.2 2.0 Iris-virginica
149 148 6.2 3.4 5.4 2.3 Iris-virginica
150 149 5.9 3.0 5.1 1.8 Iris-virginica

Test yourself:

# All statements below should be True


print(irisdata.shape == (151, 6))

2. The data we read is a bit messy, and we need to clean it. Firstly, a row that begins with ‘#’ is a
comment. This row should be ignored altogether. Then the first column named id contains
sample indexes which are row labels of the DataFrame. We can set this column to be the index
column. At last, there are some values in the DataFrame containing the nonsense character ‘_’.
We wish to remove all of them. You also can inspect the first 5 rows using the head() method
and print the results below.
Head:

1
This is a modified version of the iris.data csv from: https://archive.ics.uci.edu/ml/datasets/iris
slen swid plen pwid cls
id
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.6 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

3. Consider Iris-versicolor class as ‘True’ value and consider all other values (i.e. Iris-setosa and Iris-
virginica) as ‘False’ using the respective parameters. You also can inspect first 5 rows using the
head() method, and print the results as below.
Head:
slen swid plen pwid cls
id
0 5.1 3.5 1.4 0.2 False
1 4.9 3.0 1.4 0.2 False
2 4.7 3.2 1.6 0.2 False
3 4.6 3.1 1.5 0.2 False
4 5.0 3.6 1.4 0.2 False

4. Use the columns dataframe property and print the results. Your result should be a list of the
column names.

Index(['slen', 'swid', 'plen', 'pwid', 'cls'], dtype='object')

Then inspect the dataframe information using info() and print the results.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 slen 150 non-null float64
1 swid 150 non-null float64
2 plen 150 non-null float64
3 pwid 150 non-null float64
4 cls 150 non-null bool
dtypes: bool(1), float64(4)
memory usage: 6.0 KB
None

You might also like