0% found this document useful (0 votes)
31 views16 pages

Lec3 PandasDataframes 2

Uploaded by

mrg160999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views16 pages

Lec3 PandasDataframes 2

Uploaded by

mrg160999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Pandas Dataframes

Part II
Pandas Dataframes - Recap
In the previous lecture, we have seen about
 Introduction to pandas
 Importing data into Spyder
 Creating copy of original data
 Attributes of data
 Indexing and selecting data

Python for Data Science 2


In this lecture
 Data types
◦ Numeric
◦ Character
 Checking data types of each column
 Count of unique data types
 Selecting data based on data types
 Concise summary of dataframe
 Checking format of each column
 Getting unique elements of each column

Python for Data Science 3


Data types
 The way information gets stored in a dataframe or
a python object affects the analysis and outputs of
calculations
 There are two main types of data
◦ numeric and character types
 Numeric data types includes integers and floats
◦ For example: integer – 10, float – 10.53
 Strings are known as objects in pandas which can
store values that contain numbers and / or
characters
◦ For example:‘category1’

Python for Data Science 4


Numeric types
 Pandas and base Python uses different names for data types

Python data type Pandas data type Description


int int64 Numeric characters
float float64 Numeric characters with decimals

◦ ‘64’ simply refers to the memory allocated to store data in each cell which
effectively relates to how many digits it can store in each “cell”
◦ 64 bits is equivalent to 8 bytes
◦ Allocating space ahead of time allows computers to optimize storage and
processing efficiency

Python for Data Science 5


Character types
 Difference between category & object
category object
◦ A string variable ◦ The column will be assigned
consisting of only a few as object data type when it
different values. has mixed types (numbers
Converting such a and strings). If a column
string variable to a contains ‘nan’(blank cells),
categorical variable will pandas will default to object
save some memory datatype.
◦ A categorical variable ◦ For strings, the length is not
takes on a limited, fixed fixed
number of possible
values
Python for Data Science 6
Checking data types of each column
dtypes returns a series with the data type of
each column
Syntax: DataFrame.dtypes

Python for Data Science 7


Count of unique data types
get_dtype_counts()returns counts of
unique data types in the dataframe

Syntax: DataFrame.get_dtype_counts()

Python for Data Science 8


Selecting data based on data types
pandas.DataFrame.select_dtypes() returns a
subset of the columns from dataframe based on the column
dtypes

Syntax: DataFrame.select_dtypes(include=None,
exclude=None)

Python for Data Science 9


Concise summary of dataframe
info() returns a concise summary of a
dataframe
 data type of index

 data type of columns

 count of non-null values

 memory usage
Syntax: DataFrame.info()

Python for Data Science 10


Checking format of each column
By using info(), we can see
 ‘KM’ has been read as object instead of integer

 ‘HP’ has been read as object instead of integer

 ‘MetColor’ and ‘Automatic’ have been read as


float64 and int64 respectively since it has values 0/1
 Ideally, ‘Doors’ should’ve been read as int64 since it
has values 2, 3, 4, 5. But it has been read as object
 Missing values present in few variables

Let’s encounter the reason !

Python for Data Science 11


Unique elements of columns
unique() is used to find the unique
elements of a column
Syntax: numpy.unique(array)

 ‘KM’ has special character to it -


 Hence, it has been read as object instead of int64

Python for Data Science 12


Unique elements of columns
Variable ‘HP’ :

 ‘HP’ has special character to it -


 Hence, it has been read as object instead of int64
Variable ‘MetColor’ :

 ‘MetColor’ have been read as float64 since it has values 0. & 1.


Python for Data Science 13
Unique elements of columns
Variable ‘Automatic’ :

 ‘Automatic’ has been read as int64 since it has values 0 & 1

Variable ‘Doors’ :

 ‘Doors’ has been read as object instead of int64 because of


values ‘five’ ‘four’ ‘three’ which are strings

Python for Data Science 14


Summary
 Data types
◦ Numeric
◦ Character
 Checked data types of each column
 Count of unique data types
 Selected data based on data types
 Concise summary of dataframe
 Checked format of each column
 Got unique elements of each column

Python for Data Science 15


THANK YOU

You might also like