0% found this document useful (0 votes)
37 views

Lec2 PandasDataframes 1

This document provides an introduction and overview of pandas and dataframes. It discusses importing data into Spyder, creating copies of data, attributes of dataframes like index, columns, size and shape. It also covers indexing and selecting data using square brackets, head and tail functions, at and iat methods, and using loc to access data by label. The goal of pandas is to provide easy-to-use and high performance data structures and analysis tools for Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Lec2 PandasDataframes 1

This document provides an introduction and overview of pandas and dataframes. It discusses importing data into Spyder, creating copies of data, attributes of dataframes like index, columns, size and shape. It also covers indexing and selecting data using square brackets, head and tail functions, at and iat methods, and using loc to access data by label. The goal of pandas is to provide easy-to-use and high performance data structures and analysis tools for Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Pandas Dataframes

Part I
In this lecture
 Introduction to pandas

 Importing data into Spyder

 Creating copy of original data

 Attributes of data

 Indexing and selecting data

Python for Data Science 2


Introduction to Pandas
 Provides high-performance, easy-to-use
data structures and analysis tools for the
Python programming language
 Open-source Python library providing high-
performance data manipulation and analysis
tool using its powerful data structures
 Name pandas is derived from the word
Panel Data – an econometrics term for
multidimensional data
Python for Data Science 3
Pandas
 Pandas deals with dataframes

Name Dimension Description


Dataframe 2  two-dimensional size-mutable
 potentially heterogeneous tabular
data structure with labeled axes
(rows and columns)

Python for Data Science 4


Importing data into Spyder
 Importing necessary libraries

‘os’ library to change the working directory


‘pandas’ library to work with dataframes
‘numpy’ library to perform numeric operations

 Changing the working directory

Python for Data Science 5


Importing data into Spyder
 Importing data

o By passing , first column becomes the index column

Python for Data Science 6


Creating copy of original data
 In Python, there are two ways to create copies
o Shallow copy
o Deep copy
Shallow copy Deep copy

Function

o It only creates a new variable o In case of deep copy, a copy of


Description that shares the reference of object is copied in other object
the original object with no reference to the original
o Any changes made to a copy o Any changes made to a copy of
of object will be reflected in object will not be reflected in
the original object as well the original object

Python for Data Science 8


Attributes of data

DataFrame.index
➢ To get the index (row labels) of the dataframe

Python for Data Science 9


Attributes of data
DataFrame.columns
➢ To get the column labels of the dataframe

Python for Data Science 10


Attributes of data
DataFrame.size
➢ To get the total number of elements from the
dataframe

DataFrame.shape
➢ To get the dimensionality of the dataframe

1436 rows & 10 columns


Python for Data Science 11
Attributes of data
DataFrame.memory_usage([index, deep])
➢ The memory usage of each column in bytes

DataFrame.ndim
➢ The number of axes / array dimensions

A two-dimensional array stores data in a


format consisting of rows and columns
Python for Data Science 12
Indexing and selecting data a

• Python slicing operator ‘[ ]’ and attribute/


dot operator ‘. ’ are used for indexing

• Provides quick and easy access to pandas


data structures

Python for Data Science 13


Indexing and selecting data
DataFrame.head([n])
➢ The function head returns the first n rows from the dataframe

By default, the head() returns first 5 rows

Python for Data Science 14


Indexing and selecting data
➢ The function tail returns the last n rows for the object based on position

✓ It is useful for quickly verifying data


✓ Ex: after sorting or appending rows.
Python for Data Science 15
Indexing and selecting data
• To access a scalar value, the fastest way
is to use the at and iat methods
○ at provides label-based scalar lookups

○ iat provides integer-based lookups

Python for Data Science 16


Indexing and selecting data
 To access a group of rows and columns by
label(s) .loc[ ] can be used

a

Python for Data Science 17


THANK YOU

You might also like