0% found this document useful (0 votes)
15 views16 pages

CSL 410 L13

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

CSL 410 L13

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Program:B.

Tech(CSE) IV Semester II Year

CSL-410: Data Science using Python


Unit No. 2
Introduction of Pandas Library

Lecture No. 13

Dr. Sanjay Jain


Associate Professor, CSA/SOET
Outlines
• Pandas-Introduction
• Pandas-Key Features
• Pandas-Environment Setup
• Pandas-Data Structures
– Series
– Data Frames
– Panel
• References
Student Effective Learning Outcomes(SELO)
01: Ability to understand subject related concepts clearly along with
contemporary issues.
02: Ability to use updated tools, techniques and skills for effective domain
specific practices.
03: Understanding available tools and products and ability to use it
effectively.
Pandas: Introduction
• Pandas is an open-source Python Library providing high-performance data
manipulation and analysis tool using its powerful data structures.
• The name Pandas is derived from the word Panel Data – an Econometrics
from Multidimensional data.
• In 2008, developer Wes McKinney started developing pandas when in need
of high performance, flexible tool for analysis of data.
• Prior to Pandas, Python was majorly used for data munging and
preparation. It had very less contribution towards data analysis.
• Pandas solved this problem. Using Pandas, we can accomplish five typical
steps in the processing and analysis of data, regardless of the origin of data
— load, prepare, manipulate, model, and analyze.
• Python with Pandas is used in a wide range of fields including academic
and commercial domains including finance, economics, Statistics,
analytics, etc.

<SELO: 1> <Reference No.: R1,R4>


Pandas: Key Features
• Fast and efficient DataFrame object with default and customized indexing.
• Tools for loading data into in-memory data objects from different file
formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.

<SELO: 1> <Reference No.: R1,R4>


Pandas: Environment Setup
• Standard Python distribution doesn't come bundled with Pandas module. A
lightweight alternative is to install Pandas using popular Python package
installer, pip.
pip install pandas
• If you install Anaconda Python package, Pandas will be installed by
default.

<SELO: 1> <Reference No.: R1,R4>


Pandas: Environment Setup
• Windows
– Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy
stack. It is also available for Linux and Mac.
– Canopy (https://www.enthought.com/products/canopy/) is available as free as well
as commercial distribution with full SciPy stack for Windows, Linux and Mac.
– Python (x,y) is a free Python distribution with SciPy stack and Spyder IDE for
Windows OS. (Downloadable from http://python-xy.github.io/)
• Linux
– Package managers of respective Linux distributions are used to install one or more
packages in SciPy stack.
• For Ubuntu Users
– sudo apt-get install python-numpy python-scipy python-
matplotlibipythonipythonnotebook python-pandas python-sympy python-nose
• For Fedora Users
– sudo yum install numpyscipy python-matplotlibipython python-pandas sympy python-
nose atlas-devel
<SELO: 1> <Reference No.: R1,R4>
Pandas: Introduction to Data Structures
• Pandas deals with the following three data structures:
– Series
– DataFrame
– Panel
• These data structures are built on top of Numpy array, which means they
are fast.
• The best way to think of these data structures is that the higher dimensional
data structure is a container of its lower dimensional data structure. For
example, DataFrame is a container of Series, Panel is a container of
DataFrame.

<SELO: 1> <Reference No.: R1,R4>


Pandas: Introduction to Data Structures
• Building and handling two or more dimensional arrays is a tedious task,
burden is placed on the user to consider the orientation of the data set when
writing functions.
• But using Pandas data structures, the mental effort of the user is reduced.
• For example, with tabular data (DataFrame) it is more semantically helpful
to think of the index (the rows) and the columns rather than axis 0 and
axis 1.
• Mutability
All Pandas data structures are value mutable (can be changed) and except
Series all are size mutable. Series is size immutable.
• DataFrame is widely used and one of the most important data
structures. Panel is very less used.

<SELO: 1> <Reference No.: R1,R4>


Pandas: Series
• Series is a one-dimensional array like structure with homogeneous data.
For example, the following series is a collection of integers 10, 23, 56, …

• Key Points
– Homogeneous data
– Size Immutable
– Values of Data Mutable

<SELO: 1> <Reference No.: R1,R4>


Pandas: DataFrame
• DataFrame is a two-dimensional array with heterogeneous data. For
example,

• The table represents the data of a sales team of an organization with their
overall performance rating. The data is represented in rows and columns.
Each column represents an attribute and each row represents a person.

<SELO: 1> <Reference No.: R1,R4>


Pandas: DataFrame
• The data types of the four columns are as follows:

• Key Points
– Heterogeneous data
– Size Mutable
– Data Mutable

<SELO: 1> <Reference No.: R1,R4>


Pandas: Panel
• Panel is a three-dimensional data structure with heterogeneous data. It is
hard to represent the panel in graphical representation. But a panel can be
illustrated as a container of DataFrame.
• Key Points
– Heterogeneous data
– Size Mutable
– Data Mutable

<SELO: 1> <Reference No.: R1,R4>


Learning Outcomes

The students have learn and understand the followings:

•Pandas-Introduction
•Pandas-Key Features
•Pandas-Environment Setup
•Pandas-Data Structures
•Series
•DataFrames
•Panel
References

1. Anaconda for python softwares(Jupiter notebook and spider IDE)


https://www.anaconda.com/products/individual
2. Python software for windows
https://www.python.org/downloads/
3. Online Google python notebook
https://colab.research.google.com/notebooks
Thank you

You might also like