0% found this document useful (0 votes)
2 views

Unit 3 Data Analysis using pandas - Copy

The document provides an overview of the Pandas library, highlighting its importance for data analysis in Python. It covers installation, key data structures like Series and DataFrame, indexing, sorting, and statistical functions. Additionally, it includes practical labs and quizzes to reinforce learning about data manipulation using Pandas.

Uploaded by

rohini.d.patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 3 Data Analysis using pandas - Copy

The document provides an overview of the Pandas library, highlighting its importance for data analysis in Python. It covers installation, key data structures like Series and DataFrame, indexing, sorting, and statistical functions. Additionally, it includes practical labs and quizzes to reinforce learning about data manipulation using Pandas.

Uploaded by

rohini.d.patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Data Analysis using Pandas

Unit 3
Data Analysis using
Pandas
Data Analysis using Pandas

Disclaimer
The content is curated from online/offline resources and used for educational purpose only
Data Analysis using Pandas

Pandas can Import


Many More Data
files..
Data Analysis using Pandas

Pandas solved
your Problem
Missing Values Give your Data to Pandas with
the correct code
Data Analysis using Pandas

Learning Objectives
• Introduction to Pandas
• Why Pandas?
• Applications of Pandas
• Installation of Pandas
• Pandas Objects
• Pandas Sort
• Working with Text Data
• Statistical Function
• Indexing and Selecting Data
Data Analysis using Pandas

Introduction to Pandas
• Pandas is an open-source Python library that uses powerful data structures to provide high-
performance data manipulation and analysis.
• It provides a variety of data structures and operations for manipulating numerical data and time series.
• This library is based on the NumPy library.
Data Analysis using Pandas

Why Pandas?

• Pandas allows you to become familiar with your


data by cleaning, transforming, and analysing
it.
• Pandas have so many applications that it might
be more useful to list what it can't do than what
it can.
• This tool is essentially the home of your data.
Data Analysis using Pandas

Click here

Reference link Reference link Reference link


Data Analysis using Pandas

Click here

Reference link Reference link


Data Analysis using Pandas

Installation of Pandas

• The first step in using pandas is to check whether it is installed in the Python folder.
• If not, we must install it on our system using the pip command.
pip install pandas

• After installing pandas on your system, you'll need to import the library.
• This module is typically imported as follows:

import pandas as pd
Data Analysis using Pandas

Introducing Pandas Objects​


• Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the
rows and columns are identified with labels rather than simple integer indices ​
• There are three fundamental Pandas data structures: ​
• Series​
• DataFrame ​
• Index

Pandas Object
Data Analysis using Pandas

What is a Series?
• Pandas Series is a labelled one-dimensional array that
can hold any type of data (integer, string, float, Python
objects, and so on).
• Pandas Series is simply a column in an Excel
spreadsheet.
• Using the Series() method, we can easily convert a list,
tuple, or dictionary into a Series.

series
Data Analysis using Pandas

What is a Series?

Creating a Series Creating a series from Lists


Data Analysis using Pandas

Pandas Index
• Pandas Index is an efficient tool for extracting
particular rows and columns of data from a
DataFrame.
• Its job is to organize data and make it easily
accessible.
• We can also define an index, similar to an address,
through which we can access any data in the Series
or DataFrame.

Pandas Index
Data Analysis using Pandas

Pandas Index

Creating Index
First, we have to take a csv file that consist some data used for indexing.
Data Analysis using Pandas

Pandas DataFrame
• Panda has A two-dimensional data structure with
corresponding labels is known as a dataframe.
• Spreadsheets used in Excel or Calc or SQL tables
are similar to DataFrames.
• Pandas DataFrame consists of three main
components: the data, the index, and the columns.

DataFrame
Data Analysis using Pandas

Pandas DataFrame
Creating a Pandas DataFrame
• Creating a dataframe using List: DataFrame can
be created using a single list or a list of lists.
Data Analysis using Pandas

Pandas DataFrame

Creating a Pandas DataFrame


• Creating DataFrame from dict of ndarray/lists: To
generate a DataFrame from a dict of ndarrays/lists,
each ndarray must be the same length.
Data Analysis using Pandas

Lab 1 Creating series, index and DataFrame using


Pandas Library
Data Analysis using Pandas

Reindexing
• Reindexing modifies the row and column labels of a DataFrame.
• It denotes verifying that the data corresponds to a specific set of labels along an established
axis.Indexing enables us to carry out a variety of operations, including:-
• Insert missing value (NaN) markers in label locations where there was previously no data for the
label.
• To reorder existing data to correspond to a new set of labels.
Data Analysis using Pandas

Reindexing
• To reindex the dataframe, use the reindex() function.
• Values in the new index that do not have matching records in the dataframe are by default given the value
NaN.
Now, we can use the dataframe.reindex() function
to reindex the dataframe.
Data Analysis using Pandas

Reindexing

• Notice that the new indexes are populated with NaN values.
• We can fill in the missing values using the fill_value parameter.
Data Analysis using Pandas

Pandas Sort
There are two kinds of sorting available in Pandas. They are –

• By label
• By Actual Value

By Label - When using the sort_index()


method, DataFrame can be sorted by
passing the axis arguments and the sorting
order. Row labels are sorted by default in
ascending order.
Data Analysis using Pandas

Pandas Sort
Order of Sorting
The order of sorting can be controlled by passing a Boolean value to the ascending parameter. To
better understand this, consider the following example.
Data Analysis using Pandas

Pandas Sort
Sort the Columns
Sorting on the column labels is possible by passing the axis argument a value of 0 or 1. Sort by row by
default, axis=0. To better understand this, consider the following example.
Data Analysis using Pandas

Pandas Sort
By Value
Sort_values(), like index sorting, is a method for sorting by values. It accepts a 'by' argument, which will
use the column name of the DataFrame to sort the values.
Data Analysis using Pandas

Lab 2 Perform Selection, Reindexing and Sorting


in Pandas
Data Analysis using Pandas

Working with Text Data


• Working with string data is made simple by a set of string functions that are part of Pandas.
• Most importantly, these functions ignore (or exclude) missing/NaN values.
• Watch each operation now to see how it does.

lower() Converts strings in the Series/Index to lower case.

upper() Converts strings in the Series/Index to upper case.

len() Computes String length()

isupper() Checks whether all characters in each string in the Series/Index in


upper case or not. Returns Boolean.
isnumeric() Checks whether all characters in each string in the Series/Index are
numeric. Returns Boolean.
islower() Checks whether all characters in each string in the Series/Index in
lower case or not. Returns Boolean
Data Analysis using Pandas

Working with Text Data

lower()

upper()
Data Analysis using Pandas

Statistical Functions
• Using pandas, it is simple to simplify numerous complex statistical operations in Python to a single line of
code.
• Some of the most popular and practical statistical operations will be covered.

sum(): Return the sum of the values.


count(): Return the count of non-empty values.
max(): Return the maximum of the values.
min(): Return the minimum of the values.
mean(): Return the mean of the values.
median(): Return the median of the values.
std(): Return the standard deviation of the values.
describe(): Return the summary statistics for each column
Data Analysis using Pandas

Statistical Functions

Pandas sum() method Pandas count() method


Data Analysis using Pandas

Statistical Functions

Pandas max() method Pandas min() method


Data Analysis using Pandas

Statistical Functions

Pandas median() method


Data Analysis using Pandas

Just One command and


Summary Statistics Hip Hip Hurray! get All Insights from Data
Data Analysis using Pandas

Lab 3 Working with Text Data and Statistical


Functions in Pandas
Data Analysis using Pandas

Indexing and Selecting Data


• In Pandas, selecting specific rows and columns of data from a DataFrame constitutes indexing.
• Selecting all the rows and some of the columns, some of the rows and all the columns, or a portion of each
row and each column is what is referred to as indexing.
• Another term for indexing is subset selection.
• Pandas now supports three types of Multi-axes indexing

.loc() Label based

.iloc() Integer based

.ix() Both Label and Integer based


Data Analysis using Pandas

Indexing and Selecting Data

Indexing a Data frame using indexing operator [] :

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving columns by indexing operator


first = data["Age"]

print(first)
Data Analysis using Pandas

Indexing and Selecting Data

Indexing a DataFrame using .loc[ ] :

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col
="Name")

# retrieving row by loc method


first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)


Data Analysis using Pandas

Indexing and Selecting Data

Indexing a DataFrame using .iloc[ ] :

import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col
="Name")

# retrieving rows by iloc method


row2 = data.iloc[3]

print(row2)
Data Analysis using Pandas

Indexing and Selecting Data

Indexing a using Dataframe.ix[ ] :

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col
="Name")

# retrieving row by ix method


first = data.ix["Avery Bradley"]
print(first)
Data Analysis using Pandas

Lab 4 Perform Selection and Indexing operation


using loc and iloc function
Data Analysis using Pandas

Summary
• We have completed this section and now we have understood about:
• What is Pandas
• Application of Pandas
• Structure of Pandas –Series, Index and DataFrame
• How to import Pandas Library
• How to import files using Pandas
• Indexing in Pandas
• Sort method in Pandas
• We have performed different types of Data Analysis
• This Knowledge we will use in Machine Learning, Data Analysis, Visualization and Mathematical
Operation.
Data Analysis using Pandas

Quiz
1. Pandas Stands For_________

a) Panel Data Analysis


b) Panel Data Analyst
c) Panel Data
d) Panel Dashboard

Answer: c) Panel Data


Data Analysis using Pandas

Quiz
2. _________is in important library used for analyzing data.

a) Math
b) Random
c) Pandas
d) None of the above

Answer: c) Pandas
Data Analysis using Pandas

Quiz
3. _________is used when data in Tabular Format

a) NumPy
b) Pandas
c) Matplotlib
d) All of the above

Answer: b) Pandas
Data Analysis using Pandas

Quiz
4. Which of the following command is used to install Pandas?

a) pip install pandas


b) install pandas
c) pip pandas
d) None of the above

Answer: a) pip install pandas


Data Analysis using Pandas

Quiz
5. A _________is a One-dimensional array.

a) Data Frame
b) Series
c) Both of the above
d) None of the above

Answer: a) Series
Data Analysis using Pandas

Reference
• https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)
• https://docs.python.org/3/library/
• https://pandas.pydata.org/docs/user_guide/10min.html
• https://www.geeksforgeeks.org/python-pandas-series/
• https://towardsdatascience.com/pandas-index-explained-b131beaf6f7b
• https://medium.com/analytics-vidhya/introduction-to-pandas-90b75a5c2278
• https://mode.com/python-tutorial/libraries/pandas/
• https://www.freepik.com/
Data Analysis using Pandas

Thank you...!

You might also like