0% found this document useful (0 votes)

8 views50 pages

Pandas 1

Pandas is a Python library for data manipulation and analysis, created by Wes McKinney in 2008. It provides tools for cleaning, exploring, and analyzing data sets, including functionalities for handling missing values, duplicates, and data correlation. Key components of Pandas include Series and DataFrames, which facilitate the organization and analysis of data in a tabular format.

Uploaded by

jaiswalarunima8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views50 pages

Pandas 1

Uploaded by

jaiswalarunima8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

CAREERERA

PANDAS
WHAT IS PANDAS?
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and
manipulating data.
• The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes
McKinney in 2008.
WHY USE PANDAS?
• Pandas allows us to analyze big data and make conclusions
based on statistical theories.
• Pandas can clean messy data sets, and make them readable
and relevant.
• Relevant data is very important in data science.
WHAT CAN PANDAS DO?
• Pandas gives you answers about the data. Like:
• Is there a correlation between two or more columns?
• What is average value?
• Max value? Min value? Pandas are also able to delete rows
that are not relevant, or contains wrong values, like empty
or NULL values. This is called cleaning the data.
HOW TO IMPORT PANDAS?

• There are two ways to import pandas

import pandas:- This will import the entire pandas module.
from pandas import*:- This will import all class, objects,
variables etc. from pandas package. here * means all.
HOW TO USE import pandas?
HOW TO USE from pandas import*
WHAT IS SERIES?
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.
WHAT IS LABLES?
• If nothing else is specified, the values are labeled with their index
number. First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.
HOW TO CREATE LABLES?
• With the index argument, you can name your own labels.
HOW TO ACCESS VALUES IN LABELS?

• When you have created labels, you can access an item by referring
to the label.
WHAT IS KEY/VALUE OBJECTS AS SERIES ?

• We can also use a key/value object, like a dictionary, when creating a

Series.
• Note: The keys of the dictionary become the labels.
HOW TO INCLUDE SPECIFIC ITEMS IN
SERIES?

• To select only some of the items in the dictionary, use the index
argument and specify only the items you want to include in the
Series.
WHAT IS DATAFRAME?
• Data sets in Pandas are usually multi-dimensional tables, called
DataFrames.
• Series is like a column, a DataFrame is the whole table.
WHAT IS LOCATE ROW?
• As you can see from the result above, the DataFrame is like a table
with rows and columns.
• Pandas use the loc attribute to return one or more specified row(s)
• NOTE : When using [], the result is a Pandas DataFrame.
HOW TO NAMED INDEXES?
With the index argument, you can name your own indexes.
HOW TO LOCATE NAMED INDEXES?

• Use the named index in the loc attribute to return the specified
row(s).
HOW TO USE READ CSV IN PANDAS?

• A simple way to store big data sets is to use CSV files (comma
separated files).
• CSV files contains plain text and is a well know format that can be
read by everyone including Pandas.
• Tip: use to_string() to print the entire DataFrame. By default, when
you print a DataFrame, you will only get the first 5 rows, and the last
5 rows:
HOW TO LOAD FILES INTO A
DataFrame?
• If your data sets are stored in a file, Pandas can load them
into a DataFrame.
HOW TO ANALYZE DataFrame?
• By default, when you print a DataFrame, you will only get the first 5
rows, and the last 5 rows:
HOW TO VIEW & ANALYZE THE DATA?

• One of the most used method for getting a quick overview of the
DataFrame, is the head() method.
• The head() method returns the headers and a specified number of
rows, starting from the top.
• Note: if the number of rows is not specified, the head() method will
return the top 5 rows.
HOW TO VIEW & ANALYZE THE HEAD OF
DATA?
HOW TO VIEW & ANALYZE THE TAIL OF DATA?

• There is also a tail() method for viewing the last rows of the
DataFrame.
• The tail() method returns the headers and a specified number of
rows, starting from the bottom.
HOW TO CHECK INFO ABOUT THE
DATA?

• The DataFrames object has a method called info(), that gives you
more information about the data set.
HOW TO PERFORM DATA CLEANING?

• Data cleaning means fixing bad data in your data set.

• Bad data could be:
Empty cells
Data in wrong format
Wrong data
Duplicates
WHAT IS PANDAS - CLEANING EMPTY
CELLS?
• Empty cells can potentially give you a wrong result when you analyze
data.
• Remove Rows
• One way to deal with empty cells is to remove rows that contain
empty cells.
• This is usually OK, since data sets can be very big, and removing a few
rows will not have a big impact on the result.
• Note: By default, the dropna() method returns a new DataFrame, and
will not change the original.
• If you want to change the original DataFrame, use the inplace = True
argument:
HOW TO PERFORM PANDAS - CLEANING EMPTY
CELLS?
HOW TO PERFORM PANDAS - CLEANING EMPTY
CELLS?
HOW TO REPLACE EMPTY VALUES?

• Another way of dealing with empty cells is to insert a new value

instead.
• This way you do not have to delete entire rows just because of some
empty cells.
• The fillna() method allows us to replace empty cells with a value.
WHAT ARE THE STEPS TO REPLACE EMPTY
VALUES?
HOW TO REPLACE EMPTY VALUES ONLY FOR A SPECIFIED COLUMNS?

• To only replace empty values for one column, specify the column
name for the DataFrame:
HOW TO FILL EMPTY VALUES USING MEAN,
MEDIAN, or MODE?

• A common way to replace empty cells, is to calculate the mean,

median or mode value of the column.
• Pandas uses the mean() median() and mode() methods to calculate
the respective values for a specified column:
• Mean = the average value (the sum of all values divided by number of
values).
• Median = the value in the middle, after you have sorted all values
ascending.
• Mode = the value that appears most frequently.
HOW TO FILL EMPTY VALUES USING MEAN ?
HOW TO FILL EMPTY VALUES USING MEDIAN ?
HOW TO FILL EMPTY VALUES USING MODE ?
HOW TO PERFORM PANDAS-CLEANING DATA OF WRONG
FORMAT?

• Cells with data of wrong format can make it difficult, or even

impossible, to analyze data.
• To fix it, you have two options: remove the rows, or convert all
cells in the columns into the same format.
WHAT ARE THE STEPS TO CONVERT INTO A
CORRECT FORMAT?

# Convert To DATE
HOW TO REMOVE UNWANTED ROWS?
• The result from the converting in the example above gave us a Na
value, which can be handled as a NULL value, and we can remove the
row by using the dropna() method.
HOW TO FIX WRONG DATA?

• "Wrong data" does not have to be "empty cells" or "wrong format", it

can just be wrong, like if someone registered "199" instead of "1.99".
• Sometimes you can spot wrong data by looking at the data set,
because you have an expectation of what it should be.
• If you take a look at our data set, you can see that in row 7, the
duration is 450, but for all the other rows the duration is between 30
and 60.
• It doesn't have to be wrong, but taking in consideration that this is
the data set of someone's workout sessions, we conclude with the
fact that this person did not work out in 450 minutes.
WHAT IS REPLACING VALUES?

• One way to fix wrong values is to replace them with something else.

• For small data sets you might be able to replace the wrong data
one by one, but not for big data sets.
• To replace wrong data for larger data sets you can create some
rules, e.g. set some boundaries for legal values, and replace any
values that are outside of the boundaries.
HOW TO PERFORM REPLACE VALUES?

• Loop through all values in the "Duration" column.

• If the value is higher than 120, set it to 120:
HOW TO PERFORM REMOVING ROWS?
• Another way of handling wrong data is to remove the rows that contains
wrong data.
• This way you do not have to find out what to replace them with, and
there is a good chance you do not need them to do your analyses.
WHAT ARE DUPLICATES?

• Duplicate rows are rows that have been registered more than one
time.
• By taking a look at our test data set, we can assume that row 11 and
12 are duplicates.
• To discover duplicates, we can use the duplicated() method.
• The duplicated() method returns a Boolean values for each row:
HOW TO DISCOVER DUPLICATES?
HOW TO REMOVE DUPLICATES?
• To remove duplicates, use the drop_duplicates() method.
HOW TO CHECK PANDAS - DATA
CORRELATION?
• A great aspect of the Pandas module is the corr() method.
• The corr() method calculates the relationship between each column
in your data set.
WHAT IS PANDAS - DATA
CORRELATION?
• The corr() method ignores "not numeric" columns.
• Result Explained The Result of the corr() method is a table with a lot
of numbers that represents how well the relationship is between two
columns.
• The number varies from -1 to 1.
• 1 means that there is a 1 to 1 relationship (a perfect correlation), and
for this data set, each time a value went up in the first column, the
other one went up as well.
• 0.9 is also a good relationship, and if you increase one value, the
other will probably increase as well.
WHAT ARE THE TYPES OF PANDAS - DATA CORRELATION?

• -0.9 would be just as good relationship as 0.9, but if you increase one
value, the other will probably go down.
• 0.2 means NOT a good relationship, meaning that if one value goes
up does not mean that the other will.
• What is a good correlation? It depends on the use, but I think it is safe
to say you have to have at least 0.6 (or -0.6) to call it a good
correlation.
• Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000,
which makes sense, each column always has a perfect relationship
with itself.
WHAT ARE THE TYPES OF PANDAS - DATA CORRELATION?

• Good Correlation:
"Duration" and "Calories" got a 0.922721 correlation, which is a very
good correlation, and we can predict that the longer you work out,
the more calories you burn, and the other way around: if you burned
a lot of calories, you probably had a long work out.
• Bad Correlation:
"Duration" and "Maxpulse" got a 0.009403 correlation, which is a
very bad correlation, meaning that we can not predict the max pulse
by just looking at the duration of the work out, and vice versa.
THANK YOU !!!

Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Analysis of Algorithms: Matplotlib and Pandas Dataframe
No ratings yet
Analysis of Algorithms: Matplotlib and Pandas Dataframe
67 pages
Unit 3 Data Analysis Using Pandas
No ratings yet
Unit 3 Data Analysis Using Pandas
49 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Pandas Data Analytics
No ratings yet
Pandas Data Analytics
61 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
Data Science - Sec4
No ratings yet
Data Science - Sec4
16 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Python Pandas Presentation
No ratings yet
Python Pandas Presentation
32 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Pandas
No ratings yet
Pandas
25 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Pandas
No ratings yet
Pandas
30 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
29 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas
No ratings yet
Pandas
21 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Data Frames
No ratings yet
Data Frames
60 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas
No ratings yet
Pandas
94 pages
Pandas
No ratings yet
Pandas
7 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Learning Pandas PDF
No ratings yet
Learning Pandas PDF
171 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
2024 Summer Question Paper
No ratings yet
2024 Summer Question Paper
4 pages
Pandas
No ratings yet
Pandas
41 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Ip Project Class 12th
No ratings yet
Ip Project Class 12th
19 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
5 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Computer Science
No ratings yet
Computer Science
7 pages
122012502009, Rehan Molla, Health Care Chartbot Using Deeplearning Technique
No ratings yet
122012502009, Rehan Molla, Health Care Chartbot Using Deeplearning Technique
67 pages
Python
No ratings yet
Python
29 pages
AI Lab Manual Spring 2025
No ratings yet
AI Lab Manual Spring 2025
85 pages
Data Science Roadmap - Notes
No ratings yet
Data Science Roadmap - Notes
1 page
Data Engineering Top 100 Questions
No ratings yet
Data Engineering Top 100 Questions
59 pages
DL Project
No ratings yet
DL Project
9 pages
IP Board Ut
No ratings yet
IP Board Ut
5 pages
How To Become A Data Analyst in 3 Months
No ratings yet
How To Become A Data Analyst in 3 Months
5 pages
Coc - GCWK - Report
No ratings yet
Coc - GCWK - Report
74 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Big Book of Data Science Use Cases v3
No ratings yet
Big Book of Data Science Use Cases v3
86 pages
Visualizing Netflix Data Using Python!
No ratings yet
Visualizing Netflix Data Using Python!
13 pages
Slicing Pandas Dataframe - GeeksforGeeks
No ratings yet
Slicing Pandas Dataframe - GeeksforGeeks
4 pages
List of Programs For Informatics - XII - IP
No ratings yet
List of Programs For Informatics - XII - IP
26 pages
IP Project File Aman Nath - 1
No ratings yet
IP Project File Aman Nath - 1
33 pages
Python Internship Report
No ratings yet
Python Internship Report
31 pages
Advanced Data Analyst Roadmap
No ratings yet
Advanced Data Analyst Roadmap
3 pages
MLS 2 - NumPy and Pandas
No ratings yet
MLS 2 - NumPy and Pandas
27 pages
Brochure Python For Data Scientist
No ratings yet
Brochure Python For Data Scientist
14 pages
Half Yearly Examination 2022-23 PT2: Class XII
No ratings yet
Half Yearly Examination 2022-23 PT2: Class XII
7 pages
ML Project Report
No ratings yet
ML Project Report
14 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pivot Tables
No ratings yet
Pivot Tables
9 pages
Resume VinodMuleva SQLDev
No ratings yet
Resume VinodMuleva SQLDev
2 pages
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
No ratings yet
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
3 pages
Ctrl+Shift+Enter Mastering Excel Array Formulas: Do the Impossible with Excel Formulas Thanks to Array Formula Magic
From Everand
Ctrl+Shift+Enter Mastering Excel Array Formulas: Do the Impossible with Excel Formulas Thanks to Array Formula Magic
Mike Girvin
4/5 (11)
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
4.5/5 (3)
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)

Pandas 1

Uploaded by

Pandas 1

Uploaded by

CAREERERA

• There are two ways to import pandas

• We can also use a key/value object, like a dictionary, when creating a

• Data cleaning means fixing bad data in your data set.

• Another way of dealing with empty cells is to insert a new value

• A common way to replace empty cells, is to calculate the mean,

• Cells with data of wrong format can make it difficult, or even

• "Wrong data" does not have to be "empty cells" or "wrong format", it

• Loop through all values in the "Duration" column.

You might also like