0% found this document useful (0 votes)

12 views54 pages

Pandas

Pandas is a Python library designed for data manipulation and analysis, providing tools for cleaning, exploring, and analyzing datasets. It includes data structures like Series and DataFrames, which facilitate handling one-dimensional and two-dimensional data, respectively. Key functionalities include data cleaning, statistical analysis, and methods for sorting, ranking, and selecting data.

Uploaded by

iamjasper2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views54 pages

Pandas

Uploaded by

iamjasper2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

PANDAS

Pandas: Exploring Data using Series, Exploring Data using DataFrames, Index objects,
Re index, Drop Entry, Selecting Entries, Data Alignment, Rank and Sort

21CSS303T/DS
PANDAS

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data

Analysis“.

Pandas allows us to analyze big data and make conclusions based on statistical

theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

21CSS303T/DS
PANDAS

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

What is average value?
Max value?
Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.

21CSS303T/DS
PANDAS

Pandas Codebase?

import pandas mydataset = { ‘cars’: [“BMW”, “Volvo”, “Ford”], ‘passings’: [3, 7, 2] }

myvar = pandas.DataFrame(mydataset)
print(myvar)

Pandas as pd

Pandas is usually imported under the pd alias.

alias: In Python alias are an alternate name for referring to the same thing.
Create an alias with the "as" keyword while importing:

### Syntax : import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

21CSS303T/DS
PANDAS

For Checking Pandas Version

The version string is stored under version attribute.

21CSS303T/DS
PANDAS

Pandas Series

What is a Series?
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.

Example : Create a simple Pandas Series from a list - int, float, string

21CSS303T/DS
PANDAS

Based on the values present in the series, the datatype of the series is decided.

21CSS303T/DS
PANDAS

Labels

If nothing else is specified, the values are labeled with their index number.
First value has index 0, second value has index 1 etc.
This label can be used to access a specified value.

21CSS303T/DS
PANDAS

Example : Return the second value of the Series:

21CSS303T/DS
PANDAS

Create you own labels

21CSS303T/DS
PANDAS

Example : Return the value of “y”:

21CSS303T/DS
PANDAS

Pandas DataFrames

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or

a table with rows and columns.
In Python Pandas module, DataFrame is a very basic and important type.

To create a DataFrame from different sources of data or other Python

datatypes, we can use "DataFrame”.

Syntax of DataFrame() class :

DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example Create an Empty DataFrame

To create an empty DataFrame, pass no arguments to pandas.DataFrame() class.

In this example, we create an empty DataFrame and print it to the console
output.
21CSS303T/DS
PANDAS

Example : Create a simple Pandas DataFrame

21CSS303T/DS
PANDAS

Example Create a simple Pandas DataFrame with Lables - Index

21CSS303T/DS
PANDAS

Create Pandas DataFrame from List of Lists

To create Pandas DataFrame from list of lists, you can pass this list of lists as data
argument to "pandas.DataFrame()".
Each inner list inside the outer list is transformed to a row in resulting DataFrame.

Example : Create DataFrame from List of Lists

21CSS303T/DS
PANDAS

Example : Create DataFrame from List of Lists with Column Names & Index

21CSS303T/DS
PANDAS

Example : Create DataFrame from List of Lists with Different List Lengths

21CSS303T/DS
PANDAS

Create Pandas DataFrame from Python Dictionary

You can create a DataFrame from Dictionary by passing a dictionary as the data
argument to Data Dictionary.

Example : Create DataFrame from Dictionary

21CSS303T/DS
PANDAS

Pandas Read CSV

A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well know format that can be read by
everyone.

21CSS303T/DS
PANDAS

to_string() Method :

to_string() is used to print the entire DataFrame.

21CSS303T/DS
PANDAS
Null Values :

21CSS303T/DS
PANDAS

Shape Method :

Viewing Data :

To see how the data looks, we can use the head () method, which shows just
the first five rows if we put a number as an argument to this method, this will
be the number of the first rows that are listed.

21CSS303T/DS
PANDAS

df.head() Method :

21CSS303T/DS
PANDAS

tail() Method :

The tail() method, which returns the last five rows by default.

21CSS303T/DS
PANDAS

Names of the columns or the names of the indexes :

If we want to know the names of the columns or the names of the indexes,
we can use the DataFrame attributes columns and index respectively.
The names of the columns or indexes can be changed by assigning a new list
of the same length to these attributes.

21CSS303T/DS
PANDAS

The values of any DataFrame can be retrieved as a Python array by calling its
values attribute.

21CSS303T/DS
PANDAS

Info About the Data:

The DataFrames object has a method called info(), that gives you more
information about the data set.

21CSS303T/DS
PANDAS

describe() Method :

If we just want quick statistical information on all the numeric columns in a data
frame, we can use the function describe().
The result shows the count, the mean, the standard deviation, the minimum and
maximum, and the percentiles, by default, the 25th, 50th, and 75th, for all the values
in each column or series

21CSS303T/DS
PANDAS

Selecting Data

21CSS303T/DS
PANDAS

Reindexing

An important method on pandas objects is reindex, which means to create a

new object with the data conformed to a new index.

21CSS303T/DS
PANDAS

Calling reindex on this Series rearranges the data according to the new index,
introducing missing values if any index values were not already present:

21CSS303T/DS
PANDAS

For ordered data like time series, it may be desirable to do some interpolation or
filling of values when reindexing. The method option allows us to do this, using a
method such as ffill, which forward-fills the values:

21CSS303T/DS
PANDAS

Dropping Entries from an Axis

Dropping one or more entries from an axis is easy if you already have an index
array or list without those entries.

drop method will return a new object with the indicated value or values deleted
from an axis:

21CSS303T/DS
PANDAS

Sorting and Ranking

Sorting a dataset by some criterion is another important built-in operation. To
sort lexicographically by row or column index, use the sort_index method,
which returns a new, sorted object:

21CSS303T/DS
PANDAS

To sort a Series by its values, use its sort_values method:

21CSS303T/DS
PANDAS

Ranking assigns ranks from one through the number of valid data points in an array.
The rank methods for Series and DataFrame are the place to look; by default rank
breaks ties by assigning each group the mean rank:

21CSS303T/DS
PANDAS

Ranks can also be assigned according to the order in which they’re observed in
the data:

Here, instead of using the average rank 6.5 for the entries 0 and 2, they instead
have been set to 6 and 7 because label 0 precedes label 2 in the data.
You can rank in descending order, too:

21CSS303T/DS
PANDAS

DataFrame can compute ranks over the rows or the columns:

21CSS303T/DS
PANDAS

Slice Operator :

If we want to select a subset of rows from a DataFrame, we can do so by indicating

a range of rows separated by : inside the square brackets.
This is commonly known as a slice of rows.
Next instruction returns the slice of rows from the 9th to the 13th position.

Note : that the slice does not use the index labels as references, but the position

21CSS303T/DS
PANDAS

21CSS303T/DS
PANDAS
If we want to select a subset of columns and rows using the labels as our references
instead of the positions, we can use loc indexing:
Next instruction will return all the rows between the indexes specified in the slice
before the comma, and the columns specified as a list after the comma.

21CSS303T/DS
PANDAS

Pandas - Cleaning Data

Data Cleaning :
Data cleaning means fixing bad data in your data set. Bad data could
be:
Empty cells
Data in wrong format
Wrong data
Duplicates

21CSS303T/DS
PANDAS

is null() method :

How many null values

21CSS303T/DS
PANDAS

Remove Rows :

One way to deal with empty cells is to remove rows that contain empty cells.

dropna() method :

the dropna() method returns a new DataFrame, and will not change the original.

21CSS303T/DS
PANDAS

Replace Empty Values :

Another way of dealing with empty cells is to insert a new value instead.

This way you do not have to delete entire rows just because of some empty

cells. fillna() method :

The fillna() method allows us to replace

empty cells with a value:

21CSS303T/DS
PANDAS

Replace Only For Specified Columns

The example above replaces all empty cells in the whole Data Frame.
To only replace empty values for one column, specify the column name for
the DataFrame:

21CSS303T/DS
PANDAS

Discovering Duplicates :

Duplicate rows are rows that have been registered more than one time.
By taking a look at our test data set,
we can assume that row 11 and 12
are duplicates.
To discover duplicates, we can
use the duplicated() method.
The duplicated() method returns a
Boolean values for each row:

21CSS303T/DS

Albay Numeracy Assessment Tools ALNAT Manual
100% (7)
Albay Numeracy Assessment Tools ALNAT Manual
31 pages
Salesforce CPQ Manual
No ratings yet
Salesforce CPQ Manual
475 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Anomaly Detection On Time Series Data Challenge Rules
100% (1)
Anomaly Detection On Time Series Data Challenge Rules
8 pages
MonetDB User Guide
No ratings yet
MonetDB User Guide
49 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Woodward DTSC-200 Configuration
100% (2)
Woodward DTSC-200 Configuration
158 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
Pandas
No ratings yet
Pandas
16 pages
Pandas
No ratings yet
Pandas
41 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
41 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Cable and Harness
No ratings yet
Cable and Harness
14 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Whatsapp Group Contacts Getter
No ratings yet
Whatsapp Group Contacts Getter
4 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Apache JMeter - User's Manual - Best Practices-17 PDF
No ratings yet
Apache JMeter - User's Manual - Best Practices-17 PDF
4 pages
Unit 4
No ratings yet
Unit 4
36 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Laboratory Work #6. R - CSV Files: Getting and Setting The Working Directory
No ratings yet
Laboratory Work #6. R - CSV Files: Getting and Setting The Working Directory
21 pages
5CS037 WS02 PandasForDataAnalysis
No ratings yet
5CS037 WS02 PandasForDataAnalysis
30 pages
What's New in Global Mapper v11.00
No ratings yet
What's New in Global Mapper v11.00
85 pages
Introduction To R Programming 1691124649
No ratings yet
Introduction To R Programming 1691124649
79 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Otrs Itsm Book
No ratings yet
Otrs Itsm Book
86 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Python Pandas Module - Introduction-07-11-2023
No ratings yet
Python Pandas Module - Introduction-07-11-2023
84 pages
Pandas
No ratings yet
Pandas
11 pages
Chapter 5 Rev1
No ratings yet
Chapter 5 Rev1
17 pages
ARI.063 Data Strategy L2 - Shared Master Data Source Details
No ratings yet
ARI.063 Data Strategy L2 - Shared Master Data Source Details
68 pages
Pandas
No ratings yet
Pandas
13 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Bash Cheat Sheet by Tomi Mester
No ratings yet
Bash Cheat Sheet by Tomi Mester
19 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
How To Use Windows Commands in BODS
No ratings yet
How To Use Windows Commands in BODS
3 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Python Pandas For Data Analytics
No ratings yet
Python Pandas For Data Analytics
7 pages
Data Handling Using Pandas - Revision Notes
No ratings yet
Data Handling Using Pandas - Revision Notes
6 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
OBR Configuration Guide
No ratings yet
OBR Configuration Guide
308 pages
Computer Programmers
No ratings yet
Computer Programmers
20 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
CiscoWorks LAN Management Solution 4.0
No ratings yet
CiscoWorks LAN Management Solution 4.0
5 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
Handout Pandas
No ratings yet
Handout Pandas
33 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Cambridge International Advanced Subsidiary and Advanced Level
6 pages
Pig & Hive Questionaire
No ratings yet
Pig & Hive Questionaire
2 pages
4IT1 - 02 - Notes For Centres November 2020
No ratings yet
4IT1 - 02 - Notes For Centres November 2020
9 pages
Pandas
No ratings yet
Pandas
3 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Dsbda Ass1
No ratings yet
Dsbda Ass1
61 pages
12th Computer Science Expected Public Questions
No ratings yet
12th Computer Science Expected Public Questions
15 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Lab 9
No ratings yet
Lab 9
9 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
DATA FILE HANDLING Chapter Clearance
No ratings yet
DATA FILE HANDLING Chapter Clearance
21 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
Grade 10 Unit 4 - Data Science
No ratings yet
Grade 10 Unit 4 - Data Science
14 pages
X1 GNSS Receiver User Manual
No ratings yet
X1 GNSS Receiver User Manual
55 pages
Cambridge International General Certificate of Secondary Education
No ratings yet
Cambridge International General Certificate of Secondary Education
12 pages
Pandas
No ratings yet
Pandas
13 pages
Labview To Excel
No ratings yet
Labview To Excel
9 pages
Business Intelligence Data Analyst - Career Path
No ratings yet
Business Intelligence Data Analyst - Career Path
27 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Pandas
No ratings yet
Pandas
163 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas
No ratings yet
Pandas
36 pages
Pandas
No ratings yet
Pandas
7 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages

Pandas

Uploaded by

Pandas

Uploaded by

PANDAS

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

import pandas mydataset = { ‘cars’: [“BMW”, “Volvo”, “Ford”], ‘passings’: [3, 7, 2] }

Pandas is usually imported under the pd alias.

### Syntax : import pandas as pd

For Checking Pandas Version

The version string is stored under __version__ attribute.

Example : Return the second value of the Series:

Create you own labels

Example : Return the value of “y”:

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or

To create a DataFrame from different sources of data or other Python

Syntax of DataFrame() class :

Example Create an Empty DataFrame

To create an empty DataFrame, pass no arguments to pandas.DataFrame() class.

Example : Create a simple Pandas DataFrame

Example Create a simple Pandas DataFrame with Lables - Index

Create Pandas DataFrame from List of Lists

Example : Create DataFrame from List of Lists

Create Pandas DataFrame from Python Dictionary

Example : Create DataFrame from Dictionary

Pandas Read CSV

to_string() is used to print the entire DataFrame.

Names of the columns or the names of the indexes :

Info About the Data:

An important method on pandas objects is reindex, which means to create a

Dropping Entries from an Axis

Sorting and Ranking

To sort a Series by its values, use its sort_values method:

DataFrame can compute ranks over the rows or the columns:

If we want to select a subset of rows from a DataFrame, we can do so by indicating

Pandas - Cleaning Data

How many null values

Replace Empty Values :

cells. fillna() method :

The fillna() method allows us to replace

empty cells with a value:

Replace Only For Specified Columns

You might also like

The version string is stored under version attribute.