0% found this document useful (0 votes)

111 views9 pages

Pandas

Pandas pdf

Uploaded by

Allinagaram Ajay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views9 pages

Pandas

Pandas pdf

Uploaded by

Allinagaram Ajay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Introduction
Pandas library of python is very useful for the manipulation of mathematical, data and is
widely used in the field of machine learning for data analysis.

Why Pandas
• Intrinsic Data alignment.
• Data Operation Functions
• Functions for handling missing data
• Data standardization functions
• Data Structures handling major use cases.
Pandas Features
• Powerful data Structure
• Fast and efficient data wrangling
• Easy data aggregation and transformation
• Tools for reading/ Writing data
• Intelligent and automated data alignment
• High performance merging and joining of data sets

2. Technical Setup

I). Install Anaconda (https://www.anaconda.com/)

II). Install Pandas Package

!pip install pandas

Import pandas as pd

III). Use Jupyter notebook or Google Colab

IV). Downloading the dataset ( e.g. Kaggle,..)

3. Series and DataFrames

Series:
A series is a sequence of data. A series is a one-dimensional array of indexed data. However,
a Series does not have a column name, it only has one overall name. Use Series () function.
• One-dimensional labeled array.
• Support multiple data types
Syntax:
S = pd.Series(data, index = [index])

Accessing a single Series:

DataFrame[‘SeriesName’]
DataFrame[“SeriesName”]
DataFrame. SeriesName - It does not work if space in SeriesName

Accessing Multiple Series:

DataFrame[[‘SeriesName1’,’SeriesName2’]]

#pd.Series ([1, 2, 3, 4, 5])

#pd.Series ([3000, 3500, 4000], index=['2021 price', '2020 price', '2019 price'], name='Index
A')

DataFrames:
Two-Dimensional data structure, like two-dimensional array, or a table with rows and columns.
Use DataFrame () function.
• Two-dimensional labeled array.
• Support multiple data types
• Input cab be a Series
• Input can be another DataFrame.
type (DataFrame) : pandas.core.frame. DataFrame (check dataframe object)

#df = pd. DataFrame ({'quantity' : [10,12], 'price': [1200,1400]})

Note: DataFrame entries are not for only integers also dataframe whose values are strings.
#df=pd.DataFrame({'Nepal' : ['nepal is','beautiful', 'country.'], 'Kathmandu' : ['Kathmand
u is','capital','of nepal']})

Index in DataFrame: The list of row labels used in a DataFrame is known as an Index.
#df=pd.DataFrame({'Nepal' : ['nepal is','beautiful', 'country.'], 'Kathmandu' : ['Kathmand
u is','capital','of nepal']},
index=['A', 'B','c'])

df = pd.read_csv(“file location”)

Parameters Descriptions DataType

Filepath File location str
skiprows For skip rows Int,list, callabel
Usecols List of column no or name, Callable or list
if callable uses columns
where the name passed to
the callable result in TRUE.
Index Indexing Int, str
Skip_blank_lines TRUE to skip blank lines bool
rather than reading NAN
values.Default is TRUE
Sep This sep parameter tells the str
interpreter, which delimiter
is used in our dataset or in
Layman’s term, how the
data items are separated in
our CSV file.
Delimiter Alish for sep str
Name List of columns name to use Arrary like

4. Data Input and Validation

Data Input
Functions Description
read_csv () Read CSV file
read_json() Read JSON file
read_htm() Read HTML file
read_xml() Read XML file
read_sql() Read SQL file
read_excel() Read Excel file
to_csv(“file name”) Save DataFrame in CSV file format.

Shape:
The Shape attribute returns a tuple. Representing rows and columns the dimensionality of the
DataFrame.
#DataFrame.shape
E.g. df.shape
Out: (rows, columns)
#df.shape[0]
Out: display rows
#df.shape[1]
Out: display columns
head () and tail() :
#DataFrame.head(n)
Return first n rows of Dataframe.
note: if you not pass any number, display first five rows.
#DataFrame.tail(n)
Return last N rows of Dataframe.
note: if you not pass any number, display last five rows.
info()
info() provides a summary of the data frame, including the number of entries, the data type and
the number of non-null entries for each series in the data frame.
#DataFrame.info ()

5. Basic Analysis
value_counts ():
value_counts () method is very useful in pandas. It returns a series object, counting all the
unique values in DataFrame. Returns a object containing counts of unique values.
By default, results are in descending order so first element is most frequently occurring
element.
#Series.value_counts (normalize = False, sort=True, ascending=False, bins=None,
drope=True)
→ you can use above parameters as your needs.
sort_values():
#Series.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort',na_position='last’)
➔ sort values along either axis.
#DataFrame.sort_values(by, axis=0, ascending= True, inplace=False,
kind='quicksort',na_position='last)
Boolean Indexing:
➔ Boolean vectors can be used to filter data.
Operator Symbol
AND &
OR |
NOT ~
EQUAL-TO ==

➔ Multiple Boolean conditions must be grouped using brackets.

Eg. df[(df.DataFrame.value == ‘value’) & (df.DataFrame.value == ‘value’)]
String Handling:
➔ Available to every Series using the str attribute.

➔ Series.str – access values of series as strings and apply several methos to it

# Series.str.contains(‘string value’)
#Series.str.startswith()
# Series.str.isnumeric()
Indexing:
➔ The index object is an immutable array.

➔ Indexing allows you to access a row or column using a label

#type(DataFrame.index)
#DataFrame.index[ 20 ]
set_index():
#DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=Flase)
➔ Set the Dataframe using one or more columns
#set_index(keys, inplace=True
reset_index():
#DataFrame.reset_index(level=None, drop=False, inplace=False,..)
➔ Returns a DataFrame with default (integer-based) index.
#DataFrame.reset_index(inplace =True)
reset_index():
#DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, … by=None)
➔ Sort objects by a label along the axis.

loc[]:
➔ DataFrame.loc [] / Dataframe.Series.loc []

➔ A label-based indexer for selection by label

➔ loc[] will raise a KeyError when the items are not found

iloc[]:
➔ DataFrame.iolc[]

➔ iloc[] is primarily integer position based (from 0 to length-1 of the axis)

➔ Allows traditional Pythonic slicing.

6. GroupBy

Groupby is one of the most important functionalities available in Pandas. Groupby

does three things.

• Split DataFrame into group Based on some criteria.

• Apply a function to each group independently
• Combine the results into a DataFrame.
• Return a groupby object
#pandas.DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True,
group_keys=True, squeeze=False, **kwargs)
Groupby Object
Group Key1 -> df1 Merge
Key2 -> df2
-> New DataFrame / New Series
DataFrame ->

Keyn -> dfn

Iterate through a group:
• for key,group in DataFrame.groupby():
print(key)
print(group)
type(group_value)
Groupby Computations:
# GroupBy.size()
# GroupBy.count()
# GroupBy.first(),GroupBy.last()
# GroupBy.head(), GroupBy.tail()
# GroupBy.mean()
# GroupBy.max(), GroupBy.min()
agg() -> multiple statistics in one calculation per group
# DataFrame.groupby(agg([…]))

7. Reshaping

stack():

#DataFrame.stack(level= -1, dropna=True)

➔ Returns a DataFrame or Series.

Pivot a level of the column labels, returning a DataFrame or Series, with a new innermost level of
row labels.

Unstack():

#DataFrame.unstack(level=-1, full_value=None)
➔ Pivot a level of the index labels, returning a DataFrame having a new level of columns
labels.
➔ If the index is not a multi-Index, the output will be a Series-the level involved will

automatically get sorted

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas
No ratings yet
Pandas
41 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
Pandas
No ratings yet
Pandas
30 pages
Pandas For Data Science
No ratings yet
Pandas For Data Science
42 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Pandas
No ratings yet
Pandas
14 pages
Pandas
No ratings yet
Pandas
27 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
Pandas
No ratings yet
Pandas
8 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
HTML-Notes 1
No ratings yet
HTML-Notes 1
27 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Complete HTML XML JS CSS WT-Course-Material
No ratings yet
Complete HTML XML JS CSS WT-Course-Material
174 pages
Actc HTML Notes
No ratings yet
Actc HTML Notes
48 pages
HTML Note Imp HTML
No ratings yet
HTML Note Imp HTML
165 pages
HTML
No ratings yet
HTML
12 pages
HTML Notes
No ratings yet
HTML Notes
22 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
The Racers Life
No ratings yet
The Racers Life
74 pages
HTML Tutorial
No ratings yet
HTML Tutorial
42 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
Pandas
No ratings yet
Pandas
4 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Pandas
No ratings yet
Pandas
86 pages
UN Data Analysis Pandas Matplotlib
No ratings yet
UN Data Analysis Pandas Matplotlib
28 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Chapter - 6 Dictionary
100% (2)
Chapter - 6 Dictionary
25 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
HTML Cheat Sheet - Copie
No ratings yet
HTML Cheat Sheet - Copie
9 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
WebDevelopment BackEnd
No ratings yet
WebDevelopment BackEnd
22 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Data Frame
No ratings yet
Data Frame
10 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
No ratings yet
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
20 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Unit 4 Final
No ratings yet
Unit 4 Final
100 pages
Pandas
No ratings yet
Pandas
25 pages
Chatgpt: Generated The Idea For This Cover
100% (1)
Chatgpt: Generated The Idea For This Cover
100 pages
Implementation of Sieve of Eratosthenes Using Actors, JRuby and Akka - Gist
No ratings yet
Implementation of Sieve of Eratosthenes Using Actors, JRuby and Akka - Gist
4 pages
Algorithm Analysis Module 3 Important Topics
No ratings yet
Algorithm Analysis Module 3 Important Topics
51 pages
Data Structure and Algorithms Unit-2 Strings
No ratings yet
Data Structure and Algorithms Unit-2 Strings
20 pages
SlickEdit Slick-C - Macro - Programming - Guide - Book
100% (1)
SlickEdit Slick-C - Macro - Programming - Guide - Book
137 pages
Dot NET MAUI Community Toolkit Succinctly
No ratings yet
Dot NET MAUI Community Toolkit Succinctly
120 pages
Unit 10 String Handling
No ratings yet
Unit 10 String Handling
36 pages
5.lab Prog2 (DFA Simulation)
No ratings yet
5.lab Prog2 (DFA Simulation)
2 pages
AI Practical Exam
No ratings yet
AI Practical Exam
8 pages
A Little Cup of Java-Coffee: CS404: CAI Class Presentation - 01 By: Leo Sep, 2002
No ratings yet
A Little Cup of Java-Coffee: CS404: CAI Class Presentation - 01 By: Leo Sep, 2002
29 pages
Computation Geometry Algorithms Library From CGAL
No ratings yet
Computation Geometry Algorithms Library From CGAL
27 pages
DATA (1) Review Quiz - Attempt Review - Home
No ratings yet
DATA (1) Review Quiz - Attempt Review - Home
6 pages
cs8582 Viva Voce Questions
No ratings yet
cs8582 Viva Voce Questions
6 pages
PWG
No ratings yet
PWG
121 pages
Unit Iii Object Oriented Analysis
No ratings yet
Unit Iii Object Oriented Analysis
108 pages
C++ Starting Programs of Book
No ratings yet
C++ Starting Programs of Book
5 pages
Arqc Arpc
No ratings yet
Arqc Arpc
5 pages
Garbage Collector
No ratings yet
Garbage Collector
4 pages
Rdbms
No ratings yet
Rdbms
88 pages
PPT01-Introduction To Algorithm and Programming
No ratings yet
PPT01-Introduction To Algorithm and Programming
38 pages
09
No ratings yet
09
75 pages
May Jun 2023
No ratings yet
May Jun 2023
2 pages
Data - Pump (Exp - Imp)
No ratings yet
Data - Pump (Exp - Imp)
3 pages
Multithreading: Object Oriented Programming 1
No ratings yet
Multithreading: Object Oriented Programming 1
102 pages
UNIX Shell Scripting Basics
No ratings yet
UNIX Shell Scripting Basics
200 pages
Decaying Window
No ratings yet
Decaying Window
16 pages
HLTAPI For Perl (JT) Installation Guide
No ratings yet
HLTAPI For Perl (JT) Installation Guide
5 pages
Tutorial Sheets of Data Structure
No ratings yet
Tutorial Sheets of Data Structure
10 pages
Infy TQ Python Assignment-4
No ratings yet
Infy TQ Python Assignment-4
4 pages
Z180ZDS0100ZCC: User Manual UM004300-COR0200
No ratings yet
Z180ZDS0100ZCC: User Manual UM004300-COR0200
164 pages

Pandas

Uploaded by

Pandas

Uploaded by

1.

I). Install Anaconda (https://www.anaconda.com/)

II). Install Pandas Package

!pip install pandas

III). Use Jupyter notebook or Google Colab

IV). Downloading the dataset ( e.g. Kaggle,..)

3. Series and DataFrames

Accessing a single Series:

Accessing Multiple Series:

#pd.Series ([1, 2, 3, 4, 5])

#df = pd. DataFrame ({'quantity' : [10,12], 'price': [1200,1400]})

Parameters Descriptions DataType

4. Data Input and Validation

➔ Multiple Boolean conditions must be grouped using brackets.

➔ Series.str – access values of series as strings and apply several methos to it

➔ Indexing allows you to access a row or column using a label

➔ A label-based indexer for selection by label

➔ iloc[] is primarily integer position based (from 0 to length-1 of the axis)

➔ Allows traditional Pythonic slicing.

Groupby is one of the most important functionalities available in Pandas. Groupby

does three things.

• Split DataFrame into group Based on some criteria.

Keyn -> dfn

#DataFrame.stack(level= -1, dropna=True)

➔ Returns a DataFrame or Series.

automatically get sorted

You might also like