0% found this document useful (0 votes)

25 views6 pages

Pandas - Cheat - Sheet

The document provides an overview of key Pandas concepts for working with Series and DataFrames. It describes how Series are 1-dimensional labeled arrays that can have names, indexes, and different data types. DataFrames are 2-dimensional tables that contain rows and columns, with each column being a Series. The document outlines many common Series and DataFrame methods for data manipulation, such as filtering, sorting, descriptive statistics, and transformations.

Uploaded by

Soumabrata Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

Pandas - Cheat - Sheet

Uploaded by

Soumabrata Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Pandas: Python Package | Fast, flexible and powerful data manipulation tool

Series: 1 Dimensional labelled array of any data type

>Bros = pd.Series(data = [‘Soum’ , ’Ayush’], index = [1,2], dtype = ’string’) # Using List for creating Series
Note: When Index are not specified Pandas assign a sequence of integers starting from zero. This is a built-in Pandas
object called ‘RangeIndex’

Method & Attributes:

Method – A function bound to the object (Example: List1.equals(List2)
Attribute – A variable bound to the object (Example: List1.dtype, Bros.name = ‘Cool Dudes!’)
Note: Series can have names and those names can be used as column names in DataFrame. Similarly, the Series Index
can also have its own name

Add_prefix() & Add_suffix() Method: (Doesn’t modify the original Series labels)
>Alphabet.add_prefix(‘_prefix_label’)
>Alphabet.add_suffix(‘_suffix_label’)

Boolean mask and the .loc Indexer:

Boolean Mask - Used to index select items at scale | work with [] and .loc | need to be of same length as Series
>Alphabet.loc[[True if i%2==0 else False for i in range(26)]]

.loc – The prototypical way of doing label based extraction

.iloc – Way of doing integer / location based extraction
Example:
>Alphabet = pd.Series(data = [‘A’,’B’,’C’,’D’], index = [1,2,3,4], dtype=’string’)
>Alphabet.loc[1] ---> Output: A
>Alphabet.iloc[1] ---> Output: B

Selecting with .get():

-‘get()’ works for both ‘index based on labels’ as well as ‘location based extraction’
>Alphabet.get(‘label_A’) ---> Output: A
>Alphabet.get(0) ---> Output: A
>Alphabet.get(‘Anything’, default=’Couldn’t find anything!’) ---> Output: Couldn’t find anything!
Note: default of .get() is set to ‘None’

Series Methods and Handling:

pd.read_csv – Read a comma-separated values (csv) file into the DataFrame
>Alcohol = pd.read_csv(‘https://andybek.com/drinks’, usecols = [‘country’,’wine_servings’], index_col = [‘country’])
>type(Alcohol) ---> pandas.core.frame.DataFrame

If the parsed data only contains one column then setting squeeze parameter as True will return a Series
>Alcohol = pd.read_csv(‘https://andybek.com/drinks’, usecols = [‘country’,’wine_servings’], index_col = [‘country’],
squeeze = True)
>type(Alcohol) ---> pandas.core.series.Series
Size & Shape:
.size - number if elements in the Series
.shape - tuple of the dimension for a Series

Unique values and Series Monotonicity:

.is_unique - Simply checks if the Series contains a sequence of Unique values (True/Values)
.nunique() - Gives exact counts of unique values in a Series (excluding NA) (#)
.nunique(dropna = False) - Gives counts of unique values in a Series (including NA)
.unique() - Gives a list of unique values in the Series
.is_monotonic / .is_monotonic_increasing - Checks if Series is increasing / stagnation
.is_monotonic_decreasing - Checks if Series is decreasing

The count() Method:

Series.count() - Return number of non-NA / Null observations in the Series
.isna().sum() - Return uber of Null elements in the Series

Dropping and Filling NAs:

>Alcohol.fillna(100, inplace = False)
>Alcohol.dropna(inplace = False) - Exclude NAs from the Series
Note: Both methods return a copy of the Series unless the ‘inplace’ parameter is set to True

Descriptive Statistics:
>Alcohol.sum() - Excludes NA’s
>Alcohol.mean()
>Alcohol.median()
>Alcohol.quantile(q=0.5)
IQR (Interquantile Range) -> Alcohol.quantile(0.75) – Alcohol.qunatile(0.25)
>Alcohol.min()
>Alcohol.max()
>Alcohol.std()
>Alcohol.var()
Note: Alcohol.std()**2 = Alcohol.var() | Mode - Item with highest frequency

Describe() Method:
Gives an overall statistical description of the dataset
>Alcohol.describe(percentile = [0.79,0.19], include = float, exclude = object)

Value_counts():
A sorted series containing unique values and their counts
>Alcohol.value_counts(sort = True, ascending = False, dropna = True, normalize = False)
Note: Normalize provides relative frequency

idxmax() & idxmin():

idxmax() - Returns the label of the row with minimum value
idxmin() - Returns the label of the row with minimum value
Note: If ‘multiple’ min/max values are present then only the first label is returned
nlargest() & nsmallest():
>Alcohol.nlargest(n = 10) - Shows largest 10 alcohol consuming countries / values
>Alcohol.nsmallest(n = 10) - Shows smallest 10 alcohol consuming countries / values

Sorting with sort_values():

Returns a new Series, sorted by values
>Alcohol.sort_values(ascending = False, na_position = ‘last’, kind = ‘quicksort’, inplace = False)
Note: Have other sorting options such as ‘mergesort’ or ‘heapsort’, default is ‘quicksort’

Sorting wit sort_index():

Returns a new Series, sorted by index labels
>Alcochol.sort_index(ascending = True, na_postion = ‘last’, inplace = False)
Note: NaN won’t show up in the sorting if not present in the index label

Series Arithmetic and fill_value():

Use to add / subtract / divide / multiply two Series
>Alcohol.add(More_Drinks, fill_value = 0)
>Alcohol.subtract(More_Drinks, fill_value = 0)
>Alcohol.divide(More_Drinks, fill_value = 1)
>Alcohol.multiply(More_Drinks, fill_value = 1)
Note: ‘fill_value’ is set to 0 / 1 so that we don’t end up losing data due to Series misalignment

Calculating Variance and Standard Deviation:

Variance - The average of squared differences from the mean

>(Alcohol.subtract(Alcohol.mean())**2).sum()/(Alcohol.count() -1)
Note: Standard Deviation is square of Variance

Cumulative Operations:
>Alcohol.cumsum(skipna =True) - Calculate a progressive / cumulative sum (of the values preceding in the Series)
>Alcohol.cumprod()
>Alcohol.cummin()
>Alcohol.cummax()
Note: ‘NaN’ are skipped (i.e., ‘skipna’ parameter is set to ‘True’ by default) as sum of any number with ‘NaN’ is ‘NaN’)

Pairwise Difference with Diff():

Used to calculate the discrete difference for pair of elements in a Series
>Ser.diff(periods = 1)
Series Iteration:
>for i in Alcohol:
print(i)
-Prints value / alcohol consumption without labels

>for i in Alcohol.index:
print(i)
-Prints labels without values

>for i in Alcohol.index:
print(i, Alcohol[i])
-Prints both label and value

>for i in Alcohol.items(): / for i in Alcohol.iteritems():

print(i)
-Returns a lazy evaluated iterable of tuple | Behind the scene used ‘zip’ method

Filtering: filter(), where() and mask()

>Alcohol.filter(regex = ‘^V’) - Will filter countries starting with letter ‘V’
>Alcohol.filter(like = ‘stan’) - Will filter countries having ‘stan’ in them
Note: Filtering is done along index labels and not on values on using ‘filter()’ function. For filtering for values, use square
brackets (Example: Alcohol[Alcohol > 200])

pandas.Series.where() - Replace values with ‘NaN / set value’ where the condition is False
>Alcohol.where(lambda x: x > 200, np.nan).dropna()

pandas.Series.mask() - Replace values with ‘NaN / set value’ where the condition is True
>Alcohol.mask(lambda x: x > 200, np.nan).dropna()

Note: Default replacing value is ‘NaN’

Transforming with update(), apply() & map():

update() - Update series with newly added values
>Alcohol.update(pd.Series(data = [200,20], index = [‘Albania’, ‘Algeria’]))

apply() - Applies a transformation to each and every element in a series

>Alcohol.apply(lambda x : x**2) | Alcohol.apply(np.square)

map() - Map values of Series according to input correspondence | Used for substituting each value in a Series with
another value, that may be derived from a function, a dictionary Series
>Ser.map({‘old value’ : ‘new value’})
>Alcohol.map(lambda x : x**2)
DataFrame:
A table of data that contains a collection of rows and columns.
Key Aspects -
1. DataFrames have two dimensions: labeled indices and columns
2. Each column in a DataFrame is a Series and each column must be of same size
3. Unlike Series, DataFrames could be heterogenous (i.e., have multiple data types)

>names = [‘Olga’, ‘Andre’, ‘Brian’]

>ages = [29, 21, 45]
>married = [False, True, True]
>df = pd.DataFrame({‘name’ : names, ‘age’ : ages, ‘married’ : married})

Need to specify index position as well as column position / name to fetch specific value from DataFrame
> df.iloc[2,0] --> Output: ‘Brian’

.dtypes: Shows columns name and their datatypes

Note: ‘.dtype’ will throw an error as DataFrame can have multiple data types

.ndim & .shape:

.ndim - Direct way of comparing dimension of the dataset
.shape - Gives shape of the dataset

More ways for DataFrame:

1. Dict of Tuples (Column-wise)
>tuple_name = tuple(names)
>tuple_ages = tuple(ages)
>tuple_married = tuple(married)
>pd.DataFrame({‘name’ : tuple_name, ‘age’ : tuple_ages, ‘married’ : tuple_married})

2. Dict of Dicts (Column-wise)

>series_name = pd.Series(names)
>series_ages = pd.Series(ages)
>series_married = pd.Series(married)
>pd.DataFrame({‘name’ : series_name, ‘age’ : series_ages, ‘married’ : series_married})

3. Dict of Series (Column-wise)

>dict_names = {k:v for k,v in enumerate(names)}
>dict_ages = {k:v for k,v in enumerate(ages)}
>dict_married = {k:v for k,v in enumerate(married)}
>pd.DataFrame({‘name’ : dict_name, ‘age’ : dict_ages, ‘married’ : dict_married})

4. List of Dicts (Row-wise)

>rowwise = [{‘name’ : name, ‘age’ : ages, ‘married’ : married} for name, ages, married in zip(names, ages, married)]

The info() Method:

Gives overview about the dataset. Only works for DataFrames
>df.info(verbose = True, max_cols = 4, memory_usage = ‘deep)
Note: verbose default is set to True. Setting it to False will remove column specific information
DataFrame:
>Nutrition = pd.DataFrame(‘http://andybek.com/pandas-nutrition’, index_col = [0])

.drop():
Removes specific rows / columns from the DataFrame
>Nutrition.drop(‘Unnamed : 0’, axis = 1)

.set_index():
Set specified column as index of the DataFrame
>Nutrition.set_index(‘Unnamed : 0’)

Entry Exam DSTI Preparation Questions
100% (2)
Entry Exam DSTI Preparation Questions
18 pages
How To Automate SSS
No ratings yet
How To Automate SSS
3 pages
Equipment SW Loader - User Manual-InG - 1.0
No ratings yet
Equipment SW Loader - User Manual-InG - 1.0
19 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Pandas
No ratings yet
Pandas
5 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Pandas
No ratings yet
Pandas
36 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Pandas
No ratings yet
Pandas
21 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Unit 2
No ratings yet
Unit 2
81 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas
No ratings yet
Pandas
63 pages
Pandas
No ratings yet
Pandas
9 pages
Data Science With Python
No ratings yet
Data Science With Python
12 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Pandas
No ratings yet
Pandas
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Ip Study
No ratings yet
Ip Study
18 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Unit 3
No ratings yet
Unit 3
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Unit 4
No ratings yet
Unit 4
27 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
Ip Lab File Python
No ratings yet
Ip Lab File Python
9 pages
Pandas
No ratings yet
Pandas
7 pages
Pandas
No ratings yet
Pandas
20 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Pandas
No ratings yet
Pandas
8 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
NumPy, SciPy, Pandas, Quandl Cheat Sheet
100% (3)
NumPy, SciPy, Pandas, Quandl Cheat Sheet
4 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
CSS Grid Layout
From Everand
CSS Grid Layout
Abdelfattah Ragab
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
How To Set NTLM Authentication On Windows System Platform and NAStorage
No ratings yet
How To Set NTLM Authentication On Windows System Platform and NAStorage
6 pages
SqUID Warehouse Robot
No ratings yet
SqUID Warehouse Robot
3 pages
LTE Quick Reference: Idle Mode Procedure
No ratings yet
LTE Quick Reference: Idle Mode Procedure
1 page
Philips Audio Tips
No ratings yet
Philips Audio Tips
29 pages
What We've Learned From A Year of Building With LLMs - Applied LLMs
No ratings yet
What We've Learned From A Year of Building With LLMs - Applied LLMs
37 pages
Tutorial 2 FSM
No ratings yet
Tutorial 2 FSM
4 pages
Learn Cryptography With Python - Python Technologies
No ratings yet
Learn Cryptography With Python - Python Technologies
94 pages
Cid 2 Code
No ratings yet
Cid 2 Code
389 pages
Isochrone Mode Reliable Control Over High-Speed Events
No ratings yet
Isochrone Mode Reliable Control Over High-Speed Events
6 pages
Group Exhibition Proposal
No ratings yet
Group Exhibition Proposal
8 pages
6es7132 6BH00 0aa0
No ratings yet
6es7132 6BH00 0aa0
4 pages
Exam Timeable 2024 1st
No ratings yet
Exam Timeable 2024 1st
6 pages
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
100% (1)
Table of Specifications (Tos) Epp 6 - Ict and Entrepreneurship - Quarter 1
1 page
Learning Three - Js - The JavaScript 3D Library For WebGL - Second Edition - Sample Chapter
100% (1)
Learning Three - Js - The JavaScript 3D Library For WebGL - Second Edition - Sample Chapter
34 pages
Lecture 3 - 2024
No ratings yet
Lecture 3 - 2024
30 pages
Nexus 1500 Meter Brochure - E154718160
No ratings yet
Nexus 1500 Meter Brochure - E154718160
12 pages
AP Computer Science Principles Session3 MCQ
No ratings yet
AP Computer Science Principles Session3 MCQ
4 pages
8-Data Management
No ratings yet
8-Data Management
6 pages
Model School Management System
No ratings yet
Model School Management System
6 pages
I Get To Love You Ruelle Sheet Music For Piano, Violin (Mixed Duet)
No ratings yet
I Get To Love You Ruelle Sheet Music For Piano, Violin (Mixed Duet)
1 page
CS Project File
No ratings yet
CS Project File
12 pages
Computer & Generations & AI
No ratings yet
Computer & Generations & AI
12 pages
Parallel and Distributed Transaction Processing: Practice Exercises
No ratings yet
Parallel and Distributed Transaction Processing: Practice Exercises
4 pages
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
No ratings yet
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
40 pages
ALV Reports
No ratings yet
ALV Reports
70 pages
How To Get PN Certificate 2922n V110 Dec21
No ratings yet
How To Get PN Certificate 2922n V110 Dec21
40 pages
Quiz
No ratings yet
Quiz
5 pages

Pandas - Cheat - Sheet

Uploaded by

Pandas - Cheat - Sheet

Uploaded by

Pandas: Python Package | Fast, flexible and powerful data manipulation tool

Series: 1 Dimensional labelled array of any data type

Method & Attributes:

Boolean mask and the .loc Indexer:

.loc – The prototypical way of doing label based extraction

Selecting with .get():

Series Methods and Handling:

Unique values and Series Monotonicity:

The count() Method:

Dropping and Filling NAs:

idxmax() & idxmin():

Sorting with sort_values():

Sorting wit sort_index():

Series Arithmetic and fill_value():

Calculating Variance and Standard Deviation:

Pairwise Difference with Diff():

>for i in Alcohol.items(): / for i in Alcohol.iteritems():

Filtering: filter(), where() and mask()

Note: Default replacing value is ‘NaN’

Transforming with update(), apply() & map():

apply() - Applies a transformation to each and every element in a series

>names = [‘Olga’, ‘Andre’, ‘Brian’]

.dtypes: Shows columns name and their datatypes

.ndim & .shape:

More ways for DataFrame:

2. Dict of Dicts (Column-wise)

3. Dict of Series (Column-wise)

4. List of Dicts (Row-wise)

The info() Method:

You might also like