Introduction To Pandas

Lecture Slide in Programming for Data Analysis - Lession 10

Uploaded by

Nam Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views27 pages

Introduction To Pandas

Lecture Slide in Programming for Data Analysis - Lession 10

Uploaded by

Nam Trần

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Introduction to

pandas
Introduction to Pandas
• Pandas is an open-source, BSD-licensed library providing
high-performance, easy-to-use data structures and data
analysis tools for the Python programming language
Why
• Suppose we have been given an input file with employee
details like emp_name, emp_salary, emp_department, ..
• We need to find the sum of employees’ salaries for each
department (or sum of salaries department wise)
• Solution 1: Only Python code
• Solution 2: Using Pandas
1

2
Pandas data
structure
• Series:
• Pandas series is a one-
dimensional array with
index labels, more in
technical term, these labels
are also referred to as axis
index. So, in other words,
the pandas series is the
collection of objects in one
dimension with an axis
index
Pandas data structure
• DataFrame
• Pandas DataFrame is the data structure with two-dimensional labels,
one is axis index (row label), and “the second is axis column (column
label). We can think of it as a table in Excel/SpreadSheet where data
is organized in rows and columns
• In pandas library, we have the function to create DataFrame:
Pandas.DataFrame(data,index=index,columns=[column(s)])
Pandas data
structure
Loading data from
files
• Loading the data from CSV into
DataFrame:
• CSV is the abbreviation of
Comma Separated Values, so
typically, the CSV file contains
the common separated values
• Loading data from CSV file with a header into a
DataFrame
Loading data • The read_csv() function automatically picked up
the first line as header and assigned the
from files column names in DataFrame accordingly.
Following is the snapshot of a CSV with a
header.
• Loading data from CSV file without header into a DataFrame:
• If the CSV file doesn’t have a header, we have to pass header or
column details as an argument during calling the read_csv()
Loading data function to load the data from CSV to a data frame. The function
will be like – Pandas.read_csv(<Input_CSV_file_Path>,names =
[col1,col2…coln]). If neither file has a header nor applies the
from files values to the names keyword, then by default, pandas will
assign the first row’s value(s) as a column name(s) to the
DataFrame
Loading data from files
• Loading the data from excel file into DataFrame
• To load data from a Microsoft Excel file into a DataFrame, we have the
read_excel() function:
pd.read_excel(<excel_file_path>,sheet_name=<excel_sheet_name>)
• Loading the data from JSON file
into DataFrame:
• A JSON file format is the
commonly used file format
across the system and
platforms. It organizes the data
in key-value pairs and the
order’s list
DataFrame operations
DataFrame operations
DataFrame Information
• Using the info() method.
• This method returns the number of rows and columns, the data types of
each column, and the memory usage of the DataFrame
• df.shape
• returns a tuple representing the number
of rows and columns
• df.columns
• returns an Index object containing the
column label
• df.index
• returns an Index object containing the
index labels
• df.describe()
• returns a summary of the count, mean,
standard deviation, minimum, and
maximum of each numerical column
Indexing and selection
• .loc: This attribute is used to access a group of rows and
columns by labels. It is primarily label based, but may also be
used with a boolean array.
• # Selecting rows by label
• df.loc[1:3, ['name’,'age’]]
• .iloc: This attribute is used to access a group of rows and
columns by index. It is primarily index based, but may also be
used with a boolean array.
• # Selecting rows by index
• df.iloc[1:3, [0, 1]]
Indexing and selection
• .at: This method is used to access a single value in the
DataFrame by its label. It is faster than .loc for accessing a
single value.
• # Selecting a single value by label
• df.at[1,'name’]
• .iat: This method is used to access a single value in the
DataFrame by its index. It is faster than .iloc for accessing a
single value.
• # Selecting a single value by index
• df.iat[1, 0]
Indexing and selection
• .ix: This attribute is used to access a group of rows and
columns by either labels or index. However, it is now
deprecated and it is recommended to use .loc and .iloc instead.
• Boolean Indexing: This method is used to filter a DataFrame
based on a boolean condition. It returns a new DataFrame
containing only the rows that meet the specified condition.
• # Filtering DataFrame based on condition
• df[df['age'] > 25]
Indexing and selection
• .query(): This method is used to filter a DataFrame based on a
query expression. It is similar to Boolean indexing but it allows
for more complex queries.
• # Filtering DataFrame based on query
• df.query('age > 25 and country == "UK"')
Data cleaning and transformation
• .drop(): This method is used to remove rows or columns from a
DataFrame. You can specify the axis (0 for rows, 1 for columns) and
the labels or indexes of the rows or columns to be removed.
• # Dropping a column
• df.drop('age',axis=1)
• .fillna(): This method is used to fill missing values in a DataFrame
with a specified value or method. For example, you can use 'ffill' or
'bfill' to fill missing values with the previous or next value,
respectively.
• # Filling missing values with 0
• df.fillna(0)
Data cleaning and transformation
• .replace(): This method is used to replace specific values in a
DataFrame with a different value. You can specify the values to
be replaced and the replacement value.
• # Replacing specific values
• df.replace({'USA': 'United States’, 'UK': 'United Kingdom’})
• .rename(): This method is used to rename columns or indexes
in a DataFrame. You can specify a dictionary of old and new
names or a function to determine the new names.
• # Renaming columns
• df.rename(columns={'name': 'full_name'})
Data cleaning and transformation
• .map(): This method is used to apply a function to each element
in a column or series. You can specify a function that takes one
input and returns one output.
• # Applying function to column
• df['age'] = df['age'].map(lambda x: x*2)
• df.head()
• .apply(): This method is used to apply a function to each row or
column in a DataFrame. You can specify a function that takes a
series or DataFrame as input and returns one output.
Data explorations
• df.value_counts(): This function returns the frequency counts for
each unique value in a column.
• # Viewing the frequency counts for a column
• df['column_name'].value_counts()
Data explorations
• df.plot(): This function is used to create a variety of plots,
including line, bar, and histogram plots, for the DataFrame. You
can specify the type of plot, the x and y columns, and various
other plot options.
Data explorations
• df.corr(): This function is used to compute pairwise correlation
of columns in a DataFrame.
• # Viewing correlation between columns
• df.corr()
Merging and joining data

Heat ExchangersBasics Design Applications
100% (3)
Heat ExchangersBasics Design Applications
598 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas
No ratings yet
Pandas
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Dwarf Mast Foundation Design - 11.05.20
100% (1)
Dwarf Mast Foundation Design - 11.05.20
8 pages
7.2 - Data Frame Basics - mp4
No ratings yet
7.2 - Data Frame Basics - mp4
3 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Python For ML
No ratings yet
Python For ML
41 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
94 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Ainotes
No ratings yet
Ainotes
5 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
Environmental Engg Lab
No ratings yet
Environmental Engg Lab
72 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
26 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Unit 3
No ratings yet
Unit 3
10 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
150 - 5300 - 13 - chg18 - Consolidated (FAA Standar Perencanaan) PDF
100% (1)
150 - 5300 - 13 - chg18 - Consolidated (FAA Standar Perencanaan) PDF
306 pages
Preparation of Cement Samples Using The POLAB APM: Verification of Suitability According To ASTM C114
No ratings yet
Preparation of Cement Samples Using The POLAB APM: Verification of Suitability According To ASTM C114
2 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas
No ratings yet
Pandas
5 pages
Perturbation Methods in Fluid - Mechanics - MiltonVanDyke
No ratings yet
Perturbation Methods in Fluid - Mechanics - MiltonVanDyke
143 pages
Pandas
No ratings yet
Pandas
8 pages
POA - Tracker MACHINE LEARNING
100% (1)
POA - Tracker MACHINE LEARNING
48 pages
Physics 00111 - Exam Nov 2017 - Tharaka
No ratings yet
Physics 00111 - Exam Nov 2017 - Tharaka
3 pages
Lab Report Template
No ratings yet
Lab Report Template
2 pages
2006-2010 Syllabus of MG University Mechanical Engineering
0% (1)
2006-2010 Syllabus of MG University Mechanical Engineering
74 pages
Integrated Pollution Prevention and Control (IPPC) Reference Document On Best Available Techniques For The Textiles Industry July 2003
No ratings yet
Integrated Pollution Prevention and Control (IPPC) Reference Document On Best Available Techniques For The Textiles Industry July 2003
22 pages
Comp 128
No ratings yet
Comp 128
18 pages
12 Animation Principles
No ratings yet
12 Animation Principles
2 pages
MTH 302 Assignment # 1 JS100400153
No ratings yet
MTH 302 Assignment # 1 JS100400153
3 pages
University of Engineering and Technology Lahore (Narowal Campus) Experiment No. 9 Frequency Modulation & Demodulation Using MATLAB
No ratings yet
University of Engineering and Technology Lahore (Narowal Campus) Experiment No. 9 Frequency Modulation & Demodulation Using MATLAB
7 pages
Kea Oilfield Engineering Pte LTD
No ratings yet
Kea Oilfield Engineering Pte LTD
9 pages
B. Sc-Mathematics-cbcs
No ratings yet
B. Sc-Mathematics-cbcs
20 pages
Nanofibrillation Twin-Screw Extrusion
No ratings yet
Nanofibrillation Twin-Screw Extrusion
13 pages
Thesis Sample Appendices Straight Numbering PDF
No ratings yet
Thesis Sample Appendices Straight Numbering PDF
4 pages
BIO 11 Lect - Botany Part 2 Assignment
No ratings yet
BIO 11 Lect - Botany Part 2 Assignment
2 pages
Drawing
No ratings yet
Drawing
12 pages
Mat - Class Vii-Phase 4
No ratings yet
Mat - Class Vii-Phase 4
30 pages
623-23 - Thrust Reverser Doors
No ratings yet
623-23 - Thrust Reverser Doors
70 pages
Microprocessor Lab Manual NEE-553: Department of Electrical & Electronics Engineering
No ratings yet
Microprocessor Lab Manual NEE-553: Department of Electrical & Electronics Engineering
27 pages
Sample Paper Xii Phy.
No ratings yet
Sample Paper Xii Phy.
4 pages
1 s2.0 S1053811920307382 Main
No ratings yet
1 s2.0 S1053811920307382 Main
15 pages
Batterie Gerbeur 0
No ratings yet
Batterie Gerbeur 0
6 pages
F.4 Mathematics BTE (3) More About Polynomial
No ratings yet
F.4 Mathematics BTE (3) More About Polynomial
2 pages
DataSheet ANT-100
No ratings yet
DataSheet ANT-100
2 pages
Sum3 Trends
No ratings yet
Sum3 Trends
2 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)

Introduction To Pandas

Uploaded by

Introduction To Pandas

Uploaded by

Introduction to

You might also like