0% found this document useful (0 votes)

23 views14 pages

Lab-3 Pandas Library

The document provides an introduction to the Pandas library in Python, focusing on its data structures, specifically Series and DataFrames, and their functionalities. It covers installation, data manipulation techniques such as indexing, conditional selection, and merging, as well as practical tasks for hands-on learning. The content is aimed at helping users effectively utilize Pandas for data science applications.

Uploaded by

Charlie William

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views14 pages

Lab-3 Pandas Library

Uploaded by

Charlie William

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Introduction to Pandas Library in Python

Introduction to Pandas

1. Objective

• Getting started with pandas

• Introduction to panda’s data structure
• NumPy vs pandas
• Series vs Data frames
• Creating, Reading and Writing
• Indexing, Selecting and assigning data to data frames
• Grouping and sorting
• Handling missing values
• Use fundamental Python data frames for data science

2. Getting Started with pandas:

In the previous lab, we dove into detail on NumPy and its functionalities, which provides
efficient storage and manipulation of dense typed arrays in Python. Here we’ll build on this
knowledge by looking in detail at the data structures provided by the Pandas library. Pandas is a
package built on top of NumPy, and provides an efficient implementation of a Series and
DataFrame. In this lab, we will focus on the mechanics of using Series, DataFrame, and related
structures effectively.

Just as we generally import NumPy under the alias np, we will import Pandas under the alias pd:
import pandas as pd

This important convention will be used through the lab.

2.1 Introduction to pandas Data Structures

At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy
structured arrays in which the rows and columns are identified with labels rather than simple
integer indices. To get started with pandas, you will need to get comfortable with its two workhorse
data structures: Series and DataFrame. While they are not a universal solution for every problem,
they provide a solid, easy-to-use basis for most applications.

Installing Pandas library

pip install pandas # for regular python environments
conda install pandas # for anaconda environment
!pip install pandas # for jupytor notebook (applicable in our case)

By: Faizan Irshad Page 2

Introduction to Pandas

2.2 Series
❖ A Pandas Series is a one-dimensional array of indexed data. It can be created from a
list or array as follows:
Example:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
print(data)

Series wraps both a sequence of values and a sequence of indices, which we can access with the
values and index attributes.
print(data.values) //output: [0.25 0.5 0.75 1. ]

Like with a NumPy array, data can be accessed by the associated index via the familiar
Python square-bracket notation:
print(data[1]) //output: 0.5

For example, if we wish, we can use strings as an index:

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
print(data)

output:
and the item access works as expected: data[‘b’] = ?

Important: We can even use noncontiguous or nonsequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2, 5, 3, 7])

By: Faizan Irshad Page 3

Introduction to Pandas

2.3 DataFrame
A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered
collection of columns, each of which can be a different value type (numeric, string,
Boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a
dict of Series.

2.4 Reading Data files

Being able to create a DataFrame or Series by hand is handy. But, most of the time, we
won't actually be creating our own data by hand. Instead, we'll be working with data that
already exists.

Data can be stored in any of a number of different forms and formats. By far the most
basic of these is the humble CSV file. CSV file is a table of values separated by commas.
Hence the name: "Comma-Separated Values", or CSV.

Loading file from C/users folder

data = pd.read_csv("covid19.csv")

load file using google drive if using colab

from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
dataset = pd.read_csv('/content/drive/MyDrive/covid19.csv')

Loading from any other specific location on computer, use

backslashes for address
data = pd.read_csv("C:\\Users\\ABPKSUP\\Desktop\\AI Enabled Data
Analytics -LUMS\\Class 4, Pandas Library\\covid19.csv")

We can use the shape attribute to check how large the resulting DataFrame is:
print(data.shape)

We can examine the contents of the resultant DataFrame using the head() command,
which grabs the first five rows:

print(data.head())

By: Faizan Irshad Page 4

Introduction to Pandas

3. Access in Data Frame

In Python, we can access the property of an object by accessing it as an attribute. A book object, for
example, might have a title property, which we can access by calling book Title. Columns in a pandas
DataFrame work in much the same way.

print(data.Country)
Moreover, we can also access data frame specific value as:
print(data['Country’][2])

4. Indexing in pandas
The indexing operator and attribute selection are nice because they work just like they do in the rest
of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas
has its own accessor operators, loc and iloc. For more advanced operations, these are the ones
you're supposed to be using:

4.1 Index-based selection

Pandas indexing works in one of two paradigms. The first is index-based selection:
selecting data based on its numerical position in the data. iloc follows this paradigm.

To select the first row of data in a DataFrame, we may use the following:
print(data.iloc[0])

To get a column with iloc, we can do the following:

print(data.iloc[:,0])

By: Faizan Irshad Page 5

Introduction to Pandas

To select the column from just the first, second, and third row, we would do:

print(data.iloc[:3,0])

It's also possible to pass a list: print(data.iloc[[0,2,5],0])

Finally, it's worth knowing that negative numbers can be used in selection. This will start counting
forwards from the end of the values. So, for example here are the last five elements of the dataset.

print(data.iloc[-5:])

By: Faizan Irshad Page 6

Introduction to Pandas

4.2 Label-based selection

The second paradigm for attribute selection is the one followed by the loc operator: label-
based selection. In this paradigm, it's the data index value, not its position, which matters.

loc is label-based, means that you have to specify rows and columns based on their row
and column labels. iloc is integer position-based, so you have to specify rows and columns by
their integer position values (0-based integer position). Since your dataset usually has
meaningful indices, it's usually easier to do things using loc instead. For example, here's one
operation that's much easier using loc.

print(data.loc[:, ['ObservationDate', 'Province/State', 'Country']])

5. Conditional Selection

So far we've been indexing various strides of data, using structural properties of the DataFrame
itself. To do interesting things with the data, however, we often need to ask questions based on
conditions.

For example, suppose that we're interested specifically in Anhui province cases

print(data.loc[data['Province/State'] == 'Anhui'])

/ / total 494 records

We can use the OR (|) to bring the two questions together( for AND you can use &):

print(data.loc[(data['Province/State'] == 'Anhui') | (data['Deaths'] ==

0)])

total 32,893 records found.

Pandas comes with a few built-in conditional selectors, two of which we will highlight here.

The first is isin which lets you select data whose value "is in" a list of values. For example case in
in Peru or Spain.

print(data.loc[(data.Country.isin(['Peru','Spain']))])

The second is isnull or isna (and its companion notnull). These methods let you highlight
values which are (or are not) empty (NaN).

print(data.loc[(data.Country.notnull())])

print(data.loc[(data['Province/State'].notnull())])

By: Faizan Irshad Page 7

Introduction to Pandas

To find records with null values

print(data.loc[(data['Province/State'].isnull())])
print(data.loc[(data['Province/State'].isna())])

By: Faizan Irshad Page 8

Introduction to Pandas

Assigning data:

Going the other way, assigning data to a DataFrame is easy.

Adding a New Column to the data frame with selected values..

data['New Column'] = 'dummy_values'

print(data['New Column'])

6. Functions and maps

To see a list of unique values we can use the unique() function:

print(data.Country.unique())

More functions which are available with pandas (here pd means pandas
library while df is the name of our dataset like we used name ‘data’ above.
It can be any name that we assign to our data)

By: Faizan Irshad Page 9

Introduction to Pandas

By: Faizan Irshad Page 10

Introduction to Pandas

Important:

Writing to files

df.describe() | Summary statistics for numerical columns

df.mean() | Returns the mean of all columns
df.corr() | Returns the correlation between columns in a DataFrame
df.count() | Returns the number of non-null values in each DataFrame column
df.max() | Returns the highest value in each column
df.min() | Returns the lowest value in each column
df.median() | Returns the median of each column
df.std() | Returns the standard deviation of each column
pd.to_datetime | To convert column format to date time

Combining:

Concatenate:
The concat function is used to concatenate two or more DataFrames along a particular axis (either rows or
columns).

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenate along rows (axis=0)

result = pd.concat([df1, df2], axis=0)
print(result)

Output
A B
0 1 3
1 2 4
0 5 7
1 6 8

By: Faizan Irshad Page 11

Introduction to Pandas

# Concatenate along columns

result2 = pd.concat([df1, df2], axis=1)

print(result2)
OUTPUT
A B A B
0 1 3 5 7
1 2 4 6 8

Merge:
The merge function is used to merge two DataFrames based on a common column or index.

df1 = pd.DataFrame({'key': ['A', 'B'], 'value': [1, 2]})

df2 = pd.DataFrame({'key': ['A', 'B'], 'value': [3, 4]})

# Merge based on a common column 'key'

result = pd.merge(df1, df2, on='key')

Output
key value_x value_y
0 A 1 3
1 B 2 4

By: Faizan Irshad Page 12

Introduction to Pandas

Practice Task 1

Create a DataFrame fruits that looks like this:

Bananas Apples
0 20 17

Practice Task 2

Create a DataFrame sales that looks like below:

Bananas Apples
September 200 120
October 165 90

Practice Task 3

Create a series marks that looks like:

7.1 Practice Task 4

Read the following csv dataset of Groceries given to you into a DataFrame called grocer

7.2 Practice Task 5

Select the item description column from grocer and assign the result to the variable item desc.

7.3 Practice Task 6

Select the first value from the member_number column of grocer and assign it to variable
first_member.

7.4 Practice Task 7

Select the first 20 rows of data and save it to variable first_rows.

By: Faizan Irshad Page 13

Introduction to Pandas

7.5 Practice Task 8

Create a variable selected_grocer containing the values of all columns only for purchase made in
2015.

7.6 Practice Task 9

Create a variable containing the member_number and date columns of the first 500 records.

7.7 Practice Task 10

Create a DataFrame having purchases both in 2014 and 2015 year.

By: Faizan Irshad Page 14

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Rexroth - VT5041-2X-Data Sheet - English
100% (1)
Rexroth - VT5041-2X-Data Sheet - English
6 pages
Multiple Choice Questions
100% (10)
Multiple Choice Questions
5 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas
No ratings yet
Pandas
41 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas
No ratings yet
Pandas
12 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Unit 4
No ratings yet
Unit 4
36 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Pandas
No ratings yet
Pandas
26 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas
No ratings yet
Pandas
41 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Pandas
No ratings yet
Pandas
42 pages
Practical Guide To Pandas For Data Science
100% (1)
Practical Guide To Pandas For Data Science
26 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
5CS037 WS02 PandasForDataAnalysis
No ratings yet
5CS037 WS02 PandasForDataAnalysis
30 pages
Module 6
No ratings yet
Module 6
48 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Pandas
No ratings yet
Pandas
13 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
Pandas
No ratings yet
Pandas
16 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas
No ratings yet
Pandas
7 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Pandas
No ratings yet
Pandas
63 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Ip Study
No ratings yet
Ip Study
18 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
45 pages
Angel Investors Contact Sample 2.6
No ratings yet
Angel Investors Contact Sample 2.6
15 pages
Swiss Spirit (14x9)
No ratings yet
Swiss Spirit (14x9)
7 pages
Payment Plan
No ratings yet
Payment Plan
2 pages
Lab 2, Python Numpy - LUMS
No ratings yet
Lab 2, Python Numpy - LUMS
4 pages
Kibbles'+Generic+Spells+v1 1 2+-+compressed
No ratings yet
Kibbles'+Generic+Spells+v1 1 2+-+compressed
16 pages
01IaP Editorial
No ratings yet
01IaP Editorial
23 pages
O Level Mathematics Paper 2 Topical 2021
No ratings yet
O Level Mathematics Paper 2 Topical 2021
102 pages
quarter-3.LAS Research 3.week1-2
No ratings yet
quarter-3.LAS Research 3.week1-2
19 pages
2023 SDO CamSur Mid-Year INSET Program Completion Report
No ratings yet
2023 SDO CamSur Mid-Year INSET Program Completion Report
5 pages
Unit-Ii: Core T14: Ring Theory and Linear Algebra II
0% (1)
Unit-Ii: Core T14: Ring Theory and Linear Algebra II
28 pages
School Chemistry 3 Corrosion, Redox Reaction Stoichiometry and Faraday's Law in Electrolysis Cells
No ratings yet
School Chemistry 3 Corrosion, Redox Reaction Stoichiometry and Faraday's Law in Electrolysis Cells
15 pages
2016 - Starr, K. - Control Loop Performance Monitoring (ABBs Experience Over Two Decades)
No ratings yet
2016 - Starr, K. - Control Loop Performance Monitoring (ABBs Experience Over Two Decades)
7 pages
Week 6: Key Concepts of Propositional Logic
No ratings yet
Week 6: Key Concepts of Propositional Logic
39 pages
Business Statistics Fourth 4th Canadian Edition Ebook and TestBank Bundle Full Download
100% (1)
Business Statistics Fourth 4th Canadian Edition Ebook and TestBank Bundle Full Download
403 pages
Influence of Alpine Vegetation On Water Storage
No ratings yet
Influence of Alpine Vegetation On Water Storage
10 pages
Experiment 1lr
No ratings yet
Experiment 1lr
6 pages
User Manual: Safety Notice
No ratings yet
User Manual: Safety Notice
2 pages
2 A 5 Annex 2 Details of Current and Previous Work Research Plan and Career Plan After Graduation
No ratings yet
2 A 5 Annex 2 Details of Current and Previous Work Research Plan and Career Plan After Graduation
2 pages
Novel Techniques in Food Processing Technology
No ratings yet
Novel Techniques in Food Processing Technology
20 pages
Estimations in Numbers and Measurement: Let's Review
No ratings yet
Estimations in Numbers and Measurement: Let's Review
11 pages
Underground Water PDF
No ratings yet
Underground Water PDF
16 pages
Environment Clearance Sector 94 Noida
No ratings yet
Environment Clearance Sector 94 Noida
1 page
C THR84 2411 Exam Dumps
No ratings yet
C THR84 2411 Exam Dumps
2 pages
Unit 3
No ratings yet
Unit 3
18 pages
Chapter 7. Alcohols, Phenols, and Thiols
No ratings yet
Chapter 7. Alcohols, Phenols, and Thiols
19 pages
Leadership and Coercion
No ratings yet
Leadership and Coercion
8 pages
Dap An Exercises For Main Vocabulary U2
No ratings yet
Dap An Exercises For Main Vocabulary U2
5 pages
Demo in 2nd Quarter Science 4 Skeletal-System-Matatag
No ratings yet
Demo in 2nd Quarter Science 4 Skeletal-System-Matatag
5 pages
Humalyzer Primus: Semi-Automatic Microprocessor Controlled Photometer
100% (1)
Humalyzer Primus: Semi-Automatic Microprocessor Controlled Photometer
2 pages
4 - HYPERBOLIC and Circular Functions
No ratings yet
4 - HYPERBOLIC and Circular Functions
4 pages
Njit Engineering Paper
100% (2)
Njit Engineering Paper
8 pages
Stat Performance Task
No ratings yet
Stat Performance Task
2 pages

Lab-3 Pandas Library

Uploaded by

Lab-3 Pandas Library

Uploaded by

Introduction to Pandas Library in Python

• Getting started with pandas

2. Getting Started with pandas:

This important convention will be used through the lab.

2.1 Introduction to pandas Data Structures

Installing Pandas library

By: Faizan Irshad Page 2

For example, if we wish, we can use strings as an index:

Important: We can even use noncontiguous or nonsequential indices:

By: Faizan Irshad Page 3

2.4 Reading Data files

Loading file from C/users folder

load file using google drive if using colab

Loading from any other specific location on computer, use

By: Faizan Irshad Page 4

3. Access in Data Frame

4.1 Index-based selection

To get a column with iloc, we can do the following:

By: Faizan Irshad Page 5

It's also possible to pass a list: print(data.iloc[[0,2,5],0])

By: Faizan Irshad Page 6

4.2 Label-based selection

print(data.loc[:, ['ObservationDate', 'Province/State', 'Country']])

/ / total 494 records

print(data.loc[(data['Province/State'] == 'Anhui') | (data['Deaths'] ==

total 32,893 records found.

By: Faizan Irshad Page 7

To find records with null values

By: Faizan Irshad Page 8

Going the other way, assigning data to a DataFrame is easy.

data['New Column'] = 'dummy_values'

6. Functions and maps

To see a list of unique values we can use the unique() function:

By: Faizan Irshad Page 9

By: Faizan Irshad Page 10

df.describe() | Summary statistics for numerical columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Concatenate along rows (axis=0)

By: Faizan Irshad Page 11

# Concatenate along columns

df1 = pd.DataFrame({'key': ['A', 'B'], 'value': [1, 2]})

# Merge based on a common column 'key'

By: Faizan Irshad Page 12

Create a DataFrame fruits that looks like this:

Create a DataFrame sales that looks like below:

Create a series marks that looks like:

7.1 Practice Task 4

7.2 Practice Task 5

7.3 Practice Task 6

7.4 Practice Task 7

By: Faizan Irshad Page 13

7.5 Practice Task 8

7.6 Practice Task 9

7.7 Practice Task 10

Create a DataFrame having purchases both in 2014 and 2015 year.

By: Faizan Irshad Page 14

You might also like