0% found this document useful (0 votes)

11 views17 pages

Python With Pandas

The document is a Jupyter Notebook that provides an introduction to using the Pandas library in Python for data manipulation, focusing on Series and DataFrames. It covers the creation and manipulation of Series, including custom indexing, filtering, and mathematical operations, as well as how to create and modify DataFrames with various data types. The document includes examples of data processing techniques such as value assignment and handling missing data.

Uploaded by

ambeydeveloper

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views17 pages

Python With Pandas

Uploaded by

ambeydeveloper

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.

ipynb - Colaboratory

#python pandas
#used to work with heterogenous (tabular) data
#facilitate faster and easier data processing

Double-click (or enter) to edit

import pandas as pd

# frequently used libraries

from pandas import Series, DataFrame

## Data structures: Series & Dataframes

## Series
#A sequence of values ( similar to 1 D array) with explicit custom index to access the values
#like a fixed length dict. a KIND mapping of index values to data values
# Created using Series function
series1 = pd.Series([22,33,44,55]) # default index: 0 to n-1

series1

0 22
1 33
2 44
3 55
dtype: int64

# Series properties (attributes)

# 1. 'values'
#it is an array object
series1.values

array([22, 33, 44, 55])

# 2. index
# it is like range
series1.index

RangeIndex(start=0, stop=4, step=1)

#custom index: as a list of labels

series2 = pd.Series([22,33,44,55], index=['a','b','c','d'])

series2

a 22
b 33
c 44
d 55
dtype: int64

# check presence or absence of labels in a sereis

'c' in series2

True

'e' in series2

False

series2.values

array([22, 33, 44, 55])

series2.index # to run code press shift + enter

Index(['a', 'b', 'c', 'd'], dtype='object')

# accessing values using labeled indices

# series2(['b', 'a']) #error as incorrect brakets

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 1/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
series2[['b','a']]

b 33
a 22
dtype: int64

#index-value link is not disturbed by execution of various operations.

# automatic alignment of index labels with corresponding values and results just like in 'R' platform
# filtering
series2[series2>0]

a 22
b 33
c 44
d 55
dtype: int64

series2[series2>34]

c 44
d 55
dtype: int64

# scalar multiplication
series2*9 # Broadcasting is done here.

a 198
b 297
c 396
d 495
dtype: int64

#math operation
import numpy as np
np.exp(series2) # exponential of series values

a 3.584913e+09
b 2.146436e+14
c 1.285160e+19
d 7.694785e+23
dtype: float64

# Converting Python dict object into a series object as mapping is possible

# PIN_code = {'Roorkee'= 247667,'Dehradun'= 248001, 'Haridwar' = 249401}
# above invalid syntax = should be replaced with :
PIN_code = {'Roorkee': 247667, 'Dehradun': 248001, 'Haridwar': 249401}

series3 = pd.Series(PIN_code)

series3

Roorkee 247667
Dehradun 248001
Haridwar 249401
dtype: int64

series3.index

Index(['Roorkee', 'Dehradun', 'Haridwar'], dtype='object')

# METHOD 1 of changing INDEX

# changing index using index argument
# e.g., order of indices
cities = ['Dehradun', 'Roorkee', 'Haridwar', 'Tehri garhwal']

series4 = pd.Series(PIN_code, index=cities)

series4

Dehradun 248001.0
Roorkee 247667.0
Haridwar 249401.0
Tehri garhwal NaN
dtype: float64

# detect missing data

# using isnul & not null function
pd.isnull(series4)

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 2/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Dehradun False
Roorkee False
Haridwar False
Tehri garhwal True
dtype: bool

pd.notnull(series4)

Dehradun True
Roorkee True
Haridwar True
Tehri garhwal False
dtype: bool

# using isnull and notnull methods

series4.isnull()

Dehradun False
Roorkee False
Haridwar False
Tehri garhwal True
dtype: bool

series4.notnull()

Dehradun True
Roorkee True
Haridwar True
Tehri garhwal False
dtype: bool

# alignment feature
series3 + series4 # pincode added.

Dehradun 496002.0
Haridwar 498802.0
Roorkee 495334.0
Tehri garhwal NaN
dtype: float64

# name attributes of series object and its index

series4.name = 'PIN_code' #see last line of below execution result

series4.index.name = 'city' # to specify name of index

series4

city
Dehradun 248001.0
Roorkee 247667.0
Haridwar 249401.0
Tehri garhwal NaN
Name: PIN_code, dtype: float64

# METHOD 2 of changing INDEX

# changing index using index attribute
# e.g index values
series1

0 22
1 33
2 44
3 55
dtype: int64

series1.index = [1,2,3,4]

series1

1 22
2 33
3 44
4 55
dtype: int64

#DATAFRAME Represents Physically 2-D data in a tabular form row column

#It is like a dict of series elements having a common (row) index

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 3/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

#it has both a row & a column index

##row index is similar to series index (default 0 to n-1)

#column index is variables name (by default sorted alphabetically)

#columns can be variables of different value types (numeric, string, boolean,etc)

#created using Dataframe function

# ltes take year wise metro population data

# Dataframe is like a dict with : common key with multiple elements
dict1 = {'metro':['Delhi','Delhi','Delhi','Mumbai', 'Mumbai', 'Mumbai'],
'year': [2011,2012,2013,2011,2012,2013],
'popcr':[1.67,1.72,1.77,2.07,2.11,2.15]}

#possible error
# df = pd.dataframe(dict1)
# df = pd.Dataframe(dict1)

df = pd.DataFrame(dict1)

metro year popcr

0 Delhi 2011 1.67

1 Delhi 2012 1.72

2 Delhi 2013 1.77

3 Mumbai 2011 2.07

4 Mumbai 2012 2.11

5 Mumbai 2013 2.15

#first five row

df.head()

metro year popcr

0 Delhi 2011 1.67

1 Delhi 2012 1.72

2 Delhi 2013 1.77

3 Mumbai 2011 2.07

4 Mumbai 2012 2.11

# changing the column index using columns argument

pd.DataFrame(dict1, columns =['year','metro','popcr'])

year metro popcr

0 2011 Delhi 1.67

1 2012 Delhi 1.72

2 2013 Delhi 1.77

3 2011 Mumbai 2.07

4 2012 Mumbai 2.11

5 2013 Mumbai 2.15

# chnaging the row index using index argument

df2 = pd.DataFrame(dict1, columns =['year','metro','popcr','area'],index=list(range(1,7)))
df2

# row and columnindex both changed.

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 4/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area

1 2011 Delhi 1.67 NaN

2 2012 Delhi 1.72 NaN

3 2013 Delhi 1.77 NaN

4 2011 Mumbai 2.07 NaN

# column index
df2.columns
5 2012 Mumbai 2.11 NaN

Index(['year',
6 2013 Mumbai 'metro',
2.15 'popcr',
NaN 'area'], dtype='object')

# accessing column using dict key like notation

df2['popcr']

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
6 2.15
Name: popcr, dtype: float64

# accessing column using attribute like notation

df2.year

1 2011
2 2012
3 2013
4 2011
5 2012
6 2013
Name: year, dtype: int64

# selecting particular row indices along a column

df2['popcr'][1:4]

2 1.72
3 1.77
4 2.07
Name: popcr, dtype: float64

df2['popcr'][:5] # only stopping point

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
Name: popcr, dtype: float64

df2['popcr'][:-3]

1 1.67
2 1.72
3 1.77
Name: popcr, dtype: float64

df2['popcr'][:] # no start stop, all observations shown

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
6 2.15
Name: popcr, dtype: float64

# accessing row using 'loc' attribute (locate)

df2.loc[:]

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 5/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area

1 2011 Delhi 1.67 NaN

2 2012 Delhi 1.72 NaN

df2.loc[1]
3 2013 Delhi 1.77 NaN
year 2011
4 2011 Delhi
metro Mumbai 2.07 NaN
popcr 1.67
5 2012 Mumbai 2.11 NaN
area NaN
Name: 1, dtype:
6 2013 Mumbai object
2.15 NaN

## Data processing & transformation

# value assignment
# e.g area column
df2['area']=6000

df2

year metro popcr area

1 2011 Delhi 1.67 6000

2 2012 Delhi 1.72 6000

3 2013 Delhi 1.77 6000

4 2011 Mumbai 2.07 6000

5 2012 Mumbai 2.11 6000

6 2013 Mumbai 2.15 6000

# populating values
#e.g area column
df2['area'] = np.arange(6000,6600,100) # (start, stop, step)
df2

year metro popcr area

1 2011 Delhi 1.67 6000

2 2012 Delhi 1.72 6100

3 2013 Delhi 1.77 6200

4 2011 Mumbai 2.07 6300

5 2012 Mumbai 2.11 6400

6 2013 Mumbai 2.15 6500

# using series to populate vaules

# unmtched indices would be treated as missing values
val = pd.Series([1.7,1.9,2.1], index=[2,4,6])
df2['popcr'] = val
df2

year metro popcr area

1 2011 Delhi NaN 6000

2 2012 Delhi 1.7 6100

3 2013 Delhi NaN 6200

4 2011 Mumbai 1.9 6300

5 2012 Mumbai NaN 6400

6 2013 Mumbai 2.1 6500

# assigning values to a non existing column will also create that column
df2['northern'] = (df2.metro == 'Delhi') # kind of boolean column

df2

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 6/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area northern

1 2011 Delhi NaN 6000 True

2 2012 Delhi 1.7 6100 True

3 2013 Delhi NaN 6200 True

4 2011 Mumbai 1.9 6300 False

# removing a column
5 2012 Mumbai NaN 6400 False
del df2['northern']
6 2013 Mumbai 2.1 6500 False

df2.columns

Index(['year', 'metro', 'popcr', 'area'], dtype='object')

#filling values when reindexing

# using ffill method
series10 = pd.Series(['blue','purple','yellow'], index=[0,2,4])

## more on 'index objects' in series and DataFrame

# immutable array-like objects
# Also store metadata other than axis labels (index values)

# sequences of labels used while creating a series or dataframe are internally converted into
series5 = pd.Series(range(3), index=['a','b','c']) # index square braket

series5.index

Index(['a', 'b', 'c'], dtype='object')

#subsetting
ind1 = series5.index
ind1[1:] #metion starting point

Index(['b', 'c'], dtype='object')

ind1[0:2]

Index(['a', 'b'], dtype='object')

#immutability
ind1[1] = 'd'
# output is error : type error
# index does not support mutable operation

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-126-65fbb217ef82> in <cell line: 2>()
1 #immutability
----> 2 ind1[1] = 'd'
3 # output is error : type error
4 # index does not support mutable operation

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in
__setitem__(self, key, value)
5300 @final
5301 def __setitem__(self, key, value):
-> 5302 raise TypeError("Index does not support mutable operations")
5303
5304 def __getitem__(self, key):

TypeError: Index does not support mutable operations

SEARCH STACK OVERFLOW

# creating index using Index function

ind2 = pd.Index(np.arange(3))

ind2

Int64Index([0, 1, 2], dtype='int64')

series6 = pd.Series([1.5,-2.5,0], index=ind2)

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 7/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
series6

0 1.5
1 -2.5
2 0.0
dtype: float64

series6.index is ind2

True

# Index objects can have duplicate elements

ind3 = pd.Index(['beta','alpha','beta','sigma'])

ind3

Index(['beta', 'alpha', 'beta', 'sigma'], dtype='object')

series7 = pd.Series([1.1,1.2,1.3,1.4], index=ind3)

series7

beta 1.1
alpha 1.2
beta 1.3
sigma 1.4
dtype: float64

# two rows with same index beta, lets see

series7['beta'] # shows all rows with label beta

beta 1.1
beta 1.3
dtype: float64

series10

0 blue
2 purple
4 yellow
dtype: object

series11 = series10.reindex(range(6), method = 'ffill')

series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# reindexing with dataframe

df4 = pd.DataFrame((np.arange(9).reshape(3,3)),index=['a','c','d'], columns=['Delhi','Mumbai','Bangalore'])

df4

Delhi Mumbai Bangalore

a 0 1 2

c 3 4 5

d 6 7 8

#reindexing rows index

df5 = df4.reindex(['a','b','c','d'])

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 8/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

#reindexing columns index
#usingdcolumns
6.0 argument
7.0 8.0
df5.reindex(columns=['Banglore','Delhi','Mumbai'])

Banglore Delhi Mumbai

a NaN 0.0 1.0

b NaN NaN NaN

c NaN 3.0 4.0

d NaN 6.0 7.0

# droping cases or rows

df5.drop('b') # df5.drop(['a','c']) dropping more rows by passing list argument

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

#Dropping columns or variables

df5.drop('Bangalore', axis=1) # we hv to specify axis, as to confirm that column is to be dropped

Delhi Mumbai

a 0.0 1.0

b NaN NaN

c 3.0 4.0

d 6.0 7.0

# df5.drop('Delhi','Mumbai'], axis=1) # dropping multiple columns

# using in-place version of drop method

# using implace argument
df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b', inplace = True) #

df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# Ex. access, selection, and filtering

# Working with a series
series11

0 blue
1 blue
2 purple
3 purple
4 yellow

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 9/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
5 yellow
dtype: object

series7_1 = pd.DataFrame([1.1,1.3,1.3,1.3],index= ['beta','alpha','beta','sigma'])

series7_1

beta 1.1

alpha 1.3

beta 1.3

sigma 1.3

# accessing one row

series11[3] # square brakets

'purple'

series7_1['sigma'] # square braket with single quotes

# 1.3

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-153-8bdd86d373a9> in <cell line: 1>()
----> 1 series7_1['sigma'] # square braket with single quotes
2 # 1.3

1 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py in
get_loc(self, key, method, tolerance)
393 raise KeyError(key) from err
394 self._check_indexing_error(key)
--> 395 raise KeyError(key)
396 return super().get_loc(key, method=method, tolerance=tolerance)
397

KeyError: 'sigma'

SEARCH STACK OVERFLOW

# accessing a range of rows

series11[3:5]

3 purple
4 yellow
dtype: object

# selecting few rows

series7_1[['sigma','alpha']] #in index use square brakets always

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-156-b23e739c9d06> in <cell line: 2>()
1 # selecting few rows
----> 2 series7_1[['sigma','alpha']] #in index use square brakets always

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in
_raise_if_missing(self, key, indexer, axis_name)
6128 if use_interval_msg:
6129 key = list(key)
-> 6130 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
6131
6132 not_found = list(ensure_index(key)[missing_mask.nonzero()
[0]].unique())

KeyError: "None of [Index(['sigma', 'alpha'], dtype='object')] are in the [columns]"

# Using Earlier created series 7

# selecting few rows
series7[['sigma','alpha']]

sigma 1.4
alpha 1.2
dtype: float64

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 10/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
# filtering rows based on some logic
series7[series7<1.3]

beta 1.1
alpha 1.2
dtype: float64

# slicing

# Note : end-point is inclusive in series and dataframe

# series7['alpha','sigma'] shows error
series7['alpha':'sigma']

alpha 1.2
beta 1.3
sigma 1.4
dtype: float64

series7['alpha':'sigma'] = 1.3 # asssigning 1.3

series7 # values assigned

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

series7['beta']

beta 1.1
beta 1.3
dtype: float64

## interaction mechanism with the data in a series or dataframe

# Ex. reindexing
# using redindex method
series8 = pd.Series([4.5, 7.2, -5.3, 3.6], index = ['d', 'b', 'a', 'c'])

#pd.Series([],index= [])

series8

d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64

series9 = series8.reindex(['a','b','c','d','e'])

series9

a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64

# filling values whe reindexing

# using ffill method

series10 = pd.Series(['blue','purple','yellow'],index = [0,2,4])

series10

0 blue
2 purple
4 yellow
dtype: object

series11 = series10.reindex(range(6), method = 'ffill')

series11

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 11/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# reindexing with dataframe

df4 = pd.DataFrame(np.arange(9).reshape((3,3)), index=['a','c','d'], columns = ['Delhi','Mumbai','Banglore'])

# Error chnaces
#1. index=['a','b',c','d']
# brakets

df4

Delhi Mumbai Banglore

a 0 1 2

c 3 4 5

d 6 7 8

# reindexing rows index

df5 = df4.reindex(['a','b','c','d'])

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# reindexing column index, using column argument

df5.reindex(columns = ['Banglore','Delhi','Mumbai'])

Banglore Delhi Mumbai

a 2.0 0.0 1.0

b NaN NaN NaN

c 5.0 3.0 4.0

d 8.0 6.0 7.0

# reindexing using loc attribute

df4.loc[['a','b','c','d'],['Banglore','Delhi','Mumbai']]

# previously its working but not in recent updates

A function that takes the passenger name as input, checks which booking class the passenger belongs to, and returns the average delay time.

# Accept passenger name input from user

passenger_name = input("Enter passenger name: ")

booking_class = df.loc[df['Name'] == passenger_name , 'Booking Class'].iloc[0]

print("Booking class for passenger", passenger_name, ":", booking_class)
delay_min = df.loc[df['Name'] == passenger_name , 'Delay Minutes'].iloc[0]
print("Delay minutes for passenger", passenger_name, ":", delay_min)

Enter passenger name: Amber Mcclure Booking class for passenger Amber Mcclure : First Delay minutes for passenger Amber Mcclure : 8

# Ex. dropping cases or rows

## using drop method

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 12/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b')

Delhi Mumbai Banglore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# df5.drop('a','c') error
# have to pass list
df5.drop(['a','c'])

Delhi Mumbai Banglore

b NaN NaN NaN

d 6.0 7.0 8.0

df5.drop('Banglore', axis=1)

Delhi Mumbai

a 0.0 1.0

b NaN NaN

c 3.0 4.0

d 6.0 7.0

# dropping multiple columns

# multiple columns in square bracket followed by ,

df5.drop(['Delhi','Mumbai'], axis=1)

Banglore

a 2.0

b NaN

c 5.0

d 8.0

# using in-place version of drop method

#using inplace argument

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b', inplace = True)

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 13/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Banglore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# access, selection, filtering

# working with a series

series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

series7

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

# accessing one row

series11[3] #integer index 3

'purple'

series7['sigma'] # string index

1.3

# accessing a range of row

series11[3:5]

3 purple
4 yellow
dtype: object

# selecting few row

series7[['sigma','alpha']]

sigma 1.3
alpha 1.3
dtype: float64

series11[[1,5]] # selecting row index 1 & 5

1 blue
5 yellow
dtype: object

# filtering rows based on some logic

series7[series7<1.3]

beta 1.1
dtype: float64

# slicing
# Note: end point inclusive on series and dataframe

series7['alpha':'sigma']

#series7['alpha','sigma'] not work

alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

series7['alpha':'sigma'] = 1.3

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 14/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

series7

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

#### New Topic Starts

# working with DataFrame
df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# accessing columns
df5['Delhi']

a 0.0
c 3.0
d 6.0
Name: Delhi, dtype: float64

df5[['Delhi','Bangalore']]

Delhi Bangalore

a 0.0 2.0

c 3.0 5.0

d 6.0 8.0

# slicing rows
df5[:2] # starting point not specified

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

df5[1:2]

Delhi Mumbai Bangalore

c 3.0 4.0 5.0

# filtering rows
df5[df5['Bangalore']>5]

Delhi Mumbai Bangalore

d 6.0 7.0 8.0

# working with a boolean dataframe

df5 < 5

Delhi Mumbai Bangalore

a True True True

c True True False

d False False False

df5[df5<5] = 0

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 15/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Bangalore

a 0.0 0.0 0.0

c 0.0 0.0 5.0

## indexing
d operators
6.0 7.0 loc and iloc
8.0 , gives precise indexing
# loc is used with axis labels
# iloc is used with integer values corresponding to axis labels
df5

Delhi Mumbai Bangalore

a 0.0 0.0 0.0

c 0.0 0.0 5.0

d 6.0 7.0 8.0

#syntax: dataframe.loc[row,column] square brakets

# accessing sigle elemnt
df5.loc['a','Delhi']

0.0

# df5.loc['a','Delhi','Bangalore'] Gives indexing error

df5.loc['a',['Delhi','Banglore']]

Delhi 0.0
Banglore 2.0
Name: a, dtype: float64

# df5.loc['a','b'],['Delhi','Bangalore']
# not allowed now , earlier shows results.

df5.iloc[0,0] #[row, column]

0.0

df5.iloc[0,[0,1]] # selecting two columns

Delhi 0.0
Mumbai 0.0
Name: a, dtype: float64

df5.iloc[[0,1],[0,1]] # 2 row , 2 columns

Delhi Mumbai

a 0.0 0.0

c 0.0 0.0

df5.iloc[1] # second row

Delhi 0.0
Mumbai 0.0
Bangalore 5.0
Name: c, dtype: float64

df5.iloc[[1,2],[2,0,1]] # 2 row , 3 columns

Bangalore Delhi Mumbai

c 5.0 0.0 0.0

d 8.0 6.0 7.0

# slicing with loc and iloc

df5.loc[:'c', 'Delhi']

a 0.0
c 0.0
Name: Delhi, dtype: float64

df5.loc['c':'d', 'Delhi']

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 16/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

c 0.0
d 6.0
Name: Delhi, dtype: float64

df5.loc['c':'d', 'Delhi':'Bangalore']

Delhi Mumbai Bangalore

c 0.0 0.0 5.0

d 6.0 7.0 8.0

df5.iloc[:, 1:2]
# : shows all rows, all columns are to selected
# on column side we are selecting 1,2

Mumbai

a 0.0

c 0.0

d 7.0

df5.iloc[:,0:2][df5.Mumbai != 0]
# Filtering apart from slicing
# all rows, 0 to 2 columns are going to be selected
# only those columns are selected in which value is not equal to 0

Delhi Mumbai

d 6.0 7.0

# # more on integer indexing

series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# fails due to inference ambiguity

series11[-3]

series7[-1] #doesnot fails

1.3

check 0s completed at 10:00 PM

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 17/17

VendingMachine usingTOC
No ratings yet
VendingMachine usingTOC
6 pages
1004042119
No ratings yet
1004042119
1 page
Pandas
No ratings yet
Pandas
49 pages
It's True... : Cooking Guide
No ratings yet
It's True... : Cooking Guide
2 pages
Cs Tcom RF 3081 At001 Muos Antenna Spec Sheet
100% (1)
Cs Tcom RF 3081 At001 Muos Antenna Spec Sheet
2 pages
Social Media Usage and the Academic Perf
No ratings yet
Social Media Usage and the Academic Perf
32 pages
Pandas
No ratings yet
Pandas
49 pages
Line By Line 12 IP
No ratings yet
Line By Line 12 IP
21 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
DSP_Lec6
No ratings yet
DSP_Lec6
10 pages
CMGTCB554 Final Course Reflection
No ratings yet
CMGTCB554 Final Course Reflection
7 pages
Pandas
No ratings yet
Pandas
57 pages
Liquor Act 14 12
No ratings yet
Liquor Act 14 12
52 pages
Practical-file-12 IP 24-25
No ratings yet
Practical-file-12 IP 24-25
49 pages
Short Notes on pandas
No ratings yet
Short Notes on pandas
21 pages
Pandas (Paneled Data)
No ratings yet
Pandas (Paneled Data)
97 pages
Coworking Preview F
No ratings yet
Coworking Preview F
129 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
DRAGHICI SORIN IONUT - Managing Partner La Digital WorkForce - Resume
No ratings yet
DRAGHICI SORIN IONUT - Managing Partner La Digital WorkForce - Resume
2 pages
Panas Short Notes
No ratings yet
Panas Short Notes
4 pages
Tawara Yacht Brokers: Draft Report By: Candidate Adil
No ratings yet
Tawara Yacht Brokers: Draft Report By: Candidate Adil
2 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
Ip Practical 123
No ratings yet
Ip Practical 123
24 pages
Economics Chapter 2 Summary
No ratings yet
Economics Chapter 2 Summary
4 pages
Intern report FORMAT front pages
No ratings yet
Intern report FORMAT front pages
24 pages
IP NOTES
No ratings yet
IP NOTES
20 pages
Road SAFETY
No ratings yet
Road SAFETY
13 pages
Measurement and Evaluation
No ratings yet
Measurement and Evaluation
97 pages
10 Minutes To Pandas - Pandas 1.2.4 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 1.2.4 Documentation
18 pages
XII_LIST OF PRACTICALS_With answers.docx
No ratings yet
XII_LIST OF PRACTICALS_With answers.docx
20 pages
11722121469601
No ratings yet
11722121469601
49 pages
11688821574779
No ratings yet
11688821574779
49 pages
11377681249398
No ratings yet
11377681249398
49 pages
Class XII Python Practical File
No ratings yet
Class XII Python Practical File
19 pages
python unit 3 4
No ratings yet
python unit 3 4
92 pages
11745063235731
No ratings yet
11745063235731
49 pages
Foreign Facilities Approved For SA Category I and Category II/III Operations
No ratings yet
Foreign Facilities Approved For SA Category I and Category II/III Operations
5 pages
VI KLASI - Qart. Literatura I Nawili (Mostsav. Tsigni)
No ratings yet
VI KLASI - Qart. Literatura I Nawili (Mostsav. Tsigni)
16 pages
Practical File IP Class 12 2024 25 Sharing Removed
No ratings yet
Practical File IP Class 12 2024 25 Sharing Removed
29 pages
1501992967_1496666168_Pandas
No ratings yet
1501992967_1496666168_Pandas
63 pages
Pandas AI ML Python Software Engineering
No ratings yet
Pandas AI ML Python Software Engineering
63 pages
Data Frame 100 Questions
No ratings yet
Data Frame 100 Questions
16 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Final Pay Computation
100% (1)
Final Pay Computation
1 page
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
Pandas
No ratings yet
Pandas
36 pages
ip study
No ratings yet
ip study
18 pages
Minebea Stepper Motor Specifications
No ratings yet
Minebea Stepper Motor Specifications
4 pages
CMA Inter - FM MCQ Booklet
No ratings yet
CMA Inter - FM MCQ Booklet
58 pages
IpNotes
No ratings yet
IpNotes
72 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Compendium Consuelo 2.8 0623
No ratings yet
Compendium Consuelo 2.8 0623
708 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Duo Phase
No ratings yet
Duo Phase
7 pages
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
No ratings yet
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
15 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Extempore Speaking Instructions
No ratings yet
Extempore Speaking Instructions
6 pages
CHN Reviewer
No ratings yet
CHN Reviewer
8 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
NumPy Arrays and Pandas Series Object
No ratings yet
NumPy Arrays and Pandas Series Object
19 pages
Analysis and Design of A Bridge at Bhoothathankettu Barrage
No ratings yet
Analysis and Design of A Bridge at Bhoothathankettu Barrage
6 pages
IAS
No ratings yet
IAS
56 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Power Cable Failures
No ratings yet
Power Cable Failures
52 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Practical File IP Class 12 2022 23
No ratings yet
Practical File IP Class 12 2022 23
49 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Unit 2
No ratings yet
Unit 2
81 pages
Grade 1 Drums All PDF
100% (1)
Grade 1 Drums All PDF
6 pages
116222942-Data Mining-On-Forest-Cover-Prediction
No ratings yet
116222942-Data Mining-On-Forest-Cover-Prediction
21 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
NumPy Arrays and Pandas Series Object
No ratings yet
NumPy Arrays and Pandas Series Object
18 pages
Data Analysis With PANDAS: Cheat Sheet
83% (6)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Five Guys
No ratings yet
Five Guys
18 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
A2 Economics Macroeconomics Revision Workbook - April 2010
50% (4)
A2 Economics Macroeconomics Revision Workbook - April 2010
48 pages
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
From Everand
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
Orhan Gazi
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Beginning C# 7 Programming with Visual Studio 2017
From Everand
Beginning C# 7 Programming with Visual Studio 2017
Benjamin Perkins
No ratings yet