0% found this document useful (0 votes)
10 views

Python With Pandas

The document is a Jupyter Notebook that provides an introduction to using the Pandas library in Python for data manipulation, focusing on Series and DataFrames. It covers the creation and manipulation of Series, including custom indexing, filtering, and mathematical operations, as well as how to create and modify DataFrames with various data types. The document includes examples of data processing techniques such as value assignment and handling missing data.

Uploaded by

ambeydeveloper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Python With Pandas

The document is a Jupyter Notebook that provides an introduction to using the Pandas library in Python for data manipulation, focusing on Series and DataFrames. It covers the creation and manipulation of Series, including custom indexing, filtering, and mathematical operations, as well as how to create and modify DataFrames with various data types. The document includes examples of data processing techniques such as value assignment and handling missing data.

Uploaded by

ambeydeveloper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.

ipynb - Colaboratory

#python pandas
#used to work with heterogenous (tabular) data
#facilitate faster and easier data processing

Double-click (or enter) to edit

import pandas as pd

# frequently used libraries


from pandas import Series, DataFrame

## Data structures: Series & Dataframes

## Series
#A sequence of values ( similar to 1 D array) with explicit custom index to access the values
#like a fixed length dict. a KIND mapping of index values to data values
# Created using Series function
series1 = pd.Series([22,33,44,55]) # default index: 0 to n-1

series1

0 22
1 33
2 44
3 55
dtype: int64

# Series properties (attributes)


# 1. 'values'
#it is an array object
series1.values

array([22, 33, 44, 55])

# 2. index
# it is like range
series1.index

RangeIndex(start=0, stop=4, step=1)

#custom index: as a list of labels


series2 = pd.Series([22,33,44,55], index=['a','b','c','d'])

series2

a 22
b 33
c 44
d 55
dtype: int64

# check presence or absence of labels in a sereis


'c' in series2

True

'e' in series2

False

series2.values

array([22, 33, 44, 55])

series2.index # to run code press shift + enter

Index(['a', 'b', 'c', 'd'], dtype='object')

# accessing values using labeled indices


# series2(['b', 'a']) #error as incorrect brakets

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 1/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
series2[['b','a']]

b 33
a 22
dtype: int64

#index-value link is not disturbed by execution of various operations.


# automatic alignment of index labels with corresponding values and results just like in 'R' platform
# filtering
series2[series2>0]

a 22
b 33
c 44
d 55
dtype: int64

series2[series2>34]

c 44
d 55
dtype: int64

# scalar multiplication
series2*9 # Broadcasting is done here.

a 198
b 297
c 396
d 495
dtype: int64

#math operation
import numpy as np
np.exp(series2) # exponential of series values

a 3.584913e+09
b 2.146436e+14
c 1.285160e+19
d 7.694785e+23
dtype: float64

# Converting Python dict object into a series object as mapping is possible


# PIN_code = {'Roorkee'= 247667,'Dehradun'= 248001, 'Haridwar' = 249401}
# above invalid syntax = should be replaced with :
PIN_code = {'Roorkee': 247667, 'Dehradun': 248001, 'Haridwar': 249401}

series3 = pd.Series(PIN_code)

series3

Roorkee 247667
Dehradun 248001
Haridwar 249401
dtype: int64

series3.index

Index(['Roorkee', 'Dehradun', 'Haridwar'], dtype='object')

# METHOD 1 of changing INDEX


# changing index using index argument
# e.g., order of indices
cities = ['Dehradun', 'Roorkee', 'Haridwar', 'Tehri garhwal']

series4 = pd.Series(PIN_code, index=cities)


series4

Dehradun 248001.0
Roorkee 247667.0
Haridwar 249401.0
Tehri garhwal NaN
dtype: float64

# detect missing data


# using isnul & not null function
pd.isnull(series4)

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 2/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Dehradun False
Roorkee False
Haridwar False
Tehri garhwal True
dtype: bool

pd.notnull(series4)

Dehradun True
Roorkee True
Haridwar True
Tehri garhwal False
dtype: bool

# using isnull and notnull methods


series4.isnull()

Dehradun False
Roorkee False
Haridwar False
Tehri garhwal True
dtype: bool

series4.notnull()

Dehradun True
Roorkee True
Haridwar True
Tehri garhwal False
dtype: bool

# alignment feature
series3 + series4 # pincode added.

Dehradun 496002.0
Haridwar 498802.0
Roorkee 495334.0
Tehri garhwal NaN
dtype: float64

# name attributes of series object and its index


series4.name = 'PIN_code' #see last line of below execution result

series4.index.name = 'city' # to specify name of index

series4

city
Dehradun 248001.0
Roorkee 247667.0
Haridwar 249401.0
Tehri garhwal NaN
Name: PIN_code, dtype: float64

# METHOD 2 of changing INDEX


# changing index using index attribute
# e.g index values
series1

0 22
1 33
2 44
3 55
dtype: int64

series1.index = [1,2,3,4]

series1

1 22
2 33
3 44
4 55
dtype: int64

#DATAFRAME Represents Physically 2-D data in a tabular form row column


#It is like a dict of series elements having a common (row) index

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 3/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

#it has both a row & a column index


##row index is similar to series index (default 0 to n-1)

#column index is variables name (by default sorted alphabetically)


#columns can be variables of different value types (numeric, string, boolean,etc)

#created using Dataframe function

# ltes take year wise metro population data


# Dataframe is like a dict with : common key with multiple elements
dict1 = {'metro':['Delhi','Delhi','Delhi','Mumbai', 'Mumbai', 'Mumbai'],
'year': [2011,2012,2013,2011,2012,2013],
'popcr':[1.67,1.72,1.77,2.07,2.11,2.15]}

#possible error
# df = pd.dataframe(dict1)
# df = pd.Dataframe(dict1)

df = pd.DataFrame(dict1)

df

metro year popcr

0 Delhi 2011 1.67

1 Delhi 2012 1.72

2 Delhi 2013 1.77

3 Mumbai 2011 2.07

4 Mumbai 2012 2.11

5 Mumbai 2013 2.15

#first five row


df.head()

metro year popcr

0 Delhi 2011 1.67

1 Delhi 2012 1.72

2 Delhi 2013 1.77

3 Mumbai 2011 2.07

4 Mumbai 2012 2.11

# changing the column index using columns argument


pd.DataFrame(dict1, columns =['year','metro','popcr'])

year metro popcr

0 2011 Delhi 1.67

1 2012 Delhi 1.72

2 2013 Delhi 1.77

3 2011 Mumbai 2.07

4 2012 Mumbai 2.11

5 2013 Mumbai 2.15

# chnaging the row index using index argument


df2 = pd.DataFrame(dict1, columns =['year','metro','popcr','area'],index=list(range(1,7)))
df2

# row and columnindex both changed.

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 4/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area

1 2011 Delhi 1.67 NaN

2 2012 Delhi 1.72 NaN

3 2013 Delhi 1.77 NaN

4 2011 Mumbai 2.07 NaN


# column index
df2.columns
5 2012 Mumbai 2.11 NaN

Index(['year',
6 2013 Mumbai 'metro',
2.15 'popcr',
NaN 'area'], dtype='object')

# accessing column using dict key like notation


df2['popcr']

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
6 2.15
Name: popcr, dtype: float64

# accessing column using attribute like notation


df2.year

1 2011
2 2012
3 2013
4 2011
5 2012
6 2013
Name: year, dtype: int64

# selecting particular row indices along a column


df2['popcr'][1:4]

2 1.72
3 1.77
4 2.07
Name: popcr, dtype: float64

df2['popcr'][:5] # only stopping point

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
Name: popcr, dtype: float64

df2['popcr'][:-3]

1 1.67
2 1.72
3 1.77
Name: popcr, dtype: float64

df2['popcr'][:] # no start stop, all observations shown

1 1.67
2 1.72
3 1.77
4 2.07
5 2.11
6 2.15
Name: popcr, dtype: float64

# accessing row using 'loc' attribute (locate)


df2.loc[:]

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 5/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area

1 2011 Delhi 1.67 NaN

2 2012 Delhi 1.72 NaN


df2.loc[1]
3 2013 Delhi 1.77 NaN
year 2011
4 2011 Delhi
metro Mumbai 2.07 NaN
popcr 1.67
5 2012 Mumbai 2.11 NaN
area NaN
Name: 1, dtype:
6 2013 Mumbai object
2.15 NaN

## Data processing & transformation


# value assignment
# e.g area column
df2['area']=6000

df2

year metro popcr area

1 2011 Delhi 1.67 6000

2 2012 Delhi 1.72 6000

3 2013 Delhi 1.77 6000

4 2011 Mumbai 2.07 6000

5 2012 Mumbai 2.11 6000

6 2013 Mumbai 2.15 6000

# populating values
#e.g area column
df2['area'] = np.arange(6000,6600,100) # (start, stop, step)
df2

year metro popcr area

1 2011 Delhi 1.67 6000

2 2012 Delhi 1.72 6100

3 2013 Delhi 1.77 6200

4 2011 Mumbai 2.07 6300

5 2012 Mumbai 2.11 6400

6 2013 Mumbai 2.15 6500

# using series to populate vaules


# unmtched indices would be treated as missing values
val = pd.Series([1.7,1.9,2.1], index=[2,4,6])
df2['popcr'] = val
df2

year metro popcr area

1 2011 Delhi NaN 6000

2 2012 Delhi 1.7 6100

3 2013 Delhi NaN 6200

4 2011 Mumbai 1.9 6300

5 2012 Mumbai NaN 6400

6 2013 Mumbai 2.1 6500

# assigning values to a non existing column will also create that column
df2['northern'] = (df2.metro == 'Delhi') # kind of boolean column

df2

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 6/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

year metro popcr area northern

1 2011 Delhi NaN 6000 True

2 2012 Delhi 1.7 6100 True

3 2013 Delhi NaN 6200 True

4 2011 Mumbai 1.9 6300 False


# removing a column
5 2012 Mumbai NaN 6400 False
del df2['northern']
6 2013 Mumbai 2.1 6500 False

df2.columns

Index(['year', 'metro', 'popcr', 'area'], dtype='object')

#filling values when reindexing


# using ffill method
series10 = pd.Series(['blue','purple','yellow'], index=[0,2,4])

## more on 'index objects' in series and DataFrame


# immutable array-like objects
# Also store metadata other than axis labels (index values)

# sequences of labels used while creating a series or dataframe are internally converted into
series5 = pd.Series(range(3), index=['a','b','c']) # index square braket

series5.index

Index(['a', 'b', 'c'], dtype='object')

#subsetting
ind1 = series5.index
ind1[1:] #metion starting point

Index(['b', 'c'], dtype='object')

ind1[0:2]

Index(['a', 'b'], dtype='object')

#immutability
ind1[1] = 'd'
# output is error : type error
# index does not support mutable operation

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-126-65fbb217ef82> in <cell line: 2>()
1 #immutability
----> 2 ind1[1] = 'd'
3 # output is error : type error
4 # index does not support mutable operation

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in
__setitem__(self, key, value)
5300 @final
5301 def __setitem__(self, key, value):
-> 5302 raise TypeError("Index does not support mutable operations")
5303
5304 def __getitem__(self, key):

TypeError: Index does not support mutable operations

SEARCH STACK OVERFLOW

# creating index using Index function


ind2 = pd.Index(np.arange(3))

ind2

Int64Index([0, 1, 2], dtype='int64')

series6 = pd.Series([1.5,-2.5,0], index=ind2)

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 7/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
series6

0 1.5
1 -2.5
2 0.0
dtype: float64

series6.index is ind2

True

# Index objects can have duplicate elements


ind3 = pd.Index(['beta','alpha','beta','sigma'])

ind3

Index(['beta', 'alpha', 'beta', 'sigma'], dtype='object')

series7 = pd.Series([1.1,1.2,1.3,1.4], index=ind3)

series7

beta 1.1
alpha 1.2
beta 1.3
sigma 1.4
dtype: float64

# two rows with same index beta, lets see


series7['beta'] # shows all rows with label beta

beta 1.1
beta 1.3
dtype: float64

series10

0 blue
2 purple
4 yellow
dtype: object

series11 = series10.reindex(range(6), method = 'ffill')


series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# reindexing with dataframe


df4 = pd.DataFrame((np.arange(9).reshape(3,3)),index=['a','c','d'], columns=['Delhi','Mumbai','Bangalore'])

df4

Delhi Mumbai Bangalore

a 0 1 2

c 3 4 5

d 6 7 8

#reindexing rows index


df5 = df4.reindex(['a','b','c','d'])

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 8/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0


#reindexing columns index
#usingdcolumns
6.0 argument
7.0 8.0
df5.reindex(columns=['Banglore','Delhi','Mumbai'])

Banglore Delhi Mumbai

a NaN 0.0 1.0

b NaN NaN NaN

c NaN 3.0 4.0

d NaN 6.0 7.0

# droping cases or rows


df5.drop('b') # df5.drop(['a','c']) dropping more rows by passing list argument

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

#Dropping columns or variables


df5.drop('Bangalore', axis=1) # we hv to specify axis, as to confirm that column is to be dropped

Delhi Mumbai

a 0.0 1.0

b NaN NaN

c 3.0 4.0

d 6.0 7.0

# df5.drop('Delhi','Mumbai'], axis=1) # dropping multiple columns

# using in-place version of drop method


# using implace argument
df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b', inplace = True) #


df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# Ex. access, selection, and filtering


# Working with a series
series11

0 blue
1 blue
2 purple
3 purple
4 yellow

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 9/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
5 yellow
dtype: object

series7_1 = pd.DataFrame([1.1,1.3,1.3,1.3],index= ['beta','alpha','beta','sigma'])

series7_1

beta 1.1

alpha 1.3

beta 1.3

sigma 1.3

# accessing one row


series11[3] # square brakets

'purple'

series7_1['sigma'] # square braket with single quotes


# 1.3

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-153-8bdd86d373a9> in <cell line: 1>()
----> 1 series7_1['sigma'] # square braket with single quotes
2 # 1.3

1 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py in
get_loc(self, key, method, tolerance)
393 raise KeyError(key) from err
394 self._check_indexing_error(key)
--> 395 raise KeyError(key)
396 return super().get_loc(key, method=method, tolerance=tolerance)
397

KeyError: 'sigma'

SEARCH STACK OVERFLOW

# accessing a range of rows


series11[3:5]

3 purple
4 yellow
dtype: object

# selecting few rows


series7_1[['sigma','alpha']] #in index use square brakets always

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-156-b23e739c9d06> in <cell line: 2>()
1 # selecting few rows
----> 2 series7_1[['sigma','alpha']] #in index use square brakets always

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in
_raise_if_missing(self, key, indexer, axis_name)
6128 if use_interval_msg:
6129 key = list(key)
-> 6130 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
6131
6132 not_found = list(ensure_index(key)[missing_mask.nonzero()
[0]].unique())

KeyError: "None of [Index(['sigma', 'alpha'], dtype='object')] are in the [columns]"

# Using Earlier created series 7


# selecting few rows
series7[['sigma','alpha']]

sigma 1.4
alpha 1.2
dtype: float64

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 10/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory
# filtering rows based on some logic
series7[series7<1.3]

beta 1.1
alpha 1.2
dtype: float64

# slicing

# Note : end-point is inclusive in series and dataframe


# series7['alpha','sigma'] shows error
series7['alpha':'sigma']

alpha 1.2
beta 1.3
sigma 1.4
dtype: float64

series7['alpha':'sigma'] = 1.3 # asssigning 1.3

series7 # values assigned

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

series7['beta']

beta 1.1
beta 1.3
dtype: float64

## interaction mechanism with the data in a series or dataframe

# Ex. reindexing
# using redindex method
series8 = pd.Series([4.5, 7.2, -5.3, 3.6], index = ['d', 'b', 'a', 'c'])

#pd.Series([],index= [])

series8

d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64

series9 = series8.reindex(['a','b','c','d','e'])

series9

a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64

# filling values whe reindexing


# using ffill method

series10 = pd.Series(['blue','purple','yellow'],index = [0,2,4])

series10

0 blue
2 purple
4 yellow
dtype: object

series11 = series10.reindex(range(6), method = 'ffill')

series11

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 11/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# reindexing with dataframe


df4 = pd.DataFrame(np.arange(9).reshape((3,3)), index=['a','c','d'], columns = ['Delhi','Mumbai','Banglore'])

# Error chnaces
#1. index=['a','b',c','d']
# brakets

df4

Delhi Mumbai Banglore

a 0 1 2

c 3 4 5

d 6 7 8

# reindexing rows index

df5 = df4.reindex(['a','b','c','d'])

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# reindexing column index, using column argument

df5.reindex(columns = ['Banglore','Delhi','Mumbai'])

Banglore Delhi Mumbai

a 2.0 0.0 1.0

b NaN NaN NaN

c 5.0 3.0 4.0

d 8.0 6.0 7.0

# reindexing using loc attribute


df4.loc[['a','b','c','d'],['Banglore','Delhi','Mumbai']]

# previously its working but not in recent updates

A function that takes the passenger name as input, checks which booking class the passenger belongs to, and returns the average delay time.

# Accept passenger name input from user


passenger_name = input("Enter passenger name: ")

booking_class = df.loc[df['Name'] == passenger_name , 'Booking Class'].iloc[0]


print("Booking class for passenger", passenger_name, ":", booking_class)
delay_min = df.loc[df['Name'] == passenger_name , 'Delay Minutes'].iloc[0]
print("Delay minutes for passenger", passenger_name, ":", delay_min)

Enter passenger name: Amber Mcclure Booking class for passenger Amber Mcclure : First Delay minutes for passenger Amber Mcclure : 8

# Ex. dropping cases or rows


## using drop method

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 12/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b')

Delhi Mumbai Banglore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# df5.drop('a','c') error
# have to pass list
df5.drop(['a','c'])

Delhi Mumbai Banglore

b NaN NaN NaN

d 6.0 7.0 8.0

df5.drop('Banglore', axis=1)

Delhi Mumbai

a 0.0 1.0

b NaN NaN

c 3.0 4.0

d 6.0 7.0

# dropping multiple columns


# multiple columns in square bracket followed by ,

df5.drop(['Delhi','Mumbai'], axis=1)

Banglore

a 2.0

b NaN

c 5.0

d 8.0

# using in-place version of drop method


#using inplace argument

df5

Delhi Mumbai Banglore

a 0.0 1.0 2.0

b NaN NaN NaN

c 3.0 4.0 5.0

d 6.0 7.0 8.0

df5.drop('b', inplace = True)

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 13/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Banglore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0


# access, selection, filtering

# working with a series

series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

series7

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

# accessing one row


series11[3] #integer index 3

'purple'

series7['sigma'] # string index

1.3

# accessing a range of row


series11[3:5]

3 purple
4 yellow
dtype: object

# selecting few row


series7[['sigma','alpha']]

sigma 1.3
alpha 1.3
dtype: float64

series11[[1,5]] # selecting row index 1 & 5

1 blue
5 yellow
dtype: object

# filtering rows based on some logic


series7[series7<1.3]

beta 1.1
dtype: float64

# slicing
# Note: end point inclusive on series and dataframe

series7['alpha':'sigma']

#series7['alpha','sigma'] not work

alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

series7['alpha':'sigma'] = 1.3

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 14/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

series7

beta 1.1
alpha 1.3
beta 1.3
sigma 1.3
dtype: float64

#### New Topic Starts


# working with DataFrame
df5

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

d 6.0 7.0 8.0

# accessing columns
df5['Delhi']

a 0.0
c 3.0
d 6.0
Name: Delhi, dtype: float64

df5[['Delhi','Bangalore']]

Delhi Bangalore

a 0.0 2.0

c 3.0 5.0

d 6.0 8.0

# slicing rows
df5[:2] # starting point not specified

Delhi Mumbai Bangalore

a 0.0 1.0 2.0

c 3.0 4.0 5.0

df5[1:2]

Delhi Mumbai Bangalore

c 3.0 4.0 5.0

# filtering rows
df5[df5['Bangalore']>5]

Delhi Mumbai Bangalore

d 6.0 7.0 8.0

# working with a boolean dataframe


df5 < 5

Delhi Mumbai Bangalore

a True True True

c True True False

d False False False

df5[df5<5] = 0

df5

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 15/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

Delhi Mumbai Bangalore

a 0.0 0.0 0.0

c 0.0 0.0 5.0


## indexing
d operators
6.0 7.0 loc and iloc
8.0 , gives precise indexing
# loc is used with axis labels
# iloc is used with integer values corresponding to axis labels
df5

Delhi Mumbai Bangalore

a 0.0 0.0 0.0

c 0.0 0.0 5.0

d 6.0 7.0 8.0

#syntax: dataframe.loc[row,column] square brakets


# accessing sigle elemnt
df5.loc['a','Delhi']

0.0

# df5.loc['a','Delhi','Bangalore'] Gives indexing error


df5.loc['a',['Delhi','Banglore']]

Delhi 0.0
Banglore 2.0
Name: a, dtype: float64

# df5.loc['a','b'],['Delhi','Bangalore']
# not allowed now , earlier shows results.

df5.iloc[0,0] #[row, column]

0.0

df5.iloc[0,[0,1]] # selecting two columns

Delhi 0.0
Mumbai 0.0
Name: a, dtype: float64

df5.iloc[[0,1],[0,1]] # 2 row , 2 columns

Delhi Mumbai

a 0.0 0.0

c 0.0 0.0

df5.iloc[1] # second row

Delhi 0.0
Mumbai 0.0
Bangalore 5.0
Name: c, dtype: float64

df5.iloc[[1,2],[2,0,1]] # 2 row , 3 columns

Bangalore Delhi Mumbai

c 5.0 0.0 0.0

d 8.0 6.0 7.0

# slicing with loc and iloc


df5.loc[:'c', 'Delhi']

a 0.0
c 0.0
Name: Delhi, dtype: float64

df5.loc['c':'d', 'Delhi']

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 16/17
6/25/23, 10:02 PM Copy of Copy of IIT R DIXIT DATAFRAMES.ipynb - Colaboratory

c 0.0
d 6.0
Name: Delhi, dtype: float64

df5.loc['c':'d', 'Delhi':'Bangalore']

Delhi Mumbai Bangalore

c 0.0 0.0 5.0

d 6.0 7.0 8.0

df5.iloc[:, 1:2]
# : shows all rows, all columns are to selected
# on column side we are selecting 1,2

Mumbai

a 0.0

c 0.0

d 7.0

df5.iloc[:,0:2][df5.Mumbai != 0]
# Filtering apart from slicing
# all rows, 0 to 2 columns are going to be selected
# only those columns are selected in which value is not equal to 0

Delhi Mumbai

d 6.0 7.0

# # more on integer indexing


series11

0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object

# fails due to inference ambiguity


series11[-3]

series7[-1] #doesnot fails

1.3

check 0s completed at 10:00 PM

https://colab.research.google.com/drive/1KY8qW02fPGwZI8S3tRjhoj8Z1alZWqCf#scrollTo=iDltQkrkPOd6&uniqifier=1&printMode=true 17/17

You might also like