0% found this document useful (0 votes)
18 views7 pages

Pandas Practice

The document outlines a practice lab for using the Pandas library in Python, focusing on creating DataFrames and Series, as well as selecting and slicing data. It includes exercises on using the loc() and iloc() functions for data selection, along with practical coding examples. The lab aims to enhance understanding of data manipulation using Pandas within a 30-minute timeframe.

Uploaded by

mktpvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Pandas Practice

The document outlines a practice lab for using the Pandas library in Python, focusing on creating DataFrames and Series, as well as selecting and slicing data. It includes exercises on using the loc() and iloc() functions for data selection, along with practical coding examples. The lab aims to enhance understanding of data manipulation using Pandas within a 30-minute timeframe.

Uploaded by

mktpvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Pandas_Practice

March 6, 2025

1 Practice Lab: Selecting data in a Dataframe


Estimated time needed: 30 minutes

1.1 Objectives
After completing this lab you will be able to:
• Use Pandas Library to create DataFrame and Series
• Locate data in the DataFrame using loc() and iloc() functions
• Use slicing

1.1.1 Exercise 1: Pandas: DataFrame and Series


Pandas is a popular library for data analysis built on top of the Python programming language.
Pandas generally provide two data structures for manipulating data, They are:
• DataFrame
• Series
A DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows
and columns.
• A Pandas DataFrame will be created by loading the datasets from existing storage.
• Storage can be SQL Database, CSV file, Excel file, etc.
• It can also be created from the lists, dictionaries, and from a list of dictionaries.
Series represents a one-dimensional array of indexed data. It has two main components : 1. An
array of actual data. 2. An associated array of indexes or data labels.
The index is used to access individual data values. You can also get a column of a dataframe as a
Series. You can think of a Pandas series as a 1-D dataframe.

[1]: !pip install pandas

Collecting pandas
Downloading
pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
(89 kB)
Collecting numpy>=1.26.0 (from pandas)
Downloading
numpy-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata

1
(62 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in
/opt/conda/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.12/site-
packages (from pandas) (2024.2)
Collecting tzdata>=2022.7 (from pandas)
Downloading tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.12/site-
packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Downloading
pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7
MB)
���������������������������������������� 12.7/12.7 MB
118.7 MB/s eta 0:00:00
Downloading
numpy-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.1 MB)
���������������������������������������� 16.1/16.1 MB
150.7 MB/s eta 0:00:00
Downloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)
Installing collected packages: tzdata, numpy, pandas
Successfully installed numpy-2.2.3 pandas-2.2.3 tzdata-2025.1

[2]: # let us import the Pandas Library


import pandas as pd

Once you’ve imported pandas, you can then use the functions built in it to create and analyze data.
In this practice lab, we will learn how to create a DataFrame out of a dictionary.
Let us consider a dictionary ‘x’ with keys and values as shown below.
We then create a dataframe from the dictionary using the function pd.DataFrame(dict)

[3]: #Define a dictionary 'x'

x = {'Name': ['Rose','John', 'Jane', 'Mary'], 'ID': [1, 2, 3, 4], 'Department':␣


↪['Architect Group', 'Software Group', 'Design Team', 'Infrastructure'],

'Salary':[100000, 80000, 50000, 60000]}

#casting the dictionary to a DataFrame


df = pd.DataFrame(x)

#display the result df


df

[3]: Name ID Department Salary


0 Rose 1 Architect Group 100000
1 John 2 Software Group 80000
2 Jane 3 Design Team 50000

2
3 Mary 4 Infrastructure 60000

We can see the direct correspondence between the table. The keys correspond to the column labels
and the values or lists correspond to the rows.

Column Selection: To select a column in Pandas DataFrame, we can either access the columns
by calling them by their columns name.
Let’s Retrieve the data present in the ID column.

[4]: #Retrieving the "ID" column and assigning it to a variable x


x = df[['ID']]
x

[4]: ID
0 1
1 2
2 3
3 4

Let’s use the type() function and check the type of the variable.

[5]: #check the type of x


type(x)

[5]: pandas.core.frame.DataFrame

The output shows us that the type of the variable is a DataFrame object.

Access to multiple columns Let us retrieve the data for Department, Salary and ID columns

[6]: #Retrieving the Department, Salary and ID columns and assigning it to a␣


↪variable z

z = df[['Department','Salary','ID']]
z

[6]: Department Salary ID


0 Architect Group 100000 1
1 Software Group 80000 2
2 Design Team 50000 3
3 Infrastructure 60000 4

1.1.2 Try it yourself


Problem 1: Create a dataframe to display the result as below:
[7]: #write your code here

Click here for the solution

3
a = {'Student':['David', 'Samuel', 'Terry', 'Evan'],
'Age':['27', '24', '22', '32'],
'Country':['UK', 'Canada', 'China', 'USA'],
'Course':['Python','Data Structures','Machine Learning','Web Development'],
'Marks':['85','72','89','76']}
df1 = pd.DataFrame(a)
df1

Problem 2: Retrieve the Marks column and assign it to a variable b


[8]: #write your code here

Click here for the solution


b = df1[['Marks']]
b

Problem 3: Retrieve the Country and Course columns and assign it to a variable c
[9]: #write your code here

Click here for the solution


c = df1[['Country','Course']]
c

To view the column as a series, just use one bracket:


[10]: # Get the Student column as a series Object

x = df1['Student']
x

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 3
1 # Get the Student column as a series Object
----> 3 x = df1['Student']
4 x

NameError: name 'df1' is not defined

[ ]: #check the type of x


type(x)

The output shows us that the type of the variable is a Series object.

4
1.1.3 Exercise 2: loc() and iloc() functions
loc() is a label-based data selecting method which means that we have to pass the name of the row
or column that we want to select. This method includes the last element of the range passed in it.
Simple syntax for your understanding:
• loc[row_label, column_label]
iloc() is an indexed-based selecting method which means that we have to pass an integer index in
the method to select a specific row/column. This method does not include the last element of the
range passed in it.
Simple syntax for your understanding:
• iloc[row_index, column_index]
Let us see some examples on the same.

[ ]: # Access the value on the first row and the first column

df.iloc[0, 0]

[ ]: # Access the value on the first row and the third column

df.iloc[0,2]

[ ]: # Access the column using the name

df.loc[0, 'Salary']

Let us create a new dataframe called ‘df2’ and assign ‘df’ to it. Now, let us set the “Name” column
as an index column using the method set_index().

[ ]: df2=df
df2=df2.set_index("Name")

[ ]: #To display the first 5 rows of new dataframe


df2.head()

[ ]: #Now, let us access the column using the name


df2.loc['Jane', 'Salary']

1.1.4 Try it yourself


Use the loc() function,to get the Department of Jane in the newly created dataframe df2.

[ ]: #write your code here

Click here for the solution


df2.loc['Jane', 'Department']

5
Use the iloc() function to get the Salary of Mary in the newly created dataframe df2.

[ ]: #write your code here

Click here for the solution


df2.iloc[3,2]

1.1.5 Exercise 3: Slicing


Slicing uses the [] operator to select a set of rows and/or columns from a DataFrame.
To slice out a set of rows, you use this syntax: data[start:stop],
here the start represents the index from where to consider, and stop represents the index one step
BEYOND the row you want to select. You can perform slicing using both the index and the name
of the column.
NOTE: When slicing in pandas, the start bound is included in the output.
So if you want to select rows 0, 1, and 2 your code would look like this: df.iloc[0:3].
It means you are telling Python to start at index 0 and select rows 0, 1, 2 up to but not including
3.
NOTE: Labels must be found in the DataFrame or you will get a KeyError.
Indexing by labels(i.e. using loc()) differs from indexing by integers (i.e. using iloc()). With loc(),
both the start bound and the stop bound are inclusive. When using loc(), integers can be used,
but the integers refer to the index label and not the position.
For example, using loc() and select 1:4 will get a different result than using iloc() to select rows
1:4.
We can also select a specific data value using a row and column location within the DataFrame
and iloc indexing.

[ ]: # let us do the slicing using old dataframe df

df.iloc[0:2, 0:3]

[ ]: #let us do the slicing using loc() function on old dataframe df where index␣
↪column is having labels as 0,1,2

df.loc[0:2,'ID':'Department']

[ ]: #let us do the slicing using loc() function on new dataframe df2 where index␣
↪column is Name having labels: Rose, John and Jane

df2.loc['Rose':'Jane', 'ID':'Department']

Try it yourself
using loc() function, do slicing on old dataframe df to retrieve the Name, ID and department of
index column having labels as 2,3

6
[ ]: # Write your code below and press Shift+Enter to execute

Click here for the solution


df.loc[2:3,'Name':'Department']

Congratulations, you have completed this lesson and the practice lab on Pandas

1.2 Author(s):
Appalabhaktula Hema
##
© IBM Corporation 2022. All rights reserved.
<!–## Change Log

Date
(YYYY-MM-DD) Version Changed By Change Description
2022-03-31 0.1 Appalabhaktula Created initial version
Hema

–!>

[ ]:

[ ]:

You might also like