0% found this document useful (0 votes)
12 views5 pages

Data Science Lab 3

Uploaded by

Tayyaba Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Data Science Lab 3

Uploaded by

Tayyaba Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Bahria University, Islamabad Campus

Department of Computer Science

Department of Computing

CSL487: Introduction to Data Science Lab

Class: BSCS-6A

Lab 3: Working with Data in Python

Date: 20 -2-2020

Time: 8.30 AM-11:00AM

Instructor: Tayyaba Faisal


Bahria University, Islamabad Campus
Department of Computer Science
Lab 3: Working with Data in Python

Introduction

The purpose of this lab is to get familiar with Data Science by Python. In this lab we explore
data in Python, using examples. I encourage you to type all python commands your own
machine.

Tools/Software Requirement
Python, Jupyter Notebook

Note: Comment your program.

Let’s get started


Start by importing the Pandas module and loading the data set into Python environment
as Pandas Dataframe:

Boolean Indexing in Pandas


If you want to filter values of a column based on conditions from another set of columns
from a Pandas Dataframe.

For instance, we want a list of all females who are not graduates and got a
loan. Boolean indexing can help here. You can use the following code:

data.loc[(data["Gender"]=="Female") & (data["Education"]=="Not Graduate") & (data["Loan_Status"]=="Y"),

["Gender","Education","Loan_Status"]]
Bahria University, Islamabad Campus
Department of Computer Science
Data types of dataframe

#Check current type:

data.dtypes

Filter data in a Pandas DataFrame


We can also apply conditions to the data we are inspecting, such as to filter our data.

dataframe.Height > 1300

Would return:

0 True
1 True
2 False
3 False
4 False
Name: Height, dtype: bool

This returns a new Series of True/False values though. To actually filter the data, we
need to use this Series to mask our original DataFrame:

dataframe[dataframe.Height > 1300]

Append data to an existing DataFrame


We can also append data to the DataFrame. This is done using the following syntax:
Bahria University, Islamabad Campus
Department of Computer Science
dataframe['Region'] = ['Grampian', 'Cairngorm', 'Cairngorm', 'Cairngorm', 'Cairngorm']

Sorting Pandas DataFrames


Pandas allow easy sorting based on multiple columns. This can be done as:

data_sorted = data.sort_values(['ApplicantIncome','CoapplicantIncome'], ascending=False)

Missing Values in DataSet


1. Drop Rows

Sometimes csv file has null values, which are later displayed as NaN in Data
Frame. Pandas dropna() method allows the user to analyze and drop
Rows/Columns with Null values in different ways

DataFrameName.dropna(axis=0, how='any', thresh=None, subset=None,


inplace=False)

2. Replace Missing Values


Csv file has null values, which are later displayed as NaN in Data Frame. Just like
pandas dropna() method manage and remove Null values from a data
frame, fillna() manages and let the user replace NaN values with some value of
their own.
DataFrame.fillna(value=None, method=None, axis=None,
inplace=False, limit=None, downcast=None)

LAB TASKS

Task 1
Create a dataframe of your name with atleast 6 attributes insert data and Filter by chain
method. Display dataframe and filter results
(For example you can create dataframe of car with its specifications)

Task 2

Bigmart-sales dataset
Retail is another industry which extensively uses analytics to optimize business processes.
Tasks like product placement, inventory management, customized offers, product bundling,
etc. are being smartly handled using data science techniques. As the name suggests, this
data comprises of transaction records of a sales store. The data has 8523 rows of 12
variables

Import bigmart-sales dataset (download from piazza)


Bahria University, Islamabad Campus
Department of Computer Science
 display data
 display datatypes of attributes
 filter by item_type=frozen foods
 append 5 records in existing dataset sort in ascending order and display
 plot boxplot and histogram by outlet_type mention no of bins in histogram
 replace item_weight missing values with “1.00”
 replace outlet_size missing values with “Small”
 drop rows whose outlet_type is missing display no of rows after
 plot boxplot and histogram by outlet_type mention no of bins in histogram compare
results after.

Deliverables: Submit Python files as zip archive before the next lab along with lab journal.

You might also like