0% found this document useful (0 votes)
4 views5 pages

AI Lab5

This document outlines a practical lab exercise for students at Sukkur IBA University focused on Exploratory Data Analysis (EDA) using Python. It includes objectives, a description of EDA, and a step-by-step guide on analyzing the Titanic dataset, along with tasks for students to perform EDA on additional datasets. The lab emphasizes the use of visualization techniques to uncover insights and patterns in data.

Uploaded by

ayesha mangrio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

AI Lab5

This document outlines a practical lab exercise for students at Sukkur IBA University focused on Exploratory Data Analysis (EDA) using Python. It includes objectives, a description of EDA, and a step-by-step guide on analyzing the Titanic dataset, along with tasks for students to perform EDA on additional datasets. The lab emphasizes the use of visualization techniques to uncover insights and patterns in data.

Uploaded by

ayesha mangrio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Sukkur IBA University

Department of Computer Science


6 th
Semester 3rd Year

Artificial Intelligence - Lab

Practical No. 5
To perform Exploratory Data Analysis using python
Student’s Roll no: _______________ Points Scored: __________________________

Date of Conduct: ________________ Teacher’s Signature: ___________________

LAB DATA ANALYSIS ABILITY TO


SUBJECT CALCULATION OBSERVATION/
PERFORMANCE KNOWLEDGE
AND CONDUCT PRESENTATION
AND CODING RESULTS
SCORE
INDICATOR INTERPRETATION EXPERIMENT

 OBJECTIVES: Upon successful completion of this practical, the students will be able to:
 Understand the Exploratory Data Analysis
 Perform the Exploratory Data Analysis on a given dataset.

Exploratory Data Analysis

Exploratory Data Analysis or EDA is used to take insights from the data. Data Scientists and
Analysts try to find different patterns, relations, and anomalies in the data using some statistical
graphs and other visualization techniques. Following things are part of EDA :
● Get maximum insights from a data set.
● Uncover underlying structure.
● Extract important variables from the dataset.
● Detect outliers and anomalies(if any).
● Test underlying assumptions.
● Determine the optimal factor settings.

How to perform Exploratory Data Analysis using Python


 Titanic Dataset – Downlaod link https://www.kaggle.com/c/titanic/data
It is one of the most popular datasets used for understanding machine learning basics. It contains
information of all the passengers aboard the RMS Titanic, which unfortunately was shipwrecked.
This dataset can be used to predict whether a given passenger survived or not.

Steps:

To check null values

The columns having null values are: Age, Cabin, Embarked. They need to be filled up with
appropriate values later on.
 Using Seaborn
Seaborn:
It is a python library used to statistically visualize data. Seaborn, built over Matplotlib, provides a
better interface and ease of usage. It can be installed using the following command,

 Visualizing data using graphical analysis


Just by observing the graph, it can be approximated that the survival rate of men is around 20% and
that of women is around 75%. Therefore, whether a passenger is a male or a female plays an
important role in determining if one is going to survive.
 Pclass (Ordinal Feature) vs Survived

It helps in determining if higher-class passengers had more survival rate than the lower class ones
or vice versa. Class 1 passengers have a higher survival chance compared to classes 2 and 3. It
implies that Pclass contributes a lot to a passenger’s survival rate.
 Age (Continuous Feature) vs Survived
This graph gives a summary of the age range of men, women and children who were saved. The
survival rate is –
 Good for children.
 High for women in the age range 20-50.
 Less for men as the age increases.
Since Age column is important, the missing values need to be filled, either by using the Name
column(ascertaining age based on salutation – Mr, Mrs etc.) or by using a regressor.
After this step, another column – Age_Range (based on age column) can be created and the data can
be analyzed again.
 Bar Plot for Fare (Continuous Feature)

Fare denotes the fare paid by a passenger. As the values in this column are continuous, they need to
be put in separate bins(as done for Age feature) to get a clear idea. It can be concluded that if a
passenger paid a higher fare, the survival rate is more.
 Categorical Count Plots for Embarked Feature
Some notable observations are:
Majority of the passengers boarded from S. So, the missing values can be filled with S.
Majority of class 3 passengers boarded from Q.
S looks lucky for class 1 and 2 passengers compared to class 3.

Lab Tasks

1. Perform lab work task, add code, and output screenshots.

2. Perform EDA on iris dataset- download link https://datahub.io/machine-learning/iris

3. Perform EDA on Marketing_Data_Analysis ,


https://github.com/Kaushik-Varma/Marketing_Data_Analysis

The End

You might also like