REPORT - Assignment 1

This document summarizes a student project analyzing a dataset of 12,000 car entries. The students used Python and pandas to load and explore the dataset. They investigated central tendencies and dispersion, identified and removed duplicate and null values, and detected and removed outliers. Visualizations were created to better understand patterns in the data. The overall goal was to clean the data and gain insights through exploratory analysis using various Python libraries.

Uploaded by

hardik solanki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views2 pages

REPORT - Assignment 1

Uploaded by

hardik solanki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Visualization and Exploratory Data Analysis

Katleen Ezekeil Orata (c0848019) Hardik Solanki (0852302) Mohammad Imran Uddin (c0800487)
Artificial Intelligence & Machine Learning Artificial Intelligence & Machine Learning Artificial Intelligence & Machine Learning
Program Program Program
Lambton College Lambton College Lambton College
Toronto, Canada Toronto, Canada Toronto, Canada
[email protected] [email protected] [email protected]

Abstract—This electronic document is a “live” template and The.head() and.tail() functions are used by the students to
already defines the components of your paper [title, text, heads, investigate the head and the tail, or the first and last rows of
etc.] in its style sheet. *CRITICAL: Do Not Use Symbols, Special the dataset. This provides the students with a quick look at the
Characters, Footnotes, or Math in Paper Title or Abstract.
information, assisting them in developing a hypothesis and
(Abstract)
giving them an indication of the type of analysis they can
Keywords—data preprocessing, exploratory data analysis, conduct. The students occasionally discovered some
central tendency, dispersion, outlier, visualization duplicated rows using the.head() and.tail() functions, as seen
in Figure 2.
I. INTRODUCTION
The students are tasked to investigate a dataset (data.csv) that
contains 12,000 observations with 16 different attributes and
perform an exploratory data analysis. The purpose of the
assignment is for the students to investigate various Python
libraries that may be applied to the analysis, manage typical
data mistakes, and illustrate patterns and insights from the
data.

II. DATASET
The dataset contains 12,000 entries with a total of 16 columns.
Each column describes the various feature of the cars such as
make, model, engine fuel type, fuel type, popularity, etc.

III. . DATA LOADING AND OVERVIEW

For reading and exploring the dataset, we imported the Python
pandas library and utilized its various methods to load the Figure 3: Result using the .info() method
dataset and do an initial inspection of the dataset.
The function.info() and .describe() function is also utilized
during the initial stages of the analysis. The function helped
the students gain insights not just into the number of
observations, and the total number of features but also into the
types of data and the count of non-null and null values.

IV. HANDLING DUPLICATE AND NULL VALUES

Figure 1: The head or the first five rows of the dataset
Many datasets in the real world contain a lot of incomplete
and erroneous information, giving them poor quality. One
form of inaccurate data is duplicated value or when all of the
values in at least one row match all of the values in another
row, that value is considered to be duplicate. To manage this,
the team used the pandas duplicated() to determine the total
number of duplicate values present in the dataset and then
Figure 2: The tail or the last five rows of the dataset used drop_dulicates() function to eliminate 801 rows' worth of
duplicate data.
Figure 3: The head or the first five rows of the dataset
Another form of poor quality data is the existence of null or
missing values. Prior to handling the NaN values, the team
decided to perform a review on dataset distribution and VI. OUTLIER DETECTION AND REMOVAL
dispersion which will be discussed in Section V of the paper.
The insight from this analysis will be used to decide which The term "outlier" refers to a data point or observation that
imputing technique is best suited for the dataset. significantly deviates from the data set's norm or average.
Outliers can distort perceptions of statistical results by having
a large impact on statistics like the mean and other measures
V. MEASURE OF CENTRAL TENDENCY of central tendency. In addition, it has the potential to mislead
One of the foundations of advanced analytics and data science machine learning model training, leading to longer training
is descriptive statistics. Descriptive statistics are the times, less accurate models, and ultimately subpar outcomes.
measurements that provide a summary of a set of data which
may be further subdivided into measures of central tendency To detect outliers and handle the outliers, the students used
and measures of dispersion. inter quartile range.

Pandas’ built-in function was used by the students in

measuring central tendency and measures of variability.
Measures include mean, median, mode, standard deviation,
variance, and skewness.

A. Data Loading and Overview

VII. DATA VISUALIZATION

EXCEL ENERGY - Bill
0% (1)
EXCEL ENERGY - Bill
3 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Roadmap B1+ SB
No ratings yet
Roadmap B1+ SB
176 pages
Data Visualization and Story Telling Notes
No ratings yet
Data Visualization and Story Telling Notes
31 pages
Design Engineer Interview Questions
0% (1)
Design Engineer Interview Questions
2 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Personal Details Update Dbs
No ratings yet
Personal Details Update Dbs
1 page
Equations and Patterns
No ratings yet
Equations and Patterns
230 pages
Safety Systems and Accident Theory SSAT Reader 2021 09 29
No ratings yet
Safety Systems and Accident Theory SSAT Reader 2021 09 29
283 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Asterix-Conducteur Et Parties
100% (1)
Asterix-Conducteur Et Parties
63 pages
Lab 1: Getting Started
No ratings yet
Lab 1: Getting Started
51 pages
Phython Example
No ratings yet
Phython Example
12 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
ML Lab Manual Bcsl602
No ratings yet
ML Lab Manual Bcsl602
108 pages
Quiz 1 Patterns of Paragraph Development
No ratings yet
Quiz 1 Patterns of Paragraph Development
7 pages
Ft-950 Usa Exp Eu Om Eng Eh031h206
No ratings yet
Ft-950 Usa Exp Eu Om Eng Eh031h206
132 pages
Turbine Monitoring and Control: Aset - Eee
No ratings yet
Turbine Monitoring and Control: Aset - Eee
16 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Principles of AI Laboratory Varshadr
No ratings yet
Principles of AI Laboratory Varshadr
54 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Practical No.-01
No ratings yet
Practical No.-01
25 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
Handbook of Experimental Structural Dynamics - Peter Avitable - Randall Allemag - 2017
No ratings yet
Handbook of Experimental Structural Dynamics - Peter Avitable - Randall Allemag - 2017
7 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
2022 JamesCook Katalog EN Homepage
No ratings yet
2022 JamesCook Katalog EN Homepage
36 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Whatsapp Document PDF
No ratings yet
Whatsapp Document PDF
5 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
UNIT02
No ratings yet
UNIT02
41 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Research Methodogy Class 5
No ratings yet
Research Methodogy Class 5
29 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Course File OS Session 2022-23
No ratings yet
Course File OS Session 2022-23
34 pages
Research Methodogy Class 4
No ratings yet
Research Methodogy Class 4
29 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Acquisition Python
No ratings yet
Data Acquisition Python
12 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
What Is Data Visualization and Why Is It Important
No ratings yet
What Is Data Visualization and Why Is It Important
18 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Program-1
No ratings yet
Program-1
15 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Experiment - 1 csd201
No ratings yet
Experiment - 1 csd201
19 pages
Technology NEW Vocab Parts 1-2-3
No ratings yet
Technology NEW Vocab Parts 1-2-3
21 pages
AIML Expt
No ratings yet
AIML Expt
7 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
9CSC006267e - PROFIsafe Safety Functions Module - 11122023 - EN
No ratings yet
9CSC006267e - PROFIsafe Safety Functions Module - 11122023 - EN
29 pages
Engineering Maths
No ratings yet
Engineering Maths
2 pages
DAC Phase3
No ratings yet
DAC Phase3
6 pages
ML Report
No ratings yet
ML Report
12 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Lab Report - House Made of Pasta
No ratings yet
Lab Report - House Made of Pasta
14 pages
Logg 20250509
No ratings yet
Logg 20250509
21 pages
Company Profile-Falcon Comp
No ratings yet
Company Profile-Falcon Comp
9 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Tutorial 4
No ratings yet
Tutorial 4
8 pages
DS&ML 4
No ratings yet
DS&ML 4
9 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
Z-Y (+) Impact of Skill Enhancement Training On Quality of Work Life
No ratings yet
Z-Y (+) Impact of Skill Enhancement Training On Quality of Work Life
21 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Tectura Cloud Capability - 2017
No ratings yet
Tectura Cloud Capability - 2017
26 pages
Machine
No ratings yet
Machine
10 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Recommendation
No ratings yet
Recommendation
3 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Lab 3 Report
No ratings yet
Lab 3 Report
4 pages
TAM Final LAS
No ratings yet
TAM Final LAS
4 pages
Employees' State Insurance Corporation E-Pehchan Card: Personal Details
No ratings yet
Employees' State Insurance Corporation E-Pehchan Card: Personal Details
2 pages
DSBDL Asg 2 Write Up
No ratings yet
DSBDL Asg 2 Write Up
4 pages
Guidelines DAVP
No ratings yet
Guidelines DAVP
3 pages
Aishwarya Digitec Profile Present
No ratings yet
Aishwarya Digitec Profile Present
11 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Assure Model For Educational Media
No ratings yet
Assure Model For Educational Media
8 pages
Remarks On A Tropical Key Exchange System: Dylan Rudy Chris Monico
No ratings yet
Remarks On A Tropical Key Exchange System: Dylan Rudy Chris Monico
4 pages
Jadual
No ratings yet
Jadual
4 pages
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet

REPORT - Assignment 1

Uploaded by

REPORT - Assignment 1

Uploaded by

Data Visualization and Exploratory Data Analysis

III. . DATA LOADING AND OVERVIEW

IV. HANDLING DUPLICATE AND NULL VALUES

Pandas’ built-in function was used by the students in

A. Data Loading and Overview

You might also like