0% found this document useful (0 votes)

5 views7 pages

10

The document outlines a data visualization assignment using the Iris flower dataset, focusing on analyzing features and their types, creating histograms and box plots, and identifying outliers. It includes prerequisites, learning objectives, and a summary of statistical methods relevant to data analysis. The assignment aims to enhance students' understanding of dataset features, summary statistics, and visualization techniques using Python or R.

Uploaded by

Krishna Ugale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

10

Uploaded by

Krishna Ugale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Data Visualization III

TITLE

Download the Iris flower dataset or any other dataset into a

PROBLEM DataFrame. (e.g., https://archive.ics.uci.edu/ml/datasets/Iris).
STATEMENT/ Scan the dataset and give the inference as:
DEFINITION 1. List down the features and their types (e.g., numeric,
nominal) available in the dataset.
2. Create a histogram for each feature in the dataset to
illustrate the feature distributions.
3. Create a box plot for each feature in the dataset.
4. Compare distributions and identify outliers.

To implement the data visualization techniques

OBJECTIVE

1. Operating System : 64-bit Open source Linux or its

S/W PACKAGES AND derivative
HARDWARE 2. Programming Languages: PYTHON/R
APPARATUS USED

 Mark Gardner, “Beginning R: The Statistical

REFERENCES Programming Language”, Wrox Publication, ISBN: 978-
1-118-16430-3
 David Dietrich, Barry Hiller, “Data Science and Big Data
Analytics”, EMC education services, Wiley publications,
2012, ISBN0-07-120413-X
 Luis Torgo, “Data Mining with R, Learning with Case
Studies”, CRC Press, Talay and Francis Group,
ISBN9781482234893
Refer to student activity flow chart if found necessary
STEPS by subject teacher and relevant to the subject manual.
Describe steps only.
1. Title 2. Problem statement 3. Learning objective 4.
INSTRUCTIONS FOR Learning outcome 5. Theory (includes methods, libraries and
WRITING JOURNAL functions, 6. Analysis (as per assignment), 7. conclusion.

Head of Department Subject Co-ordinator

(Dr. M.S.Takalikar) (Dr. S.S.Sonawane)
P:F:-LTL-UG / 03 / R1

Assignment No. 10

 Aim:

Summary statistics, data visualization, histogram and boxplot for the features on
the Iris dataset or any other dataset.

 Problem Statement / Definition:

o Download the Iris flower dataset or any

other dataset into a DataFrame. (e.g.,
https://archive.ics.uci.edu/ml/datasets/Iris). Scan the dataset and give the
inference as:
 List down the features and their types (e.g., numeric, nominal)
available in the dataset.
 Create a histogram for each feature in the dataset to illustrate the
feature distributions.
 Create a box plot for each feature in the dataset.

 Prerequisites

o Database management system, Python/R programming

 Learning Objectives

o Learn to use dataset, dataframes, features of dataset in an application

o Learn to compute summary statistics for the features.

o Learn to use visualization techniques.

 Learning Outcome:

o Students will be able to compute statistics on the features of the dataset, use
histograms and boxplot on the features of the dataset.
 Theory:
Data analysis is a process of inspecting, cleansing, transforming, and
modelling data with the goal of discovering useful information, informing
conclusions, and supporting decision-making. Data analysis has multiple facets and
approaches, encompassing diverse techniques under a variety of names, while being
used in different business, science, and social science domains.
A data set (or dataset) is a collection of data. Most commonly a data set corresponds
to the contents of a single database table, or a single statistical data matrix, where
every column of the table represents a particular variable, and each row corresponds
to a given member of the data set in question.

Iris flower dataset:

The Iris Dataset contains four features (length and width of sepals and petals) of 50
samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). These
measures were used to create a linear discriminant model to classify the species. The
dataset is often used in data mining, classification and clustering examples and to test
algorithms.

Attribute Information:
-> sepal length in cm
-> sepal width in cm
-> petal length in cm
-> petal width in cm
-> class:
Iris Setosa
Iris Versicolour
Iris Virginica

Number of Instances: 150

Summary statistic:
Mean, standard deviation, regression, sample size determination and hypothesis
testing are the fundamental data analytics methods.

Mean: The sum of all the data entries divided by the number of entries.
Range: The difference between the maximum and minimum data entries in the
set.
Range = (Max. data entry) – (Min. data entry)

Standard deviation:
The standard deviation measure variability and consistency of the sample or
population. In most real-world applications, consistency is a great advantage. In
statistical data analysis, less variation is often better.

Variance: The average squared deviation from the mean is also known as the
variance.

Percentile: Let p be any integer between 0 and 100. The pth percentile of data set
is the data value at which p percent of the value in the data set are less than or
equal to this value.
• How to calculate percentiles: Use the following steps for calculating percentiles
for small data sets.
• Step 1: Sort the data in ascending order (from smallest to largest)

• Step Step 3: 2: Calculate ith = the 100 where p is the

percentile and n is the sample size.
Step 3: If i is an integer the pth percentile is the mean of the data values in
position i and i+1.If i is not an integer then round up to the next integer and use
the value in this position.
Summary statistic on Iris dataset:

Summary Statistics:
Min Max Mean SD Class Correlation
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

Class Distribution: 33.3% for each of 3 classes.

Box Plot:

A boxplot shows the distribution of the data with more detailed information. It shows
the outliers more clearly, maximum, minimum, quartile(Q1), third quartile(Q3),
interquartile range(IQR), and median. You can calculate the middle 50% from the IQR.

Histogram:

Both histograms and box plots are used to explore and present the data in an easy and
understandable manner. Histograms are preferred to determine the underlying probability
distribution of a data. Box plots on the other hand are more useful when comparing between
several data sets. They are less detailed than histograms and take up less space.

A histogram is a value distribution plot of numerical columns. It basically creates bins in

various ranges in values and plots it where we can visualize how values are distributed. We
can have a look where more values lie like in positive, negative, or at the center(mean)
Histograms and box plots are very similar in that they both help to visualize and describe
numeric data. Although histograms are better in determining the underlying distribution of
the data, box plots allow you to compare multiple data sets better than histograms as they are
less detailed and take up less space. It is recommended that you plot your data graphically
before proceeding with further statistical analysis.

Histogram for Sepal Length

Histogram for Petal Length

Data Mining: Exploring Data: Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data: Lecture Notes For Chapter 3
21 pages
Cracking The SQL Interview
No ratings yet
Cracking The SQL Interview
52 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Presentation On Library Automation System
No ratings yet
Presentation On Library Automation System
23 pages
Change Pointer Technique For Idocs - SAP Community
No ratings yet
Change Pointer Technique For Idocs - SAP Community
9 pages
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
No ratings yet
Academic Year: 2020: Course Title: Data Structures and Algorithms Lab
7 pages
Course Presentation
No ratings yet
Course Presentation
236 pages
451 Computer Studies Paper 2 Revision Strategy 2023
No ratings yet
451 Computer Studies Paper 2 Revision Strategy 2023
10 pages
Chapter 2 Final of Final
No ratings yet
Chapter 2 Final of Final
158 pages
Chapter 4: Summarizing & Exploring Data (Descriptive Statistics) Graphics! Graphics! Graphics! (And Some Numbers)
No ratings yet
Chapter 4: Summarizing & Exploring Data (Descriptive Statistics) Graphics! Graphics! Graphics! (And Some Numbers)
85 pages
3 4 5 IT Infrastructure
No ratings yet
3 4 5 IT Infrastructure
97 pages
BT 3041: Analysis and Interpretation of Biological Data
No ratings yet
BT 3041: Analysis and Interpretation of Biological Data
57 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Math 553
No ratings yet
Math 553
271 pages
CH 2
No ratings yet
CH 2
68 pages
Math236 Lecture 3
No ratings yet
Math236 Lecture 3
62 pages
OpenBuildings Deployment Guide For ProjectWise Managed Configurations - v1.1
No ratings yet
OpenBuildings Deployment Guide For ProjectWise Managed Configurations - v1.1
59 pages
G12-Cs-Practical QP, Ak
No ratings yet
G12-Cs-Practical QP, Ak
16 pages
Team-4 - Project Report
No ratings yet
Team-4 - Project Report
94 pages
Lecture Notes For Data Exploration Chapter Introduction To Data Mining
No ratings yet
Lecture Notes For Data Exploration Chapter Introduction To Data Mining
46 pages
5 Data Exploration
No ratings yet
5 Data Exploration
41 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Unit 4 SQL
No ratings yet
Unit 4 SQL
45 pages
Lecture Notes For Data Exploration Chapter: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Data Exploration Chapter: by Tan, Steinbach, Karpatne, Kumar
43 pages
Data Mining Data Exploration
No ratings yet
Data Mining Data Exploration
66 pages
Student MGT System (Cs Class 12)
No ratings yet
Student MGT System (Cs Class 12)
38 pages
Report Mohi
No ratings yet
Report Mohi
69 pages
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
34 pages
1 3 ST-explore
No ratings yet
1 3 ST-explore
55 pages
Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
4 - Exploring Data
No ratings yet
4 - Exploring Data
32 pages
Data Exploration LEC3 AM
No ratings yet
Data Exploration LEC3 AM
59 pages
M1.2 DS
No ratings yet
M1.2 DS
29 pages
Dbit DBMS
No ratings yet
Dbit DBMS
23 pages
Information Tech NSC Grade 12 June 2021 P1 and Memo
No ratings yet
Information Tech NSC Grade 12 June 2021 P1 and Memo
47 pages
Bcis5420 - Lecture Note - ch4 - ER Modleing
No ratings yet
Bcis5420 - Lecture Note - ch4 - ER Modleing
42 pages
Unit 5
No ratings yet
Unit 5
18 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Chapter Five
No ratings yet
Chapter Five
48 pages
Ashutosh Dissertation 202107040
No ratings yet
Ashutosh Dissertation 202107040
63 pages
GEA1000 Notes
No ratings yet
GEA1000 Notes
27 pages
Lab Cs
No ratings yet
Lab Cs
38 pages
SQL Notes
No ratings yet
SQL Notes
14 pages
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
No ratings yet
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
24 pages
Bhargavi Prakash Deodhe 22BCE11586
No ratings yet
Bhargavi Prakash Deodhe 22BCE11586
14 pages
Univariate and Multivariate Data Exploration
No ratings yet
Univariate and Multivariate Data Exploration
26 pages
Unit 3
No ratings yet
Unit 3
45 pages
Data Science Project
No ratings yet
Data Science Project
31 pages
clc03 Hmtoan Ass4
No ratings yet
clc03 Hmtoan Ass4
56 pages
Ms Data Science S, 24 (WEEK# 2)
No ratings yet
Ms Data Science S, 24 (WEEK# 2)
19 pages
Materi 1 B VDE
No ratings yet
Materi 1 B VDE
18 pages
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
Task 1
No ratings yet
Task 1
14 pages
43 - InfyTQ Interview Experience Batch
No ratings yet
43 - InfyTQ Interview Experience Batch
4 pages
DS Assignment
No ratings yet
DS Assignment
12 pages
Wk. 4. Exploring Data (12-05-2021)
No ratings yet
Wk. 4. Exploring Data (12-05-2021)
10 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
Business Analytics Assignment NAME: Divyansh: Bisht
No ratings yet
Business Analytics Assignment NAME: Divyansh: Bisht
7 pages
AMR - Assignment 1-Sample Solutions
No ratings yet
AMR - Assignment 1-Sample Solutions
7 pages
04 Data Exploration Part 1 - Spring 24-25
No ratings yet
04 Data Exploration Part 1 - Spring 24-25
15 pages
ML R Experiment1
No ratings yet
ML R Experiment1
10 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Module 2e - Data Visualization - NV
No ratings yet
Module 2e - Data Visualization - NV
9 pages
الواجب الاول علوم البيانات
No ratings yet
الواجب الاول علوم البيانات
7 pages
Dsbda Lab - 3 - 1737952797670
No ratings yet
Dsbda Lab - 3 - 1737952797670
9 pages
Lecture 2.1 Data - Exploration
No ratings yet
Lecture 2.1 Data - Exploration
22 pages
Data Preprocessing Report
No ratings yet
Data Preprocessing Report
6 pages
1a. Cable Route
No ratings yet
1a. Cable Route
13 pages
What's New in Oracle Primavera 24.12 (On Premises)
No ratings yet
What's New in Oracle Primavera 24.12 (On Premises)
5 pages
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
DSBDA Lab Assignment No 10
No ratings yet
DSBDA Lab Assignment No 10
3 pages
Title: Introduction To PHP Programming Slide 1: Title
No ratings yet
Title: Introduction To PHP Programming Slide 1: Title
4 pages
EXPERIMENT
No ratings yet
EXPERIMENT
16 pages
DSBDAL - Assignment No 10
No ratings yet
DSBDAL - Assignment No 10
5 pages
Gagan Jindali Report
No ratings yet
Gagan Jindali Report
11 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Merging and Importing Data Additionalmaterial
No ratings yet
Merging and Importing Data Additionalmaterial
2 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
STAT 1770 Lab 2-2
No ratings yet
STAT 1770 Lab 2-2
3 pages
Concept Paper For Administrative Information Management System
No ratings yet
Concept Paper For Administrative Information Management System
2 pages
Dbms Set-1
No ratings yet
Dbms Set-1
2 pages
First Week
No ratings yet
First Week
8 pages
Exp 10
No ratings yet
Exp 10
2 pages
Module 2 Iris Data Set
No ratings yet
Module 2 Iris Data Set
1 page
AI Lab Exercise 3
No ratings yet
AI Lab Exercise 3
1 page
Pallavi Patill
No ratings yet
Pallavi Patill
1 page
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)