0% found this document useful (0 votes)
15 views7 pages

Business Analytics Assignment NAME: Divyansh: Bisht

The document is an assignment by Divyansh Bisht analyzing the Iris dataset, which contains measurements of iris flowers categorized into three species. It includes descriptive statistics, correlation analysis, outlier detection, and visual representations of the data, concluding that the species are well-separated based on their measurements. Suggestions for further research include applying machine learning algorithms and exploring dimensionality reduction techniques.

Uploaded by

Divyansh Bisht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Business Analytics Assignment NAME: Divyansh: Bisht

The document is an assignment by Divyansh Bisht analyzing the Iris dataset, which contains measurements of iris flowers categorized into three species. It includes descriptive statistics, correlation analysis, outlier detection, and visual representations of the data, concluding that the species are well-separated based on their measurements. Suggestions for further research include applying machine learning algorithms and exploring dimensionality reduction techniques.

Uploaded by

Divyansh Bisht
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

BUSINESS ANALYTICS ASSIGNMENT

NAME: Divyansh Bisht

CITATIONS READ

0 1

1 author:

Divyansh Bisht
Hindu College
BUSINESS ANALYTICS
ASSIGNMENT
NAME: Divyansh Bisht
ROLL NO. : 22-4-02-003293
SEMESTER: VI
SUBJECT: BUSINESS ANALYTICS
SUBMITTED TO: PROF. JASPREET KAUR

Teacher’s Signature Student Signature


In-Depth Analysis of the Iris Dataset

The Iris dataset is a famous dataset in machine learning and statistics.


It contains 150 observations of iris flowers, categorized into three
species: Setosa, Versicolor, and Virginica. Each observation includes
four features:

• Sepal Length (in cm)

• Sepal Width (in cm)

• Petal Length (in cm)

• Petal Width (in cm)

The objective is to analyze and interpret the dataset using statistical


formulas.

1. Descriptive Statistics (Mean, Median, Standard Deviation)

Descriptive statistics help us understand the central tendency (mean,


median) and variability (standard deviation) of the data.

Setosa

Feature Mean (µ) Median Standard Deviation (σ)


Sepal Length 5.006 5.0 0.3
5
0.3
Sepal Width 3.428 3.4
7
0.1
Petal Length 1.462 1.5
7
0.1
Petal Width 0.246 0.2 1
Versicolor

Feature Mean (µ) Median Standard Deviation (σ)

Sepal Length 5.936 5.9 0.52

Sepal Width 2.770 2.8 0.31

Petal Length 4.260 4.35 0.47

Petal Width 1.326 1.3 0.20

Virginica
Feature Mean (µ) Median Standard Deviation (σ)

Sepal Length 6.588 6.5 0.64

Sepal Width 2.974 3.0 0.32

Petal Length 5.552 5.55 0.55

Petal Width 2.026 2.0 0.27

2. Range s Variance

The range (difference between the largest and smallest values) and
variance (spread of data points) tell us about data distribution.

Range of Features (Max - Min)

Feature Setosa Versicolor Virginica

Sepal Length 4.3 - 5.8 (1.5) 4.9 - 7.0 (2.1) 4.9 - 7.9 (3.0)

Sepal Width 2.3 - 4.4 (2.1) 2.0 - 3.4 (1.4) 2.2 - 3.8 (1.6)

Petal Length 1.0 - 1.9 (0.9) 3.0 - 5.1 (2.1) 4.5 - 6.9 (2.4)

Petal Width 0.1 - 0.6 (0.5) 1.0 - 1.8 (0.8) 1.4 - 2.5 (1.1)

3. Correlation Analysis

Correlation measures how strongly features are related.

Correlation (r) Value Interpretation

Sepal Length s Petal Length 0.87 Strong positive correlation

Sepal Width s Petal Length -0.43 Negative correlation

Petal Length s Petal Width 0.G6 Very strong positive correlation

4. Z-Score s Outlier Detection


Z-score helps us find outliers (unusual data points). The Z-
score formula: Z=X−μσZ = \frac{{X - \mu}}{\sigma}
where:
• XX = Data point

• μ\mu = Mean

• σ\sigma = Standard deviation

Observations:

• No extreme outliers in petal and sepal sizes.

• However, some sepal widths in Setosa (4.4 cm) and some petal
widths in Virginica (2.5 cm) are slightly unusual.

5. Visual Analysis Using Graphs

Histogram of Sepal Length

A histogram shows the distribution of sepal length for each species. The
distribution helps identify patterns and variations.

Pie Chart of Species Distribution

A pie chart represents the proportion of each species in the dataset,


showing an equal distribution.

Scatter Plot of Petal Length vs. Petal Width

A scatter plot helps visualize the strong correlation between petal length
and petal width. Setosa has distinct clustering, while Versicolor and Virginica
overlap slightly.

Box Plot for Sepal Width

A box plot highlights outliers and the spread of sepal width across species.

6. Conclusion s Insights

Key Takeaways:

• Setosa has the smallest petals s widest sepals → Easily


identifiable.
• Virginica has the longest petals s sepals → Distinct from others.

• Versicolor is in-between → Moderate petal C sepal size.

• Petal length s width are highly correlated → If you know one, you
can predict the other.

• No extreme outliers, but some high sepal width in Setosa and


large petal width in Virginica are unusual.

With this extensive statistical and visual analysis, we conclude that


Setosa, Versicolor, and Virginica are well-separated species
based on their petal and sepal measurements. These insights can
be utilized in classification models for accurate species identification.

Further Research Suggestions:

• Applying machine learning algorithms like KNN or SVM on the


dataset.

• Exploring dimensionality reduction techniques (e.g., PCA) to


analyze feature importance.

• Investigating real-world applications of iris species classification in


botany and agriculture.

References

• Fisher, R.A. "The Use of Multiple Measurements in Taxonomic


Problems." 1936.

• UCI Machine Learning Repository: Iris Dataset.


View publication stats

You might also like