Exp 4
Exp 4
Experiment No.4
Somaiya University
KJSCE/IT/SY /SEMIII/FDS-Honour-AI/2023-24
For data preprocessing to be successful, it is essential to have an overall picture of your data.
Basic statistical descriptions alone cannot be used to identify properties of the data and
highlight which data values should be treated as noise or outliers. The plots such as Box Plot,
Q-Q Plot, Histogram and Scatterplots provide various information to the data analyst. Data
visualization is very much needed because a visual summary of information makes it easier
to identify patterns and trends than looking through thousands of rows. Before applying plots
suitability of the attribute should be checked.
Histogram
Histogram gives accurate representation of the distribution of numeric data. A histogram is a
chart that shows frequencies for intervals of values of a continuous variable. It summarize
a Univariate Data set. In histogram of a continuous frequency table, x-axis marks class
intervals on a suitable scale and y-axis marks frequency of each class interval. The interval of
value is known as bin and they all have the same widths. The upper and lower class limits of
the new exclusive type classes are known as class boundaries. Histograms also give us
much more complete information about our data.
Box plot
Boxplot also known as box-and-whisker plot is a way to show the distribution of values
based on the five-number summary: minimum, first quartile, median, third quartile,
and maximum. The minimum and the maximum are just the min and max values from the
data set. The median is the value that separates the higher half of a data from the lower
Somaiya University
KJSCE/IT/SY /SEMIII/FDS-Honour-AI/2023-24
half. The first quartile is the median of the data values to the left of the median in our ordered
values. The third quartile is the median of the data values to the right of the median in our
ordered values. Boxplot can also show outliers and IQR(Inter Quartile range) .
Scatterplots
Somaiya University
KJSCE/IT/SY /SEMIII/FDS-Honour-AI/2023-24
Quantile-Quantile Plot
___________________________________________________________________________
1. Identify the attributes where it will be sensible to apply the below given plots.
a. Box Plot
b. Q Q Plot
c. Histogram
d. Scatter Plot
Somaiya University
KJSCE/IT/SY /SEMIII/FDS-Honour-AI/2023-24
Apply the above mentioned plots on the identified attributes. Discuss the inferences from
these plots in detail.
___________________________________________________________________________
Results: (Program printout with output / Document printout as per the format)
Questions:
1. Why is it important to measure the dispersion in the dataset?
2. Discuss the other purposes/advantages of the plots used in this experiment.
Outcomes:
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Grade: AA / AB / BB / BC / CC / CD /DD
Somaiya University
KJSCE/IT/SY /SEMIII/FDS-Honour-AI/2023-24
1. Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 3nd
Edition
Somaiya University