Exp 10
Exp 10
Group A
Assignment No: 10
Theory:
How to Find the Mean, Median, Mode, Range, and Standard Deviation
Simplify comparisons of sets of number, especially large sets of number, by calculating the
center values using mean, mode and median. Use the ranges and standard deviations of the sets
to examine the variability of data.
Calculating Mean
The mean identifies the average value of the set of numbers. For example, consider the data
set containing the values 20, 24, 25, 36, 25, 22, 23.
Formula
To find the mean, use the formula: Mean equals the sum of the numbers in the data set
divided by the number of values in the data set. In mathematical terms: Mean=(sum of all
terms)÷(how many terms or values in the set).
Finding Divisor
Divide by the number of data points in the set. This set has seven values so divide by 7.
Finding Mean
Insert the values into the formula to calculate the mean. The mean equals the sum of the values
(175) divided by the number of data points (7). Since 175÷7=25, the mean of this data set equals
25. Not all mean values will equal a whole number.
Guru Gobind Singh College of Engineering & Research Centre, Nasahik Page 1
Department of Computer Engineering Subject : DSBDAL
Calculating Range
Range shows the mathematical distance between the lowest and highest values in the data set.
Range measures the variability of the data set. A wide range indicates greater variability in the
data, or perhaps a single outlier far from the rest of the data. Outliers may skew, or shift, the
mean value enough to impact data analysis.
In the sample group, the lowest value is 20 and the highest value is 36.
Calculating Range
To calculate range, subtract the lowest value from the highest value. Since 36-20=16, the
range equals 16.
Standard deviation measures the variability of the data set. Like range, a
smaller standard deviation indicates less variability.
Formula
Finding standard deviation requires summing the squared difference between each data point and
-µ)2], adding all the squares, dividing that sum by one less than the number of
values (N-1), and finally calculating the square root of the dividend.
Calculate the mean by adding all the data point values, then dividing by the number of data
points. In the sample data set, 20+24+25+36+25+22+23=175. Divide the sum, 175, by the
number of data points, 7, or 175÷7=25. The mean equals 25.
Guru Gobind Singh College of Engineering & Research Centre, Nasahik Page 2
Department of Computer Engineering Subject : DSBDAL
Standard Deviation
Calculate the standard deviation by finding the square root of the division by N-1. In the
example, the square root of 26.6667 equals approximately 5.164. Therefore, the standard
deviation equals approximately 5.164.
Standard deviation helps evaluate data. Numbers in the data set that fall within one standard
deviation of the mean are part of the data set. Numbers that fall outside of two standard
deviations are extreme values or outliers. In the example set, the value 36 lies more than two
standard deviations from the mean, so 36 is an outlier. Outliers may represent erroneous data or
may suggest unforeseen circumstances and should be carefully considered when interpreting
data.
Application:
1. The histogram is suitable for visualizing distribution of numerical data over a continuous
interval, or a certain time period. The histogram organizes large amounts of data, and produces
visualization quickly, using a single dimension.
2. The box plot allows quick graphical examination of one or more data sets. Box plots may
seem more primitive than a histogram but they do have some advantages. They take up less
space and are therefore particularly useful for comparing distributions between several groups or
sets of data. Choice of number and width of bins techniques can heavily influence the appearance
of a histogram, and choice of bandwidth can heavily influence the appearance of a kernel density
estimate.
3. Data Visualization Application lets you quickly create insightful data visualizations, in
minutes.
Data visualization tools allow anyone to organize and present information intuitively. They
enables users to share data visualizations with others.
Input:
Guru Gobind Singh College of Engineering & Research Centre, Nasahik Page 3
Department of Computer Engineering Subject : DSBDAL
Output:
Conclusion:
Hence, we have studied using dataset into a dataframe and compare distribution and identify
outliers.
Assignment Questions
1. For the iris dataset, list down the features and their types.
2. Write a code to create a histogram for each feature. (iris dataset)
3. Write a code to create a boxplot for each feature. (iris dataset)
4. Identify the outliers from the boxplot drawn for iris dataset.
Guru Gobind Singh College of Engineering & Research Centre, Nasahik Page 4