Data Mining and Predictive Modelling Assignment
Data Mining and Predictive Modelling Assignment
5th Semester
Department of Computer Science and
Engineering
GIET University, Gunupur
ASSIGNMENT 1
MEASURES OF CENTRAL TENDENCY
It describes distribution of data focusing on
central location around which all other data
are clustered.
MEASURES OF CENTRAL TENDENCY
It attempts to describe set of data by
identifying the central position within which
data is set.
Measure of central tendency:
1. Mean
2. Median
3. Mode
MEAN
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
MEDIAN
The median is the middle score for a set of data that has been
arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order
to calculate the median, suppose we have the data below
Ex-1) 65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case is 56
Ex-2) 65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and
average them to get a median of 55.5.
MODE
The mode is the most frequent score in our data set. On a histogram it
represents the highest bar in a bar chart or histogram in fig-1 .
Normally, the mode is used for categorical data where we wish to know
which is the most common category, as illustrated in fig-2.
However, one of the problems with the mode is that it is not unique, so it
leaves us with problems when we have two or more values that share
the highest frequency, such as fig-3.
SKEWED DISTRIBUTIONS
An example of a normally distributed set of data is presented
below.
•In any symmetrical distribution the mean, median and mode are
equal.
• Mean is widely preferred as the best measure of central tendency
because it is the measure that includes all the values in the data set
for its calculation.
CONTD.
However, when our data is skewed, for example, as with the right-skewed
data set below:
Please use the following summary table to know what the best
measure of central tendency is with respect to the different types of
variable.
✔ Median
✔ Mode
✔ Variance
MEAN WITHOUT LIBRARY FUNCTION