Data Analysis
Data Analysis
- Mean
- Median
- Mode
Mean:
The mean is also referred to as the arithmetic mean.
How to calculate: add all the values in a data set and divide by the total number
When it is most suitable or appropriate to use: when data is numerical and on a
universal scale (years, KGs, meters etc.). It takes into account all the values in a
data set which is why you would use this over the other average scores
Median:
This is the middlemost score in a data set
How to calculate: arrange the data in ascending order; eliminate the highest and
smallest values; continue doing so till one value is left in the middle; however if
two values are left in the middle, you take their mean
When it is most suitable or appropriate to use: when data is ordinal (in order, such
as ascending order) or in intervals, and when the scale is devised by the
researcher specifically for their study (rate how happy you are from 1 to 10).
Mode:
This is the most frequently occurring value
How to calculate: See which value/score/category has the highest frequency
When it is most suitable or appropriate to use: when data is nominal (in
categories) such as favourite color, favourite subject etc. For example, people are
asked to select their favourite color from red, yellow, blue and green. 10 people
pick red, 7 pick yellow, 12 pick blue, and 8 pick green, the mode would be blue as
it has the highest frequency (12). If two categories had the highest joint
frequency, then both would be the mode – this would be called a bi-modal score.
MEASURES OF SPREAD/DISPERSION (DESCRIPTIVE STATISTICS): These refer to
how much the data is spread from the average value and include:
- Range
- Standard deviation
Range:
Range is used with the median when data is ordinal/interval
It is calculated by arranging the data in ascending order, then subtracting the
smallest number from the largest, and adding 1
e.g. 5,6,3,2,3,5,7
Range = 6
Standard Deviation:
Standard deviation is used with the mean when data is numerical and it might be
better to use this over range as it takes into account all the values in a data set.
If the mean age of students in a class is 17 years and standard deviation is 1, you
would do the following two steps:
17 + 1 = 18
17 – 1 = 16
This shows that the range of ages of the students in the class is 16-18 with the
average age being 17. This means the data is spread by just one year from the
mean age.
GRAPHS:
Bar Chart/Graph:
This has spaces between the bars and is used for categories of data when data is
nominal. It helps to calculate the mode
Frequency is plotted on the y-axis and the categories are plotted on the x-axis
Histogram:
This has no spaces between the bars and is used to plot numerical data and is
useful for calculating the mean
Frequency is plotted on the y-axis and the numerical data on the x-axis
Scatterplot/gram/graph:
This is used to plot correlations between two variables. A downward sloping curve
is a negative correlation whereas an upward sloping curve is a positive
correlation. The steeper the curve, the stronger the correlation.
You can label either the x-axis or the y-axis either of the two variables
Normal Distribution:
A normal distribution curve is also known as a bell-shaped curve and has two
features:
- It is perfectly symmetrical
- The mean, mean and mode are all the same values