0% found this document useful (0 votes)
10 views6 pages

Data Analysis

The document outlines key concepts in data analysis, focusing on measures of central tendency (mean, median, mode) and measures of spread (range, standard deviation). It explains how to calculate each measure and when to use them based on the type of data. Additionally, it describes various graphical representations of data, including bar charts, histograms, and scatterplots, as well as the characteristics of a normal distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Data Analysis

The document outlines key concepts in data analysis, focusing on measures of central tendency (mean, median, mode) and measures of spread (range, standard deviation). It explains how to calculate each measure and when to use them based on the type of data. Additionally, it describes various graphical representations of data, including bar charts, histograms, and scatterplots, as well as the characteristics of a normal distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA ANALYSIS

MEASURES OF CENTRAL TENDENCY (AVERAGES): These are average scores that


include:

- Mean
- Median
- Mode

Mean:
The mean is also referred to as the arithmetic mean.
How to calculate: add all the values in a data set and divide by the total number
When it is most suitable or appropriate to use: when data is numerical and on a
universal scale (years, KGs, meters etc.). It takes into account all the values in a
data set which is why you would use this over the other average scores

Median:
This is the middlemost score in a data set
How to calculate: arrange the data in ascending order; eliminate the highest and
smallest values; continue doing so till one value is left in the middle; however if
two values are left in the middle, you take their mean
When it is most suitable or appropriate to use: when data is ordinal (in order, such
as ascending order) or in intervals, and when the scale is devised by the
researcher specifically for their study (rate how happy you are from 1 to 10).

Mode:
This is the most frequently occurring value
How to calculate: See which value/score/category has the highest frequency
When it is most suitable or appropriate to use: when data is nominal (in
categories) such as favourite color, favourite subject etc. For example, people are
asked to select their favourite color from red, yellow, blue and green. 10 people
pick red, 7 pick yellow, 12 pick blue, and 8 pick green, the mode would be blue as
it has the highest frequency (12). If two categories had the highest joint
frequency, then both would be the mode – this would be called a bi-modal score.
MEASURES OF SPREAD/DISPERSION (DESCRIPTIVE STATISTICS): These refer to
how much the data is spread from the average value and include:

- Range
- Standard deviation

Range:
Range is used with the median when data is ordinal/interval
It is calculated by arranging the data in ascending order, then subtracting the
smallest number from the largest, and adding 1

e.g. 5,6,3,2,3,5,7

step 1: Arrange in ascending order – 2,3,3,5,5,6,7


step 2: largest value – smallest value + 1 = 7-2+1 = 6

Range = 6

Standard Deviation:
Standard deviation is used with the mean when data is numerical and it might be
better to use this over range as it takes into account all the values in a data set.

Step 1: Mean + Standard Deviation


Step 2: Mean – Standard Deviation

If the mean age of students in a class is 17 years and standard deviation is 1, you
would do the following two steps:
17 + 1 = 18
17 – 1 = 16

This shows that the range of ages of the students in the class is 16-18 with the
average age being 17. This means the data is spread by just one year from the
mean age.
GRAPHS:

Bar Chart/Graph:
This has spaces between the bars and is used for categories of data when data is
nominal. It helps to calculate the mode

Frequency is plotted on the y-axis and the categories are plotted on the x-axis

Histogram:
This has no spaces between the bars and is used to plot numerical data and is
useful for calculating the mean

Frequency is plotted on the y-axis and the numerical data on the x-axis
Scatterplot/gram/graph:
This is used to plot correlations between two variables. A downward sloping curve
is a negative correlation whereas an upward sloping curve is a positive
correlation. The steeper the curve, the stronger the correlation.

You can label either the x-axis or the y-axis either of the two variables
Normal Distribution:

A normal distribution curve is also known as a bell-shaped curve and has two
features:

- It is perfectly symmetrical
- The mean, mean and mode are all the same values

You might also like