0% found this document useful (0 votes)
6 views

Notes_Data handling

The document outlines the process of data handling, including developing questions, collecting data through various methods, and organizing data using tallies and frequency tables. It explains the difference between surveys and questionnaires, as well as the concepts of population and sample. Additionally, it covers measures of central tendency (mean, median, mode) and provides activities for practical application of these concepts.

Uploaded by

shirmique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Notes_Data handling

The document outlines the process of data handling, including developing questions, collecting data through various methods, and organizing data using tallies and frequency tables. It explains the difference between surveys and questionnaires, as well as the concepts of population and sample. Additionally, it covers measures of central tendency (mean, median, mode) and provides activities for practical application of these concepts.

Uploaded by

shirmique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data handling

March 2025

Introduction
1. Developing Questions

The first step in every statistical process is to develop or pose a question


MAIN Question: What are the trends/patterns, for example, in learner absenteeism after
exams are written?
Sub-questions (on learner absenteeism after an exam):
On which day(s) of the week following the exams are the most learners absent?
What percentage of the absentees are boys and girls?
In which grade is absenteeism generally highest and lowest during this period?

Example 1:

For the following question, write down three questions that will enable you to collect meaningful
data.

1. Do extra lessons improve learner’s academic results?


Answers:
Did you find that your marks increased noticeably after taking extra lessons?
If yes, give your approximate mark before taking the lessons.
By approximately what percentage did your marks increase/decrease after taking the
lessons.

2. Collecting data:

Ways of collecting data:

By observation e.g.
Counting the number of people entering a shop over a period of an hour

By interview e.g.
Asking people their opinion on the strength of a brand such as Vodacom or Nike

By survey, e.g.
Finding out a learner’s favorite subject and teacher by means of a questionnaire.
Advantages Disadvantages
Observation • Easy to record • Time consuming for
• Participants don’t observer
need to fill in forms • Reliant on accuracy
of observer
Interview • Discussion between • Time consuming
the interviewee and • Can be expensive
interviewer • Difficult to arrange
• Researcher can • Difficult to target a
clarify responses large audience in a
• Interviewees tend to whole geographical
be more honest area
Survey using • Can be completed by • Researcher is not
a many people at the always present to
questionnaire same time clarify wording
• Can be distributed in •
Researcher cannot
different areas ask person to clarify
• Can be fast and easy responses
to complete • People tend not to
• Can be completed at complete
a time convenient to questionnaire
the person • Questions can be
completing the vague and
questionnaire ambiguous
• People can easy be
dishonest
3.1 The difference between a survey and a questionnaire

• A questionnaire is a tool used to conduct a survey


• A survey is the process of using questionnaires to gather information.

3.2 Different ways of completing questionnaires

Researchers can ask people to fill in questionnaires in many different ways:


• Telephone
• Mail
• Online
• In-home visit
• Shopping mall
• .1 Population

• The population in statistics refers to the entire group of interest

• Example

• Which cell phone brand is most popular amongst the learners at Gateway High
School. (Here the population is all the learners at the school)

3.3 Sample

• It is often impossible to investigate the entire population e.g. every 16-year-old in
Cape Town. Instead we restrict ourselves to studying a representative sample of the
population
• A part of the population is called the sample.

• Example


If your population is the learners of Gateway High School, then the sample would be
for example, 5 learners taken from every grade.
5. Classifying and Organising data:

Classifying data:

• Categorical data is generally descriptive in nature.


Data can be observed and not measured
Examples include: textures, smells, tastes, appearance gender (male or female), eye
colour, shoe sizes and country of birth.
Categorical data can consist of ‘yes’ or ‘no’ answers
• Numerical data refers to data consisting of quantities or numerical values.
Examples include: measurements e.g. length, height, area, volume, mass, speed, time,
temperature, rainfall, humidity, sound levels, cost, members, ages, etc.
Numerical data is ether discrete or continuous data.

6. Organising data

To organize collected data, we use tallies and frequency tables.

Assume you have the following set of data:

1st questionnaire: yes yes yes yes no no no yes no


2nd questionnaire: yes yes no no yes yes yes yes yes

Tally tables

As we go through each questionnaire, we put a vertical line (a tally) next to the appropriate
answer (Yes/No)

1st questionnaire 2 nd questionnaire

Answer Tally Answer Tally


YES llll YES llll ll
NO llll NO ll
• The tallies are grouped into fives – each count is represented by a vertical line. l l l l
represents 4 and the fifth line is drawn horisontallly through the previous 4…. l l l l to
represent 5 – this makes the responses easier to count.

Frequency table

• Another column is added to the tally table, whereby the frequency of the tallies isitten in
numerical form
• The response of the questionnaires combined would be organized in a frequency table as
follows:

Answer Tally Frequency


YES ll ll llll l 12
l
NO llll l 6
Total 18

Grouping Data

Data is grouped into intervals (called class intervals) when:


• When there is a large number of data items or
• One is working with continuous data (i.e. measurements)

Example:

Draw a frequency table that you could use to organize the following data gathered about the
height of female learners in a class.

1.43 m 1.11 m 1.4 m 1.44 m 1.32 m


1.57 m 1.31 m 1.05 m 1.52 m 1.14 m

1.23 m 1.57 m 1.49 m 1.44 m 1.22 m

1.38 m 1.49 m 1.37 m 1.45 m 1.48 m

Answer
Height (in metres) Tally Frequency
1.00 – 1.09 l 1
1.10 -1.19 ll 2
1.20 -1.29 ll 2
1.30 – 1.39 llll 4
1.40 – 1.49 llll lll 8
1.50 - 159 lll 3
Total 20

7. Summarising Data

Measures of central tendency

The 3 measures of central tendency:

• Mean (also called the average)


• Median
• Mode
These three measures of central tendency all reflect some aspect of the data values, which is
‘representative’ of the whole data set.

Mean

To calculate the mean, you add all the values of the data set and divide this sum by the total
number of values in the data set.

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡


𝑀𝑒𝑎𝑛 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡

Note: the mean can only be calculated if the data is numerical

Example:

The soccer team kept a record of the number of goals scored, as shown below, in all the matches
they played in the recent season:

1 7 9 4 3 5 8 3 2 8

1. How many matches did the soccer team play?


2. Calculate the mean score.
3. How many matches produced a result above the mean score?

Solutions:

1. 10 matches as there are 10 scores


2.
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡
𝑀𝑒𝑎𝑛 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡

1+7+9+4+3+5+8+3+2+8
𝑀𝑒𝑎𝑛 =
10

50
=
10

=5
3. 4 matches as there were 4 scores greater than 5 (i.e. 7; 9; 8 and 8)
Median
• The median is the middle value of a data set that is arranged in ascending (smallest to
biggest) order
• If there is an odd number of values in the data set, the middle value will be the median
• If there is an even number of the values in the data set, you will have two values in the
middle. In this case, you need to find the mean of these two middle values, i.e. add them
together and divide by 2

Example

1. Odd number of data values

The list below shows the first round scores obtained by golfers in a school tournament:

83 89 88 90 89 84 82 86 89 87 86

Determine the median:


• Firstly, arrange the data set in ascending order

82 83 84 86 86 87 88 89 89 89 90

The middle value is the median


Median = 87

2. Even number of data values

The soccer team kept a record of their match scores, as shown below:

1 7 9 4 3 5 8 3 2 8

Calculate the median score.


• Firstly, arrange the data set in ascending order

1 2 3 3 4 5 7 8 8 9

No middle value, therefore add 4 and 5 and divide by 2


4+5 9
= 2 = 4,5
2
Median match score is 4,5 goals

Mode
• The mode is the value (or values) in the data set, that occur(s) most frequently

Example

The soccer team kept a record of their match scores, as shown below:

1 7 9 4 3 5 8 3 2 8

Calculate the modal score:

3 and 8 have the highest frequency – both appear twice

The data set is said to be bi-modal, i.e. the mode is 3 and 8


Measure of spread

Range

Range = highest value – lowest value

Activities
Activity 1.

1. Decide on the best method of gathering information for each of the scenarios below:

1.1 How many vehicles use the one-way road south of the school between 06:45 and 07:45
on a school day?
1.2 The majority of the girls in the high school would like to wear long pants to school in
winter.
1.3 Mrs. Mali would like to find out from her learners which teaching methods they prefer
and the reason why?
1.4 What is the class’s favorite subject?
1.5 Information recycling in your home.

2. The learners whose names start with the letters F and L are chosen from a Grade 11 class.
2.1 Identify the population
2.2 Identify the sample

3. Decide whether the following are appropriate samples from their population. Give reasons
for your answer.
3.1 All girls have good balance. 20 girls chosen from Mrs. Day’s elite gymnastics class.
3.2 All grade 10’s at Gateway high School have tried smoking. An alphabetical list of the
grade 10’s is printed, and every third learner is selected.

Activity 2:

1. Classify each of the following as categorical or numerical; and thenas discrete or


continuous data (if data is numerical in nature).

1.1 The number of trees in a garden.


1.2 The heights of the trees in the garden
1.3 The colour of the flowers in the garden
1.4 The of cars in the parking lot.
1.5 The speed at which cars are travelling past West High school.

2. He following data shows the percentage results for a Grade 10 mathematical literacy test:
58 55 44 23 78 85 53 65 88 23 64
24 43 82 76 69 50 73 67 58 16 0

2.1 How many learners wrote the test?


2.2 Draw up a frequency table and group results in class intervals of 10%:
For example (i.e. 0% - 9%; 10% - 19%; 20% - 29% etc.)
2.3 What percentage of learners achieved 50% or more?

Activity 3

1. In a Mathematical Literacy examination, 12 learners scored the following marks (%)

58 62 91 64 78 53 28 40 66 13 86 60

Calculate the:

1.1 mean
1.2 median
1.3 mode
1.4 range

2. The following money was spent at the school tuck shop during a week
Day Amount spent
Monday R456,85
Tuesday R236,90
Wednesday R236,90
Thursday R429,25
Friday R1 123,45
2.1 What was the mean amount of money spent at the tuck shop?

2.2 What was the median amount of money spent at the tuck shop?

2.3 What is the mode?

2.4 What is the range of the money spent during the week?

You might also like