Data Handling
Data Handling
REPRESENTATION
(STATISTICS)
TEACHER: JOSE ARTURO GONZALEZ
THINKING PROBLEM
Manuel and Santiago play for the same basketball team. Unfortunately, during practice Manuel
suffered an injury and could only play half the season. The points scored by both boys in each
match were:
Manuel: 17,21,15,23,18,12,27,15,22,31,28,25
Santiago: 19,19,13,10,15,15,24,18,26,27,23,13,20,24,18,26,19,25,8,26,21,23,26,19
Sample: A representative group chosen from the population to take part in the survey, be measured,
or be tested.
Random Sample: A sample selected so that any person or object has the same possibility of being
selected than any other.
Inference: A conclusion we can make based on the information that was collected and interpreted.
Consider the following example, suppose we want to determine how many students at CCB like vanilla
ice cream. What could the population be? How could we select a sample? What might an inference be?
SAMPLES AND POPULATIONS
Special Cases:
When a government carries out a CENSUS it involves gathering information from everyone in the
population. This process is very expensive and takes lots of time.
Because of the previous statement many governments may decide to gather the required information
from a sample of the population. To do this, and to make any inference real it is critical that the results
be as typical of the whole population as possible. To ensure this, it is important to randomly select the
sample and to make the sample as large as is practical.
5 Scientist in the jungle want to find the best estimate for the lion population. They
tagged and released 20 lions as part of a research project. Later, they found 160
lions, 8 of which where tagged. Find the nearest whole number that best estimates
the lion population?
6 Juanita works in an Ornithology Department. Students asked her to find out the
best estimate of the local bird population. So she tied a belt around the legs of 40
birds. A few days later, he observed 520 birds, 34 of which had belts. To the
nearest whole number, what is the best estimate for the bird population?
CATEGORICAL DATA
When we talk about categorical data we refer to data which can be placed in categories.
An example could be if we stand at a street intersection and record the color of the different cars driving
past the intersection. In this case we could use the following code for the colors; R for Red, B for blue, G
for green, W for white and O for all other colors.
We could then obtain the following results after observing a 50 car sample:
BGWWR OGWRW OOBBG OGRWR WWWGB
BBGGW WWWOG WOBWW RWWRB OOBWR
Once we have our categorical data, we first organize it in groups. To do this we can either use a:
a. a dot plot or
b. a tally and frequency table.
At this point we can identify key features of the data. For example, the mode. The mode is the most
frequently occurring category.
A dot plot is a graph used to display data, each dot represents one data value. They can be horizontal
or vertical.
CATEGORICAL DATA
Example:
CATEGORICAL DATA
(DOT PLOT)
Exercises:
CATEGORICAL DATA
(DOT PLOT)
Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
If the problem we are studying has lots of data, it might be easier to use a tally and frequency table. This
tool will help us in the data collection process.
The tally part is used to keep a count of data in each category. The frequency simply summarizes the
tally, meaning it lets us know the total number of each category.
This type of table is sometimes called a frequency distribution table or simply a frequency table.
Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Bar Graphs consist of rectangular shaped columns of equal width. The height of each column represents
the number of observations (frequency) of the different categories.
Example:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Pie Charts are a useful of showing how a quantity is divided up. A full pie/circle represents the whole
quantity. We can then divide the pie into wedges or slices to show the frequency of each category.
The table opposite shows the results when 8th grade students were asked
“What is your favorite fruit?”
!
There are 60 kids in the sample, so each person is entitled to "# 𝑡ℎ of the
!
pie chart. "# 𝑡ℎ of 360ª is 6ª, so we can determine the angles of the
different wedges in the pie chart.
Numerical data can be arranged using either a stem-and-leaf plot or a tally and frequency table. As in
the case of categorical data, numerical data can also be presented by a bar/column graph.
STEM-AND-LEAF PLOTS
101, 91, 83, 84, 72, 93, 67, 85, 79, 87, 78, 89, 68, 80, 107, 70, 85, 64, 95, 76, 87, 74, 68, 59, 82, 77
For each data value, the units digit will be the leaf, and the digits before it determines the stem on which
the leaf is placed.
For this example the stem labels are 5, 6, 7, 8, 9, and 10. These will be written under one another in
Ascending order.
NUMERICAL DATA
Once the stems have been recorded we start to look at each dada value. The first value is 101, here 10
is the stem and 1 is the leaf. So we record a 1to the right of the stem label 10. The next value we see is
91. Here its stem label is 9 and its leaf would be 1. Again we record a 1 to the right of the stem label 9.
We proceed to record all the data in an un ordered stem-and-leaf plot.
NUMERICAL DATA
Example:
NUMERICAL DATA
Exercises:
NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Example:
WORKING WITH NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Exercises:
MEASURES OF CENTRAL TENDENCY
The mean or average of a set of numbers is an important measure of their middle (central tendency). We
Talk about averages all the time. For example:
MEAN OR AVERAGE
Example:
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
The Median of a data set is dependent on whether the number of observations in the data set is odd or
even. To determine the median, first reorder the data set from the smallest to the largest then if the
MEDIAN & MODE
number of observations is odd, then the median is the observation in the middle of the data set. If the
number of observations is even, then the median is the average of the two middle observations.
MEASURES OF CENTRAL TENDENCY
The Mode for a data set is the observation that occurs the most often. It is not uncommon for a data set
to have more than one mode. This happens when two or more observation occur with equal frequency in
the data set. A data set with two modes is called bimodal. A data set with three modes is called
MEDIAN & MODE
trimodal.
MEASURE OF VARIABILITY
The Range for a data set is the difference between the largest value and smallest value contained in the
data set. First reorder the data set from smallest to largest then subtract the first observation from the
last observation.
RANGE