NSTA 51516 Slides
NSTA 51516 Slides
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓ –
✓
• Statistics: this is the science of data. In simple terms, statistics can be defined as
the processing of raw data into summary measures that aid in decision-making
by representing important information.
• Data: this is unprocessed data that carries out little useful and useable
information.
Consider the following example:
• Data of students who study at SPU consists of their Age, Addresses , and
Gender.
• Statistics of students at SPU is as follows. 40% males and 60% females, those
aged between 19 and 25 constitutes 80% and the rest of 20% is older than
25. More than 70% are from NC and the rest of 30% is from the neighboring
towns.
• Statistics provide evidence-based decisions by assisting managers,
policy makers or business excecutives with addressing questions or
problems with confidence. For instance;
Management decision making based on statistical analysis is as follows:
We have data on our subject of interest.
We perform statistical analysis from the data.
We have information from the analysis.
Decision is made based on the analysis from data.
Consider the following scenario:
• Suppose you are a data analyst of a certain company. The company
wishes to extend their business, what steps can you follow to assist
the company into making good decisions?
Random variable is a sample space real-valued function. Also, it can be
defined as the variable of “interest” that we collect our data on. Random
variables are mostly denoted by capital letters.
Population: this is a set of all possible outcomes on our random variable.
For instance, the population is all students at SPU.
Sample: this is the subset of the population that is used to represent the
whole population. Sometimes, it is costly and time-consuming to work
with the whole population, hence the sample. An example of a sample
can be students at SPU who are in their first year staying off cam.
Sampling unit: this is the unit we collect data on. For instance, it can be a
student from the sample above.
There are 2 ways in which a sample can be selected from the population.
a) Probability Sampling: Each sample has an equal chance of being chosen in probability sampling. A
probability sample, to put it another way, is one in which each element of the population has a known
non-zero probability of selection. This sampling strategy yields the likelihood that our sample is
representative of the entire population.
• For instance, in SPU, for example, there are 500 students. This is called probability sampling, and it is a way
in which all 500 students at SPU have an equal chance of being participants in your study. To put it another
way, probability sampling employs random sampling procedures to select a sample.
• In a population of 100 people, for example, each person has a one-in-a-hundred chance of getting chosen.
This sampling approach ensures that a representative sample of the entire population is obtained
This is the method of selecting a sample from the population in such a
way that all elements of the population stand an equal chance of being
selected.
The probability of selecting the first unit of the sample is,
𝑛
𝑁
Second sample unit probability is
𝑛−1
𝑁−1
The last unit probability is
1
𝑁−𝑛+1
Hence, the probability of selecting n samples from N population units is
𝑛! 𝑁 − 𝑛 ! 1
=
𝑁 ! 𝑁𝐶𝑛
The above method is called sampling without replacement.
The other way we can sample is with replacement and with this method,
the probability is the same throughout the sampling process. This is
because after picking one unit, it is taken back into the population before
the next pick is made.
The selection can be done using the computer random numbers or the random
number table. In the case where we use the random number table, we follow
the following process.
To choose a random sample of 15 people from a total of 85, each subject must
be numbered from 01 to 85. Then, by closing your eyes and placing your finger
on a number on the table, choose a starting number. (Although this may appear
weird, it allows us to generate a random starting number.) Assume your finger
landed on the number 12 in the second column in this scenario. (It's the sixth
number from the top on the list.) Then go down the list until you've chosen 15
distinct numbers between 01 and 85. Go to the top of the following column
when you reach the bottom of the previous column. If you choose a number
that is bigger than 85 or 00 or a duplicate number, just omit it. We'll utilize the
subjects 12, 27, 75, 62, 57, 13, 31, 06, 16, 49, 46, 71, 53, 41, and 02 in our
example.
Researchers obtain systematic samples by numbering each data object in
the population and then picking every kth individual. Let's say there were
2000 people in the population and a sample of 50 people was required.
2000
Because = 40, 𝑘 = 40 𝑎𝑛𝑑 𝑡ℎ𝑒 40𝑡ℎ subject would be chosen at
50
random; nonetheless, the first subject (numbered between 1 and 40)
would be chosen at random. Assume subject 12 was chosen first; the
sample would then consist of subjects with numbers 12, 52, 92, and so
on until 50 subjects were acquired. When utilizing systematic sampling,
it's important to pay attention to how the population's subjects are
counted.
Researchers produce stratified samples by dividing the population into
groups (referred to as strata) based on a study-relevant trait and then
sampling from each group. Within each stratum, samples should be
chosen at random. Let's say the VC of SPU wants to know how students
feel about a particular topic. In addition, the VC wants to examine if first-
year students' attitudes differ from those of second-year students.
Students from each group will be chosen at random by the VC to be
included in the sample.
Cluster samples are also used by researchers. The population is
separated into clusters by a variety of factors, including geographic area,
schools in a large school district, and so on. The researcher then chooses
some of these clusters at random and employs all members of those
clusters as sample subjects. Let's say a researcher wants to conduct a
survey of apartment inhabitants in a big city. If there are ten apartment
buildings in the city, the researcher can choose two at random from the
ten and interview all the tenants. When there is a huge population or
people who live in a large geographic area, cluster sampling is
performed.
Non-probability sampling, as a contrast to probability sampling, draws
the sample using non-randomized methods. The majority of non-
probability sampling methods entail judgment. Instead of randomization,
individuals are chosen based on their accessibility. Your classmates and
friends, for example, have a larger probability of being included in your
sample. Non-probability sampling is a helpful and handy method of
selecting a sample in some circumstances when it is the only method
available.
Qualitative Random Variable: this generates categorical response data or
nonnumeric data . Examples of Categorical data are gender, education
levels, the province one comes from, etc.
5 000 4
6 000 1
7 000 3
8 000 5
9 000 8
10 000 10
11 000 2
13 000 5
14 000 6
15 000 1
• Observation: this is the observation of the respondent of the process in action. This may include,
observing traffic, the behavior of students in class, how employees work, and pedestrian flow.
• The advantage is that the respondent is unaware that they are being observed, hence they act
normal, hence reducing biases in data.
• The disadvantage is that this is a passive form of data collection hence the respondent cannot be
questioned as to why they doing what they are doing.
• Surveys: This is the most used method to collect data and it is done through questionnaires. The
questionnaire is administered to the respondent by directly asking them questions.
• There are different types of surveys, we have: Personal interviews, Telephone interview, and e-
surveys
• Experimentation: This is when data is collected by conducting experiments. Here there are controlled
conditions while other variables are being manipulated. Examples can be, changing advertising platforms
to push products sales.
• The advantage of this method is that it produces high-quality data that is likely to be accurate. This leads to reliable
and valid statistical results.
• The disadvantage is costly and time-consuming with some complications when certain variables are to be
controlled.
Starting Salaries of Graduates
12
Frequencies 10
0
5000 6000 7000 8000 9000 10000 11000 13000 14000 15000
Salaries
• Suppose we have the following information. From the data of salaries of people when they
started working, we can categorise them as;
Males 15
Females 30
Gender
15
30
• A stem-and-leaf display is a shorthand notation for expressing the values in
ascending order, from lowest to highest. We may gain a sense of the typical value, or
center, of the set of data, as well as how the values are spread, from the
presentation.
• Each value is broken into two parts—a "stem" and a "leaf"—to create a stem-and-
leaf display. The first portion of the number is the stem, and the latter part of the
number is the leaf. Although the values can be split in a variety of ways depending on
the types of values you're working with, we normally use the leaf to represent the
value's last digit and the stem to represent the value's preceding digits. The value 46,
for example, has a stem of 4 and a leaf of 6. The stem is 19 and the leaf is 2 for the
value 192.
• When constructing a stem and leaf display, we start by locating the lowest and
greatest values in the data set before generating the actual stem-and-leaf display.
This will provide us with the beginning and last stems for our display. Write down
each conceivable stem in a vertical column, beginning with the lowest and ending
with the tallest. Then, on the row holding its stem, write each value's leaf. After
you've done this for each value, you'll need to sort each row from lowest to highest.
• Consider the following test scores:
83 86 65 94 88 51 76 75 86 64 91
47 71 48 68 45 83 76 92 82 96 82
71 56 79 90 92 76 74 98 75 69
Construct the Steam and Leaf for the above scores.
• The following are the ages of people who stay at an old age home around Bloemfontein.
Construct a stem and leaf display for them.
43 68 46 43 34 59 55 42 73 71 73
75 71 84 43 36 68 54 77 76 62 59
62 69 31 60 82 88 68 59 81 51 75
79 51 62 41 50 74 82 61 56 66 75
52 59 44 58 51 58 72 62 72 50
• A frequency distribution is another graphical tool we may use to evaluate data. We
divide the data into classes and count the number of times each one is represented.
• Consider the following frequency distribution of 32 test scores from one module.
Scores Frequency
40-50 3
50-60 2
60-70 4
70-80 9
80-90 7
90-100 7
The first class, 40–50, consists of values from 40 up to, but not including, 50. A value of
50 would be counted in the second class, 50–60. In the first class, 40 is referred to as
the lower-class limit, whereas 50 is the upper-class limit. The upper-class limit is not
actually included in the class but is the boundary point for beginning the next class.
The class size for a given class is the distance from its lower limit to its upper limit.
𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 = 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 − 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡
The class size for each class in the previous example is 10 (50 – 40 = 10, 60 – 50 = 10,
etc.).
The class mark for a given class is the midpoint of the class.
𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 + 𝑢𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡
𝑐𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 =
2
In the above example class mark for first class is 45.
There are 2 rules to follow when creating a frequency distribution
• The classes must, first and foremost, be mutually exhaustive. This implies that each value in the data
collection must fall into one of the classes. Consider the following frequency distribution in Table 1. There are
some values that will not appear in the distribution table because they do not belong in any of the classes.
Suppose we had 51, in which class would it belong?
• The second rule is that the classes must be mutually exclusive. This means that the two classes may not
overlap. Looking at the second table, classes overlap. Suppose we had a value of 63, in which class would it
be because it can either be in second class or third class?
40-48 40-55
52-65 50-65
60-75
65-73 70-85
76-89 85-100
89-100
• The ideal way to satisfy both conditions is to make sure that the lowest value belongs
in the first class, the highest value belongs in the last class, and the top limit of one
class corresponds to the lower limit of the next. This eliminates both overlaps and
missing values.
• Do not use either too many classes or fewer classes. Making too many classes breaks
data too much while few classes does not break data enough, hence 5 to 20 classes
are recommended.
• Use equal class sizes and that will enable us to compare frequencies easily. We can
not compare a frequency of a class size 5 to that of class size 10.
• Another thing to think about is avoiding open-ended classes. An open-ended class is
one that lacks one of its constraints. If we were looking at household incomes,
R100,000 and up would be considered an open-ended class. Below R20,000, on the
other hand, is not an open-ended category because a household's income cannot be
less than R0. As a result, R0 is the true lower limit for that class. The range of the
class should be R0–R20,000.
Use data in example 1 to construct
frequency distribution.
• The graph constructed using cumulative
frequencies is called Ogive
• We will use the score example to construct
this graph
• This is the graph that is used to present the frequency distribution.
• The horizontal axis shows the class limits while the vertical axes is the frequencies.
as an example, we will consider the test score data above and construct the histogram.
•
This relationship can be depicted
using the following graphs:
•Scatter plot: it displays the data of 2
numerical random variables on a x-y graph.
•Trendline Graph: data is plotted over time
Scatter plot: we can construct the scatter of Trendline: consider the amount of electricity
between the number of hours studying for an over 12 months
exam and the marks obtained
• This is the single value that gives us the idea of the center of the data. This value gives us the idea of where most of our data
points fall.
1. Mean for ungrouped data
• Sample mean is calculated as
σ𝑥
𝑥ҧ =
𝑛
• Population mean is calculated as
σ𝑥
𝜇=
𝑁
Note that when calculating population, we divide by capital N because we taking all population units, while the sample is small n.
Example 1
• A manager of a local restaurant is interested in the number of people who eat there on Fridays. Here are the totals for nine
randomly selected Friday.
712 626 600 596 655 682 642 532 526
Example
• Use example 1 data to find mean of grouped data.
• After the values have been ordered from lowest to highest, the median of a set of
data is a value that divides the set of data into two equal groups. What is the
location of a road's median? In the middle of the road.
• When looking for a median, there are two possibilities. There will be one value in
the center of the data if there are an odd number of items in the collection, and this
value is the median.
• If the number of values in the collection of data is even, however, there is no one
value in the middle. We take the mean of the two numbers in the center in this
situation.
Example
• Using the above data on the number of people and the units of electricity, find the
median
The formula used to calculate median for grouped data is as follows:
𝑛
𝑐[ − 𝑓(<)]
𝑀𝑐 = 𝑂𝑚𝑒 + 2
𝑓𝑚𝑒
Where:
𝑀𝑐 is the median of the grouped data
𝑂𝑚𝑒 is the lower limit of the median interval
C is the class width
n is the sample size
𝑓𝑚𝑒 is the frequency count of the median interval
𝑓 < is the cumulative frequency count of all intervals before the median interval
Example
Use the data on the test score to calculate the mean and the median.
𝑂𝑚𝑒 =
𝑓𝑚𝑒
𝑓 <
32
10 −9
2
𝑀𝑐 = 70 + = 77.78
9
Weights Frequency
20-Oct 0
20-30 7
30-40 12
40-50 23
50-60 16
60-70 2
• The mode is the value in the data set that appears the most frequently. It is also referred to as the most
typical situation.
• Unimodal data is defined as a set of data having only one value that occurs with the greatest frequency.
• When two values occur with the same maximum frequency in a data collection, both values are regarded
as the mode, and the data set is called bimodal.
• A data set is considered to be multimodal if it contains more than two values that occur with the same
maximum frequency. Each value is utilized as the mode.
• The data set is said to have no mode if no data value appears more than once. There can be more than
one mode in a data collection, or none at all.
Properties and uses of the Mode
1. When the most common scenario is wanted, the mode is employed.
2. The mode is the most straightforward average to calculate.
3. When the data is nominal or categorical, such as religious preference, gender, or political affiliation, the
mode can be employed.
4. The mode isn't necessarily unique. A data set may contain more than one mode, or it may not have any at
all.
The data show the
Find the mode for the 104 104 104 104 104 107
number of licensed cars
number of branches that 109 109 109 110 109 111 Find the mode.
in the Sol plaatjie for a
six banks have. 112 111 109
recent 15-year period.
𝑜𝑚𝑜
𝑐
𝑓𝑚
𝑓𝑚−1
𝑓𝑚+1
•
• 𝑜𝑚𝑜 = 70
• 𝑐 = 10
• 𝑓𝑚 = 9
• 𝑓𝑚−1 = 4
• 𝑓𝑚+1 = 7
10 9 − 4
𝑀𝑜 = 70 + = 77.14
2∗9 −4−7
• The median divides a set of data in half. Quartiles are the ideal
metrics to use if we want to divide a set of data into quarters.
• The value that divides the first quarter of a data set from the
remainder is known as the first quartile, or 𝑄1 .
• The value that divides the last quarter of a data set from the rest is
the third quartile, abbreviated 𝑄3 .
• A set of data is divided into two equal groups by the median. The
"median" of the first group is in the first quartile, while the "median"
of the second group is in the third quartile.
Here are the maths scores for 19 randomly selected students. Find the median, as well as the first quartile and third quartile.
480 370 540 660 650 710 470 490 630 390 430 320 470 400 430 570 450 470 530
Solution
• The first step is to put the values in ascending order and find the median as before.
• Since there are 19 values, we will have two groups of nine values with one value left in the middle. That middle value is the
median.
• First Group: 320 370 390 400 430 430 450 470 470 Median: 470
• Second Group: 480 490 530 540 570 630 650 650 710
• The median is 470. Now to find the first quartile, we need to find the “median” of the first group of nine values. This first group
will be broken into two groups of four values with one value left in the middle. That middle value is the first quartile, 𝑄1 .
320 370 390 400 430 430 450 470 470
• The first quartile is 430. This is the score that separates the first quarter of the values from the rest.
• To find the third quartile, 𝑄3 , we repeat the same procedure with the second group of nine values.
480 490 530 540 570 630 650 650 710
• The third quartile is 570. This is the score that separates the last quarter of the data from the rest.
•
𝑛
𝑐( − 𝑓 < )
𝑄1 = 𝑂𝑄1 + 4
𝑓𝑄1
3𝑛
𝑐( − 𝑓 < )
𝑄3 = 𝑂𝑄3 + 4
𝑓𝑄3
The range indicates the distance between the lowest and highest values. The range has the drawback of being sensitive to outliers. If there is
an outlier in a collection of data, the range uses it in its calculations. Another issue with range is that two sets of data can be distributed in
completely different ways while still having the same range.
• Here are the scores of two golfers from last month’s matches.
Jack 71 72 73 71 73 82 70 72 68
Greg 75 73 77 78 78 81 74 71 85
Both golfers have a 14-stroke range, yet their scores are spread out very
differently. Jack's scores are mostly in the range of 68 to 73, with an 82 as an
anomaly, and Greg's scores range from 71 to 85.
The range should only be used as a starting point when looking at the dispersion of
a set of data. Although it can give us an indication of how dispersed the values are,
it cannot provide us with a complete picture.
The interquartile range, or the distance between the first and third quartiles, is another measure
of dispersion.
Rather than showing us how far away the two extreme values are, it shows us the range in which
the middle 50% of the values can be found. It is not affected by outliers.
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Example:
Let’s consider the example on quartile chapter of 19 test score.
We found the following;
𝑚𝑒𝑑𝑖𝑎𝑛 = 470
𝐿𝑜𝑤𝑒𝑟 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = 430
𝑇ℎ𝑖𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 = 570
Therefore;
𝐼𝑄𝑅 = 570 − 430 = 140
• To calculate the mean deviation for a set of data, first determine the distance between each value
and the mean. The mean of these distances is then calculated. This is the standard deviation of the
data.
• In a nutshell, it informs us how distant the values are on average from the data’s center. Because
distance is a nonnegative metric, we use the absolute value of the difference between the value and
the mean to calculate each distance. The formula is as follows:
σ 𝑥𝑖 − 𝑥ҧ
𝑀𝐷 =
𝑛
• Calculation Mean Deviation
Although this calculation employs sample notation, the process for calculating the mean deviation of a
population is the same. Simply change 𝑥ҧ to 𝜇 and n to N.
The stages for this computation are as follows.
1. Determine the average.
2. Subtract each value from the mean.
3. Calculate each difference's absolute value.
4. Add up the distances.
5. Divide the total distance of values in the set by the number of values in the set.
1. A uber owner is interested in the number of fares for his drivers on
Fridays. He randomly selects seven drivers, and then randomly selects
one Friday for each of the drivers. Here are the number of fares for
each. Find the mean deviation for these totals.
32 27 30 41 29 38 34
2. Here is a list of the JSE’s daily volumes for one week of trading, in
millions of shares. Find the mean deviation for these values.
669 754 752 771 835
• Variance is another measure of dispersion that measures from the inside out. With two exceptions, variance is like mean deviation.
• The first difference between these two measures is that instead of calculating the absolute value, we square the difference between each value and
the mean.
• Another distinction is that there are two formulas to use depending on whether we’re looking for a sample or population variance. The two
formulas are listed below,
σ 𝑥𝑖 − 𝑥ҧ 2
𝑠2 = ; 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛 − 12
σ 𝑥𝑖 − 𝜇
𝜎2 = ; 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑁
• Calculating Variance
• It's worth noting that the two formulas differ significantly. The sample variance formula requires us to subtract 1 from the sample size, whereas the
population variance formula utilizes the population size as the denominator without subtracting 1. Why is there a distinction? When the sample
variance is subtracted from the sample size, the sample variance becomes an unbiased estimator of the population variance.
• We must be able to distinguish if a set of data is a sample or a population. Using the sample formula incorrectly will result in an excessively large
variation. When the population formula is used incorrectly, the variance is too little.
•
• The standard deviation is the most common measure of dispersion that we will use in this
course. The square root of the variance is the standard deviation of a set of data.
𝑠 = 𝑠 2 ; 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝜎 = 𝜎 2 ; 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Examples:
1. Using the uber owner example, find the standard deviation.
2. A student is interested in how old women are when they first get married. To estimate the
mean age, she goes into 11 randomly selected chat rooms, and asks randomly selected
women how old they were at their first marriage until she gets a response in each room. Here
are the 11 ages.
Find the standard deviation of these ages.
21 20 19 16 22 21 21 19 18 24 18
This is the measure of relative variability, and its formula is as follows;
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑉 = %
𝑚𝑒𝑎𝑛
𝑠
= %
𝑥ҧ
This is expressed as percentage as it expresses the variability of the random variable.
This assists in comparing variability across different samples.
The smaller the percentage of CV, the more concentrated the data values are to their
mean, and the larger value of CV says the data values are widely dispersed about their
mean.
Examples:
From the above 2 examples, calculate the CV.
Consider the following data of the number of children born
around free state hospitals on Christmas over the last 15 years.
26 26 24 23 23 23 25 23
22 22 23 22 23 25 24
This is called the standardization of data. This is mostly done when we have data points from
different
• This is used to measure how far the data points deviates from the
data mean
• When a data value (𝑥) has a z-score that is either below –3 or above
+3, it is considered an outlier. The characteristic that values (𝑥) of a
normally distributed random variable lie within 3 standard deviations
of its mean is the basis for this rule of thumb. (i.e. a z-score −3 ≤
𝑧 − 𝑠𝑐𝑜𝑟𝑒 ≤ +3) As a result, x-values with z-scores more than ±3
standard deviations are considered outliers.
• Here is a list of the Johannesburg Stock Exchange’s daily volumes for
one week of trading, in millions of shares. Convert each volume to its
z-score.
669 754 752 771 835
• A boxplot is another technique to graphically depict a set of data. The
lowest value, the first quartile, the median, the third quartile, and
the highest value are all required to produce a boxplot. The five-
number summary for a set of data is sometimes referred to as the
five-number summary. We draw a box from the first to the third
quartile above a horizontal axis. At the median, we drew a dashed
line in the box. Finally, from the box, extend line segments to the
lowest and highest values.
•
• Find the interquartile range for the following set of values.
1. 45 58 50 47 55 60 40 43 50 55 40 43 48 46 56 46
2. 260 56 65 19 63 74 63 55 105 23 30 49 13 68 31 86 101 91 70 76 15 55 50 98 35 104
57 57 17 107 98 47 49 84 98 74 33
• Construct a boxplot for a set of data that has the following five-number summary. Be
sure to label both the range and the interquartile range.
Rule 2: For any 2 events, A & B, the probability that one or both occur is
𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵
= 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
From Table 1, we have:
Elementary event (or “sample point”) is a teacher.
Event is any set of teachers. (e.g., region, level, or combination).
Simple Experiment: Select 1 teacher at random, so
1106
𝑃 𝑃𝑟𝑖𝑚𝑎𝑟𝑦 = = 0.555
1991
885
𝑃 𝑁𝑜𝑡 𝑃𝑟𝑖𝑚𝑎𝑟𝑦 = 𝑃 𝑆𝑒𝑐𝑜𝑛𝑑𝑎𝑟𝑦 = = 0.445
1991
Events are a Primary teacher from the South & a Primary
teacher from the far central,
𝑃 𝑃𝑟𝑖𝑚𝑎𝑟𝑦 𝑖𝑛 𝑠𝑜𝑢𝑡ℎ 𝑜𝑟 𝑓𝑎𝑟 𝑐𝑒𝑛𝑡𝑟𝑎𝑙
= 𝑃 𝑝𝑟𝑖𝑚𝑎𝑟𝑦, 𝑠𝑜𝑢𝑡ℎ + 𝑃 𝑝𝑟𝑖𝑚𝑎𝑟𝑦, 𝑓𝑎𝑟 𝑐𝑒𝑛𝑡𝑟𝑎𝑙
240 279
= + = 0.121 + 0.140 = 0.261
1991 1991
Conditional Probability equals the probability of an event A given that
we know that event B has occurred.
𝑃 𝐴∩𝐵 𝑃 𝐴, 𝐵
𝑃 𝐴𝐵 = =
𝑃 𝐵 𝑃 𝐵
A = {𝐸1 𝐸3 𝐸4 }
𝐵 = {𝐸2 𝐸3 }
A picture containing text, clipart Icon A picture containing text, clipart, vector graphics Icon
Description automatically generated Description automatically generated Description automatically generated Description automatically generated
A picture containing text, clipart Icon A picture containing text, clipart, vector graphics