Note 1
Note 1
CHAPTER 1
INTRODUCTION TO STATISTICS
Every day we need to make decisions. Usually these decisions are made under conditions of
uncertainty. Many times, the situations or problems we face in the real world have no precise
or definite solution. Statistical methods help us make scientific and intelligent decisions in such
situations (educated guesses).
A Scottish landowner and president of the Board of Agriculture, Sir John Sinclair introduced
the word statistics into the English language in the 1798 publication of his book on a statistical
account of Scotland. The word statistics is derived from the Latin word status, which is loosely
defined as a statesman.
1
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Statistics refers to
▪ The Statistics Department released a report stating that about 390,000 out of 560,000 SPM
candidates, or 72.1%, preferred to join the workforce after the examination. [Source]
▪ The downward trend among bankrupt youths was also seen in those aged 25 to 34 where
in 2020, a total of 1,741 youths became bankrupt. This number decreased to 1,060 in 2021
and 425 this year until June. [Source]
Statistics refers to
▪ Applied statistics: Applications of theorems, formulas, rules, and laws to solve real-world
problems.
2
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Descriptive Statistics
▪ Consists of methods for organizing, displaying, and describing data by using tables,
graphs, and summary measures.
▪ Deals with the description and analysis of a given group of data.
▪ Present information in a convenient, usable, and comprehensible form.
The chart shows the lobbying spending by five selected companies during 2014. Many
companies spend millions of dollars to win favors in Washington. According to Fortune
Magazine, “Comcast has remained one of the biggest corporate lobbyists in the country.” In
2014, Comcast spent $17 million, Google spent $16.8 million, AT&T spent $14.2 million, Verizon
spent $13.3 million, and Time Warner Cable spent $7.8 million on lobbying.
These numbers simply describe the total amounts spent by these companies on lobbying. We
are not drawing any inferences, decisions, or predictions from these data.
Hence, this data set and its presentation is an example of descriptive statistics.
3
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
A poll of 176,903 American adults, aged 18 and older, was conducted in 2014 to get their
outlook on life. According to this poll, 54.1% rated their lives as “highly enough to be
considered thriving,” 42.1% said they were struggling, and 3.8% mentioned that they were
suffering. As mentioned in the chart, the margin of sampling error was ±1%.
In later notes, the concept of margin of error will be discussed, which can be combined with
these percentages when making inferences, and how to apply these results to the entire
population of adults.
Such decision making about the population based on sample results is called inferential
statistics.
4
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.1
Determine whether descriptive or inferential statistics:
1. The average price of a 30-second advertisement during the Academy Awards show in a
recent year was 1.90 million dollars.
2. The Department of Economic and Social Affairs predicts that the population of Mexico
City, Mexico in 2030 will be 238 647 000 people.
3. A medical report stated that taking statins (a kind of drug that reduces the cholesterol
level in blood) is proven to lower heart attacks, but some people are at a slightly higher
risk of developing diabetes when taking statins.
5
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Population/Target Population
Consists of all elements (individuals, items, or objects) whose characteristics are being studied.
Sample
A portion of the population selected for study.
Exercise 1.2
Suppose a researcher want to study the body mass index (BMI) of the students from USM.
The population is
The sample is
Population parameter
A numerical measure (mean, median, mode, range, variance, standard deviation) calculated
for a population data set.
Sample statistic
A summary measure calculated for a sample data set.
6
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
1) Internal sources
▪ Data taken from the records of the organization itself, such as a company’s personnel
files, accounting records, etc.
▪ Accurate and reliable since records are kept by the organization itself.
▪ Example: School of Mathematical Sciences might use data that exists in its own records
to analyse the changes in the number of MAA161 students over 10 semesters.
2) External sources
▪ Data taken from the sources outside the organization.
▪ Examples: Government’s reports, private reporting organization, etc.
Primary data
▪ Data that are published or released by the same organization that collected them.
▪ Primary data are preferred - more relevant, more elaborate, more reliable and in
clearer details.
Secondary data
▪ Data that are published by an organization, but the data are collected by other
organization.
Census
A survey that includes
Sample survey
A survey that includes
7
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
▪ Sometimes conducting a survey means destroying the items included in the survey.
Example: Conducting a survey to estimate the mean life of all light bulbs.
This would burn out all the bulbs included in the survey!
Purpose of Statistics
Drawing conclusions about the population by studying the sample data set.
Example:
Marketers want to know the proportion of FB users who view the advertisements. A million
users may be watching the ads on a given time. They can observe a sample of 1,000 FB users
and calculate the sample proportion as an estimate of the proportion of all FB users that view
their ads.
HOW TO DO IT?
2) Questionnaire
▪ Face-to-face interview
▪ By mail or e-mail
▪ By phone
8
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Types of Samples
Representative sample
A sample that represents the characteristics of the population as closely as possible.
SAMPLES
Nonprobability/ Probability/
Nonrandom Random
Samples Samples
Simple
Stratified Systematic Cluster
random
Judgment Convenience
Non-random sample
Some members of the population may not have any chance of being selected in the
sample
Random sample
▪ A sample drawn in such a way that each element of the population has a chance
of being selected.
▪ If all samples of the same size selected from a population have the same chance
of being selected, we call it
Exercise 1.3
Suppose there are 60 students in MAA161 course. I want to select 10 students to answer tutorial
questions at the front. How do I do it?
9
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.4
10
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Random Sampling
1) Simple random sampling
▪ Every individual or item from the frame has an equal chance of being selected.
▪ Selection may be with replacement (selected individual is returned to frame for
possible reselection), or without replacement (selected individual is not returned to
the frame).
▪ Example: Samples obtained from table of random numbers or computer random
number generators.
Random number generator:
https://www.calculatorsoup.com/calculators/statistics/random-number-generator.php
Example: Employer wants to select four individuals from a list of 40 employees of the company.
All employees are listed in alphabetical order. From the first 10 numbers, randomly select a
starting point: number 7. From number 7 onwards, every 10th person on the list is selected.
Features:
▪ Easy to be carried out.
▪ Mainly used in factories: Every 100 items produced by a machine is tested for quality
control purposes.
▪ Every member of the population does not have the same probability of being selected.
11
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Features:
▪ A population that differs widely in the possession of a characteristic, is divided into
different strata.
▪ The elements in each stratum have the similar characteristic.
Example:
Election exit polls, where certain election districts are selected and sampled.
Feature:
Each cluster is a representative of the population.
12
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.5
Which sampling method was used?
13
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.6
State which sampling method was used.
1. Out of 10 clinics in a city, a researcher randomly selects one and collects records for a 24-
hour period on the types of emergencies that were treated there.
2. A researcher divides a group of students according to gender, minor program, CGPA. Six
students from each group are selected randomly to answer questions in a survey.
3. The subscribers to a magazine are numbered. Then a sample of them is selected using
random numbers.
4. Every 10th bottle of soda is selected and the amount of liquid in the bottle is measured.
The purpose is to see if the machines that fill the bottles are working properly.
5. To determine how long people exercise, a researcher interviews 5 people selected from a
dance class, 5 people selected from a weight-lifting class, 5 people selected from an
aerobics class, and 5 people from swimming classes.
7. Every seventh customer entering a shopping mall is asked to select her or his favourite
store.
8. In a large school district, a researcher numbers all the full-time teachers and then
randomly selects 30 teachers to be interviewed.
14
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
1) Sampling error
▪ The difference between the result obtained from a sample survey and the result that
would have been obtained if the whole population had been included in the survey.
▪ Can occur only in sample survey.
2) Non-sampling errors
▪ The errors that occur in the collection, recording and tabulation of the data.
▪ Can occur in a sample survey and in a census.
NON-SAMPLING ERRORS
Sampling frame: The list of the members of the target population that is used to select the
sample.
15
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Element/Member
An element or member of a sample or population is a specific subject or object about which
the information is collected.
Variable
A variable is a characteristic under study that assumes different values for different elements.
Observation/Measurement
The value of a variable for an element.
Data Set
A collection of observations on one or more variables.
Exercise 1.7
State the terms used in Table 1.1.
16
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
VARIABLES
Categorical Numerical
For rating:
1) Numerical/Quantitative variable
A variable whose values can be measured numerically.
The data collected on a quantitative variable are called quantitative data.
Can be classified as discrete variables or continuous variables.
Example:
a) Discrete variable
▪ A variable whose values are countable and usually integer-valued.
▪ Can assume only certain values with no intermediate values.
▪ Example:
b) Continuous variable
▪ A variable whose values cannot take exact value.
▪ The precision depends on the instruments.
▪ Assume any numerical value over a certain interval or intervals.
▪ Example:
2) Categorical/Qualitative variable
A variable that cannot assume a numerical value but can be classified or ranked into two or
more nonnumeric categories. The data collected on such a variable are called qualitative data.
Example:
17
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Types of Measurement
There are four types of measurement scales are used: nominal, ordinal, interval and ratio.
1) Nominal scale
Classifies data into distinct categories in which no ranking is implied.
2) Ordinal scale
Classifies data into distinct categories in which ranking is implied.
3) Interval scale
An ordered scale in which the difference between measurements is a meaningful quantity but
the measurements do not have a true zero point.
Example:
4) Ratio scale
An ordered scale in which the difference between measurements is a meaningful quantity but
the measurements have a true zero point.
Example:
18
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.8
Examine the table below and answer all questions.
Table 1.2: Number of fatal accidents of the transportation industry in a specific year
Transportation industry Number of fatalities
Highway 968
Railway 44
Water vehicle 52
Aircraft 151
Source: Bureau of Labor Statistics
a) What are the variables under study? Categorise each variable as quantitative or qualitative.
Then, categorise each quantitative variable as discrete or continuous.
c) The railroad had the fewest fatalities for the specific year. Does that mean railroads have
fewer accidents than the other industries?
19
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Based on the time over which the data are collected, data can be classified as either cross-section
data or time-series data.
Cross-section data
Data collected on different elements at the same point in time or for the same period of time.
Example:
Time-series data
Data collected on the same element for the same variable at different points in time or for
different periods of time.
Example:
20
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Pilot Study
It is carried out before the real survey to test the questionnaire.
A question we might think simple and self-explanatory, but it might be misunderstood by most
people. So, we need to alter the question.
The aim is to find and overcome any difficulties before using the real questionnaire.
Questionnaire
A designed form used to collect information.
Good questionnaire?
▪ Questions should be short and simple.
▪ If can, the questions will need the precise answer such as ‘yes’ or ‘no’, a number, a
measurement, a place, etc.
21
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Statistical investigation is made to gain and process information to answer specific questions
about the subject being investigated using specific steps to obtain valid results.
4. Decide what sampling method you will use to collect the data
▪ Determine the sample size and sampling method.
▪ Depends on factors such as cost, time, degree of accuracy, etc.
▪ An adequate sampling enables us to obtain the information efficiently and at
minimum expenses.
9. Write a report
▪ Explain the results of the investigation.
▪ Make some recommendations.
22
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Exercise 1.9
The use of e-cigarettes or vapes have increased dramatically. A study to investigate this
alternative smoking device was conducted. The researchers considered the following factors:
carbon monoxide concentration, heart rate, and plasma nicotine concentration. The study
consisted of 32 subjects from four separate groups: cigarettes, 18-mg nicotine cartridge vape,
16-mg nicotine cartridge vape, and a device containing no vapours. After five minutes of
smoking, the cigarettes smokers saw increased levels of carbon monoxide, heart rate, and
plasma nicotine concentration. The vape smokers did not see a significant increase in heart rate
or subjective effects to the user.
d) List some possible confounding variables (variables that influence the dependent
variable).
23
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS
Last but not least, we know that statistical techniques can be used to describe data, compare
two or more data sets, determine if a relationship exists between variables, test hypotheses, and
make estimates about population characteristics.
However,
In summary, statistics, when used properly, can be beneficial in obtaining much information,
but when used improperly, can lead to misinformation
24