0% found this document useful (0 votes)
4 views24 pages

Note 1

This document introduces the fundamentals of statistics, covering definitions, types of statistics (descriptive and inferential), and key concepts such as population vs. sample, data sources, and sampling techniques. It emphasizes the importance of statistical methods in making informed decisions under uncertainty and outlines various data collection methods. The document also includes exercises to reinforce understanding of the material presented.

Uploaded by

Nur Atikah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views24 pages

Note 1

This document introduces the fundamentals of statistics, covering definitions, types of statistics (descriptive and inferential), and key concepts such as population vs. sample, data sources, and sampling techniques. It emphasizes the importance of statistical methods in making informed decisions under uncertainty and outlines various data collection methods. The document also includes exercises to reinforce understanding of the material presented.

Uploaded by

Nur Atikah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

CHAPTER 1
INTRODUCTION TO STATISTICS

1.1 Definition of Statistics


1.2 Types of Statistics
1.3 Population vs Sample
1.4 Sources of Data
1.5 Data Collection and Sampling Techniques
1.6 Basic Terms
1.7 Types of Variables and Data
1.8 Cross-section Data vs Time-series Data
1.9 Pilot Study & Questionnaires
1.10 Steps in Statistical Investigation

Every day we need to make decisions. Usually these decisions are made under conditions of
uncertainty. Many times, the situations or problems we face in the real world have no precise
or definite solution. Statistical methods help us make scientific and intelligent decisions in such
situations (educated guesses).

A Scottish landowner and president of the Board of Agriculture, Sir John Sinclair introduced
the word statistics into the English language in the 1798 publication of his book on a statistical
account of Scotland. The word statistics is derived from the Latin word status, which is loosely
defined as a statesman.

You will learn:


▪ the basic terms and
▪ the concepts of statistics.
The terms and concepts will bridge your understanding of the concepts and techniques
presented later in the next classes.

1
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.1 Definition of Statistics

The word ‘statistics’ has two meanings.

Statistics refers to

▪ The Statistics Department released a report stating that about 390,000 out of 560,000 SPM
candidates, or 72.1%, preferred to join the workforce after the examination. [Source]

▪ The downward trend among bankrupt youths was also seen in those aged 25 to 34 where
in 2020, a total of 1,741 youths became bankrupt. This number decreased to 1,060 in 2021
and 425 this year until June. [Source]

Statistics refers to

Statistics is a science of collecting, analyzing, presenting and interpreting data, as well as of


making decisions based on such analyses.

▪ Theoretical/mathematical statistics: Provide development, derivation and proof of


statistical theorems, formulas, rules, and laws.

▪ Applied statistics: Applications of theorems, formulas, rules, and laws to solve real-world
problems.

2
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.2 Types of Statistics

Descriptive Statistics
▪ Consists of methods for organizing, displaying, and describing data by using tables,
graphs, and summary measures.
▪ Deals with the description and analysis of a given group of data.
▪ Present information in a convenient, usable, and comprehensible form.

The chart shows the lobbying spending by five selected companies during 2014. Many
companies spend millions of dollars to win favors in Washington. According to Fortune
Magazine, “Comcast has remained one of the biggest corporate lobbyists in the country.” In
2014, Comcast spent $17 million, Google spent $16.8 million, AT&T spent $14.2 million, Verizon
spent $13.3 million, and Time Warner Cable spent $7.8 million on lobbying.

These numbers simply describe the total amounts spent by these companies on lobbying. We
are not drawing any inferences, decisions, or predictions from these data.

Hence, this data set and its presentation is an example of descriptive statistics.

3
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Inferential Statistics (Inductive Statistics)


▪ Consists of methods that use sample results to make decisions or predictions about a
population.
▪ Deals with the problems of making inferences or drawing conclusions about population
based on information obtained from the samples taken from the population.

A poll of 176,903 American adults, aged 18 and older, was conducted in 2014 to get their
outlook on life. According to this poll, 54.1% rated their lives as “highly enough to be
considered thriving,” 42.1% said they were struggling, and 3.8% mentioned that they were
suffering. As mentioned in the chart, the margin of sampling error was ±1%.

In later notes, the concept of margin of error will be discussed, which can be combined with
these percentages when making inferences, and how to apply these results to the entire
population of adults.

Such decision making about the population based on sample results is called inferential
statistics.

4
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.1
Determine whether descriptive or inferential statistics:

1. The average price of a 30-second advertisement during the Academy Awards show in a
recent year was 1.90 million dollars.

2. The Department of Economic and Social Affairs predicts that the population of Mexico
City, Mexico in 2030 will be 238 647 000 people.

3. A medical report stated that taking statins (a kind of drug that reduces the cholesterol
level in blood) is proven to lower heart attacks, but some people are at a slightly higher
risk of developing diabetes when taking statins.

4. In a survey of higher-learning institutions, 60% of the institutions offered online education


classes in 2021.

5
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.3 Population vs Sample

Population/Target Population
Consists of all elements (individuals, items, or objects) whose characteristics are being studied.

Sample
A portion of the population selected for study.

Population and sample

Exercise 1.2
Suppose a researcher want to study the body mass index (BMI) of the students from USM.
The population is
The sample is

Population parameter
A numerical measure (mean, median, mode, range, variance, standard deviation) calculated
for a population data set.

Sample statistic
A summary measure calculated for a sample data set.

6
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.4 Sources of Data

1) Internal sources
▪ Data taken from the records of the organization itself, such as a company’s personnel
files, accounting records, etc.
▪ Accurate and reliable since records are kept by the organization itself.
▪ Example: School of Mathematical Sciences might use data that exists in its own records
to analyse the changes in the number of MAA161 students over 10 semesters.

2) External sources
▪ Data taken from the sources outside the organization.
▪ Examples: Government’s reports, private reporting organization, etc.

Primary data
▪ Data that are published or released by the same organization that collected them.
▪ Primary data are preferred - more relevant, more elaborate, more reliable and in
clearer details.

Secondary data
▪ Data that are published by an organization, but the data are collected by other
organization.

NO DATA??? WHAT WILL YOU DO?

3) Surveys and experiments


Sometimes the data we need may not be available from internal or external sources. Therefore,
we need to collect data by conducting our own survey or experiment.

Census
A survey that includes

Sample survey
A survey that includes

WHICH ONE WILL YOU CHOOSE?

7
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Sample Survey - Save time & cost!


▪ In most cases, the size of the population is quite large. Therefore, conducting a census
will take a long time. A sample survey can be conducted very quickly.
▪ A sample survey needs less manpower if compared with a census!

Census - Mission impossible?


▪ Sometimes it is impossible to identify all member of the population.
Example: Conducting a survey about the opinion of TV viewers about a program.
But we don’t know exactly who watched a particular TV program!

▪ Sometimes conducting a survey means destroying the items included in the survey.
Example: Conducting a survey to estimate the mean life of all light bulbs.
This would burn out all the bulbs included in the survey!

Purpose of Statistics
Drawing conclusions about the population by studying the sample data set.

Collect information from a Draw conclusions about


comparatively SMALL sample a LARGE population

Example:
Marketers want to know the proportion of FB users who view the advertisements. A million
users may be watching the ads on a given time. They can observe a sample of 1,000 FB users
and calculate the sample proportion as an estimate of the proportion of all FB users that view
their ads.

HOW TO DO IT?

1.5 Data Collection and Sampling Techniques


1) Observation
Decisions concerning local traffic flow are based on observations of flow made by video
cameras or teams of observers.
“Would it help to replace the crossroads by a roundabout?”

2) Questionnaire
▪ Face-to-face interview
▪ By mail or e-mail
▪ By phone

8
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Types of Samples
Representative sample
A sample that represents the characteristics of the population as closely as possible.

SAMPLES

Nonprobability/ Probability/
Nonrandom Random
Samples Samples

Simple
Stratified Systematic Cluster
random

Judgment Convenience

Non-random sample
Some members of the population may not have any chance of being selected in the
sample

Random sample
▪ A sample drawn in such a way that each element of the population has a chance
of being selected.
▪ If all samples of the same size selected from a population have the same chance
of being selected, we call it

Exercise 1.3
Suppose there are 60 students in MAA161 course. I want to select 10 students to answer tutorial
questions at the front. How do I do it?

9
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.4

Obtained by dividing the


population into subgroups
Obtained from the
(strata) according to some
population based on the
characteristic relevant to the
study. Then subjects are judgment or prior
knowledge of an expert.
selected at random from each
subgroup.

Obtained by dividing the


The most accessible
population into sections
members of the
(clusters) and then
population are selected.
selecting one or more All members of the
clusters at random and population have an equal
using all members in the chance of being selected.
cluster(s) as the members
of the sample.

Obtained by selecting every kth


member of the population where
k is a counting number.

10
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Random Sampling
1) Simple random sampling
▪ Every individual or item from the frame has an equal chance of being selected.
▪ Selection may be with replacement (selected individual is returned to frame for
possible reselection), or without replacement (selected individual is not returned to
the frame).
▪ Example: Samples obtained from table of random numbers or computer random
number generators.
Random number generator:
https://www.calculatorsoup.com/calculators/statistics/random-number-generator.php

2) Systematic random sampling

Divide the Randomly


From the total Select
frame of N select one
population, N, every kth
individuals individual
decide on a individual
into groups of from the
sample size: n thereafter
k individuals first group

Example: Employer wants to select four individuals from a list of 40 employees of the company.
All employees are listed in alphabetical order. From the first 10 numbers, randomly select a
starting point: number 7. From number 7 onwards, every 10th person on the list is selected.

Features:
▪ Easy to be carried out.
▪ Mainly used in factories: Every 100 items produced by a machine is tested for quality
control purposes.
▪ Every member of the population does not have the same probability of being selected.

11
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

3) Stratified random sampling

Divide population into two or more subgroups (strata) according to


some common characteristics

A simple random sample is selected from each subgroup, with


sample sizes proportional to strata sizes

Samples from subgroups are combined into one


Example:
Sampling population of voters, stratifying across racial, gender or socio-economic lines.

Features:
▪ A population that differs widely in the possession of a characteristic, is divided into
different strata.
▪ The elements in each stratum have the similar characteristic.

4) Cluster random sampling

All items in the selected


cluster can be used
Divide population into
A simple random OR
several clusters; each
sample is selected from items can be chosen
representative of the
each cluster from a cluster using
population
another probability
sampling technique

Example:
Election exit polls, where certain election districts are selected and sampled.

Feature:
Each cluster is a representative of the population.

12
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.5
Which sampling method was used?

13
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.6
State which sampling method was used.

1. Out of 10 clinics in a city, a researcher randomly selects one and collects records for a 24-
hour period on the types of emergencies that were treated there.

2. A researcher divides a group of students according to gender, minor program, CGPA. Six
students from each group are selected randomly to answer questions in a survey.

3. The subscribers to a magazine are numbered. Then a sample of them is selected using
random numbers.

4. Every 10th bottle of soda is selected and the amount of liquid in the bottle is measured.
The purpose is to see if the machines that fill the bottles are working properly.

5. To determine how long people exercise, a researcher interviews 5 people selected from a
dance class, 5 people selected from a weight-lifting class, 5 people selected from an
aerobics class, and 5 people from swimming classes.

6. In an educational research study, a researcher selects a library randomly and interviews


all visitors that day.

7. Every seventh customer entering a shopping mall is asked to select her or his favourite
store.

8. In a large school district, a researcher numbers all the full-time teachers and then
randomly selects 30 teachers to be interviewed.

14
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Sampling and Non-sampling Errors

1) Sampling error
▪ The difference between the result obtained from a sample survey and the result that
would have been obtained if the whole population had been included in the survey.
▪ Can occur only in sample survey.

2) Non-sampling errors
▪ The errors that occur in the collection, recording and tabulation of the data.
▪ Can occur in a sample survey and in a census.

SELECTION ERROR RESPONSE ERROR


When the sampling frame When the people

NON-SAMPLING ERRORS

NON-RESPONSE ERROR VOLUNTARY RESPONSE ERROR


When many of the people When a survey

Sampling frame: The list of the members of the target population that is used to select the
sample.

15
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.6 Basic Terms

Element/Member
An element or member of a sample or population is a specific subject or object about which
the information is collected.

Variable
A variable is a characteristic under study that assumes different values for different elements.

Observation/Measurement
The value of a variable for an element.

Data Set
A collection of observations on one or more variables.

Exercise 1.7
State the terms used in Table 1.1.

Table 1.1: Total revenues of six companies in 2010

16
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.7 Types of Variables and Data

VARIABLES

Categorical Numerical

Nominal Ordinal Discrete Continuous

For rating:

1) Numerical/Quantitative variable
A variable whose values can be measured numerically.
The data collected on a quantitative variable are called quantitative data.
Can be classified as discrete variables or continuous variables.
Example:

a) Discrete variable
▪ A variable whose values are countable and usually integer-valued.
▪ Can assume only certain values with no intermediate values.
▪ Example:
b) Continuous variable
▪ A variable whose values cannot take exact value.
▪ The precision depends on the instruments.
▪ Assume any numerical value over a certain interval or intervals.
▪ Example:

2) Categorical/Qualitative variable
A variable that cannot assume a numerical value but can be classified or ranked into two or
more nonnumeric categories. The data collected on such a variable are called qualitative data.
Example:

17
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Types of Measurement
There are four types of measurement scales are used: nominal, ordinal, interval and ratio.

1) Nominal scale
Classifies data into distinct categories in which no ranking is implied.

Categorical Variables Categories


Do you have a Tiktok account?
Type of investment
Mobile provider

2) Ordinal scale
Classifies data into distinct categories in which ranking is implied.

Categorical Variables Ordered Categories


Student class designation
Faculty rank
Student grades
Product satisfaction

3) Interval scale
An ordered scale in which the difference between measurements is a meaningful quantity but
the measurements do not have a true zero point.
Example:

4) Ratio scale
An ordered scale in which the difference between measurements is a meaningful quantity but
the measurements have a true zero point.
Example:

18
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.8
Examine the table below and answer all questions.
Table 1.2: Number of fatal accidents of the transportation industry in a specific year
Transportation industry Number of fatalities
Highway 968
Railway 44
Water vehicle 52
Aircraft 151
Source: Bureau of Labor Statistics

a) What are the variables under study? Categorise each variable as quantitative or qualitative.
Then, categorise each quantitative variable as discrete or continuous.

b) What are the type of measurement for each variable?

c) The railroad had the fewest fatalities for the specific year. Does that mean railroads have
fewer accidents than the other industries?

d) Other than safety, what factors influence a person’s choice of transportation?

e) Comment on the relationship between the variables.

19
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.8 Cross-Section Data vs Time-Series Data

Based on the time over which the data are collected, data can be classified as either cross-section
data or time-series data.

Cross-section data
Data collected on different elements at the same point in time or for the same period of time.
Example:

Time-series data
Data collected on the same element for the same variable at different points in time or for
different periods of time.
Example:

20
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.9 Pilot Study and Questionnaires

Pilot Study
It is carried out before the real survey to test the questionnaire.

A question we might think simple and self-explanatory, but it might be misunderstood by most
people. So, we need to alter the question.

The aim is to find and overcome any difficulties before using the real questionnaire.

Questionnaire
A designed form used to collect information.

The design of questionnaire consists of two parts.


▪ Part 1: Details of the respondent (age, sex, marital status, occupation, etc.)
▪ Part 2: Questions related to the investigation.

Good questionnaire?
▪ Questions should be short and simple.

▪ Questions should be free from unfamiliar words.

▪ Leading questions (questions that guide to the answer) should be avoided.


‘Many people agree with this point, do you agree?’

▪ Questions should not require any calculations to be made.

▪ If can, the questions will need the precise answer such as ‘yes’ or ‘no’, a number, a
measurement, a place, etc.

▪ An objective question is preferable to a subjective question.

21
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

1.10 Steps in Statistical Investigation

Statistical investigation is made to gain and process information to answer specific questions
about the subject being investigated using specific steps to obtain valid results.

1. Formulate the purpose of the study


▪ State the problem of study clearly!

2. Identify the variables for the study


▪ Independent variables
▪ Dependent variables

3. Define the population


▪ Who is the subject of the research?

4. Decide what sampling method you will use to collect the data
▪ Determine the sample size and sampling method.
▪ Depends on factors such as cost, time, degree of accuracy, etc.
▪ An adequate sampling enables us to obtain the information efficiently and at
minimum expenses.

5. Collect the data


▪ Face-to-face interview, mailed/emailed questionnaire, telephone interview, direct
observation, report, results of experiments, etc.

6. Summarize the data


▪ Items are counted or values are summed in different categories.
▪ Present the data in some comprehensive and suitable forms so that useful
information can be obtained easily.
▪ Examples: tables, charts, histogram, graphs, diagrams, etc.

7. Perform any statistical calculations needed


▪ Analyse the collected data using the chosen statistical methods

8. Interpret the results

9. Write a report
▪ Explain the results of the investigation.
▪ Make some recommendations.

22
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Exercise 1.9
The use of e-cigarettes or vapes have increased dramatically. A study to investigate this
alternative smoking device was conducted. The researchers considered the following factors:
carbon monoxide concentration, heart rate, and plasma nicotine concentration. The study
consisted of 32 subjects from four separate groups: cigarettes, 18-mg nicotine cartridge vape,
16-mg nicotine cartridge vape, and a device containing no vapours. After five minutes of
smoking, the cigarettes smokers saw increased levels of carbon monoxide, heart rate, and
plasma nicotine concentration. The vape smokers did not see a significant increase in heart rate
or subjective effects to the user.

Answer the following questions:


a) What type of study was this?

b) What are the independent and dependent variables?

c) Which one was the control group?

d) List some possible confounding variables (variables that influence the dependent
variable).

e) Do you think this is a good way to study the effect of vapes?

23
MAA161 S2, 2024/25 1 | INTRODUCTION TO STATISTICS

Last but not least, we know that statistical techniques can be used to describe data, compare
two or more data sets, determine if a relationship exists between variables, test hypotheses, and
make estimates about population characteristics.

However,

“There are three types of lies—lies, damn lies, and statistics.”

Statistics can be misused or misrepresented.


How?

In summary, statistics, when used properly, can be beneficial in obtaining much information,
but when used improperly, can lead to misinformation

24

You might also like