CH01 - Introduction To Statistics 2
CH01 - Introduction To Statistics 2
CHAPTER 1
Introduction to Statistics
Further reading:
https://medium.com/@john_marsh7/10-awesome-reasons-why-statistics-are-
important-96b87e283640
2
• Even though you may not have realized it, you
probably have made some statistical statements
in your everyday conversation or thinking.
3
• We encounter data and conclusions based on data
every day.
• Statistics is the scientific discipline that provides
methods to help us make sense of data.
• Statistical methods are used in business, medicine,
agriculture, social sciences, natural sciences, and
applied sciences, such as engineering.
• The field of statistics teaches us how to make
intelligent judgments and informed decisions in the
presence of uncertainty and variation.
4
• Statistics is the scientific application of mathematical
principles to the collection, analysis, and presentation
of numerical data.
• Statistics is a discipline which is concerned with:
designing experiments and other data collection,
summarizing information to aid understanding,
drawing conclusions from data, and
estimating the present or predicting the future.
• There are 2 main branches of statistics:
Descriptive
Inferential
5
Descriptive Statistics
• Descriptive statistics are used to describe the basic
features of the data gathered from an experimental study
in various ways.
• The techniques are commonly classified as:
Graphical description in which we use graphs to
summarize data.
Tabular description in which we use tables to
summarize data.
Parametric description in which we estimate the
values of certain parameters which we assume to
complete the description of the set of data.
6
Graphical description:
Example:
Line chart
Bar chart
Pie chart
7
Tabular description:
Score Frequency
0 2
Example:
1 5
Frequency Table
2 8
3 6
4 4
5 3
8
Parametric description:
Mean
Median
Mode
Range Infinity in both
directions.
Standard
Deviation
Skewness 0
Kurtosis 3
9
Inferential Statistics
10
Population
Sample
11
Population & Sample
• The entire collection of individuals or object about
which information is desired is called the population of
interest.
• A sample is a subset of the population, selected for
study in some prescribed manner.
12
13
14
Data Analysis Process
15
• Statistics involves the collection and analysis of data
• Both task are critical.
• Raw data without analysis are of little value, and even a
sophisticated analysis cannot extract meaningful
information from data that were not collected in a
sensible way.
• The data analysis process can be viewed as a sequence
of steps that lead from planning to data collection to
informed conclusions based on the resulting data.
16
Steps in Data Analysis Process
6 steps:
i. Understanding the nature of the problem
ii. Deciding what to measure and how to measure it
iii. Data collection
iv. Data summarization and preliminary analysis
v. Formal data analysis
vi. Interpretation of results
17
i) Understanding the nature of the problem.
An understanding of the research problem.
Know the goal of the research and what questions
we hope to answer.
Have a clear direction before gathering data to lessen
the chance of being unable to answer the questions
of interest using the data collected.
18
ii) Deciding what to measure and how to measure it.
In some cases, the choice is obvious, e.g. in a study
of the relationship between the weight of a football
player and position played, you would need to
collect data on player weight and position.
19
But in other cases, the choice of information
is not as straightforward.
Example: In a study of the relationship between
preferred learning style and intelligence, how
would you define learning style and measure it
and what measure of intelligence would you
use?
It is important to carefully define the
variables to be studied and to develop
appropriate methods for determining their
values.
20
iii) Data collection
Decide whether an existing data source is adequate or
whether new data must be collected.
If a decision is made to use existing data (secondary
data), it is important to understand how the data were
collected and for what purpose.
21
If new data are to be collected (primary data), a
careful plan must be developed.
The type of analysis that is appropriate and
subsequent conclusions that can be drawn
depend on how the data are collected.
22
iv) Data summarization and preliminary analysis
Summarizing the data graphically and numerically.
This initial analysis provides insight into important
characteristics of the data and can provide guidance
in selecting appropriate methods for further analysis.
23
v) Formal data analysis
Select and apply the appropriate inferential
statistical methods.
24
Example
24
vi) Interpretation of results
What conclusions can be drawn from the analysis?
How do the result of the analysis inform us about the
stated research problem or question?
How can our results guide future research?
26
Data Sources:
Primary & Secondary Data
27
Primary Data
• EXPERIMENT • SURVEY
(Questionnaire/Interview)
28
Secondary Data
29
Data Types:
Qualitative & Quantitative Data
30
Example
• Oil Painting
31
Example
• Latte
32
Quantitative Data:
Discrete or Continuous
33
Example
Example 1
Number of pages in a book
is a discrete variable.
Example 2
Length of a film is a
continuous variable.
Example 3
Shoe size is a Discrete
variable. E.g. 5, 5½, 6, 6½
etc. Not in between.
34
Example
Example 4
Temperature is a
continuous variable.
Example 5
Number of people in a
race is a discrete
variable.
Example 6
Time taken to run a race
is a continuous variable.
35
Exercise #1
State the type of data for the following cases as either Discrete or
Continuous.
36
Levels of Data Measurement
37
Each level of measurement is characterized by its properties:
38
Nominal Scales
Properties: Classification.
Observations reflect: Differences in kind.
Examples: gender, ethnic background.
40
Example
41
Ordinal Scales
Properties: Classification, Order.
Observations reflect: differences in degree.
Examples: Likert scale categories, rankings,
academic letter grade, stages in development.
42
Example
43
Example
44
Interval Scales
Properties: Classification, Order, Equal Intervals.
Observations reflect: measurable differences in amount.
Examples: IQ scores, degrees of temperature.
45
However, ratios are meaningless on an interval
scale because an interval scale has no true zero.
Example:
• Temperature scales: Zero degrees Fahrenheit does
not mean the total absence of temperature.
46
Ratio Scales
Properties: Classification, Order, Equal Intervals, True Zero
Observations reflect: measurable differences in total amount
Examples: weight, income, family size.
Ratio data are the highest form of data measurement and the form
we are most familiar with.
The ratios are interpretable and it has a natural zero.
Ratio data look a lot like interval data. However, the zero point has
a special meaning in ratio-scaled data. It indicates the absence of
whatever property is being measured.
47
Ratio data always have the flavor of counting.
Example:
When you measure the amount of money that you
have, you are counting up coins and bills.
When you are measuring your height, you are
counting the number of inches off the ground to the
top of your head.
Both ratio and interval data make use of a wide
range of statistical analysis tools.
48
Summary
Source: https://www.spss-tutorials.com/measurement-levels/
49
Source: https://www.spss-tutorials.com/measurement-levels/
50
"You can have data without information, but you cannot have
information without data." — Daniel K. Moran
51
Exercise #2
Determine which of the four levels of measurement (nominal,
ordinal, interval and ratio) is most appropriate.
a) Heights of women basketball players in a tournament.
b) Ratings of superior, above average, average, below average or
poor for blind dates.
c) Today temperatures (in degrees Celsius) in Kuala Lumpur.
d) A movie critic’s classification of “drama, comedy, adventure”.
e) The number of bugs made when a programmer develop a coding
for a project.
52