0% found this document useful (0 votes)
19 views52 pages

CH01 - Introduction To Statistics 2

Uploaded by

mk.foo123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views52 pages

CH01 - Introduction To Statistics 2

Uploaded by

mk.foo123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

#Probability & Statistical Data Analysis/ Slides

CHAPTER 1

Introduction to Statistics

[email protected] : 2020/2021 Sem. 2


Why Study Statistics?

Further reading:
https://medium.com/@john_marsh7/10-awesome-reasons-why-statistics-are-
important-96b87e283640

2
• Even though you may not have realized it, you
probably have made some statistical statements
in your everyday conversation or thinking.

• Statements like "I sleep for about eight hours


per night on average" and "You are more likely
to pass the exam if you start preparing earlier"
are actually statistical in nature.

3
• We encounter data and conclusions based on data
every day.
• Statistics is the scientific discipline that provides
methods to help us make sense of data.
• Statistical methods are used in business, medicine,
agriculture, social sciences, natural sciences, and
applied sciences, such as engineering.
• The field of statistics teaches us how to make
intelligent judgments and informed decisions in the
presence of uncertainty and variation.

4
• Statistics is the scientific application of mathematical
principles to the collection, analysis, and presentation
of numerical data.
• Statistics is a discipline which is concerned with:
 designing experiments and other data collection,
 summarizing information to aid understanding,
 drawing conclusions from data, and
 estimating the present or predicting the future.
• There are 2 main branches of statistics:
 Descriptive
 Inferential

5
Descriptive Statistics
• Descriptive statistics are used to describe the basic
features of the data gathered from an experimental study
in various ways.
• The techniques are commonly classified as:
 Graphical description in which we use graphs to
summarize data.
 Tabular description in which we use tables to
summarize data.
 Parametric description in which we estimate the
values of certain parameters which we assume to
complete the description of the set of data.
6
 Graphical description:

Example:
 Line chart
 Bar chart
 Pie chart

7
 Tabular description:
Score Frequency

0 2
Example:
1 5
 Frequency Table
2 8

3 6

4 4

5 3

8
 Parametric description:

Mean 
Median 
Mode 
Range Infinity in both
directions.
Standard 
Deviation
Skewness 0
Kurtosis 3

9
Inferential Statistics

• Inferential statistics are used to draw


inferences about a population from a sample.
• It includes:
 point estimation
 interval estimation
 hypothesis testing (or significant testing)
 prediction

10
Population

Sample

11
Population & Sample
• The entire collection of individuals or object about
which information is desired is called the population of
interest.
• A sample is a subset of the population, selected for
study in some prescribed manner.

12
13
14
Data Analysis Process

15
• Statistics involves the collection and analysis of data
• Both task are critical.
• Raw data without analysis are of little value, and even a
sophisticated analysis cannot extract meaningful
information from data that were not collected in a
sensible way.
• The data analysis process can be viewed as a sequence
of steps that lead from planning to data collection to
informed conclusions based on the resulting data.

16
Steps in Data Analysis Process

6 steps:
i. Understanding the nature of the problem
ii. Deciding what to measure and how to measure it
iii. Data collection
iv. Data summarization and preliminary analysis
v. Formal data analysis
vi. Interpretation of results

17
i) Understanding the nature of the problem.
 An understanding of the research problem.
 Know the goal of the research and what questions
we hope to answer.
 Have a clear direction before gathering data to lessen
the chance of being unable to answer the questions
of interest using the data collected.

18
ii) Deciding what to measure and how to measure it.
 In some cases, the choice is obvious, e.g. in a study
of the relationship between the weight of a football
player and position played, you would need to
collect data on player weight and position.

19
 But in other cases, the choice of information
is not as straightforward.
Example: In a study of the relationship between
preferred learning style and intelligence, how
would you define learning style and measure it
and what measure of intelligence would you
use?
 It is important to carefully define the
variables to be studied and to develop
appropriate methods for determining their
values.

20
iii) Data collection
 Decide whether an existing data source is adequate or
whether new data must be collected.
 If a decision is made to use existing data (secondary
data), it is important to understand how the data were
collected and for what purpose.

21
 If new data are to be collected (primary data), a
careful plan must be developed.
 The type of analysis that is appropriate and
subsequent conclusions that can be drawn
depend on how the data are collected.

22
iv) Data summarization and preliminary analysis
 Summarizing the data graphically and numerically.
 This initial analysis provides insight into important
characteristics of the data and can provide guidance
in selecting appropriate methods for further analysis.

23
v) Formal data analysis
 Select and apply the appropriate inferential
statistical methods.

24
Example

24
vi) Interpretation of results
 What conclusions can be drawn from the analysis?
 How do the result of the analysis inform us about the
stated research problem or question?
 How can our results guide future research?

26
Data Sources:
Primary & Secondary Data

27
Primary Data

• EXPERIMENT • SURVEY
(Questionnaire/Interview)

28
Secondary Data

• EXISTING DATABASE • RECORD REVIEW

29
Data Types:
Qualitative & Quantitative Data

Qualitative Data Quantitative Data


 Deals with descriptions.  Deals with numbers.
 Data can be observed but  Data which can be
not measured. measured.
 Example: Colors,  Example: Length, height,
textures, smells, tastes, area, volume, weight,
appearance, beauty, etc. speed, time, temperature,
 Qualitative → Quality humidity, sound levels,
cost, members, ages, etc.
 Quantitative → Quantity

30
Example
• Oil Painting

Qualitative data: Quantitative data:


blue/green color, gold frame picture is 10" by 14"
smells old and musty with frame 14" by 18"
texture shows brush strokes of oil weighs 8.5 pounds
paint surface area of painting is 140 sq.
peaceful scene of the country in.
masterful brush strokes cost $300

31
Example
• Latte

Qualitative data: Quantitative data:


robust aroma 12 ounces of latte
frothy appearance serving temperature 150º F.
strong taste serving cup 7 inches in height
burgundy cup cost $4.95

32
Quantitative Data:
Discrete or Continuous

Discrete data can only take on


certain individual values.

Continuous data can take on any


value in a certain range.

33
Example

Example 1
Number of pages in a book
is a discrete variable.

Example 2
Length of a film is a
continuous variable.
Example 3
Shoe size is a Discrete
variable. E.g. 5, 5½, 6, 6½
etc. Not in between.

34
Example

Example 4
Temperature is a
continuous variable.
Example 5
Number of people in a
race is a discrete
variable.

Example 6
Time taken to run a race
is a continuous variable.

35
Exercise #1
State the type of data for the following cases as either Discrete or
Continuous.

Case Discrete or Continuous?

Number of matches in a box


Speed of a car
Population of a town
Length of crocodile
Temperature of oven
T-Shirt size

36
Levels of Data Measurement

• There are four levels of data measurement.


• Ranked from top to bottom in order of
complexity and information content.
 Ratio scale
 Interval scale
 Ordinal scale
 Nominal/categorical scale

37
Each level of measurement is characterized by its properties:

 Nominal/categorical measurement has just one property:


CLASSIFICATION.
 Ordinal measurement has two properties: CLASSIFICATION
and ORDER.
 Interval measurement has three properties:
CLASSIFICATION, ORDER and EQUAL INTERVALS.
 Ratio data has four properties: CLASSIFICATION, ORDER,
EQUAL INTERVALS and TRUE ZERO.

38
Nominal Scales
Properties: Classification.
Observations reflect: Differences in kind.
Examples: gender, ethnic background.

 Nominal scales are used to labeling variables without


any quantitative value.
 Numbers assigned to categories (as identification codes)
have no numeric value (we cannot add, subtract, divide
or multiply nominal data) and any ordering of
categories is arbitrary.
39
 Since the single property of nominal data is
classification, thus it doesn’t tells us about differences
in degree or amount.
 This is the most primitive form of measurement. The
presence vs. absence of something is a form of nominal
measurement (e.g., “do you smoke?” YES, NO).
 Collection of nominal data is easy.

40
Example

41
Ordinal Scales
Properties: Classification, Order.
Observations reflect: differences in degree.
Examples: Likert scale categories, rankings,
academic letter grade, stages in development.

 The distinctive property of ordinal measurement is order.


 On a typical Likert Scale “strongly agree” represents
more agreement than “agree”. However, we do not know
how much more.

42
Example

43
Example

The example of questionnaire that used Likert Scale:

Strongly Agree Neither Disagree Strongly


Agree Disagree

If the price of raw materials fell firms would reduce the


price of their food products. 1 2 3 4 5
Without government regulation the firms would exploit
the consumer. 1 2 3 4 5
Most food companies are so concerned about making
profits they do not care about quality. 1 2 3 4 5
The food industry spends a great deal of money making
sure that its manufacturing is hygienic. 1 2 3 4 5
Food companies should charge the same price for their
products throughout the country 1 2 3 4 5

44
Interval Scales
Properties: Classification, Order, Equal Intervals.
Observations reflect: measurable differences in amount.
Examples: IQ scores, degrees of temperature.

 Essentially, interval data are ordinal, but they have an extra


property, that is the ability to meaningfully add and subtract
measurements.
 In interval-scaled data, the gaps between the numbers are
comparable, unlike with ordinal data.
 Any interval has the same meaning regardless of its location on
the scale. Example: “ x is five inches longer than y" has meaning
regardless of the values of x and y.

45
 However, ratios are meaningless on an interval
scale because an interval scale has no true zero.
 Example:
• Temperature scales: Zero degrees Fahrenheit does
not mean the total absence of temperature.

• Decibel scales: Zero decibels does not mean there


is no sound.

46
Ratio Scales
Properties: Classification, Order, Equal Intervals, True Zero
Observations reflect: measurable differences in total amount
Examples: weight, income, family size.

 Ratio data are the highest form of data measurement and the form
we are most familiar with.
 The ratios are interpretable and it has a natural zero.
 Ratio data look a lot like interval data. However, the zero point has
a special meaning in ratio-scaled data. It indicates the absence of
whatever property is being measured.

47
 Ratio data always have the flavor of counting.
Example:
When you measure the amount of money that you
have, you are counting up coins and bills.
When you are measuring your height, you are
counting the number of inches off the ground to the
top of your head.
 Both ratio and interval data make use of a wide
range of statistical analysis tools.

48
Summary

Source: https://www.spss-tutorials.com/measurement-levels/
49
Source: https://www.spss-tutorials.com/measurement-levels/

50
"You can have data without information, but you cannot have
information without data." — Daniel K. Moran

51
Exercise #2
Determine which of the four levels of measurement (nominal,
ordinal, interval and ratio) is most appropriate.
a) Heights of women basketball players in a tournament.
b) Ratings of superior, above average, average, below average or
poor for blind dates.
c) Today temperatures (in degrees Celsius) in Kuala Lumpur.
d) A movie critic’s classification of “drama, comedy, adventure”.
e) The number of bugs made when a programmer develop a coding
for a project.

52

You might also like