0% found this document useful (0 votes)
247 views32 pages

Lecture 1 Inferential Statistics

Here are the steps to solve this problem: (a) The stem-and-leaf plot is: 1|5 6 7 8 9 2|0 1 2 3 4 5 6 7 8 9 3|0 1 2 3 4 5 (b) The relative frequency distribution is: 1.5-1.9: 4 observations 2.0-2.9: 15 observations 3.0-3.5: 11 observations (c) The relative frequency histogram would show the bars for the intervals in (b). (d) The sample mean is 2.33, sample median is 2.3, sample range is 2.5, sample variance is 0.

Uploaded by

Juliet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
247 views32 pages

Lecture 1 Inferential Statistics

Here are the steps to solve this problem: (a) The stem-and-leaf plot is: 1|5 6 7 8 9 2|0 1 2 3 4 5 6 7 8 9 3|0 1 2 3 4 5 (b) The relative frequency distribution is: 1.5-1.9: 4 observations 2.0-2.9: 15 observations 3.0-3.5: 11 observations (c) The relative frequency histogram would show the bars for the intervals in (b). (d) The sample mean is 2.33, sample median is 2.3, sample range is 2.5, sample variance is 0.

Uploaded by

Juliet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Overview: Statistical Inference, Samples,

Populations, and Experimental Design


Use of Scientific Data

use of statistical methods - involves the


gathering of information or scientific data.

Inferential statistics – tool used by


statistical practitioners
- method designed to
contribute to the process of
making scientific judgements.
DATA
 Any set of characters that is gathered and
translated for some purpose, usually
analysis.

 It can be any character, including text and


numbers, pictures, sound, or video.

 If data is not put into context, it doesn't do


anything to a human or computer.

Source: https://www.computerhope.com/jargon/d/data.htm
INFORMATION

 Data that has been processed in such a way


as to be meaningful to the person who
receives it.

 It is any thing that is communicated.

 Data that has been converted into a more


useful or intelligible form.

 It is the set of data that has been organized


for direct utilization of mankind.
Source: http://ecomputernotes.com/fundamental/information-technology/what-do-you-mean-
by-data-and-information
Statistical methods - used to analyze data
from a process in order to gain more sense
of where in the process may be made to
improve the quality of a process.

Example 1. Manufacturing Process

Example 2. Biomedical study


Variability in Scientific Data

Inferential statistics - allow us to go beyond


merely reporting data, but rather , allow the
drawing of conclusion (inferences) about
the scientific system.

information  samples / collections of


observations  populations scientific
system

Example. Manufacturing Improvement


Important: statistical thinking by managers.
statistical inference by scientific
personnel (thru scientific data)

Data – provides an understanding of


scientific phenomena

Engineers gain valuable insight by


gathering data.
Descriptive statistics - summary of a set of
data

- gives a sense of center of location of the


data
- variability of data
- general nature of the distributions of
observations in a sample
- accompanied by graphics
- computation of mean, median, mode,
standard deviation that shows the nature
of the sample
The Role of Probability

- Allows us to have a better understanding


of statistical inference

- Elements of probability allows us to


quantify the strength in our conclusions

- Provides transition between descriptive


stat & inferential methods
EXAMPLE. Suppose that an engineer encounters data from a manufacturing
process in which 100 items are sampled and 10 are found to be defective. It is
expected and anticipated that occasionally there will be defective items.
Obviously these 100 items represent the sample. However, it has been
determined that in the long run, the company can only tolerate 5% defective
in the process. Now, the elements of probability allow the engineer to
determine how conclusive the sample information is regarding the nature of
the process. In this case, the population conceptually represents all possible
items from the process. Suppose we learn that if the process is acceptable,
that is, if it does produce items no more than 5% of which are defective, there
is a probability of 0.0282 of obtaining 10 or more defective items in a random
sample of 100 items from the process. This small probability suggests that the
process does, indeed, have a long-run rate of defective items that exceeds 5%.
In other words, under the condition of an acceptable process, the sample
information obtained would rarely occur. However, it did occur! Clearly,
though, it would occur with a much higher probability if the process defective
rate exceeded 5% by a significant amount. From this example it becomes clear
that the elements of probability aid in the translation of sample information
into something conclusive or inconclusive about the scientific system. In fact,
what was learned likely is alarming information to the engineer or manager.
Statistical methods, which we will actually detail in Chapter 10, produced a P-
value of 0.0282. The result suggests that the process very likely is not
acceptable.
How Do Probability and Statistical Inference
Work Together?
statistical inference makes use of concepts
in probability

Problems in probability
- allow us to draw conclusions about
characteristics of hypothetical data
Sampling Procedures; Collection of Data
Simple Random Sampling

- implies that any particular sample: of a


specified sample size has the same
chance of being selected as any other
sample of the same size.

- aids in the elimination of the problem of


having the sample reflect a different
(possibly more confined) population

Example. Survey (political preference)


Stratified Random Sampling

- involves random selection of a sample


within each stratum (group).

- purpose is to be sure that each of the


strata (group) is neither over or under
represented.

- Example: Political Survey by ethnic


groups
Measures of Location: The Sample Mean
and Median

Sample Mean - numerical average

Suppose that the observations in a sample


are 𝑥1 , 𝑥2 , 𝑥3 ,…, 𝑥𝑛 , The sample mean,
denoted by ̄ is

𝑛 𝑥𝑖
̄= σ𝑖=1
𝑛
Sample Median - reflect the central
tendency of the sample

Given that the observations in a sample


are 𝑥1 , 𝑥2 , 𝑥3 ,…, 𝑥𝑛 , arranged in increasing
order of magnitude, the sample
median is

𝒙(𝒏+𝟏)/𝟐 ; 𝒊𝒇 𝒏 𝒊𝒔 𝒐𝒅𝒅
𝑿 = ൞𝟏
𝒙𝒏 + 𝒙𝒏+𝟏 ; 𝒊𝒇 𝒏 𝒊𝒔 𝒆𝒗𝒆𝒏
𝟐 𝟐 𝟐
Other Measures of Locations

Trimmed mean -
"trimming away“ a
certain % of both the
largest and smallest set
of values.

Calculate the 10%


trimmed mean
of the ff. data set:
Seatwork : The following measurements
were recorded for the drying time, in
hours, of a certain brand of latex paint.
Assume that the measurements are a
simple random sample.

(a) What is the sample size for the above


sample?
(b) Calculate the sample mean for this
data.
(c) Calculate the sample median.
(d) Compute the 20% trimmed mean for
the above data set.
Measures of Variability

Sample Range = 𝒙𝒎𝒂𝒙 − 𝒙𝒎𝒊𝒏

Sample Variance measures how far a set of


numbers are spread out from their average
value
𝑛
( 𝑥 − )
̄ 2
𝑖
𝒔𝟐 = ෍
𝑛 −1
𝑖=1

Standard Deviation a measure that is used to


quantify the amount of variation or dispersion
of a set of data values
s = 𝒔𝟐
What is 'Mode‘

The mode is a statistical term that refers


to the most frequently occurring
number found in a set of numbers.

The mode is found by collecting and


organizing data in order to count the
frequency of each result.

The result with the highest number of


occurrences is the mode of the set.
BREAKING DOWN 'Mode‘
For example, in the following list of numbers, 16 is
the mode since it appears more times than any other number in
the set:

3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48

A set of numbers can have more than one mode (this is known
as bimodal) if there are multiple numbers that occur with equal
frequency, and more times than the others in the set.

3, 3, 3, 9, 16, 16, 16, 27, 37, 48

In this example, both the number 3 and the number 16 are


modes.

If no number in a set of numbers occurs more than once, that set


has no mode:

3, 6, 9, 16, 27, 37, 48


Example/Boardwork

An engineer is interested in testing the “bias” in a


pH meter. Data are collected on the meter by
measuring the pH of a neutral substance (pH = 7.0).

A sample of size 10 is taken, with results given by

7.07 6.97 7.01 7.08


7.00 7.00 7.01
7.10 7.03 6.98

Determine the sample mean, sample variance,


sample standard deviation.
Graphical Methods and Data Description

A summary of a collection of data via a


graphical display can provide insight
regarding the system from which the data
were taken.

Stem-and-leaf plot

- useful for studying the behaviour of the


distribution
- combined tabular and graphic display
Double-stem-and-leaf plot
Relative Frequency Histogram
A distribution is said to be symmetric if it
can be folded along a vertical axis so
that, the two sides coincide.
A distribution that lacks symmetry with respect to
a vertical axis is said to be skewed.

Skewness of data
Group Quiz(Cellphones Not Allowed):

The following data represent the length of life


in years, measured to the nearest tenth, of 30
similar fuel pumps:
(a) Construct a stem-and-leaf plot for the
life in years of the fuel pumps, using
the digit to the left of the decimal
point as the stem for each observation.
(b) Set up a relative frequency
distribution.
(c) Set up relative frequency histogram.
(d) Compute the sample mean, sample
median, sample range, sample variance
and sample standard deviation.
(e)Identify if the distribution is skewed to
the left, right or is symmetric.

You might also like