0% found this document useful (0 votes)
23 views

Engineering Data and Analysis-Lecture-1

1. Inferential 2. Inferential 3. Descriptive 4. Inferential 5. Inferential 6. a) Descriptive b) Inferential 1.20

Uploaded by

Chou Xi Min
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Engineering Data and Analysis-Lecture-1

1. Inferential 2. Inferential 3. Descriptive 4. Inferential 5. Inferential 6. a) Descriptive b) Inferential 1.20

Uploaded by

Chou Xi Min
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

What is Statistics?

1.1
What is Statistics?

“Statistics” is a way to get information from data.

Statistics is the branch of science that deals


with the collection, organization, analysis,
interpretation and presentation of data.

1.2
What is Statistics?

“Statistics is a way to get information from data”


Statistics

Data Information

Statistics is a tool for creating new understanding from a set of numbers.

Definitions: Oxford English Dictionary

1.3
Key Statistical Concepts
Population
— a population is the group of all items of interest to
the researcher.
— frequently very large; sometimes infinite.
e.g. all registered voters in the Philippines

Sample
— a sample is a set of data drawn from the
population.
— potentially large, but less than the population.
e.g. a sample of 765 voters exit polled on election day.
1.4
Key Statistical Concepts
The diagram depicts the relationship between the population
and the sample. The big circle is the population while the
small circle within is the sample. This emphasizes the
requirement that all elements of the sample must belong in the
population.
Population
• Problem: What is the average expenditure of households in Metro Manila?
• Population: set of all households in Metro Manila

• Problem: What is the average expenditure of households in Quezon City?


• Population: set of all households in Quezon City

• Example of population with people as elements:


• set of farmers in Central Luzon

• Examples of population with animals/objects as elements:


• collection of milkfish cultured in Luzon
• set of fluorescent bulbs manufactured for a month

1.6
Population and Sample
A doctor claims that three tablespoons of pure virgin coconut
oil daily can reduce weight. To test the doctor’s claim, a
researcher studied two groups of 25 women aged 35 to 40
years with weights between 130 to 140 pounds. He
administered the three tablespoons of pure virgin coconut oil
daily for a period of three months to one group of women
only. After three months, he weighed the two groups of
women.
• Identify the two populations of interest.
• What are the two samples?

1.7
Key Statistical Concepts
Parameter
— A descriptive measure of a population.

Statistic
— A descriptive measure of a sample.

1.8
Key Statistical Concepts
Population Sample

Subset

Statistic
Parameter
populations have parameters
samples have statistics

1.9
Parameter vs Statistic
Consider the case where our population consists of 40 students in a Statistics class.

The parameter of interest is p = proportion of students in this class with iPad.


𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑖𝑛𝑔 𝑎 𝑐𝑒𝑟𝑡𝑎𝑖𝑛 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐
𝑝 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑖𝑃𝑎𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
=
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

Suppose that among the 40 students, only 12 own an iPad.


𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑖𝑃𝑎𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 12
Thus, 𝑝 = = = 0.30 or 30%.
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 40

Suppose we were not able to collect data from all the 40 students. Instead, we only
took a sample of 10 students from this class. Among the 10 students in the sample,
4 own iPad. Can you compute for the parameter, p?

1.10
Parameter vs Statistic
We cannot compute for the parameter, p = proportion of students in the
population with iPad but we can compute for 𝑝Ƹ (read as “p hat”), where
𝑝Ƹ is the proportion of students in the sample with iPad as follows
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑖𝑃𝑎𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 4
𝑝Ƹ = = = 0.40 or 40%.
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 10

The proportion of students ෝ𝑝 in our sample with iPad is an example of a


statistic because it is a summary measure describing a characteristic of
the sample.

Suppose we redefine the population as the collection of all students


enrolled in all sections of Statistics classes so that the class consisting of
40 students earlier is now just a sample taken from this new population.
Is the earlier computed proportion of 0.30 a parameter or a statistic?

1.11
Descriptive Statistics
Descriptive statistics deals with methods of organizing,
summarizing, and presenting data.
• One form uses graphical techniques, which allow statistics
practitioners to present data in ways that make it easy for
the reader to extract useful information.
• Another form uses numerical techniques to summarize data,
such as computing for the center, e.g, mean, median, mode,
or position, such as percentile, quartile. The range,
variance, and standard deviation measure the variability of
the data.
• Note that there is no generalization about the population or
predictions based on patterns from the sample data are made
yet. 1.12
Examples of Descriptive Statistics
• Describing or summarizing data by tables and graphs
• example: constructing histogram, bar graph, pie chart,
frequency distribution table, scatterplots, trends, etc …

• Numerical descriptions of center, variability, position


• example: computing for the mean, median, mode, range,
variance, standard deviation, percentile, quartile, strengths
of association or correlation coefficient etc…

1.13
Inferential statistics
Inferential statistics is a body of methods used to draw
conclusions or inferences about characteristics of populations
based on sample data.

Descriptive statistics describe the data set that’s being


analyzed, but doesn’t allow us to draw any conclusions or
make any interferences about the data. Hence, we need
another branch of statistics: inferential statistics.

Inferential statistics is also a set of methods, but it is used to


draw conclusions or inferences about characteristics of
populations based on data from a sample.
1.14
Statistical Inference
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population

Sample

Inference

Statistic
Parameter

What can we infer about a population’s parameters


based on a sample’s statistics?
1.15
Statistical Inference
We use statistics to make inferences about parameters.

Therefore, we can make an estimate, prediction, or decision


about a population based on sample data.

Thus, we can apply what we know about a sample to the


larger population from which it was drawn!

1.16
Example of Inferential Statistics
During elections, candidates hire survey companies to
predict their chances of winning, i.e. estimate the proportion
of voters who will likely vote for them.

Because they cannot ask every one of the 65 million actual


voters, they cannot predict the outcome with 100% certainty.

Thus, they get a sample that is only a small fraction of the


population which can lead to correct inferences only a
certain percentage of the time.

1.17
Example of Inferential Statistics
• Based on studies, there is a significant difference between
the new vaccine and old vaccine in terms of preventing
ICU visits.

• A latest research indicates that hybrid learning


(combination of online and face-to-face) improves content
mastery by students than the purely traditional face-to-face
setup.

1.18
Descriptive or Inferential?
1. As a result of the ongoing war between Ukraine and Russia, gas
supplies are rationed and so we can expect the price of gasoline to
increase by 25% next year.

2. At least 5% of all fires reported last year in a certain city were


deliberately set by arsonists.

3. Of all patients who received this vaccine, 20% later developed


significant side effects.

4. As a result of recent poll, an overwhelming majority of voting


citizens approve of reviving the building of nuclear power plants.

1.19
Descriptive or Inferential?
5. A car manufacturer wishes to estimate the average lifetime of
batteries by testing a sample of 50 batteries.

6. a) A market research group wishes to determine the number of


families not eating three times a day in the sample used for their
survey.
b) A market research group wishes to determine the number of
families in the Philippines not eating three times a day based on the
sample used for their survey.

7. Janine wants to determine the variability of her six exam scores in


Algebra.

1.20
The Big Picture

image from https://online.stat.psu.edu/stat100/


1.21
Statistical Inference
Rationale:
• Large populations make investigating each member impractical
and expensive.
• Easier and cheaper to take a sample and make estimates about the
population from the sample.

However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of
reliability”, namely confidence level and significance level.

1.22
Confidence & Significance Levels
The confidence level is the proportion of times that an
estimating procedure will be correct.
e.g. a confidence level of 95% means that, estimates based on this
form of statistical inference will be correct 95% of the time.

When the purpose of the statistical inference is to draw a


conclusion about a population, the significance level
measures how frequently the conclusion will be wrong in the
long run.
E.g. a 5% significance level means that, in the long run, this type of
conclusion will be wrong 5% of the time.

1.23
Confidence & Significance Levels
If we use α (Greek letter “alpha”) to represent significance,
then our confidence level is 1 - α.

This relationship can also be stated as:

Confidence Level
+ Significance Level
=1

1.24
Confidence & Significance Levels
Consider a statement from polling data you may hear about
in the news:

“This poll is considered accurate within 3.4


percentage points, 19 times out of 20.”

In this case, our confidence level is 95% (19/20 = 0.95),


while our significance level is 5%.

1.25
Statistical Applications
Statistical analysis plays an important role in virtually all
aspects across many disciplines.

Throughout this course, we will see applications of statistics


in business, science, and technology.

1.26
Statistical Inquiry

A statistical inquiry is a designed


research that provides information
needed to solve a research problem.

1.27
Assignment # 1
Look for 3 different statistical studies in the field assigned to your group.
• a)What is the title of the study?
• b)State at least 2 specific objectives of this study that the researchers will achieve by using
statistics.
• c)Explain how the achievement of the stated objectives in (b) will be useful in decision-
making. In other words, discuss the importance of the achievement of the stated objectives.
Fields:
1. Public administration and governance
2. Economics
3. Marketing
4. Banking and Finance
5. Medicine and Epidemiology
6. Manufacturing and Production
7. Education
8. Food science and nutrition
9. Tourism
10. Sports
1.28

You might also like