Engineering Data and Analysis-Lecture-1
Engineering Data and Analysis-Lecture-1
1.1
What is Statistics?
1.2
What is Statistics?
Data Information
1.3
Key Statistical Concepts
Population
— a population is the group of all items of interest to
the researcher.
— frequently very large; sometimes infinite.
e.g. all registered voters in the Philippines
Sample
— a sample is a set of data drawn from the
population.
— potentially large, but less than the population.
e.g. a sample of 765 voters exit polled on election day.
1.4
Key Statistical Concepts
The diagram depicts the relationship between the population
and the sample. The big circle is the population while the
small circle within is the sample. This emphasizes the
requirement that all elements of the sample must belong in the
population.
Population
• Problem: What is the average expenditure of households in Metro Manila?
• Population: set of all households in Metro Manila
1.6
Population and Sample
A doctor claims that three tablespoons of pure virgin coconut
oil daily can reduce weight. To test the doctor’s claim, a
researcher studied two groups of 25 women aged 35 to 40
years with weights between 130 to 140 pounds. He
administered the three tablespoons of pure virgin coconut oil
daily for a period of three months to one group of women
only. After three months, he weighed the two groups of
women.
• Identify the two populations of interest.
• What are the two samples?
1.7
Key Statistical Concepts
Parameter
— A descriptive measure of a population.
Statistic
— A descriptive measure of a sample.
1.8
Key Statistical Concepts
Population Sample
Subset
Statistic
Parameter
populations have parameters
samples have statistics
1.9
Parameter vs Statistic
Consider the case where our population consists of 40 students in a Statistics class.
Suppose we were not able to collect data from all the 40 students. Instead, we only
took a sample of 10 students from this class. Among the 10 students in the sample,
4 own iPad. Can you compute for the parameter, p?
1.10
Parameter vs Statistic
We cannot compute for the parameter, p = proportion of students in the
population with iPad but we can compute for 𝑝Ƹ (read as “p hat”), where
𝑝Ƹ is the proportion of students in the sample with iPad as follows
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑖𝑃𝑎𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 4
𝑝Ƹ = = = 0.40 or 40%.
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 10
1.11
Descriptive Statistics
Descriptive statistics deals with methods of organizing,
summarizing, and presenting data.
• One form uses graphical techniques, which allow statistics
practitioners to present data in ways that make it easy for
the reader to extract useful information.
• Another form uses numerical techniques to summarize data,
such as computing for the center, e.g, mean, median, mode,
or position, such as percentile, quartile. The range,
variance, and standard deviation measure the variability of
the data.
• Note that there is no generalization about the population or
predictions based on patterns from the sample data are made
yet. 1.12
Examples of Descriptive Statistics
• Describing or summarizing data by tables and graphs
• example: constructing histogram, bar graph, pie chart,
frequency distribution table, scatterplots, trends, etc …
1.13
Inferential statistics
Inferential statistics is a body of methods used to draw
conclusions or inferences about characteristics of populations
based on sample data.
Sample
Inference
Statistic
Parameter
1.16
Example of Inferential Statistics
During elections, candidates hire survey companies to
predict their chances of winning, i.e. estimate the proportion
of voters who will likely vote for them.
1.17
Example of Inferential Statistics
• Based on studies, there is a significant difference between
the new vaccine and old vaccine in terms of preventing
ICU visits.
1.18
Descriptive or Inferential?
1. As a result of the ongoing war between Ukraine and Russia, gas
supplies are rationed and so we can expect the price of gasoline to
increase by 25% next year.
1.19
Descriptive or Inferential?
5. A car manufacturer wishes to estimate the average lifetime of
batteries by testing a sample of 50 batteries.
1.20
The Big Picture
However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of
reliability”, namely confidence level and significance level.
1.22
Confidence & Significance Levels
The confidence level is the proportion of times that an
estimating procedure will be correct.
e.g. a confidence level of 95% means that, estimates based on this
form of statistical inference will be correct 95% of the time.
1.23
Confidence & Significance Levels
If we use α (Greek letter “alpha”) to represent significance,
then our confidence level is 1 - α.
Confidence Level
+ Significance Level
=1
1.24
Confidence & Significance Levels
Consider a statement from polling data you may hear about
in the news:
1.25
Statistical Applications
Statistical analysis plays an important role in virtually all
aspects across many disciplines.
1.26
Statistical Inquiry
1.27
Assignment # 1
Look for 3 different statistical studies in the field assigned to your group.
• a)What is the title of the study?
• b)State at least 2 specific objectives of this study that the researchers will achieve by using
statistics.
• c)Explain how the achievement of the stated objectives in (b) will be useful in decision-
making. In other words, discuss the importance of the achievement of the stated objectives.
Fields:
1. Public administration and governance
2. Economics
3. Marketing
4. Banking and Finance
5. Medicine and Epidemiology
6. Manufacturing and Production
7. Education
8. Food science and nutrition
9. Tourism
10. Sports
1.28