0% found this document useful (0 votes)
15 views64 pages

Math236 Lecture 2

Uploaded by

tahsinefeedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views64 pages

Math236 Lecture 2

Uploaded by

tahsinefeedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

MATH 236

Engineering Statistics
2022 – 2023 Fall

Course Objective: This course aims at


providing students skills on collecting,
analyzing and interpreting statistical data.

Instructor: Dr. Necla Koçhan


Email: [email protected]

Text Book: Probability and Statistics for Engineers


and , Scientists, Walpole R., Myers R., Myers S., Ye K.,
9th edition 1
Lectures
• Every week 3 hours.
• Attendance is an essential requirement.
• PowerPoint delivery in some lectures.
• Applications with Excel.
Help!
Seek help in the following order:
• Discussion with classmates.
• Lecturer, by e-mail.
• Lecturer, by appointment (zoom).
• Other lecturers, by appointment(zoom).
You Shall Not
You Shall Not
• Course Book

6
What is Statistics?
Statistics is the science of data…

Statistics is a collection of tools for

✔ collecting data from population,


✔ organizing data,
✔ analyzing data,
✔ estimating parameters of population, and
✔ inferencing the results about population.

7
Section 1.3
Measures of
Location: The
Sample Mean
and Median

Copyright © 2017 Pearson Education ,Ltd. All rights reserved.


Describing Data Numerically : Statistics
Describing Data Numerically

Measures of Variation
Locations
Sample Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation
Mean
• The arithmetic mean (mean) is the most common
measure of central tendency
• For a population of N values:

Population
values
Population size
• For a sample of size n:

Observed
values
Sample size
Arithmetic Mean
(continued)

• The most common measure of central tendency


• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-11
Remark: Outliers
• Outliers are points that are much larger or smaller than the rest of
the sample points.
• Outliers may be data entry errors or they may be points that really are
different from the rest.
• Outliers should not be deleted without considerable thought—
sometimes calculations and analyses will be done with and without
outliers and then compared.
Median
• In an ordered list, the median is the “middle” number
(50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

• Not affected by extreme values


Finding the Median

• The location of the median:

• If the number of values is odd, the median is the middle number


• If the number of values is even, the median is the average of the two middle numbers

• Note that is not the value of the median, only the

• position of the median in the ranked data


Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may may be no mode
• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Copyright © 2017 Pearson Education Ltd. All
rights reserved. 1 - 16
Mean-Median-Mode-Trimmed Mean for
Example 1.2

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 17
Review Example
• Five houses on a hill by the beach

House Prices:

$2,000,000
500,000
300,000
100,000
100,000
Review Example:
Summary Statistics
House Prices:
• Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000
• Median: middle value of ranked data
Sum 3,000,000
= $300,000

• Mode: most frequent value

= $100,000
Which measure of location
is the “best”?
• Mean is generally used, unless extreme values
(outliers) exist . . .
• Then median is often used, since the median is not
sensitive to extreme values.
• Example: Median home prices may be reported for a
region – less sensitive to outliers
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
• Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean
Section 1.4
Measures of
Variability

Copyright © 2017 Pearson Education ,Ltd. All rights reserved.


Population Variance
• Average of squared deviations of values from the mean

• Population variance:

Where = population mean


N = population size
xi = ith value of the variable x
Sample Variance
• Average (approximately) of squared deviations of values from the
mean

• Sample variance:

Where = arithmetic mean


n = sample size
Xi = ith value of the variable X
Population Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data

• Population standard deviation:


Sample Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data

• Sample standard deviation:


Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24

n=8 Mean = x = 16

A measure of the “average” scatter


around the mean
Measuring variation

Small standard deviation

Large standard deviation


Comparing Standard Deviations
Data A
Mean = 15.5

11 12 13 14 15 16 17 18 19 20
s = 3.338
21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 s = 0.926
21
Data C
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
s = 4.570
Advantages of Variance and
Standard Deviation

• Each value in the data set is used in the


calculation

• Values far from the mean are given extra


weight
(because deviations from the mean are squared)
Section 1.5
Discrete and
Continuous Data

Copyright © 2017 Pearson Education ,Ltd. All rights reserved.


All type of
research need to
be able to extract
information from
data
Variables

Make Model Year …


Ford Fiesta 2014 …
Individuals
Opel Astra 2013 …
… … … …
What is a variable?
• A measurable quantity or describable attribute that varies between
individuals
• e.g. for cars
• Colour
• Age
• Engine size
What are data?
• The recorded values of one or more variables from certain individuals
• e.g. for a particular car
• White
• 15 years old
• 2000 cc

• There are two types of data quantitative and qualitative.


Types of Data
• Data

• Categorical
•Nominal

•Ordinal

• Numerical

• Discrete

• Continuous

Examples: Examples: Examples: Examples:


■ Marital Status ■ Year of Study ■ Number of Children ■ Weight
■ Are you registered to ■ T-shirt size ■ Defects per hour ■ Voltage
vote? (Defined categories with (Counted items) (Measured characteristics)
■ Eye Color natural order)
(Defined categories or
groups)
Nominal data
• Named categories
• e.g.
• Type – hatchback, saloon, coupe
• Can you think of another?
Ordinal data
• Categories with a natural order
• e.g.
• Size – small, medium, large
• Can you think of another?
Discrete data
• Distinct numbers with gaps between
• e.g.
• Doors – 2, 3, 4, 5 …
• … but not 2.39 or 3.86
• Can you think of another?
Continuous data
• Any number in a given range
• e.g.
• Mileage – anything between
•0 (brand new)
• 200,000 (ready for the scrap heap)
• Can you think of another?
Section 1.6
Statistical
Modeling,
Scientific,
Inspection, and
Graphical
Diagnostics

Copyright © 2017 Pearson Education ,Ltd. All rights reserved.


• Scatterplot
• Stem-and-leaf plot
• Histogram
• Boxplot
1-42

Scatterplot
• Data for which items consists of a pair of values is
called bivariate.
• The graphical summary for bivariate data is a
scatterplot.
• Display of a scatterplot:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Looking at Scatterplots
• If the dots on the scatterplot are spread out in “random scatter,” then
the two variables are not well related to each other.
• If the dots on the scatterplot are spread around a straight line, then
one variable may be used to help predict the value of the other
variable.
Stem-and-Leaf Plot

• A simple way to summarize a data set.


• Each item in the sample is divided into two parts: a stem, consisting
of the leftmost one or two digits, and the leaf, which consists of the
next digit.
• It is a compact way to represent the data.
• It also gives us some indication of the shape of our data.
Example 5
• Example: Duration of dormant periods of the geyser Old
Faithful in Minutes
• Stem-and-leaf plot:

4 259
5 0111133556678
6 067789
7 01233455556666699
8 000012223344456668
9 013

• Let’s look at the first line of the stem-and-leaf plot. This


represents measurements of 42, 45, and 49 minutes.
• A good feature of these plots is that they display all the sample
values. One can reconstruct the data in its entirety from a
Histogram
• Graphical display that gives an idea of the “shape” of the sample.
• We want a reasonable number of observations in each interval.
• The bars of the histogram touch each other. A space indicates that
there are no observations in that interval.
1-47

Creating a Histogram
• Choose boundary points for the class intervals.
Usually these intervals are the same width.
• Compute the frequencies: this is the number of
observations that occur in each interval.
• Compute the relative frequencies for each class: this
is the number of observations in each interval divided
by the total number of observations.
• If the class intervals are the same width, then draw a
rectangle for each class, whose height is equal to the
frequencies or relative frequencies.
• If the class intervals are of unequal widths, the heights of
the rectangles must be set equal to the densities, where
density is the relative frequency divided by the class width.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Example of Histogram
Table 1.4 Car Battery Life

* Find the stem and leaf plot of thegiven data


* Find the relative frequencey distribution table of car battery life
* Draw the corresponding Histogram

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 49
Table 1.5 Stem-and-Leaf Plot of Battery Life

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 50
Table 1.7 Relative Frequency Distribution of Battery
Life

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 51
Figure 1.6 Relative frequency histogram

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 52
Figure 1.7 Estimating frequency distribution

Copyright © 2017 Pearson Education Ltd. All


rights reserved. 1 - 53
Box-and-Whisker plot or Box plot
• A boxplot is a graphic that presents the median, the first and third
quartiles, and any outliers present in the sample.

• The interquartile range (IQR) is the difference between the third


quartile and the first quartile. This is the distance needed to span the
middle half of the data.
Creating a Boxplot
⮚ Compute the median and the first and third quartiles of the sample. Indicate
these with horizontal lines. Draw vertical lines to complete the box.
⮚ Find the largest sample value that is no more than 1.5 IQR above the third
quartile, and the smallest sample value that is no more than 1.5 IQR below
the first quartile. Extend vertical lines (whiskers) from the quartile lines to
these points.
⮚ Points more than 1.5 IQR above the third quartile, or more than 1.5 IQR
below the first quartile are designated as outliers. Plot each outlier
individually.
Anatomy of a Boxplot
Example 5 cont.
❖ Notice there are no outliers in these data.

❖ Looking at the four pieces of the boxplot, we can


tell that the sample values are comparatively
densely packed between the median and the
third quartile.

❖ The lower whisker is a bit longer than the upper


one, indicating that the data has a slightly longer
lower tail than an upper tail.

❖ The distance between the first quartile and the


median is greater than the distance between the
median and the third quartile.

❖ This boxplot suggests that the data are skewed


to the left.
Comparative Boxplots
• Sometimes we want to compare more than one sample.
• We can place the boxplots of the two (or more) samples side-by-side.

• This will allow us to compare how the medians differ between


samples, as well as the first and third quartile.
• It also tells us about the difference in spread between the two
samples.
Symmetry and Skewness
• A histogram is perfectly symmetric if its right half is a mirror image of its left half.
• For example, heights of randomly selected men are roughly symmetric

• Histograms that are not symmetric are referred to as skewed.


• A histogram with a long right-hand tail is said to be skewed to the right, or
positively skewed.
• For example, incomes are right skewed.

• A histogram with a long left-hand tail is said to be skewed to the left, or


negatively skewed.
• For example, grades on an easy test are left skewed.
Symmetrical shape

0 10 20 30 40 50 60 70 80
90

mean = median = mode

• e.g. assignment marks


• Another example?
Right (positive skew) shape

0 10 20 30 40 50 60 70 80
90

mode < median < mean

• e.g. salaries of University workers


• Another example?
Left (negative skew) shape

0 10 20 30 40 50 60 70 80
90
mean < median < mode

• e.g. life expectancy of people


• Another example? .
Unimodal and Bimodal
• A histogram with only one peak is what we call unimodal.

• If a histogram has two peaks then we say that it is bimodal.

• If there are more than two peaks in a histogram, then it is said to be


multimodal.
Section 1.7
General Types of
Statistical Studies:
Designed
Experiment,
Observational
Study, and
Retrospective
Study
Copyright © 2017 Pearson Education ,Ltd. All rights reserved.

You might also like