0% found this document useful (0 votes)

59 views13 pages

Week 6

The document discusses using the least squares technique to estimate population parameters for a distribution. It explains: 1) The empirical or actual cumulative distribution function (CDF) is constructed from sample data and does not require knowing the population parameters. It provides a non-parametric estimate of the CDF. 2) Many common distributions can be expressed as linear models relating the empirical CDF to transformations of the random variable. 3) The least squares technique fits a linear regression line to the data, minimizing the sum of the squared residuals or errors between the observed and modeled values. This provides estimates for the parameters of the linear model.

Uploaded by

Fahad Almitiry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views13 pages

Week 6

Uploaded by

Fahad Almitiry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Week 6.

Lecture 1&2
Estimation 2: Least Squares

The second approach to estimating the population parameters for a distribution is to use
the technique of least squares and this requires using the actual or empirical distribution. The
basic idea can be illustrated using the exponential distributions whose cdf is given by

F(x) 1 e x

This can be linearised as follows:

ln[1 - F(x)] x

Thus a plot of ln[1-F(x)] against a sample of x should produce a straight line through
the origin whose slope equals. The slope of a best fit line on such a plot therefore provides a
sample estimate for . This is the least squares technique. However, there is a duality problem
here in that to get values for F(x) to form such a plot a value for is needed, but the least
squares technique is trying to estimate . The trick is to replace F(x) with the empirical or
actual cdf, F(x), that does not require a value for and is found solely from the data itself. Then

ln[1 - F(x)] x

A. The Empirical cdf.

Consider a data set that consists of observations from some unknown cumulative
distribution function F(x). The empirical, often also referred to as the (estimated, sample or
actual), cumulative distribution function is the best guess made from a sample of data of the
true (but unknown) population cumulative distribution function. This actual cumulative
distribution function is given the symbol F(x). To construct this empirical distribution, the data
must first be arranged from smallest to largest. The following the notation is taken from lecture
1. Let x(1) x(2) x(n) represent this ascendingly ordered data set (where n is the
sample size). The bracketed (i) in the ordered series is called the rank index of the particular
data value and so i = 1, n. Consider the following hypothetical values for x(i).

x(i) x(1) =2 x(2) = 6 x(3) = 8 x(4) = 10 x(5) = 12 x(6) = 17

i 1 2 3 4 5 6
F(x) = i/n 0.16667 0.3333 0.5 0.666667 0.83333 1
F(x) =(i-1)/n 0 0.1667 0.3333 0.5 0.66667 0.833333
F(x) =(i-0.5)/n 0.08333 0.25 0.4167 0.583333 0.75 0.916667

Two possible methods can be used to quantify this empirical distribution. First,
F(x(i)) i/n . In which case the empirical probability of observing a value for x less than or
equal to x(1) = 2 is 1/6 and the empirical probability of observing a value for x less than or
equal to x(2) = 6 is 2/6 and so on. The problem with this method is that the empirical probability
of observing a value for x less than or equal to x(n) = 17 is 6/6 or 1. This would suggest that it
is impossible for x to exceed 16 in value. It is however not impossible to recorded an x value
of more than 16 if the sample size were to be increased at a later date. But the formula i/n would
not allow for this possibility.

To avoid this problem the empirical distribution could be defined as F(x(i)) (i - 1)/n .
In which case the empirical probability of observing a value for x less than or equal to x(n) =
17 is (6-1)/6 = 5/6 and the empirical probability of observing a value for x less than or equal to
x(2) = 6 is (2-1)/6 = 1/6 and so on. The problem with this method is that the empirical
probability of observing a value for x less than or equal to x(1) = 2 is 0/6 or 0. This suggests
it is impossible for x to be less than 2 in value. It is however not impossible to record an x
value of less than 2 if the sample size were to be increased at a later date. But the formula (i-
1)/n would not allow for this possibility. As the graph of the previous table illustrates, these
two methods create problems at either end of the empirical distribution.

An obvious solution is to average these two estimators:

F(x(i)) = (i- 0.5)/n

as this formula allows a small chance of observing values of x less than 2 and x more than 17
in future testing. This estimator is called the mean estimator of the population cdf. This
averaging is seen in the grey empirical cdf in the figure below.

1
F(X) = 1/n = i/6
0.9
F(x) = (i-1)/n = (i-1)/6
0.8 F(x) = (i-0.5)/n = (i-0.5)/6

0.7
Empirical cdf, F(x)

0.6

0.5

0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
x

In this illustration the smallest value x(1) characterises the lowest (1/6)*100 = 16.66% of
the unknown cumulative distribution. The next smallest value x(2) characterises the next
16.66% of the distribution and so on. Since the smallest value represents the proportion of the
distribution between 0% and 16.66% the best guess is to assign it a cumulative probability
exactly in-between these two numbers, i.e. 8.33%. That is, F(x(1)) = 8.33%. Similarly, the next
smallest value represents the proportion of the distribution between 16.66% and 33.33% and
so the best guess as to its cumulative probability is that it lies exactly in-between these two
numbers, i.e. 25%.

An alternative to this mean formula that is sometimes used is the median estimate of the
population cdf

F(x(i)) = (i- 0.3)/(n+0.4)

In all these calculations there is no need to use distribution parameters, such as in the
exponential distribution. As such the empirical distribution is parameter free and so is
sometimes also called a non-parametric estimator of the cdf.

B. Distributions as Linear Models

In what follows let the random variable W be some (possible) transformation of the
empirical cdf and V some (possible) transformation of the random variable X.

The simplest possible relationship that can exist is when w is related to the random
variable v in a proportional manner

w = b1v

A natural extension is to allow for non-proportionality by allowing the line defined by

the equation to be offset from the origin through the inclusion of an additional unknown
constant b0

w =b0 + b1v

b0 is called the intercept and b1 the slope of the line and these parameters can both be either
positive or negative in value. Many of the distributions looked at so far in this module can be
written out in one of these two ways.

i. The Exponential Distribution

The exponential distribution is an example of a proportional linear model. Normally the

cdf is written as a function of the random variable X

F(x) 1 e x

But defining w as

w ln[1 - F(x)]

then gives

w = b1v with v = x and b1 = -.

ii. The Uniform Distribution

. The cdf for a uniform random variable is written as

xa
F(x)
(b a)

But defining w as

w F(x)

then

w= b0 + b1v with v = x, b0 = -[a/(b-a)] and b1 = 1/(b-a).

iii. The Weibull Distribution

For the Weibull distribution, its cdf is normally written as

F(x) 1 e( x)

But defining w as

w ln{-ln[1 - F(x)]}

then gives

w b0 b1v

with

v = ln[x], b0 = ln() and b1 =.

iv. The Normal Distribution

For the Normal distribution, w is the standardised or Z value and is found by reading
off from the Z table the Z value associated with F(x). In Excel w is found using

w = z = NORMSINV(F(x))

But
x 1
z x

where is the population mean of x and the population standard deviation for x.
Thus

w b 0 b1v

with v = x, b0 = - and b1 = .
v. The Log Normal Distribution

If y = ln(x), then for the Log Normal distribution, w is the standardised or Z value and
is found by reading off from the Z table the Z value associated with F(y). In Excel w is found
using

w = z = NORMSINV(F(y))

But
y y y 1
z y
y y y
where y is the population mean of y (the log mean) and y the population standard deviation
for y (the log standard deviation) Thus

w b 0 b1v

with v = y = ln(x), b0 = - yy and b1 = y.

C. Linear Least Squares Method

When using the above linear models to represent the distributions, some complications
emerge. First, F(x) is not known. An obvious solution to this is to replace it with the empirical
cdf, F(x(i)). For example the proportional linear representation of the exponential distribution
then becomes

w(i) ln[1 - F(x(i)]

with

w(i) = b1v(i) and v(i) = x(i) and b1 = -.

With w and v now quantifiable, the second problem is that if their values were plotted
out, they would not all fall on a straight line. The data points will be scattered about a line. This
scatter is accounted for by adding a random error or residual e(i) term to the above linear
models. For example,

w(i) = b1v(i) + e(i) or w(i) =b0 + b1v(i) + e(i)

Each of the e(i) residuals then measures the vertical distance between the drawn line
and the actual data point and as such some residuals will be positive and some negative.

Example 1: Fatigue of ceramic ball bearings

Consider again the ceramic ball bearings data and assume this data has an exponential
distribution. To estimate , the calculations shown in the next table need to be carried out. Look
first at the 2nd column of the table below. It contains that actual fatigue lives these times are
sorted from lowest to highest. Assuming the data has an exponential distribution, x(i) also
equals v(i).

Empirical cdf
Index, i Sorted Fatigue Life, v(i) = x(i) F(x(i)) w(i) = ln[1- F(x(i))]
1 1.67 =(1-0.5)/10 = 0.05 =ln(1-0.05) = -0.05129
2 2.2 =(2-0.5)/10 = 0.15 =ln(1-0.15) = -0.16252
3 2.51 0.25 -0.287682072
4 3 0.35 -0.430782916
5 3.9 0.45 -0.597837001
6 4.7 0.55 -0.798507696
7 7.53 0.65 -1.049822124
8 14.7 0.75 -1.386294361
9 27.8 0.85 -1.897119985
10 37.4 0.95 -2.995732274

Column 1 is the rank index used to calculate the empirical cdf given in column 3. The
empirical probability that x is 1.67 or less is (i- 0.5)/10 = (1 - 0.05)/10 = 0.05 or 5%. Column
3 illustrates some additional calculates for the rest of the data. Assuming that X has an
exponential distribution, w(i) is defined as w(i) ln[1 - F(x(i)] . So for the smallest value of x,
w(1) ln[1 - 0.05] - 0.05129 .The last column in the table above shows additional
calculations for the larger values for X. The graph below then plots column 2 against column
4. No matter how you draw a line out from the origin it is impossible to have all the data points
falling on the line. The line shown in the graph is the best fit to the data given it must start at
the origin. The seventh row of the above table has the biggest vertical distance between the
data point and the best fit line, so e(7) is the largest error it is negative in value because its
below the line.

-0.5
Transformed empirical cdf, w(i)

e(7) =-1.0498 - -0.6061 = -0.44371

-1

-1.5

-2

-2.5

-3 w(i) = -0.0805v(i)

-3.5
0 5 10 15 20 25 30 35 40
v(i)
In what sense is the line shown in the above graph a best fit line? Why is b1 = -0.0805?
Ideally, the line should be drawn so as to best fit the data. Best fit in turn can be defined in
terms of all the residuals, with the best fit line being the one which minimises all of the residuals
once they have been squared hence the name least squares estimator. The squares of the
residuals are taken to stop positive and negative residuals offsetting each other in the
summation. Consider first the proportional linear model. b1 is chosen to minimise the sum of
the squares of the residuals (SSres)

n n n
SS res e(i) 2 w(i) (b 1 v(i)) w(i) 2 2b1 w(i)v(i) b12 v(i) 2
2

i 1 i 1 i 1

where n equals the number of experimental data points available on v and w. From calculus,
b1 minimises SSres if it satisfies the equation

n n n

w(i) (b v(i)) 2
2 w(i)v(i) 2b1 v(i) 0
2

b1 i1
1
i 1 i 1

Solving for b1 gives

w(i)v(i)
b1 i 1
n

v(i)
i 1
2

The predicted value for w, often labelled w(i) , is given by

w(i) b1x(i)

This is often referred to as the regression line or best fit line. The residuals can then be
calculated from

e(i) w(i) - w(i) w(i) - b1v(i)

By definition, a best fit line should go through the middle of all the data points so that
the negative and positive residuals should cancel to zero when summed, meaning the average
value for the residuals should be zero. The variance of the residuals, often called the mean
squared residuals (MSres), can therefore be calculated using the standard formula for a variance

1 n

n 1 i 1
(e(i) e) 2
1 n

n 1 i 1
SS
e(i) 2 res MS res
n 1

Given that the MSres is a measure of variation or scatter in the data about the regression
line, and so not picked up by this best fit line, this statistic is a measure of lack of fit. The
variance in w, called the mean square total (MStotal) in regression analysis, is by definition
given by

s 2 MStotal
n
w(i) w 2
i 1 n 1

It is useful to know what proportion of the variation in W is made up of the residual

variation. This ratio is called the adjusted coefficient of determination, or R2adj for short

MSres
2
R adj 1-
MStotal

As such R2adj measures the percentage of the variation in w that can be explained by the
regression line or line of best fit. R2adj = 1 only when MSres = 0, i.e. when all the data points
fall on the best fit line. Then the regression line explains all the variation in W.

Example 2: Fatigue of ceramic ball bearings

The application of the above formulas to the fatigue life data is shown in sheet Exponential
of the Excel file least squares Excel Workings. The screen shot below is taken from this file
and shows the above formulas executed in Excel.

Range G4:G13 squares the v(i) values and range H4:H13 multiplies v(i) by w(i). In cell
N6, each of these columns are summed using the sum function and the ratio of the two totals
calculated to yield the value for b1. The negative of b1 is the least squares estimate for of the
exponential distribution as shown in cell N9. This is the value shown in the last graph and so
defines that best fit line in the graph. With b1 known it is possible to work out the predicted
values or points on the best fit line and this is done in column range J4:J13. For example, the
first predicted value is given by b1v(1) = -0.08049(1.67) = -0.1344. Thus the first residual is
the difference between this prediction and the actual value for w(i) or -0.05129 - -0.1344 =
0.083. The other residuals are shown in range K4:K13 including the one shown in the above
graph.

In cell N12 these residuals are all squared and these squared residuals added up using the
function SUMSQ. Dividing this by the sample size less gives the mean squared residuals. In
Excel MStotal can be obtain using the VAR function because MStotal is nothing more than the
sample variance of w. Then in cell N13 these formulas are combined to give a R2adj value of
0.9125. That is, the best fit line explains 91.25% of the variation present in w(i). Study this
Excel sheet to become familiar with these calculations.
For the non proportional model:

n n
SSres e(i) 2 w(i) {b0 b1v(i)}
2

i 1 i 1

where n equals the number of experimental data points available on v and w. From calculus,
the values for b0 and b1 that minimise SSres is found by solving both

w(i) (b b1v(i)) 0
2

b 0
0
i 1

n
w(i) (b0 b1v(i)) 2 0
b1 i1

This involves solving two equations simultaneously and it can be shown that the
solution to this is given by

n n
SSvv v(i) v and SS vw w(i) w v(i) v
SSvw
b1
2
where
SSvv i 1 i 1

and

b 0 w b1v

where w , v are the sample mean values w and v respectively. The variance of the residuals,
often called the mean squared residuals (MSres), can therefore be calculated using the formula
for a variance

(e(i) e) 2

n
1 n SS

i 1

n k n k i 1
e(i) 2 res MS res
nk

where k is the number of unknown parameters in the model that require estimation. Notice the
denominator is n k and not the more usually n - 1 for a normal sample variance calculation.
This is because normally only 1 degree of freedom is lost in calculating the average in the
sample variance formula. But in calculating SSres, k parameters have to be estimated so k
degrees of freedom are lost.

Example 3: Fatigue of ceramic ball bearings

Consider again the ceramic ball bearings data and assume this time that the data has a
Weibull distribution. To find and , the calculations shown in the next table need to be carried
out. Look first at the 2nd column of the table below. It contains that actual fatigue lives and
these times are sorted from lowest to highest. Assuming the data has a Weibull distribution,
ln(x(i)) equals v(i) and so in column 3 these log failure times are calculated.
Empirical cdf
Index, i Sorted Fatigue Life, v(i) = x(i) Log Fatigue Life, v(i) = ln[x(i)] F(x(i)) w(i) = ln{-ln[1-F(x(i)]}
1 1.67 LN(1.67) = 0.5128 =(1-0.5)/10 = 0.05 =ln{-ln(1-0.05)} = -2.9702
2 2.2 ln(2.2) = 0.7885 =(2-0.5)/10 = 0.15 =ln{-ln(1-0.05)} = -1.81696
3 2.51 0.920282753 0.25 -1.245899324
4 3 1.098612289 0.35 -0.842150991
5 3.9 1.360976553 0.45 -0.514437136
6 4.7 1.547562509 0.55 -0.225010673
7 7.53 2.018895042 0.65 0.048620745
8 14.7 2.687847494 0.75 0.32663426
9 27.8 3.325036021 0.85 0.640336939
10 37.4 3.621670704 0.95 1.0971887

Assuming that X has an Weibull

distribution w(i) is defined as
w(i) ln{ ln[1 - F(x(i)]} . So for the smallest value of x, w(1) ln{ ln[1 - 0.05] } - 2.9702
. The last column in the table above shows additional calculations for the larger values for X.
The graph below then plots column 3 against column 5. No matter how you draw a line with
an intercept it is impossible to have all the data points falling on the line. The line shown in the
graph is the best fit to the data given a non-zero intercept. The first row of the above table has
the biggest vertical distance between the data point and the best fit line, so e(1) is the largest
error it is negative in value because its below the line.

2
1.5 w(i) = 1.015v(i) - 2.3652
1
Transformed empirical cdf, w(i)

0.5
0
-0.5
-1
-1.5
-2
-2.5 e(1) = -2.970 - -1.845 = -1.13
-3
-3.5
0 0.5 1 1.5 2 2.5 3 3.5 4
v(i)

The application of the above non proportional model formulas to the fatigue life data is
shown in sheet Weibull of the Excel file least squares Excel Workings. The screen shot
below is taken from this file and shows the above formulas executed in Excel and reveals how
the best fit lines shown in the figure above is obtained.
In range E4:E13, the deviations of w(i) from its mean value are computed and in range
F4:F13 the deviations of v(i) from its mean value are computed. Range H4:H13 then multiplies
these two deviations together, whilst range G4:G13 squares the v(i) deviations. The above
formulas show that Svv and Svw are the sums of the numbers in columns G and H respectively,
and so the SUM function is used in cells N3 and N4 to compute these values. The ratio of these
two values yields the value for b1 and this is computed in cell N6. This is also the least squares
estimate for and this is calculated in cell N10. The formulas above show that b0 is then given
by the mean value for v(i) minus the product of b 1 and the mean of w(i). This calculation is
done in cell N7. ln() is then given by the ratio of b0 to b1 and so is calculated this way in cell
N9. These values for b0 and b1 are the same as those shown in the last graph and so defines that
best fit line in that graph.

With b0 and b1 known it is possible to work out the predicted values or points on the best
fit line and this is done in range J4:J13. For example, the first predicted value is given by b0 +
b1v(1) =-2.3652+1.0149(0.5128) = -1.8447. Thus the first residual is the difference between
this prediction and the actual value for w(i) or -2.9702 - -1.8447 = -1.125. The other residuals
are shown in range K4:K13 including the one shown in the above graph.

In cell N12 these residuals are all squared and these squared residuals added up using the
function SUMSQ. Dividing this by the sample size less k (in this case with two parameters, k
= 2) gives the mean squared residuals. In Excel MStotal can be obtain using the VAR function
because MStotal is nothing more than the variance of w. Thus in cell N13 these formulas are
combined to give a R2adj value of 0.7991. That is, the best fit line explains 79.91% of the
variation present in w(i). R2, or the coefficient of determination, does not adjust for the degrees
of freedom and so

MS res (n k)
R 2 1-
MS total (n 1)

Note then when b0 = 0, R2 = R2adj because k (the number of estimated parameters) = 1.

Study this Excel sheet to become familiar with these calculations.

Sheets Normal and Log Normal of the Excel file least squares Excel Workings
shows how to obtain the least squares estimates of the parameters of the normal and log normal
distributions. Study them in detail and relate them to the formulas above.

D. Probability Plots
By plotting various types of cumulative distributions (such as the normal, log normal or
Weibull) against the actual or empirical distribution, it is possible to see which one best
describes the sample of data being analyzed. Such a cross plot is called a probability plot.

Example 4: Fatigue of ceramic ball bearings

Consider a probability plot for the exponential distribution. The cdf for this distribution
is F(x) 1 e x . Substituting in the above least squares estimate for gives F(x) 1 e 0.0805x
. In sheet Exponential of the Excel file least squares Excel Workings this formula is used
to calculate the numbers shown in cells E22:E31. This is often referred to as the modeled cdf.
This is then plotted against the empirical cdf calculated as above, to give the following
probability plot.

1
0.9 Modeled cdf based on lambda = 0.0805

0.8
0.7
0.6
Empirical cdf

0.5

0.4
0.3

0.2
0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Modeled cdf

If the exponential distribution perfectly described this fatigue data all the points on this
plot should fall on the 45 degree line (blue line above) because then the empirical and modelled
cdfs are identical. So large deviations of the data points from the 45 degree line can be taken
as evidence to suggest the selected distribution the exponential in this illustration is not a
good description of the data in this case fatigue life and so another distribution may be
better. To this effect, in cells F22:F31 the absolute deviations of the data points in the
probability plot from the 45 degree line are calculated. The largest of these deviations (shown
in red and called the maximum absolute deviation or MAD for short) can then be used to
compare distributions. The best distribution to use for a given set of data would then be the one
with the smallest maximum absolute deviation. Looking through all the other Sheets in the
Excel file reveals that the Weibull distribution has a smaller maximum absolute deviation than
the exponential, but the log normal distribution has the smallest maximum deviation at 0.133.
Thus fatigue life is best described using the log normal distribution that is fatigue life appears
to have a log normal distribution.
E. Bias and Consistency.

Play around with the Excel file Sim2 to see whether the least squares estimator of
distribution parameters are biased in small and large samples and compare this to the findings
from Excel file Sim1 to see which estimator (least squares or method of moments) is best for
which distribution. Your findings will be required to tackle some of the questions in your 3rd
assignment.

Rats in The Walls (Artfree Version) PDF
100% (4)
Rats in The Walls (Artfree Version) PDF
54 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Design and Analysis SAE
100% (1)
Design and Analysis SAE
186 pages
English10 Q1 M8
No ratings yet
English10 Q1 M8
12 pages
Sta301 Lec23
No ratings yet
Sta301 Lec23
73 pages
Gemechu Bushu1
No ratings yet
Gemechu Bushu1
80 pages
Chapter 6 PDF Lecture Notes
No ratings yet
Chapter 6 PDF Lecture Notes
41 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Chapter 3 Radiation
100% (1)
Chapter 3 Radiation
36 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
Construction Process
100% (1)
Construction Process
16 pages
English9 1ST Exam
100% (1)
English9 1ST Exam
4 pages
Lecture 3 - CSE38900 - Rev
No ratings yet
Lecture 3 - CSE38900 - Rev
88 pages
Autodesk AutoCAD Plant 3D - CAD System Manager Manual
No ratings yet
Autodesk AutoCAD Plant 3D - CAD System Manager Manual
29 pages
Continuous R.V. Student
No ratings yet
Continuous R.V. Student
82 pages
Chap 4
No ratings yet
Chap 4
36 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Lecture6 - Random Variable - 0925
No ratings yet
Lecture6 - Random Variable - 0925
33 pages
Physics Formula Sheet
100% (1)
Physics Formula Sheet
4 pages
Session-9 Mar 14
No ratings yet
Session-9 Mar 14
45 pages
L1 RVs-1
No ratings yet
L1 RVs-1
47 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
66 pages
Basics
No ratings yet
Basics
61 pages
Topic 4B Common Continuous Distributions
No ratings yet
Topic 4B Common Continuous Distributions
25 pages
Packet 6
No ratings yet
Packet 6
16 pages
Prelims Stats
No ratings yet
Prelims Stats
39 pages
Notes Dvi
No ratings yet
Notes Dvi
34 pages
Negative Exponential Distribution PDF
100% (1)
Negative Exponential Distribution PDF
3 pages
Parametric Probability Distributions
No ratings yet
Parametric Probability Distributions
27 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
No ratings yet
Empirical Distribution Function & Exploratory Data Analysis: Vijay Kumar
22 pages
ProbabilityStatistics Probability3
No ratings yet
ProbabilityStatistics Probability3
9 pages
Distributions 1
No ratings yet
Distributions 1
18 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
Review of Basic Probability and Statistics: Random Variable
No ratings yet
Review of Basic Probability and Statistics: Random Variable
55 pages
Sta 2200 N (1) - 31-40
No ratings yet
Sta 2200 N (1) - 31-40
10 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
Seismic Resistant Design of Structures: Random Variables
No ratings yet
Seismic Resistant Design of Structures: Random Variables
30 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Virtual University of Pakistan Lecture No. 23 of The Course On Statistics and Probability by Miss Saleha Naghmi Habibullah
No ratings yet
Virtual University of Pakistan Lecture No. 23 of The Course On Statistics and Probability by Miss Saleha Naghmi Habibullah
73 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Zeeshan Research Paper
No ratings yet
Zeeshan Research Paper
13 pages
CIS2460 Statistics Tutorial Part 4
No ratings yet
CIS2460 Statistics Tutorial Part 4
15 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
No ratings yet
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
11 pages
Cambridge Books Online
No ratings yet
Cambridge Books Online
20 pages
Chapter 6
No ratings yet
Chapter 6
8 pages
Adidas ST-03 Tensile Strength
No ratings yet
Adidas ST-03 Tensile Strength
9 pages
Mit18 05 s22 Class05-Prep-C
No ratings yet
Mit18 05 s22 Class05-Prep-C
8 pages
Random Processes: Version 2, ECE IIT, Kharagpur
No ratings yet
Random Processes: Version 2, ECE IIT, Kharagpur
8 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
06-14-PP CHAPTER 06 Continuous Probability
No ratings yet
06-14-PP CHAPTER 06 Continuous Probability
81 pages
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
Section06 Solutions
No ratings yet
Section06 Solutions
11 pages
Mixed Poisson Process With Max-U-Exp Mixing Variab
No ratings yet
Mixed Poisson Process With Max-U-Exp Mixing Variab
15 pages
Exponential and Normal Distribution Lecture Notes
No ratings yet
Exponential and Normal Distribution Lecture Notes
8 pages
PPST Domain Learnin G Area Strength/S Weaknesse S Opportunitie S Threats
No ratings yet
PPST Domain Learnin G Area Strength/S Weaknesse S Opportunitie S Threats
13 pages
Review of Basic Statistics: Appendix A
No ratings yet
Review of Basic Statistics: Appendix A
12 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
8 Random Variable
No ratings yet
8 Random Variable
7 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Lecture 2: CDF and EDF: Exponential (0.5) Exponential (0.5)
No ratings yet
Lecture 2: CDF and EDF: Exponential (0.5) Exponential (0.5)
7 pages
Probability Density Function Distribution Function: Distributions
No ratings yet
Probability Density Function Distribution Function: Distributions
7 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Negative Exponential Distribution
No ratings yet
Negative Exponential Distribution
3 pages
An Examination of The Relationship Between Ability Model Emotional Intelligence and Leadership Practices of Organizational Leaders and Entrepreneurs
No ratings yet
An Examination of The Relationship Between Ability Model Emotional Intelligence and Leadership Practices of Organizational Leaders and Entrepreneurs
251 pages
463505
100% (1)
463505
311 pages
SW Mock 2024 - O AND A LEVEL GENERAL TIMETABLE FINAL
No ratings yet
SW Mock 2024 - O AND A LEVEL GENERAL TIMETABLE FINAL
1 page
Camera - Angles of Twilight
No ratings yet
Camera - Angles of Twilight
7 pages
Logitech Wireless Keyboard K350 Manual
No ratings yet
Logitech Wireless Keyboard K350 Manual
40 pages
RunnTech H100 Multifunction Control Handle
No ratings yet
RunnTech H100 Multifunction Control Handle
3 pages
WongLitWan PDF
No ratings yet
WongLitWan PDF
507 pages
Fungsi Komposisi English YOLANDA
No ratings yet
Fungsi Komposisi English YOLANDA
11 pages
Writing Successful Undergraduate Dissertations in Social Sciences A Student S Handbook 2nd Edition Francis Jegede Download
No ratings yet
Writing Successful Undergraduate Dissertations in Social Sciences A Student S Handbook 2nd Edition Francis Jegede Download
46 pages
The Overview Effect
No ratings yet
The Overview Effect
7 pages
TEAP - Modelo 01
No ratings yet
TEAP - Modelo 01
8 pages
The Evolution of Luxury.
No ratings yet
The Evolution of Luxury.
9 pages
Individual Differences in Cognitive Styles Develop
No ratings yet
Individual Differences in Cognitive Styles Develop
32 pages
Interdisciplinary Lesson Plan
No ratings yet
Interdisciplinary Lesson Plan
6 pages
What Is It - Docx Activity 3 Philosophy PAUL ANDRIE A. COSO
No ratings yet
What Is It - Docx Activity 3 Philosophy PAUL ANDRIE A. COSO
4 pages
Tutorial04 MATRIX
No ratings yet
Tutorial04 MATRIX
40 pages
Blue Whale Communication
No ratings yet
Blue Whale Communication
9 pages
Exploring Randomness PDF
No ratings yet
Exploring Randomness PDF
2 pages
YetiDespatch75-031 0090C 11-3059
No ratings yet
YetiDespatch75-031 0090C 11-3059
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Week 6

Uploaded by

Week 6

Uploaded by

Week 6.

This can be linearised as follows:

A. The Empirical cdf.

x(i) x(1) =2 x(2) = 6 x(3) = 8 x(4) = 10 x(5) = 12 x(6) = 17

An obvious solution is to average these two estimators:

F(x(i)) = (i- 0.5)/n

F(x(i)) = (i- 0.3)/(n+0.4)

B. Distributions as Linear Models

A natural extension is to allow for non-proportionality by allowing the line defined by

i. The Exponential Distribution

The exponential distribution is an example of a proportional linear model. Normally the

w = b1v with v = x and b1 = -.

ii. The Uniform Distribution

w= b0 + b1v with v = x, b0 = -[a/(b-a)] and b1 = 1/(b-a).

iii. The Weibull Distribution

For the Weibull distribution, its cdf is normally written as

v = ln[x], b0 = ln() and b1 =.

iv. The Normal Distribution

with v = y = ln(x), b0 = - yy and b1 = y.

C. Linear Least Squares Method

w(i) ln[1 - F(x(i)]

w(i) = b1v(i) and v(i) = x(i) and b1 = -.

w(i) = b1v(i) + e(i) or w(i) =b0 + b1v(i) + e(i)

Example 1: Fatigue of ceramic ball bearings

e(7) =-1.0498 - -0.6061 = -0.44371

Solving for b1 gives

The predicted value for w, often labelled w(i) , is given by

e(i) w(i) - w(i) w(i) - b1v(i)

It is useful to know what proportion of the variation in W is made up of the residual

Example 2: Fatigue of ceramic ball bearings

Example 3: Fatigue of ceramic ball bearings

Assuming that X has an Weibull

Note then when b0 = 0, R2 = R2adj because k (the number of estimated parameters) = 1.

Example 4: Fatigue of ceramic ball bearings

You might also like