0% found this document useful (0 votes)
38 views8 pages

Data Analysis Assignment Help

The document provides information about generating correlated random variables for stochastic simulation applications. It gives the mean and variance of the variable y, which is generated from y = ax + bε, where x and ε are independent random variables. It then selects values for a and b so that y has a variance of 2 and a correlation between x and y of 0.5. The document also provides examples of estimating quantiles from data and constructing confidence intervals, as well as performing a hypothesis test to compare the mean salaries of engineers with and without professional certifications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Data Analysis Assignment Help

The document provides information about generating correlated random variables for stochastic simulation applications. It gives the mean and variance of the variable y, which is generated from y = ax + bε, where x and ε are independent random variables. It then selects values for a and b so that y has a variance of 2 and a correlation between x and y of 0.5. The document also provides examples of estimating quantiles from data and constructing confidence intervals, as well as performing a hypothesis test to compare the mean salaries of engineers with and without professional certifications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

For any Homework related queries, call us at- +1 678 648 4277

You can mail us at:- [email protected] or


reach us at- https://www.statisticshomeworksolver.com/

Data Analysis Assignment Help


Sometimes we may want to generate two correlated random variables
for stochastic simulation applications. For example, the duration and
intensity of rain storm may be highly uncertain but positively
correlated. One option for generating correlated variables x and y is to
obtain y from:

y = ax + b

where x and  are independent random variables with means equal to 0.0
and variances equal to 1.0 and a and b are specified constants.

a. What are the mean and variance of y? (expressed as functions of a and


b):

Solution:

E[y] = E[ax+b] = aE[x]+bE[] = 0


Var[y] = Var[ax+b] = a2Var[x] + b2Var[] = a2 + b2
b. Select a and b so that y has a variance of 2 and the correlation between
x and y is
0.5. Show all relevant calculations.

Solution:
a2 + b2 = 2
Correl(x,y) = 0.5 = Cov(x,y)/(Std[x]Std[y])
Cov(x,y) = E[(x- x )(y- y )]
y- y = a(x- x )+b(-  )
so: Cov(x,y) = E[a(x- x )2+b(x- x ) (-  )] = aVar[x] = a
→ a = .5* 2 = 0.707 ; b = 1.22

Problem 2

Suppose that the time between failures of a structural component is


modeled as an exponentially distributed random variable. You want to
use the 10% quantile x10 [defined by Fx(x10) = 0.10] as an indication of
how often the component should be tested. You have the following 10
recorded times between failures (in hrs):

512 1464 4995 7216 1150 2717 7842


39,898 1967 8103

a. Propose a technique for estimating x10 from the observed times


between failures.

Solution:

One technique is to use the exponential CDF to obtain x10 estimates


using mx estimates from the data:
F(x) =1-exp[-x/a], where mx approximates a, and F(x) = 0.1
→ xˆ10 = 0.1054mx

This estimator is both unbiased and consistent. Estimators that were not
unbiased and consistent were also given credit (since the problem does
not specify that they should be), but only estimators that actually
estimate x10 legitimately were given full credit.

b. Compute an x10 estimate from the above data.

Solution:

xˆ10 = .1054*7586 = 799.6

c. Use the above data to derive a 99% large sample double-sided


confidence interval for the true x10 value. You may wish to use the unit
normal CDF plot provided at the end of this quiz.

Solution:

Because the estimator in this case is x10 and not mx, the approximation
SD( aˆ ) ~ SD(x)/ N
CANNOT be used!

For the above estimator, SD( xˆ10 ) ~ 0.1054


SD(x)/ N = 390.4
The 99% confidence interval is:
-2.575 < ( 799.6 - x10)/390.4 < 2.575
205.7 < x10 < 1804.9

Problem 3

Suppose that you have reason to believe that fluctuations x around


the long-term average tidal velocity normal to a shoreline are
uniformly distributed, between –a and +a, with a mean of 0. The
distributional parameter a, the upper limit on the velocity, is
unknown. You have N velocity measurements [x1, x2, …, xN], which
you assume is a random sample drawn from the postulated uniform
distribution. In this problem you will use the method of moments to
estimate a from the random sample.

a. Derive an expression that relates the variance of x to the


parameter a.

Solution:

For this uniform CDF, Var[x] = (2a)2/12 (this can be derived by


integration). So Var[x] = a2/3

b. Use this expression to suggest an estimator aˆ (x1 , x2 , x N ) for a. Do


you think your
estimator is unbiased and consistent (you do not need to prove that
these properties apply --- just state your opinion, with justification)?
Solution:

Using this expression (as the problem states), a good estimator is:
aˆ  3sx

This is unbiased and consistent since s 2 is an unbiased estimator of


Var[x] and the variance of aˆ

approaches zero as N approaches infinity.

c.Describe how you would derive a two-sided large sample


confidence interval for a from i) a specified confidence level 1-,
ii) an actual estimate aˆ of a computed from the N velocity
measurements with your suggested estimator, and iii) the standard
deviation of aˆ . Describe your procedure step-by-step so it could
be carried out with a real data set.

Solution:

3s x - SD( aˆ )F-1(1 – α/2) < a < 3s x - SD( aˆ )F-1(α/2)

where F-1 is the inverse of the unit normal CDF (large sample
assumption)

d. Identify how you would obtain approximate values for any


unknown quantities appearing in the confidence interval
expression. In particular, provide a MATLAB (or pseudocode)
d.program for any virtual experiment/Monte Carlo calculations that you
would perform. If you cannot remember the exact name or syntax for
a particular internal MATLAB function (such as exprnd or normcdf),
just specify your own syntax and identify what the function does in
words. Then include it in the appropriate place in your program.
Alternatively, ask us.

Solution:

function test(actual_data)
% This program will simulate replicates to determine the
% SD[ahat], which is the unknown quantity in Part c.
%
% Actual data vector is input as a function argument
N=length(actual_data) ; % number of data points per sample
nrep=1000;
ahat=sqrt(3)*std(actual_data) ; % estimate of ahat from the
sx of the data.
% Assume unknown true value of uniform distribution limit
% a is equal to ahat
% generate nrep replicates from unifrnd
sim_data=unifrnd(-ahat,ahat,N,nrep);
% compute estimate for each replicate
simahats=sqrt(3)*std(sim_data);
% find standard deviation over estimate
replicates
sdahat=std(simahats);

% That's all we need, so just plug in:


lowerbound=ahat-sdahat*norminv(1-alpha/2,0,1)
upperbound=ahat-sdahat*norminv(alpha/2,0,1)
return

Problem 4

Consider two groups of engineers 7 years out of MIT: one group of 10


with professional engineer (PE) registration and one group of 8
without. The salaries of each group (in tens of thousand dollars) are as
follows:

With PEs: 66 41 77 80 52 98 99 74 81 78

Without PEs: 65 88 55 124 66 72 96 71

Using a large sample assumption and this data set, perform a two-sided
test of the hypothesis that the mean salaries of engineers with and
without PEs are the same. Summarize your results by reporting the p
value for the test. When picking the two groups of engineers how could
you minimize the impact of factors other than PE registration on your
conclusions? You may wish to
use the unit normal CDF plot provided at the end of this quiz.

Solution:

With PE (x): mx = 74.6 sx = 18.0874


Without PE (y): my = 79.625 sy = 22.1871
z = (74.6-79.6)2 2
/8) -0.5 = -.518
(18.1 /10+22.2

From the chart below, Fz(-.518) = 0.3 = p/2

P=0.6

You might also like