0% found this document useful (0 votes)
939 views80 pages

Dips Academy Statistics Notes

Uploaded by

Harnaik Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
939 views80 pages

Dips Academy Statistics Notes

Uploaded by

Harnaik Sahni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 80
Dips. awe Regenerating Mathematics An ISO 9001 : 2008 Certified Institute ep ge Games oii A High Quality Study Material or Higher Level Exam for V.G. L£ P.G. Students rN UREA DPA ACCC E SEMEL EEUU RL LLL OEE] y Scanned with CamScanner Scanned with CamScanner Chapter 1: Probability GPefo- 63796 La 12 13 Chapter 2: Random Variables 24 22 23 24 25 26 27 Chapter 3: Special Univariate Distributions 31 32 ‘Outcomes and Events .. Probability Functions/Measures .. 124 Conditional Probability and Independence .. Random Variables and Cumulative Distribution Functions Density Funetions Expectations and Moments . Expectation of a Function of a Random ‘Two Important Inequality Results Moments and Moment Generating Functions .. Other Distribution Summaries 318 Continuous Distributions 324 322 323 3.24 325 Properties of P{] Degenerate Distribution ‘Two Point Dis Uniform Distribution on n~ Points Binomial Distribution Negative Binomial Distribution (Pascal or Waiting time distribution) Hyper-Geometric Distribution Poisson Distribution Multi Uniform Distribution Gamma Distribution Beta Distribution .. Normal Distribution (Gaussian Law) Cauchy Distribution Scanned with CamScanner Chapter 4: Joint and Conditional Distributions 4.1 Joint Distributions .. 4.2 Special Multivariate Distributions 4.2.1, Multinomial Distribution 43 44 4.8 Conditional Expectations of Functions of Random Variables Conditional Expectation 4.6 Independence of Random Variables .. 47 Covariance and Correlation 4.8 Transformation of Random Variables: ¥ = g(X) -« 49. Moment-Generating-Funetion Technique Chapter 5: Inference 5.1 Sample Statistics .... 5.2 Sampling Distributions... $3 Point Estimation 54 Interval Estimation Chapter 6: Maximum Likelihood Estimation 6.1 Likelihood Function and ML Estimator .. 64 Properties of MLEs (Chapter 7: Hypothesis Testing 1 Introduction 7.2 Simple Hypothesis Tests... 73 Simple Null, Composite Alternative 74 Composite Hypothesis Tests Chapter 8: Elements of Bayesian Inference 8.1 Introduction 82 Parameter Estimation .. 83 Conjugate Prior Distributions Chapter 9: Markov Chains 91 Discrete ‘ime Markov Chain Chapter 10: Miscellaneous 10.1 Reliability Analysis 10.2 Quadratic Forms Assignment Sheet-1 Assignment Sheet-2 .. Assignment Sheet-3 snes Assignment Sheet-4 .... Scanned with CamScanner tL Pg CHAPTER 1 PROBABILITY 1.1. Outcomes and Events We consider experiments, which comprise: a collection of distinguishable ‘outcomes, which are termed elementary events, and typically denoted by a and’a collection of sets of possible outcomes to which we might wish to assign probabilities, 4 the event. In order to obtain a sensible theory of probability, we require that our collection of events is an algebra over, ie. it mist possess the following properties (i) Mea Gi) If 4 in a,then AeA (iii) If A, and 4 € A, then AU € A In the case of finiter , we might note that the collection of all subsets of i necessarily satisfies the above properties and by using this default choice of algebra, we can assign probabilities to. any._possible combination of elementary events, Proposition 1.1: If 4 is an algebra, then-.¢ Proposition 1.2: If 4, and 4, ¢.2, then’ 4,(\4y €A forany algebra a Proposition 1.3:If a is an algebra and 4,,4jyondy ©, then (4, ©. 1.2. Probability Functions/Measures Let @ denote the sample space and A denote a collection of events assumed to be a o- algebra. Definition 1.1: (Probability Function): A probability function P(] is a set function with domain 4 (a o- algebra of events) and range [0,1], ic. P: A> [0,1], which satisfies the following axioms (i) P[4)20 forevery de a Gi) Peay=1 I (ili) If 4, 4y,.. is a sequence of mutually exclusive events (i.e. 4,914; for any i j) in @ andif (J4,<, then P| Ga] SrA) 1.2.1. Properties of P[.] A remarkably rich theory emerges from these three axioms (together, of course, with those of set theory). Indeed, all formal probability follows as a logical consequence of these axioms. Some of the most important simple results are summarised here. Throughout this section, assume that a is out collection of possible outcomes is a o- algebra over 2 and P[} is an associated probability distribution 7A, Fit Powe Scanned with CamScanner pa Many of these results simply demonstrate that things which we would intuitively want to be true of probabilities do, indeed, arise as logical consequences of this simple axiomatic framework Proposition 1.4: P(@) Proposition 1.8: If 4,,4,,.4, are pairwise disjoint elements of 4, corresponding to mutually exclusive outcomes in our experiment, then P[4U4]U-U4]= P14) a Proposition 1.6: If 4 4 then Pa] =1-P(4) Proposition 1.7: For any two events 4,Be A P[4Ua]=P[4]+P[8]-P[4n3] Proposition 1.8: If 4,8¢4 and.«< 2 , then P[4}s [3] Proposition 1.9 (Boole’s Inequality): If4,...4, €4, then P[AU 4 U..U4,] $ PLA] +P] +--+ Pl 4] Definition 1.2: (Probability Space): A probability space is the triple (2,a.P(]), where isa sample space, A is @ ¢- algebra overn , and Pf} isa probability function with domain a. Conditional Probability and Independence Sometimes it’s possible to observe that one event has occurred. In this situation, we wish to have a model for the behaviour of the probability that other events compatible with 8. Conditional probability is the appropriate language. Definition 1.3: (Conditional Probability): Let 4 and » be events in 4 of the given probability space (2, 4,P[]). The conditional probability of event 4 given event #, denoted by P[4) 8], is defined as Plans] FLA a i P[B]>0, and is left undefined when P[8] = 0. Exercisel.3.1: Consider the experiment of tossing two coins, 2={(H,4),(H,7),(7.4),(7,T)}, and assume that each point is equally likely. Find (i). The probability of two heads given @ head on the frst coin (i). The probability of two heads given atleast one head. Theorem 1.1: (Law of Total Probability): For a given probability space (QAPf)), if 8.8, is a collection of mutually disjoint events ina satisfying a, ny —_—_—_ Put Your Own Note ee—— Scanned with CamScanner — ie. ByonBy partition a and P[B,]>0, |outt then for every dea, Conditional probability has a number of useful properties. The following elementary result is surprisingly important and has some far-reaching consequences. Theorem 1.2 (Bayes? Formul: ): For a given probability space (0,4,P{), if A,Be A are such that P[4]> 0, P[B]>0, then: Pais] esa Theorem 1.3 (Par sea Formula): If 8,,..,8, € A partition then for any P(a)= SPA 8)P(B) Theorem 1.4 (Multiplication Rule): For a~given probability space (QAPE), let dona be events belonging to°4 for Which PL Ay.udy1]>0 then PLA Arena) = Pls LA] BL A Ae Definition 1.4 (Independent Events): “For a given probability space (Q,AP[}), let 4 and a be two events in:a, Evetits 4 and # are defined to be independent iff one of the following conditions is satisfied (i) P[Ana]=P[ a} (a) (i) P[4)a]=P[4] if P[a)>0 (ii) P[a| 4]=P[a] if P[4]>0 Exercise 1.3.2: Consider the experiment of rolling two dice. Let A {total is odd}, B={6 on the first die}, C = {total is seven}. (i) Are 4 and » independent? (ii) Are 4 and C independent? ii.) Are a and C independent? Definition 1.5 (Independence of Several Events): For a given probability space (2,A,P[]), let 4....4, be events in a. Events 4,...4, are defined to be independent iff @ PLAN ]=PlA Pl 4 pies Gi) PLAN 4,14 ]= PLADP[ 4, JPL pie i Fekete & nr[e4 TT PlAl =O SAT, Cae SPTRNGE R HEMET STS BB Scanned with CamScanner Ips. URC Ley { Seal meee | CHAPTER 2 RANDOM VARIABLES 2.1 Random Variables and Cumulative Distribution Functions We considered random events in the previous chapter: experimental outcomes which either do or do not occur. In general we cannot predict whether or not a random event will or will not occur before we observe the outcome of the associated experiment — although if we know enough about the experiment we may be able to make good probabilistic predictions. The natural generalization of a random event is a random variable: an object which can take values in the set of real numbers (rather than simply happening or not happening) for which the precise value which it takes is not known before the experiment is observed. The following definition may seem a little surprising if you've seen probability only outside of measure-theoretic settings in the past. In | particular, random variables are deterministic functions: neither random nor variable in themselves. This definition is rather convenient; all randomness stems from the underlying probability space and it is clear that random variables and random events are closely related. This definition also. makes it straightforward to define multiple random variables related to a single i experiment and to investigate and model the relationships between them, De a random variable, X, is a function with domain 9 and co-domain R (the real line) (ie, XR) ion 2.1 (Random Variable): Given a probability space (0,4,P[), Example 2.1: Roll 2 dice =| J)s4,7=lou6}. Several random variables can be defined, for example X(j,/) =i j, also ¥((ix/))=[/—/): Both, x and y are random variables. x° can take values 23,12 and y can take values 0.1.05 Definition 2.2 (Distribution Function): The distribution function of a random variable X, denoted by Fy(-) is defined to be the function Fy: [0,1] which assigns i Fy (x)= P[X [0, «) is a probability density function iff @ s(x)20 ve Gi) [PP ()ac=1 2.3 Expectations and Moments Definition 2.9 (Expectation, Mean): Let x be a random variable. The mean of x , denoted by wy or E[X], is defined by (i) E[X]=¥, x f(xy) if x is disonete with mass points 31.43.04. (ii) EL x]= [© afr (x)ae if x is continuous with density fy (x) Intuitively, E[2] is the centre of gravity of the unit mass that is } specified by the density function. inal Initdcnmcnieny co Weber dlnseateny.com (Be ara ra mrs No TRO i Ts LAE Ca BGA FET TH B Scanned with CamScanner Dripsacademy er Exercise 2.3.1: Consider the experiment of rolling two dice. Let x denote the total of two dice and r their absolute difference. Compute B[X’] and E(Y] . | Erercze 2.3.2: Let x bea continuous random variable with density fel’ ne if OSxo Corollary 2.1: If x isa random variable with finite variance, o% then: rox J= P(X -ny) 270% Je 4 | Pllx nel Note, that the last statement can also be written as P[|X-nylsroy ]21 oa ‘Thus the probability that x falls within ro units of jy is greater than or equal to For r=2 one gets Ply -20y Var(X) 2 Binomial Distribution x is said to have binomial with parameter p if its PMF is P(X =b)="Cyp* (1-5 k= Ohya Ospst We write X~b(np) E(x ¥(x)=n09 M(d=(aere) ve ‘Note: Binomial distribution can also be considered as the distribution of sum of m independent & identical Binomial rv: i. (1, p) Results: (Let X,(F= 12,44) be independent R.V’s with X,~b(n,.p) then Xt Xy tot Xp ~D (tote. p) distribution, In particular, take n,m oF 1 (ii) Let X,Y be two independent, non-negative finite integer valued RV's & let 2-247. Then Z is a binomial RV. with parameter p iff x &Y are binomial R.V. with same parameter p ii) Tf x ~0(n,p) then n—x ~b(n9) nomial Distribution (Pascal or Waiting time distribution) Let x denotes no. of failures that precede the r" success Then Xx +r is total no. of experiments replication to get r successes. rosy e( ‘re Dy We write, X-NB(rp) Baya 2s MC 7A, Ft Hoon Jia Sars Scanned with CamScanner 3.1.6 Hyper-Geometric Distribution Var(x)="4 provided ge! <1 ? Note: If we are interested in no. of experiment trials needed. Then y= X47 eor=y)-(2)}or0 py iyeresls Special Case: If rat then we say 2 is a geometric RLV. Results: (Let youd be independent NB();:p) ee Corollary: Take 5, then sum of independent geometric is iorial (ii) Let x and y be independent a(y;:p)-and-NB(r,p) . Then condition PMF of x given’ «= oX+Y=1 is entry -x ‘ ee (ervaat) oo 1, then this distribution is uniform on ¢ points. In particular, if 7 If x has a geometric distribution then for any m,n = NU {0} (Memory Less) P{X > m+nly,,.} = P(X 2n} Converse is also true. (iv) Let x be a non-negative integer valued R.V. satisfying P(x >m+ily, }=P[X 21) vm Then x must have geometrie distribution () Let x «n) be independent geometric R.V. with parameter p, A (-p) then Xj) =min{ Xi} is also geometric with parameter p =1~ ‘A box contains W marbles of which a” are marked. Now, » marbles are drawn, Let x denote no. of marked marbles drawn, (#1) N-M) wdc), P(x ax EDIT, cing an) ln max (0M 4n—N)Sx EUs BAST, Ca SPIRE BHT TD @ Scanned with CamScanner Result: (i) Let x and ¥ be independent R.V.’s with distributions 6(m,p)6(n, p) ‘Then conditional distribution of x given +1 is hyper geometric. = Geta "Cy a" m8, pg a "Cyn 3.1.7 Poisson Distribution A RY. is said to be a Poisson R.V. with parameter 3.> 0 if its PMF is given by tak P(x =4) = k= 0,12, fet) EX M()=e Var(X)=% Results: x, (Let X, be independent Poisson R.V.’s Ny ~ PO) 12 oo Then Spa Xp pot Ny 18 PQ tot Ry) RVS Converse is also true. (ii) Let xr be independent R.V.'s with P(2,) and P(A.) respectively then conditional distribution of x given x+¥ is binomial (Converse is true) Uses of the Poisson Distribution For large », and small p, X~ Bin(n,p) is approximately distributed as Poi (np). This is sometimes termed the “law of small numbers”. A Poisson Process with rate & per unit time is such that (@_ X, the number of occurrences of an event in any given time interval of length « is Poi(2s) Gi) The number of events in non-overlapping time intervals are independent random variables (see later) AIH, i Flo) Sara, Scanned with CamScanner 3.1.8 Multi-Nomial Distribution (Generalized Binomial Distribution) Let xj.%,-1 be non-negative integers such that x +...) $m then probability that exactly »; trials terminate in 4,(/=1,...,k1) & hence that ay em—(4 +t) tials is sti ait M()=(n1e" ¥ pre? +t Pee +m)” 2(e8 4) Vian eR 2(X,)=mpj. Var(X))=ne,(I-P,) Cov.) =n Ph Summary st | Distribution [PEM E(x) |var(x) [ey : | Boson frame |, : Mex) X~ P(A) e k=0,12, Binomiat |P(¥=4)= "Cupta™* leone 2 | an "p~ |noa (a+ pe! a) k=0,1,2,...0 ale ee ee 3) Uniform 25 4 | two Point 0 rya-oF | Negative ao | Binomiat (peer a | ) * i X~ BrP > |p : acl Geometric 5 Gl peo 8) x (up) PO P8 > 2 mae (‘ N-M 2 alan My Hyper- [P(X =2)=424 n se 7 | geometric & x [Row { (wal iWdiomerdeucoe WED: .dnaeada eo TILE, New Dai 1006 Ph ESSN, Ca RING PAT Scanned with CamScanner Continuous Distributions Lora ITT aN 1. Uniform Distribution eee eee 1X is said to have uniform distribution on [4,6] if ts PDF is given by fe f(x)={b-a (0; otherwise asxsh +b BRS Var(X) M(s) Gal Results: Let x be an R.V. with a continuous DF. F, then F(x) has the uniform distribution on (0.1) bution . Gamma Di An R.V. x is said to have gamma distribition with parameters «and p if its PDF is + ‘er; acy s(x) =1(Fa(8)" 0 otherwise We write x ~ (0,8) E(X)=aB — Var(x u() Special C: (a) When a=1, then we say x follows exponential distribution with parameter [ rio-fP te; x>0 oro orn=t 8 pafem 20 se) 0; — otherwise | oo euy=p=t (6) When a= (a0 integer) & 9=2 | Then A WN. Qexewo | foy-|n so | [HATH (Pet Flor Ja Sarak Maus Khas, Neat LLT., New DaMF-11OO16, Ph (O11) 26S57S27, Cli 990183434 & 98VV1GLT34, ASRRHAGTHO Scanned with CamScanner Beiivsacademy is said chi-square x? (n) distribution. E(X)=m, Var(X)=20 Results: Let _X; (/=1....m) be independent RV. such that X; ~ 6 (a,B), then 5.=D4,~6(Ea8) RV, Corollary: (Take a;=1 vi Then 5, ~G(n.B) i.e. sum of exponential is Gamma Gi) Let X ~G(a,8) & ¥ ~ G(a;,p) be independent R.V. then x+y and © are independent x rx +¥ and —*— all independent or x+y and all indep conversely also true, ii) Memory less property of exponential P(X >r+sly,,)=P(x> 2) where x ~ex9(2) (iv) If x &Y are independent exponential R.V.’s with parameter 8, then Zz has a U(0,!) distribution ¥e7 3.2.3. Beta Distribution An RV. x is said to have beta distribution with parameters a&B (a>0,B>0) ifits PDF is ae S(x)={ aap) O<*<° 0; otherwise We write X ~ B(a,6) £(x)=—2— a tlerner 00 = a aabet) Note: Ifa =f =1, we have U(0,1) Results: () x ~B(a,8) then 1- x ~ 8(6,a) Gi) Let x~G(a,8) & Y~-G(as,8) be independent then a TAK & TAIT ARTY Zoe slaves) RV eY (__ BAH ee Seta Scanned with CamScanner Ty 3.2.4. Normal Distribution (Gaussian Law) (@) AnR.V. x is said to have a standard normal distribution if its PDF is ee? wexee f(x) we write 4°~1(0,) (b) We write X ~N(p.07) m()ron{oee 2) Central moments ie. £((X-1)")=0 if m is odd =[(2n=1)(2n-3)...3.1]o™ if n is even Results: (Let X,,..,.%, be independent R'V.'s such that Xe- Neel), ket then sda ie Corollary: (a) (©) IF X,~(0,1)(¢=1,.-.m) are independent, then (0.1) Let x & be independent R.V.’s then x +¥ is normal iff x &Y is normal Let x & 1 be independent R.V.’s with (0,1) then x + and xy are independent. ‘An RV. is said to have normal distribution with parameters yx & o(> 0) Scanned with CamScanner Let Xj, are independent N(y,,0°) & N(2,02) then X,—, and X, +4, are independent. (a) X~N(041)> x? ~87(0) ©) X-N(uo? = at ~ Maw, ax +b~ (aus ba2o?) x=N (uo?) (0,1) war beiid M0") RVs thea 2 Cauchy (1,0) x Cauchy (1.0 i ! MM gas PDF 2 o0,-1

0 Conditional Distribution For jointly continuous random variables x and Y, Sex vis)=[fnx(els)de vx such that f(x) >0 4.4. Conditional Expectation We can also ask what the expected behaviour of one random variable is, given knowledge of the value of a second random variable and this gives tise to the idea of conditional expectation. Definition 4.10 (Conditional Expectation): The conditional expectation in discrete and continuous cases corresponds to an expectation with respect 10 the appropriate conditional probability distribution: Diserete E[Y|X=s]= DvP (Y= 91K =x) Continuous ELL = 2]=[° ohne (V1 = 2), Note that before x is known to take the value x, E[}’| X] is itself a random variable being a function of the random variable x . We might be interested in the distribution of the random variable E[Y| x). Theorem 4.1 (Tower Property of Conditional Expectation): For any two random variables X, and X, sR Nae LL Nw DIOLS, Ps 0 RHE ak PATRI wee wn aoucsdet com pee 20 (a marran ms Scanned with CamScanner ee ELEN 1 42))=EL%] Exercise 44.1: Suppose that © ~ Ufo, 1} and (x |@)~8in(2, @) Find B[4|@] and hence or otherwise show that E[X]=1 4.5. Conditional Expectations of Functions of Random Variables By extending the theorem on marginal expectations we can relate the conditional and marginal expectations of functions of random variables (in particular, their variances). Theorem 4.2: (Marginal Expectation of a ‘Transformed Random Variables): For any random variables X; and X;, and for any function 40 {EL A(%)) X2]]= ELA) Theorem 4.3 (Marginal Variance): For any random variables X, and x, Var( X,) = [ Var(%; |2)]}+Var( M1 4%2]) 46, Independence of Random Variables Whilst the previous sections have been conceméd with the information that ‘one random variable carries about another, it would seem that there must be pairs of random variables which each provide no information whatsoever about the other. It is, for example, difficult to imagine that the value obtain when a die is rolled in Coventry will tell us much about the outcome of a coin toss taking place at the same time in Lancaster. There are two equivalent statements of a property termed stochastic independence which capture precisely this idea The following two definitions are equivalent for both discrete and continuous random variables Definition 4.11: (Stochastic Independence): Definition 1 Random variables X,,X2,...X, are stochastically independent iff nals) “FTF 0) Definition 2: Random variables. ),,,....%, are stochastically. independent iff foe s)=T1 f(s) If X; and X; are independent then their conditional densities are equal to their marginal densities 4.7. Covariance and Correlation Having established that sometimes one random variable does convey information about another and in other cases knowing the value of a random variable tells us nothing useful about another random variable itis useful to have mechanisms for characterising the relationship between pairs (or larger ‘groups) of random variables Definition 4.12 (Covariance and Correlation): Covariance: For random variables ’ and ¥ defined on the same probability space THAT, Ft Paria Sara Haat Khar rau lafaaipracanvcom: Webate: wor istndctesom Scanned with CamScanner Joint and Conditional Distributions Re Dab 11016, Ph (268759, Cas MTR vam ee Put Your Own Notes 48. Ty Coo. Y] FLX He (Hr) EL] -ney Correlation: For random variables x probability space Cov[.x,¥] and Y defined on the same CovfX,7] ola] Wvar[x]Ver[?] provided that ¢y >0 and oy >0. ): Let X and Y have finite Theorem 4A (Cauchy-Schwart Tneqt second moments. Then f°] 2X]=1 for some constant ¢ e[xy]) =|e[x yp se[ x ( elevy’ sel, with equality if and only if PLY ‘Transformation of Random Variables: (x) Theorem 4.5 (Distribution of a Function of a Random Variable be a random variable and Y= g(X) where g is injective (ic. it maps at ‘most one x to any value y). Then ) de dy fel») = fe(27"09)} given that (2"'(y)) exists and (g°'(y)) >0 vy or (g"(y)) <0 vy. If g is not bijective (one-to-one) there may be values of for which there exists no x such that y= g(x). Such points clearly have density zero, When the conditions of this theorem are not satisfied it is necessary to be a little more careful. The most general approach for finding the density of a transformed random variable is to explicitly construct the distribution function of the transformed random variable and then to use the standard approach to turn the distribution function into a density (this approach is discussed in Larry Wasserstein’s “All of Statistics”) Exercise 4.8.1: Let X be distributed exponentially with parameter 0, that ee x20 Il eco Find the density function of ne a ee ae 0 Gi) y=x"%, poo 0. for x<0 (ii) ¥=e(X) with g(X)=|x for Osxst 1 forx>t ( Deli 0016, Ps (12657527, Cal 99918) 59117, RROD Scanned with CamScanner 2> ips ccademy Theorem 4.6 (Probability Integral Transformation): If X is a random UT variable with continuous Fy (x), then U = Fy (X) is uniformly distributed Pat Your Own Notes over the interval (0, 1). —— Conversely if U is uniform over (0,1), then X =F! (U) has distribution funetion Fy 49. Moment-Generating-Function Technique ‘The following technique is but one example of a situation in which the ‘moment generating function proves invaluable. Function of Vail. For Y= g() compute iy-efe"] -efee()] m(pe ele" }=e[ 4] If the result is the MGF of a known distribution then it will follow that ¥ hhas that distribution, Sums of Independent random variables. For ¥ =F, Xj, where the X; are independent random variables for which the MGE exists V—A<10 reef "Ja Ten (fr her ‘Thus [],my, (0) may be used to identify the distribution of ¥ as above Scanned with CamScanner CHAPTER 5 FERENCE 5.1. Sample Statisties Suppose we select a sample of size n from a population of size. For each i in {1,..n}, let X; be a random variable denoting the outcome of the i observation of a variable of interest. For example, X; might be the height of the i person sampled. Under the assumptions of simple random sampling, the X, are independent and identically distributed (i). Therefore, if the distribution of a single unit sampled from, the population can be characterized by a distribution with density function. f, the marginal density function of each X, is also f and their joint density function g is a simple product of their marginal densities =H) F(¥2)-S (rn) BOM Qnty In order to make inferences about a population parameter, we use sample data to form an estimate of the population parameter. We calculate our estimate using an estimator or sample statistic, which is a function of the X, We have already seen examples of sample statistics, for example the sample mean where » is the size of the sample is an estimator of the population mean, eg, fordiserete ur SaP[e= where NV is the number of tinct values which it is possible for an X; to take, 5.2. Sampling Distributions Since an estimator 6 is a function of random variables, it follows that is itself a random variable and possesses its own distribution. The probability | distribution of an estimator itself is called a sampling distribution, Proposition 5.1 (Distribution of The Sample Mean): Let ¥ denote the sample mean of a random sample of size n from a normal distribution with ‘mean 1 and variance o?. Then rf) Scanned with CamScanner - Theorem 5.1 (Central Limit Theorem): Let f be a density function with 2 mean and finite variance a? Let X be the sample mean of a random sample of size n from f and let ] J] x-e[7]_¥. Fale vn ‘Then the distribution of Z, approaches the standard normal distribution as nv. This is often written as: Z,—“+N(0,1) with “> denoting convergence in distribution. Thus, ifthe sample size is “large enough”, the sample mean can be assumed to follow a normal distribution regardless of the population distribution. In practice, this assumption is often taken to be valid for a sample size n>30 The Chi-Squared Distribution ‘The chi-squared distribution is a special case of the gamma distribution. The smple variance Ss-3p 1 mI of a standard normal distribution is 7 with n-1 degrees of freedom. Definition 5.1: If X is a random variable with density r(f SF x)= 14 (8 (3) ° otherwise then X is defined to have a y? distribution with & degrees of freedom (1?) where k is a positive integer. ‘Thus the 12 density is a gamma density with r=f Result: Ifthe RVs X,, =... are independently normally distributed with ‘means 41, and variances o? then 2) has a z2 distribution. Theorem 5.2: If X,,.Xq is a random sample from @ normal distribution with mean js and variance o? then (i) ¥ and 4,(x,-X)° are independent, 7 CA SENDA HORTON, SHAT Ba Scanned with CamScanner Corollary 5.1: If s? = _Y'(x, isthe sample variance of a random sample of size n from a normal distribution with mean 1 and variance a? , then The ¢ distribution is closely related to the normal distribution and is needed ! for making inferences about the mean of a normal distribution when the variance is also unknown, Definition 5.2: If Z ~N(0, 1), U~z3 and Z and U are independent of one i other, then z Ve where 1 denotes a f distribution with & degrees of freedom, The density of the ¢ distribution is: ‘The MGF for the ¢ distribution does not exist, but E[x]=0 as the distribution is symmetric (actually the expectation only exists when > 1 although an extension known as the Cauchy principle value can be defined more generally), and it can be shown that var[x]=—4., for k>2 kd Theorem 5.3: If X~ then | L() 2 we ¥ as kro, That is, as k+0 the density approaches the density of a standard normal. The F Distribution The F Distribution is useful for making inferences about the ratio of two ‘unknown variances. ‘HAL, rat Flor Sa Sara Hau Khas, Neo LT New DATI0OK, Ps (11) 27ST, Ca PTANGN ADDICT, SHAT ‘rma nftadarnndea com: Web: ww easton om Scanned with CamScanner Siayse U and ¥ ate independently itu wih SERRE ‘Put Your Own. Definition 5. U~22, and y~ 33 . Then the random variable Notes xe U ft is distributed sccotng to an F distbution with m and n depress of freedom. The density of X is given by (net) S()=Faa()= HG sae rf rf [mya aay fies ; U7) Returning to the case of two independent random samples_froity normal Populations of common variance, but differing means River y, ~M(Hie") Foostay ~M(H2s9?) 5.3. Point Estimation ‘The sample mean and variance are examples of point estimators, because the estimates they produce are single point values, rather than a range of values. For a given parameter there are an infinite number of possible estimators, hhence the question arises: what makes a “good” estimator? Definition 5.4 (Unbiasedness): Let X be a random variable with pdf £(x:0), where O«MER? is some unknown parameter, p21. Let Xjond’y be a random sample from the distribution of and let 6 denote a statistic. 6 is an unbiased estimator of 6 if F[i]-0 voen where the expectation is with respect to (x; 8). If 6 is not unbiased, we say that 6 is a biased estimator of 0, with sies(3)-[8]-0 IF 6is(@) +0 when n>. then we say that 6 is asymptotically unbiased, Example 8.1: Consider the following estimator of the population variance nus Khas New LT, New DoB-1 0016, Pa (01) 255757, Ct 99915504 & SPIES, SHONTID Ea Scanned with CamScanner Ty We find that As this decays to zero as | 1 itis an asymptotically unbiased estimator, However, we can see that | the sample variance 2 DSK 3] So this estimator is biased with bias equal to Consistency: In order to define a consistent estimator, we first define convergence in probability Definition 5.5 (Convergence in Probabi ): Let {X,} be a sequence of random variables and let be a random variable. We say that x, converges in probability © X if Ve>0 lim P[|X,—X|>€]=0, or equivalently tim P[]x—X]8(a) Consistency according to the definition above may be hard to prove, but it tums out that a sufficient (though not necessary) condition for consistency is that sias(8) +0 and Var(@)-+0 as n> Definition 5.7 (Consistency in Mean-Squared Error): If 6 is an estimator of 6, then the mean squared error of 6 is defined as )-#{(0-0)'] and 6 is said to be consistent in MSE if Mse(@) +0, as the size of the wst( sample on which 6 is based increases to infinity. (8) =Vae(@)+[ias(6))° Interval Estimation This section describes confidence intervals which are intervals constructed such that they contain @ with some level of confidence. Definition 5.8 (Confidence Interval}: Let Xj; be a random sample from a distribution with pdf (x; ) where @ i8 ait unknown parameter in the parameter space ©. If Land U arestatistes such that Pl Dai6016 Fh (1) 26575, CalDOPIANOM& DOPLETON, SRIBTID Scanned with CamScanner Hence P(-1.96Zq a8 mr, In practice, if n2120, the quantiles of the standard normal can be used instead (they are essentially indistinguishable from those of the ¢ distribution in this regime). Bxercise 5.4.3: Suppose we observe the following data on lactic. acid concentrations in cheese 086 153 157 181 099 109 129° 1.78 ~ 1.29 1.58 Assuming that these data are a random sample from a-normal distribution, calculate a 90% Cl for the mean of this distribution. CI for Differences in Means of Normal Populations with Equal Variance: Suppose we have two independent random samples Kieoky =Mb49") Tt, Met) In Section 5.2 we saw that ~tayam-d> Where 3 isthe pooled variance 2 (=U)sh +(ns -I)F ‘Which can be shown to be an unbiased estimator of o* ‘Therefore the exact 100(1-a)% CI for 4, —1, in this case is In the case of unequal variances, an exact Cl cannot be derived. Exercise 5.4.4: Consider a study to compare the nicotine contents of two brands of cigarettes. 10 cigarettes of Brand A had an average nicotine content of 3.1 milligrams with a standard error of 0.5 mg, while 8 cigarettes of Brand B had an average nicotine content of 2.7 mg with a standard error of 0.7 mg, Assuming that the 2 data sets are independent random samples from normal populations with equal variances, construct a 95% confidence interval for the difference between the mean nicotine contents of the two brands. Confidence Interval for the Variance of a Normal Population: Suppose Xiu, are iid normal with unknown 41 and o?. From the Corollary of ‘Theorem 5.3 we know that (n=)? TBA, et Flor Sa ner Khas, Near LIT New DAWFTOOG, Pe Scanned with CamScanner Baipsccademy oo ——= He aus Gun Notes le Where P(0< x22, Hence an exact 100(1-«)% CI for o? is given by (eu sv) (aaianet “Maree Note that the 72 distribution is asymmetric, hence this Cl is asymmetric, unlike the previous Cls we have defined, This illustrates an important point: any interval which has the appropriate coverage probability can be used as a confidence interval and there is no unique (1~a) confidence interval for a statistic (imagine shifting the left endpoint of the interval slightly towards the cent; the right endpoint could then be moved a small amount away from the centre to compensate). Symmetry is one of the properties which is often considered desirable when dealing with confidence intervals, but it’s certainly not essential CI for the Ratio of Variance of Normal Populations: Suppose we have ‘wo independent random samples Kok, —M(sic8) Hoth, ~Mae8) Where pyn6; and oy are unknown. Lt o-let In Section 5.2 we found that Thus we have PF cim-tt-e2 <2 < Fajita) P|) 0c as Fahne ame a from which we derive the following exact Cl for 2 : . Note this Cl is also asymmetric. Example 5.2: Consider the study described in Exercise 5.4.4. A 95% Cl for cee pletict a (8s a 85 asa) (25 anthem) -( 122,246) Which includes 1, so the assumption of equal variances seems reasonable in TA, eat Fes) Soa vas Ks, Near LIT, New Dai-I0016 Ph 11)2689757, Cle SESE POTTING SRIETOD emai noddatcademrcom: Webte wwpacaden.com Scanned with CamScanner CHAPTER 6 MAXIMUM LIKELIHOOD ESTIMATION 6.1. Likelihood Function and ML Estimator Suppose we have a simple random sample X;,...X, from a density /(x; 0) parameterized by some possibly unknown parameter 6. The joint pdf of the entire data set is £(x58) with x =(x),0.4%)" The likelihood function £(0; x) is this joint pdf viewed as a function of 0 with x fixed to the observed data. It is often more convenient to work with the log-likelihood 1(8; x)=log[£(0; x)] Which in the simple random sampling case above yields: 1(8; x) = Yilow f(x: 0) ‘The maximum likelihood estimator (MLE) is the value 0 which maximises 1(0; x). The MLE also maximises 1(0; x) because log(.) is monotonic function. Usually it is easier to maximize 1(0; x) , so we work with this Example 6.1; Suppose X,,...%, is a random sample from the exponential distribution with pdf foe x>0, 0 otherwise F(x) <0, s0 8 does comeapord to a maximum, Exercise 6.2: Suppose Xj...4', form a random sample from the Poisson distribution Poisson (8). Derive the ML estimator for 0 BATT, est Fw Ta aus Khas, Near LT, New Dei 11016 Pu (1) 265757, Cals 999164 & SOPLSLTIN ASHBLAT Scanned with CamScanner 1 6.4. Properties of MLEs ‘The maximum likelihood estimator has a number of useful properties. Theorem 6.1: Suppose @ and 4 represent two alternative parameterizations and that 4 is @ one-to-one function of O (more formally, there exists a bijective mapping, g, from 9 to4), so we can write $=8(0), 0=A(6) for appropriate g and h= g"! Then if @ is the MLE of 6, then the MLE of 4 is ¢(3). Corollary 6.1 (Invariance of MLE): Let 6,,...0, be a MLE for 6,,..,0, If 7(9)=(7,(0),--.7;(8)) isa sufficiently regular transformation of the PARAMETER space @ , then 7(0)=(7 (0). (8)) isa MLE of 7(0) Lemma 6.1: Suppose there exists an unbiased estimator, 6, which attains the Cramer-Rao bound. Suppose that 6, the MLE. isa solution to a [HA rt Pty Sa Sarl Haus Ks Near LT, New Doi 11016 Ps (UD) 265007, Cah PRIN A DDSTONTG RTD ‘ui niece cams Weber stead am 4 Scanned with CamScanner 1 CHAPTER 7 HYPOTHESIS TESTING Introduction | A hypothesis is a falsifiable claim about the real world; a statement which if true will explain some observable quantity. Statisticians will be interested in hypotheses like (i) “The probabilities of a male panda or a female panda being born are equal’, (ii) “The number of flying bombs falling on a given area of London during World War II follows a Poisson distribution”, (ii) “The mean systolic blood pressure of 35-year-old men is no higher than that of 40-year-old women”, (iv) “The mean value of ¥ = log (systolic blood pressure) is independent of X= age" (ie. B[Y[X =x] = constant) These hypotheses can be translated into statements about parameters within | a probability model: @ aap. (i) + (within the general probability model p, >0. Yn ~Poi(n) for some 220", ie. : py =P(N=n) =D" exp(-2)/n! beni Pa =1) | ii) *0, 50," and iv)“, -0° (assuming the linear model BY |x]=y +B") Definition 7.1 (Hypothesis Test): A hypothesis test is a procedure for deciding whether to accept a particular hypothesis as a reasonable simplifying assumption, or (o reject it as unreasonable in the light of the data, Definition 7.2 (Null Hypothesis assumption we are considering making. The null hypothesis Ho is the default Definition 7.3 (Alternative Hypothesis): The alternative hypothesis #, is the alternative explanation(s) we are considering for the data. Ordinarily, the null hypothesis is what we would assume fo be true in the absence of data which suggests that itis not. Ordinarily, Hy will explain the data in at least as simple a way as 1, Definition 7.4 (Type I error): A type I error is made if Hy is rejected when +, is true. In some situations this type of error is known as a false positive. Definition 7.5 (Type I error): A type II error is made if Hy is accepted when Hg is false. This may also be termed a false negative Scanned with CamScanner Example 7.1: In the first example above (pandas) the null hypothesis is Ho: i= Pa ‘The alternative hypothesis in the first example would usually be Hy: p,# Pz, though it could also be (for example) @ Hin py, oF Gil) Aye -P2 each of these alternative hypotheses makes a slightly different statement about the collection of situations which we believe are possible. A statistician needs to decide which type of hypothesis test is appropriate in any real situation, for some specified 6 + 0. 7.2. Simple Hypothesis Tests The simplest type of hypothesis testing occurs*when the probability distribution giving rise to the data is specified completely under the null and altemnative hypotheses, ~ Definition 7.6 (Simple Hypotheses): A simple hypothesis is of the form H,:0=0,, where is the parameter vector, which parameterizes the probabilistic model for the data. A simple hypottiesis specifies the precise value of the parameter vector (ice. the probability distribution of the data is specified completely) Definition 7.7 (Composite Hypotheses): A: composite hypothesis is of the form #1, :0€0,, ic. the parameter O,lies-in a specified subset 2 of the parameter space. This type of hypothesis specifies an entire collection of values for the parameter vector and so specifies a class of probabilistic models from which the data may have arisen. Definition 7.8 (Simple Hypothesis Test): A simple hypothesis test tests a simple null hypothesis 7, :0=0, against a simple alternative #,:0= 0, , where @ parameterizes the distribution of our experimental random variables X= X).2p..u0Xy Although simple hypothesis tests seem appealing, there are many situations in which a statistician cannot reduce the problem at hand to a clear dichotomy between two fully-specified models for the data-generating process There may be many seemingly sensible approaches to testing a given hypothesis. A reasonable criterion for choosing between them is to attempt to minimize the chance of making a mistake: incorrectly rejecting a true null hypothesis, or incorrectly accepting a false null hypothesis, Definition 7.9 (Size): A test of size a i hypothesis Hy fone which rejects the mull 1p in favour of the alternative #, :0=6, ifand if XeC, Where P(XeC, |@=0))=0 for some subset C,, of the sample space S of X. The size, a, of a testis the probability of rejecting 7, when Hy is in fact true; ie. size is the probability of type Terror if Mg is true. We want a to be small (a.=0.05, say. AM, at oe) Sia oes, Near LL New Dai, Pus (11 AES7S2, Cok DOPOTIDEM & BOVIS BSBA [27 | Scanned with CamScanner Definition 7.10 (Critical Region): The set C,, in Definition 7.9 is called the critical region or rejection region of the test. Definition 7.11 (Power & Power Functio with critical region C, is the function B(0)=P(XeC, |), and the power of a simple test is P=B(0), ie. the probability that we reject Mo in favour of H, when 1 is true. Thus a simple test of power B has probability 1-p of a type I error occurring when H, is true, Clearly for a fixed size a of test, the larger the power i of atest the better. : The power function of a test However, there is an inevitable trade-off between small size and high power (as in a jury trial: the more careful one is not to convict an innocent defendant, the more likely one is to free a guilty one by mistake), A hypothesis test typically uses a test statistic 7(X), whose distribution is known under Hy, and such that extreme values of 7(X) are more compatible with #, that Hy Many useful hypothesis tests have the following form: Definition 7.12 (Simple Likelihood Ratio Test): A simple likelihood ratio test (SLRT) of Hy :0=0) against #7,:0=0, rejects Hy iff | [eles cz {xO < 4} Koco tas) <} Where £(0:.) isthe likelihood of@ given the data x, andthe number 4, is chosen so that the size of the test is a. ‘nN (0,1). Show that the =0 against H, :0=1 can be written, Exercise 7.2.1: Suppose that Xj, likelihood ratio for testing Hy 2(a)=e0[ oz a} Hence show that the corresponding SLRT of size a rejects Hy when the test statistic T(X)=* satisfies > @"'(1-a)/V2. ‘A numberof points should be bome in mind + For a simple hypothesis test, both Hy and #1, are ‘point hypotheses’, each specifying a particular value for the parameter 0 rather than a region of the parameter space. ‘+ Inpractice, no hypothesis will be precisely true, so the whole foundation of ‘classical hypothesis testing seems suspect! Actually, there is @ tendency to overuse hypothesis testing: it is appropriate only when one really does wish to compare competing, well-defined hypotheses. In many problems the use of point estimation with an appropriate confidence interval is much easier to justify + Regarding likelihood as a measure of compatibility between data and model, an SLRT compares the compatibility of 6 and 0, with the observed data x, and accepts Hp iff the ratio is sufficiently large. + One reason for the importance of likelihood ratio tests is the following theorem, which shows that out of all tests of a given size, an SLRT (if one exists) is “best" in a certain sense. DAH, Fst Po) ia Sar Ha Scanned with CamScanner 2> ips academy Element Theorem 7.1 (The Neyman-Pearson Lemma): Given random variables Xi Aan Xy» With joint density f(x|0), the simple likelihood ratio test of a fixed size a for testing Hy :0= 0p against H,:0=0, is at least as powerful as any other test of the same size. Proof: Let 4 be a positive constant and C, a subset of the sample space satisfying ix the size of the test to be a. (a) P(XeC|0=%)=a, 2 (Box) _ F180) 5 2852) f(*1)) ‘Suppose that there exists another test of size a, defined by the critical region Cl, ic (b) XeGQe Reject Hy iff xeC, where P(x eC; |0=6))= Let B, = CpG. By = CoN. By =CENG, Note that 8, UB) =Co, 8, UB, jy and B,,2, & B, are disjoint. Let the power of the likelihood ratio test be I = P(X © Cy |0=6,), and the power of the other test be f, = P(X€C,|0=6)) We want to show that fy —f, 20 But Fo th= fc, FCI )ae-f,, F (#10 )de = Faye, SVE fa ya, (Orde =f, F0 Oi)def,, £2181) de AlsoB,

You might also like