CS1 2021 Upgrade

upgrade for 2021 exam

Uploaded by

rimitsu668

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

9 views48 pages

CS1 2021 Upgrade

upgrade for 2021 exam

Uploaded by

rimitsu668

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 48

(31-03: Generating functions Pages Cumulant generating functions For many random variables the cumulant generating function (CGF) Is easier to use than the MGF in evaluating the mean and variance. Definition ‘The cumulant generating function, C(t), of a random variable X is given by: Cx (0) =InMy (t) We can treat this as the definition of the CGF. Question The MGF of the Bia(n,p) distribution is given by: mte)=(a+Pe') State the CGF of the Bin(n, p) distribution. Solution y(t) =InMty()=In(q+ pet" =nnia+ pet) As a result, if Cy (¢) is known, itis easy to determine My (¢) We have Myit)=e°"") Calculating moments The first three derivatives of C(t) evaluated at f= 0 give the mean, variance and skewness of X directly. These results can be proved as follows: My(t) xt) Mx(t) Mic (EM (0) (Me (0)? (My (0)? Cx()= and he Actuatial Education Company (0 F#:2022 Examinations(51-03: Generating functions Now Mx(0) =1, so: 4 (0) = Mel) _ ELXI (= (0) "1 (MSc(ON? | ELXAI (EX arty; r : ‘M5 (O)(My (0))® ~3(M y (0)? My (O) MS (0) + 20 (O)(M.(0))® cx(o)= and 00) (mon = EOE 108 ~ 3 EDEL? + 2 EDD? t = skew(X) 1, Question 4 Hence prove that E(X) =. State the CGF of X where X~Gammata,2) skew(x)=2%. Solution gly OTe OG) Differentiating with respect to t : a wy =c(0) = a\ PICK) = var(xX)=Cy(0): chl= 3 3 7 cyit)= 25s 5) : 24(1-4) => skew) = C30) =25 The coefficient of © in the Maclaurin series of C(t) =InMy() Is called the r th cumulant and is denoted by x, The Actuarial éueation Company (© Fe: 2022 Examinations4s (s.06: sont detoutons Pages Similarly PiN=n,M=m) (om) mt Pi=m) (aga) 10 7a" Prantem(en, m= n=1,2,3 is the conditional probability function of N given M=m These are identical to the marginal distributions obtained in the chapter text. (i) Marginal density 1 ) Mane 4(52.2,) toroe fubo= J 5(3e +ay)dy Bi s5x) fordex(Xy +X + X5 +Xq) +800) POY, #%5 YG) (%, + Xp +9 +e) > 800) (©: 2022 Examinations Education Company66 67 68 69 (1.06: The Central Unie Theorem Pages Let X be the number of individuals with blood group A. X~ Bin(300,0.45)>N(135,74.25) a) Using a continuity correction P(x >115) becomes P(X >115.5) a pf 2> 55335 | _ pz >-2.263)-r(2<2.263)-0.988 a ras If our population is normal, we do not need the central limit theorem. The distribution of X is exactly normal > a Ren a) Yn Hence: Ak > 26.=0{ 2> 28°25) piz>2)=1-087725=0.02275 2) ( avis } Let x; be the sum assured under the ‘th policy We require 109 0] 3X: 845.0 | a Now, according to the Central Limit Theorem: wo ¥%,~(100%8000, 10030007) (approximately) a) a Therefore 10 - , x, >845,000 | = P| z > 845-000-800,000 }_ 57.1.5) 7 30,000 = 1-0.93319 = 0.06681 [2] ‘We have the sum of 100 discrete uniform random variables, X; (/=1,2,---,100). Using the formulae from page 10 of the Tables, with a=1, b=5 and h=1, we get 1+5 EUK)= 2 1 varlXi)= 3 (5-115—1+2) 0) he Actatial Eduation Company (0 F8:2021 examinations6.10 Page24 (31-06: The Cenval Umit Theorem Using the Central Limit Theorem: 109 $=. X/+N(800,200) (a) a Using a continuity correction, the probability is P(280< 5 <320)=P(279.5<5.<3205) a Standardising this: P279.5 <5 <320.5)=P(S <320.5)—P(S<279.5) 3205-300) < <9 X,6)20<0 resulting in (0,04) being @ 95% confidence interval for @ Fortunately, in most practical situations such quantities g(X,0) do exist, although an approximation to the method is needed for the binomial and Poisson cases. (014: 2022 examinations The Actaris(31.09: conten interval rages With prediction intervals, we are predicting a single future value from the distribution. Since we already have a sample of values (X,,...,X,) from this distribution, we'll cll this new predicted value Xp A similar approach can be used for prediction intervals. In the example above, of sampling from a normal distribution with known variance, X -X,,,, has a distribution that does not depend on jz, and in fact: X—Xnet _ (0,1 ania The predicted value comes from a normal distribution, X,,1 ~N(s,07) . The Central Limit Theorem tells us that for samples from a normal distribution, X~ N(y1,0/n). Hence, using the linear combination of normal distributions result from Chapter 4: XX — Mu w,0?/n+07)=N(0,07 (1/0 +1) Standardising this gives the result above. The previous derivations therefore give prediction intervals for X,.1 if we replace o/Vn with ox/i=77n : a 95% prediction interval for the random sample of size 20 above is: K A.96 01 72H ~ 62.75 420.08 Aless formal way to consider this is as follows. The predicted value comes from a N(u,o7) distribution, Since P(-1.96 £, and so the statistic will be positive. he Actatial Education Company (0 F#:2022 ExaminationsPages (S15: Generale near models The code for comparing these two (non-normally distributed) models, mode1 and mode12, in Ris: anova(modeli, mode22, ‘Avory important point is that this method of comparison can only be used for nested ‘models. In other words, Model 1 must be a submodel of Model 2. Thus, we can compare two models for which the distribution of the data and the link function are the same, but the linear predictor has one extra parameter in Model 2. For example fy +fyx and Po + Byx + fax, But we could not compare in this way if the distribution of the data or the link function are different, or, for example, when the linear predictors are fig + [4x + fx? and ip + Bslogx. It should be clear that we can gauge the importance of factors by ‘examining the scaled deviances, but we cannot use the testing procedure outlined above. In the first case, the difference between the models is yx”, and so a significant difference between the models tells us that the quadratic term should be included. In the second case, the difference between the models is fslogx— /2,x”,, and so a significant difference doesn’t tell us Which parameter is significant. ‘An alternative method of comparing modols is to uso Akaike’s Information Criterion (AIC). ‘Since the deviance will always decrease as more covariates are added to the model, there will always be a tendency to add more covariates. However this will increase the complexity of the model which is generally considered to be undesirable. To take account of the undesirability of increased complexity, computer packages will often quote the AIC, which Is a penalised log-likelihood: AIC = -2xlogly 1 2number of parameters where logly is the log-likelihood of the model under consideration. When comparing two models, the smaller the AIC, the better the fit. So if the change in deviance is more than twice the change in the number of parameters then it would give a smaller AIC. This is approximately equivalent to checking whether the difference in deviance is greater than the 5% value of the 7” distribution for degrees of freedom between § and 15. However, it has the added advantage of being a simple way to compare GLMs without formal testing. This is similar to comparing the adjusted R? for multiple linear regression models in the previous chapter and hence is displayed as part of the output of a computer fitted GLM. InR the AIC Is displayed as part of the results from summary (model) . ‘An example of this is given in the R box at the end of Section 5.4. (© FE: 2022 examinations ‘The Acatal Education Company5.6 (S112: Generale toear models Page as The process of selecting explanatory variables AAs for multiple linear regression the process of selecting the optimal set of covariates for a GLM Is not always easy. Again, we could use one of the two following approaches: (1) Forward selection. Add the covariate that reduces the AIC the most or causes a significant decrease in the deviance. Continue in this way until adding any more causes the AIC to rise or does not lead to a significant improvement in the deviance. Note we should start with main effects before interaction terms and linear terms before polynomial. Suppose we are modelling the number of claims on a motor insurance portfolio and we have data con the driver's age, sex and vehicle group. We would start with the null model (ie a single constant equal to the sample mean). Then we would try each of single covariate models (linear function of age or the factors sex or vehicle group) to see which produces the most significant improvement in a 72 test or reduces the AIC the most. Suppose this was sex. Then we would try adding a second covariate (linear function of age or the factor vehicle group}. Suppose this was age. Then we would try adding the third covariate (vehicle group). We might then try a quadratic function of the variable age (and maybe higher powers) or each of 2 term interactions (eg sex*age or sex*group or age*group). Finally we would try the 3 term interaction (ie sex*age*group). (2) Backward selection. Start by adding all available covariates and interactions. Then jes one by one star th the least significant until the AIC reaches a minimum or there is no significant improvement in the deviance, and all the remaining covariates have a statistically significant impact on the response. So with the last example we would start with the 3 term interaction sex*age*group and look at ‘which parameter has the largest p-value (in a test of it being zero) and remove that. We should see significant improvement ina 2? test and the AIC should fll. Then we remove the next parameter with the largest p-value and soon, The Core Reading uses R to demonstrate this procedure. Whilst this will be covered in the C1. PBOR, it’s important to understand the process here. Example We demonstrate both of these methods in R using a binomial model on the nt.cars dataset from the MASS package to determine whether a car has a V engine or an $ engine (vs) using weight in 1000 Ibs (jt) and engine displacement in cubic inches (isp) as cov Forward selection Starting with the null model: mode10 <- gim(vs ~ 1, datasmtcars, family-binomial) The AIC of this model (which would be displayed using summary (mede10) )is 45.86. We have to choose whether we add ci sp or w* first. We try each and see which has the greatest improvement in the deviance. modell <- update (modelo, ~.+ a(model0, modell, test: sp) ‘The Arata Education Company Fe: 2022 examinationsPage a6 (3113: Generalised near mose's Moder Model 2: vs ~ disp Resid. Dz Resid, Dey Dr Deviance er (. 43.860 20 -22.696 2) 21.164 4.2152-06 #4 motlel2 <- update{madel0, ~.+ wt 0, modei2, test="ch anova (mod Model 1: ve ~ 2 Medel 2: va ~ ut Resid. Di Resid. Dev Df Deviance Px (>Ch: a 343.060 a 30 31.367 1 12.493 0.0no4084 ¥ww So we can see that isp has produced the more significant result - so we add that covariate first. R always calls the models we are comparing ‘Model 1’ and ‘Model 2’, irespective of how we have named them. This can lead to confusion if we are not careful The AIC of model 1 (adding isp) is 26.7 whereas the AIC of model 2 (adding wt) is 35.37, Therefore adding disp reduces the AIC more from model 0's value of 45.86. Let us new see if adding w~ to disp produces a significant improvement: model3 <- update(modell, ~.+ wt) anova (nmedel1, model3, test="chi") Moder + disp Model ~ disp + WE Reaid. D2 Resid. Dev Df Deviance Pr (>Chi) 1 30 (22.436 2 23 -24.400 2 1.2958 0.255 This has not led to a significant improvement in the deviance so we would not add wt (and. therefore we definitely would not add an interaction term between ci: sp and wt). The AIC of model 3 (adding w«) is 27.4 which is worse than model 1's AIC of 26.7. Therefore we would not add it. Incidentally the AIC for models 0, 1, 2, 3 are 45.86, 26.7, 36.37 and 27.4. So using these would have given the same results (as Model 1 produces a smaller AIC than Model 2, and then Model 3 increases the AIC and so we would not have selected it). kward selection Starting with all the possibilities: modelA <- glm(vs ~ wt * disp, data=mtcars, family-binomial) (© 18: 2022 examinations avon Company(31-13: Generalised near models age? The output is: Estimate Std. Error z value Pr(>Iz!) 2.308003 4.169950 0.554 373 4.460010 1.629645 0.864 disp 0.081218 0.035930 -1.147 werdise 0.001733 9.002023 0.216 None of these covariates are significant. ‘The parameter of the interaction term has the highest p-value (0.829), and so is most likely to be We first remove the interaction term wt :/21) (Intercept) 1.60859 2.43903 0.660 0.519 we 2162635 1.49068 1-091, 273 eiep 0.03143 0.01536 -2.241 02s + The AIC has fallen from 29.361 to 27.4. Aternativly, carying outa 7? tetusing anova nodeiA, nodelB, test="cni*) would show that there is no significant difference between the models (p-value of 0.8417) and therefore we are correct to remove the interaction term between we and «ip. The wt term is not significant so removing that: nodelC <- update (nodelB, ~.-wt) coestioiente: Eetimate Std. Error 2 value Pr(>1z!) spt) 4.137827 1.989954 2.978 9.00290 ** 0.021600 0.007121 -3.029 0.00245 «+ Both of these coefficients are significant and the AIC has fallen from 27.4 to 26.696. chim) would Aternatively carrying outa 72 testusing anova node8, modelc, test show that there is no significant difference between the models (p-value of 0.255) and therefore we are correct to remove the wt: covariate. We would stop at this model. If we remove the «isp term (to give the null model), the AIC increases to 45.86. Aternatively, carrying outa 7? test between these two models would show a very significant difference p-alue of es than 0,001) and therefore we should not remove the disp covariate We can see that both forward and backward selection lead to the same model being chosen in this case. he Actuarial Education Company (0 F8:2022 ExaminationsPage as (S15: Generale near models 5.7 _ Estimating the response variable Once we have obtained our model and its estimates, we are then able to calculate the value of the linear predictor, 7, and by using the inverse of the link function we can calculate our estimate of the response variable i - g“"(j). Substituting the estimated parameters into the linear predictor gives the estimated value of the linear predictor for different individuals. The link function links the linear predictor to the mean of the distribution. Hence we can obtain an estimate for the mean of the distribution of ¥ for that individual Let’s now return to the Core Reading example on page 45. ‘Suppose, we wish to estimate the probability of having a V engine for a car with weight 2,100 tbs and displacement 180 cubic inches. Using our linear predictor fy + 6; xdisp (ie vs ~ disp), we obtained estimates of fy = 4.137827 and fy ~-0.021600 These coefficients displayed as part of the summary output of Model C in the example above. Hence, for displacement 180 we have #j = 4.137827 —0.021600180 = 0.24983. We did not specify the link function so we shall use the canonical binomial link function which is the logit function. 20-2498 > f= — gay 0.562 Recall that the mean for a binomial model s the probability. So the probability of having a V engine for a car with weight 2,100 Ibs and displacement 180 cubic inches is 56.2%. The figure 2,100 does not enter the calculation because we removed the weight covariate. BR | Rw can obtain this a8 follows: pred (© FE: 2022 examinations ‘The Acatal Education Company(S112: Generale toesr models Pages 6 Residuals analysis and assessment of model fit ‘Once a possible model has been found it should be checked by looking at the residuals. The residuals are based on the differences between the observed responses, y , and the fitted responses, 1. The fitted responses are obtained by applying the inverse of the link function to the linear predictor with the fitted values of the parameters. We looked at how we could obtain predicted responses values in the previous section. The fitted values are the predicted Y values for the observed data set, x. BR [The r code for ob ing the fitted values of a GLM model) For example, in the actuarial pass rates model detailed on page 6, we could calculate from the model what the pass rate ought to be for students who have attended tutorials, submitted three assignments and scored 60% on the mock exam, The difference between this theoretical pass rate and the actual pass rate observed for students ‘who match the criteria exactly will give us the residuals. Question Draw upa table showing the differences between the actual and expected values of the truancy rates in the example on page 9 Solu in Recall that the expected number of unexplained absences in a year were modelled by: n=a)+Bj+7x where x=age,and @ and ff are as follows: Aye =-2.64 aoe = “326 fp=-354 7 =0.66 114 Py where WC =Within catchment , OC= Outside catchment, M=Male, F=Female This gives expected values of: ‘Age last birthday 8 10 2 14 Within Male 0.46 1.65 5.93 21.33 catchment area Female 0.35 1.25 4.48 16.12 Outside Male 2.05 7.39 26.58 95.58 catchment area Female 155 5.58 20.08 72.24 ‘he Atuatal Education Company Fe: 2022 examinations6.1 ageso (S15: Generale near models So the differences between the actual values (given on page 9) and expected values are: Age last birthday 8 10 2 4 Within Male 1.34 0.35 037 723 catchment area Female os 0.35 0.52 0.08 Outside Male 0.05 oat 1.08 23.58 catchment area Female 1.25 0.62 0.49 4.04 The procedure here is a natural extension of the way we calculated residuals for linear regression models covered in the previous chapter. However, because of the different distributions used, we need to transform these ‘raw’ residuals so we are able to interpret them meaningfully. There are two kinds of residuals: Pearson and deviance. Pearson residuals ‘The Pearson residuals are defined as: y- vary The var) in the denominator refers tothe variance of the response distribution, var¥) using the fitted values, fi, in the formula. For example, since the variance of the exponential distribution is 4, we have var(ji)= 2 in that case. The Pearson residual, which is offen used for normally distributed data, has the disadvantage that its distribution is often skewed for non-normal data. This makes the interpretation of residual plots difficult. The R code for obtaining the Pearson residuals is: residuals (model, type: The Pearson residuals returned by R are calculated slightly differently from the definition given in this section. Therefore, this output won't necessarily match the Pearson residuals calculated from first principles using ear If the data come from 2 normal distribution, then the Pearson residuals will follow the standard normal distribution. By comparing these residuals to a standard normal (eg by using a Q-Q plot), we can determine whether the model is a good fit. (© FE: 2022 examinations ‘The Acatal Education Company(S112: Generale toear models ages However, for non-normal data the Pearson residuals will not follow the standard normal distribution and wor’t even be symmetrical. This makes it difficult to determine whether the model is a good fit. Hence we will need to use a different type of residual. 6.2 Deviance residuals Deviance residuals are defined as the product of the sign of y — ji and the square root of, the contribution of y to the scaled deviance. Thus, the deviance residual is: signly ~ ji)d; where the sealed deviance is Yd? Recall that: stats) =|" ms AL ifx2se(4) (018: 2022 Examinations The Actuaste: sayasian eases Pagers Credible Intervals Having derived the posterior distribution of a parameter @ , there are several ways in which we can summarise inferences about 4 . For single parameters, a plot of the posterior density is very informative and shows clearly the range of values consistent with our posterior beliefs. In Section 5.1 below, the Core Reading considers a numerical example where the posterior distribution is Gamma(15,5.3). A plot of the POF of this distribution is given below: PDF of Gamma(15,5.3) POF 00 01 02 03 04 0S As described earlier, we can also quote quantities such as the posterior mean of a Parameter or the posterior variance. For the Gamma(1S,5.3) distribution pictured above, the mean is 2.83, the variance is 0.534 and the standard deviation is 0.731 For expressing and quantifying uncertainty about the values of @ , a natural analogue of the classical confidence interval is the Bayesian credible interval, In Chapter 8, we saw how to estimate parameters using the method of moments and the method ‘of maximum likelihood. In Chapter 9, we used confidence intervals to express the uncertainty in these estimates. Earlier in this chapter, we estimated a parameter using the mean, mode or median ofits posterior distribution. We will now explain how to express the uncertainty in these estimates. ‘Suppose that, givon data x , we derive the posterior density of @ as (|x). Then, for 0 4.43] x) Question Arandom sample of size 15 from a normal distribution with mean 4: and standard deviation 3 yields the following data values: 10,75 -0.29 5.37 6.68 8.77 1.69 7.12 4.89 645 4.27 9.37 5.68 3.87 7.70 6.98 The prior distribution of 1 is N(S,2") Calculate an equal-tailed 95% Bayesian credible interval for sz based on these data values. You are given that the posterior distribution of 1 is N(5.83,0.722”) . Solu From the Tables, we have P(-1.96 4.23 | x) + 0.025,, although the probabilities do sur to 5%. For unimodal distributions (such as the gamma distribution), the two endpoints of a highest posterior density interval have the same height (ie density}. In the example above: (1.48) = f(4.29)-0.80 The densities of all the values in a higher posterior density interval are larger than the densities of those outside the interval (ie the graph is higher in the interval). So, a higher posterior density interval contains a collection of most likely values of the parameter @ , which isa desirable property. By definition, a higher posterior density interval must contain the mode, ie the posterior estimate for @ under 0-1 loss. (© F€:2022 examinations he Actuarial Education CompanyFor a unimodal distribution, the highest posterior density interval is the shortest interval amongst all Bayesian credible intervals, For symmetrical distributions, such as a normal posterior distribution, the equal-tailed credible interval and highest posterior density interval are identical when based on the same data set. For skewed distributions, such as the gamma and most beta posterior distributions, the highest posterior density interval is not the same as the equal-tailed interval (as we have seen in the example above involving the Gamma(15,5.3) distribution} he Actuarial Edueation Company (© E2022 Eaminationsage 264 (31-44: Bayesian sates The chapter summary starts on the next page so that you can keep all the chapter summaries together for revision purposes.Chapter 14 Summary Bayesian estimation v classical estimation ‘A common problem in statistics is to estimate the value of some unknown parameter 0 ‘The classical approach to this problem is to treat ¢ as a fixed, but unknown, constant and use sample data to estimate its value. For example, if 0 represents some population mean then its value may be estimated by a sample mean. ‘The Bayesian approach is to treat @ asa random variable. Prior distribution ‘The prior distribution of @ represents the knowledge available about the possible values of 6 before the collection of any sample data. Likelihood function likelihood function, L, is then determined, based on a random sample X =(X3, X, The likelihood function is the joint PF (or, inthe ascrete case the joint probability of Ky Karon Xn lO Posterior distribution The prior distribution and the likelihood function are combined to obtain the posterior distribution of ‘When @ is a continuous random variable: Fost (0) Syne When 6 isa discrete random variable, the posterior distribution is a set of conditional probabilities. Conjugate distributions For a given likelihood, ifthe prior distribution leads to a posterior distribution belonging to the same family as the prior, then this prior is called the conjugate prior for ths likelihood. u r distributions formative p If we have no prior knowledge about 6 , a uniform prior distribution should be used. This is, sometimes referred to as an uninformative prior distribution. When the prior distribution is. Uniform, the posterior PDF is proportional to the likelihood function, he Actuarial Edueation Company (© E2022 EaminationsLoss functions ‘loss function, such as quadratic (or squared) error loss, absolute error loss or all or-nothing (0/1) loss gives a measure of the loss incurred when @ is used as an estimator of the true value of # . In other words, it measures the seriousness of an incorrect estimator. Under squared error loss, the mean of the posterior distribution minimises the expected loss function, Under absolute error loss, the median of the posterior distribution minimises the expected loss function. Under all-or-nothing loss, the mode of the posterior distribution minimises the expected loss function. Credible intervals A Bayesian credible interval quantifies uncertainty about the values of parameter 0. A 100(1 2% credible interval is an interval whose posterior probability of containing @ is lea. These can be equal-taled intervals or highest posterior density intervals. The endpoints of an equal-talled 95% credible interval for 0 are the lower and upper 2.5% points of the posterior distribution of @.. Ifthe posterior distrioution isa standard distribution with tabulated values, we can calculate equal-tailed confidence intervals algebraically The densities of all points within a highest posterior density interval are greater than or equal to the densities of all points that lie outside the interval. We can use R to calculate highest posterior density intervals.So the posterior distribution of @ given x is: where: Ean or: FO |xya2%4(1-Z)u (15.3.4) where: a n+ (of 108) ses) Equation (15.3.4) is a credibility estimate of E(@ | x) since it is a weighted average of two estimates: the first, x, Is a maximum likelihood estimate based solely on data from the risk itself, and the second, j. Is the best available estimate if no data were available from the risk itself. Notice that, as for the Poisson/gamma model, the estimate based solely on data from the risk itself is a linear function of the observed data values. ‘There are some further points to be made about the credibility factor, Z , given by (15.3.5) + Its always between zero and one. ‘* tis an increasing function of 1, the amount of data available. ‘* tis an increasing function of 2, the standard deviation of the prior distribution. ‘These features are all exactly what would be expected for a credibility factor. he Actuatil Edveation Company © IF: 2022 Gxaminationeage20 S115: Cray theory Notice also that as af increases, the denominator increases, and so Z decreases. of denotes the variance ofthe distribution of the sample values. If this s large, then the same values are likely to be spread over a wide range, and they will therefore be les reliable for estimation, FR. The R code to obtain the Monte Carlo credibility premiums for the above based on M 35 imulations is: 2 <= af (ntsignal*2/sigma2”2) cp < rep(o,m) in lim) theta < rnorm(1,mu,sigma2) x <= rnorm(n, theta, signal) epli] < Zemean (x) 441-2) sm The average of these credibility estimates is given by: mean (cp) Further remarks on the normal/normal model In Section 3.4 the normal/normal model for the estimation of a pure premium was discussed within the framework of Bayesian statistics. In this section the same model will be considered, without making any different assumptions, but in a slightly different way. The reason for doing this is that some of the observations will be helpful when empirical Bayes credibility theory is considered in the next chapter. In this section, as in Section 3.4, the problem is to estimate the expected aggregate produced each year by a risk. Let: Xp X20 Xa Xmas be random variables representing the aggregate claims in successive years. The following assumptions are made, The distribution of each Xj depends on the value of a fixed, but unknown, parameter, 6. The conditional distribution of Xj given @ is N(@,0%) Given 9 , the random variables {X;} are independent. The prior dstbution of 0 is Nuno) The values of X;,X2,...1Xq have already beon observed and the expected aggregate claims in the coming, fe (n+1)th, year need to be estimated. (FE: 2022 Examinations he Actuarial Education Company(SL-16:Empincl Boyes Credy theory age2t So, depending on the context of the problem, X, represents either: © the aggregate claim amount in Year j per unit of risk volume, or © the total number of claims in Year j_per unit of risk volume, In Model 1, we assume that P; is always equal to 1, je the volume of business is the same for each risk group. Assumptions for EBCT Model 2 ‘The assumptions that specify EBCT Model 2 are as follows. Assumption 7: The distribution of each X; depends on the value of a parameter, 8, whose valu the same for each j but is unknown. Assumption 8: Given @, the X,’s are independent (but not necessarily identically distributed), ‘Assumption 9: E(X, |) does not depend on J ‘Assumption 10: P, var(X,|@) does not depend on j. As in previous sections, @ is known as the risk parameter for the risk, and, as for EBCT Mode! 4, it could be just a single real valued number or a more general quantity such as a vector of real valued numbers. Assumption 7 is the standard assumption for all credibility models considered here. Assumption 8 corresponds to Assumption 2 in EBCT Model 1, but notice that Assumption 8 is slightly weaker than Assumption 2. Assumption 8 does not require the X,’s to be conditionally (given @) identically distributed, but only to be nally independent. There is no assumption in EBCT Model 2 that the X;;'s are unconditionally, or conditionally given 9 , identically distributed. condi IFall the P;'s are equal to 1, then Assumptions 7-10, taken together, become the same as ‘Assumptions 4, 5 and 6 (taken together) in EBCT Model 1. Thus, ifall the P;’s are equal to 1, EBCT Model 2 is exactly the same as EBCT Model 1 Having made Assumptions 9 and 10, m(6) and s*(0) can be defined as follows: m(0) ‘The definition of m(@) corresponds exactly to the definition for EBCT Model 1 in Section 1 but the definition of s(6) is slightly different. andso In Model 2, there is a factor of P; in the definition of s2(6). In Model 1, F; (6)=vartx;10)Page22 (S148: Empial Bayes Creibity seory To gain alittle more insight into Assumptions 9 and 10, consider the following example. ‘Suppose the risk being considered is made up of a different number of independent policies each year and that the number of policies in Year j is P; Itis important to realise that P; is @ known quantity, not a random variable. Suppose also thatthe aggregate claims ina singla year from a single policy have m ‘m(@) and variance 52(@), where m() and s?() are functions of 8, and @ is the fixed, but tinknown, risk parametor forall these policies. Now Ist ¥; denote the ayoregate claims from all the policies in force in Year j. Then ElY;|0) isthe expected aggregate claim amount from all policies in year j, and: Ely; 16)= ¥° expected aggregate claim amount for policy k = )° m8) 4 fot So E(y) |) =P; m(0) Also, since the polices are assumed to be independent: 4 ®, varl¥; 6) = > variance of aggregate claim amount for policy k= )” s?(0) i is var(Y; 16) =P; *(6) ¥, Then, since Xj =" 3 i 1 0x) 10)= Let e)= me) and varlx,10)= So: E(X;16)=m(o) and P;var(xX; 16) =5%(0)

cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Caie A2 Level Further Maths 9231 Further Statistics 2 v1
No ratings yet
Caie A2 Level Further Maths 9231 Further Statistics 2 v1
16 pages
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
75% (4)
A Brief Course in Mathematical Statistics 1st Edition Tanis Hogg Solution Manual
8 pages
STAT 330 Supplementary Notes
No ratings yet
STAT 330 Supplementary Notes
134 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
ECE286 Final Exam Aid Sheet
No ratings yet
ECE286 Final Exam Aid Sheet
4 pages
Basic Probability and Statistics: Random Variables Distribution Functions Various Probability Distributions
No ratings yet
Basic Probability and Statistics: Random Variables Distribution Functions Various Probability Distributions
39 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Stat 130n Answers To The LAs in Lessons 3.1-3.3
No ratings yet
Stat 130n Answers To The LAs in Lessons 3.1-3.3
18 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
College Statistics
No ratings yet
College Statistics
244 pages
Mathematical Expectation and Others
No ratings yet
Mathematical Expectation and Others
17 pages
Formula
No ratings yet
Formula
7 pages
Formula PDF
No ratings yet
Formula PDF
7 pages
Formula Sheet Math236
No ratings yet
Formula Sheet Math236
2 pages
1973 - Faulkenberry - A Method of Obtaining Prediction Intervals
No ratings yet
1973 - Faulkenberry - A Method of Obtaining Prediction Intervals
4 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Module03 Slides Print
No ratings yet
Module03 Slides Print
82 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Stats Exam Formulae
No ratings yet
Stats Exam Formulae
6 pages
Probability - Statistics - Class Notes
No ratings yet
Probability - Statistics - Class Notes
15 pages
Formula Sheet Final 240124 091728
No ratings yet
Formula Sheet Final 240124 091728
7 pages
Formulasheetensvnew
No ratings yet
Formulasheetensvnew
15 pages
Confidence Interval
100% (1)
Confidence Interval
19 pages
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
No ratings yet
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
11 pages
Final Soln
No ratings yet
Final Soln
5 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Week 9+10+11
No ratings yet
Week 9+10+11
82 pages
Confidence Intervals Continued: Statistics 512 Notes 4
No ratings yet
Confidence Intervals Continued: Statistics 512 Notes 4
8 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
323 Egec
No ratings yet
323 Egec
18 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Lecture11 571 PDF
No ratings yet
Lecture11 571 PDF
13 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Par Est
No ratings yet
Par Est
36 pages
MA1201 Probability Notes
No ratings yet
MA1201 Probability Notes
30 pages
ETS Ntegrals ONT Ounting Echniques: S I C - C T
No ratings yet
ETS Ntegrals ONT Ounting Echniques: S I C - C T
6 pages
Examp Formula Sheets
No ratings yet
Examp Formula Sheets
6 pages
Central Limit Theorem and Confidence Interval Notes
No ratings yet
Central Limit Theorem and Confidence Interval Notes
11 pages
ETS Ntegrals ONT Ounting Echniques: S I C - C T
No ratings yet
ETS Ntegrals ONT Ounting Echniques: S I C - C T
6 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
CH 4
No ratings yet
CH 4
32 pages
12 UnknownProportions
No ratings yet
12 UnknownProportions
37 pages
Suresh Kumar 5-9 Chap Notes
No ratings yet
Suresh Kumar 5-9 Chap Notes
24 pages
Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
Formula List Statistics 2
No ratings yet
Formula List Statistics 2
4 pages
Statistical Inference: Confidence Intervals
No ratings yet
Statistical Inference: Confidence Intervals
22 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Internal Paper
No ratings yet
Internal Paper
20 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
15 pages
Stat1 Formulas and Tables For Statistics 2022
No ratings yet
Stat1 Formulas and Tables For Statistics 2022
34 pages
Tutorial 6 So LN
No ratings yet
Tutorial 6 So LN
12 pages
11.estimation IV
No ratings yet
11.estimation IV
62 pages

CS1 2021 Upgrade

Uploaded by

CS1 2021 Upgrade

Uploaded by

You might also like