Assignment Booklet - Jan.-Dec. 2024 (PGDAST)
Assignment Booklet - Jan.-Dec. 2024 (PGDAST)
MST-001 to MSTL-003
(Valid from 1st January 2024 to 31st December, 2024)
School of Sciences
Indira Gandhi National Open University
Maidan Garhi, New Delhi-110068
(2024)
Dear Student,
Please read the information on assignments in the Programme Guide that we have sent you after
your enrolment. A weightage of 30%, as you are aware, has been earmarked for continuous
evaluation, which would consist of one tutor-marked assignment for this course. The
assignments for MST-001 to MSTE-004 have been given in this booklet.
Instructions for Formatting Your Assignments
Before attempting the assignment, please read the following instructions carefully:
1) On top of the first page of your answer sheet, please write the details exactly in the following
format:
ENROLLMENT NO :……………………………………………
NAME :……………………………………………
ADDRESS :……………………………………………
……………………………………………
……………………………………………
PROGRAMME CODE: ………………………..
COURSE CODE: ……………………………….
COURSE TITLE: ………………………………
STUDY CENTRE: ………………………..……. DATE: ……………….………………...
2) Use only foolscap size writing paper (but not of very thin variety) for writing your answers.
3) Leave 4 cm margin on the left, top and bottom of your answer sheet.
We strongly suggest that you should retain a copy of your answer sheets.
6) This assignment is valid from January 1st , 2024 up to December 31, 2024.
7) The latest assignments should be submitted by the candidate.
8) You cannot fill the Exam Form for this course till you have submitted this assignment. So
solve it and submit it to your study centre at the earliest. If you wish to appear in the
TEE, June 2024, you should submit your TMAs by March 31, 2024. Similarly, If you wish
to appear in the TEE, December 2024, you should submit your TMAs by September 30,
2024.
We wish you good luck.
TUTOR MARKED ASSIGNMENT
MST-001: Foundation in Mathematics and Statistics
Course Code: MST-001
Assignment Code: MST-001/TMA/2024
Maximum Marks: 100
Note: All questions are compulsory. Answer in your own words.
1. State whether the following statements are True or False and also give the reason in support
of your answer. (2×5=10)
a) Collection of rich persons in India forms a set.
b) Following rule is a function from A to B.
f
A B
x1 y1
x2 y2
x3
d
( 9 − 7x ) = 45 (9 − 7x )
5 4
c)
dx
d) In exclusive method, upper limit of a class is included in the same class.
2 5 6
e) The order of the matrix is 3 2.
4 3 1
2. If four cards are chosen from a pack of 52 playing cards then find the number of ways that
all four cards are:
a) of same suit
b) red
c) face cards
d) king
e) of different suit (2 + 2 + 2 + 2 +2)
3. Arrange the numbers 49, 36, 42, 19, 22, 27, 14, 13, 24, 48, 23, 28, 17, 42, 39, 45, 22, 24, 17,
41, 18, 42, 38, 43, 11, 27, 36, 13, 40, 30, 24, 10, 18, 47, 18, 19, 23, 12, 27 in stretched stem-
and-leaf display that has single-digit starting parts and leaves, but has stem width of 5. (10)
3. a) The frequency distribution of the marks obtained by the 25 students each of the two
sections is given as follows:
Marks: 10-20 20-30 30-40 40-50 50-60
Section A: 2 5 10 5 3
Section B: 3 7 8 5 2
Find which section is more consistent.
b) Mean and Standard deviation of 18 observations are found to be 7 and 4, respectively. On
comparing the original data, it was found that an observation 12 was miscopied as 21 in
the calculations. Calculate correct mean and standard deviation. (7+3)
4. The equations of two regression lines are given as follows:
4x – 5y + 30 = 0
20x−9y – 107 = 0
Calculate (i) regression coefficients, byx and bxy; (ii) correlation coefficient r(x, y);
(iii) Mean of X and Y; and (iv) the value of σy if σx = 3. (10)
5. A researcher collects the following information for two variables x and y:
n = 20, r = 0.5, mean (x) = 15, mean (y) = 20, σx = 4 and σy = 5
Later it was found that one pair of values (x, y) has been wrongly taken as (16, 30)
whereas the correct values were (26, 35). Find the correct value of r(x, y). (10)
6. a) If a, b, c, d are constants, then show that the coefficient of correlation between ax+b and
cy+d is numerically equal to that between x and y.
b) A statistician wanted to compare two methods A and B of teaching. He selected a random
sample of 22 students. He grouped them into 11 pairs so that the students in pair have
approximately equal scores on an intelligence test. In each pair one student was taught by
method A and the other by method B and examined after the course. The marks obtained
by both methods are given as:
Methods 1 2 3 4 5 6 7 8 9 10 11
Method A 24 29 19 14 30 19 27 30 20 28 11
Method B 37 35 16 26 23 27 19 20 16 11 21
Find the rank correlation coefficient. (3+7)
7. a) Fit an exponential curve of the form Y = abX to the following data:
X: 1 2 3 4 5 6 7 8
Y: 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
b) Calculate the first, second and third quartile for the following data:
Class: Below 30 30-40 40-50 50-60 60-70 70-80 80 and above
Frequency: 69 167 207 65 58 27 10
Also find the quartile deviation and coefficient of quartile deviation. (10+10)
8. a) Board of Directors of Labour Union wishes to sample the opinion of its members before
submitting a change in its contribution at a forthcoming annual meeting. Questionnaires
are sent to a random sample of 200 members in three union locals. The results of the
survey are as follows:
Union Locals
Reaction A B C Total
Favour Change 35 45 20 100
Against Change 15 25 16 56
No Response 10 10 24 44
Total 60 80 60 200
Determine the amount of association between the Union locals and their reactions using
coefficient of contingency and interpret the result.
b) 600 candidates were appeared in an examination. The boys outnumbered girls by 15% of
all candidates. Number of passed exceeded the number of failed candidates by 300. Boys
failing in the examination numbered 80. Determine the coefficient of association. (12+8)
TUTOR MARKED ASSIGNMENT
MST-003: Probability Theory
Course Code: MST-003
Assignment Code: MST-003/TMA/2024
Maximum Marks: 100
Note: All questions are compulsory. Answer in your own words.
1. Which of the following statements are true or false? Give reason in support of your answer.
(2×5 = 10)
a) When two dice are thrown simultaneously then total number of sample points in the
sample space will be 12.
x
b) Expected value of a continuous random variable X is defined as E(X) = x f (x)dx .
−
2. An insurance company selected 6000 drivers from a city at random in order to find a
relationship between age and accidents. The following table shows the results to these 6000
drivers.
Age of drivers (in years) Accidents in one year
Class Interval 0 1 2 3 4 or more
18 – 25 700 310 225 110 85
25 – 40 1100 290 200 105 80
40 – 50 1200 235 175 80 55
50 and above 600 205 140 70 35
If a driver from the city is selected at random, find the probability of the following events:
a) Age lying between 18 – 25 and meet 3 accidents
b) Age lying between 18 – 40 and meet 1 accident
c) Age more than 25 years and meet at most one accident
d) Having no accident in the year
e) Age lying between 18 – 40 and meet at least 3 accidents
(2 + 2 + 2 + 2 +2)
3. Determine the constant k such that the function f (x) = kx (1 − x ) , 0 x 1 is a beta
2 5
distribution of first kind. Also, find its mean and variance. (10)
4. An insurance company insured 2000 scooter drivers, 3000 car drivers and 5000 truck
drivers. The probabilities that scooter, car and truck drivers meet an accident are 0.02, 0.04,
and 0.25 respectively. One of the insured persons meets with an accident. What is the
probability that he is a
a) Scooter driver
b) Car driver (5 + 5)
5. The following table represents the joint probability distribution of the discrete random
variable (X, Y):
Y 1 2 3
X
1 0.2 0.2 0.1
2 0.1 0.3 0.1
Find
a) The marginal distributions.
b) The conditional distribution of Y given X = 2 (5 + 5)
6. a) A rain coat dealer can earn Rs 800 per day during a rainy day. If it is a dry day, he can
lose Rs 150 per day. What is his expectation, if the probability of rain is 0.6?
b) A player tosses two unbiased coins. He wins Rs. 10 if 2 heads appear, Rs. 5 if one head
appears and Rs 1 if no head appears. Find the expected value of the amount won by
him. (5 + 5)
7. a) (i) Let X and Y be two independent random variables such that X ~ B(5, 0.06) and
Y ~ B(4, 0.6) . Find P X + Y 1
(ii) Comment on the statement: “The mean of a binomial distribution is 4 and variance
5”.
b) If the probability that an individual suffers a bad reaction from an injection of a given
serum is 0.002, determine the probability that out of 400 individuals
(i) exactly 2
(ii) more than 3
(iii) at least one
individuals suffer from bad reaction. (10 + 10)
8. a) A die is rolled. If the outcome is a number greater than 2, what is the probability that it
is an odd prime number?
b) A person is known to hit the target in 3 out of 4 shots whereas another person is known
to hit 2 out of 5 shots. Find the probability that the target being hit when they both try.
c) Events A, B, C are mutually exclusive and exhaustive. If odds against A are 4:1 and
against B are 3: 2 . Find the odds against event C. (7 + 7 + 6)
TUTOR MARKED ASSIGNMENT
MST-004: Statistical Inference
Course Code: MST-004
Assignment Code: MST-004/TMA/2024
Maximum Marks: 100
2. a) A random sample of nine college students yielded the following data concerning the
number of hours per day each student spent in using mobile phone:
5, 2, 7, 5.5, 3.5, 4, 5, 4.5, 4
Estimate the average number of hours per day spent in using mobile phone by the
college students.
b) If the sample values are 3, 5, 2, 7, and 0 then obtain the ML estimate for parameter θ
for the following distribution :
f ( x, ) = e −x ; 0 x, 0 (5+5)
3. A sample of 100 tyres is taken from a lot. The mean life of the tyres selected is the sample
is found to be 40,000 kms with a standard deviation of 3200 kms. Is it reasonable to
suppose the mean life of tyres in the lot as 41,000 kms at 5% level of significance? Also
establish 95% confidence limits within which the mean life of tyres in the lot is expected to
lie. (10)
4. The blood cholesterol levels of a population of workers have mean 202 mg/dl and standard
deviation 14 mg/dl. If a sample of 36 workers is selected from the population and sample
mean is calculated then find
i) mean and standard error of the sampling distribution of the mean.
ii) approximate the probability that the sample mean of their blood cholesterol levels will
lie between 198 mg/dl and 206 mg/dl. (5+5)
5. The following data relate to the number of items produced per shift followed normal
distribution by two workers Rahul and Ramesh for a number of days:
Rahul 19 22 24 27 24 18 20 19 25
Ramesh 26 37 40 35 30 40 26 30 35 45
Can it be inferred that Rahul is more stable worker compared to Ramesh by testing the
variation in the item produced by them at 5% level of significance. (10)
6 a) In a city, 36 out of a random sample of 500 men were found to drinkers at a certain date.
After the heavy increase in tax on intoxicants, another sample of 100 men in the same
city included 6 drinkers. Do you feel that the observed proportion of drinkers decreasing
significantly at 1%level?
b) In a locality, 100 persons were randomly selected and asked about their educational
achievements. The results were as follows:
(20)
8. A company is trying to improve the work efficiency of its employees. It has organized a
special training programme for all employees. In order to assess the effectiveness of the
training programme, the company has selected 10 employees randomly and administered a
well-structured questionnaire. The scores (out of 100) obtained by the employees are given
in the following table:
S. No Before Training After Training
1 60 68
2 62 70
3 67 80
4 64 74
5 66 66
6 63 72
7 69 84
8 63 60
9 60 65
10 62 90
To examine whether the training programme has improved efficiency of the employees,
give the answer of the following:
i) Are both samples are paired or independent?
ii) Formulate the null and alternative hypotheses.
iii) Which parametric test is used for testing the null hypothesis if it is known that the
scores of the employees before and after the training programme follow the normal
distribution? Conduct the test at 1% level of significance and conclude the result.
iv) Which non-parametric test is used for testing the null hypothesis if it is known that the
scores of the employees before and after the training programme do not follow the
normal distribution but the distribution of the differences of scores before and after the
training is symmetrical about its median? Conduct the test at 1% level of significance
and conclude the result.
(2+2+8+8)
TUTOR MARKED ASSIGNMENT
MST-005: Statistical Techniques
Course Code: MST-005
Assignment Code: MST-005/TMA/2024
Maximum Marks: 100
Note: All questions are compulsory. Answer in your own words.
1. State whether the following statements are true or false and also give the reason in support
of your answer. (2×5=10)
c) In SRSWOR, the possible numbers of sample of size n from a population of size N if
sampling is done with replacement is Nn.
d) One-way analysis of variance is a generalization of the two sample t-test.
e) If experimental error is reduced considerably and the efficiency of the design is
decreased.
f) If strata are heterogeneous then stratified sampling schemes provides estimates with
greater precision.
g) If one wants to convert random numbers selected from two digit numbers 00-99 to
uniformly distributed U (0, 1) variables then one has to divide them by 99.
2. Assume that you have to perform a sample survey for Family expenditure of the faculty of
Indira Gandhi National Open University. Then explain the main steps involved in the planning
and execution of that sample survey. (10)
3 a) In a class of Statistics, total number of students is 30. Select the linear and circulur
systematic random samples of 10 students. The age of 30 students is given below:
Age: 22 25 22 21 22 25 24 23 22 21 20 21
22 23 25 23 24 22 24 24 21 20 23 21 22
20 20 21 22 25 (5)
b) To determine the yield rate of wheat in a district of Punjab, 6 groups were constructed of 6
plots each. The data is given in the following table:
Plot No. Group 1 Group 2 Group 3 Group 4 Group 5 Group 6
1 8 6 18 13 17 12
2 13 5 8 7 15 15
3 11 16 6 13 10 11
4 26 5 10 6 21 17
5 13 16 16 7 20 8
6 31 5 20 2 25 10
Select a cluster sample of 3 clusters from the given data and find sample mean. (5)
4. Three varieties A, B and C of wheat are shown in five plots each of the following fields per
acre as obtained:
Plots A B C
1 8 7 12
2 10 5 9
3 7 10 13
4 14 9 12
5 11 9 14
Set up a table of analysis of variance and find out whether there is significant difference
between the fields of these varieties. (10)
5. An experiment was planned to study the effect of Sulphate, Potash and Super Phosphate on
the yield of potatoes. All the combinations of 2 levels of Super Phosphate [0 cent (p0) and 5
cent (p1)/ acre] and two levels of Sulphate and Potash [0 cent (k0) and 5 cent (k1)/acre]
were studied in a randomised block design with 4 replications each. The (1/70) yields [lb
per plot = (1/70) acre] obtained are given in table below:
Blocks Yields (lbs per plot)
I (1) k p kp
23 25 22 38
II p (1) k kp
40 26 36 38
III (1) k pk p
29 20 30 20
IV kp k p (1)
34 31 24 28
Analyse the data and give your conclusions. (10)
6. By generating 10 uniform random variate U (0, 1) estimate the integral
2
1
e
− x2 2
= dx
2 −1
^
Recognizing this function as probability density function of N (0, 1), compare the value of
with . (10)
7. A sample of 100 villagers is to be drawn from a population of villages A and B. The
population means and population mean squares of their monthly wages are given below:
Village Ni Xi S i2
Collage A 400 60 20
Collage B 200 120 80
Find the number of samples using Proportional and Neyman allocation techniques and
compare. Obtain the sample mean and variances for the Proportional Allocation and
SRSWOR for the given information. Then Find the percentage gain in precision of
variances of sample mean under the proportional allocation over that of SRSWOR.
(20)
8. A manufacturer wishes to determine the effectiveness of four types of machines (A, B, C
and D) in the production of bolts. To accumulate this, the numbers of defective bolts
produced for each of two shifts in the results are shown in the following table:
First shift Second Shift
Machine
M T W Th F M T W Th F
A 6 4 5 5 4 5 7 4 6 8
B 10 8 7 7 9 7 9 12 8 8
C 7 5 6 5 9 9 7 5 4 6
D 8 4 6 5 5 5 7 9 7 10
1. State whether the following statements are True or False. Give reason in support of your
answer. (2×5=10)
a) If the average number of defects in an item is 4, the upper control limit of the c-chart will be
12.
b) The specification limits and natural tolerance limits are same in statistical quality control.
c) If the probability of making a decision about acceptance or rejection of a lot on the first
sample is 0.80 and the sizes of the first and second samples are 10 and 15, respectively, then
the average sample number for the double sampling plan will be 25.
d) Two independent components of a system are connected in series configuration. If the
reliabilities of these components are 0.1 and 0.30, respectively then the reliability of the
system will be 0.65.
e) A point in the pictorial representation of a decision tree having states of nature as immediate
sub-branches is known as decision point.
3. A shirt manufacturing company supplies shirts in lots of size 250 to the buyer. A single
sampling plan with n = 20 and c = 1 is being used for the lot inspection. The company and
the buyer decide that AQL = 0.04 and LTPD = 0.10. If there are 15 defective in each lot,
compute the
i) probability of accepting the lot. (2)
ii) producer’s risk and consumer’s risk. (4)
iii) average outgoing quality (AOQ), if the rejected lots are screened and all defective shirts
are replaced by non-defectives. (2)
iv) average total inspection (ATI). (2)
Calculate, the
5. Solve the two-person zero-sum game having the following payoff matrix for player A: (10)
Player B
B1 B2 B3 B4 B5
A1 3 4 5 –2 3
Player A
A2 1 6 –3 3 7
6. The system shown below is made up of ten components. Components 3, 4 and 5 are not
identical and at least one component of this group must be available for system success.
Components 8, 9 and 10 are identical and for this particular group it is necessary that two out
of the three components functions.
What is the system reliability if R 1 = R 3 = R 5 = R 7 = R 9 = 0.85 and
R 2 = R 4 = R 6 = R 8 = R 10 = 0.95 (10)
7. A small electronic device is designed to emit a timing signal of 200 milliseconds (ms)
duration. In the production of this device, 10 subgroups of four units are taken at periodic
intervals and tested. The results are shown in the following table:
Subgroup Duration of Automatic Signal (in ms)
Number
a b c d
1 195 201 194 201
2 204 190 199 195
3 195 197 205 201
4 211 198 193 180
5 204 193 197 200
6 200 202 195 200
7 196 198 197 196
8 201 197 206 207
9 200 202 204 192
10 203 201 209 192
Estimate, the
i) reliability. (5)
ii) cumulative failure distribution. (5)
iii) failure density. (5)
iv) failure rate functions. (5)
TUTOR MARKED ASSIGNMENT
How should the tasks be allocated, one to a subordinate, so as to minimise the total man
hour? (10)
5. a) Use graphical method to minimise the time added to process the following jobs on the
machines shown:
Job 1: Sequence A B C D E
Time 3 4 2 6 2
Job 2: Sequence B C A D E
Time 5 4 3 2 6
Calculate the total time elapsed to complete both the jobs. (4)
b) The following data comprising the number of customers (in hundred) and monthly sales
(in thousand Rupees):
Number of 4 6 6 8 10 14 18 20 22 26 28 30
Customers (in
hundred)
Monthly Sales 1.8 3.5 5.8 7.8 8.7 9.8 10.7 11.5 12.9 13.6 14.2 15
(in thousand Rs)
Calculate the residuals and determine the standardised residuals for the model
Y = 2.6185 + 0.4369 X (6)
6. a) A Statistician collected the data of 78 values with two independent variable X1 and X2,
and considered the four models:
(i) Y = B0 + e; (ii) Y = B0 + B1 X1 + e; (iii) Y = B0 + B1 X1 + e and
(iv) Y = B0 + B1 X1 + B2 X2 + e.
The results obtained are: ˆ 2 0.91 , SS B0 652.42, SS B0 , B1 679.34,
SS B0 , B2 654.00, and SS B0 , B1 , B2 687.79 . Find the additional contribution
of (i) X2 over X1 and (ii) X1 over X2. Test whether their inclusion in the model is
justified. (5)
b) Fifteen successive observations on a stationary time series are as follows:
34, 24 23 31 38 34 35 31 29 28 25 27
32 33 30
Calculate r6, r7, r8 and r9 and plot the correlogram. (5)
7. Calculate seasonal indices by the ratio to moving average method from the following data:
Year 2001 2002 2003 2004
Quarter
Q1 750 860 900 1000
Q2 600 650 720 780
Q3 540 630 660 720
Q4 590 800 850 930
(20)
8. Consider the following Transportation problem:
Factory Godowns Stock
1 2 3 4 5 6 Available
A 7 5 7 7 5 3 60
B 9 11 6 11 - 5 20
C 11 10 6 2 2 8 90
D 9 10 9 6 9 12 50
Demand 60 20 40 20 40 40
3. State whether the following statements are True or False. Give reason in support of your
answer: (2×5=10)
(a) Suppose A is the exposure and B is a confounding factor for outcome C, then there will
be a path from A to C via B.
(b) Doing exercise may also be a regimen.
(c) In clinical trials, a control only may be: treatment or no treatment.
(d) In Greville’s method, the central death rate is more in the life table than the population.
(e) In a slope ratio assay, both regression lines have common slope.
(b) The data on population and number of deaths for different age groups of Districts A and
B in the year 2001 were collected in the following table:
District A District B
Age Group
(Years) Population No. of Deaths Population No. of Deaths
10 15 20 5 10 15
Breed
(in μL) (in μL) (in μL) (in μL) (in μL) (in μL)
1 25 42 55 20 43 64
2 23 47 52 23 42 66
3 22 38 58 24 44 67
5.(a) If D+ and D ̶ denote presence and absence, respectively, of a disease and T+ and T ̶
denote test result as positive and negative, respectively, then on the basis of the following
information:
D+ D̶ Total
+
T 145 2000 2145
T̶ 15 48000 48015
Total 160 50000 50160
Find: (i) sensitivity (ii) specificity (iii) positive and negative predictive values.
6.(a) Explain design and analysis of data of case control study in detail.
(b) Creatinine excretion is a parameter of kidney function. Generally speaking, lower values
indicate better health. This depends on body weight. A researcher conducted a study on
creatinine excretion in test group and control group to find the efficacy of a new drug.
The subjects were randomly divided. He included 100 subjects in each group but for this
exercise consider only 10 subjects in each group. The data obtained on creatinine level in
these 10 subjects are as follows:
Test group: 16.6 19.8 17.1 17.0 15.6 20.3 24.7 18.5 17.6 22.0
Control group: 23.2 22.0 21.9 16.7 14.2 23.2 24.8 25.5 28.1 21.8
Do you think that creatinine excretion was really lower in the test group on average?
(15+5)
7. Suppose you try a regimen A on 1000 subjects and regimen B on 1600 subjects. Results of
the trial show that efficacies of regimen A and B are 76% and 82% respectively. Suppose
doctor determines 4% as superiority margin. Can you consider regimen B as superior to
regimen A.
(10)
TUTOR MARKED ASSIGNMENT
MSTE-004: Biostatistics-II
Course Code: MSTE-004
Assignment Code: MSTE-004/TMA/2024
Maximum Marks: 100
Note: All questions are compulsory. Answer in your own words.
1. State whether the following statements are True or False. Give reason in support of your
answer: (2×5=10)
(a) The value of sensitivity of the following results of a diagnostic test is 0.85.
Test result
Disease Total
+ –
Present 170 30 200
Absent 20 280 300
(b) For the following cohort study, the relative risk for the lung cancer among smokers is 3.5.
Lung No Lung
Total
Cancer Cancer
Smokers 100 1220 1320
Non-smokers 50 2260 2310
(d) We define three indicator/dummy variables for a regressor variable with three
categories.
(e) Left censoring occurs whenever the exact time of occurrence of an event is not
known.
3. A random sample of 250 patients was selected and their workout timing and diabetes status
were recorded. The following table shows the workout timing and severity of diabetes:
0 −15 06 27 19
15 to 30 08 36 17
30 to 45 21 45 33
≥ 45 14 18 06
Test at 5% level of significance whether workout habit and diabetes are associated with
to each other or not.
(10)
4.(a) Explain the assumptions underlying multiple linear regression model.
(b) Suppose a researcher wants to evaluate the effect of cholesterol on the blood pressure.
The following data on serum cholesterol (in mg/dL) and systolic blood pressure (in
mm/Hg) were obtained for 15 patients to explore the relationship between cholesterol and
blood pressure:
1 300 150
2 410 270
3 380 210
4 530 310
5 570 350
6 490 310
7 340 210
8 320 150
9 280 110
10 550 320
11 340 220
12 350 170
13 410 260
14 390 230
15 450 270
(i) Fit a linear regression model using the method of least squares.
(ii) Construct the normal probability plot for the data on serum cholesterol and systolic
blood pressure.
(iii) Test the significance of the fitted regression model.
(5+15)
4. Write a short note on the following:
(i) Polytomous logistic models
(ii) Poisson regression
(iii) Kaplan and Meier method
(12)
6. The following data on diagnosis of coronary heart disease (where 0 indicating absence
and 1 indicating presence), serum cholesterol (in mg/dl), resting blood pressure (in
mmHg) and weight (in kg) were obtained for 80 patients to explore the relationship of
coronary heart disease with cholesterol and weight.
Serum Number of
S. Cholesterol Weight Total Number
Patients
No. (kg) of Patients
(mg/dl) having CHD
1 420 60 10 20
2 450 68 15 30
3 400 54 4 15
4 510 74 2 10
5 480 62 1 5
(i) Fit a multiple logistic model for the dependence of coronary heart disease on the average
serum cholesterol and weight considering βˆ 00 = 4.279, βˆ 10 = −0.035 and βˆ 02 = 0.172 as
the initial values of the parameters (solve only for one Iteration).
(ii) Test the significance of the fitted model using Hosmer-Lemeshow test at 5% level of
significance.
(12+8)
7.(a) Describe censoring and differentiate between different types of censoring with the help of
examples which are not considered in Block 4 of MSTE-004.
(b) A study was conducted on 185 patients aged more than 45 years which are followed until
the time of death or up to 10 years, whichever comes first. The patients have different
covariates: age, gender (male/female), systolic blood pressure, smoking (yes/no), total
serum cholesterol and diabetes (yes/no). The objective of this study is to determine which
covariate influences the survival time. An analysis is conducted to investigate differences
in all-cause mortality between men and women participating in the study. Suppose we
obtain the following results after applying the Cox regression hazard model analyses:
Risk Factor Parameter Estimate SE