QT 1 - Group 5 - R Assignment 1
QT 1 - Group 5 - R Assignment 1
Group 5
1
Task 1
2
Result:
Group Size E(X) of Tests
1 10000.000
2 5199.000
3 3630.980
4 2894.040
5 2490.100
6 2254.964
7 2111.075
8 2022.553
9 1976.741
10 1956.179
11 1956.513
12 1972.697
13 1996.422
14 2030.017
15 2074.017
16 2110.422
17 2161.940
18 2218.208
19 2269.271
20 2320.931
3
The above data can be better understood by looking at the graph attached below:
Task 2
4
The function (“optimal_pool_size_row_pool”) iterates over all possible
pool sizes and returns the one that minimizes the expected number of
tests.
It has a while loop running from 2 to 10000 and every time an expected
test count is found to be lower than the pre-stored optimal tests, we
update the optimal tests with the new minimum and the optimal value
with the index of the minimum expected test count.
Results:
Task 3
CALCULATING THE CUT-OFF VALUE OF ‘p’ FOR ANY GIVEN VALUE OF ‘n’
Number of tests when the group has at least one positive = n+1
At the cut-off value of p, the expected number of tests for a group shall
be greater than or equal to the number of individuals in the group.
5
A/c to the previous statement, E>=n
=> n+1-n*(1-p)n>=n
=> n+1-(n*(1-p)n)-n>=n-n
=> 1>=n*(1-p)n
=> (1/n)(1/n)>=1-p
We use the above formula to find the cut-off value of p for a given value
of n.
Task 4
6
We take n*n samples from N samples at a time. Each of those n*n samples is
referred to as a group.
Here N=10000 and p=0.01, the prevalence rate of the disease.
Number of groups(NG) = floor(N/n2)
Number of individual tests(N_IND) = Individual samples which are not a part of
any group
All samples of a row are tested together. Similarly, all samples of a column are
tested together. If a row and a column are tested positive, then their intersection
sample is tested again.
FOR A GROUP :
P(row negative) = (1-p)n
P(column negative) = (1-p)n
P(both row and column negative) = (1-p)(2*n-1)
P(row or column negative) =2*(1-p)n - (1-p)(2*n-1)
P(row and column positive) =1-P(row or column negative)
Number(row and column test) = 2*n
E(number of intersection tests) = n*n*P(row and column positive)
E(number of tests per group) = 2*n + n*n*P(row and column positive)
FOR THE WHOLE SAMPLE
E(number of tests) = E(number of tests per group)*NG + N_INDV
We iterate over a loop, where n varies from 1 to 30 and we compute the expected
number of tests for each value of n. The value of n which gives the minimum result
is taken as our best pool size.
7
Results
Group Size E(X) of Tests
1 10000
2 10100.99
3 6770.91
4 5108.733
5 4115.371
6 3475.433
7 2993.85
8 2657.457
9 2409.498
10 2174.045
11 2071.028
12 1927.111
13 1790.133
14 1680.411
15 1687.848
16 1557.408
17 1642.901
18 1694.564
19 1640.729
20 1399.152
21 1637.501
22 1643.745
23 1772.168
24 1534.84
25 1354.745
26 1821.143
27 1815.904
28 1884.139
29 2030.509
30 1485.499
8
Also attaching the graph for better understanding:
Task 5
● The goal is to find the optimal square size for a given prevalence rate p.
● The function “optimal_pool_size_cross_pool” iterates over possible
square sizes and returns the one that minimizes the expected number of
tests.
● It builds on the previous task, with the only change that, instead of
finding the optimum value in the range of 1 to 30, in this task, we find
the optimum value in the range of 1 to 100 (floor value of square root
of N).
9
Results:
Task 6
From this table, we observe that the Acbott Test is conducted on 2000 Individuals
and the truth table looks as follows.
1000 1000
990 950
PROBABILITY_TRUE_POSITIVE PROBABILITY_TRUE_NEGATIVE
p1=0.99 p3=0.95
PROBABILTY_FALSE_NEGATIVE PROBABILITY_FALSE_POSITIVE
p2=0.01 p4=0.05
We have assumed that whenever a row tests positive, we re-test all the
samples of the row.
With this information, we approach the problem in a way similar to the first
task. Every time a pool of samples test positive, all the individuals of the pool
10
would be retested individually.
E(number of tests per group)= (1*(1-p_pos)n + (n+1)*(1-(1-p_pos)n))
11