0% found this document useful (0 votes)
6 views15 pages

Asignment

The document provides statistical calculations including mean, median, mode, midrange, quartiles, variance, and standard deviation based on a dataset. It also discusses probabilities related to a classification problem and analyzes the results for a test sample. Additionally, it outlines the candidate generation and pruning steps of the Apriori algorithm for itemset mining.

Uploaded by

dohoangtruonghuy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views15 pages

Asignment

The document provides statistical calculations including mean, median, mode, midrange, quartiles, variance, and standard deviation based on a dataset. It also discusses probabilities related to a classification problem and analyzes the results for a test sample. Additionally, it outlines the candidate generation and pruning steps of the Apriori algorithm for itemset mining.

Uploaded by

dohoangtruonghuy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Nguyễn Hoàng Đức _ITITIU20190

Q1:

(a) Mean: The mean is calculated by summing all the values and dividing by the total number
of values.​
Mean = (53 + 55 + 70 + 58 + 64 + 57 + 53 + 69 + 57 + 68 + 53) / 11 = 59.73

Median: To find the median, we first sort the data:​


53, 53, 53, 55, 57, 57, 58, 64, 68, 69, 70. The middle value is the 6th value:​
Median = 57

(b) Mode: The mode is the value that occurs most frequently. From the sorted data, 53
appears 3 times, which is more than any other value.​
Mode = 53

(c) Midrange: The midrange is the average of the smallest and largest values.​
Midrange = (53 + 70) / 2 = 61.5

(d) First Quartile (Q1): Q1 is the median of the first half of the sorted data (53, 53, 53, 55, 57).
The median of this group is:​
Q1 = (53 + 55) / 2 = 54

Third Quartile (Q3): Q3 is the median of the second half of the sorted data (58, 64, 68, 69,
70). The median of this group is:​
Q3 = (64 + 68) / 2 = 66

(e) Five-Number Summary:​


Min = 53, Q1 = 54, Median = 57, Q3 = 66, Max = 70
(g) Variance, Standard Deviation :
Deviations: [−6.73,−4.73,10.27,−1.73,4.27,−2.73,−6.73,9.27,−2.73,8.27,−6.73]
Variance= [1/(n-1)] x (sum of deviation)^2 = (1/10) x 454.18 = 45.42
Standard Deviation = 𝑠𝑞𝑟𝑡(45. 42)= 6.74

Q2:
Question 5 :

a)​ From the table:


●​ Total + : 5 (Records 1, 5, 6, 9, 10)
●​ Total − : 5 (Records 2, 3, 4, 7, 8)

P(A=1|+) = (Count of A =1 with +) / total + =3/5 =0.6 => P( A=0|+ ) = 1 - 0.6 =0.4

P(B=1|+) = (Count of B =1 with +) / total + =1/5= 0.2 => P( B=0|+ )= 1 - 0.2 = 0.8

P(C=1|+), (Count of C =1 with +) / total +=4/5= 0.8 => P(C=0|+) = 1 - 0.8 = 0.2

P(A=1|-) : (Count of A = 1 with -) / total - =2/5= 0.4 => P(A=0|-) = 1-0.4 = 0.6

P(B=1|-) : (Count of B = 1 with -) / total - =2/5=0.4 =>P(B=0|-) = 1-0.6 = 0.4

P(C=1|-):(Count of C = 1 with -) / total - =5/5=1 => P(C=0|-) = 1-1=0

From the table:

●​ There are 5 records with class "+" (records 1, 5, 6, 9, 10) and 5 records with class "-"
(records 2, 3, 4, 7, 8).

For Class "+" (Positive):

●​ The probability of A being 1 given class "+" (P(A=1|+)) is 3 out of 5, which is 0.6, so
P(A=0|+) = 1 - 0.6 = 0.4.
●​ The probability of B being 1 given class "+" (P(B=1|+)) is 1 out of 5, which is 0.2, so
P(B=0|+) = 1 - 0.2 = 0.8.
●​ The probability of C being 1 given class "+" (P(C=1|+)) is 4 out of 5, which is 0.8, so
P(C=0|+) = 1 - 0.8 = 0.2.

For Class "-" (Negative):

●​ The probability of A being 1 given class "-" (P(A=1|-)) is 2 out of 5, which is 0.4, so
P(A=0|-) = 1 - 0.4 = 0.6.
●​ The probability of B being 1 given class "-" (P(B=1|-)) is 2 out of 5, which is 0.4, so
P(B=0|-) = 1 - 0.4 = 0.6.
●​ The probability of C being 1 given class "-" (P(C=1|-)) is 5 out of 5, which is 1, so
P(C=0|-) = 1 - 1 = 0.
Analysis of the probabilities for the test sample (A=0, B=1, C=0):

●​ For A=0, P(A=0|+) = 0.4, which is less than P(A=0|-) = 0.6.


●​ For B=0, P(B=0|+) = 0.8, which is greater than P(B=0|-) = 0.4.
●​ For C=0, P(C=0|+) = 0.2, which is less than P(C=0|-) = 0.

P( A=0|+ ) < P(A=0|-)

P( B=0|+ ) > P(B=0|-)

P(C=0|+) < P(C=0|-)

So class label for a test sample (A=0,B=1,C=0) is ( - , +, +)

Q7)
Support for Itemset {b, d, e}

Count the number of customers who have b, d, and e in their basket:

●​ Customers with {b, d, e}: 2, 5


●​ Number of customers with {b, d, e}: 2
●​ Total customers: 5

Support for {b, d, e}:


Q8:

(a) List all candidate 4-itemsets obtained by the candidate generation


procedure in Apriori.

The candidate generation procedure in the Apriori algorithm creates candidate itemsets of
size kkk by combining itemsets of size k−1k-1k−1. To generate the candidate 4-itemsets from
the given frequent 3-itemsets, we first combine the frequent 3-itemsets that share 2 items in
common.

Frequent 3-itemsets:

●​ {1, 2, 3}
●​ {1, 2, 4}
●​ {1, 2, 5}
●​ {1, 3, 4}
●​ {1, 3, 5}
●​ {2, 3, 4}
●​ {2, 3, 5}
●​ {3, 4, 5}

Candidate 4-itemsets:

To generate 4-itemsets, we take pairs of frequent 3-itemsets that share 2 items and combine
them to form a 4-itemset. Here are the combinations:

1.​ Combine {1, 2, 3} and {1, 2, 4}: Resulting 4-itemset = {1, 2, 3, 4}


2.​ Combine {1, 2, 3} and {1, 2, 5}: Resulting 4-itemset = {1, 2, 3, 5}
3.​ Combine {1, 2, 4} and {1, 2, 5}: Resulting 4-itemset = {1, 2, 4, 5}
4.​ Combine {1, 2, 3} and {1, 3, 4}: Resulting 4-itemset = {1, 2, 3, 4} (Duplicate, already
listed)
5.​ Combine {1, 2, 3} and {1, 3, 5}: Resulting 4-itemset = {1, 2, 3, 5} (Duplicate, already
listed)
6.​ Combine {1, 2, 4} and {1, 3, 4}: Resulting 4-itemset = {1, 2, 3, 4} (Duplicate, already
listed)
7.​ Combine {1, 2, 4} and {1, 3, 5}: Resulting 4-itemset = {1, 2, 4, 5}
8.​ Combine {1, 2, 5} and {1, 3, 4}: Resulting 4-itemset = {1, 2, 4, 5} (Duplicate, already
listed)
9.​ Combine {1, 2, 5} and {1, 3, 5}: Resulting 4-itemset = {1, 2, 5, 3} (Duplicate, already
listed)
10.​Combine {2, 3, 4} and {2, 3, 5}: Resulting 4-itemset = {2, 3, 4, 5}

By considering only unique 4-itemsets:

Candidate 4-itemsets:

●​ {1, 2, 3, 4}
●​ {1, 2, 3, 5}
●​ {1, 2, 4, 5}
●​ {2, 3, 4, 5}

(b) List all candidate 4-itemsets that survive the candidate pruning step of
the Apriori algorithm.

In the candidate pruning step, any candidate itemset that contains a subset that is not frequent
is discarded. To determine which candidate 4-itemsets survive, we need to check whether
each 4-itemset has all its subsets of size 3 among the given frequent 3-itemsets.
Frequent 3-itemsets:

●​ {1, 2, 3}
●​ {1, 2, 4}
●​ {1, 2, 5}
●​ {1, 3, 4}
●​ {1, 3, 5}
●​ {2, 3, 4}
●​ {2, 3, 5}
●​ {3, 4, 5}

Check each candidate 4-itemset:

1.​ {1, 2, 3, 4}:


○​ Subsets: {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}
○​ All subsets are frequent.
○​ This itemset survives.
2.​ {1, 2, 3, 5}:
○​ Subsets: {1, 2, 3}, {1, 2, 5}, {1, 3, 5}, {2, 3, 5}
○​ All subsets are frequent.
○​ This itemset survives.
3.​ {1, 2, 4, 5}:
○​ Subsets: {1, 2, 4}, {1, 2, 5}, {1, 4, 5}, {2, 4, 5}
○​ {1, 4, 5} and {2, 4, 5} are not frequent.
○​ This itemset is pruned.
4.​ {2, 3, 4, 5}:
○​ Subsets: {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, {3, 4, 5}
○​ All subsets are frequent.
○​ This itemset survives.

Surviving 4-itemsets:

●​ {1, 2, 3, 4}
●​ {1, 2, 3, 5}
●​ {2, 3, 4, 5}

Final Answer:

●​ Candidate 4-itemsets (from part a):


○​ {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {2, 3, 4, 5}
●​ 4-itemsets after pruning (from part b):
○​ {1, 2, 3, 4}, {1, 2, 3, 5}, {2, 3, 4, 5}

You might also like