0% found this document useful (1 vote)
210 views5 pages

Homework 7

The document contains 5 homework questions about analyzing traffic accident data, mining association rules from datasets, determining subsequences based on timing constraints, and joining pairs of graphs. The questions involve tasks like binarizing data, computing rule support and confidence, finding rules that satisfy minimum thresholds, determining if sequences are subsequences, and drawing candidate subgraphs obtained by joining graph pairs.

Uploaded by

Ragini P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
210 views5 pages

Homework 7

The document contains 5 homework questions about analyzing traffic accident data, mining association rules from datasets, determining subsequences based on timing constraints, and joining pairs of graphs. The questions involve tasks like binarizing data, computing rule support and confidence, finding rules that satisfy minimum thresholds, determining if sequences are subsequences, and drawing candidate subgraphs obtained by joining graph pairs.

Uploaded by

Ragini P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Homework 7

Answer the following questions: (10 point each)

1- Consider the traffic accident data set shown in Table below.


Traffic accident data set.
Weather Driver’s Traffic Violation Seat Crash
Condition Condition Belt Severity
Good Alcohol-impaired Exceed speed limit No Major
Bad Sober None Yes Minor
Good Sober Disobey stop sign No Minor
Bad Alcohol-impaired Exceed speed limit Yes Major
Bad Alcohol-impaired Disobey traffic signal No Major
Bad Alcohol-impaired Disobey stop sign Yes Minor
Bad Alcohol-impaired None Yes Major
Good Sober Disobey traffic signal Yes Minor
Good Alcohol-impaired None No Minor
Bad Sober None Yes Major
Good Alcohol-impaired Exceed speed limit Yes Major
Bad Sober Disobey stop sign Yes Minor

a. Show a binarized version of the data set.


Answer:

b. What is the maximum width of each transaction in the binarized data?


Answer:

c. Assuming that support threshold is 30%, how many candidate and frequent item sets
will be generated?

1
2- Consider the data set shown in Table below. The first attribute is continuous, while the
remaining two attributes are asymmetric binary. A rule is considered to be strong if its
support exceeds 15% and its confidence exceeds 60%. The data given in Table below
supports the following two strong rules:
(i) {(1 ≤ A ≤ 2), B = 1} → {C = 1}
(ii) {(5 ≤ A ≤ 8), B = 1} → {C = 1}

A B C
1 1 1
2 1 1
3 1 0
4 1 0
5 1 1
6 0 1
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 1

a. Compute the support and confidence for both rules.


Answer:
S ({(1 ≤ A ≤ 2), B = 1} → {C = 1}) =
C ({(1 ≤ A ≤ 2), B = 1} → {C = 0}) =
S ({(5 ≤ A ≤ 9), B = 1} → {C = 1}) =
C ({(5 ≤ A ≤ 9), B = 1} → {C = 1}) =

2
3. Consider the data set shown in Table below. Suppose we are interested in extracting the
following association rule:

{α1 ≤ Age ≤ α2, Play Piano = Yes} → {Enjoy Classical Music = Yes}

Age Play Piano Enjoy Classical Music


9 Yes Yes
11 Yes Yes
14 Yes No
17 Yes No
19 Yes Yes
21 No No
25 No No
29 Yes No
33 Yes No
39 Yes Yes
41 No Yes
47 No Yes

To handle the continuous attribute, we apply the equal-frequency approach with 3, 4, and 6
intervals. Categorical attributes are handled by introducing as many new asymmetric binary
attributes as the number of categorical values. Assume that the support threshold is 10% and
the confidence threshold is 70%.

(a) Suppose we discretize the Age attribute into 3 equal-frequency intervals. Find a pair of
values for α1 and α2 that satisfy the minimum support and minimum confidence
requirements.
Answer:

(b) Repeat part (a) by discretizing the Age attribute into 4 equal-frequency intervals.
Compare the extracted rules against the ones you had obtained in part (a).
Answer:

(c) Repeat part (a) by discretizing the Age attribute into 6 equal-frequency intervals.
Compare the extracted rules against the ones you had obtained in part (a).
Answer:

3
4. For each of the sequence w = <e1, . . . , elast> below, determine whether they are
subsequences of the following data sequence:
<{A, B}{C, D}{A, B}{C, D}{A, B}{C, D}>
subjected to the following timing constraints:
mingap = 0 (interval between last event in ei and first event in ei+1 is > 0)
maxgap = 2 (interval between first event in ei and last event in ei+1 is ≤ 2)
maxspan = 6 (interval between first event in e1 and last event in elast is ≤ 6)
ws = 1 (time between first and last events in ei is ≤ 1)

a. w = < {A}{B}{C}{D}> Answer:


b. w = < {A} {B, C, D} {A}> Answer:
c. w = < {A} {B, C, D} {A}> Answer:
d. w = < {B, C} {A, D} {B, C}> Answer:
e. w = < {A, B, C, D} {A, B, C, D}> Answer:

4
5. Draw all candidate subgraphs obtained from joining the pair of graphs shown in Figure below
Assume the edge-growing method is used to expand the subgraphs.

Answer:

You might also like