Advanced Database
Advanced Database
ANALASYS ON
AUTOMOBILE DATASET
TEAM MEMBERS:
Anusha Vadlamudi Narasimha Rao
50134597
Abstract
INTRODUCTION
This Auto dataset contains the car model,
mpg (miles per gallon), cylinders,
displacement, horse power, weight,
acceleration, origin.
DATA OF AUTO :
ATTRIBUTES DESCRIPTION
MPG
CYLINDERS
DISPLACEMENT
HORSEPOWER
WEIGHT
ACCELERATION
YEAR
ORIGIN
NAME
APRIORI ALGORITHM
The
PSEUDO CODE
L1 = {large 1-itemsets};
for (k=2; Lk-1 0; k++) do
begin
Ck= apriori-gen(Lk-1); // new candidates
For all transactions t D do
begin
Ct =subset(C, t);
forall candidates c Ct do
c.count++;
end
L k= {c Ck | c.count minsup}
end
answer = k Lk;
DATA CLEANING
Unclean data refers to data that contains
erroneous information.
UNCLEAN DATA
print("" \
+ "{" \
+ "".join(str(i) + ", " for i in iter(item)).rstrip(', ') \
+ "}" \
+ ": supp = " + str(round(supporting_data[item], 3)))
return F, supporting_data
def create_candidate_keys(data, verbose=False):
can_keys = []
for transac in data:
for item in transac:
if not [item] in can_keys:
can_keys.append([item])
can_keys.sort()
return map(frozenset, can_keys)
def back_prune(data, candidates, min_support, verbose=False):
sscount = {}
for tid in data:
for candidate in candidates:
if candidate.issubset(tid):
sscount.setdefault(candidate, 0)
sscount[candidate] += 1
num_items = float(len(data))
ret_list = []
supporting_data = {}
a.sort()
b.sort()
F1 = a[:key-2]
F2 = b[:key-2]
if F1 == F2:
returnList.append(frequency_sets[i] | frequency_sets[j])
return returnList
def rules_from_conseq(frequency_set, H, supporting_data, rules, min_confidence=0.9,
verbose=False):
m = len(H[0])
if m == 1:
Hmp1 = cal_conf(frequency_set, H, supporting_data, rules, min_confidence, verbose)
if (len(frequency_set) > (m+1)):
Hmp1 = apriori_generation(H, m+1)
Hmp1 = cal_conf(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
if len(Hmp1) > 1:
rules_from_conseq(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
def cal_conf(frequency_set, H, supporting_data, rules, min_confidence=0.9, verbose=False):
pruned_H = []
for consequence in H:
confidence = supporting_data[frequency_set] / supporting_data[frequency_set consequence]
def import_data():
with open('C:/Users/Anusha/Desktop/Auto_clean_data.csv',"rU") as fin:
data = [row for row in csv.reader(fin.read().splitlines())]
return data
data = import_data()
D_map = map(set, data)
can_keys = create_candidate_keys(data, verbose=True)
F1, supporting_data = back_prune(D_map, can_keys, 0.3, verbose=True)
F, supporting_data = apriori_generation_algo(data, min_support=0.05,
verbose=True)
H = gen_rules(F, supporting_data, min_confidence=0.9, verbose=True)
OBSERVATION
If mpg equals 14, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.063
If mpg equals 13, Cylinders equals 8 and origin
equals 1 then confidence = 0.929 and support =
0.066
If cylinders equals 8, Origin equals 73 and origin
equals 1 then confidence = 1.0 and support =
0.051
If horsepower 150, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.056.
CONCLUSION