0% found this document useful (0 votes)

185 views23 pages

Advanced Database

This document presents a project analyzing an automobile dataset using the Apriori algorithm for association rule mining. A team of 4 students extracted information from the dataset using data mining techniques to understand relationships between attributes like MPG, cylinders, horsepower, etc. They implemented the Apriori algorithm in Python code to generate rules based on minimum support and confidence thresholds. The analysis found relationships with high confidence between attributes like MPG, cylinders and origin.

Uploaded by

ravikumarrk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

185 views23 pages

Advanced Database

Uploaded by

ravikumarrk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

ADVANCED DATABASES

AND DATA MINING

CSCI-527
PROJECT REPORT PRESENTATION

ANALASYS ON
AUTOMOBILE DATASET
TEAM MEMBERS:
Anusha Vadlamudi Narasimha Rao
50134597

Deepthi Chidura 50129270

Namitha Yellokonda 50126906

Shravya Beerakayala 50124534

Abstract

The goal of a data mining process is to

extract information from a dataset and
transform it into a format that could be used
for any purpose of the concerned field.
We have examined the Auto dataset in
which the performance of various cars are
analyzed based on their attributes such as
mpg, cylinders, displacement, horse power,
weight, acceleration, year and origin.
We have analyzed using data mining
technique Apriori algorithm.

INTRODUCTION
This Auto dataset contains the car model,
mpg (miles per gallon), cylinders,
displacement, horse power, weight,
acceleration, origin.

By using Apriori algorithm, confidence and

support allows us to generate important
decisions to find how they worked as
combined factors to find ways to increase the
performance.

The minimum support and

confidence value are set
according to the application and
the sets which satisfy this criterion
are considered to finally find out
which attributes together satisfy
the confidence and support.

DATA OF AUTO :

ATTRIBUTES DESCRIPTION

MPG
CYLINDERS
DISPLACEMENT
HORSEPOWER
WEIGHT
ACCELERATION
YEAR
ORIGIN
NAME

APRIORI ALGORITHM
The

Apriori Algorithm is an influential

algorithm for mining frequent item sets
for Boolean association rules that have
support and confidence greater than
minimum
support
(min-sup)
and
minimum
confidence
(min-conf),
respectively.

The problem of discovering all association

rules can be broken down into two parts
as follows:
Find all sets of items that have support
values greater than the minimum support.
These items are called large item sets.

Use the large item sets to generate the

desired rules.

Two factors that affect significance of

association rules:

Support: The rule X Y has support s in

the transaction set D if s% of the
transactions in D contains X Y.
Confidence: The rule X Y holds in the
transaction set D with confidence c if c%
of the transactions in D that contain X
also contain Y.

PSEUDO CODE
L1 = {large 1-itemsets};
for (k=2; Lk-1 0; k++) do
begin
Ck= apriori-gen(Lk-1); // new candidates
For all transactions t D do
begin
Ct =subset(C, t);
forall candidates c Ct do
c.count++;
end
L k= {c Ck | c.count minsup}
end
answer = k Lk;

DATA CLEANING
Unclean data refers to data that contains
erroneous information.

It may also be used when referring to data

that is in memory and not yet loaded into a
database. There are some missing fields in the
data fields.

UNCLEAN DATA

CODE FOR DATA CLEANING

autoData<- read.csv(file =
"~/Documents//data//Auto.csv", header = TRUE)
horsepwr<- (as.character(autoData$horsepower))
horsepwr<- (ifelse( horsepwr== "?", 0, horsepwr))a

After data cleaning these fields are

removed.

PYTHON CODE FOR APRIORI ALGORITHM

import csv
import os
def apriori_generation_algo(data, min_support=0.3, verbose=False):
can_keys = create_candidate_keys(data)
D_map = map(set, data)
F1, supporting_data = back_prune(D_map, can_keys, min_support, verbose=False)
F = [F1]
key = 2
while (len(F[key - 2]) > 0):
candidate_keys = apriori_generation(F[key-2], key)
F_key, support_K = back_prune(D_map, candidate_keys, min_support)
supporting_data.update(support_K)
F.append(F_key)
key += 1
if verbose:
for kset in F:
for item in kset:

print("" \
+ "{" \
+ "".join(str(i) + ", " for i in iter(item)).rstrip(', ') \
+ "}" \
+ ": supp = " + str(round(supporting_data[item], 3)))
return F, supporting_data
def create_candidate_keys(data, verbose=False):
can_keys = []
for transac in data:
for item in transac:
if not [item] in can_keys:
can_keys.append([item])
can_keys.sort()
return map(frozenset, can_keys)
def back_prune(data, candidates, min_support, verbose=False):
sscount = {}
for tid in data:
for candidate in candidates:
if candidate.issubset(tid):
sscount.setdefault(candidate, 0)
sscount[candidate] += 1
num_items = float(len(data))
ret_list = []
supporting_data = {}

for key in sscount:

support = sscount[key] / num_items
if support >= min_support:
ret_list.insert(0, key)
supporting_data[key] = support
if verbose:
for kset in ret_list:
for item in kset:
print("{" + str(item) + "}")
print("")
for key in sscount:
print("" \
+ "{" \
+ "".join([str(i) + ", " for i in iter(key)]).rstrip(', ') \
+ "}" \
+ ": supp = " + str(supporting_data[key]))
return ret_list, supporting_data
def apriori_generation(frequency_sets, key):
returnList = []
lenLk = len(frequency_sets)
for i in range(lenLk):
for j in range(i+1, lenLk):
a=list(frequency_sets[i])
b=list(frequency_sets[j])

a.sort()
b.sort()
F1 = a[:key-2]
F2 = b[:key-2]
if F1 == F2:
returnList.append(frequency_sets[i] | frequency_sets[j])
return returnList
def rules_from_conseq(frequency_set, H, supporting_data, rules, min_confidence=0.9,
verbose=False):
m = len(H[0])
if m == 1:
Hmp1 = cal_conf(frequency_set, H, supporting_data, rules, min_confidence, verbose)
if (len(frequency_set) > (m+1)):
Hmp1 = apriori_generation(H, m+1)
Hmp1 = cal_conf(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
if len(Hmp1) > 1:
rules_from_conseq(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
def cal_conf(frequency_set, H, supporting_data, rules, min_confidence=0.9, verbose=False):
pruned_H = []
for consequence in H:
confidence = supporting_data[frequency_set] / supporting_data[frequency_set consequence]

if confidence >= min_confidence:

append((frequency_set - consequence, consequence, confidence))
pruned_H.append(consequence)
if verbose:
print("" \
+ "{" \
rules.

+ "".join([str(i) + ", " for i in iter(frequency_setconsequence)]).rstrip(', ') \

+ "}" \
+ " --> " \
+ "{" \
+ "".join([str(i) + ", " for i in iter(consequence)]).rstrip(', ') \
+ "}" \
+ ": conf = " + str(round(confidence, 3)) \
+ ", supp = " + str(round(supporting_data[frequency_set], 3)))
return pruned_H
def gen_rules(F, supporting_data, min_confidence=0.9, verbose=True):
rules = []
for i in range(1, len(F)):
for frequency_set in F[i]:

def import_data():
with open('C:/Users/Anusha/Desktop/Auto_clean_data.csv',"rU") as fin:
data = [row for row in csv.reader(fin.read().splitlines())]
return data
data = import_data()
D_map = map(set, data)
can_keys = create_candidate_keys(data, verbose=True)
F1, supporting_data = back_prune(D_map, can_keys, 0.3, verbose=True)
F, supporting_data = apriori_generation_algo(data, min_support=0.05,
verbose=True)
H = gen_rules(F, supporting_data, min_confidence=0.9, verbose=True)

OBSERVATION
If mpg equals 14, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.063
If mpg equals 13, Cylinders equals 8 and origin
equals 1 then confidence = 0.929 and support =
0.066
If cylinders equals 8, Origin equals 73 and origin
equals 1 then confidence = 1.0 and support =
0.051
If horsepower 150, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.056.

CONCLUSION

In our project we have observed Apriori

algorithm and generated rules by considering
minimum support and confidence. The data
set is cleaned using R-Programming and the
algorithm is implemented using python code.
The python code is run is the java
environment and the results are obtained.

Step 1 Documentation Round
No ratings yet
Step 1 Documentation Round
11 pages
Algorithm
No ratings yet
Algorithm
8 pages
Apriori Algorithm (Python 3.0) - A Data Analyst
No ratings yet
Apriori Algorithm (Python 3.0) - A Data Analyst
13 pages
Weantuday: T Deuhh Anytha
No ratings yet
Weantuday: T Deuhh Anytha
23 pages
Apriori Algorithm Example Problems
No ratings yet
Apriori Algorithm Example Problems
8 pages
DWM Exp8
No ratings yet
DWM Exp8
8 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
13 pages
Homework 1 Data
No ratings yet
Homework 1 Data
5 pages
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
No ratings yet
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
28 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Pract4 63
No ratings yet
Pract4 63
3 pages
Exp 9
No ratings yet
Exp 9
9 pages
Apriori Algorithm in Machine Learning
No ratings yet
Apriori Algorithm in Machine Learning
8 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Intro To Ai ML
No ratings yet
Intro To Ai ML
21 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Apriori Algorithm - Ipynb - Colaboratory
No ratings yet
Apriori Algorithm - Ipynb - Colaboratory
5 pages
Shweta Singh-Dwdm2024
No ratings yet
Shweta Singh-Dwdm2024
5 pages
11 Association Rules Mining and Recommendation Systems
No ratings yet
11 Association Rules Mining and Recommendation Systems
70 pages
Data Analyticskit601 Unit 4 Notes
No ratings yet
Data Analyticskit601 Unit 4 Notes
178 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
DM Lab Cycle 7 1
No ratings yet
DM Lab Cycle 7 1
7 pages
Ashfatmaterial
No ratings yet
Ashfatmaterial
4 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
BDA Experiments
No ratings yet
BDA Experiments
41 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
Ds 2
No ratings yet
Ds 2
3 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
No ratings yet
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
6 pages
15th QN
No ratings yet
15th QN
3 pages
Apriori Algo
No ratings yet
Apriori Algo
15 pages
Program
No ratings yet
Program
2 pages
MSApriori Algorithm Steps
No ratings yet
MSApriori Algorithm Steps
5 pages
DW Ans
No ratings yet
DW Ans
19 pages
Data Analyticskit601 Unit 4 Notes
No ratings yet
Data Analyticskit601 Unit 4 Notes
178 pages
Candidate Generation and Pruning
No ratings yet
Candidate Generation and Pruning
9 pages
Department of Computer Engineering: Experiment No.8
No ratings yet
Department of Computer Engineering: Experiment No.8
4 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Apriori
No ratings yet
Apriori
34 pages
DWM Exp9 C49
No ratings yet
DWM Exp9 C49
8 pages
DWDM Answer
No ratings yet
DWDM Answer
19 pages
219 - Exp 9 - DWM
No ratings yet
219 - Exp 9 - DWM
10 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
15 pages
Indexdw
No ratings yet
Indexdw
34 pages
Association Rule Mining Activity
No ratings yet
Association Rule Mining Activity
4 pages
Da Exp 9
No ratings yet
Da Exp 9
5 pages
Unit 4
No ratings yet
Unit 4
113 pages
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
23 pages
3) 65 (Apriori Algorithm) : Frequent Item Set in Data Set (Association Rule Mining
No ratings yet
3) 65 (Apriori Algorithm) : Frequent Item Set in Data Set (Association Rule Mining
4 pages
4.4-Apriori-Algorithm - (CourseMega - Com)
No ratings yet
4.4-Apriori-Algorithm - (CourseMega - Com)
8 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
APrior Algorithm
No ratings yet
APrior Algorithm
11 pages
Assignment 1: Data Mining MGSC5126 - 10
No ratings yet
Assignment 1: Data Mining MGSC5126 - 10
10 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Programming Assignment #1, Csci 530, Winter-Mini 2014/2015: Derek Harter 2014-12-15
No ratings yet
Programming Assignment #1, Csci 530, Winter-Mini 2014/2015: Derek Harter 2014-12-15
7 pages
Awa
No ratings yet
Awa
1 page
5G
No ratings yet
5G
20 pages
MTC 2 e
No ratings yet
MTC 2 e
8 pages
Artificial Intelligence For Speech Recognition
92% (12)
Artificial Intelligence For Speech Recognition
48 pages
Anti Bag Snatching Alarm
67% (3)
Anti Bag Snatching Alarm
15 pages
Palm Vein Technology PDF
83% (12)
Palm Vein Technology PDF
57 pages
S Swetha - Task 1
No ratings yet
S Swetha - Task 1
8 pages
What Is Process? Discuss The Process Framework Activities
No ratings yet
What Is Process? Discuss The Process Framework Activities
6 pages
15 Questions You Must Ask Your CNAPP Vendor - 031524
No ratings yet
15 Questions You Must Ask Your CNAPP Vendor - 031524
23 pages
CPA Gantt Chart5
No ratings yet
CPA Gantt Chart5
13 pages
Case Study On RAID Architectures
No ratings yet
Case Study On RAID Architectures
3 pages
JFQ4A
No ratings yet
JFQ4A
4 pages
BitTorrent in Ns-2 - My Site
No ratings yet
BitTorrent in Ns-2 - My Site
2 pages
SO ClearPass
No ratings yet
SO ClearPass
5 pages
FYP Presentation
No ratings yet
FYP Presentation
14 pages
Explain Transplanted Real World Business Models and Native Internet Business Models ?
No ratings yet
Explain Transplanted Real World Business Models and Native Internet Business Models ?
6 pages
Sangfor WANO Brochure
No ratings yet
Sangfor WANO Brochure
12 pages
DCS MCQs Unit 1to4 Sir
100% (1)
DCS MCQs Unit 1to4 Sir
20 pages
Computer Proficiency Certification Test (CPCT) Rule Book For Examinees
No ratings yet
Computer Proficiency Certification Test (CPCT) Rule Book For Examinees
13 pages
Cisco Live Aci
100% (1)
Cisco Live Aci
48 pages
Chatbot
No ratings yet
Chatbot
106 pages
Web Services: Internet and Web Application Development
100% (1)
Web Services: Internet and Web Application Development
17 pages
03-Web Application Vulnerabilities - I (Website Attacks Tips)
No ratings yet
03-Web Application Vulnerabilities - I (Website Attacks Tips)
5 pages
Test Excel Sheet
No ratings yet
Test Excel Sheet
10 pages
Fortigate 200f Series
No ratings yet
Fortigate 200f Series
11 pages
Cat6 B6 2023
No ratings yet
Cat6 B6 2023
3 pages
Darling
No ratings yet
Darling
36 pages
Real-Time ASP - Net Core 3 Apps With SignalR Succinctly
No ratings yet
Real-Time ASP - Net Core 3 Apps With SignalR Succinctly
81 pages
Flappy Bird SRS IUB
No ratings yet
Flappy Bird SRS IUB
18 pages
All in One Data Modeling - Compressed
No ratings yet
All in One Data Modeling - Compressed
473 pages
1.3.4 Lab - Visualizing The Black Hats
No ratings yet
1.3.4 Lab - Visualizing The Black Hats
3 pages
Oracle-DBA Trainining in Hyderabad
No ratings yet
Oracle-DBA Trainining in Hyderabad
4 pages
Test Units
No ratings yet
Test Units
19 pages
Organising
No ratings yet
Organising
2 pages
Module 9 - Transactions & ACID Properties
No ratings yet
Module 9 - Transactions & ACID Properties
9 pages

Advanced Database

Uploaded by

Advanced Database

Uploaded by

ADVANCED DATABASES

AND DATA MINING

Deepthi Chidura 50129270

Namitha Yellokonda 50126906

Shravya Beerakayala 50124534

The goal of a data mining process is to

By using Apriori algorithm, confidence and

The minimum support and

Apriori Algorithm is an influential

The problem of discovering all association

Use the large item sets to generate the

Two factors that affect significance of

Support: The rule X Y has support s in

It may also be used when referring to data

CODE FOR DATA CLEANING

After data cleaning these fields are

PYTHON CODE FOR APRIORI ALGORITHM

for key in sscount:

if confidence >= min_confidence:

+ "".join([str(i) + ", " for i in iter(frequency_setconsequence)]).rstrip(', ') \

In our project we have observed Apriori

You might also like