0% found this document useful (0 votes)

847 views36 pages

Data Mining Lab Report

The document provides details of several practical experiments conducted as part of a data mining lab report. It includes implementing functions, procedures, triggers and cursors in practical 1. Practical 2 describes the normalization of relations to first, second and third normal forms. Practical 3 explores preprocessing techniques, the Apriori algorithm, and FilteredAssociator algorithm in Weka. Classification algorithms like ZeroR and decision tables and clustering using k-means are analyzed in Practical 4. Practical 5 introduces data mining in the automotive industry.

Uploaded by

Himani chopra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

847 views36 pages

Data Mining Lab Report

Uploaded by

Himani chopra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

UE163048

DATA MINING LAB REPORT

UNIVERSITY INSTITUTE OF ENGINEERING AND

TECHNOLOGY,
PANJAB UNIVERSITY
CHANDIGARH

NAME: HIMANI CHOPRA

CLASS: B.E. CSE (6th SEM) – SECTION 1

ROLL NO.: UE163048

SUBMITTED TO:

Mr. KUSHAL KANWAR

1
UE163048

Practical - 1
(Implementation of functions, Procedures, Triggers and Cursors)
1. Creating a Table and inserting values

Output:

2
UE163048

FUNCTIONS:
1. To calculate total no. of customers in the table

2. To find maximum of two numbers

3
UE163048

PROCEDURES:
a. To create a greetings procedure

b. To drop a procedure

3.To calculate maximum of two numbers

4
UE163048

CURSORS:
1. Implicit Cursor: To update salary of customers

2. Explicit Cursor: To select customer ID and name from customers

TRIGGERS:
1. Create trigger before insert/ update/ delete

5
UE163048

(i) On update

6
UE163048

(ii) On insert

(iii) On delete

2. Create trigger after update

7
UE163048

Practical - 2
(Normalization of Relations)
Original Table:

First Normal Form:

Second Normal Form :

8
UE163048

Third Normal Form :

9
UE163048

Practical - 3
(Learning and exploring Weka)
What is WEKA?
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be
applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-
processing, classification, regression, clustering, association rules, and visualization. It is also well-
suited for developing new machine learning schemes.
Weka is open source software issued under the GNU General Public License.
3 (a): Exploring preprocessing in Weka
Unsupervised filters
1. Add ID-An instance filter that adds an ID attribute to the dataset. The new attribute contains a
unique ID for each instance.
Note: The ID is not reset for the second batch of instances when batch mode is used from the
command-line, or the FilteredClassifier.

2. Numeric to Nominal- A filter for turning numeric attributes into nominal ones.

10
UE163048

3. Numeric to Binary-Converts all numeric attributes into binary attributes (apart from the class
attribute, if set): if the value of the numeric attribute is exactly zero, the value of the new attribute
will be zero.

4. Discretize-An instance filter that discretizes a range of numeric attributes in the dataset into
nominal attributes. Discretization is by simple binning. Skips the class attribute if set.

11
UE163048

Supervised Filters:
1. Nominal to Binary-Converts all nominal attributes into binary numeric attributes. An attribute
with k values is transformed into k binary attributes if the class is nominal (using the one-attribute-
per-value approach). Binary attributes are left binary if option '-A' is not given. If the class is
numeric, k - 1 new binary attributes are generated in the manner described in "Classification and
Regression Trees" by Breiman et al. (i.e., by taking the average class value associated with each
attribute value into account).

2. Merge Nominal Values - Merges values of all nominal attributes among the specified attributes,
excluding the class attribute, using the CHAID method, but without considering re-splitting of
merged subsets.

3 (b): EXPLORING APRIORI IN WEKA

Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent itemsets and
relevant association rules. It is devised to operate on a database containing a lot of transactions, for
instance, items brought by customers in a store.

12
UE163048

3 (c): EXPLORING FILTERED ASSOCIATER GROWTH IN WEKA

13
UE163048

Practical-4 (Classification and Clustering)

CLASSIFICATION IN WEKA

1. ZERO-R :There is a classifier called Zero Rule (or 0R or ZeroR for short). It is the simplest rule
you can use on a classification problem and it simply predicts the majority class in your dataset
(e.g. the mode).

Zero-R with 10 cross-validation folds

14
UE163048

Zero-R with 25 cross-validation folds

2. DECISION TABLE: Class for building and using a simple decision table majority classifier.

Decision Table with 10 cross-validation folds

15
UE163048

Decision Table
with 25 cross-validation folds

CLUSTERING IN WEKA

SIMPLEK-MEANS

The WEKA SimpleKMeans algorithm uses Euclidean distance measure to compute distances between
instances and clusters. To perform clustering, select the "Cluster" tab in the Explorer and click on the
"Choose" button. This results in a drop down list of available clustering algorithms.

16
UE163048

Simple K-Means with 50% split.

Simple K-Means with 75% split.

17
UE163048

Practical - 3
(Learning and exploring Weka)

What is WEKA?
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for
data pre-processing, classification, regression, clustering, association rules, and visualization. It is
also well-suited for developing new machine learning schemes.
Weka is open source software issued under the GNU General Public License.

3 (a): Exploring preprocessing in Weka

Unsupervised filters

1. Add ID

18
UE163048

2. Numeric to Nominal

3. Numeric to Binary

19
UE163048

4. Discretize

Supervised Filters:

1. Nominal to Binary

20
UE163048

2. Merge Nominal Values

3 (b): Exploring Apriori in Weka

Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent itemsets
and relevant association rules. It is devised to operate on a database containing a lot of
transactions, for instance, items brought by customers in a store.

21
UE163048

3 (c): Exploring Filtered Associater Growth in Weka

22
UE163048

PRACTICAL - 5
INTRODUCTION :

Nowadays, data mining is playing a vital role in automobile industry and one of the most important areas
of research with the objective of finding meaningful information from the data stored in huge dataset.
Automotive data mining (ADM) is a very important research area which helpful to predict useful
information from automobile database to improve automobile performance, better understanding and to
have better sales and marketing operations. Data Mining or knowledge discovery has become the area of
growing significance because it helps in analyzing data from different perspectives and summarizing it
into useful information.

What is Data Mining?

Data Mining is defined as extracting information from huge sets of data. In other words, we can say that
data mining is the procedure of mining knowledge from data.
Data Mining could be a promising and flourishing frontier in analysis of data and additionally the result
of analysis has many applications. Data Mining can also be referred as Knowledge Discovery from Data
(KDD).This system functions as the machine-driven or convenient extraction of patterns representing
knowledge implicitly keep or captured in huge databases, data warehouses, the Web, data repositories,
and information streams. Data Mining is a multidisciplinary field, encompassing areas like information
technology, machine learning, statistics, pattern recognition, data retrieval, neural networks, information
based systems, artificial intelligence and data visualization.

What is Knowledge Discovery Database?

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a
collection of data. This widely used data mining technique is a process that includes data preparation and
selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions
from the observed results.

23
UE163048

BACKGROUND :

Required Software(WEKA) -

We have used a data mining software named as WEKA for this project. For the purposes of this study, we
select WEKA (Waikato Environment for Knowledge Analysis) software that was developed at the
University of Waikato in New Zealand. WEKA tool supports to a wider range of algorithms & very large
data sets. The WEKA (pronounced Waykuh) workbench contains a collection of visualization tools &
algorithms. WEKA is open source software issued under the GNU General Public License. It contains
tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
The original non-java version of WEKA was a TCL/TK, but the recent java based version is WEKA
3(1997), is now used in many different application areas, in particular for education & research. WEKA’s
main user interface is Explorer. The Experimenter is also there by which we can compare WEKA’s
machine learning algorithms’ performance. The Explorer interface has many panels by which we can
access to main components of workbench. The Visualization tab allows visualizing a 2-D plot of the
current working relation, it is very useful. In this study WEKA toolkit 3.8.1 is used for generating the
association rules and prediction of result.
WEKA supports several standard data mining tasks, more specifically, data preprocessing,
clustering, classification, regression, visualization, and feature selection. All of WEKA's techniques are
predicated on the assumption that the data is available as a single flat file or relation, where each data
point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some
other attribute types are also supported). WEKA provides access to SQL databases using Java Database
Connectivity and can process the result returned by a database query. It is not capable of multi-relational
data mining, but there is separate software for converting a collection of linked database tables into a
single table that is suitable for processing using WEKA.

PROBLEM STATEMENT :
Data mining is widely used in automobile industry to find the problems that arise in this field.
Automobile’s performance is of great concern and several factors may affect the performance. For
prediction the three required components are: Parameters which affect the car’s performance, Data mining
methods and third one is data mining tool. By applying data mining techniques on automobile data we
can obtain knowledge which describes the car’s performance. This knowledge will help to find out the
best car in each segment depending upon various factors like price, average, engine type, etc.

METHODOLOGY :
The methodology consists of 5 different phases (as shown in the figure) i.e. Data Set Generation, Data
Cleaning, Attribute Selection, Data Mining and Analysis of Results.

Fig - Methodology Steps

24
UE163048

WORKFLOW DIAGRAM :

25
UE163048

DESIGN AND IMPLEMENTATION :

● Dataset and attribute selection - A dummy dataset of automobile is collected which contains the
data of various cars by major car manufacturing companies in the market. The dataset contains
207 instances and 26 attributes. It has some missing values also. The data file has to be in either
in ‘CSV’ format or ‘ARFF’ format.

● Preprocessing - Data Preprocessing is the first step of evaluation. For this experiment WEKA
Explorer interface is being used. Here the source data file is selected from local machine. After
loading the data in Explorer, the data is refined by selecting different options which is known as
‘Data Cleaning’ and can also select or remove attributes as per our need. The following is the
preprocessed of our dataset. Left hand side of the above screen shows detail of relation name,
number of attributes and number of records. Right hand side gives details of attribute values, type,
and number of distinct values. Specification of every attribute is displayed in the right bottom of
the screen.

26
UE163048

● Filters - The preprocess section allows filters to be defined that transform the data in various
ways. The Filter box is used to set up the filters that are required. There are mainly two
categories of filters-Supervised and Unsupervised. Here we will choose unsupervised category
filters. In case if the dataset is contained with any numeric values we have to convert it
nominal values( as Association in WEKA can only support nominal values) by using ‘Numeric
To Nominal’ filter under attribute section of Unsupervised filters.

● Classification - To predict nominal or numeric quantities we have classifiers in WEKA. For

prediction purpose a classifier is to be chosen. We will select a standard classifier named as
ZERO-R for classification.

Here are some others factor in classifier output-

● TP Rate : rate of true positives (instances correctly classified as a given class)

● FP Rate : rate of false positives (instances falsely classified as a given class)

● Precision : proportion of instances that are truly of a class divided by the total
instances classified as that class

● Recall : proportion of instances classified as a given class divided by the actual total in
that class (equivalent to TP rate)

27
UE163048

● F-Measure: A combined measure for precision and recall calculated as 2 * Precision

*Recall / (Precision + Recall)

Apriori algorithm in Weka:

General Process

Association rule generation is usually split up into two separate steps:

1. First, minimum support is applied to find all frequent item sets in a database.

2. Second, these frequent item sets and the minimum confidence constraint are used to form rules.

While the second step is straight forward, the first step needs more attention.
Finding all frequent item sets in a database is difficult since it involves searching all possible item sets.

28
UE163048

Support- The support for a rule X => Y is obtained by dividing the number of transactions Which
satisfy the rule, N (X=>Y), by the total number of transactions, N

Support (X=>Y) =N (X=>Y) / N

The support is therefore the frequency of events for which both the LHS and RHS of the rule hold true.
The higher the support the stronger the information that both type of events occur together.

Confidence- The confidence of the rule X => Y is obtained by dividing the number of Transactions
which satisfy the rule N (X=>Y) by the number of transactions which contain the Body of the rule, X.
Confidence (X=>Y) = N (X=>Y) / N (X)
The confidence is the conditional probability of the RHS holding true given that the LHS Holds true. A
high confidence that the LHS event leads to the RHS event implies causation or Statistical dependence.

Lift- The lift of the rule X => Y is the deviation of the support of the whole rule from the Support
expected under independence given the supports of the LHS (X) and the RHS (Y).

Lift {X=>Y} = confidence (X=>Y) / support (Y)

= support (X=>Y) / support (X). support (Y)

Lift is an indication of the effect that knowledge that LHS holds true has on the probability of The RHS
holding true. Hence Lift is a value that gives us information about the increase in Probability of the
"then" (consequent RHS) given the "if" (antecedent LHS) part.

Lift is exactly 1: No effect (LHS and RHS independent). No relationship between Events.

Lift greater than 1: Positive effect (given that the LHS holds true, it is more likely that The
Operational risk management RHS holds true). Positive dependence between events.

Lift is smaller
than 1: Negative effect (when the LHS holds true, it is less likely that the RHS holds true). Negative
dependence between events.

Leverage – proportion of additional examples covered by both the antecedent and the Consequent
above those expected if the antecedent and consequent were independent of each Other, and finally.

lev(X →Y ) = supp(X,Y ) − sup(X). supp(Y)

Conviction – a measure similar to Leverage that measures the departure from independence.
conv(X →Y ) = supp(X)(1-supp(Y)) / supp(X) – supp(X,Y)

Practical work on Apriori in Weka tools:

Since the data mining software used to generate association rules accepts data only in arff format, the
researcher first converted the data on Ms Excel file into comma separated text format and then to arff
format. Data in arff format of nominal form is then given to Weka associate, then select Apriori for
association rule mining.

For our test we shall consider 206 students data with respect to different type of nominal attributes.

29
UE163048

The ARFF file presented below contains information regarding each student’s performance.

Using the Apriori Algorithm we want to find the association rules that have min Support=0.1(10%)
and minimum confidence=0.9(90%). We will do this using WEKA GUI. After we launch the WEKA
application and open the Automobile.csv file, we move to the Associate tab and we set up the
following configuration:

30
UE163048

In here, we can set minimum support= 0.1, because this can generate more frequent item set. If we set
minimum support= 0.2 or more, then this can remove many attributes, but minimum no of attributes is
not sufficient to give a proper decision.
But, minimum confidence=0.9, can set higher because this boundary can give less amount of rules.

Result of Apriori Algorithm

Minimum support: 0.7 (144 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 6

Generated sets of large itemsets:

Size of set of large itemsets L(1): 5
Large Itemsets L(1):
fuel-type=gas 185
aspiration=std 168
engine-location=front 202
engine-type=ohc 148
num-of-cylinders=four 159
Size of set of large itemsets L(2): 6
Large Itemsets L(2):
fuel-type=gas aspiration=std 161
fuel-type=gas engine-location=front 182
fuel-type=gas num-of-cylinders=four 144
aspiration=std engine-location=front 165
engine-location=front engine-type=ohc 148
engine-location=front num-of-cylinders=four 159
Size of set of large itemsets L(3): 2
Large Itemsets L(3):
fuel-type=gas aspiration=std engine-location=front 158
fuel-type=gas engine-location=front num-of-cylinders=four 144

31
UE163048

Best rules found:

1. num-of-cylinders=four 159 ==> engine-location=front 159 <conf:(1)>

lift:(1.02) lev:(0.01) [3] conv:(3.09)
2. engine-type=ohc 148 ==> engine-location=front 148 <conf:(1)> lift:(1.02) lev:(0.01) [2]
conv:(2.87)
3. fuel-type=gas num-of-cylinders=four 144 ==> engine-location=front 144
<conf:(1)> lift:(1.02) lev:(0.01) [2] conv:(2.8)
4. fuel-type=gas 185 ==> engine-location=front 182 <conf:(0.98)> lift:(1) lev:(0) [0]
conv:(0.9)
5. aspiration=std 168 ==> engine-location=front 165 <conf:(0.98)> lift:(1) lev:(0) [0]
conv:(0.82)
6. fuel-type=gas aspiration=std 161ĺ ==> engine-location=front 158 <conf:(0.98)> lift:(1)
lev:(0) [0] conv:(0.78)
7. aspiration=std 168 ==> fuel-type=gas 161 <conf:(0.96)> lift:(1.07) lev:(0.05) [10]
conv:(2.14)
8. aspiration=std engine-location=front 165 ==> fuel-type=gas 158
<conf:(0.96)> lift:(1.07) lev:(0.05) [9] conv:(2.1)
9. aspiration=std 168 ==> fuel-type=gas engine-location=front 158
<conf:(0.94)> lift:(1.06) lev:(0.05) [9] conv:(1.78)
10. num-of-cylinders=four 159 ==> fuel-type=gas 144 <conf:(0.91)> lift:(1.01) lev:(0.01)
[1] conv:(1.01)

USEFUL CONCEPTS:

Interestingness measures of rules in weka:

For the dataset, association rules of the form X -> Y, where the frequent item-sets are generated using
methods Aproiri techniques. The item-sets X and Y are called antecedent and consequent of the rule
respectively. Generation of association rules (AR) is generally controlled by the two measures or
metrics Called support and confidence, Some important are given below.

1. P(X)= count of total no of tuples at antecedent

2. P(Y)= count of total no of tuples at consequent
3. P(XY)=P(X U Y)=P(X,Y)=P(XàY)= total no of tuples that contain both X and Y

Now, In this Student Performance dataset, we can calculate the interestingness as per as Weka results
for every generating association rules.
But here, We only calculate for one rule which was generated by weka.

To select interesting rules from the set of all possible rules, constraints on various measures of
significance and interest can be used. The best-known constraints are minimum thresholds on support
and confidence.

32
UE163048

Support:

The support for a rule X => Y is obtained by dividing the number of transactions which satisfy the
rule, N {X=>Y}, by the total number of transactions N.

The support supp(X) or supp(Y) of an itemset X or Y is defined as the proportion of transactions in the
data set which contain the itemset.

Support {X=>Y} =N {X=>Y} / N

supp(X)= no. of transactions which contain the itemset X / total no. of transactions

supp(Y)= no. of transactions which contain the itemset Y / total no. of transactions

Confidence:

The confidence of a rule is defined:

Conf(X→Y) = Supp(X U Y)/Supp(X)

33
UE163048

Lift:

The lift of a rule is defined as:

Lift(X→Y) = supp(X U Y)
-----------------
supp(Y)*sup(X)

Leverage:
Leverage is the proportion of additional elements covered by both the premise and consequence above
the expected if independent.

lev(X →Y ) = supp(X U Y ) − sup(X). supp(Y)

RESULT ANALYSIS:

The KDD (Knowledge Discovery in Databases) paradigm is a step by step process for finding
interesting patterns in large amounts of data. Data mining is one step in the process. The algorithms'
potential as good analytical tools for performance evaluation is shown by looking at results from a
computer performance dataset. It is much easier to store data than it is to make sense of it. Being able
to find relationships in large amounts of stored data can lead to enhanced analysis strategies in fields
such as educational, marketing, computer performance analysis, and data analysis in general. The
problem addressed by KDD is to find patterns in these massive datasets. Traditionally data has been
analyzed manually, but there are human limits. Large databases offer too much data to analyze in the
traditional manner. The focus of this paper is to first summarize exactly what the KDD process is.

Procedure of prediction and analysis of Students performance using KDD:

1. After completing all part test (preprocess, classification, filter, association and visualization),
we are going to show the final accumulate structure of student performance by using KDD process.
2. If you choose ARFF file in your experiment then select ARFF loader or, you choose csv file in
your experiment then select on csv loader .we take csv file.
3. Click on csv loader and paste it on screen, then pass the dataset to next position.
4. For transfer numerical value to nominal value into csv file, use the intermediate filter “numeric to
nominal” .Then pass the dataset to next position.
5. To classify the file need some intermediate evaluation-
1. Class assigner (to assign the class)
2. Cross validation fold maker

34
UE163048

3. Training Set Maker (to train the dataset for prediction)

-by passing the dataset both of these three parts.

6. After that, need to choose a standard classifier to classify & prediction result of the given
dataset by test set and training set. Take classifier like- J48.
7. Then connect the classifier with some intermediate evaluation
I. classifier performance evaluator (use for getting some important parameter result)
II. prediction appender ( use for getting predicted result )
- Both connect by batch classifier from J48 classifier.

To show the result use text viewer ,by connect with text ,means to get parameters result that means,
confusion matrix , accuracy ,TP rate, FP rate, precision ,recall and so on & also predicted result along
with actual result.
8. (i) To show the generating graph or image by the classifier, need a graph viewer by passing
graphs signal.
(ii) There will be needed a visualization tool which is model performance chart by passing the
threshold data for getting some chart between classifier parameters and by using visualizable
error signal for getting some chart between error points between attributes( like- actual result
vs predicted result).
9. After completing Classification Stage, will go to Association Stage for generating association rules
by using Apriori algorithm, passing the dataset from loader to see the result of rules, need a text
viewer for showing the output by using text signal.
10. (i) At last, for visualization of the dataset need to choose Scatter Plot Matrix tool by passing the
dataset from loader.
(ii)After completing diagrams, need to load the data in the csv loader portion or tools and Click on
run at the left top portion then look on bottom status portion for checking success or errors point

WEKA LIMITATIONS :

There have some limitation in WEKA. Those are explained below-

1. In WEKA, when we have to declare any item set values in an attribute portion, then only those
items will be used in creating of data format. If we are not given those similar item set, then WEKA
show an error pop-up message because WEKA does not support any undeclared numerical or string
value.

2. In WEKA associate, class cannot be generated by using lift/others without confidence.

35
UE163048

CONCLUSION AND FUTURE WORK:

This paper presents data mining in Automobile environment that identifies car’s performance
patterns using association rule mining technique. The identified patterns are analyzed to offer a helpful
and constructive recommendations to the new car buyers to enhance their selection process..
Association rule mining has been applied to Cars/Automobiles for analysis of its performance. In this
research, the association rule mining technique is used to find hidden patterns and evaluate
automobiles’' performance and trends. Apriori algorithm is used for finding associations among
attributes.

The automobile performance was evaluated based on data collected from the market including
attributes like prices, mileage etc. . After that Zero-R classification algorithms were used. The data
mining tool used in the experiment was WEKA 3.8.2. Based on the accuracy and the classification
errors one may conclude that the Zero-R Classification method was the most suited algorithm for the
dataset. The Apriori algorithm was applied to the dataset using WEKA to find analysis of overall
automobile performance by some of the best rules. The data may be extended to collect some of the
extra technical skills of the automobile and mined with different classification algorithms to predict the
automobile performance.

In future work the authors also interested in working in future on data of each and every
automobile present in the market. It may define what kinds of construction mechanisms are
adapted for every automobile model who shares the same characteristics. It may also provide
various multidimensional summary reports and redefine pedagogical learning paths.

HDLT12300 Device Instruction Manual
100% (6)
HDLT12300 Device Instruction Manual
88 pages
New Python Basics Assignment
0% (1)
New Python Basics Assignment
5 pages
Prolog Lab File
0% (2)
Prolog Lab File
20 pages
Complex Algebraic Curves - Kirwan
100% (7)
Complex Algebraic Curves - Kirwan
96 pages
Glossary of Philatelic Terms
No ratings yet
Glossary of Philatelic Terms
54 pages
Phase 2 Final Report Depression Detection
No ratings yet
Phase 2 Final Report Depression Detection
48 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
DB Syllabus DBATU (5) 55
100% (1)
DB Syllabus DBATU (5) 55
3 pages
Data manipulation in R
No ratings yet
Data manipulation in R
5 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Data Warehouse - Bitmap Indexing
No ratings yet
Data Warehouse - Bitmap Indexing
24 pages
Depth Limit Search Input
No ratings yet
Depth Limit Search Input
3 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
5.knowledge Acquisition in Artificial Intelligence
No ratings yet
5.knowledge Acquisition in Artificial Intelligence
19 pages
Music Organizer Report
50% (2)
Music Organizer Report
21 pages
Final Report
No ratings yet
Final Report
49 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
Weka Lab Record Experiments
No ratings yet
Weka Lab Record Experiments
21 pages
DSBDA Practical Final
No ratings yet
DSBDA Practical Final
49 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
A New Approach: Data Segregation Model
No ratings yet
A New Approach: Data Segregation Model
3 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Humidity Sunny: (For Low) (For Sunny (For Yes) (For
100% (1)
Humidity Sunny: (For Low) (For Sunny (For Yes) (For
4 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Final Lab Manual WEB
No ratings yet
Final Lab Manual WEB
62 pages
Enterprise Computing With Java Practical File: Master of Computer Application
No ratings yet
Enterprise Computing With Java Practical File: Master of Computer Application
45 pages
Ai Practical File Gtu
No ratings yet
Ai Practical File Gtu
43 pages
Steganography Project Report For Major Project in B Tech
No ratings yet
Steganography Project Report For Major Project in B Tech
74 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
Database Design and Management Laboratory Manual
No ratings yet
Database Design and Management Laboratory Manual
46 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
100% (1)
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
8 pages
ML LAB Viva Questions with Answers
No ratings yet
ML LAB Viva Questions with Answers
10 pages
IT6702-Data Warehousing and Data Mining
0% (1)
IT6702-Data Warehousing and Data Mining
12 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Unit - 4 Pushdown Automata
No ratings yet
Unit - 4 Pushdown Automata
64 pages
CS8691-Artificial Intelligence NOTES 1
No ratings yet
CS8691-Artificial Intelligence NOTES 1
220 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
7th Sem 1
No ratings yet
7th Sem 1
32 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
APP Question Bank Unit3
100% (1)
APP Question Bank Unit3
5 pages
Laboratory Record Note Book: Amity University Chhattisgarh
No ratings yet
Laboratory Record Note Book: Amity University Chhattisgarh
21 pages
Bangladeshi Flower Identification Using Computer Vision and Machine Learning Techniques
100% (1)
Bangladeshi Flower Identification Using Computer Vision and Machine Learning Techniques
16 pages
Clouds and Big Data Computing
No ratings yet
Clouds and Big Data Computing
13 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
Spammer Detect Project Document
No ratings yet
Spammer Detect Project Document
45 pages
Unit 4 - 4.4
No ratings yet
Unit 4 - 4.4
23 pages
ST Microelectronics Interview Questions
No ratings yet
ST Microelectronics Interview Questions
4 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Ieee Paper
No ratings yet
Ieee Paper
5 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
UNIT 2 Bigdata Mining and Analytics
No ratings yet
UNIT 2 Bigdata Mining and Analytics
18 pages
SRM Institute of Science and Technology
No ratings yet
SRM Institute of Science and Technology
6 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
DMDV_210
No ratings yet
DMDV_210
63 pages
DMDV_210
No ratings yet
DMDV_210
61 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
DWM1
No ratings yet
DWM1
19 pages
AWS Well Architected
No ratings yet
AWS Well Architected
7 pages
Selena Depaz: Current GPA: 3.866 Reporter Austin, TX
No ratings yet
Selena Depaz: Current GPA: 3.866 Reporter Austin, TX
1 page
Review of Instrument Landing System: Mutaz Mohammed Abdalla Eltahier, Prof - Khalid Hamid
No ratings yet
Review of Instrument Landing System: Mutaz Mohammed Abdalla Eltahier, Prof - Khalid Hamid
8 pages
List of HEIs in The NCR
No ratings yet
List of HEIs in The NCR
7 pages
User Instructions: For Smartlf Scan! Large Format Scanner Rev. G June 2016 (F/W 1.01)
No ratings yet
User Instructions: For Smartlf Scan! Large Format Scanner Rev. G June 2016 (F/W 1.01)
25 pages
AI for People and Business: A Framework for Better Human Experiences and Business Success Alex Castrounis - Download the ebook now to start reading without waiting
100% (1)
AI for People and Business: A Framework for Better Human Experiences and Business Success Alex Castrounis - Download the ebook now to start reading without waiting
69 pages
Aveva P&id
100% (5)
Aveva P&id
814 pages
28.9 - Domain Specific Kernels - mp4
No ratings yet
28.9 - Domain Specific Kernels - mp4
2 pages
Btech 1 Sem Programming For Problem Solving kcs101 2022
No ratings yet
Btech 1 Sem Programming For Problem Solving kcs101 2022
2 pages
Cs6703 Grid and Cloud Computing - Syllabus
No ratings yet
Cs6703 Grid and Cloud Computing - Syllabus
3 pages
Cognos Cube Tutorial
100% (1)
Cognos Cube Tutorial
41 pages
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
No ratings yet
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
22 pages
Get THE ADOBE PHOTOSHOP LAYERS BOOK 1st Edition Richard Lynch free all chapters
100% (6)
Get THE ADOBE PHOTOSHOP LAYERS BOOK 1st Edition Richard Lynch free all chapters
71 pages
Order Reduction and Variation of Parameters Reduction of Order
No ratings yet
Order Reduction and Variation of Parameters Reduction of Order
5 pages
Weight-Biased Leftist Heaps Advanced)
No ratings yet
Weight-Biased Leftist Heaps Advanced)
64 pages
Sharda_11e_full_accessible_ppt_02
No ratings yet
Sharda_11e_full_accessible_ppt_02
34 pages
TD Sigma Tracker en
No ratings yet
TD Sigma Tracker en
2 pages
Instant Download Nanoscale Electronic Devices and Their Applications 1st Edition Khurshed Ahmad Shah (Author) PDF All Chapter
100% (3)
Instant Download Nanoscale Electronic Devices and Their Applications 1st Edition Khurshed Ahmad Shah (Author) PDF All Chapter
52 pages
OutSystems Agile Platform 5.0 - Form Validations
No ratings yet
OutSystems Agile Platform 5.0 - Form Validations
9 pages
Operation Manual OMD
No ratings yet
Operation Manual OMD
3 pages
D-LINK Setup Manual
No ratings yet
D-LINK Setup Manual
61 pages
XPON ONU、SFP、OLT LIST
No ratings yet
XPON ONU、SFP、OLT LIST
13 pages
Inventory Control & Improving Record Accuracy in Production: Dr. Elbahlul M. Abogrean, Tajedeen R. Own
No ratings yet
Inventory Control & Improving Record Accuracy in Production: Dr. Elbahlul M. Abogrean, Tajedeen R. Own
6 pages
Air India Web Booking Eticket (W9NOJU) - SARANSH
No ratings yet
Air India Web Booking Eticket (W9NOJU) - SARANSH
2 pages
Data Mining: Homework 1 Solution
No ratings yet
Data Mining: Homework 1 Solution
5 pages
SSL Report:: Scan Another
No ratings yet
SSL Report:: Scan Another
5 pages

Data Mining Lab Report

Uploaded by

Data Mining Lab Report

Uploaded by

UE163048

DATA MINING LAB REPORT

UNIVERSITY INSTITUTE OF ENGINEERING AND

NAME: HIMANI CHOPRA

CLASS: B.E. CSE (6th SEM) – SECTION 1

ROLL NO.: UE163048

Mr. KUSHAL KANWAR

2. To find maximum of two numbers

3.To calculate maximum of two numbers

2. Explicit Cursor: To select customer ID and name from customers

2. Create trigger after update

First Normal Form:

Second Normal Form :

Third Normal Form :

3 (b): EXPLORING APRIORI IN WEKA

3 (c): EXPLORING FILTERED ASSOCIATER GROWTH IN WEKA

Practical-4 (Classification and Clustering)

Zero-R with 10 cross-validation folds

Zero-R with 25 cross-validation folds

Decision Table with 10 cross-validation folds

Simple K-Means with 50% split.

Simple K-Means with 75% split.

3 (a): Exploring preprocessing in Weka

2. Merge Nominal Values

3 (b): Exploring Apriori in Weka

3 (c): Exploring Filtered Associater Growth in Weka

What is Data Mining?

What is Knowledge Discovery Database?

Fig - Methodology Steps

DESIGN AND IMPLEMENTATION :

● Classification - To predict nominal or numeric quantities we have classifiers in WEKA. For

Here are some others factor in classifier output-

● TP Rate : rate of true positives (instances correctly classified as a given class)

● FP Rate : rate of false positives (instances falsely classified as a given class)

● F-Measure: A combined measure for precision and recall calculated as 2 * Precision

Apriori algorithm in Weka:

Association rule generation is usually split up into two separate steps:

Support (X=>Y) =N (X=>Y) / N

Lift {X=>Y} = confidence (X=>Y) / support (Y)

= support (X=>Y) / support (X). support (Y)

lev(X →Y ) = supp(X,Y ) − sup(X). supp(Y)

Practical work on Apriori in Weka tools:

Result of Apriori Algorithm

Generated sets of large itemsets:

Best rules found:

1. num-of-cylinders=four 159 ==> engine-location=front 159 <conf:(1)>

Interestingness measures of rules in weka:

1. P(X)= count of total no of tuples at antecedent

Support {X=>Y} =N {X=>Y} / N

The confidence of a rule is defined:

Conf(X→Y) = Supp(X U Y)/Supp(X)

The lift of a rule is defined as:

lev(X →Y ) = supp(X U Y ) − sup(X). supp(Y)

Procedure of prediction and analysis of Students performance using KDD:

3. Training Set Maker (to train the dataset for prediction)

-by passing the dataset both of these three parts.

There have some limitation in WEKA. Those are explained below-

2. In WEKA associate, class cannot be generated by using lift/others without confidence.

CONCLUSION AND FUTURE WORK:

You might also like