Lab Manual DWN
Lab Manual DWN
Roll No:____
Name:__________________
Sem:_______Section______
Data Warehouse & Mining Lab Manual 2018
CERTIFICATE
Certified that this file is submitted by
Shri/Ku.___________________________________________________________
Roll No. ________a student of VII Semester final year of the course Computer
Tukadoji Maharaj Nagpur University for the subject Data Warehouse & Mining in
_________________________ and that I have instructed him/her for the said work,
And that I have accessed the said work and I am satisfied that the same is up to
Mission
To create conducive academic culture for learning and identifying career goals.
To provide quality technical education, research opportunities and imbibe
entrepreneurship skills contributing to the socio-economic growth of the Nation.
To inculcate values and skills, that will empower our students towards development
through technology.
Mission:
To create outcome based education environment for learning and identifying career
goals.
Provide latest tools in a learning ambience to enhance innovations, problem solving
skills, leadership qualities team spirit and ethical responsibilities.
Inculcating awareness through innovative activities in the emerging areas of
technology.
COURSE PRE-REQUISITES:
C.CODE COURSE NAME DESCRIPTION SEM
DATABASE MANAGEMENT SYSTEMS v
CO.1 Create a dataset for any application in the .arff format. LEVEL 6
CO.2 Describe various preprocessing techniques and statistical techniques LEVEL 1,3
and apply those techniques on the given data set.
CO.3 Apply various association rule mining algorithms on the given data LEVEL 3
set
CO.4 Apply various classification algorithms on the given data set. LEVEL 3
CO.5 Apply various clustering algorithms on the given data set. LEVEL 3
Make entry in the Log Book as soon as you enter the Laboratory.
All the students should sit according to their Roll Numbers.
All the students are supposed to enter the terminal number in the Log Book.
Do not change the terminal on which you are working.
Strictly observe the instructions given by the Faculty / Lab. Instructor.
Take permission before entering in the lab and keep your belongings in the
racks.
NO FOOD, DRINK, IN ANY FORM is allowed in the lab.
TURN OFF CELL PHONES! If you need to use it, please keep it in bags.
Avoid all horseplay in the laboratory. Do not misbehave in the computer
laboratory. Work quietly.
Save often and keep your files organized.
Don‟t change settings and surf safely.
Do not reboot, turn off, or move any workstation or PC.
Do not load any software on any lab computer (without prior permission of
Faculty and Technical Support Personnel). Only Lab Operators and Technical
Support Personnel are authorized to carry out these tasks.
Do not reconfigure the cabling/equipment without prior permission.
Do not play games on systems.
Turn off the machine once you are done using it.
Violation of the above rules and etiquette guidelines will result in disciplinary
action.
CONTENTS
Exp PAGE
NAME OF EXPERIMENT
No NO.
EXPERIMENT NO – 1
1) If you have a XLSX file then you need to convert it into a CSV (Comma Separated Values)
File.
2) Then Open the CSV File with a text editor eg .Notepad++
3) Append header relation e.g. @relation compile-weka.filters.unsupervised.attribute
4) After that append the file with headers equal to the number of instances in your XLSX file
e.g. @attribute max numeric @attribute min numeric @attribute mean numeric @attribute
median numeric. This means the file has four columns excluding the class label.
5) Add the class label relation eg. @attribute CLASS {0,1} This has 2 classes mainly 0 and after
that append the header with @data and then save the file as .arff
A complete example of the ARFF header can be as follows.
EXPERIMENT NO – 2
Step1: Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.
Step2: Once the data is loaded, weka will recognize the attributes and during the scan of the data
weka will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or table
and the current working relation (which are same initially).
Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous attributes
we can obtain min, max, mean, standard deviation and deviation etc.,
Step4: The visualization in the right button panel in the form of cross-tabulation across two
attributes.
a) Next click the textbox immediately to the right of the choose button. In the resulting dialog box
enter the index of the attribute to be filtered out.
b) Make sure that invert selection option is set to false. The click OK now in the filter box. You
will see “Remove-R-7”.
c) Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
d) Save the new working relation as an arff file by clicking save button on the
top(button)panel.(student.arff)
Let us divide the values of age attribute into three bins(intervals).First load
To change the defaults for the filters,click on the box immediately to the right of thechoose button.
We enter the index for the attribute to be discretized.In this case the attribute is age.So wemust
enter „1‟ corresponding to the age attribute.
Enter „3‟ as the number of bins.Leave the remaining field values as they are.
Click OK button.
Clicks apply in the filter panel. This will result in a new working relation with the selectedattribute
partition into 3 bins.
EXPERIMENT NO – 3
NAME
weka.associations.Apriori
SYNOPSIS
Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds
the required number of rules with the given minimum confidence.
The algorithm has an option to mine class association rules. It is adapted as explained in the second
reference.
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th
International Conference on Very Large Data Bases, 478-499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In:
Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998.
OPTIONS
minMetric -- Minimum metric score. Consider only rules with scores higher than this value.
classIndex -- Index of the class attribute. If set to -1, the last attribute is taken as class attribute.
car -- If enabled class association rules are mined instead of (general) association rules.
doNotCheckCapabilities -- If set, associator capabilities are not checked before associator is built
(Use with caution to reduce runtime).
treatZeroAsMissing -- If enabled, zero (that is, the first value of a nominal) is treated in the same
way as a missing value.
metricType -- Set the type of metric by which to rank rules. Confidence is the proportion of the
examples covered by the premise that are also covered by the consequence (Class association rules
can only be mined using confidence). Lift is confidence divided by the proportion of all examples
that are covered by the consequence. This is a measure of the importance of the association that is
independent of support. Leverage is the proportion of additional examples covered by both the
premise and consequence above those expected if the premise and consequence were independent of
each other. The total number of examples that this represents is presented in brackets following the
leverage. Conviction is another measure of departure from independence. Conviction is given by
P(premise)P(!consequence) / P(premise, !consequence).
upperBoundMinSupport -- Upper bound for minimum support. Start iteratively decreasing minimum
support from this valueThis experiment illustrates some of the basic elements of association rule
mining using WEKA. The sample dataset used for this example is contactlenses.arff
Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.
Step4: In order to change the parameters for the run (example support, confidence etc) we click on
the text box immediately to the right of the choose button.
The following screenshot shows the association rules that were generated when apriori algorithm is
applied on the given dataset.
EXPERIMENT NO – 4
weka.associations.FPGrowth
SYNOPSIS
Class implementing the FP-growth algorithm for finding large item sets without
candidate generation. Iteratively reduces the minimum support until it finds the
required number of rules with the given minimum metric. For more information see:
J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In:
Proceedings of the 2000 ACM-SIGMID International Conference on Management of
Data, 1-12, 2000.
OPTIONS
findAllRulesForSupportLevel -- Find all rules that meet the lower bound on minimum
support and the minimum metric constraint. Turning this mode on will disable the
iterative support reduction procedure to find the specified number of rules.
minMetric -- Minimum metric score. Consider only rules with scores higher than this
value.
rulesMustContain -- Only print rules that contain these items. Provide a comma
separated list of attribute names.
positiveIndex -- Set the index of binary valued attributes that is to be considered the
positive index. Has no effect for sparse data (in this case the first index (i.e. non-zero
values) is always treated as positive. Also has no effect for unary valued attributes
(i.e. when using the Weka Apriori-style format for market basket data, which uses
missing value "?" to indicate absence of an item.
delta -- Iteratively decrease support by this factor. Reduces support until min support
is reached or required number of rules has been generated.
metricType -- Set the type of metric by which to rank rules. Confidence is the
proportion of the examples covered by the premise that are also covered by the
consequence(Class association rules can only be mined using confidence). Lift is
confidence divided by the proportion of all examples that are covered by the
consequence. This is a measure of the importance of the association that is
independent of support. Leverage is the proportion of additional examples covered by
both the premise and consequence above those expected if the premise and
consequence were independent of each other. The total number of examples that this
represents is presented in brackets following the leverage. Conviction is another
measure of departure from independence.
Step1: open the data file in Weka Explorer. It is presumed that the required data fields
have been discretized .
Step2: Clicking on the associate tab will bring up the interface for association rule
algorithm.
Step4: In order to change the parameters for the run (example support confidence etc).
we click on the text box immediately to the right of the choose button
EXPERIMENT NO – 5
weka.classifiers.trees.J48
SYNOPSIS
Class for generating a pruned or unpruned C4.5 decision tree. For more information, see
Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
OPTIONS
seed -- The seed used for randomizing the data when reduced-error pruning is used.
debug -- If set to true, classifier may output additional info to the console.
subtreeRaising -- Whether to consider the subtree raising operation when pruning.
saveInstanceData -- Whether to save the training data for visualization.
binarySplits -- Whether to use binary splits on nominal attributes when building the trees.
doNotCheckCapabilities -- If set, classifier capabilities are not checked before classifier is built (Use with
caution to reduce runtime).
collapseTree -- Whether parts are removed that do not reduce training error.
Step1. We begin the experiment by loading the data (employee.arff) into weka.
Step2: next we select the “classify” tab and click “choose” button to select the “id3”classifier.
Step3: now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values his default version does
perform some pruning but does not perform error pruning.
Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we may find
more work. (Either in preprocessing or in selecting current parameters for the classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can be done
by right clicking the last result set and selecting “visualize tree” from the pop-up menu.
EXPERIMENT NO – 6
EXPERIMENT NO – 7
EXPERIMENT NO – 8
EXPERIMENT NO – 9
EXPERIMENT NO – 10