0% found this document useful (0 votes)
8 views29 pages

DWDM Lab File

Lab file for 7th semester aktu dwdm

Uploaded by

taman2246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

DWDM Lab File

Lab file for 7th semester aktu dwdm

Uploaded by

taman2246
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

EXPERIMENT - 1

Aim: Implementation of OLAP operations


Objective:
 To learn fundamentals of data warehousing
 To learn concepts of dimensional modelling
 To learn OLAP operations

Theory:
Description -
OLAP is an acronym for On Line Analytical Processing. An OLAP system manages
large amount of historical data and provides facilities.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss
OLAP operations in multidimensional data.
Here is the list of OLAP operations:
 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)

Roll-up
Roll-up performs aggregation on a data cube in any of the following ways:
 By climbing up a concept hierarchy for a dimension
 By dimension reduction
The following diagram illustrates how roll-up works
o Roll-up is performed by climbing up a concept hierarchy for the
dimension location.
o Initially the concept hierarchy was "street < city < province < country".
o On rolling up, the data is aggregated by ascending the location hierarchy
from the level of city to the level of country.
o The data is grouped into cities rather than countries.
o When roll-up is performed, one or more dimensions from the data cube
are removed.

Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the
following ways:
 By stepping down a concept hierarchy for a dimension
 By introducing a new dimension.
The following diagram illustrates how drill-down works:

o Drill-down is performed by stepping down a concept hierarchy for the


dimension time.
o Initially the concept hierarchy was "day < month < quarter < year."
o On drilling down, the time dimension is descended from the level of
quarter to the level of month.
o When drill-down is performed, one or more dimensions from the data
cube are added.
o It navigates the data from less detailed data to highly detailed data.

Slice
The slice operation selects one particular dimension from a given cube and
provides a new sub-cube.

Assessment Questions:
1. Star schema vs snowflake schema
2. Dimensional table Vs. Relational Table
3. Advantages of snowflake schema
Conclusion:
Through OLAP operations the data can be extracted in different fashion. This
helps further to analyse the data.
EXPERIMENT- 2

Aim: Implementation of Varying Arrays

Objective: To learn fundamentals of var arrays

Theory:

PL/SQL programming language provides a data structure called the VARRAY,


which can store a fixed-size sequential co variables of the same type.

All varrays consist of contiguous memory locations. The lowest address


corresponds to the first element and the highest a

Creating a Varray Type


A varray type is created with the CREATE TYPE statement. You must specify the
maximum size and the type of elements The basic syntax for creating a VRRAY
type at the schema level is:
Example 1
The following program illustrates using varrays:

When the above code is executed at SQL prompt, it produces the following
result:

Note:

o In oracle environment, the starting index for varrays is always 1.

o You can initialize the varray elements using the constructor

method of the varray type, which has the same name a

o Varrays are one-dimensional arrays.


o A varray is automatically NULL when it is declared and must be
initialized before its elements can be referenced.

Post lab assignment:

1. Advantages of varrays

Conclusion:
I have understood the process of creating and handling the varying arrays.
Experiment - 3

Aim: Implementation of Nested Tables


Objective: To learn fundamentals of Nested Arrays
Theory:
A collection is an ordered group of elements having the same data type. Each
element is identified by a u
PL/SQL provides three collection types:
o Index-by tables or Associative array
o Nested table
o Variable-size array or Varray

Oracle documentation provides the following characteristics for each type of


collections:

We have already discussed varray in the chapter 'PL/SQL arrays'. In this chapter,
we will discuss PL/SQL tables.

Both types of PL/SQL tables, i.e., index-by tables and nested tables have the
same structure and their rows are accessed u cannot.

Index-By Table
An index-by table (also called an associative array) is a set of key-value pairs. Each
key is unique and is used to locate the
An index-by table is created using the following syntax. Here, we are creating an
index-by table namedtable_name whos

Example:
Following example shows how to create a table to store integer values along
with names and later it prints the same list of

When the above code is executed at SQL prompt, it produces the following
result:

Example:
Elements of an index-by table could also be a %ROWTYPE of any database table
or %TYPE of any database table field.
When the above code is executed at SQL prompt, it produces the following
result:

Nested Tables
A nested table is like a one-dimensional array with an arbitrary number of
elements. However, a nested table differs from.
o An array has a declared number of elements, but a nested table does not.
The size of a nested table can increase dy.
o An array is always dense, i.e., it always has consecutive subscripts. A
nested
o array is dense initially, but it can beco
A nested table is created using the following syntax:
This declaration is similar to declaration of an index-by table, but there is no
INDEX BY clause.
A nested table can be stored in a database column and so it could be used for
simplifying SQL operations where you join a

Example:
The following examples illustrate the use of nested table:

When the above code is executed at SQL prompt, it produces the following
result:

Example:
Elements of a nested table could also be a %ROWTYPE of any database table or
%TYPE of any database table field. The following
above code is executed at SQL prompt, it produces the following result:
Customer(1):
Customer (1):Ramesh
Ramesh
Customer(2):
Customer (2):Khilan
Khilan
Customer(3):
Customer (3):kaushik
kaushik
Customer(4):
Customer (4):Chaitali
Chaitali
Customer(5):
Customer (5):Hardik
Hardik
Customer(6):
Customer (6):Komal
Komal

PL/SQL procedure successfully completed.


we have understood the process of creating and handling the Nested Tables. It is
different from the tables we handled so far.
EXPERIMENT - 4

Aim: Demonstration of ETL tool.

ETL Process:
Extract The Extract step covers the data extraction from the source system and
makes it accessible for further processing. The main objective of the extract step
is to retrieve all the required data from the source system with as little resources
as possible. The extract step should be designed in a way that it does not
negatively affect the source system in terms or performance, response time or
any kind of locking.
There are several ways to perform the extract:
 Update notification - if the source system is able to provide a notification that
a record has been changed and describe the change, this is the easiest way to
get the data.
 Incremental extract - some systems may not be able to provide notification
that an update has occurred, but they are able to identify which records have
been modified and provide an extract of such records. During further ETL steps,
the system needs to identify changes and propagate it down. Note, that by using
daily extract, we may not be able to handle deleted records properly.
 Full extract - some systems are not able to identify which data has been
changed at all, so a full extract is the only way one can get the data out of the
system. The full extract requires keeping a copy of the last extract in the same
format in order to be able to identify changes. Full extract handles deletions as
well.

Transform
The transform step applies a set of rules to transform the data from the source
to the target. This includes converting any measured data to the same
dimension (i.e. conformed dimension) using the same units so that they can
later be joined. The transformation step also requires joining data from several
sources, generating aggregates, generating surrogate keys, sorting, deriving new
calculated values, and applying advanced validation rules.
Load
During the load step, it is necessary to ensure that the load is performed
correctly and with as little resources as possible. The target of the Load process
is often a database. In order to make the load process efficient, it is helpful to
disable any constraints and indexes before the load and enable them back only
after the load completes. The referential integrity needs to be maintained by ETL
tool to ensure consistency.

OUTPUT:
EXPERIMENT - 5

Aim: Write a program of Apriori algorithm using any programming Language.


Below are the steps for the apriori algorithm:
Step 1: Determine the support of itemsets in the transactional database, and
select the minimum support and confidence.

Step 2: Take all supports in the transaction with higher support value than the
minimum or selected support value.

Step 3: Find all the rules of these subsets that have higher confidence value than
the threshold or minimum confidence.

Step 4: Sort the rules as the decreasing order of lift.

OUTPUT:
EXPERIMENT – 6

Aim: Demonstration of preprocessing on .arff file uses student data.


The procedure for creating a ARFF File in Weka is quite simple.
Note: This is for a XLSX file/dataset containing alphanumeric values.
1) If you have a XLSX file then you need to convert it into a CSV (Comma
Separated Values) File.
2) Then Open the CSV File with a text editor eg .Notepad++
3) Append header relation e.g. @relation compile-
weka.filters.unsupervised.attribute
4) After that append the file with headers equal to the number of instances
in your XLSX file e.g. @attribute max numeric @attribute min numeric
@attribute mean numeric @attribute median numeric. This means the
file has four columns excluding the class label.
5) Add the class label relation eg. @attribute CLASS {0,1} This has 2 classes
mainly 0 and after that append the header with @data and then save the
file as. arff A complete example of the ARFF header can be as follows.
Dataset student. arff
@relation student
@attribute age {40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
30, high, no, fair, no
30, high, no, excellent, no
30-40, high, no, fair, yes
40, medium, no, fair, yes
40, low, yes, fair, yes
40, low, yes, excellent, no
30-40, low, yes, excellent, yes
30, medium, no, fair, no
30, low, yes, fair, no
40, medium, yes, fair, yes
And then execute.

OUTPUT:
EXPERIMENT - 7

Aim: Demonstration of association rule mining using apriory algorithm on


supermarket data.
NAME
weka.associations.Apriori
SYNOPSIS
Class implementing an Apriori-type algorithm. Iteratively reduces the minimum
support until it finds the required number of rules with the given minimum
confidence.
The algorithm has an option to mine class association rules. It is adapted as
explained in the second reference.
For more information see:
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large
Databases. In: 20th International Conference on Very Large Data Bases, 478-
499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule
Mining. In: Fourth International Conference on Knowledge Discovery and Data
Mining, 80-86, 1998.
OPTIONS:
 minMetric -- Minimum metric score. Consider only rules with scores
higher than this value.
 verbose -- If enabled the algorithm will be run in verbose mode.
 numRules -- Number of rules to find.
 lowerBoundMinSupport -- Lower bound for minimum support.
 classIndex -- Index of the class attribute. If set to -1, the last attribute is
taken as class attribute.
 outputItemSets -- If enabled the itemsets are output as well.
 car -- If enabled class association rules are mined instead of (general)
association rules.
 doNotCheckCapabilities -- If set, associator capabilities are not checked
before associator is built (Use with caution to reduce runtime).
 removeAllMissingCols -- Remove columns with all missing values.
 significanceLevel -- Significance level. Significance test (confidence metric
only).
 treatZeroAsMissing -- If enabled, zero (that is, the first value of a nominal)
is treated in the same way as a missing value.
 delta -- Iteratively decrease support by this factor. Reduces support until
min support is reached or required number of rules has been generated.
 metricType -- Set the type of metric by which to rank rules. Confidence is
the proportion of the examples covered by the premise that are also
covered by the consequence (Class association rules can only be mined
using confidence). Lift is confidence divided by the proportion of all
examples that are covered by the consequence. This is a measure of the
importance of the association that is independent of support. Leverage is
the proportion of additional examples covered by both the premise and
consequence above those expected if the premise and consequence were
independent of each other. The total number of examples that this
represents is presented in brackets following the leverage. Conviction is
another measure of departure from independence. Conviction is given by
P(premise)P(!consequence) / P(premise, !consequence).
 upperBoundMinSupport -- Upper bound for minimum support. Start
iteratively decreasing minimum support from this valueThis experiment
illustrates some of the basic elements of association rule mining using
WEKA. The sample dataset used for this example is contactlenses.arff
Step 1: Open the data file in Weka Explorer. It is presumed that the required
data fields have been discretized. In this example it is age attribute.
Step 2: Clicking on the associate tab will bring up the interface for association
rule algorithm.
Step 3: We will use apriori algorithm. This is the default algorithm.
Step 4: In order to change the parameters for the run (example support,
confidence etc) we click on the text box immediately to the right of the
choose button.
OUTPUT:
EXPERIMENT - 8

Aim: Demonstration of classification rule process on dataset student.arff using


j48 algorithm.
This experiment illustrates the use of j-48 classifier in weka. The sample data set
used in this experiment is “student” data available at arff format. This document
assumes that appropriate data pre processing has been performed.
Steps involved in this experiment:
Step 1: We begin the experiment by loading the data (student.arff)into weka.
Step 2: Next we select the “classify” tab and click “choose” button to select the
“j48”classifier.
Step 3: Now we specify the various parameters. These can be specified by
clicking in the text box to the right of the chose button. In this example,
we accept the default values. The default version does perform some
pruning but does not perform error pruning.
Step 4: Under the “text” options in the main panel. We select the 10-fold cross
validation as our evaluation approach. Since we don’t have separate
evaluation data set, this is necessary to get a reasonable idea of accuracy
of generated model.
Step 5: We now click ”start” to generate the model .the Ascii version of the tree
as well as evaluation statistic will appear in the right panel when the
model construction is complete.
Step 6: Note that the classification accuracy of model is about 69%.this indicates
that we may find more work. (Either in preprocessing or in selecting
current parameters for the classification)
Step 7: Now weka also lets us a view a graphical version of the classification
tree. This can be done by right clicking the last result set and selecting
“visualize tree” from the pop-up menu.
Step 8: We will use our model to classify the new instances.
Step 9: In the main panel under “text” options click the “supplied test set” radio
button and then click the “set” button. This wills pop-up a window which
will allow you to open the file containing test instances.
OUTPUT:
EXPERIMENT - 9

Aim: Demonstration of classification rule process on dataset employee.arff using


naïve bayes algorithm.
This experiment illustrates the use of naïve bayes classifier in weka. The sample data set
used in this experiment is “employee” data available at .arff format. This document
assumes that appropriate data pre processing has been performed.
Steps involved in this experiment:
Step 1: We begin the experiment by loading the data (employee.arff) into weka.
Step 2: next we select the “classify” tab and click “choose” button to select the
“id3”classifier.
Step 3: now we specify the various parameters. These can be specified by clicking in
the text box to the right of the chose button. In this example, we accept the
default values his default version does perform some pruning but does not
perform error pruning.
Step 4: under the “text “options in the main panel. We select the 10-fold cross
validation as our evaluation approach. Since we don’t have separate evaluation
data set, this is necessary to get a reasonable idea of accuracy of generated
model.
Step 5: we now click”start”to generate the model .the ASCII version of the tree as well
as evaluation statistic will appear in the right panel when the model
construction is complete.
Step 6: note that the classification accuracy of model is about 69%.this indicates that
we may find more work. (Either in preprocessing or in selecting current
parameters for the classification)
Step 7: now weka also lets us a view a graphical version of the classification tree. This
can be done by right clicking the last result set and selecting “visualize tree”
from the pop-up menu.
Step 8: we will use our model to classify the new instances.
Step 9: In the main panel under “text “options click the “supplied test set” radio
button and then click the “set” button. This will show pop-up window which will
allow you to open the file containing test instances.
OUTPUT:
EXPERIMENT - 10

Aim: Demonstration of clustering rule process on dataset iris.arff using simple


k-means.
This experiment illustrates the use of simple k-mean clustering with Weka
explorer. The sample data set used for this example is based on the iris data
available in ARFF format. This document assumes that appropriate
preprocessing has been performed. This iris dataset includes 150 instances.
Steps involved in this Experiment:
Step 1: Run the Weka explorer and load the data file iris.arff in preprocessing
interface.
Step 2: In order to perform clustering, select the ‘cluster’ tab in the explorer and
click on the choose button. This step results in a dropdown list of available
clustering algorithms.
Step 3: In this case we select ‘simple k-means’.
Step 4: Next click in text button to the right of the choose button to get popup
window shown in the screenshots. In this window we enter six on the
number of clusters and we leave the value of the seed on as it is. The
seed value is used in generating a random number which is used for
making the internal assignments of instances of clusters.
Step 5: Once of the option have been specified. We run the clustering algorithm
there we must make sure that they are in the ‘cluster mode’ panel. The
use of training set option is selected and then we click ‘start’ button. This
process and resulting window are shown in the following screenshots.
Step 6: The result window shows the centroid of each cluster as well as statistics
on the number and the percent of instances assigned to different clusters.
Here clusters centroid are means vectors for each cluster. This clusters can
be used to characterized the cluster. For eg, the centroid of cluster1
shows the class iris. Versicolor mean value of the sepal length is 5.4706,
sepal width 2.4765, petal width 1.1294, petal length 3.7941.
Step 7: Another way of understanding characteristics of each cluster through
visualization, we can do this, try right clicking the result set on the result.
List panel and selecting the visualize cluster assignments.

OUTPUT:
EXPERIMENT - 11

Aim: To perform the classification by decision tree induction using weka tools.
Steps involved in this experiment:
NAME
weka.classifiers.trees.J48
SYNOPSIS
Class for generating a pruned or unpruned C4.5 decision tree.
For more information, see Ross Quinlan (1993). C4.5: Programs for Machine
Learning. Morgan Kaufmann Publishers, San Mateo, CA.
OPTIONS:
 seed -- The seed used for randomizing the data when reduced-error
pruning is used.
 unpruned -- Whether pruning is performed.
 confidenceFactor -- The confidence factor used for pruning (smaller values
incur more pruning). numFolds -- Determines the amount of data used for
reduced-error pruning. One fold is used for pruning, the rest for growing
the tree.
 numDecimalPlaces -- The number of decimal places to be used for the
output of numbers in the model.
 batchSize -- The preferred number of instances to process if batch
prediction is being performed. More or fewer instances may be provided,
but this gives implementations a chance to specify a preferred batch size.
 reducedErrorPruning -- Whether reduced-error pruning is used instead of
C.4.5 pruning.
 useLaplace -- Whether counts at leaves are smoothed based on Laplace.
 doNotMakeSplitPointActualValue -- If true, the split point is not relocated
to an actual data value. This can yield substantial speed-ups for large
datasets with numeric attributes.
 debug -- If set to true, classifier may output additional info to the console.
 subtreeRaising -- Whether to consider the subtree raising operation when
pruning.
 saveInstanceData -- Whether to save the training data for visualization.
 binarySplits -- Whether to use binary splits on nominal attributes when
building the trees.
 doNotCheckCapabilities -- If set, classifier capabilities are not checked
before classifier is built (Use with caution to reduce runtime).
 minNumObj -- The minimum number of instances per leaf.
 useMDLcorrection -- Whether MDL correction is used when finding splits
on numeric attributes.
 collapseTree -- Whether parts are removed that do not reduce training
error.
 Step 1: We begin the experiment by loading the data (employee.arff) into
weka.
 Step 2: next we select the “classify” tab and click “choose” button to
select the “id3”classifier.
 Step 3: now we specify the various parameters. These can be specified by
clicking in the text box to the right of the chose button. In this example,
we accept the default values his default version does perform some
pruning but does not perform error pruning.
 Step 4: under the “text “options in the main panel. We select the 10-fold
cross validation as our evaluation approach. Since we don‟t have separate
evaluation data set, this is necessary to get a reasonable idea of accuracy
of generated model.
 Step 5: we now click”start”to generate the model .the ASCII version of the
tree as well as evaluation statistic will appear in the right panel when the
model construction is complete.
 Step 6: note that the classification accuracy of model is about 69%.this
indicates that we may find more work. (Either in preprocessing or in
selecting current parameters for the classification)
 Step 7: now weka also lets us a view a graphical version of the
classification tree. This can be done by right clicking the last result set and
selecting “visualize tree” from the pop-up menu.
 Step 8: we will use our model to classify the new instances.
 Step 9: In the main panel under “text “options click the “supplied test set”
radio button and then click the “set” button. This will show pop-up
window which will allow you to open the file containing test instances.
OUTPUT:

You might also like