DWDM Lab File
DWDM Lab File
Theory:
Description -
OLAP is an acronym for On Line Analytical Processing. An OLAP system manages
large amount of historical data and provides facilities.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss
OLAP operations in multidimensional data.
Here is the list of OLAP operations:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways:
By climbing up a concept hierarchy for a dimension
By dimension reduction
The following diagram illustrates how roll-up works
o Roll-up is performed by climbing up a concept hierarchy for the
dimension location.
o Initially the concept hierarchy was "street < city < province < country".
o On rolling up, the data is aggregated by ascending the location hierarchy
from the level of city to the level of country.
o The data is grouped into cities rather than countries.
o When roll-up is performed, one or more dimensions from the data cube
are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the
following ways:
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.
The following diagram illustrates how drill-down works:
Slice
The slice operation selects one particular dimension from a given cube and
provides a new sub-cube.
Assessment Questions:
1. Star schema vs snowflake schema
2. Dimensional table Vs. Relational Table
3. Advantages of snowflake schema
Conclusion:
Through OLAP operations the data can be extracted in different fashion. This
helps further to analyse the data.
EXPERIMENT- 2
Theory:
When the above code is executed at SQL prompt, it produces the following
result:
Note:
1. Advantages of varrays
Conclusion:
I have understood the process of creating and handling the varying arrays.
Experiment - 3
We have already discussed varray in the chapter 'PL/SQL arrays'. In this chapter,
we will discuss PL/SQL tables.
Both types of PL/SQL tables, i.e., index-by tables and nested tables have the
same structure and their rows are accessed u cannot.
Index-By Table
An index-by table (also called an associative array) is a set of key-value pairs. Each
key is unique and is used to locate the
An index-by table is created using the following syntax. Here, we are creating an
index-by table namedtable_name whos
Example:
Following example shows how to create a table to store integer values along
with names and later it prints the same list of
When the above code is executed at SQL prompt, it produces the following
result:
Example:
Elements of an index-by table could also be a %ROWTYPE of any database table
or %TYPE of any database table field.
When the above code is executed at SQL prompt, it produces the following
result:
Nested Tables
A nested table is like a one-dimensional array with an arbitrary number of
elements. However, a nested table differs from.
o An array has a declared number of elements, but a nested table does not.
The size of a nested table can increase dy.
o An array is always dense, i.e., it always has consecutive subscripts. A
nested
o array is dense initially, but it can beco
A nested table is created using the following syntax:
This declaration is similar to declaration of an index-by table, but there is no
INDEX BY clause.
A nested table can be stored in a database column and so it could be used for
simplifying SQL operations where you join a
Example:
The following examples illustrate the use of nested table:
When the above code is executed at SQL prompt, it produces the following
result:
Example:
Elements of a nested table could also be a %ROWTYPE of any database table or
%TYPE of any database table field. The following
above code is executed at SQL prompt, it produces the following result:
Customer(1):
Customer (1):Ramesh
Ramesh
Customer(2):
Customer (2):Khilan
Khilan
Customer(3):
Customer (3):kaushik
kaushik
Customer(4):
Customer (4):Chaitali
Chaitali
Customer(5):
Customer (5):Hardik
Hardik
Customer(6):
Customer (6):Komal
Komal
ETL Process:
Extract The Extract step covers the data extraction from the source system and
makes it accessible for further processing. The main objective of the extract step
is to retrieve all the required data from the source system with as little resources
as possible. The extract step should be designed in a way that it does not
negatively affect the source system in terms or performance, response time or
any kind of locking.
There are several ways to perform the extract:
Update notification - if the source system is able to provide a notification that
a record has been changed and describe the change, this is the easiest way to
get the data.
Incremental extract - some systems may not be able to provide notification
that an update has occurred, but they are able to identify which records have
been modified and provide an extract of such records. During further ETL steps,
the system needs to identify changes and propagate it down. Note, that by using
daily extract, we may not be able to handle deleted records properly.
Full extract - some systems are not able to identify which data has been
changed at all, so a full extract is the only way one can get the data out of the
system. The full extract requires keeping a copy of the last extract in the same
format in order to be able to identify changes. Full extract handles deletions as
well.
Transform
The transform step applies a set of rules to transform the data from the source
to the target. This includes converting any measured data to the same
dimension (i.e. conformed dimension) using the same units so that they can
later be joined. The transformation step also requires joining data from several
sources, generating aggregates, generating surrogate keys, sorting, deriving new
calculated values, and applying advanced validation rules.
Load
During the load step, it is necessary to ensure that the load is performed
correctly and with as little resources as possible. The target of the Load process
is often a database. In order to make the load process efficient, it is helpful to
disable any constraints and indexes before the load and enable them back only
after the load completes. The referential integrity needs to be maintained by ETL
tool to ensure consistency.
OUTPUT:
EXPERIMENT - 5
Step 2: Take all supports in the transaction with higher support value than the
minimum or selected support value.
Step 3: Find all the rules of these subsets that have higher confidence value than
the threshold or minimum confidence.
OUTPUT:
EXPERIMENT – 6
OUTPUT:
EXPERIMENT - 7
OUTPUT:
EXPERIMENT - 11
Aim: To perform the classification by decision tree induction using weka tools.
Steps involved in this experiment:
NAME
weka.classifiers.trees.J48
SYNOPSIS
Class for generating a pruned or unpruned C4.5 decision tree.
For more information, see Ross Quinlan (1993). C4.5: Programs for Machine
Learning. Morgan Kaufmann Publishers, San Mateo, CA.
OPTIONS:
seed -- The seed used for randomizing the data when reduced-error
pruning is used.
unpruned -- Whether pruning is performed.
confidenceFactor -- The confidence factor used for pruning (smaller values
incur more pruning). numFolds -- Determines the amount of data used for
reduced-error pruning. One fold is used for pruning, the rest for growing
the tree.
numDecimalPlaces -- The number of decimal places to be used for the
output of numbers in the model.
batchSize -- The preferred number of instances to process if batch
prediction is being performed. More or fewer instances may be provided,
but this gives implementations a chance to specify a preferred batch size.
reducedErrorPruning -- Whether reduced-error pruning is used instead of
C.4.5 pruning.
useLaplace -- Whether counts at leaves are smoothed based on Laplace.
doNotMakeSplitPointActualValue -- If true, the split point is not relocated
to an actual data value. This can yield substantial speed-ups for large
datasets with numeric attributes.
debug -- If set to true, classifier may output additional info to the console.
subtreeRaising -- Whether to consider the subtree raising operation when
pruning.
saveInstanceData -- Whether to save the training data for visualization.
binarySplits -- Whether to use binary splits on nominal attributes when
building the trees.
doNotCheckCapabilities -- If set, classifier capabilities are not checked
before classifier is built (Use with caution to reduce runtime).
minNumObj -- The minimum number of instances per leaf.
useMDLcorrection -- Whether MDL correction is used when finding splits
on numeric attributes.
collapseTree -- Whether parts are removed that do not reduce training
error.
Step 1: We begin the experiment by loading the data (employee.arff) into
weka.
Step 2: next we select the “classify” tab and click “choose” button to
select the “id3”classifier.
Step 3: now we specify the various parameters. These can be specified by
clicking in the text box to the right of the chose button. In this example,
we accept the default values his default version does perform some
pruning but does not perform error pruning.
Step 4: under the “text “options in the main panel. We select the 10-fold
cross validation as our evaluation approach. Since we don‟t have separate
evaluation data set, this is necessary to get a reasonable idea of accuracy
of generated model.
Step 5: we now click”start”to generate the model .the ASCII version of the
tree as well as evaluation statistic will appear in the right panel when the
model construction is complete.
Step 6: note that the classification accuracy of model is about 69%.this
indicates that we may find more work. (Either in preprocessing or in
selecting current parameters for the classification)
Step 7: now weka also lets us a view a graphical version of the
classification tree. This can be done by right clicking the last result set and
selecting “visualize tree” from the pop-up menu.
Step 8: we will use our model to classify the new instances.
Step 9: In the main panel under “text “options click the “supplied test set”
radio button and then click the “set” button. This will show pop-up
window which will allow you to open the file containing test instances.
OUTPUT: