0% found this document useful (0 votes)

72 views24 pages

Data Warehousing Lab Exp 1-3

Uploaded by

rajar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views24 pages

Data Warehousing Lab Exp 1-3

Uploaded by

rajar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

22CSE009- DATA WAREHOUSING LABORATORY

Exp 1: Introduction to WEKA:
WEKA - an open source software provides tools for data preprocessing, implementation of
several Machine Learning algorithms, and visualization tools so that you can develop
machine learning techniques and apply them to real-world data mining problems. What
WEKA offers is summarized in the following diagram:

If you observe the beginning of the flow of the image, you will understand that there are
many stages in dealing with Big Data to make it suitable for machine learning:

First, you will start with the raw data collected from the field. This data may contain
several null values and irrelevant fields. You use the data preprocessing tools provided in
WEKA to cleanse the data.

Then, you would save the preprocessed data in your local storage for applying ML
algorithms.
Next, depending on the kind of ML model that you are trying to develop you would select
one of the options such as Classify, Cluster, or Associate. The Attributes Selection
allows the automatic selection of features to create a reduced dataset.

Note that under each category, WEKA provides the implementation of several algorithms.
You would select an algorithm of your choice, set the desired parameters and run it on
the dataset.

Then, WEKA would give you the statistical output of the model processing. It provides you
a visualization tool to inspect the data.

The various models can be applied on the same dataset. You can then compare the
outputs of different models and select the best that meets your purpose.

Thus, the use of WEKA results in a quicker development of machine learning models on
the whole.

Now that we have seen what WEKA is and what it does, in the next chapter let us learn
how to install WEKA on your local computer.
Weka Installation:
To install WEKA on your machine, visit WEKA’s official website and download the
installation file. WEKA supports installation on Windows, Mac OS X and Linux. You just
need to follow the instructions on this page to install WEKA for your OS.

The steps for installing on Mac are as follows:

 Download the Mac installation file.

 Double click on the downloaded weka-3-8-3-corretto-jvm.dmg file.

You will see the following screen on successful installation

 Click on the weak-3-8-3-corretto-jvm icon to start Weka.

 Optionally you may start it from the command line:

java -jar weka.jar

The WEKA GUI Chooser application will start and you would see the following screen:

The GUI Chooser application allows you to run five different types of applications as
listed here:
 Explorer
 Experimenter
 KnowledgeFlow
 Workbench
 Simple CLI

We will be using Explorer in this tutorial.

Weka – Launching Explorer

In this chapter, let us look into various functionalities that the explorer provides for
working with big data.
When you click on the Explorer button in the Applications selector, it opens the followingscreen:

On the top, you will see several tabs as listed here:

 Preprocess
 Classify
 Cluster
 Associate
 Select Attributes
 Visualize
Under these tabs, there are several pre-implemented machine learning algorithms. Let us
look into each of them in detail now.

Preprocess Tab
Initially as you open the explorer, only the Preprocess tab is enabled. The first step in
machine learning is to preprocess the data. Thus, in the Preprocess option, you will
select the data file, process it and make it fit for applying the various machine learning
algorithms.

Classify Tab
The Classify tab provides you several machine learning algorithms for the classification of
your data. To list a few, you may apply algorithms such as Linear Regression, Logistic
Regression, Support Vector Machines, Decision Trees, RandomTree, RandomForest,
NaiveBayes, and so on. The list is very exhaustive and provides both supervised and
unsupervised machine learning algorithms.

Cluster Tab
Under the Cluster tab, there are several clustering algorithms provided - such as
SimpleKMeans, FilteredClusterer, HierarchicalClusterer, and so on.

Associate Tab
Under the Associate tab, you would find Apriori, FilteredAssociator and FPGrowth.

Select Attributes Tab

Select Attributes allows you feature selections based on several algorithms such as
ClassifierSubsetEval, PrinicipalComponents, etc.

Visualize Tab
Lastly, the Visualize option allows you to visualize your processed data for analysis.

As you noticed, WEKA provides several ready-to-use algorithms for testing and building your
machine learning applications. To use WEKA effectively, you must have a sound
knowledge of these algorithms, how they work, which one to choose under what
circumstances, what to look for in their processed output, and so on. In short, you must
have a solid foundation in machine learning to use WEKA effectively in building your apps.

In the upcoming chapters, you will study each tab in the explorer in depth.
Exp 2: Data exploration and integration with WEKA
2.1) Weka : Loading data

In this chapter, we start with the first tab that you use to preprocess the data. This is
common to all algorithms that you would apply to your data for building the model and is
a common step for all subsequent operations in WEKA.

For a machine learning algorithm to give acceptable accuracy, it is important that you
must cleanse your data first. This is because the raw data collected from the field may
contain null values, irrelevant columns and so on.

In this chapter, you will learn how to preprocess the raw data and create a clean,
meaningful dataset for further use.

First, you will learn to load the data file into the WEKA explorer. The data can be loaded
from the following sources:
 Local file system
 Web
 Database
In this chapter, we will see all the three options of loading data in detail.

LoadingDatafromLocalFileSystem
Just under the Machine Learning tabs that you studied in the previous lesson, you would
find the following three buttons:
 Open file …
 Open URL …
 Open DB …
Click on the Open file ... button. A directory navigator window opens as shown in
the following screen:
Now, navigate to the folder where your data files are stored. WEKA installation
comes up with many sample databases for you to experiment. These are available
in the data folderof the WEKA installation.

For learning purpose, select any data file from this folder. The contents of the file
would be loaded in the WEKA environment. We will very soon learn how to inspect
and process this loaded data. Before that, let us look at how to load the data file
from the Web.

LoadingDatafromWeb
Once you click on the Open URL … button, you can see a window as follows:
We will open the file from a public URL Type the following URL in the popup box:

https://storm.cis.fordham.edu/~gweiss/data-mining/weka-
data/weather.nominal.arff

You may specify any other URL where your data is stored. The Explorer will load the
datafrom the remote site into its environment

LoadingDatafromDB
Once you click

Set the connection string to your database, set up the query for data selection, processthe query
and load the selected records in WEKA
Exp 3: Apply weka tool for data validation
3.1) Weka – Preprocessing the data

The data that is collected from the field contains many unwanted things that leads to
wrong analysis. For example, the data may contain null fields, it may contain columns
that are irrelevant to the current analysis, and so on. Thus, the data must be
preprocessed to meet the requirements of the type of analysis you are seeking. This is
the done in the preprocessing module.

To demonstrate the available features in preprocessing, we will use the Weather

database that is provided in the installation.

Using the Open file ... option under the Preprocess tag select the weather- nominal.arff file.

When you open the file, your screen looks like as shown here:
This screen tells us several things about the loaded data, which are discussed further inthis
chapter.

3.2)UnderstandingData
Let us first look at the highlighted Current relation sub window. It shows the name of
the database that is currently loaded. You can infer two points from this sub window:
 There are 14 instances - the number of rows in the table.

 The table contains 5 attributes - the fields, which are discussed in the upcomingsections.

On the left side, notice the Attributes sub window that displays the various fields in the
database.
The weather database contains five fields - outlook, temperature, humidity, windy and
play. When you select an attribute from this list by clicking on it, further details on the
attribute itself are displayed on the right hand side.
Let us select the temperature attribute first. When you click on it, you would see the
following screen
In the Selected Attribute subwindow, you can observe the following:

 The name and the type of the attribute are displayed.

 The type for the temperature attribute is Nominal.

 The number of Missing values is zero.

 There are three distinct values with no unique value.

 The table underneath this information shows the nominal values for this field ashot, mild and cold.

 It also shows the count and weight in terms of a percentage for each nominal value.

At the bottom of the window, you see the visual representation of the class values.
If you click on the Visualize All button, you will be able to see all features in one single
window as shown here:

RemovingAttributes
Many a time, the data that you want to use for model building comes with many
irrelevant fields. For example, the customer database may contain his mobile number
which is relevant in analysing his credit rating.
To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After you fully preprocess
the data, you can save it for model building.

Next, you will learn to preprocess the data by applying filters on this data.

3.2)ApplyingFilters
Some of the machine learning techniques such as association rule mining requires categorical data. To illustrate
the use of filters, we will use weather-numeric.arff database that contains two numeric attributes -
temperature and humidity.

We will convert these to nominal by applying a filter on our raw data. Click on the Choose
button in the Filter subwindow and select the following filter:

weka->filters->supervised->attribute->Discretize
Click on the Apply button and examine the temperature and/or humidity attribute.
Youwill notice that these have changed from numeric to nominal types
Let us look into another filter now. Suppose you want to select the best attributes for
deciding the play. Select and apply the following filter:

weka->filters->supervised->attribute->AttributeSelection
You will notice that it removes the temperature and humidity attributes from thedatabase.

After you are satisfied with the preprocessing of your data, save the data by clicking the
Save … button. You will use this saved file for model building.

In the next chapter, we will explore the model building using several predefined ML
algorithms.
Weka – Classifiers
Many machine learning applications are classification related. For example, you may like
to classify a tumor as malignant or benign. You may like to decide whether to play an
outside game depending on the weather conditions. Generally, this decision is dependent
on several features/conditions of the weather. So you may prefer to use a tree classifier to
make your decision of whether to play or not.

In this chapter, we will learn how to build such a tree classifier on weather data to decide
on the playing conditions.

3.3) SettingTestData
We will use the preprocessed weather data file from the previous lesson. Open the saved
file by using the Open file ... option under the Preprocess tab, click on the Classify
tab, and you would see the following screen:

Before you learn about the available classifiers, let us examine the Test options. You will
notice four testing options as listed below:
 Training set
 Supplied test set
 Cross-validation
 Percentage split
Unless you have your own training set or a client supplied test set, you would use cross-
validation or percentage split options. Under cross-validation, you can set the number of
folds in which entire data would be split and used during each iteration of training. In the
percentage split, you will split the data between training and testing using the set split
percentage.

Now, keep the default play option for the output class:

Next, you will select the classifier.

SelectingClassifier
Click on the Choose button and select the following classifier:

weka->classifiers>trees>J48
This is shown in the screenshot below:
Click on the Start button to start the classification process. After a while, the classification
results would be presented on your screen as shown here:
Let us examine the output shown on the right hand side of the screen.

It says the size of the tree is 6. You will very shortly see the visual representation of the
tree. In the Summary, it says that the correctly classified instances as 2 and the
incorrectly classified instances as 3, It also says that the Relative absolute error is 110%.
It also shows the Confusion Matrix. Going into the analysis of these results is beyond the
scope of this tutorial. However, you can easily make out from these results that the
classification is not acceptable and you will need more data for analysis, to refine your
features selection, rebuild the model and so on until you are satisfied with the model’s
accuracy. Anyway, that’s what WEKA is all about. It allows you to test your ideas quickly.

VisualizeResults
To see the visual representation of the results, right click on the result in the Result list
box. Several options would pop up on the screen as shown here:
Select Visualize tree to get a visual representation of the traversal tree as seen in the
screenshot below:
Selecting Visualize classifier errors would plot the results of classification as shown here:

A cross represents a correctly classified instance while squares represents incorrectly classified
instances. At the lower left corner of the plot you see a cross that indicates if outlook is sunny
then play the game. So this is a correctly classified instance. To locateinstances, you can
introduce some jitter in it by sliding the jitter slide bar
The current plot is outlook versus play. These are indicated by the two drop down list
boxes at the top of the screen.

Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
DELL - LATITUDE - E6500 - COMPAL - LA-4041P (Diagramas - Com.br)
No ratings yet
DELL - LATITUDE - E6500 - COMPAL - LA-4041P (Diagramas - Com.br)
56 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Free Fire Bangladesh Championship 2025 RULEBOOK
No ratings yet
Free Fire Bangladesh Championship 2025 RULEBOOK
31 pages
Design Patterns Embedded Systems
No ratings yet
Design Patterns Embedded Systems
9 pages
ETERNUS DX Disk Storage Systems User's Guide - Server Connection
No ratings yet
ETERNUS DX Disk Storage Systems User's Guide - Server Connection
59 pages
Evans Analytics3e PPT 02 Accessible v2
No ratings yet
Evans Analytics3e PPT 02 Accessible v2
64 pages
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
No ratings yet
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
6 pages
Pin Out BMW 318i (E36)
No ratings yet
Pin Out BMW 318i (E36)
10 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
5000 SQli Vulnerable Websites List 2016 Fresh
No ratings yet
5000 SQli Vulnerable Websites List 2016 Fresh
120 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Week 2-Consumer Behaviour in The Digital Age
No ratings yet
Week 2-Consumer Behaviour in The Digital Age
27 pages
Accreditation of CPD Program
No ratings yet
Accreditation of CPD Program
14 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Manual Smar Tt301
100% (1)
Manual Smar Tt301
58 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Oop Finalized
No ratings yet
Oop Finalized
10 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Wekappt
No ratings yet
Wekappt
58 pages
Language Summary ? 2
No ratings yet
Language Summary ? 2
16 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
DWDM File
No ratings yet
DWDM File
26 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
Aiml Final
No ratings yet
Aiml Final
45 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
PS CORE Graduate Programme Overview
100% (1)
PS CORE Graduate Programme Overview
2 pages
NOTES
No ratings yet
NOTES
45 pages
Chapter 4
No ratings yet
Chapter 4
50 pages
Itdw
No ratings yet
Itdw
44 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
NCM110 LEC MODULE 6 - PDA - Wireless - DSS - Implications of NI
No ratings yet
NCM110 LEC MODULE 6 - PDA - Wireless - DSS - Implications of NI
35 pages
OS Journal
No ratings yet
OS Journal
28 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
BTechCSE (2023 27 R01
No ratings yet
BTechCSE (2023 27 R01
30 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
Microchip Presentation - Evolution of 8-Bit MCUs - Final
No ratings yet
Microchip Presentation - Evolution of 8-Bit MCUs - Final
27 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Flowchart Basics
No ratings yet
Flowchart Basics
20 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
DWM1
No ratings yet
DWM1
19 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
9SD00582 PSRPT 2024-02-15 05.05.44
No ratings yet
9SD00582 PSRPT 2024-02-15 05.05.44
14 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Yemini - Etal (2019) LitReview Global Citizenship Teacher Education
No ratings yet
Yemini - Etal (2019) LitReview Global Citizenship Teacher Education
13 pages
Internet Technologies Exam
No ratings yet
Internet Technologies Exam
14 pages
Specimen 2018 MS
No ratings yet
Specimen 2018 MS
12 pages
A Comparative Study of Language Models For Book and Author Recognition
No ratings yet
A Comparative Study of Language Models For Book and Author Recognition
12 pages
Exp 6
No ratings yet
Exp 6
9 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Exp 6
No ratings yet
Exp 6
12 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
DSD Project Report
No ratings yet
DSD Project Report
10 pages
Proposal
No ratings yet
Proposal
5 pages
Aim Theory::: Study and Working of WEKA Tool
No ratings yet
Aim Theory::: Study and Working of WEKA Tool
3 pages
My Mine
No ratings yet
My Mine
5 pages
(Patchapk) Rebuiding Apk - Error - Rebuilding The APK May Have Failed. Read The Following Output To Determine If Apktool Actually Had An Error
No ratings yet
(Patchapk) Rebuiding Apk - Error - Rebuilding The APK May Have Failed. Read The Following Output To Determine If Apktool Actually Had An Error
4 pages
CCS 341 Data Warehousing Regulation 2021
No ratings yet
CCS 341 Data Warehousing Regulation 2021
3 pages
What Is Weka
No ratings yet
What Is Weka
2 pages
PC 3000 Express
No ratings yet
PC 3000 Express
1 page
2291 - Simulation and Programming Techniques - 1169 - (29!10!2024 08-16-28 - 319 AM)
No ratings yet
2291 - Simulation and Programming Techniques - 1169 - (29!10!2024 08-16-28 - 319 AM)
2 pages
21MCME02
No ratings yet
21MCME02
1 page
Learn SQL using MySQL in One Day and Learn It Well: SQL for beginners with Hands-on Project
From Everand
Learn SQL using MySQL in One Day and Learn It Well: SQL for beginners with Hands-on Project
Jamie Chan
No ratings yet
SQL| KILLING STEPS TO INTRODUCE SQL DATABASES
From Everand
SQL| KILLING STEPS TO INTRODUCE SQL DATABASES
Ben Brumm
No ratings yet
The Definitive Guide to Getting Started with OpenCart 2.x
From Everand
The Definitive Guide to Getting Started with OpenCart 2.x
iSenseLabs
No ratings yet
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)

Data Warehousing Lab Exp 1-3

Uploaded by

Data Warehousing Lab Exp 1-3

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

22CSE009- DATA WAREHOUSING LABORATORY

The steps for installing on Mac are as follows:

 Download the Mac installation file.

 Double click on the downloaded weka-3-8-3-corretto-jvm.dmg file.

 Click on the weak-3-8-3-corretto-jvm icon to start Weka.

 Optionally you may start it from the command line:

java -jar weka.jar

We will be using Explorer in this tutorial.

On the top, you will see several tabs as listed here:

Select Attributes Tab

To demonstrate the available features in preprocessing, we will use the Weather

 The name and the type of the attribute are displayed.

 The type for the temperature attribute is Nominal.

 The number of Missing values is zero.

 There are three distinct values with no unique value.

Next, you will select the classifier.

You might also like