0% found this document useful (0 votes)

5 views36 pages

Data Warehousing Lab Manual

Uploaded by

Tãmïl ÅRÃẞÂÑ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views36 pages

Data Warehousing Lab Manual

Uploaded by

Tãmïl ÅRÃẞÂÑ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

(A Constituent College of Anna University, Chennai)

Villupuram 605103

CCS341
DATA WAREHOUSE LABORATORY
EX NO:1 DATA EXPLORATION AND INTEGRATION WITH WEKA-
WEATHER DATASET

Aim:

The goal of this lab is to install and familiarize with Weka. To demonstrate the available features in
preprocessing, we will use the Weather dataset.

Procedure:

Step1: Download and install Weka.

Step2: Open Weka and have a look at the interface. It is an open-source project written in Java from
theUniversity of Waikato
Step 3: Click on the Explorer button on the right side
Step 4: Weka comes with a number of small datasets. Those files are located at C:\Program Files\Weka3-8
(If it is installed at this location. Or else, search for Weka-3-8 to find the installation location).
In this folder, there is a subfolder named ‘data’. Open that folder to see all files that comes with
Weka.

Using the ... Open file option under the Preprocess tag select the weather-nominal.arff file.

DATASET
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no

When opening the file, the screen looks like this.

Step 5: Check different tabs to familiarize with the tool.

Understanding Data
Let us first look at the highlighted Current relation sub window. It shows the name of the dataset that is
currently loaded. You can infer two points from this sub window −
• There are 14 instances - the number of rows in the table.
• The table contains 5 attributes - the fields, which are discussed in the upcoming sections.
On the left side, notice the Attributes sub window that displays the various fields in the database.
The weather dataset contains five fields - outlook, temperature, humidity, windy and play. When you select
an attribute from this list by clicking on it, further details on the attribute itself are displayed on the right
hand side.
Let us select the temperature attribute first. When clicking on it, we would see the following screen −
In the Selected Attribute subwindow, you can observe the following −
• The name and the type of the attribute are displayed.
• The type for the temperature attribute is Nominal.
• The number of Missing values is zero.
• There are three distinct values with no unique value.
• The table underneath this information shows the nominal values for this field as hot, mild and cold.
• It also shows the count and weight in terms of a percentage for each nominal value.
At the bottom of the window, you see the visual representation of the class values.
If you click on the Visualize All button, you will be able to see all features in one single window as shown
here −

Removing Attributes:
Many a time, the data that you want to use for model building comes with many irrelevant fields. For
example, the customer database may contain his mobile number which is relevant in analysing his credit
rating.
To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After we fully pre-process the data, we can
save it for model building.

Applying Filters:

Some of the machine learning techniques such as association rule mining requires categorical data. To
illustrate the use of filters, we will use weather-numeric.arff database that contains two numeric attributes
- temperature and humidity.

We will convert these to nominal by applying a filter on our raw data. Click on the Choose button in
the Filter subwindow and select the following filter −

weka→filters→supervised→attribute→Discretize
Click on the Apply button and examine the temperature and/or humidity attribute. You will notice that
these have changed from numeric to nominal types.

After we fully pre-process the data, we can save it for model building.

Result:

Thus the data exploration and integration with WEKA is successfully executed.
EX NO:2 APPLY WEKA TOOL FOR DATA VALIDATION

Aim:
By applying Weka in dataset it supports several standard data mining tasks like data pre-
processing, clustering, classification, regressing, visualization and feature selection.

Preprocess :

Initially as you open the explorer, only the Pre-process tab is enabled. The first step is to pre-process
the data. Thus, in the Pre-process option, we will select the data file, process it and make it fit for applying
the various algorithms.
Loading Data:

The first four buttons at the top of the preprocess section enable you to load data into WEKA

1. Open file.... Brings up a dialog box allowing you to browse for the data file on the local file system.

2. Open URL.... Asks for a Uniform Resource Locator address for where the data is stored.

3. Open DB.... Reads data from a database. (Note that to make this work you might have to edit the file in
weka/experiment/DatabaseUtils.props.)

4. Generate.... Enables you to generate artificial data from a variety of Data Generators. Using the Open
file... button you can read files in a variety of formats:

WEKA‘s ARFF format, CSV format, C4.5 format, or serialized Instances format. ARFF files typically have
a .arff extension, CSV files a .csv extension,C4.5 files a .data and .names extension, and serialized Instances
objects a .bsiextension.
DATASET
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no
Select classify tab in WEKA explorer.

The Classify tab provides us several machine learning algorithms for the classification of your data. To list a
few, we may apply algorithms such as Linear Regression, Logistic Regression, Support Vector Machines,

Decision Trees, RandomTree, RandomForest, NaiveBayes, and so on. The list is very exhaustive and

provides both supervised and unsupervised machine learning algorithms.

Cross-Validation
Procedure for cross-validation:
1. Load the dataset into weka tool
2. Go to classify option & in left-hand navigation bar we can see different classification algorithms under
functions section.
3. In which we selected Linear Regression algorithm & click on start option with cross validation option
with 10 folds.
4. Then we will get regression model & its result as shown below

Here, we enabled cross-validation test option with 10 folds & clicked start button as represented below.
Using Cross-Validation Strategy with 20 folds: Here, we enabled cross-validation test option with 20 folds
& clicked start button as represented below.

If we see the above results of cross validation with 10 folds & 20 folds. As per our observation the
error rate is lesser with 20 folds got 97.3% correctness when compared to 10 folds got 94.6% correctness.

Result:

Thus the data validation using WEKA is successfully executed.

Ex:No: 3
Real-time anomaly detection with Apache Kafka and Python
(Plan the architecture for real time application)

Aim:
To make real-time predictions with incoming stream data from Apache Kafka, and to implement
notification messages for credit card transactions, GPS logs, system consumption metrics.

Project ideas:
• Train an anomaly detection algorithm using unsupervised machine learning.
• Create a new data producer that sends the transactions to a Kafka topic.
• Read the data from the Kafka topic to make the prediction using the trained ml model.
• If the model detects that the transaction is not an inlier, send it to another Kafka topic.
• Create the last consumer that reads the anomalies and sends an alert to a Slack channel.

Architecture:

Procedure:
Step 1: Project Structure:
i) First, Check The Settings.Py; It Has Some Variables To Set, Like The Kafka Broker Host And Port;
Leave The Ones By Default (Listening On Localhost And Default Ports Of Kafka And Zookeeper).
ii) The Streaming/Utils.Py File Contains The Configurations To Create Kafka Consumers And Producers.
iii) Install The Requirements

Step 2: Train The Model

i) To generate random data; it will have two variables
ii) Isolation Forest model to detect the outliers; (To isolate the data points by tracing random lines over
one of the (sampled) variables' axes and, after several iterations, measure how "hard" was to isolate
each observation).
Step 3:Create The Topics
i) "transactions," where the producer will send new transaction records.
ii) "anomalies," the module that detects anomalies will send the data, and the last consumer will read it to
send a slack notification:

Step 4:Transaction Producer

i) To generate the first producer that will send new data to the Kafka topic "transactions"; use the
confluent-Kafka package; in the file streaming/producer.py.
ii) Producer will send data to a Kafka topic, with a probability of
OUTLIERS_GENERATION_PROBABILITY; the data will come from an "outlier generator," will send
an auto-incremental id, the data needed for the machine learning model and the current time in UTC.

Step 5:Outlier Detector Consumer

i) To make the predictions, and filter the outliers. Done in the streaming/anomalies_detector.py file .
ii) The consumer read messages from the "transactions" topic and a consumer sent outliers to the
"anomalies" .

Step 6:Slack notification

i) To take some actions with these detected outliers; in a real-life scenario, it could block a transaction,
scale a server, generate a recommendation, send an alert to an administrative user.

Program:
Step1:

pip install -r requirements.txt

step 2 Train and build the model

Step 3 create the topics

kafka-topics.sh --zookeeper localhost:2181 --topic transactions --create --partitions 3 --replication-factor 1
kafka-topics.sh --zookeeper localhost:2181 --topic anomalies --create --partitions 3 --replication-factor 1
Step 4 transaction producer

kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic transactions

Step 4 Outlier Detector Consumer

Step 5 Slack notification
Output:
anamoly detection

CHATBOT ALERT NOTIFICATION

Result:
Thus the prediction of Real-time anomaly detection with Apache Kafka and Python has executes the
streaming/bot_alerts notification.
Ex:No:4 Implement The Query For Schema Definition
(Star, snowflake and Fact constellation schemas)

Aim:

To design database on data warehouse query for schema definition namely Star, snowflake and Fact
constellation schemas through mysql database connection of weka tool.
Procdure:

Step 1: Click Start –All Programs-Xmpp-Click Start Apache And Mysql Server Then Open Weka Tool- Click
Explorer.
Step 2: Click Open Db Tab For Database Connectivity
Step 3: Enter Database Connection Parameter Url,Username,Password. Then Check Database Connection .
Step 4: Double Click Localhost 3306. Database Connection.
Step 5: Double Click Dwtp And Click Schemas(1) Right Click-Select New Schemas Type Schema Name
Step 6: Double Click Dw-Tables(0) Click Icon Sql-Type Query And Click Run Icon
Step 7: Then Close Sql Query Dialog Box
Step 8: Upto Above, Created Database Along With Primary Key.Next Import Csv File And Store Data In The
Database Goto Tables-Choose Table Name And Right Click Choose Import Option .Then Select.Csv File From
The Location &Select Format As Csv & Click Import Button. Now The Data Are Stored In The Database
Step 9: Similarly Do The Followings Table-Snowflake ,Star ,Fact Constellation Table
Step 10: Click Sql Icon-Type The Queries And Click Run Button.

Implementation:
STAR SCHEMA:
• Each dimension in a star schema is represented with only one-dimension table.
• The fact table also contains the attributes, namely dollars sold and units sold.
STAR SCHEMA DEFINITION:
The star schema defined using Data Mining Query Language (DMQL) as follows-
define cube sales star [time, item, branch, location]:
dollars sold sum(sales in dollars), units sold-count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state, country)

SELECT pdim.Name Product_Name,

Sum (sfact.sales_units) Quanity_Sold
FROM Product pdim,
Sales sfact,
Store sdim,
Date ddim
WHERE sfact.product_id = pdim.product_id
AND sfact.store_id = sdim.store_id
AND sfact.date_id = ddim.date_id
AND sdim.state = 'Kerala'
AND ddim.month = 1
AND ddim.year = 2018
AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name
SNOWFLAKE SCHEMA:
Snowflake schema can be defined using DMQL as follows-
define cube sales snowflake (time, item, branch, location]
dollars sold sumtsales in dollars), units sold count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier (supplier key, supplier type))
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city (city key, city, province or state, countr))

SELECT pdim.Name Product_Name,

Sum (sfact.sales_units) Quanity_Sold
FROM Sales sfact
INNER JOIN Product pdim ON sfact.product_id = pdim.product_id
INNER JOIN Store sdim ON sfact.store_id = sdim.store_id
INNER JOIN State stdim ON sdim.state_id = stdim.state_id
INNER JOIN Date ddim ON sfact.date_id = ddim.date_id
INNER JOIN Month mdim ON ddim.month_id = mdim.month_id
WHERE stdim.state = 'Kerala'
AND mdim.month = 1
AND ddim.year = 2018
AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name

FACT CONSTELLATION SCHEMA:

Fact constellation schema can be defined using DMQL as follows-
define cube sales (time, item, branch, location]:
dollars sold sum(sales in dollars), units sold-count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state,country)
define cube shipping (time, item, shipper, from location, to location]:
dollars cost-sum(cost in dollars), units shipped count(*)
define dimension tune as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as location in cube sales, shipper type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
Chrome –search bar –localhost 3306
Output:
Star schema

Snowflake schema

Fact constellation schema

Result:

Thus the implementation of star, snowflake, fact constellation schema using weka and mysql was
executed successfully.
Ex :No:5 Design Data WareHouse For Real Time Applications

(population database named as sample data create user-details and hockey)

Aim:
To Build a Data Warehouse/Data Mart of source tables and populate sample data using MySQL
administrator, SQLyog Enterprise tools.

Procedure:(MySQL Administrator connection establishment)

Step1: Start Login MySQL Administrator,After successful login, it will open new window .
Step2:Check the mysql database connection localhost 3306.
Step3:Open The SQLyog Enterprise tool.
Step4:To build tables & populate table‘s data in database through SQL queries.
Step5:After creation of database using mysql query for real time application stop the process
and Logout the server.

Implement:
Result:

Thus the creation of data warehouse for real time application of sample dataset with user-details and
hockey data table could be designed successfully
Ex:No:6 Implement and Analyse the Dimensional Modeling of Data Model

Aim:

To implement the creation of table dimensions and analysis of data model.

Procedure:
Step1:Identify the Business Process
Step2:Identify the Grain
Step3:Identify the Dimensions
Step4:Identify the Facts
Step5:Build the Schema

Implementation:

Create the data warehouse

create database TopHireDW
go
use TopHireDW
go
-- Create Date Dimension
if exists (select * from sys.tables where name = 'DimDate')
drop table DimDate
go
create table DimDate
( DateKey int not null primary key,
[Year] varchar(7), [Month] varchar(7), [Date] date, DateString varchar(10))
go
-- Populate Date Dimension
truncate table DimDate
go
declare @i int, @Date date, @StartDate date, @EndDate date, @DateKey int,
@DateString varchar(10), @Year varchar(4),
@Month varchar(7), @Date1 varchar(20)
set @StartDate = '2006-01-01'
set @EndDate = '2016-12-31'
set @Date = @StartDate
insert into DimDate (DateKey, [Year], [Month], [Date], DateString)
values (0, 'Unknown', 'Unknown', '0001-01-01', 'Unknown') --The unknown row
while @Date <= @EndDate
begin
set @DateString = convert(varchar(10), @Date, 20)
set @DateKey = convert(int, replace(@DateString,'-',''))
set @Year = left(@DateString,4)
set @Month = left(@DateString, 7)
insert into DimDate (DateKey, [Year], [Month], [Date], DateString)
values (@DateKey, @Year, @Month, @Date, @DateString)
set @Date = dateadd(d, 1, @Date)
end
go
select * from DimDate

-- Create Customer dimension

if exists (select * from sys.tables where name = 'DimCustomer')
drop table DimCustomer
go
create table DimCustomer
( CustomerKey int not null identity(1,1) primary key,
CustomerId varchar(20) not null,
CustomerName varchar(30), DateOfBirth date, Town varchar(50),
TelephoneNo varchar(30), DrivingLicenceNo varchar(30), Occupation varchar(30)
)
go
insert into DimCustomer (CustomerId, CustomerName, DateOfBirth, Town, TelephoneNo,
DrivingLicenceNo, Occupation)
select * from HireBase.dbo.Customer
select * from DimCustomer
-- Create Van dimension
if exists (select * from sys.tables where name = 'DimVan')
drop table DimVan
go
create table DimVan
( VanKey int not null identity(1,1) primary key,
RegNo varchar(10) not null,
Make varchar(30), Model varchar(30), [Year] varchar(4),
Colour varchar(20), CC int, Class varchar(10)
)
go
insert into DimVan (RegNo, Make, Model, [Year], Colour, CC, Class)
select * from HireBase.dbo.Van
go
select * from DimVan
-- Create Hire fact table
if exists (select * from sys.tables where name = 'FactHire')
drop table FactHire
go
create table FactHire
( SnapshotDateKey int not null, --Daily periodic snapshot fact table
HireDateKey int not null, CustomerKey int not null, VanKey int not null, --Dimension Keys
HireId varchar(10) not null, --Degenerate Dimension
NoOfDays int, VanHire money, SatNavHire money,
Insurance money, DamageWaiver money, TotalBill money
)
go
select * from FactHire
Output:

Snowflake Schema image source of dimensional data model

Result:

Thus the implementation and analysed of data dimension model using weka tool with mysql query.
Ex:No:7 Perform various OLAP operations such slice, dice, roll up, drill down and pivot

Aim:
To Perform various OLAP operations such slice, dice, roll up, drill down and pivot using Microsoft
Excel.

Procedure:
Step1:Open Microsoft Excel, go to Data tab in top & click on ―Existing Connections”.
Step2: Existing Connections window will be opened, there “Browse for more”option should be clicked for
importing .cub extension file for performing OLAP Operations.
Step3: select ―PivotTable Report” and click “OK”.
Step4: Analyzing different OLAP Operations.Firstly, performed drill-down operation .
Step5: To perform roll-up (drill-up) operation.
Step6: Next OLAP operation Slicing is performed by inserting slicer
Step7:Dicing operation is similar to Slicing operation.
Step8: Finally, the Pivot (rotate) OLAP operation is performed by swapping rows.
Step9:After visualization save and exit the process.

Implement:
Open data

Import data
Drill down

Drill Up
slice

Dice

Pivot table
Result:

Thus the implementation of OLAP operations using Microsoft excel successfully executed.
Ex:No:8 Write ETL Scripts of OLTP Operations and Implement Using Data Warehouse Tools

Aim:
To implement OLTP operations using ETL tool for the extraction of data from several sources its,
cleansing, customization, reformatting, integration, and insertion into a data warehouse.

Procedure:
Step 1: open the tool.
Step2:Setting Up, download or create data save it one folder then select and copy the file path.
Step3: Extract the features of data aor retrieve the data.
Step4:Transform the data from one source to another .
Step5: Load the data from the sources.
Step6:Save and exit the process.

Implement:
Extract
# Import
statements
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy import inspect
# Connect the engine to the database file we'll be using
engine = create_engine('sqlite:///chinook.db')
Engine
DB Query
# SQL Expression Language creates metadata that contains objects that define the customers table
metadata = MetaData()
# This method instantiates the tables that already
# exist in the database, which the engine is connected to.
metadata.create_all(engine)
# Checking this out, we can see the table structure and variable types for the employees table
inspector = inspect(engine)
# Checked out the columns in the employees table
inspector.get_columns('employees')

Creation of Employees Table

# Let's execute raw SQL on some tables using SQLAlchemy
with engine.connect() as con:
rs = con.execute('SELECT * FROM employees')
for row in rs:
print(row)
# Don't forget to close your connection to the database when the query is done
con.close()
Transform
# How many employees are there?
with engine.connect() as con:
rs = con.execute("""SELECT COUNT(EmployeeId)
FROM employees;""")
for row in rs:
print(row)
con.close()
# How many customers did each sales rep help?
with engine.connect() as con:
rs = con.execute("""SELECT COUNT(SupportRepId)
FROM customers
GROUP BY SupportRepId;""")
for row in rs:
print(row)
con.close()
with engine.connect() as con:
rs = con.execute("""SELECT HireDate, EmployeeId
FROM employees
WHERE EmployeeId BETWEEN 3 AND 5
ORDER BY HireDate ASC""")
for row in rs:
print(row)

# Does their length of tenure map to how many customers they helped?
with engine.connect() as con:
rs = con.execute("""SELECT MIN(HireDate), EmployeeId
FROM employees;""")
for row in rs:
print(row)
con.close()
with engine.connect() as con:
# Grab the variables you want then inner join them on the respective private keys
rs = con.execute(
"""SELECT
invoices.InvoiceId AS invid,
invoices.CustomerId AS invcustid,
customers.CustomerId AS custcustid,
COUNT(customers.CustomerId) AS numcustomers,
customers.Country as country,
invoice_items.InvoiceId AS invitemid,
invoice_items.TrackId AS invtrackid,
tracks.TrackId AS tracktrackid,
tracks.GenreId AS trackgenreid,
tracks.Bytes AS trackbytes,
SUM(tracks.Milliseconds) / 1000 / 60 AS minutes
FROM
invoices INNER JOIN customers ON invcustid=custcustid
INNER JOIN invoice_items ON invitemid=invid
INNER JOIN tracks ON tracktrackid=invtrackid
GROUP BY country
ORDER BY minutes DESC
"""
)
for row in rs:
print(row)
con.close()

Load

# Connecting the query to pd.read_sql_query. To simplify, you could modify the query to create
# a table and then just do pd.read_sql_table in to the dataframe.
import pandas as pd
df = pd.read_sql_query("""SELECT
invoices.InvoiceId AS invid,
invoices.CustomerId AS invcustid,
customers.CustomerId AS custcustid,
COUNT(customers.CustomerId) AS numcustomers,
customers.Country as country,
invoice_items.InvoiceId AS invitemid,

invoice_items.TrackId AS invtrackid,
tracks.TrackId AS tracktrackid,
tracks.GenreId AS trackgenreid,
tracks.Bytes AS trackbytes,
SUM(tracks.Milliseconds) / 1000 / 60 AS minutes

FROM
invoices INNER JOIN customers ON invcustid=custcustid
INNER JOIN invoice_items ON invitemid=invid
INNER JOIN tracks ON tracktrackid=invtrackid

GROUP BY country

ORDER BY minutes DESC

""", con=engine.connect())

Output:
Employees table

To count the number of employees

3,000,000 employees

Sales representative helped

Employee 3 helped 21 customers, 4 helped 20, and 5 helped 18 respectively.

Hiring date for employees 3-5

Invoice customer

3,277 minutes of music were sold to 494 Americans! Cool!

Load dataset

Result:

Thus the implementation of OLTP operations using ETL tool.

Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Anne - CCS341 - DW - Students Record - 1a - 1b - 2 - Print
No ratings yet
Anne - CCS341 - DW - Students Record - 1a - 1b - 2 - Print
63 pages
Poa Sba
100% (2)
Poa Sba
14 pages
Qualification Validation
No ratings yet
Qualification Validation
29 pages
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
No ratings yet
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
6 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Data Warehouse and Data Mining: Lab Manual
100% (1)
Data Warehouse and Data Mining: Lab Manual
69 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Lab Assignment Report: ECS 851 Data Warehousing and Data Mining
No ratings yet
Lab Assignment Report: ECS 851 Data Warehousing and Data Mining
69 pages
Pre-Employment Requirements
No ratings yet
Pre-Employment Requirements
2 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
(NOV) F2升F3 BI (tech savvy)
No ratings yet
(NOV) F2升F3 BI (tech savvy)
33 pages
HL-740 (TM) 7-5
No ratings yet
HL-740 (TM) 7-5
17 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
CVR DWDM Manual
100% (1)
CVR DWDM Manual
70 pages
Indonesia (Suite) Wiring Diagram
No ratings yet
Indonesia (Suite) Wiring Diagram
1 page
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Excel Associate
No ratings yet
Excel Associate
7 pages
2005 RG Body
No ratings yet
2005 RG Body
1,402 pages
Iare DWDM and WT Lab Manual PDF
No ratings yet
Iare DWDM and WT Lab Manual PDF
69 pages
Data Migration in Fiori
No ratings yet
Data Migration in Fiori
22 pages
Reda Hps PDF
100% (1)
Reda Hps PDF
1 page
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Wekappt
No ratings yet
Wekappt
58 pages
Wendland, Aristeae Ad Philocratem Epistula
No ratings yet
Wendland, Aristeae Ad Philocratem Epistula
275 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Lab Manual
No ratings yet
Lab Manual
69 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Lesson Plan in Measures of Position and Exploratory Data Analysis
No ratings yet
Lesson Plan in Measures of Position and Exploratory Data Analysis
37 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
OSPE TI BC152886485467en-000701 February2022
No ratings yet
OSPE TI BC152886485467en-000701 February2022
39 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
Technical Manual Qa-S (10-25) PDF
No ratings yet
Technical Manual Qa-S (10-25) PDF
102 pages
NIBDocument NIB16
No ratings yet
NIBDocument NIB16
92 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DW Lab
No ratings yet
DW Lab
85 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Red Hat Enterprise Linux-9-Upgrading From RHEL 8 To RHEL 9-En-US
No ratings yet
Red Hat Enterprise Linux-9-Upgrading From RHEL 8 To RHEL 9-En-US
61 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
DWDM File
No ratings yet
DWDM File
26 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
DWM1
No ratings yet
DWM1
19 pages
DWH Manual Merged
No ratings yet
DWH Manual Merged
47 pages
Data Mining Lab
No ratings yet
Data Mining Lab
33 pages
Parallel Database
No ratings yet
Parallel Database
27 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
71 pages
Wa0002.
No ratings yet
Wa0002.
21 pages
Conduit User Manual
No ratings yet
Conduit User Manual
29 pages
Data Warehousing - To Write
No ratings yet
Data Warehousing - To Write
23 pages
DMW Lab Manual
No ratings yet
DMW Lab Manual
35 pages
MiniWave Manual
No ratings yet
MiniWave Manual
16 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Itdw
No ratings yet
Itdw
44 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
A) Collection of Values
No ratings yet
A) Collection of Values
9 pages
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
No ratings yet
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
11 pages
Exp 6
No ratings yet
Exp 6
9 pages
Lab Manual
No ratings yet
Lab Manual
16 pages
LAB Experiment Data Mining and Warehousing
No ratings yet
LAB Experiment Data Mining and Warehousing
33 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
No ratings yet
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
16 pages
Exp 6
No ratings yet
Exp 6
12 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
30 pages
Is 1892
No ratings yet
Is 1892
1 page
OS Journal
No ratings yet
OS Journal
28 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Stahl Control 6 K
No ratings yet
Stahl Control 6 K
12 pages
A High-Efficiency Step-Up Current-Fed PushPull Quasi-Resonant Converter With Fewer Components For Fuel Cell Application
No ratings yet
A High-Efficiency Step-Up Current-Fed PushPull Quasi-Resonant Converter With Fewer Components For Fuel Cell Application
10 pages
Bschons Statistics and Data Science (02240193) : University of Pretoria Yearbook 2020
No ratings yet
Bschons Statistics and Data Science (02240193) : University of Pretoria Yearbook 2020
6 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
The Evaluation of Operating System
No ratings yet
The Evaluation of Operating System
6 pages
4 (B) - Data Preprocessing and Visualization
No ratings yet
4 (B) - Data Preprocessing and Visualization
6 pages
Unilumin AIO SMD 135 Inch
No ratings yet
Unilumin AIO SMD 135 Inch
4 pages
Honour Declaration Template (Revised 250214)
No ratings yet
Honour Declaration Template (Revised 250214)
4 pages
Copy of Копия - Short Film Budget Template -
No ratings yet
Copy of Копия - Short Film Budget Template -
3 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages

Data Warehousing Lab Manual

Uploaded by

Data Warehousing Lab Manual

Uploaded by

(A Constituent College of Anna University, Chennai)

Step1: Download and install Weka.

When opening the file, the screen looks like this.

Step 5: Check different tabs to familiarize with the tool.

provides both supervised and unsupervised machine learning algorithms.

Thus the data validation using WEKA is successfully executed.

Step 2: Train The Model

Step 4:Transaction Producer

Step 5:Outlier Detector Consumer

Step 6:Slack notification

pip install -r requirements.txt

step 2 Train and build the model

Step 3 create the topics

kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic transactions

Step 4 Outlier Detector Consumer

CHATBOT ALERT NOTIFICATION

SELECT pdim.Name Product_Name,

SELECT pdim.Name Product_Name,

FACT CONSTELLATION SCHEMA:

Fact constellation schema

(population database named as sample data create user-details and hockey)

Procedure:(MySQL Administrator connection establishment)

To implement the creation of table dimensions and analysis of data model.

Create the data warehouse

-- Create Customer dimension

Snowflake Schema image source of dimensional data model

Creation of Employees Table

ORDER BY minutes DESC

To count the number of employees

Sales representative helped

Hiring date for employees 3-5

3,277 minutes of music were sold to 494 Americans! Cool!

Thus the implementation of OLTP operations using ETL tool.

You might also like