0% found this document useful (0 votes)

15 views26 pages

Data Warehousing Record

The document outlines a laboratory record for a Data Warehousing course, detailing various experiments conducted using the WEKA tool. It includes aims, algorithms, and results for tasks such as data exploration, data validation, architecture planning for real-time applications, schema definition, and dimensional modeling. Additionally, it presents case studies on OLAP and OTLP implementations in a retail and technology consulting context, highlighting challenges and solutions in data management and analysis.

Uploaded by

suntharan4321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views26 pages

Data Warehousing Record

Uploaded by

suntharan4321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CCS341 DATA WAREHOUSING

REGULATION 2021
LABORATORY RECORD

This is to certify that this is a bonafide record of the work done by Ms/Mr

……………………………………………. Register number ………………………………..

in the ……………………………………………laboratory in the ……………….Semester.

Staff-in-charge Head of the Department

Submitted for the University Examination held on ………………………………..

Internal Examiner External Examiner

INDEX

EX PAGE
NO. DATE NAME OF THE EXPERIMENT NO. SIGNATURE
INDEX

EX PAGE
NO. DATE NAME OF THE EXPERIMENT NO. SIGNATURE
EXP 1. Data exploration and integration with WEKA

AIM:

To Explore Data and Integrate with WEKA

ALGORTIHM AND EXPLORES:

1. Download and install Weka. You can find it here:

http://www.cs.waikato.ac.nz/mn/weka/downloading.html

2.Open the weka tool and select the explorer option.

3.New window will be opened which consists of different options (Preprocess,

Association etc.)

4. In the preprocess, click the ―open file‖ option

5.Go to C:\Program Files\Weka-3-6\data for finding different existing. arff datasets.

Click on any dataset for loading the data then the data will be displayed as shown below

Load each dataset and observe the following:

Here we have taken IRIS.arff dataset as sample for observing all the below things.

i. List the attribute names and they types

There are 5 attributes& its datatype present in the above loaded dataset (IRIS.arff)
sepallength – Numeric sepalwidth – Numeric petallength – Numeric petallength – Numeric
Class – Nominal
ii. Number of records in each dataset

There are total 150 records (Instances) in dataset (IRIS.arff).

iii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels.

They are: 1. Iris-setosa 2. Iris-versicolor 3. Iris-virginica

iv. Plot Histogram

v. Determine the number of records for each class.

There is one class attribute (150 records) which consists of 3 labels. They are shown below
1. Iris-setosa - 50 records 2. Iris-versicolor – 50 records 3. Iris-virginica – 50 records
vi. Visualize the data in various dimensions

RESULT:
EXP 2. Apply WEKA tool for Data Validation

AIM:

To Apply WEKA tool for Data Validation

Steps and Apply:

1. Load the dataset (Iris-2D. arff) into weka tool

2. Go to classify option & in left-hand navigation bar we can see differentclassification

algorithms under rules section.

3. In which we selected JRip (If-then) algorithm & click on start option with ―use
training set‖ test option enabled.

4. Then we will get detailed accuracy by class consists ofF-measure, TP rate, FP rate,
Precision, Recall values& Confusion Matrix as represented below.

Using Cross-Validation Strategy with 10 folds:

Here, we enabled cross-validation test option with 10 folds & clicked start button as
represented below.
Using Cross-Validation Strategy with 20 folds:

Here, we enabled cross-validation test option with 20 folds & clicked start button as
represented below.

If we see the above results of cross validation with 10 folds & 20 folds. As per our
observation the error rate is lesser with 20 folds got 97.3% correctness when compared to 10
folds got 94.6% correctness.
RESULT:
EXP 3.Plan the architecture for real time application

Aim:

To plan the architecture for a real-time application using Weka, you need to consider several
factors. Weka is a popular machine learning library that provides various algorithms for data
mining and predictive modelling.

Here are the steps to plan the architecture:

1. Define the problem: Clearly understand the problem you are trying to solve with your
real-time application. Identify the specific tasks and goals you want to achieve using Weka.

2. Data collection and preprocessing: Determine the data sources and collect the required
data for your application. Preprocess the data to clean, transform, and prepare it for analysis
using Weka. This may involve tasks like data cleaning, feature selection, normalization, and
handling missing values.

3. Choose the appropriate Weka algorithms: Weka offers a wide range of machine
learning algorithms. Select the algorithms that are suitable for your problem and data.
Consider factors like the type of data (classification, regression, clustering), the size of the
dataset, and the computational requirements.

4. Real-time data streaming: If your application requires real-time data processing, you
need to set up a mechanism to stream the data continuously. This can be done using
technologies like Apache Kafka, Apache Flink, or Apache Storm. Ensure that the data
streaming infrastructure is integrated with Weka for seamless processing.

5. Model training and evaluation: Train the selected Weka algorithms on your training
dataset. Evaluate the performance of the models using appropriate evaluation metrics like
accuracy, precision, recall, or F1-score. Fine-tune the models if necessary.

6. Integration and deployment: Integrate the trained models into your real-time
application. This may involve developing APIs or microservices to expose the models'
functionality. Ensure that the application can handle real-time requests and provide
predictions or insights in a timely manner.

7. Monitoring and maintenance: Set up monitoring mechanisms to track the performance

of your real-time application. Monitor the accuracy and performance of the models over time.
Update the models periodically to adapt to changing data patterns or to improve performance.
Remember to document your architecture design and implementation decisions for future
reference. Regularly review and update your architecture as your application evolves and new
requirements arise.
RESULT:
EXP 4.Write the query for schema definition

AIM:

To write the query for schema definition

ALGORITHM:

1. Create a new database

2. Switch to the newly created database

3. Define the schema for each table

4. Define relationships between tables (if needed)

5. Execute the schema definition queries

PROGRAM:

-- Create a new database named "library" CREATE DATABASE library;

-- Switch to the "library" databaseUSE library;

-- Define the schema for the "books" tableCREATE TABLE books ( book_id INT
AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255) NOT NULL, author
VARCHAR(100) NOT NULL, publication_year INT, isbnVARCHAR(20), available
BOOLEAN DEFAULT TRUE );

-- Define the schema for the "members" table CREATE TABLE members ( member_id INT
AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL, email
VARCHAR(255) UNIQUE, phone_numberVARCHAR(20), address VARCHAR(255) );

-- Define the schema for the "checkouts" table CREATE TABLE checkouts ( checkout_id
INT AUTO_INCREMENT PRIMARY KEY, book_id INT NOT NULL ,member_id INT
NOT NULL, checkout_date DATE NOT NULL, return_date DATE, FOREIGN KEY
(book_id) REFERENCES books(book_id), FOREIGN KEY (member_id) REFERENCES
members(member_id) );

OUTPUT:

Database 'library' created.

Database changed to 'library'.

Table 'books' created successfully.

Table 'members' created successfully.

Table 'checkouts' created successfully

RESULT:
Ex.5. Design data ware house for real time applications

AIM:

To Design data ware house for real time applications

ALGORITHM AND PROGRAM:

1. Data Sources and Integration:

sql

-- Example: Creating a Snowpipe for real-time data ingestion from an external stage

CREATE PIPE snowpipe_real_time

AUTO_INGEST = TRUE

COPY INTO temperature_data

FROM (SELECT $1::timestamp, $2::int, $3::float FROM @real_time_stage)

FILE_FORMAT = (TYPE = 'JSON');

2. Data Storage and Modeling:

sql

-- Example: Creating tables for storing real-time data

CREATE TABLE temperature_data (

timestamp TIMESTAMP,

sensor_id INT,

temperature FLOAT );

3. Data Governance and Security:

sql

-- Example: Creating roles and granting privileges

CREATE ROLE analyst_role;

GRANT USAGE ON DATABASE my_database TO analyst_role;

GRANT SELECT ON temperature_data TO analyst_role;

4. *Monitoring and Performance Optimization*:

sql

-- Example: Monitoring query performance using Snowflake's query history

SELECT * FROM
TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER('ANALYST_USER')
);

5. Deployment and Testing:

- Deployment would involve setting up Snowflake accounts, databases, and resources,

which are typically done through the Snowflake web interface or via Snowflake's
APIs. Testing would involve validating the data ingestion process, querying data, and
ensuring proper access controls.

6. Training and Documentation: -

Training sessions and documentation would cover topics such as Snowflake SQL
syntax, data modeling best practices, and security principles.

7. Iterative Improvement and Maintenance: -

This would involve ongoing monitoring of system performance, optimizing queries

and data models as needed, and iterating on the data warehouse design based on user
feedback and evolving business requirements.

OUTPUT:

+--------------------+-----------+--------------------------------+

| timestamp | sensor_id | temperature |

|---------------------|-----------|-----------------------------|

| 2024-02-06 10:00:00 | 1 | 25.5 |

| 2024-02-06 10:01:00 | 2 | 26.3 |

| 2024-02-06 10:02:00 | 1 | 24.8 |

| 2024-02-06 10:02:30 | 3 | 27.1 |

| 2024-02-06 10:03:00 | 2 | 26.7 |

| 2024-02-06 10:04:00 | 1 | 25.2 |

+-------------------+-----------+-------------------------------+
RESULT:
EXP 6.Analyse the dimensional Modeling

AIM:

To Analyse the dimensional Modeling

ALGORITHM:

1. Identify the business process

2. Identify dimensional and facts

3. Design the dimensional model

4. Define relationships

5. Optimize for query performance

PROGRAM:

1. Sales Fact Table:

sql

CREATE TABLE SalesFact (

SaleID INT PRIMARY KEY,

DateID INT,

ProductID INT,

QuantitySold INT,

AmountSoldDECIMAL(10, 2) );

2. *Date Dimension:*

sql

CREATE TABLE DateDim (

DateID INT PRIMARY KEY,

CalendarDate DATE,

Day INT,

Month INT,

Year INT );
-- Populate Date Dimension (sample data)

INSERT INTO DateDim (DateID, CalendarDate, Day, Month, Year)

VALUES

(1, '2024-01-01', 1, 1, 2024),

(2, '2024-01-02', 2, 1, 2024),

-- Add more dates as needed ;

3. *Product Dimension:*

sql

CREATE TABLE ProductDim (

ProductID INT PRIMARY KEY,

ProductName VARCHAR(255),

Category VARCHAR(50),

-- Additional attributes as needed );

-- Populate Product Dimension (sample data)

INSERT INTO ProductDim (ProductID, ProductName, Category)

VALUES

(101, 'Product A', 'Electronics'),

(102, 'Product B', 'Clothing'),

-- Add more products as needed ;

4. Query to retrieve sales with date and product details:

sql

SELECT

s.SaleID,

d.CalendarDate,

p.ProductName,

s.QuantitySold,

s.AmountSold
FROM

SalesFact s

JOIN DateDim d ON s.DateID = d.DateID

JOIN ProductDim p ON s.ProductID = p.ProductID;

This query retrieves sales information along with corresponding date and product details,
leveraging the dimensional model.

OUTPUT:

|----------------|------------|---------------|----------------|-------------------|

| SaleID | CalendarDate | ProductName | QuantitySold | AmountSold|

|--------------|--------------|---------------|---------------|---------------------|

|1 | 2024-01-01 | Product A | 10 | 100.00 |

| 2 | 2024-01-02 | Product B | 5 | 50.00 |

| 3 | 2024-01-02 | Product A | 8 | 80.00 |

RESULT:
7. Case study using OLAP

AIM:

To study case using OLAP

Introduction:

In this case study, we will explore how Online Analytical Processing (OLAP) technology was
implemented in a retail data warehousing environment to improve data analysis capabilities
and support decision-making processes. The case study will focus on a fictional retail
company, XYZ Retail, and the challenges they faced in managing and analyzing their vast
amounts of transactional data.

Background:

XYZ Retail is a large chain of stores with locations across the country. The company has
been experiencing rapid growth in recent years, leading to an increase in the volume of data
generated from sales transactions, inventory management, customer interactions, and other
operational activities. The existing data management system was struggling to keep up with
the demand for timely and accurate data analysis, hindering the company's ability to make
informed business decisions.

Challenges:

1. Lack of real-time data analysis: The existing data warehouse system was unable to provide
real-time insights into sales trends, inventory levels, and customer preferences.

2. Limited scalability: The data warehouse infrastructure was reaching its limits in terms of
storage capacity and processing power, making it difficult to handle the growing volume of
data.

3. Complex data relationships: The data stored in the warehouse was highly normalized,
making it challenging to perform complex queries and analyze data across multiple
dimensions.

Solution:

To address these challenges, XYZ Retail decided to implement an OLAP solution as part of
their data warehousing strategy. OLAP technology allows for multidimensional analysis of
data, enabling users to easily slice and dice information across various dimensions such as
time, product categories, geographic regions, and customer segments.

Implementation:
1. Data modeling: The data warehouse was redesigned using a star schema model, which
simplifies data relationships and facilitates OLAP cube creation.

2. OLAP cube creation: OLAP cubes were created to store pre-aggregated data for faster
query performance. The cubes were designed to support various dimensions and measures
relevant to the retail business.

3. Reporting and analysis: Business users were trained on how to use OLAP tools to create ad
hoc reports, perform trend analysis, and drill down into detailed data.

Results:

1. Improved data analysis: With OLAP technology in place, XYZ Retail was able to perform
complex analyses on sales data, identify trends, and make informed decisions based on real-
time insights.

2. Faster query performance: OLAP cubes enabled faster query performance compared to
traditional relational databases, allowing users to retrieve data more efficiently.

3. Enhanced decision-making: The ability to analyze data across multiple dimensions helped
XYZ Retail gain a deeper understanding of their business operations and customer behavior,
leading to more strategic decision-making.

Conclusion:

By leveraging OLAP technology in their data warehousing environment, XYZ Retail was
able to overcome the challenges of managing and analyzing vast amounts of data. The
implementation of OLAP not only improved data analysis capabilities but also empowered
business users to make informed decisions based on real-time insights. This case study
demonstrates the value of OLAP in enhancing data analysis and decision-making processes in
a retail environment.

RESULT:
8. Case study using OTLP

AIM:

To study case using OTLP

Introduction:

This case study explores the implementation of the Operational Data Layer Pattern (OTLP) in
a data warehousing environment to improve data integration, processing, and analytics
capabilities. The case study focuses on a fictional company, Tech Solutions Inc., and how
they leveraged OTLP to enhance their data warehousing operations.

Background:

Tech Solutions Inc. is a technology consulting firm that provides IT solutions to various
clients. The company collects a vast amount of data from different sources, including
customer interactions, sales transactions, and operational activities. The existing data
warehouse infrastructure was struggling to handle the growing volume of data and provide
real-time insights for decision-making.

Challenges:

1. Data silos: Data from different sources were stored in separate silos, making it difficult to
integrate and analyze data effectively.

2. Real-time data processing: The existing data warehouse was not capable of processing real
time data streams, leading to delays in data analysis and decision-making.

3. Scalability: The data warehouse infrastructure was reaching its limits in terms of storage
capacity and processing power, hindering the company's ability to scale with the growing
data volume.

Solution:

To address these challenges, Tech Solutions Inc. decided to implement the OTLP pattern in
their data warehousing environment. OTLP combines elements of both Operational Data
Store (ODS) and Traditional Data Warehouse (TDW) architectures to enable real-time data
processing, data integration, and analytical capabilities.

Implementation:

1. Data integration: Tech Solutions Inc. integrated data from various sources into the
operational data layer, where data transformations and cleansing processes were applied.

2. Real-time processing: The OTLP architecture allowed for real-time data processing,
enabling the company to analyze streaming data and generate insights in near real-time.
3. Analytics and reporting: Business users were provided with self-service analytics tools to
create ad-hoc reports, perform trend analysis, and gain actionable insights from the integrated
data.

Results:

1. Improved data integration: The OTLP architecture facilitated seamless integration of data
from multiple sources, breaking down data silos and enabling a unified view of the
company's operations.

2. Real-time analytics: With OTLP in place, Tech Solutions Inc. was able to analyze
streaming data in real-time, allowing for faster decision-making and response to market
trends.

3. Scalability: The OTLP architecture provided scalability to handle the growing volume of
data, ensuring that the company's data warehousing operations could support future growth.

Conclusion:

By implementing the Operational Data Layer Pattern (OTLP) in their data warehousing
environment, Tech Solutions Inc. was able to overcome the challenges of data silos, real-time
data processing, and scalability. The adoption of OTLP not only improved data integration
and analytics capabilities but also empowered business users to make informed decisions
based on real-time insights. This case study highlights the benefits of leveraging OTLP in
enhancing data warehousing operations for improved business outcomes.

RESULT:
9. Implementation of warehouse testing.

AIM:

To implement warehouse testing

Steps with program:

1. Install necessary libraries: pip install pytest pandas

2. Create a Python script for data transformation and loading:

# data_transformation.py

import pandas as pd

def transform_data(input_data):

# Perform data transformation logic here

transformed_data = input_data.apply(lambda x: x * 2)

return transformed_data

def load_data(transformed_data):

# Load transformed data into the operational data layer

transformed_data.to_csv('transformed_data.csv', index=False)

3. Create test cases using pytest:

# test_data_integration.py

import pandas as pd

import data_transformation

def test_transform_data():

input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

expected_output = pd.DataFrame({'A': [2, 4, 6], 'B': [8, 10, 12]})

transformed_data = data_transformation.transform_data(input_data)

assert transformed_data.equals(expected_output)

def test_load_data():

input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

data_transformation.load_data(input_data)
loaded_data = pd.read_csv('transformed_data.csv')

assert input_data.equals(loaded_data)

4. Run the tests using pytest: pytest test_data_integration.py

5. Analyze the test results to ensure that the data transformation and loading processes are
functioning correctly in the operational data layer.

By implementing automated tests for data integration processes in the data warehousing
environment, you can ensure the accuracy and reliability of the data transformation and
loading operations. This approach helps in identifying any issues or discrepancies early on in
the development cycle, leading to a more robust and efficient data warehousing system.

OUTPUT:

RESULT:

Military Soldier Safety and Weapon Detection Using YOLO and Computer Vision
No ratings yet
Military Soldier Safety and Weapon Detection Using YOLO and Computer Vision
5 pages
Module 1introduction To Microsoft SQL Server 2014
100% (1)
Module 1introduction To Microsoft SQL Server 2014
26 pages
Database System With Administration: Technical Assessment
100% (2)
Database System With Administration: Technical Assessment
13 pages
Veeam 9.5 User Guide
No ratings yet
Veeam 9.5 User Guide
1,121 pages
Crime File System Project Report
67% (3)
Crime File System Project Report
79 pages
RIS SQL User's Guide
No ratings yet
RIS SQL User's Guide
312 pages
DFS and BFS Algorithm
100% (1)
DFS and BFS Algorithm
11 pages
Module 2 Assignment: Query 1: Sales Order Shipments by Month and Category Code1
0% (1)
Module 2 Assignment: Query 1: Sales Order Shipments by Month and Category Code1
8 pages
SQL Server Command-Line Utilities: in This Chapter
No ratings yet
SQL Server Command-Line Utilities: in This Chapter
17 pages
Senior Oracle DBA Interview Questions
100% (2)
Senior Oracle DBA Interview Questions
37 pages
Simple Invoicing Desktop Database With MS Access 2013/2016
No ratings yet
Simple Invoicing Desktop Database With MS Access 2013/2016
24 pages
Summative Test Emtech Lesson 3&4
No ratings yet
Summative Test Emtech Lesson 3&4
1 page
ch-6 Class Diagrams
No ratings yet
ch-6 Class Diagrams
20 pages
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
No ratings yet
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
33 pages
WEKA
No ratings yet
WEKA
50 pages
An006 SD Ffs Drivers v1.0 0
No ratings yet
An006 SD Ffs Drivers v1.0 0
20 pages
Chapter N1 Introduction To Big Data
No ratings yet
Chapter N1 Introduction To Big Data
40 pages
Department of Computer Science CMP 222: File Organization and Management
No ratings yet
Department of Computer Science CMP 222: File Organization and Management
19 pages
Jena API - Introduction
No ratings yet
Jena API - Introduction
14 pages
Arraylist, Hashtable, Dictionary
No ratings yet
Arraylist, Hashtable, Dictionary
4 pages
Datawarehouse Final Edit-1
No ratings yet
Datawarehouse Final Edit-1
40 pages
Talend Big Data Reading A File
No ratings yet
Talend Big Data Reading A File
2 pages
Lab 12 Introduction To Rapidminer/Weka.: Objective
No ratings yet
Lab 12 Introduction To Rapidminer/Weka.: Objective
24 pages
Oracle Data Mining
No ratings yet
Oracle Data Mining
17 pages
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
No ratings yet
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
35 pages
DM Lab Cse
No ratings yet
DM Lab Cse
108 pages
Artikel Sirilus
No ratings yet
Artikel Sirilus
18 pages
Iare Data Preparation and Analysis Lab Manual
No ratings yet
Iare Data Preparation and Analysis Lab Manual
55 pages
Lakshay ISM 26
No ratings yet
Lakshay ISM 26
51 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Committing and Rolling Back A Transaction Using A Stored Procedure
No ratings yet
Committing and Rolling Back A Transaction Using A Stored Procedure
9 pages
DW Lab Record
No ratings yet
DW Lab Record
44 pages
DWDM Lab Manual: Department of Computer Science and Engineering
No ratings yet
DWDM Lab Manual: Department of Computer Science and Engineering
46 pages
BO Dev
No ratings yet
BO Dev
5 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
DWM1
No ratings yet
DWM1
19 pages
Data Warehouse Manual
No ratings yet
Data Warehouse Manual
15 pages
FINAL DW Record PDF
No ratings yet
FINAL DW Record PDF
32 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
CCS341 Set3
100% (1)
CCS341 Set3
3 pages
Grade 12 CS - Term 1-Self Test-2-28-04-2024
No ratings yet
Grade 12 CS - Term 1-Self Test-2-28-04-2024
5 pages
Shared Dimensions
No ratings yet
Shared Dimensions
21 pages
Data Science
No ratings yet
Data Science
24 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
44 pages
Database Careers
No ratings yet
Database Careers
18 pages
It5003 - Data Warehousing and Data Mining-1
No ratings yet
It5003 - Data Warehousing and Data Mining-1
5 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Assignment 05 ANSWERS
100% (1)
Assignment 05 ANSWERS
5 pages
Data Warehouse Lab Record
No ratings yet
Data Warehouse Lab Record
65 pages
Lab Manual Front and Back Except First Page
No ratings yet
Lab Manual Front and Back Except First Page
75 pages
Data Warehousing Lab Record Final
No ratings yet
Data Warehousing Lab Record Final
45 pages
DWDM Record Print1
No ratings yet
DWDM Record Print1
100 pages
Vijay DMPM
No ratings yet
Vijay DMPM
23 pages
DW Lab Manual 1-9
No ratings yet
DW Lab Manual 1-9
27 pages
DWDM File
No ratings yet
DWDM File
26 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
Datawarehouse Lab Manunaul Edited
No ratings yet
Datawarehouse Lab Manunaul Edited
34 pages
Oose Record
No ratings yet
Oose Record
29 pages
Employee Database Template
No ratings yet
Employee Database Template
3 pages
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
Virtualization Lab Record
No ratings yet
Virtualization Lab Record
33 pages
Data Warehouse Manuel
No ratings yet
Data Warehouse Manuel
44 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
DWH Manual Merged
No ratings yet
DWH Manual Merged
47 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
CC Processing Presentation
No ratings yet
CC Processing Presentation
20 pages
Military Safety, Weapon Detection
No ratings yet
Military Safety, Weapon Detection
9 pages
Military Soldier Safety
No ratings yet
Military Soldier Safety
11 pages
Question Bank For Embedded Systems and IoT
No ratings yet
Question Bank For Embedded Systems and IoT
17 pages
Coding
No ratings yet
Coding
6 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Shami Shaji
No ratings yet
Shami Shaji
2 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
ACE Scanner - 2025 - 04 - 27
No ratings yet
ACE Scanner - 2025 - 04 - 27
10 pages
OpenCV - Model Assessment
No ratings yet
OpenCV - Model Assessment
9 pages
Number Plate Project
No ratings yet
Number Plate Project
8 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
30 pages
Class Test2
No ratings yet
Class Test2
1 page
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
Data Warehousing Lab Record
No ratings yet
Data Warehousing Lab Record
30 pages
Datawarehousing Lab Manual
No ratings yet
Datawarehousing Lab Manual
22 pages
DataMiningManual Sawan
No ratings yet
DataMiningManual Sawan
30 pages
Software Requirements Specification For Exam Registration System
No ratings yet
Software Requirements Specification For Exam Registration System
2 pages
Data Warehousing Lab Excercise, 110
No ratings yet
Data Warehousing Lab Excercise, 110
45 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
DW Lab Manual (With Mini Project)
No ratings yet
DW Lab Manual (With Mini Project)
46 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Itdw
No ratings yet
Itdw
44 pages
OS Journal
No ratings yet
OS Journal
28 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
DW Record - 62
No ratings yet
DW Record - 62
27 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
DMDV
No ratings yet
DMDV
22 pages
Komal DWDM 1to5
No ratings yet
Komal DWDM 1to5
61 pages
CCS341-DW Mp-Set 2
No ratings yet
CCS341-DW Mp-Set 2
2 pages
CCS341-DW Mp-Set 1
No ratings yet
CCS341-DW Mp-Set 1
2 pages
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet

Data Warehousing Record

Uploaded by

Data Warehousing Record

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CCS341 DATA WAREHOUSING

……………………………………………. Register number ………………………………..

in the ……………………………………………laboratory in the ……………….Semester.

Staff-in-charge Head of the Department

Submitted for the University Examination held on ………………………………..

Internal Examiner External Examiner

To Explore Data and Integrate with WEKA

ALGORTIHM AND EXPLORES:

1. Download and install Weka. You can find it here:

2.Open the weka tool and select the explorer option.

3.New window will be opened which consists of different options (Preprocess,

4. In the preprocess, click the ―open file‖ option

5.Go to C:\Program Files\Weka-3-6\data for finding different existing. arff datasets.

Load each dataset and observe the following:

i. List the attribute names and they types

There are total 150 records (Instances) in dataset (IRIS.arff).

iii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels.

They are: 1. Iris-setosa 2. Iris-versicolor 3. Iris-virginica

iv. Plot Histogram

v. Determine the number of records for each class.

To Apply WEKA tool for Data Validation

Steps and Apply:

1. Load the dataset (Iris-2D. arff) into weka tool

2. Go to classify option & in left-hand navigation bar we can see differentclassification

Using Cross-Validation Strategy with 10 folds:

Here are the steps to plan the architecture:

7. Monitoring and maintenance: Set up monitoring mechanisms to track the performance

To write the query for schema definition

1. Create a new database

2. Switch to the newly created database

3. Define the schema for each table

4. Define relationships between tables (if needed)

5. Execute the schema definition queries

-- Create a new database named "library" CREATE DATABASE library;

-- Switch to the "library" databaseUSE library;

Database 'library' created.

Database changed to 'library'.

Table 'books' created successfully.

Table 'checkouts' created successfully

To Design data ware house for real time applications

ALGORITHM AND PROGRAM:

1. *Data Sources and Integration*:

CREATE PIPE snowpipe_real_time

COPY INTO temperature_data

FROM (SELECT $1::timestamp, $2::int, $3::float FROM @real_time_stage)

FILE_FORMAT = (TYPE = 'JSON');

2. *Data Storage and Modeling*:

-- Example: Creating tables for storing real-time data

CREATE TABLE temperature_data (

3. *Data Governance and Security*:

-- Example: Creating roles and granting privileges

CREATE ROLE analyst_role;

GRANT USAGE ON DATABASE my_database TO analyst_role;

GRANT SELECT ON temperature_data TO analyst_role;

-- Example: Monitoring query performance using Snowflake's query history

5. *Deployment and Testing*:

- Deployment would involve setting up Snowflake accounts, databases, and resources,

6. *Training and Documentation*: -

7. *Iterative Improvement and Maintenance*: -

This would involve ongoing monitoring of system performance, optimizing queries

| timestamp | sensor_id | temperature |

| 2024-02-06 10:00:00 | 1 | 25.5 |

| 2024-02-06 10:01:00 | 2 | 26.3 |

| 2024-02-06 10:02:00 | 1 | 24.8 |

| 2024-02-06 10:02:30 | 3 | 27.1 |

| 2024-02-06 10:03:00 | 2 | 26.7 |

| 2024-02-06 10:04:00 | 1 | 25.2 |

To Analyse the dimensional Modeling

1. Identify the business process

2. Identify dimensional and facts

3. Design the dimensional model

5. Optimize for query performance

1. *Sales Fact Table:*

CREATE TABLE SalesFact (

1. Data Sources and Integration:

2. Data Storage and Modeling:

3. Data Governance and Security:

5. Deployment and Testing:

6. Training and Documentation: -

7. Iterative Improvement and Maintenance: -

1. Sales Fact Table:

4. Query to retrieve sales with date and product details: