0% found this document useful (0 votes)

64 views27 pages

DW Lab Manual 1-9

The document outlines various exercises related to data exploration, validation, architecture planning, schema definition, data warehousing, dimensional modeling, and case studies using OLAP and OLTP. Each exercise includes aims, algorithms, and results, demonstrating successful execution of tasks using WEKA, SQL, and architectural design principles. The document emphasizes the importance of real-time processing, data integrity, and efficient querying in data management systems.

Uploaded by

shohi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views27 pages

DW Lab Manual 1-9

Uploaded by

shohi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Ex 1 -DATA EXPLORATION AND INTEGRATION WITH WEKA

AIM:
To Explore Data and Integrate with WEKA

ALGORTIHM AND EXPLORES:

1. Download and install Weka. You can find it here: https://sourceforge.net/projects/weka/

2. Open the weka tool and select the explorer option.

3. New window will be opened which consists of different options (Preprocess, Association etc.)

3. In the pre-process, click the―open file‖ option.

4. Go to C:\ProgramFiles\Weka-3-8-6\data for finding different existing. arff datasets. Click on any

dataset for loading the data then the data will be displayed.

OUTPUT:
RESULT:

Thus the data exploration and integration with WEKA executed successfully.
Ex 2 -APPLY WEKA TOOL FOR DATA VALIDATION

AIM:

To Apply WEKA tool for Data Validation

Steps and Apply:

1. Open file ... option under the Pre-process tag select the weather- nominal.arff file.

2. Go to classify option & in left-hand navigation bar we can see different classification algorithms

under rules section.

3. Click on the Choose button in the Filter sub window and select the following filter:

Applying Filters -weka->filters->supervised->attribute->Discretize.

weka->filters->supervised->attribute->Attribute Selection

Click on the Apply button and examine the temperature and/or humidity attribute.

4. Selecting Weka - Classifiers

5. Setting Test Data testing options as listed below

 Training set

 Supplied test set

 Cross-validation

 Percentage split

6. Selecting Classifier Click on the Choose button and select the following classifier.

weka->classifiers>trees>J48.

7. Visualize Results

 Select Visualize tree to get a visual representation of the traversal tree.

 Selecting Visualize classifier errors would plot the results of classification

8. The current plot is outlook versus play.

OUTPUT:
RESULT:

Thus the WEKA tool for Data Validation done successfully.

Ex 3 -PLAN THE ARCHITECTURE FOR REAL TIME APPLICATION

AIM:
Designing the architecture for a real−time application involves considering various factors such as
scalability, performance, reliability, and maintainability
Algorithm:

Here's a high−level guide to help you plan the architecture for your real−time application:

Define Requirements:

 Clearly define the functional and non−functional requirements of your real−time application.
 Identify the specific use cases and scenarios that require real−time processing.
System Components:
 Identify the major components of your system. This could include servers, databases, user
interfaces, external APIs, and more.
 Divide the system into smaller, manageable modules that can be developed, tested, and deployed
independently.
Scalability:

 Plan for scalability from the beginning. Consider how the system will handle an increase in load
and user activity.
 Use scalable infrastructure, such as cloud services, to easily adapt to changing demands.
Data Storage:
 Choose an appropriate database solution for real−time data storage and retrieval.
 Consider using in−memory databases or caching mechanisms to improve data access speed.
Real-time Processing:
 Decide on the technologies and frameworks for real−time processing. This may include stream
processing systems like Apache Kafka, Apache Flink, or Rabbit MQ.
 Implement mechanisms for event−driven architecture to handle real−time events efficiently.

Communication:
 Establish efficient communication channels between different components of the system. APIs,
message queues, and Web Socket protocols are common choices for real−time communication.
 Ensure low latency and high throughput for communication between components.
Fault Tolerance:
 Design the system to be fault−tolerant. Use redundant components, implement backup and
recovery strategies, and handle errors gracefully.
 Consider implementing micro services architecture to isolate failures and improve overall system
resilience.
Security:
 Prioritize security measures to protect real−time data and communication.
 Implement secure communication protocols, access controls, and encryption mechanisms.
Monitoring and Analytics:

 Incorporate monitoring tools to track the performance of your real−time application.

 Use analytics to gain insights into user behavior, system performance, and potential issues.

Testing:
 Develop a comprehensive testing strategy, including unit testing, integration testing, and
performance testing.
 Implement continuous integration and continuous deployment (CI/CD) pipelines to automate
testing and deployment processes.
Documentation:
 Document the architecture, APIs, and data flow to facilitate easier maintenance and future
development. Include clear documentation on how to troubleshoot and resolve common issues.
Compliance:
 Ensure that your real−time application complies with relevant regulations and standards,
especially if it involves sensitive data.
ARCHITECTURE:

Explanation:

The above reference architecture is generally applicable: Data streams in from a variety of

producers, typically delivered via Apache Kafka, Amazon Kinesis, or Azure Event Hub, to tools that

ingest it and deliver it to a range of data stores and analytics engines. Between source and destination the

data is prepared for consumption for a variety of reasons, including normalization, obfuscation of PII,

flattening of nested data, filtering, and joining of data from multiple sources.

RESULT:

Thus architecture for real time applications was planned.

EX 4 -WRITE THE QUERY FOR SCHEMA DEFINITION

AIM:

To write the query for schema definition.

ALGORITHM:

1. Create a new database

2. Switch to the newly created database

3. Define the schema for each table

4. Define relationships between tables (if needed)

5. Execute the schema definition queries

QUERY:

Create a new database named “library"

CREATE DATABASE library;

Switch to the “library"

Database USE library;

Define the schema for the"books"table

CREATE TABLE books (book_id INT AUTO_INCREMENT PRIMARY KEY, title VARCHAR(255)
NOT NULL, author VARCHAR(100) NOT NULL,publication_year INT, isbn VARCHAR(20),available
BOOLEAN DEFAULT TRUE);

Define the schema for the"members"table

CREATE TABLE members ( member_id INT AUTO_INCREMENT PRIMARY KEY, name

VARCHAR(100) NOT NULL, email VARCHAR(255) UNIQUE, phone_number VARCHAR(20),
address VARCHAR(255) );

Define the schema for the "checkouts" table

CREATE TABLE checkouts ( checkout_id INT AUTO_INCREMENT PRIMARY KEY, book_id INT
NOT NULL, member_id INT NOT NULL, checkout_date DATE NOT NULL, return_date DATE,
FOREIGN KEY (book_id) REFERENCES books(book_id), FOREIGN KEY (member_id)
REFERENCES members(member_id) );
OUTPUT:

RESULT:

Thus Schema Definition was written and executed successfully.

EX 5 -DESIGN DATA WARE HOUSE FOR REAL TIME APPLICATION

AIM:
Design a data warehouse for a real−time application to store and analyze large volumes of
real−time data efficiently.

ALGORITHM:

1. Define Dimensional Modeling:

Identify key business processes and metrics relevant to the real−time application.Define facts
(measurable metrics) and dimensions (contextual attributes) to create a dimensional model.

2. Select ETL Processes:

Choose Extract, Transform, Load (ETL) processes suitable for real−time data ingestion.Implement
mechanisms to continuously extract data from various sources, transform it to fit the data warehouse
schema, and load it efficiently.

3. Implement Star or Snowflake Schema:

Choose a schema design that optimizes query performance for analytical processing.For simplicity
and performance; consider a star schema where a central fact table is surrounded by dimension tables.
Each dimension table represents specific attributes related to the facts.

4. Define Fact Table:

Create a central fact table, e.g., "FactRealTimeData," to store real−time metrics. Include a
timestamp for time−based analysis and foreign keys referencing dimension tables for additional context.

5. Define Dimension Tables:

Create dimension tables (e.g., "Dimension1" and "Dimension2") to store contextual attributes.
Each dimension table has a primary key and attributes related to specific dimensions.

6. Implement Foreign Key Relationships:

Establish foreign key relationships between the fact table and dimension tables. Ensure referential
integrity to maintain consistency in the data warehouse.

QUERY:

Define a fact table

CREATE TABLE FactRealTimeData ( timestamp TIMESTAMP, metric1 INT,metric2 FLOAT,

dimension1_id INT, dimension2_id INT, PRIMARY KEY (timestamp),FOREIGN KEY (dimension1_id)
REFERENCES Dimension1(dimension1_id), FOREIGN KEY (dimension2_id) REFERENCES
Dimension2(dimension2_id));
Define dimension tables

CREATE TABLE Dimension1 (dimension1_id INT PRIMARY KEY, attribute1 VARCHAR(255),

attribute2 DATE);

CREATE TABLE Dimension2 ( dimension2_id INT PRIMARY KEY, attribute3

VARCHAR(255),attribute4 BOOLEAN);

OUTPUT:

OUTPUT EXPLANATION:

The SQL script provided in the output example creates a simple data warehouse structure.The
"FactRealTimeData" table stores real−time metrics with a timestamp and references two dimension tables
("Dimension1" and "Dimension2") to provide additional context to the metrics.This structure facilitates
efficient querying and analysis of real−time data within the context of various dimensions.

RESULT:

Implemented a real−time data warehouse using a star schema with "FactRealTimeData,"

"Dimension1,"and "Dimension2" tables. Fact table stores metrics with timestamps, and dimension tables
provide additional context.
EX 6- ANALYSE THE DIMENSIONAL MODELING

AIM:

To Analyse the dimensional Modeling

ALGORITHM:

1. Identify the business process

2. Identify dimensional and facts

3. Design the dimensional model

4. Define relationships

5. Optimize for query performance

QUERY:

1. Sales Fact Table: sql

CREATE TABLE SalesFact ( SaleID INT PRIMARY KEY, DateID INT, ProductID INT, QuantitySold
INT, AmountSold DECIMAL(10,2));

2. Date Dimension: sql

CREATE TABLE DateDim ( DateID INT PRIMARY KEY, CalendarDate DATE, Day INT, Month INT,
Year INT );

Populate Date Dimension (sample data)

INSERT INTO DateDim (DateID, CalendarDate, Day, Month, Year) VALUES (1, '2024-01-01', 1, 1,
2024), (2, '2024-01-02', 2, 1, 2024);

--Add more dates as needed

3. Product Dimension: sql

CREATE TABLE ProductDim ( ProductID INT PRIMARY KEY, ProductName VARCHAR(255),

Category VARCHAR(50));

-- Additional attributes as needed

Populate Product Dimension (sample data)

INSERT INTO ProductDim (ProductID, ProductName, Category) VALUES (101, 'Product A',
'Electronics'), (102, 'Product B', 'Clothing');
-- Add more products as needed
4. *Query to retrieve sales with date and product details:*

SELECT s.SaleID, d.CalendarDate, p.ProductName, s.QuantitySold, s.AmountSold FROM SalesFact s

JOIN DateDim d ON s.DateID = d.DateID JOIN ProductDim p ON s.ProductID = p.ProductID;

This query retrieves sales information along with corresponding date and product details, leveraging the
dimensional model.

OUTPUT:

RESULT:

Thus the dimensional modeling Analysed Successfully.

EX 7-CASE STUDY USING OLAP

AIM:
To study case using OLAP
Introduction:
OLAP:-
OLAP Stands for "Online Analytical Processing." OLAP allows users to analyze database
information from multiple database systems at one time. While relational databases are considered to be
two-dimensional, OLAP data is multidimensional, meaning the information can be compared in many
different ways. For example, a company might compare their computer sales in June with sales in July,
and then compare those results with the sales from another location, which might be stored in a different
database. In order to process database information using OLAP, an OLAP server is required to organize
and compare the information. Clients can analyze different sets of data using functions built into the
OLAP server. Some popular OLAP server software programs include Oracle Express Server and Hyperion
Solutions Essbase.

Purpose of OLAP:-
An effective OLAP solution solves problems for both business users and IT departments. For
business users, it enables fast and intuitive access to centralized data and related calculations for the
purposes of analysis and reporting. For IT, an OLAP solution enhances a data warehouse or other
relational database with aggregate data and business calculations. In addition, by enabling business users
to do their own analyses and reporting, OLAP systems reduce demands on IT resources.
OLAP offers five key benefits:
 Business-focused multidimensional data
 Business-focused calculations
 Trustworthy data and calculations
 Speed-of-thought analysis
 Flexible, self-service reporting

OLAP operations

These are used to analyze data in an OLAP cube. There are five basic operations:

Drill down

This makes the data more detailed by moving down the concept hierarchy or adding a new dimension. For
example, in a cube showing sales data by Quarter, drilling down would show sales data by Month.
Roll up

This makes the data less detailed by climbing up the concept hierarchy or reducing dimensions. For
example, in a cube showing sales data by City, rolling up would show sales data by Country.

Dice

This selects a sub-cube by choosing two or more dimensions and criteria. For example, in a cube showing
sales data by Location, Time, and Item, dicing could select sales data for Delhi or Kolkata, in Q1 or Q2,
for Cars or Buses.

Slice

This selects a single dimension and creates a new sub-cube. For example, in a cube showing sales data by
Location, Time, and Item, slicing by Time would create a new sub-cube showing sales data for Q1.

Pivot

This rotates the current view to get a new representation. For example, after slicing by Time, pivoting
could show the same data but with Location and Item as rows instead of columns

RESULT:
Thus case study using OLAP done successfully.
EX 8- CASE STUDY USING OTLP
AIM:
To study case using OTLP
Introduction:
OLTP or online transactional processing is a software program or operating system that supports
transaction-oriented applications in three-tier architecture. It facilitates and supports the execution of a
large number of real-time transactions in a database.

OLTP monitors daily transactions and is typically done over an internet-based multi-access
environment. It handles query processing and, at the same time, ensures and protects data integrity. The
efficacy of OLTP is determined by the number of transactions per second that it can process. OLTP
systems are optimized for transactional superiority hence, suitable for most monetary transactions.

The defining characteristic of OLTP transactions is atomicity and concurrency. Concurrency

prevents multiple users from changing the same data simultaneously. Atomicity (or indivisibility) ensures
that all transactional steps are completed for the transaction to be successful. If one step fails or is
incomplete, the entire transaction fails.

Atomic statefulness is a computing condition in which database changes are permanent, requiring
transactions to be completed successfully. OLTP systems enable inserting, deleting, changing, and
querying data in a database.

OLTP systems activities consist of gathering input data, processing the data, and updating it using
the data collected. OLTP is usually supported by a database management system (DBMS) and operates in
a client-server system. It also relies on advanced transaction management systems to facilitate multiple
concurrent updates.

OLTP Transaction Examples

OLTP systems facilitate many types of financial and non-financial transactions such as:
 Automated teller machines (ATMs)
 Online banking applications
 Online bookings for airline ticketing, hotel reservations, etc.
 Online and in-store credit card payment processing
 Order entry
 E-commerce and in-store purchases
 Password changes and sending text messages
OLTP systems are found in a broad spectrum of industries with a concentration in client-facing
environments.
OLTP Characteristics
1. Short response time
OLTP systems maintain very short response times to be effective for users. For example, responses from
an ATM operation need to be quick to make the process effective, worthwhile, and convenient.

2. Process small transactions

OLTP systems support numerous small transactions with a small amount of data executed simultaneously
over the network. It can be a mixture of queries and Data Manipulation Language (DML) overload. The
queries normally include insertions, deletions, updates, and related actions. Response time measures the
effectiveness of OLTP transactions, and millisecond responses are becoming common.

3. Data maintenance operations

Data maintenance operations are data-intensive computational reporting and data update programs that run
alongside OLTP systems without interfering with user queries.

4. High-level transaction volume and multi-user access

OLTP systems are synonymous with a large number of users accessing the same data at the same time.
Online purchases of a popular or trending gadget such as an iPhone may involve an enormous number of
users all vying for the same product. The system is built to handle such situations expertly.
5. Very high concurrency
An OLTP environment experiences very high concurrency due to the large user population, small
transactions, and very short response times. However, data integrity is maintained by a concurrency
algorithm, which prevents two or more users from altering the same data at the same time. It prevents
double bookings or allocations in online ticketing and sales, respectively.

A mobile money transfer application is a good example where concurrency is very high as thousands of
users can be making transfers simultaneously on the platform at every time of the day.

6. Round-the-clock availability
OLTP systems often need to be available round the clock, 24/7, without interruption. A small period of
unavailability or offline operations can significantly impact a large number of people and an equally huge
transaction quantity.Downtimes can also pose potential losses to organizations, e.g., an online banking
system downtime has adverse consequences to the bank’s bottom line. Therefore, an OLTP system
requires frequent, regular, and incremental backup.
7. Data usage patterns
OLTP systems experience periods of both high data usage and low data usage. Finance-related OLTP
systems typically see high data usage during month ends when financial obligations are settled.

8. Indexed data sets

Index data sets are used to facilitate rapid query, search, and retrieval.

9. Normalized schema
OLTP systems utilize a fully normalized schema for database consistency.

10. Storage
OLTP stores data records for the past few days or about a week. It supports sophisticated data models and
tables.
1. Business Strategy
The business strategy influences the OLTP systems design. The strategy is formulated at the senior
management and the level of the board of directors.

2. Business Process
They are processes by the OLTP system that will accomplish the goals set by the business strategy. The
processes comprise a set of activities, tasks, and actions.

3. Product, Customer/Supplier, Transactions, Employees

The OLTP database contains information on products, transactions, employees, and customers, and
suppliers.

4. Extract, Transform, Load (ETL) Process

The ETL process extracts data from the OLTP database and transforms it into the staging area, which
includes data cleansing and optimizing the data for analysis. The transformed data is then loaded into the
online analytical processing (OLAP) database, which is synonymous with the data warehouse
environment.

5. Data Warehouse and Data Mart

Data warehouses are central repositories of integrated data from one or more incongruent sources. A data
mart is an access layer of the data warehouse that is used to access specific/summarized information of a
unit or department.

6. Data Mining, Analytics, and Decision Making

The data stored in the data warehouse and data mart is used for analysis, data mining, and decision
making.

RESULT:
Thus case study using OTLP done successfully.
EX 9 - IMPLEMENTATION OF WAREHOUSE TESTING
AIM:
To implement warehouse testing.
ALGORITHM:
1. Install necessary libraries
pip install pytest pandas.
2. Create a Python script for data transformation and loading
3. Create test cases using pytest
4. Run the tests using pytest:
pytest test_data_integration.py
5. Analyze the test results to ensure that the data transformation and loading processes are functioning
correctly in the operational data layer.
By implementing automated tests for data integration processes in the data warehousing
Environment, you can ensure the accuracy and reliability of the data transformation and Loading
operations. This approach helps in identifying any issues or discrepancies early on in the development
cycle, leading to a more robust and efficient data warehousing system.

PROGRAM:

# data_transformation.py
import pandas as pd
def transform_data(input_data):
# Perform data transformation logic here
transformed_data = input_data.apply(lambda x: x * 2)
return transformed_data
def load_data(transformed_data):
# Load transformed data into the operational data layer
transformed_data.to_csv('transformed_data.csv', index=False)

# test_data_integration.py
import pandas as pd
import data_transformation
def test_transform_data():
input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
expected_output = pd.DataFrame({'A': [2, 4, 6], 'B': [8, 10, 12]})
transformed_data = data_transformation.transform_data(input_data)
assert transformed_data.equals(expected_output)
def test_load_data():
input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
data_transformation.load_data(input_data)
loaded_data = pd.read_csv('transformed_data.csv')
assert input_data.equals(loaded_data)

OUTPUT:
RESULT:

Thus implementation of warehouse testing done successfully.

6.section C - Electrics
100% (9)
6.section C - Electrics
225 pages
gmw3110 2010
80% (5)
gmw3110 2010
336 pages
TAP Document Issue 4 Volume II
No ratings yet
TAP Document Issue 4 Volume II
299 pages
Ubd Term 3 Unit Planner Year 7 Humanities
No ratings yet
Ubd Term 3 Unit Planner Year 7 Humanities
7 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
Datawarehousing Lab Manual
No ratings yet
Datawarehousing Lab Manual
22 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
26 pages
Rainfall Analysis Implementing On Data Warehouse
No ratings yet
Rainfall Analysis Implementing On Data Warehouse
12 pages
Lab Manual Front and Back Except First Page
No ratings yet
Lab Manual Front and Back Except First Page
75 pages
Datawarehouse Final Edit-1
No ratings yet
Datawarehouse Final Edit-1
40 pages
Data Warehouse Manual
No ratings yet
Data Warehouse Manual
15 pages
DW Lab Manual-Updated1
No ratings yet
DW Lab Manual-Updated1
50 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
CT 2
No ratings yet
CT 2
8 pages
Guidelines Data Warehousing Design
No ratings yet
Guidelines Data Warehousing Design
3 pages
DW Lab Manual (With Mini Project)
No ratings yet
DW Lab Manual (With Mini Project)
46 pages
ETL Question and Answers
No ratings yet
ETL Question and Answers
6 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Design A Workflow Management Platform Like Apache Airflo
No ratings yet
Design A Workflow Management Platform Like Apache Airflo
4 pages
Index: 1.1 Key Features
No ratings yet
Index: 1.1 Key Features
53 pages
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
No ratings yet
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
52 pages
Group 3&4 Assignment Sample Solution
No ratings yet
Group 3&4 Assignment Sample Solution
5 pages
System Design
No ratings yet
System Design
6 pages
Adbms
No ratings yet
Adbms
19 pages
DWH Manual Merged
No ratings yet
DWH Manual Merged
47 pages
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
No ratings yet
Difference Between OLAP and OLTP: Feature OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)
34 pages
Real Time Data Streaming To Data Warehouse Seminar
No ratings yet
Real Time Data Streaming To Data Warehouse Seminar
26 pages
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
Data Warehousing Lab Excercise, 110
No ratings yet
Data Warehousing Lab Excercise, 110
45 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
DW Lab Case Study
No ratings yet
DW Lab Case Study
11 pages
Data Ware Unit 2
No ratings yet
Data Ware Unit 2
23 pages
Ds 6
No ratings yet
Ds 6
7 pages
Data Engineer Interview Questions With Examples
No ratings yet
Data Engineer Interview Questions With Examples
8 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions With Solutions For Data Warehousing & Data Mining (BCS058)
119 pages
Architecting A Data Lake
100% (9)
Architecting A Data Lake
60 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
Design Data architecture 1st Unit
No ratings yet
Design Data architecture 1st Unit
58 pages
Lecture 2 Scalable Data Systems
No ratings yet
Lecture 2 Scalable Data Systems
41 pages
Unit-1ppt
No ratings yet
Unit-1ppt
39 pages
Name - Nityananda Vyawhare Roll No. - 2223216 TY Core - 2: Unit-3
No ratings yet
Name - Nityananda Vyawhare Roll No. - 2223216 TY Core - 2: Unit-3
22 pages
U - 2 A C D W: NIT Rchitectural Omponents of A ATA Arehouse
No ratings yet
U - 2 A C D W: NIT Rchitectural Omponents of A ATA Arehouse
103 pages
Canteen Management DOC
No ratings yet
Canteen Management DOC
83 pages
Medical
No ratings yet
Medical
3 pages
Systems Analysis and Design 3
No ratings yet
Systems Analysis and Design 3
5 pages
Data Engineer Interview - Assessment of ETL Designs
No ratings yet
Data Engineer Interview - Assessment of ETL Designs
13 pages
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
No ratings yet
Data Lakehouse, Data Mesh, and Data Fabric - SqlBits
35 pages
Apuntes Big Data Ii
No ratings yet
Apuntes Big Data Ii
11 pages
Project Guidelines & Index
No ratings yet
Project Guidelines & Index
8 pages
Lakehouse With Delta Lake Deep Dive
100% (2)
Lakehouse With Delta Lake Deep Dive
64 pages
Unit 2
No ratings yet
Unit 2
14 pages
DL Vs DLH Draft v0.1
No ratings yet
DL Vs DLH Draft v0.1
9 pages
Noted Assignment
No ratings yet
Noted Assignment
4 pages
DW Univ
No ratings yet
DW Univ
12 pages
Real Scenarios On Data Term 1722747078
No ratings yet
Real Scenarios On Data Term 1722747078
11 pages
2 Data Warehousing
No ratings yet
2 Data Warehousing
85 pages
WED 0830 McKnight William COLOR 10015
No ratings yet
WED 0830 McKnight William COLOR 10015
21 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Distribution's and Variables
No ratings yet
Distribution's and Variables
4 pages
IP Lab Record 25.8.18 - Updated
No ratings yet
IP Lab Record 25.8.18 - Updated
54 pages
Microelectricity Generation From Higher Head Water Tanks
No ratings yet
Microelectricity Generation From Higher Head Water Tanks
8 pages
CS3551
100% (1)
CS3551
4 pages
EACOP ESIA Report Executive Summary 2020
No ratings yet
EACOP ESIA Report Executive Summary 2020
24 pages
Enviromental and Social Impact Assessment (Esia) : Eucalyptus Plantation Departments of Concepción and Amambay - Paraguay
No ratings yet
Enviromental and Social Impact Assessment (Esia) : Eucalyptus Plantation Departments of Concepción and Amambay - Paraguay
379 pages
Guitar Syllabus For Achievement
100% (9)
Guitar Syllabus For Achievement
87 pages
Werner CH 4
No ratings yet
Werner CH 4
99 pages
UPS Catalogue
No ratings yet
UPS Catalogue
32 pages
Properties of Inequalities: A B A 5 A 5 A 5
No ratings yet
Properties of Inequalities: A B A 5 A 5 A 5
2 pages
Unit 1 Introduction To Sociology 1
No ratings yet
Unit 1 Introduction To Sociology 1
7 pages
Traffic and Highway Engineering, 5th Edition Garber Nicholas J. instant download
No ratings yet
Traffic and Highway Engineering, 5th Edition Garber Nicholas J. instant download
102 pages
Resume: B) Company: - Alcatel Lucent Managed Solutions India PVT Ltd.
No ratings yet
Resume: B) Company: - Alcatel Lucent Managed Solutions India PVT Ltd.
4 pages
Social Phenomenology Husserl Intersubjectivity: and Collective Intentionality 1st Edition Eric S. Chelstrom
No ratings yet
Social Phenomenology Husserl Intersubjectivity: and Collective Intentionality 1st Edition Eric S. Chelstrom
74 pages
Are Maintening
No ratings yet
Are Maintening
4 pages
08 FIR Filter For Speckle Noise Rejection
No ratings yet
08 FIR Filter For Speckle Noise Rejection
59 pages
HTML To React & Figma by Magic Patterns - Chrome Web Store
No ratings yet
HTML To React & Figma by Magic Patterns - Chrome Web Store
3 pages
Grade 4 Rationalized Agriculture and Nutrition Lesson Notes
No ratings yet
Grade 4 Rationalized Agriculture and Nutrition Lesson Notes
26 pages
(Archives Internationales D’Histoire Des Idees _ International Archives of the History of Ideas 5) Gregor Sebba (auth.) - Bibliographia Cartesiana_ A Critical Guide to the Descartes Literature 1800–19.pdf
No ratings yet
(Archives Internationales D’Histoire Des Idees _ International Archives of the History of Ideas 5) Gregor Sebba (auth.) - Bibliographia Cartesiana_ A Critical Guide to the Descartes Literature 1800–19.pdf
520 pages
Yadav
No ratings yet
Yadav
9 pages
Dr. Indra Saputra, Senin, 19 Juni 23
No ratings yet
Dr. Indra Saputra, Senin, 19 Juni 23
2 pages
Reviewer in P.E
No ratings yet
Reviewer in P.E
2 pages
62.questions STS
No ratings yet
62.questions STS
6 pages
NCM 110 - Ariola BSN2-5 Finals Ip
No ratings yet
NCM 110 - Ariola BSN2-5 Finals Ip
15 pages
Science 3 DLP 9 - Caring For The Eye
100% (2)
Science 3 DLP 9 - Caring For The Eye
10 pages
CXS - 033e Codex Aceite de Oliva
No ratings yet
CXS - 033e Codex Aceite de Oliva
9 pages
Stralis Trakker Euro Vi
No ratings yet
Stralis Trakker Euro Vi
23 pages
Mitsubishi Puhz W85vha
No ratings yet
Mitsubishi Puhz W85vha
44 pages
Sidhali Parsekar 2019
No ratings yet
Sidhali Parsekar 2019
21 pages
Led 2
No ratings yet
Led 2
13 pages
Case Study OsMak
No ratings yet
Case Study OsMak
4 pages
Canary Melon Oil Proposal PABPLEK
No ratings yet
Canary Melon Oil Proposal PABPLEK
5 pages

DW Lab Manual 1-9

Uploaded by

DW Lab Manual 1-9

Uploaded by

Ex 1 -DATA EXPLORATION AND INTEGRATION WITH WEKA

ALGORTIHM AND EXPLORES:

1. Download and install Weka. You can find it here: https://sourceforge.net/projects/weka/

2. Open the weka tool and select the explorer option.

3. In the pre-process, click the―open file‖ option.

4. Go to C:\ProgramFiles\Weka-3-8-6\data for finding different existing. arff datasets. Click on any

To Apply WEKA tool for Data Validation

Steps and Apply:

under rules section.

Applying Filters -weka->filters->supervised->attribute->Discretize.

4. Selecting Weka - Classifiers

5. Setting Test Data testing options as listed below

 Supplied test set

 Select Visualize tree to get a visual representation of the traversal tree.

 Selecting Visualize classifier errors would plot the results of classification

8. The current plot is outlook versus play.

Thus the WEKA tool for Data Validation done successfully.

 Incorporate monitoring tools to track the performance of your real−time application.

Thus architecture for real time applications was planned.

To write the query for schema definition.

1. Create a new database

2. Switch to the newly created database

3. Define the schema for each table

4. Define relationships between tables (if needed)

5. Execute the schema definition queries

Create a new database named “library"

CREATE DATABASE library;

Switch to the “library"

Database USE library;

Define the schema for the"books"table

Define the schema for the"members"table

CREATE TABLE members ( member_id INT AUTO_INCREMENT PRIMARY KEY, name

Define the schema for the "checkouts" table

Thus Schema Definition was written and executed successfully.

1. Define Dimensional Modeling:

2. Select ETL Processes:

3. Implement Star or Snowflake Schema:

4. Define Fact Table:

5. Define Dimension Tables:

6. Implement Foreign Key Relationships:

Define a fact table

CREATE TABLE FactRealTimeData ( timestamp TIMESTAMP, metric1 INT,metric2 FLOAT,

CREATE TABLE Dimension1 (dimension1_id INT PRIMARY KEY, attribute1 VARCHAR(255),

CREATE TABLE Dimension2 ( dimension2_id INT PRIMARY KEY, attribute3

Implemented a real−time data warehouse using a star schema with "FactRealTimeData,"

To Analyse the dimensional Modeling

1. Identify the business process

2. Identify dimensional and facts

3. Design the dimensional model

5. Optimize for query performance

1. *Sales Fact Table:* sql

2. *Date Dimension:* sql

Populate Date Dimension (sample data)

--Add more dates as needed

3. *Product Dimension:* sql

CREATE TABLE ProductDim ( ProductID INT PRIMARY KEY, ProductName VARCHAR(255),

-- Additional attributes as needed

Populate Product Dimension (sample data)

SELECT s.SaleID, d.CalendarDate, p.ProductName, s.QuantitySold, s.AmountSold FROM SalesFact s

Thus the dimensional modeling Analysed Successfully.

The defining characteristic of OLTP transactions is atomicity and concurrency. Concurrency

OLTP Transaction Examples

2. Process small transactions

3. Data maintenance operations

4. High-level transaction volume and multi-user access

8. Indexed data sets

3. Product, Customer/Supplier, Transactions, Employees

4. Extract, Transform, Load (ETL) Process

5. Data Warehouse and Data Mart

6. Data Mining, Analytics, and Decision Making

Thus implementation of warehouse testing done successfully.

You might also like

1. Sales Fact Table: sql

2. Date Dimension: sql

3. Product Dimension: sql