INFO408 Database

The document describes the architecture of a data warehouse including its key components and data flows. It outlines a three-tier architecture with an operational data source at the bottom tier, an OLAP server in the middle tier, and front-end client tools in the top tier. Data warehouses store raw, summary, and metadata from various sources. ETL tools extract, transform, and load this data. Reporting, analysis, and data mining tools allow users to access and analyze the stored data. Data marts segment the warehouse data for specific user groups. The document also defines data mining and describes the Apriori and FP-Tree algorithms.

Uploaded by

Tanaka Matend

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

INFO408 Database

Uploaded by

Tanaka Matend

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

QUESTION ONE

With the aid of a diagram, describe the architecture of a data warehouse (identify the components in the
architecture and the flows in the architecture) [15 marks]
Data Warehouse architecture is based on a Relational database management system server that functions as the
central repository for informational data. In the data warehouse architecture, operational data and processing are
separate from data warehouse processing. This central information repository is surrounded by several key
components designed to make the entire environment functional, manageable, and accessible by both the
operational systems that source data into the warehouse and by the end-user query and analysis tools.
Usually, a Data Warehouse adopts a three-tier architecture. The three-tier architecture of a data warehouse follows
the below.
Bottom Tier: The bottom tier of the architecture represents the data warehouse database server, also known as the
relational database system. Back-end tools and utilities are made use of to feed data into the bottom tier. These
back-end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Middle Tier: The middle tier of a data warehouse lies the OLAP Server which is an extended relational database
management system. The ROLAP maps the operations on multidimensional data to standard relational OLAP
(MOLAP) model, which directly implements the multidimensional data and operations.
Top-Tier: This tier represents the front-end client layer. This layer holds the query tools and reporting tools,
analysis tools and data mining tools.
The following diagram depicts the three-tier architecture of data warehouse
SourcesDataWarehouseDataPresentation
Operational System Reporting Tools 1 Metadata
Marts Staging area
ETL
TOOLS
Analysis Tools
Raw Data
Summary Data
External data
2
3
Data mining Tools
Flat files
Data Warehouse Components
From the architectures outlined above, some components overlap, while others are unique to the
number of tiers.
ETL Tools
ETL stands for Extract, Transform, and Load. The staging layer uses ETL tools to extract the
needed data from various formats and checks the quality before loading it into the data
warehouse.
The Database
The most crucial component and the heart of each architecture is the database. The warehouse is
where the data is stored and accessed.
Data
Once the system cleans and organizes the data, it stores it in the data warehouse. The data
warehouse represents the central repository that stores metadata, summary data, and raw data
coming from each source.
Metadata is the information that defines the data. Its primary role is to simplify working with
data instances. It allows data analysts to classify, locate, and direct queries to the required data.
Summary data is generated by the warehouse manager. It updates as new data loads into the
warehouse. This component can include lightly or highly summarized data. Its main role is to
speed up query performance.
Raw data is the actual data loading into the repository, which has not been processed. Having the
data in its raw form makes it accessible for further processing and analysis.
Access Tools
Users interact with the gathered information through different tools and technologies. They can
analyze the data, gather insight, and create reports.
Some of the tools used include:
Reporting tools. They play a crucial role in understanding how your business is doing and what
should be done next. Reporting tools include visualizations such as graphs and charts showing
how data changes over time.
OLAP tools. Online analytical processing tools which allow users to analyze multidimensional
data from multiple perspectives. These tools provide fast processing and valuable analysis. They
extract data from numerous relational data sets and reorganize it into a multidimensional format.
Data mining tools. Examine data sets to find patterns within the warehouse and the correlation
between them. Data mining also helps establish relationships when analyzing multidimensional
data
Data Marts
Data marts allow you to have multiple groups within the system by segmenting the data in the
warehouse into categories. It partitions data, producing it for a particular user group.
For instance, you can use data marts to categorize information by departments within the
company.
b) Explain the reasons for creating a data mart from the data warehouse and describe the
architecture of the data mart. [5marks]
A data mart is the access layer of a data warehouse that is used to provide users with data. Data
marts are often seen as small slices of the data warehouse. Data warehouses typically house
enterprise-wide data, and information stored in a data mart usually belongs to a specific
department or team.
The key objective for data marts is to provide the business user with the data that is most relevant,
in the shortest possible amount of time. This allows users to develop and follow a train of
thought, without needing to wait long periods for queries to complete. Data marts are designed to
meet the demands of a specific group and have a comparatively narrow subject area. However,
narrow in focus doesn’t necessarily mean small in size. Data marts may contain millions of
records and require gigabytes of storage
QUESTION TWO
a) Define what data mining is and highlight the different styles of data mining that are
available. [5marks]

Data Mining?
It is a process of extracting useful information or knowledge from a tremendous amount of data
(or big data). The different styles of data mining that are available:
Association, Classification, Clustering Analysis, Prediction, Sequential Patterns or Pattern
Tracking, Decision Trees, Outlier Analysis or Anomaly Analysis, Neural Network.
b) With the aid of appropriate examples explain the following data mining algorithms:
i) Apriori Algorithm [8 marks]
Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent itemsets
and relevant association rules. It is devised to operate on a database containing a lot of
transactions, for instance, items brought by customers in a store.
It is very important for effective Market Basket Analysis and it helps the customers in purchasing
their items with more ease which increases the sales of the markets. It has also been used in the
field of healthcare for the detection of adverse drug reactions. It produces association rules that
indicates what all combinations of medications and patient characteristics lead to ADRs.
Another basic example is when we go grocery shopping then which items, we frequently
purchase together is been analysed by the shop owner, is using apriori algorithm. So that the
shopkeeper then arrange that frequently bought together items in same shelf so that it will be easy
to buy by the customer.
Basic principle on which Apriori Machine Learning Algorithm works:
• If an item set occurs frequently then all the subsets of the item set, also occur frequently.
• If an item set occurs infrequently then all the supersets of the item set have infrequent
occurrence.
Applications of Apriori Algorithm
Detecting Adverse Drug Reactions
Apriori algorithm is used for association analysis on healthcare data like-the drugs taken by
patients, characteristics of each patient, adverse ill-effects patients experience, initial diagnosis,
etc. This analysis produces association rules that help identify the combination of patient
characteristics and medications that lead to adverse side effects of the drugs.
Market Basket Analysis
Many e-commerce giants like Amazon use Apriori to draw data insights on which products are
likely to be purchased together and which are most responsive to promotion. For example, a
retailer might use Apriori to predict that people who buy sugar and flour are likely to buy eggs to
bake a cake.
Auto-Complete Applications
Google auto-complete is another popular application of Apriori wherein - when the user types a
word, the search engine looks for other associated words that people usually type after a specific
word.
ii) Frequent Pattern Tree Algorithm [7 marks]
FP-tree(Frequent Pattern tree) is the data structure of the FP-growth algorithm for mining
frequent itemsets from a database by using association rules, it is the alternative of the apriori-like
algorithm. The frequent-pattern tree(FP-tree) structure, is a tree data structure for storing frequent
patterns.
The algorithm is designed to operate on databases containing transactions, such as customers’
purchase history on the Amazon website. The purchased item is considered ‘frequent’. The
similar frequent will share the similar branch of the tree, and when they differ, the nodes will split
them. The node identifies a single item from the branch (set of items), and the branch (path)
shows the number of occurrences—links between the items called node-link.
For example, a supermarket sees that there are 200 customers on Friday evening. Out of the 200
customers, 100 bought chickens, and out of the 100 customers who bought chicken, 50 have
bought Onions. Thus, the association rule would be- If customers buy chicken, then buy onion
too, with a support of 50/200 = 25% and a confidence of 50/100=50%.
Another example, in market-based analysis if the minimum threshold is 30% and bread appears
with eggs and milk more than three times or at least three times then it will be a frequent itemset
Frequent pattern mining can be used in a variety of real-world applications. It can be used in
super markets for selling, product placement on shelves, for promotion rules and in text
searching. It can be used in wireless sensor networks especially in smart homes with sensors
attached on Human Body or home usage objects and other applications that require monitoring
of user environment carefully that are subject to critical conditions or hazards such as gas leak,
fire and explosion. These frequent patterns can be used to monitor the activities for dementia
patients. It can be seen as an important approach with the ability to monitor activities of daily
life in smart environment for tracking functional decline among dementia patients

Full download ROS Basics in 5 Days 1st Edition Miguel Angel Rodríguez pdf docx
100% (5)
Full download ROS Basics in 5 Days 1st Edition Miguel Angel Rodríguez pdf docx
55 pages
Alex Xu System Design
100% (2)
Alex Xu System Design
16 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Data Warehousing Components - L3 - L4 - L5
No ratings yet
Data Warehousing Components - L3 - L4 - L5
26 pages
CS2032 Unit I Notes
No ratings yet
CS2032 Unit I Notes
23 pages
12 01 09 10 32 12 1287 Sindhujam PDF
No ratings yet
12 01 09 10 32 12 1287 Sindhujam PDF
23 pages
Week 5 Chapter 6
No ratings yet
Week 5 Chapter 6
29 pages
2 Data Warehousing Components L3 L4 L5
No ratings yet
2 Data Warehousing Components L3 L4 L5
26 pages
dwh
No ratings yet
dwh
34 pages
Introduction to Data Warehouse
No ratings yet
Introduction to Data Warehouse
17 pages
TIS Chapter 3
No ratings yet
TIS Chapter 3
36 pages
Recent Trends in IT
No ratings yet
Recent Trends in IT
7 pages
Data Mining Basics
No ratings yet
Data Mining Basics
20 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
ISM Data warehousing-1
No ratings yet
ISM Data warehousing-1
23 pages
Dzone Refcard160 Datawarehousing Updated
No ratings yet
Dzone Refcard160 Datawarehousing Updated
9 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
adbms-unit5 (1)
No ratings yet
adbms-unit5 (1)
10 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Lecture-5
No ratings yet
Lecture-5
22 pages
OLAP and Data Mining
No ratings yet
OLAP and Data Mining
27 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
No ratings yet
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
10 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
Data Warehousing & Mining: Unit - Ii
No ratings yet
Data Warehousing & Mining: Unit - Ii
41 pages
Data Repositories in Data Analytics
No ratings yet
Data Repositories in Data Analytics
8 pages
Unit 2 Data Mining & Warehouse
No ratings yet
Unit 2 Data Mining & Warehouse
40 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
31 pages
Ba Unit 2
No ratings yet
Ba Unit 2
20 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
Data Warehousing Mining
No ratings yet
Data Warehousing Mining
26 pages
unit-1
No ratings yet
unit-1
23 pages
Data Anlytics Full Notes
No ratings yet
Data Anlytics Full Notes
186 pages
Data Warehousing
No ratings yet
Data Warehousing
35 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
BI- Chap 2 Data Warehouses
No ratings yet
BI- Chap 2 Data Warehouses
31 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
Designing The Data Warehouse - Part 1
100% (2)
Designing The Data Warehouse - Part 1
45 pages
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
119 pages
m203212 Tendai Ashonhiwa
No ratings yet
m203212 Tendai Ashonhiwa
8 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
12 pages
DA Unit 1
No ratings yet
DA Unit 1
24 pages
U1_DA(R18)_20102021
No ratings yet
U1_DA(R18)_20102021
23 pages
1intro - Data Mining
No ratings yet
1intro - Data Mining
61 pages
data mart ian
No ratings yet
data mart ian
8 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
Ba Important
No ratings yet
Ba Important
13 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
Chapter Four
No ratings yet
Chapter Four
43 pages
02-dw Architecture
No ratings yet
02-dw Architecture
31 pages
dwm 2
No ratings yet
dwm 2
31 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
From Everand
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
Rob Botwright
No ratings yet
Group 4 Presentation Information Security
No ratings yet
Group 4 Presentation Information Security
36 pages
Group 2 Info409 Presentation
No ratings yet
Group 2 Info409 Presentation
55 pages
Group 5 The OSI Security Architecture
No ratings yet
Group 5 The OSI Security Architecture
17 pages
Group 4 Presentation Physical Security
No ratings yet
Group 4 Presentation Physical Security
36 pages
Tia Portal V17 Technical Highlights
No ratings yet
Tia Portal V17 Technical Highlights
50 pages
Fake Name Generator _ FauxID.com
No ratings yet
Fake Name Generator _ FauxID.com
3 pages
TSG - ASM Program Backup
No ratings yet
TSG - ASM Program Backup
6 pages
Emotion Recognition From Facial Expression of Autism Spectrum Disordered Children Using Image Processing and Machine Learning Algorithms
No ratings yet
Emotion Recognition From Facial Expression of Autism Spectrum Disordered Children Using Image Processing and Machine Learning Algorithms
47 pages
Vga
No ratings yet
Vga
13 pages
Rabbit Tape by Safari Pedals
No ratings yet
Rabbit Tape by Safari Pedals
9 pages
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
No ratings yet
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
90 pages
Lecture 4 - Functions - IT
No ratings yet
Lecture 4 - Functions - IT
43 pages
CAM2 Measure 10 System Requirements: Paula Toth 2/18/2015
No ratings yet
CAM2 Measure 10 System Requirements: Paula Toth 2/18/2015
3 pages
3 Dcrack
No ratings yet
3 Dcrack
2 pages
Pseudocode Notes
No ratings yet
Pseudocode Notes
13 pages
CV Srinivas Edited
No ratings yet
CV Srinivas Edited
2 pages
Clipx: The Facts: Bm40 Bm40Pb Bm40Ie
No ratings yet
Clipx: The Facts: Bm40 Bm40Pb Bm40Ie
1 page
HyperElk Levelling Tips
No ratings yet
HyperElk Levelling Tips
4 pages
SprintPG 1.0-Payment API Document-REV 180724
No ratings yet
SprintPG 1.0-Payment API Document-REV 180724
18 pages
Computer Hardware Servicing
100% (1)
Computer Hardware Servicing
44 pages
Iot Based Autonomous Floor Cleaning Robot: A Projrct Report On
No ratings yet
Iot Based Autonomous Floor Cleaning Robot: A Projrct Report On
47 pages
Field Safety Notice Form
No ratings yet
Field Safety Notice Form
2 pages
How To Bypass Internet Connection To Install Windows 11
No ratings yet
How To Bypass Internet Connection To Install Windows 11
8 pages
Wallet Statement 1 - 1 2021-08-25 - 2021-09-04
No ratings yet
Wallet Statement 1 - 1 2021-08-25 - 2021-09-04
2 pages
Model Os QP 2024
No ratings yet
Model Os QP 2024
3 pages
1.4.4 Assembly Language
No ratings yet
1.4.4 Assembly Language
7 pages
An Introduction To Vlsi Processor Architecture For Gaas
No ratings yet
An Introduction To Vlsi Processor Architecture For Gaas
35 pages
MOCA2311001.txt Statment Proposal
No ratings yet
MOCA2311001.txt Statment Proposal
2 pages
Daily English 647 Using A Smartphone: Glossary
No ratings yet
Daily English 647 Using A Smartphone: Glossary
13 pages
Instructions_NP_WD365_2A
No ratings yet
Instructions_NP_WD365_2A
4 pages
S. Abinesh_ III YEAR_ AU03_ BBA_PROJECT
No ratings yet
S. Abinesh_ III YEAR_ AU03_ BBA_PROJECT
5 pages
Introduction To Information Systems: ITEC 1010 Information and Organizations
No ratings yet
Introduction To Information Systems: ITEC 1010 Information and Organizations
77 pages

INFO408 Database

Uploaded by

INFO408 Database

Uploaded by

QUESTION ONE

You might also like