0% found this document useful (0 votes)
2 views

Module 7 Lecture

The document covers essential concepts related to data warehouses, business intelligence systems, and big data, including the definitions and functionalities of data marts, OLAP, and cloud computing. It also discusses the CAP theorem, which outlines the trade-offs in distributed databases, and introduces various NoSQL database categories. Key learning objectives include understanding data transformation, the role of BI systems, and the importance of virtualization and distributed database processing.

Uploaded by

samerelking3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 7 Lecture

The document covers essential concepts related to data warehouses, business intelligence systems, and big data, including the definitions and functionalities of data marts, OLAP, and cloud computing. It also discusses the CAP theorem, which outlines the trade-offs in distributed databases, and introduces various NoSQL database categories. Key learning objectives include understanding data transformation, the role of BI systems, and the importance of virtualization and distributed database processing.

Uploaded by

samerelking3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Database Concepts

Ninth Edition

Chapter 7
Data Warehouses, Business
Intelligence Systems, and Big Data

Slide in this Presentation Contain Hyperlinks. JAWS users should be


able to get a list of links by using INSERT+F7

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Learning Objectives
2

• Learn the basic concepts of data warehouses and data marts


• Learn the basic concepts of dimensional databases
• Learn the basic concepts of business intelligence (BI) systems
• Learn the basic concepts of online analytical processing (OLAP)
• Learn the basic concepts of virtualization and virtual machines
• Learn the basic concepts of cloud computing
• Learn the basic concepts of Big Data, structured storage, and the
MapReduce process
• Understand the limitations and tradeoffs of replicated, partitioned
stores as indicated by the CAP (Consistency, Availability, Partition
Tolerance) theorem
• Learn the basic concepts of JavaScript Object Notation (JSON) as a
way of structuring nonrelational data
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Big Data
3

Learn the basic concepts of data warehouses


and data marts
• Big Data is the current term for the enormous datasets
generated by applications such as search tools like Google and
Bing
• Web 2.0 social networks include:
– Facebook
– LinkedIn
– Twitter

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.1 Storage Capacity Terms
4

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 5.27 Heather Sweeney
5

Designs: Database Design

Note: From Chapter 5 showing the final database design for Heather Sweeney
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.2 The HSD Database
6

Diagram

Oracle MySQL Community Server 8.0, Oracle Corporation


Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Business Intelligence Systems
7

Learn the basic concepts of business


intelligence (BI) systems
• Business intelligence (BI) systems are information systems that:
– assist managers and other professionals in the analysis of current and
past activities and in the prediction of future events
– do not support operational activities, such as the recording and
processing of orders
§ these are supported by transaction processing systems
– support management assessment, analysis, planning and control
• BI systems fall into two broad categories:
– reporting systems that sort, filter, group, and make elementary
calculations on operational data
– data mining applications that perform sophisticated analyses on data;
analyses that usually involve complex statistical and mathematical
processing

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.4 Characteristics of
8

Business Intelligence Applications

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
BI Terms
9

Learn the basic concepts of business


intelligence (BI) systems
• Online analytical processing (OLAP) is the processing of non-
operational data used in data mining and other applications
• Data lake is a repository that includes all data relevant to a business
(re: photos, documents, other files of any type)
• A data warehouse is a database system that has data, programs,
and personnel that specialize in the preparation of data for BI
processing.

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.3 The Relationship Between
10

Operational and BI Applications

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.5 Components of a Data
11

Warehouse

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Problems with Operational Data
12

Learn the basic concepts of business


intelligence (BI) systems
• “Dirty data,” examples include:
– “G” for gender
– “213” for age
• Missing values
• Inconsistent data
– data that has changed (ex: customer’s phone number)
• Nonintegrated data (data from multiple sources)
• Incorrect format (ex: too many or not enough digits
• Too much data (ex: an excess number of columns)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
ETL Data Transformation
13

Learn the basic concepts of business


intelligence (BI) systems
• Data may need to be transformed for use in a data warehouse.
For example:
– {CountryCode → CountryName}
– “US” → “United States”
– Email address to Email domain
§ [email protected] → “somewhere.com”

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.6 Data Warehouses and
14

Data Marts

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Data Warehouses Versus Data Marts
15

Learn the basic concepts of data warehouses


and data marts
• A data mart is a collection of data that is smaller than the data
warehouse that addresses a specific component
• An enterprise data warehouse (EDW) architecture is when a
data mart is combined with the data warehouse architecture.

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.7 Characteristics of
16

Operational and Dimensional


Databases
• A dimensional database is a design used for efficient analysis
and efficient queries for data warehouses.
• Uses a star-schema with a fact table at the center (fully
normalized) and a dimensional tables radiating out from the
center (may be non-normalized), In a snowflake schema, each
dimension table is normalized.
• May use slowly changing dimensions to track data such as a
date or time dimension.
Operational Database Dimensional Database
Used for structured transaction data processing Used for unstructured analytical data
Current data are used Current and historical data are used
Data are inserted, updated, and deleted by Data are loaded and updated
users systematically, not by users

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.8 A Star Schema
17

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.9 The HSD–DW Star Schema
18

Oracle MySQL Community Server 8.0, Oracle Corporation


Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
19

Figure 7.10 The HSD–DW SQL Create Table


Statements

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.11 The HSD–DW Table Data
20

Oracle MySQL Community Server 8.0, Oracle Corporation


Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
A Query to Summarize Products Sold
21

by Customer and Product


Learn the basic concepts of business
intelligence (BI) systems
• The following SQL code is used to summarize products sold by
Customer and Product

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
22

Figure 7.12 The HSD–DW Query Results: Summarize Product Units


Sold by Customer and Product

Oracle MySQL Community Server 8.0, Oracle Corporation

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.13 The Two-Dimensional
23

ProductNumber-CustomerID Matrix

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.14 The Three-Dimensional
24

Time-ProductNumber-CustomerID
Cube

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.15 The HSD–DW Star
25

Schema Extended for RFM Analysis

Oracle MySQL Community Server 8.0, Oracle Corporation

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Online Analytical Processing (OLAP)
26

Learn the basic concepts of online analytical


processing (OLAP)
• Online Analytical Processing (OLAP) is a technique for
dynamically examining database data:
– OLAP uses arithmetic functions such as Sum and Average
– OLAP uses the dimensional database model
• OLAP systems produce OLAP reports (also called OLAP cube)
– An OLAP report uses inputs call dimensions
– An OLAP report calculates outputs called measures
– Excel PivotTables can be used to create OLAP reports

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
SQL Query for OLAP Data
27

Learn the basic concepts of online analytical


processing (OLAP)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
SQL View for O LAP Data
28

Learn the basic concepts of online analytical


processing (OLAP)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
29

Figure 7.16 The HSD-DW Query for OLAP


Results: Time-Product-Customer Cube

Oracle MySQL Community Server 8.0, Oracle Corporation

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.17 The HSD-DW OLAP
30

ProductNumber by City Report

Excel 2019, Windows 10, Microsoft Corporation


Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.18 The HSD–DW OLAP
31

ProductNumber by City Report:


CustomerName and Year Dimensions
Added

Excel 2019, Windows 10, Microsoft Corporation

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.19 The HSD–DW OLAP City
32

by ProductNumber, CustomerName,
and Year Report

Excel 2019, Windows 10, Microsoft Corporation


Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Distributed Database Processing
33

Learn the basic concepts of virtualization and


virtual machines
• A distributed database is a database that is stored and
processed on more than one computer.
• A service cluster is a group of associated servers where the
database shared between them is called a distributed
database.
• A database is distributed when it is:
– partitioned
– replicated
– both partitioned and replicated
• Databases are distributed for two major reasons – performance
and control.
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.20 Types of Distributed
34

Databases

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Object-Relational Databases
35

Learn the basic concepts of virtualization and


virtual machines
• Object-oriented programming (OOP) is a methodology for
designing and writing computer programs.
• Some object-oriented programming languages are Java,
Python, C++, C#, and Visual Basic.NET
• Objects are data structures that have both methods
(programs that perform some task with the object) and
properties (data items particular to an object).
• Object persistence is the permanent storing of values of
properties of an object in secondary memory (usually disk)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
OODBMS
36

Learn the basic concepts of virtualization and


virtual machines
• Object-oriented DBMSs (OODBMSs) are special-purpose
DBMS products for storing data developed by several vendors
in the early 1990s. These never achieved commercial success.
– too expensive to transfer existing data from traditional
legacy databases
– not cost justifiable

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Virtualization
37

Learn the basic concepts of virtualization and


virtual machines
• Virtualization is using hardware and software to simulate
another hardware resource.
• Virtualization is done by having one physical computer host
one or more virtual computers, also known as virtual
machines.
• The host machine runs a special program known as a virtual
machine manager or hypervisor.
• There are two ways to implement hypervisors:
– “bare metal” or type 1 (used in large data centers)
– “hosted” or type 2 (used by students and others)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.21 Type 1 and Type 2
38

Hypervisors versus Containers

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Cloud Computing
39

Learn the basic concepts of cloud computing


• Cloud computing services are ultimately provided by large data
centers.
• Storage area networks (SANs) have dedicated network paths from
servers to disk arrays, where several physical disks are combined
together to act as a single, larger disk.
• Redundant arrays of independent disks (RAID) can be configured
for maximum access speed or for reliability.
• Three basic ways to lease cloud services:
– Software as a service (SaaS), Ex: Salesforce.com
– Platform as a service (PaaS), Ex: operating systems, software
development tools & system programs provided
– Infrastructure as a service (IaaS), Ex: Only hardware provided
and users manage their own software
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Not Only SQL
40

Learn the basic concepts of Big Data,


structured storage, and the MapReduce
process
• The Not only SQL (previously NoSQL) movement is a movement to
use non-relational databases.
• These databases are often a distributed, replicated database to
support large datasets.

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
The CAP Theorem
41

Understand the limitations and tradeoffs of


replicated, partitioned stores as indicated by
the CAP (Consistency, Availability, Partition
Tolerance) theorem
• The CAP theorem defines three properties of distributed database
systems:
– Consistency means that all database replicas see the same data
at any given point in time
– Availability means that every request received by a server will
result in a response, as long as the network is available
– Partition tolerance means that the distributed database can
continue to operate even when the cluster is partitioned by
network failures into two or more disconnected sections
(partitions)

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.22 The CAP Theorem—You
42

Can’t Have All Three at the Same


Time!

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Categories of NoSQL Database
43

Management Systems
Learn the basic concepts of Big Data,
structured storage, and the MapReduce
process
• Four categories of NoSQL databases include:
– Key-Value database
§ Dynamo, MemcacheDB, Redis
– Document database
§ Couchbase, ArangoDB, Mongo DB
– Column family database
§ Vertica, Apache Cassandra, Hbase
– Graph database
§ Neo4j, AllegroGraph, Titan
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.23 An ArangoDB AQL Query
44

and JSON Results

Courtesy of ArangoDB
Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
MapReduce and Hadoop
45

Learn the basic concepts of Big Data,


structured storage, and the MapReduce
process
• MapReduce is a technique to analyze extremely large datasets
by breaking down tasks and assigning them to a cluster of
computers and then combine them into the final product of the
original task.
• Hadoop Distributed File System (HDFS) provides standard
file services to clustered servers so their file systems can
function as one distributed, replicated file system that can
support large-scale MapReduce processing.

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.24 MapReduce
46

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Figure 7.25 Example of Apache Tez
47

Directed Acyclic Graph

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved
Copyright
48

This work is protected by United States copyright laws and is


provided solely for the use of instructors in teaching their
courses and assessing student learning. Dissemination or sale of
any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work
and materials from it should never be made available to students
except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and
the needs of other instructors who rely on these materials.

Copyright © 2020, 2017, 2015 Pearson Education, Inc. All Rights Reserved

You might also like