0% found this document useful (0 votes)
9 views18 pages

1c-OLAP-BI-Using The AMBER Data Repository To Analy

Uploaded by

P B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

1c-OLAP-BI-Using The AMBER Data Repository To Analy

Uploaded by

P B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Tutorial The AMBER Project

• Assessing, Measuring and Benchmarking


Resilience in computer systems and
Using the AMBER Data Repository components (AMBER)
to Analyze, Share and Cross-exploit • Coordination Action supported by the
Dependability Data European Commission in the 7th FP
Marco Vieira

m vi ei r a@dei .uc.pt
• Coordinating and advancing research in
University of Coimbra, Portugal

resilience measurement and benchmarking in


T he Sec ond Inter nati onal Confer ence on Dependability ( D EPEN D 2009)

Athens/Gl yfada, Gr eece, J une 18,2009

computer systems and infrastructures

Current challenges AMBER objectives

• Quality of measurements • State-of-the art survey


• Integration of the human and technical • Research agenda
components of the analysis • Data repository
• Dynamic and adaptive systems and
• Others:
networks
– Dissemination events (workshops, panels, etc)
• Integration with the development – Benchmarking tools
processes – Training material

3 4

This Tutorial… Problems

• How to analyze the usually large amount of


raw data produced in dependability evaluation
Learn how to use the experiments?
AMBER Data Repository • How to compare results from different
experiments or results of similar experiments
to analyze and share data across different systems?
– Different and incompatible tools, data formats, and
from dependability setup details…
evaluation experiments • How to share raw experimental results among
research teams?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 5 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 6

1
Marco Vieira, University of Coimbra, Portugal

Current situation ADR Vision and objectives

• The situation today is not good!!! • Vision


• Spreadsheets and other specific tools to – Become a worldwide repository for
dependability related data
analyze results
– Not standard and difficult to build • Key objectives:
– Provide state-of-the-art data
• Difficult to compare data and generalize
analysis
conclusions – Allow data comparison and cross-
• Researchers share final results and exploitation
conclusions – Facilitate worldwide data sharing and
– Papers, mainly dissemination
– Raw data is not shared • Potential tool to increase the impact of
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 7 8

research
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009

Data analysis approach Outline

• Repository to analyze, compare, and share 1. Business Intelligence


results
• Use a business intelligence approach:
2. Data Warehousing & OLAP
– Data warehouse to store data
– On-Line Analytical Processing (OLAP) to analyze
data 3. Using DW to analyze dependability related data
– Data mining algorithms to identify (unknown)
phenomena in the data
– Information retrieval for data in textual 4. The AMBER Data Repository
formats
• Adopt the same life cycle of BI data 9 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 10

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009

What is Business Intelligence?

• Business Intelligence (BI):


– Getting the right information, to the right
decision makers, at the right time
• BI is an enterprise-wide platform that
1. Business Intelligence supports, data gathering, reporting, analysis
and decision making
• BI is meant to:
– Fact-based decision making
– “Single version of the truth”
• BI includes reporting and analytics
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 12

2
Marco Vieira, University of Coimbra, Portugal

Five classic BI questions Typical BI technologies

• ETL Tools (Extract, Transform, and Load)

• Repositories
• What happened? Past
– Data Warehouse
• What is happening?
• Why did it happen? Present • Analytical tools
• What will happen? – Reporting and querying
Future – OLAP
• What do I want to happen?
– Data mining

• Information retrieval

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 13 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 14

Many proprietary products Some open source/free producs

ACE*COMM Microsoft SAS Institute • Eclipse BIRT Project:.


Ab Initio Microsoft Analysis Services Siebel Systems • Freereporting.com:
Actuate PerformancePoint Server 2007
Proclarity Spotfire (now Tibco) • JasperSoft:
ComArch
Oracle Corporation
CyberQuery Hyperion Solutions
StatSoft • OpenI:
Dimensional Insight Corporation SPSS • Palo (OLAP database):
IBM Panorama Software Telerik Reporting
Applix Pentaho • Pentaho:
Pervasive Teradata
Cognos • RapidMiner
InetSoft Pilot Software, Inc. Thomson Data
PRELYTIS Analyzer
• SpagoBI:
Informatica
Prospero Business
Information Builders Suite • Weka
LogiXML Qliktech
LucidEra SAP Business Inf.
Warehouse • Some products from big companies can be used freely
MicroStrategy
Business Objects
OutlookSoft
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 15 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 16

What is a Data Warehouse?

• Big database that stores data for decision


support
• Built from the operational data collected from

2. Data Warehousing transactional DB and other operational systems


Operational DB

& OLAP & other systems Data Warehouse


Users

Users

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 18

3
Marco Vieira, University of Coimbra, Portugal

Basic DW components Data volume

Data warehouse Users


• Less than 20 GBytes
Operational DB
(presentation servers) – Small dimension; runs in a PC
Ad hoc
• From 20 to 100 GBytes
queries
Legacy systems – Medium dimension; needs a powerful workstation
Reports

Data • From 100 Gbytes to 1 TBytes


Spreadsheets, Specific
files, ...
Staging
apps – Large dimension; needs a powerful server,
Area
Models and
normally with parallel processing
other
tools
• More than 1 TBytes
External sources
– Very large dimension; massive parallel
processing
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 19 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 20

Some characteristics Temporal dependency

• Temporal dependency • The data is collected over time


– Do not represent a specific moment
• Non volatile – Represents the history

• Target oriented
• A temporal reference must be associated to all
• Data integration and consistency data in the database
• Designed for queries

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 21 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 22

Non volatile Target oriented

• The data in the DW is never updated • The data warehouse must only store data
relevant for decision support
• The DW stores historic data (historic memory)
collected from the operational databases • Many operational data (needed for everyday
management) is not relevant for the DW
• After being load (from the operational
databases) there is only one operation:
– Queries

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 23 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 24

4
Marco Vieira, University of Coimbra, Portugal

Data integration and consistency Designed for queries

• In a operational environment the information • After being load the


may be stored in different locations using data never changes:
different representations – Only queries are The data must be stored
allowed in such a way that
improves performance
• That data must be integrated and made • DW stores a large
consistent before being load in the DW amount of data

Multidimensional view
Partial denormalization
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 25 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 26

Dimensional model The multidimensional model

• Typical model in operational databases: E/R • Facts stored in a multidimensional array


• The dimensional model follows a different • The dimensions are used to index the array
approach
– Stores the same data
• Usually built using data from operational
– Data organization is user oriented databases Sales

e
• Easy to understand or
St Lisbon 2
Coimbra
• Very good performance for queries Milk
Oil 5
• Data Warehouses built over complex E/R Product
Sugar 3
models never succeed Coffee Jan Feb Mar Apr
Date
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 27 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 28

Star model Facts

• The typical dimensional model is a star • Represent the business measures


structure with:
• The most useful facts are:
– A central table with facts
– Numbers
– Several dimensions tables describing the
– Additives
facts
ID_dim 1 ID_dim 3
Dimension 1 Facts Table
Attributes Attributes
.. ID_dim 1 ..
Dimension
. 3 ID_dim 2 .
ID_dim 3
ID_dim 4

Dimension 2 Dimension 4
ID_dim 2 Fact 1 ID_dim 4
Fact 2
.
Attributes . Attributes
.. . ..
. Fact n .
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 29 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 30

5
Marco Vieira, University of Coimbra, Portugal

Facts table Dimensions

• Comprises several numeric attributes (facts) • Each dimension represents a business


and foreign keys to the dimensions parameter
• Normalized table – Time, clients, products, etc

• Relationships M:1 with the business • Represent a entry point for the analysis of the
dimensions facts

• Contains normally a large number • Represent different point-of-views for the


of records analysis of the facts
• Represents typically 95% of the space used
by the DW
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 31 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 32

Dimension tables Star schema example

• Strongly denormalized Stores


Time
– For performance ID_time
Day Store
• Dimensions have hierarchies Day_of_week
Week_of_year
Sale
ID_store
Month Name
– Day  Month  Year  … Contain a large set of Trimester ID_time
ID_product
Local
Year District
attributes ID_store
Area
Units_sold Num_te
Purchase_cost
• Typically comprise a small number of records Product
ID_product
Sale_value
Num_Clients
llers

(when compared to the facts table) Name


Type
Brand
Category
Pack
Description

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 33 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 34

Low level queries User interfaces


Tim
ID_time • Explore data in Data Warehouses
Day e Store
Day_of_week
Week_of_year ID_store – Typical OLAP tools
Sales
Month
Trimester ID_time
Name
Local
• Access the relational engine using SQL
Year ID_product
ID_store District
Area
• Data presentation using tables, graphics, reports, etc
Product
Num_te
llers
• Targeted for ad-hoc queries
ID_product Units_sold
Name
Purchase_cost
Sale_value – Other tools
Type Num_Clients
Brand • Data mining
Category select avg (sale_value x units_sold)
Pack
from sale, time, product • Modeling
Descripti
on where JOIN_TABLES
group by brand, month

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 35 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 36

6
Marco Vieira, University of Coimbra, Portugal

Queries - Slice and Dice Drill-Down & Roll-Up

Drill-Down Roll-up
Sales by time and Most generic category
product Sales by store and
brand
Intermediate category

Most detailed category

Full Detail

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 37 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 38

Steps for the design of the star


Time: Drill-Down & Roll-Up
model
Drill-Down ALL Roll-up 1. Identify the business process/activity
Year 2. Identify the facts el
od e )
m abl etc
the vail iles,
Trimester 3. Identify the dimensions at a ,f
th at a s
et d ase
rg e b
fo th ata
Month 4. Define the data granularity not s on al d
d n
• Do pen atio
Day, Week, Month, … de per
(o
Week • Product, Category, …
• Store, City, …
Day

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 39 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 40

Example – Retail sales Retail sales – Business data

• Set of stores belonging to the same enterprise • Where to collect the data?
– POS - point of sales
• Goal: Analysis of sales
– Operational database
• Each store has several departments (food, • What to measure?
hygiene and cleaning, etc) – Sales

• Sells thousands of products • Goals?


– Maxi
• Products are identified using a unique mize
number the
profit
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 41
– Maxi 42

mum
sales
price
possi
ble
– Lowe
7
r
costs


More
client
s
Marco Vieira, University of Coimbra, Portugal

Retail sales – Facts Retail sales – Dimensions

• Examples of relevant decision support facts: • Main dimensions:


– Number of units sold – Product x Store x Time
– Acquisition costs
• Are there other relevant dimensions?
– Sale value
– Supplier? – Promotions? – Client?
– Number of clients that bought the product
– Employee responsible for the store on that day?
• Question: is it possible to obtain base data
• It is normally possible to add extra dimensions
(from the operational system) for these facts?
• All the dimensions have a 1:M relationship
with the facts
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 43 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 44

Retail sales Granularity

ID_product • Example: record the daily sales for all products


ID_product ID_time
description
full_description
ID_store ID_store – Analyze in detail (price, quantity, etc) the products
ID_promotion name
SKU_number
package_size units_sold store_number
store_street_a
sold every day, in each store, …
brand purchase_cost ddress
subcategory
category
sale_value
num_Clients
city
store_county
• Retail sales granularity:
department store_state
package_type
diet_type weight ID_time store_zip
sales_district
– Products x Store x Promotion x Day
weight_unit_of_measure date ID_promotion sales_region
units_per_retail_case
units_per_shipping_case
day_of_week
day_number
number
name
store_manager
store_phone • The granularity defines the detail of the DW and
cases_per_pallet _in_month type_price_red store_FAX
shelf_width_cm
shelf_height_cm
day_number_overall
week_number_in_year
type_advertisement
type_poster
floor_plan_type
photo_processin
has a strong impact in the size
shelf_depth_cm week_number_overall Type_coupons g_type
……... Month
quarter
promotion_cost
start_date end_date
finance_services_type
first_opened_date
• The granularity must be adjusted to the
fiscal_period ……... last_remodel_date
year
holiday_flag
store_sqft
grocery_sqft
analysis requirements
………. frozen_sqft meat_sqft
……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 45 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 46

Retail sales – Details Retail sales – Details

ID_product
• Mctandatory dimension that
ID_pro du
ID_product
ID_product
ID_time ID_time
description represents the DW tDI e_mrotspe oral
ID_store description ID_store ID_store
full_description name full_description • Must chaIDr_apcrotmeortiizone the name
SKU_number ID_promodtieonpendency store_number SKU_number store_number
package_size units_sold store_street_address package_size as
products units_sold store_street_a
brand • Must describe
purchase_cost
time cstore_county
atisy seen brand ddress
subcategory sale_value subcategory seenpburychtahsee_cobstusiness city
category
by
num_Clietnhtse business managsteormstore_zip
e_setatnet
category num_Clients store_county
department
package_type
department
package_type • manage
Must smael e_vnthe
contain atu
le attributes that store_state
store_zip
• Is typically generate d
ID_time sales_district ID_time
diet_type weight diet_type
weight_unit_of_measure date ID_promotion s alesi_store_manager
nregaion weight date are relevant forID p _ opr somteot rio i nor sales_district
s sales_region
store_manager
units_per_retail_case day_of_week sy n t h etic
nu m b er store_phone weight_unit_of_measure
querie
day_of_week number store_phone store_FAX
units_per_shipping_case day_number name store_FAX units_per_retail_case day_•nuItmbiser_ani_smtornoht ngly floor_plan_type
cases_per_pallet
shelf_width_cm
_in_month
day_number_overall
week_number_in_year
• manner
Ittyitype_advertisement
spe_nproci te_gredenerated frflooomphoto_processing_type
r_pltahn_etype
units_per_shipping_case
cases_per_pallet
day_number_overall
dtea nbormalized
wek_nn am
um eb e
type_price_red
l er_i(nw_yeharich is
photo_processing_type
finance_services_type
shelf_height_cm shelf_width_cm first_opened_date
type_poster
shelf_depth_cm
……...
week_number_overall
Month
opType_coupons
e r a t i o nal databasfirst_opened_date
ypt e _p o ts e r feinasnce_services_type shelf_height_cm
shelf_depth_cm
a yt lps
Month _e oadvtyrep
week_number_overall
other sitiecman
dimension e lt in Tysp)e_coupons last_remodel_date
promotion_cost quarter promotion_cost store_sqft grocery_sqft
quarter • Intscarlu_dt daeet s all the last_remodel_date ……... fiscal_period start_date frozen_sqft
fiscal_period year end_date meat_sqft
year reco strodres_sqft
end_date grocery_sqft holiday_flag ……... ……...
holiday_flag re…p…r.e..senting the ……….
………. considered in the DW meat_sqft
……...
tim e p e r i od
fr oze June
DEPEND 2009, Athens/Glyfada, Greece, n_ sq18,2009
ft 47 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 48

8
Marco Vieira, University of Coimbra, Portugal

Retail sales – Details Retail sales – Details

ID_product
ID_product
• Must characterize tIhD_emit setores as •IDC_prhodaurctacterizes the existing ID_time
•prdesIIconDprim
tth_proinoisodexample
tiuoctns
ID_store
description
full_description
ID_store
seen by the busineIsDs_pmromaonoit angement name full_description there is onlyID_promotion
IoDn_seortde imension related to IpD
name
r om
_ s ot er

SKU_number
units_sold store_number otions
•SKRU_enupmrberesents a very importantudniitms_seolnd store_number


package_size store_street_a package_size store_street_address
brand ddress purchase_cost
subcategory Must contain the atptsurarl ice_bhvu aslteee_csostthat sion
brand
• Managers want to know the impact of promotions in the salescitiyn order to
category relevant for posteri o r q u e ries
a u
num_Clients
city
store_county
subcategory sale_value store_county
category target new promotions to specnifuicm_pCrloei dnutscts, stores and timestore_zip
store_state
department
are store_state department
package_type
diet_type •
Incu
l IDd_etimsegeographical attributes
store_zip
sales_district
package_type
diet_type
ID_time sales_district
sales_region
weight (localization)
date
day_of_week
ID_promotion
number
sales_region weight date
day_of_week
ID_promotion
store_manager
weight_unit_of_measure store_manager weight_unit_of_measure number store_phone
day_number_in_month day_number
units_per_retail_case
units_per_shipping_case • Inclu des time attributes
day_number_overall
name
type_price_red
store_phone
store_FAX
units_per_retail_case
units_per_shipping_case _in_month
name
type_pri
store_FAX
floor_plan_type
cases_per_pallet type_advertisement floor_plan_type cases_per_pallet day_number_overall ce_red photo_processin
w ee k _
n um b e
r_ n
i_ y week_number_in_year
shelf_width_cm
shelf_height_cm
(ope nweek_number_overall
i n g d a te ,
Month
type_poster
Type_coupons
photo_processin
g_type
shelf_width_cm
shelf_height_cm week_number_overall
type_advertisement
type_poster
g_type
finance_services_type
shelf_depth_cm quarter promotion_cost finance_services_type shelf_depth_cm Month Type_coupons first_opened_date
……... ear fiscal_period start_date end_date first_opened_date ……... quarter promotion_cost last_remodel_date
… ). year ……... last_remodel_date fiscal_period start_date store_sqft
holiday_flag store_sqft year end_date grocery_sqft
………. grocery_sqft holiday_flag ……... frozen_sqft meat_sqft
frozen_sqft meat_sqft ……….
……...
……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 49 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 50

More than one star Several stars


Sales Stock
ID_time
ID_product Time ID_time
ID_product
Store ID_store ID_warehouse Warehouse
units_sold quant_available
purchase_cost quant_out
sale_value purchase_cost Sales
num_clients last_sell_price Orders Dimension: Time
Product Dimension: Time Dimension: Component
Dimension: Component Dimension: Client
• Two or more starts can be connected using one Dimension: Supplier Dimension: Contract
Dimension: Contract
or more dimensions
Stocks
• Shared dimensions must be conform Dimension: Time
Dimension: Component
– Contain consistent data when considering each star Dimension: Warehouse

• Drill across: query that crosses more than one


start
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 51 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 52

Questions

? 3. Using DW to analyze
dependability data

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 53

9
Marco Vieira, University of Coimbra, Portugal

Basic elements of a DW A DW for experimental data


Operations Analysis Experiments Result
Operational DB analysis
Fault injection
Multidimensional OLAP application
Multidimensional OLAP application tools
server (result analysis)
server (result analysis)
Robustness testing
Exp. System A
Legacy Systems tools
Ad hoc
Ad hoc
Dependability
queries
Data queries benchmarking Data LAN/
Spread sheets, Warehouse Warehouse Interne Statistical
files ... Net Statistical experiments
Exp. System B t
Reporting Reporting
Any other
experimental
External sources environment

Field
Exp. dataN
System

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 55 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 56

Two types of data in experimental


Key points of the proposed approach dependability evaluation
Experiments Multidimensional
OLAP tool database Network
Exp. Setup A Ad hoc
queries

Exp. Setup B Experiment

?
Management System Exp. control data
Data Faults definition Target System
Warehouse Net

Statistical Readouts
Exp. Setup N Reporting
(impact of faults)
Two types of data:
• General approach to store results from dependability
evaluation experiments
• Measures collected from the target system (FACTS)
– For example, raw data representing error detection efficiency, recovery
• Data from different experiments can be compared/cross- time, failure modes, etc
exploit (only if it makes Wsehnastes’toni csiodme?pare)
• Features of the target system and experimental setup
• Raw data is available (not only the final results)
that have impact on the measures (DIMENSIONS)
• Results can be analyzed and shared world wide by using – For example, attributes describing the target systems, the different
web-enabled versions of OLAP tools configurations, the workload, the faultload, etc
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 57 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 58

The multidimensional model The star schema

• Facts are stored in a multidimensional array


• Dimensions are used to access the array
according to any possible criteria
e m
st
sy

System B
et

System A
rg
Ta

Faultload

Workload

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 59 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 60

10
Marco Vieira, University of Coimbra, Portugal

Basic elements of the proposed Basic elements of the proposed


approach approach
Experiment Multidimensional Analysis Experiment Multidimensional Analysis
s database s database
Exp. Setup A Ad hoc Exp. Setup A Ad hoc
queries queries

Exp. Setup B Exp. Setup B


Data Loading Data
Warehouse Net Warehouse Net
applications
Statistical Statistical
Exp. Setup N Reporting Exp. Setup N Reporting

The experimental setups are used as they are. You can use your Loading applications
favorite dependability evaluation tool and do the experiments • General purpose loading applications
in the usual way. It’s necessary… • Some transformations in the data are normally necessary for
• To know the format of the raw results consistency
• To have access to the results
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 61 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 62

Basic elements of the proposed Basic elements of the proposed


approach approach
Experiment Multidimensional Analysis Experiment Multidimensional Analysis
s database s database
Exp. Setup A Ad hoc Exp. Setup A Ad hoc
queries queries

Exp. Setup B Exp. Setup B

Loading Data Loading Data


Warehouse Net Warehouse Net
applications applications
Statistical Statistical
Exp. Setup N Reporting Exp. Setup N Reporting

Data warehouse
• Raw data is available in a standard star schema (facts + dimensions)
Analysis
• Results from different experiments are compatible and can be compared/
• Commercial OLAP tools are used to analyze the raw data and
analyzed together, then they are stored in the same star schema (or in compute the measures. These tools are designed to be used by
scheme that share at least one dimension) managers: very easy to use :-)
• If results are from different unrelated experiments then they are stored in a • Just need an internet browser to analyze the data
separated schema

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 63 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 64

Steps needed to put our approach Example: Recovery and


into practice Performance Evaluation in DBMS
• Tuning of a large DBMS is very complex
1. Definition of the adequate star schema to store
• Administrators tend to focus on performance
the data. Create the tables in the data warehouse
tuning and disregard the recovery features
2. Use general-purpose loading application to • Administrators seldom have feedback on how
define the loading plans for each table in the star good a given configuration is
schema
• A technique to characterize the performance
3. Run the loading plans to load the star tables and the recoverability in DBMS is needed
with the raw data collected from the experiments

4. Every time a new experiment is done 65 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 66

corresponding loading plans are run again to add


the new data to the data warehouse

5. Analyze the data: calculate measures,


find unexpected results, analyze trends, 11
etc
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009
Marco Vieira, University of Coimbra, Portugal

Operator faults injection and


The Approach
recovery
• Extending existing performance benchmarks
to evaluate recoverability features in DBMS
• Include a faultload and new measures
• Faultload based on operator faults
• Measures related to recovery:
– Recovery time
– Data integrity violations
– Lost transactions

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 67 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 68

Experimental setup The data storage model

Test

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 69


DEPEND 2009, Athens/Glyfada, Greece, June 18,2009

70

Definition of the adequate star


Steps towards data analyzes
schema: Identify the process/activity
1. Definition of the adequate star schema • Experiments to characterize the performance
a. Identify the process/activity and the recoverability in DBMS
b. Identify the facts • Includes a faultload and new measures
c. Identify the dimensions
d. Define the data granularity • Faultload based on operator faults

2. Load the data • Measures related to recovery

3. Analyze the data

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 71 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 72

12
Marco Vieira, University of Coimbra, Portugal

Definition of the adequate star Definition of the adequate star


schema: Identify the facts schema: Identify the dimensions

DEPEND 20 09, Athens/Glyfada, Greece, June 18,2009 73 DEPEND 20 09, Athens/Glyfada, Greece, June 18,2009 74

Definition of the adequate star


The star schema
schema: Define the data granularity
• Performance and recovery results
– Per experiment
– Per SUT
– Per workload
– Per fault type

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 75 DEPEND 2009, Athens/Glyfada, Greece, Ju ne 18,2009 76

Analyze the data: Example of query


Load the data
construction

ETL

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 77 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 78

13
Marco Vieira, University of Coimbra, Portugal

Analyze the data: Example of query


Questions
answer

?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 79 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 80

AMBER Repository
vision and objectives
• Vision
− Become a worldwide repository
for dependability related data

4. The AMBER Data • Key objectives:


− Provide state-of-the-art data

Repository analysis
− Allow data comparison and
cross-exploitation
− Facilitate worldwide data
sharing and dissemination

• Potential tool to increase


DEPEND the
2009, Athens/Glyfada, Greece, June 18,2009 82

impact of research

Potential use Data analysis approach

• Research team level • Repository to analyze, compare, and share


− Perform the analysis of data in an efficient way results
− Efficient dissemination of the results of the team • Use a business intelligence approach:
• Project level − Data warehouse to store data
− Sharing and cross-exploitation of results from − On-Line Analytical Processing (OLAP) to analyze data
different project teams − Data mining algorithms to identify (unknown)
• World wide phenomena in the data
− Common repository to store and share data − Information retrieval to access data in textual formats
− Many teams are performing dependability • Adopt the same life cycle of BI data
evaluation but there are no results available at the
web • Use technology already available for DW, DM &
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 83 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 84
IR

14
Marco Vieira, University of Coimbra, Portugal

Step User registration


s
1. User registration • ADR users must undergo a registration
procedure
2. Multidimensional analysis
• Provide identification information that is
3. Definition of the loading plans verified by the ADR support team
7. Load the data − To filter malicious users

8. Definition of data ownership policies • Contact information is used to get in touch


with the potential repository user
9. Analysis of the data
• • To access the repository users must
Analyze DBench-OLTP results using
OLAP authenticate
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 85 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 86

Multidimensional analysis The DBench-OLTP benchmark

• Design an adequate multidimensional data


model
• User has the required expertise to design the
data model 
− Send to the ADR support team the SQL scripts
needed to create the database tables
• The ADR team helps the user defining the
model
− The user only needs to explain us the experimental
setup and the format of the data collected
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 87 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 88

Format of the raw data Data model (1)

• Raw data collected by DBench-OLTP is • Key steps:


composed of tens of CSV files (one from each − Identification of the facts that characterize the
run) problem under analysis
− Identification of the dimensions that may influence
• Each row contains data from an injection slot the facts
− Identification, duration, number of transactions − Definition of the granularity of the data stored in the
executed, data integrity errors discovered, type of star schema
fault injected, moment of fault injection, workload
used, etc)

• A text file describes the experiment and


the characteristics of the SUB
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 89 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 90

15
Marco Vieira, University of Coimbra, Portugal

Data model (2) Definition of the loading plans

• Data extraction
− SQL scripts to extract data from the CSV files to a
temporary database schema (data staging area)
• Data transformation
− SQL scripts transform the data into an adequate format

• Data load
− SQL scripts to load the transformed data into the
data warehouse
• Loading plans documented and stored in
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 91
the ADR
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009

92

Load the data Data ownership policy

• Executing the loading plans created before • Data ownership policies of ADR are divided in two
main groups
• If new data becomes available we just need to
− Private data
rerun the plans
− Proprietary data
− e.g., if the benchmark is executed in other systems
− Collaborative data
• The documentation of the DBench-OLTP
• For the DBench-OLTP data we have decided to
includes papers and technical reports
use a collaborative approach
− This is considered as part of the DBench-OLTP
− Allows other potential users of the benchmark to
data
compare their results with the ones available in the
− It is loaded to the repository and made available ADR
to the potential readers of the data
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 93 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 94

Analysis of the data OLAP Wizard

• On-line Analytical Processing (OLAP) tools • Selection of query type (crosstab or


− Support the analysis in a very flexible way table) and characteristics (title, graph,
− Provide high query performance and easy, intuitive text area, etc)
data navigation
• Selection of measures and dimensional
• Oracle Business Intelligence Discoverer Plus attributes
(ODP)
• Setting the query layout
− Commercial tool included in Oracle Business
Intelligence package • Selection of the fields to be used to sort
− Widely used by industry Used freely for research
the results
purposes under an Oracle Academy Agreement
• Creation of parameters used to filter data
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 95 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 96

16
Marco Vieira, University of Coimbra, Portugal

Some results Quick demo…

• Murphy's law…

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 97 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 98

http://www.amber-project.eu Questions

Do you have
data? ?
Share Them!

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 99 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 100

Generic bibliography ADR bibliography

• Ralph Kimbal, Margy Ross, “The Data • Madeira, H., Costa, J., Vieira, M. , "The OLAP and Data
Warehousing Approaches for Analysis and Sharing of Results
Warehouse Toolkit: The Complete Guide to from Dependability Evaluation Experiments", International
Dimensional Modeling” (Second Edition), Ed. Conference on Dependable Systems and Networks, DSN-
J. Wiley & Sons, Inc, 2002. DCC 2003, San Francisco, CA, USA, June 2003
• Pintér, G., Madeira, H., Vieira, M., Pataricza, A., Majzik, I. , "A
• Ralph Kimbal, “The Data Warehouse Lifecycle Data Mining Approach to Identify Key Factors in Dependability
Toolkit”, Ed. J. Wiley & Sons, Inc, 2001. Experiments", Fifth European Dependable Computing
Conference (EDCC-5), Budapest, Hungary, April 2005

DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 101 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 102

17
Marco Vieira, University of Coimbra, Portugal

ADR bibliography

• Pintér, G., Madeira, H., Vieira, M., Majzik, I., Pataricza, A. ,


"Integration of OLAP and Data Mining for Analysis of Results
from Dependability Evaluation Experiments", International
Journal of Knowledge Management Studies (IJKMS), Volume
2 – Issue 4 – 2008, Inderscience Publishers, July 2008
• Vieira, M., Mendes, N., Durães, J., Madeira, H. , "The
AMBER Data Repository", DSN 2008 Workshop on
Resilience Assessment and Dependability Benchmarking
(DSN-RADB08), Anchorage, Alaska, June 2008
• Vieira, M., Mendes, N., Durães, J. , "A Case Study on Using
the AMBER Data Repository for Experimental Data Analysis",
SRDS 2008 Workshop on Sharing Field Data and Experiment
Measurements on Resilience of Distributed Computing
Systems, Naples, Italy, October 2008
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 103

18

You might also like