1c-OLAP-BI-Using The AMBER Data Repository To Analy
1c-OLAP-BI-Using The AMBER Data Repository To Analy
m vi ei r a@dei .uc.pt
• Coordinating and advancing research in
University of Coimbra, Portugal
3 4
1
Marco Vieira, University of Coimbra, Portugal
research
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009
2
Marco Vieira, University of Coimbra, Portugal
• Repositories
• What happened? Past
– Data Warehouse
• What is happening?
• Why did it happen? Present • Analytical tools
• What will happen? – Reporting and querying
Future – OLAP
• What do I want to happen?
– Data mining
• Information retrieval
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 13 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 14
Users
3
Marco Vieira, University of Coimbra, Portugal
• Target oriented
• A temporal reference must be associated to all
• Data integration and consistency data in the database
• Designed for queries
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 21 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 22
• The data in the DW is never updated • The data warehouse must only store data
relevant for decision support
• The DW stores historic data (historic memory)
collected from the operational databases • Many operational data (needed for everyday
management) is not relevant for the DW
• After being load (from the operational
databases) there is only one operation:
– Queries
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 23 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 24
4
Marco Vieira, University of Coimbra, Portugal
Multidimensional view
Partial denormalization
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 25 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 26
e
• Easy to understand or
St Lisbon 2
Coimbra
• Very good performance for queries Milk
Oil 5
• Data Warehouses built over complex E/R Product
Sugar 3
models never succeed Coffee Jan Feb Mar Apr
Date
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 27 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 28
Dimension 2 Dimension 4
ID_dim 2 Fact 1 ID_dim 4
Fact 2
.
Attributes . Attributes
.. . ..
. Fact n .
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 29 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 30
5
Marco Vieira, University of Coimbra, Portugal
• Relationships M:1 with the business • Represent a entry point for the analysis of the
dimensions facts
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 33 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 34
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 35 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 36
6
Marco Vieira, University of Coimbra, Portugal
Drill-Down Roll-up
Sales by time and Most generic category
product Sales by store and
brand
Intermediate category
Full Detail
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 37 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 38
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 39 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 40
• Set of stores belonging to the same enterprise • Where to collect the data?
– POS - point of sales
• Goal: Analysis of sales
– Operational database
• Each store has several departments (food, • What to measure?
hygiene and cleaning, etc) – Sales
mum
sales
price
possi
ble
– Lowe
7
r
costs
–
More
client
s
Marco Vieira, University of Coimbra, Portugal
ID_product
• Mctandatory dimension that
ID_pro du
ID_product
ID_product
ID_time ID_time
description represents the DW tDI e_mrotspe oral
ID_store description ID_store ID_store
full_description name full_description • Must chaIDr_apcrotmeortiizone the name
SKU_number ID_promodtieonpendency store_number SKU_number store_number
package_size units_sold store_street_address package_size as
products units_sold store_street_a
brand • Must describe
purchase_cost
time cstore_county
atisy seen brand ddress
subcategory sale_value subcategory seenpburychtahsee_cobstusiness city
category
by
num_Clietnhtse business managsteormstore_zip
e_setatnet
category num_Clients store_county
department
package_type
department
package_type • manage
Must smael e_vnthe
contain atu
le attributes that store_state
store_zip
• Is typically generate d
ID_time sales_district ID_time
diet_type weight diet_type
weight_unit_of_measure date ID_promotion s alesi_store_manager
nregaion weight date are relevant forID p _ opr somteot rio i nor sales_district
s sales_region
store_manager
units_per_retail_case day_of_week sy n t h etic
nu m b er store_phone weight_unit_of_measure
querie
day_of_week number store_phone store_FAX
units_per_shipping_case day_number name store_FAX units_per_retail_case day_•nuItmbiser_ani_smtornoht ngly floor_plan_type
cases_per_pallet
shelf_width_cm
_in_month
day_number_overall
week_number_in_year
• manner
Ittyitype_advertisement
spe_nproci te_gredenerated frflooomphoto_processing_type
r_pltahn_etype
units_per_shipping_case
cases_per_pallet
day_number_overall
dtea nbormalized
wek_nn am
um eb e
type_price_red
l er_i(nw_yeharich is
photo_processing_type
finance_services_type
shelf_height_cm shelf_width_cm first_opened_date
type_poster
shelf_depth_cm
……...
week_number_overall
Month
opType_coupons
e r a t i o nal databasfirst_opened_date
ypt e _p o ts e r feinasnce_services_type shelf_height_cm
shelf_depth_cm
a yt lps
Month _e oadvtyrep
week_number_overall
other sitiecman
dimension e lt in Tysp)e_coupons last_remodel_date
promotion_cost quarter promotion_cost store_sqft grocery_sqft
quarter • Intscarlu_dt daeet s all the last_remodel_date ……... fiscal_period start_date frozen_sqft
fiscal_period year end_date meat_sqft
year reco strodres_sqft
end_date grocery_sqft holiday_flag ……... ……...
holiday_flag re…p…r.e..senting the ……….
………. considered in the DW meat_sqft
……...
tim e p e r i od
fr oze June
DEPEND 2009, Athens/Glyfada, Greece, n_ sq18,2009
ft 47 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 48
8
Marco Vieira, University of Coimbra, Portugal
ID_product
ID_product
• Must characterize tIhD_emit setores as •IDC_prhodaurctacterizes the existing ID_time
•prdesIIconDprim
tth_proinoisodexample
tiuoctns
ID_store
description
full_description
ID_store
seen by the busineIsDs_pmromaonoit angement name full_description there is onlyID_promotion
IoDn_seortde imension related to IpD
name
r om
_ s ot er
SKU_number
units_sold store_number otions
•SKRU_enupmrberesents a very importantudniitms_seolnd store_number
•
package_size store_street_a package_size store_street_address
brand ddress purchase_cost
subcategory Must contain the atptsurarl ice_bhvu aslteee_csostthat sion
brand
• Managers want to know the impact of promotions in the salescitiyn order to
category relevant for posteri o r q u e ries
a u
num_Clients
city
store_county
subcategory sale_value store_county
category target new promotions to specnifuicm_pCrloei dnutscts, stores and timestore_zip
store_state
department
are store_state department
package_type
diet_type •
Incu
l IDd_etimsegeographical attributes
store_zip
sales_district
package_type
diet_type
ID_time sales_district
sales_region
weight (localization)
date
day_of_week
ID_promotion
number
sales_region weight date
day_of_week
ID_promotion
store_manager
weight_unit_of_measure store_manager weight_unit_of_measure number store_phone
day_number_in_month day_number
units_per_retail_case
units_per_shipping_case • Inclu des time attributes
day_number_overall
name
type_price_red
store_phone
store_FAX
units_per_retail_case
units_per_shipping_case _in_month
name
type_pri
store_FAX
floor_plan_type
cases_per_pallet type_advertisement floor_plan_type cases_per_pallet day_number_overall ce_red photo_processin
w ee k _
n um b e
r_ n
i_ y week_number_in_year
shelf_width_cm
shelf_height_cm
(ope nweek_number_overall
i n g d a te ,
Month
type_poster
Type_coupons
photo_processin
g_type
shelf_width_cm
shelf_height_cm week_number_overall
type_advertisement
type_poster
g_type
finance_services_type
shelf_depth_cm quarter promotion_cost finance_services_type shelf_depth_cm Month Type_coupons first_opened_date
……... ear fiscal_period start_date end_date first_opened_date ……... quarter promotion_cost last_remodel_date
… ). year ……... last_remodel_date fiscal_period start_date store_sqft
holiday_flag store_sqft year end_date grocery_sqft
………. grocery_sqft holiday_flag ……... frozen_sqft meat_sqft
frozen_sqft meat_sqft ……….
……...
……...
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 49 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 50
Questions
? 3. Using DW to analyze
dependability data
9
Marco Vieira, University of Coimbra, Portugal
Field
Exp. dataN
System
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 55 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 56
?
Management System Exp. control data
Data Faults definition Target System
Warehouse Net
Statistical Readouts
Exp. Setup N Reporting
(impact of faults)
Two types of data:
• General approach to store results from dependability
evaluation experiments
• Measures collected from the target system (FACTS)
– For example, raw data representing error detection efficiency, recovery
• Data from different experiments can be compared/cross- time, failure modes, etc
exploit (only if it makes Wsehnastes’toni csiodme?pare)
• Features of the target system and experimental setup
• Raw data is available (not only the final results)
that have impact on the measures (DIMENSIONS)
• Results can be analyzed and shared world wide by using – For example, attributes describing the target systems, the different
web-enabled versions of OLAP tools configurations, the workload, the faultload, etc
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 57 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 58
System B
et
System A
rg
Ta
Faultload
Workload
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 59 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 60
10
Marco Vieira, University of Coimbra, Portugal
The experimental setups are used as they are. You can use your Loading applications
favorite dependability evaluation tool and do the experiments • General purpose loading applications
in the usual way. It’s necessary… • Some transformations in the data are normally necessary for
• To know the format of the raw results consistency
• To have access to the results
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 61 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 62
Data warehouse
• Raw data is available in a standard star schema (facts + dimensions)
Analysis
• Results from different experiments are compatible and can be compared/
• Commercial OLAP tools are used to analyze the raw data and
analyzed together, then they are stored in the same star schema (or in compute the measures. These tools are designed to be used by
scheme that share at least one dimension) managers: very easy to use :-)
• If results are from different unrelated experiments then they are stored in a • Just need an internet browser to analyze the data
separated schema
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 63 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 64
4. Every time a new experiment is done 65 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 66
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 67 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 68
Test
70
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 71 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 72
12
Marco Vieira, University of Coimbra, Portugal
DEPEND 20 09, Athens/Glyfada, Greece, June 18,2009 73 DEPEND 20 09, Athens/Glyfada, Greece, June 18,2009 74
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 75 DEPEND 2009, Athens/Glyfada, Greece, Ju ne 18,2009 76
ETL
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 77 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 78
13
Marco Vieira, University of Coimbra, Portugal
?
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 79 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 80
AMBER Repository
vision and objectives
• Vision
− Become a worldwide repository
for dependability related data
Repository analysis
− Allow data comparison and
cross-exploitation
− Facilitate worldwide data
sharing and dissemination
impact of research
14
Marco Vieira, University of Coimbra, Portugal
15
Marco Vieira, University of Coimbra, Portugal
• Data extraction
− SQL scripts to extract data from the CSV files to a
temporary database schema (data staging area)
• Data transformation
− SQL scripts transform the data into an adequate format
• Data load
− SQL scripts to load the transformed data into the
data warehouse
• Loading plans documented and stored in
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 91
the ADR
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009
92
• Executing the loading plans created before • Data ownership policies of ADR are divided in two
main groups
• If new data becomes available we just need to
− Private data
rerun the plans
− Proprietary data
− e.g., if the benchmark is executed in other systems
− Collaborative data
• The documentation of the DBench-OLTP
• For the DBench-OLTP data we have decided to
includes papers and technical reports
use a collaborative approach
− This is considered as part of the DBench-OLTP
− Allows other potential users of the benchmark to
data
compare their results with the ones available in the
− It is loaded to the repository and made available ADR
to the potential readers of the data
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 93 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 94
16
Marco Vieira, University of Coimbra, Portugal
• Murphy's law…
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 97 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 98
http://www.amber-project.eu Questions
Do you have
data? ?
Share Them!
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 99 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 100
• Ralph Kimbal, Margy Ross, “The Data • Madeira, H., Costa, J., Vieira, M. , "The OLAP and Data
Warehousing Approaches for Analysis and Sharing of Results
Warehouse Toolkit: The Complete Guide to from Dependability Evaluation Experiments", International
Dimensional Modeling” (Second Edition), Ed. Conference on Dependable Systems and Networks, DSN-
J. Wiley & Sons, Inc, 2002. DCC 2003, San Francisco, CA, USA, June 2003
• Pintér, G., Madeira, H., Vieira, M., Pataricza, A., Majzik, I. , "A
• Ralph Kimbal, “The Data Warehouse Lifecycle Data Mining Approach to Identify Key Factors in Dependability
Toolkit”, Ed. J. Wiley & Sons, Inc, 2001. Experiments", Fifth European Dependable Computing
Conference (EDCC-5), Budapest, Hungary, April 2005
DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 101 DEPEND 2009, Athens/Glyfada, Greece, June 18,2009 102
17
Marco Vieira, University of Coimbra, Portugal
ADR bibliography
18