DataWarehouseDesignDecisions PDF
DataWarehouseDesignDecisions PDF
Decisions
August 2015
Colleen Barnitz
Director, IT Development
MVT Services
Colleen Barnitz
over 20 Years in IT
worked with SQL Server since version 6.5
developer and an architect on data warehouse projects during that
time
leads a team of developers in providing applications and business
intelligence solutions for the enterprise
co-founded and is active in the Las Cruces & El Paso SQL Server
User Group
Twitter: @ColleenBarnitz
Email: [email protected]
2
It’s An Exciting Time to be in the Data
Business…
3
Agenda
Big Data
• Hadoop
Streaming Data
• Event Processing
4
Data Services
• Be prepared to be Agile
5
What is a Data Warehouse?
Schema-driven
6
Built for the Business
7
Still valid after all these years…
• sub-terabyte data
8
The Data Warehouse Project
9
Data Modeling
Choices:
Kimball – Dimensional Modeling – Star Schemas
Inmon – model 3rd Normal Form first – then build data warehouse
1
0
Kimball Methodology
Enterprise Data Warehouse Bus Architecture
1
1
Dimensional Modeling
1
2
Star Schema
1
3
1
4
Dimension Tables
Different flavors:
Slowly Changing Dimensions – Types 0 thru 7
Junk Dimensions
Bridge Tables
1
5
Conformed Dimensions
1
6
Fact Tables
Kinds:
Periodic
Accumulating
Temporal
1
7
Data Vault Modeling
Unified Decomposition
Strengths:
Agility
Auditability
History Tracking
Easy to Automate
1
8
Data Vault Cont’d
1
9
2
0
Anchor Modeling
Concepts:
Objects
Attributes
Relationships
2
1
2
2
Populating the Warehouse
2
3
ETL
Extract – the process by which data is extracted from the data source
Transform – transformations and integrity checking
Load – loading the data into the warehouse
Only the data required for the output is included in the extraction
process
2
4
ELT – Extract, Load, Transform
“We don’t need no stinking transforms” interpretation
Brought all the data and now available for future needs
2
5
Automation of ETL/ELT is KEY
Brittle designs hold back changes and new development
SSIS editing can be very frustrating
Code Generation
BIML / Bids Helper
Write your own
2
6
ELT – Automation Friendly
2
7
OLAP - On Line Analytical Processing
2
8
Analysis Services (SSAS) Options
Multi-Dimensional mode
Tabular Mode
2
9
Adding Data Scientists / Uber Analysts
to the Mix
They need to spend their time understanding what data means and
not creating clean, integrated data sets
3
0
Providing for the Analysts – A Sandbox
Environment for:
Experiments
New ideas
Test hypotheses
*Minimally Governed
3
1
Analytics Sandbox
CAVEATS:
3
2
Traditional Data Warehouse Flow
Audits STAGING
Reporting
ERP
Power BI
Analytics
Sandbox Applications
3
3
Adapting to Changing Times
3
4
Old School DW
very structured
heavily designed
fairly rigid. Takes time to react to new data and analytic requests.
3
5
4 V’s of Big Data
3
6
Volume
Log Files
Social Media
Click Stream
Device generated
Remote monitoring sensor
RFID
Spatial and GPS coordinates
3
7
Velocity
3
8
Hadoop
3
9
Map Reduce
Map
Split the data into pieces
Processed in parallel on individual nodes
Stores results locally
Reduce
Aggregate the data
4
0
HDFS
4
1
Azure HDInsight
Integrates with Excel, SQL Server Analysis Services, and SQL Server
Reporting Services
4
2
Spark for Azure HDInsight
Supports
Batch and Interactive queries
Real-time streaming
Machine learning
Graph processing
4
3
Why incorporate Streaming Data?
• Manufacturing Process
• Financial trading
• Web analytics
• Operational Analytics
4
4
Hybrid Data Warehouse Environment
4
5
What’s in the Data Lake?
Archives
Operational data
Logs
Analytics Sandbox
4
6
Relational Database aren’t standing
still…
4
7
SQL Server 2016 BI Features
Updateable nonclustered columnstore index support with
columnar index in-memory or on-disk row store
4
8
Columnstore
Vertipaq Technology
4
9
Columnstore Index
Data is:
Compressed
Stored in column segments
5
0
NonClustered Columnstore Index
5
1
Clustered Columstore Index
5
2
Columnstore Index Use Case
5
3
SQL Server 2016 SSIS (ETL)
5
4
SQL Server 2016 – SSRS (Reporting)
FINALLY - improvements and additional features
5
6
Data Governance
5
7
Understand the New
5
8
What Does your DW look like?
5
9
References
The Microsoft Modern Data Warehouse
http://download.microsoft.com/download/C/2/D/C2D2D5FA-768A-49AD-8957-
1A434C6C8126/The_Microsoft_Modern_Data_Warehouse_White_Paper.pdf
HDInsight
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/
Biml
http://bimlscript.com/
6
0
References cont’d
Anchor Modeling
http://www.anchormodeling.com/
6
1
Las Cruces & El Paso SQL Server User
Group
Meets Second Thursday of the Month at Noon
6
2