Data Warehouse Introduction
Data Warehouse Introduction
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). - OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
The following table summarizes the major differences between OLTP and OLAP system design.
Can be relatively small if historical data is archived Highly normalized with many tables
Backup religiously; operational data is critical Instead of regular backups, some environments to run the business, data loss is likely to may consider simply reloading the OLTP data as a entail significant monetary loss and legal recovery method liability
EIS - Executive Information Systems DSS - Decision Support Systems MIS - Management Information Systems GIS - Geographic Information Systems OLAP - Online Analytical Processing and multidimensional analysis CRM - Customer Relationship Management Business Intelligence systems based on Data Warehouse technology. A Data Warehouse(DW) gathers information from a wide range of company's operational systems, Business Intelligence systems based on it. Data loaded to DW is usually good integrated and cleaned that allows to produce credible information which reflected so called 'one version of the true'.
Oracle - Siebel Business Analytics Applications SAS - Business Intelligence SAP - BusinessObjects XI IBM - Cognos 8 BI Oracle - Hyperion System 9 BI+ Microsoft - Analysis Services MicroStrategy - Dynamic Enterprise Dashboards Pentaho - Open BI Suite Information Builders - WebFOCUS Business Intelligence QlikTech - QlikView TIBCO Spotfire - Enterprise Analytics Sybase - InfoMaker KXEN - IOLAP SPSS ShowCase
ETL tools
List of the most popular ETL tools: Informatica - Power Center IBM - Websphere DataStage(Formerly known as Ascential DataStage) SAP - BusinessObjects Data Integrator IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft - SQL Server Integration Services Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor) SAS - Data Integration Studio Oracle - Warehouse Builder AB Initio Information Builders - Data Migrator Pentaho - Pentaho Data Integration Embarcadero Technologies - DT/Studio IKAN - ETL4ALL IBM - DB2 Warehouse Edition Pervasive - Data Integrator ETL Solutions Ltd. - Transformation Manager Group 1 Software (Sagent) - DataFlow Sybase - Data Integrated Suite ETL Talend - Talend Open Studio Expressor Software - Expressor Semantic Data Integration System Elixir - Elixir Repertoire OpenSys - CloverETL
ETL process ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks: - extracting the data from source systems (SAP, ERP, other oprational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. - transforming the data may involve the following tasks: applying business rules (so-called derivations, e.g., calculating new measures and dimensions), cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.), filtering (e.g., selecting only certain columns to load), splitting a column into multiple columns and vice versa, joining together data from multiple sources (e.g., lookup, merge), transposing rows and columns, applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing) - loading the data into a data warehouse or data repository other reporting applications
source:http://www.sybase.com
Products included in Sybase Business Intelligence platform: PowerDesigner WorkSpace Industry Warehouse Studio ASE Database Sybase IQ Data Integration Suite (Replication, Data Federation, Real-time Events, Sybase ETL) InfoMaker