History of Datawarehouse
History of Datawarehouse
History of Datawarehouse
A data warehouse is a blend of technologies and components which allows the strategic use of
data. It is a technique for collecting and managing data from varied sources to provide meaningful
business insights.
•Data warehouse allows business users to quickly access critical data from some sources all in
one place.
•Data Warehouse helps to integrate many sources of data to reduce stress on the production
system.
•Data warehouse helps to reduce total turnaround time for analysis and re-porting.
•Restructuring and Integration make it easier for the user to use for report-ing and analysis.
•Data warehouse allows users to access critical data from the number of sources in a single
place. Therefore, it saves user’s time of retrieving data from multiple sources.
•Data warehouse stores a large amount of historical data. This helps users to analyze different
time periods and trends to make future predictions.
Disadvantages of Data Warehouse:
•Difficult to make changes in data types and ranges, data source schema,
Data Warehouse Tools
There are many Data Warehousing tools are available in the market. Here, are some most
prominent one:
1. MarkLogic: MarkLogic is useful data warehousing solution that makes data integration
easier and faster using an array of enterprise features. This tool helps to perform very complex search
operations. It can query different types of data like documents, relationships, and
metadata. http://developer.marklogic.com/products
2. Oracle: Oracle is the industry-leading database. It offers a wide range of choice of
data warehouse solutions for both on-premises and in the cloud. It helps to optimize customer
experiences by increasing operational efficiency. efficiency. https://www.oracle.com/index.html
3. Amazon RedShift: Amazon Redshift is Data warehouse tool. It is a simple and cost-effective
tool to analyze all types of data using standard SQL and existing BI tools. It also allows running
complex queries against petabytes of structured data, using the technique of query
optimization. https://aws.amazon.com/redshift/?nc2=h_m1 Here is a complete list of useful
Datawarehouse Tools - https:// https://ru99.-com/top-etl-etl-database-housing-tools.html
Operational systems are designed to support high- Data warehousing systems are typically designed
volume transaction processing. to support high-volume analytical processing
(i.e., OLAP).
Operational systems are usually concerned with Data warehousing systems are usually concerned
current data. with historical data.
Data within operational systems are mainly updated Non-volatile, new data may be added regularly.
regularly according to need. Once Added rarely changed.
It is designed for real-time business dealing and It is designed for analysis of business measures
processes. by subject area, categories, and attributes.
It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a time complex, unpredictable queries that access many
per table. rows per table.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-
oriented
Operational systems are usually optimized to Data warehousing systems are usually optimized
perform fast inserts and updates of associatively to perform fast retrievals of relatively high
small volumes of data. volumes of data.
Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Processing (OLTP) Processing (OLAP)
What is data warehouse Data warehouse is an information system that contains historical and
commutative data from single or multiple sources. It simplifies reporting and analysis process of the
organization. It is also a single version of truth for any company for decision making and fore- casting.
Integrated
In Data Warehouse, integration means the establishment of a common unit of measure for all
similar data from the dissimilar database. The data also needs to be stored in the Datawarehouse in
common and universally acceptable manner. A data warehouse is developed by integrating data from
varied sources like a main- frame, relational databases, flat files, etc. Moreover, it must keep consistent
nam- ing conventions, format, and coding. This integration helps in effective analysis of data.
Consistency in naming conven- tions, attribute measures, encoding structure etc. have to be ensured.
Consider the following example:
In the above example, there are three different application labeled A, B and C. Infor- mation stored in
these applications are Gender, Date, and Balance. However, each application's data is stored different way.
•In Application A gender field store logical values like M or F
•In Application B gender field is a numerical value,
•In Application C application, gender field stored in the form of a character value.
•Same is the case with Date and balance
However, after transformation and cleaning process all this data is stored in com- mon format in the
Data Warehouse.
Time-Variant The time horizon for data warehouse is quite extensive compared with
operational systems. The data collected in a data warehouse is recognized with a particular pe- riod and
offers information from the historical point of view. It contains an element of time, explicitly or
implicitly. One such place where Datawarehouse data display time variance is in in the struc- ture of
the record key. Every primary key contained with the DW should have either implicitly or explicitly an
element of time. Like the day, week month, etc. Another aspect of time variance is that once data is
inserted in the warehouse, it can’t be updated or changed.
Non-volatile Data warehouse is also non-volatile means the previous data is not erased
when new data is entered in it. Data is read-only and periodically refreshed. This also helps to analyze
historical data and understand what & when happened. It does not require transaction process, recovery
and concurrency control mechanisms. Activities like delete, update, and insert which are performed in
an operational application environment are omitted in Data warehouse environment. Only two types of
data operations performed in the Data Warehousing are
Here, are some major differences between Application and Data Warehouse
Complex program must be coded to make sure This kind of issues does not
that data upgrade processes maintain high happen because data update is
integrity of the final product. not performed.
Data from operational databases and external sources (such as user profile data provided by
external consultants) are extracted using application program interfaces called a gateway. A
gateway is provided by the underlying DBMS and allows customer programs to generate SQL
code to be executed at a server.
(1) A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that maps
functions on multidimensional data to standard relational operations.
2. Operational metadata, which usually describes the currency level of the stored data, i.e., active,
archived or purged, and warehouse monitoring information, i.e., usage statistics, error reports,
audit, etc.
3. System performance data, which includes indices, used to improve data access and retrieval
performance.
5. Summarization algorithms, predefined queries, and reports business data, which include
business terms and definitions, ownership information, etc.
– Simply load data and run • No need to define indexes, create partitions, etc.
Fully-managed
• No tuning required
Fully-elastic
• Third-party BI tools
– Oracle cloud services: Analytics Cloud Service, Golden Gate Cloud Service, Integration Cloud Service,
and others
SnowFlakes:
Snowflake is a cloud data warehouse that can store and analyze all your data records in one place. It
can automatically scale up/down its compute resources to load, integrate, and analyze data.
Snowflake’s decoupled storage, compute, and services architecture enables the platform to automatically
deliver the optimal set of IO, memory, and CPU resources for each workload and usage scenario.
Snowflake uses a new multi-cluster, shared data architecture that decouples storage, compute resources, and
system services. Snowflake’s architecture has the following three components:
1. Storage: Snowflake uses a scalable cloud storage service to ensure a high degree of data replication,
scalability, and availability without much manual user intervention. It allows users to organize
information in databases, as per their needs.
2. Compute: Snowflake uses massively parallel processing (MPP) clusters to allocate compute resources
for tasks like loading, transforming, and querying data. It allows users to isolate workloads within
particular virtual warehouses. Users can also specify which databases in the storage layer a particular
virtual warehouse has access to.
3. Cloud services: Snowflake uses a set of services such as metadata, security, access control, security,
and infrastructure management. It allows users to communicate with client applications such as
Snowflake web user interface, JDBC, or ODBC.