Data Warehouses
Data Warehouses
Data Warehouses
management and retrieval. A data warehouse is a place where data is stored for archival,
analysis and security purposes. A data warehouse is a central repository for all or significant
parts of the data that an enterprise's various business systems collect. The term was coined by
W. H. Inmon. IBM sometimes uses the term "information warehouse."
Usually a data warehouse is either a single computer or many computers (servers) tied
together to create one giant computer system. Typically, a data warehouse is housed on an
enterprise mainframe server. Data from various online transaction processing (OLTP)
applications and other sources is selectively extracted and organized on the data warehouse
database for use by analytical applications and user queries. Data warehousing emphasizes
the capture of data from diverse sources for useful analysis and access. Applications of data
warehouses include data mining, Web Mining, and decision support systems (DSS).
Data marts are smaller and less integrated data housings. They might be just a
database on human resources records or sales data on just one division.
i) Subject Oriented: Subject oriented means that data is linked together and is organized by
relationships.
ii) Time Variant: Time variant means that any data that is changed in the data warehouse can
be tracked. Usually all changes to data are stamped with a time-date and with a before and
after value, so that you can show the changes throughout a period of time.
iii) Non Volatile: Non volatile means that the data is never deleted or erased. This is a great
way to protect your most crucial data. Because this data is retained, you can continue to use it
in a later analysis.
iv) Integrated: The data is integrated, which means that a data warehouse uses data that is
organizational wide instead of from just one department.
b) Advantages: The data warehouse helps the employees or end users to access and use the
data for reports, analysis and decision making. Using the data in a warehouse one can locate
trends, focus on relationships and understand more about the environment on which the
business operates
Data warehouses also increase the consistency of the data and allow it to be checked over and
over to determine how relevant it is. Because most data warehouses are integrated, one can
pull data from many different areas of the business, for instance human resources, finance,
IT, accounting, etc.
c) Disadvantages: Data warehouse is time consuming to create and to keep operating. Many
time the current systems become incompatible with the data. So, the hardware and software
continuously need to be upgraded. Finally, security might be a huge concern, especially when
the data is accessible over an open network such as the internet. In such cases the data can be
viewed by the competitor or worse hacked and destroyed.
1.Which of the following process includes data cleaning, data integration, data selection,
data transformation, data mining, pattern evolution and knowledge presentation?
KDD process
ETL process
KTL process
MDX process
None of the above.
ANSWER : KDD process
Explanation : : KDD Process includes data cleaning, data integration, data selection,
data transformation, data mining, pattern evolution, and knowledge presentation.
2.A warehouse architect is trying to determine what data must be included in the
warehouse. A meeting has been arranged with a business analyst to understand the data
requirements, which of the following should be included in the agenda?
Number of users
Corporate objectives
Database design
Routine reporting
Budget.
ANSWER : Routine reporting
Explanation : : Data modeling technique used for data marts is Dimensional modeling.
4.Which of the following employees data mining techniques to analyze the intent of a
user query, provided additional generalized or associated information relevant to the
query?
Iceberg query method
Data analyzer
Intelligent query answering
DBA
Query parser.
ANSWER : 3 Intelligent query answering
Explanation : : Cluster is the collection of data objects that are similar to one another
within the same group.
8.Which of the following is not related to dimension table attributes?
Verbose
Descriptive
Equally unavailable
Complete
Indexed.
ANSWER : Equally unavailable
Explanation : :The process of removing the deficiencies and loopholes in the data is
called as cleaning up of data.
13.Which of the following is not the managing issue in the modeling process?
Content of primary units column
Document each candidate data source
Do regions report to zones
Walk through business scenarios
Ensure that the transaction edit flat is used for analysis.
ANSWER : Ensure that the transaction edit flat is used for analysis.
Explanation : : Ensure that the transaction edit flat is used for analysis is not the
managing issue in the modeling process.
14.Which one manages both current and historic transactions?
OLTP
OLAP
Spread sheet
XML
All (opt1), (opt2), (opt3) and (opt4) above.
ANSWER : OLAP