Top 50 Datawarehousing Questions Answers
Top 50 Datawarehousing Questions Answers
Top 50 Datawarehousing Questions Answers
com/
A Datawarehouse is the repository of a data and it is used for Management decision support
system. Datawarehouse consists of wide variety of data that has high level of business
conditions at a single point in time.
In single sentence, it is repository of integrated information which can be available for queries
and analysis.
Business Intelligence is also known as DSS – Decision support system which refers to the
technologies, application and practices for the collection, integration and analysis of the
business related information or data. Even, it helps to see the data on the information itself.
Dimension table is a table which contain attributes of measurements stored in fact tables. This
table consists of hierarchies, categories and logic that can be used to traverse in nodes.
Fact table contains the measurement of business processes, and it contains foreign keys for the
dimension tables.
Average number of bricks produced by one person/machine – measure of the business process
1/8
https://career.guru99.com/
Data Mining is set to be a process of analyzing the data in different dimensions or perspectives
and summarizing into a useful information. Can be queried and retrieved the data from
database in their own format.
7. What is OLTP?
8. What is OLAP?
OLTP OLAP
Data is from original data source Data is from various data sources
Simple queries by users Complex queries by system
Normalized small database De-normalized Large Database
Fundamental business tasks Multi-dimensional business tasks
ODS is abbreviated as Operational Data Store and it is a repository of real time operational data
rather than long term trend data.
A view is nothing but a virtual table which takes the output of the query and it can be used in
place of tables.
2/8
https://career.guru99.com/
A materialized view is nothing but an indirect access to the table data by storing the results of a
query in a separate schema.
ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads
the data from the specified data source and extracts a desired subset of data. Next, it transform
the data using rules and lookup tables and convert it to a desired state.
Then, load function is used to load the resulting data to the target database.
VLDB is abbreviated as Very Large Database and its size is set to be more than one terabyte
database. These are decision support systems which is used to server large number of users.
Real-time datawarehousing captures the business data whenever it occurs. When there is
business activity gets completed, that data will be available in the flow and become available for
use instantly.
Aggregate tables are the tables which contain the existing warehouse data which has been
grouped to certain level of dimensions. It is easy to retrieve data from the aggregated tables
than the original table which has more number of records.
This table reduces the load in the database server and increases the performance of the query.
A factless fact tables are the fact table which doesn't contain numeric fact column in the fact
table.
Time dimensions are usually loaded through all possible dates in a year and it can be done
through a program. Here, 100 years can be represented with one row per day.
Non-Addictive facts are said to be facts that cannot be summed up for any of the dimensions
present in the fact table. If there are changes in the dimensions, same facts can be useful.
3/8
https://career.guru99.com/
Conformed fact is a table which can be used across multiple data marts in combined with the
multiple fact tables.
Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for
analyzing the data, managing aggregations, information partitioning into minor level information.
24. What are the key columns in Fact and dimension tables?
Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of fact tables
are the primary keys of the dimension tables.
SCD is defined as slowly changing dimensions, and it applies to the cases where record
changes over time.
4/8
https://career.guru99.com/
BUS schema consists of suite of confirmed dimension and standardized definition if there is a
fact tables.
Star schema is nothing but a type of organizing the tables in such a way that result can be
retrieved from the database quickly in the data warehouse environment.
Snowflake schema which has primary dimension table to which one or more dimensions can be
joined. The primary dimension table is the only table that can be joined with the fact table.
Core dimension is nothing but a Dimension table which is used as dedicated for single fact table
or datamart.
Name itself implies that it is a self explanatory term. Cleaning of Orphan records, Data
breaching business rules, Inconsistent data and missing information in a database.
Metadata is defined as data about the data. The metadata contains information like number of
columns used, fix width and limited width, ordering of fields and data types of the fields.
In datawarehousing, loops are existing between the tables. If there is a loop between the tables,
then the query generation will take more time and it creates ambiguity. It is advised to avoid
loop between the tables.
Yes, dimension table can have numeric value as they are the descriptive elements of our
business.
Cubes are logical representation of multidimensional data. The edge of the cube has the
dimension members,and the body of the cube contains the data values.
5/8
https://career.guru99.com/
Dimensional Modeling is a concept which can be used by dataware house designers to build
their own datawarehouse. This model can be stored in two types of tables – Facts and
Dimension table.
Fact table has facts and measurements of the business and dimension table contains the
context of measurements.
There are three types of Dimensional Modeling and they are as follows:
Conceptual Modeling
Logical Modeling
Physical Modeling
Surrogate key is nothing but a substitute for the natural primary key. It is set to be a unique
identifier for each row that can be used for the primary key to a table.
ER modeling will have logical and physical model but Dimensional modeling will have only
Physical model.
ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling
is used for de-normalizing the ROLAP and MOLAP design.
6/8
https://career.guru99.com/
Enterprise Datawarehousing
Operational Data Store
Data Mart
1. Start an Instance
2. Mount the database
3. Open the database
A Partial backup in an operating system is a backup short of full backup and it can be done
while the database is opened or shutdown.
The goal to Optimizer is to find the most efficient way to execute the SQL statements.
Execution Plan is a plan which is used to the optimizer to select the combination of the steps.
48. What are the approaches used by Optimizer during execution plan?
1. Rule Based
2. Cost Based
7/8
https://career.guru99.com/
Informatica
Data Stage
Oracle
Warehouse Builder
Ab Initio
Data Junction
8/8
Powered by TCPDF (www.tcpdf.org)