Data Dictionary

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 11

Data warehousing Basics

1. Definition of data warehousing?


 Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile
collection of data in support of management's decision making process.

Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by
subject matter, sales in this case makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put
data from disparate sources into a consistent format. They must resolve such
problems as naming conflicts and inconsistencies among units of measure. When they
achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This
is very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.
2. How many stages in Datawarehousing?
Data warehouse generally includes two stages
 ETL
 Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into
one tool
• Extract -- the process of reading data from a source database.
• Transform -- the process of converting the extracted data from its previous
form into required form
• Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts
anddata warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is
transformed into useful information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing.
It is a set of specification which allows the client applications in retrieving the
data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide
various analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for
Decision support for top level management.

1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services

• Different Between OLTP and OLAP


• OLTP OLAP
1 Application Oriented (e.g., purchase order it is functionality of an
application) Subject Oriented (subject in the sense customer, product, item,
time)
2 Used to run business Used to analyze business
3 Detailed data Summarized data
4 Repetitive access Ad-hoc access
5 Few Records accessed at a time (tens), simple query Large volumes accessed
at a time(millions), complex query
6 Small database Large Database
7 Current data Historical data
8 Clerical User Knowledge User
9 Row by Row Loading Bulk Loading
10 Time invariant Time variant
11 Normalized data De-normalized data
12 E – R schema Star schema

3. What are the types of datawarehousing?


EDW (Enterprise datawarehousing)
 It provides a central database for decision support throughout the enterprise
 It is a collection of DATAMARTS
DATAMART
 It is a subset of Datawarehousing
 It is a subject oriented database which supports the needs of individuals
depts. in an organizations
 It is called high performance query structure
 It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
 It is defined as an integrated view of operational database designed to support
operational monitoring
 It is a collection of operational data sources designed to support Transaction
processing
 Data is refreshed near real-time and used for business activity
 It is an intermediate between the OLTP and OLAP which helps to create an
instance reports

5. What are the types of Approach in DWH?


Bottom up approach: first we need to develop data mart then we integrate these data
mart into EDW
Top down approach: first we need to develop EDW then form that EDW we develop data
mart
Bottom up
OLTP ETL Data mart DWH OLAP
Top down
OLTP ETL DWH Data mart OLAP
Top down
 Cost of initial planning & design is high
 Takes longer duration of more than an year
Bottom up
 Planning & Designing the Data Marts without waiting for the Global warehouse
design
 Immediate results from the data marts
 Tends to take less time to implement
 Errors in critical modules are detected earlier.
 Benefits are realized in the early phases.
 It is a Best Approach
Data Modeling Types:
 Conceptual Data Modeling
 Logical Data Modeling
 Physical Data Modeling
 Dimensional Data Modeling
1. Conceptual Data Modeling
 Conceptual data model includes all major entities and relationships and does
not contain much detailed level of information about attributes and is often used
in the INITIAL PLANNING PHASE
 Conceptual data model is created by gathering business requirements from
various sources like business documents, discussion with functional teams, business
analysts, smart management experts and end users who do the reporting on the
database. Data modelers create conceptual data model and forward that model to
functional team for their review.
 Conceptual data modeling gives an idea to the functional and technical team
about how business requirements would be projected in the logical data model.

2. Logical Data Modeling


 This is the actual implementation and extension of a conceptual data model.
Logical data model includes all required entities, attributes, key groups, and
relationships that represent business information and define business rules.
3. Physical Data Modeling
 Physical data model includes all required tables, columns, relationships,
database properties for the physical implementation of databases. Database
performance, indexing strategy, physical storage and demoralization are important
parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information and defines business rules Represents the physical
implementation of the model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
Dimensional Data Modeling
 Dimension model consists of fact and dimension tables
 It is an approach to develop the schema DB designs
Types of Dimensional modeling
 Star schema
 Snow flake schema
 Star flake schema (or) Hybrid schema
 Multi star schema
What is Star Schema?
 The Star Schema Logical database design which contains a centrally located fact
table surrounded by at least one or more dimension tables
 Since the database design looks like a star, hence it is called star schema db
 The Dimension table contains Primary keys and the textual descriptions
 It contain de-normalized business information
 A Fact table contains a composite key and measures
 The measure are of types of key performance indicators which are used to
evaluate the enterprise performance in the form of success and failure
 Eg: Total revenue , Product sale , Discount given, no of customers
 To generate meaningful report the report should contain at least one dimension
and one fact table
The advantage of star schema
 Less number of joins
 Improve query performance
 Slicing down
 Easy understanding of data.
Disadvantage:
 Require more storage space

Example of Star Schema:


Snowflake Schema
 In star schema, If the dimension tables are spitted into one or more dimension
tables
 The de-normalized dimension tables are spitted into a normalized dimension
table
Example of Snowflake Schema:

 In Snowflake schema, the example diagram shown below has 4 dimension tables, 4
lookup tables and 1 fact table. The reason is that hierarchies (category, branch,
state, and month) are being broken out of the dimension tables (PRODUCT,
ORGANIZATION, LOCATION, and TIME) respectively and separately.
 It increases the number of joins and poor performance in retrieval of data.
 In few organizations, they try to normalize the dimension tables to save space.
 Since dimension tables hold less space snow flake schema approach may be
avoided.
 Bit map indexes cannot be effectively utilized

Data warehousing Basics


1. Definition of data warehousing?
 Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile
collection of data in support of management's decision making process.

Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by
subject matter, sales in this case makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put
data from disparate sources into a consistent format. They must resolve such
problems as naming conflicts and inconsistencies among units of measure. When they
achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This
is very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.
2. How many stages in Datawarehousing?
Data warehouse generally includes two stages
 ETL
 Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into
one tool
• Extract -- the process of reading data from a source database.
• Transform -- the process of converting the extracted data from its previous
form into required form
• Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts
anddata warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is
transformed into useful information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing.
It is a set of specification which allows the client applications in retrieving the
data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide
various analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for
Decision support for top level management.

1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services

• Different Between OLTP and OLAP


• OLTP OLAP
1 Application Oriented (e.g., purchase order it is functionality of an
application) Subject Oriented (subject in the sense customer, product, item,
time)
2 Used to run business Used to analyze business
3 Detailed data Summarized data
4 Repetitive access Ad-hoc access
5 Few Records accessed at a time (tens), simple query Large volumes accessed
at a time(millions), complex query
6 Small database Large Database
7 Current data Historical data
8 Clerical User Knowledge User
9 Row by Row Loading Bulk Loading
10 Time invariant Time variant
11 Normalized data De-normalized data
12 E – R schema Star schema

3. What are the types of datawarehousing?


EDW (Enterprise datawarehousing)
 It provides a central database for decision support throughout the enterprise
 It is a collection of DATAMARTS
DATAMART
 It is a subset of Datawarehousing
 It is a subject oriented database which supports the needs of individuals
depts. in an organizations
 It is called high performance query structure
 It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
 It is defined as an integrated view of operational database designed to support
operational monitoring
 It is a collection of operational data sources designed to support Transaction
processing
 Data is refreshed near real-time and used for business activity
 It is an intermediate between the OLTP and OLAP which helps to create an
instance reports

5. What are the types of Approach in DWH?


Bottom up approach: first we need to develop data mart then we integrate these data
mart into EDW
Top down approach: first we need to develop EDW then form that EDW we develop data
mart
Bottom up
OLTP ETL Data mart DWH OLAP
Top down
OLTP ETL DWH Data mart OLAP
Top down
 Cost of initial planning & design is high
 Takes longer duration of more than an year
Bottom up
 Planning & Designing the Data Marts without waiting for the Global warehouse
design
 Immediate results from the data marts
 Tends to take less time to implement
 Errors in critical modules are detected earlier.
 Benefits are realized in the early phases.
 It is a Best Approach
Data Modeling Types:
 Conceptual Data Modeling
 Logical Data Modeling
 Physical Data Modeling
 Dimensional Data Modeling
1. Conceptual Data Modeling
 Conceptual data model includes all major entities and relationships and does
not contain much detailed level of information about attributes and is often used
in the INITIAL PLANNING PHASE
 Conceptual data model is created by gathering business requirements from
various sources like business documents, discussion with functional teams, business
analysts, smart management experts and end users who do the reporting on the
database. Data modelers create conceptual data model and forward that model to
functional team for their review.
 Conceptual data modeling gives an idea to the functional and technical team
about how business requirements would be projected in the logical data model.

2. Logical Data Modeling


 This is the actual implementation and extension of a conceptual data model.
Logical data model includes all required entities, attributes, key groups, and
relationships that represent business information and define business rules.
3. Physical Data Modeling
 Physical data model includes all required tables, columns, relationships,
database properties for the physical implementation of databases. Database
performance, indexing strategy, physical storage and demoralization are important
parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information and defines business rules Represents the physical
implementation of the model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
Dimensional Data Modeling
 Dimension model consists of fact and dimension tables
 It is an approach to develop the schema DB designs
Types of Dimensional modeling
 Star schema
 Snow flake schema
 Star flake schema (or) Hybrid schema
 Multi star schema
What is Star Schema?
 The Star Schema Logical database design which contains a centrally located fact
table surrounded by at least one or more dimension tables
 Since the database design looks like a star, hence it is called star schema db
 The Dimension table contains Primary keys and the textual descriptions
 It contain de-normalized business information
 A Fact table contains a composite key and measures
 The measure are of types of key performance indicators which are used to
evaluate the enterprise performance in the form of success and failure
 Eg: Total revenue , Product sale , Discount given, no of customers
 To generate meaningful report the report should contain at least one dimension
and one fact table
The advantage of star schema
 Less number of joins
 Improve query performance
 Slicing down
 Easy understanding of data.
Disadvantage:
 Require more storage space

Example of Star Schema:


Snowflake Schema
 In star schema, If the dimension tables are spitted into one or more dimension
tables
 The de-normalized dimension tables are spitted into a normalized dimension
table
Example of Snowflake Schema:

 In Snowflake schema, the example diagram shown below has 4 dimension tables, 4
lookup tables and 1 fact table. The reason is that hierarchies (category, branch,
state, and month) are being broken out of the dimension tables (PRODUCT,
ORGANIZATION, LOCATION, and TIME) respectively and separately.
 It increases the number of joins and poor performance in retrieval of data.
 In few organizations, they try to normalize the dimension tables to save space.
 Since dimension tables hold less space snow flake schema approach may be
avoided.
 Bit map indexes cannot be effectively utilized

Data warehousing Basics


1. Definition of data warehousing?
 Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile
collection of data in support of management's decision making process.
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by
subject matter, sales in this case makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put
data from disparate sources into a consistent format. They must resolve such
problems as naming conflicts and inconsistencies among units of measure. When they
achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This
is very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.
2. How many stages in Datawarehousing?
Data warehouse generally includes two stages
 ETL
 Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into
one tool
• Extract -- the process of reading data from a source database.
• Transform -- the process of converting the extracted data from its previous
form into required form
• Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts
anddata warehouses and also to convert databases from one format to another format.
It is used to retrieve the data from various operational databases and is
transformed into useful information and finally loaded into Datawarehousing system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing.
It is a set of specification which allows the client applications in retrieving the
data for analytical processing.
It is a specialized tool that sits between a database and user in order to provide
various analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for
Decision support for top level management.

1. Business Objects
2. Cognos
3. Micro strategy
4. Hyperion
5. Oracle Express
6. Microsoft Analysis Services

• Different Between OLTP and OLAP


• OLTP OLAP
1 Application Oriented (e.g., purchase order it is functionality of an
application) Subject Oriented (subject in the sense customer, product, item,
time)
2 Used to run business Used to analyze business
3 Detailed data Summarized data
4 Repetitive access Ad-hoc access
5 Few Records accessed at a time (tens), simple query Large volumes accessed
at a time(millions), complex query
6 Small database Large Database
7 Current data Historical data
8 Clerical User Knowledge User
9 Row by Row Loading Bulk Loading
10 Time invariant Time variant
11 Normalized data De-normalized data
12 E – R schema Star schema

3. What are the types of datawarehousing?


EDW (Enterprise datawarehousing)
 It provides a central database for decision support throughout the enterprise
 It is a collection of DATAMARTS
DATAMART
 It is a subset of Datawarehousing
 It is a subject oriented database which supports the needs of individuals
depts. in an organizations
 It is called high performance query structure
 It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
 It is defined as an integrated view of operational database designed to support
operational monitoring
 It is a collection of operational data sources designed to support Transaction
processing
 Data is refreshed near real-time and used for business activity
 It is an intermediate between the OLTP and OLAP which helps to create an
instance reports

5. What are the types of Approach in DWH?


Bottom up approach: first we need to develop data mart then we integrate these data
mart into EDW
Top down approach: first we need to develop EDW then form that EDW we develop data
mart
Bottom up
OLTP ETL Data mart DWH OLAP
Top down
OLTP ETL DWH Data mart OLAP
Top down
 Cost of initial planning & design is high
 Takes longer duration of more than an year
Bottom up
 Planning & Designing the Data Marts without waiting for the Global warehouse
design
 Immediate results from the data marts
 Tends to take less time to implement
 Errors in critical modules are detected earlier.
 Benefits are realized in the early phases.
 It is a Best Approach
Data Modeling Types:
 Conceptual Data Modeling
 Logical Data Modeling
 Physical Data Modeling
 Dimensional Data Modeling
1. Conceptual Data Modeling
 Conceptual data model includes all major entities and relationships and does
not contain much detailed level of information about attributes and is often used
in the INITIAL PLANNING PHASE
 Conceptual data model is created by gathering business requirements from
various sources like business documents, discussion with functional teams, business
analysts, smart management experts and end users who do the reporting on the
database. Data modelers create conceptual data model and forward that model to
functional team for their review.
 Conceptual data modeling gives an idea to the functional and technical team
about how business requirements would be projected in the logical data model.

2. Logical Data Modeling


 This is the actual implementation and extension of a conceptual data model.
Logical data model includes all required entities, attributes, key groups, and
relationships that represent business information and define business rules.
3. Physical Data Modeling
 Physical data model includes all required tables, columns, relationships,
database properties for the physical implementation of databases. Database
performance, indexing strategy, physical storage and demoralization are important
parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information and defines business rules Represents the physical
implementation of the model in a database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
Dimensional Data Modeling
 Dimension model consists of fact and dimension tables
 It is an approach to develop the schema DB designs
Types of Dimensional modeling
 Star schema
 Snow flake schema
 Star flake schema (or) Hybrid schema
 Multi star schema
What is Star Schema?
 The Star Schema Logical database design which contains a centrally located fact
table surrounded by at least one or more dimension tables
 Since the database design looks like a star, hence it is called star schema db
 The Dimension table contains Primary keys and the textual descriptions
 It contain de-normalized business information
 A Fact table contains a composite key and measures
 The measure are of types of key performance indicators which are used to
evaluate the enterprise performance in the form of success and failure
 Eg: Total revenue , Product sale , Discount given, no of customers
 To generate meaningful report the report should contain at least one dimension
and one fact table
The advantage of star schema
 Less number of joins
 Improve query performance
 Slicing down
 Easy understanding of data.
Disadvantage:
 Require more storage space

Example of Star Schema:


Snowflake Schema
 In star schema, If the dimension tables are spitted into one or more dimension
tables
 The de-normalized dimension tables are spitted into a normalized dimension
table
Example of Snowflake Schema:

 In Snowflake schema, the example diagram shown below has 4 dimension tables, 4
lookup tables and 1 fact table. The reason is that hierarchies (category, branch,
state, and month) are being broken out of the dimension tables (PRODUCT,
ORGANIZATION, LOCATION, and TIME) respectively and separately.
 It increases the number of joins and poor performance in retrieval of data.
 In few organizations, they try to normalize the dimension tables to save space.
 Since dimension tables hold less space snow flake schema approach may be
avoided.
 Bit map indexes cannot be effectively utilized

You might also like