0% found this document useful (0 votes)
56 views9 pages

Data Warehousing - Metadata Concepts

The document discusses metadata concepts in data warehousing. Metadata is defined as data about data and acts as a roadmap and directory for a data warehouse. Metadata is categorized into business, technical, and operational metadata. Metadata plays an important role by helping locate data warehouse contents and enabling mapping, summarization, and query, extraction, and reporting tools. Challenges for metadata management include metadata being scattered across sources and lack of industry standards.

Uploaded by

Kalaivani D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views9 pages

Data Warehousing - Metadata Concepts

The document discusses metadata concepts in data warehousing. Metadata is defined as data about data and acts as a roadmap and directory for a data warehouse. Metadata is categorized into business, technical, and operational metadata. Metadata plays an important role by helping locate data warehouse contents and enabling mapping, summarization, and query, extraction, and reporting tools. Challenges for metadata management include metadata being scattered across sources and lack of industry standards.

Uploaded by

Kalaivani D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Warehousing - Metadata Concepts

What is Metadata?

Metadata is simply defined as data about data. The data that is used to represent other data is

known as metadata. For example, the index of a book serves as a metadata for the contents in

the book. In other words, we can say that metadata is the summarized data that leads us to

detailed data. In terms of data warehouse, we can define metadata as follows.

 Metadata is the road-map to a data warehouse.

 Metadata in a data warehouse defines the warehouse objects.

 Metadata acts as a directory. This directory helps the decision support system to locate the contents of a data

warehouse.

Note − In a data warehouse, we create metadata for the data names and definitions of a given

data warehouse. Along with this metadata, additional metadata is also created for time-stamping

any extracted data, the source of extracted data.

Categories of Metadata

Metadata can be broadly categorized into three categories −

 Business Metadata − It has the data ownership information, business definition, and changing policies.

 Technical Metadata − It includes database system names, table and column names and sizes, data types and

allowed values. Technical metadata also includes structural information such as primary and foreign key

attributes and indices.


 Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the data

is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on

it.

Role of Metadata

Metadata has a very important role in a data warehouse. The role of metadata in a warehouse is

different from the warehouse data, yet it plays an important role. The various roles of metadata

are explained below.

 Metadata acts as a directory.

 This directory helps the decision support system to locate the contents of the data warehouse.

 Metadata helps in decision support system for mapping of data when data is transformed from operational

environment to data warehouse environment.

 Metadata helps in summarization between current detailed data and highly summarized data.

 Metadata also helps in summarization between lightly detailed data and highly summarized data.

 Metadata is used for query tools.

 Metadata is used in extraction and cleansing tools.


 Metadata is used in reporting tools.

 Metadata is used in transformation tools.

 Metadata plays an important role in loading functions.

The following diagram shows the roles of metadata.

Metadata Repository

Metadata repository is an integral part of a data warehouse system. It has the following

metadata −

 Definition of data warehouse − It includes the description of structure of data warehouse. The description is

defined by schema, view, hierarchies, derived data definitions, and data mart locations and contents.

 Business metadata − It contains has the data ownership information, business definition, and changing policies.

 Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the data

is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on

it.
 Data for mapping from operational environment to data warehouse − It includes the source databases and

their contents, data extraction, data partition cleaning, transformation rules, data refresh and purging rules.

 Algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation,

summarizing, etc.

Challenges for Metadata Management

The importance of metadata can not be overstated. Metadata helps in driving the accuracy of

reports, validates data transformation, and ensures the accuracy of calculations. Metadata also

enforces the definition of business terms to business end-users. With all these uses of metadata,

it also has its challenges. Some of the challenges are discussed below.

 Metadata in a big organization is scattered across the organization. This metadata is spread in spreadsheets,

databases, and applications.

 Metadata could be present in text files or multimedia files. To use this data for information management

solutions, it has to be correctly defined.

 There are no industry-wide accepted standards. Data management solution vendors have narrow focus.

 There are no easy and accepted methods of passing metadata.

Why Do We Need a Data Mart?

Listed below are the reasons to create a data mart −

 To partition data in order to impose access control strategies.

 To speed up the queries by reducing the volume of data to be scanned.

 To segment data into different hardware platforms.

 To structure data in a form suitable for a user access tool.


Note − Do not data mart for any other reason since the operation cost of data marting could be

very high. Before data marting, make sure that data marting strategy is appropriate for your

particular solution.

Cost-effective Data Marting

Follow the steps given below to make data marting cost-effective −

 Identify the Functional Splits

 Identify User Access Tool Requirements

 Identify Access Control Issues

Identify the Functional Splits

In this step, we determine if the organization has natural functional splits. We look for

departmental splits, and we determine whether the way in which departments use information

tend to be in isolation from the rest of the organization. Let's have an example.

Consider a retail organization, where each merchant is accountable for maximizing the sales of

a group of products. For this, the following are the valuable information −

 sales transaction on a daily basis

 sales forecast on a weekly basis

 stock position on a daily basis

 stock movements on a daily basis

As the merchant is not interested in the products they are not dealing with, the data marting is a

subset of the data dealing which the product group of interest. The following diagram shows

data marting for different users.


Given below are the issues to be taken into account while determining the functional split −

 The structure of the department may change.

 The products might switch from one department to other.

 The merchant could query the sales trend of other products to analyze what is happening to the sales.

Note − We need to determine the business benefits and technical feasibility of using a data

mart.

Identify User Access Tool Requirements

We need data marts to support user access tools that require internal data structures. The data in

such structures are outside the control of data warehouse but need to be populated and updated

on a regular basis.
There are some tools that populate directly from the source system but some cannot. Therefore

additional requirements outside the scope of the tool are needed to be identified for future.

Note − In order to ensure consistency of data across all access tools, the data should not be

directly populated from the data warehouse, rather each tool must have its own data mart.

Identify Access Control Issues

There should to be privacy rules to ensure the data is accessed by authorized users only. For

example a data warehouse for retail banking institution ensures that all the accounts belong to

the same legal entity. Privacy laws can force you to totally prevent access to information that is

not owned by the specific bank.

Data marts allow us to build a complete wall by physically separating data segments within the

data warehouse. To avoid possible privacy problems, the detailed data can be removed from the

data warehouse. We can create data mart for each legal entity and load it via data warehouse,

with detailed account data.

Designing Data Marts

Data marts should be designed as a smaller version of starflake schema within the data

warehouse and should match with the database design of the data warehouse. It helps in

maintaining control over database instances.


The summaries are data marted in the same way as they would have been designed within the

data warehouse. Summary tables help to utilize all dimension data in the starflake schema.

Cost of Data Marting

The cost measures for data marting are as follows −

 Hardware and Software Cost

 Network Access

 Time Window Constraints


Hardware and Software Cost

Although data marts are created on the same hardware, they require some additional hardware

and software. To handle user queries, it requires additional processing power and disk storage.

If detailed data and the data mart exist within the data warehouse, then we would face additional

cost to store and manage replicated data.

Note − Data marting is more expensive than aggregations, therefore it should be used as an

additional strategy and not as an alternative strategy.

Network Access

A data mart could be on a different location from the data warehouse, so we should ensure that

the LAN or WAN has the capacity to handle the data volumes being transferred within the data

mart load process.

Time Window Constraints

The extent to which a data mart loading process will eat into the available time window depends

on the complexity of the transformations and the data volumes being shipped. The

determination of how many data marts are possible depends on −

 Network capacity.

 Time window available

 Volume of data being transferred

 Mechanisms being used to insert data into a data mart

You might also like