Components of Data Mgmt
Components of Data Mgmt
Management
Data Management
• When data will be • degree to which • data are stored, • data stating that all
available data are recent exchanged, or its references are
• Timestamp on data enough to be presented in a valid.
created useful format
• Retention period
Referential
Timelines Currency Conformance
Integrity
Characteristics of Quality Data
• Uniqueness : means that each entity exists no more than once within the database, and there is a key that can be used to
uniquely access each entity. This characteristic requires identity matching (finding data about the same entity) and
resolution to locate and remove duplicate entities.
• Accuracy : has to do with the degree to which any data correctly represents the real-life object it models. Often accuracy is
measured by agreement with some recognized authority data source (e.g., one source system or even some external data
provider). Data must be both accurate and precise enough for their intended use. For example, knowing sales accurately is
important, but for many decisions, knowing sales only to the nearest $1000 per month for each product is sufficient. Data
can be valid (i.e., satisfy a specified domain or range of values) and not be accurate.
• Consistency : means that values for data in one data set (database) are in agreement with the values for related data in
another data set (database). Consistency can be within a table row (e.g., the weight of a product should have some
relationship to its size and material type), between table rows (e.g., two products with similar characteristics should have
about the same prices, or data that are meant to be redundant should have the same values), between the same
attributes over time (e.g., the product price should be the same from one month to the next unless there was a price
change event), or within some tolerance (e.g., total sales computed from orders filled and orders billed should be roughly
the same values). Consistency also relates to attribute inheritance from super- to subtypes. For example, a subtype
instance cannot exist without a corresponding supertype, and overlap or disjoint subtype rules are enforced.
• Completeness : refers to data having assigned values if they need to have values. This characteristic encompasses the NOT
NULL and foreign key constraints of SQL, but more complex rules might exist (e.g., male employees do not need a maiden
name but female employees may have a maiden name). Completeness also means that all data needed are present (e.g., if
we want to know total dollar sales, we may need to know both total quantity sold and unit price, or if an employee record
indicates that an employee has retired, we need to have a retirement date recorded). Sometimes completeness has an
aspect of precedence. For example, an employee in an employee table who does not exist in an applicant table may
indicate a data quality issue.
Characteristics of Quality Data
• Timeliness : means meeting the expectation for the time between when data are expected and when they
are readily available for use. As organizations attempt to decrease the latency between when a business
activity occurs and when the organization is able to take action on that activity, timeliness is becoming a
more important quality of data characteristic (i.e., if we don’t know in time to take action, we don’t have
quality data). A related aspect of timeliness is retention, which is the span of time for which data represent
the real world. Some data need to be time-stamped to indicate from when to when they apply, and missing
from or to dates may indicate a data quality issue.
• Currency : is the degree to which data are recent enough to be useful. For example, we may require that
customers’ phone numbers be up-to-date so we can call them at any time, but the number of employees
may not need to be refreshed in real time. Varying degrees of currency across data may indicate a quality
issue (e.g., if the salaries of different employees have drastically different updated dates).
• Conformance : refers to whether data are stored, exchanged, or presented in a format that is as specified by
their metadata. The metadata include both domain integrity rules (e.g., attribute values come from a valid
set or range of values) and actual format (e.g., specific location of special characters, precise mixture of text,
numbers, and special symbols).
• Referential integrity Data that refer to other data need to be unique and satisfy requirements to exist (i.e.,
satisfy any mandatory one or optional one cardinalities).
Data Cleaning
• Data cleaning is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data within
a dataset.
Meta Data Management
• Metadata management involves capturing, organizing, and maintaining
metadata, which provides essential context and information about the data
itself.
• How it works
• Data Source identification
• Data Extraction
• Data mapping
• Data Validation and QA
• Data Transformation
• Data Loading
• Data Synchronization
• Data Governance and Security
• Meta Data Management
• Data Access and analysis
Master Data Management
• approach to managing an organization's critical data across the
enterprise.
• uses technology, tools and processes to create a unified
master data service that consolidates key enterprise data
assets
• involves establishing workflows to streamline these processes
and guarantee consistent data handling across the organization
• Supported by a well-defined data model and solid data
stewardship
Data Management