0% found this document useful (0 votes)
11 views7 pages

M3A1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

M3A1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A - Data Quality Dimensions and Problems

Data Quality Dimensions


Introduction:

In the realm of data management, the concept of data quality (DQ) is pivotal,
serving as the cornerstone for ensuring that data is apt for its intended uses in
operations, decision-making, and planning. Given the diversity in the types of data
and the variety of contexts in which they are used, DQ is inherently a
multidimensional concept. A widely recognized framework by Wang et al.
delineates DQ into four primary categories: intrinsic, contextual, representation,
and access.

Data Quality Dimensions

Intrinsic Data Quality:


The intrinsic category of DQ relates to the veracity and credibility of data.
Accuracy, objectivity, and reputation are the pillars that support this category.
 Accuracy: It is the most fundamental aspect of DQ, ensuring that data
correctly reflects reality. For instance, the accuracy of birth dates in a
database is paramount; any deviation from the actual date constitutes a direct
compromise of data quality.
 Objectivity: Data should be collected and presented without bias, ensuring
that decisions made based on the data are fair and impartial.
 Reputation: The source of data significantly impacts its trustworthiness.
Data from reputable sources is more likely to be accurate and objective.

Contextual Data Quality :


Contextual DQ focuses on the practical application of data, assessing its
relevance and utility for specific tasks.
 Completeness: This dimension assesses whether all necessary data is
present.We observe instances of incomplete data, specifically in the 'Email'
column, which can hinder communication efforts.
 Relevance: Data should be pertinent to the context in which it is used.
Irrelevant data can lead to misinformed decisions and strategies.
 Timeliness: The utility of data is often time-sensitive. Outdated data can
lead to missed opportunities or continued use of inefficient processes.
Representation Data Quality :
This category assures that data is presented in a manner that is
understandable and interpretable by users.
 Consistency: Uniformity in data representation across different platforms
and datasets is crucial. Consistency ensures that data interpreted in one
context will be understood in the same way in another.
 Interpretability: The data must be in a language and symbols that are
familiar to the user, ensuring clear communication.

Access Data Quality :


Data should be easily and securely accessible to those who need it.
 Accessibility: If data is not readily available to users, it loses its value. Data
locked behind overly stringent security measures may be as inaccessible as
data that is lost.
 Security: While data must be accessible, it must also be protected from
unauthorized access. Balancing these two aspects is a continuous challenge
in DQ management.

Data Quality Problems

The quality of data has become a paramount concern for organizations in our
digitally-driven era. Data quality (DQ) is not a singular, monolithic attribute but a
multidimensional concept where problems can manifest in various forms. From the
genesis of data to its final application, several factors can undermine its integrity.
This paper explores common DQ problems identified within the framework of
multiple data sources, subjective judgment in data production, limited computing
resources, the overwhelming volume of data, and the evolving nature of data
needs.

Multiple data sources: multiple sources with the same data may produce
duplicates – a problem of consistency.

Subjective judgment in data production: data production using human


judgment (e.g., opinions) can cause the production of biased information
– a problem of objectivity.

Limited computing resources: lack of sufficient computing resources


and/or digitalization may limit the accessibility of relevant data – a problem
of accessibility.

Volume of data: large volumes of stored data make it difficult to access


needed information in a reasonable time – a problem of accessibility.

Changing data needs: data requirements change on an ongoing basis due to


new company strategies or the introduction of new technologies – a problem
of relevance.

Different processes using and updating the same data – a problem of


consistency.
B - Roles in Data Management

Introduction:
In an era where data acts as the lifeblood of organizations, managing this
vital resource demands a variety of specialized roles, each contributing to the
overall quality and value of the data. These roles range from the architects who
design the data structures to the stewards and scientists who extract meaning and
ensure its integrity. This paper outlines the various job profiles within the context
of data management, emphasizing their importance in maintaining high data
quality (DQ) and driving business value.

Information Architect:
At the forefront of data structuring stands the Information Architect, the
visionary who lays down the foundational blueprint of an organization’s data
landscape. Charged with the design of the conceptual data model, the Information
Architect must possess a keen understanding of business processes and be adept at
translating these into IT solutions. By working in tandem with the business users
and database designers, the Information Architect ensures that the data model is
robust, scalable, and flexible, effectively bridging the gap between business needs
and technological capabilities.

Database Designer:
The Database Designer plays the critical role of converting the conceptual
blueprint into a practical framework. This involves the translation of the
conceptual data model into logical and internal data models that underpin the
functioning of database applications. The Database Designer’s domain extends to
aiding application developers with external data model views and establishing
company-wide naming conventions. This uniformity is crucial for future database
maintenance and consistency across the enterprise's data ecosystem.

Data Owner:
Ownership imparts accountability, and in the data realm, the Data Owner is
entrusted with the ultimate authority over data fields within the organization's
databases. This responsibility includes decisions on data access and usage. A Data
Owner must not only understand the data’s meaning but also ensure its currency
and accuracy. In cases where DQ issues arise, the Data Owner is the go-to person
for data stewards to initiate corrective actions, thus playing a pivotal role in the DQ
lifecycle.
Data Steward:
As guardians of DQ, Data Stewards are tasked with the ongoing assessment
and assurance of both business data and metadata quality. Through meticulous and
regular DQ checks, they apply and analyze various DQ indicators and metrics,
taking initiative based on their findings. Although Data Stewards do not correct
data themselves, they are instrumental in identifying and understanding the root
causes of DQ issues and designing preventive measures to eliminate these
problems at the source. Their role is preventative, ensuring that data integrity is
baked into the systems from the very beginning, thus saving costs and resources in
the long run.

Database Administrator:
The Database Administrator (DBA) is the technical custodian of the
database environment. Their role encompasses a broad spectrum of
responsibilities, including the installation, maintenance, and performance
optimization of the DBMS. DBAs are pivotal in disaster recovery planning,
securing data, and ensuring that the data infrastructure runs seamlessly. By
collaborating with network and system managers, and interfacing with database
designers, DBAs help minimize operational costs while maintaining service levels,
thereby directly influencing the performance and reliability of the data
management systems.

Data Scientist:
The Data Scientist is the alchemist of the data management world, turning
raw data into golden insights. With a diverse skill set that spans ICT, quantitative
modeling, business acumen, and creativity, the Data Scientist digs deep into data to
unearth patterns and predictions that inform strategic decisions. Their role is
crucial in interpreting and leveraging data, which, when done correctly, can lead to
breakthroughs in understanding customer behavior and market trends.

Conclusion:
The complexity and scale of modern data management necessitate a multi-
faceted team of professionals, each specializing in different aspects of the data
lifecycle. From the strategic foresight of the Information Architect to the analytical
prowess of the Data Scientist, these roles collectively ensure that data is not only of
high quality but also a driving force for innovation and growth. As organizations
continue to navigate the data-driven landscape, the synergy among these roles will
be crucial in transforming data into actionable business value.
C- Legacy Databases
Introduction:

In the realm of data management, legacy databases serve as a testament to


the evolution of technology and the enduring nature of data as a resource. Despite
their age and perceived obsolescence, these databases still play a significant role in
the current data architecture of many organizations. This paper explores the logical
data models of legacy database technologies, their expressive power, limitations,
and the reasons why they remain relevant in today’s fast-paced technological
landscape.

The Relevance of Legacy Database Technologies:


Legacy databases often remain entrenched in organizations due to historical
implementations and limited IT budgets. Their basic characteristics are essential
knowledge for the maintenance of existing database applications and potential
migration to modern Database Management Systems (DBMSs). Moreover, the
principles underpinning these older systems offer invaluable insights into the
semantic richness of newer technologies. Notably, the procedural Data
Manipulation Language (DML) and navigational access, hallmarks of these legacy
systems, have found their way into more recent databases, such as Object-Oriented
Database Management Systems (OODBMSs).

The Hierarchical Model:


One of the earliest data models, the hierarchical model, emerged during the
Apollo moon missions as a solution by IBM to manage the vast quantities of data.
Known as the Information Management System (IMS), this model has no formal
description and is characterized by structural limitations, rendering it a legacy
technology.

Building Blocks of the Hierarchical Model:


The hierarchical model is built on two main components: record types
and relationship types. Record types represent sets of records that describe
similar entities, such as products or suppliers, each consisting of various
fields or data items. Relationship types define the connections between these
record types, allowing only for hierarchical (1:N) relationships.
Consequently, a parent record may have multiple child records, but a child
record is limited to a single parent. This model inherently supports the
construction of hierarchical structures, with a single root record type at the
top and multiple leaf record types at the bottom.
Expressive Power and Limitations:
The hierarchical model's expressive power is notably restricted. It
supports only 1:N relationship types and does not accommodate N:M or 1:1
relationships without implementing workarounds that lead to a loss of
semantics and data redundancy. The retrieval of record data in this model is
also procedurally driven, requiring navigation from the root node down
through the hierarchy, which is inefficient by modern standards.

Illustrative Example: Department, Employee, and Project Structure


To exemplify the hierarchical model, consider a simple structure
involving departments, employees, and projects. The department record type
includes fields like department number, name, and location, and is linked to
employees and projects through parent-child relationships. Departments can
have multiple employees and projects, but each employee or project is tied
to exactly one department, underscoring the model's rigidity.

Implementing N:M Relationships in the Hierarchical Model:


Implementing N:M relationships within the hierarchical model
necessitates mapping these to 1:N relationships, a suboptimal solution that
often results in redundancy and semantic loss. For instance, in an employee-
project relationship, making the project a parent and the employee a child
would distort the true network structure into an artificial tree structure, with
implications for the integrity and utility of the data.

Conclusion:
The hierarchical model, with its procedural DML and navigational access, is
emblematic of the legacy databases that many organizations still grapple with.
Understanding these models is crucial not only for maintaining and potentially
upgrading these systems but also for appreciating the advanced semantic
capabilities of modern databases. Legacy databases serve as a reminder of the
technological journey that data management has undergone and continue to inform
the development of more sophisticated, efficient, and semantically rich database
technologies.

You might also like