Data Quality and Database Design 1
Data Quality and Database Design 1
Abstract: - Both products and services must satisfy customers’ requirements. Information Systems and their
Databases are the main support for organizations to collect, store, and retrieval these requirement data. If any of
these operations are badly executed or not made on the right data, they will not produce useful results, and our aim
will not get satisfied. That is the reason for which we are interested in data quality. This paper deals about what
data quality is, which are the most important dimension of data quality and how we can design quality databases.
Key-Words: - data quality, quality dimensions.
1 This research has been done within the framework of the CALIDAT Project, developed by Cronos Ibérica, S.A. in
collaboration with the Universidad de Castilla-La Mancha, supported by the Consejería de Educación y Cultura de la
Comunidad de Madrid (Ref: 09/0013/1999).
quality, data model (both conceptual and logical) accuracy, the degree to which
quality and data quality. In this paper we will focus the data reflect the objects
on the most prominent features of the data; regarding from the real world they
data model quality, the interested reader can consult represent; this includes:
[7]. conformity with the definition,
completion of values, validity
or conformity with the
company rules, accuracy of
2 Dimensions of information quality sources, accuracy of reality,
lack of duplication,
As we know, quality is a relative concept, as
accessibility.
far as it is in the eyes of the beholder; for this reason,
- Pragmatic quality, i.e. the
we can consider quality as a multidimensional
degree to which the data allows
concept, subject to restrictions and limitations ([4]).
the knowledge workers to
In recent years, various authors have proposed
satisfy the company objectives
different dimensions for data quality:
in an efficient and accurate
• [8] groups quality dimensions into three way.
self-explanatory categories:
• [13] analyse some of the causes of poor data
- Quality Dimensions of a Conceptual View
quality due to design deficiencies from an
- Content: Relevance of the data,
ontological perspective, identifying four
obtainability of values, clarity
quality dimensions:
of definition.
- Data Quality
- Scope: Comprehensiveness
- Nature of deficiency
and essentialness
- Completion
- Level of Detail: Granularity of
- Improper representation
attributes and precision of
domains.
- Composition: Naturalness,
As these authors indicate, the aim is that each state in
identifiability, Homogeneity,
the real world will unequivocally correspond to one
Minimal unnecessary
system state. If the unequivocal relationship is not
Redundancy.
verified, or the expected results are not obtained
- View Consistency: Structural
when operating with the data, a data deficiency is
and semantic consistencies.
produced.
- Reaction to Change: Flexibility
and robustness.
• [11] identify several dimensions group by
- Quality Dimensions of Data Values
four categories:
- Accuracy
- Intrinsic: precision, objectivity,
- Completeness
credibility, reputation
- Currency
- Accessibility: accessibility,
- Value Consistency
access security
- Quality Dimensions of Data
- Contextual: relevance, added
Representation
value, opportunity, completion,
- Appropriateness
data quantity
- Interpretability
- Representational:
- Portability
interpretability, comprehension
- Format Precision
facility, concise representation,
- Format Flexibility
consistent representation.
- Ability to represent null values
- Efficient usage of Recording
Media 3 Data base design and data quality
- Representation Consistency [5] suggest three different strategies in order to
• [2] emphasises two topics related to data improve the intrinsic quality of data bases:
quality: • Building richer semantic models that reflect
- Inherent quality, that is, data reality more accurately.
• Reinforcing databases by introducing a data base ([15]).
higher number of constraints, in order to
identify and discriminate problematic data 4 Conclusions and future research
and link them to the appropriate We can affirm that, if product and service quality has
applications. become a decisive factor of business success in
• Restricting the use of data to predefined recent years, information quality will receive a
processes, preventing them from being preferential role in the next decade.
modified by other processes so that they If we actually consider that information is the most
cannot be accidentally deleted. important business asset, one of the first aims of IT
professionals should consist in ensuring its quality.
Although these strategies allow a higher degree of We have presented some recent proposals regarding
data quality, they are not enough by themselves, information quality, but further research is needed on
since an adequate base is needed in order to manage the degree of quality attached to other processes
quality dimensions ([16]). Unfortunately, there are linked to information: modelling, data gathering and
very few proposals that consider data quality as a loading, and data presentation.
fundamental factor in the design process. In this On the one hand, companies will have to define a
sense, [14, 15] are an exception to this rule. They quality policies (see, for example, [8]) that defines
propose a method that is intended to complement the the obligations of each function in order to ensure
traditional design methodologies of database design, data quality in all its dimensions; on the other hand,
see figure 1 at the end of the paper. they will have to implement a process in order to
In the first step, see figure 1, apart from creating a evaluate the quality of the information at their
conceptual scheme, e.g. using an entity/ relationship disposal. There are several proposals regarding
model, quality requisites and candidate attributes information quality evaluation; English’s TQdM
should be identified, determining thereafter the (Total Quality data Management) can be highlighted.
‘quality parameter view’, so that each element within A decisive aspect regarding evaluation has to do
the conceptual schema can be linked to a quality with the definition of relevant metrics, that will allow
parameter. E.g., in an ‘academic’ data base, the an actual analysis and improvement of quality. In [3],
attribute ‘exam mark” can be linked to precision and three types of metrics are proposed: subjective
timeliness. Later on, subjective parameters are metrics (based on the data users’ judgement),
objectivated through the addition of labels to the objective metrics that are independent of the
attributes in the conceptual scheme (source, in order application (such as correction) and objective metrics
to know the degree of accuracy, and date, in order to belonging to the application (i.e., that are specific to
know the timeliness, of exam marks). a given domain). Besides, the actual value of
information (either produced by operational systems
Moreover, we can also propose an extension of or used to assist decision taking) should be
relational databases with indicators that allow the measured.
assignment of these objective and subjective
parameters to the quality of the values within the
References
[1] Celko J., Don`t Warehouse Dirty Data. Datamation, 15 October, 1995, pp. 42-52.
[2] English, L. Improving Data Warehouse and Business Information Quality. John Wiley & Sons, Inc.,
1999.
[3] Huang, K-T., Lee, Y.W. and Wang, R.Y. Quality Information and Knowledge. Prentice Hall, Upper
Saddle River, 1999
[4] Jones, C. Software Quality. Analysis and Guidelines for Success. London: International
Thomson Computer Press, 1997.
[5] Orman, L., Storey, V. Y Wang, R. Systems Approaches to Improving Data Quality. TDQM-94-05,
August 1994. Available on http://web.mit.edu/tdqm/www/papers/94/94-05.html
[6] Orr, KData Quality and System Theory. Communications of the ACM, 41 (2), 1998, pp. 66-71.
[7] Piattini, M., Genero, M., Calero, C., Ruiz, F. and Polo, M. Database quality. In: Advanced Databases:
Technology and Design. Piattini, M. and Diaz, O. (eds.). London, Artech House, 2000
[8] Redman, T. C. Data Quality for the Information Age. Artech House, Boston, 1996.
[9] Sneed, H.M. and Foshag, O. Measuring Legacy Database Structures. Proc. of The European Software
Measurement Conference FESMA’98, Coombes, Hooft and Peeters (eds.), 1998, pp. 199-210.
[10] Storey, V. C. and Wang, R. Modeling Quality Requirements in Conceptual Database Design. TDQM-
94-02, May 1994.
[11] Strong, D.M., Lee, Y.W. and Wang, R.Y. Data Quality in Context. Communications of the ACM ,
Vol. 40, No. 5, 1997, pp. 103-110.
[12] Strong, D.M., Lee, Y.W. and Wang, R.Y. 10 Potholes in the Road to Information Quality. IEEE
Computer, 1997, pp. 38-46.
[13] Wand, Y. and Wang, R.Y. (1996). Anchoring Data Quality Dimensions in Ontological Foundations.
Communications of the ACM, Vol. 39 (11), 1996, pp.86-95.
[14] Wang, R. Y., Kon, H. B. and Madnick, S. E. (1993). Data Quality Requirements Analysis and
Modeling. Proc. of the 9th International Conference on Data Engineering, IEEE Computer Society,
1993, pp. 670-677.
[15] Wang, R.Y., Reddy, M.P. and Kon, H.B. Toward quality data: An attribute-based approach. Decision
Support Systems, Vol. 13, 1995, pp. 349-372.
Application requirements
Step 1
DETERMINE THE APPLICATION DATA
VIEW OF DATA
Application candidate
Quality requirements Quality attributes
APPLICATION VIEW
Step 2
DETERMINE (SUBJETIVE) QUALITY
PARAMETERS FOR THE APPLICATION
PARAMETER
VIEW
Step 3
QUALITY
VIEWS
Step 4
QUALITY
SCHEMA