Data Quality Setting Organizational Policies
Data Quality Setting Organizational Policies
Data Quality Setting Organizational Policies
a r t i c l e i n f o a b s t r a c t
Article history: The collection, representation, and effective use of organizational data are important to a firm because these
Received 19 May 2010 activities facilitate the increasingly important analysis needed for business operations and business analytics.
Received in revised form 18 May 2012 Poor data quality can be a major cause for damages or losses of organizational processes. The many tasks that
Accepted 19 June 2012
individuals perform within an organization are linked and normally require access to shared data. These linkages
Available online 27 June 2012
are often documented as process flow diagrams that connect the data inputs and outputs of individuals. Howev-
Keywords:
er, in such a connected setting, the differences among individuals in terms of their preferences for data attributes
Data quality such as timeliness, accuracy, and others, can cause data quality problems. For example, individuals at the head of
Organizational policies a process flow could bear all of the costs of capturing high quality data but not receive all of the benefits, even
Economic analysis though the rest of the organization benefits from their diligence. Consequently, these individuals, in absence of
Incentives any managerial intervention, might not invest enough in data quality. This research analyzes this problem and
Data ownership proposes a set of solutions to this, and similar, organizational data quality problems. The solutions focus on prin-
ciples of employee empowerment, decentralization, and mechanisms to measure and reward individuals for
their data quality efforts.
© 2012 Elsevier B.V. All rights reserved.
0167-9236/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2012.06.004
V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442 435
reengineering [2,11,23]. Hence, analyzing data quality from a process number of business departments that may use, update, or augment
perspective would be very useful. the data gathered from previous steps in the process. The various
This research is intended to analyze data quality by modeling it from tasks that comprise a process are best illustrated by a work flow dia-
an organizational perspective, in which the improvements in decisions gram as shown in Fig. 1.
made is considered to be a key measure of data quality value. The The purchasing process begins with a report from the inventory
research explicitly recognizes that data quality management involves control department on inventory levels of raw materials needed by
multiple stakeholders, possibly with conflicts of interest among them. the manufacturing department. Any inaccuracies in data capture at
The models developed in this research are intended to be applicable this stage will affect decisions down the line; for example, data capture
to data quality management policies that are important and commonly errors by the Inventory Control department. Although some errors at
occur within organizations. Guidelines for improving data quality that this stage can be detected and corrected through the use of semantic
stress human factors in addition to technology ones are derived. and referential integrity constraints in databases and applications,
Data quality research often treats the organization as a monolithic those that persist throughout the process will result in incorrect deci-
entity with no differences in incentives or preferences. In reality, how- sions and actions. An undercount may result in the order of unneeded
ever, data quality is not valued identically by all users, nor is the value parts and a delay in manufacturing. An over count can result in unantic-
or cost of data quality distributed evenly across the organization. ipated delays in manufacturing schedules. If the managers in the Inven-
These differences, combined with the interdependence of departments tory Control department take extra care in training its workers and
on common business data, create a number of management problems build extra checks and balances into its data capturing process, the
when setting organizational data quality policies. whole organization will benefit from these efforts while the inventory
The objectives of this research, therefore, are to: 1) identify and department bears all the cost. A manager making a decision on levels
analyze sources of common data quality problems that arise in organi- of diligence will probably under invest in them from an organizational
zations; 2) suggest policies for organizations to follow when setting data perspective. This situation is modeled in the next section.
quality standards; and 3) propose solutions for data quality management An organization may solve this data quality problem in a number of
problems. ways. For example, information systems can be decentralized to the
Setting reasonable organizational data quality goals and auditing departmental level so the department that values the data most is
their implementation is difficult. The major source of this difficulty is given ownership of it. A more conventional approach involves setting
the fact that data quality depends on the whole business process with up a procedure to measure and reward data quality.
all users of the data affected. Data quality management involves multi- This paper proceeds as follows. The next section develops a model
ple stakeholders, possibly with conflicts of interest amongst them. for predicting quality choices made by individuals and organizations.
For example, data quality decisions made by the department that is This model is applied to develop a set of organizational policies to
responsible for capturing and maintaining the data limit the quality of solve data quality problems and the resulting managerial implications
the data for other departments. The interdependence of departmental of the solutions identified. A discussion and conclusion follow.
data quality decisions is best examined from a process perspective.
A business process consists of a set of activities that are performed 2. A model of organizational data quality
in coordination in an organizational and technical environment [29].
A typical business process [8] begins with a customer that may be This section presents a model to analyze organizational data quality
an entity inside or outside of the organization. It proceeds through a problems. This analysis concentrates on defining a model that can be
Materials
Legend
Manufacturing Schedule
Planning
Activities
Data Flow,
Purchasing Prepare Direct or
Order Indirect via
Database
Inventory Audit
Receive
Control Levels
Order
Vendor Fulfillment
Time
adopted at the organizational level. It models the most important the processes. With respect to the value of the organization, two
aspects of data quality considerations, while remaining flexible cases need to be considered by making one of two alternative
enough for organizations to add activities that are specific to their assumptions:
process flow.
Consider a process flow diagram, such as that shown in Fig. 1. The 4. (Interdependence but No Synergy) The value of the tasks to the orga-
following notation describes the diagram and other parameters: nization is the sum of the values to the individual divisions. This is a
simple case where there is no synergy in the value of the tasks, but
i index of activities, 1,…, n yet, there is still an interdependence that arises through common
Pi set of activities, excluding i, whose value is affected by the data usage. In this case V N ¼ ∑ vi and the marginal value of
i
quality levels chosen for activity i. Nominally, it may include
quality to the organization is the sum of the values to the individuals,
all the activities that follow i in the process flow. However,
∂V N ∂vj
it may be empty or as large as a set of all activities other i.e., ¼∑ .
than i. ∂qi j ∂qi
qi quality level chosen by worker performing activity i 5. (Interdependence and Synergy) The value to the organization is
ci(qi) cost of quality level qi to the worker larger than the sum of value to the individual, i.e, V S > ∑ vi . In
i
vi(qi, qj, j: i ∈ Pj) This is the value to a worker who performs activity i. this case, it is also assumed that VS is concave and that the marginal
It depends on i and on the activities that precede it.
value of quality to the organization is strictly greater than the mar-
VN Value to the organization in the case when there is no synergy
∂V S ∂V N ∂vj
between the activities other than interdependence on data. ginal value of quality to any one worker, i.e., > ¼∑
∂qi ∂qi j ∂qi
VS Value to the organization in the case with synergy between
the activities over and above interdependence on data. for all i. This can occur in a variety of ways. One common scenar-
io is that the quality of work done in a task in a process affects
The assumptions needed for the model are listed below and based other processes within the company. Consequently, the margin-
upon accepted notions of how organizations operate and basic econom- al value for the organization that cares about the value of all
ic principles. These variables can be measured, at some level, so that it work flows is greater than that of the worker performing the
will be possible for an organization to measure the impact, even when task at that one station. As is the case with the v i, assume that
given a range of values for analysis. ∂2 V s
> 0 if there is activity i such that j, k ∈ {i∪{j: i ∈ Pj}, j ≠ k
∂qj ∂qk
1. Cost of data quality effort: The cost of quality level qi of activity i to and zero otherwise.
the worker who performs it (and to the organization), ci(qi), is
twice differentiable, convex and increasing in qi . It is natural to With these assumptions in place, consider first a situation in which
expect that higher levels of quality cost more than lower level each worker selects the quality that is best for him or her. Since the
ones. If this were not so, it would be in the worker's self interest to value of quality to the worker depends on the quality level chosen by
choose higher levels of quality without any intervention from man- others, the worker has to consider another's decision while making
agement and the problem would be trivial. Furthermore, it is natural his or her own. Given the quality level that the worker expects others
to assume that a unit increase in quality at higher levels of quality to choose, he or she picks a level that is best for himself or herself.
costs more than a similar change at lower levels. This could arise Every worker makes a decision in a similar fashion. The equilibrium
for many reasons. Suppose, for example, that a worker has a number that exists in this setting is referred to in economics (specifically game
of means available to improve data quality. The worker would theory) as Nash Equilibrium [24]. A situation is in Nash Equilibrium if
“cherry-pick the low hanging fruit” first; that is, implement the qual- no one involved would be better off by changing his or her strategy
ity improvements that are easiest to implement first. while the other people involved (in this case the workers) retain their
2. Value of data quality: The value of an activity to a worker is increasing strategies. This is represented below.
and concave in the level of quality of all activities that affect it. The Each worker i, picks quality level qi∗ such that:
concavity assumption captures the fact that the value of decisions
may increase with increases in the quality of the data, but at a de-
qi ∈ arg max vi qi ; qj j : i∈P j −ci ðqi Þ
creasing rate. For example, in determining shipping rates, knowing qi
the region, state, county, zip code and street address would offer
increasing precision and quality of decisions, but the successive
impact of increases in precision on the accuracy of shipping costs The first order conditions for the above are:
decreases relatively.
3. Increase in data quality by a worker helps others: The cross partial ∂vi ∂ci
derivatives of the value of an activity to a worker is positive on ¼
∂qi ∂qi
∂2 vi
activities that affect it, i.e., > 0 for j, k ∈ {i∪{ j: i ∈ Pj}, j ≠ k
∂qj ∂qk Second, consider a situation in which the organization with no
and zero otherwise. This implies that the quality efforts by workers synergy between tasks (Assumption 4) selects the quality level for
are complementary, i.e., increases in the quality of one activity each worker.
makes the quality of others more valuable. To illustrate, consider
a simple organization in which Worker 1 observes and captures
max V N ðqÞ− ∑ ci ðqi Þ
the data and Worker 2 processes it. Processing the data more care- q¼ðq1 …qn Þ i¼1…n
fully may be worthwhile only if it has been captured accurately in
where V N ¼ ∑ vi .
the first place (“garbage in–garbage out.”) Conversely, it is worth i
more to capture data accurately if it is used with greater precision The first order condition for the organization's optimization problem
for decision making.
∂vi ∂vj ∂ci
is: þ ∑ ¼ for all i.
Next, the organization as a whole is considered, recognizing the ∂qi j:i∈Pj ∂qi ∂qi
interdependency of information technology, the organization, and Let q1⁎⁎,…, qN⁎⁎ be the organization's choice.
V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442 437
Theorem 1. Interdependence but No Synergy behavioral theories include enhanced participation in decision
making, hierarchy of authority [9], top management involvement
Considering an organization in which the value to the organization is [10], establishing leadership in data quality [26], and others [2].
the sum of the value to the individuals, the individuals under-invest in Each of the solutions is analyzed below. The first employee-based
quality compared to the organization's optimal choice, i.e., individual perspectives reply on mathematical analysis. The organizational
workers voluntarily select a quality level lower than the level optimal perspectives are discussion based.
for the organization.
3.1. Personal computing
The proof is provided in Appendix A.
This theorem highlights the quality management problem that The last decade has seen advances in technology being increasingly
exists in organizations. If managers do not intervene, then the workers, responsible for integration, communication, collaboration, and comput-
left to select the quality levels themselves, will not choose ones that are ing ease. This helps to address some of the economic problems faced by
the best for the organization. This kind of situation is common whenev- organizations by allocating decision rights to the person to whom it
er the actions of an individual create a benefit for others. Such actions matters. The ubiquitous proliferation of increasingly powerful personal
are said to create a positive externality. The individual making his or computers, and user friendly software and browsers, as well as reliable
her own decision ignores the value created for others. As a result, the telecommunications has resulted in a two decade long trend in growth
individual compares only the marginal cost of an additional unit of qual- of personal computing. Historically, business processes have wended
ity to the marginal value to the individual alone, rather than for the in- their way through many departments, each of which collects, modifies
dividual and the rest of the organization. It is this self-interested and updates data until it reaches a decision maker who uses the data.
behavior that leads to a less than optimal solution for the organization. Long data flows describing such processes are common. Quality deci-
Many similar situations occur, such as the “Tragedy of the Commons” sions, made earlier by workers and managers, who themselves may
[18], a situation in which individuals, acting on their own self- not use some of the data collected, impact the decisions made at the
interests, will eventually deplete a scarce resource when it is not of end of the flow. This causes a number of problems related to data
anyone's interest in the long run to do so. quality.
The problem gets even worse when there is synergy as the indi- First, a long data flow that separates data capturing and processing
vidual workers ignore this too in selecting their individually optimal from the final decision making also puts the workers at the early parts
quality levels. of the process at an information disadvantage. Since they do not make
the decision, the workers do not know the value of data qualities such
Theorem 2. Interdependence and Synergy as accuracy and timelines. This can, obviously, be a source of data qual-
ity problems.
For an organization in which the value to the organization is greater Consider an employee who is entering data into an electronic form
than the sum of the value to the individuals, the individuals under-invest while talking to a customer. Eventually, a decision may be made that
in quality even more than in organizations with no synergy. depends upon this data. The worker capturing the data may not know
the importance of the quality of different data elements in different
The proof is found in Appendix A. cases. For instance, the model year of a car may be critical for diagnos-
ing a certain kind of failure but not as relevant for others. This kind of
3. Organizational policies to address data quality problems knowledge is useful in motivating additional effort for data quality,
but hard to judge by anyone but the decision maker.
Organizations do not operate in a vacuum. Rather, they are affected Second, worker motivation can be a problem when the quality
greatly by the environment within which they operate and the new and investment must be made by one worker, and the benefits from
existing technologies they adopt. Changes in information technology, better decision making accrue to someone else. In such situations, the
especially, development of technologies for collaboration among local organization must craft an incentive mechanism that measures and,
and virtual teams; enterprise wide integrated data warehouses and accordingly, rewards workers.
planning systems; and other management information systems, can One solution to these data quality problems is task consolidation in
help to provide a number of solutions to the problems of data quality which the initial tasks of data capturing and processing are consoli-
in organizations. Employees are an organization's strongest asset. dated with decision making. Task consolidation also offers benefits
Therefore, solutions to organizational data quality problems should be of hand off and cycle time reductions. The value of having the deci-
approached from both employee and organizational perspectives. sion maker perform the data processing tasks is analyzed in the
Employee based: example below, based upon the model presented above.
1. Personal computing: the decision maker processes the Example 1. Consider the “Audit Inventory” and “Materials Planning”
information. tasks at the head of the process flow diagram in Fig. 1. The worker in
2. Employee empowerment: the worker at the head of the process, the Inventory Control department, Worker 1, counts and enters the
closest to the customer (customer-facing), makes decisions. inventory levels, which is used for materials planning by Worker 2
3. Teams: multi-skilled teams, with IT-facilitated, rich interaction, in the Manufacturing department. Let the value of data quality to
perform the process jointly. worker 1 be v1 = q1(a1-b1q1) where q1 is the level of quality picked
by worker 1, 0 ≤ q1 ≤ 1, and a1 > 2b1 > 0. A value of q1 = 1, represents
Organization theory based: that the quality is the best. The variables a1 and b1 are used in the
4. Data quality program: program for data quality measurement analysis to capture the desired direction. It is easily verified that v1
and incentives for key employees in the process. is increasing and concave over the range. The cost of achieving the
5. Ownership: change data ownership. quality level is c1 q12, where c1 > a1/2 − b1.
These solutions depend as much on organizational theory as Worker 2 decides on the quantity to order for each part using the
they do on technology. Problems with under-investment by indi- inventory data collected by worker 1. The value of purchase planning
viduals have been addressed by setting up incentive schemes [14] is affected by the quality of inventory data capture, q1, and the quality
or changing ownership rights to projects [4,7]. Relevant socio- of materials planning, q2.
438 V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442
Solving these simultaneously, we obtain the Nash solution: this worker more closely tracks the organization's desired quality level
and increases as the interaction level increases.
a1 a2 a1 b2 In Fig. 3, the quality level of planning picked by Worker 2 is lower
q1 ¼ ;q ¼ þε
2ðb1 þ c1 Þ 2 2ðb2 þ c2 Þ 4ðb1 þ c1 Þðb2 þ c2 Þ than that preferred by the organization. Note that this is the case even
though the value of inventory audit is not affected by the quality of
Obviously, by applying Theorem 1, the organization would prefer planning. When Worker 2 performs both tasks, he or she picks a higher
that individual workers do not set the quality level. quality for planning (and for inventory audit, as shown earlier). This
is because the higher level of quality of inventory audit makes an
Corollary 1. Using v1, v2, VN, and costs defined above, the workers indi- increased investment in the quality of planning more worthwhile.
vidually will under-invest in quality from an organizational perspective. From a process flow perspective, Worker 2 decides to process the data
4 more carefully when the data capturing is done with higher quality.
It is easily verified that Assumptions 1–4 hold when εb . This analysis indicates the presence of quality problems for the orga-
b2
This solution is explored further by determining the quality levels nization. Having Worker 2 perform both of the activities greatly allevi-
the organization would select: ates the problem. This is true even when Worker 2 is more skilled and
better paid than Worker 1 for larger interaction levels. This can be
2a1 ðb2 þ c2 Þ þ a2 b2 ε seen in Fig. 4 where the net value to the organization from different
q
1 ¼ processes is plotted, for c2 > c1, i.e., Worker 2's time is more valuable
4ðb1 þ c1 Þðb2 þ c2 Þ þ 4ðb2 þ c2 Þε−b22 ε2
than that of Worker 1. For large dependence, when ε is larger, the orga-
2a2 ðb1 þ c1 þ εÞ þ a1 b2 ε
q
2 ¼
nization would prefer Worker 2 do the job of the less skilled Worker 1,
4ðb1 þ c1 Þðb2 þ c2 Þ þ 4ðb2 þ c2 Þε−b22 ε2 in addition to his or her own job. In fact, Worker 2 picks a higher quality
level for both the activities if the worker does the whole process. In the
Consider an alternative process organization in which Worker 2 case of c1 > c2, this preference is true à fortiori.
picks the quality levels and performing both of the activities. To obtain
the solution, assume that Worker 2 receives a total value of v1 + v2. The 3.2. Employee empowerment
worker needs to spend q12 and q22 hours respectively for quality q1 and q2
on inventory audit and planning, respectively. The cost to the worker Employee empowerment uses technology to support decision
(and the organization) is c2 (q12 + q22). making and solves some of the externality problems. Employee em-
In Figs. 2 and 3, the quality levels picked by Workers 1 and 2 are powerment is another form of task consolidation where the worker
plotted against increasing interaction, ε where the following values earlier in the process, who typically is less skilled, performs many of
have been set: a1 = 3, b1 = 1, c1 = 8 and a2 = 10, b2 = 8, c2 = 9. the tasks that use the data he or she captures. This approach was origi-
In Fig. 2 the quality of inventory audit picked by Worker 1 is lower nally proposed by the Reengineering Principle: “Subsume information
than that preferred by the organization. In fact, while the organization processing work into real work that produces the information.” [11]. This
prefers to have an increase in quality of inventory audit as the interac- principle suggests that employees who capture the information also
tion level increases, the worker continues to pick the same low level. do more of the processing associated with decision making.
If we follow the principle above, and have Worker 2 perform inventory Employee empowerment creates challenges for the information
audit and planning, then the quality level of inventory audit selected by system to integrate information and provide decision support and
0.3 2.2
0.28 Picked by the organization
Organization dictates quality levels
2.1
0.26
0.24
Net Value
1.7
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
ε ε
Fig. 2. Quality of activity 1 (q1) versus interaction level (ε). Fig. 4. organizational net value versus interaction level (ε).
V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442 439
expert systems to aid hitherto lesser skilled workers to make a deci- The case where the organization picks the quality level is:
sion. This approach offers the advantages of reducing handoffs and
a
increasing quality. These advantages may outweigh the cost of IT sup- q ¼
2ðb þ cÞ−mεb−mεðb−2Þ
port and relative inefficiency of lesser skilled workers. To explore this,
consider once again the “Audit Inventory” and “Materials Planning”
tasks at the head of the process diagrammed in Fig. 1. If Worker 1, For b > 2, the denominator for q⁎⁎ is smaller by mε(b − 2), so, for
who previously only performed the inventory audit, is asked to do this case, q⁎⁎ > q⁎. The same relationship would have been obtained
the material planning as well, he or she might not perform the plan- by employing Theorem 1.
ning task as efficiently as Worker 2. Let η capture the increase in costs Corollary 2. Using the value and cost functions of Example 2, workers
from the inefficiency and additional technological support when under-invest in quality, i.e., q⁎⁎ ≥ q⁎.
Worker 1 performs the more complicated planning task. In particular,
the cost of performing Worker 2's task by Worker 1 is (1 + η) c1 q22. It is interesting to examine the performance of self-managed
In Fig. 5, η is set equal to 0.1 and the task dependence parameter workers as the interaction level increases. This is explored in Figs. 6
varied from 0 to 1. The net value to the organization is higher when and 7.
Worker 1 performs all tasks than when the tasks are highly depen- The solid line is the benchmark case in which the organization
dent. In this case, the value of task consolidation exceeds the cost of dictates the quality level for all workers, and the value of each worker
having a relatively inefficient worker perform all of the tasks. depends on six others. The benchmark case is represented by a solid
In general, the choice of a worker to perform the whole task; line. The dashed line is for a self-managed organization with levels of
that is, the choice between worker empowerment and end-user interaction ranging from 1 (star organization), 2 (straight line or hierar-
computing, will depend upon the relative inefficiency of the lesser chy of any span), and larger teams. We see that the individual quality
skilled worker and the cost of IT support versus the cost of using a levels and organization value increases as teams get larger. (Of course,
highly skilled worker to perform tasks that require lower skill levels. there may be a point where team size becomes too unwieldy.) The dif-
ference at interaction level 6 is the difference between dictated levels
3.3. Use of teams and voluntary choice (q⁎⁎ − q⁎).
Over the past two decades, the increased use of teams to replace 3.4. Incentive schemes
parts of a hierarchical organization has been well-documented and be-
lieved to be one of the most powerful enablers of organization structural Setting up business procedures and incentive schemes to co-
change [8]. Teams facilitate interactions, often involved and numerous, ordinate choices solves data quality problems within organizations.
among individuals. The interaction and awareness of each other's qual- There are three parts to this process: 1) set clear individual and group
ity decisions can change the choices made by an individual to one that goals for data quality; 2) set incentives that are tied to success in meet-
more closely reflects the choice of the organization. Teams, including ing goals; and 3) build-in mechanisms for measuring and compensating
virtual teams, are the favored method for organizing large projects workers based on performance.
[20]. The value of teams for improving data quality is illustrated below.
Example 2. This example continues the scenario explored in 3.4.1. Setting data quality goals
Example 1, which shows that worker i's self-motivated choice was First, clear data quality goals must be set. From a process perspec-
to pick a level of quality qi such that: tive, the choice of data quality goals is dictated by the goal of maximiz-
ing net value to the customers of the process. Applying this to the work
flow in Fig. 1, the key “customers” are the manufacturing department
ai −2bi qi þ εbi ∑ qj ¼ 2ci qi that schedules its activities based on parts availability and the supplier
i∈P j
who must be paid for goods and services rendered. The manufacturing
department would benefit greatly from an accurate inventory count
Assume that individuals are similar and have similar interactions. and reduced cycle time for the parts ordering process. The supplier
Let each individual interact, on average, with m other individuals. wants a short cycle time from shipping to payment. Additionally,
Invoking symmetry and simultaneously solving the conditions for there could be organizational goals for control and accounting.
each worker, we obtain: Information systems standards, which are part of any control mech-
anism for information systems, play a key role in formulating and com-
municating data quality goals. Programming standards (mandated
a
q ¼ structured walk-throughs, documentation requirements, etc.) are an
2ðb þ cÞ−mεb
example of information systems standards that help improve the
0.6
2.1
Net Value
Net Value
0.56
Worker 1 picks quality level
2 and performs both tasks
0.54
1.9
Each worker picks his/her 0.52 Self-Managed Teams
1.8 own quality level
1.7
0.2 0.4 0.6 0.8 1 2 3 4 5 6
ε Team Size
Fig. 5. Employee empowerment: net value versus interaction level (ε). Fig. 6. Impact of team size on quality level.
440 V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442
0.24
the manufacturing department who would then set the data quality
0.23 standards. (A similar solution was proposed by Van Alstyne et al. [25].)
Largest team size, quality dictated by the organization Decentralization of data ownership and management means that,
0.22 instead of having a centralized database, the data is decentralized and
managed by systems owned by the departments. In cases where there
q2 0.21
is a clear beneficiary from the quality of certain data, Coase's [7]
0.2 approach requires that the department that derives the most benefit
Self-managed teams
from the data quality also owns the data. It is then free to manage the
0.19 data server and set usage policies that are aligned with its data quality
requirements. This completely obviates the externality problem
2 3 4 5 6 described in Sections 2 and 3. In cases where there is not a clear benefi-
Team Size ciary of data quality, a combination of ownership policies and incentive
schemes will be needed to manage the data quality of the organization.
Fig. 7. Net value versus team size. A system of data stewardship would be appropriate because the benefi-
ciaries of data quality exist across the firm). Data governance (which
quality of data by reducing errors from software bugs, improving rele- includes a combination of ownership policies and incentive schemes)
vance of data, and so forth. Standards play a role similar to that of bud- is needed to manage the data quality of the organization.
gets in financial management. They set the minimal acceptable level
that all participants must meet to qualify for incentives.
4. Discussion and implications
3.4.2. Setting incentives
Differences in costs and benefits from data quality are a source of dif- 4.1. Organizational data interdependencies
ficulty in managing data quality within an organization as discussed
earlier. Consequently, it is only natural to expect incentives, which Data quality characterizes the whole business process rather than
alter a decision maker's preferences, to play a role in providing a solu- just the data found in corporate databases. Each step in the process,
tion. Group and individual incentives that are tied to clear organization- from data capture to processing for decision support, has an impact on
al goals can be very effective. These incentives, may change, not only the the final quality of the data. This creates interdependencies in the orga-
on time performance, but also large changes in employee morale and nization where the net value that an individual or department receives
cooperation. from data quality depends upon the choices of others. The result is a
source of data quality management problems in an organization. The
3.4.3. Measuring performance problems may manifest themselves as under investment in data quality
Setting goals and incentives work only if they are credible. Credibil- enhancing activities by individuals as they do not appreciate the value
ity requires a commitment to reward performance. This, in turn, they create for others within the organization. Even if managers are
requires that performance be measured in clear and direct terms in diligent in enhancing data quality, the multi-attribute nature of data
order to provide feedback to employees. Fortunately, information quality poses additional problems. Managers make tradeoffs between
systems are uniquely suited to facilitate measurement. Numerous data quality attributes that suit their decision making, but not necessar-
tools are available to measure human and machine performance. The ily the organization as a whole.
key is to choose one that makes measurements which are directly
connected to data quality goals. For example, if cycle time is an impor-
tant goal, the information system can be enhanced to use the time 4.2. Solution approaches
stamps on customer orders to regularly print out average cycle times.
A measurement example is the percentage scanned report that is regu- Solutions to these data interdependent data quality problems
larly generated at large retailers to measure the performance of point require attention to both the human and machine portions of the infor-
of sale personnel. A certain amount of skill and diligence is required to mation system. Data quality problems and solutions must be considered
reliably scan items at the point of sale terminal. The system keeps as early as the design stage of an information system. To the extent that
track of the percentage of items that the employee scans, instead of data quality problems can be anticipated, automated checks such as tra-
typing-in the product code. Thus, percentage correctly scanned is a ditional referential and integrity constraints can be built into database
quality measure that is directly measured by the information system management systems. With these constraints in place, all applications
used by the retailers. Employees who obtain the highest scanning that read or update data will transparently receive the benefit of these
percentages are recognized; others may require additional training. checks. It is also easier to build in data quality performance measure-
This analysis identifies a specific action to be taken by the organization. ments into the information system at the design stage. It would be use-
ful to analyze the decisions and determine the impact of data quality on
3.5. Data ownership policies them. Then, a measurement system can be built to measure these key
data quality attributes.
Data ownership polices involve decentralization of information Data quality considerations should play a large role in the design of
technology resources, departmental computing, etc. Coase [7] suggested information systems with data ownership a key decision. The depart-
that ownership or allocation of decision making rights can achieve a ment that owns the data should also manage the data server and sets
similar purpose. The work flow in Fig. 1 can be used to illustrate Coase's policies for data use update. Hence, it would be beneficial if the depart-
approach. First, consider a situation in which the inventory control ment that derives the most benefit from the data is the one to own
department does not report to the manufacturing department and and manage it. This would reduce the interdependency problem. The
maintains its data in its own or corporate database system. Provided remaining issues could be dealt with by an incentive system.
the manufacturing department does not have direct control and incen- Finally, to improve data quality, the human part of the information
tives are not used, the inventory department manager will under invest systems deserves as much conscious attention as the machine part.
in data quality for data capture from the perspective of the manufactur- Identifying key data quality characteristics, setting clear data quality
ing manager. This situation was illustrated in Fig. 2. One solution to the goals, and building incentive systems to reward individuals who per-
problem is to make the inventory department report to the manager of form well are essential parts of an organizational architecture.
V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442 441
Improvement efforts will require the development of the strategies there is no j with with i ∈ Pj.
and procedures that need to be put into place to obtain new data Substituting this into the first order condition for individual's
quality levels. Since the model developed incorporates variables choice, we get:
that take into consideration the decision makers, multiple conclu-
∂vi ∂ci
sions should be reached for what actions to take for improvements. − b0
∂qi ∂qi q¼q
Thus, this general approach of collecting information to instantiate
the variables, measuring, analyzing and attempting to improve the Since activity i has been taken to be an ‘initial’ activity, vi depends
data quality, may require iterations that might lead to changes in busi- only on qi. As a function of qi, vi − ci is strictly concave and so qi* b qi**.
ness rules and improvement in data quality monitoring and analysis. If activity i is not initial, then we similarly have:
5. Conclusion ∂vi ∂ci
− b 0 if the activity is not terminal and 0 otherwise:
∂qi ∂qi q¼q
This paper has proposed an analytical model to represent data qual-
ity management scenarios that involve multiple stakeholders. Applica- Note that ci depends only on qi while vi depends on qi and on all
tion of the model to different scenarios, with varying values of the qk where k precedes i. We can make an inductive hypothesis that
model's attributes, illustrated the sensitivity to the changes. The analy- ∂2 vi
qk* b qk** for all k that precede i. For such k we have > 0 and so
sis was used to generate a set of organizational policy considerations. ∂qi ∂qk
Guidelines for improving data quality that stress human factors in addi- ∂vi ∂vi
tion to technology ones were derived. b .
∂qi qi ¼qi and qk ¼qk if k≠i ∂qi q¼q
This research makes several notions explicit. It recognizes that
∂vi ∂ci
there are multiple aspects of, and need for, modeling data quality, Hence − b 0.
with different users of information having different data quality ∂qi ∂qi qi ¼qi and qk ¼qk if k≠i
needs. It focuses on data quality and processes, with the notion that As before this gives qi* b qi** for activities i that are not initial. QED
users can actually select the level of quality. There are different
Proof of Theorem 2. For λ: 0 ≤ λ ≤ 1, define:
ways to view and address the data quality problems that any organi-
zation faces. Although the research provides one formalized model of
f ðq; λÞ ¼ λðV S ðqÞ−∑ci ðqi ÞÞ þ ð1−λÞðV N ðqÞ−∑ci ðqi ÞÞ
data quality, it recognizes that there are others and that the overall
increase in data quality by one employee helps another. Data quality
Let q(λ) solve ∇f(q; λ) = 0.
depends on the whole business process and explicitly provides value dqðλÞ
Using the Implicit Function Theorem, we get: ¼ −H −1
to the firm. Data quality problems, then, should be approached from dλ
ð∇V S −∇V N Þ where H is the Hessian Matrix of f(.).
both employee and organizational perspectives. ∂c
Future research is needed to implement the model into a tool and (Note that the terms are independent of λ and hence they do
∂q
test it on simulated values of variables and real world data. The policies not enter into the above equation.)
might then be modified and expanded to reflect changing organization- Using convexity of c and concavity of VS and vi, H is a negative
al forms. An assessment is needed of the challenges involved in identi- definite matrix.
fying and estimating values for the variables involved in the models. By hypothesis, it has positive off diagonal elements.
By Takayama, Pg. 393, Thm. 4.D.3, H −1 is non-positive.
Acknowledgements ∇VS − ∇VN is non-negative by hypothesis.
dqðλÞ
This research was supported by Georgia State University and the Hence ≥0. This implies that q(0) ≤ q(1), i.e., the optimal
dλ
University of Rochester. The authors gratefully acknowledge the assis- quality choice by the organization with synergy is greater than one
tance of the editor-in-chief, the managing editor, and the reviewers. without synergy. QED
442 V.C. Storey et al. / Decision Support Systems 54 (2012) 434–442
References [19] K. Orr, Data quality and systems theory, Communications of the ACM 41 (1998)
66–71.
[1] D.P. Ballou, H.L. Pazer, Modeling data and process quality in multi-input, [20] A. Powell, G. Piccoli, B. Ives, Virtual teams: a review of current literature and
multi-output information systems,, Management Science 31 (1995) 150–162. directions for future research, ACM SIGMIS Database 35 (2004) 6–36.
[2] C. Batini, C. Cappiello, C. Francalanci, A. Maurino, Methodologies for data quality [21] L. Rao, K.-M. Osei-Bryson, An approach for incorporating quality-based cost–benefit
assessment and improvement, ACM Computing Surveys 41 (2009) 1–52. analysis in data warehouse design, Information Systems Frontiers 10 (2008)
[3] R. Blake, P. Mangiameli, The effects and interactions of data quality and problem 361–373.
complexity on classification, ACM Journal of Data and Information Quality (JDIQ) [22] T.C. Redman, Improve data quality for competitive advantage, In: SloanManagement
1 (2011). Review, Winter 1995, pp. 99–107.
[4] J.A. Brickley, C.W. Smith Jr., J.L. Zimmerman, Managerial Economics and Organiza- [23] M. Stoica, N. Chawat, N. Shin, An Investigation of the Methodologies of Business
tional Architecture, Irwin, 2006. Process Reengineering, In: Proceedings of the Information Systems Education
[5] C. Cappiello, C. Francalanci, B. Pernici, Data Quality Assessment From the User's Conference, 2003.
Perspective, In: Proceedings of the International Workshop on Information Quality [24] A. Takayama, Mathematical Economics, Dryden Press, 1985.
in Information Systems (IQIS'04), 2004. [25] M. Van Alstyne, E. Brynjolfsson, S. Madnick, Why Not One Big Database? In:
[6] S. Chaudhuri, U. Dayal, V. Narasayya, An overview of business intelligence tech- Principles for Data Ownership, Decision Support Systems, vol.15, 1995.
nology, Communications of the ACM 54 (8) (2011) 88–98. [26] R. Wang, H.B. Kon, Towards Total Data Quality Management (TDQM), In: Infor-
[7] R. Coase, The problem of social cost, Journal of Law and Economics 3 (1960) 1–44. mation Technology in Action: Trends and Perspectives, Prentice Hall, Englewood
[8] T.H. Davenport, Process Innovation: Reengineering Work Through Information Cliffs, NJ, 1993, pp. 179–197.
Technology, Harvard Business School Press, 1993. [27] R. Wang, V.C. Storey, C. Firth, Data quality research: a framework, survey, and
[9] R. Deshpande, The organizational context of marketing research use, Journal of analysis, IEEE Transactions on Knowledge and Data Engineering 7 (1995) 835–842.
Marketing 46 (1982) 91–101. [28] K. Weber, B. Otto, H. Oesterie, One size does not fit all — a contingency approach
[10] D. Halloran, S. Manchester, J. Morlarty, R. Riley, J. Rohrman, T. Skramstad, Systems to data governance, ACM Journal of Data and Information Quality 1 (2009).
development quality control, MIS Quarterly 2 (1978) 1–12. [29] M. Weske, Business Process Management: Concepts, Languages, Architectures,
[11] M. Hammer, Reengineering work: don't automate, obliterate, Harvard Business Springer, 2010.
Review (1990) 104–112.
[12] B. Heinrich, M. Kaiser, M. Klier, A procedure to develop metrics for currency and Rajiv Dewan is the Senior Associate Dean for Faculty and Research, Wm. E. Simon
its application in CRM, ACM Journal of Data and Information Quality 1 (2009). Graduate School of Business Administration, University of Rochester. He has research
[13] K.M. Huner, M. Ofner, B. Otto, Towards a Maturity Model for Corporate Data Quality interests in information technology and strategy and in managing information systems
Management, In: Proceedings of the 2009 ACM symposium on Applied Computing in organizations.
(SAC '09), 2009.
[14] M. Jensen, Organization theory and methodology, The Accounting Review 58
(1983) 319–339. Veda C. Storey is the Tull Professor of Computer Information Systems, J. Mack Robinson
[15] Y.W. Lee, L.L. Pipino, J.S. Funk, R.Y. Wang, Journey to Data Quality, 1, MIT Press, College of Business, Georgia State University. She has teaching and research interests in the
2009. Semantic Web, data management, conceptual modeling, and knowledge management.
[16] S.E. Madnick, Y. Lee, Search of novel ideas and solutions with a broader context of
data quality in mind, ACM Journal of Data and Information Quality 4 (2012).
[17] S.E. Madnick, Y. Lee, R.Y. Wang, H. Zhu, Overview and framework for data and Professor Freimer, is professor of operations and computer information systems, Wm. E.
information quality research, ACM Journal of Data and Information Quality 1 Simon Graduate School of Business Administration, University of Rochester. He has
(2009). teaching and research interests in applied probability and optimization. He applies his
[18] P. Milgrom, J. Roberts, Economics, Organization and Management, Prentice Hall, work to the analysis of problems in information systems and marketing. His work appears
1992. in management, engineering, economics, statistics and mathematics journals.