Big Data and Data Warehouse
Big Data and Data Warehouse
Big Data and Data Warehouse
Because databases typically process data in real time (or near real time), it is not practical
to allow users access to the databases. After all, the data will change while the user is looking
at them! As a result, data warehouses have been developed to allow users to access data for
decision making. You will learn about data warehouses in Section 3.4.
STEP 2: Activity
STEP 3: Deliverable
You are employed as the coordinator of multiple ongoing projects
Using the relational database design you created in Step 2,
within a company. Your responsibilities include keeping track
prepare a discussion of the advantages and disadvantages of
of the company’s commercial projects, its employees, and the
this database. How will it benefit the company? What additional
employees’ participation in each project. Usually, a project will
challenges might it create?
have multiple team members, but some projects have not been
assigned to any team members. For each project, the company Submit your design and your discussion to your instructor.
As recently as the year 2000, only 25 percent of the stored information in the world was
digital. The other 75 percent was analog; that is, it was stored on paper, film, vinyl records, and
the like. By 2015, the amount of stored information in the world was over 98 percent digital and
less than 2 percent nondigital.
As we discussed at the beginning of the chapter, we refer to the superabundance of data
available today as Big Data. That is, Big Data is a collection of data so large and complex that it
is difficult to manage using traditional database management systems. (We capitalize Big Data
to distinguish the term from large amounts of traditional data.)
Essentially, Big Data is about predictions. Predictions do not come from “teaching” com
puters to “think” like humans. Instead, predictions come from applying mathematics to huge
quantities of data to infer probabilities. Consider these examples:
Big Data systems perform well because they contain huge amounts of data on which to base
their predictions. Moreover, these systems are configured to improve themselves over time by
searching for the most valuable signals and patterns as more data are input.
• Exhibit variety;
• Include structured, unstructured, and semi-structured data;
• Are generated at high velocity with an uncertain pattern;
• Do not fit neatly into traditional, structured, relational databases; and
• Can be captured, processed, transformed, and analyzed in a reasonable amount of time
only by sophisticated information systems.
• In 2015 Google was processing more than 27 petabytes of data every day.
• Facebook members upload more than 10 million new photos every hour. In addition, they
click a “like” button or leave a comment nearly 3 billion times every day.
68 CHAPTER 3 Data and Knowledge Management
• The 800 million monthly users of Google’s YouTube service upload more than an hour of
video every second.
• The number of messages on Twitter is growing at 200 percent every year. By mid-2015, the
volume exceeded 550 million tweets per day.
1. Volume: We have noted the huge volume of Big Data. Consider machine-generated data,
which are generated in much larger quantities than nontraditional data. For instance, sen
sors in a single jet engine can generate 10 terabytes of data in 30 minutes. (See our discus
sion of the Internet of Things in Chapter 10.) With more than 25,000 airline flights per day,
the daily volume of data from just this single source is incredible. Smart electrical meters,
sensors in heavy industrial equipment, and telemetry from automobiles compound the
volume problem.
2. Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity
is critical because it increases the speed of the feedback loop between a company, its
customers, its suppliers, and its business partners. For example, the Internet and mobile
technology enable online retailers to compile histories not only on final sales, but on their
customers’ every click and interaction. Companies that can quickly utilize that informa
tion—for example, by recommending additional purchases—gain competitive advantage.
3. Variety: Traditional data formats tend to be structured and relatively well described, and
they change slowly. Traditional data include financial market data, point-of-sale transac
tions, and much more. In contrast, Big Data formats change rapidly. They include satellite
imagery, broadcast audio streams, digital music files, Web page content, scans of govern
ment documents, and comments posted on social networks.
Irrespective of their source, structure, format, and frequency, Big Data are valuable. If certain
types of data appear to have no value today, it is because we have not yet been able to ana
lyze them effectively. For example, several years ago when Google began harnessing satellite
imagery, capturing street views, and then sharing these geographical data for free, few people
understood its value. Today, we recognize that such data are incredibly valuable because anal
yses of Big Data yield deep insights. We discuss analytics in detail in Chapter 5.
Big Data Can Come from Untrusted Sources. As we discussed above, one of the char
acteristics of Big Data is variety, meaning that Big Data can come from numerous, widely varied
sources. These sources may be internal or external to the organization. For instance, a com
pany might want to integrate data from unstructured sources such as e-mails, call center notes,
and social media posts with structured data about its customers from its data warehouse. The
question is, How trustworthy are those external sources of data? For example, how trustworthy
is a tweet? The data may come from an unverified source. Further, the data itself, reported by
the source, can be false or misleading.
Big Data Is Dirty. Dirty data refers to inaccurate, incomplete, incorrect, duplicate, or erro
neous data. Examples of such problems are misspelling of words and duplicate data such as
retweets or company press releases that appear numerous times in social media.
Suppose a company is interested in performing a competitive analysis using social media
data. The company wants to see how often a competitor’s product appears in social media
B ig D a ta 69
outlets as well as the sentiments associated with those posts. The company notices that the
number of positive posts about the competitor is twice as large the number of positive posts
about itself. This finding could simply be a case where the competitor is pushing out its press
releases to multiple sources, in essence “blowing its own horn.” Alternatively, the competitor
could be getting many people to retweet an announcement.
Big Data Changes, Especially in Data Streams. Organizations must be aware that data
quality in an analysis can change, or the data itself can change, because the conditions under
which the data are captured can change. For instance, imagine a utility company that analyzes
weather data and smart-meter data to predict customer power usage. What happens when the
utility is analyzing this data in real time and it discovers data missing from some of its smart
meters?
Making Big Data Available. Making Big Data available for relevant stakeholders can help
organizations gain value. For example, consider open data in the public sector. Open data is
B ig D a ta 71
accessible public data that individuals and organizations can use to create new businesses
and solve complex problems. In particular, government agencies gather very large amounts
of data, some of which is Big Data. Making that data available can provide economic benefits.
The Open Data 500 study at the GovLab at New York University found some 500 examples of
U.S.-based companies whose business models depend on analyzing open government data.
Another example of making Big Data available occurred in the fight against the Ebola virus, as
you see in IT’s About Business 3.3.
Creating New Business Models. Companies are able to use Big Data to create new busi
ness models. For example, a commercial transportation company operated a large fleet of
POM
large, long-haul trucks. The company recently placed sensors on all its trucks. These sensors
wirelessly communicate large amounts of information to the company, a process called
telematics. The sensors collect data on vehicle usage (including acceleration, braking, corner
ing, etc.), driver performance, and vehicle maintenance.
By analyzing this Big Data, the transportation company was able to improve the condi
tion of its trucks through near-real-time analysis that proactively suggested preventive main
tenance. In addition, the company was able to improve the driving skills of its operators by
analyzing their driving styles.
The transportation company then made its Big Data available to its insurance carrier. Using
this data, the insurance carrier performed risk analysis on driver behavior and the condition of
the trucks, resulting in a more precise assessment. The insurance carrier offered the transpor
tation company a new pricing model that lowered the transportation company’s premiums by
10 percent.
Organizations Can Analyze More Data. In some cases, organizations can even process
all the data relating to a particular phenomenon, meaning that they do not have to rely as
much on sampling. Random sampling works well, but it is not as effective as analyzing an entire
dataset. In addition, random sampling has some basic weaknesses. To begin with, its accuracy
depends on ensuring randomness when collecting the sample data. However, achieving such
randomness is problematic. Systematic biases in the process of data collection can cause the
results to be highly inaccurate. For example, consider political polling using landline phones.
This sample tends to exclude people who use only cell phones. This bias can seriously skew the
results because cell phone users are typically younger and more liberal than people who rely
primarily on landline phones.
B ig D a ta 73
Human Resources. Employee benefits, particularly healthcare, represent a major busi HRM
ness expense. Consequently, some companies have turned to Big Data to better manage these
benefits. Caesars Entertainment (www.caesars.com), for example, analyzes health-insurance
claim data for its 65,000 employees and their covered family members. Managers can track
thousands of variables that indicate how employees use medical services, such as the num
ber of emergency room visits and whether employees choose a generic or brand name drug.
Consider the following scenario: Data revealed that too many employees with medical
emergencies were being treated at hospital emergency rooms rather than at less-expensive
urgent-care facilities. The company launched a campaign to remind employees of the high cost
of emergency room visits, and they provided a list of alternative facilities. Subsequently, 10,000
emergencies shifted to less-expensive alternatives, for a total savings of $4.5 million.
Big Data is also having an impact on hiring. An example is Catalyst IT Services (www
.catalystdevworks.com), a technology outsourcing company that hires teams for programming
jobs. Traditional recruiting is typically too slow, and hiring managers often subjectively choose
candidates who are not the best fit for the job. Catalyst addresses this problem by requiring
candidates to fill out an online assessment. It then uses the assessment to collect thousands
of data points about each candidate. In fact, the company collects more data based on how
candidates answer than on what they answer.
For example, the assessment might give a problem requiring calculus to an applicant who
is not expected to know the subject. How the candidate responds—laboring over an answer,
answering quickly and then returning later, or skipping the problem entirely—provides insight
into how that candidate might deal with challenges that he or she will encounter on the job.
That is, someone who labors over a difficult question might be effective in an assignment that
requires a methodical approach to problem solving, whereas an applicant who takes a more
aggressive approach might perform better in a different job setting.
The benefit of this big-data approach is that it recognizes that people bring different skills
to the table and that there is no one-size-fits-all person for any job. Analyzing millions of data
points can reveal which attributes candidates bring to specific situations.
As one measure of success, employee turnover at Catalyst averages about 15 percent per
year, compared with more than 30 percent for its U.S. competitors and more than 20 percent
for similar companies overseas.
Product Development. Big Data can help capture customer preferences and put that infor MKT
mation to work in designing new products. For example, Ford Motor Company (www.ford.com)
was considering a “three blink” turn indicator that had been available on its European cars for
years. Unlike the turn signals on its U.S. vehicles, this indicator flashes three times at the driv
er’s touch and then automatically shuts off.
Ford decided that conducting a full-scale market research test on this blinker would be too
costly and time consuming. Instead, it examined auto-enthusiast Web sites and owner forums to
discover what drivers were saying about turn indicators. Using text-mining algorithms, research
ers culled more than 10,000 mentions and then summarized the most relevant comments.
The results? Ford introduced the three-blink indicator on the new Ford Fiesta in 2010, and
by 2013 it was available on most Ford products. Although some Ford owners complained online
that they have had trouble getting used to the new turn indicator, many others defended it.
Ford managers note that the use of text-mining algorithms was critical in this effort because
they provided the company with a complete picture that would not have been available using
traditional market research.
Operations. For years, companies have been using information technology to make their POM
operations more efficient. Consider United Parcel Service (UPS). The company has long relied
on data to improve its operations. Specifically, it uses sensors in its delivery vehicles that can,
74 CHAPTER 3 Data and Knowledge Management
among other things, capture the truck’s speed and location, the number of times it is placed in
reverse, and whether the driver’s seat belt is buckled. These data are uploaded at the end of
each day to a UPS data center, where they are analyzed overnight. By combining GPS informa
tion and data from sensors installed on more than 46,000 vehicles, UPS reduced fuel consump
tion by 8.4 million gallons, and it cut 85 million miles off its routes.
MKT Marketing. Marketing managers have long used data to better understand their customers
and to target their marketing efforts more directly. Today, Big Data enables marketers to craft
much more personalized messages.
The United Kingdom’s InterContinental Hotels Group (IHG; www.ihg.com) has gathered
details about the members of its Priority Club rewards program, such as income levels and
whether members prefer family-style or business-traveler accommodations. The company then
consolidated all this information with information obtained from social media into a single
data warehouse. Using its data warehouse and analytics software, the hotelier launched a new
marketing campaign. Where previous marketing campaigns generated, on average, between
7 and 15 customized marketing messages, the new campaign generated more than 1,500. IHG
rolled out these messages in stages to an initial core of 12 customer groups, each of which is
defined by 4,000 attributes. One group, for instance, tends to stay on weekends, redeem reward
points for gift cards, and register through IHG marketing partners. Utilizing this information,
IHG sent these customers a marketing message that alerted them to local weekend events.
The campaign proved to be highly successful. It generated a 35 percent higher rate of cus
tomer conversions, or acceptances, than previous, similar campaigns.
POM Government Operations. With 55 percent of the population of the Netherlands living
under the threat of flooding, water management is critically important to the Dutch govern
ment. The government operates a sophisticated water management system, managing a net
work of dykes or levees, canals, locks, harbors, dams, rivers, storm-surge barriers, sluices, and
pumping stations.
In its water management efforts, the government makes use of a vast number of sensors
embedded in every physical structure used for water control. The sensors generate at least 2
petabytes of data annually. As the sensors are becoming cheaper, the government is deploying
more of them, increasing the amount of data generated.
In just one example of the use of sensor data, sensors in dykes can provide information
on the structure of the dyke, how well it is able to handle the stress of the water it controls,
and whether it is likely to fail. Further, the sensor data are providing valuable insights for new
designs for Dutch dykes. The result is that Dutch authorities have reduced the costs of manag
ing water by 15 percent.
and vendors together in a very interactive and engaging way. STEP 3: Deliverable
It uses vast amounts of data (volume), in real time (velocity), Choose one of the videos mentioned in Step 2, and write a
from multiple sources (variety) to bring this solution to its review. In your review, define Big Data, and discuss its basic
customers. Visit YouTube, and search for two videos —“Deliver characteristics relative to the video. Also in your review, note the
Personalized Retail Experiences Using Big Data” and “Harnessing functional areas of an organization referred to in each video.
Big Data and Social Media to Engage Customers”—both by user
“TIBCOSoftware.” Submit your review to your instructor.
transactional systems, where data are organized by business process, such as order entry,
inventory control, and accounts receivable.
• Use online analytical processing. Typically, organizational databases are oriented toward
handling transactions. That is, databases use online transaction processing (OLTP), where
business transactions are processed online as soon as they occur. The objectives are speed
and efficiency, which are critical to a successful Internet-based business operation. Data
warehouses and data marts, which are designed to support decision makers but not OLTP,
use OLTP. Online analytical processing involves the analysis of accumulated data by end
users. We consider OLAP in greater detail in Chapter 5.
• Integrated. Data are collected from multiple systems and then integrated around subjects.
For example, customer data may be extracted from internal (and external) systems and
then integrated around a customer identifier, thereby creating a comprehensive view of
the customer.
• Time variant. Data warehouses and data marts maintain historical data (i.e., data that
include time as a variable). Unlike transactional systems, which maintain only recent
data (such as for the last day, week, or month), a warehouse or mart may store years of
data. Organizations utilize historical data to detect deviations, trends, and long-term
relationships.
• Nonvolatile. Data warehouses and data marts are nonvolatile—that is, users cannot change
or update the data. Therefore, the warehouse or mart reflects history, which, as we just
saw, is critical for identifying and analyzing trends. Warehouses and marts are updated,
but through IT-controlled load processes rather than by users.
• Multidimensional. Typically the data warehouse or mart uses a multidimensional data
structure. Recall that relational databases store data in two-dimensional tables. In con
trast, data warehouses and marts store data in more than two dimensions. For this reason,
the data are said to be stored in a multidimensional structure. A common representa
tion for this multidimensional structure is the data cube.
The data in data warehouses and marts are organized by business dimensions, which are sub
jects such as product, geographic area, and time period that represent the edges of the data
cube. If you look ahead to Figure 3.6 for an example of a data cube, you see that the product
dimension is comprised of nuts, screws, bolts, and washers; the geographic area dimension
is comprised of East, West, and Central; and the time period dimension is comprised of 2013,
2014, and 2015. Users can view and analyze data from the perspective of these business dimen
sions. This analysis is intuitive because the dimensions are presented in business terms that
users can easily understand.
Figure 3.4 depicts a generic data warehouse/data mart environment. Let’s drill down into the
component parts.
Source Systems. There is typically some “organizational pain” (i.e., business need) that
motivates a firm to develop its BI capabilities. Working backward, this pain leads to information
D ata Ware h o u s e s an d D a ta M a r t s 77
requirements, BI applications, and source system data requirements. The data requirements
can range from a single source system, as in the case of a data mart, to hundreds of source
systems, as in the case of an enterprisewide data warehouse.
Modern organizations can select from a variety of source systems, including: operational/
transactional systems, enterprise resource planning (ERP) systems, Web site data, third-party
data (e.g., customer demographic data), and more. The trend is to include more types of data
(e.g., sensing data from RFID tags). These source systems often use different software packages
(e.g., IBM, Oracle) and store data in different formats (e.g., relational, hierarchical).
A common source for the data in data warehouses is the company’s operational databases,
which can be relational databases. To differentiate between relational databases and multi
dimensional data warehouses and marts, imagine your company manufactures four prod-
ucts—nuts, screws, bolts, and washers—and has sold them in three territories—East, West,
and Central—for the previous three years—2013, 2014, and 2015. In a relational database,
these sales data would resemble Figure 3.5(a) through (c). In a multidimensional database,
in contrast, these data would be represented by a three-dimensional matrix (or data cube),
as depicted in Figure 3.6. This matrix represents sales dimensioned by products and regions
and year. Notice that Figure 3.5(a) presents only sales for 2013. Sales for 2014 and 2015 are
presented in Figure 3.5(b) and (c), respectively. Figure 3.7(a) through (c) illustrates the equiva
lence between these relational and multidimensional databases.
Unfortunately, many source systems that have been in use for years contain “bad data”
(e.g., missing or incorrect data) and are poorly documented. As a result, data-profiling software
should be used at the beginning of a warehousing project to better understand the data. For
example, this software can provide statistics on missing data, identify possible primary and
foreign keys, and reveal how derived values (e.g., column 3 = column 1 + column 2) are calcu
lated. Subject area database specialists (e.g., marketing, human resources) can also assist in
understanding and accessing the data in source systems.
Organizations need to address other source systems issues as well. Often there are multi
ple systems that contain some of the same data and the best system must be selected as the
source system. Organizations must also decide how granular (i.e., detailed) the data should
D ata Ware h o u s e s an d D a ta M a r t s 79
be. For example, does the organization need daily sales figures or data at the individual trans
action level? The conventional wisdom is that it is best to store data at a highly granular level
because someone will likely request the data at some point.
Data Integration. In addition to storing data in their source systems, organizations need
to extract the data, transform them, and then load them into a data mart or warehouse. This
process is often called ETL, although the term data integration is increasingly being used to
reflect the growing number of ways that source system data can be handled. For example, in
some cases, data are extracted, loaded into a mart or warehouse, and then transformed (i.e.,
ELT rather than ETL).
Data extraction can be performed either by handwritten code (e.g., SQL queries) or by
commercial data-integration software. Most companies employ commercial software. This
software makes it relatively easy to specify the tables and attributes in the source systems that
are to be used, map and schedule the movement of the data to the target (e.g., a data mart or
warehouse), make the required transformations, and ultimately load the data.
After the data are extracted they are transformed to make them more useful. For exam
ple, data from different systems may be integrated around a common key, such as a customer
identification number. Organizations adopt this approach to create a 360-degree view of all of
their interactions with their customers. As an example of this process, consider a bank. Cus
tomers can engage in a variety of interactions: visiting a branch, banking online, using an ATM,
obtaining a car loan, and more. The systems for these touch points—defined as the numerous
ways that organizations interact with customers, such as e-mail, the Web, direct contact, and
the telephone—are typically independent of one another. To obtain a holistic picture of how
customers are using the bank, the bank must integrate the data from the various source sys
tems into a data mart or warehouse.
Other kinds of transformations also take place. For example, format changes to the data
may be required, such as using male and female to denote gender, as opposed to 0 and 1 or M
and F. Aggregations may be performed, say on sales figures, so that queries can use the summa
ries rather than recalculating them each time. Data-cleansing software may be used to “clean
up” the data; for example, eliminating duplicate records for the same customer.
Finally, data are loaded into the warehouse or mart during a specific period known as the
“load window.” This window is becoming smaller as companies seek to store ever-fresher data
in their warehouses. For this reason, many companies have moved to real-time data warehous
ing where data are moved (using data-integration processes) from source systems to the data
warehouse or mart almost instantly. For example, within 15 minutes of a purchase at Walmart,
the details of the sale have been loaded into a warehouse and are available for analysis.
Storing the Data. A variety of architectures can be used to store decision-support data. The
most common architecture is one central enterprise data warehouse, without data marts. Most
organizations use this approach, because the data stored in the warehouse are accessed by all
users and represent the single version of the truth.
Another architecture is independent data marts. This architecture stores data for a single
application or a few applications, such as marketing and finance. Limited thought is given to
how the data might be used for other applications or by other functional areas in the organiza
tion. This is a very application-centric approach to storing data.
The independent data mart architecture is not particularly effective. Although it may meet
a specific organizational need, it does not reflect an enterprise-wide approach to data man
agement. Instead, the various organizational units create independent data marts. Not only
are these marts expensive to build and maintain, but they often contain inconsistent data. For
example, they may have inconsistent data definitions such as: What is a customer? Is a particu
lar individual a potential or current customer? They might also use different source systems
(which may have different data for the same item, such as a customer address). Although inde
pendent data marts are an organizational reality, larger companies have increasingly moved to
data warehouses.
Still another data warehouse architecture is the hub and spoke. This architecture contains
a central data warehouse that stores the data plus multiple dependent data marts that source
80 CHAPTER 3 Data and Knowledge Management
their data from the central repository. Because the marts obtain their data from the central
repository, the data in these marts still comprise the single version of the truth for decision-sup
port purposes.
The dependent data marts store the data in a format that is appropriate for how the data
will be used and for providing faster response times to queries and applications. As you have
learned, users can view and analyze data from the perspective of business dimensions and
measures. This analysis is intuitive because the dimensions are in business terms, easily under
stood by users.
Metadata. It is important to maintain data about the data, known as metadata, in the data
warehouse. Both the IT personnel who operate and manage the data warehouse and the users
who access the data need metadata. IT personnel need information about data sources; data
base, table, and column names; refresh schedules; and data-usage measures. Users’ needs
include data definitions, report/query tools, report distribution information, and contact infor
mation for the help desk.
Data Quality. The quality of the data in the warehouse must meet users’ needs. If it does not,
users will not trust the data and ultimately will not use it. Most organizations find that the qual
ity of the data in source systems is poor and must be improved before the data can be used in
the data warehouse. Some of the data can be improved with data-cleansing software, but the
better, long-term solution is to improve the quality at the source system level. This approach
requires the business owners of the data to assume responsibility for making any necessary
changes to implement this solution.
To illustrate this point, consider the case of a large hotel chain that wanted to conduct tar
geted marketing promotions using zip code data it collected from its guests when they checked
in. When the company analyzed the zip code data, they discovered that many of the zip codes
were 99999. How did this error occur? The answer is that the clerks were not asking customers
for their zip codes, but they needed to enter something to complete the registration process.
A short-term solution to this problem was to conduct the marketing campaign using city and
state data instead of zip codes. The long-term solution was to make certain the clerks entered
the actual zip codes. The latter solution required the hotel managers to take the responsibility
for making certain their clerks enter the correct data.
Governance. To ensure that BI is meeting their needs, organizations must implement gover
nance to plan and control their BI activities. Governance requires that people, committees, and
processes be in place. Companies that are effective in BI governance often create a senior-level
committee comprised of vice presidents and directors who (1) ensure that the business strate
gies and BI strategies are in alignment, (2) prioritize projects, and (3) allocate resources. These
companies also establish a middle management–level committee that oversees the various
projects in the BI portfolio to ensure that these projects are being completed in accordance
with the company’s objectives. Finally, lower level operational committees perform tasks such
as creating data definitions and identifying and solving data problems. All of these committees
rely on the collaboration and contributions of business users and IT personnel.
Users. Once the data are loaded in a data mart or warehouse, they can be accessed. At this
point the organization begins to obtain business value from BI; all of the prior stages constitute
creating BI infrastructure.
There are many potential BI users, including IT developers; frontline workers; analysts;
information workers; managers and executives; and suppliers, customers, and regulators.
Some of these users are information producers whose primary role is to create information for
other users. IT developers and analysts typically fall into this category. Other users—including
managers and executives—are information consumers, because they utilize information cre
ated by others.
Companies have reported hundreds of successful data-warehousing applications. You
can read client success stories and case studies at the Web sites of vendors such as NCR
Corp. (www.ncr.com) and Oracle (www.oracle.com). For a more detailed discussion, visit the
K n o wl e d ge M a n a ge m e n t 81
Data Warehouse Institute (http://tdwi.org). The benefits of data warehousing include the
following:
• End users can access needed data quickly and easily via Web browsers because these data
are located in one place.
• End users can conduct extensive analysis with data in ways that were not previously possible.
• End users can obtain a consolidated view of organizational data.
These benefits can improve business knowledge, provide competitive advantage, enhance
customer service and satisfaction, facilitate decision making, and streamline business
processes.
Despite their many benefits, data warehouses have some limitations. First, they can be
very expensive to build and to maintain. Second, incorporating data from obsolete mainframe
systems can be difficult and expensive. Finally, people in one department might be reluctant to
share data with other departments.
STEP 1: Background
STEP 3: Deliverable
A set of general ingredients is required for organizations to
effectively utilize the power of data marts and data warehouses. To demonstrate that you recognize the environmental factors
Figure 3.4 presents this information. Healthcare as an industry necessary to implement and maintain a data warehouse, imagine
has not been centralized for many business, legal and ethical that the date is exactly five years in the future. Write a newspaper
reasons. However, the overall health implications of a centralized article titled “Data from Gadgets Like Fitbit Remade How Doctors
data warehouse are unimaginable. Treated Us.” In your article imagine that all of the ingredients
necessary in the environment have come together. Discuss what
STEP 2: Activity the environment was like five years ago (today) and how things
have evolved to create the right mix of environmental factors.
Visit http://www.wiley.com/go/rainer/MIS4e/applytheconcept,
and read the article in WIRED magazine from March 6, 2014, titled Be aware that there is no right/wrong answer to this exercise. The
“Gadgets Like Fitbit Are Remaking How Doctors Treat You.” As you objective is for you to recognize the necessary environment for a
read this article, you will see that several key ingredients exist, successful data warehouse implementation. The healthcare-related
though no one has built a medical data warehouse as described example simply provides a platform to accomplish this task.
Moreover, industry analysts estimate that most of a company’s knowledge assets are not
housed in relational databases. Instead, they are dispersed in e-mail, word-processing doc
uments, spreadsheets, presentations on individual computers, and in people’s heads. This
arrangement makes it extremely difficult for companies to access and integrate this knowl
edge. The result frequently is less-effective decision making.
Knowledge. In the information technology context, knowledge is distinct from data and
information. As you learned in Chapter 1, data are a collection of facts, measurements, and
statistics; information is organized or processed data that are timely and accurate. Knowledge
is information that is contextual, relevant, and useful. Simply put, knowledge is information in
action. Intellectual capital (or intellectual assets) is another term for knowledge.
To illustrate, a bulletin listing all of the courses offered by your university during one
semester would be considered data. When you register, you process the data from the bulle
tin to create your schedule for the semester. Your schedule would be considered information.
Awareness of your work schedule, your major, your desired social schedule, and characteristics
of different faculty members could be construed as knowledge, because it can affect the way
you build your schedule. You see that this awareness is contextual and relevant (to developing
an optimal schedule of classes) as well as useful (it can lead to changes in your schedule). The
implication is that knowledge has strong experiential and reflective elements that distinguish
it from information in a given context. Unlike information, knowledge can be utilized to solve
a problem.
Numerous theories and models classify different types of knowledge. In the next section,
we will focus on the distinction between explicit knowledge and tacit knowledge.
Explicit and Tacit Knowledge. Explicit knowledge deals with more objective, rational,
and technical knowledge. In an organization, explicit knowledge consists of the policies, pro
cedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure
of the enterprise. In other words, explicit knowledge is the knowledge that has been codified
(documented) in a form that can be distributed to others or transformed into a process or a
strategy. A description of how to process a job application that is documented in a firm’s human
resources policy manual is an example of explicit knowledge.
In contrast, tacit knowledge is the cumulative store of subjective or experiential learn
ing. In an organization, tacit knowledge consists of an organization’s experiences, insights,
expertise, know-how, trade secrets, skill sets, understanding, and learning. It also includes
the organizational culture, which reflects the past and present experiences of the organiza
tion’s people and processes, as well as the organization’s prevailing values. Tacit knowledge
is generally imprecise and costly to transfer. It is also highly personal. Finally, because it is
unstructured, it is difficult to formalize or codify, in contrast to explicit knowledge. A salesper
son who has worked with particular customers over time and has come to know their needs
quite well would possess extensive tacit knowledge. This knowledge is typically not recorded.
In fact, it might be difficult for the salesperson to put into writing, even if he or she were willing
to share it.
realize they need to integrate explicit and tacit knowledge into formal information systems.
Knowledge management systems (KMSs) refer to the use of modern information technol
ogies—the Internet, intranets, extranets, databases—to systematize, enhance, and expedite
intrafirm and interfirm knowledge management. KMSs are intended to help an organization
cope with turnover, rapid change, and downsizing by making the expertise of the organization’s
human capital widely accessible. IT’s About Business 3.4 illustrates how Performance Bicycle
implemented the Learning Center, a knowledge management system.
Organizations can realize many benefits with KMSs. Most importantly, they make best
practices—the most effective and efficient ways of doing things—readily available to a wide
range of employees. Enhanced access to best-practice knowledge improves overall organ
izational performance. For example, account managers can now make available their tacit
knowledge about how best to manage large accounts. The organization can then utilize this
knowledge when it trains new account managers. Other benefits include improved customer
service, more efficient product development, and improved employee morale and retention.
84 CHAPTER 3 Data and Knowledge Management
1. Create knowledge. Knowledge is created as people determine new ways of doing things or
develop know-how. Sometimes external knowledge is brought in.
2. Capture knowledge. New knowledge must be identified as valuable and be represented in
a reasonable way.
3. Refine knowledge. New knowledge must be placed in context so that it is actionable. This
is where tacit qualities (human insights) must be captured along with explicit facts.
4. Store knowledge. Useful knowledge must then be stored in a reasonable format in a knowl
edge repository so that other people in the organization can access it.
5. Manage knowledge. Like a library, the knowledge must be kept current. It must be re
viewed regularly to verify that it is relevant and accurate.
6. Disseminate knowledge. Knowledge must be made available in a useful format to anyone
in the organization who needs it, anywhere and anytime.