0% found this document useful (0 votes)

277 views19 pages

Big Data and Data Warehouse

This document discusses data warehousing and big data. It explains that data warehouses allow users to access data for decision making because databases typically process live data which is constantly changing. It then defines big data as a large and complex collection of data that is difficult to manage with traditional databases due to its volume, velocity, and variety. Examples are provided of the vast amounts of data generated each day by organizations like Google, Facebook, and YouTube. The key characteristics of big data - volume, velocity, and variety - are outlined.

Uploaded by

Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

277 views19 pages

Big Data and Data Warehouse

Uploaded by

Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

66 CHAPTER 3 Data and Knowledge Management

Because databases typically process data in real time (or near real time), it is not practical
to allow users access to the databases. After all, the data will change while the user is looking
at them! As a result, data warehouses have been developed to allow users to access data for
decision making. You will learn about data warehouses in Section 3.4.

Before you go on…

1. What is a data model?
2. What is a primary key? A secondary key?
3. What is an entity? An attribute? An instance?
4. What are the advantages and disadvantages of relational databases?

Apply the Concept 3.2

LEARNING OBJECTIVE 3.2 Discuss the advantages must keep track of the project’s title, description, location,
and disadvantages of relational databases. estimated budget, and due date.

Each employee can be assigned to one or more projects. Some

STEP 1: Background employees can also be on leave and therefore will not be working
This section has introduced you to the advantages and on any assignments. Project leaders usually need to know the
disadvantages of using a relational database. This is one of following information about their team members: name, address,
those concepts that cannot really be appreciated until you work phone number, Social Security number, highest degree attained,
through the process of designing a database. Even though very and his/her expertise (for example, IS, accounting, marketing,
few people go on to become database administrators, it is still finance).
valuable to have some understanding of how a database is built Your manager has instructed you to present an overview of the
and administered. In this activity, you will be presented with advantages and disadvantages of a database that would support
a scenario, and you will then apply the concepts you have just the company’s efforts to manage the data described in this
read about. In the process, you will develop a solid foundation scenario. At a minimum, you should define the tables and the
to discuss the advantages and disadvantages of relational relationships among the key variables to provide structure to the
databases. information that would be included in the database.

STEP 2: Activity
STEP 3: Deliverable
You are employed as the coordinator of multiple ongoing projects
Using the relational database design you created in Step 2,
within a company. Your responsibilities include keeping track
prepare a discussion of the advantages and disadvantages of
of the company’s commercial projects, its employees, and the
this database. How will it benefit the company? What additional
employees’ participation in each project. Usually, a project will
challenges might it create?
have multiple team members, but some projects have not been

assigned to any team members. For each project, the company Submit your design and your discussion to your instructor.

3.3 Big Data

We are accumulating data and information at an increasingly rapid pace from many diverse
sources. In fact, organizations are capturing data about almost all events—including events
that, in the past, firms never used to think of as data at all, such as a person’s location, the
vibrations and temperature of an engine, and the stress at numerous points on a bridge—and
then analyzing those data.
Organizations and individuals must process a vast amount of data that continues to rap
idly increase. According to IDC (a technology research firm; www.idc.com), the world generates
exabytes of data each year (an exabyte is 1 trillion terabytes). Furthermore, the amount of data
produced worldwide is increasing by 50 percent each year.
B ig D a ta 67

As recently as the year 2000, only 25 percent of the stored information in the world was
digital. The other 75 percent was analog; that is, it was stored on paper, film, vinyl records, and
the like. By 2015, the amount of stored information in the world was over 98 percent digital and
less than 2 percent nondigital.
As we discussed at the beginning of the chapter, we refer to the superabundance of data
available today as Big Data. That is, Big Data is a collection of data so large and complex that it
is difficult to manage using traditional database management systems. (We capitalize Big Data
to distinguish the term from large amounts of traditional data.)
Essentially, Big Data is about predictions. Predictions do not come from “teaching” com
puters to “think” like humans. Instead, predictions come from applying mathematics to huge
quantities of data to infer probabilities. Consider these examples:

• The likelihood that an e-mail message is spam;

• The likelihood that the typed letters “teh” are supposed to be “the”;
• The likelihood that the trajectory and velocity of a person jaywalking indicates that he
will make it across the street in time, meaning that a self-driving car need only slow
down slightly.

Big Data systems perform well because they contain huge amounts of data on which to base
their predictions. Moreover, these systems are configured to improve themselves over time by
searching for the most valuable signals and patterns as more data are input.

Defining Big Data

It is difficult to define Big Data. Here we present two descriptions of the phenomenon. First, the
technology research firm Gartner (www.gartner.com) defines Big Data as diverse, high-volume,
high-velocity information assets that require new forms of processing to enable enhanced deci
sion making, insight discovery, and process optimization. Second, the Big Data Institute (TBDI;
https://thebigdatainstitute.wordpress.com/) defines Big Data as vast datasets that:

• Exhibit variety;
• Include structured, unstructured, and semi-structured data;
• Are generated at high velocity with an uncertain pattern;
• Do not fit neatly into traditional, structured, relational databases; and
• Can be captured, processed, transformed, and analyzed in a reasonable amount of time
only by sophisticated information systems.

Big Data generally consists of the following:

• Traditional enterprise data—examples are customer information from customer relation

ship management systems, transactional enterprise resource planning data, Web store
transactions, operations data, and general ledger data.
• Machine-generated/sensor data—examples are smart meters; manufacturing sensors;
sensors integrated into smartphones, automobiles, airplane engines, and industrial
machines; equipment logs; and trading systems data.
• Social data—examples are customer feedback comments; microblogging sites such as
Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
• Images captured by billions of devices located throughout the world, from digital cameras
and camera phones to medical scanners and security cameras.

Let’s take a look at a few specific examples of Big Data:

• In 2015 Google was processing more than 27 petabytes of data every day.
• Facebook members upload more than 10 million new photos every hour. In addition, they
click a “like” button or leave a comment nearly 3 billion times every day.
68 CHAPTER 3 Data and Knowledge Management

• The 800 million monthly users of Google’s YouTube service upload more than an hour of
video every second.
• The number of messages on Twitter is growing at 200 percent every year. By mid-2015, the
volume exceeded 550 million tweets per day.

Characteristics of Big Data

Big Data has three distinct characteristics: volume, velocity, and variety. These characteristics
distinguish Big Data from traditional data:

1. Volume: We have noted the huge volume of Big Data. Consider machine-generated data,
which are generated in much larger quantities than nontraditional data. For instance, sen
sors in a single jet engine can generate 10 terabytes of data in 30 minutes. (See our discus
sion of the Internet of Things in Chapter 10.) With more than 25,000 airline flights per day,
the daily volume of data from just this single source is incredible. Smart electrical meters,
sensors in heavy industrial equipment, and telemetry from automobiles compound the
volume problem.
2. Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity
is critical because it increases the speed of the feedback loop between a company, its
customers, its suppliers, and its business partners. For example, the Internet and mobile
technology enable online retailers to compile histories not only on final sales, but on their
customers’ every click and interaction. Companies that can quickly utilize that informa
tion—for example, by recommending additional purchases—gain competitive advantage.
3. Variety: Traditional data formats tend to be structured and relatively well described, and
they change slowly. Traditional data include financial market data, point-of-sale transac
tions, and much more. In contrast, Big Data formats change rapidly. They include satellite
imagery, broadcast audio streams, digital music files, Web page content, scans of govern
ment documents, and comments posted on social networks.

Irrespective of their source, structure, format, and frequency, Big Data are valuable. If certain
types of data appear to have no value today, it is because we have not yet been able to ana
lyze them effectively. For example, several years ago when Google began harnessing satellite
imagery, capturing street views, and then sharing these geographical data for free, few people
understood its value. Today, we recognize that such data are incredibly valuable because anal
yses of Big Data yield deep insights. We discuss analytics in detail in Chapter 5.

Issues with Big Data

Despite its extreme value, Big Data does have issues. In this section, we take a look at data
integrity, data quality, and the nuances of analysis that are worth noting.

Big Data Can Come from Untrusted Sources. As we discussed above, one of the char
acteristics of Big Data is variety, meaning that Big Data can come from numerous, widely varied
sources. These sources may be internal or external to the organization. For instance, a com
pany might want to integrate data from unstructured sources such as e-mails, call center notes,
and social media posts with structured data about its customers from its data warehouse. The
question is, How trustworthy are those external sources of data? For example, how trustworthy
is a tweet? The data may come from an unverified source. Further, the data itself, reported by
the source, can be false or misleading.

Big Data Is Dirty. Dirty data refers to inaccurate, incomplete, incorrect, duplicate, or erro
neous data. Examples of such problems are misspelling of words and duplicate data such as
retweets or company press releases that appear numerous times in social media.
Suppose a company is interested in performing a competitive analysis using social media
data. The company wants to see how often a competitor’s product appears in social media
B ig D a ta 69

outlets as well as the sentiments associated with those posts. The company notices that the
number of positive posts about the competitor is twice as large the number of positive posts
about itself. This finding could simply be a case where the competitor is pushing out its press
releases to multiple sources, in essence “blowing its own horn.” Alternatively, the competitor
could be getting many people to retweet an announcement.

Big Data Changes, Especially in Data Streams. Organizations must be aware that data
quality in an analysis can change, or the data itself can change, because the conditions under
which the data are captured can change. For instance, imagine a utility company that analyzes
weather data and smart-meter data to predict customer power usage. What happens when the
utility is analyzing this data in real time and it discovers data missing from some of its smart
meters?

Managing Big Data

Big Data makes it possible to do many things that were previously impossible; for example,
to spot business trends more rapidly and accurately, prevent disease, track crime, and so on.
When properly analyzed, Big Data can reveal valuable patterns and information that were pre
viously hidden because of the amount of work required to discover them. Leading corpora
tions, such as Walmart and Google, have been able to process Big Data for years, but only at
great expense. Today’s hardware, cloud computing (see Plug IT In 4), and open-source software
make processing Big Data affordable for most organizations.
The first step for many organizations toward managing data was to integrate information
silos into a database environment and then to develop data warehouses for decision making.
(An information silo is an information system that does not communicate with other, related
information systems in an organization.) After completing this step, many organizations turned
their attention to the business of information management—making sense of their proliferat
ing data. In recent years, Oracle, IBM, Microsoft, and SAP have spent billions of dollars purchas
ing software firms that specialize in data management and business intelligence. (You will learn
about business intelligence in Chapter 5.)
In addition to existing data management systems, today many organizations employ
NoSQL databases to process Big Data. Think of them as “not only SQL” (structured query lan
guage) databases. (We discuss SQL in Plug IT In 3.)
As you have seen in this chapter, traditional relational databases such as Oracle and MySQL
store data in tables organized into rows and columns. Recall that each row is associated with
a unique record and each column is associated with a field that defines an attribute of that
account.
In contrast, NoSQL databases can manipulate structured as well as unstructured data
and inconsistent or missing data. For this reason, NoSQL databases are particularly useful
when working with Big Data. Many products utilize NoSQL databases, including Cassandra
(http://cassandra.apache.org), CouchDB (http://couchdb.apache.org), and MongoDB (www
.mongodb.org).
Hadoop (http://hadoop.apache.org) is not a type of database, but rather a collection of
programs that allow storage, retrieval, and analysis of very large datasets using massively par
allel processing. Massively parallel processing is the coordinated processing of an application
by multiple processors that work on different parts of the application, with each processing
using its own operating system and memory. As such, Hadoop enables the processing of NoSQL
databases, which can be spread across thousands of servers without a reduction in perfor
mance. For example, a large database application which could take 20 hours of processing time
on a centralized relational database system may only take a few minutes when using Hadoop’s
parallel processing. IT’s About Business 3.2 shows how TrueCar uses Hadoop to manage its
rapidly growing amount of data.
MapReduce refers to the software procedure of dividing an analysis into pieces that can be
distributed across different servers in different locations. MapReduce first distributes the anal
ysis (map) and then collects and integrates the results back into one report (reduce).
70 CHAPTER 3 Data and Knowledge Management

IT’s About Business 3.2

MKT TrueCar Uses Hadoop actual and potential car buyers who visit its Web site. TrueCar ana
lyzes that data and provides pricing and Web sales tips to its net
TrueCar (www.truecar.com) is an automotive pricing and informa work of car dealers.
tion Web site for buyers and dealers of new and used cars. Via its TrueCar has experienced problems implementing Hadoop.
e-commerce Web site, the company provides buyers with informa For example, TrueCar sells a “white label” or generic version of its
tion on how much other customers have paid for cars in addition to platform to other companies, such as banks, to rebrand as their
upfront pricing and access to a network of more than 10,000 True- own. But the white label version requires security features that Ha
Car dealers. Participating dealerships pay TrueCar when customers doop does not currently support. Third-party clients such as banks
purchase a car from them through the site. and financial institutions must comply with regulations and privacy
MIS Founded in 2005, TrueCar grew very rapidly, but its data laws in order for Hadoop data to work. Because of these particular
management infrastructure did not keep pace. TrueCar data requirements, TrueCar is closely monitoring the development
decided to implement Hadoop, for two reasons. First, the software of new features in Hadoop such as enhanced security features and
tool is an economical way to store data, and it’s compatible with the ability to track metadata.
several analytics tools. Because the Hadoop data lake holds all of the data in a single
Second, TrueCar experienced problems in using traditional location, TrueCar can perform a variety of valuable analytics strate
relational databases to analyze very large amounts of rapidly ac gies. For instance, one analysis revealed a positive link between the
cumulating data. For example, the car data vary greatly in struc size of a Web cache and sales. A Web cache temporarily stores Web
ture, meaning that some of the data are structured and some are documents, such as images, to speed up the loading of Web pages.
unstructured. Examples of the company’s structured data are car The analysis found that the larger the Web caches, the faster the
brand, car name, car color, and car price. Examples of unstructured page loads, the more satisfied the customer is, and the greater the
data are customer comments and images of cars. This data vari likelihood of a purchase.
ability had created problems for TrueCar’s existing infrastructure. This is one example of how TrueCar’s analytics now include
Essentially, the company is using Hadoop to pursue new data- “dark data,” which organizations typically deleted, overlooked, or
driven business models specifically developed to exploit all of the ignored. That is, the sizes of Web caches were largely ignored in the
information assets available to them. past, but today they are considered valuable.
TrueCar went on to create a 2-petabyte Hadoop data lake And how is TrueCar doing? As of July 2015, more than two mil
storage repository that currently holds information on vehicles, lion vehicles had been bought from TrueCar-certified dealers.
transactions, registrations, customers’ buying behaviors, and
many additional variables. A data lake stores vast amounts of data
Sources: Compiled from K. Korosec, “TrueCar Helps Sam’s Club Get
in their native (original) format until they are needed, rather than into the Car Business,” Fortune, September 3, 2015; C. Bruce, “Scott
forcing the integration of large volumes of data before analysis. Painter Stepping Down as TrueCar CEO by Year End,” Autoblog,
Examples of these raw data are documents in .doc, .txt, or .rtf for August 10, 2015; D. Undercoffler, “TrueCar Works on Relationships,”
mats; Adobe Photoshop documents in .psd formats; images in .jpg Automotive News, July 27, 2015; “TrueCar Announces Its Users Have
and .gif formats; and video in .mp4 format. Bought over 2 Million Vehicles from TrueCar Certified Dealers,” PR
Hadoop allows TrueCar to work with rapidly increasing Newswire, June 29, 2015; D. Levin, “TrueCar, Facing Lawsuits, Sees
Share Price Decline for Most of 2015,” TheStreet, June 20, 2015; D.
amounts of data. In fact, TrueCar was managing 24 times as much
Needle, “Hadoop Summit: Wrangling Big Data Requires Novel Tools,
data by the end of 2014 as it was a year earlier. Those data origi Techniques,” eWeek, June 10, 2015; D. Undercoffler, “Dealers vs.
nated from 12,000 data sources and contain 65 billion data points. TrueCar: The Saga Continues,” Automotive News, June 1, 2015; J.
Of the 2 petabytes in its Hadoop data lake, TrueCar has 600 tera Vaughan, “Hadoop Fuels TrueCar’s Data-Driven Business Model,”
bytes in active use at any one time, involving more than 20 million TechTarget, November 21, 2014; “TrueCar: Over 1 Million Cars Sold
buyer profiles. Hadoop provides TrueCar customers with an ad through Affinity Auto Buying Programs,” InsuranceNewsNet, June 13,
vanced, multidimensional, real-time search capability to help them 2014; www.truecar.com, accessed October 10, 2015.
find their perfect cars.
An early example of TrueCar’s use of Hadoop was to manage Questions
700 million car photos every day. The firm notes that if there is no
1. Describe how Hadoop manages Big Data in its data lake.
image of a vehicle, then there is no sale.
TrueCar is also using Hadoop to make money from “exhaust 2. Discuss why relational databases experienced problems with
data,” which is information resulting as a byproduct of normal op the variety of data that TrueCar has to manage and analyze.
erations. For example, the company collects exhaust data about 3. What are the benefits of Big Data to TrueCar?

Putting Big Data to Use

Organizations must manage Big Data and gain value from it. There are several ways to do this.

Making Big Data Available. Making Big Data available for relevant stakeholders can help
organizations gain value. For example, consider open data in the public sector. Open data is
B ig D a ta 71

accessible public data that individuals and organizations can use to create new businesses
and solve complex problems. In particular, government agencies gather very large amounts
of data, some of which is Big Data. Making that data available can provide economic benefits.
The Open Data 500 study at the GovLab at New York University found some 500 examples of
U.S.-based companies whose business models depend on analyzing open government data.
Another example of making Big Data available occurred in the fight against the Ebola virus, as
you see in IT’s About Business 3.3.

IT’s About Business 3.3

Combining Big Data and Open Data to Fight Ebola can show heatmaps that link public mobile call data with specific
locations. (A heatmap is a graphical representation of data that
In December 2013, an outbreak of the Ebola virus began in the represents individual data values as colors.) The analytics pro
West African nation of Guinea and then spread to the neighbor cess provided an overview of regional population movements that
ing countries of Liberia and Sierra Leone. In mid-2014, the World helped predict the spread of Ebola. The cell phone data models and
Health Organization (WHO; www.who.it) declared the outbreak a WHO reports were used to focus on the best ways to prevent the
public health emergency of international concern. A dysfunctional spread and provide healthcare.
healthcare system, a mistrust of government officials after years Integrating other information, such as social media, hospital
of armed conflict, local burial customs that include washing the updates, health clinic and physician reports, media reports, trans
body after death, widespread poverty, severely limited health and actional data from retailers and pharmacies, and flight records,
infrastructure resources, and the delay in responding to the out authorities used analytics software to determine where and how
break for several months all contributed to the failure to control to respond to the crisis. The system also integrated open data from
the epidemic. The outbreak became the largest recorded occur governments that assisted in relief efforts, such as census data and
rence of Ebola. other data concerning roads, airports, schools, and medical facili
The unprecedented scope of the outbreak, coupled with the ties. Finally, it enabled healthcare workers to pinpoint previously
fact that the Ebola virus incubates between 2 and 21 days before unanticipated trends and reduce the spread of the disease, thereby
the victim knows they are infected, made it extremely difficult to limiting the number of deaths.
determine how and where the disease would spread. This situation The system also allowed authorities to home in on areas with
presented an enormous challenge for the humanitarian agencies suspected Ebola cases that might need medical supplies and elec
providing care to those most in need. For example, out of a popu tricity. It also assisted in the stark tasks of collecting bodies and
lation of four million, Liberia had only 200 physicians. As a result, performing burials. In addition, the system provided data for gov
the integration of open data from governments, Big Data, analytics ernments to use in requesting international aid, particularly more
software, and the willingness to collaborate across traditional geo testing facilities and equipment.
political boundaries was essential in the fight against the disease.
IBM’s research lab in Africa (www.research.ibm.com/labs/
Sources: Compiled from “Ebola Situation Report,” World Health
africa/) developed a citizen engagement and analytics system that Organization Report, September 30, 2015; K. Dvorak, “Researchers
enables citizens to report their concerns directly to government Turn to Big Data, Social Media to Track Ebola,” FierceHealthIT, May 13,
agencies. In addition, IBM provided staff who volunteered to iden 2015; M. Shacklett, “Fighting Ebola with a Holistic Vision of Big Data,”
tify and classify all sources of open Ebola-related data in the Ebola TechRepublic, March 23, 2015; B. Rossi, “How Big Data Is Beating
Open Data Repository (www.eboladata.org). The system provides Ebola,” Information Age, March 5, 2015; E. Malykhina, “IBM Brings
information on the daily experiences of communities to govern Open Data Tech to Ebola Fight,” InformationWeek, November 10, 2014;
“Ebola and Big Data: Call for Help,” The Economist, October 25, 2014;
ments and aid agencies in the hopes of helping to stop the spread
M. Wall, “Ebola: Can Big Data Analytics Help Contain Its Spread?”
of the disease. BBC News, October 15, 2014; D. Richards, “How Big Data Could Help
Launched in collaboration with Sierra Leone’s Open Govern Stop the Ebola Outbreak,” CNBC, October 1, 2014; “How Big Data Can
ment initiative (www.ogi.gov.sl), Cambridge University’s Africa’s Help Beat Ebola,” IBM Smarter Planet, August, 2014; L. Gilpin, “How
Voices project, telecommunications company Airtel (www.airtel an Algorithm Detected the Ebola Outbreak a Week Early, and What
.com), and Kenya’s Echo Mobile (www.echomobile.org), IBM’s sys It Could Do Next,” TechRepublic, August 26, 2014; www.who.int, ac
tem incorporates data from a number of sources. For example, cessed October 13, 2015.
wireless carriers gather data from cell phones and make it anony
mous before providing it to researchers. Specifically, Airtel set up a Questions
toll-free number that citizens used to report Ebola-related matters
1. Provide examples of open data mentioned in this case.
via short message service (SMS) or voice calls.
Utilizing IBM’s cloud-based analytics software, the system 2. Provide examples of Big Data mentioned in this case.
highlighted and correlated trends found among all messages. 3. Why was the integration of open data and Big Data essential
The text message and voice data identify location, so the system to help lessen the impact of the Ebola virus?
72 CHAPTER 3 Data and Knowledge Management

Enabling Organizations to Conduct Experiments. Big Data allows organizations to

improve performance by conducting controlled experiments. For example, Amazon (and many
other companies such as Google and LinkedIn) constantly experiments by offering slight dif
ferent “looks” on its Web site. These experiments are called A/B experiments, because each
experiment has only two possible outcomes. Here is an example of an A/B experiment at Etsy
.com, an online marketplace for vintage and handmade products.
MKT When Etsy analysts noticed that one of its Web pages attracted customer attention but
failed to keep it, they looked more closely at the page. They found that the page had few “calls
to action.” (A call to action is an item, such as a button, on a Web page that enables a customer
to do something.) On this particular Etsy page, customers could leave, buy, search, or click on
two additional product images. The analysts decided to show more product images on the
page.
Consequently, one group of visitors to the page saw a strip across the top of the page
that displayed additional product images. Another group of page visitors saw only the two
original product images. On the page with additional images, customers viewed more prod
ucts and bought more products. The results of this experiment revealed valuable information
to Etsy.

Micro-Segmentation of Customers. Segmentation of a company’s customers means

dividing them up into groups that share one or more characteristics. Micro-segmentation
simply means dividing customers up into very small groups, or even down to the individual
level.
MKT For example, Paytronix Systems (www.paytronix.com) provides loyalty and rewards pro
gram software for thousands of different restaurants. Paytronix gathers restaurant guest data
from a variety of sources beyond loyalty and gift programs, including social media. Paytronix
analyzes this Big Data to help its restaurant clients micro-segment their guests. Restaurant
managers are now able to more precisely customize their loyalty and gift programs. In doing
so, the managers are noting improved performance in their restaurants in terms of profitability
and customer satisfaction.

Creating New Business Models. Companies are able to use Big Data to create new busi
ness models. For example, a commercial transportation company operated a large fleet of
POM
large, long-haul trucks. The company recently placed sensors on all its trucks. These sensors
wirelessly communicate large amounts of information to the company, a process called
telematics. The sensors collect data on vehicle usage (including acceleration, braking, corner
ing, etc.), driver performance, and vehicle maintenance.
By analyzing this Big Data, the transportation company was able to improve the condi
tion of its trucks through near-real-time analysis that proactively suggested preventive main
tenance. In addition, the company was able to improve the driving skills of its operators by
analyzing their driving styles.
The transportation company then made its Big Data available to its insurance carrier. Using
this data, the insurance carrier performed risk analysis on driver behavior and the condition of
the trucks, resulting in a more precise assessment. The insurance carrier offered the transpor
tation company a new pricing model that lowered the transportation company’s premiums by
10 percent.

Organizations Can Analyze More Data. In some cases, organizations can even process
all the data relating to a particular phenomenon, meaning that they do not have to rely as
much on sampling. Random sampling works well, but it is not as effective as analyzing an entire
dataset. In addition, random sampling has some basic weaknesses. To begin with, its accuracy
depends on ensuring randomness when collecting the sample data. However, achieving such
randomness is problematic. Systematic biases in the process of data collection can cause the
results to be highly inaccurate. For example, consider political polling using landline phones.
This sample tends to exclude people who use only cell phones. This bias can seriously skew the
results because cell phone users are typically younger and more liberal than people who rely
primarily on landline phones.
B ig D a ta 73

Big Data Used in the Functional Areas of the Organization

In this section, we provide examples of how Big Data is valuable to various functional areas in
the firm.

Human Resources. Employee benefits, particularly healthcare, represent a major busi HRM
ness expense. Consequently, some companies have turned to Big Data to better manage these
benefits. Caesars Entertainment (www.caesars.com), for example, analyzes health-insurance
claim data for its 65,000 employees and their covered family members. Managers can track
thousands of variables that indicate how employees use medical services, such as the num
ber of emergency room visits and whether employees choose a generic or brand name drug.
Consider the following scenario: Data revealed that too many employees with medical
emergencies were being treated at hospital emergency rooms rather than at less-expensive
urgent-care facilities. The company launched a campaign to remind employees of the high cost
of emergency room visits, and they provided a list of alternative facilities. Subsequently, 10,000
emergencies shifted to less-expensive alternatives, for a total savings of $4.5 million.
Big Data is also having an impact on hiring. An example is Catalyst IT Services (www
.catalystdevworks.com), a technology outsourcing company that hires teams for programming
jobs. Traditional recruiting is typically too slow, and hiring managers often subjectively choose
candidates who are not the best fit for the job. Catalyst addresses this problem by requiring
candidates to fill out an online assessment. It then uses the assessment to collect thousands
of data points about each candidate. In fact, the company collects more data based on how
candidates answer than on what they answer.
For example, the assessment might give a problem requiring calculus to an applicant who
is not expected to know the subject. How the candidate responds—laboring over an answer,
answering quickly and then returning later, or skipping the problem entirely—provides insight
into how that candidate might deal with challenges that he or she will encounter on the job.
That is, someone who labors over a difficult question might be effective in an assignment that
requires a methodical approach to problem solving, whereas an applicant who takes a more
aggressive approach might perform better in a different job setting.
The benefit of this big-data approach is that it recognizes that people bring different skills
to the table and that there is no one-size-fits-all person for any job. Analyzing millions of data
points can reveal which attributes candidates bring to specific situations.
As one measure of success, employee turnover at Catalyst averages about 15 percent per
year, compared with more than 30 percent for its U.S. competitors and more than 20 percent
for similar companies overseas.

Product Development. Big Data can help capture customer preferences and put that infor MKT
mation to work in designing new products. For example, Ford Motor Company (www.ford.com)
was considering a “three blink” turn indicator that had been available on its European cars for
years. Unlike the turn signals on its U.S. vehicles, this indicator flashes three times at the driv
er’s touch and then automatically shuts off.
Ford decided that conducting a full-scale market research test on this blinker would be too
costly and time consuming. Instead, it examined auto-enthusiast Web sites and owner forums to
discover what drivers were saying about turn indicators. Using text-mining algorithms, research
ers culled more than 10,000 mentions and then summarized the most relevant comments.
The results? Ford introduced the three-blink indicator on the new Ford Fiesta in 2010, and
by 2013 it was available on most Ford products. Although some Ford owners complained online
that they have had trouble getting used to the new turn indicator, many others defended it.
Ford managers note that the use of text-mining algorithms was critical in this effort because
they provided the company with a complete picture that would not have been available using
traditional market research.

Operations. For years, companies have been using information technology to make their POM
operations more efficient. Consider United Parcel Service (UPS). The company has long relied
on data to improve its operations. Specifically, it uses sensors in its delivery vehicles that can,
74 CHAPTER 3 Data and Knowledge Management

among other things, capture the truck’s speed and location, the number of times it is placed in
reverse, and whether the driver’s seat belt is buckled. These data are uploaded at the end of
each day to a UPS data center, where they are analyzed overnight. By combining GPS informa
tion and data from sensors installed on more than 46,000 vehicles, UPS reduced fuel consump
tion by 8.4 million gallons, and it cut 85 million miles off its routes.

MKT Marketing. Marketing managers have long used data to better understand their customers
and to target their marketing efforts more directly. Today, Big Data enables marketers to craft
much more personalized messages.
The United Kingdom’s InterContinental Hotels Group (IHG; www.ihg.com) has gathered
details about the members of its Priority Club rewards program, such as income levels and
whether members prefer family-style or business-traveler accommodations. The company then
consolidated all this information with information obtained from social media into a single
data warehouse. Using its data warehouse and analytics software, the hotelier launched a new
marketing campaign. Where previous marketing campaigns generated, on average, between
7 and 15 customized marketing messages, the new campaign generated more than 1,500. IHG
rolled out these messages in stages to an initial core of 12 customer groups, each of which is
defined by 4,000 attributes. One group, for instance, tends to stay on weekends, redeem reward
points for gift cards, and register through IHG marketing partners. Utilizing this information,
IHG sent these customers a marketing message that alerted them to local weekend events.
The campaign proved to be highly successful. It generated a 35 percent higher rate of cus
tomer conversions, or acceptances, than previous, similar campaigns.

POM Government Operations. With 55 percent of the population of the Netherlands living
under the threat of flooding, water management is critically important to the Dutch govern
ment. The government operates a sophisticated water management system, managing a net
work of dykes or levees, canals, locks, harbors, dams, rivers, storm-surge barriers, sluices, and
pumping stations.
In its water management efforts, the government makes use of a vast number of sensors
embedded in every physical structure used for water control. The sensors generate at least 2
petabytes of data annually. As the sensors are becoming cheaper, the government is deploying
more of them, increasing the amount of data generated.
In just one example of the use of sensor data, sensors in dykes can provide information
on the structure of the dyke, how well it is able to handle the stress of the water it controls,
and whether it is likely to fail. Further, the sensor data are providing valuable insights for new
designs for Dutch dykes. The result is that Dutch authorities have reduced the costs of manag
ing water by 15 percent.

Before you go on…

1. Define Big Data.
2. Describe the characteristics of Big Data.
3. Describe how companies can use Big Data to gain competitive advantage.

Apply the Concept 3.3

LEARNING OBJECTIVE 3.3 Define Big Data and its information. The key “ingredients” that make the Big Data
basic characteristics. phenomenon a reality are volume, velocity, and variety.

STEP 1: Background STEP 2: Activity

This section describes Big Data as an ongoing phenomenon TIBCO (www.tibco.com) is a company that provides a real-time
that is providing businesses with access to vast amounts of event-processing software platform that brings customers
D ata Ware h o u s e s an d D a ta M a r t s 75

and vendors together in a very interactive and engaging way. STEP 3: Deliverable
It uses vast amounts of data (volume), in real time (velocity), Choose one of the videos mentioned in Step 2, and write a
from multiple sources (variety) to bring this solution to its review. In your review, define Big Data, and discuss its basic
customers. Visit YouTube, and search for two videos —“Deliver characteristics relative to the video. Also in your review, note the
Personalized Retail Experiences Using Big Data” and “Harnessing functional areas of an organization referred to in each video.
Big Data and Social Media to Engage Customers”—both by user
“TIBCOSoftware.” Submit your review to your instructor.

As you view these videos, watch carefully for the three

“ingredients” mentioned above.

3.4 Data Warehouses and Data Marts

Today, the most successful companies are those that can respond quickly and flexibly to mar
ket changes and opportunities. A key to this response is the effective and efficient use of data
and information by analysts and managers. The challenge is providing users with access to
corporate data so that they can analyze the data to make better decisions. Let’s look at an
example. If the manager of a local bookstore wanted to know the profit margin on used books
at her store, she could obtain that information from her database, using SQL or QBE. However,
if she needed to know the trend in the profit margins on used books over the past 10 years, she
would have to construct a very complicated SQL or QBE query.
This example illustrates several reasons why organizations are building data warehouses
and/or data marts. First, the bookstore’s databases contain the necessary information to answer
the manager’s query, but this information is not organized in a way that makes it easy for her to
find what she needs. Second, the organization’s databases are designed to process millions of
transactions every day. Therefore, complicated queries might take a long time to answer, and
they also might degrade the performance of the databases. Third, transactional databases are
designed to be updated. This update process requires extra processing. Data warehouses and
data marts are read-only, and the extra processing is eliminated because data already contained
in the data warehouse are not updated. Fourth, transactional databases are designed to access a
single record at a time. Data warehouses are designed to access large groups of related records.
As a result of these problems, companies are using a variety of tools with data warehouses
and data marts to make it easier and faster for users to access, analyze, and query data. You will
learn about these tools in Chapter 5 on Business Analytics.

Describing Data Warehouses and Data Marts

In general, data warehouses and data marts support business intelligence (BI) applications. As
you will see in Chapter 5, business intelligence encompasses a broad category of applications,
technologies, and processes for gathering, storing, accessing, and analyzing data to help busi
ness users make better decisions. A data warehouse is a repository of historical data that are
organized by subject to support decision makers in the organization.
Because data warehouses are so expensive, they are used primarily by large companies.
A data mart is a low-cost, scaled-down version of a data warehouse that is designed for the
end-user needs in a strategic business unit (SBU) or an individual department. Data marts can
be implemented more quickly than data warehouses, often in less than 90 days. Further, they
support local rather than central control by conferring power on the user group. Typically,
groups that need a single or a few BI applications require only a data mart, rather than a data
warehouse.
The basic characteristics of data warehouses and data marts include the following:

• Organized by business dimension or subject. Data are organized by subject—for exam

ple, by customer, vendor, product, price level, and region. This arrangement differs from
76 CHAPTER 3 Data and Knowledge Management

transactional systems, where data are organized by business process, such as order entry,
inventory control, and accounts receivable.
• Use online analytical processing. Typically, organizational databases are oriented toward
handling transactions. That is, databases use online transaction processing (OLTP), where
business transactions are processed online as soon as they occur. The objectives are speed
and efficiency, which are critical to a successful Internet-based business operation. Data
warehouses and data marts, which are designed to support decision makers but not OLTP,
use OLTP. Online analytical processing involves the analysis of accumulated data by end
users. We consider OLAP in greater detail in Chapter 5.
• Integrated. Data are collected from multiple systems and then integrated around subjects.
For example, customer data may be extracted from internal (and external) systems and
then integrated around a customer identifier, thereby creating a comprehensive view of
the customer.
• Time variant. Data warehouses and data marts maintain historical data (i.e., data that
include time as a variable). Unlike transactional systems, which maintain only recent
data (such as for the last day, week, or month), a warehouse or mart may store years of
data. Organizations utilize historical data to detect deviations, trends, and long-term
relationships.
• Nonvolatile. Data warehouses and data marts are nonvolatile—that is, users cannot change
or update the data. Therefore, the warehouse or mart reflects history, which, as we just
saw, is critical for identifying and analyzing trends. Warehouses and marts are updated,
but through IT-controlled load processes rather than by users.
• Multidimensional. Typically the data warehouse or mart uses a multidimensional data
structure. Recall that relational databases store data in two-dimensional tables. In con
trast, data warehouses and marts store data in more than two dimensions. For this reason,
the data are said to be stored in a multidimensional structure. A common representa
tion for this multidimensional structure is the data cube.

The data in data warehouses and marts are organized by business dimensions, which are sub
jects such as product, geographic area, and time period that represent the edges of the data
cube. If you look ahead to Figure 3.6 for an example of a data cube, you see that the product
dimension is comprised of nuts, screws, bolts, and washers; the geographic area dimension
is comprised of East, West, and Central; and the time period dimension is comprised of 2013,
2014, and 2015. Users can view and analyze data from the perspective of these business dimen
sions. This analysis is intuitive because the dimensions are presented in business terms that
users can easily understand.

A Generic Data Warehouse Environment

The environment for data warehouses and marts includes the following:

• Source systems that provide data to the warehouse or mart

• Data-integration technology and processes that prepare the data for use
• Different architectures for storing data in an organization’s data warehouse or data marts
• Different tools and applications for the variety of users. (You will learn about these tools
and applications in Chapter 5.)
• Metadata, data-quality, and governance processes that ensure that the warehouse or mart
meets its purposes

Figure 3.4 depicts a generic data warehouse/data mart environment. Let’s drill down into the
component parts.

Source Systems. There is typically some “organizational pain” (i.e., business need) that
motivates a firm to develop its BI capabilities. Working backward, this pain leads to information
D ata Ware h o u s e s an d D a ta M a r t s 77

FIGURE 3.4 Data warehouse

framework.

requirements, BI applications, and source system data requirements. The data requirements
can range from a single source system, as in the case of a data mart, to hundreds of source
systems, as in the case of an enterprisewide data warehouse.
Modern organizations can select from a variety of source systems, including: operational/
transactional systems, enterprise resource planning (ERP) systems, Web site data, third-party
data (e.g., customer demographic data), and more. The trend is to include more types of data
(e.g., sensing data from RFID tags). These source systems often use different software packages
(e.g., IBM, Oracle) and store data in different formats (e.g., relational, hierarchical).
A common source for the data in data warehouses is the company’s operational databases,
which can be relational databases. To differentiate between relational databases and multi
dimensional data warehouses and marts, imagine your company manufactures four prod-
ucts—nuts, screws, bolts, and washers—and has sold them in three territories—East, West,
and Central—for the previous three years—2013, 2014, and 2015. In a relational database,
these sales data would resemble Figure 3.5(a) through (c). In a multidimensional database,
in contrast, these data would be represented by a three-dimensional matrix (or data cube),
as depicted in Figure 3.6. This matrix represents sales dimensioned by products and regions
and year. Notice that Figure 3.5(a) presents only sales for 2013. Sales for 2014 and 2015 are
presented in Figure 3.5(b) and (c), respectively. Figure 3.7(a) through (c) illustrates the equiva
lence between these relational and multidimensional databases.

FIGURE 3.5 Relational databases.

78 CHAPTER 3 Data and Knowledge Management

FIGURE 3.6 Data cube.

FIGURE 3.7 Equivalence between relational and multidimensional databases.

Unfortunately, many source systems that have been in use for years contain “bad data”
(e.g., missing or incorrect data) and are poorly documented. As a result, data-profiling software
should be used at the beginning of a warehousing project to better understand the data. For
example, this software can provide statistics on missing data, identify possible primary and
foreign keys, and reveal how derived values (e.g., column 3 = column 1 + column 2) are calcu
lated. Subject area database specialists (e.g., marketing, human resources) can also assist in
understanding and accessing the data in source systems.
Organizations need to address other source systems issues as well. Often there are multi
ple systems that contain some of the same data and the best system must be selected as the
source system. Organizations must also decide how granular (i.e., detailed) the data should
D ata Ware h o u s e s an d D a ta M a r t s 79

be. For example, does the organization need daily sales figures or data at the individual trans
action level? The conventional wisdom is that it is best to store data at a highly granular level
because someone will likely request the data at some point.

Data Integration. In addition to storing data in their source systems, organizations need
to extract the data, transform them, and then load them into a data mart or warehouse. This
process is often called ETL, although the term data integration is increasingly being used to
reflect the growing number of ways that source system data can be handled. For example, in
some cases, data are extracted, loaded into a mart or warehouse, and then transformed (i.e.,
ELT rather than ETL).
Data extraction can be performed either by handwritten code (e.g., SQL queries) or by
commercial data-integration software. Most companies employ commercial software. This
software makes it relatively easy to specify the tables and attributes in the source systems that
are to be used, map and schedule the movement of the data to the target (e.g., a data mart or
warehouse), make the required transformations, and ultimately load the data.
After the data are extracted they are transformed to make them more useful. For exam
ple, data from different systems may be integrated around a common key, such as a customer
identification number. Organizations adopt this approach to create a 360-degree view of all of
their interactions with their customers. As an example of this process, consider a bank. Cus
tomers can engage in a variety of interactions: visiting a branch, banking online, using an ATM,
obtaining a car loan, and more. The systems for these touch points—defined as the numerous
ways that organizations interact with customers, such as e-mail, the Web, direct contact, and
the telephone—are typically independent of one another. To obtain a holistic picture of how
customers are using the bank, the bank must integrate the data from the various source sys
tems into a data mart or warehouse.
Other kinds of transformations also take place. For example, format changes to the data
may be required, such as using male and female to denote gender, as opposed to 0 and 1 or M
and F. Aggregations may be performed, say on sales figures, so that queries can use the summa
ries rather than recalculating them each time. Data-cleansing software may be used to “clean
up” the data; for example, eliminating duplicate records for the same customer.
Finally, data are loaded into the warehouse or mart during a specific period known as the
“load window.” This window is becoming smaller as companies seek to store ever-fresher data
in their warehouses. For this reason, many companies have moved to real-time data warehous
ing where data are moved (using data-integration processes) from source systems to the data
warehouse or mart almost instantly. For example, within 15 minutes of a purchase at Walmart,
the details of the sale have been loaded into a warehouse and are available for analysis.

Storing the Data. A variety of architectures can be used to store decision-support data. The
most common architecture is one central enterprise data warehouse, without data marts. Most
organizations use this approach, because the data stored in the warehouse are accessed by all
users and represent the single version of the truth.
Another architecture is independent data marts. This architecture stores data for a single
application or a few applications, such as marketing and finance. Limited thought is given to
how the data might be used for other applications or by other functional areas in the organiza
tion. This is a very application-centric approach to storing data.
The independent data mart architecture is not particularly effective. Although it may meet
a specific organizational need, it does not reflect an enterprise-wide approach to data man
agement. Instead, the various organizational units create independent data marts. Not only
are these marts expensive to build and maintain, but they often contain inconsistent data. For
example, they may have inconsistent data definitions such as: What is a customer? Is a particu
lar individual a potential or current customer? They might also use different source systems
(which may have different data for the same item, such as a customer address). Although inde
pendent data marts are an organizational reality, larger companies have increasingly moved to
data warehouses.
Still another data warehouse architecture is the hub and spoke. This architecture contains
a central data warehouse that stores the data plus multiple dependent data marts that source
80 CHAPTER 3 Data and Knowledge Management

their data from the central repository. Because the marts obtain their data from the central
repository, the data in these marts still comprise the single version of the truth for decision-sup
port purposes.
The dependent data marts store the data in a format that is appropriate for how the data
will be used and for providing faster response times to queries and applications. As you have
learned, users can view and analyze data from the perspective of business dimensions and
measures. This analysis is intuitive because the dimensions are in business terms, easily under
stood by users.

Metadata. It is important to maintain data about the data, known as metadata, in the data
warehouse. Both the IT personnel who operate and manage the data warehouse and the users
who access the data need metadata. IT personnel need information about data sources; data
base, table, and column names; refresh schedules; and data-usage measures. Users’ needs
include data definitions, report/query tools, report distribution information, and contact infor
mation for the help desk.

Data Quality. The quality of the data in the warehouse must meet users’ needs. If it does not,
users will not trust the data and ultimately will not use it. Most organizations find that the qual
ity of the data in source systems is poor and must be improved before the data can be used in
the data warehouse. Some of the data can be improved with data-cleansing software, but the
better, long-term solution is to improve the quality at the source system level. This approach
requires the business owners of the data to assume responsibility for making any necessary
changes to implement this solution.
To illustrate this point, consider the case of a large hotel chain that wanted to conduct tar
geted marketing promotions using zip code data it collected from its guests when they checked
in. When the company analyzed the zip code data, they discovered that many of the zip codes
were 99999. How did this error occur? The answer is that the clerks were not asking customers
for their zip codes, but they needed to enter something to complete the registration process.
A short-term solution to this problem was to conduct the marketing campaign using city and
state data instead of zip codes. The long-term solution was to make certain the clerks entered
the actual zip codes. The latter solution required the hotel managers to take the responsibility
for making certain their clerks enter the correct data.

Governance. To ensure that BI is meeting their needs, organizations must implement gover
nance to plan and control their BI activities. Governance requires that people, committees, and
processes be in place. Companies that are effective in BI governance often create a senior-level
committee comprised of vice presidents and directors who (1) ensure that the business strate
gies and BI strategies are in alignment, (2) prioritize projects, and (3) allocate resources. These
companies also establish a middle management–level committee that oversees the various
projects in the BI portfolio to ensure that these projects are being completed in accordance
with the company’s objectives. Finally, lower level operational committees perform tasks such
as creating data definitions and identifying and solving data problems. All of these committees
rely on the collaboration and contributions of business users and IT personnel.

Users. Once the data are loaded in a data mart or warehouse, they can be accessed. At this
point the organization begins to obtain business value from BI; all of the prior stages constitute
creating BI infrastructure.
There are many potential BI users, including IT developers; frontline workers; analysts;
information workers; managers and executives; and suppliers, customers, and regulators.
Some of these users are information producers whose primary role is to create information for
other users. IT developers and analysts typically fall into this category. Other users—including
managers and executives—are information consumers, because they utilize information cre
ated by others.
Companies have reported hundreds of successful data-warehousing applications. You
can read client success stories and case studies at the Web sites of vendors such as NCR
Corp. (www.ncr.com) and Oracle (www.oracle.com). For a more detailed discussion, visit the
K n o wl e d ge M a n a ge m e n t 81

Data Warehouse Institute (http://tdwi.org). The benefits of data warehousing include the
following:

• End users can access needed data quickly and easily via Web browsers because these data
are located in one place.
• End users can conduct extensive analysis with data in ways that were not previously possible.
• End users can obtain a consolidated view of organizational data.

These benefits can improve business knowledge, provide competitive advantage, enhance
customer service and satisfaction, facilitate decision making, and streamline business
processes.
Despite their many benefits, data warehouses have some limitations. First, they can be
very expensive to build and to maintain. Second, incorporating data from obsolete mainframe
systems can be difficult and expensive. Finally, people in one department might be reluctant to
share data with other departments.

Before you go on…

1. Differentiate between data warehouses and data marts.
2. Describe the characteristics of a data warehouse.
3. What are three possible architectures for data warehouses and data marts in an organization?

Apply the Concept 3.4

LEARNING OBJECTIVE 3.4 Explain the elements in the article. (The term “warehouse” is not used, but the concept
necessary to successfully implement and maintain data is applicable.) Also, although the article seems to focus on real-
warehouses. time data usage, imagine the possibilities of compiling years of
this type of data in a data warehouse.

STEP 1: Background
STEP 3: Deliverable
A set of general ingredients is required for organizations to
effectively utilize the power of data marts and data warehouses. To demonstrate that you recognize the environmental factors
Figure 3.4 presents this information. Healthcare as an industry necessary to implement and maintain a data warehouse, imagine
has not been centralized for many business, legal and ethical that the date is exactly five years in the future. Write a newspaper
reasons. However, the overall health implications of a centralized article titled “Data from Gadgets Like Fitbit Remade How Doctors
data warehouse are unimaginable. Treated Us.” In your article imagine that all of the ingredients
necessary in the environment have come together. Discuss what
STEP 2: Activity the environment was like five years ago (today) and how things
have evolved to create the right mix of environmental factors.
Visit http://www.wiley.com/go/rainer/MIS4e/applytheconcept,
and read the article in WIRED magazine from March 6, 2014, titled Be aware that there is no right/wrong answer to this exercise. The
“Gadgets Like Fitbit Are Remaking How Doctors Treat You.” As you objective is for you to recognize the necessary environment for a
read this article, you will see that several key ingredients exist, successful data warehouse implementation. The healthcare-related
though no one has built a medical data warehouse as described example simply provides a platform to accomplish this task.

3.5 Knowledge Management

As we have noted throughout this text, data and information are critically important organiza
tional assets. Knowledge is a vital asset as well. Successful managers have always valued and
utilized intellectual assets. These efforts were not systematic, however, and they did not ensure
that knowledge was shared and dispersed in a way that benefited the overall organization.
82 CHAPTER 3 Data and Knowledge Management

Moreover, industry analysts estimate that most of a company’s knowledge assets are not
housed in relational databases. Instead, they are dispersed in e-mail, word-processing doc
uments, spreadsheets, presentations on individual computers, and in people’s heads. This
arrangement makes it extremely difficult for companies to access and integrate this knowl
edge. The result frequently is less-effective decision making.

Concepts and Definitions

Knowledge management (KM) is a process that helps organizations manipulate important
knowledge that comprises part of the organization’s memory, usually in an unstructured for
mat. For an organization to be successful, knowledge, as a form of capital, must exist in a for
mat that can be exchanged among persons. In addition, it must be able to grow.

Knowledge. In the information technology context, knowledge is distinct from data and
information. As you learned in Chapter 1, data are a collection of facts, measurements, and
statistics; information is organized or processed data that are timely and accurate. Knowledge
is information that is contextual, relevant, and useful. Simply put, knowledge is information in
action. Intellectual capital (or intellectual assets) is another term for knowledge.
To illustrate, a bulletin listing all of the courses offered by your university during one
semester would be considered data. When you register, you process the data from the bulle
tin to create your schedule for the semester. Your schedule would be considered information.
Awareness of your work schedule, your major, your desired social schedule, and characteristics
of different faculty members could be construed as knowledge, because it can affect the way
you build your schedule. You see that this awareness is contextual and relevant (to developing
an optimal schedule of classes) as well as useful (it can lead to changes in your schedule). The
implication is that knowledge has strong experiential and reflective elements that distinguish
it from information in a given context. Unlike information, knowledge can be utilized to solve
a problem.
Numerous theories and models classify different types of knowledge. In the next section,
we will focus on the distinction between explicit knowledge and tacit knowledge.

Explicit and Tacit Knowledge. Explicit knowledge deals with more objective, rational,
and technical knowledge. In an organization, explicit knowledge consists of the policies, pro
cedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure
of the enterprise. In other words, explicit knowledge is the knowledge that has been codified
(documented) in a form that can be distributed to others or transformed into a process or a
strategy. A description of how to process a job application that is documented in a firm’s human
resources policy manual is an example of explicit knowledge.
In contrast, tacit knowledge is the cumulative store of subjective or experiential learn
ing. In an organization, tacit knowledge consists of an organization’s experiences, insights,
expertise, know-how, trade secrets, skill sets, understanding, and learning. It also includes
the organizational culture, which reflects the past and present experiences of the organiza
tion’s people and processes, as well as the organization’s prevailing values. Tacit knowledge
is generally imprecise and costly to transfer. It is also highly personal. Finally, because it is
unstructured, it is difficult to formalize or codify, in contrast to explicit knowledge. A salesper
son who has worked with particular customers over time and has come to know their needs
quite well would possess extensive tacit knowledge. This knowledge is typically not recorded.
In fact, it might be difficult for the salesperson to put into writing, even if he or she were willing
to share it.

Knowledge Management Systems

The goal of knowledge management is to help an organization make the most productive use
of the knowledge it has accumulated. Historically, management information systems have
focused on capturing, storing, managing, and reporting explicit knowledge. Organizations now
K n o wl e d ge M a n a ge m e n t 83

realize they need to integrate explicit and tacit knowledge into formal information systems.
Knowledge management systems (KMSs) refer to the use of modern information technol
ogies—the Internet, intranets, extranets, databases—to systematize, enhance, and expedite
intrafirm and interfirm knowledge management. KMSs are intended to help an organization
cope with turnover, rapid change, and downsizing by making the expertise of the organization’s
human capital widely accessible. IT’s About Business 3.4 illustrates how Performance Bicycle
implemented the Learning Center, a knowledge management system.

IT’s About Business 3.4

MKT Performance Bicycle Leverages Its on the e-commerce site. When new products are added for sale, the
editorial team links back to relevant Learning Center content.
MIS Employees’ Knowledge
For Performance Bicycle employees and customers alike, cy
Performance Bicycle (PB; www.performancebike.com) is a leading cling is a lifestyle. Its e-commerce site and Learning Center have
retailer of cycling products. The company has 2,200 employees and helped forge its reputation as a leading cycling authority. The
more than 100 stores in 20 states. It also has a print catalog and site succeeds in inspiring both experienced and new cyclists, and
sells more than 10,000 products on its Web site. helps them navigate the otherwise bumpy road of a wide range of
The company wanted to use the Web to increase its customer products.
base as part of its long-term business plan. PB knew that its e- And the bottom line? Within just four months of the Learning
commerce site could extend its reach beyond its brick-and-mortar Center going live, traffic on PB’s site tripled. The Learning Center is
stores. To do this, it would need more than just great products. now referring more than 40 percent of all of the company’s direct
Performance Bicycle decided to leverage its employees’ online sales. Meanwhile, PB has begun to implement the Learning
passion for cycling. Most of them keep bikes in their offices or cu Center as a mobile application to further improve the customer
bicles. By using the staff’s knowledge and enthusiasm, PB could experience.
motivate new customers and cement its reputation as a cycling
expert. Significantly, the company recognized that although its Sources: Compiled from A. Dow, “Top Takeaways for Retail from
employees’ knowledge is extremely valuable, it was also largely VMworld,” VMware Blogs, September 15, 2015; “Consumer and Retail
untapped. The challenge confronting Performance Bicycle was to Companies Must Focus on Distribution and Localization in Emerging
capture this knowledge and share it with its customers in an en Markets,” M-Brain, July 5, 2015; E. Tucker, “Positive Stories about
gaging way. PB addressed this challenge by launching its Learn Working in Retail,” APQC.org, April 29, 2015; T. Hussein and S. Khan,
“Knowledge Management: An Instrument for Implementation in
ing Center.
Retail Marketing,” MATRIX Academic International Online Journal
To make its employees’ knowledge available to customers of Engineering and Technology, April, 2015; “Performance Bike:
on its Web site, Performance Bicycle added a knowledge manage Architecting a Customer Learning Center,” Sirius Digital Experience,
ment system, called the Learning Center, where staff share their 2015; A. Pickrell, “Putting the ‘Perform’ in Performance Bicycle,”
expertise, tips, and tricks via videos, articles, and how-to guides. IBM Amplify, 2015; J. Gregoire, “5 Challenges, Opportunities, and
Key to the Learning Center’s success is its seamless integration with Imperatives for Retailers in 2015,” CPC Strategy, October 8, 2014;
“Performance Bicycle,” IBM Smarter Commerce, April 22, 2014;
e-commerce, allowing customers to easily find products and the
“Performance Bicycle Launches Learning Center,” MarketWatch,
relevant multimedia content about them. For example, custom September 25, 2013; www.performancebike.com, accessed
ers interested in performing maintenance on their bike can watch September 26, 2015.
a how-to video and then click directly to the replacement parts
they’ll need to order. Questions
Performance Bicycle conducted knowledge-transfer sessions
1. Describe several ways in which Performance Bicycle incorpo
to obtain the knowledge for the Learning Center. PB has an edito
rates employee knowledge in its customer experience.
rial team that reviews content produced by employees, includ
ing articles and video guides. When publishing the content to the 2. Is Performance Bicycle capturing and using its employee’s
Learning Center, the editorial team makes links to relevant pages tacit knowledge or explicit knowledge? Explain your answer.

Organizations can realize many benefits with KMSs. Most importantly, they make best
practices—the most effective and efficient ways of doing things—readily available to a wide
range of employees. Enhanced access to best-practice knowledge improves overall organ
izational performance. For example, account managers can now make available their tacit
knowledge about how best to manage large accounts. The organization can then utilize this
knowledge when it trains new account managers. Other benefits include improved customer
service, more efficient product development, and improved employee morale and retention.
84 CHAPTER 3 Data and Knowledge Management

At the same time, however, implementing effective KMSs pre

sents several challenges. First, employees must be willing to share
their personal tacit knowledge. To encourage this behavior, organi
zations must create a knowledge management culture that rewards
employees who add their expertise to the knowledge base. Second,
the organization must continually maintain and upgrade its knowl
edge base. Specifically, it must incorporate new knowledge and
delete old, outdated knowledge. Finally, companies must be willing
to invest in the resources needed to carry out these operations.

The KMS Cycle

A functioning KMS follows a cycle that consists of six steps (see
Figure 3.8). The reason the system is cyclical is that knowledge is
dynamically refined over time. The knowledge in an effective KMS
is never finalized because the environment changes over time and
FIGURE 3.8 The knowledge management system cycle. knowledge must be updated to reflect these changes. The cycle
works as follows:

1. Create knowledge. Knowledge is created as people determine new ways of doing things or
develop know-how. Sometimes external knowledge is brought in.
2. Capture knowledge. New knowledge must be identified as valuable and be represented in
a reasonable way.
3. Refine knowledge. New knowledge must be placed in context so that it is actionable. This
is where tacit qualities (human insights) must be captured along with explicit facts.
4. Store knowledge. Useful knowledge must then be stored in a reasonable format in a knowl
edge repository so that other people in the organization can access it.
5. Manage knowledge. Like a library, the knowledge must be kept current. It must be re
viewed regularly to verify that it is relevant and accurate.
6. Disseminate knowledge. Knowledge must be made available in a useful format to anyone
in the organization who needs it, anywhere and anytime.

Before you go on…

1. What is knowledge management?
2. What is the difference between tacit knowledge and explicit knowledge?
3. Describe the knowledge management system cycle.

Apply the Concept 3.5

LEARNING OBJECTIVE 3.5 Describe the benefits and in employee presentations, in e-mail among coworkers, and in
challenges of implementing knowledge management systems in numerous other scenarios. The problem many organizations face
organizations. is that there are massive amounts of knowledge that are created
and shared, but this information is not stored in a centralized,
searchable format.
STEP 1: Background
As you have learned in this text, data are captured, stored,
STEP 2: Activity
analyzed, and shared to create knowledge within organizations.
This knowledge is exposed in meetings when colleagues are Visit http://www.wiley.com/go/rainer/MIS4e/applytheconcept, and
interpreting the information they received from the latest report, click on the links provided for Apply the Concept 3.5. They will take

Batch B DWM Experiments
No ratings yet
Batch B DWM Experiments
90 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Latihan Soal MTCNA
100% (1)
Latihan Soal MTCNA
27 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
PhysicalSecuritySystemsAssessmentGuide Dec2016
100% (2)
PhysicalSecuritySystemsAssessmentGuide Dec2016
309 pages
Dynapac - PL500T
No ratings yet
Dynapac - PL500T
370 pages
6 Documentdatabases
No ratings yet
6 Documentdatabases
27 pages
Unit 2 - Data Warehouse Logical Designm
No ratings yet
Unit 2 - Data Warehouse Logical Designm
73 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
Dsi 142
100% (1)
Dsi 142
19 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Data Warehouse Development Approach
No ratings yet
Data Warehouse Development Approach
25 pages
Unit 5 - Chapter 2 - Introduction To MongoDB
No ratings yet
Unit 5 - Chapter 2 - Introduction To MongoDB
53 pages
04 Data Warehouse and Data Mart
No ratings yet
04 Data Warehouse and Data Mart
15 pages
DBMS Vs DataWarehouse
No ratings yet
DBMS Vs DataWarehouse
2 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
DW 3 - Logical Data WareHouse Design
No ratings yet
DW 3 - Logical Data WareHouse Design
139 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Data Warehousing Logical Design
100% (1)
Data Warehousing Logical Design
23 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Assignment of Information Technology: Submitted To: Submitted by
No ratings yet
Assignment of Information Technology: Submitted To: Submitted by
14 pages
Chapter-21The Virtual Data Warehouse
No ratings yet
Chapter-21The Virtual Data Warehouse
11 pages
DWDM Unit-2 PDF
No ratings yet
DWDM Unit-2 PDF
149 pages
IT Managed Support Services Proposal
No ratings yet
IT Managed Support Services Proposal
8 pages
Data Warehouse and OLAP
No ratings yet
Data Warehouse and OLAP
55 pages
Designing The Data Warehouse Aima Second Lecture
No ratings yet
Designing The Data Warehouse Aima Second Lecture
34 pages
Decision Support and Data Warehouse Systems
No ratings yet
Decision Support and Data Warehouse Systems
9 pages
Data Warehouse On Hadoop Platform For Processing of Big Educational Data
No ratings yet
Data Warehouse On Hadoop Platform For Processing of Big Educational Data
4 pages
Ch4 - Data Warehousing
No ratings yet
Ch4 - Data Warehousing
33 pages
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
7 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
9 pages
Introduction To Data Warehousing: Presentation On
No ratings yet
Introduction To Data Warehousing: Presentation On
8 pages
Unit #1 - Data Warehouse and Data Mining
No ratings yet
Unit #1 - Data Warehouse and Data Mining
62 pages
Data Warehouse File
No ratings yet
Data Warehouse File
9 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
Data Warehouse Conceptual Data Model
No ratings yet
Data Warehouse Conceptual Data Model
6 pages
Create First Data WareHouse - CodeProject
No ratings yet
Create First Data WareHouse - CodeProject
10 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Data W Areho Us e
100% (1)
Data W Areho Us e
9 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Unit 1
No ratings yet
Unit 1
14 pages
2018 Maserati Ghibli v01
No ratings yet
2018 Maserati Ghibli v01
374 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Unit #2 - Data Warehouse and Data Mining
No ratings yet
Unit #2 - Data Warehouse and Data Mining
51 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
Interpretable Machine Learning
No ratings yet
Interpretable Machine Learning
7 pages
SENSOR J-BOX - 02ATEX3291Iss - 3 PDF
No ratings yet
SENSOR J-BOX - 02ATEX3291Iss - 3 PDF
4 pages
Advantages of Data Warehouse
No ratings yet
Advantages of Data Warehouse
2 pages
OLTP
No ratings yet
OLTP
12 pages
Chap01 Data Warehouse 1
No ratings yet
Chap01 Data Warehouse 1
65 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
3 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
Data Warehouse Layers
No ratings yet
Data Warehouse Layers
2 pages
F60fd한글설명서
No ratings yet
F60fd한글설명서
135 pages
FSD Hotel Study Guide F 58
No ratings yet
FSD Hotel Study Guide F 58
33 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
Collins Indictment
100% (1)
Collins Indictment
28 pages
Web Robot Detection Based On Pattern Matching Technique and Host and Network Based Analyzer and Detector For Botnets
No ratings yet
Web Robot Detection Based On Pattern Matching Technique and Host and Network Based Analyzer and Detector For Botnets
2 pages
PNR Optional Fields Abacus Air Tickets System
No ratings yet
PNR Optional Fields Abacus Air Tickets System
65 pages
Code of Conduct
No ratings yet
Code of Conduct
15 pages
ExtremeXOS 15.2.3-Patch1-12 RelNotes
100% (1)
ExtremeXOS 15.2.3-Patch1-12 RelNotes
116 pages
Accessibility Brochure
No ratings yet
Accessibility Brochure
16 pages
Final Record
No ratings yet
Final Record
52 pages
Hotel Automation Software
No ratings yet
Hotel Automation Software
7 pages
User Manual of ADMS Software
No ratings yet
User Manual of ADMS Software
41 pages
Gitex Vendor Presentation Schedule (2) - 2
No ratings yet
Gitex Vendor Presentation Schedule (2) - 2
8 pages
Richards Laura - The Golden Windows
No ratings yet
Richards Laura - The Golden Windows
147 pages
Archived Problems - Project Euler
No ratings yet
Archived Problems - Project Euler
206 pages
CCSP Online Training
No ratings yet
CCSP Online Training
8 pages
Dr. Katherine Albrecht Answers Questions About RFID
100% (3)
Dr. Katherine Albrecht Answers Questions About RFID
9 pages
Keonn-Advanstation-Data Sheet
No ratings yet
Keonn-Advanstation-Data Sheet
2 pages
NIB of ML (1) 2024
No ratings yet
NIB of ML (1) 2024
21 pages
Issues and Challenges: Cloud Computing E-Government in Developing Countries
No ratings yet
Issues and Challenges: Cloud Computing E-Government in Developing Countries
6 pages
Tor Writing Competition
No ratings yet
Tor Writing Competition
6 pages
5585 - 28112024121810 (1) - Unlocked
No ratings yet
5585 - 28112024121810 (1) - Unlocked
5 pages
Ezoref Technologies Product Portfolio 2024
No ratings yet
Ezoref Technologies Product Portfolio 2024
8 pages
Bug - BP3
No ratings yet
Bug - BP3
4 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Big Data and Data Warehouse

Uploaded by

Big Data and Data Warehouse

Uploaded by

66 CHAPTER 3 Data and Knowledge Management

Before you go on…

Apply the Concept 3.2

Each employee can be assigned to one or more projects. Some

3.3 Big Data

• The likelihood that an e-mail message is spam;

Defining Big Data

Big Data generally consists of the following:

• Traditional enterprise data—examples are customer information from customer relation­

Let’s take a look at a few specific examples of Big Data:

Characteristics of Big Data

Issues with Big Data

Managing Big Data

IT’s About Business 3.2

Putting Big Data to Use

IT’s About Business 3.3

Enabling Organizations to Conduct Experiments. Big Data allows organizations to

Micro-Segmentation of Customers. Segmentation of a company’s customers means

Big Data Used in the Functional Areas of the Organization

Before you go on…

Apply the Concept 3.3

STEP 1: Background STEP 2: Activity

As you view these videos, watch carefully for the three

3.4 Data Warehouses and Data Marts

Describing Data Warehouses and Data Marts

• Organized by business dimension or subject. Data are organized by subject—for exam­

A Generic Data Warehouse Environment

• Source systems that provide data to the warehouse or mart

FIGURE 3.4 Data warehouse

FIGURE 3.5 Relational databases.

FIGURE 3.6 Data cube.

FIGURE 3.7 Equivalence between relational and multidimensional databases.

Before you go on…

Apply the Concept 3.4

3.5 Knowledge Management

Concepts and Definitions

Knowledge Management Systems

IT’s About Business 3.4

At the same time, however, implementing effective KMSs pre­

The KMS Cycle

Before you go on…

Apply the Concept 3.5

You might also like

• Traditional enterprise data—examples are customer information from customer relation

• Organized by business dimension or subject. Data are organized by subject—for exam

At the same time, however, implementing effective KMSs pre