Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF
Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF
Amar Sahay - Business Analytics, Volume II - A Data Driven Decision Making Approach For Business-Business Expert Press (2019) PDF
SAHAY
Mark Ferguson, Editor
BUSINESS ANALYTICS,
BUSINESS ANALYTICS, VOLUME II VOLUME II
A Data-Driven Decision-Making Approach for Business
Being an expert on quality management and Six Sigma, Dr. Sahay also
incorporated quality tools into the analytics process, something that is
rare, but in my opinion extremely important and helpful. Moreover, his
treatment of the tools for predictive analytics not only explains the tools,
but goes a step further in clarifying when each should be used and how
the tools fit together. Such clarification is often presented in tabular form,
which makes it easy to refer back to whenever the information is needed.
Business Expert Press Big Data, Business Analytics, and Smart Technology
Collection
10 9 8 7 6 5 4 3 2 1
Keywords
analytics; business analytics; business intelligence; data analysis; decision
making; descriptive analytics; predictive analytics; prescriptive analytics;
statistical analysis; quantitative techniques; data mining; predictive mod-
eling; regression analysis; modeling; time series forecasting; optimization;
simulation; machine learning; neural networks; artificial intelligence
Contents
Preface...................................................................................................xi
Acknowledgments.................................................................................xvii
Chapter 1 Business Analytics at a Glance............................................1
Chapter 2 Business Analytics and Business Intelligence.....................23
Chapter 3 Analytics, Business Analytics, Data Analytics, and
How They Fit into the Broad Umbrella of Business
Intelligence......................................................................33
Chapter 4 Descriptive Analytics—Overview, Applications,
and a Case........................................................................57
Chapter 5 Descriptive versus Predictive Analytics.............................71
Chapter 6 Key Predictive Analytics Models (Predicting Future
Business Outcomes Using Analytic Models).....................83
Chapter 7 Regression Analysis and Modeling..................................103
Chapter 8 Time Series Analysis and Forecasting..............................195
Chapter 9 Data Mining: Tools and Applications in Predictive
Analytics........................................................................239
Chapter 10 Wrap-Up, Overview, Notes on Implementation, and
Current State of Business Analytics................................263
Appendices..........................................................................................281
Additional Readings............................................................................ 373
About the Author.................................................................................377
Index..................................................................................................379
Preface
This book deals with business analytics (BA)—an emerging area in mod-
ern business decision making.
BA tools are also used to visualize and explore the patterns and trends
in the data to predict future business outcomes with the help of forecast-
ing and predictive modeling.
In this age of technology, companies collect massive amounts of data.
Successful companies view their data as an asset and use them to gain
a competitive advantage. These companies use BA tools as an organiza-
tional commitment to data-driven decision making. BA helps businesses
in making informed business decisions. It is also critical in automating
and optimizing business processes.
BA makes extensive use of data, statistical analysis, mathematical and
statistical modeling, and data mining to explore, investigate, and understand
the business performance. Through data, BA helps to gain insight and drive
business planning and decisions. The tools of BA focus on understanding
business performance based on the data. It uses a number of models derived
from statistics, management science, and operations research areas.
The BA area can be divided into different categories depending upon
the types of analytics and tools being used. The major categories of BA are:
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
xii PREFACE
Each of the above categories uses different tools, and the use of these
analytics depend on the type of business and the operations a company
is involved in. For example, an organization may only use descriptive
analytics tools; whereas another company may use a combination of de-
scriptive and predictive modeling and analytics to predict future business
performance to drive business decisions.
The different types of analytics and the tools used in these analytics
are described below:
The analytics tools come under the broad area of Business Intelligence
(BI) that incorporates Business Analytics (BA), data analytics, and
advanced analytics. All these areas come under the umbrella of BI and
use a number of visual and mathematical models.
Modeling is one of the most important parts of BA. Models are of
different types. An understanding of different types of models is critical
in selecting and applying the right model or models to solve business
problems. The widely used models are: (a) graphical models, (b) quantita-
tive models, (c) algebraic models, (d) spreadsheet models, and (e) other
analytic tools.
Most of the tools in descriptive, predictive, and prescriptive analyt-
ics are described using one or the other type of model which are usually
graphical, mathematical, or computer models. Besides these models, sim-
ulation and a number of other mathematical models are used in analytics.
BA is a vast area. It is not possible to provide a complete and in-depth
treatment of all the BA topics in one concise book; therefore, the book is
divided into two parts:
topics which is the focus of this text. The topics and the chapters contained in
the second volume are outlined below. The specific topics covered in this second
volume are:
Business Analytics
at a Glance
Chapter Highlights
• Introduction to Business Analytics—What Is It?
• Analytics and Business Analytics
• Business Analytics and Its Importance in Modern Business Decisions
• Types of Business Analytics
?? Tools of Business Analytics
• Predictive Analytics
?? Most Widely Used Predictive Analytics Models
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
6 BUSINESS ANALYTICS, VOLUME II
Each of the above mentioned categories uses different tools, and the
use of these analytics depends on the type of business and the operations
a company is involved in. For example, one organization may use only
descriptive analytics tools or a combination of descriptive and predictive
modeling and analytics to predict future business performance to drive
business decisions. Other companies may use prescriptive analytics to op-
timize business processes.
include the commonly used graphs and charts along with some newly
developed graphical tools such as bullet graphs, tree maps, and data dash-
boards. Dashboards are now becoming very popular with big data. They
are used to display the multiple views of the business data graphically.
The other aspect of descriptive analytics is an understanding of num-
erical methods, including the measures of central tendency, measures of
position, measures of variation, measures of shape, and how different
measures and statistics are used to draw conclusions and make decision
from the data. Some other topics of interest are the understanding of em-
pirical rule and the relationship between two variables—the covariance
and correlation coefficient. The tools of descriptive analytics are helpful in
understanding the data, identifying the trend or patterns in the data, and
making sense from the data contained in the databases of companies. The
understanding of databases, data warehouse, web search and query, and
big data concepts are important in extracting and applying descriptive
analytics tools. A number of statistical software are used for statistical an-
alysis. Widely used software are SAS, MINITAB, and R—programming
language for statistical computing. Volume I of this book is about descrip-
tive analytics that deals with a number of applications and a detailed case
to explain and implement the applications.
Tools of descriptive analytics: Figure 1.1 outlines the tools and
methods used in descriptive analytics. These tools are explained in subse-
quent chapters.
Predictive Analytics
Machine learning and data mining are similar in some ways and often
overlap in applications. Machine learning is used for prediction based on
known properties learned from the training data, whereas data mining
algorithms are used for discovery of (previously) unknown patterns. Data
mining is concerned with knowledge discovery in databases (or KDD).
Data mining uses many machine learning methods. On the other
hand, machine learning also employs data mining methods as “unsuper-
vised learning” or as a preprocessing step to improve learner accuracy.
The goals are somewhat different. The performance of machine learn-
ing is usually evaluated with respect to the ability to reproduce known
12 BUSINESS ANALYTICS, VOLUME II
Machine learning tasks are typically classified into following three broad
categories, depending on the nature of the learning “signal” or “feedback”
available to a learning system. These are as follows [20]:
brain processes light and sound into vision and hearing. Some successful
applications of deep learning are computer vision and speech recognition.
Note: Neural networks use machine learning algorithms extensively, whereas machine learn-
ing is an application of artificial intelligence that automates analytical model building by
using algorithms that iteratively learn from data without being explicitly programmed [1].
of operations management can be divided into mainly three areas: (a) plan-
ning, (b) analysis, and (c) control tools. The analysis part is the prescriptive
analysis part that uses the operations research, management science, and
simulation. The control part is used to monitor and control the product and
service quality. The prescriptive analytics models are shown in Figure 1.5.
Figure 1.6 outlines the tools of descriptive, predictive, and prescrip-
tive analytics tools together. This flow chart is helpful in outlining the dif-
ference and details of the tools for each type of analytics. The flow chart in
Figure 1.6 shows the vast areas of business analytics (BA) that come under
the umbrella of business intelligence (BI).
Types of Models
(i) Graphical models, (ii) quantitative models, (iii) algebraic models,
(iv) spreadsheet models, (v) simulation models, (vi) process optimization
models, and (vii) other—predictive and prescriptive models.
The first volume of this book provided the details of descriptive analytics
and outlined the tools of predictive and prescriptive analytics. The predic-
tive analytics is about predicting the future business outcomes. The sec-
ond volume of this book is about predictive modeling which provides the
background and the models used in predictive modeling with applications
and cases. We have explained the distinction between descriptive, predic-
tive, and prescriptive analytics. The prescriptive analytics is about optimiz-
ing certain business activities. A complete treatment of the topics used in
predictive and prescriptive analytics is not possible in one brief volume of
analytics book; therefore, this volume II focuses on predictive modeling.
Business Analytics at a Glance 19
Summary
Business analytics (BA) uses data, statistical analysis, mathematical and
statistical modeling, data mining, and advanced analytics tools, includ-
ing forecasting and simulation, to explore, investigate, and understand
the business performance. Through data, BA helps to gain insight and
drive business planning and decisions. The tools of BA focus on under-
standing business performance based on the data and a number of mod-
els derived from statistics, management science, and different types of
analytics tools.
BA helps companies to make informed business decisions and can
be used to automate and optimize business processes. Data-driven com-
panies treat their data as a corporate asset and leverage it for competi-
tive advantage. Successful business analytics depends on data quality and
skilled analysts who understand the technologies. BA is an organizational
commitment to data-driven decision making.
This chapter provided an overview of the field of BA. The tools
of BA, including the descriptive, predictive, and prescriptive analyt-
ics along with advanced analytics tools were discussed. This chapter
also introduced a number of terms related to and used in conjunction
with BA. Flow diagrams outlining the tools of each of the descriptive,
predictive, and prescriptive analytics were presented. This second vol-
ume of business analytics book is a continuation of the first volume.
A preview of this second volume entitled Business Analytics: A Data-
Driven Decision-Making Approach for Business: Volume II is provided
in this chapter.
Gartner was credited with the three “Vs” of big data. Gartner’s definition of big
data is as follows: high-volume, high-velocity, and high-variety information assets
that demand cost-effective, innovative forms of information processing that en-
able enhanced insight, decision making, and process automation.
Gartner is referring to the size of data (large volume), speed with which the
data is being generated (velocity), and the different types of data (variety), and
this seemed to align with the combined definition of Wikipedia and O’Reilly
media.
Mike Gualtieri of Forrester said that the three “Vs” mentioned by Gartner
are just measures of data. He insisted that following definition is more actionable
and can be seen as follows:
Big data is the frontier of a firm’s ability to store, process, and access (SPA)
all the data it need to operate effectively, make decisions, reduce risks, and serve
customers.
Algorithm A mathematical formula or statistical process used to analyze data.
Analytics Involves drawing insights from the data, including big data. Analyt-
ics uses simple to advanced tools depending upon the objectives. Analytics may
involve visual display of data (charts and graphs), descriptive statistics, making
predictions, forecasting future outcomes, or optimizing business processes. The
more recent terms is Big Data Analytics that involves making inferences using
very large sets of data. Thus, analytics can take different form depending on the
objectives and the decisions to be made. They may be descriptive, predictive, or
prescriptive analytics. These are briefly described here.
Descriptive Analytics If you are using charts and graphs or time series plots to
study the demand or the sales patters, or the trend for the stock market, you are
using descriptive analytics. Also, calculating statistics from the data such as the
mean, variance, median, or percentiles are all examples of descriptive analytics.
Some of the recent software are designed to create dashboards that are useful in
analyzing business outcomes. The dashboards are examples of descriptive analyt-
ics. Of course, a lot more details can be created from the data by plotting and
performing simple analyses.
Predictive Analytics As the name suggests, predictive analytics is about predict-
ing the future business outcomes. It also involves forecasting demand, sales, and
profits for a company. The commonly used techniques for predictive analytics are
different types of regression and forecasting models. Some advanced techniques
are data mining, machine learning, neural networks, and advanced statistical
models. We will discuss the regression and forecasting techniques as well as the
related terms later in this book.
Prescriptive Analytics Prescriptive analytics involves analyzing the results of the
predictive analytics and “prescribes” the best category to target and minimize or
maximize the objective(s). It builds on predictive analytics and often suggests
the best course of action, leading to best possible solution. It is about optimizing
(maximizing or minimizing) an objective function. The tools of prescriptive ana-
lytics are now used with big data to make data-driven decisions by selecting the
best course of actions involving multicriteria decision variables. Some examples
of prescriptive analytics models are linear and nonlinear optimization models,
different types of simulations, and others.
Business Analytics at a Glance 21
Data Mining Data mining involves finding meaningful patterns and deriving
insights from large data sets. It is closely related to analytics. Data mining uses
statistics, machine learning, and artificial intelligence techniques to derive mean-
ingful patterns.
Analytical Models The most commonly used models that are parts of descrip-
tive, predictive, or prescriptive analytics are graphical models, quantitative mod-
els, algebraic models, spreadsheet models, simulation models, process models,
and other analytic models—predictive and prescriptive models.
IoT Stands for Internet of Things or IOT. It means the interconnection of com-
puting devices in embedded objects (sensors, cars, fridges, etc.) via Internet
with capabilities of sending or receiving data. The devices in IOT generate huge
amounts of data providing opportunities for big data applications and data ana-
lytics opportunities.
Machine Learning Machine learning is a method of designing systems that can
learn, adjust, and improve based on the data fed to them. Machine learning works
based on predictive and statistical algorithms that are provided to these machines.
The algorithms are designed to learn and improve as more data flow through the
system. Fraud detection, e-mail spam, and GPS systems are some examples of
machine learning applications.
R “R” is a programming language for statistical computing. It is one of the popu-
lar languages in data science.
Structured vs. Unstructured Data Refer to the “volume” and “variety”—the
“Vs” of big data. Structured data is the data that can be stored in the relational
databases. This type of data can be analyzed and organized in such a way that
can be related to other data via tables. Unstructured data cannot be directly put
in the databases or analyzed or organized directly. Some examples are e-mail/text
messages, social media posts, and recorded human speech, etc.
CHAPTER 2
Chapter Highlights
• Business Analytics and Business Intelligence—Overview
• Types of Business Analytics and Their Objectives
• Input to Business Analytics, Types of Business Analytics, and
Their Purpose
• Business Intelligence and Business Analytics: Differences
• Business Intelligence and Business Analytics: A Comparison
• Summary
science and operations research. All these tools help businesses in making
informed business decisions. The analytics tools are also critical in auto-
mating and optimizing business processes.
The types of analytics are divided into different categories. Accord-
ing to the Institute of Operations Research and Management Science
(INFORMS)—(www.informs.org)—the field of analytics is divided into
three broad categories: descriptive, predictive, and prescriptive. We dis-
cussed each of the three categories along with the tools used in each one.
The tools used in analytics may overlap and the use of one or the other
type of analytics depends on the applications. A firm may use only the
descriptive analytics tools or a combination of descriptive and predictive
analytics depending upon the types of applications, analyses, and deci-
sions they encounter.
Figure 2.4 Comparing business intelligence (BI) and business analytics (BA)
the business. The information about what went wrong or what is happen-
ing in the business provides opportunities for improvement.
BI may be seen as the descriptive part of data analysis but when combined
with other areas of analytics—predictive, advanced, and data analytics—
provides a powerful combination of tools. These tools enable the analyst and
data scientists to look into the business data, the current state of the business,
and make use of predictive, prescriptive, data analytics tools as well as the
powerful tools of data mining to guide an organization in business planning,
predicting the future outcomes, and make effective data-driven decisions.
The flow chart in Figure 2.4 also outlines the purpose of BA program
and briefly mentions the tools and the objectives of BA. Different types of
analytics and their tools are discussed earlier and are shown in Table 2.2.
The terms business analytics (BA) and business intelligence (BI) are
used interchangeably and often the tools are combined and referred to as
business analytics or business intelligence program. Figure 2.5 shows the
31
Figure 2.5 Business intelligence (BI) and business analytics (BA) tools
32 BUSINESS ANALYTICS, VOLUME II
tools of BI and BA. Note that the tools overlap in the two areas. Some of
these tools are common to both.
Summary
This chapter provided an overview of business analytics (BA) and busi-
ness intelligence (BI) and outlines the similarities and differences between
them. The BA, different types of analytics—descriptive, predictive, and
prescriptive—and the overall analytics process were explained using a
flow diagram. The input to the analytics process and the types of ques-
tions each analytics attempts to answer along with their tools were dis-
cussed in detail. The chapter also discussed BI and a comparison between
BA and BI. Different tools used in each type of analytics—descriptive,
predictive, and prescriptive—and their relationship were described. The
tools of analytics overlap in applications, and in many cases, a combina-
tion of these tools are used. The interconnection between different types
of analytics tools were explained. Finally, a comparison between the BI
and BA was presented. BA, data, analytics, and advanced analytics fall
under the broad area of BI. The broad scope of BI and the distinction
between the BI and BA tools were outlined.
CHAPTER 3
Analytics, Business
Analytics, Data Analytics,
and How They Fit into the
Broad Umbrella of Business
Intelligence
Chapter Highlights
• Introduction: Analytics, Business Analytics, and Data Analytics
?? Analytics
?? Business Analytics
• Business Intelligence—Defined
• Origin of Business Intelligence
• How Does Business Intelligence Fit into Overall Analytics?
• Business Intelligence and Support Systems
• Applications of Business Intelligence
• Tools of Business Intelligence
• BI Functions and Applications Explained
?? Reporting
?? Process Mining
?? Web Analytics
?? Financial Analytics
• Advanced Analytics
• BI Programs in Companies
• Specific Areas of BI Applications in an Enterprise
• Success Factors for BI Applications
• Comparing BI with BA
• Difference between BA and BI
• Glossary of Terms Related to Business Intelligence
• Summary
Analytics
(BA) goes beyond simply presenting data and creating visuals, crunch-
ing numbers, and computing statistics. The essence of analytics lies in
the application—making sense from the data using prescribed statistical
methods, tools, and logic to draw meaningful conclusion from the data.
It uses logic, learning, intelligence, and mental models that enable us to
reason, organize, analyze, and solve problems, and understand the data,
learn, and make data-driven decisions.
Business Analytics
Business analytics (BA) covers a vast area. It is a complex field that en-
compasses visualization, statistics, statistical analysis, and modeling. It
uses descriptive, predictive, and prescriptive analytics, including text and
speech analytics, web analytics, decision processes, and much more.
Before the data can be used effectively for analysis, the following data
preparation steps are essential:
1. Data cleansing
2. Scripting
3. Data transformation
4. Data warehousing
Accuracy Completeness
Update status Relevance
Consistency across data sources Reliability
Appropriate presentation Accessibility
Business Intelligence—Defined
According to David Loshin, business intelligence (BI) is “…the processes,
technologies and tools needed to turn data into information, information into
knowledge, and knowledge into plans that drive profitable business actions.”
According to Larissa Moss’, BI is “… an architecture and a collection of
integrated operational as well as decision-support applications and databases
that provide the business community easy access to business data.”
BI is a technology-driven process for processing and analyzing data to
make sense from huge quantities of data that businesses collect and obtain
from various sources. In a broad sense, BI is both visualization and ana-
lytics. The purpose of visualization or graphic presentation of data is to
obtain meaningful and useful information to help management, business
managers, and other end-users make more-informed business decisions.
BI uses a wide variety of tools, applications, and methodologies that en-
able organizations to collect data from internal systems and processes as
well as external sources. The collected data may be both structured and
unstructured. The first challenge is to prepare the data to run queries,
perform analysis, and create reports.
One of the major tasks is to create dashboards and other forms of
data visualizations and make the analysis results available to corporate
decision makers as well as the managers and others involved in the
decision-making process [http://searchbusinessanalytics.techtarget.com/
definition/business-intelligence-BI [9,10]].
technology and computing power, visuals and data dashboards are com-
monly used in business reporting.
The BI tools, technologies, and technical architectures are used in
the collection, analysis, presentation, and dissemination of business in-
formation. The analysis of business data provides historical as well as
current and future views of the business performance. Specialized data
analysis and software are now available that are capable of processing
and analyzing big data. They can create multiple views of the busi-
ness performance in form of dashboards, which are extremely helpful
in displaying current business performance. The big data software is
now being used for analyzing vast amount of data. They are extremely
helpful in the decision-making process. Besides data visualization, a
number of models described earlier are used to predict and optimize
future business outcomes.
Reporting
Data mining involves exploring new patterns and relationships from the
collected data. Data mining is a part of predictive analytics. It involves
processing and analyzing huge amount of data to extract useful infor-
mation and patterns hidden in the data. The overall goal of data min-
ing is knowledge discovery from huge amounts of data businesses collect.
Data mining techniques are used in (i) extracting previously unknown
and potential useful knowledge or patterns from massive amount of data
collected and stored and (ii) exploring and analyzing these large quanti-
ties of data to discover meaningful pattern and transforming data into
an understandable structure for further use. The field of data mining is
a rapidly growing and statistics plays a major role in it. Data mining is
also known as knowledge discovery in databases (KDD), pattern analysis,
information harvesting, BI, BA, and so on. Besides statistics, data min-
ing uses artificial intelligence, machine learning, database systems and
advanced statistical tools, and pattern recognition.
44 BUSINESS ANALYTICS, VOLUME II
Process Mining
deriving patterns within the structured data, and finally evaluation and
interpretation of the output. “High quality” in text mining usually refers
to some combination of relevance (how well a retrieved document or set
of documents meets the information need of the user).
Typical text mining tasks include text categorization, text clustering [1],
concept/entity extraction, production of granular taxonomies, sentiment
analysis, document summarization, and entity relation modeling (i.e.,
learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study
word frequency distributions, pattern recognition, information extraction,
data mining techniques including link and association analysis, visualiza-
tion, and predictive analytics. The overall goal is to transform text into data
for analysis using natural language processing (NLP) [2] and analytical
methods.
A typical application is to scan a set of documents written in a natural
language. It is also known as ordinary language—any language that has
evolved naturally in humans through use and repetition without con-
scious planning or premeditation. Natural languages can take different
forms, such as speech or signing (sign language). They are distinguished
from constructed and formal languages such as those used to program
computers or to study logic [17].
Text Analytics
The term text analytics describes a set of linguistic applications (the sci-
entific [1] study of languages and involves an analysis of language). It uses
statistical and machine learning techniques to model and structure the
information content of textual sources. The term is synonymous with text
mining. Ronen Feldman modified a 2000 description of “text mining” [4]
in 2004 to describe “text analytics” [5]. The latter term is now used more
frequently in business settings.
The term text analytics also describes the application of text analyt-
ics to respond to business problems, whether independently or in con-
junction with query and analysis of fielded, numerical data. In general,
approximately 80 percent of business-relevant information originates in
unstructured form, primarily text [7]. The techniques of text analytics
46 BUSINESS ANALYTICS, VOLUME II
sources internal to the business such as financial and operations data (in-
ternal data). When combined, external and internal data can provide a
more complete picture, which in effect, creates an “intelligence” that can-
not be derived by any singular set of data [3].
BI along with BA empower organizations to gain a better under-
standing of the existing markets and customer behavior. The tools of
BI are being used to study the markets, analyze massive amounts of
data to learn about customer behavior, conduct risk analysis, assess
demand and suitability of products and services for different m arket
segments, and predict and optimize business processes to name a
few [10–12].
Summary
This chapter discussed analytics, business analytics (BA), data analyt-
ics (DA), and business intelligence (BI) as decision-making tools in
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 55
Descriptive Analytics—
Overview, Applications,
and a Case
Chapter Highlights
• Overview: Descriptive Analytics
• Descriptive Analytics—Applications—A Business
Analytics Case
• Case Study: Buying Pattern of Online Customers in a Large
Department Store
• Summary
59
60 BUSINESS ANALYTICS, VOLUME II
customers placing orders online. As the orders are placed, customer infor-
mation is recorded in the database. Data on several categorical and num-
erical values are recorded. The categorical variables shown in the data file
are day of the week, time (morning, midday), payment type (credit, debit
cards, etc.), region of the country order was placed from, order volume,
sale or promotion item, free shipping offer, gender, and customer survey
rating. The quantitative variables include order quantity and the dollar
value of the order placed or “Total Orders.” Table 4.1 shows the part of
the data.
The operations manager of the store wants to understand the buying
pattern of the customers by summarizing and displaying the data visu-
ally and numerically. He believes that using the descriptive analytics tools
including the data visualization tools, numerical methods, graphical dis-
plays, dashboards, and tables of collected data can be created to gain more
insight into the online order process. They will also provide opportunities
for improving the process.
The manager hired an intern and gave her the responsibility to pre-
pare a descriptive analytics summary of the customer data using graphical
and numerical tools that can help understand the buying pattern of the
customers and help improve the online order process to attract more on-
line customers to the store.
The intern was familiar with one of the tools available in EXCEL—
the Pivot Table/Pivot Chart that she thought can be used in extracting
information from a large database. In this case, the pivot tables can help
break the data down by categories so that useful insight can be obtained.
For example, this tool can create a table of orders received by the geo-
graphical region or summarize the orders by the day or time of the week.
She performed analyses on the data to answer the questions and concerns
the manager expressed in the meeting. As part of the analysis, the follow-
ing graphs, tables, and numerical analyses were performed.
1. A pivot table, a bar chart, and a pie chart of the pivot table providing
a summary of number of orders received on each day of the week were
created to visually see the orders received by the online department
on each day (Figures 4.2 and 4.3). The table and graphs show that the
maximum number of orders were received on Saturday and Sunday.
Table 4.1 Partial data online orders
Buying Pattern of Online Customer in a Large Department Store
Payment Order Order Sale/ Free Total Customer Survey
Day Time Type Region Volume Quality Promotion Shipping Order Gender Rating
Mon Morning Visa North High 6 1 Yes 194.12 Male Good
Mon Morning Visa North Low 2 1 No 40.38 Male Good
Mon Morning Visa North High 7 1 Yes 270.87 Female Fair
Mon Morning Visa North Medium 4 0 No 186.88 Male Excellent
Tues Morning Visa North High 6 0 Yes 279.52 Female Good
Tues Morning Visa North High 2 1 Yes 220.30 Female Fair
Tues Morning Visa North High 7 1 No 279.57 Female Excellent
Tues Morning Visa North Medium 5 0 Yes 160.70 Male Poor
Tues Midday Visa North Medium 4 1 Yes 184.96 Male Good
Tues Midday Visa North High 8 1 Yes 205.39 Male Good
Wed Midday MasterCard North High 7 1 Yes 272.88 Male Excellent
Mon Midday Store Card North Medium 5 1 Yes 191.83 Male Excellent
Mon Midday MasterCard North High 7 1 Yes 288.94 Male Excellent
Tues Midday Store Card North High 3 0 Yes 270.75 Male Fair
Tues Midday MasterCard North High 8 0 Yes 275.27 Male Poor
Wed Afternoon Store Card North Medium 4 1 Yes 174.58 Male Good
Wed Afternoon MasterCard South Medium 4 0 Yes 152.30 Male Good
Thurs Afternoon Store Card South Medium 5 1 No 172.39 Male Fair
Wed Afternoon MasterCard South High 7 1 Yes 215.69 Male Excellent
Wed Afternoon Store Card South Low 3 0 No 80.89 Male Excellent
Thurs Afternoon MasterCard South High 8 0 Yes 184.19 Male Good
Fri Afternoon Store Card South Medium 4 1 Yes 181.28 Male Good
Fri Afternoon MasterCard South Medium 4 1 Yes 158.96 Male Poor
61
Fri Afternoon Store Card South Medium 4 1 Yes 198.28 Male Poor
62 BUSINESS ANALYTICS, VOLUME II
2. Table 4.2 and Figure 4.4 show the count of number of orders by the
time of the day (morning, midday, etc.). A bar chart and a pie chart of
the pivot table were created to visually see the orders received online
by the time of day. The pie chart shows both the numbers and the
percent for each category. The table and the pie chart indicate that
more orders are placed during night hours.
3. Orders by the region: The bar chart and the pie chart (Figures 4.5 and
4.6) summarize the number of orders by the region. These plots show
that the maximum orders were received from the North and South
regions. Marketing efforts are needed to target the other regions.
Descriptive Analytics—Overview, Applications, and a Case 63
4. A pivot table (Table 4.3) and a bar graph (Figure 4.7) were created
to summarize the customer rating by gender where the row labels
show “Gender” and the column labels show the count of “Customer
Survey Ratings” (excellent, good, fair, poor). A bar chart of the count
of “Customer Survey Ratings” (excellent, good, fair, poor) on the
y-axis and gender on the x-axis is shown below the table. This infor-
mation provided the customer opinion and was important to view
and improve the process.
5. The descriptive statistics of the “total orders ($)” was calculated and
displayed in Table 4.4 and the plot below. The statistics show the
measures of central tendency and the measures of variation along
with other useful statistics of the total orders.
Descriptive Analytics—Overview, Applications, and a Case 65
Maximum
Minimum
SE Mean
Variable
Median
Mean
Q1
Q3
N*
N
Total 500 0 223.87 3.77 84.23 30.09 167.95 252.62 287.54 371.40
order
($)
gain insight and learn from the data. These tools help to understand what
has happened in the past and is very helpful in predicting future business
outcome. Predictive analytics tools help answer these questions. The rest
of the book explores predictive analytics tools and applications.
Summary
In this chapter, we provided a brief description of descriptive analytics
and a case to illustrate the tools and applications of visual techniques used
in descriptive analytics. The descriptive analytics is critical in studying the
current state of the business and to learn what has happened in the past
using the company’s data. The knowledge from the descriptive analytics
lays a foundation for further analysis and leads to predictive analytics. As
mentioned, the knowledge obtained by descriptive analytics helps us to
learn what has happened in the past. This information is used to create
predictive analytics models.
The subsequent chapters discuss the predictive analytics and back-
ground information needed for predictive analytics along with the ana-
lytical tools. Specific predictive analytics models and their applications
are the topics of chapters that follow The rest of this book covers mostly
predictive analytics.
CHAPTER 5
Chapter Highlights
• What Is Predictive Analytics and How Is It Different from
Descriptive Analytics?
• Exploring the Relationships between the Variables—Qualitative
Tools
• An Example of Logic-Driven Model—Cause-and-Effect
Diagram
• Data-Driven Predictive Models and Their Applications—
Quantitative Models
• Prerequisites and Background for Predictive Analytics
• Summary
and explaining data to reveal trends and patterns and to obtain information
not apparent otherwise. The objective is to obtain useful information that can
help organizations achieve its goals. Predictive analytics is about identifying
future business trends, creating, and describing predictive models to explore
the trends and relationships. The descriptive analytics tools are useful in visu-
alizing some of the trends and relationships among the variables, predictive
analytics provides information on what types of predictive models can be used
to predict the future business outcomes.
73
74 BUSINESS ANALYTICS, VOLUME II
Table 5.1 outlines the statistical tools, their brief description, and ap-
plication areas of predictive analytics models.
The next chapter discusses the details of the above data-driven predic-
tive models with applications.
Table 5.1 Statistical models and prerequisites for predictive modeling
Statistical Tools and Models Brief Description Application Areas
Probability concepts
One of the main reasons for applying statistics in Probability is used to answer the questions in the
analytics is that statistics allows us to draw conclu- following situations:
sions using limited data, that is, we can draw con- What is the probability that the Reserve Bank will
clusions about the population using sample data. start raising the interest rate soon?
The process of making inferences about the popula- What is the probability that the Dow Jones stock
tion using the sample involves uncertainty. index will go up by 15% by the end of this year?
Probability is used in situations where uncertainty What is the probability of me winning the Power-
exits; it is the study of random phenomenon or ball lottery?
events. A random event is an event in which the What is the probability that a customer will default
outcome cannot be predicted in advance. on a loan and is a potential risk?
Probability can tell us the likelihood of the random
event(s).
In the decision-making process, uncertainty almost
always exists. One question that is of usual concern
when making decisions under uncertainty is the
probability of success of the outcome.
(Continued)
75
Table 5.1 (Continued)
76
Statistical Tools and Models Brief Description Application Areas
Probability distributions
Discrete and continuous probability distributions Although probabilities are the way of dealing with Computer simulation is often used to study the
Most processes produce random outcomes that can uncertainty and the rules of probability provide behavior of a call center or a drive through of
be described using a random variable. us with a way of dealing with uncertainties, the fast-food restaurants. In such applications, the ar-
A random variable can be discrete or continuous. concept and understanding of probability distribu- rivals of calls in a call center and customer arrival
The random variable that can assume only a count- tions is critical in modeling and decision making in in a drive through are modeled using some type of
able number of values is called a discrete random vari- analytics. distribution.
able (e.g., number of defective products or number The probability distribution assigns probabilities to The customer or calls arriving are random phenom-
of voters). Random variables that are not countable each value of a random variable. enon that are random and can be modeled using a
but correspond to the points in an interval are A random variable is a variable that can take numerical discrete distribution known as a Poisson distribu-
known as continuous random variables (e.g., delivery values that are associated with the random outcomes tion. In a fast-food drive through, the customer
time, length, diameter, volume). of an experiment or a process under study. A ran- waiting time and service time can be described using
Continuous random variables are infinite and dom variable is usually an outcome of a statistical a continuous distribution known as exponential
uncountable. experiment or a process that generates random distribution.
These random variables are described using either outcomes and can be modeled using a probability A very widely used continuous distribution in real
a discrete or a continuous probability distribution distribution—discrete or continuous depending on world is normal or Gaussian distribution.
depending on the nature of variable. the outcome of variable (s) of interest. These dis- Some examples where this distribution can be ap-
Distributions have wide applications in data analy- tributions are used to determine the probabilities of plied are the length of time to assemble an electron-
sis, decision making, and computer simulation. outcomes and draw conclusions from the data. ic appliance, the life span of a satellite power source,
The distributions are applied based on the trend or fuel consumption in miles-per-gallon of new model
pattern from the data or when certain conditions of a car, the inside diameter of a manufactured cyl-
are met. For example, a bell-shaped pattern from a inder, the waiting time of patients at an outpatient
data is usually described using a normal distribution, clinic, etc.
whereas customer arrival or calls coming to a call
center is random and can be modeled using a Pois-
son distribution.
The normal distribution is a continuous distribu-
tion, whereas the Poisson distribution falls in the
category of discrete distribution. There are a number
of distributions used in data analytics.
Sampling and sampling distribution
Sampling is a systematic way of selecting a few items In data analysis, we almost always rely on sample A manufacturer of computer and laser printers has
from the population. Samples are analyzed to draw data to draw conclusion about the population from determined that the assembly time of one of its
conclusion(s) about the entire population. which the data are collected. computers is normally distributed with mean μ = 18
minutes and standard deviation σ = 4 minutes.
In most cases, the parameters of the population are Sample is a part of population. One of the obvious To improve this process, a random sample is used.
unknown and we can estimate a population param- reasons of using samples in statistics and data an- The analysis of the sample data can answer the
eter using a sample statistic. alysis is because in most cases, the population can probability that the mean assembly time will take
Sampling distribution is the probability distribution be huge and it is not practical to study the entire longer than 19 minutes.
of a sample statistic. population. The other example involves predicting the presi-
Different sampling techniques are used to collect sample Samples are used to make inferences about the dential election using poll results. Polls are con-
data. Sample statistics are then calculated to draw con- population and this can be done through a sampling ducted to gather sample data and lots of planning
clusion about the population. Sample statistic may be distribution. goes into this.
a sample mean x, sample variance s2, sample stan- Here we will try to answer questions related to sam- Refer to Gallup polls in http://www.gallup.com that
dard deviation s, or a sample proportion, p. pling and surveys conducted in the real world. conducts and reports numerous poll results. They
The central limit theorem has major applications in In sampling theory, we need to consider several fac- use sample data to predict the polls (the percent or
sampling and other areas of statistics and data an- tors and answer questions, such as why do we use proportion of voters that favor different candidates).
alysis. It tells us that if we take a large sample, that samples? Why do we need to have a homogeneous This is an example of using a sample of voters to
is a sample size of 30 or more or (n ≥ 30), we can sample? What are different ways of taking samples? predict the population of voters who favor certain
use the normal distribution to calculate the prob- What is a sampling distribution and what is the pur- candidates.
ability and draw conclusion about the population pose of it?
77
parameter.
(Continued)
Table 5.1 (Continued)
78
Statistical Tools and Models Brief Description Application Areas
In data analysis, sample data is used to draw conclu- A population is described by its parameters The first example studied the mean, whereas the
sion about the population. (population parameters) and a sample is described poll example is about studying proportion.
by its statistics (sample statistics). It is important
to note that a population parameter is always a
constant, whereas a sample statistic is a random
variable. Similar to other random variables, each
sample statistic can be described using a probability
distribution.
Inference procedure: estimation and confidence intervals
Statistical inference: The objective of statistical inference The parameters of a process are generally unknown; A recent poll report entitled: “Link between exer-
is to draw conclusions or make decisions about a popula- they change over time and must be estimated. This cising regularly and feeling good about appearance”
tion based on the samples selected from the population. is done using inference procedures. came to the following conclusion: 56% of Amer-
The idea of drawing conclusion about the popula- Statistical inference is an extremely important area of icans who exercise two days per week feel good
tion parameters such as the population mean (μ), statistics and data analysis and is used to estimate about their looks. This jumps to the 60% range
population variance (σ 2), population proportion the unknown population parameters. among those who exercise three to six times per
(p) comes under estimation where these population For example, estimating the mean, μ, or the stan- week. The results are based on telephone interviews
parameters are estimated using the sample statistics dard deviation, σ, using the corresponding sample conducted as part of the Gallup-Healthways Well-
referred to as the sample mean (x), sample variance statistic (sample mean, x, or the sample standard Being Index survey from January 1 to June 23, 2014
(s2), and the sample proportion (p). deviation, s). There are two major tools of infer- with a random sample of 85,143 adults aged 18 and
The reason for estimating these parameters is that ential statistics: estimation and hypothesis testing. older, living in all 50 U.S. states.
the values of the population parameters are un- These techniques are the basis for many of the For the results based on the total sample of national
known, and therefore, we must use the sample data methods of data analysis and statistical quality con- adults, the margin of sampling error is ±5.0 percentage
to estimate them. trol. Here we explain the concept of estimation. points at the 95% confidence level.
Estimation is the simplest form of inferential sta- Parts of the claims made here may not make any
tistics in which a sample statistic is used to draw sense and perhaps you are wondering about some
conclusions regarding the unknown population of the statements. For example, what do the margin
parameter. of sampling error of ±5.0 percentage points and a 95%
Two types of estimates are used in parameter estima- confidence level mean? Also, how can using a sample of
tion: point estimate and interval estimate or confidence only a few thousand allows for a conclusion to be drawn
interval. about the entire population?
Estimation and confidence intervals answer
the above questions. They enable us to draw
conclusion(s) about the entire population using the
sample from the population.
Inference procedure: hypothesis testing
Hypothesis testing is a major tool of inferential Here we extend the concept of inferential statistics Let us look into a real-world example. In recent
statistics that uses the information in the sample to hypothesis testing. Hypothesis tests enable us to years, there has been a great deal of interest in hy-
data to make a decision about a hypothesis. The hy- draw conclusions about a population by analyzing brid cars. Consumers are attracted to buy hybrids be-
pothesis may be about a mean, proportion, variance, the information obtained from a sample. cause of the high miles per gallon (mpg) these cars
and so on. Suppose the Department of Labor claims A hypothesis test involves a statement about a pop- claim to provide. If you are interested in purchasing
that the average salary of graduates with a Data ulation parameter (such as a population mean, or a a hybrid, there are many makes and models from
Analytics degree is $80,000, this can be written as a population proportion). The test specifies a value for different manufacturers to choose from. It seems
hypothesis. the parameter that lies in a region. Using the sample that just about every manufacturer offers a hybrid
To verify this claim, a sample of recent graduates data, we must decide whether the hypothesis is con- to compete in the growing market of hybrid cars.
may be evaluated and a conclusion can be reached sistent with or supported by the sample data. The following are the claims made by some of the
about the validity of this claim. A hypothesis may manufactures of hybrid cars: Toyota Prius claims to
test a claim, a design specification, a belief, or a provide about 50 mpg in the city and 48 mpg on the
theory, and sample data are used to verify these. highway. It tops the list of fuel-efficient hybrids. The
79
estimated annual fuel cost is less than $800.
(Continued)
Table 5.1 (Continued)
80
Statistical Tools and Models Brief Description Application Areas
Ford Fusion Hybrid claims to provide 41 mpg in
the city and 36 mpg on the highway. The average
annual fuel cost of less than $1,000 makes it attract-
ive to customers.
Honda Civic Hybrid claims to provide 40 mpg in
the city and 45 mpg on the highway. Estimated an-
nual fuel costs are less than the Ford Fusion.
These days we find several claims like the ones
above in consumer magazines and television com-
mercials. Should the consumers believe these
claims? Hypothesis testing may provide the answer
to such questions. Hypothesis testing will enable us
to make inferences about a population parameter by
analyzing the difference between the stated popula-
tion parameter value and the results obtained from
the sample data. It is a very widely used technique
in real world.
Correlation analysis
Correlation is a numerical measure of linear associ- Coefficient of correlation is a numerical measure of Suppose we are interested in investigating and
ation between two variables. It provides the degree the linear association between the two variables x knowing the strength of relationships between the
of association between two variables of interest and and y. The correlation coefficient is denoted by rxy sales and the profit or the relationship between the
tells us how week or strong the relationship is be- and its value is between −1 and +1. The rxy value sales and advertisement expenditures for a company.
tween variables. tells us the degree of association between the two Similarly, we can study the relationship between
variables. It also tells us how strong or weak the the home-heating cost and the average temperature
correlation is between the two variables. using the correlation analysis. Usually, the first step
in correlation analysis starts by constructing a scat-
ter plot.
These plots are very useful in visualizing whether
the relationship between the variables is positive or
negative and linear or nonlinear. The next step is
calculating a numerical measure. For example, if the
calculated coefficient of correlation r = +0.902, it
shows a very strong positive correlation between the
sales and advertisement. These scatter plots are very
helpful in describing bivariate relationships or the
relationship between the two quantitative variables
and can be easily created using computer packages.
These are very helpful in data analysis and model
building.
81
82 BUSINESS ANALYTICS, VOLUME II
Summary
Predictive analytics is about predicting the future business outcomes.
This phase of analytics uses a number of models that can be divided into
logic-driven models and data-driven models. We discussed both types of
models and the difference between the two. The key discussion area of
this chapter was to introduce the readers to a number of tools and statis-
tical models—the understanding of which are critical in understanding
and applying the predictive analytics models. These are background infor-
mation and we call them prerequisite to predictive analytics. The chapter
provided a brief description and application areas of prerequisite tools.
These are probability concepts, probability distributions, sampling and
sampling distributions, correlation analysis, estimation and confidence
intervals, and hypothesis testing. These topics are investigated in detail in
the Appendix that accompanies this book. The appendix is available as a
free download.
Appendix A–D
The appendix contains the topics that are the prerequisite to data-driven
predictive analytics models. The concepts discussed here are essential in
applying predictive models. The Appendix A–D discuss the following sta-
tistical tools and models of business analytics: concept of probability, role
of probability distributions in decision making, sampling and sampling
distribution, inference procedures: estimation and confidence interval,
and inference procedures for one and two-population parameters—
hypothesis testing.
Note: The following chapters discuss predictive analytics models—regression
analysis, modeling, time series forecasting, and data mining
CHAPTER 6
Chapter Highlights
• Key Predictive Analytics Models and Their Brief Description
and Applications
• Regression Models
• Forecasting Models
• Analysis of Variance (ANOVA)
• Data Mining
• Simple Regression, Multiple Regression, Nonlinear Regression
• Forecasting Models
• Summary
Table 6.1 outlines key predictive analytics tools, the types of questions
they try to answer, and briefly explains the applications of the tools.
The descriptions and application areas of the statistical tools in predic-
tive analytics are outlined in Table 6.2.
Table 6.1 Predictive analytics, questions they attempt to answer, and their tools
Predictive Analytics Attempts to Answer Tools and Applications
Regression models • How the trends and patterns identified in the data • Regression models: (a) simple regression models; (b) multiple
can be used to predict the future business outcome(s)? regression models; (c) nonlinear regression models, including the
• How can we identify appropriate prediction models? quadratic or second-order models, and polynomial regression mod-
• How the models be used in making prediction about els; (d) regression models with indicator or qualitative independent
how things will turn out in the future—what will hap- variables; (e) regression models with interaction terms or inter-
pen in the future? action models; and (f) logistic regression models.
• How can we predict the future trends of the key per-
formance indicators using the past data and models
and make predictions?
Forecasting models How to predict the future behavior of the key business Forecasting techniques: Widely used predictive models involve a class
outcomes or variables using different forecasting tech- of time series analysis and forecasting models. The commonly used fore-
niques suited to predict future business phenomena? casting models fall into the following categories:
How different forecasting models using both the qualita- • Techniques using average: simple moving average, weighted moving
tive and quantitative forecasting techniques can be ap- average, exponential smoothing
plied to predict a number of future business phenomena? • Techniques for trend: linear trend equation (similar to simple re-
How some of the key variables, including the sales, gression), double moving average or moving average with trend,
revenue, number of customers, demand, inventory, cus- exponential smoothing with trend or trend-adjusted exponential
tomer behavior, number of visits to the business website, smoothing
and many others, can be predicted using a number of • Techniques for seasonality: forecasting data with seasonal pattern
proven techniques? • Associative forecasting techniques: simple regression, multiple re-
The prediction and forecasting methods use a number of gression analysis, nonlinear regression, regression involving categor-
time series models as well as data mining techniques. ical or indicator variables, and other regression models
How the forecast can be used in short-term and long- • Regression-based models that use regression analysis to forecast fu-
term business planning? ture trends. Other time series forecasting models are simple moving
average, moving average with trend, exponential smoothing, expo-
85
nential smoothing with trend, and forecasting seasonal data.
(Continued)
Table 6.1 (Continued)
86
Predictive Analytics Attempts to Answer Tools and Applications
ANOVA (analysis of ANOVA in its simplest form is a way to study multiple ANOVA and DOE techniques include single-factor ANOVA, two-
variance) means. Single-factor, two- and multiple factor ANOVA factor ANOVA, and multiple factor ANOVA. Factorial designs and
along with design of experiment (DOE) techniques are DOE tools are used to create models involving multiple factors.
powerful tools used in data analysis to study and identify
key variables and build prediction equations. These
models are used in modeling and predictive analytics to
predict future outcomes.
Data mining Determines meaningful patterns and deriving insights Data mining techniques are used to extract useful information from
from large data sets. It is closely related to analytics. huge amounts of data using predictive analytics, computer algorithms,
Data mining uses statistics, machine learning, and arti- software, mathematical, and statistical tools.
ficial intelligence techniques to derive meaningful pat-
terns and make predictions.
Other tools of predictive Machine learning is a method used to design systems Machine learning, artificial intelligence, neural networks, and deep
analytics: that can learn, adjust, and improve based on the data learning have been used successfully in fraud detection, e-mail spam,
Machine learning, artificial fed to them. Machine-learning works based on predic- GPS systems, medicine, medical diagnosis, and predicting and treat-
intelligence, neural net- tive and statistical algorithms that are provided to these ing a number of medical conditions. There are other applications of
works, and deep learning machines. The algorithms are designed to learn and im- machine learning.
prove as more data flow through the system.
Table 6.2 Statistical tools and application areas
Statistical Tools
and Models Brief Description Application Areas
Simple regression Background: Regression analysis is The purpose of simple regression analysis is to develop a statistical model that can be used to predict the
model used to investigate the relationship value of a response or dependent variable using an independent variable.
between two or more variables. For example, we might be interested in predicting the profit using the number of customers or we might be
Often we are interested in predict- interested in predicting the time required to produce certain number of products in a production situation.
ing a variable using one or more In these cases, the variable profit or the variable time that we are trying to predict is known as the depend-
independent variables. ent or the response variable, and the other variable, sales or the number of products, is referred to as the in-
In general, we have one dependent or dependent variable or predictor.
response variable y and one or more In a simple linear regression method, we study the linear relationship between two variables, the dependent or
independent variables x1, x2,…,xk. the response variable (y) and the independent variable or predictor (x). The following is an example relating the
The independent variables are also advertising expenditure and sales of a company. The relationship is linear and the objective is to predict sales—
called predictors. If there is only one response variable (y) using advertisement—the independent variable or predictor (x). A scatter plot as shown in
independent variable x that we are Figure 6.2 is one of the first steps in studying the relationship (linear or nonlinear) between the variables.
trying to relate to the dependent
variable y, then this is a case of
simple regression. On the other hand,
if we have two or more independent
variables that are related to a single
response or dependent variable, we
have a case of multiple regression.
87
Figure 6.2 Scatter plot of sales versus advertising
(Continued)
Table 6.2 (Continued)
88
Statistical Tools
and Models Brief Description Application Areas
Multiple regression In regression analysis, we have one A pharmaceutical company is concerned about declining sales of one of its drugs. The drug was introduced
models dependent or response variable y and in the market approximately two-and-a half years ago. In the recent few months the sales of this product
one or more independent variables, is in constant decline and the company is concerned about losing its market share as it is one of the major
x1, x2,…,xk. The independent vari- drugs the company markets. The head of the sales and marketing department wants to investigate the pos-
ables are also called predictors. If sible causes and evaluate some strategies to boost the sales. He would like to build a regression model of the
there is only one independent vari- sales volume and several independent variables believed to be strongly related to the sales. A multiple re-
able x that we are trying to relate gression model will help the company to determine the important variables and also predict the future sales
to the dependent variable y, then volume. The marketing director believes that the sales volume is directly related to three major factors: dol-
this is a case of simple regression. On lars spent on advertisement, commission paid to the salespersons, and the number of salespersons deployed
the other hand, if we have two or for marketing this drug. A multiple regression model can be built to study this problem.
more independent variables that In a multiple regression, the least squares method determines the best fitting plane or the hyperplane
are related to a single response or through the data points that ensures that the sum of the squares of the vertical distances or deviations from
dependent variable, then we have a the given points and the plane are a minimum.
case of multiple regression.
Figure 6.3 below shows a multiple regression model with two independent variables. The response y with
The relationship between the two independent variables x1 and x2 forms a regression plane.
dependent and independent vari-
able or variables are described by
a mathematical model known as a
regression equation. The regression
model is described in the form of a
regression equation that is obtained
using the least squares method. In
case of a multiple linear regression
the equation is of the form:
y = b0 + b1x1 + b2x2 + b3x3 + … + bnxn,
where b0, b1, b2, …, bn are the re-
gression coefficients and x1, x2,…,xk
are the independent variables.
Figure 6.3 A multiple regression model
Nonlinear regression The above models—simple and A nonlinear (second-order) regression model is described here:
(quadratic and poly- multiple regression—are based The life of an electronic component is believed to be related to the temperature in the operating environ-
nomial) models on the assumption of linearity, ment. A scatter plot shown below was created to study the relationship. The scatter plot in Figure 6.4 shows
that is, the relationship between the life of the components (in hours) and the corresponding operating temperature (in ° F). From the scat-
the independent variable(s) and ter plot, it is clear that the relationship between the variables is not linear. An appropriate model in this
the response variable can be well case would be a second-order or quadratic model that will predict the life of the component. In this case,
approximated by a linear model. the life of the component is the dependent variable (y) and the operating temperature is the independent
However, in certain situations the variable (x).
relationship between the variables
is not linear but may be described
by quadratic or second-order model.
Sometimes the relationships can
be described using a polynomial
model.
89
Table 6.2 (Continued)
90
Statistical Tools
and Models Brief Description Application Areas
Figure 6.5 shows a second-order model with the regression equation that can be used to predict the life of
the components using temperature.
91
(Continued)
Table 6.2 (Continued)
92
Statistical Tools
and Models Brief Description Application Areas
All subset and step- Finding the best set of predictor
wise regression variables to be included in the
model
Some other regression models:
Reciprocal transform- This transformation can produce a linear relationship and is of the form
ation of x variable
1
y = β0 + β1 + ε
x
This model is appropriate when x and y have an inverse relationship. Note that the inverse relationship is not linear.
Log transformation of The logarithmic transformation is of the form
x variable y = β0 + β1 ln(x) + ε
Log transformation of This is a useful curvilinear form where ln(x) is the natural logarithm of x and x > 0 .
x and y variables ln(y) = β0 + β1 ln(x) + ε
The purpose of this transformation is to achieve a linear relationship. The model is valid for positive values of x and y. This transformation is
more involved and is difficult to compare it with other models with y as the dependent variable.
Logistic regression This model is used when the response variable is categorical. In all the regression models we developed in this book, response variable was a
quantitative variable. In cases, where the response is categorical or qualitative, the simple and multiple least-squares regression model violates the
normality assumption. The correct model in this case is logistic regression.
Statistical Tools
and Models Brief Description Application Areas
Forecasting models A forecast is a statement about the Usually the first step in forecasting is to plot the historical data. This is critical in identifying the pattern in
future value of a variable of interest
the time series and applying the correct forecasting method. If the data are plotted over time, such plots are
such as demand. known as time series plots. This plot involves plotting the time on the horizontal axis and the variable of
Forecasting is used to make informed interest on the vertical axis. The time series plot is a graphical representation of data over time where the
decisions and may be long-range or data may be weekly, monthly, quarterly, or annually. Some of the common time series patterns are discussed.
short-range. Figure 6.6 below shows that the demand data are fluctuating around an average. The averaging techniques
Forecasts affect decisions and such as simple moving average or simple exponential smoothing can be used to forecast such patterns. The
activities throughout an organiza- actual data and the forecast are shown in Figure 6.7.
tion. Produce-to-order companies
depend on demand forecast to plan
their production. Inventory plan-
ning and decisions are affected by
forecast. Following are some of the
areas where forecasting is used.
Forecasting methods are classified
as qualitative or quantitative.
Qualitative forecasting methods use
expert judgment to develop fore-
casts. These methods are used when
historical data on the variable being
forecast are usually not available.
The method is also known as judg- Figure 6.6 Plot of demand over time
mental as they use subjective inputs.
93
(Continued)
Table 6.2 (Continued)
94
Statistical Tools
and Models Brief Description Application Areas
These forecasts may be based on
consumer surveys, opinions of sales
and marketing, market sensing, and
Delphi method that uses opinions
of managers or consensus.
The objective of forecasting is to
predict the future outcome based
on the past pattern or data. When
the historical data are not avail-
able, qualitative methods are used.
These methods are used in absence
of past data or in cases when a new
product is to be launched for which
information is not available. Quali- Figure 6.7 Demand and forecast
tative methods forecast the future
outcome based on opinion, judge-
Figures 6.8 and 6.9 show the sales data for a company over a period of 65 weeks. Clearly, the data are
ment, or experience.
fluctuating around an average and showing an increasing trend. Forecasting techniques such as double
moving average or exponential smoothing with a trend can be used to forecast such patterns. The plot in
Figure 6.10 shows the sales and forecast for the data.
The forecast may be based on the
consumer/customer surveys, execu-
tive opinions, sales force opinions,
surveys of similar competitive
products, Delphi method, expert
knowledge, and opinions of man-
agers, achieving a consensus on the
forecast.
Quantitative forecasting is based on
historical data. The most common
methods are time series and associa-
tive forecasting methods. These are
discussed in detail in the subse-
quent sections. The forecasting Figure 6.8 Sales over time
methods and models can be divided
into following categories:
Techniques using average
Simple moving average
Weighted moving average
Exponential smoothing
Techniques for trend
Linear trend equation (similar to
simple regression)
(Continued)
95
96
Table 6.2 (Continued)
Statistical Tools
and Models Brief Description Application Areas
Double moving average or moving
average with trend, exponential
smoothing with trend or trend-
adjusted exponential smoothing
Techniques for seasonality
Forecasting data with seasonal
pattern
Associative forecasting techniques
Simple regression
Multiple regression analysis
Nonlinear regression
Regression involving categorical or
indicator variables, and Figure 6.9 Sales and forecast for the data in
Figure 6.8
Other regression models
The other forecasting techniques involve a number of regression models, and forecasting seasonal patterns.
Figure 6.10 shows a seasonal pattern and forecast.Figure 6.10 A seasonal pattern
(Continued)
97
Table 6.2 (Continued)
98
Statistical Tools
and Models Brief Description Application Areas
ANOVA (analysis of A single-factor completely random- Consider an example in which the marketing manager of a franchise wants to know whether there is a dif-
variance) ized design is the simplest experi- ference in the average profit among four of their stores. He randomly selected four stores and recorded the
mental design. This design involves profit for these stores. The data would look like Table 6.1. In this case, the single factor of interest is store.
one factor at different levels that Since there are four stores, store 1, store 2, store 3, and store 4, we have four levels of the same factor. Recall
can be dealt with a single-factor that the levels of a factor are also known as treatments or groups; therefore, we can say that there are four
factorial experiment. The analysis treatments or groups. The manager wants to study the profit for the selected stores; therefore, profit is the re-
method for such problems is known sponse variable. The response variable is the variable that is measured in an experiment. This is an example
as ANOVA. of a one-factor ANOVA where the single-factor store is at four levels and the response variable is profit.
In ANOVA, the procedure uses
variances to determine whether the Table 6.1
means of multiple groups are differ-
Store 1 Store 2 Store 3 Store 4
ent. The process works by compar-
ing the variance between group 30 37 25 23
vs. the variance within groups and 34 33 21 26
determines whether the groups are 26 39 24 29
all part of a single population or
30 42 25 28
separate populations.
Design of experiment (DOE) is a 25 37 18 25
powerful tool. Many variations of 29 40 25 25
design involve two-factor factorial
design using a two-factor ANOVA. The null and alternate hypotheses for a one-factor ANOVA involving k treatments or groups tests whether
More than two factors can be the k treatment means are equal.
studied using specially designed
experiments.
Data mining Data mining involves exploring Data mining is one of the major tools of predictive analytics. In business, data mining is used to analyze busi-
new patterns and relationships ness data. Business transaction data along with other customer and product related data are continuously
from the collected data—a part of stored in the databases. The data mining software are used to analyze the vast amount of customer data to
predictive analytics that involves reveal hidden patterns, trends, and other customer behavior. Businesses use data mining to perform market
processing and analyzing huge analysis to identify and develop new products, analyze their supply chain, find the root cause of manufactur-
amounts of data to extract useful ing problems, study the customer behavior for product promotion, improve sales by understanding the needs
information and patterns hidden and requirements of their customer, prevent customer attrition, and acquire new customers. For example,
in the data. The overall goal of Wal-Mart collects and processes over 20 million point-of-sale transactions every day. These data are stored
data mining is knowledge discov- in a centralized database and are analyzed using data mining software to understand and determine customer
ery from the data. Data mining behavior, needs, and requirements. The data are analyzed to determine sales trends and forecasts, develop
techniques are used to (i) extract marketing strategies, and predict customer-buying habits http://www.laits.utexas.edu/~anorman/BUS.FOR/
previously unknown and potential course.mat/Alex
useful knowledge or patterns from The success with data mining and predictive modeling has encouraged many businesses to invest in data
massive amount of data collected mining to achieve a competitive advantage. Data mining has been successfully applied in several areas of
and stored and (ii) exploring and business and industry, including customer service, banking, credit card fraud detection, risk management,
analyzing these large quantities of sales and advertising, sales forecast, customer segmentation, and manufacturing.
data to discover meaningful pat-
Data mining is “the process of uncovering hidden trends and patterns that lead to predictive modeling using
tern, and transforming data into an
a combination of explicit knowledge base, sophisticated analytical skills and academic domain knowledge
understandable structure for further
(Luan, Jing, 2002).” Data mining has been used successfully in science, engineering, business, and finance
use. The field of data mining is rap-
to extract previously unknown patterns in the databases containing massive amount of data and to make
idly growing and statistics plays a
predictions that are critical in decision making and improving the overall system performance.
major role in it. Data mining is also
known as knowledge discovery in In recent years, data mining combined with machine learning/artificial intelligence is finding larger and
databases (KDD), pattern analysis, wider applications in analyzing business data, thereby predicting future business outcomes. The reason for
information harvesting, business this is the growing interest in knowledge management and in moving from data to information and finally
intelligence, analytics, etc. Besides to knowledge discovery.
statistics, data mining uses artificial
intelligence, machine learning, da-
99
tabase systems, advanced statistical
tools, and pattern recognition.
(Continued)
Table 6.2 (Continued)
100
Statistical Tools
and Models Brief Description Application Areas
In this age of technology, compan-
ies collect massive amount of data
automatically using different means.
A large quantity of data is also col-
lected using remote sensors and
satellites. With the huge quantities
of data collected today—usually re-
ferred to as big data, traditional tech-
niques of data analysis are infeasible
for processing the raw data. The
data in its raw form have no mean-
ing unless processed and analyzed.
Among several tools and techniques
available and currently emerging
with the advancement of technology
and computers, it is now possible to
analyze big data using data mining,
machine learning, and artificial in-
telligence (AI) techniques.
Machine learning Machine learning methods use Machine-learning algorithms have extensive applications in data-driven predictions and are a major
complex models and algorithms that decision-making tool. Some applications where machine learning has been used are e-mail filtering, cyber
are used to make predictions. These security, signal processing, fraud detection, and others. Machine learning is employed in a range of comput-
models allow the analysts to make ing tasks. Although machine-learning models are being used in a number of applications, it has limitations
predictions by learning from the in designing and programming explicit algorithms that are reproducible and have repeatability with good
trends, patterns, and relationships in performance. With current research and the use of newer technology, the field of machine learning and arti-
the historical data. The algorithms are ficial intelligence are becoming more promising.
designed to learn iteratively from data
without being programmed. In a way,
machine learning automates model
building.
KEY PREDICTIVE ANALYTICS MODELS 101
Summary
This chapter provided a brief description and applications of key predic-
tive analytics models. These models are the core of predictive analytics
and are used to predict future business outcomes.
CHAPTER 7
Regression Analysis
and Modeling
Chapter Highlights
• Introduction to Regression and Correlation
• Linear Regression
?? Regression Model
Linear Regression
Regression analysis is used to investigate the relationship between two or
more variables. Often we are interested in predicting a variableusing one
or more independent variables x1 , x 2 ,.., xk . For example, we might be
interested in the relationship between two variables: sales and profit for
a chain of stores, number of hours required to produce a certain number
of products, number of accidents vs. blood alcohol level, advertising ex-
penditures and sales, or the height of parents compared to their children.
In all these cases, regression analysis can be applied to investigate the
relationship between the two variables.
In general, we have one dependent or response variable, y and one or
more independent variables, x1 , x 2 ,..., xk . The independent variables are
also called predictors. If there is only one independent variable x that we
are trying to relate to the dependent variable y, then this is a case of simple
regression. On the other hand, if we have two or more independent vari-
ables that are related to a single response or dependent variable, then we
have a case of multiple regression. In this section, we will discuss simple
regression, or to be more specific, simple linear regression. This means
that the relationship we obtain between the dependent or response vari-
able y and the independent variable x will be linear. In this case, there is
only one predictor or independent variable (x) of interest that will be used
to predict the dependent variable (y).
In regression analysis, the dependent or response variable y is a ran-
dom variable; whereas the independent variable or variables x1 , x 2 ,.., xn
are measured with negligible error and are controlled by the analyst. The
relationship between the dependent and independent variable or variables
are described by a mathematical model known as a regression model.
y = β0 + β1 x + ε (7.1)
E ( y ) = β0 + β1 x (7.2)
108 BUSINESS ANALYTICS, VOLUME II
ŷ = b0 + b1 x (7.3)
where ŷ = point estimator of E(y) or the mean value of y for a given value
of x
b0 = y - intercept of the regression line b1 = slope of the regression line
110 BUSINESS ANALYTICS, VOLUME II
Figure 7.3 shows a scatter plot of the data of Table 7.1. Scatter plots
are often used to investigate the relationship between two variables. An
investigation of the plot shows a positive relationship between sales and
advertising expenditures therefore, the manager would like to predict the
sales using the advertising expenditure using a simple regression model.
yˆ = −150.9 + 18.33 x
The vertical distance of each point from the line is known as the error
or residual. Note that the residual or error of a point can be positive, nega-
tive, or zero depending upon whether the point is above, below, or on the
fitted line. If the point is above the line, the error is positive, whereas if
the point is below the fitted line, the error is negative.
Figure 7.4 shows graphically the errors for a few points. To demon-
strate how the error or residual for a point is calculated, refer to the data
in Table 7.1.
112 BUSINESS ANALYTICS, VOLUME II
Figure 7.4 Fitting the regression line to the sales and advertising data
of table 7.1
Figure 7.4 shows this error value. This error is negative because the
point y = 498 lies below the fitted regression line.
Now, consider the advertising expenditure of x = 44 . The observed
sales for this value is 728 or y = 728 (from Table 7.1). The predicted
sales for x = 44 equals the vertical distance from y = 728 to the fitted
regression line. This value is calculated as:
Regression Analysis and Modeling 113
The value is shown in Figure 7.4. The error for this point is the dif-
ference between the observed and the predicted, or the estimated value
which is
This value of the error is positive because the point y = 728 lies
above the fitted line.
The errors for the other observed values can be calculated in a similar
way. The vertical deviation of a point from the fitted regression line rep-
resents the amount of error associated with that point. The least squares
method determines the values b0 and b1 in the fitted regression line
ŷ = b0 + b1 x that will minimize the sum of the squares of the errors.
Minimizing the sum of the squares of the errors provides a unique line
through the data points such that the distance of each point from the fit-
ted line is a minimum.
Since the least squares criteria require that the sum of the squares of
the errors be minimized, we have the following relationship:
∑ ( y − yˆ )2 = ∑ ( y − b0 − b1x )2 (7.4)
where y is the observed value and ŷ is the estimated value of the depend-
ent variable given by ŷ = b0 + b1 x
Equation (7.4) involves two unknowns b0 and b1. Using differential
calculus, the following two equations can be obtained:
∑ y = nb0 + b1 ∑ x (7.5)
∑ xy = b0 ∑ x + b1 ∑ x 2
These equations are known as the normal equations and can be solved
algebraically to obtain the unknown values of the slope and y-intercept b0
and b1. Solving these equations yields the results shown below.
114 BUSINESS ANALYTICS, VOLUME II
b1 =
n∑ xy −(∑ x )(∑ y ) (7.6)
n∑ x − ( ∑ x )
2
2
and b0 = y − b1 x (7.7)
y =
∑y and x = ∑x
where, n n
The values b0 and b1 when calculated using equations (7.6) and (7.7)
minimize the sum of the squares of the vertical deviations or errors. These
values can be calculated easily using the data points ( xi , yi ) which are
the observed values of the independent and dependent variables (the col-
lected data in Table 7.1).
x =
∑x =
546
= 36.4 y =
∑y =
7, 742
= 516.133
n 15 n 15
Using the values in Table 7.2, and equations (7.6) and (7.7) we first
calculate the value of b1
b1 =
n∑ xy − (∑ x )(∑ y ) = 15(295, 509) − (546)(7, 742) = 18.326
n∑ x − ( ∑ x )
2
2 15(20, 622) − (546) 2
This gives us the following equation for the estimated regression line:
yˆ = −150.9 + 18.33 x
and interpret the results. You will find that all the formulas are written in
terms of the values calculated in Table 7.4.
The above plot clearly shows an increasing trend. It shows a linear re-
lationship between x and y; therefore, the data can be approximated using
a straight line with a positive slope.
Regression Analysis and Modeling 119
ŷ = b0 + b1 x
b1 =
n∑ xy −(∑ x )(∑ y ) = 30(357, 055) − (24,132)((431.23) = 0.00964
n∑ x − ( ∑ x )
2
2 30(20, 467, 220) − (24,132)2
and
yˆ = b0 + b1 x = 6.62 + 0.00964 x
The regression equation or the equation of the “best” fitting line can
also be written as:
The error is also known as the residual. Figure 7.7 shows the least squares
line and the residuals for each of the points as the vertical distance from
the point to the estimated regression line.
[Note: The estimated line is denoted by ŷ and the residual for a point
yi is given by ( yi − yˆ )]
Recall that the error or the residual for a point is given by ( y − yˆ )
which is the vertical distance of a point from the estimated line. Figure 7.8
shows the fitted regression line over the scatter plot.
yˆ = 6.62 + 0.00964 x
In this equation of the fitted line, 6.62 is the y-intercept and 0.00964
is the slope. This line provides the relationship between the hours and
the number of units produced. The equation means that for each unit
increase in(the number of units produced), (the number of hours) will
increase by 0.00964. The value 6.62 represents the portion of the hours
that is not affected by the number of units.
s =
∑ ( y − yˆ )2 (7.7A)
n−2
The equation can also be written and evaluated using the values of b0,
b1 and the values in Table 7.4, the standard error of the estimate can be
calculated as:
s =
∑ y 2 − b0 ∑ y − b1 ∑ xy =
6, 302.3 − 6.62(431.23) − 0.00964(357, 055)
= 0.4
n−2 28
s = 0.4481
A small value of s indicates less scatter of the data points around the fit-
ted line of regression (see Figure 7.8). The value s = 0.4481 indicates that the
average deviation is 0.4481 hours (measured in units of dependent variable y).
used to judge the adequacy of the regression model. The value of r2 lies
between 0 and 1 (0 ≤ r2 ≤ 1) or 0 to 100 percent. The closer the value of r2
to 1 or 100 percent, the better is the model because the r2 value indicates
the amount of variation in the data explained by the regression model.
Figure 7.9 shows the relationship between the explained, unexplained,
and the total variation.
In regression, the total sum of squares is partitioned into two com-
ponents; the regression sum of squares and the error sum of squares giving
the following relationship:
(∑ y )
2
∑( y − y ) ∑y
2
SST = = 2
− (7.9)
n
and
SSE = ∑ ( y − yˆ )2 = ∑ y 2 − b0 ∑ y − b1 ∑ xy (7.10)
Note that we can calculate SSR by calculating SST and SSE since,
SSR
r2 = (7.11)
SST
(∑ y )2 (431.23)2
∑( y − y ) ∑ y2 −
2
SST = = = 6302.3 − = 103.68
80
n 30
Since
Therefore,
and
SSR 98.057
r2 = = = 0.946
SST 103.680
or, r2 = 94.6%
This means that 94.6 percent variation in the dependent variable, y is
explained by the variation in x and 5.4 percent of the variation is due to
unexplained reasons or error.
r = r2 (7.13)
Therefore,
r = r2 = 0.946 = 0.973
−1 ≤ r ≤ 1 (7.14)
(∑ x )(∑ y )
∑ xy − n
r = (7.15)
(∑ x ) (∑ y )
2 2
∑ x2 − n
× ∑ y2 − n
Using the values in Table 7.4, we can calculate r from equation (7.15).
The instructions in Table 7.5 will produce the regression output shown in
Table 7.6. If you checked the boxes under Residuals and the Line Fit Plots,
the residuals and fitted line plot will be displayed.
yˆ = 6.62 + 0.00964 x
SSR
r2 =
SST
The values of SSR, SSE, and SST can be obtained using the ANOVA
table of regression output above which is part of the regression analysis
output of EXCEL. Table 7.7 shows the EXCEL regression output with
SSR and SST values. Using these values, the coefficient of determination,
r 2 = SSR / SST = 0.9458 . This value is reported under regression sta-
tistics in Table 7.7.
The t-test and F-test for the significance of regression can be easily
performed using the information in the EXCEL computer output under
the ANOVA table. Table 7.8 shows the EXCEL regression output with
the ANOVA table.
(1) Conducting the t-Test Using the Regression Output in Table 7.8.
t n − 2 = b1 sb1
130
Table 7.7 EXCEL regression output
Table 7.8 EXCEL regression output
131
132 BUSINESS ANALYTICS, VOLUME II
The values of b1, sb1 and the test-statistic value t n − 2 are labeled in
Table 7.8 below.
Using the test-statistic value, the hypothesis test for the significance
of regression can be conducted. This test is explained here using the com-
puter results. The appropriate hypotheses for the test are:
H 0 : β1 = 0
H1 : β1 ≠ 0
The null hypothesis states that the slope of the regression line is zero.
Thus, if the regression is significant, the null hypothesis must be rejected.
A convenient way of testing the above hypotheses is to use the p-value
approach. The test statistic value t n − 2 and the corresponding p values are
reported in the regression output Table 7.8. Note that the p value is very
close to zero (p = 2.92278E-19). If we test the hypothesis at a 5 percent
level of significance (α = 0.05) then p = 0.000 is less than α = 0.05 and
we reject the null hypothesis and conclude that the regression is signifi-
cant overall.
Refer to the Regression Analysis part. In this table, the regression equation
is printed as Hours(y) = 6.62 + 0.00964 Units(x). This is the equation of
the best fitting line using the least squares method. Just below the regression
equation, a table is printed that describes the model in more detail. The val-
ues under the Coef column means coefficients. The values in this column
refer to the regression coefficients b0 and b1 where b0 is the y-intercept or
constant and b1 is the slope of the regression line. Under the Predictor, the
value of Units (x) is 0.0096388 which is b1 (or the slope of the fitted line).
The Constant is 6.6209. These values form the regression equation.
1. The regression equation or the equation of the “best” fitting line is:
from each of the points to the line is minimum. The error or the
residual is the vertical distance of each point from the estimated line.
Figure 7.12 shows the least squares line and the residuals. The re-
sidual for a point is given by ( y − y ) which is the vertical distance
of a point from the estimated line.
R-Sq = 94.6%
vs. order of data. The residuals can also be plotted with each of the in-
dependent variables.
Figures 7.13a and 7.13b are used to check the normality assumption.
The regression model assumes that the errors are normally distributed
with mean zero. Figure 7.13a shows the normal probability plot. This plot
is used to check for the normality assumption of regression model. In this
plot, if the plotted points lie on a straight line or close to a straight line
then the residuals or errors are normally distributed. The pattern of points
appear to fall on a straight line indicating no violation of the normality
assumption.
Figure 7.13b shows the histogram of residuals. If the normality as-
sumption holds, the histogram of residuals should look symmetrical or
approximately symmetrical. Also, the histogram should be centered at
zero because the sum of the residuals is always zero. The histogram of
residuals is approximately symmetrical which indicates that the errors ap-
pear to be approximately normally distributed. Note that the histogram
may not be exactly symmetrical. We would like to see a pattern that is
symmetrical or approximately symmetrical.
In Figures 7.13c, the residuals are plotted against the fitted value and
the order of the data points. These plots are used to check the assump-
tions of linearity. The points in this plots should be scattered randomly
around the horizontal line drawn through the zero residual value for the
linear model to be valid. As can be seen, the residuals are randomly scat-
tered about the horizontal line indicating that the relationship between x
and y is linear.
The plot of residual vs. the order of the data shown in Figure 7.13d is
used to check the independence of errors.
The independence of errors can be checked by plotting the errors or
the residuals in the order or sequence in which the data were collected.
The plot of residuals vs. the order of data should show no pattern or ap-
parent relationship between the consecutive residuals. This plot shows
no apparent pattern indicating that the assumption of independence of
errors is not violated.
Note that checking the independence of errors is more important in
the case where the data were collected over time. Data collected over time
sometimes may show an autocorrelation effect among successive data
Figure 7.13 Plots for residual analysis
139
140 BUSINESS ANALYTICS, VOLUME II
The mathematical form of multiple linear regression model relating the de-
pendent variable y and two or more independent variables x1 , x 2 ,… xk
with the associated error term is given by:
y = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk + ε (7.16)
E = ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk (7.17)
The above equation relating the mean value of y and the k independ-
ent variables is known as the multiple regression equation.
It is important to note that β0 , β1 , β 2 ,.. βk are the unknown popula-
tion parameters, or regression coefficients and they must be estimated
using the sample data to obtain the estimated equation of multiple regres-
sion. The estimated regression coefficients are denoted by b0 , b1 , b2 ,.. bk .
These are the point estimates of the parameters β0 , β1 , β 2 ,.. βk . The esti-
mated multiple regression equation using the estimates of the unknown
population regression coefficients can be written as:
( yˆ ) = b0 + b1x1 + b2 x2 + b3 x3 +…. + bk xk
(7.18)
y = β0 + β1 x1 + β 2 x 2 + ε (7.19)
Figure 7.14 Scatter plot and regression plane with two independent
variables
dots. The stars on the regression plane indicate the corresponding points
that have identical values for x1 and x2. The vertical distance from the ob-
served points to the point on plane are shown using vertical lines. These
vertical lines are the errors. The error for a particular point yi is denoted by
( yi − yˆ ) where the estimated value ŷ is calculated using the regression
equation: ŷ = b0 + b1 x1 + b2 x 2 for a given value of x1 and x2.
The least squares criteria requires that the sum of the squares of the
errors be minimized, or,
∑ ( y − yˆ )2
where y is the observed value and ŷ is the estimated value of the depend-
ent variable given by ŷ = b0 + b1 x1 + b2 x 2
[Note: The terms independent, or explanatory variables, and the predictors have the
same meaning and are used interchangeably in this chapter. The dependent variable
is often referred to as the response variable in multiple regression.]
Similar to the simple regression, the least squares method uses the
sample data to estimate the regression coefficients b0 , b1 , b2 ,.. bk and
hence the estimated equation of multiple regression. Figure 7.15 shows
the process of estimating the regression coefficients and the multiple re-
gression equation.
144
Figure 7.15 Process of estimating the multiple regression equation
Regression Analysis and Modeling 145
y = b0 + b1 x1 + b2 x 2 (7.20)
The graph of the first order model is shown in Figure 7.16. This graph
with two independent quantitative variables x1 and x2 plots a plane in a
three-dimensional space. The plane plots the value of y for every combin-
ation ( x1 , x 2 ). This corresponds to the points in the ( x1 , x 2 ) plane.
The first-order model with two quantitative variables x1 and x2 is
based on the assumption that there is no interaction between x1 and x2.
This means that the effect on the response of y of a change in x1(for a
fixed value of x2) is same regardless of the value of x2 and the effect on
y of a change in x2 (for a fixed value of x1) is same rardless of the value
of x1.
In case of simple regression analysis in the previous chapter, we pre-
sented both the manual calculations and the computer analysis of the
problem. Most of the concepts we discussed for simple regression also
apply to the multiple regression; however, the computations for multiple
regression are more involved and require the use of matrix algebra and
other mathematical concepts which are beyond the scope of this text.
Therefore, in this chapter, we have provided computer analysis of the
multiple linear regression models using EXCEL and MINITAB. This sec-
tion provides examples with computer instructions and analysis of the
computer results. The assumptions and the interpretation of the multiple
linear regression models are similar to that of the simple linear regression.
As we provide the analysis, we will point out the similarities and the dif-
ferences between the simple and multiple regression models.
146 BUSINESS ANALYTICS, VOLUME II
E ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5 (7.21)
and the house size (x2). The scatterplots showing the relationship between
the pairs of independent variables are obtained from columns 2 and 3 of
the matrix plot. The matrix plot is helpful in visualizing the interaction
relationships. For fitting the first order model, a plot of y versus each x is
adequate.
The matrix plots in Figures 7.17 and 7.18 show a negative association
or relationship between the heating cost (y) and the average temperature
(x1) and a positive association or relationship between the heating cost (y)
and the other two explanatory variables: house size (x2) and the age of the
furnace (x3). All these relationships are linear indicating that all the three
explanatory variables can be used to build a multiple regression model.
Constructing the matrix plot and investigating the relationships between
the variables can be very helpful in building a correct regression model.
y = b0 + b1 x1 + b2 x 2 + b3 x3
where,
Table 7.10 and data file HEAT_COST.MTW shows the data for
this problem. We used MINITAB to run the regression model for this
problem.
Table 7.11 shows the results of running the multiple regression prob-
lem using MINITAB. In this table, we have marked some of the calcula-
tions (e.g., b0, b1, sbo, sb1, etc.) for clarity and explanation. These are not
the part of the computer output. The regression computer output has two
parts: Regression Analysis and Analysis of Variance.
154 BUSINESS ANALYTICS, VOLUME II
y = b0 + b1 x1 + b2 x 2 + b3 x3
or
where, y is the response variable (Heating Cost), x1, x2, x3 are the in-
dependent variables as described above, the regression coefficients
b0 , b1 , b2 , b3 are stored under the column Coef. In the regression equation
these coefficients appear in rounded form.
The regression equation which can be stated in the form of equation
(7.22) or (7.23) is the estimated regression equation relating the heating
cost to all the three independent variables.
Regression Analysis and Modeling 155
• b1 = −1.65 means that for each unit increase in the average tem-
perature (x1), the heating cost y (in dollars) can be predicted to go
down by 1.65 (or, $1.65) when the house size (x2), and the age of
the furnace (x3) are held constant.
• b2 = +57.5 means that for each unit increase in the house size (x2
in thousands of square feet), the heating cost, y (in dollars) can be
predicted to go up by 57.5 when the average temperature (x1) and
the age of the furnace (x3) are held constant.
• b3 = + 7.91 means that for each unit increase in the age of the furnace
(x3 in years), the heating cost y can be predicted to go up by $7.91 when
the average temperature (x1) and the house size (x2) are held constant.
s = 37.32 dollars
The standard error of the estimate is used to check the utility of the
model and to provide a measure of reliability of the prediction made from
the model. One interpretation of s is that the interval ±2s will provide an ap-
proximation to the accuracy with which the regression model will predict the
future value of the response y for given values of. Thus, for our example, we
can expect the model to provide predictions of heating cost (y) to be within
r2 = 88.0%
The value of r2 = 88.0% for our example implies that using the three
independent variables; average temperature, size of the house, and the age
of the furnace in the model, 88.0 percent of the total variation in heating
cost (y) can be explained. The statistic r2 tells how well the model fits the
data, and thus, provides the overall predictive usefulness of the model.
The value of adjusted R2 is also used in comparing two regression
models that have the same response variable but different number of in-
dependent variables or predictors.
Recall that in simple regression analysis, we conducted the test for the sig-
nificance using a t-test and F-test. Both of these tests in simple regression
provided the same conclusion. If the null hypothesis is rejected in these
tests, it will lead to the conclusion that the slope was not zero, or β1 = 0.
In a multiple regression, the t-test and the F-test have somewhat different
interpretation. These tests have the following objectives:
The F-test in a multiple regression is used to test the overall signifi-
cance of the regression. This test is conducted to determine whether a
significant relationship exists between the response variable y and the set
of independent variables, or predictors x1, x2, …,xn.
158 BUSINESS ANALYTICS, VOLUME II
F-Test
The null and alternate hypotheses for the multiple regression model
y = b0 + b1 x1 + b2 x 2 + .. + bk xk are stated as
MSR
F =
MSE (7.25)
For the overall significance of regression, the null and alternate hypoth-
eses are:
MSR
F = (7.27)
MSE
The degrees of freedom (DF) for Regression and Error are k and n −
(k + 1) respectively where, k is the number of independent variables (k = 3
for our example) and n is the number of observations (n = 30). Also, the
total sum of squares (SST) is partitioned into two parts: sum of squares
due to regression (SSR) and the sum of squares due to error (SSE) having
the following relationship.
We have labeled SST, SSR, and SSE values in Table 7.12. The mean
square due to regression (MSR) and the mean squares due to error (MSE)
are calculated using the following relationships:
The test statistic value or the F statistic from the ANOVA table (see
Table 7.12) is
F = 63.62
The calculated F statistic value is 63.62. Since F = 63.62 > Fcritical = 2.74,
we reject the null hypothesis stated in equation (7.26) and conclude that
Regression Analysis and Modeling 161
the regression is significant overall. This indicates that there exists a sig-
nificant relationship between the dependent and independent variables.
The hypothesis stated using equation (7.26) can also be tested using the
p-value approach. The decision rule using the p-value approach is given by
If p ≥ α, do not reject H0
If P < α, reject H0
From Table 7.12, the calculated p value is 0.000 (see the P column). Since
p = 0.000 < α = 0.05, we reject the null hypothesis H0 and conclude that
the regression is significant overall.
t-tests
H0:β j = 0
H1:β j ≠ 0 (7.28)
This hypothesis test also helps to determine if the model can be made
more effective by deleting certain independent variables, or by adding
extra variables. The information to conduct the hypothesis test for each of
the independent variables is contained in the “Regression Analysis” part
of the computer output which is reproduced in Table 7.13 below. The
columns labeled T and p are used to test the hypotheses. Since there are
three independent variables, we will test to determine whether each of the
three variables is a significant variable; that is, if each of the independent
variables contributes in the prediction of y. The hypothesis to be tested
and the test procedure are explained below. We will use a significance level
of α = 0.05 for testing each of the independent variables.
b1 (7.30)
t =
sb1
where, b1 is the estimate of slope β1 and sb1 is the estimated standard devi-
ation of b1.
Step 3: Determine the value of the test statistic
The values b1, sb1 and t are all reported in the Regression Analysis part of
Table 7.13. From this table, these values for the variable x1 or the average
temperature (Avg. Temp.) are
b1 −1.6457
t = = = −2.36
sb 1 0.6967
tα / 2,[ n − ( k +1)]
which is the t-value from the t-table for [n − (k + 1)] degrees of freedom
and α /2, where n is the number of observations (n = 30), k is the number
of independent variables (k = 3) and α is the level of significance (0.05 in
this case). Thus,
Step 5: Specify the decision rule: The decision rule for the test:
If p ≥ α, do not reject H0
If P < α, reject H0 (7.31)
From Table 7.14, the p-value for the variable average temperature (Avg.
Temp., x1) is 0.026. Since, p = 0.026 < α = 0.05, we reject H0 and con-
clude that the variable average temperature (x1) is a significant variable.
the null hypothesis will be rejected incorrectly at least once leading to the
conclusion that β differs from 0. Thus, in the multiple regression models
where a large number of independent variables are involved and a series
of t- tests are conducted, there is a chance of including a large number of
insignificant variables and excluding some useful ones from the model. In
order to assess the utility of the multiple regression models, we need to
conduct a test that will include all the β parameters simultaneously. Such
a test would test the overall significance of the multiple regression model.
The other useful measure of the utility of the model would be to find
some statistical quantity such as R2 that measures how well the model fits
the data.
A Note on Checking the Utility of a Multiple Regression Model
(Checking the Model Adequacy)
H 0 : β1 = β 2 = … = β k = 0 (No relationship)
Effects of Multicollinearity
A) Consider a regression model where the production cost (y) is related
to three independent variables: machine hours (x1), material cost (x2),
and labor hours (x3):
y = β0 + β1 x1 + β 2 x 2 + β3 x3
Detecting Multicollinearity
Several methods are used to detect the presence of multicollinearity in
regression. We will discuss two of them.
(VIF) for each predictor variable that measures how much the vari-
ance of the estimated regression coefficients are inflated as compared
to when the predictor variables are not linearly related. Use the
guidelines in Table 7.16 to interpret the VIF.
y = b0 + b1 x + b2 x2 + b3 x3+….+bn xn (7.32)
In the above equation, n is an integer and b0, b1,...,bn are unknown par-
ameters that must be estimated.
172 BUSINESS ANALYTICS, VOLUME II
A) First-order Model
The first order model is given by:
y = b0 + b1 x
or y = b0 + b1 x1 + b2 x2 + b3 x3+….+bn xn (7.33)
y = b0 + b1 x + b2 x2 (7.34)
C) Third-order Model
A third order model can be written as:
y = b0 + b1 x + b2 x2 + b3 x3 (7.35)
b0: y-intercept and b3: controls the rate of reversal of the curvature of
curve.
Regression Analysis and Modeling 173
X(Temp.) 105 90 94 79 91
Figure 7.21 Scatter Plot of Life (y) vs. Operating Temp. (x)
A second order model was fitted using MINITAB. The regression output
of the model is shown in Table 7.20.
A quadratic model in MINITAB can also be run using the fitted line
plot option. The results of the quadratic model using this option provide
a fitted line plot (shown in Figure 7.22).
While running the quadratic model, the data values and residuals can
be stored and the plots of residuals be created.
Regression Analysis and Modeling 175
Figure 7.23 shows the residual plots for this quadratic model. The residual
plots are useful in checking the assumptions of the model and the model
adequacy.
The analysis of residual plots for this model is similar to that of simple
and multiple regression models. The investigation of the plots shows that
the normality assumption is met. The plot of residuals versus the fitted
176
Figure 7.23 Residual plots for the quadratic model example
Regression Analysis and Modeling 177
values shows a random pattern indicating that the quadratic model fitted
to the data is adequate.
y = b0 + b1x + b2 x2
In the EXCEL output, the prediction equation can be read from the
“coefficients” column.
The r2 value is 95.9 percent which is an indication of a strong model.
It indicates that 95.9 percent of the variation in y can be explained by the
variation in x and 4.1 percent of the variation is unexplained or due to
error. The equation can be used to predict the life of the components at a
specified temperature.
We can also test a hypothesis to determine if the second order term in our
model, in fact, contributes to the prediction of y. The null and alternate hy-
potheses to be tested for this can be expressed as
H0:β2 = 0
H0:β2 ≠ 0 (7.36)
178
Table 7.21 EXCEL computer output for the quadratic model
Summary Output
Regression Statistics
Multiple R 0.97947
R Square 0.95936
Adjusted R Square 0.95567
Standard Error 5.37620
Observations 25
ANOVA
df SS MS F Significance F
Regression 2 15,011.7720 7,505.8860 259.6872 0.0000
Residual 22 635.8784 28.9036
Total 24 15,647.6504
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 433. 0063 61.8367 7.0024 0.0000 304.7648 561.2478 304.7648 561.2478
Temp. (x) −8.8908 1.3743 −6.4691 0.0000 −11.7410 −6.0405 −11. 7410 −6.0405
x**2 0. 0598 0.0075 7.9251 0.0000 0.0442 0.0755 0.0442 0.0755
Regression Analysis and Modeling 179
b2
t =
sb2
b2
t = = 7.93
sb2
We reject the null hypothesis and conclude that the second order term in
fact contributes in the prediction of the life of components (y). Note: we
could have tested the following hypotheses:
H0:β = 0
H0:β > 0
which will determine that the value of b2 = 0.0598 in the prediction equa-
tion is large enough to conclude that the life of the components increases
at an increasing rate with temperature. This hypothesis will have the same
test statistic and can be tested at α = 0.05.
180 BUSINESS ANALYTICS, VOLUME II
Figure 7.24 Fitted line plot showing the yield of a chemical process
Regression Analysis and Modeling 181
1
x1 =
0
182 BUSINESS ANALYTICS, VOLUME II
y = b0 +b1 x
1 if male
x1 =
0 if female
This coding scheme will allow us to compare the mean salary for male
and female employees by substituting the appropriate code in the regres-
sion equation: y = b0 + b1 x.
Thus, the mean salary for the female employees is b0. In a 0-1 coding
system, the mean response will always be b0 for the qualitative variable
that is assigned the value 0.This is also called the base level.
The difference in the mean salary for the male and female employees
can be calculated by taking the difference (µM − µF)
The above is the difference between the mean response for the level
that is assigned the value 1 and the level that is assigned the value 0 or the
Regression Analysis and Modeling 183
base level. The mean salary for the male and female employees is shown
graphically in Figure 7.25. We can also see that
b0 = µ F
b1 = µ M − µ F
y = b0 + b1 x1 + b2 x2 where,
1 if location B
x1 =
0 if not
1 if location C
x2 =
0 if not
184 BUSINESS ANALYTICS, VOLUME II
The variables x1 and x2 are known as the dummy variables that make
the model function.
µA = y = b0 + b1(0) + b2 (0)
or, µA = b0
µB = y = b0 + b1 x1 + b2 x2 = b0 + b1(1) + b2(0)
or, µB = b0 + b1
µB = µA + b1
or b1 = µB − µA
µC = y = b0 + b1 x1 + b2 x2 = b0 + b1(0) + b2(1)
or, µC = b0 + b2
µC = µA + b2
b2 = µC − µA
Regression Analysis and Modeling 185
µA = b0 and b1 = µB − µA
µB = b0 + b1
µC = b0 + b2 b2 = µC − µA
where µA, µB, µC are the mean profits for locations A, B, and C.
Note that the three levels of the qualitative variable can be described with only
two dummy variables. This is because the mean of the base level (in this case
location A) is accounted for by the intercept b0. In general form, for m levels
of qualitative variable, we need (m − 1) dummy variables.
The bar graph in Figure 7.26 shows the values of mean profit (y) for
the three locations.
Figure 7.26 Bar chart showing the mean profit for three locations A,
B, C
In the above bar chart, the height of the bar corresponding to location
A is y = b0. Similarly, the heights of the bars corresponding to locations
B and C are y = b0 + b1 and y = b0 + b2 respectively. Note that either b1 or
b2, or both could be negative. In Figure 7.26, b1 and b2 are both positive.
1 if zone A 1 if zone B
x4 x5
0 otherwise 0 otherwise
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5
1 if zone A 1 if zone B
x4 x5
0 otherwise 0 otherwise
Regression Analysis and Modeling 187
Table 7.23 shows the data file for this regression model with the dummy
variables. The data can be analyzed using a MINITAB data file – [Data
File: DummyVar_File(2) or from the EXCEL data file – DummyVar_File
(2).xlsx].
We used both MINITAB and EXCEL to run this model The
MINITAB and EXCEL regression output and results are shown in Tables
7.24 and 7.25. Refer to the computer results to answer the following
questions.
188 BUSINESS ANALYTICS, VOLUME II
Table 7.23 Data file for the model with dummy variables
No. of
Volume Advertisement Commission Salespersons Zone A Zone B
Row (y) (x1) (x2) (x3) (x4) (x5)
1 973.62 580.17 235.48 8 1 0
2 903.12 414.67 240.78 7 1 0
3 1,067.37 420.48 276.07 10 1 0
4 1,193.37 454.59 295.70 14 0 1
5 1,429.62 524.05 286.67 16 0 0
6 1,557.87 623.77 325.66 18 1 0
7 1,590.12 641.89 298.82 17 1 0
8 1,081.62 403.03 210.19 12 0 0
9 1,088.37 415.76 202.91 13 0 0
10 1,132.62 506.73 275.88 11 0 1
11 1,314.87 490.35 337.14 15 1 0
12 1,562.37 624.24 266.30 19 0 0
13 1,050.12 459.56 240.13 10 0 0
14 1,055.37 447.03 254.18 12 0 1
15 1,112.37 493.96 237.49 14 0 1
16 1,235.37 543.84 276.70 16 0 1
17 1,518.12 618.38 271.14 18 1 0
18 1,574.37 690.50 281.94 15 0 0
19 1,644.87 591.27 316.75 20 0 0
20 1,169.37 530.73 297.37 10 0 0
21 1,212.87 541.34 272.77 13 0 1
22 1,304.37 492.20 344.35 11 0 1
23 1,477.62 546.34 295.53 15 0 0
24 1,593.87 590.02 293.79 19 0 0
25 1,134.87 505.32 277.05 11 0 1
A) Using the EXCEL data file, run a regression model. Show your regres-
sion output.
B) Using the MINITAB or EXCEL regression output, write down the
regression equation.
C) Using a 5 percent level of significance and the column “p” in the
MINITAB regression output or “p-value” column in the EXCEL re-
gression output, conduct appropriate hypotheses tests to determine
that the independent variables advertisement, commission paid, and
Regression Analysis and Modeling 189
Solution:
A) The MINITAB regression output is shown in Table 7.24.
B) Table 7.25 shows the EXCEL regression output.
C) From the MINITAB or the EXCEL regression outputs in Tables 7.24
and 7.25, the regression equation is:
The regression equation from the EXCEL output in Table 7.25 can be
written using the coefficients column.
The above hypothesis can be tested using the “p” column in either
MINITAB or the p-value column in EXCEL computer results. The deci-
sion rule for the p-value approach is given by
If p ≥ α , do not reject H0
If p < α , reject H0
190 BUSINESS ANALYTICS, VOLUME II
Table 7.26 shows the p-value for each of the predictor variables. From
MINITAB or EXCEL computer results in Table 7.24 or 7.25 (see the “p”
or the “p-value” columns in these tables).
From the above table it can be seen that all the three independent
variables are significant.
(E) As indicated, the overall regression equation is
Separate equations for each zone can be written from this equation.
Regression Analysis and Modeling 191
Zone A: x4 = 1.0, x5 = 0
Therefore, the equation for the sales volume of Zone A can be written as
Similarly, the regression equations for the other two zones are shown
below.
Zone B: x4 = 0, x5 = 1.0
Substituting these values in the overall regression equation of part (c)
Zone C: x4 = 0, x5 = 0
Substituting these values in the overall regression equation of part (c)
Note that in all of the above equations, the slopes are same but intercepts
are different.
192 BUSINESS ANALYTICS, VOLUME II
y = β0 + β1 x1 + β 2 x 22 + ... + βk x k + ε
Regression Analysis and Modeling 193
Models with Dummy Variables General form of Model with one qualitative
(dummy)independent variable at m levels
y = b0 + b1 x1 + b2 x2 +……+ bm − 1 xm − 1
All Subset and Stepwise Regression Finding the best set of predictor variables to be
included in the model
Note; the Interaction Models and All Subset Regression are not discussed
in this chapter.
There are other regression models that are not discussed but can be de-
veloped using the concepts presented for the other models. Some of these
models are explained here.
Chapter Highlights
• Introduction to Forecasting
• Forecasting Methods: An Overview
?? Qualitative Forecasting
Introduction to Forecasting
Forecasting and time series analysis are major tools of predictive analytics.
Forecasting involves predicting future business outcomes using a number
of qualitative and quantitative methods. In this chapter we discuss the
prediction techniques using forecasting and time series data. Many of the
business planning production, operations, sales, demand, and inventory
decisions are based on forecasting. We discuss here the broad meaning of
forecasting applications and a number of models. A forecast is a statement
about the future value of a variable of interest such as demand. Forecast-
ing is used to make informed decisions and may be divided into:
• Long range
• Short range
Associative Forecasting
Associative forecasting methods use explanatory variables to predict the
future. These methods use one or several independent variables or factors
to predict the response variable. Regression methods using simple, mul-
tiple, nonlinear regression models and also indicator variables are some of
the methods used in this category. In this chapter, we will mainly focus on
quantitative forecasting methods.
Features of Forecasts
• Forecasts are not exact and are rarely perfect because of randomness.
Also more than one forecasting method can often be used to forecast
the same data. They all produce different results. The forecast accuracy
differs based on the methods used. Applying the correct forecasting
technique is critical to achieving good forecasts. Some forecasting tech-
niques are more complex than the others. Applying the correct fore-
casting method requires experience and a knowledge of the process.
• Forecast accuracy depends on the randomness and noise present
in the data.
• Forecast accuracy decreases as the time horizon increases.
Time Series Analysis and Forecasting 199
Trend
Seasonal
These are the time series where the variable of interest shows a combin-
ation of a trend and seasonal pattern. Forecasting this type of pattern
requires a technique that can deal with both trend and seasonality and can
be achieved through time series decomposition to separate or decompose
a time series into trend and seasonal components. The methods to forecast
trend and seasonal patterns are usually more involved computationally.
Cyclical
Random Fluctuations
Random fluctuations are the result of chance variation and may be a com-
bination of constant fluctuations followed by trends. An example would
be the demand for electricity in summer. These patterns require special
forecasting techniques and are often complex in nature.
Usually the first step in forecasting is to plot the historical data. This
is critical in identifying the pattern in the time series and applying the
correct forecasting method. If the data are plotted over time, such plots
are known as time series plots. This plot involves plotting the time on the
horizontal axis and the variable of interest on the vertical axis. The time
series plot is a graphical representation of data over time where the data
may be weekly, monthly, quarterly, or annually. Some of the common
time series patterns are shown in Figures 8.1 through 8.7.
Figure 8.1 shows that the demand data is fluctuating around an aver-
age. The averaging techniques such as, Simple Moving Average or Simple
Exponential Smoothing can be used to forecast such patterns. Figure 8.2
shows the actual data and the forecast for Figure 8.1.
202 BUSINESS ANALYTICS, VOLUME II
Figure 8.2 Forecast for the demand data in Figure 8.1 (forecasts are
dotted lines)
Figure 8.3 shows the sales data for a company over a period of
65 weeks. Clearly, the Data are fluctuating around an average and show-
ing an increasing trend. Forecasting techniques such as, Double Moving
Average or Exponential Smoothing with a trend can be used to forecast
such patterns. Figure 8.4 shows the sales and forecast for the data in
Figure 8.3. Figure 8.5 shows a seasonal pattern.
Time Series Analysis and Forecasting 203
Figure 8.4 Forecast for the sales data in Figure 8.3 using double
moving average
The other class of models is based on regression. Figure 8.6 shows the
relationship between two variables—summer temperature and electricity
used. There is a clear indication that there exists a linear relationship be-
tween the two variables. Such a relationship between the variables enables
us to use regression models where one variable can be predicted using the
other variable. We have explained the regression models in the previous
chapter. Figure 8.7 shows a nonlinear relationship (quadratic model). A
nonlinear or quadratic model as explained in the previous chapter can be
used in such cases to predict the response variable (yield in this case) using
the independent variable (temperature).
204 BUSINESS ANALYTICS, VOLUME II
The forecast accuracy is related to the forecast error that is defined as:
Mean Error
Mean or the average forecast error is the simplest measure of forecast ac-
curacy. Since the error can be positive or negative, the positive and nega-
tive forecast errors tend to offset one another, resulting in a small value of
the mean error. Therefore, mean forecast error is not a very useful measure.
The mean absolute error (MAE) is also known as mean absolute deviation
(MAD). It is the mean of the absolute values of the forecast errors. This
avoids the problem of offsetting the positive and negative mean errors.
The MAD can be calculated as:
MAD =
∑ Actual − Forecast
n
Time Series Analysis and Forecasting 207
MAD shows the average size of the error (or average deviation of
forecast from the actual data). Note that n is the number of forecasts
generated.
This is another measure of forecast error that avoids the problem of posi-
tive and negative errors. It is the average of the squared forecast errors
(mean squared error, MSE) and is calculated using
MSE =
∑ ( Actual − Forecast )2
n −1
The MAE or MAD and the MSE depend upon the scale of the data. This
makes it difficult to compare the error for different time intervals. The
mean absolute percentage error (MAPE) provides a relative or percent
error measure that makes the comparison easier. The MAPE is the average
of the absolute percentage forecast errors and is calculated using:
MAPE =
∑ Actual − Forecast / Actual * 100
n
Tracking Signal
Tracking Signal =
∑ ( Actual − Forecast )
MAD
MAD shows the average size of the error (or average deviation of fore-
cast from the actual data).
Bias is the persistent tendency for the forecasts to be greater or smaller
than the actual values. It indicates whether the forecast is typically too low
208 BUSINESS ANALYTICS, VOLUME II
or too high and by how much. Thus, the bias shows the average total error
and its direction.
Tracking signal uses both bias and MAD and can also be calculated as:
Bias
Tracking Signal =
MAD
Forecasting Methods
Naïve Forecasting Method
This method uses the most recent observation in the time series as the
forecast for the next time period and generates short-term forecast.
The weekly demand (for the past 21 weeks) for a particular brand
of cell phone is shown in Table 8.1. We will use the naïve forecasting
method to forecast one week ahead and calculate the forecast accuracy
by calculating the errors. The data and the forecast along with the fore-
cast errors, absolute errors, squared errors, and absolute percent errors are
shown in Table 8.1.
Note that this method uses the most recent observation in the time
series as the forecast for the next time period. Thus, the forecast for the
next period
^
X t +1 = Actual Value in Period t
Time Series Analysis and Forecasting 209
Using the values from the Total row in Table 8.1, we can calculate the
forecast accuracies or errors as shown in Table 8.2.
MSE 88373
MSE = = 4418.65
20
MAPE 500.56
MAPE = = 25.01%
20
The above measures are used in selecting the forecasting method for
the data by comparing them to the measures calculated using other meth-
ods. Usually a small deviation (MAD) or MAPE is an indication of better
forecast.
(a) Simple moving average; (b) weighted moving average; (c) expo-
nential smoothing
The above methods are used for short-range forecast and are also known
as smoothing methods because their objective is to smooth out the random
fluctuations in the time series. A computer software is almost always used
to study the trend or the time series characteristics of the data. The ex-
amples below show the analysis of the class of forecasting techniques that
are based on averages.
The weekly demand (for the past 65 weeks) for a particular brand of cell
phone is used to demonstrate the simple moving average method. The
partial data are show in Table 8.3. The plot of complete data is shown in
Figure 8.8.
Time Series Analysis and Forecasting 211
Figure 8.9 Plot of actual data and six-period moving average forecast
forecasts are responding better to the actual data compared to the six-
period moving average. Usually, a smaller averaging period will produce
a better forecast.
X T + X T −1 + X T − 2 + + X T − N +1
MT = (8.1)
N
214 BUSINESS ANALYTICS, VOLUME II
General Equation:
XT − XT − N
MT = MT −1 +
N (8.2)
XˆT + τ (T ) = MT (8.3)
Sample Calculations
Refer to the first 15 values of demand from Table 8.5 for sample calculation.
X T + X T −1 + X T − 2 + + X T − N +1
MT =
N
X 6 + X 5 + X 4 + X 3 + X 2 + X1
M6 =
6
239 + 226 + 216 + 248 + 222 + 158
M6 = = 218.17
6
In Table 8.5: Week is the time, Demand is the actual demand XT,
MA = moving average, Forecast = one-period ahead forecast, Error is the
difference between the actual and the forecast values (it is a measure of
deviation of actual and the forecast values).
Using equation (8.2) calculate the moving averages. Note that you
need to use equation (8.1) once.
XT − XT − N
MT = MT −1 +
N
Set: T = 7, N = 6
XT − XT − N
MT = MT −1 +
N
X 7 − X1
M7 = M6 +
6
206 − 158
= 218.17 + = 226.17
6
In the computations shown below, note that each time the most re-
cent value is included in the average and the oldest one is discarded. To
calculate the next moving average,
216 BUSINESS ANALYTICS, VOLUME II
Set: T = 8, N = 6
XT − XT − N
MT = MT −1 +
N
X8 − X 2
M8 = M 7 +
6
178 − 222
= 226.17 + = 218.33
6
Set: T = 9, N = 6
XT − XT − N
MT = MT −1 +
N
X9 − X3
M9 = M8 +
6
169 − 248
= 218.83 + = 205.66
6
Set: T = 10, N = 6
XT − XT − N
MT = MT −1 +
N
X 10 − X 4
M10 = M9 +
6
177 − 216
= 205.66 + = 199.167
6
The rest of the moving averages and forecasts are shown in Table 8.5.
Since we calculated a six-period moving average, the forecast for the 7th
period is just the moving average for the 6 periods.
The forecasts for the complete data (with 65 periods) were generated
using a computer software. Figures 8.9 and 8.10 showed the actual data
and forecasts plotted on the same graph for a six-period and three-period
moving average for all 65 periods of data. The forecast errors for these two
moving average periods were shown in Table 8.4.
MSE 52846.44
MSE = = 2935.91
18
MAPE 340.43
MAPE = = 18.91
18
the data values. The more recent observations are given more weights
compared to the older observations. The sum of the weights for the data
values included in the average is usually 1.0.
In Table 8.8, we used Excel to calculate a 4-period simple moving
average and 4-period weighted moving average forecasts for the 21 per-
iods of sales data in column B. Column C shows the 4-period simple
moving average forecasts and column D shows 4-period weighted mov-
ing average forecasts. The weights used for the four data points are 0.1,
0.2, 0.3, and 0.4 and are denoted using W(1) through W(4) shown in
columns A and B. Columns E to H show the forecast errors and absolute
errors for the simple and weighted 4-period forecasts.
Table 8.8 Four-period simple moving average and weighted moving average
forecasts and errors
Ft = α At −1 + (1 − α )Ft −1
where
Ft = forecast for period t, the next period, Ft–1 = forecast for period (t−1),
the prior period
At–1= actual data for (t−1), the prior period, α = smoothing constant
0≤α ≤1
Sample Calculations
Forecast for periods 2 through 5 using the forecasting equation:
Ft = α At −1 + (1 − α )Ft −1
Note: the smoothing constant, α = 0.1 and the initial forecast or the
forecast for the first period, F1 = 393
The forecasts for periods 2,3,4,… are shown below:
… and so on.
Another Example on Simple Exponential Smoothing for Inven-
tory Demand. The operations manager at a company talks to an an-
alyst at company headquarters about forecasting monthly demand for
inventory from her warehouse. The analyst suggests that she considers
using simple exponential smoothing with smoothing constant of 0.3. The
operations manager decides to use the most recent inventory demand (in
thousands of dollars) shown below. From the past experience, she decided
to use 99.727 as the forecast for the first period. Use the simple exponen-
tial smoothing using α = 0.3 and F1 = 99.727 to develop the forecast for
months 2 through 11 for the data in Table 8.11. What is the MAD?
The results are shown in Table 8.11. MINITAB statistical software
was used to generate the forecast.
The inventory demand data and the forecast are plotted and shown
in Figure 8.14.
To see the effect of the smoothing constant α on the forecasts, two
sets of forecasts were generated with α = 0.3 and α = 0.1 and accuracy
measures were calculated. These are shown in Table 8.12.
Time Series Analysis and Forecasting 223
Changing α from 0.3 to 0.1 produced better forecast with less error
values. Both the MAD and MAPE decreased for smaller α. There is a way
of obtaining an optimal value of smoothing constant. The forecast using
exponential smoothing depends on the value of α; therefore, an optimal
value of α is recommended.
The previous forecasting methods were applied to the time series data that
did not show any trend. For the data showing a trend, the simple moving
average method will not provide correct forecasts.
A trend in the time series is identified by a gradual shift or move-
ments to relatively higher or lower values over a period of time. A trend
may be increasing or decreasing and may be linear or nonlinear. Some-
times an increasing or decreasing trend may depict a fluctuation around
an average. Some examples of trend may be changes in populations,
sales and revenue of a company, and demand for a particular technol-
ogy of consumers items showing increasing or decreasing demand.
Figure 8.15 shows the actual sales and double moving average forecast
for a company for the past 65 weeks (the dotted line represents the
forecast). Table 8.13 shows partial data. The time series clearly shows
an increasing trend. The appropriate method to forecast this pattern
is double moving average or the moving average with a trend. Double
moving average is the average of simple moving average. The forecasting
equation in this method is designed to incorporate both the average and
trend component.
MT + M T −1 + MT − 2 + + MT − N +1
MT2 = (8.4)
N
where
MT2 = N-period double moving average
N = no. of periods in moving average
T = no. of observations
MT = N-period simple moving average
General Equation:
MT − MT − N
MT[ 2 ] = MT[ 2−]1 + (8.5)
N
Time Series Analysis and Forecasting 227
2
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ
N − 1
( )
MT − MT[ 2 ] (8.6)
MT + M T −1 + MT − 2 + + MT − N +1
MT2 =
N
M9 + M 8 + M7 + M6 + M5
M 9[ 2 ] =
5
45.60 + 46.60 + 48.40 + 48.40 + 45.20
M 9[ 2 ] = = 46.84
5
MT − MT − N
MT[ 2 ] = MT[ 2−]1 +
N
Using this equation, calculate the other double moving average values as
shown in column (5) of Table 8.14.
Set T = 10, N = 5
Set T = 11, N = 5
… and so on.
228 BUSINESS ANALYTICS, VOLUME II
2
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ
N − 1
(
MT − MT[ 2 ] )
Forecast for the 10th week using the first 9 periods of data (note τ is
always =1 because of one-period ahead forecast)
Set T = 9, τ = 1
2
Xˆ 9 +1(9) = 2 M 9 − M 9[ 2 ] + 1 (
5 − 1
M 9 − M 9[ 2 ] )
1
Xˆ10 (9) = 2(45.60) − 46.84 + ( 45.60 − 46.84 ) = 43.74 (shown in
2
Table 8.14 column 6)
2
Xˆ10 +1(10) = 2 M10 − M10
[2]
+ 1
5 − 1
M10 − M10(
[2]
)
1
Xˆ11(10) = 2(44.60) − 46.72 + ( 44.60 − 46.72 ) = 41.42
2
… and so on.
The rest of the forecasts and complete data are shown in Appendix A.
next-day closing price of XYZ Analytics Inc. common stock. The analyst
has obtained the closing stock prices for the past 40 days (see Appendix).
A) Forecast the stock price for days 3 through 41 using a 3-period mov-
ing average and calculate the forecast errors: MAD, MAPE, and MSD.
Plot the actual data and the forecast on one plot. Use a 6-period mov-
ing average to forecast the stock price data.
B) Use the simple exponential smoothing method to forecast periods 1
through 41 of the stock price. Note that the forecast for period 1 is the
actual price of day 1 (which is 43.50). Use the smoothing constant α
of 0.4. Then increase the value of α to 0.804 and develop your fore-
cast with this α value. Calculate the MAD, bias, and tracking signal
for α = 0.4 and for α = 0.804. The forecast and the error values should
be rounded to four decimal places.
C) Compare the MAD values in parts (a) and (b) and decide which fore-
casting approach to use. What does the bias and tracking signal tell
you? Make a table as shown below and show your values.
Figures 8.16 through 8.19 show the plots of actual data and the fore-
casts using moving average and exponential smoothing methods. The fore-
cast accuracies for comparison purposes are provided below the figures.
A close examination of the forecasts shows that all these methods pro-
vided good short-term forecast of the stock values. However, the forecasts
using the exponential smoothing with a smoothing constant (α = 0.8)
has the least MAD and also the MAPE.
Step 1. Plot the data: Quarter vs. Sales. Figure 8.20 shows the plot of the
demand. This plot clearly shows a seasonal pattern.
2. Calculate the seasonal index for each quarter as shown in Table 8.16.
The formula to calculate the seasonal index is explained below the table.
4. Plot the deseasonalized data. Figure 8.21 shows the plot of deseason-
lized data.
5. Since the data show an increasing trend (see plot above), perform a
regression analysis on the deseasonalized data (x is quarter and y is
deseasonalized data). The computer result is shown in Figure 8.22
and the regression equation is shown below.
Y = 615.419 + 16.8652 x
S = 22.3799 R-Sq = 89.0%
6. Use the regression equation to forecast for quarters 13, 14, 15, and
16 of the following year or year 11. These are deseasonalized fore-
casts for the next four quarters of next year (note that quarter 13 is
the 1st quarter of the next year, quarter 14 is the 2nd quarter of the
next year, and so on).
y = 615.419 =16.8652x
y13 = 615.419 = 16.8652(13) = 834.67
y14 = 615.419 = 16.8652(14) = 851.53
y15 = 615.419 = 16.8652(15) = 868.39
y16 = 615.419 = 16.8652(16) = 885.26
Time Series Analysis and Forecasting 235
7. Multiply the deseasonalized forecast for each quarter with the sea-
sonal index to get the seasonalized forecast. The forecasts are shown
in Table 8.18.
8. The actual data (for the first 12 quarters) and seasonal forecast (next
4 quarters 13 to 16) are shown in Table 8.19.
9. Plot the actual data and the forecast. Figure 8.23 shows the plot of
actual data and the forecast for the next quarter. Note how the fore-
cast follows the seasonal trend.
236 BUSINESS ANALYTICS, VOLUME II
Figure 8.23 Actual demand data (first 12 quarters) and the forecasts
for the next four quarters (quarters 13 through 16)
• Simple Regression
• Multiple Regression Analysis
Summary
This chapter discussed forecasting techniques. Forecasting is a critical part
of predictive analytics and involves predicting future business activities
including the sales, revenue, workforce requirements, demand, and in-
ventory, to name a few. Forecasts affect decisions and activities through-
out an organization. Produce-to-order and produce-to-stock companies
depend on forecast for production and operations planning. Inventory
planning and decisions are affected by forecast. The companies with good
forecasting in place are able to balance the demand and supply, thereby
Time Series Analysis and Forecasting 237
Chapter Highlights
• Introduction to Data Mining
• Data Mining Defined
• Some Application Areas of Data Mining
• Machine Learning and Data Mining
• Data Mining and Its Origins and Areas It Interacts with
• Process of Data Mining and Knowledge Discovery in Databases
(KDD)
• Data Mining Methodologies and Data Mining Tasks
?? Data Preparation or Data Preprocessing, and
▪▪ Data cleaning
▪▪ Data integration
▪▪ Data selection
▪▪ Data transformation
?? Data Mining
▪▪ Pattern evaluation
▪▪ Knowledge representation
• Data Mining Tasks
?? Descriptive Data Mining
?? clustering,
?? sequence, and
• Data Collection: The goal of this phase is to extract the data rel-
evant to data mining analysis. The data should be stored in a data-
base where data analysis will be applied.
244
Figure 9.2 The knowledge discovery in data mining (KDD) process
Data Mining: Tools and Applications in Predictive Analytics 245
The above steps are necessary to prepare the data for further process-
ing. The steps provide clean or processed data so that data mining tasks
can be performed. The data mining tasks involve:
A) Data mining
B) Pattern evaluation
C) Knowledge representation
Data Cleaning
Data cleaning is the process of preparing and making data ready for
further processing. The data collected are raw data and are usually un-
structured, incomplete, noisy, have missing values, and are inconsistent.
The data may also be missing attributes, for example, a huge number of
customer data of a financial company may miss attributes like age and
gender. Such data are incomplete with missing values. Data may also have
outliers or extreme values. There may be recording errors, for example, a
person’s age may be wrongly recorded as 350 years.
The data available in data sources might be lacking attribute values.
For example, we may have data that do not include attributes for the
gender or age of the customers. These data are, of course, incomplete.
Sometimes the data might contain errors or outliers. An example is
an age attribute with value 200. It is obvious that the age value is
wrong in this case. The data could also be inconsistent. For example,
the name of an employee might be stored differently in different data
tables or documents. Here, the data are inconsistent. If the data are
not clean and structured, the data mining results would be neither
reliable nor accurate.
Data cleaning involves a number of techniques including filling in the
missing values manually, combined computer and human inspection, etc.
The output of data cleaning process is adequately cleaned data ready for
further processing.
248 BUSINESS ANALYTICS, VOLUME II
Data Integration
Data integration is the process where data from different data sources are
integrated into one. Data lie in different formats in different locations
and could be stored in databases, text files, spreadsheets, documents, data
cubes, Internet, and so on. Data integration is a really complex and tricky
task because data from different sources may not match normally. For
example, suppose table A contains an entity, named customer-id, whereas
table B contains an entity named “number” instead of customer-id. In
such cases, it is difficult to ensure whether both these entities refer to the
same value. Metadata can be used effectively to reduce errors in the data
integration process. Another issue faced is data redundancy where the
same data may be available in different tables in the same database or are
available in different data sources. Data integration tries to reduce redun-
dancy to the maximum possible level without affecting the reliability of
data.
Data Selection
Data mining process uses large volumes of historical data for analysis.
Sometimes, the data repository with integrated data may contain much
more data than actually required. Before applying any data mining task or
algorithm, the data of interest needs to be separated, selected, and stored
from the available stored data. Data selection is the process of retrieving
the relevant data for analysis from the database.
Data Transformation
Data Mining
Data mining is the core process that uses a number of complex methods
to extract patterns from data. This purpose of data mining phase is to
analyze the data using appropriate algorithms to discover meaningful pat-
terns and rules to produce predictive models. This is the most important
phase of KDD cycle.
Data mining process includes a number of tasks such as association,
classification, prediction, clustering, time series analysis, machine learning,
and deep learning. Table 9.1 outlines the data mining tasks.
Data mining tasks can be broadly classified into descriptive data mining
and predictive data mining.
There are a number of data mining tasks such as classification, pre-
diction, time series analysis, association, clustering, and summarization.
All these tasks are either predictive or descriptive data mining tasks.
Figure 9.4 shows a broad view of data mining tasks.
Descriptive data mining tasks make use of collected data and data min-
ing methodologies to look into the past behavior, relationships, and
patterns to understand and explain what exactly happened in the past.
Predictive analytics employs various predictive data mining and sta-
tistical models including regression, forecasting techniques, and other
predictive models including simulation, machine learning, and AI to
understand what could happen in the future and predict future busi-
ness outcomes.
Predictive data mining uses models from the available data to pre-
dict future values of future business outcomes. An operations manager
using simulation and queuing models to predict the future behavior of a
call center to improve its performance can be considered as a predictive
data mining task. Descriptive data mining tasks use graphical visual and
numerical methods to find data describing patterns to learn about the
Table 9.1 Key data mining tasks
Data
250
Mining Brief Description Application Areas
Data Min- Data Mining involves exploring new patterns and relation- Data mining is one of the major tools of predictive analytics. In business, data mining is used
ing and ships from the collected data—a part of predictive analytics to analyze business data. Business transaction data along with other customer and product-re-
Tasks that involves processing and analyzing huge amounts of data lated data are continuously stored in the databases. The data mining software are used to an-
to extract useful information and patterns hidden in the alyze the vast amount of customer data to reveal hidden patterns, trends, and other customer
data. The overall goal of data mining is knowledge discovery behavior. Businesses use data mining to perform market analysis to identify and develop new
from the data. Data mining techniques are used to (i) ex- products, analyze their supply chain, find the root cause of manufacturing problems, study
tract previously unknown and potential useful knowledge or the customer behavior for product promotion, improve sales by understanding the needs and
patterns from massive amount of data collected and stored, requirements of their customer, prevent customer attrition, and acquire new customers. For
(ii) explore and analyze these large quantities of data to dis- example, Wal-Mart collects and processes over 20 million point-of-sale transactions every
cover meaningful pattern, and transform data into an under- day. These data are stored in a centralized database and are analyzed using data mining soft-
standable structure for further use. The field of data mining ware to understand and determine customer behavior, needs, and requirements. The data are
is rapidly growing and statistics plays a major role in it. Data analyzed to determine sales trends and forecasts, develop marketing strategies, and predict
mining is also known as KDD, pattern analysis, informa- customer-buying habits [http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/].
tion harvesting, business intelligence, analytics, etc. Besides The success with data mining and predictive modeling has encouraged many businesses
statistics, data mining uses AI, machine learning, database to invest in data mining to achieve a competitive advantage. Data mining has been
systems, advanced statistical tools, and pattern recognition. successfully applied in several areas of business and industry including customer service,
In this age of technology, companies collect massive banking, credit card fraud detection, risk management, sales and advertising, sales fore-
amounts of data automatically using different means. cast, customer segmentation, and manufacturing.
A large quantity of data are also collected using remote sen- Data mining is “the process of uncovering hidden trends and patterns that lead to predic-
sors and satellites. With the huge quantities of data collected tive modeling using a combination of explicit knowledge base, sophisticated analytical
today—usually referred to as big data, traditional techniques skills and academic domain knowledge” (Luan, Jing, 2002).
of data analysis are infeasible for processing the raw data. Data mining has been used successfully in science, engineering, business, and finance to extract
The data in its raw form have no meaning unless processed previously unknown patterns in the databases containing massive amounts of data and to make
and analyzed. Among several tools and techniques available predictions that are critical in decision making and improving the overall system performance.
and currently emerging with the advancement of technol- In recent years, data mining combined with machine learning/AI is finding larger and
ogy and computers, it is now possible to analyze big data us- wider applications in analyzing business data, thereby predicting future business out-
ing data mining, machine learning, and AI techniques. comes. The reason for this is the growing interest in knowledge management and in
moving from data to information and finally to knowledge discovery.
Data Mining: Tools and Applications in Predictive Analytics 251
new information from the available data set that is not apparent other-
wise. Businesses use a number of data visualization techniques including
dashboards, heat maps, and a number of other graphical tools to study
the current behavior of their businesses. These visual tools are simple but
rather powerful tools in studying the current business behaviors and are
used in building predictive analytics models.
Classification
Clustering
Cluster Analysis
Prediction
Time series analysis involves data collected over time. Time series is a se-
quence of historical events over time and studies the past performance to
forecast or determine the future events where the next event is determined
by one or more of the preceding events.
A number of models are used to analyze time series data. The forecast-
ing chapter in this book discussed a number of time series patterns and
256
Figure 9.6 Supervised and Unsupervised Learning Techniques
Data Mining: Tools and Applications in Predictive Analytics 257
Summarization
Deep Learning
human brain processes light and sound into vision and hearing. Some
successful applications of deep learning are computer vision and speech
recognition.
Summary
This chapter introduced and provided an overview of the field of data
mining. Today, vast amounts of data are collected by businesses. Data
mining is an essential tool for extracting knowledge from massive
amounts of data. The tools of data mining are used in extracting knowl-
edge from the data—the process is known as KDD. The extracted in-
formation and knowledge are used in different models to predict future
business outcomes. Besides the process of data mining and KDD, the
chapter explained a number of data mining methodologies and tasks. We
outlined and discussed several areas where data mining finds application.
The essential tasks of data mining including data preparation or data pre-
processing, knowledge representation, pattern evaluation, and descriptive
and predictive data mining were discussed. The two broad areas of data
mining are descriptive and predictive data mining. We discussed both of
these areas and outlined the tools in each case with their objectives.
Data mining techniques are also classified as supervised and unsu-
pervised learning. We discussed the tasks of data mining that fall under
supervised and unsupervised learning. The key methodologies of data
mining including anomalies (or outlier) detection, association learning,
classification, clustering, sequence, prediction, and time series and fore-
casting along with their objectives were discussed. We also introduced
the current and growing application areas of data mining. Data mining
has wide applications in machine learning. The chapter introduced the
relationship between data mining and machine learning. Different types
of machine learning problems and tasks—supervised and unsupervised
machine learning, applications of data mining in using artificial neural
networks, and deep learning—were introduced.
CHAPTER 10
Overview
This book provided an overview of the field of business analytics (BA).
BA uses a set of methodology to extract, explore, and analyze big data. It
is about extracting information and making decisions from big data. BA
is a data-driven decision-making process.
The field of BA can be broken down into two broad areas: (1) business
intelligence (BI) and (2) statistical analysis. The flow diagram in
Figure 10.1 outlines the broad area of analytics.
Chapters 1, 2, and 3 provided an explanation on BI and BA. This
book mainly focuses on predictive analytics involving predictive analytics
models. Several chapters in the book are devoted to these models.
The broad area of BA can be broken down into: (1) BI and (2) statistical
analysis
Business Intelligence
BA comes under the broad umbrella of BI discussed in Chapter 3. BI
has evolved from business data reporting that involves examining histori-
cal data to gain an insight into the performance of a company over time.
264
Figure 10.1 Broad area of analytics
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 265
Statistical Analysis
The field of analytics is about driving business decisions using data.
Therefore, statistical analysis is at the core of BA. A number of statistical
techniques and models—from descriptive and data visualization tools to
analytics models—are applied for drawing meaningful conclusions from
the data. Statistical analysis involves performing data analysis and creating
statistical models and can be broken down into the following categories:
Data Analytics
Data analytics is the process of exploring and investigating a company’s
data to find patterns and relationships in data and applying specialized
268 BUSINESS ANALYTICS, VOLUME II
Machine Learning
AI and ML are sometimes used synonymously, but there is a difference
between the two. ML is simply a way of achieving AI.
AI can be achieved without using ML, in which the AI system would
require specific program with millions of lines of codes with complex
rules and decision trees. Alternatively, ML algorithms can be developed.
These are a way of “training” an algorithm so that it can learn how. The
“training” requires feeding huge amounts of data to the algorithm and al-
lowing it to adjust, learn, and improve. One of the most successful appli-
cations of ML is in the area of computer vision—the ability of a machine
to recognize an object in an image or video.
Deep Learning
Deep learning is a class of ML algorithm and is one of many approaches
to ML. Most deep learning models are based on an artificial neural
network and are inspired by the structure and function of the brain
or neurons in the brain. The deep learning applications are commonly
referred to as artificial neural networks (ANNs). The term deep refers
to the number of layers through which the data are transformed. The
reported applications of deep learning include computer vision, speech
recognition, natural language processing, social network filtering, bioin-
formatics, drug design, medical image processing, material inspection,
and more. The research in this area is promising, and the results pro-
duced in different applications are comparable to and, in some cases,
superior to human experts.9
274 BUSINESS ANALYTICS, VOLUME II
275
Figure 10.7 Prescriptive analytics models
276 BUSINESS ANALYTICS, VOLUME II
https://www.forbes.com/sites/bernardmarr/2017/06/06/the-9-best-
free-online-big-data-and-data-science-courses/#6403190343cd
Foundations in Business Analytics — University of Maryland
Business Analytics Certificate — Cornell University
Master Certificate in Business Aanalytics — Michigan State University
The above are listed as the Best Online Business Analytics Certificates &
Courses [Updated 2018]
Summary
In this chapter, we provided an overview of the field of analytics. The
broad area of analytics can be divided into two broad categories: BI and
statistical analysis.
278 BUSINESS ANALYTICS, VOLUME II
Probability Concepts:
Role of Probability in
Decision Making
0 ≤ P ( A) ≤ 1
2. Permutations
The number of ways of selecting n distinct objects from a group of N
objects—where the order of selection is important—is known as the
number of permutations on N objects using n at a time and is written as
N!
PnN = = (n )(n − 1)...(n − k + 1)
( N − n )!
3. Combinations
Combination is selecting n objects from a total of N objects. The
order of selection is not important in combination. This disregard
of arrangement makes the combination different from the permuta-
tion. In general, an experiment will have more permutations than
combinations.
APPENDIX A 285
N N!
C nN = = Note 0! = 1 by definition.
n n !( N − n )!
Assigning Probabilities
0 ≤ P ( A ) ≤ 1.0
P ( A 1 ) + P ( A2 ) + P ( A3 ) + ... + P ( An ) = 1
1. Classical Method
2. Relative Frequency Approach
3. Subjective Approach
1. Classical Method
The classical approach of probability is defined as the favorable number
of outcomes divided by the total number of possible outcomes. Suppose
an experiment has n number of possible outcomes and the event A occurs
in m of the n outcomes, then the probability that event A will occur is
m
P ( A) =
n
286 APPENDIX A
P ( A) + P ( A ) = 1
which means that the probability that event A will occur plus the
probability that event A will not occur must be equal to 1.
2. Relative Frequency Approach
Probabilities are also calculated using relative frequency. In many
problems, we define probability by relative frequency.
3. Subjective Probability
Subjective probability is used when the events occur only once or
very few times and when little or no relevant data are available. In as-
signing subjective probability, we may use any information available,
such as our experience, intuition, or expert opinion. In this case the
experimental outcomes may not be clear and relative frequency of
occurrence may not be available. Subjective probability is a measure
of our belief that a particular event will occur. This belief is based on
any information that is available to determine the probability.
If we have two events A and B that are mutually exclusive, then the
probability that A or B will occur is given by
P ( A ∪ B ) = P ( A ) + P (B )
Note that the “union” sign is used for “or” probability, that is, P ( A ∪ B ) .
This is the same as P (A or B). This rule can be extended to three or more
mutually exclusive events. If three events A, B, and C are mutually exclu-
sive, then the probability that A or B or C will happen can be given by
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C )
APPENDIX A 287
The occurrence of two events that are non-mutually exclusive means that
they can occur together. If the events A and B are non-mutually exclusive,
the probability that A or B will occur is given by
P ( A ∪ B ) = P ( A ) + P (B ) − P ( A and B )
or P ( A ∪ B ) = P ( A ) + P (B ) − P ( A ∩ B )
or
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩ C ) − P (B ∩ C ) + P
A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩
C ) − P (B ∩ C ) + P ( A ∩ B ∩ C )
Equally Likely Events are those that have an equal chance of occurrence
or those where there is no reason to expect one in preference to the other.
In many experiments it is natural to assume that each outcome in the
sample space is equally likely. Suppose that the sample space S consists
of k outcomes, where each outcome is equally likely to occur. The k out-
comes of the sample space can be denoted by S = {1,2,3,...,k} and
When two or more events occur, the occurrence of one event has no ef-
fect on the probability of occurrence of any other event. In this case, the
events are considered independent. There are three types of probabil-
ities under statistical independence:
Statistical Independence
P ( AB ) = P ( A )P (B ) or
P ( A ∩ B ) = P ( A )P ( B )
P( A B)
P ( A B ) = P ( A)
This means that if the events are independent, the probabilities are
not affected by the occurrence of each other. The probability of oc-
currence of B has no effect on the occurrence of A. That is, the condi-
tion has no meaning if the events are independent.
Statistical dependence
When two or more events occur, the occurrence of one event has an effect
on the probability of the occurrence of any other event. In this case, the
events are considered to be dependent.
There are three types of probabilities under statistical dependence.
Statistical Dependence
P ( A ∩ B ) P ( A and B )
P( A B) = =
P (B ) P (B )
P ( A ∩ B ) = P ( A B )P ( B )
or P ( A and B ) = P ( A B )P (B )
Similarly,
P (B ∩ A ) = P (B A )P ( A ) or
P (B and A ) = P (B A )P ( A )
P ( R ) = P ( D R )P ( R ) + P (S R )P ( R )
APPENDIX A 291
Bayes’ Theorem
P ( Ai )P ( D Ai )
P ( Ai D ) =
P ( A1 )P ( D A1 ) + P ( A2 )P ( D A2 ) + ... + P ( An )P ( D An )
A random variable that can assume only integer value or whole number
is known as discrete. An example would be the number of customers ar-
riving at a bank. Another example of a discrete random variable would be
rolling two dice and observing the sum of the numbers on the top faces.
In this case, the results are 2 through 12. Also, note that each outcome
is a whole number or a discrete quantity. The random variable can be
described by a discrete probability distribution.
Table A.1 shows the discrete probability distribution (in a table form)
of rolling two dice and observing the sum of the numbers. In rolling two
dice and observing the sum of the numbers on the top faces, the outcome
is denoted by x which is the random variable that denotes the sum of
the numbers.
Table A.1
X 2 3 4 5 6 7 8 9 10 11 12
P(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
The outcome X (which is the sum of the numbers on the top faces)
takes on different values each time the pair of dice is rolled. On each trial,
the sum of the numbers is going to be a number between 2 and 12 but
we cannot predict the sum with certainty in advance. In other words,
the outcomes or the occurrence of these numbers is a chance factor. The
probability distribution is the outcomes Xi, and the probabilities for these
outcomes P(Xi). The probability of each outcome of this experiment
can be found by listing the sample space of all 36 outcomes. These can
be shown both in a tabular and in a graphical form. Figure A.1 shows the
APPENDIX A 293
µ x = E ( X ) = ∑ xi P ( xi )
σ2 = ∑ ( xi − µ )2 P ( xi ) (A)
σ2 = ∑ x 2 P ( x ) − µ 2 (B)
Example A.1
Table A.2 shows the number of cars sold over the past 500 days for a par-
ticular car dealership in a certain city.
[b] Calculate the expected value or the mean number of cars sold
The expected value is given by:
µx = E (x ) = ∑ xi P ( xi )
or
σ2 = ∑ ( x − µ )2 P ( x )
σ 2 = (0 − 3.056)2 (0.08) + (1 − 3.056)2 (0.200) + (2 − 3.056)2 (0.284)
+ (3 − 3.056)2 (0.132) + (4 − 3.056)2 (0.072) + (5 − 3.056)2 (0.060)
+ (6 − 3.056)2 (0.052) + (7 − 3.056)2 (0.040) + (8 − 3.056)2 (0.032)
+ (9 − 3.056)2 (0.028) + (10 − 3.056)2 (0.016) + (11 − 3.056)2 (0.004)
= 6.071296
296 APPENDIX A
The variance can be more easily calculated using equation (2.5) with (B).
The standard deviation for this discrete distribution is
σ = σ2 = 6.071296 = 2.46
P ( x < 4) = P ( x = 0) + P ( x = 1) + P ( x = 2) + P ( x = 3)
= 0.08 + 0.200 + 0.284 + 0.132
= 0.696
These probability values are obtained from Table A.2 column (3).
P ( x ≤ 4 ) = P ( x = 0 ) + P ( x = 1) + P ( x = 2 ) + P ( x = 3) + P ( x = 4 )
= 0.08 + 0.200 + 0.284 + 0.132 + 0.072
= 0.768
[f] What is the probability of selling at least four cars?
The above probability can also be calculated as
P ( x ≥ 4) = 1 − P ( X < 4)
= 1 − [ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3)]
= 1 − [0.08 + 0.200 + 0.284 + 0.132]
= 0.304
The random variable that might assume any value over a continuous
range of possibilities is known as continuous random variables. Some
examples of continuous variables are physical measurements of length,
volume, temperature, or time. These variables can be described using con-
tinuous distributions.
The continuous probability distribution is usually described using
a probability density function. The probability density function, f(x), de-
scribes the behavior of a random variable. It may be viewed as the shape
of the data. Figure A.2 shows the histogram of the diameter of a machined
APPENDIX A 297
parts with a fitted curve. It is clear that the diameter can be approximated
by certain patterns that can be described by a probability distribution.
The shape of the curve in Figure A.2 can be described by a mathemat-
ical function, f ( x ) , or a probability density function. The area below the
probability density function to the left of a given value, x, is equal to the
probability of the random variable (the diameter in this case) shown on
the x-axis. The probability density function represents the entire sample
space; therefore, the area under the probability density function must
equal one.
The probability density function, f(x), must be positive for all values of x
(as negative probabilities are impossible). Stating these two requirements
mathematically,
∫ f (x ) = 1
−∞
298 APPENDIX A
n
∑ f ( x ) = 1.0 and f ( x ) > 0 .
i =1
1
f ( x) = e − ( x − µ ) / 2σ
2 2
σ 2π
where f (x) is the probability density function, µ the mean, σ the standard
deviation, and e = 2.71828, which denotes the base of the natural loga-
rithm. The distribution has the following properties:
1. The normal curve is a bell-shaped curve. It is symmetrical about
the line x = µ. The mean, median, and mode of the distribution have the
same value.
2. The parameters of normal distribution are the mean µ and stan-
dard deviation σ. The interpretation of how the mean and standard devi-
ation are related in a normal curve is shown in Figure A.6.
Figure A.6 states the area property of the normal curve. For a normal
curve, approximately 68 percent of the observations lie between the mean
and ±1σ (one standard deviation), approximately 95 percent of all obser-
vations lie between the mean and ±2σ (two standard deviations), and ap-
proximately 99.73 percent of all observations fall between the mean and
±3σ (three standard deviations). This is also known as the empirical rule.
302 APPENDIX A
The shape of the curve depends on the mean (µ) and standard devia-
tion (σ). The mean µ determines the location of the distribution, whereas
the standard deviation σ determines the spread of the distribution. Note
that larger the standard deviation (σ), more spread out is the curve (see
Figure A.7).
x2
1
∫σ e − ( x − µ ) / 2σ dx
2 2
(A)
x1
2π
1
e − Z / 2σ
2 2
f (x ) = –∞ < z < ∞
σ 2π
z
P(Z ≤ z ) = ∫ f ( y ) dy
−∞
x−µ
z = (B)
σ
Equation (B) above is a simple equation that can be used to evaluate the
probabilities involving normal distribution.
304 APPENDIX A
Example A.2
x−µ
z =
σ
5.15 − 5.07
z = = 1.14 → 0.3729
0.07
Note: 0.3729 is the area corresponding to z = 1.14. This can be read from
the table of Normal Distribution provided in the Appendix. There are
many variations of this table. The normal table used here provides the
probabilities on the right side of the mean.
or, there is 12.71 percent chance that piston ring diameter will exceed
5.15 cm.
Example A.3
The percentage of acceptable pipes is the shaded area shown in Figure A.9.
The required area or the percentage of acceptable pipes is explained below.
The area 0.4772 is the area between the mean 5.01 and 4.95 (see Fig-
ure A.9). The area left of 4.95 is 0.5 – 0.4772 = 0.0228.
The area 0.4082 is the area between the mean 5.01 and 5.05. The area
right of 5.05 is 0.5 – 0.4082 = 0.0918.
Therefore, the percentage of pipes not acceptable = 0.0228 + 0.0918
= 0.1146 or 11.46 percent. These probabilities can also be calculated
using a statistical software.
Probability Plots
Probability plots are used to determine if a particular distribution fits
sample data. The plot allows us to determine whether a distribution is
appropriate and also to estimate the parameters of fitted distribution. The
probability plots are a good way of determining whether the given data
follow a normal or any other assumed distribution. In regression analysis,
this plot is of great value because of its usefulness in verifying one of the
major assumption of regressions—the normality assumption.
MINITAB and other statistical software provide options for creating
individual probability plots for the selected distribution for one or more
variables. The steps to probability plotting procedure are:
(i − 0.5)100
PP =
n
MINITAB provides the plot based on the above steps. To test the hypoth-
esis, an Anderson-Darling (AD) goodness-of-fit statistic and associated
p-value can be used. These values are calculated and displayed on the plot.
If the assumed distribution fits the data:
From the probability plot of the length data (Figure A.11), we can see
that the cumulative percentage points approximately form a straight line
and the points are close to the straight line. The calculated p-value is 0.543.
At a 5 percent level of significance (α = 0.05), p-value is greater than α so
we cannot reject the null hypothesis that the data follow a normal distribu-
tion. We conclude that the data follow a normal distribution. The prob-
ability plot of failure time data shows that the cumulative percentage points
do not form a straight line. The plotted points show a curvilinear pattern.
The calculated p-value is less than 0.005. At a 5 percent level of significance
(α = 0.05), p-value is less than α so we reject the null hypothesis that the
data follow a normal distribution. The deviation of the plotted points from
a straight line is an indication that the failure time data do not follow a nor-
mal distribution. This is also evident from the histogram of the failure data.
Statistics and data analysis cases involve making inferences about the pop-
ulation based on the sample data. Several of these inference procedures are
discussed in the chapters that follow. Many of these inference procedures
Figure A.11 Histograms and Probability Plots of Length and Failure Time Data
309
310 APPENDIX A
are based on the assumption of normality; that is, the population from
which the sample is taken follows a normal distribution. Before we draw
conclusions based on the assumption of normality, it is important to de-
termine whether the sample data come from a population that is nor-
mally distributed. Below we present several descriptive methods that can
be used to check whether the data follow a normal distribution. Methods
most commonly used to assess the normality are described in Table A.3.
Check #1
The histogram of the data in Figure A.12 indicates that the shape very
closely resembles a bell shape or normal distribution. The bell curve su-
perimposed over the histogram shows that the data have a symmetric or
normal distribution centered around the mean. Thus, we can conclude
that the data follow a normal distribution.
Check #2
The values of mean and the median in Figure A.12 are 14.124 and
14.200, respectively. If the data are symmetrical or normal, the values of
the mean and median are very close. Since the mean and median for the
waiting time data are very close, it indicates that the distribution is sym-
metrical or normal.
312
Figure A.12: Graphical and Numerical Summary of Waiting Time
APPENDIX A 313
Check #3
x ± 2s 95.3
x ± 3s 99.3
The percentages between the mean and standard deviation of the ex-
ample problem (Table A.4 data) agree with the empirical rule or the nor-
mal distribution.
Check #4
The box plot of the data in Figure A.12 shows that the waiting time data
very closely follow a normal distribution.
Check #5
The ratio of the IQR to the standard deviation is calculated below. The
values are obtained from Figure A.12.
The value is close to 1.3, indicating that the data are approximately normal.
314 APPENDIX A
Check #6
All of the above checks confirm that the waiting time data very closely
follow a normal distribution.
Student t-Distribution
This is one of the useful sampling distributions related to the normal dis-
tribution. This distribution is used to check the adequacy of the regression
models. Suppose x is a normally distributed random variable with mean
0 and variance 1. Suppose we have another random variable χn 2 with
n degrees of freedom, then the random variable tn is given by:
x
tn =
χ n2 /n
APPENDIX A 315
of freedom. We will plot the normal and t-distributions on the same plot
and compare the shapes of the t-distributions for different degrees of free-
dom to that of the normal distribution. The steps are outlined below.
From Figure A.14, the innermost curve is the probability density for
t-distribution with one degree of freedom and the outermost curve is the
density function of a normal distribution. You can see that as we increase the
number of degrees of freedom for the t-distribution, the shape approaches
a normal distribution. Also, note that the t-distribution is less peaked at the
center and higher in the tails compared to the normal distribution.
F-distribution
χ u2 / u
Fu ,v =
χ v2 / v
s12 / σ 12
s 22 / σ 22
Summary
In this section, we provided an overview of statistical methods used in ana-
lytics. A number of statistical techniques both graphical and numerical were
presented. These descriptive statistical tools are used in modeling, studying,
and solving various problems. The graphical and numerical tools of descrip-
tive statistics are also used to describe variation in the process data. The
concept of graphical tools of descriptive statistics includes the concept of
frequency distribution, histograms, stem-and-leaf plot, and box plot. These
are simple but effective tools and their knowledge is essential in studying
analytics. A number of numerical measures include the measures of central
tendency such as the mean and the median. In addition, a number of sta-
tistical measures including the variance and standard deviation are a critical
part of the data analysis. Standard deviation is a measure of variation. When
combined with the mean it provides useful information. In the second part
of this section we introduced the concept of probability distribution and ran-
dom variable. A number of probability distributions both discrete and con-
tinuous were discussed with their properties and applications. We discussed
the normal, t-distribution, and F-distribution. They all find applications in
analytics. These distributions are used in assessing the validity of models and
checking the assumptions.
APPENDIX B
Sampling, Sampling
Distribution, and Inference
Procedure
are chosen randomly, each sample has equal probability of being selected
and the sample mean calculated from these samples has equal probability
of going up and down the true population mean.
Because the sample mean x is a random variable, it can be described
using a probability distribution. The probability distribution of a sample
statistic is called its sampling distribution and the probability distribu-
tion of the sample mean x is known as the sampling distribution of the
sample mean. The sampling distribution of the sample has certain prop-
erties that are used in making inference about the population. The central
limit theorem plays an important role in the study of sampling distribu-
tion. We will also study the central limit theorem and see how the amazing
results produced by it are applied in analyzing and solving many problems.
The concepts of sampling distribution form the basis for the inference
procedures. It is important to note that a population parameter is always
a constant, whereas a sample statistic is a random variable. Similar to the
other random variables, each sample statistic can be described using a
probability distribution.
Besides sampling and sampling distribution, other key topics in this
section include point and confidence interval estimates of means and
proportions. We also discuss the concepts of hypothesis testing. These
concepts are important in the study of analytics.
Sampling Distribution
As indicated earlier, in most cases the true value of the population par-
ameters is not known. We must draw a sample or samples and calculate
the sample statistic to estimate the population parameter. The sampling
error of the sample mean is given by
Sampling error = x − µ
Solution to (2): The last column shows the mean of each sample drawn.
Note that each row represents a sample of size 5.
Solution to (3): Figure 3.2 shows the histogram of the sample means
shown in the last column of Table B.1. The histogram shows that the
324 APPENDIX B
Solution to (4): The mean and standard deviation of the sample means
shown in the last column of Table B.1 were calculated using a computer
package. These values are shown in Table B.2.
conclude that x —or the sample mean—values have much less variation
than the individual observations.
Solution to (5): Based on parts (3) and (4), we conclude that the sample
mean x follows a normal distribution, and this distribution is much
narrower than the population of individual observations. This is apparent
from the standard deviation of x value, which is 1.1035 (see Table B.2).
In general, the mean and standard deviation of the random variable x
are given as follows.
Mean of the sample mean, x is
µx = µ or E ( x ) = µ (i)
σ
σx = (ii)
n
µ x = µ = 25
σ 5
and σx = = = 2.236
n 5
From Table B.2, the mean and the standard deviation of 50 sample
means were 25.0942 and 1.1035, respectively. These values will get closer
to 25 and 3.0 if we take more and more samples of size 5.
we can expect of the mean of one or more samples. The standard devi-
ation of the sample mean σ x is often called the standard error of the
mean. Using equation (ii), it can be shown that a sample of 16 observa-
tions (n = 16) is twice as precise as a sample of 4 (n = 4). It may be argued
that the gain in precision in this case is small, relative to the effort in
taking additional 12 observations. However, doubling the sample size in
other cases may be desirable.
Figure B.3 shows a comparison between the probability distribution
of individual observations and the probability distributions of means of
samples of various sizes drawn from the underlying population.
Note that as the sample size increases, the standard error becomes
smaller and hence the distribution becomes more peaked. It is obvious
from Figure B.3 that a sample of one does not tell us anything about the
precision of the estimated mean. As more samples are taken, the standard
error decreases, thus providing greater precision.
This means that if samples of large size (n ≥ 30) are selected from a
population, then the sampling distribution of the sample means is ap-
proximately normal. This approximation improves with larger samples.
The Central Limit Theorem has major applications in sampling and
other areas of statistics. It tells us that if we take a large sample (n ≥ 30) ,
we can use the normal distribution to calculate the probability and draw
conclusion about the population parameter.
The above are useful results in drawing conclusions from the data. For
a sample size of n = 30 or more (large sample), we can always use the
normal distribution to draw conclusions from the sample data.
x −µ
z = (for an infinite population) (iii)
σ
n
or
x −µ
z = (for a finite population) (iv)
σ N −n
n N −1
1
Ostle, Bernard and Mensing, Richard W., Statistics in Research, Third Edition, The
Iowa State University Press, Ames, Iowa, 1979, p. 76.
328 APPENDIX B
Review of Estimation,
Confidence Intervals, and
Hypothesis Testing
There are two types of estimates: (a) point estimates, which are single-
value estimates of the population parameter, and (b) interval estimates
or the confidence intervals, which are a range of numbers that contain
the parameter with specified degree of confidence known as the confi-
dence level. Confidence level is a probability attached to a confidence
interval that provides the reliability of the estimate. In the discussion of
estimation, we will also consider the standard error of the estimates, the
margin of error, and the sample size requirement.
Point Estimate
A) The point estimate of the population mean (μ) is the sample mean ( x ),
x =
∑x
n
∑ xi2 −
( ∑ x i )2
∑ ( xi − x ) or
2
s = s = n
n −1 n −1
Interval Estimate
16.8 ≤ µ ≤ 18.6
or
(16.8 to 18.6)
or
(16.8–18.6)
L≤µ≤U
APPENDIX C 333
P {L ≤ β ≤ U} = 1−α (v)
The confidence interval means that if many random samples are col-
lected and a 100 (1−α) percent confidence interval computed from each
sample for β, then 100 (1−α) percent of these intervals will contain the
true value β.
In practice, we usually take one sample and calculate the confidence
interval. This interval may or may not contain the true value, and it is
not reasonable to attach a probability level to this specific event. The ap-
propriate statement would be that β lies in the observed interval [L,U]
with confidence 100(1−α). That is, we don’t know if the statement is
true for this specific sample, but the method used to obtain the interval
[L,U] yields correct statement 100 (1−α) percent of the time. The inter-
val L ≤ β ≤ U is known as a two-sided or two-tailed interval. We can also
build one-sided interval. The length of the observed confidence interval
is an important measure of the quality of information obtained from the
sample. The half interval (β – L) or (U – β) is called the accuracy of the
estimator. A two-sided interval can be interpreted in the following way:
The wider the confidence interval, the more confident we are that the inter-
val actually contains the unknown population parameter being estimated.
334 APPENDIX C
On the other hand, the wider the interval, the less information we have about
the true value of β. In an ideal situation, we would like to obtain a relatively
short interval with high confidence.
x −µ
z = (vi)
σ
n
P {− zα / 2 ≤ z ≤ zα / 2 } = 1 − α
or
x −µ
P − zα / 2 ≤ ≤ zα / 2 = 1 − α
σ/ n
zα σ
P x − 2 ≤ µ ≤ x + zα σ / n = 1 − α
n 2
zα σ
x − 2
≤ µ ≤ x + z α σ / n (vii)
n 2
336 APPENDIX C
σ σ
x − zα / 2 ≤ µ ≤ x + zα / 2 (viii)
n n
σ
E = zα /2 (ix)
n
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2 (x)
n n
If the population variance is unknown and the sample size is large, the
confidence interval for the mean can also be calculated using a normal
distribution using the following formula:
s
x ± zα /2 (xi)
n
Confidence interval for the mean when the sample size is small
and the population standard deviation s is unknown
When σ is unknown and the sample size is small, use t-distribution for
the confidence interval. The t-distribution is characterized by a single par-
ameter, the number of degrees of freedom (df ), and its density function
provides a bell-shaped curve similar to a normal distribution.
The confidence interval using t-distribution is given by
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2
n n (xii)
where t n−1,α/2 = t-value from the t-table for (n−1) degrees of freedom and
α/2 where α is the confidence level.
In this section, we will discuss the confidence interval estimate for the
proportions. A proportion is a ratio or fraction, or percentage that in-
dicates the part of the population or sample having a particular trait of
interest. Following are the examples of proportions: (1) a software com-
pany claiming that its manufacturing simulation software has 12 percent
of the market share, (2) a public policy department of a large university
wants to study the difference in proportion between male and female un-
employment rate, and (3) a manufacturing company wants to determine
the proportion of defective items produced by its assembly line. In all
these cases, it may be desirable to construct the confidence intervals for
the proportions of interest. The population proportion is denoted by “p,”
whereas the sample proportion is denoted by p.
In constructing the confidence interval for the proportion:
We consider the sample size (n) to be large. If the sample size is large and
np ≥ 5, and
n(1 − p ) ≥ 5
A) The large sample so that the sampling distribution of the sample pro-
portion ( p ) follows a normal distribution.
B) The value of sample proportion.
C) The level of confidence, denoted by z.
p (1 − p ) p (1 − p )
p − zα / 2 ≤ p ≤ p + zα / 2 (xiii)
n n
A) The margin of error E (also known as tolerable error level or the ac-
curacy requirement). For example, suppose we want to estimate the
population mean salary within $500 or within $200. In the first case,
the error E = 500; in the second case, E = 200. A smaller value of the
error E means more precision is required, which in turn will require
a larger sample. In general, smaller the error, larger the sample size.
B) The desired reliability or the confidence level.
C) A good guess for σ.
Both the margin of error E and reliability are arbitrary choices that have
an impact on the cost of sampling and the risks involved. The following
formula is used to determine the sample size:
( zα / 2 )2 σ 2
n= (xiv)
E2
( zα / 2 )2 p(1 − p )
n=
E2 (xv)
Example C.1
Solution:
First, calculate the mean and standard deviation of 25 values in the data.
You should use your calculator or a computer to do this. The values are
x = 22.40
S = 2.723
s
x ±t α n
n −1,
2
2.723
22.40 ± ( 2.064 )
25
21.28 ≤ µ ≤ 23.52
The value 2.064 is the t-value from the t-table for n−1 = 24 degrees of
freedom and α/2 = 0.025.
The confidence interval using a normal distribution can be calculated
using the formula below:
σ
x ± Zα / 2
n
2.723
22.40 ± 1.96
25
APPENDIX C 341
This interval is
21.33 ≤ µ ≤ 23.47
Example C.2
Since the sample size is large (n ≥ 30), and the population standard
deviation σ is known, the appropriate confidence interval formula is
σ
x ± zα /2
n
The confidence intervals using the above formula are shown below.
3, 600
38, 000 ± 1.28
36
37, 232 ≤ µ ≤ 38, 768
3, 600
38, 000 ± 1.645
36
37, 013 ≤ µ ≤ 38, 987
342 APPENDIX C
3, 600
38, 000 ± 1.96
36
36, 824 ≤ µ ≤ 39,176
3, 600
38, 000 ± 2.58
36
36, 452 ≤ µ ≤ 39, 548
Note that the z-values in the above confidence interval calculations are
obtained from the normal table. Refer to the normal table for the values
of z. Figure C.3 shows the confidence intervals graphically.
Figure C.3 shows that larger the confidence level, the wider is the
length of the interval. This indicates that for a larger confidence interval,
we gain confidence. There is higher chance that the true value of the par-
ameter being estimated is contained in the interval but at the same time,
we lose accuracy.
Example C.3
candidate with a margin of error of ±3 percent. What does this mean? From
this information, determine the sample size that was used in this study.
Solution: The polls conducted by the news media use a 95 percent con-
fidence interval unless specified otherwise. Using a 95 percent confidence
interval, the confidence interval for the proportion is given by
p (1 − p )
p ± 1.96
n
0.48 (1 − 0.48)
0.48 ± 1.96
n
0.48 (1 − 0.48)
1.96 = 0.03
n
n = 1066
Example C.4
sample mean differs from the population mean by no more than 15 psi
is 0.95. From the past experience, it is known that the standard deviation
for bursting pressures of this seal is 150 psi.
2
z σ
n = α /2
E
2
(1.96)150
n= ≈ 385
15
Case 1
Assumptions:
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
σ 12 σ 22 (xvi)
( x1 − x 2 ) ± zα / 2 +
n1 n2
or
σ 12 σ 22 σ 12 σ 22
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 +
n1 n2 n1 n2
(xvii)
Case 2
Assumptions:
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
s12 s 22
( x1 − x 2 ) ± zα / 2 + (xviii)
n1 n2
or
s12 s 22 s2 s2
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 1 + 2
n1 n2 n1 n2
(xix)
Case 3
Assumptions:
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
1 1
( x 1 − x 2 ) ± t n1 + n2 − 2,α / 2 s 2p + (xx)
n1 n2
or
1 1 1 1
( x 1 − x 2 ) − t n1 + n2 − 2,α / 2 s 2p + ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s 2p +
n1 n2 n1 n2
1 1 2 1 1
n + n ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s p n + n (xxi)
1 2 1 2
Example C.4
S p2 =
(n1 − 1) s12 + (n2 − 1) s22
n1 + n2 − 2
=
(14)( 2.24)2 + (19) (1.99)2 = 4.41
33
1 1
( x1 − x 2 ) ± t n + n − 2,α / 2 Sp 2 +
1 2
n1 n2
↓
t 33,0.025 = 2.035
(14.54 − 15.36) ± ( 2.035)(0.72)
−0.82 ± 1.47
−2.29 to 0.65
or –2.29 ≤β μ1 – μ2 ≤ 0.65
Note that this interval contains zero (–2.29 to 0.65). This means that the
difference is zero at some point in the interval, indicating there is no dif-
ference in the average wage of union and nonunion workers.
348 APPENDIX C
x1 x
p1 = and p2 = 2
n1 n2
The point estimate for the difference between the population propor-
tions is given by
1 1
( p1 − p2 ) ± zα / 2 p (1 − p ) +
n1 n2
(xxiii)
x1 + x 2 or p = n1 p1 + n2 p2
p = n1 + n 2
n1 + n 2
Example C.5
x1 80
Proportion of defective using the improved method: p1 = = = 0.20
n1 400
x2 108
Proportion of defective using the old method: p2 = = = 0.24
n2 450
x1 + x 2 80 + 108
Combined or the “pooled” proportion: p = = = 0.221
n1 + n2 400 + 450
1 1
( p1 − p2 ) ± zα / 2 p (1 − p ) +
n1 n2
1 1
(0.20 − 0.24) ± (1.96) (0.221)(1 − 0.221) +
400 450
−0.04 ± 0.06
−0.1 ≤ p1 − p2 ≤ 0.02
Hypothesis Testing
The control charts used in statistical process control are closely related
to the hypothesis testing. The tests are also used in several quality control
problems and form the basis of many of the statistical process techniques
to be discussed in the coming chapters.
H0: μ = 60mpg
H1: μ ≠ 60mpg
• The consumer group would gather the sample data and calculate
the sample mean, x .
• Compare the difference between the hypothesized value (µ) and
the value of the sample mean ( x ).
• If the difference is small, there is a greater likelihood that the hy-
pothesized value of the population mean is correct. If the difference
is large, there is less likelihood that the claim about the population
mean is correct.
Note that in hypothesis testing, the decision to reject or not to reject the
hypothesis is based on a single sample and therefore, there is always a chance
of not rejecting a hypothesis that is false, or rejecting a hypothesis that is
true. In fact, we always encounter two types of errors in hypothesis test-
ing. These are:
We also use another term known as the power of the test defined as
Note that μ0 is the hypothesized value. There are three possible cases
for testing the population mean. The test statistic or the formulas used to
test the hypothesis are given below.
Case (1): Testing a single mean with known variance or known popula-
tion standard deviation σ and large sample: in this case, the sample mean
x follows a normal distribution and the test statistic is given as follows:
x −µ
z = (ii)
σ/ n
Case (2): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and large sample: in this case, the sample
mean x follows a normal distribution and the test statistic is given by
356 APPENDIX D
x −µ
z = (iii)
s/ n
Case (3): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and small (n < 30) sample. In this case, the
sample mean x follows a t-distribution and the test statistic is given by
x −µ
t n −1 = (iv)
s/ n
Note that s is the sample standard deviation and n is the sample size.
There are different ways of testing a hypothesis. These will be illus-
trated with examples.
H 0 : µ ≥ 60, 000
H1 : µ < 60, 000
is written under the null hypothesis, and the statement µ < 60,000 is
written under the alternate hypothesis.
Note that the alternate hypothesis is opposite of the null hypothesis.
This is an example of a left-sided test. The left-sided test will reject the
null hypothesis (H0) below a specified hypothesized value of µ.
The alternate hypothesis is also known as the research hypothesis. If you
are trying to establish a certain hypothesis, then it should be written as
the alternate hypothesis.
The statement about the null hypothesis contains the claim or the
theory. Therefore, rejecting a null hypothesis is a strong statement. This is
the reason that the conclusion of a hypothesis test is stated as “reject the
null hypothesis” or “do not reject the null hypothesis.”
H 0 : µ ≤ 24
H1 : µ > 24
H 0 : µ = 1.5
H1 : µ ≠ 1.5
358 APPENDIX D
Example D.3
H 0 : µ ≥ 600
H1 : µ < 600
B) State and explain the type I and type II errors in this situation.
Type I error: Reject H0: µ ≥ 600 and conclude that the average pro-
duction cost is less than $600 (µ < $600). Type II error would be to
conclude that the average operating cost is at least $600 when it is
not.
Example D.4
n = 30, α = 0.05
σ = 0.8, x = 16.32
H 0 : µ = 16
H1 : µ ≠ 16
3. Determine the appropriate level of significance (α) or use the given value
of significance, α
4. Select the appropriate distribution and test statistic to perform the test
The sample size is large and the population standard deviation is
known; therefore, use normal distribution with the following test
statistic:
x −µ
z =
σ
n
5. Based on step 3, find the critical value or values and the area or areas of
rejection. Show the critical value(s) and the area or areas of rejection and
non rejection using a sketch
360 APPENDIX D
7. Use the test data (sample data) and find the value of the test statistic
x −µ 16.32 − 16
Z = = = 2.19
σ/ n 0.8 / 30
8. Find out if the value of the test statistic is in rejection or non rejection
region; make appropriate decision and state your conclusion in terms of
the problem
We will test the hypothesis using p-value for the following two-sided test:
H 0 : µ = 15
H1 : µ ≠ 15
If p ≥ α, do not reject H0
If p < α; reject H0
First, using the appropriate test statistic formula, calculate the test statistic
value.
x −µ 14.2 − 15
Z = = = −1.13
s/ n 5 / 50
For a two-sided test, the p-value is the sum of the above two values, that is,
0.1292+0.1292 = 0.2584. Since p = 0.2584 > α = 0.02, do not reject H0.
Basic Assumptions:
The hypothesis for testing the two means can be a two-sided test or
a one-sided test. The hypothesis is written in one of the following ways:
H 0 : µ1 = µ2 or H 0 : µ1 − µ2 = 0
H1 : µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0 (v)
H 0 : µ1 ≤ µ2 or H 0 : µ1 − µ2 ≤ 0
C) Test if one population mean is smaller than the other: a left-sided test
H 0 : µ1 ≥ µ2 or H 0 : µ1 − µ2 ≥ 0
the means. To test two means, the test statistics are selected based on the
following cases:
Case 1: Sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known, then the sampling distribution of the dif-
ference between the sample means follows a normal distribution and the
test statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
z = (viii)
σ 12 σ 22
+
n1 n2
Case 2: Sample sizes n1 and n2 are large (≥ 30) and the population
variances σ 12 and σ 22 are unknown
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are unknown, then the sampling distribution of the
difference between the sample means follows a normal distribution and
the test statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
z = (ix)
s12 s 22
+
n1 n2
Case 3: Sample sizes n1 and n2 are small (< 30) and the population
variances σ 1 and σ 2 are unknown
2 2
If the sample sizes n1 and n2 are small ( < 30) and the population vari-
ances σ 1 and σ 2 are unknown, then the sampling distribution of the
2 2
difference between the sample means follows a t-distribution and the test
statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
t n1 + n2 − 2 = (x)
1 1
S p2 ( + )
n1 n2
Important Note:
In the equations (viii), (ix) and (x) the difference ( µ1 − µ2 ) is zero in most
cases. Also, these equations are valid under the following assumptions:
The assumption that the two population variances are equal may not
be correct. In cases where the variances are not equal, the test statistic
formula for testing the difference between the two means is different.
APPENDIX D 367
Example D.6
Suppose that two independent random samples are taken from two pro-
cesses with equal variances and we would like to test the null hypothesis
that there is no difference between the means of two processes or the
means of the two processes are equal; that is,
H 0 : µ1 − µ2 = 0 or H 0 : µ1 = µ2
H1 : µ1 − µ2 ≠ 0 H1 : µ1 ≠ µ2
s1 = 8 . 4 s 2 = 7.6
α = 0.05
Note that when n1, n2 are large and σ1, σ2 are unknown, therefore use
normal distribution. The test statistic for this problem is
z =
( x1 − x 2 ) − ( µ1 − µ2 )
s12 s 22
+
n1 n2
Solution: The test can be done using four methods that are explained
below.
z =
( x1 − x 2 ) − ( µ1 − µ2 ) =
(104 − 106) − 0 = −1.53
s12 s 22
+ (8.4)2 + ( 7.6)2
n1 n2 80 70
The test statistic value z –1.53 > Zcritical = –1.96; do not reject H0.
statistic value z. This value was –1.53 or z = –1.53. This test statistic value
is converted to a probability (see Figure D.5).
In the above figure, z = 1.53 is the test statistic value from method 1
above.
From the standard normal table, z = 1.53 corresponds to 0.4370. The
p-value is calculated as shown below.
from one population may not be independent of the sample values from
the other population. The two populations may be considered dependent
in such cases.
In cases where the populations are considered related, the observa-
tions are paired to prevent other factors from inflating the estimate of the
variance. This method is used to improve the precision of comparisons
between means. The method of testing the difference between the two
means when the populations are related is also known as matched sample
test or the paired t-test.
We are interested in testing a two-sided or a one-sided hypothesis for
the difference between the two population means. The hypotheses can be
written as
H 0 : µd = 0 H 0 : µd ≤ 0 H 0 : µd ≥ 0
H1 : µd ≠ 0 H1 : µd > 0 H1 : µd < 0
Two-tailed or two-sided test Right tailed or right-sided test Left-tailed or left-sided test
Test Statistic: If the pairs of data values X1n and X2n are related and are not
independent, the average of the differences ( d ) follows a t-distribution
and the test statistic is given by
d − µd
t n −1 = (xii)
sd / n
where
The confidence interval given below can also be used to test the hypothesis
sd
d ±t α (xiii)
n −1, n
2
Summary
This section discussed three important topics that are critical to analyt-
ics. In particular, we studied sampling and sampling distribution, estima-
tion and confidence intervals, and hypothesis testing. Samples are used
to make inferences about the population and this can be done through
sampling distribution. The probability distribution of a sample statistic is
called its sampling distribution. We explained the central limit theorem
and its role in sampling, sampling distribution, and sample size deter-
mination. Besides sampling and sampling distribution, other key topics
covered included point and confidence interval estimates of means and
proportions.
Two types of estimates used in inferential statistics were discussed.
These estimates include (a) point estimates, which are single-value es-
timates of the population parameter, and (b) interval estimates or the
confidence intervals, which are a range of numbers that contain the
parameter with specified degree of confidence known as the confidence
level. Confidence level is a probability attached to a confidence interval
that provides the reliability of the estimate. In the discussion of estima-
tion, we also discussed the standard error of the estimates, the margin of
error, and the sample size determination.
We also discussed the concepts of hypothesis testing, which is directly
related to the analysis methods used in analytics. Hypothesis testing is
one of the most useful aspects of statistical inference. We provided several
examples on formulating and testing hypothesis about the population
mean and population proportion. Hypothesis tests are used in assessing
the validity of regression methods. They form the basis of many of the
assumptions underlying the analytical methods to be discussed in this
book.
Additional Readings
Albright, S. C, and W. Winston. 2015. Business Analytics: Data Analysis
and Decision Making. 5th ed. Boston, MA: Cengage Learning.
Albright, S. C., W. Winston, and C. Zappe. 2011. Data Analysis and Deci-
sion Making. 4th ed. Boston, MA: South Western Cengage Learning.
Anderson, D. R., D. J. Sweeny, T. A. William, J. D. Camm, and J. J.
Cochran. 2003. An Introduction to Management Science – Quantitative
Approaches to Decision Making. 10th ed. Boston, MA: South Western
Cengage Learning.
Benisis, A. (2010). Business Process Management: A Data Cube To Analyze
Business Process Simulation Data For Decision Making. Saarbrücken,
Germany: VDM Verlag Dr. Müller. p. 204. ISBN:978-3-639-22216-6.
Bowerman, B. L., R. T. O’Connell, and E. S. Murphree. 2017. Busi-
ness Statistics in Practice Using Data, Modeling, and Analytics. 8th ed.
New York, NY: McGraw-Hill Education.
Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting
and Control. 2nd ed. San Francisco, CA: Wiley.
Camm, J. D., J. J. Cochran, M. J. Fry, J. W. Ohlmann, D. R. Anderson,
D. J. Sweeney, and T. A. Williams. 2015. Essentials of Business Analyt-
ics, 1st ed. Boston, MA: Cengage Learning.
Gould, F. J., C. P. Schmidt, J. H. Moore, and L. R. Weatherford. 1998.
Introductory Management Science – Decision Making with Spread
Sheets. 5th ed. Upper Saddle River, NJ: Prentice Hall.
Montgomery, D. C., and L. A. Johnson. 1976. Forecasting and Time Series
Analysis. New York, NY: McGraw Hill.
Russell, R. S., and Taylor, B. W. 2014. Operations and Supply Chain Man-
agement. In Operations Management, eds. W. Stevenson and J William.
Hoboken, NJ: McGraw Hill.
Sahay, A. 2016a. Applied Regression and Modeling – A computer Integrated
Approach. New York, NY: Business Expert Press.
374 ADDITIONAL READINGS
Online References
The list of online research and related websites are as follows:
[1] Geisser, S. (1993). Predictive Inference: An Introduction. Chapman & Hall.
ISBN 978-0- 412-03471-8
[2] Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin,
Germany: Springer. ISBN 97387-31073-2
ADDITIONAL READINGS 375
Standard error of estimate, 122, 155 Time series analysis, 255, 257
Standard normal distribution, 302–306 Time series forecasting, 198
Stata, 127 Tracking signal, 207–208
Statistical analysis, 265 Trend, 200, 202–203
data analytics, 267–268 forecasting data with, 224–228
descriptive statistics, 265, 267 and seasonal patterns, 201
inferential statistics, 267
Statistical dependence, 289–290 Unsupervised learning, 12
Statistical independence, 288–289
Statistical inference, 321. See also Variables, exploring relationships, 72
Inferential statistics Variance, 294
Subjective probability, 286 Variance inflation factor (VIF),
Supervised learning, 12 detecting multicollinearity
using, 168–169
t-distribution, 314–315 VIF. See Variance inflation factor
versus normal distribution,
315–317 Web analytics, 47–48
t-test, 129–132, 161–162 Weighted moving average method,
Text analytics, 45–46 217–219
Text data mining. See Text mining
Text mining, 44–45 z-value approach, hypothesis testing,
Third order model, 172–173 367–368
OTHER TITLES IN OUR BIG DATA, BUSINESS ANALYTICS,
AND SMART TECHNOLOGY COLLECTION
Mark Ferguson, University of South Carolina, Editor
• a one-time purchase,
• that is owned forever,
• allows for simultaneous readers,
• has no restrictions on printing, and
• can be downloaded as PDFs from within the library community.
Our digital library collections are a great solution to beat the rising cost of textbooks. E-books
can be loaded into their course management systems or onto students’ e-book readers.
The Business Expert Press digital libraries are very affordable, with no obligation to buy in
future years. For more information, please visit www.businessexpertpress.com/librarians.
To set up a trial in the United States, please email [email protected].
B ig D ata , B usiness A nalytics , and S mart T echnology C ollection
SAHAY
Mark Ferguson, Editor
BUSINESS ANALYTICS,
BUSINESS ANALYTICS, VOLUME II VOLUME II
A Data-Driven Decision-Making Approach for Business