Business Intelligence - Chapter 1
Business Intelligence - Chapter 1
2
BUSINESS INTELLIGENCE AND DATA MINING CYCLE
3
Dr. Nasim AbdulWahab Matar
EXPLANATION
Consider a retail business chain that sells many kinds of goods and services around
the world, online and in physical stores. It generates data about sales, purchases, and
expenses from multiple locations and time frames. Analyzing this data could help
identify fast-selling items, regional selling items, seasonal items, fast-growing
customer segments, and so on. It might also help generate ideas about what products
sell together, which people tend to buy which products, and so on. These insights and
intelligence can help design better promotion plans, product bundles, and store
layouts, which in turn lead to a better-performing business.
4
BUSINESS INTELLIGENCE
Business intelligence is a broad set of information technology
(IT) solutions that includes tools for gathering, analyzing, and
reporting information to the users about performance of the
organization and its environment. These IT solutions are
among the most highly prioritized solutions for investment.
5
PATTERN RECOGNITION
A pattern is a design or model that helps grasp something. Patterns
help connect things that may not appear to be connected. Patterns help
cut through complexity and reveal simpler understandable trends.
Patterns can be as definitive as hard scientific rules, like the rule that
the sun always rises in the east. They can also be simple
generalizations, such as the Pareto principle, which states that 80
percent of effects come from 20 percent of the causes.
6
Dr. Nasim AbdulWahab Matar
CHARACTERISTICS OF A PERFECT PATTERN OR MODEL
(a) accurately describes a situation,
(b) is broadly applicable
(c) can be described in a simple manner.
PATTERN TYPES
1. temporal,
2. Spatial
3. functional
7
PATTERN TYPES
Explanation
A temporal rule would be that “some people are always late,” no matter what the occasion or time. Some
people may be aware of this pattern and some may not be. Understanding a pattern like this would help
dissipate a lot of unnecessary frustration and anger
A spatial pattern, following the 80–20 rule, could be that the top 20 percent of customers lead to 80
percent of the business. Or 20 percent of products generate 80 percent of the business. Or 80 percent
of incoming customer service calls are related to just 20 percent of the products. This last pattern may
simply reveal a discrepancy between a product’s features and what the customers believe about the
product
A functional pattern may involve test-taking skills. Some students perform well on essay-type questions.
Others do well in multiple-choice questions. Yet other students excel in doing hands-on projects, or in
oral presentations. An awareness of such a pattern in a class of students can help the teacher design a
balanced testing mechanism that is fair to all.
8
WHY PATTERNS
data mining is the act of digging into large amounts of raw data to
discover unique nontrivial useful patterns. Data is cleaned up, and
then special tools and techniques can be applied to search for
patterns. Diving into clean and nicely organized data from the right
perspectives can increase the chances of making the right
discoveries.
9
DATA PROCESSING CHAIN
Data is the new natural resource. Implicit in this statement is the recognition of hidden
value in data. Data lies at the heart of business intelligence. There is a sequence of
steps to be followed to benefit from the data in a systematic way. Data can be modeled
and stored in a database. Relevant data can be extracted from the operational data
stores according to certain reporting and analyzing purposes, and stored in a data
warehouse. The data from the warehouse can be combined with other sources of data,
and mined using data mining techniques to generate new insights. The insights need to
be visualized and communicated to the right audience in real time for competitive
advantage
10
DATA
Anything that is recorded is data. Observations and facts are data. Anecdotes
and opinions are also data, of a different kind. Data can be numbers, such as
the record of daily weather or daily sales. Data can be alphanumeric, such as
the names of employees and customers.
1. Data could come from any number of sources
2. Data can come in many ways
3. There is also data about data
11
DATA CAN BE OF DIFFERENT TYPES.
1. Data could be an unordered collection of values. For example, a retailer sells shirts of
red, blue, and green colors.
2. Data could be ordered values like small, medium, and large.
3. Another type of data has discrete numeric values defined in a certain range, with the
assumption of equal distance between the values. Customer satisfaction score may be
ranked on a 10-point scale with 1 being lowest and 10 being highest.
4. The highest level of numeric data is ratio data that can take on any numeric value. The
weights and heights of all employees would be exact numeric values.
5. There is another kind of data that does not lend itself to much mathematical analysis, at
least not directly. Such data needs to be first structured and then analyzed. This includes
data like audio, video, and graphs files, often called BLOBs (Binary Large Objects). These
kinds of data lend themselves to different forms of analysis and mining. Songs can be
described as happy or sad, fast-paced or slow, and so on. They may contain sentiment and
12
intention, but these are not quantitatively precise.
DATAFICATION
Datafication is a new term that means that almost every
phenomenon is now being observed and stored
13
THE FIVE CS OF DATA
Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into
shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five
Cs of data.
• Clean—dirty data can really muddy up a company’s attempt at real-time disclosure and puts the CFO at high risk when
signing off on financial reports and even press releases based on incorrect information. Dirty data has missing items, invalid
entries, and other problems that wreak havoc with automated data integration and data analysis. Customer and prospect
data, for example, is notorious for being dirty. Most source data is dirty to some degree, which is why data profilingand
cleansing are critical steps in data warehousing.
•
14
THE FIVE CS OF DATA
Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into
shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five
Cs of data.
• Consistent—there should be no arguments about whose version of the data is the correct one. Management meetings
should never have to break down into arguments about whose number is correct when they really need to focus on how to
improve customer satisfaction, increase sales, or improve profits. Business people using different hierarchies or calculations
for metrics will argue regardless of how clean the transactional data is.
15
THE FIVE CS OF DATA
Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into
shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five
Cs of data.
Conformed—the business needs to analyze the data across common, shareable dimensions if business people across the
enterprise are to use the same information for their decision-making.
16
THE FIVE CS OF DATA
Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into
shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five
Cs of data.
• Current—the business needs to base decisions on whatever currency is necessary for that type of decision. In some
cases, such as detecting credit card fraud, the data needs to be up to the minute.
17
THE FIVE CS OF DATA
Before a BI/DW program can deliver actionable information to business people, it must whip the enterprise’s data into
shape. Data that has been whipped into shape will be clean, consistent, conformed, current, and comprehensive—the five
Cs of data.
• Comprehensive—business people should have all the data they need to do their jobs—regardless of where the data
came from and its level of granularity.
18
DATABASE
A database is a modeled collection of data that is accessible in many
ways. A data model can be designed to integrate the operational data of
the organization. The data model abstracts the key entities involved in
an action and their relationships. Most databases today follow the
relational data model and its variants. Each data modeling technique
imposes rigorous rules and constraints to ensure the integrity and
consistency of data over time.
19
DATA WAREHOUSE
A data warehouse is an organized store of data from all over the organization, specially designed to help make management
decisions. Data can be extracted from operational database to answer a particular set of queries. This data, combined with
other data, can be rolled up to a consistent granularity and uploaded to a separate data store called the data warehouse.
Therefore, the data warehouse is a simpler version of the operational data base, with the purpose of addressing reporting and
decision- making needs only. The data in the warehouse cumulatively grows as more operational data becomes available and
is extracted and appended to the data warehouse. Unlike in the operational database, the data values in the warehouse are
not updated.
20
DATA MINING
Data Mining is the art and science of discovering useful innovative patterns from
data. There is a wide variety of patterns that can be found in the data. There are
many techniques, simple or complex, that help with finding patterns.
21
DATA MINING TECHNIQUES
Following are the brief descriptions of some of the most important data mining techniques used to generate insights from
data.
Decision trees: They help classify populations into classes. It is said that 70 percent of all data mining work is about
classification solutions; and that 70 percent of all classification work uses decision trees. Thus, decision trees are the most
popular and important data mining technique. There are many popular algorithms to make decision trees. They differ in terms
of their mechanisms and each technique work well for different situations. It is possible to try multiple algorithms on a data set
and compare the predictive accuracy of each tree.
22
DATA MINING TECHNIQUES
Regression: This is a well-understood technique from the field of statistics. The goal is to find a best
fitting curve through the many data points. The best fitting curve is that which minimizes the (error)
distance between the actual data points and the values predicted by the curve. Regression models
can be projected into the future for prediction and forecasting purposes.
23
DATA MINING TECHNIQUES
Artificial neural networks (ANNs): Originating in the field of artificial intelligence
and machine learning, ANNs are multilayer nonlinear information processing
models that learn from past data and predict future values. These models predict
well, leading to their popularity. The model’s parameters may not be very intuitive.
Thus, neural networks are opaque like a black box. These systems also require a
large amount of past data to adequately train the system.
24
DATA MINING TECHNIQUES
Cluster analysis: This is an important data mining technique for dividing and conquering large data
sets. The data set is divided into a certain number of clusters, by discerning similarities and
dissimilarities within the data. There is no one right answer for the number of clusters in the data.
The user needs to make a decision by looking at how well the number of clusters chosen fit the data.
This is most commonly used for market segmentation. Unlike decision trees and regression, there is
no one right answer for cluster analysis.
25
DATA MINING TECHNIQUES
Association rule mining: Also called market basket analysis when used in retail
industry, these techniques look for associations between data values. An analysis of
items frequently found together in a market basket can help cross-sell products and
also create product bundles.
26
COMMON EXAMPLES OF DATA MINING
Artificial Intelligence and Machine Learning
Both Artificial Intelligence and Machine Learning are gaining a lot of relevance in the world today,
and the credit goes to Data Mining. How else do you make a system “artificially intelligent” without
feeding it with relevant data and patterns? And, how do you extract relevant patterns if not by Data
Mining?
One of the most common examples of AI and Machine Learning that you most likely come across
every day is the beloved recommendation systems. Has it ever happened that after buying a product
from Amazon, you’re shown a list of recommended products, and you end up buying one of those in
a blink of an eye? How did Amazon accomplish this? By thoroughly studying and analyzing your past
data and behaviours. Using your behavioural trends, Amazon can categorise products depending on
the probability of your purchasing the product. While Amazon and other e-commerce websites use AI
to show product recommendations, video and music streaming platforms like Spotify and Netflix use
the same to better curate your playlists. The examples mentioned above use Artificial Intelligence on
top of the mined data. However, reverse usage is also possible, i.e., you can develop theories and
then use data mining to strengthen your theory. For example, if a self-driving car sees a red Maruti
overspeeding by twice the speed limit, it might develop a theory that all red Marutis over speed. This
27
AI can then use Data Mining methods to strengthen or weaken the theory.
COMMON EXAMPLES OF DATA MINING
Service Providers
Service providers have been using Data Mining to retain customers for a very long
time now. Using the techniques of Business Intelligence and Data Mining allows
these service providers to predict the “churn” – a term used for when a customer
leaves them for another service provider.
Today, every service provider has terabytes of data on their customers. This data
includes things like your billing information, customer services interactions, website
visits, and such. Using mining and analysis of this data, the service providers assign
a probability score to each customer. This probability score is a reflection of how
likely you are of switching the vendors. Then, these companies target the people at
a higher risk by providing incentives and personalized attention, to retain the
customers
28
COMMON EXAMPLES OF DATA MINING
Supermarkets and Retail Stores
Data mining allows the supermarket owners to know your choices and preferences even better than
yourself. If you don’t believe us, you’ll be amazed by what Target did a few years back.
Following the purchase history and behaviours of one of their female customers, Target correctly
concluded that she is pregnant. Oh, and let’s tell you – this was even before the woman herself
knew. Such is the power of data, patterns, and analysis.
In general, these retail stores divide the customers into what they call “recency, frequency, monetary”
(RFM) groups and specific groups with different campaigns and strategies. So, a customer who
spends a lot but infrequently will be dealt differently than a customer who spends little but often. The
latter kind may receive loyalty, upsell, or cross-sell offers, whereas the former might be offered a win-
back deal, just for instance
29
COMMON EXAMPLES OF DATA MINING
Science, Engineering, and Education
The areas of science and engineering have seen a massive overhaul ever since the application of data mining
techniques. Let’s look at some specific fields that make use of Data Mining techniques:
– Sequence mining finds extensive use in the study of human genetics. It helps in understanding the
relationship between the variations in DNA sequence and the variability in susceptibility to diseases. Simply
put, it aims to find out how the changes in DNA correspond to the risk of developing common diseases,
which will aid significantly in improving methods of diagnosing, preventing, and treating these diseases.
•
– Data mining is used in the field of educational research to understand the factors leading students to
engage in behaviours which reduce their learning and efficiency.
In the area of electrical power engineering, data mining methods have been widely used for performing
condition monitoring on high voltage electrical equipment. The aim of this is to obtain valuable information on
various safety-related parameters like the status of insulation, and such, to avoid any mishaps.
30
COMMON EXAMPLES OF DATA MINING
Crime Prevention Agencies
The use of Data Mining and Analytics is not just restricted to corporate applications
or education and technology, and the last example on this list goes to prove the
same. Beyond corporate organizations, crime prevention agencies also use data
analytics to spot trends across myriads of data. This data includes information
including details of all the major criminal activities that have happened.
Mining this data and thoroughly studying and understanding patterns and trends
allows these crime prevention agencies to predict the future events with much
better accuracy. With the help of Data Mining and analytics, these agencies can find
out everything from where to deploy maximum police manpower (where is the next
crime most likely to happen and when?), who to search at a border crossing (based
on type or age of the vehicle, number or age of occupants, or border crossing
history), to even which intelligence to take seriously in counter-terrorism activities. 31
DATA VISUALIZATION
As data and insights grow in number, a new requirement is the ability of the
executives and decision makers to absorb this information in real time. There is a
limit to human comprehension and visualization capacity. That is a good reason to
prioritize and manage with fewer but key variables that relate directly to the key
result areas of a role.
32
EXAMPLE OF DATA VISUALIZATION
Sample data visualization
33
EXAMPLE OF DATA VISUALIZATION
Sample data visualization
34
HOMEWORK
• Create a Decision Tree chart for one of your daily routine and
apply it with your colleagues to define a pattern of use or act.
35
THANK
YOU
Dr. Nasim AbdulWahab Matar
Head of E-Business and MIS Department @ University of Petra
EXT: 9400