0% found this document useful (0 votes)
3 views

Data Visualization

The document outlines techniques for descriptive analytics in business, emphasizing the importance of aligning business analytics initiatives with strategic goals. It details various statistical methods for summarizing data, such as numerical descriptions and data visualization techniques, which aid in understanding historical data and making informed decisions. Additionally, it discusses predictive modeling, including logic-driven and data-driven models, and highlights the significance of regression analysis, correlation analysis, and simulation in forecasting and risk assessment.

Uploaded by

Jibril Yero
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Visualization

The document outlines techniques for descriptive analytics in business, emphasizing the importance of aligning business analytics initiatives with strategic goals. It details various statistical methods for summarizing data, such as numerical descriptions and data visualization techniques, which aid in understanding historical data and making informed decisions. Additionally, it discusses predictive modeling, including logic-driven and data-driven models, and highlights the significance of regression analysis, correlation analysis, and simulation in forecasting and risk assessment.

Uploaded by

Jibril Yero
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

TECHNIQUES FOR DESCRIPTIVE ANALYTICS

In any Business Analytic (BA) undertaking, referred to as BA initiatives or projects, a set of


objectives is articulated. These objectives are a means to align the BA activities to support
strategic goals. The objectives might be to seek out and find new business opportunities, to solve
operational problems the firm is experiencing, or to grow the organization. It is from the
objectives that exploration via BA originates and is in part guided. The directives that come
down, from the strategic planners in an organization to the BA department or analyst, gives focus
to the tactical efforts of the BA initiative or project. Maybe the assignment will be one of
exploring internal marketing data for a new product marketing. Maybe the BA assignment will
be focused on enhancing service quality by collecting engineering and customer service
information. Regardless of the type of BA assignment, the first step is one of exploring data and
revealing new, unique, and relevant information to help the organization advance its goals.
Doing this requires an exploration of data. This section focuses on how to undertake the first step
in the BA process: descriptive analytics.
Techniques for Descriptive Analytics are statistical techniques that are used to numerically or
graphically present a summary an organisation’s existing data in order to understand what has
happened in the past or is happening currently. It is the simplest form of analytics that allows the
organisation to gain insight from historical data, using various exploration techniques such as
numerical summary statistics, data visualization, descriptive mining, etc., etc. Some of the
common methods are explored in the subsequent sections below.
NUMERICAL DESCRIPTION
This includes basic statistical methods like mean, median, mode (central tendency), standard
deviation, variance (dispersion), correlation, etc.
Some of these descriptive statistics are discussed in the Table below as tools that are helpful in
understanding the informational value of data sets.
Statistics Computation Application Example Application Notes
(in Data Set) Area
N or Count Number of values in Any Sample size of a Useful in knowing how
data distribution company's many items were used in
transactions during the statistics
a given month. computations.

Sum Total of the values ]Any Total sales for a Useful in knowing the
in the entire data company. total value.
set
Mean Average of all Any Average sales per Useful in capturing the
values. month. central tendency of the
data set
Median Midpoint value in Finding the midpoint Total income forUseful in finding the point
the data set arrangedin the distribution of citizen of a country. where 50 percent of the
from high to low. data. data is above and below
Mode Most common value Knowing where values Fixed annual Useful in declaring a
in the dataset. are highly repeated in salaries where a common value in highly
the data set. limited number of repetitive data sets.
wage levels are
used..
Maximum/ Largest and smallestTo conceptualize the Largest and smallest Useful in providing a
values, respectively. spread of the data's sales in a day. scope or end points in the
Minimum
distribution. data.
Range Difference between A crude estimate of the Spread of Sales in Useful as simple estimate
the Max and the spread of the data’s unit during a given of dispersion
Min values distribution month
Standard Square root of the A precise estimate of A standard The smaller the value, the
Deviation average of the the spread of the data deviation in Naira less the variation and the
squared differences distribution from a from average sales more the predictability of
between the mean mean value in terms of the data set
value and each data the units used in the
value in the data computation
distribution
Variance Average of the A variance estimate of Best in comparing The smaller the value, the
squared difference the spread of the data one variance with less the variation and the
between a mean distribution from a another variance more the predictability of
value and each mean value NOT in the data set
observation in the terms of the units used
dataset in the computation
Coefficient Positive or Measure of symmetric As the population of The closer the coefficient
of Negative. If a + nature (the degree of a given country agesof skewedness is to zero,
Skewedness coefficient is asymmetry) of the data (having more old the more symmetry is the
resulted, it means around the mean people than young) data. A positively skewed
the data distribution the age distribution data has its largest
is positively skewed becomes negatively allocation to the left, and
and vice versa. The skewed vice versa
larger the
coefficient, the
greater the
skewedness
Fortunately, we do not need to compute these statistics to know how to use them. Computer
software provides these descriptive statistics where they are needed or requested. Illustration
will be given in the class using appropriate software and an example sales data as presented
below

GRAPHICAL DESCRIPTION (DATA VISUALIZATION/DATA ISSUE)


In addition to numerical descriptive statistics, learning how to effectively visualize data could be
another first step toward using data analytics and data science to your advantage to add value to
your organization. No matter your role or title within an organization, data visualization is a skill
that is important for all professionals. Being able to effectively present complex data through
easy-to-understand visual representations is invaluable when it comes to communicating
information with members both inside and outside your business. Corporate executives and other
decision makers may not have time to go through numbers and voluminous reports. Visual
presentation of information can save their time and as well, effectively deliver the message.
WHAT IS DATA VISUALIZATION?
Data visualization is the process of creating graphical representations of information. This
process helps the presenter communicate data in a way that is easy for the viewer to interpret and
draw conclusions.
There are many different techniques and tools you can leverage to visualize data, so you want to
know which ones to use and when. Here are some of the most important data visualization
techniques all professionals should know:
Bar Chart:
Bar charts are useful for summarizing mainly categorical variables (or continuous variables
sometimes) side by side For example, you can use a bar chart to show the number of men and the
number of women who participated in a survey, or you can use a bar chart to show the mean
salary for men and the mean salary for women.
- Plots SYNTAX in STATA: e.g.: graph bar npl car e.g2: graph bar fsize_dummy
npl_quart
- Plot categorical variable by the x axis against a continuous variable on the y axis
• SYNTAX in STATA e.g.: graph bar npl, over(fsize_dummy)
Pie Chart:
Pie chart produces sliced portions of a cycle for each variable represented in the data. By default,
one slice corresponds to each category defined by the variable with one slice representing all
missing values. Pie charts are labeled with value labels or with the value if no label is defined.
• SYNTAX in STATA: graph pie categorical variable name, over(optional variable
name) - to generate pie chart depicting non-performing loan for each bank
• Example, type: graph pie npl, over(bank)

Histograms:
Histograms are useful for showing the distribution of a single scale variable. Data are binned and
summarized using a count or percentage statistic. A variation of a histogram is a frequency
polygon, which is like a typical histogram except that the area graphic element is used instead of
the bar graphic element.
1. Syntax in STATA: histogram var_name,normal
Numerous other descriptive visualization techniques such gantt chart, line trend, dot plots, etc.,
etc., exist. We will look at more of these and demonstrate using various statistical software
applications such as spss, stata, spreadsheet, python, etc., as time permits.
PREDICTIVE MODELING
Predictive modeling means developing models that can be used to forecast or predict future
events. In business analytics, models can be developed based on logic or data.
Logic-Driven Models
A logic-driven model is one based on experience, knowledge, and logical relationships of
variables and constants connected to the desired business performance outcome situation. The
question here is how to put variables and constants together to create a model that can predict the
future. Doing this requires business experience. Model building requires an understanding of
business systems and the relationships of variables and constants that seek to generate a desirable
business performance outcome. To help conceptualize the relationships inherent in a business
system, diagramming methods can be helpful.
For example, the cause-and-effect diagram is a visual aid diagram that permits a user to
hypothesize relationships between potential causes of an outcome (see figure below). This
diagram lists potential causes in terms of human, technology, policy, and process resources in an
effort to establish some basic relationships that impact business performance. The diagram is
used by tracing contributing and relational factors from the desired business performance goal
back to possible causes, thus allowing the user to better picture sources of potential causes that
could affect the performance. This diagram is sometimes referred to as a fishbone diagram
because of its appearance.
Another useful diagram to conceptualize potential relationships with business performance
variables is called the influence diagram. According to Evans (2013, pp. 228–229), influence
diagrams can be useful to conceptualize the relationships of variables in the development of
models. An example of an influence diagram is presented in the next Figure. It maps the
relationship of variables and a constant to the desired business performance outcome of profit.
From such a diagram, it is easy to convert the information into a quantitative model with
constants and variables that define profit in this situation:
Profit = Revenue − Cost, or
Profit = (Unit Price × Quantity Sold) − [(Fixed Cost) + (Variable Cost × Quantity Sold)], or
P = (UP × QS) − [FC + (VC × QS)]

The relationships in this simple example are based on fundamental business knowledge.
Consider, however, how complex cost functions might become without some idea of how they
are mapped together. It is necessary to be knowledgeable about the business systems being
modeled in order to capture the relevant business behavior. Cause-and-effect diagrams and
influence diagrams provide tools to conceptualize relationships, variables, and constants, but it
often takes many other methodologies to explore and develop predictive models.
Data-Driven Models
Logic-driven modeling is often used as a first step to establish relationships through data-driven
models (using data collected from many sources to quantitatively establish model relationships).
Some of the popular techniques on the use and application of the data-driven models. Include:
Regression modelling:
Regression analysis is a common tool used in business, finance and other fields to study variable
dependency. This means that it can help a professional in these areas understand the relationship
between key variables. Learning about regression and its various methods can help you gain the
analytic skills necessary to succeed in a data-driven position.
Regression analysis is a mathematically measured correlation of variables used as a predictive
modeling method. You use regression modeling to predict numerical values depending on
various inputs. For example, you can understand the relationship between an independent and
dependent variable, allowing you to predict how the dependent variable changes along with its
independent counterpart. In this case, the dependent variable is what you’re measuring and the
independent variable is the factor that causes change.
In business, regression analysis can help:
(1) Forecast trends,
(2) Predict strengths and areas of weakness or
(3) Establish cause-and-effect relationships to make informed business decisions and strategic
plans.
You often calculate regression analysis through machine learning or artificial intelligence,
though there are also mathematical equations you can use. There are different analysis types that
you can use based on the nature of the variables you are predicting and what information you
would like to gather from your analysis. It could be a simple regression (having one independent
and a dependent variable), or multiple linear regression (involving more than one independent
variable).
In addition, depending on how the dependent variable is measured (whether nominal, ordinal,
count or continuous) different techniques of regression are suitable for each. For instance, for a
binary dependent variable (i.e. a dependent variable that is measured as 0 and 1), Binary logit
regression or Binary Probit regression is the appropriate regression technique. If the dependent
variable is however measured as categorical variable of more than 2 categories (1, 2, 3,..),
Nominal Logit or Nominal probit regression is the appropriate technique, if the categories are
just nominal – without any order of ranking or weights assigned. If however the categories are
weighted (ranked/ordered), then Ordered logit or Ordered Probit regression is the appropriate
regression technique to be used.
Correlation analysis
Correlation Analysis is statistical method that is used to discover if there is a relationship
(positive or negative) between two variables/datasets, and how strong that relationship may be.
Positive Correlation
Any score from 0.1 to +1 indicates a positive correlation, which means that they both increase at
the same time. The line of best fit, or the trend line, is places to best represent the data on the
graph. In this case, it is following the data points upwards to indicate the positive correlation.

Negative Correlation
Any score between -0.1 and -1 indicate a correlation, which means that as one variable increases,
the other decreases proportionally. The line of best fit can be seen here to indicate the negative
correlation. In these cases it will slope downwards from the point of origin.
No Correlation
Very simply, a score of 0 indicates that there is no correlation, or relationship, between the two
variables. The larger the sample size, the more accurate the result. No matter which formula is
used, this fact will stand true for all.
As a rule of thumb, a correlation coefficient of +or-0.7 to + or-1 indicate strong positive or
correlation
Correlation ≠ Causation
While a significant relationship may be identified by correlation analysis techniques, correlation
does not imply causation. The cause cannot be determined by the analysis, nor should this
conclusion be attempted. The significant relationship implies that there is more to understand and
that there are extraneous or underlying factors that should be explored further in order to search
for a cause. While it is possible that a causal relationship exists, it would be remiss of any
researcher to use the correlation results as proof of this existence.
Generally, correlation:
i. Assess variables relationships
ii. Generally useful in model development as it is used to sieve out predictor variables that
have weak association – of little value to forecasting model
Simulation:
The Simulation Analysis is a method, wherein the infinite calculations are made to obtain the
possible outcomes and probabilities for any choice of action.
In the context of business analytics, simulation analysis is a technique used to model business
processes, assess risks, and predict outcomes by creating and analyzing virtual scenarios. It
allows businesses to test different strategies, understand potential impacts, and make informed
decisions in a controlled, risk-free environment.
Key Aspects of Simulation Analysis in Business Analytics:
1. Creating Models of Business Processes:
Simulation analysis involves building models that replicate the key processes within a business,
such as sales, operations, or financial performance.
These models can incorporate variables like costs, demand, supply chain logistics, and other
operational factors.
2. Scenario Analysis and "What-If" Questions:
Businesses use simulation to explore various "what-if" scenarios by changing inputs or
conditions to see how they affect outcomes.
For example, a company might simulate the impact of a price change, a new marketing strategy,
or a disruption in the supply chain.
3. Monte Carlo Simulation:
A popular method in business analytics is Monte Carlo simulation, which uses random sampling
and statistical modeling to estimate the probability of different outcomes.
This approach is valuable for risk assessment, as it provides a range of possible results and their
likelihood, helping businesses understand the uncertainty and variability in their predictions.
4. Risk Analysis and Management:
Simulation analysis helps in identifying potential risks by showing how variations in key inputs
can impact the business.
By simulating various risk scenarios, businesses can develop strategies to mitigate potential
adverse effects.
5. Decision Support and Optimization:
Simulations support decision-making by providing insights into how different choices might
affect business performance.
For example, it can help optimize resource allocation, inventory levels, or production schedules
by simulating the outcomes of different strategies.
6. Sensitivity Analysis:
This involves examining how sensitive the results of a simulation are to changes in input
variables.
Sensitivity analysis helps identify the most critical factors that influence business outcomes,
guiding focus areas for improvement.
Applications in Business Analytics:
1. Financial Planning: Simulating different financial scenarios, such as changes in market
conditions, interest rates, or cash flow to forecast financial performance and guide
investment decisions.
2. Supply Chain Management: Modeling supply chain dynamics to optimize logistics,
inventory management, and reduce costs by predicting the impact of changes in demand
or supply disruptions.
3. Customer Behavior Analysis: Simulating customer interactions and purchase patterns to
predict the outcomes of marketing campaigns, pricing changes, or new product launches.
4. Operational Efficiency: Using simulations to model workflows and processes, identify
bottlenecks, and improve operational efficiency in manufacturing, service delivery, or
other business operations.
Example:
A retailer might use simulation analysis to predict the impact of a promotional discount on sales
volume. By modeling different discount levels, marketing spends, and consumer responses, the
retailer can identify the optimal discount that maximizes profit without excessively eroding
margins.
Put is simply in general, simulation Project future behaviour of variables by simulating the past
behaviour found in probability distributions
Many other techniques and algorithms for predictive modelling exist.
Data Mining
Data mining is a discovery-driven software application process that provides insights into
business data by finding hidden patterns and relationships in big or small data and inferring rules
from them to predict future behavior. These observed patterns and rules guide decision-making.
This is not just numbers, but text and social media information from the Web. For example,
Abrahams et al. (2013) developed a set of text-mining rules that automobile manufacturers could
use to distill or mine specific vehicle component issues that emerge on the Web but take months
to show up in complaints or other damaging media. These rules cut through the mountainous
data that exists on the Web and are reported to provide marketing and competitive intelligence to
manufacturers, distributors, service centers, and suppliers. Identifying a product’s defects and
quickly recalling or correcting the problem before customers experience a failure reduce
customer dissatisfaction when problems occur.
Data mining could be descriptive, predictive or prescriptive. It is descriptive if the purpose is just
to picture out a given pattern in the data. Example: sorting sales data by gender just to identify
which gender group patronize which product. That is descriptive data mining. If however, the
sales record is used to identify seasonal demand pattern, so as to predict when the company may
likely experience high demand, then the data mining here can be categorized as predictive. Using
the predicted pattern to optimize its inventory levels and minimize stockouts and overstocking,
qualified the mining as prescriptive. For this reason, some of the same tools used in the
descriptive analytics step may be used in the predictive step but are employed to establish a
model (either based on logical connections or quantitative formulas) that may be useful in
predicting the future.
Several methodologies for data mining exist. Defending on the type of information required,
appropriate technique may be adopted. See Sample below:

Data Mining – Market Basket Analysis


In market basket analysis (also called association analysis or frequent item-set mining), you
analyze purchases that commonly happen together. For example, people who buy bread and
peanut butter also buy jelly. Or people who buy shampoo might also buy conditioner. What
relationships there are between items is the target of the analysis. Knowing what your customers
tend to buy together can help with marketing efforts and store/website layout.
Market basket analysis isn’t limited to shopping carts. Other areas where the technique is used
include analysis of fraudulent insurance claims or credit card purchases. Market basket analysis
can also be used to cross-sell products. Amazon uses an algorithm to suggest items that you might be
interested in, based on your browsing history or what other people have purchased.

A grocery store used market basket analysis, and found that men were likely to buy beer and
diapers together. Sales increased sales by placing beer next to the diapers.
It sounds simple (and in many cases, it is). However, pitfalls to be aware of:
 For large inventories (i.e. over 10,000), the combination of items may explode into the
billions, making the math almost impossible.
 Data is often mined from large transaction histories. A large amount of data is usually
handled by specialized statistical software (see below).
Basic Terminology in Market Basket analysis
An itemset is the set of items a customer buys at the same time. It is typically stated as a logic
rule like IF {bread, peanut butter} THEN {jelly}. An itemset can consist of no items (a null
amount though, is usually ignored) to all items in the data set.
The support count is a count of how often the itemset appears in the transaction database.
The support is how often the item appears, stated as a probability. For example, if the support
count is 21 out of a possible 1,000 transactions, then the probability is 21/1,000 or 0.021.
The confidence is the conditional probability that the items will be purchased together.
Calculations
Calculations are rarely performed by hand, due to large number of combinations possible from
even relatively small datasets. Software that can perform market basket analysis include:
 SAS® Enterprise Miner (Association Analysis).
 SPSS Modeler (Association Analysis).
 R (Data Mining Association Rules).

Data Mining – Cluster Analysis


Cluster analysis is a statistical method for processing data. It works by organizing items into
groups – or clusters – based on how closely associated they are. The objective of cluster analysis is
to find similar groups of subjects, where the “similarity” between each pair of subjects represents a
unique characteristic of the group vs. the larger population/sample. Strong differentiation between
groups is indicated through separate clusters; a single cluster indicates extremely homogeneous data.

Cluster analysis is an unsupervised learning algorithm, meaning that you don’t know how many
clusters exist in the data before running the model. Unlike many other statistical methods, cluster
analysis is typically used when there is no assumption made about the likely relationships within
the data. It provides information about where associations and patterns in data exist, but not what
those might be or what they mean.

When should cluster analysis be used?


Cluster analysis is for when you’re looking to segment or categorize a dataset into groups based
on similarities, but aren’t sure what those groups should be.
While it’s tempting to use cluster analysis in many different research projects, it’s important to
know when it’s genuinely the right fit. Here are three of the most common scenarios where
cluster analysis proves its worth.
Exploratory data analysis
When you have a new dataset and are in the early stages of understanding it, cluster analysis can
provide a much-needed guide.
By forming clusters, you can get a read on potential patterns or trends that could warrant deeper
investigation.
Market segmentation
This is a golden application for cluster analysis, especially in the business world. Because when
you aim to target your products or services more effectively, understanding your customer base
becomes paramount.
Cluster analysis can carve out specific customer segments based on buying habits, preferences or
demographics, allowing for tailored marketing strategies that resonate more deeply.
Resource allocation
Be it in healthcare, manufacturing, logistics or many other sectors, resource allocation is often
one of the biggest challenges. Cluster analysis can be used to identify which groups or areas
require the most attention or resources, enabling more efficient and targeted deployment.
How is cluster analysis used?
The most common use of cluster analysis is classification. Subjects are separated into groups so
that each subject is more similar to other subjects in its group than to subjects outside the group.
In a market research context, cluster analysis might be used to identify categories like age
groups, earnings brackets, urban, rural or suburban location.
In marketing, cluster analysis can be used for audience segmentation, so that different customer
groups can be targeted with the most relevant messages.
Healthcare researchers might use cluster analysis to find out whether different geographical areas
are linked with high or low levels of certain illnesses, so they can investigate possible local
factors contributing to health problems.
Employers, on the other hand, could use cluster analysis to identify groups of employees who
have similar feelings about workplace culture, job satisfaction or career development. With this
data, HR departments can tailor their initiatives to better suit the needs of specific clusters, like
offering targeted training programs or improving office amenities.
Whatever the application, data cleaning is an essential preparatory step for successful cluster
analysis. Clustering works at a data-set level where every point is assessed relative to the others,
so the data must be as complete as possible.
Cluster analysis in action: A step-by-step example
Here is how an online bookstore used cluster analysis to transform its raw data into actionable
insights.
Step one: Creating the objective
The bookstore’s aim is to provide more personalized book recommendations to its customers.
The belief is that by curating book selections that will be more appealing to subgroups of its
customers, the bookstore will see an increase in sales.
Step two: Using the right data
The bookstore has its own historical sales data, including two key variables: ‘favorite genre’,
which includes categories like sci-fi, romance and mystery; and ‘average spend per visit’.
The bookstore opts to hone in on these two factors as they are likely to provide the most
actionable insights for personalized marketing strategies.
Step three: Choosing the best approach
After settling on the variables, the next decision is determining the right analytical approach.
The bookstore opts for K-means clustering for the ‘average spend per visit’ variable because it’s
numerical – and therefore scalar data. For ‘favorite genre’, which is categorical – and therefore
non-scalar data – they choose K-medoids.
Step four: Running the algorithm
With everything set, it is time to crunch the numbers. The bookstore runs the K-means and K-
medoids clustering algorithms to identify clusters within their customer base.
The aim is to create three distinct clusters, each encapsulating a specific customer profile based
on their genre preferences and spending habits.
Step five: Validating the clusters
Once the algorithms have done their work, it’s important to check the quality of the clusters. For
this, the bookstore looks at intracluster and intercluster distances.
A low intracluster distance means customers within the same group are similar, while a high
intercluster distance ensures the groups are distinct from each other. In other words, the
customers within each group are similar to one another and the group of customers are distinct
from one another.
Step six: Interpreting the results
Now that the clusters are validated, it’s time to dig into what they actually mean. Each cluster
should represent a specific customer profile based solely on ‘favorite genre’ and ‘average spend
per visit’.
For example, one cluster might consist of customers who are keen on sci-fi and tend to spend
less than N200, while another cluster could be those who prefer romance novels and are in the
N200-400 spending range.
Step seven: Applying the findings
The final step is all about action. Armed with this new understanding of their customer base, the
bookstore can now tailor its marketing strategies.
Knowing what specific subgroups like to read and how much they’re willing to spend, the store
can send out personalized book recommendations or offer special discounts to those specific
clusters – aiming to increase sales and customer satisfaction.
Cluster analysis algorithms
Your choice of cluster analysis algorithm is important, particularly when you have mixed data.
Some of the common ones are:
Hierarchical clustering: is a methodology that establishes a hierarchy of clusters that can be
grouped by the hierarchy. Two strategies are suggested for this methodology: agglomerative and
divisive. The agglomerative strategy is a bottom-up approach, where one starts with each item in
the data and begins to group them. The divisive strategy is a top-down approach, where one
starts with all the items in one group and divides the group into clusters. How the clustering takes
place can involve many different types of algorithms and differing software applications.
K-mean clustering is a classification methodology that permits a set of data to be reclassified
into K groups, where K can be set as the number of groups desired. The algorithmic process
identifies initial candidates for the K groups and then interactively searches other candidates in
the data set to be averaged into a mean value that represents a particular K group. The K-mean
clustering process provides a quick way to classify data into differentiated groups. To illustrate
this process, use the sales data in Figure below and assume these are sales from individual
customers. Suppose a company wants to classify the sales customers into high and low sales
groups.
The SPSS K-Mean cluster software can be found in Analyze > Classify > K-Means Cluster
Analysis. Any integer value can designate the K number of clusters desired. In this problem set,
K=2. The SPSS printout of this classification process is shown in Table below. The solution is
referred to as a Quick Cluster because it initially selects the first two high and low values. The
Initial Cluster Centers table listed the initial high (20167) and a low (12369) value from the data
set as the clustering process begins. As it turns out, the software divided the customers into nine
high sales customers with a group mean sales of 18,309 and eleven low sales customers with a
group mean sales of 14,503.
CASE STUDIES
1. Case Study Background A firm has collected a random sample of monthly sales information
on a service product offered infrequently and only for a month at a time. The sale of this service
product occurs only during the month that the promotion efforts are allocated. Basically,
promotion funds are allocated at the beginning or during the month, and whatever sales occur are
recorded for that promotion effort. There is no spillover of promotion to another month, because
monthly offerings of the service product are independent and happen randomly during any
particular year. The nature of the product does not appear to be impacted by seasonal or cyclical
variations, which prevents forecasting and makes planning the budget difficult.
The firm promotes this service product by using radio commercials, newspaper ads, television
commercials, and point-of-sale (POS) ad cards. The firm has collected the sales information as
well as promotion expenses. Because the promotion expenses are put into place before the sales
take place and on the assumption that the promotion efforts impact products, the four promotion
expenses can be viewed as predictive data sets (or what will be the predictive variables in a
forecasting model). Actually, in terms of modeling this problem, product sales is going to be
considered the dependent variable, and the other four data sets represent independent or
predictive variables.
These five data sets, in thousands of dollars, are present in the SPSS printout shown in Figure
below. What the firm would like to know is, given a fixed budget of N350,000 for promoting this
service product, when offered again, how best should budget dollars be allocated in the hope of
maximizing future estimated months’ product sales?
{This is a typical question asked of any product manager and marketing manager’s promotion
efforts. Before allocating the budget, there is a need to understand how to estimate future
product sales. This requires understanding the behavior of product sales relative to sales
promotion. To begin to learn about the behavior of product sales to promotion efforts, we begin
with the first step in the BA process: descriptive analytics.}

CASE STUDY 2.
Suppose a grocery store has collected a big data file on what customers put into their baskets at
the market (the collection of grocery items a customer purchases at one time). The grocery store
would like to know if there are any associated items in a typical market basket. (For example, if a
customer purchases product A, she will most often associate it or purchase it with product B.) If
the customer generally purchases product A and B together, the store might only need to
advertise product A to gain both product A’s and B’s sales. The value of knowing this
association of products can improve the performance of the store by reducing the need to spend
money on advertising both products. The benefit is real if the association holds true. Finding the
association and proving it to be valid requires some analysis. From the descriptive analytics
analysis, some possible associations may have been uncovered, such as product A’s and B’s
association. With any size data file, the normal procedure in data mining would be to divide the
file into two parts. One is referred to as a training data set, and the other as a validation data set.
The training data set develops the association rules, and the validation data set tests and proves
that the rules work. S

You might also like