0% found this document useful (0 votes)
11 views68 pages

Business Analytics - Part-II

Uploaded by

nehagoud46563
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views68 pages

Business Analytics - Part-II

Uploaded by

nehagoud46563
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Q-6) Explain about Data for Business Analytics. (or)


What is data? Explain about different types of Data.

Data for Business Analytics: -


• Business analytics uses data from three sources for construction of the
business model.
• It uses business data such as annual reports, financial ratios, marketing
research, etc.
• It uses the database which contains various computer files and information
coming from data analysis.

Overview of Using Data: Definitions and Goals: -


• Data: - Data are the facts and figures collected, analysed, and summarized
for presentation and interpretation.
• Variable: - A characteristic or a quantity of interest that can take on
different values is known as a variable.
• Observation: - An observation is a set of values corresponding to a set of
variables.
• Variation: - Variation is the difference in a variable measured over
observations (time, customers, items etc.). The role of descriptive analytics
is to collect and analyse data to gain a better understanding of variation
and its impact on the business setting.
• Random Variable (or) Uncertain Variable: -
▪ The values of some variables are under direct control of the decision
maker, these are often called decision variables.
▪ The values of other variables may fluctuate with uncertainty because of
factors outside the direct control of the decision maker.
▪ In general, a quantity whose values are not known with certainty is called
a ‘random variable(or) uncertain variable’.

Types of Data: -
• Population and Sample Data: -
▪ Data can be categorized in several ways based on how they are collected,
and the type collected. In many cases, it is not feasible to collect data
from the population of all elements of interest.
▪ In such instances, we collect data from a subset of the population known
as a sample.
▪ For example, with the thousands of publicly traded companies in the
India, tracking and analyzing all of these stocks every day would be too
time consuming and expensive. The NSE represents a sample of 50
stocks of large public companies based in India, and it is often

15
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
interpreted to represent the larger population of all publicly traded
companies.
▪ It is very important to collect sample data that are representative of the
population data so that generalizations can be made from them. In most
cases, a representative sample can be gathered by random sampling
from the population data.

• Quantitative and Categorical Data: -


▪ Data are considered quantitative data if numeric and arithmetic
operations, such as addition, subtraction, multiplication, and division
can be performed on them.
▪ For instance, we can sum the values for Volume in the NSE data to
calculate a total volume of all shares traded by companies included in
the NSE.
▪ If arithmetic operations cannot be performed on the data, they are
considered categorical data.
▪ We can summarize categorical data by counting the number of
observations or computing the proportions of observations in each
category.
▪ For instance, we can count the number of companies in the NSE that are
in the telecommunications industry.

• Cross-Sectional and Time Series Data: -


▪ For statistical analysis, it is important to distinguish between cross-
sectional data and time series data.
▪ ‘Cross-sectional data’ are collected from several entities at the same or
approximately the same, point in time.
▪ The data in NSE are cross-sectional because they describe the 50
companies that comprise the NSE at the same point in time (Nov 2023).
▪ ‘Time series data’ are collected over several time periods. Graphs of time
series data are frequently found in business and economic publications.
▪ Such graphs help analysts understand what happened in the past,
identify trends over time, and project future levels for the time series.
▪ For example, the graph of the time series in NSE value from January
2023 to November 2023.

Q-7) Explain about the Business Decision Modelling.

• Business Decision Modelling is a structured process that predicts the


outcome of different scenarios and provides insights to business users.
• It's a forecasting tool that helps break down complex decision-making
processes into manageable components.

16
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Decision modelling can also help companies model complex operational
decisions into more manageable subsets, which can facilitate scalability.
• Decision modelling is the process of creating a structured and, typically,
visual representation of how the decisions are made within an
organisation.
• Decision models created through this process serve as visual aids helping
all involved stakeholders, including analysts and key decision-makers, to
comprehend all the important factors, business rules, and considerations
that impact choices within an organisation.
• The Business Decision Models are as follows:
o Creative Decision-Making Model
o Intuitive Decision-Making Model
o Rational Decision-Making Model

• Creative Decision-Making Model:


o The creative decision-making model involves using new ideas to solve
problems. It differs from other models, which focus on past successes.
o Creative decision-making involves:
▪ Considering all perspectives
▪ Solving problems in new ways
▪ Establishing new or better alternatives
▪ Discovering new products or services
▪ Thinking outside the box
▪ Generating innovative solutions
▪ Taking risks
▪ Exploring unconventional ideas

• Intuitive Decision-Making Model:


o The intuitive decision-making model is a decision-making process that
relies on feelings and instinct, rather than logical reasoning.
o It's often used by managers and team leaders to make quick decisions
when they don't have much time for planning or research.

17
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o Intuitive decision making is especially beneficial in entrepreneurship,
ideation, creativity, and selling.
o Here are some steps for making intuitive decisions:
▪ Identify the decision to be made.
▪ Gather relevant information.
▪ Identify alternative solutions.
▪ Evaluate the options.
▪ Choose a course of action.
▪ Implement the decision.
▪ Review the outcome.

• Rational Decision-Making Model: -


o The Rational Decision-Making Model is a framework for making logical
and objective choices.
o It involves taking emotion out of decision-making and applying logical
steps to work towards a solution.
o The model assumes that you have clear objectives, complete and
accurate information, and unlimited time and resources to evaluate all
the possible alternatives and their consequences.
Definition:
o Being the opposite of intuitive decision making, rational model of
decision making is a model where individuals use facts and
information, analysis, and a step-by-step procedure to come to a
decision. The rational model of decision making is a more advanced
type of decision-making model.

18
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o The rational decision-making model can be used for a variety of
reasons, including educational purposes, business decisions, career
choices, and other significant life events.
o It allows for an objective approach that's based on scientifically
obtained data to reach informed decisions.
o It reduces the chance of errors and assumptions.
o It helps to minimize the manager's emotions which might have resulted
in poor judgments in the past.

19
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

UNIT – II
Descriptive Analytics
Overview of Descriptive Statistics: - (Central Tendency, Variability)
Q-1) Explain about Measures of Location (or) Central Tendency
through MS-Excel: -
• Central tendency is a statistical measure that represents a single value
that is representative of an entire data distribution.
• It aims to provide an accurate description of the entire data.

• Mean (Arithmetic Mean): -


▪ The most commonly used measure of location is the mean (arithmetic
mean), or average value for a variable.
▪ The mean provides a measure of central location for the data. If the
data are for a sample, the mean is denoted by x—(x-bar).
▪ The sample mean is a point estimate of the population mean for the
variable of interest.
▪ For a sample with ‘n’ observations, the formula for the sample mean
is as follows:

20
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The mean can be found in Excel using the AVERAGE function.
▪ The Home Sales data from Table in an Excel spreadsheet. The value
for the mean in cell E3 is calculated using the formula:
= AVERAGE (C3:C14)

Median: -

▪ The median, another measure of central location, is the value in the


middle when the data are arranged in ascending order (smallest to
largest value).
▪ With an odd number of observations, the median is the middle value.
▪ An even number of observations has no single middle value. In this
case, we follow convention and define the median as the average of the
values for the middle two values.

▪ The median of a data set can be found in Excel using the function
= MEDIAN( )
▪ The value for the median in cell E5 is found using the formula
=MEDIAN(B4:B13)

Mode: -
▪ As a third measure of location, the mode, is the value that occurs most
frequently in a dataset.
▪ Consider the sample of five class sizes, 32 42 46 46 54
The only value that occurs more than once is 46. Because this value,
occurring with a frequency of 2, has the greatest frequency, it is the
mode.
▪ To find the mode for a data set with only one most often occurring
value in Excel, we use the MODE.SNGL function.
▪ Occasionally the greatest frequency occurs at two or more different
values, in which case more than one mode exists.
▪ If data contain at least two modes, we say that they are multimodal. A
special case of multimodal data occurs when the data contain exactly
two modes; in such cases we say that the data are bimodal.

21
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ In multimodal cases when there are more than two modes, the mode is
almost never reported because listing three or more modes is not
particularly helpful in describing a location for the data.
▪ Also, if no value in the data occurs more than once, we say the data have
no mode.
▪ The Excel MODE.SNGL function will return only a single most-often-
occurring value.
▪ For multimodal distributions, we must use the MODE.MULT command
in Excel to return more than one mode.
▪ To find both of the modes in Excel, we take these steps:
Step 1: Select cells G7 and G8
Step 2: Type the formula =MODE.MULT(E4:E16)
Step 3: Press CTRL+SHIFT+ENTER after typing the formula
Excel enters the values for both modes of this data set in cells G7 and G8

Geometric Mean:-
▪ The geometric mean is a measure of location that is calculated by
finding the nth root of the product of n values.
▪ The general formula for the sample geometric mean, denoted x—g
follows:

▪ The geometric mean is often used in analyzing growth rates in financial


data. In these types of situations, the arithmetic mean or average value
will provide misleading results.
▪ We can use Excel to calculate the geometric mean for the given data by
using the function
=GEOMEAN( )

22
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

In the figure, the value for the geometric


mean in cell C4 is found by using the
formula
=GEOMEAN(B3:B12).

Harmonic Mean (HM): -


▪ The harmonic mean is the reciprocal of the arithmetic mean of reciprocals.
▪ The harmonic mean is a kind of numeric average, calculated by dividing the
number values in a list by the sum of the reciprocal of each value. In other
words, the harmonic mean is the reciprocal of the average of the reciprocals.
▪ Harmonic mean can be used to calculate a mean that reduces the impact of
large outliers in the data set.
▪ The harmonic mean is always less than the geometric mean (GEOMEAN),
which is always less than the arithmetic mean (AVERAGE).
▪ The Excel HARMEAN function returns the harmonic mean for a set of
numeric values.
Syntax
=HARMEAN(number1,[number2],...)
number1 - First value or reference.
number2 - [optional] Second value or reference.

In the figure, the value for


the Harmonic mean in cell
C4 is found by using the
formula
=HARMEAN (A2:A11).

Relationship among Mean, Median and Mode:


▪ The relationship between mean, median, and mode is known as the
empirical relationship. It can be expressed as:
Mean - Mode = 3(Mean - Median)
▪ You can also express it as:
Mode = 3 Median - 2 Mean

23
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The mean, median, and mode are all measures of the centre of a set of
data. You can use this relationship to estimate the third unknown quantity
if you know any two of the mean, median, or mode.
▪ For any given data, mean is the average of given data values, and this can
be calculated by dividing the sum of all data values by number of data
values. Median is the middlemost value of the data set when data values
are arranged either in ascending or descending order. Mode is the most
frequently occurred data value.
▪ For a frequency distribution with symmetrical frequency curve, the
relation between mean median and mode is given by:
Mean = Median = Mode
▪ For a positively skewed frequency distribution, the relation between
mean median and mode is:
Mean > Median > Mode
▪ For a negatively skewed frequency distribution, the relation between
mean median and mode is:
Mean < Median < Mode

Example:
The median and mean values of a series that is moderately
asymmetrical are 30 and 20, respectively. Determine the mode value.

Solution:
In the above example to calculate the value of mode, the following
formula is to be used:

Mode = 3 Median – 2 Mean

here, Median = 30 and Mean = 20.

Thus, Mode = 3(30) – 2(20)


Mode = 90 – 40
Mode = 50
24
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Q-2) Explain about Measures of Dispersion (or)


Variability through MS-Excel: -
Measures of Variability: -
▪ In addition to measures of location, it is often desirable to consider
‘Measures of Variability or Dispersion’.
▪ The following are the different measures of variability or dispersion:
▪ Range
▪ Quartile Deviation (QD)
▪ Mean Deviation (MD)
▪ Standard Deviation (SD)
▪ Variance
▪ Co-efficient of Variation
Range: -
▪ The simplest measure of variability is the range. The range can be found
by subtracting the smallest value from the largest value in a data set.
Range = Maximum Value–Minimum Value
▪ The range can be calculated in Excel using the =MAX () and =MIN ()
functions.
▪ The range value in cell E7,
calculates the range using the
formula:
=MAX (B2:B13) – MIN (B2:B13).

▪ This subtracts the smallest


value in the range B2:B13 from
the largest value in the range
B2:B13

Quartile Deviation (QD): -


▪ Quartile deviation is the difference in the frequency distribution between
the first and third quartiles. Therefore, it is also called the interquartile
range. Moreover, If the difference is divided by two, it is known as quartile
deviation or semi-interquartile range.
▪ The Quartile Deviation can be defined mathematically as half of the
difference between the upper and lower quartile. Here, quartile deviation
can be represented as QD; Q3 denotes the upper quartile and Q1 indicates
the lower quartile.
QD = (Q3 – Q1)/2
▪ Use the Excel QUARTILE.INC function to find the first quartile (Q1).
Formula:
=QUARTILE.INC(data_range, 1)

25
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Use the Excel QUARTILE.INC function again to find the third quartile
(Q3). Formula:
=QUARTILE.INC(data_range, 3)
▪ Subtract Q1 from Q3.
▪ Divide the result by 2 (since it's the semi-interquartile range).

Mean Deviation(MD) : -
▪ Mean deviation, also known as the average absolute deviation, is a
measure of statistical dispersion that quantifies the average absolute
difference between each data point and the mean of the dataset.
▪ It gives an indication of how spread out the values in a dataset.
▪ In Excel, you can calculate the mean deviation using built-in functions.
▪ Organize your data: Place your data in a column in Excel.
▪ Calculate the Mean: Use the AVERAGE function to find the mean of your
dataset.
▪ Calculate Absolute Deviations: Subtract the mean from each data point
and take the absolute value of each difference.
▪ Calculate Mean Deviation: Find the average of the absolute deviations
calculated in step 3.
▪ Suppose your data is in cells A2:A11.
▪ In cell A13, enter the formula to calculate the mean:
=AVERAGE (A2:A11)
▪ In cell B2, enter the formula to calculate the absolute deviations:
=ABS(A2-$A$13)
▪ Drag this formula down to cover all the data points. This will give you the
absolute deviation of each data point from the mean.

▪ In cell B14, calculate the mean deviation using the formula:


=AVERAGE(B2:B11)
This will give you the mean deviation.
▪ Alternatively, you can do it in one step by combining the formulas:

26
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
=AVERAGE(ABS(A2:A11-AVERAGE(A2:A11)))
This formula directly calculates the mean deviation without needing an
intermediate column for absolute deviations.

Variance: -
▪ Variance is a statistical measure that quantifies the spread or dispersion
of a set of data points around their mean or average value.
▪ It provides insight into how much individual data points differ from the
mean.

▪ In Excel, you can calculate variance using the VAR function for
population variance and the VAR.S function for sample variance.
▪ These functions take the data range as input and return the variance of
the dataset.

27
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

In Excel, you can find the


variance for sample data using
the VAR.S function.
The variance in cell E8 is
calculated using the formula

=VAR.S(B2:B13).

Standard Deviation:-
▪ The standard deviation is defined to be the positive square root of the
variance.
▪ We use ‘s’ to denote the sample standard deviation and ‘σ’ to denote the
population standard deviation.
▪ The sample standard deviation ‘s’, is a point estimate of the population
standard deviation ‘σ’ , and is derived from the sample variance in the
following way: Sample Standard Deviation (s) = √s2

The Excel calculation for the sample


standard deviation, which can be
calculated using Excel’s STDEV.S
function.

The sample standard deviation in cell


E9 is calculated using the formula
=STDEV.S(B2:B13).

Coefficient of Variation:-
▪ The coefficient of variation (CV) is a relative measure of variability that
expresses the standard deviation as a percentage of the mean.
▪ It is used to compare the variability of different datasets with different units
or scales, allowing for a more meaningful comparison.
▪ Mathematically, the coefficient of variation is calculated as:

28
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Q-3) Explain about Data Visualization and Techniques


of Data Visualization: -
▪ The first step in trying to interpret data is often to visualize it in some way.
Data visualization can be as simple as creating a summary table, or it
could require generating charts to help interpret, analyze, and learn from
the data.
▪ Data visualization is very helpful for identifying data errors and for
reducing the size of your data set by highlighting important relationships
and trends.
▪ Data visualization is also important in conveying your analysis to others.
Although business analytics is about making better decisions, in many
cases, the ultimate decision maker is not the person who analyzes the
data. Therefore, the person analyzing the data has to make the analysis
simple for others to understand.
▪ Proper data-visualization techniques greatly improve the ability of the
decision maker to interpret the analysis easily.
Overview of Data Visualization: -
▪ Decades of research studies in psychology and other fields show that the
human mind can process visual images such as charts much faster than
it can interpret rows of numbers.
▪ However, these same studies also show that the human mind has certain
limitations in its ability to interpret visual images and that some images
are better at conveying information than others.
▪ The goal of Data Visualization is to introduce some of the most common
forms of visualizing data and demonstrate when each form is appropriate.
▪ Microsoft Excel is a suitable tool used in business for basic data
visualization.
▪ Software tools such as Excel make it easy for anyone to create many
standard examples of data visualization.
29
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Visualization Techniques:-
▪ The first decision in displaying data is whether a table or a chart will be
more effective.
▪ In general, charts can often convey information faster and easier to readers,
but in some cases a table is more appropriate.
▪ Tables should be used when the
1. reader needs to refer to specific numerical values.
2. reader needs to make precise comparisons between different values and
not just relative comparisons.
3. values being displayed have different units or very different magnitudes.

Cross tabulation: -
▪ A useful type of table for describing data of two variables is a
crosstabulation, which provides a tabular summary of data for two
variables.
▪ For example, consider the following application based on data from Zagat’s
Restaurant Review. Data on the quality rating, meal price, and the usual
wait time for a table during peak hours were collected for a sample of 300
Los Angeles area restaurants. The data for the first 10 restaurants are as
follows:

▪ Quality ratings are an example of categorical data, and meal prices are an
example of quantitative data.
▪ For now, we will limit our consideration to the quality-rating and meal-
price variables.
▪ A cross tabulation of the data for quality rating and meal price is shown in
the following table:

30
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

PivotTables in Excel:-
A cross tabulation in Microsoft Excel is known as a PivotTable. To create a
PivotTable in Excel, we follow these steps:
Step 1. Click the Insert tab on the Ribbon
Step 2. Click PivotTable in the Tables group
Step 3. When the Create PivotTable dialog box appears: Choose Select a
Table or Range Enter the data in the Table/Range: box
Select New Worksheet as the location for the PivotTable Report
Click OK
Step 4. In the PivotTable Fields task pane, go to Drag fields between
areas below: Drag the Quality Rating field to the ROWS area.
Drag the Meal Price ($) field to the COLUMNS area.
Drag the Restaurant field to the VALUES area.
Step 5. Click on Sum of Restaurant in the VALUES area.
Step 6. Select Value Field Settings from the list of options.
Step 7. When the Value Field Settings dialog box appears:
Under Summarize value field by, select Count.
Click OK
▪ The completed PivotTable Field List and a portion of the PivotTable
worksheet as it now appears.
▪ To complete the PivotTable, we need to group the columns representing
meal prices and place the row labels for quality rating in the proper order:
Step 8. Right-click in cell B4 or any cell containing a meal price column label
Step 9. Select Group from the list of options
Step 10. When the Grouping dialog box appears:
Enter 10 in the Starting at: box
Enter 49 in the Ending at: box
Enter 10 in the By: box
Click OK
Step 11. Right-click on “Excellent” in cell A5
Step 12. Select Move and click Move “Excellent” to End

31
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

CHARTS: -
▪ Charts (or graphs) are visual methods for displaying data. Here, we
introduce some of the most commonly used charts to display and analyze
data including scatter charts, line charts, and bar charts.
▪ Excel is the most commonly used software package for creating simple
charts.
How to use Excel to create scatter charts, line charts, sparklines, bar
charts, bubble charts, and heat maps:-

Scatter Charts:-
▪ A scatter chart is a graphical presentation of the relationship between two
quantitative variables.
▪ For example, On 10 occasions during the past three months, the store
used week- end television commercials to promote sales at its stores. The
managers want to investigate whether a relationship exists between the
number of commercials shown and sales at the store the following week.
Sample data for the 10 weeks, with sales in hundreds of dollars are shown
in the following table:
No. of Sales
Commercials ($100s)
Week x Y
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46

We will use the data from the above table to create a scatter chart using
Excel’s chart tools:
Step 1. Select cells B2:C11
Step 2. Click the Insert tab in the Ribbon.
Step 3. Click the Insert Scatter (X,Y) or Bubble Chart button
in the Charts group
Step 4. When the list of scatter chart subtypes appears, click the
Scatter button
Step 5. Right-click on one of the horizontal grid lines in the body of
the chart, and click Delete

32
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 6. Right-click on one of the vertical grid lines in the body of the
chart, and click Delete
Step 7. Click Add Chart Element in the Chart Layouts group, Select
Axis, Axis Title, Chart Title and replace the Axis Titles
and Chart Title with respective Names.
▪ We can also use Excel to add a trend line to the scatter chart.
▪ A trend line is a line that provides an approximation of the
relationship between the variables.
▪ To add a linear trend line using Excel, we use the following steps:
Step 1. Right-click on one of the data points in the scatter chart, and
select Add Trendline…
Step 2. When the Format Trendline task pane appears, select Linear
under Trendline Options
The following shows the scatter chart and linear trendline created with Excel
for the data in the above table. The number of commercials (x) is shown on
the horizontal axis, and sales (y) are shown on the vertical axis.

Line Charts :-
▪ Line charts are similar to scatter charts, but a line connects the points in
the chart.
▪ Line charts are very useful for time series data collected over a period of
time (minutes, hours, days, years, etc.).
▪ To create the line chart in Excel, we follow these steps:
Step 1. Select cells of the entered data.
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Line Chart button in the Charts group.
Step 4. When the list of line chart subtypes appears, click the Line
with Markers button under 2-D Line
▪ This creates a line chart for sales with a basic layout and minimum
formatting.
33
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 5. Click the Chart Elements button Select the check boxes for
Axes, Axis Titles, and Chart Title. Deselect the check box for Gridlines.

Sparkline: -
▪ A special type of line chart is a spark line, which is a minimalist type of
line chart that can be placed directly into a cell in Excel.
▪ Spark lines contain no axes; they display only the line for the data.
▪ Spark lines take up very little space, and they can be effectively used to
provide information on overall trends for time series data.
▪ The use of spark lines in Excel for the regional sales data. To create a spark
line in Excel:
Step 1. Click the Insert tab on the Ribbon.
Step 2. Click Line in the Spark lines group.

Bar Charts and Column Charts:-


▪ Bar charts and column charts provide a graphical summary of categorical
data.
▪ Bar charts use horizontal bars to display the magnitude of the quantitative
variable.
▪ Column charts use vertical bars to display the magnitude of the quantitative
variable.
▪ Bar and column charts are very helpful in making comparisons between
categorical variables.
▪ Consider a regional supervisor who wants to examine the number of
accounts being handled by each manager. The following figure shows a bar
chart created in Excel displaying these data.
▪ To create this bar chart in Excel:
Step 1. Select cells A2:B9
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Column or Bar Chart button in the Charts
group
34
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 4. When the list of bar chart subtypes appears:
Click the Clustered Bar button in the 2-D Bar section
Step 5. Select the bar chart that was just created to reveal the Chart
Buttons
Step 6. Click the Chart Elements button Select the check boxes for
Axes, Axis Titles, and Chart Title. Deselect the check box for Gridlines
Click on the text box next to the vertical axis, and replace “Axis Title”
with Accounts Managed
Click on the text box next to the vertical axis, and replace “Axis Title” with
Manager
Click on the text box above the chart, and replace “Chart Title” with
Bar Chart of Accounts Managed

Pie Charts :-
➢ Pie charts are another common form of chart used to compare
categorical data.
➢ However, many experts argue that pie charts are inferior to bar charts
for comparing data.
➢ The pie chart in the Figure displays the data for the number of accounts
managed in another figure.
➢ Visually, it is still relatively easy to see that Gentry has the greatest
number of accounts and that Williams has the fewest.
➢ However, it is difficult to say whether Lopez or Francois has more
accounts.
➢ Research has shown that people find it very difficult to perceive
differences in area.
➢ Compare the two figures, making visual comparisons is much easier in
the bar chart than in the pie chart (particularly when using a limited
number of colours for differentiation).
➢ Therefore, we recommend against using pie charts in most situations
and suggest instead using bar charts for comparing categorical data.

35
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Bubble Charts:-
▪ A bubble chart is a graphical means of visualizing three variables in a two-
dimensional graph and is therefore sometimes a preferred alternative to a
3-D graph.
▪ Suppose that we want to compare the number of billionaires in various
countries.
▪ The following table provides a sample of six countries, showing, for each
country, the number of billionaires per 10 million residents, the per capita
income, and the total number of billionaires.
▪ We can create a bubble chart using Excel to further examine these data:
Step 1. Select cells of entered data.
Step 2. Click the Insert tab on the Ribbon
Step 3. In the Charts group, click Insert Scatter (X,Y) or Bubble Chart
In the Bubble subgroup, click Bubble
Step 4. Select the chart that was just created to reveal the Chart Buttons
Step 5. Click the Chart Elements button Select the check boxes for
Axes, Axis Titles, Chart Title and Data Labels. Deselect the
check box for Gridlines. Rename all the Titles.

36
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Heat Maps: -
▪ A heat map is a two-dimensional graphical representation of data that
uses different shades of color to indicate magnitude.
▪ The following figure shows a heat map indicating the magnitude of
changes for a metric called same-store sales, which are commonly used
in the retail industry to measure trends in sales.
▪ The cells shaded red in Figure indicate declining same- store sales for the
month, and cells shaded blue indicate increasing same-store sales for the
month. Column N in Figure also contains sparklines for the same-store
sales data.
▪ The following Figure can be created in Excel by following these steps:
Step 1. Select cells B2:M17
Step 2. Click the Home tab on the Ribbon
Step 3. Click Conditional Formatting in the Styles group
Select Color Scales and click on Blue–White–Red Color Scale
▪ To add the sparklines in column N, we use the following steps:
Step 4. Select cell N2
Step 5. Click the Insert tab on the Ribbon
Step 6. Click Line in the Sparklines group
Step 7. When the Create Sparklines dialog box opens: Enter
B2:M2 in the Data Range: box Enter N2 in the Location
Range: box and click OK
Step 8. Copy cell N2 to N3:N17

37
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Explain about Data Dashboards through EXCEL with


Principals. How to create an Excel Dashboard?
Data Dashboards: -
▪ A data dashboard is a data-visualization tool that illustrates multiple
metrics and automatically updates these metrics as new data become
available.
▪ It is like an automobile’s dash- board instrumentation that provides
information on the vehicle’s current speed, fuel level, and engine
temperature so that a driver can assess current operating conditions and
take effective action.
▪ Similarly, a data dashboard provides the important metrics that managers
need to quickly assess the performance of their organization and react
accordingly.
▪ Here we provide guidelines for creating effective data dashboards and an
example application.

Principles of Effective Data Dashboards: -


▪ In an automobile dashboard, values such as current speed, fuel level, and
oil pressure are displayed to give the driver a quick overview of current
operating characteristics.
▪ In a business, the equivalent values are often indicative of the business’s
current operating characteristics, such as its financial position, the
inventory on hand, customer service metrics, and the like.
▪ These values are typically known as key performance indicators (KPIs).
▪ A data dashboard should provide timely summary information on KPIs
that are important to the user, and it should do so in a manner that
informs rather than overwhelms its user.
▪ Ideally, a data dashboard should present all KPIs as a single screen that
a user can quickly scan to understand the business’s current state of
operations.
▪ Rather than requiring the user to scroll vertically and horizontally to see
the entire dashboard, it is better to create multiple dashboards so that
each dashboard can be viewed on a single screen.
▪ The KPIs displayed in the data dashboard should convey meaning to its
user and be related to the decisions the user makes.
▪ For example, the data dashboard for a marketing manager may have KPIs
related to current sales measures and sales by region, while the data
dashboard for a Chief Financial Officer should provide information on the
current financial standing of the company, including cash on hand,
current debt obligations, and so on.
▪ A data dashboard should call attention to unusual measures that may
require attention, but not in an overwhelming way.
▪ Colour should be used to call attention to specific values to differentiate
38
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
categorical variables, but the use of colour should be restrained.
▪ Too many different or too bright colours make the presentation
distracting and difficult to read.

How to create an Excel Dashboard?


▪ A dashboard is a visual representation of KPIs, key business metrics, and
other complex data in a way that’s easy to understand.
▪ A dashboard is a compact visual representation of data. Dashboards are
designed to be eye-catching, easy to understand, and concise so that users
can quickly extract insights.
▪ Dashboards can contain raw numbers, tables, and charts. However,
dashboards become powerful tools when they incorporate visual elements
to enrich the information they convey, such as through colours, charts,
and conditional formatting.
▪ The following are the different steps involved in creating a dashboard:
Step 1: Pull your raw data into Excel.
Step 2: Set up a structure for your workbook.
Step 3: Create a table.
Step 4: Visualize your data.
There are many ways to analyze your data on Excel, and you should go
for the option that suits the overall goal for the dashboard.
Here are the commonly used methods of visualizing data:
Pivot Table
Charts
Excel Formulas
Conditional formatting
Pivot tables are the most useful method for the purposes of a
dashboard, as they let you sort, group, count, and add up data in a table.
Step 5: Create a Pivot Table
In the new worksheet, go to Insert>PivotTable.
▪ Arrange the table fields depending on what you want to see. Now you can
insert a Pivot Chart based on the PivotTable.
▪ You could also go with a pie chart, bar graph, Gantt chart, waterfall chart,
stacked column chart, line graph, and more.
▪ Continue creating more tables on new worksheets, to display all the key
metrics you’re looking for. Make sure to name each worksheet for easy
identification.
Step 6: Assemble your Dashboard.

39
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

UNIT – III :: PREDICTIVE ANALYTICS


Explain about Predictive Modelling
Definition of Predictive analytics:-
▪ Predictive analytics is the use of data, statistical algorithms and machine
learning techniques to identify the likelihood of future outcomes based on
historical data.
▪ The goal is to go beyond knowing what has happened to providing a best
assessment of what will happen in the future.
▪ Predictive analytics is the use of statistics and modelling techniques to
determine future performance based on current and historical data.
▪ Predictive analytics looks at patterns in data to determine if those patterns
are likely to emerge again, which allows businesses and investors to adjust
where they use their resources to take advantage of possible future events.

Ex:-1.

40
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Ex:- 2

Ex:- 3.

Ex:- 4.

41
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Ex:- 5.

❖ Explain about Trend Line:


What is a trend line?
▪ A trend line is a straight line connecting a number of points on a graph.
▪ It is used to analyse the specific direction of a group of values set in a
presentation.
▪ There are two kinds of trend lines:
• an uptrend with values going higher and
• a downtrend where the direction of the line gradually drops to the
lower values.
Predicting the future:-
▪ Trend lines allow businesses to see the difference in various points over
a period of time.
▪ This helps foretell the possible path the values will take in the future.
▪ This can help reveal performance, value and competitiveness of specific
products and services, along with the relevant business departments,
such as sales.
▪ By knowing how to add a trend line to your presentation, you can create
a graphical representation of the values you have computed.
▪ This will enable the user to easily comprehend and analyze the message
you are trying to imply.
Add a trend line to your Excel chart
▪ If you use Office 2013, you can create a trend line to complement your
reports by right clicking the data series (e.g., the information that has been
graphed to charted) in the chart you created.
▪ This will show a drop down menu where you can find the option to Add a
Trend line.
▪ This will open another window where trend line types are available. You
can choose the one which suits the chart you created.
42
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Exponential trend lines:
This creates an uneven arc that is more curved at one side than the other
on charts with values that fluctuate. It cannot be used when you have a
zero or a negative value in your chart.
• Linear trend lines:
Most common when the values in your chart create a straight line. This
shows a continuous rise or fall trend that indicates a path it will steadily
continue in the future.
• Logarithmic trend lines:
Where there is a sudden increase or decrease in the chart, which then
continues to become level.
• Polynomial trend lines:
Used for larger set of data with fluctuating values. If the direction of your
values continuously changes, then this option could suit you best.
• Power trend lines:
Almost the same as exponential, only in this, the arc is more
symmetrical.
• Moving average trend lines:
Used when your points seem to have too many ups and downs. This
levels out the extreme fluctuations for easier trend analysis. Depending
on the number of periods set, this option gathers the values together and
computes its average which is then used as the trend point.
Whatever your reports, it is easier to spot the direction of values
when you use graphical tools to show data. This ensures that reports are
easily understood, along with the trend at which your values are headed
as a result of the lines appearing in the chart.

Explain about Regression Analysis with types.


What is Regression Analysis?
▪ A predictive modelling technique that evaluates the relation between
dependent (i.e. the target variable) and independent variables is known as
regression analysis.
▪ Regression analysis can be used for forecasting time series modelling, or
finding the relation between the variables and predict continuous values.
▪ For example, the relationship between household locations and the power
bill of the household by a driver is best studied through regression.
▪ In simple words, regression analysis is used to model the relationship
between a dependent variable and one or more independent variables.
▪ Suppose your manager asked you to predict annual sales. There can be a
hundred of factors (drivers) that affects sales. In this case, sales is your
dependent variable. Factors affecting sales are independent variables.
Regression analysis would help you to solve this problem.

43
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and
independent variable (s) (predictor).
▪ This technique is used for forecasting, time series modelling and finding
the causal effect relationship between the variables.
▪ For example, relationship between rash driving and number of road
accidents by a driver is best studied through regression.
▪ The term “regression” in this context, was first coined by Sir Francis
Galton, a cousin of Sir Charles Darwin. The earliest form of regression
was developed by Adrien-Marie Legendre and Carl Gauss - a method of
least squares.
▪ We can analyze data and perform data modeling using regression analysis.
Here, we create a decision boundary/line according to the data points,
such that the differences between the distances of data points from the
curve or line are minimized.

▪ The terminology you will often listen related with regression analysis is:
• Dependent variable or target variable: Variable to predict.
• Independent variable or predictor variable: Variables to estimate the
dependent variable.
• Outlier: Observation that differs significantly from other observations.
It should be avoided since it may hamper the result.
• Multicollinearity: Situation in which two or more independent
variables are highly linearly related.
• Homoscedasticity or homogeneity of variance: Situation in which
the error term is the same across all values of the independent
variables.
44
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Benefits (or) Advantages (or) Uses of Using Regression Analysis in Data
Analytics:-
▪ There are multiple benefits of using regression analysis. They are as
follows:
▪ It indicates the significant relationships between dependent variable and
independent variable.
▪ It indicates the strength of impact of multiple independent variables on a
dependent variable.
▪ Regression analysis also allows us to compare the effects of variables
measured on different scales, such as the effect of price changes and the
number of promotional activities.
▪ These benefits help market researchers / data analysts / data scientists
to eliminate and evaluate the best set of variables to be used for building
predictive models.
▪ Therefore, this powerful statistical tool is used by Business Analysts and
other data professionals for removing the unwanted variables and choosing
only the important ones.
▪ From a business point of view, the regression method of forecasting can
be helpful for an individual working with data in the following ways:
• Predicting sales in the near and long term.
• Understanding demand and supply.
• Understanding inventory levels.
• Review and understand how variables impact all these factors.
▪ However, businesses can use regression methods to understand the
following:
• Why did the customer service calls drop in the past months?
• How will the sales look like in the next six months?
• Which ‘marketing promotion’ method to choose?
• Whether to expand the business or to create and market a new
product.
▪ The ultimate benefit of regression analysis is to determine which
independent variables have the most effect on a dependent variable.
▪ It also helps to determine which factors can be ignored and those that
should be emphasized.
Types of regression Analysis:
▪ There are various types of regressions that are used in business
analytics, data science and machine learning.
▪ Here we mention some important types of regression:
▪ Linear Regression
▪ Polynomial Regression
▪ Decision Tree Regression
▪ Random Forest Regression
▪ Ridge Regression
▪ Logistic Regression

45
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
What is Linear Regression?
▪ Linear regression is a basic and commonly used type of predictive analysis.
▪ It is the simplest form of regression. It is a technique in which
the dependent variable is continuous in nature.
▪ The relationship between the dependent variable and independent
variables is assumed to be linear in nature.
▪ The overall idea of regression is to examine two things:
• does a set of predictor variables do a good job in predicting an outcome
(dependent) variable?
• Which variables are significant predictors of the outcome variable, and
in what way do they–indicated by the magnitude and sign of the beta
estimates–impact the outcome variable?
▪ The simplest form of the linear regression equation with one dependent
and one independent variable is defined by the formula:
y = a + b*x +e
where, y = estimated dependent variable score,
a = constant or intercept,
b = regression coefficient or slope of the line,
x = score on the independent variable, and
e = error term
▪ Naming the Variables:
There are many names for a regression’s dependent variable. It may be
called an outcome variable, criterion variable, endogenous variable,
or regressand. The independent variables can be called exogenous
variables, predictor variables, or regressors.
▪ Three major uses for regression analysis are
(1) determining the strength of predictors,
(2) forecasting an effect, and
(3) trend forecasting.
▪ When you have only 1 independent variable and 1 dependent variable, it
is called simple linear regression.
▪ When you have more than 1 independent variable and 1 dependent
variable, it is called Multiple linear regression.

Linear Regression Analysis


46
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Assumptions of linear regression:
▪ There must be a linear relation between independent and dependent
variables.
▪ There should not be any outliers present.
▪ No heteroscedasticity
▪ Sample observations should be independent.
▪ Error terms should be normally distributed with mean 0 and constant
variance.
▪ Absence of multicollinearity and auto-correlation.

Applications of Linear Regression in Data Analysis:


▪ It is one of the most widely known modelling technique.
▪ Linear regression is usually among the first few topics which people pick
while learning predictive modelling.
▪ In this technique, the dependent variable is continuous, independent
variable(s) can be continuous or discrete, and nature of regression line is
linear.
▪ Linear Regression establishes a relationship between dependent variable
(Y) and one or more independent variables (X) using a best fit straight line
(also known as regression line).
▪ It is represented by an equation Y=a+b*X + e,
where a is intercept,
b is slope of the line and
e is error term.
▪ This equation can be used to predict the value of target variable based
on given predictor variable(s).

▪ The linear regression is used in everything from biological, behavioural,


environmental and social sciences to business.
▪ You can perform linear regression in Microsoft Excel or use statistical
software packages such as IBM SPSS® Statistics that greatly simplify
the process of using linear-regression equations, linear-regression models
and linear-regression formula.

47
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ SPSS Statistics can be leveraged in techniques such as simple linear
regression and multiple linear regression.
▪ You can perform the linear regression method in a variety of programs
and environments, including:
• Excel linear regression
• Linear regression Python
• R linear regression
• MATLAB linear regression
• Sklearn linear regression
Examples:-
1. Marks scored by students based on number of hours studied
(ideally)-
Here marks scored in exams are independent and the number of hours
studied is independent.
2. Predicting crop yields based on the amount of rainfall-
Yield is a dependent variable while the measure of precipitation is an
independent variable.
3. Predicting the Salary of a person based on years of experience-
Therefore, Experience becomes the independent while Salary turns into
the dependent variable.
Linear Regression through Excel (or) Creating a Linear Regression Model
in Excel:-
▪ The first step in running regression analysis in Excel is to double-check
that the free Excel plugin Data Analysis ToolPak is installed.
▪ This plugin makes calculating a range of statistics very easy.
▪ It is not required to chart a linear regression line, but it makes creating
statistics tables simpler.
▪ To verify if installed, select "Data" from the toolbar. If "Data Analysis" is
an option, the feature is installed and ready to use.
▪ If not installed, you can request this option by clicking on the Office
button and selecting "Excel options".
▪ Using the Data Analysis ToolPak, creating a regression output is just a
few clicks.

48
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Interpret the Results
Using that data, we get the following table:

Charting a Regression Line in Excel:


❖ We can chart a regression line in Excel by highlighting the data and
charting it as a scatter plot.
❖ To add a regression line, choose "Layout" from the "Chart Tools" menu.
❖ In the dialog box, select "Trendline" and then "Linear Trendline".
❖ To add the R2 value, select "More Trendline Options" from the
“Trendline” menu.
❖ Lastly, select "Display R-squared value on chart". The visual result
sums up the strength of the relationship.

49
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Multiple Linear Regression:
▪ Multiple linear regression (MLR), also known simply as multiple regression,
is a statistical technique that uses several explanatory variables to predict
the outcome of a response variable.
▪ The goal of multiple linear regression is to model the linear relationship
between the explanatory (independent) variables and response (dependent)
variables.
▪ In essence, multiple regression is the extension of ordinary least-squares
(OLS) regression because it involves more than one explanatory variable.
What is Goodness-of-Fit?
▪ The Regression Analysis is a part of the linear regression technique. It
examines an equation that reduces the distance between the fitted line
and all of the data points. Determining how well the model fits the data
is crucial in a linear model.
▪ A general idea is that if the deviations between the observed values and
the predicted values of the linear model are small and unbiased, the
model has a well-fit data.
▪ In technical terms, “Goodness-of-fit” is a mathematical model that
describes the differences between the observed values and the expected
values or how well the model fits a set of observations. This measure can
be used in statistical hypothesis testing.
What are Residuals?
▪ Residuals identify the deviation of observed values from the expected
values.
▪ They are also referred to as error or noise terms.
▪ A residual gives an insight into how good our model is against the actual
value but there are no real-life representations of residual values.

50
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
What is R-squared (or) Coefficient of Determination?
• R squared (R2) value in machine learning is referred to as the coefficient
of determination or the coefficient of multiple determination in case of
multiple regression.
• R squared in regression acts as an evaluation metric to evaluate the
scatter of the data points around the fitted regression line. It recognizes
the percentage of variation of the dependent variable.

R-squared and the Goodness-of-fit:


• R-squared is the proportion of variance in the dependent variable that
can be explained by the independent variable.

• The value of R-squared stays between 0% and 100%


❖ 0% corresponds to a model that does not explain the variability of the
response data around its mean. The mean of the dependent variable
helps to predict the dependent variable and also the regression model.
❖ On the other hand, 100% corresponds to a model that explains the
variability of the response variable around its mean.
• If your value of R2 is large, you have a better chance of your regression
model fitting the observations.

How to Interpret R squared:


• The simplest R squared interpretation is how well the regression model
fits the observed data values.
• For Example, Consider a model where the R2 value is 70%. Here R
squared meaning would be that the model explains 70% of the fitted
data in the regression model. Usually, when the R2 value is high, it
suggests a better fit for the model.
• The correctness of the statistical measure does not only depend
on R2 but can depend on other several factors like the nature of the
variables, the units on which the variables are measured, etc. So, a high
R-squared value is not always likely for the regression model and can
indicate problems too.
• A low R-squared value is a negative indicator for a model in general.
However, if we consider the other factors, a low R2 value can also end
up in a good predictive model.

51
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Correlation and Coefficient of Correlation (r) :-
▪ Correlation is a statistical technique that shows how strongly two variables
are related to each other or the degree of association between the two.
▪ For example, if we have the weight and height data of taller and shorter
people, with the correlation between them, we can find out how these two
variables are related.
▪ We can also find the correlation between these two variables and say that
their weights are positively related to height.
▪ Correlation is measured by the correlation coefficient. It is denoted by ‘r’.
▪ It is very easy to calculate the correlation coefficient in SPSS.
▪ Before calculating the correlation in SPSS, we should have some basic
knowledge about correlation.
▪ The correlation coefficient should always be in the range of -1 to 1.
▪ There are three types of correlation:
1. Positive and negative correlation:
When two variables move in the same direction, then it is called
positive correlation. When one variable moves in a positive direction,
and a second variable moves in a negative direction, then it is said
to be negative correlation.
2. Linear and non linear or curvi-linear correlation:
When both variables change at the same ratio, they are known to be
in linear correlation.
When both variables do not change in the same ratio, then they are
said to be in curvi-linear correlation.
For example, if sale and expenditure move in the same ratio, then
they are in linear correlation and if they do not move in the same
ratio, then they are in curvi-linear correlation.
3. Simple, partial and multiple correlations:
When two variables in correlation are taken in to study, then it is
called simple correlation.
When one variable is a factor variable and with respect to that factor
variable, the correlation of the variable is considered, then it is a
partial correlation.
When multiple variables are considered for correlation, then they are
called multiple correlations.
Degree of correlation:
Perfect correlation: When both the variables change in the same ratio, then
it is called perfect correlation.
High degree of correlation:
When the correlation coefficient range is above 0.75, it is called high degree
of correlation.
Moderate correlation: When the correlation coefficient range is between 0.50
to 0.75, it is called in moderate degree of correlation.

52
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Low degree of correlation: When the correlation coefficient range is between
0.25 to 0.50, it is called low degree of correlation.
Absence of correlation: When the correlation coefficient is between 0 to 0.25,
it shows that there is no correlation.

Explain about DATA MINING


What Is Data Mining?
▪ Data mining is a process used by companies to turn raw data into useful
information.
▪ By using software to look for patterns in large batches of data, businesses
can learn more about their customers to develop more effective marketing
strategies, increase sales and decrease costs.
▪ Data mining depends on effective data collection, warehousing, and
computer processing.
▪ Data mining is the process of analyzing a large batch of information to
discern trends and patterns.
▪ Data mining can be used by corporations for everything from learning
about what customers are interested in or want to buy to fraud detection
and spam filtering.
▪ Data mining programs break down patterns and connections in data
based on what information users request or provide.
▪ Data mining is a process of extracting and discovering patterns in large
data sets involving methods at the intersection of machine learning,
statistics, and database systems.

▪ Data Mining is a process of finding potentially useful patterns from huge


data sets.
▪ It is a multi-disciplinary skill that uses machine learning, statistics, and
AI to extract information to evaluate future events probability.

53
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The insights derived from Data Mining are used for marketing, fraud
detection, scientific discovery, etc.
▪ Data Mining is all about discovering hidden, unsuspected, and previously
unknown yet valid relationships amongst the data.
▪ Data mining is also called Knowledge Discovery in Data (KDD), Knowledge
extraction, data/pattern analysis, information harvesting, etc.
Data Mining Applications: -
▪ Data Mining is primarily used by organizations with intense consumer
demands-
• Retail
• Communication
• Financial
• marketing company
• determine price
• consumer preferences
• product positioning and impact on sales
• customer satisfaction and corporate profits.
▪ Data mining enables a retailer to use point-of-sale records of customer
purchases to develop products and promotions that help the organization
to attract the customer.

54
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
These are the following areas where data mining is widely used:
Data Mining in Healthcare:
▪ Data mining in healthcare has excellent potential to improve the health
system.
▪ It uses data and analytics for better insights and to identify best practices
that will enhance health care services and reduce costs.
▪ Analysts use data mining approaches such as Machine learning, Multi-
dimensional database, Data visualization, Soft computing, and statistics.
▪ Data Mining can be used to forecast patients in each category.
▪ The procedures ensure that the patients get intensive care at the right
place and at the right time.
▪ Data mining also enables healthcare insurers to recognize fraud and
abuse.

Data Mining in Market Basket Analysis:


▪ Market basket analysis is a modelling method based on a hypothesis.
▪ If you buy a specific group of products, then you are more likely to buy
another group of products.
▪ This technique may enable the retailer to understand the purchase
behaviour of a buyer.
▪ This data may assist the retailer in understanding the requirements of
the buyer and altering the store's layout accordingly.
▪ Using a different analytical comparison of results between various stores,
between customers in different demographic groups can be done.

Data mining in Education:


▪ Education data mining is a newly emerging field, concerned with
developing techniques that explore knowledge from the data generated
from educational Environments.
▪ EDM objectives are recognized as affirming student's future learning
behaviour, studying the impact of educational support, and promoting
learning science.
▪ An organization can use data mining to make precise decisions and also
to predict the results of the student.
▪ With the results, the institution can concentrate on what to teach and
how to teach.

Data Mining in Manufacturing Engineering:


▪ Knowledge is the best asset possessed by a manufacturing company.
▪ Data mining tools can be beneficial to find patterns in a complex
manufacturing process.
▪ Data mining can be used in system-level designing to obtain the
relationships between product architecture, product portfolio, and data
needs of the customers.

55
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ It can also be used to forecast the product development period, cost, and
expectations among the other tasks.

Data Mining in CRM (Customer Relationship Management):


▪ Customer Relationship Management (CRM) is all about obtaining and
holding Customers, also enhancing customer loyalty and implementing
customer-oriented strategies.
▪ To get a decent relationship with the customer, a business organization
needs to collect data and analyze the data.
▪ With data mining technologies, the collected data can be used for
analytics.

Data Mining in Fraud detection:


▪ Billions of dollars are lost to the action of frauds.
▪ Traditional methods of fraud detection are a little bit time consuming and
sophisticated.
▪ Data mining provides meaningful patterns and turning data into
information.
▪ An ideal fraud detection system should protect the data of all the users.
▪ Supervised methods consist of a collection of sample records, and these
records are classified as fraudulent or non-fraudulent.
▪ A model is constructed using this data, and the technique is made to
identify whether the document is fraudulent or not.

Data Mining in Lie Detection:


▪ Apprehending a criminal is not a big deal, but bringing out the truth from
him is a very challenging task.
▪ Law enforcement may use data mining techniques to investigate offenses,
monitor suspected terrorist communications, etc.
▪ This technique includes text mining also, and it seeks meaningful
patterns in data, which is usually unstructured text.
▪ The information collected from the previous investigations is compared,
and a model for lie detection is constructed.

Data Mining in Financial Banking:


▪ The Digitalization of the banking system is supposed to generate an
enormous amount of data with every new transaction.
▪ The data mining technique can help bankers by solving business-related
problems in banking and finance by identifying trends, casualties, and
correlations in business information and market costs that are not
instantly evident to managers or executives because the data volume is
too large or are produced too rapidly on the screen by experts.
▪ The manager may find these data for better targeting, acquiring,
retaining, segmenting, and maintain a profitable customer.

56
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Types of Data for Data Mining:
Data mining can be performed on following types of data:
• Relational databases
• Data warehouses
• Advanced Database and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining

Explain about Data Mining Implementation Process (Data


Mining Process)

Business understanding:
In this phase, business and data-mining goals are established.
• First, you need to understand business and client objectives. You need
to define what your client wants (which many times even they do not
know themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your
assessment.
• Using business objectives and current scenario, define your data
mining goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.
• First, data is collected from multiple data sources available in the
organization.
• These data sources may include multiple databases, flat filer or data
cubes. There are issues like object matching and schema integration
which can arise during Data Integration process. It is a quite complex
and tricky process as data from various sources unlikely to match
easily. For example, table A contains an entity named cust_no whereas
another table B contains an entity named cust-id.
57
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
•Therefore, it is quite difficult to ensure that both of these given objects
refer to the same value or not. Here, Metadata should be used to reduce
errors in the data integration process.
• Next, the step is to search for properties of acquired data. A good way
to explore the data is to answer the data mining questions (decided in
business phase) using the query, reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained.
Missing data if any should be acquired.
Data preparation:
In this phase, data is made production ready.
• The data preparation process consumes about 90% of the time of
the project.
• The data from different sources should be selected, cleaned,
transformed, formatted, anonymized, and constructed (if required).
• Data cleaning is a process to "clean" the data by smoothing noisy
data and filling in missing values.
• For example, for a customer demographics profile, age data is
missing. The data is incomplete and should be filled. In some cases,
there could be data outliers. For instance, age has a value 300. Data
could be inconsistent. For instance, name of the customer is
different in different tables.
• Data transformation operations change the data to make it useful in
data mining. Following transformation can be applied
Data transformation:
▪ Data transformation operations would contribute toward the success of
the mining process.
o Smoothing: It helps to remove noise from the data.
o Aggregation: Summary or aggregation operations are applied to the
data. I.e., the weekly sales data is aggregated to calculate the
monthly and yearly total.
o Generalization: In this step, Low-level data is replaced by higher-
level concepts with the help of concept hierarchies. For example, the
city is replaced by the county.
o Normalization: Normalization performed when the attribute data
are scaled up o scaled down. Example: Data should fall in the range
-2.0 to 2.0 post-normalization.
o Attribute construction: these attributes are constructed and
included the given set of attributes helpful for data mining. The
result of this process is a final data set that can be used in modelling.
Modelling:
In this phase, mathematical models are used to determine data patterns.
• Based on the business objectives, suitable modelling techniques should
be selected for the prepared dataset.

58
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that model
can meet data mining objectives.
Evaluation:
In this phase, patterns identified are evaluated against the business
objectives.
• Results generated by the data mining model should be evaluated
against the business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of
data mining.
• A go or no-go decision is taken to move the model in the deployment
phase.
Deployment:
In the deployment phase, you ship your data mining discoveries to everyday
business operations.
• The knowledge or information discovered during data mining process
should be made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring
of data mining discoveries is created.
• A final project report is created with lessons learned and key
experiences during the project. This helps to improve the organization's
business policy.
Explain about Techniques of Data Mining (or) Data Mining
Techniques (Methods)

1. Classification:
This analysis is used to retrieve important and relevant information
about data, and metadata.
This data mining method helps to classify data in different classes.
2. Clustering:
Clustering analysis is a data mining technique to identify data that are
like each other.

59
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
This process helps to understand the differences and similarities between
the data.
3. Regression:
Regression analysis is the data mining method of identifying and
analyzing the relationship between variables.
It is used to identify the likelihood of a specific variable, given the presence
of other variables.
4. Association Rules:
This data mining technique helps to find the association between two or
more Items.
It discovers a hidden pattern in the data set.
5. Outer detection:
This type of data mining technique refers to observation of data items in
the dataset which do not match an expected pattern or expected behavior.
This technique can be used in a variety of domains, such as intrusion,
detection, fraud or fault detection, etc.
Outer detection is also called Outlier Analysis or Outlier mining.
6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns
or trends in transaction data for certain period.
7. Prediction:
Prediction has used a combination of the other techniques of data mining
like trends, sequential patterns, clustering, classification, etc.
It analyses past events or instances in a right sequence for predicting a
future event.
Data mining Example:
A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were
halved. Bank has multiple years of record on average credit card balances,
payment amounts, credit limit usage, and other key parameters. They create
a model to check the impact of the proposed new business policy. The data
results show that cutting fees in half for a targeted customer base could
increase revenues by $10 million.
Benefits of Data Mining:
• Data mining technique helps companies to get knowledge-based
information.
• Data mining helps organizations to make the profitable adjustments in
operation and production.
• The data mining is a cost-effective and efficient solution compared to other
statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviours as well as
automated discovery of hidden patterns.
60
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze huge
amount of data in less time.
Disadvantages of Data Mining
• There are chances of companies may sell useful information of their
customers to other companies for money. For example, American Express
has sold credit card purchases of their customers to the other companies.
• Many data mining analytics software is difficult to operate and requires
advance training to work on.
• Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct
data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.
Approaches in Data Mining:
▪ Data mining uses the existing data and identifies the patterns among
attributes in data set and build models.
▪ The models are mathematical representations i.e. linear relationships,
non-linear relationships, explanatory patterns, predictive patterns.
▪ Data mining identifies four types of patterns:
o Association among company occurring groups. In a market
basket analysis, cigarettes and chocolates are going together.
o Predictions about future happening of events, based on past
forecasting absolute temperature of a particular day.
o Clustering is identifying natural grouping of things based on the
known characteristics. Segmentation of customers based on
demographics is an example.
o Sequential relationship is discovering time order events. A
customer who ordered pizza, can order cold drink and ice cream is
an example.

61
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Explain about Data Exploration & Reduction
Explain about Data Exploration and Different steps involve in
Data Exploration:
▪ Data exploration is an approach similar to initial data analysis, whereby
a data analyst uses visual exploration to understand what is in a dataset
and the characteristics of the data, rather than through traditional data
management systems.
▪ These characteristics can include size or amount of data, completeness of
the data, correctness of the data, possible relationships amongst data
elements or files/tables in the data.

▪ Data exploration is the first step in data analysis and typically involves
summarizing the main characteristics of a data set, including its size,
accuracy, initial patterns in the data and other attributes.
▪ It is commonly conducted by data analysts using visual analytics tools,
but it can also be done in more advanced statistical software, such as ‘R’.

Importance of Data Exploration:


▪ Identifying Patterns and Trends:
Data exploration enables businesses to make better choices based on
historical data by revealing relevant patterns and trends within the data.
▪ Understanding Relationships:
Exploratory data analysis allows businesses to understand relationships
between variables. This insight is valuable for strategizing marketing
campaigns, optimizing processes, and improving overall efficiency.

62
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Detecting Outliers:
It aids in identifying outliers in the data, which might indicate errors or
unique occurrences. Detecting these irregularities is vital for maintaining
data quality and accuracy.
▪ Informing Decision-Making:
By exploring data, organizations gain insights into customer behavior,
market trends, and operational efficiencies. This information is
instrumental in making data-driven decisions that can enhance products,
services, and overall business strategies.
▪ Enhancing Predictive Modelling:
It provides a deep understanding of the data distribution, helping data
scientists select appropriate variables for predictive modeling.
Understanding the data thoroughly improves the accuracy and reliability
of machine learning algorithms.
▪ Improving Data Quality:
Data inconsistencies and missing values can be identified and corrected
through exploration. Clean and reliable data is essential for meaningful
analysis and reporting.
▪ Facilitating Communication:
Data visualization, a significant data exploration component, simplifies
complex data sets into understandable visual representations. These
visuals facilitate communication among stakeholders, making it easier to
convey insights and trends.
▪ Innovation and Competitive Advantage:
Businesses can gain a competitive edge by exploring data creatively.
Innovative solutions often arise from a deep understanding of customer
preferences and market dynamics, which can be explored through
comprehensive data analysis.
How Data Exploration Works? (or) Different steps involved in Data
exploration:
▪ Define Your Objective:
Start by understanding the problem or question you want to answer
through data exploration. Having a clear goal will help focus your
exploration.
▪ Gather the Data:
Collect the relevant data for your analysis. This could involve data
acquisition from various sources, such as databases, APIs, spreadsheets,
or files.
▪ Understand the Data:
Examine the data’s structure and format. Key steps include:
o Data Loading: Import the data into your analysis environment (e.g.,
Python, R, Excel).
o Data Description: Check the dataset’s size, shape, and basic
statistics (e.g., mean, median, standard deviation).

63
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o Data Types: Identify the types of variables (categorical, numerical,
date, etc.).
o Column Names: Review the column names for clarity and
consistency.
▪ Data Cleaning:
Before diving into exploration, address data quality issues:
o Missing Data: Handle missing values through imputation or
removal.
o Outliers: Detect and address outliers that could skew results.
o Data Transformation: Normalize, standardize, or scale data when
necessary.
▪ Data Visualization:
Create visual representations of the data to reveal patterns and
relationships. Common visualization techniques include:
o Bar Charts and Histograms: Display frequency distributions of
categorical and numerical data.
o Scatter Plots: Show relationships between two numerical variables.
o Box Plots: Visualize the spread and distribution of numerical data.
o Heat Maps: Display correlations between variables.
o Time Series Plots: Explore data over time.
▪ Summary Statistics:
Compute summary statistics to gain a deeper understanding of the data:
o Central Tendency: Calculate mean, median, and mode.
o Dispersion: Assess variance, standard deviation, and range.
o Skewness and Kurtosis: Understand the shape of the data
distribution.
o Correlation: Evaluate relationships between variables.
▪ Exploratory Data Analysis (EDA):
Perform in-depth analysis to uncover patterns, anomalies, and insights:
o Frequency Analysis: Examine the distribution of categorical data.
o Box Plots and Violin Plots: Visualize data distributions and
outliers.
o Correlation Matrix: Identify relationships between numerical
variables.
o Hypothesis Testing: Conduct statistical tests to confirm or reject
hypotheses.
Explain about Data Reduction and its Strategies
▪ Data reduction is the process of minimizing the amount of data that
needs to be stored in a data storage environment.
▪ Data reduction can increase storage efficiency and reduce costs.
▪ Data reduction can be achieved using several different types of
technologies.
▪ The best-known data reduction technique is data deduplication, which
eliminates redundant data on storage systems.

64
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The deduplication process typically occurs at the storage block level.
▪ The system analyses the storage to see if duplicate blocks exist, and gets
rid of any redundant blocks.
▪ The remaining block is shared by any file that requires a copy of the block.
▪ If an application attempts to modify this block, the block is copied prior
to modification so that other files that depend on the block can continue
to use the unmodified version, thereby avoiding file corruption.
▪ While data deduplication is probably the most common data reduction
technique, it is not the only viable one.
▪ Data archiving and data compression can also reduce the amount of data
that has to be stored on primary storage systems.
▪ Data compression reduces the size of a file by removing redundant
information from files so that less disk space is required.
▪ This is accomplished natively in storage systems using algorithms or
formulae designed to identify and remove redundant bits of data.
▪ Archiving data also reduces data on storage systems, but the approach is
quite different.
▪ Rather than reducing data within files or databases, archiving removes
older, infrequently accessed data from expensive storage and moves it to
low-cost, high-capacity storage.
▪ Archive storage can be disk, tape or cloud based.
Strategies for data reduction (or) Methods of data reduction
(or) Techniques of Data Reduction (or) Types of Data
Reduction:
1. Data Cube Aggregation
• Aggregation operations are applied to the data in the construction of a
data cube.
2. Dimensionality Reduction
• In dimensionality reduction redundant attributes are detected and
removed which reduce the data set size.
3. Data Compression
• Encoding mechanisms are used to reduce the data set size.
4. Numerosity Reduction
• In numerosity reduction where the data are replaced or estimated by
alternative.
• Where the data are replaced or estimated by alternative, smaller data
representations such as parametric models or non parametric method
such as clustering, sampling, and the use of histograms.
5. Discretisation and concept hierarchy generation
• Where raw data values for attributes are replaced by ranges or higher
conceptual levels.
• Data discretization is a form of numerosity reduction that is very useful
for the automatic generation of concept hierarchies.

65
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Discretization and concept hierarchy generation are powerful tools for
data mining, in that they allow the mining of data at multiple levels of
abstraction.

Benefits of Data Reduction

▪ The main benefit of data reduction is simple: the more data you can fit into
a terabyte of disk space, the less capacity you will need to purchase. Here
are some benefits of data reduction, such as:
o Data reduction can save energy.
o Data reduction can reduce your physical storage costs.
o And data reduction can decrease your data center track.
▪ Data reduction greatly increases the efficiency of a storage system and
directly impacts your total spending on capacity.
▪ Improved efficiency: Data reduction can help to improve the efficiency of
machine learning algorithms by reducing the size of the dataset. This can
make it faster and more practical to work with large datasets.
▪ Improved performance: Data reduction can help to improve the
performance of machine learning algorithms by removing irrelevant or
redundant information from the dataset. This can help to make the model
more accurate and robust.
▪ Reduced storage costs: Data reduction can help to reduce the storage
costs associated with large datasets by reducing the size of the data.
▪ Improved interpretability: Data reduction can help to improve the
interpretability of the results by removing irrelevant or redundant
information from the dataset.

Disadvantages:
▪ Loss of information: Data reduction can result in a loss of information, if
important data is removed during the reduction process.
▪ Impact on accuracy: Data reduction can impact the accuracy of a model,
as reducing the size of the dataset can also remove important information
that is needed for accurate predictions.
▪ Impact on interpretability: Data reduction can make it harder to
interpret the results, as removing irrelevant or redundant information can
also remove context that is needed to understand the results.
▪ Additional computational costs: Data reduction can add additional
computational costs to the data mining process, as it requires additional
processing time to reduce the data.

66
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Explain about Data Classification for Predictive


Analytics
➢ Classification models predict categorical class labels. Following are the
examples of cases where the data analysis task is Classification:
• A bank loan officer wants to analyse the data in order to know which
customer (loan applicant) are risky or which are safe.
• A marketing manager at a company needs to analyse a customer with
a given profile, who will buy a new computer.
➢ In both of the above examples, a model or classifier is constructed to
predict the categorical labels.
➢ These labels are risky or safe for loan application data and yes or no for
marketing data.
How Does Classification Works?
➢ With the help of the bank loan application that we have discussed above,
let us understand the working of classification. The Data Classification
process includes two steps −
1. Building the Classifier or Model
2. Using Classifier for Classification
1. Building the Classifier or Model
➢ This step is the learning step or the learning phase.
➢ In this step the classification algorithms build the classifier.
➢ The classifier is built from the training set made up of database tuples and
their associated class labels.
➢ Each tuple that constitutes the training set is referred to as a category or
class. These tuples can also be referred to as sample, object or data points.

2.Using Classifier for Classification


➢ In this step, the classifier is used for classification.
➢ Here the test data is used to estimate the accuracy of classification rules.

67
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ The classification rules can be applied to the new data tuples if the
accuracy is considered acceptable.

Classification Issues
The major issue is preparing the data for Classification. Preparing the data
involves the following activities −
➢ Data Cleaning − Data cleaning involves removing the noise and treatment
of missing values. The noise is removed by applying smoothing techniques
and the problem of missing values is solved by replacing a missing value
with most commonly occurring value for that attribute.
➢ Relevance Analysis − Database may also have the irrelevant attributes.
Correlation analysis is used to know whether any two given attributes are
related.
➢ Data Transformation and reduction − The data can be transformed by
any of the following methods.
▪ Normalization − The data is transformed using normalization.
Normalization involves scaling all values for given attribute in order to
make them fall within a small specified range. Normalization is used
when in the learning step, the neural networks or the methods involving
measurements are used.
▪ Generalization − The data can also be transformed by generalizing it to
the higher concept. For this purpose we can use the concept
hierarchies.

Explain about Data Association for Predictive


Analytics
➢ Association rules are "if-then" statements, that help to show the
probability of relationships between data items, within large data sets in
various types of databases.
➢ Association rule mining has a number of applications and is widely used
to help discover sales correlations in transactional data or in medical data
sets.
68
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Use cases for association rules:
➢ In Business Analytics, association rules are used to find correlations and
co-occurrences between data sets.
➢ The act of using association rules is sometimes referred to as "association
rule mining" or "mining associations."
➢ Below are a few real-world use cases for association rules:
Medicine:
• Doctors can use association rules to help diagnose patients. There are
many variables to consider when making a diagnosis, as many diseases
share symptoms.
• By using association rules and machine learning-fuelled data analysis,
doctors can determine the conditional probability of a given illness by
comparing symptom relationships in the data from past cases.
• As new diagnoses get made, the machine learning model can adapt the
rules to reflect the updated data.
Retail:
• Retailers can collect data about purchasing patterns, recording purchase
data as item barcodes are scanned by point-of-sale systems.
• Machine learning models can look for co-occurrence in this data to
determine which products are most likely to be purchased together.
• The retailer can then adjust marketing and sales strategy to take advantage
of this information.
User experience (UX) design:
• Developers can collect data on how consumers use a website they create.
• They can then use associations in the data to optimize the website user
interface -- by analyzing where users tend to click and what maximizes the
chance that they engage with a call to action, for example.
Entertainment:
• Services like Netflix and Spotify can use association rules to fuel their
content recommendation engines.
• Machine learning models analyze past user behavior data for frequent
patterns, develop association rules and use those rules to recommend
content that a user is likely to engage with, or organize content in a way
that is likely to put the most interesting content for a given user first.

How association rules work?


➢ Association rule mining, at a basic level, involves the use of machine
learning models to analyze data for patterns, or co-occurrences, in a
database.
➢ It identifies frequent if-then associations, which themselves are
the association rules.
➢ An association rule has two parts: an antecedent (if) and a consequent
(then).

69
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ An antecedent is an item found within the data.
➢ A consequent is an item found in combination with the antecedent.
➢ Association rules are created by searching data for frequent if-then
patterns and using the criteria support and confidence to identify the
most important relationships.
➢ Support is an indication of how frequently the items appear in the data.
➢ Confidence indicates the number of times the if-then statements are
found true.
➢ A third metric, called lift, can be used to compare confidence with
expected confidence, or how many times an if-then statement is expected
to be found true.
➢ Association rules are calculated from itemsets, which are made up of two
or more items.
➢ If rules are built from analyzing all the possible itemsets, there could be so
many rules that the rules hold little meaning.
➢ With that, association rules are typically created from rules well-
represented in data.

Uses of association rules in data mining


➢ In data mining, association rules are useful for analyzing and predicting
customer behavior.
➢ They play an important part in customer analytics, market basket
analysis, product clustering, catalog design and store layout.
➢ Programmers use association rules to build programs capable of machine
learning.
➢ Machine learning is a type of artificial intelligence (AI) that seeks to build
programs with the ability to become more efficient without being explicitly
programmed.

Example of association rules in data mining


A classic example of association rule mining refers to a relationship between
diapers and beers. The example, which seems to be fictional, claims that men
who go to a store to buy diapers are also likely to buy beer. Data that would
point to that might look like this:
A supermarket has 2,00,000 customer transactions. About 4,000
transactions, or about 2% of the total number of transactions, include the
purchase of diapers. About 5,500 transactions (2.75%) include the purchase
of beer. Of those, about 3,500 transactions, 1.75%, include both the purchase
of diapers and beer. Based on the percentages, that large number should be
much lower. However, the fact that about 87.5% of diaper purchases include
the purchase of beer indicates a link between diapers and beer.

70
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Explain about Cause & Effect Modelling


➢ Cause and Effect refers to a relationship between two phenomena in which
one phenomenon is the reason behind the other.
Ex:- Eating too much fast food without any physical activity leads to weight
gain.
Here, eating without any physical activity is the “cause” and weight gain
is the “effect”

➢ Cause and Effect Analysis gives us a useful way of doing this. This diagram-
based technique, which combines Brainstorming with a type of Mind
Map , pushes us to consider all possible causes of a problem, rather than
just the ones that are most obvious.
➢ Cause and Effect Analysis was devised by ‘Prof. Kaoru Ishikawa’, a pioneer
of quality management, in the 1960s.
➢ The technique was then published in his 1990 book, "Introduction to
Quality Control."
➢ The diagrams that you create with are known as ‘Ishikawa Diagrams’ or
‘Fishbone Diagrams’ (because a completed diagram can look like the
skeleton of a fish).
➢ There are four steps to solve a problem with Cause and Effect Modeling:
• Identify the Problem
• Work out the major factors involved
• Identify possible causes
• Analyze your diagram

71
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Step 1: Identify the Problem


➢ First, write down the exact problem you face. Where appropriate,
identify who is involved, what the problem is, and when and where it
occurs.
➢ Then, write the problem in a box on the left-hand side of a large sheet
of paper, and draw a line across the paper horizontally from the box.
➢ This arrangement, looking like the head and spine of a fish, gives you
space to develop ideas.
Example:
➢ In this simple example, a manager is having problems with an
uncooperative branch office.
Step 2: Work Out the Major Factors Involved
➢ Next, identify the factors that may be part of the problem.
➢ These may be systems, equipment, materials, external forces, people
involved with the problem, and so on.
➢ Try to draw out as many of these as possible. As a starting point, you
can use models such as the McKinsey 7S Framework which offers
you as factors that you can consider
Strategy Structure Systems Shared values Skills
Style and Staff
(or)
➢ The 4Ps of Marketing , which offers you as possible factors,
Product Place Price and Promotion
➢ Brainstorm any other factors that may affect the situation.
➢ Then draw a line off the "spine" of the diagram for each factor, and label
each line.
Example:
➢ The manager identifies the following factors, and adds these to his
diagram:
– Site --Task --People --Equipment --Control

72
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 3: Identify Possible Causes
➢ Now, for each of the factors you considered in step 2, brainstorm
possible causes of the problem that may be related to the factor.
➢ Show these possible causes as shorter lines coming off the "bones" of
the diagram.
➢ Where a cause is large or complex, then it may be best to break it down
into sub-causes.
➢ Show these as lines coming off each cause line.
Example:
➢ For each of the factors he identified in step 2, the manager brainstorms
possible causes of the problem, and adds these to the diagram

Step 4: Analyze Your Diagram


➢ By this stage you should have a diagram showing all of the possible
causes of the problem that you can think of.
➢ Depending on the complexity and importance of the problem, you can
now investigate the most likely causes further.
➢ This may involve setting up investigations, carrying out surveys, and so
on.
➢ These will be designed to test which of these possible causes is actually
contributing to the problem.

73
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Unit – IV : PRESCRIPTIVE ANALYTICS


Introduction:
➢ Descriptive analytics describes the ‘past’ and predictive analytics provides
a ‘probability of what might happen’.
➢ In contrast, prescriptive analytics helps an organization evaluate different
scenarios and seeks to determine the best course of action to achieve
optimal outcomes - given known and estimating unknown variables.
➢ Prescriptive analytics provides decision options and shows the likely
impact of each decision option using probability theory.

➢ To design and implement an effective prescriptive analytics strategy, an


organization needs an information management strategy (including
both internal and external data as well as both structured and
unstructured data), a technology strategy and a data science strategy.
➢ The organization must invest in a team of data scientists to use
sophisticated simulation techniques, machine learning and statistical
algorithms for crunching relevant data and applying probability theory.
➢ The data science team works with leaders to design a
prescriptive strategy for evaluating scenarios and making optimal
decisions.
➢ In practice, prescriptive analytics can continually and automatically
process new data to improve the accuracy of predictions and provide
better decision options.
➢ The effectiveness of predictive analytics also depends on how well the
decision model captures the impact of the decisions being analyzed.
➢ Advancements in the speed of computing and the development of
complex mathematical algorithms applied to the data sets have made
prescriptive analysis possible.
➢ Specific techniques used in prescriptive analytics include optimization,
simulation, game theory and decision-analysis methods.
74
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ The following are different methods used in the prescriptive analytics:
Linear Programming (Linear Optimization)
Formulation of LPP
Solve LPP by Graphical Method
Non Linear Programming Integer Optimization
Cutting Plane Algorithm Methods
Decision Analysis – Risk and Uncertainty Methods
Linear Programming (Linear Optimization)
➢ Linear programming (LP, also called linear optimization) is a method
to achieve the best outcome (such as maximum profit or lowest cost) in
a mathematical model whose requirements are represented by linear
relationships.
➢ Linear programming is a special case of mathematical programming
(also known as mathematical optimization).
➢ More formally, linear programming is a technique for
the optimization of a linear objective function, subject to linear
equality and linear inequality constraints.
➢ Linear programming (LP) is one of the simplest ways to perform
optimization.
➢ It helps you solve some very complex optimization problems by making
a few simplifying assumptions.
➢ Linear programming is a simple technique where we depict complex
relationships through linear functions and then find the optimum
points.
➢ The real relationships might be much more complex – but we can
simplify them to linear relationships.
Example of a linear programming problem
➢ Let’s say a FedEx delivery man has 6 packages to deliver in a day. The
warehouse is located at point A.
➢ The 6 delivery destinations are given by U, V, W, X, Y and Z.
➢ The numbers on the lines indicate the distance between the cities.
➢ To save on fuel and time the delivery person wants to take the shortest
route.

75
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ So, the delivery person will calculate different routes for going to all
the 6 destinations and then come up with the shortest route.
➢ This technique of choosing the shortest route is called linear
programming.
➢ So, the delivery person will calculate different routes for going to all
the 6 destinations and then come up with the shortest route.
➢ This technique of choosing the shortest route is called linear
programming.
➢ Linear programming is used for obtaining the most optimal solution for
a problem with given constraints.
➢ In linear programming, we formulate our real life problem into a
mathematical model.
➢ It involves an objective function, linear inequalities with subject to
constraints.

Formulation of Linear Programming Problem (LPP):-


Common Terminologies used in LPP:-

Decision Variables:-
• The Decision variables are the variables which will decide the output.
• They represent the ultimate solution.
• To solve any problem, first we need to identify the decision Variables.
Ex: The total no of units for two products denoted by ‘x' and 'y'
respectively, are the decision variables.

Objective Function:-
• The Objective function is defined as the objective of making decisions. It
is denoted by 'Z'.
Ex: - The company wishes to increase the total profit. So, profit is the
objective function.
Constraints: -
• The constraints are the restrictions (or) limitations on the decision
variables.
• They usually limit the value of the decision variables.
Ex:- The limits on the availability of resources are the Constraints.

Non-Negativity Restrictions:-
• For all linear programs, the decision variables should always take non-
negative values, which means the values for decision Variables should be
greater than (or) equal to '0'.

Ex:- x≥0, y≥0

76
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Explain about the Process of formulate a Linear Programming
Problem (LPP):-

The following are the different steps involved in the formulation of LPP :
Step 1: Identify the decision Variables.
Step 2: Write the objective function.
Step 3: Mention the constraints.
Step 4: Explicitly state the non-negativity restriction.

For a problem to be a LPP, the decision variables, Objective function and


constraints all have to be linear functions.
If all the above three conditions are satisfied, it is called a Limear
Programming Problem (LPP).

Problems:
1. Consider a chocolate manufacturing company which produces only two
types of chocolate-A and B. Both the chocolates require Milk and Choco only.
To manufacture each unit of A and B, the following quantities are required:
•Each unit of 'A' requires 1 unit of Milk and 3 units of Choco.
• Each unit of ‘B’ requires l unit of milk and 2 units of Choco.
The company has a total of 5 units of milk and 12 units of Choco. On each
sale, the company makes a Rs. 6 per unit 'A' and Rs. 5 per unit B. Formulate
the problem to maximize the company's profit.

Sol:- Let 'x' be the total no. of units of production of chocolate ‘A’
‘y’ be the total no of units of production of chocolate ‘B’
Now, we can represent the given problem in a tabular form for better
understanding:

77
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

78
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

79
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

Graphical method of solving Linear Programming


problem(LPP): -

The steps of graphical method can be summarized as follows:


Step 1: Formulate the linear programming problem.
Step 2: Each inequality in the constraints may be written as equality.
Step 3: Plot the constraints lines considering them as equations.
Step 4: Identify the feasible solution region. Feasible solution is a solution for
which all the constraints are satisfied. Conversely an infeasible
solution is a solution for which at least one constraint is violated.
Step 5: Locate the corner points of the feasible region.
Step 6: Calculate the value of the objective function on the corner points.
Step 7: Choose the point where the objective function has optimal value. An
optimal solution is a feasible solution that has the most favourable value
of the objective function. The most favourable value is the largest value,
if the objective function is to be maximized, whereas it the smallest value
if the objective function is to be minimized.

80
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

81
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics

82
Siva Sivani Degree College

You might also like