Business Analytics - Part-II
Business Analytics - Part-II
Types of Data: -
• Population and Sample Data: -
▪ Data can be categorized in several ways based on how they are collected,
and the type collected. In many cases, it is not feasible to collect data
from the population of all elements of interest.
▪ In such instances, we collect data from a subset of the population known
as a sample.
▪ For example, with the thousands of publicly traded companies in the
India, tracking and analyzing all of these stocks every day would be too
time consuming and expensive. The NSE represents a sample of 50
stocks of large public companies based in India, and it is often
15
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
interpreted to represent the larger population of all publicly traded
companies.
▪ It is very important to collect sample data that are representative of the
population data so that generalizations can be made from them. In most
cases, a representative sample can be gathered by random sampling
from the population data.
16
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Decision modelling can also help companies model complex operational
decisions into more manageable subsets, which can facilitate scalability.
• Decision modelling is the process of creating a structured and, typically,
visual representation of how the decisions are made within an
organisation.
• Decision models created through this process serve as visual aids helping
all involved stakeholders, including analysts and key decision-makers, to
comprehend all the important factors, business rules, and considerations
that impact choices within an organisation.
• The Business Decision Models are as follows:
o Creative Decision-Making Model
o Intuitive Decision-Making Model
o Rational Decision-Making Model
17
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o Intuitive decision making is especially beneficial in entrepreneurship,
ideation, creativity, and selling.
o Here are some steps for making intuitive decisions:
▪ Identify the decision to be made.
▪ Gather relevant information.
▪ Identify alternative solutions.
▪ Evaluate the options.
▪ Choose a course of action.
▪ Implement the decision.
▪ Review the outcome.
18
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o The rational decision-making model can be used for a variety of
reasons, including educational purposes, business decisions, career
choices, and other significant life events.
o It allows for an objective approach that's based on scientifically
obtained data to reach informed decisions.
o It reduces the chance of errors and assumptions.
o It helps to minimize the manager's emotions which might have resulted
in poor judgments in the past.
19
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
UNIT – II
Descriptive Analytics
Overview of Descriptive Statistics: - (Central Tendency, Variability)
Q-1) Explain about Measures of Location (or) Central Tendency
through MS-Excel: -
• Central tendency is a statistical measure that represents a single value
that is representative of an entire data distribution.
• It aims to provide an accurate description of the entire data.
20
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The mean can be found in Excel using the AVERAGE function.
▪ The Home Sales data from Table in an Excel spreadsheet. The value
for the mean in cell E3 is calculated using the formula:
= AVERAGE (C3:C14)
Median: -
▪ The median of a data set can be found in Excel using the function
= MEDIAN( )
▪ The value for the median in cell E5 is found using the formula
=MEDIAN(B4:B13)
Mode: -
▪ As a third measure of location, the mode, is the value that occurs most
frequently in a dataset.
▪ Consider the sample of five class sizes, 32 42 46 46 54
The only value that occurs more than once is 46. Because this value,
occurring with a frequency of 2, has the greatest frequency, it is the
mode.
▪ To find the mode for a data set with only one most often occurring
value in Excel, we use the MODE.SNGL function.
▪ Occasionally the greatest frequency occurs at two or more different
values, in which case more than one mode exists.
▪ If data contain at least two modes, we say that they are multimodal. A
special case of multimodal data occurs when the data contain exactly
two modes; in such cases we say that the data are bimodal.
21
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ In multimodal cases when there are more than two modes, the mode is
almost never reported because listing three or more modes is not
particularly helpful in describing a location for the data.
▪ Also, if no value in the data occurs more than once, we say the data have
no mode.
▪ The Excel MODE.SNGL function will return only a single most-often-
occurring value.
▪ For multimodal distributions, we must use the MODE.MULT command
in Excel to return more than one mode.
▪ To find both of the modes in Excel, we take these steps:
Step 1: Select cells G7 and G8
Step 2: Type the formula =MODE.MULT(E4:E16)
Step 3: Press CTRL+SHIFT+ENTER after typing the formula
Excel enters the values for both modes of this data set in cells G7 and G8
Geometric Mean:-
▪ The geometric mean is a measure of location that is calculated by
finding the nth root of the product of n values.
▪ The general formula for the sample geometric mean, denoted x—g
follows:
22
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
23
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The mean, median, and mode are all measures of the centre of a set of
data. You can use this relationship to estimate the third unknown quantity
if you know any two of the mean, median, or mode.
▪ For any given data, mean is the average of given data values, and this can
be calculated by dividing the sum of all data values by number of data
values. Median is the middlemost value of the data set when data values
are arranged either in ascending or descending order. Mode is the most
frequently occurred data value.
▪ For a frequency distribution with symmetrical frequency curve, the
relation between mean median and mode is given by:
Mean = Median = Mode
▪ For a positively skewed frequency distribution, the relation between
mean median and mode is:
Mean > Median > Mode
▪ For a negatively skewed frequency distribution, the relation between
mean median and mode is:
Mean < Median < Mode
Example:
The median and mean values of a series that is moderately
asymmetrical are 30 and 20, respectively. Determine the mode value.
Solution:
In the above example to calculate the value of mode, the following
formula is to be used:
25
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Use the Excel QUARTILE.INC function again to find the third quartile
(Q3). Formula:
=QUARTILE.INC(data_range, 3)
▪ Subtract Q1 from Q3.
▪ Divide the result by 2 (since it's the semi-interquartile range).
Mean Deviation(MD) : -
▪ Mean deviation, also known as the average absolute deviation, is a
measure of statistical dispersion that quantifies the average absolute
difference between each data point and the mean of the dataset.
▪ It gives an indication of how spread out the values in a dataset.
▪ In Excel, you can calculate the mean deviation using built-in functions.
▪ Organize your data: Place your data in a column in Excel.
▪ Calculate the Mean: Use the AVERAGE function to find the mean of your
dataset.
▪ Calculate Absolute Deviations: Subtract the mean from each data point
and take the absolute value of each difference.
▪ Calculate Mean Deviation: Find the average of the absolute deviations
calculated in step 3.
▪ Suppose your data is in cells A2:A11.
▪ In cell A13, enter the formula to calculate the mean:
=AVERAGE (A2:A11)
▪ In cell B2, enter the formula to calculate the absolute deviations:
=ABS(A2-$A$13)
▪ Drag this formula down to cover all the data points. This will give you the
absolute deviation of each data point from the mean.
26
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
=AVERAGE(ABS(A2:A11-AVERAGE(A2:A11)))
This formula directly calculates the mean deviation without needing an
intermediate column for absolute deviations.
Variance: -
▪ Variance is a statistical measure that quantifies the spread or dispersion
of a set of data points around their mean or average value.
▪ It provides insight into how much individual data points differ from the
mean.
▪
▪ In Excel, you can calculate variance using the VAR function for
population variance and the VAR.S function for sample variance.
▪ These functions take the data range as input and return the variance of
the dataset.
27
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
=VAR.S(B2:B13).
Standard Deviation:-
▪ The standard deviation is defined to be the positive square root of the
variance.
▪ We use ‘s’ to denote the sample standard deviation and ‘σ’ to denote the
population standard deviation.
▪ The sample standard deviation ‘s’, is a point estimate of the population
standard deviation ‘σ’ , and is derived from the sample variance in the
following way: Sample Standard Deviation (s) = √s2
Coefficient of Variation:-
▪ The coefficient of variation (CV) is a relative measure of variability that
expresses the standard deviation as a percentage of the mean.
▪ It is used to compare the variability of different datasets with different units
or scales, allowing for a more meaningful comparison.
▪ Mathematically, the coefficient of variation is calculated as:
28
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Cross tabulation: -
▪ A useful type of table for describing data of two variables is a
crosstabulation, which provides a tabular summary of data for two
variables.
▪ For example, consider the following application based on data from Zagat’s
Restaurant Review. Data on the quality rating, meal price, and the usual
wait time for a table during peak hours were collected for a sample of 300
Los Angeles area restaurants. The data for the first 10 restaurants are as
follows:
▪ Quality ratings are an example of categorical data, and meal prices are an
example of quantitative data.
▪ For now, we will limit our consideration to the quality-rating and meal-
price variables.
▪ A cross tabulation of the data for quality rating and meal price is shown in
the following table:
30
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
PivotTables in Excel:-
A cross tabulation in Microsoft Excel is known as a PivotTable. To create a
PivotTable in Excel, we follow these steps:
Step 1. Click the Insert tab on the Ribbon
Step 2. Click PivotTable in the Tables group
Step 3. When the Create PivotTable dialog box appears: Choose Select a
Table or Range Enter the data in the Table/Range: box
Select New Worksheet as the location for the PivotTable Report
Click OK
Step 4. In the PivotTable Fields task pane, go to Drag fields between
areas below: Drag the Quality Rating field to the ROWS area.
Drag the Meal Price ($) field to the COLUMNS area.
Drag the Restaurant field to the VALUES area.
Step 5. Click on Sum of Restaurant in the VALUES area.
Step 6. Select Value Field Settings from the list of options.
Step 7. When the Value Field Settings dialog box appears:
Under Summarize value field by, select Count.
Click OK
▪ The completed PivotTable Field List and a portion of the PivotTable
worksheet as it now appears.
▪ To complete the PivotTable, we need to group the columns representing
meal prices and place the row labels for quality rating in the proper order:
Step 8. Right-click in cell B4 or any cell containing a meal price column label
Step 9. Select Group from the list of options
Step 10. When the Grouping dialog box appears:
Enter 10 in the Starting at: box
Enter 49 in the Ending at: box
Enter 10 in the By: box
Click OK
Step 11. Right-click on “Excellent” in cell A5
Step 12. Select Move and click Move “Excellent” to End
31
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
CHARTS: -
▪ Charts (or graphs) are visual methods for displaying data. Here, we
introduce some of the most commonly used charts to display and analyze
data including scatter charts, line charts, and bar charts.
▪ Excel is the most commonly used software package for creating simple
charts.
How to use Excel to create scatter charts, line charts, sparklines, bar
charts, bubble charts, and heat maps:-
Scatter Charts:-
▪ A scatter chart is a graphical presentation of the relationship between two
quantitative variables.
▪ For example, On 10 occasions during the past three months, the store
used week- end television commercials to promote sales at its stores. The
managers want to investigate whether a relationship exists between the
number of commercials shown and sales at the store the following week.
Sample data for the 10 weeks, with sales in hundreds of dollars are shown
in the following table:
No. of Sales
Commercials ($100s)
Week x Y
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
We will use the data from the above table to create a scatter chart using
Excel’s chart tools:
Step 1. Select cells B2:C11
Step 2. Click the Insert tab in the Ribbon.
Step 3. Click the Insert Scatter (X,Y) or Bubble Chart button
in the Charts group
Step 4. When the list of scatter chart subtypes appears, click the
Scatter button
Step 5. Right-click on one of the horizontal grid lines in the body of
the chart, and click Delete
32
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 6. Right-click on one of the vertical grid lines in the body of the
chart, and click Delete
Step 7. Click Add Chart Element in the Chart Layouts group, Select
Axis, Axis Title, Chart Title and replace the Axis Titles
and Chart Title with respective Names.
▪ We can also use Excel to add a trend line to the scatter chart.
▪ A trend line is a line that provides an approximation of the
relationship between the variables.
▪ To add a linear trend line using Excel, we use the following steps:
Step 1. Right-click on one of the data points in the scatter chart, and
select Add Trendline…
Step 2. When the Format Trendline task pane appears, select Linear
under Trendline Options
The following shows the scatter chart and linear trendline created with Excel
for the data in the above table. The number of commercials (x) is shown on
the horizontal axis, and sales (y) are shown on the vertical axis.
Line Charts :-
▪ Line charts are similar to scatter charts, but a line connects the points in
the chart.
▪ Line charts are very useful for time series data collected over a period of
time (minutes, hours, days, years, etc.).
▪ To create the line chart in Excel, we follow these steps:
Step 1. Select cells of the entered data.
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Line Chart button in the Charts group.
Step 4. When the list of line chart subtypes appears, click the Line
with Markers button under 2-D Line
▪ This creates a line chart for sales with a basic layout and minimum
formatting.
33
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 5. Click the Chart Elements button Select the check boxes for
Axes, Axis Titles, and Chart Title. Deselect the check box for Gridlines.
Sparkline: -
▪ A special type of line chart is a spark line, which is a minimalist type of
line chart that can be placed directly into a cell in Excel.
▪ Spark lines contain no axes; they display only the line for the data.
▪ Spark lines take up very little space, and they can be effectively used to
provide information on overall trends for time series data.
▪ The use of spark lines in Excel for the regional sales data. To create a spark
line in Excel:
Step 1. Click the Insert tab on the Ribbon.
Step 2. Click Line in the Spark lines group.
Pie Charts :-
➢ Pie charts are another common form of chart used to compare
categorical data.
➢ However, many experts argue that pie charts are inferior to bar charts
for comparing data.
➢ The pie chart in the Figure displays the data for the number of accounts
managed in another figure.
➢ Visually, it is still relatively easy to see that Gentry has the greatest
number of accounts and that Williams has the fewest.
➢ However, it is difficult to say whether Lopez or Francois has more
accounts.
➢ Research has shown that people find it very difficult to perceive
differences in area.
➢ Compare the two figures, making visual comparisons is much easier in
the bar chart than in the pie chart (particularly when using a limited
number of colours for differentiation).
➢ Therefore, we recommend against using pie charts in most situations
and suggest instead using bar charts for comparing categorical data.
35
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Bubble Charts:-
▪ A bubble chart is a graphical means of visualizing three variables in a two-
dimensional graph and is therefore sometimes a preferred alternative to a
3-D graph.
▪ Suppose that we want to compare the number of billionaires in various
countries.
▪ The following table provides a sample of six countries, showing, for each
country, the number of billionaires per 10 million residents, the per capita
income, and the total number of billionaires.
▪ We can create a bubble chart using Excel to further examine these data:
Step 1. Select cells of entered data.
Step 2. Click the Insert tab on the Ribbon
Step 3. In the Charts group, click Insert Scatter (X,Y) or Bubble Chart
In the Bubble subgroup, click Bubble
Step 4. Select the chart that was just created to reveal the Chart Buttons
Step 5. Click the Chart Elements button Select the check boxes for
Axes, Axis Titles, Chart Title and Data Labels. Deselect the
check box for Gridlines. Rename all the Titles.
36
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Heat Maps: -
▪ A heat map is a two-dimensional graphical representation of data that
uses different shades of color to indicate magnitude.
▪ The following figure shows a heat map indicating the magnitude of
changes for a metric called same-store sales, which are commonly used
in the retail industry to measure trends in sales.
▪ The cells shaded red in Figure indicate declining same- store sales for the
month, and cells shaded blue indicate increasing same-store sales for the
month. Column N in Figure also contains sparklines for the same-store
sales data.
▪ The following Figure can be created in Excel by following these steps:
Step 1. Select cells B2:M17
Step 2. Click the Home tab on the Ribbon
Step 3. Click Conditional Formatting in the Styles group
Select Color Scales and click on Blue–White–Red Color Scale
▪ To add the sparklines in column N, we use the following steps:
Step 4. Select cell N2
Step 5. Click the Insert tab on the Ribbon
Step 6. Click Line in the Sparklines group
Step 7. When the Create Sparklines dialog box opens: Enter
B2:M2 in the Data Range: box Enter N2 in the Location
Range: box and click OK
Step 8. Copy cell N2 to N3:N17
37
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
39
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Ex:-1.
40
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Ex:- 2
Ex:- 3.
Ex:- 4.
41
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Ex:- 5.
43
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and
independent variable (s) (predictor).
▪ This technique is used for forecasting, time series modelling and finding
the causal effect relationship between the variables.
▪ For example, relationship between rash driving and number of road
accidents by a driver is best studied through regression.
▪ The term “regression” in this context, was first coined by Sir Francis
Galton, a cousin of Sir Charles Darwin. The earliest form of regression
was developed by Adrien-Marie Legendre and Carl Gauss - a method of
least squares.
▪ We can analyze data and perform data modeling using regression analysis.
Here, we create a decision boundary/line according to the data points,
such that the differences between the distances of data points from the
curve or line are minimized.
▪ The terminology you will often listen related with regression analysis is:
• Dependent variable or target variable: Variable to predict.
• Independent variable or predictor variable: Variables to estimate the
dependent variable.
• Outlier: Observation that differs significantly from other observations.
It should be avoided since it may hamper the result.
• Multicollinearity: Situation in which two or more independent
variables are highly linearly related.
• Homoscedasticity or homogeneity of variance: Situation in which
the error term is the same across all values of the independent
variables.
44
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Benefits (or) Advantages (or) Uses of Using Regression Analysis in Data
Analytics:-
▪ There are multiple benefits of using regression analysis. They are as
follows:
▪ It indicates the significant relationships between dependent variable and
independent variable.
▪ It indicates the strength of impact of multiple independent variables on a
dependent variable.
▪ Regression analysis also allows us to compare the effects of variables
measured on different scales, such as the effect of price changes and the
number of promotional activities.
▪ These benefits help market researchers / data analysts / data scientists
to eliminate and evaluate the best set of variables to be used for building
predictive models.
▪ Therefore, this powerful statistical tool is used by Business Analysts and
other data professionals for removing the unwanted variables and choosing
only the important ones.
▪ From a business point of view, the regression method of forecasting can
be helpful for an individual working with data in the following ways:
• Predicting sales in the near and long term.
• Understanding demand and supply.
• Understanding inventory levels.
• Review and understand how variables impact all these factors.
▪ However, businesses can use regression methods to understand the
following:
• Why did the customer service calls drop in the past months?
• How will the sales look like in the next six months?
• Which ‘marketing promotion’ method to choose?
• Whether to expand the business or to create and market a new
product.
▪ The ultimate benefit of regression analysis is to determine which
independent variables have the most effect on a dependent variable.
▪ It also helps to determine which factors can be ignored and those that
should be emphasized.
Types of regression Analysis:
▪ There are various types of regressions that are used in business
analytics, data science and machine learning.
▪ Here we mention some important types of regression:
▪ Linear Regression
▪ Polynomial Regression
▪ Decision Tree Regression
▪ Random Forest Regression
▪ Ridge Regression
▪ Logistic Regression
45
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
What is Linear Regression?
▪ Linear regression is a basic and commonly used type of predictive analysis.
▪ It is the simplest form of regression. It is a technique in which
the dependent variable is continuous in nature.
▪ The relationship between the dependent variable and independent
variables is assumed to be linear in nature.
▪ The overall idea of regression is to examine two things:
• does a set of predictor variables do a good job in predicting an outcome
(dependent) variable?
• Which variables are significant predictors of the outcome variable, and
in what way do they–indicated by the magnitude and sign of the beta
estimates–impact the outcome variable?
▪ The simplest form of the linear regression equation with one dependent
and one independent variable is defined by the formula:
y = a + b*x +e
where, y = estimated dependent variable score,
a = constant or intercept,
b = regression coefficient or slope of the line,
x = score on the independent variable, and
e = error term
▪ Naming the Variables:
There are many names for a regression’s dependent variable. It may be
called an outcome variable, criterion variable, endogenous variable,
or regressand. The independent variables can be called exogenous
variables, predictor variables, or regressors.
▪ Three major uses for regression analysis are
(1) determining the strength of predictors,
(2) forecasting an effect, and
(3) trend forecasting.
▪ When you have only 1 independent variable and 1 dependent variable, it
is called simple linear regression.
▪ When you have more than 1 independent variable and 1 dependent
variable, it is called Multiple linear regression.
47
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ SPSS Statistics can be leveraged in techniques such as simple linear
regression and multiple linear regression.
▪ You can perform the linear regression method in a variety of programs
and environments, including:
• Excel linear regression
• Linear regression Python
• R linear regression
• MATLAB linear regression
• Sklearn linear regression
Examples:-
1. Marks scored by students based on number of hours studied
(ideally)-
Here marks scored in exams are independent and the number of hours
studied is independent.
2. Predicting crop yields based on the amount of rainfall-
Yield is a dependent variable while the measure of precipitation is an
independent variable.
3. Predicting the Salary of a person based on years of experience-
Therefore, Experience becomes the independent while Salary turns into
the dependent variable.
Linear Regression through Excel (or) Creating a Linear Regression Model
in Excel:-
▪ The first step in running regression analysis in Excel is to double-check
that the free Excel plugin Data Analysis ToolPak is installed.
▪ This plugin makes calculating a range of statistics very easy.
▪ It is not required to chart a linear regression line, but it makes creating
statistics tables simpler.
▪ To verify if installed, select "Data" from the toolbar. If "Data Analysis" is
an option, the feature is installed and ready to use.
▪ If not installed, you can request this option by clicking on the Office
button and selecting "Excel options".
▪ Using the Data Analysis ToolPak, creating a regression output is just a
few clicks.
48
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Interpret the Results
Using that data, we get the following table:
49
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Multiple Linear Regression:
▪ Multiple linear regression (MLR), also known simply as multiple regression,
is a statistical technique that uses several explanatory variables to predict
the outcome of a response variable.
▪ The goal of multiple linear regression is to model the linear relationship
between the explanatory (independent) variables and response (dependent)
variables.
▪ In essence, multiple regression is the extension of ordinary least-squares
(OLS) regression because it involves more than one explanatory variable.
What is Goodness-of-Fit?
▪ The Regression Analysis is a part of the linear regression technique. It
examines an equation that reduces the distance between the fitted line
and all of the data points. Determining how well the model fits the data
is crucial in a linear model.
▪ A general idea is that if the deviations between the observed values and
the predicted values of the linear model are small and unbiased, the
model has a well-fit data.
▪ In technical terms, “Goodness-of-fit” is a mathematical model that
describes the differences between the observed values and the expected
values or how well the model fits a set of observations. This measure can
be used in statistical hypothesis testing.
What are Residuals?
▪ Residuals identify the deviation of observed values from the expected
values.
▪ They are also referred to as error or noise terms.
▪ A residual gives an insight into how good our model is against the actual
value but there are no real-life representations of residual values.
50
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
What is R-squared (or) Coefficient of Determination?
• R squared (R2) value in machine learning is referred to as the coefficient
of determination or the coefficient of multiple determination in case of
multiple regression.
• R squared in regression acts as an evaluation metric to evaluate the
scatter of the data points around the fitted regression line. It recognizes
the percentage of variation of the dependent variable.
51
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Correlation and Coefficient of Correlation (r) :-
▪ Correlation is a statistical technique that shows how strongly two variables
are related to each other or the degree of association between the two.
▪ For example, if we have the weight and height data of taller and shorter
people, with the correlation between them, we can find out how these two
variables are related.
▪ We can also find the correlation between these two variables and say that
their weights are positively related to height.
▪ Correlation is measured by the correlation coefficient. It is denoted by ‘r’.
▪ It is very easy to calculate the correlation coefficient in SPSS.
▪ Before calculating the correlation in SPSS, we should have some basic
knowledge about correlation.
▪ The correlation coefficient should always be in the range of -1 to 1.
▪ There are three types of correlation:
1. Positive and negative correlation:
When two variables move in the same direction, then it is called
positive correlation. When one variable moves in a positive direction,
and a second variable moves in a negative direction, then it is said
to be negative correlation.
2. Linear and non linear or curvi-linear correlation:
When both variables change at the same ratio, they are known to be
in linear correlation.
When both variables do not change in the same ratio, then they are
said to be in curvi-linear correlation.
For example, if sale and expenditure move in the same ratio, then
they are in linear correlation and if they do not move in the same
ratio, then they are in curvi-linear correlation.
3. Simple, partial and multiple correlations:
When two variables in correlation are taken in to study, then it is
called simple correlation.
When one variable is a factor variable and with respect to that factor
variable, the correlation of the variable is considered, then it is a
partial correlation.
When multiple variables are considered for correlation, then they are
called multiple correlations.
Degree of correlation:
Perfect correlation: When both the variables change in the same ratio, then
it is called perfect correlation.
High degree of correlation:
When the correlation coefficient range is above 0.75, it is called high degree
of correlation.
Moderate correlation: When the correlation coefficient range is between 0.50
to 0.75, it is called in moderate degree of correlation.
52
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Low degree of correlation: When the correlation coefficient range is between
0.25 to 0.50, it is called low degree of correlation.
Absence of correlation: When the correlation coefficient is between 0 to 0.25,
it shows that there is no correlation.
53
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The insights derived from Data Mining are used for marketing, fraud
detection, scientific discovery, etc.
▪ Data Mining is all about discovering hidden, unsuspected, and previously
unknown yet valid relationships amongst the data.
▪ Data mining is also called Knowledge Discovery in Data (KDD), Knowledge
extraction, data/pattern analysis, information harvesting, etc.
Data Mining Applications: -
▪ Data Mining is primarily used by organizations with intense consumer
demands-
• Retail
• Communication
• Financial
• marketing company
• determine price
• consumer preferences
• product positioning and impact on sales
• customer satisfaction and corporate profits.
▪ Data mining enables a retailer to use point-of-sale records of customer
purchases to develop products and promotions that help the organization
to attract the customer.
54
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
These are the following areas where data mining is widely used:
Data Mining in Healthcare:
▪ Data mining in healthcare has excellent potential to improve the health
system.
▪ It uses data and analytics for better insights and to identify best practices
that will enhance health care services and reduce costs.
▪ Analysts use data mining approaches such as Machine learning, Multi-
dimensional database, Data visualization, Soft computing, and statistics.
▪ Data Mining can be used to forecast patients in each category.
▪ The procedures ensure that the patients get intensive care at the right
place and at the right time.
▪ Data mining also enables healthcare insurers to recognize fraud and
abuse.
55
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ It can also be used to forecast the product development period, cost, and
expectations among the other tasks.
56
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Types of Data for Data Mining:
Data mining can be performed on following types of data:
• Relational databases
• Data warehouses
• Advanced Database and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining
Business understanding:
In this phase, business and data-mining goals are established.
• First, you need to understand business and client objectives. You need
to define what your client wants (which many times even they do not
know themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your
assessment.
• Using business objectives and current scenario, define your data
mining goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.
• First, data is collected from multiple data sources available in the
organization.
• These data sources may include multiple databases, flat filer or data
cubes. There are issues like object matching and schema integration
which can arise during Data Integration process. It is a quite complex
and tricky process as data from various sources unlikely to match
easily. For example, table A contains an entity named cust_no whereas
another table B contains an entity named cust-id.
57
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
•Therefore, it is quite difficult to ensure that both of these given objects
refer to the same value or not. Here, Metadata should be used to reduce
errors in the data integration process.
• Next, the step is to search for properties of acquired data. A good way
to explore the data is to answer the data mining questions (decided in
business phase) using the query, reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained.
Missing data if any should be acquired.
Data preparation:
In this phase, data is made production ready.
• The data preparation process consumes about 90% of the time of
the project.
• The data from different sources should be selected, cleaned,
transformed, formatted, anonymized, and constructed (if required).
• Data cleaning is a process to "clean" the data by smoothing noisy
data and filling in missing values.
• For example, for a customer demographics profile, age data is
missing. The data is incomplete and should be filled. In some cases,
there could be data outliers. For instance, age has a value 300. Data
could be inconsistent. For instance, name of the customer is
different in different tables.
• Data transformation operations change the data to make it useful in
data mining. Following transformation can be applied
Data transformation:
▪ Data transformation operations would contribute toward the success of
the mining process.
o Smoothing: It helps to remove noise from the data.
o Aggregation: Summary or aggregation operations are applied to the
data. I.e., the weekly sales data is aggregated to calculate the
monthly and yearly total.
o Generalization: In this step, Low-level data is replaced by higher-
level concepts with the help of concept hierarchies. For example, the
city is replaced by the county.
o Normalization: Normalization performed when the attribute data
are scaled up o scaled down. Example: Data should fall in the range
-2.0 to 2.0 post-normalization.
o Attribute construction: these attributes are constructed and
included the given set of attributes helpful for data mining. The
result of this process is a final data set that can be used in modelling.
Modelling:
In this phase, mathematical models are used to determine data patterns.
• Based on the business objectives, suitable modelling techniques should
be selected for the prepared dataset.
58
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that model
can meet data mining objectives.
Evaluation:
In this phase, patterns identified are evaluated against the business
objectives.
• Results generated by the data mining model should be evaluated
against the business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of
data mining.
• A go or no-go decision is taken to move the model in the deployment
phase.
Deployment:
In the deployment phase, you ship your data mining discoveries to everyday
business operations.
• The knowledge or information discovered during data mining process
should be made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring
of data mining discoveries is created.
• A final project report is created with lessons learned and key
experiences during the project. This helps to improve the organization's
business policy.
Explain about Techniques of Data Mining (or) Data Mining
Techniques (Methods)
1. Classification:
This analysis is used to retrieve important and relevant information
about data, and metadata.
This data mining method helps to classify data in different classes.
2. Clustering:
Clustering analysis is a data mining technique to identify data that are
like each other.
59
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
This process helps to understand the differences and similarities between
the data.
3. Regression:
Regression analysis is the data mining method of identifying and
analyzing the relationship between variables.
It is used to identify the likelihood of a specific variable, given the presence
of other variables.
4. Association Rules:
This data mining technique helps to find the association between two or
more Items.
It discovers a hidden pattern in the data set.
5. Outer detection:
This type of data mining technique refers to observation of data items in
the dataset which do not match an expected pattern or expected behavior.
This technique can be used in a variety of domains, such as intrusion,
detection, fraud or fault detection, etc.
Outer detection is also called Outlier Analysis or Outlier mining.
6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns
or trends in transaction data for certain period.
7. Prediction:
Prediction has used a combination of the other techniques of data mining
like trends, sequential patterns, clustering, classification, etc.
It analyses past events or instances in a right sequence for predicting a
future event.
Data mining Example:
A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were
halved. Bank has multiple years of record on average credit card balances,
payment amounts, credit limit usage, and other key parameters. They create
a model to check the impact of the proposed new business policy. The data
results show that cutting fees in half for a targeted customer base could
increase revenues by $10 million.
Benefits of Data Mining:
• Data mining technique helps companies to get knowledge-based
information.
• Data mining helps organizations to make the profitable adjustments in
operation and production.
• The data mining is a cost-effective and efficient solution compared to other
statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviours as well as
automated discovery of hidden patterns.
60
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze huge
amount of data in less time.
Disadvantages of Data Mining
• There are chances of companies may sell useful information of their
customers to other companies for money. For example, American Express
has sold credit card purchases of their customers to the other companies.
• Many data mining analytics software is difficult to operate and requires
advance training to work on.
• Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct
data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.
Approaches in Data Mining:
▪ Data mining uses the existing data and identifies the patterns among
attributes in data set and build models.
▪ The models are mathematical representations i.e. linear relationships,
non-linear relationships, explanatory patterns, predictive patterns.
▪ Data mining identifies four types of patterns:
o Association among company occurring groups. In a market
basket analysis, cigarettes and chocolates are going together.
o Predictions about future happening of events, based on past
forecasting absolute temperature of a particular day.
o Clustering is identifying natural grouping of things based on the
known characteristics. Segmentation of customers based on
demographics is an example.
o Sequential relationship is discovering time order events. A
customer who ordered pizza, can order cold drink and ice cream is
an example.
61
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Explain about Data Exploration & Reduction
Explain about Data Exploration and Different steps involve in
Data Exploration:
▪ Data exploration is an approach similar to initial data analysis, whereby
a data analyst uses visual exploration to understand what is in a dataset
and the characteristics of the data, rather than through traditional data
management systems.
▪ These characteristics can include size or amount of data, completeness of
the data, correctness of the data, possible relationships amongst data
elements or files/tables in the data.
▪ Data exploration is the first step in data analysis and typically involves
summarizing the main characteristics of a data set, including its size,
accuracy, initial patterns in the data and other attributes.
▪ It is commonly conducted by data analysts using visual analytics tools,
but it can also be done in more advanced statistical software, such as ‘R’.
62
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ Detecting Outliers:
It aids in identifying outliers in the data, which might indicate errors or
unique occurrences. Detecting these irregularities is vital for maintaining
data quality and accuracy.
▪ Informing Decision-Making:
By exploring data, organizations gain insights into customer behavior,
market trends, and operational efficiencies. This information is
instrumental in making data-driven decisions that can enhance products,
services, and overall business strategies.
▪ Enhancing Predictive Modelling:
It provides a deep understanding of the data distribution, helping data
scientists select appropriate variables for predictive modeling.
Understanding the data thoroughly improves the accuracy and reliability
of machine learning algorithms.
▪ Improving Data Quality:
Data inconsistencies and missing values can be identified and corrected
through exploration. Clean and reliable data is essential for meaningful
analysis and reporting.
▪ Facilitating Communication:
Data visualization, a significant data exploration component, simplifies
complex data sets into understandable visual representations. These
visuals facilitate communication among stakeholders, making it easier to
convey insights and trends.
▪ Innovation and Competitive Advantage:
Businesses can gain a competitive edge by exploring data creatively.
Innovative solutions often arise from a deep understanding of customer
preferences and market dynamics, which can be explored through
comprehensive data analysis.
How Data Exploration Works? (or) Different steps involved in Data
exploration:
▪ Define Your Objective:
Start by understanding the problem or question you want to answer
through data exploration. Having a clear goal will help focus your
exploration.
▪ Gather the Data:
Collect the relevant data for your analysis. This could involve data
acquisition from various sources, such as databases, APIs, spreadsheets,
or files.
▪ Understand the Data:
Examine the data’s structure and format. Key steps include:
o Data Loading: Import the data into your analysis environment (e.g.,
Python, R, Excel).
o Data Description: Check the dataset’s size, shape, and basic
statistics (e.g., mean, median, standard deviation).
63
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
o Data Types: Identify the types of variables (categorical, numerical,
date, etc.).
o Column Names: Review the column names for clarity and
consistency.
▪ Data Cleaning:
Before diving into exploration, address data quality issues:
o Missing Data: Handle missing values through imputation or
removal.
o Outliers: Detect and address outliers that could skew results.
o Data Transformation: Normalize, standardize, or scale data when
necessary.
▪ Data Visualization:
Create visual representations of the data to reveal patterns and
relationships. Common visualization techniques include:
o Bar Charts and Histograms: Display frequency distributions of
categorical and numerical data.
o Scatter Plots: Show relationships between two numerical variables.
o Box Plots: Visualize the spread and distribution of numerical data.
o Heat Maps: Display correlations between variables.
o Time Series Plots: Explore data over time.
▪ Summary Statistics:
Compute summary statistics to gain a deeper understanding of the data:
o Central Tendency: Calculate mean, median, and mode.
o Dispersion: Assess variance, standard deviation, and range.
o Skewness and Kurtosis: Understand the shape of the data
distribution.
o Correlation: Evaluate relationships between variables.
▪ Exploratory Data Analysis (EDA):
Perform in-depth analysis to uncover patterns, anomalies, and insights:
o Frequency Analysis: Examine the distribution of categorical data.
o Box Plots and Violin Plots: Visualize data distributions and
outliers.
o Correlation Matrix: Identify relationships between numerical
variables.
o Hypothesis Testing: Conduct statistical tests to confirm or reject
hypotheses.
Explain about Data Reduction and its Strategies
▪ Data reduction is the process of minimizing the amount of data that
needs to be stored in a data storage environment.
▪ Data reduction can increase storage efficiency and reduce costs.
▪ Data reduction can be achieved using several different types of
technologies.
▪ The best-known data reduction technique is data deduplication, which
eliminates redundant data on storage systems.
64
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
▪ The deduplication process typically occurs at the storage block level.
▪ The system analyses the storage to see if duplicate blocks exist, and gets
rid of any redundant blocks.
▪ The remaining block is shared by any file that requires a copy of the block.
▪ If an application attempts to modify this block, the block is copied prior
to modification so that other files that depend on the block can continue
to use the unmodified version, thereby avoiding file corruption.
▪ While data deduplication is probably the most common data reduction
technique, it is not the only viable one.
▪ Data archiving and data compression can also reduce the amount of data
that has to be stored on primary storage systems.
▪ Data compression reduces the size of a file by removing redundant
information from files so that less disk space is required.
▪ This is accomplished natively in storage systems using algorithms or
formulae designed to identify and remove redundant bits of data.
▪ Archiving data also reduces data on storage systems, but the approach is
quite different.
▪ Rather than reducing data within files or databases, archiving removes
older, infrequently accessed data from expensive storage and moves it to
low-cost, high-capacity storage.
▪ Archive storage can be disk, tape or cloud based.
Strategies for data reduction (or) Methods of data reduction
(or) Techniques of Data Reduction (or) Types of Data
Reduction:
1. Data Cube Aggregation
• Aggregation operations are applied to the data in the construction of a
data cube.
2. Dimensionality Reduction
• In dimensionality reduction redundant attributes are detected and
removed which reduce the data set size.
3. Data Compression
• Encoding mechanisms are used to reduce the data set size.
4. Numerosity Reduction
• In numerosity reduction where the data are replaced or estimated by
alternative.
• Where the data are replaced or estimated by alternative, smaller data
representations such as parametric models or non parametric method
such as clustering, sampling, and the use of histograms.
5. Discretisation and concept hierarchy generation
• Where raw data values for attributes are replaced by ranges or higher
conceptual levels.
• Data discretization is a form of numerosity reduction that is very useful
for the automatic generation of concept hierarchies.
65
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
• Discretization and concept hierarchy generation are powerful tools for
data mining, in that they allow the mining of data at multiple levels of
abstraction.
▪ The main benefit of data reduction is simple: the more data you can fit into
a terabyte of disk space, the less capacity you will need to purchase. Here
are some benefits of data reduction, such as:
o Data reduction can save energy.
o Data reduction can reduce your physical storage costs.
o And data reduction can decrease your data center track.
▪ Data reduction greatly increases the efficiency of a storage system and
directly impacts your total spending on capacity.
▪ Improved efficiency: Data reduction can help to improve the efficiency of
machine learning algorithms by reducing the size of the dataset. This can
make it faster and more practical to work with large datasets.
▪ Improved performance: Data reduction can help to improve the
performance of machine learning algorithms by removing irrelevant or
redundant information from the dataset. This can help to make the model
more accurate and robust.
▪ Reduced storage costs: Data reduction can help to reduce the storage
costs associated with large datasets by reducing the size of the data.
▪ Improved interpretability: Data reduction can help to improve the
interpretability of the results by removing irrelevant or redundant
information from the dataset.
Disadvantages:
▪ Loss of information: Data reduction can result in a loss of information, if
important data is removed during the reduction process.
▪ Impact on accuracy: Data reduction can impact the accuracy of a model,
as reducing the size of the dataset can also remove important information
that is needed for accurate predictions.
▪ Impact on interpretability: Data reduction can make it harder to
interpret the results, as removing irrelevant or redundant information can
also remove context that is needed to understand the results.
▪ Additional computational costs: Data reduction can add additional
computational costs to the data mining process, as it requires additional
processing time to reduce the data.
66
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
67
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ The classification rules can be applied to the new data tuples if the
accuracy is considered acceptable.
Classification Issues
The major issue is preparing the data for Classification. Preparing the data
involves the following activities −
➢ Data Cleaning − Data cleaning involves removing the noise and treatment
of missing values. The noise is removed by applying smoothing techniques
and the problem of missing values is solved by replacing a missing value
with most commonly occurring value for that attribute.
➢ Relevance Analysis − Database may also have the irrelevant attributes.
Correlation analysis is used to know whether any two given attributes are
related.
➢ Data Transformation and reduction − The data can be transformed by
any of the following methods.
▪ Normalization − The data is transformed using normalization.
Normalization involves scaling all values for given attribute in order to
make them fall within a small specified range. Normalization is used
when in the learning step, the neural networks or the methods involving
measurements are used.
▪ Generalization − The data can also be transformed by generalizing it to
the higher concept. For this purpose we can use the concept
hierarchies.
69
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ An antecedent is an item found within the data.
➢ A consequent is an item found in combination with the antecedent.
➢ Association rules are created by searching data for frequent if-then
patterns and using the criteria support and confidence to identify the
most important relationships.
➢ Support is an indication of how frequently the items appear in the data.
➢ Confidence indicates the number of times the if-then statements are
found true.
➢ A third metric, called lift, can be used to compare confidence with
expected confidence, or how many times an if-then statement is expected
to be found true.
➢ Association rules are calculated from itemsets, which are made up of two
or more items.
➢ If rules are built from analyzing all the possible itemsets, there could be so
many rules that the rules hold little meaning.
➢ With that, association rules are typically created from rules well-
represented in data.
70
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ Cause and Effect Analysis gives us a useful way of doing this. This diagram-
based technique, which combines Brainstorming with a type of Mind
Map , pushes us to consider all possible causes of a problem, rather than
just the ones that are most obvious.
➢ Cause and Effect Analysis was devised by ‘Prof. Kaoru Ishikawa’, a pioneer
of quality management, in the 1960s.
➢ The technique was then published in his 1990 book, "Introduction to
Quality Control."
➢ The diagrams that you create with are known as ‘Ishikawa Diagrams’ or
‘Fishbone Diagrams’ (because a completed diagram can look like the
skeleton of a fish).
➢ There are four steps to solve a problem with Cause and Effect Modeling:
• Identify the Problem
• Work out the major factors involved
• Identify possible causes
• Analyze your diagram
71
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
72
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Step 3: Identify Possible Causes
➢ Now, for each of the factors you considered in step 2, brainstorm
possible causes of the problem that may be related to the factor.
➢ Show these possible causes as shorter lines coming off the "bones" of
the diagram.
➢ Where a cause is large or complex, then it may be best to break it down
into sub-causes.
➢ Show these as lines coming off each cause line.
Example:
➢ For each of the factors he identified in step 2, the manager brainstorms
possible causes of the problem, and adds these to the diagram
73
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
75
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
➢ So, the delivery person will calculate different routes for going to all
the 6 destinations and then come up with the shortest route.
➢ This technique of choosing the shortest route is called linear
programming.
➢ So, the delivery person will calculate different routes for going to all
the 6 destinations and then come up with the shortest route.
➢ This technique of choosing the shortest route is called linear
programming.
➢ Linear programming is used for obtaining the most optimal solution for
a problem with given constraints.
➢ In linear programming, we formulate our real life problem into a
mathematical model.
➢ It involves an objective function, linear inequalities with subject to
constraints.
Decision Variables:-
• The Decision variables are the variables which will decide the output.
• They represent the ultimate solution.
• To solve any problem, first we need to identify the decision Variables.
Ex: The total no of units for two products denoted by ‘x' and 'y'
respectively, are the decision variables.
Objective Function:-
• The Objective function is defined as the objective of making decisions. It
is denoted by 'Z'.
Ex: - The company wishes to increase the total profit. So, profit is the
objective function.
Constraints: -
• The constraints are the restrictions (or) limitations on the decision
variables.
• They usually limit the value of the decision variables.
Ex:- The limits on the availability of resources are the Constraints.
Non-Negativity Restrictions:-
• For all linear programs, the decision variables should always take non-
negative values, which means the values for decision Variables should be
greater than (or) equal to '0'.
76
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
Explain about the Process of formulate a Linear Programming
Problem (LPP):-
The following are the different steps involved in the formulation of LPP :
Step 1: Identify the decision Variables.
Step 2: Write the objective function.
Step 3: Mention the constraints.
Step 4: Explicitly state the non-negativity restriction.
Problems:
1. Consider a chocolate manufacturing company which produces only two
types of chocolate-A and B. Both the chocolates require Milk and Choco only.
To manufacture each unit of A and B, the following quantities are required:
•Each unit of 'A' requires 1 unit of Milk and 3 units of Choco.
• Each unit of ‘B’ requires l unit of milk and 2 units of Choco.
The company has a total of 5 units of milk and 12 units of Choco. On each
sale, the company makes a Rs. 6 per unit 'A' and Rs. 5 per unit B. Formulate
the problem to maximize the company's profit.
Sol:- Let 'x' be the total no. of units of production of chocolate ‘A’
‘y’ be the total no of units of production of chocolate ‘B’
Now, we can represent the given problem in a tabular form for better
understanding:
77
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
78
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
79
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
80
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
81
Siva Sivani Degree College
BBA (Business Analytics)–I year/II Semester Introduction to Business Analytics
82
Siva Sivani Degree College