CCW331 BUSINESS ANALYTICS-notes
CCW331 BUSINESS ANALYTICS-notes
COURSE OBJECTIVES:
∙ To understand the Analytics Life Cycle.
∙ To comprehend the process of acquiring Business Intelligence
∙ To understand various types of analytics for Business Forecasting
∙ To model the supply chain management for Analytics.
∙ To apply analytics for different functions of a business
30 PERIODS
INTRODUCTION
ANALYTICS AND DATA SCIENCE
ANALYTICS:
The word analytics has come into the foreground in last decade or so. The
increase of the internet and information technology has made analytics very
relevant in the current age. Analytics is a field which combines data, information
technology, statistical analysis, quantitative methods and computer-based models
into one.
This all are combined to provide decision makers all the possible scenarios to
make a well thought and researched decision. The computer-based model ensures
that decision makers are able to see performance of decision under various
scenarios.
Meaning
Definition
⮚ Business analytics (BA) refers to the skills, technologies, and practices for
continuous iterative exploration and investigation of past business
performance to gain insight and drive business planning. Business
analytics focuses on developing new insights and understanding of
business performance based on data and statistical methods.
Business analytics has a wide range of application and usages. It can be used
for descriptive analysis in which data is utilized to understand past and present
situation. This kind of descriptive analysis is used to asses’ current market position
of the company and effectiveness of previous business decision.
It is used for predictive analysis, which is typical used to asses’ previous business
performance.
Before any data analysis takes place, BA starts with several foundational processes:
● Determine the business goal of the analysis.
● Select an analysis methodology.
● Get business data to support the analysis, often from various systems and sources.
● Cleanse and integrate data into a single repository, such as a data
warehouse or data mart.
For starters, business analytics is the tool your company needs to make
accurate decisions. These decisions are likely to impact your entire organization as
they help you to improve profitability, increase market share, and provide a
greater return to potential shareholders.
While some companies are unsure what to do with large amounts of data,
business analytics works to combine this data with actionable insights to improve
the decisions you make as a company
Essentially, the four main ways business analytics is important, no matter the industry, are:
▪ Improves performance by giving your business a clear picture of what is
and isn’t working
▪ Provides faster and more accurate decisions
▪ Minimizes risks as it helps a business make the right choices regarding
consumer behaviour, trends, and performance
▪ Inspires change and innovation by answering questions about the consumer.
DATA SCIENCE
So this a very big challenge for any organization to deal with such a massive amount of data generating
every second. For handling and evaluating this data we required some very powerful complex algorithms
and technologies.
The following are some primary motives for the use of Data science technology:
1. It helps to convert the big quantity of uncooked and unstructured records into significant insights.
2. It can assist in unique predictions such as a range of surveys, elections, etc.
3. It also helps in automating transportation such as growing a self-driving car, we can say which is the
future of transportation.
4. Companies are shifting towards Data science and opting for this technology. Amazon, Netflix, etc,
which cope with the big quantity of data, are the use of information science algorithms for higher
consumer experience.
1. Business Understanding:
The complete cycle revolves around the enterprise goal. What will you resolve if you do not longer
have a specific problem? It is extraordinarily essential to apprehend the commercial enterprise goal sincerely
due to the fact that will be your ultimate aim of the analysis. After desirable perception only we can set the
precise aim of evaluation that is in sync with the enterprise objective. You need to understand if the
customer desires to minimize savings loss, or if they prefer to predict the rate of a commodity, etc.
2. Data Understanding:
After enterprise understanding, the subsequent step is data understanding. This includes a series of
all the reachable data. Here you need to intently work with the commercial enterprise group as they are
certainly conscious of what information is present, what facts should be used for this commercial enterprise
problem, and different information. This step includes describing the data, their structure, their relevance,
their records type. Explore the information using graphical plots. Basically, extracting any data that you can
get about the information through simply exploring the data.
3. Preparation of Data:
Next comes the data preparation stage. This consists of steps like choosing the applicable data,
integrating the data by means of merging the data sets, cleaning it, treating the lacking values through either
eliminating them or imputing them, treating inaccurate data through eliminating them, additionally test for
outliers the use of box plots and cope with them. Constructing new data, derive new elements from present
ones. Format the data into the preferred structure, eliminate undesirable columns and features. Data
preparation is the most time-consuming but arguably the most essential step in the complete existence cycle.
Your model will be as accurate as your data.
4. Exploratory Data Analysis:
This step includes getting some concept about the answer and elements affecting it, earlier than
constructing the real model. Distribution of data inside distinctive variables of a character is explored
graphically the usage of bar-graphs, Relations between distinct aspects are captured via graphical
representations like scatter plots and warmth maps. Many data visualization strategies are considerably used
to discover each and every characteristic individually and by means of combining them with different
features.
5. Data Modeling:
Data modeling is the coronary heart of data analysis. A model takes the organized data as input and
gives the preferred output. This step consists of selecting the suitable kind of model, whether the problem is
a classification problem, or a regression problem or a clustering problem. After deciding on the model
family, amongst the number of algorithms amongst that family, we need to cautiously pick out the
algorithms to put into effect and enforce them. We need to tune the hyperparameters of every model to
obtain the preferred performance. We additionally need to make positive there is the right stability between
overall performance and generalizability. We do no longer desire the model to study the data and operate
poorly on new data.
6. Model Evaluation:
Here the model is evaluated for checking if it is geared up to be deployed. The model is examined
on an unseen data, evaluated on a cautiously thought out set of assessment metrics. We additionally need to
make positive that the model conforms to reality. If we do not acquire a quality end result in the evaluation,
we have to re-iterate the complete modelling procedure until the preferred stage of metrics is achieved. Any
data science solution, a machine learning model, simply like a human, must evolve, must be capable to
enhance itself with new data, adapt to a new evaluation metric. We can construct more than one model for a
certain phenomenon, however, a lot of them may additionally be imperfect. The model assessment helps us
select and construct an ideal model.
7. Model Deployment:
The model after a rigorous assessment is at the end deployed in the preferred structure and channel.
This is the last step in the data science life cycle. Each step in the data science life cycle defined above must
be laboured upon carefully. If any step is performed improperly, and hence, have an effect on the subsequent
step and the complete effort goes to waste. For example, if data is no longer accumulated properly, you’ll
lose records and you will no longer be constructing an ideal model. If information is not cleaned properly,
the model will no longer work. If the model is not evaluated properly, it will fail in the actual world. Right
from Business perception to model deployment, every step has to be given appropriate attention, time, and
effort.
ANALYTICS LIFE CYCLE
The Data analytic lifecycle is designed for Big Data problems and data science projects. The
cycle is iterative to represent real project. To address the distinct requirements for performing
analysis on Big Data, step – by – step methodology is needed to organize the activities and tasks
involved with acquiring, processing, analyzing, and repurposing data.
Phase 1: Discovery –
● Steps to explore, preprocess, and condition data prior to modeling and analysis.
● It requires the presence of an analytic sandbox, the team execute, load, and transform, to get data into the
sandbox.
● Data preparation tasks are likely to be performed multiple times and not in predefined order.
● Several tools commonly used for this phase are – Hadoop, Alpine Miner, Open Refine, etc.
Phase 3: Model Planning –
● Team explores data to learn about relationships between variables and subsequently, selects key
variables and the most suitable models.
● In this phase, data science team develop data sets for training, testing, and production purposes.
● Team builds and executes models based on the work done in the model planning phase.
● Several tools commonly used for this phase are – Matlab, STASTICA.
Phase 4: Model Building –
● Team develops datasets for testing, training, and production purposes.
● Team also considers whether its existing tools will suffice for running the models or if they need more
robust environment for executing models.
● Free or open-source tools – Rand PL/R, Octave, WEKA.
● Commercial tools – Matlab , STASTICA.
Phase 5: Communication Results –
● After executing model team need to compare outcomes of modeling to criteria established for success
and failure.
● Team considers how best to articulate findings and outcomes to various team members and stakeholders,
taking into account warning, assumptions.
● Team should identify key findings, quantify business value, and develop narrative to summarize and
convey findings to stakeholders.
Phase 6: Operationalize –
● The team communicates benefits of project more broadly and sets up pilot project to deploy work in
controlled way before broadening the work to full enterprise of users.
● This approach enables team to learn about performance and related constraints of the model in
production environment on small scale , and make adjustments before full deployment.
● The team delivers final reports, briefings, codes.
● Free or open source tools – Octave, WEKA, SQL, MADlib.
TYPES OF ANALYTICS
It can help identify strengths and weaknesses and provides an insight into
customer behaviour too. This helps in forming strategies that can be developed in
the area of targeted marketing.
2. Diagnostic Analytics
This type of Analytics helps shift focus from past performance to the current
events and determine which factors are influencing trends. To uncover the root
cause of events, techniques such as data discovery, data mining and drill-down are
employed. Diagnostic analytics makes use of probabilities, and likelihoods to
understand why events may occur. Techniques such as sensitivity analysis and
training algorithms are employed for classification and regression.
3. Predictive Analytics
This type of Analytics is used to forecast the possibility of a future event with the
help of statistical models and ML techniques. It builds on the result of descriptive
analytics to devise models to extrapolate the likelihood of items. To run predictive
analysis, Machine Learning experts are employed. They can achieve a higher level
of accuracy than by business intelligence alone.
One of the most common applications is sentiment analysis. Here, existing data
collected from social media and is used to provide a comprehensive picture of an
users opinion. This data is analysed to predict their sentiment (positive, neutral or
negative).
4. Prescriptive Analytics
Going a step beyond predictive analytics, it provides recommendations for the
next best action to be taken. It suggests all favourable outcomes according to a
specific course of action and also recommends the specific actions needed to
deliver the most desired result. It mainly relies on two things, a strong feedback
system and a constant iterative analysis. It learns the relation between actions and
their outcomes. One common use of this type of analytics is to create
recommendation systems.
Business Analytics tools help analysts to perform the tasks at hand and generate
reports which may be easy for a layman to understand. These tools can be
obtained from open source platforms, and enable business analysts to manage their
insights in a comprehensive manner. They tend to be flexible and user-friendly.
Various business analytics tools and techniques like.
● SAS The tool has a user-friendly GUI and can churn through terabytes
of data with ease. It comes with an extensive documentation and
tutorial base which can help early learners get started seamlessly.
● Tableau is the most popular and advanced data visualization tool in the
market. Story-telling and presenting data insights in a comprehensive
way has become one of the trademarks of a competent business analyst
Tableau is a great platform to develop customized visualizations in no
time, thanks to the drop and drag features.
Python, R, SAS, Excel, and Tableau have all got their unique places when it
comes to usage.
BUSINESS PROBLEM DEFINITION
It defines the problem that a company is facing. Also, it involves an intricate analysis of the
problem, details relevant to the situation, and a solution that can solve the problem. This is a simple yet
effective way to present a problem and its solution concisely.In other words, it is a communication tool that
helps you visualize and minimize the gap between what’s ideal vs. what’s real. Or to put it in business
lingo, the expected performance, and the real performance.
A business problem statement is a compact communication tool that helps you convey what you
want to change.
Before writing a business problem statement, it is crucial to conduct a complete analysis of the
problem and everything related. You should have the knowledge to describe your problem and also suggest
a solution to it To make things easy for you, we have explained the four key aspects to help you write your
business problem statement. They include:
Adding statistics and results from surveys, industry trends, customer demographics, staffing reports,
etc., helps the reader understand the problem distinctly. These references should describe your
problem and its effects on various attributes of your business.
Avoid adding too many numbers in your problem statement, and include only the absolute necessary
statistics. It’s best to include not more than three significant facts.
3. Propose a solution
Your business problem statement should conclude with a solution to the problem that was
previously described. The solution should describe how the current state can be improved.
1. Avoid including elaborate actions and steps in a problem statement. These can be further explained
when you write a project plan.
A popular method that is used while writing a problem statement is the 5W2H (What, Why, Where,
Who, When, How, How much) method. These are the questions that need to be asked and answered while
writing a business problem statement.
● What: What is the problem that needs to be solved? Include the root cause of the problem. Mention
other micro problems that are connected with the macro ones.
● Why: Why is it a problem? Describe the reasons why it is a problem. Include supporting facts and
statistics to highlight the trouble.
● Where: Where is the problem observed? Mention the location and the specifics of it. Include the
products or services in which the problem is seen.
● Who: Who is impacted by this problem? Define and mention the customers, the staff, departments,
and businesses affected by the problem.
● When: When was the problem first observed? Talk about the timeline. Explain how the intensity of
the problem has changed from the time it was first observed.
● How: How is the problem observed? Mention the indications of the problem. Talk about the
observations you made while conducting problem analysis.
● How much: How often is the problem observed? If you have identified a trend during your research,
mention it. Comment on the error rate and the frequency and magnitude of the problem.
● The problem: The problem statement begins with mentioning and explaining the current state.
● Who it affects: Mention the people who are affected by the problem.
● How it impacts: Explain the impacts of the problem.
● The solution: Your problem statement ends with a proposed solution.
DATA COLLECTION
Data
❖ Data Collection
Data collection is the process of acquiring, collecting, extracting, and storing the
voluminous amount of data which may be in the structured or unstructured form
like text, video, audio, XML files, records, or other image files used in later stages
of data analysis. In the process of big data analysis, “Data collection” is the initial
step before starting to analyze the patterns or useful information in data. The data
which is to be analyzed must be collected from different valid sources.
The actual data is then further divided mainly into two types known as:
1. Primary data
2. Secondary data
1.Primary Data
The data which is Raw, original, and extracted directly from the official sources is
known as primary data. This type of data is collected directly by performing
techniques such as questionnaires, interviews, and surveys. The data collected
must be according to the demand and requirements of the target audience on
which analysis is performed otherwise it would be a burden in the data processing.
Few methods of collecting primary data:
⮚ Interview method:
The data collected during this process is through interviewing the target audience
by a person called interviewer and the person who answers the interview is known
as the interviewee. Some basic business or product related questions are asked and
noted down in the form of notes, audio, or video and this data is stored for
processing. These can be both structured and unstructured like personal interviews
or formal interviews through telephone, face to face, email, etc.
Survey Method
The survey method is the process of research where a list of relevant questions are
asked and answers are noted down in the form of text, audio, or video. The survey
method can be obtained in both online and offline mode like through website
forms and email. Then that survey answers are stored for analyzing data.
Examples are online surveys or surveys through social media polls.
⮚ Observation method:
The observation method is a method of data collection in which the researcher
keenly observes the behaviour and practices of the target audience using some
data collecting tool and stores the observed data in the form of text, audio, video,
or any raw formats. In this method, the data is collected directly by posting a few
questions on the participants. For example, observing a group of customers and
their behaviour towards the products. The data obtained will be sent for
processing.
⮚ Projective Technique
Projective data gathering is an indirect interview, used when potential respondents
know why they're being asked questions and hesitate to answer. For instance,
someone may be reluctant
to answer questions about their phone service if a cell phone carrier representative
poses the questions. With projective data gathering, the interviewees get an
incomplete question, and they must fill in the rest, using their opinions, feelings,
and attitudes.
⮚ Delphi Technique.
The Oracle at Delphi, according to Greek mythology, was the high priestess of
Apollo’s temple, who gave advice, prophecies, and counsel. In the realm of data
collection, researchers use the Delphi technique by gathering information from a
panel of experts. Each expert answers questions in their field of specialty, and the
replies are consolidated into a single opinion.
⮚ Focus Groups.
Focus groups, like interviews, are a commonly used technique. The group consists
of anywhere from a half-dozen to a dozen people, led by a moderator, brought
together to discuss the issue.
⮚ Questionnaires.
Questionnaires are a simple, straightforward data collection method. Respondents
get a series of questions, either open or close-ended, related to the matter at hand.
⮚ Experimental method:
The experimental method is the process of collecting data through performing
experiments, research, and investigation. The most frequently used experiment
methods are CRD, RBD, LSD, FD.
● CRD- Completely Randomized design is a simple experimental design
used in data analytics which is based on randomization and replication. It is
mostly used for comparing the experiments.
● RBD- Randomized Block Design is an experimental design in which the
experiment is divided into small units called blocks. Random experiments are
performed on each of the blocks and results are drawn using a technique
known as analysis of variance (ANOVA). RBD was originated from the
agriculture sector.
● LSD – Latin Square Design is an experimental design that is similar to
CRD and RBD blocks but contains rows and columns. It is an arrangement of
NxN squares with an equal amount of rows and columns which contain letters
that occurs only once in a row. Hence the differences can be easily found with
fewer errors in the experiment. Sudoku puzzle is an example of a Latin square
design.
● FD- Factorial design is an experimental design where each experiment
has two factors each with possible values and on performing trail other
combinational factors are derived.
2.Secondary data:
Secondary data is the data which has already been collected and reused again for
some valid purpose. This type of data is previously recorded from primary data
and it has two types of sources named internal source and external source.
i. Internal source:
These types of data can easily be found within the organization such as market
record, a sales record, transactions, customer data, accounting resources, etc. The
cost and time consumption is less in obtaining internal sources.
● Financial Statements
● Sales Reports
● Retailer/Distributor/Deal Feedback
● Customer Personal Information (e.g., name, address, age, contact info)
● Business Journals
● Government Records (e.g., census, tax records, Social Security info)
● Trade/Business Magazines
● The internet
1. Word Association.
The researcher gives the respondent a set of words and asks them what comes to
mind when they hear each word.
2. Sentence Completion.
Researchers use sentence completion to understand what kind of ideas the
respondent has. This tool involves giving an incomplete sentence and seeing how
the interviewee finishes it.
3. Role-Playing.
Respondents are presented with an imaginary situation and asked how they would
act or react if it was real.
4. In-Person Surveys.
The researcher asks questions in person.
5. Online/Web Surveys.
These surveys are easy to accomplish, but some users may be unwilling to answer
truthfully, if at all.
6. Mobile Surveys.
These surveys take advantage of the increasing proliferation of mobile technology.
Mobile collection surveys rely on mobile devices like tablets or smart phones to
conduct surveys via SMS or mobile apps.
7. Phone Surveys.
No researcher can call thousands of people at once, so they need a third party to
handle the chore. However, many people have call screening and won’t answer.
8. Observation.
Sometimes, the simplest method is the best. Researchers who make direct
observations collect data quickly and easily, with little intrusion or third-party
bias. Naturally, it’s only effective in small-scale situations.
DATA PREPARATION
❖ Data Preparation
the data.
There are some alternatives for columns, rows and values.
● Columns, Fields, Attributes, Variables
● Rows, Records, Objects, Cases, Instances, Examples, Vectors
● Values, Data
Hypothesis
Hypothesis generation is a crucial step in any data science project. If you skip this or skim
through this, the likelihood of the project failing increases exponentially.
Types of Hypothesis
⮚ Simple Hypothesis
⮚ Complex Hypothesis
⮚ Null Hypothesis
⮚ Alternate Hypothesis
⮚ Statistical Hypothesis
Simple Hypothesis
Complex Hypothesis
Null Hypothesis
Alternate Hypothesis
For example, beginning your day with tea instead of coffee can keep
you more alert.
Statistical Hypothesis
Hypothesis testing
Hypothesis testing involves drawing inferences about two contrasting
propositions (each called a hypothesis) relating to the value of one or more population
parameters, such as the mean, proportion, standard deviation, or variance.
Null hypothesis
One of these propositions (called the null hypothesis) describes the existing
theory or a belief that is accepted as valid unless strong statistical evidence exists to
the contrary. The null hypothesis is denoted by H0
Alternative hypothesis
The second proposition (called the alternative hypothesis) is the complement
of the null hypothesis; it must be true if the null hypothesis is false.The alternative
hypothesis is denoted by H1.
Using sample data, we either reject the null hypothesis and conclude that the
sample data provide sufficient statistical evidence to support the alternative
hypothesis, If we fail to reject the null hypothesis and conclude that the sample data
does not support the alternative hypothesis.
If we fail to reject the null hypothesis, then we can only accept as valid the
existing theory or belief, but we can never prove it.
We apply this procedure to two different types of hypothesis tests; the first
involving a single population (called one-sample tests) and, later, tests involving more
than one population (multiple-sample tests).
MODELING
Model
Many decision problems can be formalized using a model. A model is an
abstraction or representation of a real system, idea, or object. Models capture the most
important features of a problem and present them in a form that is easy to interpret. A
model can be as simple as a written or verbal description of some phenomenon, a
visual representation such as a graph or a flowchart, or a mathematical or spreadsheet
representation.
We can develop a more detailed model by noting that the variable cost depends on the
unit variable cost as well as the quantity produced. The expanded model is shown in below
Figure . In this figure, all the nodes that have no branches pointing into them are inputs to the
model. We can see that the unit variable cost and fixed costs are data inputs in the model. The
quantity produced, however, is a decision variable because it can be controlled by the
manager of the operation. The total cost is the output (note that it has no branches pointing
out of it) that we would be interested in calculating. The variable cost node links some of the
inputs with the output and can be considered as a “building block” of the model for total cost.
.
Figure shows how to build a mathematical model, drawing upon the influence diagram
Decision Models
A decision model is a logical or mathematical representation of a problem or business
situation that can be used to understand, analyze, or facilitate making a decision.
Most decision models have three types of input:
1. Data, which are assumed to be constant for purposes of the model. Some examples
would be costs, machine capacities, and intercity distances.
2. Uncontrollable variables, which are quantities that can change but cannot be directly
controlled by the decision maker. Some examples would be customer demand, inflation rates,
and investment returns. Often, these variables are uncertain.
3. Decision variables, which are controllable and can be selected at the discretion of the
decision maker. Some examples would be production quantities ,staffing levels, and
investment allocations.
Decision models characterize the relationships among the data, uncontrollable
variables, and decision variables, and the outputs of interest to the decision maker.
Modeling
With the help of modelling techniques, we can create a complete description of existing and
proposed organizational structures, processes, and information used by the enterprise.
Business Model is a structured model, just like a blueprint for the final product to be
developed. It gives structure and dynamics for planning. It also provides the foundation for
the final product.
Model Evaluation
Model Evaluation is an integral part of the model development process. It
helps to find the best model that represents our data and how well the
chosen model will work in the future. Evaluating model performance with
the data used for training is not acceptable in data science because it can
easily generate overoptimistic and over fitted models.
There are two methods of evaluating models in data science,
⮚ Hold-Out
⮚ Cross-Validation.
To avoid overfitting, both methods use a test set (not seen by the model) to evaluate model
performance.
Hold-Out
In this method, the mostly large dataset is randomly divided to three subsets:
1. Training set is a subset of the dataset used to build predictive models.
2. Validation set is a subset of the dataset used to assess the
performance of model built in the training phase. It provides a test
platform for fine tuning model's parameters and selecting the best-
performing model. Not all modeling algorithms need a validation
set.
3. Test set or unseen examples are a subset of the dataset to assess the
likely future performance of a model. If a model fit to the training
set much better than it fits the test set, overfitting is probably the
cause.
Cross-Validation
When only a limited amount of data is available, to achieve an unbiased
estimate of the model performance we use k-fold cross-validation. In k-
fold cross-validation, we divide the data into k subsets of equal size.
We build models k times, each time leaving out one of the subsets from
training and use it as the test set. If k equals the sample size, this is called
"leave-one-out".
● Regression Evaluation
Regression refers to predictive modeling problems that involve predicting a
numeric value.
● It is different from classification that involves predicting a class label. Unlike
classification, you cannot use classification accuracy to evaluate the predictions
made by a regression model.
INTERPRETATION
Model Deployment
An example of using a data mining tool (Orange) to deploy a decision tree model.