BA Unit 1

CCW331 - BUSINESS ANALYTICS
LECTURE NOTES
UNIT I INTRODUCTION TO BUSINESS ANALYTICS
Analytics and Data Science – Analytics Life Cycle – Types of Analytics – Business Problem
Definition – Data Collection – Data Preparation – Hypothesis Generation – Modeling –
Validation and Evaluation – Interpretation – Deployment and Iteration
Introduction:
Every organization across the world uses performance measures such as market share,
profitability, sales growth, return on investments (ROI), customer satisfaction, and so on for
quantifying, monitoring, and improving its performance.
Organisation should understand the KPI’s (Key performance Indicators) and the factors that have
impact on KPI’s.
1. Analytics:
Analytics is a body of knowledge consisting of statistical, mathematical and operations

research techniques, and Artificial intelligence techniques such as machine learning and deep
learning algorithms, data collection and storage, data management processes such as data
extraction, transformation and loading (ETL).
Many companies use analytics as a competitive strategy. A typical data-driven decision

making process uses following steps:
1. Identify the problem or opportunity for value creation.

2. Identify the sources of data (primary & secondary)
3. Pre-process the data for issues such as missing and incorrect data.
4. Divide the data sets into subsets training and validation.
5. Build analytical models and identify the best model using model performance in
validation data.
6. Implement solution/Decision/Develop product.
1.1 Data Science:

Data Science is nothing short of magic, and a Data Scientist is a magician who performs
tricks with the data in his hat. Now, as magic is composed of different elements, similarly, Data
Science is an interdisciplinary field. We can consider it to be an amalgamation of different fields
such as data manipulation, data visualization, statistical analysis, and Machine Learning.
Each of these sub-domains has equal importance.
Data Manipulation:
With the help of data manipulation techniques, you can find interesting insights from the
raw data with minimal effort. Data manipulation is the process of organizing information to make
it readable and understandable. Engineers perform data manipulation using data manipulation
language (DML) capable of adding, deleting, or altering data. Data comes from various sources.
While working with disparate data, you need to organize, clean, and transform it to use it
in your decision-making process. This is where data manipulation fits in. Data manipulation
allows you to manage and integrate data helping drive actionable insights.
Data manipulation, also known as data preparation, enables users to turn static data into
fuel for business intelligence and analytics. Many data scientists use data preparation software to
organize data and generate reports, so non-analysts and other stakeholders can derive valuable
information and make informed decisions.
Importance of Data manipulation
Data manipulation makes it easier for organizations to organize and analyse data as
needed. It helps them perform vital business functions such as analyzing trends, buyer behaviour,
and drawing insights from their financial data.
Data manipulation offers several advantages to businesses, including:
● Consistency: Data manipulation maintains consistency across data accumulated from different
sources, giving businesses a unified view that helps them make better, more informed decisions.
● Usability: Data manipulation allows users to cleanse and organize data and use it more
efficiently.
● Forecasting: Data manipulation enables businesses to understand historical data and helps
them prepare future forecasts, especially in financial data analysis.
● Cleansing: Data manipulation helps clear unwanted data and keep information that matters.
Enterprises can clean up records, isolate, and even reduce unnecessary variables, and focus on the
data they need.
Data visualization:
It is the practice of converting raw information (text, numbers, or symbols) into a graphic format.
The data is visualized with a clear purpose: to show logical correlations between units, and define
inclinations, tendencies, and patterns. Depending on the type of logical connection and the data
itself, visualization can be done in a suitable format. So, it’s dead simple, any analytical report
contains examples of data interpretations like pie charts, comparison bars, demographic maps, and
much more.
As we’ve mentioned, a data representation tool is just the user interface of the whole
business intelligence system. Before it can be used for creating visuals, the data goes through a
long process. This is basically a description of how Business Analytics works, so we’ll break it
down into the stages shortly:
1. First things first, you should define data sources and data types that will be used. Then
transformation methods and database qualities are determined.
2. Following that, the data is sourced from its initial storages, for example, Google
Analytics, ERP, CRM, or SCM system.
3. Using API channels, the data is moved to a staging area where it is transformed.
Transformation assumes data cleaning, mapping, and standardizing to a unified format.
4. Further, cleaned data can be moved into a storage: a usual database or data warehouse.
To make it possible for the tools to read data, the original base language of datasets can
also be rewritten.
Business Intelligence Data processing in a nutshell
Common types of data visualizations

Each type of visual corresponds precisely to the idea of what data it can interpret, and
what type of connection (relationship, comparison, composition, or distribution) it shows
better. Let’s look at the most common types of visualizations you encounter in Business
Analytics in general.
Bar chart
A bar chart is one of the basic ways to compare data units to each other. Because of
its simple graphic form, a bar chart is often used in Business Analytics as an interactive page
element.
Bar charts are versatile enough to be modified and show more complex data models. The bars
can be structured in clusters or be stacked, to depict distribution across market segments, or
subcategories of items. The same goes for horizontal bar charts, fitting more for long data
labels to be placed on the bars.
When to use: comparing objects, numeric information. Use horizontal charts to fit long data
labels. Place stacks in bars to break each object into segments for a more detailed comparison.
Monthly sales bar chart
Pie chart
One more common type of chart we see everywhere, is a pie chart.
This type of chart is used in any marketing or sales department, because it makes it easy to
demonstrate the composition of objects or unit-to-unit comparison.
When to use: composition of an object, comparing parts to the whole object.
Pie chart showing percentage correlation of ice cream flavour preference
Line Graph
This type of visual utilizes a horizontal axis and a vertical axis to depict the value of a unit
over time.
Line graphs can also be combined with bar charts to represent data from multiple
dimensions.
When to use: object value on the timeline, depicting tendencies in behavior over time.
Sales analysis by payment methods
Box plot
At first glance, a box plot looks pretty complicated. But if we look closer at the example, it
becomes evident that it depicts quarters in a horizontal fashion. Our main elements here are
minimum, maximum, and the median placed in between the first and third quartile. What a box
shows is the distribution of objects, and their deviation from the median.
When to use: Distribution of the complex object, deviation from the median value.
Box plot divided into 5 quartiles, while outliers are shown as object that fall out of distribution
area
Scatter plot
This type of visualization is built on X and Y axes. Between them, there are dots placed
around, defining objects. The position of a dot on the graph denotes which qualities it has.
As in the case of line graphs, dots placed between the axes are noticed in a split second.
The only limitation of this type of visualization is the number of axes.
When to use: showing distribution of objects, defining the quality of each object on the graph.
A sad scatterplot showing the inability of young people to earn money
Radar or spider chart

This type of chart is basically a line chart drawn in radial fashion. It has a spider web form
that is created by multiple axes and variables.
Its purpose is the same as for a line chart. But because of the number of axes, you can
compare units from various angles and show the inclinations graphically.
When to use: describing data qualities, comparing multiple objects to each other through different
dimensions.
Spider chart structure
Dot map or density map
Superimposing a visualization over the map works for data’s geographical domain. Density maps
are built with the help of dots placed on the map, marking the location of each unit.
A simple representation of a dot map
Funnel charts
These are perfect for showing narrowing correlations between different groups of items. In most
cases, funnels will utilize both geometric
form and colour coding to differentiate items.
The example shows conversion results starting from total traffic number and the number of
subscribers
This type of chart is also handy when there are multiple stages in the process. On the example
above, we can see that after the “Contacted Support” stage, the number of subscribers has been
reduced.
When to use: depicting processual stages with the narrowing percentage of value/objects
In choosing the type of visualization, make sure you clearly understand the following points:
1. Specifics of your data set: domain of knowledge or department in your company

2. Audience: people you want to present the information to
3. Connection logic: comparison of objects, distribution, relationship, process description,
4. Output: simply, the reason for showing this information to somebody
1.1.3 What is statistical analysis?
Statistical analysis is the process of collecting and analyzing samples of data to uncover
patterns and trends and predict what could happen next to make better and more scientific
decisions.
Once the data is collected, statistical analysis can be used for many things in your business.
Some include:
● Summarizing and presenting the data in a graph or chart to present key findings
● Discovering crucial measures within the data, like the mean
● Calculating if the data is slightly clustered or spread out, which also determines
similarities.
● Making future predictions based on past behavior
● Testing a hypothesis from an experiment
There are several ways that businesses can use statistical analysis to their advantage. Some of
these ways include identifying who on your sales staff is performing poorly, finding trends in
customer data, narrowing down the top operating product lines, conducting financial audits, and
getting a better understanding of how sales performance can vary in different regions of the
country.
Just like any other thing in business, there is a process involved in business analytics as well.
Business analytics needs to be systematic, organized, and include step-by-step actions to have the
most optimized result at the end with the least amount of discrepancies.
Now, let us dive into the steps involved in business analytics:
● Business Problem Framing: In this step, we basically find out what business problem we are
trying to solve, e.g., when we are looking to find out why the supply chain isn’t as effective as it
should be or why we are losing sales. This discussion generally happens with stakeholders when
they realize inefficiency in any part of the business.
● Analytics Problem Framing: Once we have the problem statement, what we need to think of
next is how analytics can be done for that business analytics problem. Here, we look for metrics
and specific points that we need to analyze.
● Data: The moment we identify the problem in terms of what needs to be analyzed, the next
thing that we need is data, which needs to be analyzed. In this step, not only do we obtain data
from various data sources but we also clean the data; if the raw data is corrupted or has false
values, we remove those problems and convert the data into usable form.
● Methodology selection and model building: Once the data gets ready, the tricky part begins.
At this stage, we need to determine what methods have to be used and what metrics are the crucial
ones. If required, the team has to build custom models to find out the specific methods that are
suited to respective operations. Many times, the kind of data we possess also dictates the
methodology that can be used to do business analytics. Most organizations make multiple models
and compare them based on the decided-upon crucial metrics.
● Deployment: Post the selection of the model and the statistical ways of analyzing data for the
solution, the next thing we need to do is to test the solution in a real-time scenario. For that, we
deploy the models on the data and look for different kinds of insights. Based on the metrics and
data highlights, we need to decide the optimum strategy to solve our problem and implement a
solution effectively. Even in this phase of business analytics, we will compare the expected output
with the real-time output. Later, based on this, we will decide if there is a need to reiterate and
modify the solution or if we can go on with the implementation of the same.
2. Business Analytics Process
The Business Analytics process involves asking questions, looking at data, and manipulating it to
find the required answers. Now, every organization has different ways to execute this process as
all of these organizations work in different sectors and value different metrics more than the others
based on their specific business model.
Since the approach to business is different for different organizations, their solutions and their
ways to reach the solutions are also different. Nonetheless, all of the actions that they do can be
classified and generalized to understand their approach. The image given below demonstrates the
steps in Business Analytics process of a firm:
2.1 Six Steps in the Business Analytics Lifecycle
Step 1: Identifying the Problem
The first step of the process is identifying the business problem. The problem could be an actual
crisis; it could be something related to recognizing business needs or optimizing current processes.
This is a crucial stage in Business Analytics as it is important to clearly understand what the
expected outcome should be. When the desired outcome is determined, it is further broken down
into smaller goals. Then, business stakeholders decide the relevant data required to solve the
problem. Some important questions must be answered in this stage, such as: What kind of data is
available? Is there sufficient data? And so on.
Step 2: Exploring Data
Once the problem statement is defined, the next step is to gather data (if required) and, more
importantly, cleanse the data—most organizations would have plenty of data, but not all data
points would be accurate or useful. Organizations collect huge amounts of data through different
methods, but at times, junk data or empty data points would be present in the dataset. These faulty
pieces of data can hamper the analysis. Hence, it is very important to clean the data that has to be
analyzed.
To do this, you must do computations for the missing data, remove outliers, and find new
variables as a combination of other variables. You may also need to plot time series graphs as they
generally indicate patterns and outliers. It is very important to remove outliers as they can have a
heavy impact on the accuracy of the model that you create. Moreover, cleaning the data helps you
get a better sense of the dataset.
Step 3: Analysis
Once the data is ready, the next thing to do is analyze it. Now to execute the same, there are
various kinds of statistical methods (such as hypothesis testing, correlation, etc.) involved to find
out the insights that you are looking for. You can use all of the methods for which you have the
data.
The prime way of analyzing is pivoting around the target variable, so you need to take into
account whatever factors that affect the target variable. In addition to that, a lot of assumptions are
also considered to find out what the outcomes can be. Generally, at this step, the data is sliced, and
the comparisons are made. Through these methods, you are looking to get actionable insights.
Step 4: Prediction and Optimization
Gone are the days when analytics was used to react. In today’s era, Business Analytics is all about
being proactive. In this step, you will use prediction techniques, such as neural networks or
decision trees, to model the data. These prediction techniques will help you find out hidden
insights and relationships between variables, which will further help you uncover patterns on the
most important metrics. By principle, a lot of models are used simultaneously, and the models
with the most accuracy are chosen. In this stage, a lot of conditions are also checked as
parameters, and answers to a lot of ‘what if…?’ questions are provided.
Step 5: Making a Decision and Evaluating the Outcome
From the insights that you receive from your model built on target variables, a viable plan of
action will be established in this step to meet the organization’s goals and expectations. The said
plan of action is then put to work, and the waiting period begins. You will have to wait to see the
actual outcomes of your predictions and find out how successful you were in your endeavors.
Once you get the outcomes, you will have to measure and evaluate them.
Step 6: Optimizing and Updating
Post the implementation of the solution, the outcomes are measured as mentioned above. If you
find some methods through which the plan of action can be optimized, then those can be
implemented. If that is not the case, then you can move on with registering the outcomes of the
entire process. This step is crucial for any analytics in the future because you will have an ever-
improving database. Through this database, you can get closer and closer to maximum
optimization. In this step, it is also important to evaluate the ROI (return on investment). Take a
look at the diagram above of the life cycle of business analytics.
2.3 TYPES OF ANALYTICS :
For different stages of business analytics huge amount of data is processed at various steps.
Depending on the stage of the workflow and the requirement of data analysis, there are four main
kinds of analytics – descriptive, diagnostic, predictive and prescriptive. These four types together
answer everything a company needs to know- from what’s going on in the company to what
solutions to be adopted for optimising the functions.
The four types of analytics are usually implemented in stages and no one type of analytics is said
to be better than the other.
Before diving deeper into each of these, let’s define the four types of analytics:
1) Descriptive Analytics: Describing or summarising the existing data using existing business
intelligence tools to better understand what is going on or what has happened.
2) Diagnostic Analytics: Focus on past performance to determine what happened and why.
The result of the analysis is often an analytic dashboard.
3) Predictive Analytics: Emphasizes on predicting the possible outcome using statistical models
and machine learning techniques.
4) Prescriptive Analytics: It is a type of predictive analytics that is used to recommend one or
more course of action on analyzing the data. Let’s understand these in a bit more depth.
2.3.1. Descriptive Analytics
This can be termed as the simplest form of analytics. The mighty size of big data is beyond human
comprehension and the first stage hence involves crunching the data into understandable chunks.
The purpose of this analytics type is just to summarise the findings and understand what is going
on.
Among some frequently used terms, what people call as advanced analytics or business
intelligence is basically usage of descriptive statistics (arithmetic operations, mean, median, max,
percentage, etc.) on existing data. It is said that 80% of business analytics mainly involves
descriptions based on aggregations of past performance. It is an important step to make raw data
understandable to investors, shareholders and managers. This way it gets easy to identify and
address the areas of strengths and weaknesses such that it can help in strategizing.The two main
techniques involved are data aggregation and data mining stating that this method is purely used
for understanding the underlying behavior and not to make any estimations. By mining historical
data, companies can analyze the consumer behaviors and engagements with their businesses that
could be helpful in targeted marketing, service improvement, etc. The tools used in this phase are
MS Excel, MATLAB, SPSS, STATA, etc
2.3.2 Diagnostic Analytics
Diagnostic analytics is used to determine why something happened in the past. It is characterized
by techniques such as drill-down, data discovery, data mining and correlations. Diagnostic
analytics takes a deeper look at data to understand the root causes of the events. It is helpful in
determining what factors and events contributed to the outcome. It mostly uses probabilities,
likelihoods, and the distribution of outcomes for the analysis.
In a time series data of sales, diagnostic analytics would help you understand why the sales
have decrease or increase for a specific year or so. However, this type of analytics has a limited
ability to give actionable insights. It just provides an understanding of causal relationships and
sequences while looking backward.
A few techniques that use diagnostic analytics include attribute importance, principal
components analysis, sensitivity analysis, and conjoint analysis. Training algorithms for
classification and regression also fall in this type of analytics.
2.3.3 Predictive Analytics
As mentioned above, predictive analytics is used to predict future outcomes. However, it is
important to note that it cannot predict if an event will occur in the future; it merely forecasts what
are the probabilities of the occurrence of the event. A predictive model builds on the preliminary
descriptive analytics stage to derive the possibility of the outcomes.
The analytics is found in sentiment analysis where all the opinions posted on social media are
collected and analyzed (existing text data) to predict the person’s sentiment on a particular subject
as being- positive, negative or neutral (future prediction).
Hence, predictive analytics includes building and validation of models that provide accurate
predictions. Predictive analytics relies on machine learning algorithms like random forests, SVM,
etc. and statistics for learning and testing the data. Usually, companies need trained data scientists
and machine learning experts for building these models. The most popular tools for predictive
analytics include Python, R, RapidMiner, etc.
The prediction of future data relies on the existing data as it cannot be obtained otherwise. If the
model is properly tuned, it can be used to support complex forecasts in sales and marketing. It
goes a step ahead of the standard BI in giving accurate predictions.
2.3.4 Prescriptive Analytics
The basis of this analytics is predictive analytics but it goes beyond the three mentioned
above to suggest the future solutions. It can suggest all favorable outcomes according to a
specified course of action and also suggest various course of actions to get to a particular outcome.
Hence, it uses a strong feedback system that constantly learns and updates the relationship
between the action and the outcome.
The computations include optimizations of some functions that are related to the desired outcome.
For example, while calling for a cab online, the application uses GPS to connect you to the correct
driver from among a number of drivers found nearby. Hence, it optimizes the distance for faster
arrival time. Recommendation engines also use prescriptive analytics.
The other approach includes simulation where all the key performance areas are combined to
design the correct solutions. It makes sure whether the key performance metrics are included in
the solution. The optimization model will further work on the impact of the previously made
forecasts. Because of its power to suggest favorable solutions, prescriptive analytics is the final
frontier of advanced analytics or data science, in today’s term.
The four techniques in analytics may make it seem as if they need to be implemented sequentially.
However, in most scenarios, companies can jump directly to prescriptive analytics. As for most of
the companies, they are aware of or are already implementing descriptive analytics but if one has
identified the key area that needs to be optimized and worked upon, they must employ prescriptive
analytics to reach the desired outcome.
According to research, prescriptive analytics is still at the budding stage and not many firms have
completely used its power. However, the advancements in predictive analytics will surely pave the
way for its development.
3. Business Problem Definition:
Problem-solving in business is defined as implementing processes that reduce or remove

obstacles that are preventing you or others from accomplishing operational and strategic business
goals.
In business, a problem is a situation that creates a gap between the desired and actual
outcomes. In addition, a true problem typically does not have an immediately obvious resolution.
Business problem-solving works best when it is approached through a consistent system in
which individuals:
● Identify and define the problem
● Prioritize the problem based on size, potential impact, and urgency
● Complete a root-cause analysis
● Develop a variety of possible solutions
● Evaluate possible solutions and decide which is most effective
● Plan and implement the solution
3.1 Why Problem Solving Is Important in Business
Understanding the importance of problem-solving skills in the workplace will help you
develop as a leader. Problem-solving skills will help you resolve critical issues and conflicts that
you come across. Problem-solving is a valued skill in the workplace because it allows you to:
● Apply a standard problem-solving system to all challenges

● Find the root causes of problems
● Quickly deal with short-term business interruptions
● Form plans to deal with long-term problems and improve the organization
● See challenges as opportunities
● Keep your cool during challenges.
3.2 How to Solve Business Problems Effectively
There are many different problem-solving skills, but most can be broken into general steps.
Here is a four-step method for business problem solving:
1) Identify the Details of the Problem: Gather enough information to accurately define the
problem. This can include data on procedures being used, employee actions, relevant workplace
rules, and so on. Write down the specific outcome that is needed, but don’t assume what the
solution should be.
2) Creatively Brainstorm Solutions: Alone or with a team, state every solution you can think of.
You’ll often need to write them down. To get more solutions, brainstorm with the employees who
have the greatest knowledge of the issue.
3) Evaluate Solutions and Make a Decision: Compare and contrast alternative solutions based
on the feasibility of each one, including the resources needed to implement it and the return on
investment of each one. Finally, make a firm decision on one solution that clearly addresses the
root cause of the problem.
4) Take Action: Write up a detailed plan for implementing the solution, get the necessary
approvals, and put it into action.
4 . WHAT IS DATA COLLECTION?
Data collection is the methodological process of gathering information about a specific

subject. It’s crucial to ensure your data is complete during the collection phase and that it’s
collected legally and ethically. If not, your analysis won’t be accurate and could have far-reaching
consequences.
In general, there are three types of consumer data:
● First-party data, which is collected directly from users by your organization

● Second-party data, which is data shared by another organization about its customers (or its
first-party data)
● Third-party data, which is data that’s
been aggregated and rented or sold by
organizations that don’t have a
connection to your company or users
Although there are use cases for second- and

third-party data, first-party data (data
you’ve collected yourself) is more valuable
because you receive information about
how your audience behaves, thinks, and
feels—all from a trusted source.
Data can be qualitative (meaning contextual in nature) or quantitative (meaning numeric in

nature). Many data collection methods apply to either type, but some are better suited to one over
the other.
In the data life cycle, data collection is the second step. After data is generated, it must be
collected to be of use to your team. After that, it can be processed, stored, managed, analyzed, and
visualized to aid in your organization’s decision-making.
Before collecting data, there are several factors you need to define:
● The question you aim to answer
● The data subject(s) you need to collect data from
● The collection timeframe
● The data collection method(s) best suited to your needs
The data collection method you select should be based on the question you want to answer, the
type of data you need, your timeframe, and your company’s budget. Explore the options in the
next section to see which data collection method is the best fit.
4.1 SEVEN DATA COLLECTION METHODS USED IN BUSINESS
ANALYTICS
1. Surveys
Surveys are physical or digital questionnaires that gather both qualitative and quantitative
data from subjects. One situation in which you might conduct a survey is gathering attendee
feedback after an event. This can provide a sense of what attendees enjoyed, what they wish was
different, and areas you can improve or save money on during your next event for a similar
audience.
Because they can be sent out physically or digitally, surveys present the opportunity for
distribution at scale. They can also be inexpensive; running a survey can cost nothing if you use a
free tool. If you wish to target a specific group of people, partnering with a market research firm to
get the survey in the hands of that demographic may be worth the money.
Something to watch out for when crafting and running surveys is the effect of bias, including:
● Collection bias: It can be easy to accidentally write survey questions with a biased lean.
Watch out for this when creating questions to ensure your subjects answer honestly and aren’t
swayed by your wording.
● Subject bias: Because your subjects know their responses will be read by you, their
answers may be biased toward what seems socially acceptable. For this reason, consider pairing
survey data with behavioral data from other collection methods to get the full picture.
2. Transactional Tracking
Each time your customers make a purchase, tracking that data can allow you to make
decisions about targeted marketing efforts and understand your customer base better.
Often, e-commerce and point-of-sale platforms allow you to store data as soon as it’s generated,
making this a seamless data collection method that can pay off in the form of customer insights.
3. Interviews and Focus Groups
Interviews and focus groups consist of talking to subjects face-to-face about a specific
topic or issue. Interviews tend to be one-on-one, and focus groups are typically made up of several
people. You can use both to gather qualitative and quantitative data.
Through interviews and focus groups, you can gather feedback from people in your target
audience about new product features. Seeing them interact with your product in real-time and
recording their reactions and responses to questions can provide valuable data about which
product features to pursue.
As is the case with surveys, these collection methods allow you to ask subjects anything you want
about their opinions, motivations, and feelings regarding your product or brand. It also introduces
the potential for bias. Aim to craft questions that don’t lead them in one particular direction.
One downside of interviewing and conducting focus groups is they can be time-consuming
and expensive. If you plan to conduct them yourself, it can be a lengthy process. To avoid this,
you can hire a market research facilitator to organize and conduct interviews on your behalf.
4. Observation
Observing people interacting with your website or product can be useful for data collection
because of the candour it offers. If your user experience is confusing or difficult, you can witness
it in real-time.
Yet, setting up observation sessions can be difficult. You can use a third-party tool to
record users’ journeys through your site or observe a user’s interaction with a beta version of your
site or product.
While less accessible than other data collection methods, observations enable you to see
first hand how users interact with your product or site. You can leverage the qualitative and
quantitative data gleaned from this to make improvements and double down on points of success.
5. Online Tracking
To gather behavioural data, you can implement pixels and cookies. These are both tools
that track users’ online behaviour across websites and provide insight into what content they’re
interested in and typically engage with.
You can also track users’ behavior on your company’s website, including which parts are
of the highest interest, whether users are confused when using it, and how long they spend on
product pages. This can enable you to improve the website’s design and help users navigate to
their destination.
Inserting a pixel is often free and relatively easy to set up. Implementing cookies may
come with a fee but could be worth it for the quality of data you’ll receive. Once pixels and
cookies are set, they gather data on their own and don’t need much maintenance, if any.
It’s important to note: Tracking online behavior can have legal and ethical privacy
implications. Before tracking users’ online behavior, ensure you’re in compliance with local and
industry data privacy standards.
6. Forms
Online forms are beneficial for gathering qualitative data about users, specifically
demographic data or contact information. They’re relatively inexpensive and simple to set up, and
you can use them to gate content or registrations, such as webinars and email newsletters.
You can then use this data to contact people who may be interested in your product, build
out demographic profiles of existing customers, and in remarketing efforts, such as email
workflows and content recommendations
7. Social Media Monitoring
Monitoring your company’s social media channels for follower engagement is an

accessible way to track data about your audience’s interests and motivations. Many social media
platforms have analytics built in, but there are also third-party social platforms that give more
detailed, organized insights pulled from multiple channels.
You can use data collected from social media to determine which issues are most
important to your followers. For instance, you may notice that the number of engagements
dramatically increases when your company posts about its sustainability efforts.
5. What Is Data Preparation?
Data preparation, also sometimes called “pre-processing,” is the act of cleaning and
consolidating raw data prior to using it for business analysis. It might not be the most celebrated of
tasks, but careful data preparation is a key component of successful data analysis.
Doing the work to properly validate, clean, and augment raw data is essential to draw
accurate, meaningful insights from it. The validity and power of any business analysis produced is
only as good as the data preparation done in the early stages.
5.1 Why Is Data Preparation Important?
The decisions that business leaders make are only as good as the data that supports them.
Careful and comprehensive data preparation ensures analysts trust, understand, and ask better
questions of their data, making their analyses more accurate and meaningful. From more
meaningful data analysis comes better insights and, of course, better outcomes.
To drive the deepest level of analysis and insight, successful teams and organizations must
implement a data preparation strategy that prioritizes:
● Accessibility: Anyone — regardless of skillset — should be able to access data securely
from a single source of truth
● Transparency: Anyone should be able to see, audit, and refine any step in the end-to-end
data preparation process that took place
● Repeatability: Data preparation is notorious for being time-consuming and repetitive,
which is why
successful
data preparation
strategies
invest in
solutions built
for repeatability.
With the right

solution in
hand, analysts and
teams can
streamline the data preparation process, and instead, spend more time getting to valuable business
insights and outcomes, faster.
5.2 What Steps Are Involved in Data Preparation Processes?

The data preparation process can vary depending on industry or need, but typically consists of the
following steps:
● Acquiring data: Determining what data is needed, gathering it, and establishing consistent
access to build powerful, trusted analysis
● Exploring data: Determining the data’s quality, examining its distribution, and analyzing
the relationship between each variable to better understand how to compose an analysis
● Cleansing data: Improving data quality and overall productivity to craft error-proof
insights
● Transforming data: Formatting, orienting, aggregating, and enriching the datasets used in
an analysis to produce more meaningful insights
While data preparation processes build upon each other in a serialized fashion, it’s not always
linear. The order of these steps might shift depending on the data and questions being asked. It’s
common to revisit a previous step as new insights are uncovered or new data sources are
integrated into the process.
The entire data preparation process can be notoriously time-intensive, iterative, and repetitive.
That’s why it’s important to ensure the individual steps taken can be easily understood, repeated,
revisited, and revised so analysts can spend less time prepping and more time analyzing.
Below is a deeper look at each part of the process.

5.2.1 Acquire Data
The first step in any data preparation process is acquiring the data that an analyst will use for their
analysis. It’s likely that analysts rely on others (like IT) to obtain data for their analysis, likely
from an enterprise software system or data management system. IT will usually deliver this data in
an accessible format like an Excel document or CSV.
Modern analytic software can remove the dependency on a data-wrangling middleman to tap right
into trusted sources like SQL, Oracle, SPSS, AWS, Snowflake, Salesforce, and Marketo. This
means analysts can acquire the critical data for their regularly-scheduled reports as well as novel
analytic projects on their own.
5.2.2 Explore Data

Examining and profiling data helps analysts understand how their analysis will begin to
take shape. Analysts can utilize visual analytics and summary statistics like range, mean, and
standard deviation to get an initial picture of their data. If data is too large to work with easily,
segmenting it can help.
During this phase, analysts should also evaluate the quality of their dataset. Is the data
complete? Are the patterns what was expected? If not, why? Analysts should discuss what they’re
seeing with the owners of the data, dig into any surprises or anomalies, and consider if it’s even
possible to improve the quality. While it can feel disappointing to disqualify a dataset based on
poor quality, it is a wise move in the long run. Poor quality is only amplified as one moves
through the data analytics processes.
5.2.3 Cleanse Data

During the exploration phase, analysts may notice that their data is poorly structured and in
need of tidying up to improve its quality. This is where data cleansing comes into play. Cleansing
data includes:
● Correcting entry errors
● Removing duplicates or outliers
● Eliminating missing data
● Masking sensitive or confidential information like names or addresses
5.2.4 Transform Data

Data comes in many shapes, sizes, and structures. Some is analysis-ready, while other
datasets may look like a foreign language.
Transforming data to ensure that it’s in a format or structure that can answer the questions
being asked of it is an essential step to creating meaningful outcomes. This will vary based on the
software or language that an analysts uses for their data analysis.
A couple of common examples of data transformations are:
● Pivoting or changing the orientation of data.
● Converting date formats.
● Aggregating sales and performance data across time.
6. HYPOTHESIS GENERATION OR TESTING:
Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a

statistical parameter. Analysts implement hypothesis testing in order to test if a hypothesis is
plausible or not. In data science and statistics, hypothesis testing is an important step as it involves
the verification of an assumption that could help develop a statistical parameter. For instance, a
researcher establishes a hypothesis assuming that the average of all odd numbers is an even
number.
In order to find the plausibility of this hypothesis, the researcher will have to test the
hypothesis using hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true
on the basis of little or no evidence, hypothesis testing is required to have plausible evidence in
order to establish that a statistical hypothesis is true.
6.1 Types of Hypotheses

In data sampling, different types of hypothesis are involved in finding whether the tested
samples test positive for a hypothesis or not. In this segment, we shall discover the different types
of hypotheses and understand the role they play in hypothesis testing.
6.1.1 Alternative Hypothesis

Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship
between two variables (where one variable affects the other). The alternative hypothesis is the
main driving force for hypothesis testing.
It implies that the two variables are related to each other and the relationship that exists
between them is not due to chance or coincidence.
When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject
of the testing process. The analyst intends to test the alternative hypothesis and verifies its
plausibility.
6.1.2 Null Hypothesis
The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there
exists no relation between two variables in statistics. It states that the effect of one variable on the
other is solely due to chance and no empirical cause lies behind it.
The null hypothesis is established alongside the alternative hypothesis and is recognized as
important as the latter. In hypothesis testing, the null hypothesis has a major role to play as it
influences the testing against the alternative hypothesis.
6.1.3 Non-Directional Hypothesis
The Non-directional hypothesis states that the relation between two variables has no
direction. Simply put, it asserts that there exists a relation between two variables, but does not
recognize the direction of effect, whether variable A affects variable B or vice versa.
6.1.4 Directional Hypothesis
The Directional hypothesis, on the other hand, asserts the direction of effect of the
relationship that exists between two variables.
Herein, the hypothesis clearly states that variable A affects variable B, or vice versa.
6.1.5 Statistical Hypothesis
A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of
statistics. By using data sampling and statistical knowledge, one can determine the plausibility of a
statistical hypothesis and find out if it stands true or not.
6.2 Type 1 and Type 2 Error
A hypothesis test can result in two types of errors.
Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being
true.
Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false,
unlike a Type-I error.
Example:
Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.
H0: Student has passed
H1: Student has failed
Type I error will be the teacher failing the student [rejects H0] although the student scored the
passing marks [H0 was true].
Type II error will be the case where the teacher passes the student [do not reject H0] although the
student did not score the passing marks [H1 is true].
6.3 Two types of hypothesis testing
1. One-tailed (or one-sided) test: Tests for the significance of an effect in only one direction,
either positive or negative.
2. Two-tailed (or two-sided) test: Tests for the significance of an effect in both directions,
allowing for the possibility of a positive or negative effect.
6.4 Performing Hypothesis Testing

Now that we have understood the types of hypotheses and the role they play in hypothesis
testing, let us now move on to understand the process in a better manner.
In hypothesis testing, a researcher is first required to establish two hypotheses - alternative
hypothesis and null hypothesis in order to begin with the procedure.
To establish these two hypotheses, one is required to study data samples, find a plausible
pattern among the samples, and pen down a statistical hypothesis that they wish to test.
A random population of samples can be drawn, to begin with hypothesis testing. Among
the two hypotheses, alternative and null, only one can be verified to be true. Perhaps the presence
of both hypotheses is required to make the process successful.
At the end of the hypothesis testing procedure, either of the hypotheses will be rejected and
the other one will be supported. Even though one of the two hypotheses turns out to be true, no
hypothesis can ever be verified 100%.
6.4.1 Seven steps of hypothesis testing
Let us perform hypothesis testing through the following 7 steps of the procedure:
Step 1 : Specify the null hypothesis and the alternative hypothesis
Step 2 : What level of significance?
Step 3 : Which test and test statistic to be performed?
Step 4 : State the decision rule
Step 5 : Use the sample data to calculate the test statistic
Step 6 : Use the test statistic result to make a decision
Step 7 : Interpret the decision in the context of the original question
To guide us through the steps, let us use the following example.

Assume a food laboratory analyzed a certified reference freeze-dried food material with a
stated sodium (Na) content of 250 mg/kg. It carried out 7 repeated analyses and obtained a mean
value of 274 mg/kg of sodium with a sample standard deviation of 21 mg/kg. Now we want to
know if the mean value of 274 mg/kg is significantly larger than the stated amount of 250 mg/kg.
If so, we will conclude that the reported results of this batch of analysis were of bias and had
consistently given higher values than expected.
Step 1 : Specify the null hypothesis and the alternative hypothesis
The null hypothesis Ho is the statement that we are interested in testing. In this case, the
null condition is that the mean value is 250 mg/kg of sodium.
The alternative hypothesis H1 is the statement that we accept if our sample outcome
leads us to reject the null hypothesis. In our case, the alternative hypothesis is that the mean value
is not equal to 250 mg/kg of sodium. In other words, it can be significantly larger or smaller than
the value of 250 mg/kg.
So, our formal statement of the hypotheses for this example is as follows:
Ho : 𝑥̅ = 250 mg/kg (i.e., the certified value)
H1 : 𝑥̅ ≠ 250 mg/kg (i.e., indicating that the laboratory has a bias result.
Step 2 : What level of significance

The level of significance is the probability of rejecting the null hypothesis by chance alone.
This could happen from sub-sampling error, methodology, analyst’s technical competence,
instrument drift, etc. So, we have to decide on the level of significance to reject the null hypothesis
if the sample result was unlikely given the null hypothesis was true.
Traditionally, we define the unlikely (given by symbol ) as 0.05 (5%) or less. However,
there is nothing to stop you from using = 0.1 (10%) or = 0.01 (1%) with your own
justification or reasoning.
In fact, the significance level sometimes is referred to as the probability of a Type I error.
A Type I error occurs when you falsely reject the null hypothesis on the basis of the above-
mentioned errors. A Type II error occurs when you fail to reject the null hypothesis when it is
false.
Step 3 : Which test and test statistic?
The test statistic is the value calculated from the sample to determine whether to reject the null
hypothesis. In this case, we use Student’s t-test statistic in the following manner:
𝜇=𝑥̅±(𝛼=0.05,𝑣=𝑛−1)𝑠√𝑛
or
(𝛼=0.05,𝑣=𝑛−1)=|𝑥̅−𝜇|√𝑛𝑠
By calculation, we get a t-value of 3.024 at the significance level of = 0.05 and v = (7-1) or 6
degrees of freedom for n = 7 replicates.
Step 4 : State the decision rule
The decision rule is always of the following form:
Reject Ho if …..We reject the null hypothesis if the test statistic is larger than a critical value
corresponding to the significance level in step 2.
There is now a question in H1 on either one-tailed (> or <) or two-tailed (≠ not equal) tests
to be addressed. If we are talking about either “greater than” or “smaller than”, we take the
significance level at = 0.05 whilst for the unequal (that means the result can be either larger or
smaller than the certified value), the significance level at = 0.025 on either side of the normal
curve is to be studied.
As our H1 is for the mean value to be larger or smaller than the certified value, we use the
2-tailed t-test for = 0.05 with 6 degrees of freedom. In this case, the t-critical value at =
0.05 and 6 degrees of freedom is 2.447 from the Student’s t-table or from using the Excel function
“=T.INV.2T(0.05,6)” or “=TINV(0.05,6) in older Excel version.
That means the decision rule would be stated as below:
Reject Ho if t > 2.447
Step 5 : Use the sample data to calculate the test statistic
Upon calculation on the sample data, we have got a t-value of 3.024 at the significance
level of  = 0.05 and v = (7-1) or 6 degrees of freedom for n = 7 replicates.
Step 6 : Use the test statistic to make a decision
When we compare the result of step 5 to the decision rule in step 4, it is obvious that 3.024
is greater than the t-critical value of 2.447, and so we reject the null hypothesis. In other words,
the mean value of 274 mg/kg is significantly different from the certified value of 250 mg/kg.
Is it really so? We must go to step 7.
Step 7 : Interpret the decision in the context of the original question
Since hypothesis testing involves some kind of probability under the disguise of
significance level, we must interpret the final decision with caution. To say that a result is
“statistically significant” sounds remarkable, but all it really means is that it is more than by
chance alone.
To do justice, it would be useful to look at the actual data to see if there are one or more
high outliers pulling up the mean value. Perhaps increasing the number of replicates might show
up any undesirable data. Furthermore, we might have to take a closer look at the test procedure
and the technical competence of the analyst to see if there were any lapses in the analytical
process. A repeated series of experiment should be able to confirm these findings.
6.5 Limitations of Hypothesis Testing
Hypothesis testing has some limitations that researchers should be aware of:
1. It cannot prove or establish the truth: Hypothesis testing provides evidence to support or
reject a hypothesis, but it cannot confirm the absolute truth of the research question.
2. Results are sample-specific: Hypothesis testing is based on analyzing a sample from a

population, and the conclusions drawn are specific to that particular sample.
3. Possible errors: During hypothesis testing, there is a chance of committing type I error
(rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
4. Assumptions and requirements: Different tests have specific assumptions and

requirements that must be met to accurately interpret results.
7.MODELING
A model is an abstraction or representation of a real system, idea, or object. Models

capture the most important features of a problem and present them in a form that is easy to
interpret. A model can be as simple as a written or verbal description of some phenomenon, a
visual representation such as a graph or a flowchart, or a mathematical or spreadsheet
representation.
7.1 Decision Models
A decision model is a logical or mathematical representation of a problem or business situation

that can be used to understand, analyze, or facilitate making a decision. Most decision models
have three types of input:
1. Data, which are assumed to be constant for purposes of the model. Some examples would
be costs, machine capacities, and intercity distances.
2. Uncontrollable variables, which are quantities that can change but cannot be directly
controlled by the decision maker. Some examples would be customer demand, inflation rates, and
investment returns. Often, these variables are uncertain.
3. Decision variables, which are controllable and can be selected at the discretion of the
decision maker. Some examples would be production quantities, staffing levels, and investment
allocations. Decision models characterize the relationships among the data, uncontrollable
variables, and decision variables, and the outputs of interest to the decision maker.
Decision models can be represented in various ways, most typically with mathematical functions
and spreadsheets. Spreadsheets are ideal vehicles for implementing decision models because of
their versatility in managing data, evaluating different scenarios, and presenting results in a
meaningful fashion. Using these relationships, we may develop a mathematical representation by
defining symbols for each of these quantities:
TC = total cost
V = unit variable cost
F = fixed cost
Q = quantity produced
This results in the model TC = F + VQ
7.1.2 Model Assumptions:
All models are based on assumptions that reflect the modeler’s view of the “real world.”
Some assumptions are made to simplify the model and make it more tractable; that is, able to be
easily analyzed or solved. Other assumptions might be made to better characterize historical data
or past observations. The task of the modeler is to select or build an appropriate model that best
represents the behavior of the real situation. For example, economic theory tells us that demand
for a product is negatively related to its price. Thus, as prices increase, demand falls, and vice
versa (a phenomenon that you may recognize as price elasticity—the ratio of the percentage
change in demand to the percentage change in price). Different mathematical models can describe
this phenomenon.
7.2 Prescriptive Decision Models
A prescriptive decision model helps decision makers to identify the best solution to a
decision problem. Optimization is the process of finding a set of values for decision variables that
minimize or maximize some quantity of interest—profit, revenue, cost, time, and so on—called
the objective function. Any set of decision variables that optimizes the objective function is
called an optimal solution. In a highly competitive world where one percentage point can mean a
difference of hundreds of thousands of dollars or more, knowing the best solution can mean the
difference between success and failure.
Prescriptive decision models can be either deterministic or stochastic. A deterministic
model is one in which all model input information is either known or assumed to be known with
certainty. A stochastic model is one in which some of the model input information is uncertain.
For instance, suppose that customer demand is an important element of some model. We can make
the assumption that the demand is known with certainty; say, 5,000 units per month. In this case
we would be dealing with a deterministic model. On the other hand, suppose we have evidence to
indicate that demand is uncertain, with an average value of 5,000 units per month, but which
typically varies between 3,200 and 6,800 units. If we make this assumption, we would be dealing
with a stochastic model.
7.3 Uncertainty and Risks:

As we all know, the future is always uncertain. Thus, many predictive models incorporate
uncertainty and help decision makers analyze the risks associated with their decisions. Uncertainty
is imperfect knowledge of what will happen; risk is associated with the consequences and
likelihood of what might happen.
For example, the change in the stock price of Apple on the next day of trading is uncertain.
However, if you own Apple stock, then you face the risk of losing money if the stock price falls. If
you don’t own any stock, the price is still uncertain although you would not have any risk. Risk is
evaluated by the magnitude of the consequences and the likelihood that they would occur. For
example, a 10% drop in the stock price would incur a higher risk if you own $1 million than if you
only owned $1,000. Similarly, if the chances of a 10% drop were 1 in 5, the risk would be higher
than if the chances were only 1 in 100. The importance of risk in business has long been
recognized.
8.Model Validation
Model validation is defined within regulatory guidance as “the set of processes and
activities intended to verify that models are performing as expected, in line with their design
objectives, and business uses.” It also identifies “potential limitations and assumptions, and
assesses their possible impact.”
Generally, validation activities are performed by individuals independent of model
development or use. Models, therefore, should not be validated by their owners as they can be
highly technical, and some institutions may find it difficult to assemble a model risk team that has
sufficient functional and technical expertise to carry out independent validation. When faced with
this obstacle, institutions often outsource the validation task to third parties.
In statistics, model validation is the task of confirming that the outputs of a statistical
model are acceptable with respect to the real data-generating process. In other words, model
validation is the task of confirming that the outputs of a statistical model have enough fidelity to
the outputs of the data-generating process that the objectives of the investigation can be achieved
8.1 The Four Elements
Model validation consists of four crucial elements which should be considered:
1. Conceptual Design
The foundation of any model validation is its conceptual design, which needs documented
coverage assessment that supports the model’s ability to meet business and regulatory needs and
the unique risks facing a bank.
The design and capabilities of a model can have a profound effect on the overall
effectiveness of a bank’s ability to identify and respond to risks. For example, a poorly designed
risk assessment model may result in a bank establishing relationships with clients that present a
risk that is greater than its risk appetite, thus exposing the bank to regulatory scrutiny and
reputation damage.
A validation should independently challenge the underlying conceptual design and ensure that
documentation is appropriate to support the model’s logic and the model’s ability to achieve
desired regulatory and business outcomes for which it is designed.
2. System Validation
All technology and automated systems implemented to support models have limitations.
An effective validation includes: firstly, evaluating the processes used to integrate the model’s
conceptual design and functionality into the organisation’s business setting; and, secondly,
examining the processes implemented to execute the model’s overall design. Where gaps or
limitations are observed, controls should be evaluated to enable the model to function effectively.
3. Data Validation and Quality Assessment
Data errors or irregularities impair results and might lead to an organisation’s failure to
identify and respond to risks. Best practise indicates that institutions should apply a risk-based
data validation, which enables the reviewer to consider risks unique to the organisation and the
model.
To establish a robust framework for data validation, guidance indicates that the accuracy of source
data be assessed. This is a vital step because data can be derived from a variety of sources, some
of which might lack controls on data integrity, so the data might be incomplete or inaccurate.
4. Process Validation
To verify that a model is operating effectively, it is important to prove that the established
processes for the model’s ongoing administration, including governance policies and procedures,
support the model’s sustainability. A review of the processes also determines whether the models
are producing output that is accurate, managed effectively, and subject to the appropriate controls.
If done effectively, model validation will enable your bank to have every confidence in its various
models’ accuracy, as well as aligning them with the bank’s business and regulatory expectations.
By failing to validate models, banks increase the risk of regulatory criticism, fines, and penalties.
The complex and resource-intensive nature of validation makes it necessary to dedicate
sufficient resources to it. An independent validation team well versed in data management,
technology, and relevant financial products or services — for example, credit, capital
management, insurance, or financial crime compliance — is vital for success. Where shortfalls in
the validation process are identified, timely remedial actions should be taken to close the gaps.
Data Validation in Excel
The following example is an introduction to data validation in Excel. The data validation
button under the data tab provides the user with different types of data validation checks based on
the data type in the cell. It also allows the user to define custom validation checks using Excel
formulas. The data validation can be found in the Data Tools section of the Data tab in the ribbon
of Excel:
Fig 1: Data validation tool in Excel

Data Entry Task
The example below illustrates a case of data entry, where the province must be entered for every
store location. Since stores are only located in certain provinces, any incorrect entry should be
caught.It is accomplished in Excel using a two-fold data validation. First, the relevant provinces
are incorporated into a drop-down menu that allows the user to select from a list of valid
provinces.
Fig. 2: First level of data validation
Second, if the user inputs a wrong province by mistake, such as “NY” instead of “NS,” the system
warns the user of the incorrect input.
Fig. 3: Second level of data validation
Further, if the user ignores the warning, an analysis can be conducted using the data validation
feature in Excel that identifies incorrect inputs.
Fig.
4: Final level of data validation
8.2 Model Evaluation

Model Evaluation is an integral part of the model development process. It helps to find the best
model that represents our data and how well the chosen model will work in the future. Evaluating
model performance with the data used for training is not acceptable in data science because it can
easily generate overoptimistic and overfitted models. There are two methods of evaluating models
in data science, Hold-Out and Cross-Validation. To avoid overfitting, both methods use a test set
(not seen by the model) to evaluate model performance.
● Hold-Out: In this method, the mostly large dataset is randomly divided to three subsets:
1. Training set is a subset of the dataset used to build predictive models.
2. Validation set is a subset of the dataset used to assess the performance of model built in
the training phase. It provides a test platform for fine tuning model’s parameters and selecting the
best-performing model. Not all modelling algorithms need a validation set.
3. Test set or unseen examples is a subset of the dataset to assess the likely future
performance of a model. If a model fit to the training set much better than it fits the test set,
overfitting is probably the cause.
● Cross-Validation: When only a limited amount of data is available, to achieve an unbiased
estimate of the model performance we use k-fold cross-validation. In k-fold cross-validation, we
divide the data into k subsets of equal size. We build models ktimes, each time leaving out one of
the subsets from training and use it as the test set.
If k equals the sample size, this is called “leave-one-out”
Model evaluation can be divided to two sections:

● Classification Evaluation.
● Regression Evaluation.
9. Interpretation:
Data interpretation is the process of reviewing data and drawing meaningful conclusions
using a variety of analytical approaches. Data interpretation aids researchers in categorizing,
manipulating, and summarising data in order to make sound business decisions. The end goal for a
data interpretation project is to develop a good marketing strategy or to expand its client user base.
There are certain steps followed to conduct data interpretation:
● Putting together the data you’ll need( neglecting irrelevant data) ● Developing the initial
research or identifying the most important inputs;
● Sorting and filtering of data.
● Forming conclusions on the data.
● Developing recommendations or practical solutions.
9.1 Types of data interpretation

The purpose of data interpretation is to assist individuals in understanding numerical data
that has been gathered, evaluated, and presented.
9.1.1 Qualitative data Interpretation
To evaluate qualitative data, also known as categorical data, the qualitative data
interpretation approach is utilized. Words, instead of numbers or patterns, are used to describe
data in this technique. Unlike quantitative data, which can be studied immediately after collecting
and sorting it, qualitative data must first be converted into numbers before being analyzed. This is
due to the fact that analyzing texts in their original condition is frequently time-consuming and
results in a high number of mistakes. The analyst’s coding should also be defined so that it may be
reused and evaluated by others.
Observations: a description of the behavioral patterns seen in a group of people. The length of
time spent on an activity, the sort of activity, and the form of communication used might all be
examples of these patterns.
Groups of people: To develop a collaborative discussion about a study issue, group people and
ask them pertinent questions.
Research: Similar to how patterns of behavior may be noticed, different forms of documentation
resources can be classified and split into categories based on the type of information they include.
Interviews are one of the most effective ways to get narrative data. Themes, topics, and categories
can be used to group inquiry replies. The interview method enables extremely targeted data
segmentation.
The following methods are commonly used to produce qualitative data:

● Transcripts of interviews
● Questionnaires with open-ended answers
● Transcripts from call centers
● Documents and texts
● Audio and video recordings are available.
● Notes from the field
Now the second step is to interpret the data that is produced. This is done by the following
methods:
Content Analysis
This is a popular method for analyzing qualitative data. Other approaches to analysis may
fall under the general category of content analysis. An aspect of the content analysis is thematic
analysis. By classifying material into words, concepts, and themes, content analysis is used to
uncover patterns that arise from the text.
Narrative Analysis
The focus of narrative analysis is on people’s experiences and the language they use to
make sense of them. It’s especially effective for acquiring a thorough insight into customers’
viewpoints on a certain topic. We might be able to describe the results of a targeted case study
using narrative analysis.
Discourse Analysis
Discourse analysis is a technique for gaining a comprehensive knowledge of the political,
cultural, and power dynamics that exist in a given scenario. The emphasis here is on how people
express themselves in various social settings. Brand strategists frequently utilize discourse
analysis to figure out why a group of individuals reacts the way they do to a brand or product.
It’s critical to be very clear on the type and scope of the study topic in order to get the most
out of the analytical process. This will assist you in determining which research collection routes
are most likely to assist you in answering your query.
Your approach to qualitative data analysis will differ depending on whether you are a
corporation attempting to understand consumer sentiment or an academic surveying a school.
9.1.2 Quantitative data Interpretation

Quantitative data, often known as numerical data, is analyzed using the quantitative data
interpretation approach. Because this data type contains numbers, it is examined using numbers
rather than words. Quantitative analysis is a collection of procedures for analyzing numerical data.
It frequently requires the application of statistical modeling techniques such as standard deviation,
mean, and median. Let’s try and understand these;
Median: The median is the middle value in a list of numbers that have been sorted ascending or
descending, and it might be more descriptive of the data set than the average.
Mean: The basic mathematical average of two or more values is called a mean. The arithmetic
mean approach, which utilizes the sum of the values in the series, and the geometric mean method,
which is the average number of products, are two ways to determine the mean for a given
collection of numbers.
Standard deviation: The positive square root of the variance is the standard deviation. One of the
most fundamental approaches to statistical analysis is the standard deviation. A low standard
deviation indicates that the values are near to the mean, whereas a large standard deviation
indicates that the values are significantly different from the mean.
There are three common uses for quantitative analysis.
● For starters, it’s used to compare and contrast groupings. For instance, consider the
popularity of certain car brands with different colors.
● It’s also used to evaluate relationships between variables.
● Third, it’s used to put scientifically sound theories to the test. Consider a hypothesis
concerning the effect of a certain vaccination.
Regression analysis
A collection of statistical procedures for estimating connections between a dependent
variable and one or maybe more independent variables is known as regression analysis. It may be
used to determine the strength of a relationship across variables and to predict how they will
interact in the future.
Cohort Analysis
Cohort analysis is a technique for determining how engaged users are over time. It’s useful
to determine whether user engagement is improving over time or just looking to improve due to
growth. Cohort analysis is useful because it helps to distinguish between growth and engagement
measures. Cohort analysis is watching how individuals’ behavior develops over time in groups of
people.
Predictive Analysis
By examining historical and present data, the predictive analytic approach seeks to forecast
future trends. Predictive analytics approaches, which are powered by machine learning and deep
learning, allow firms to notice patterns or possible challenges ahead of time and prepare educated
initiatives. Predictive analytics is being used by businesses to address issues and identify new
possibilities.
Prescriptive Analysis
The prescriptive analysis approach employs tools like as graph analysis,

Prescriptive analytics is a sort of data analytics in which technology is used to assist organizations
in making better decisions by analyzing raw data. Prescriptive analytics, in particular, takes into
account information about potential situations or scenarios, available resources, previous
performance, and present performance to recommend a course of action or strategy. It may be
used to make judgments throughout a wide range of time frames, from the immediate to the long
term.
Conjoint Analysis
Conjoint analysis is the best market research method for determining how much customers
appreciate a product’s or service’s qualities. This widely utilized method mixes real-life scenarios
and statistical tools with market decision models
Cluster analysis
Any organization that wants to identify distinct groupings of consumers, sales transactions,
or other sorts of behaviors and items may use cluster analysis as a valuable data-mining technique.
The goal of cluster analysis is to uncover groupings of subjects that are similar, where “similarity”
between each pair of subjects refers to a global assessment of the entire collection of features.
Cluster analysis, similar to factor analysis, deals with data matrices in which the variables haven’t
been partitioned into criteria and predictor subsets previously.
10. Deployment and Iteration:
The iterative process is the practice of building, refining, and improving a project, product,
or initiative. Teams that use the iterative development process create, test, and revise until they’re
satisfied with the end result. You can think of an iterative process as a trial-and-error methodology
that brings your project closer to its end goal.
Iterative processes are a fundamental part of lean methodologies and Agile project
management—but these processes can be implemented by any team, not just Agile ones. During
the iterative process, you will continually improve your design, product, or project until you and
your team are satisfied with the final project deliverable.
10.1 The benefits and challenges of the iterative process
The iterative model isn’t right for every team—or every project. Here are the main pros
and cons of the iterative process for your team.
Pros:
● Increased efficiency. Because the iterative process embraces trial and error, it can often
help you achieve your desired result faster than a non-iterative process.
 Increased collaboration. Instead of working from predetermined plans and specs (which
also takes a lot of time to create), your team is actively working together.
● Increased adaptability. As you learn new things during the implementation and testing
phases, you can tweak your iteration to best hit your goals—even if that means doing
something you didn’t expect to be doing at the start of the iterative process.
● More cost effective. If you need to change the scope of the project, you’ll only have
invested the minimum time and effort into the process.
● Ability to work in parallel. Unlike other, non-iterative methodologies like the waterfall
method, iterations aren’t necessarily dependent on the work that comes before them. Team
members can work on several elements of the project in parallel, which can shorten your
overall timeline.
● Reduced project-level risk. In the iterative process, risks are identified and addressed
during each iteration. Instead of solving for large risks at the beginning and end of the
project, you’re consistently working to resolve low-level risks.
● More reliable user feedback. When you have an iteration that users can interact with or see,
they’re able to give you incremental feedback about what works or doesn’t work for them.
Cons:
● Increased risk of scope creep. Because of the trial-and-error nature of the iterative process,
your project could develop in ways you didn’t expect and exceed your original project
scope.
● Inflexible planning and requirements. The first step of the iterative process is to define
your project requirements. Changing these requirements during the iterative process can
break the flow of your work, and cause you to create iterations that don’t serve your
project’s purpose.
● Vague timelines. Because team members will create, test, and revise iterations until they
get to a satisfying solution, the iterative timeline isn’t clearly defined. Additionally,
testing for different increments can vary in length, which also impacts the overall
iterative process timeline.

BA Unit 1

Uploaded by

Copyright:

Available Formats

BA Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BA Unit 1

Uploaded by

Copyright:

Available Formats

CCW331 - BUSINESS ANALYTICS

UNIT I INTRODUCTION TO BUSINESS ANALYTICS

Analytics is a body of knowledge consisting of statistical, mathematical and operations

Many companies use analytics as a competitive strategy. A typical data-driven decision

1. Identify the problem or opportunity for value creation.

1.1 Data Science:

Importance of Data manipulation

Data manipulation offers several advantages to businesses, including:

Business Intelligence Data processing in a nutshell

Common types of data visualizations

Monthly sales bar chart

One more common type of chart we see everywhere, is a pie chart.

When to use: composition of an object, comparing parts to the whole object.

Pie chart showing percentage correlation of ice cream flavour preference

Radar or spider chart

Spider chart structure

Dot map or density map

1. Specifics of your data set: domain of knowledge or department in your company

1.1.3 What is statistical analysis?

Now, let us dive into the steps involved in business analytics:

2. Business Analytics Process

Step 1: Identifying the Problem

Step 2: Exploring Data

Step 4: Prediction and Optimization

Step 5: Making a Decision and Evaluating the Outcome

Step 6: Optimizing and Updating

2.3 TYPES OF ANALYTICS :

2.3.1. Descriptive Analytics

2.3.2 Diagnostic Analytics

Problem-solving in business is defined as implementing processes that reduce or remove

3.1 Why Problem Solving Is Important in Business

● Apply a standard problem-solving system to all challenges

3.2 How to Solve Business Problems Effectively

4 . WHAT IS DATA COLLECTION?

Data collection is the methodological process of gathering information about a specific

In general, there are three types of consumer data:

● First-party data, which is collected directly from users by your organization

Although there are use cases for second- and

Data can be qualitative (meaning contextual in nature) or quantitative (meaning numeric in

7. Social Media Monitoring

Monitoring your company’s social media channels for follower engagement is an

5. What Is Data Preparation?

5.1 Why Is Data Preparation Important?

With the right

5.2 What Steps Are Involved in Data Preparation Processes?

Below is a deeper look at each part of the process.

5.2.2 Explore Data

5.2.3 Cleanse Data

5.2.4 Transform Data

6. HYPOTHESIS GENERATION OR TESTING:

Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a

6.1 Types of Hypotheses

6.1.1 Alternative Hypothesis

H0: Student has passed

H1: Student has failed

6.3 Two types of hypothesis testing

6.4 Performing Hypothesis Testing

Step 1 : Specify the null hypothesis and the alternative hypothesis

Step 2 : What level of significance?