BA Unit 1
BA Unit 1
BA Unit 1
LECTURE NOTES
Analytics and Data Science – Analytics Life Cycle – Types of Analytics – Business Problem
Definition – Data Collection – Data Preparation – Hypothesis Generation – Modeling –
Validation and Evaluation – Interpretation – Deployment and Iteration
Introduction:
Every organization across the world uses performance measures such as market share,
profitability, sales growth, return on investments (ROI), customer satisfaction, and so on for
quantifying, monitoring, and improving its performance.
Organisation should understand the KPI’s (Key performance Indicators) and the factors that have
impact on KPI’s.
1. Analytics:
Data Manipulation:
With the help of data manipulation techniques, you can find interesting insights from the
raw data with minimal effort. Data manipulation is the process of organizing information to make
it readable and understandable. Engineers perform data manipulation using data manipulation
language (DML) capable of adding, deleting, or altering data. Data comes from various sources.
While working with disparate data, you need to organize, clean, and transform it to use it
in your decision-making process. This is where data manipulation fits in. Data manipulation
allows you to manage and integrate data helping drive actionable insights.
Data manipulation, also known as data preparation, enables users to turn static data into
fuel for business intelligence and analytics. Many data scientists use data preparation software to
organize data and generate reports, so non-analysts and other stakeholders can derive valuable
information and make informed decisions.
Data manipulation makes it easier for organizations to organize and analyse data as
needed. It helps them perform vital business functions such as analyzing trends, buyer behaviour,
and drawing insights from their financial data.
● Consistency: Data manipulation maintains consistency across data accumulated from different
sources, giving businesses a unified view that helps them make better, more informed decisions.
● Usability: Data manipulation allows users to cleanse and organize data and use it more
efficiently.
● Forecasting: Data manipulation enables businesses to understand historical data and helps
them prepare future forecasts, especially in financial data analysis.
● Cleansing: Data manipulation helps clear unwanted data and keep information that matters.
Enterprises can clean up records, isolate, and even reduce unnecessary variables, and focus on the
data they need.
Data visualization:
It is the practice of converting raw information (text, numbers, or symbols) into a graphic format.
The data is visualized with a clear purpose: to show logical correlations between units, and define
inclinations, tendencies, and patterns. Depending on the type of logical connection and the data
itself, visualization can be done in a suitable format. So, it’s dead simple, any analytical report
contains examples of data interpretations like pie charts, comparison bars, demographic maps, and
much more.
As we’ve mentioned, a data representation tool is just the user interface of the whole
business intelligence system. Before it can be used for creating visuals, the data goes through a
long process. This is basically a description of how Business Analytics works, so we’ll break it
down into the stages shortly:
1. First things first, you should define data sources and data types that will be used. Then
transformation methods and database qualities are determined.
2. Following that, the data is sourced from its initial storages, for example, Google
Analytics, ERP, CRM, or SCM system.
3. Using API channels, the data is moved to a staging area where it is transformed.
Transformation assumes data cleaning, mapping, and standardizing to a unified format.
4. Further, cleaned data can be moved into a storage: a usual database or data warehouse.
To make it possible for the tools to read data, the original base language of datasets can
also be rewritten.
Bar chart
A bar chart is one of the basic ways to compare data units to each other. Because of
its simple graphic form, a bar chart is often used in Business Analytics as an interactive page
element.
Bar charts are versatile enough to be modified and show more complex data models. The bars
can be structured in clusters or be stacked, to depict distribution across market segments, or
subcategories of items. The same goes for horizontal bar charts, fitting more for long data
labels to be placed on the bars.
When to use: comparing objects, numeric information. Use horizontal charts to fit long data
labels. Place stacks in bars to break each object into segments for a more detailed comparison.
Pie chart
This type of chart is used in any marketing or sales department, because it makes it easy to
demonstrate the composition of objects or unit-to-unit comparison.
Line Graph
This type of visual utilizes a horizontal axis and a vertical axis to depict the value of a unit
over time.
Line graphs can also be combined with bar charts to represent data from multiple
dimensions.
When to use: object value on the timeline, depicting tendencies in behavior over time.
Sales analysis by payment methods
Box plot
At first glance, a box plot looks pretty complicated. But if we look closer at the example, it
becomes evident that it depicts quarters in a horizontal fashion. Our main elements here are
minimum, maximum, and the median placed in between the first and third quartile. What a box
shows is the distribution of objects, and their deviation from the median.
When to use: Distribution of the complex object, deviation from the median value.
Box plot divided into 5 quartiles, while outliers are shown as object that fall out of distribution
area
Scatter plot
This type of visualization is built on X and Y axes. Between them, there are dots placed
around, defining objects. The position of a dot on the graph denotes which qualities it has.
As in the case of line graphs, dots placed between the axes are noticed in a split second.
The only limitation of this type of visualization is the number of axes.
When to use: showing distribution of objects, defining the quality of each object on the graph.
A sad scatterplot showing the inability of young people to earn money
Its purpose is the same as for a line chart. But because of the number of axes, you can
compare units from various angles and show the inclinations graphically.
When to use: describing data qualities, comparing multiple objects to each other through different
dimensions.
Superimposing a visualization over the map works for data’s geographical domain. Density maps
are built with the help of dots placed on the map, marking the location of each unit.
A simple representation of a dot map
Funnel charts
These are perfect for showing narrowing correlations between different groups of items. In most
cases, funnels will utilize both geometric
form and colour coding to differentiate items.
The example shows conversion results starting from total traffic number and the number of
subscribers
This type of chart is also handy when there are multiple stages in the process. On the example
above, we can see that after the “Contacted Support” stage, the number of subscribers has been
reduced.
When to use: depicting processual stages with the narrowing percentage of value/objects
In choosing the type of visualization, make sure you clearly understand the following points:
Statistical analysis is the process of collecting and analyzing samples of data to uncover
patterns and trends and predict what could happen next to make better and more scientific
decisions.
Once the data is collected, statistical analysis can be used for many things in your business.
Some include:
● Summarizing and presenting the data in a graph or chart to present key findings
● Discovering crucial measures within the data, like the mean
● Calculating if the data is slightly clustered or spread out, which also determines
similarities.
● Making future predictions based on past behavior
● Testing a hypothesis from an experiment
There are several ways that businesses can use statistical analysis to their advantage. Some of
these ways include identifying who on your sales staff is performing poorly, finding trends in
customer data, narrowing down the top operating product lines, conducting financial audits, and
getting a better understanding of how sales performance can vary in different regions of the
country.
Just like any other thing in business, there is a process involved in business analytics as well.
Business analytics needs to be systematic, organized, and include step-by-step actions to have the
most optimized result at the end with the least amount of discrepancies.
● Business Problem Framing: In this step, we basically find out what business problem we are
trying to solve, e.g., when we are looking to find out why the supply chain isn’t as effective as it
should be or why we are losing sales. This discussion generally happens with stakeholders when
they realize inefficiency in any part of the business.
● Analytics Problem Framing: Once we have the problem statement, what we need to think of
next is how analytics can be done for that business analytics problem. Here, we look for metrics
and specific points that we need to analyze.
● Data: The moment we identify the problem in terms of what needs to be analyzed, the next
thing that we need is data, which needs to be analyzed. In this step, not only do we obtain data
from various data sources but we also clean the data; if the raw data is corrupted or has false
values, we remove those problems and convert the data into usable form.
● Methodology selection and model building: Once the data gets ready, the tricky part begins.
At this stage, we need to determine what methods have to be used and what metrics are the crucial
ones. If required, the team has to build custom models to find out the specific methods that are
suited to respective operations. Many times, the kind of data we possess also dictates the
methodology that can be used to do business analytics. Most organizations make multiple models
and compare them based on the decided-upon crucial metrics.
● Deployment: Post the selection of the model and the statistical ways of analyzing data for the
solution, the next thing we need to do is to test the solution in a real-time scenario. For that, we
deploy the models on the data and look for different kinds of insights. Based on the metrics and
data highlights, we need to decide the optimum strategy to solve our problem and implement a
solution effectively. Even in this phase of business analytics, we will compare the expected output
with the real-time output. Later, based on this, we will decide if there is a need to reiterate and
modify the solution or if we can go on with the implementation of the same.
The Business Analytics process involves asking questions, looking at data, and manipulating it to
find the required answers. Now, every organization has different ways to execute this process as
all of these organizations work in different sectors and value different metrics more than the others
based on their specific business model.
Since the approach to business is different for different organizations, their solutions and their
ways to reach the solutions are also different. Nonetheless, all of the actions that they do can be
classified and generalized to understand their approach. The image given below demonstrates the
steps in Business Analytics process of a firm:
2.1 Six Steps in the Business Analytics Lifecycle
The first step of the process is identifying the business problem. The problem could be an actual
crisis; it could be something related to recognizing business needs or optimizing current processes.
This is a crucial stage in Business Analytics as it is important to clearly understand what the
expected outcome should be. When the desired outcome is determined, it is further broken down
into smaller goals. Then, business stakeholders decide the relevant data required to solve the
problem. Some important questions must be answered in this stage, such as: What kind of data is
available? Is there sufficient data? And so on.
Once the problem statement is defined, the next step is to gather data (if required) and, more
importantly, cleanse the data—most organizations would have plenty of data, but not all data
points would be accurate or useful. Organizations collect huge amounts of data through different
methods, but at times, junk data or empty data points would be present in the dataset. These faulty
pieces of data can hamper the analysis. Hence, it is very important to clean the data that has to be
analyzed.
To do this, you must do computations for the missing data, remove outliers, and find new
variables as a combination of other variables. You may also need to plot time series graphs as they
generally indicate patterns and outliers. It is very important to remove outliers as they can have a
heavy impact on the accuracy of the model that you create. Moreover, cleaning the data helps you
get a better sense of the dataset.
Step 3: Analysis
Once the data is ready, the next thing to do is analyze it. Now to execute the same, there are
various kinds of statistical methods (such as hypothesis testing, correlation, etc.) involved to find
out the insights that you are looking for. You can use all of the methods for which you have the
data.
The prime way of analyzing is pivoting around the target variable, so you need to take into
account whatever factors that affect the target variable. In addition to that, a lot of assumptions are
also considered to find out what the outcomes can be. Generally, at this step, the data is sliced, and
the comparisons are made. Through these methods, you are looking to get actionable insights.
Gone are the days when analytics was used to react. In today’s era, Business Analytics is all about
being proactive. In this step, you will use prediction techniques, such as neural networks or
decision trees, to model the data. These prediction techniques will help you find out hidden
insights and relationships between variables, which will further help you uncover patterns on the
most important metrics. By principle, a lot of models are used simultaneously, and the models
with the most accuracy are chosen. In this stage, a lot of conditions are also checked as
parameters, and answers to a lot of ‘what if…?’ questions are provided.
From the insights that you receive from your model built on target variables, a viable plan of
action will be established in this step to meet the organization’s goals and expectations. The said
plan of action is then put to work, and the waiting period begins. You will have to wait to see the
actual outcomes of your predictions and find out how successful you were in your endeavors.
Once you get the outcomes, you will have to measure and evaluate them.
Post the implementation of the solution, the outcomes are measured as mentioned above. If you
find some methods through which the plan of action can be optimized, then those can be
implemented. If that is not the case, then you can move on with registering the outcomes of the
entire process. This step is crucial for any analytics in the future because you will have an ever-
improving database. Through this database, you can get closer and closer to maximum
optimization. In this step, it is also important to evaluate the ROI (return on investment). Take a
look at the diagram above of the life cycle of business analytics.
For different stages of business analytics huge amount of data is processed at various steps.
Depending on the stage of the workflow and the requirement of data analysis, there are four main
kinds of analytics – descriptive, diagnostic, predictive and prescriptive. These four types together
answer everything a company needs to know- from what’s going on in the company to what
solutions to be adopted for optimising the functions.
The four types of analytics are usually implemented in stages and no one type of analytics is said
to be better than the other.
Before diving deeper into each of these, let’s define the four types of analytics:
1) Descriptive Analytics: Describing or summarising the existing data using existing business
intelligence tools to better understand what is going on or what has happened.
2) Diagnostic Analytics: Focus on past performance to determine what happened and why.
The result of the analysis is often an analytic dashboard.
3) Predictive Analytics: Emphasizes on predicting the possible outcome using statistical models
and machine learning techniques.
4) Prescriptive Analytics: It is a type of predictive analytics that is used to recommend one or
more course of action on analyzing the data. Let’s understand these in a bit more depth.
This can be termed as the simplest form of analytics. The mighty size of big data is beyond human
comprehension and the first stage hence involves crunching the data into understandable chunks.
The purpose of this analytics type is just to summarise the findings and understand what is going
on.
Among some frequently used terms, what people call as advanced analytics or business
intelligence is basically usage of descriptive statistics (arithmetic operations, mean, median, max,
percentage, etc.) on existing data. It is said that 80% of business analytics mainly involves
descriptions based on aggregations of past performance. It is an important step to make raw data
understandable to investors, shareholders and managers. This way it gets easy to identify and
address the areas of strengths and weaknesses such that it can help in strategizing.The two main
techniques involved are data aggregation and data mining stating that this method is purely used
for understanding the underlying behavior and not to make any estimations. By mining historical
data, companies can analyze the consumer behaviors and engagements with their businesses that
could be helpful in targeted marketing, service improvement, etc. The tools used in this phase are
MS Excel, MATLAB, SPSS, STATA, etc
Diagnostic analytics is used to determine why something happened in the past. It is characterized
by techniques such as drill-down, data discovery, data mining and correlations. Diagnostic
analytics takes a deeper look at data to understand the root causes of the events. It is helpful in
determining what factors and events contributed to the outcome. It mostly uses probabilities,
likelihoods, and the distribution of outcomes for the analysis.
In a time series data of sales, diagnostic analytics would help you understand why the sales
have decrease or increase for a specific year or so. However, this type of analytics has a limited
ability to give actionable insights. It just provides an understanding of causal relationships and
sequences while looking backward.
A few techniques that use diagnostic analytics include attribute importance, principal
components analysis, sensitivity analysis, and conjoint analysis. Training algorithms for
classification and regression also fall in this type of analytics.
2.3.3 Predictive Analytics
As mentioned above, predictive analytics is used to predict future outcomes. However, it is
important to note that it cannot predict if an event will occur in the future; it merely forecasts what
are the probabilities of the occurrence of the event. A predictive model builds on the preliminary
descriptive analytics stage to derive the possibility of the outcomes.
The analytics is found in sentiment analysis where all the opinions posted on social media are
collected and analyzed (existing text data) to predict the person’s sentiment on a particular subject
as being- positive, negative or neutral (future prediction).
Hence, predictive analytics includes building and validation of models that provide accurate
predictions. Predictive analytics relies on machine learning algorithms like random forests, SVM,
etc. and statistics for learning and testing the data. Usually, companies need trained data scientists
and machine learning experts for building these models. The most popular tools for predictive
analytics include Python, R, RapidMiner, etc.
The prediction of future data relies on the existing data as it cannot be obtained otherwise. If the
model is properly tuned, it can be used to support complex forecasts in sales and marketing. It
goes a step ahead of the standard BI in giving accurate predictions.
2.3.4 Prescriptive Analytics
The basis of this analytics is predictive analytics but it goes beyond the three mentioned
above to suggest the future solutions. It can suggest all favorable outcomes according to a
specified course of action and also suggest various course of actions to get to a particular outcome.
Hence, it uses a strong feedback system that constantly learns and updates the relationship
between the action and the outcome.
The computations include optimizations of some functions that are related to the desired outcome.
For example, while calling for a cab online, the application uses GPS to connect you to the correct
driver from among a number of drivers found nearby. Hence, it optimizes the distance for faster
arrival time. Recommendation engines also use prescriptive analytics.
The other approach includes simulation where all the key performance areas are combined to
design the correct solutions. It makes sure whether the key performance metrics are included in
the solution. The optimization model will further work on the impact of the previously made
forecasts. Because of its power to suggest favorable solutions, prescriptive analytics is the final
frontier of advanced analytics or data science, in today’s term.
The four techniques in analytics may make it seem as if they need to be implemented sequentially.
However, in most scenarios, companies can jump directly to prescriptive analytics. As for most of
the companies, they are aware of or are already implementing descriptive analytics but if one has
identified the key area that needs to be optimized and worked upon, they must employ prescriptive
analytics to reach the desired outcome.
According to research, prescriptive analytics is still at the budding stage and not many firms have
completely used its power. However, the advancements in predictive analytics will surely pave the
way for its development.
3. Business Problem Definition:
Understanding the importance of problem-solving skills in the workplace will help you
develop as a leader. Problem-solving skills will help you resolve critical issues and conflicts that
you come across. Problem-solving is a valued skill in the workplace because it allows you to:
There are many different problem-solving skills, but most can be broken into general steps.
Here is a four-step method for business problem solving:
1) Identify the Details of the Problem: Gather enough information to accurately define the
problem. This can include data on procedures being used, employee actions, relevant workplace
rules, and so on. Write down the specific outcome that is needed, but don’t assume what the
solution should be.
2) Creatively Brainstorm Solutions: Alone or with a team, state every solution you can think of.
You’ll often need to write them down. To get more solutions, brainstorm with the employees who
have the greatest knowledge of the issue.
3) Evaluate Solutions and Make a Decision: Compare and contrast alternative solutions based
on the feasibility of each one, including the resources needed to implement it and the return on
investment of each one. Finally, make a firm decision on one solution that clearly addresses the
root cause of the problem.
4) Take Action: Write up a detailed plan for implementing the solution, get the necessary
approvals, and put it into action.
In the data life cycle, data collection is the second step. After data is generated, it must be
collected to be of use to your team. After that, it can be processed, stored, managed, analyzed, and
visualized to aid in your organization’s decision-making.
Before collecting data, there are several factors you need to define:
● The question you aim to answer
● The data subject(s) you need to collect data from
● The collection timeframe
● The data collection method(s) best suited to your needs
The data collection method you select should be based on the question you want to answer, the
type of data you need, your timeframe, and your company’s budget. Explore the options in the
next section to see which data collection method is the best fit.
4.1 SEVEN DATA COLLECTION METHODS USED IN BUSINESS
ANALYTICS
1. Surveys
Surveys are physical or digital questionnaires that gather both qualitative and quantitative
data from subjects. One situation in which you might conduct a survey is gathering attendee
feedback after an event. This can provide a sense of what attendees enjoyed, what they wish was
different, and areas you can improve or save money on during your next event for a similar
audience.
Because they can be sent out physically or digitally, surveys present the opportunity for
distribution at scale. They can also be inexpensive; running a survey can cost nothing if you use a
free tool. If you wish to target a specific group of people, partnering with a market research firm to
get the survey in the hands of that demographic may be worth the money.
Something to watch out for when crafting and running surveys is the effect of bias, including:
● Collection bias: It can be easy to accidentally write survey questions with a biased lean.
Watch out for this when creating questions to ensure your subjects answer honestly and aren’t
swayed by your wording.
● Subject bias: Because your subjects know their responses will be read by you, their
answers may be biased toward what seems socially acceptable. For this reason, consider pairing
survey data with behavioral data from other collection methods to get the full picture.
2. Transactional Tracking
Each time your customers make a purchase, tracking that data can allow you to make
decisions about targeted marketing efforts and understand your customer base better.
Often, e-commerce and point-of-sale platforms allow you to store data as soon as it’s generated,
making this a seamless data collection method that can pay off in the form of customer insights.
3. Interviews and Focus Groups
Interviews and focus groups consist of talking to subjects face-to-face about a specific
topic or issue. Interviews tend to be one-on-one, and focus groups are typically made up of several
people. You can use both to gather qualitative and quantitative data.
Through interviews and focus groups, you can gather feedback from people in your target
audience about new product features. Seeing them interact with your product in real-time and
recording their reactions and responses to questions can provide valuable data about which
product features to pursue.
As is the case with surveys, these collection methods allow you to ask subjects anything you want
about their opinions, motivations, and feelings regarding your product or brand. It also introduces
the potential for bias. Aim to craft questions that don’t lead them in one particular direction.
One downside of interviewing and conducting focus groups is they can be time-consuming
and expensive. If you plan to conduct them yourself, it can be a lengthy process. To avoid this,
you can hire a market research facilitator to organize and conduct interviews on your behalf.
4. Observation
Observing people interacting with your website or product can be useful for data collection
because of the candour it offers. If your user experience is confusing or difficult, you can witness
it in real-time.
Yet, setting up observation sessions can be difficult. You can use a third-party tool to
record users’ journeys through your site or observe a user’s interaction with a beta version of your
site or product.
While less accessible than other data collection methods, observations enable you to see
first hand how users interact with your product or site. You can leverage the qualitative and
quantitative data gleaned from this to make improvements and double down on points of success.
5. Online Tracking
To gather behavioural data, you can implement pixels and cookies. These are both tools
that track users’ online behaviour across websites and provide insight into what content they’re
interested in and typically engage with.
You can also track users’ behavior on your company’s website, including which parts are
of the highest interest, whether users are confused when using it, and how long they spend on
product pages. This can enable you to improve the website’s design and help users navigate to
their destination.
Inserting a pixel is often free and relatively easy to set up. Implementing cookies may
come with a fee but could be worth it for the quality of data you’ll receive. Once pixels and
cookies are set, they gather data on their own and don’t need much maintenance, if any.
It’s important to note: Tracking online behavior can have legal and ethical privacy
implications. Before tracking users’ online behavior, ensure you’re in compliance with local and
industry data privacy standards.
6. Forms
Online forms are beneficial for gathering qualitative data about users, specifically
demographic data or contact information. They’re relatively inexpensive and simple to set up, and
you can use them to gate content or registrations, such as webinars and email newsletters.
You can then use this data to contact people who may be interested in your product, build
out demographic profiles of existing customers, and in remarketing efforts, such as email
workflows and content recommendations
Data preparation, also sometimes called “pre-processing,” is the act of cleaning and
consolidating raw data prior to using it for business analysis. It might not be the most celebrated of
tasks, but careful data preparation is a key component of successful data analysis.
Doing the work to properly validate, clean, and augment raw data is essential to draw
accurate, meaningful insights from it. The validity and power of any business analysis produced is
only as good as the data preparation done in the early stages.
The decisions that business leaders make are only as good as the data that supports them.
Careful and comprehensive data preparation ensures analysts trust, understand, and ask better
questions of their data, making their analyses more accurate and meaningful. From more
meaningful data analysis comes better insights and, of course, better outcomes.
To drive the deepest level of analysis and insight, successful teams and organizations must
implement a data preparation strategy that prioritizes:
● Accessibility: Anyone — regardless of skillset — should be able to access data securely
from a single source of truth
● Transparency: Anyone should be able to see, audit, and refine any step in the end-to-end
data preparation process that took place
● Repeatability: Data preparation is notorious for being time-consuming and repetitive,
which is why
successful
data preparation
strategies
invest in
solutions built
for repeatability.
● Acquiring data: Determining what data is needed, gathering it, and establishing consistent
access to build powerful, trusted analysis
● Exploring data: Determining the data’s quality, examining its distribution, and analyzing
the relationship between each variable to better understand how to compose an analysis
● Cleansing data: Improving data quality and overall productivity to craft error-proof
insights
● Transforming data: Formatting, orienting, aggregating, and enriching the datasets used in
an analysis to produce more meaningful insights
While data preparation processes build upon each other in a serialized fashion, it’s not always
linear. The order of these steps might shift depending on the data and questions being asked. It’s
common to revisit a previous step as new insights are uncovered or new data sources are
integrated into the process.
The entire data preparation process can be notoriously time-intensive, iterative, and repetitive.
That’s why it’s important to ensure the individual steps taken can be easily understood, repeated,
revisited, and revised so analysts can spend less time prepping and more time analyzing.
In order to find the plausibility of this hypothesis, the researcher will have to test the
hypothesis using hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true
on the basis of little or no evidence, hypothesis testing is required to have plausible evidence in
order to establish that a statistical hypothesis is true.
It implies that the two variables are related to each other and the relationship that exists
between them is not due to chance or coincidence.
When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject
of the testing process. The analyst intends to test the alternative hypothesis and verifies its
plausibility.
6.1.2 Null Hypothesis
The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there
exists no relation between two variables in statistics. It states that the effect of one variable on the
other is solely due to chance and no empirical cause lies behind it.
The null hypothesis is established alongside the alternative hypothesis and is recognized as
important as the latter. In hypothesis testing, the null hypothesis has a major role to play as it
influences the testing against the alternative hypothesis.
6.1.3 Non-Directional Hypothesis
The Non-directional hypothesis states that the relation between two variables has no
direction. Simply put, it asserts that there exists a relation between two variables, but does not
recognize the direction of effect, whether variable A affects variable B or vice versa.
6.1.4 Directional Hypothesis
The Directional hypothesis, on the other hand, asserts the direction of effect of the
relationship that exists between two variables.
Herein, the hypothesis clearly states that variable A affects variable B, or vice versa.
6.1.5 Statistical Hypothesis
A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of
statistics. By using data sampling and statistical knowledge, one can determine the plausibility of a
statistical hypothesis and find out if it stands true or not.
6.2 Type 1 and Type 2 Error
A hypothesis test can result in two types of errors.
Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being
true.
Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false,
unlike a Type-I error.
Example:
Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.
Type I error will be the teacher failing the student [rejects H0] although the student scored the
passing marks [H0 was true].
Type II error will be the case where the teacher passes the student [do not reject H0] although the
student did not score the passing marks [H1 is true].
1. One-tailed (or one-sided) test: Tests for the significance of an effect in only one direction,
either positive or negative.
2. Two-tailed (or two-sided) test: Tests for the significance of an effect in both directions,
allowing for the possibility of a positive or negative effect.
Let us perform hypothesis testing through the following 7 steps of the procedure:
The null hypothesis Ho is the statement that we are interested in testing. In this case, the
null condition is that the mean value is 250 mg/kg of sodium.
The alternative hypothesis H1 is the statement that we accept if our sample outcome
leads us to reject the null hypothesis. In our case, the alternative hypothesis is that the mean value
is not equal to 250 mg/kg of sodium. In other words, it can be significantly larger or smaller than
the value of 250 mg/kg.
So, our formal statement of the hypotheses for this example is as follows:
H1 : 𝑥̅ ≠ 250 mg/kg (i.e., indicating that the laboratory has a bias result.
Traditionally, we define the unlikely (given by symbol ) as 0.05 (5%) or less. However,
there is nothing to stop you from using = 0.1 (10%) or = 0.01 (1%) with your own
justification or reasoning.
In fact, the significance level sometimes is referred to as the probability of a Type I error.
A Type I error occurs when you falsely reject the null hypothesis on the basis of the above-
mentioned errors. A Type II error occurs when you fail to reject the null hypothesis when it is
false.
The test statistic is the value calculated from the sample to determine whether to reject the null
hypothesis. In this case, we use Student’s t-test statistic in the following manner:
𝜇=𝑥̅±(𝛼=0.05,𝑣=𝑛−1)𝑠√𝑛
or
(𝛼=0.05,𝑣=𝑛−1)=|𝑥̅−𝜇|√𝑛𝑠
By calculation, we get a t-value of 3.024 at the significance level of = 0.05 and v = (7-1) or 6
degrees of freedom for n = 7 replicates.
Reject Ho if …..We reject the null hypothesis if the test statistic is larger than a critical value
corresponding to the significance level in step 2.
There is now a question in H1 on either one-tailed (> or <) or two-tailed (≠ not equal) tests
to be addressed. If we are talking about either “greater than” or “smaller than”, we take the
significance level at = 0.05 whilst for the unequal (that means the result can be either larger or
smaller than the certified value), the significance level at = 0.025 on either side of the normal
curve is to be studied.
As our H1 is for the mean value to be larger or smaller than the certified value, we use the
2-tailed t-test for = 0.05 with 6 degrees of freedom. In this case, the t-critical value at =
0.05 and 6 degrees of freedom is 2.447 from the Student’s t-table or from using the Excel function
“=T.INV.2T(0.05,6)” or “=TINV(0.05,6) in older Excel version.
Upon calculation on the sample data, we have got a t-value of 3.024 at the significance
level of = 0.05 and v = (7-1) or 6 degrees of freedom for n = 7 replicates.
When we compare the result of step 5 to the decision rule in step 4, it is obvious that 3.024
is greater than the t-critical value of 2.447, and so we reject the null hypothesis. In other words,
the mean value of 274 mg/kg is significantly different from the certified value of 250 mg/kg.
Is it really so? We must go to step 7.
Since hypothesis testing involves some kind of probability under the disguise of
significance level, we must interpret the final decision with caution. To say that a result is
“statistically significant” sounds remarkable, but all it really means is that it is more than by
chance alone.
To do justice, it would be useful to look at the actual data to see if there are one or more
high outliers pulling up the mean value. Perhaps increasing the number of replicates might show
up any undesirable data. Furthermore, we might have to take a closer look at the test procedure
and the technical competence of the analyst to see if there were any lapses in the analytical
process. A repeated series of experiment should be able to confirm these findings.
Hypothesis testing has some limitations that researchers should be aware of:
1. It cannot prove or establish the truth: Hypothesis testing provides evidence to support or
reject a hypothesis, but it cannot confirm the absolute truth of the research question.
3. Possible errors: During hypothesis testing, there is a chance of committing type I error
(rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
7.MODELING
1. Data, which are assumed to be constant for purposes of the model. Some examples would
be costs, machine capacities, and intercity distances.
2. Uncontrollable variables, which are quantities that can change but cannot be directly
controlled by the decision maker. Some examples would be customer demand, inflation rates, and
investment returns. Often, these variables are uncertain.
3. Decision variables, which are controllable and can be selected at the discretion of the
decision maker. Some examples would be production quantities, staffing levels, and investment
allocations. Decision models characterize the relationships among the data, uncontrollable
variables, and decision variables, and the outputs of interest to the decision maker.
Decision models can be represented in various ways, most typically with mathematical functions
and spreadsheets. Spreadsheets are ideal vehicles for implementing decision models because of
their versatility in managing data, evaluating different scenarios, and presenting results in a
meaningful fashion. Using these relationships, we may develop a mathematical representation by
defining symbols for each of these quantities:
TC = total cost
V = unit variable cost
F = fixed cost
Q = quantity produced
This results in the model TC = F + VQ
7.1.2 Model Assumptions:
All models are based on assumptions that reflect the modeler’s view of the “real world.”
Some assumptions are made to simplify the model and make it more tractable; that is, able to be
easily analyzed or solved. Other assumptions might be made to better characterize historical data
or past observations. The task of the modeler is to select or build an appropriate model that best
represents the behavior of the real situation. For example, economic theory tells us that demand
for a product is negatively related to its price. Thus, as prices increase, demand falls, and vice
versa (a phenomenon that you may recognize as price elasticity—the ratio of the percentage
change in demand to the percentage change in price). Different mathematical models can describe
this phenomenon.
7.2 Prescriptive Decision Models
A prescriptive decision model helps decision makers to identify the best solution to a
decision problem. Optimization is the process of finding a set of values for decision variables that
minimize or maximize some quantity of interest—profit, revenue, cost, time, and so on—called
the objective function. Any set of decision variables that optimizes the objective function is
called an optimal solution. In a highly competitive world where one percentage point can mean a
difference of hundreds of thousands of dollars or more, knowing the best solution can mean the
difference between success and failure.
Prescriptive decision models can be either deterministic or stochastic. A deterministic
model is one in which all model input information is either known or assumed to be known with
certainty. A stochastic model is one in which some of the model input information is uncertain.
For instance, suppose that customer demand is an important element of some model. We can make
the assumption that the demand is known with certainty; say, 5,000 units per month. In this case
we would be dealing with a deterministic model. On the other hand, suppose we have evidence to
indicate that demand is uncertain, with an average value of 5,000 units per month, but which
typically varies between 3,200 and 6,800 units. If we make this assumption, we would be dealing
with a stochastic model.
8.Model Validation
Model validation is defined within regulatory guidance as “the set of processes and
activities intended to verify that models are performing as expected, in line with their design
objectives, and business uses.” It also identifies “potential limitations and assumptions, and
assesses their possible impact.”
Generally, validation activities are performed by individuals independent of model
development or use. Models, therefore, should not be validated by their owners as they can be
highly technical, and some institutions may find it difficult to assemble a model risk team that has
sufficient functional and technical expertise to carry out independent validation. When faced with
this obstacle, institutions often outsource the validation task to third parties.
In statistics, model validation is the task of confirming that the outputs of a statistical
model are acceptable with respect to the real data-generating process. In other words, model
validation is the task of confirming that the outputs of a statistical model have enough fidelity to
the outputs of the data-generating process that the objectives of the investigation can be achieved
8.1 The Four Elements
Model validation consists of four crucial elements which should be considered:
1. Conceptual Design
The foundation of any model validation is its conceptual design, which needs documented
coverage assessment that supports the model’s ability to meet business and regulatory needs and
the unique risks facing a bank.
The design and capabilities of a model can have a profound effect on the overall
effectiveness of a bank’s ability to identify and respond to risks. For example, a poorly designed
risk assessment model may result in a bank establishing relationships with clients that present a
risk that is greater than its risk appetite, thus exposing the bank to regulatory scrutiny and
reputation damage.
A validation should independently challenge the underlying conceptual design and ensure that
documentation is appropriate to support the model’s logic and the model’s ability to achieve
desired regulatory and business outcomes for which it is designed.
2. System Validation
All technology and automated systems implemented to support models have limitations.
An effective validation includes: firstly, evaluating the processes used to integrate the model’s
conceptual design and functionality into the organisation’s business setting; and, secondly,
examining the processes implemented to execute the model’s overall design. Where gaps or
limitations are observed, controls should be evaluated to enable the model to function effectively.
3. Data Validation and Quality Assessment
Data errors or irregularities impair results and might lead to an organisation’s failure to
identify and respond to risks. Best practise indicates that institutions should apply a risk-based
data validation, which enables the reviewer to consider risks unique to the organisation and the
model.
To establish a robust framework for data validation, guidance indicates that the accuracy of source
data be assessed. This is a vital step because data can be derived from a variety of sources, some
of which might lack controls on data integrity, so the data might be incomplete or inaccurate.
4. Process Validation
To verify that a model is operating effectively, it is important to prove that the established
processes for the model’s ongoing administration, including governance policies and procedures,
support the model’s sustainability. A review of the processes also determines whether the models
are producing output that is accurate, managed effectively, and subject to the appropriate controls.
If done effectively, model validation will enable your bank to have every confidence in its various
models’ accuracy, as well as aligning them with the bank’s business and regulatory expectations.
By failing to validate models, banks increase the risk of regulatory criticism, fines, and penalties.
The complex and resource-intensive nature of validation makes it necessary to dedicate
sufficient resources to it. An independent validation team well versed in data management,
technology, and relevant financial products or services — for example, credit, capital
management, insurance, or financial crime compliance — is vital for success. Where shortfalls in
the validation process are identified, timely remedial actions should be taken to close the gaps.
Data Validation in Excel
The following example is an introduction to data validation in Excel. The data validation
button under the data tab provides the user with different types of data validation checks based on
the data type in the cell. It also allows the user to define custom validation checks using Excel
formulas. The data validation can be found in the Data Tools section of the Data tab in the ribbon
of Excel:
The example below illustrates a case of data entry, where the province must be entered for every
store location. Since stores are only located in certain provinces, any incorrect entry should be
caught.It is accomplished in Excel using a two-fold data validation. First, the relevant provinces
are incorporated into a drop-down menu that allows the user to select from a list of valid
provinces.
Fig. 2: First level of data validation
Second, if the user inputs a wrong province by mistake, such as “NY” instead of “NS,” the system
warns the user of the incorrect input.
Further, if the user ignores the warning, an analysis can be conducted using the data validation
feature in Excel that identifies incorrect inputs.
Fig.
4: Final level of data validation
9. Interpretation:
Data interpretation is the process of reviewing data and drawing meaningful conclusions
using a variety of analytical approaches. Data interpretation aids researchers in categorizing,
manipulating, and summarising data in order to make sound business decisions. The end goal for a
data interpretation project is to develop a good marketing strategy or to expand its client user base.
There are certain steps followed to conduct data interpretation:
● Putting together the data you’ll need( neglecting irrelevant data) ● Developing the initial
research or identifying the most important inputs;
● Sorting and filtering of data.
● Forming conclusions on the data.
● Developing recommendations or practical solutions.
Groups of people: To develop a collaborative discussion about a study issue, group people and
ask them pertinent questions.
Research: Similar to how patterns of behavior may be noticed, different forms of documentation
resources can be classified and split into categories based on the type of information they include.
Interviews are one of the most effective ways to get narrative data. Themes, topics, and categories
can be used to group inquiry replies. The interview method enables extremely targeted data
segmentation.
Content Analysis
This is a popular method for analyzing qualitative data. Other approaches to analysis may
fall under the general category of content analysis. An aspect of the content analysis is thematic
analysis. By classifying material into words, concepts, and themes, content analysis is used to
uncover patterns that arise from the text.
Narrative Analysis
The focus of narrative analysis is on people’s experiences and the language they use to
make sense of them. It’s especially effective for acquiring a thorough insight into customers’
viewpoints on a certain topic. We might be able to describe the results of a targeted case study
using narrative analysis.
Discourse Analysis
Discourse analysis is a technique for gaining a comprehensive knowledge of the political,
cultural, and power dynamics that exist in a given scenario. The emphasis here is on how people
express themselves in various social settings. Brand strategists frequently utilize discourse
analysis to figure out why a group of individuals reacts the way they do to a brand or product.
It’s critical to be very clear on the type and scope of the study topic in order to get the most
out of the analytical process. This will assist you in determining which research collection routes
are most likely to assist you in answering your query.
Your approach to qualitative data analysis will differ depending on whether you are a
corporation attempting to understand consumer sentiment or an academic surveying a school.
● For starters, it’s used to compare and contrast groupings. For instance, consider the
popularity of certain car brands with different colors.
● Third, it’s used to put scientifically sound theories to the test. Consider a hypothesis
concerning the effect of a certain vaccination.
Regression analysis
A collection of statistical procedures for estimating connections between a dependent
variable and one or maybe more independent variables is known as regression analysis. It may be
used to determine the strength of a relationship across variables and to predict how they will
interact in the future.
Cohort Analysis
Cohort analysis is a technique for determining how engaged users are over time. It’s useful
to determine whether user engagement is improving over time or just looking to improve due to
growth. Cohort analysis is useful because it helps to distinguish between growth and engagement
measures. Cohort analysis is watching how individuals’ behavior develops over time in groups of
people.
Predictive Analysis
By examining historical and present data, the predictive analytic approach seeks to forecast
future trends. Predictive analytics approaches, which are powered by machine learning and deep
learning, allow firms to notice patterns or possible challenges ahead of time and prepare educated
initiatives. Predictive analytics is being used by businesses to address issues and identify new
possibilities.
Prescriptive Analysis
Conjoint analysis is the best market research method for determining how much customers
appreciate a product’s or service’s qualities. This widely utilized method mixes real-life scenarios
and statistical tools with market decision models
Cluster analysis
Any organization that wants to identify distinct groupings of consumers, sales transactions,
or other sorts of behaviors and items may use cluster analysis as a valuable data-mining technique.
The goal of cluster analysis is to uncover groupings of subjects that are similar, where “similarity”
between each pair of subjects refers to a global assessment of the entire collection of features.
Cluster analysis, similar to factor analysis, deals with data matrices in which the variables haven’t
been partitioned into criteria and predictor subsets previously.
The iterative process is the practice of building, refining, and improving a project, product,
or initiative. Teams that use the iterative development process create, test, and revise until they’re
satisfied with the end result. You can think of an iterative process as a trial-and-error methodology
that brings your project closer to its end goal.
Iterative processes are a fundamental part of lean methodologies and Agile project
management—but these processes can be implemented by any team, not just Agile ones. During
the iterative process, you will continually improve your design, product, or project until you and
your team are satisfied with the final project deliverable.
10.1 The benefits and challenges of the iterative process
The iterative model isn’t right for every team—or every project. Here are the main pros
and cons of the iterative process for your team.
Pros:
● Increased efficiency. Because the iterative process embraces trial and error, it can often
help you achieve your desired result faster than a non-iterative process.
Increased collaboration. Instead of working from predetermined plans and specs (which
also takes a lot of time to create), your team is actively working together.
● Increased adaptability. As you learn new things during the implementation and testing
phases, you can tweak your iteration to best hit your goals—even if that means doing
something you didn’t expect to be doing at the start of the iterative process.
● More cost effective. If you need to change the scope of the project, you’ll only have
invested the minimum time and effort into the process.
● Ability to work in parallel. Unlike other, non-iterative methodologies like the waterfall
method, iterations aren’t necessarily dependent on the work that comes before them. Team
members can work on several elements of the project in parallel, which can shorten your
overall timeline.
● Reduced project-level risk. In the iterative process, risks are identified and addressed
during each iteration. Instead of solving for large risks at the beginning and end of the
project, you’re consistently working to resolve low-level risks.
● More reliable user feedback. When you have an iteration that users can interact with or see,
they’re able to give you incremental feedback about what works or doesn’t work for them.
Cons:
● Increased risk of scope creep. Because of the trial-and-error nature of the iterative process,
your project could develop in ways you didn’t expect and exceed your original project
scope.
● Inflexible planning and requirements. The first step of the iterative process is to define
your project requirements. Changing these requirements during the iterative process can
break the flow of your work, and cause you to create iterations that don’t serve your
project’s purpose.
● Vague timelines. Because team members will create, test, and revise iterations until they
get to a satisfying solution, the iterative timeline isn’t clearly defined. Additionally,
testing for different increments can vary in length, which also impacts the overall
iterative process timeline.