Business Analytics Unit 1 Notes
Business Analytics Unit 1 Notes
Analytics and data science- Analytics life cycle-Types of Analytics-Business Problem definition -
Data collection- Data preparation - Hypothesis generation - Modeling - Validation and Evaluation
– Interpretation - Deployment and iteration.
1.1 Analytics and data science
Analytics:
Analytics is a body of knowledge consisting of statistical, mathematical and operations
research techniques, and Artificial intelligence techniques such as machine learning and deep
learning algorithms, data collection and storage, data management processes such as data
extraction, transformation and loading (ETL).
Business Analytics
• Definition: Practice of using data analysis and statistical methods for insights and informed decision
making.
• Key Focus: Collection, processing, and interpretation of large data volumes.
• Goal: Uncover patterns, trends, and correlations to drive strategic and operational improvements.
Techniques in Business Analytics
Applications
Marketing, sales, operations, finance, supply chain, customer service.
Benefits:
Data driven decisions
Process optimization
Performance improvement
Market opportunity identification
Risk mitigation
Competitive advantage
•Data-Driven Decision Making
–Enables informed, evidence-based decisions, leading to accurate and reliable outcomes.
•Improved Operational Efficiency
–Optimizes processes, reduces costs, and enhances resource allocation.
•Enhanced Business Performance
–Provides insights into customer behavior and market trends to drive growth and satisfaction.
CCW331 Business Analytics Page 1
•Improved Risk Management
–Identifies and mitigates risks, detects fraud, and enhances compliance.
•Personalized Customer Experiences
–Tailors products, services, and marketing to individual customer needs, boosting loyalty.
•Competitive Advantage
–Offers a strategic edge through data-driven decisions and anticipation of market trends.
•Improved Marketing and Sales Effectiveness
–Optimizes campaigns, targets the right audience, and enhances customer engagement.
•Innovation and New Product Development
–Identifies market gaps and customer needs, driving innovation and product improvement.
•Continuous Improvement
–Fosters a culture of refinement through data insights and performance tracking.
•Efficient Resource Allocation
–Optimizes allocation of budgets and resources, improving overall utilization and reducing waste.
Challenges of Business Analytics
1. Data Quality and Availability
Issues: Poor data quality, incomplete or inconsistent data, limited availability.
Solution: Ensure data is accurate, reliable, and relevant; use data cleansing and integration processes.
2. Data Governance and Privacy
Issues: Compliance with data governance and privacy regulations.
Solution: Implement data governance frameworks, policies, and procedures to protect sensitive
information and manage access.
3. Data Integration and Complexity
Issues: Challenges in integrating data from diverse sources due to format and system variations.
Solution: Develop robust data integration processes and technologies for holistic insights.
4. Analytical Skills and Talent Gap
Issues: Shortage of professionals with domain knowledge, statistical expertise, and analytical tool
proficiency.
Solution: Address talent gaps by hiring skilled data analysts and data scientists.
5. Technology Infrastructure
Issues: Complexity and resource intensity of implementing technology infrastructure for analytics.
Solution: Invest in scalable and reliable systems for data storage, processing, and analytics.
6. Change Management and Organizational Culture
Issues: Resistance to adopting data-driven decision-making.
Solution: Employ change management strategies and leadership support to foster a data-driven
culture.
7. Interpretation and Actionability of Insights
Issues: Difficulty in extracting actionable and understandable insights for decision-makers.
Solution: Ensure insights are relevant and actionable; translate findings into tangible actions.
8. Cost and Return on Investment (ROI)
Issues: High costs in technology, talent, and infrastructure for analytics.
Solution: Carefully assess costs versus potential returns to ensure benefits justify the investment.
Data Science
A multidisciplinary field for extracting knowledge from structured and unstructured data.
Combines statistics, mathematics, computer science, and domain expertise.
Aims to solve complex problems and make data driven decisions.
Business
Intelligence/
Statistics Information
Systems
Modeling
and
Optimization
Data Science:
Definition: A broader, multidisciplinary field that uses advanced methods (like machine learning and
statistical modeling) to extract insights from both structured and unstructured data.
Key Characteristics: Solves complex problems, makes predictions, and drives innovation beyond just
business applications.
1. Business Understanding:
Objective: Clearly define the business problem or opportunity that analytics will address. This stage sets the
direction for the entire project.
Activities: Identify key questions, define goals, and determine the scope of the analytics project. Align the
analytics objectives with business priorities to ensure relevance.
Output: A well-defined problem statement, clear objectives, and a project plan that outlines the analytics
approach.
2. Data Acquisition:
Objective: Collect data that is relevant to the business problem. This stage involves gathering data from
various sources to ensure a comprehensive dataset.
Activities: Data can be acquired from internal databases, external data providers, APIs, web scraping, and
other sources. ETL (Extraction, Transformation, and Loading) processes are used to bring data into a usable
state.
Output: A collected dataset that is ready for initial review and cleaning, with all relevant data sources
identified and accessed.
3. Data Preparation:
Objective: Prepare the data for analysis by ensuring it is clean, consistent, and formatted correctly.
Activities: This involves data cleaning (removing duplicates, correcting errors, handling missing values),
data transformation (converting data types, scaling), and feature engineering (creating new variables that
may improve model performance).
Output: A high quality, ready to analyze dataset that accurately reflects the information needed for
modeling.
4. Exploratory Data Analysis (EDA):
Objective: Explore the data to understand its underlying structure, detect patterns, spot anomalies, and form
hypotheses for further analysis.
Activities: Use statistical techniques, summary statistics, and data visualization tools (like histograms,
scatter plots, and box plots) to explore relationships and trends within the data.
Before collecting data, there are several factors you need to define:
The question you aim to answer
The data subject(s) you need to collect data from
The collection timeframe
The data collection method(s) best suited to your needs
Data Collection Methods
Surveys and Questionnaires: Collect customer feedback and quantitative data.
Interviews: Capture in-depth qualitative insights from stakeholders or customers.
Data Scraping: Extract relevant information from websites or external sources.
Data Integration: Combine data from multiple sources for a unified view.
Ensuring Data Quality
Data Governance Frameworks
o Establish processes to validate, clean, and manage data.
Quality Assurance
o Remove inconsistencies, errors, and ensure data accuracy.
Compliance
o Adhere to data privacy regulations (e.g., GDPR) to protect sensitive information.
Methods of Collecting Data
There are two different methods of collecting data: Primary Data Collection and Secondary Data
Collection.
Primary Data
Primary data refers to information collected directly from first-hand sources specifically for a
particular research purpose. This type of data is gathered through various methods, including surveys,
interviews, experiments, observations, and focus groups. One of the main advantages of primary data
The data preparation process can vary depending on industry or need, but typically consists of the
following steps:
Acquiring data: Determining what data is needed, gathering it, and establishing consistent access to
build powerful, trusted analysis
Exploring data: Determining the data‗s quality, examining its distribution, and analyzing the
relationship between each variable to better understand how to compose an analysis
Cleansing data: Improving data quality and overall productivity to craft error-proof insights
Transforming data: Formatting, orienting, aggregating, and enriching the datasets used in an analysis
to produce more meaningful insights
1. Acquire Data
The first step in any data preparation process is acquiring the data that an analyst will use for their
analysis. It‗s likely that analysts rely on others (like IT) to obtain data for their analysis, likely from an
enterprise software system or data management system. IT will usually deliver this data in an accessible
format like an Excel document or CSV.
Modern analytic software can remove the dependency on a data-wrangling middleman to tap right into
trusted sources like SQL, Oracle, SPSS, AWS, Snowflake, Sales force, and Marketo. This means analysts
can acquire the critical data for their regularly-scheduled reports as well as novel analytic projects on their
own.
2. Explore Data
Examining and profiling data helps analysts understand how their analysis will begin to take shape.
Analysts can utilize visual analytics and summary statistics like range, mean, and standard
deviation to get an initial picture of their data. If data is too large to work with easily, segmenting
it can help.
During this phase, analysts should also evaluate the quality of their dataset. Is the data complete?
Are the patterns what was expected? If not, why? Analysts should discuss what they‗re seeing with the
owners of the data, dig into any surprises or anomalies, and consider if it‗s even possible to improve
Hypothesis generation is a process beginning with an educated guess whereas hypothesis testing is a
process to conclude that the educated guess is true/false or the relationship between the variables is
statistically significant or not.
This latter part could be used for further research using statistical proof. A hypothesis is accepted or
rejected based on the significance level and test score of the test used for testing the hypothesis
Null hypothesis (H0): The null hypothesis is the starting assumption in statistics. It says there is no
relationship between groups. For Example A company claims its average production is 50 units per
day then here:
H₀: The mean number of daily visits (μμ) = 50.
Alternative hypothesis (H1): The alternative hypothesis is the opposite of the null hypothesis it
suggests there is a difference between groups. like The company‘s production is not equal to 50 units
per day then the alternative hypothesis would be:
H₁: The mean number of daily visits (μμ) ≠ 50.
Key Terms of Hypothesis Testing
Level of significance: It refers to the degree of significance in which we accept or reject the null
hypothesis. 100% accuracy is not possible for accepting a hypothesis so we select a level of
significance. This is normally denoted with ααand generally it is 0.05 or 5% which means your output
should be 95% confident to give a similar kind of result in each sample.
P-value: When analyzing data the p-value tells you the likelihood of seeing your result if the null
hypothesis is true. If your P-value is less than the chosen significance level then you reject the null
hypothesis otherwise accept it.
Test Statistic: Test statistic is the number that helps you decide whether your result is significant. It‘s
calculated from the sample data you collect it could be used to test if a machine learning model
performs better than a random guess.
Critical value: Critical value is a boundary or threshold that helps you decide if your test statistic is
enough to reject the null hypothesis
Degrees of freedom: Degrees of freedom are important when we conduct statistical tests they help
you understand how much data can vary.
Seven steps of hypothesis testing
Step 1: Specify the null hypothesis and the alternative hypothesis
Step 2: What level of significance?
Step 3: Which test and test statistic to be performed? Step 4 : State the decision rule
Step 5: Use the sample data to calculate the test statistic
Step 6: Use the test statistic result to make a decision
Step 7: Interpret the decision in the context of the original question
1.8 Modeling
Definition: Modeling involves creating mathematical or statistical models to represent business processes,
relationships, or phenomena.
Purpose: To analyze data, make predictions, and derive actionable insights for decision-making.
Steps in the Modeling Process:
Problem Formulation:
• Clearly define the business problem or objective.
• Understand the context, identify key variables, and determine the scope and constraints.
Data Preparation:
• Clean, integrate, transform, and format data for modeling.
1. Conceptual Design
The foundation of any model validation is its conceptual design, which needs documented coverage
assessment that supports the model‘s ability to meet business and regulatory needs and the unique risks
facing a bank.
2. System Validation
All technology and automated systems implemented to support models have limitations. An effective
validation includes: firstly, evaluating the processes used to integrate the model‘s conceptual design and
functionality into the organisation‘s business setting; and, secondly, examining the processes implemented
to execute the model‘s overall design.
3. Data Validation and Quality Assessment
Data errors or irregularities impair results and might lead to an organisation‘s failure to identify and respond
to risks. Best practice indicates that institutions should apply a risk-based data validation, which enables the
reviewer to consider risks unique to the organisation and the model.
Regression analysis
A collection of statistical procedures for estimating connections between a dependent variable and
one or maybe more independent variables is known as regression analysis. It may be used to
determine the strength of a relationship across variables and to predict how they will interact in the future.
Cohort Analysis
Cohort analysis is a technique for determining how engaged users are over time. It‗s useful to determine
whether user engagement is improving over time or just looking to improve due to growth. Cohort analysis
is useful because it helps to distinguish between growth and engagement measures. Cohort analysis is
watching how individual‘s behavior develops over time in groups of people.
Predictive Analysis
By examining historical and present data, the predictive analytic approach seeks to forecast future trends.
Predictive analytics approaches, which are powered by machine learning and deep learning, allow firms to
notice patterns or possible challenges ahead of time and prepare educated initiatives. Predictive analytics is
being used by businesses to address issues and identify new possibilities.
Prescriptive Analysis
The prescriptive analysis approach employs tools like as graph analysis. Prescriptive analytics is a sort of
data analytics in which technology is used to assist organisations in making better decisions by
analysing raw data. Prescriptive analytics, in particular, takes into account information about potential
situations or scenarios, available resources, previous performance, and present performance to recommend
a course of action or strategy. It may be used to make judgments throughout a wide range of time frames,
from the immediate to the long term.
Conjoint Analysis
Conjoint analysis is the best market research method for determining how much customers appreciate a
product‗s or service‗s qualities. This widely utilized method mixes real-life scenarios and statistical tools
with market decision models
Cluster analysis
Any organization that wants to identify distinct groupings of consumers, sales transactions, or other sorts of
behaviors and items may use cluster analysis as a valuable data-mining technique.
Definition: Deployment refers to the process of implementing developed analytical models, solutions, or
insights into operational systems or business processes.
Objective: Ensures that insights and recommendations from analytics are used effectively to drive decision-
making and improve business outcomes.
Definition: Iteration involves the ongoing improvement and refinement of deployed analytics models or
solutions. It adapts analytics outputs to changing business conditions, data availability, and evolving
requirements.
Objective: To ensure that analytics solutions remain relevant, accurate, and impactful through continuous
feedback, learning, and enhancement.
• Key Aspects of the Iteration Process:
Feedback Collection:
– Gather feedback from users, stakeholders, or customers who interact with the deployed analytics
solutions.
– Feedback helps identify strengths, weaknesses, and areas for improvement in the analytics outputs.
Data Updates:
– Update the data used in analytics models as new data becomes available.
– Incorporate new data points or time periods to keep models up-to-date and reflective of the current
business environment.