0% found this document useful (0 votes)
24 views62 pages

DSML

The document outlines an Introduction to Business Analytics course, detailing evaluation components, key topics, and various types of analytics such as descriptive, diagnostic, predictive, and prescriptive analytics. It emphasizes the importance of analytics in decision-making and provides examples of its applications across industries, including product recommendations, sentiment analysis, and fraud detection. Additionally, it includes a project outline for analyzing factors influencing vehicle prices, covering steps from problem definition to implementation.

Uploaded by

narayan972
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views62 pages

DSML

The document outlines an Introduction to Business Analytics course, detailing evaluation components, key topics, and various types of analytics such as descriptive, diagnostic, predictive, and prescriptive analytics. It emphasizes the importance of analytics in decision-making and provides examples of its applications across industries, including product recommendations, sentiment analysis, and fraud detection. Additionally, it includes a project outline for analyzing factors influencing vehicle prices, covering steps from problem definition to implementation.

Uploaded by

narayan972
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Introduction to Business

Analytics
02/24/2025 1
Evaluation:

Students will be evaluated based on case analysis, group projects, class


participation, and quizzes. The weightage of each evaluation components
are listed below:

 Class participation: 10%


 Quizzes: 20%
 Project: 30%
 End Term Exam: 40%

24/02/2025 2
Topics:

 What is Analytics?
 Why Analytics?
 Types of Analytics

02/24/2025 3
 Analytics is the scientific process of transforming data
into insights for making better decisions.
OR

 Analytics is the process of discovering, interpreting, and


communicating significant patterns in data and using tools
to empower entire organization.

 Business analytics adds even more opportunities to drive


desired outcomes, such as optimization, cost savings, and
customer engagement.

02/24/2025 4
Why Analytics ?
 Decision making is not easy, specially if the business is big
 Competitive Advantage
 Improving Efficiency

Example:

 City bank managers decide if a customer is trustworthy


 Amazon decides who is getting pregnant and hence offer on
baby care products
 Manager decides who to give proportional offers
02/24/2025 5
Give me some examples where industries are
making money through the use of data
analytics.
1. Product Recommendations

 Product recommendation is one


of the most popular applications of
Analytics.

 It works as a filtering system that


predicts and shows items a user
might want to purchase.

 Product recommendation is a key


feature of almost every
e-commerce website.

 Analytics tracks user behavior


based on previous purchases,
search patterns, and cart history
to make these recommendations.

02/24/2025 7
2. Image
 Image recognition is one of the most
Recognition
significant machine learning applications,
used to catalog and detect objects or
features in digital images.

 It helps identify places, logos, people,


objects, buildings, and many other
elements in images.

 This technology is widely adopted for


further analysis like pattern recognition,
face detection, and face recognition.

The focus is on how image recognition


enhances various industries by identifying and
processing visual data for deeper insights.

02/24/2025 8
3. Sentiment
Analysis
 Sentiment analysis is a real-time machine
learning application that identifies the emotion
or opinion of the speaker or writer.

 For example, if someone writes a review, email,


or document, the sentiment analyzer detects
the thought and tone behind the text.

 It is commonly used in review-based websites


and decision-making applications to analyze
user feedback and other textual data.

This emphasizes how sentiment analysis helps in


understanding public opinion and making informed
decisions based on textual content.

02/24/2025 9
4. Smart Health Records

 Maintaining health records is a time-consuming


process, even with technological advancements.

 Machine learning in healthcare aims to streamline


processes, saving time, effort, and money.

 MIT is at the forefront of developing smart health


records using machine learning for diagnosis and
clinical treatment suggestions.

This highlights the potential of machine learning to


transform healthcare by improving efficiency and
enhancing decision-making through intelligent health
records.

02/24/2025 10
5. customer segmentation
 Customer segmentation involves dividing
customers into groups based on common
characteristics to enable effective and
targeted marketing.

 In business-to-business (B2B) marketing,


companies may segment customers by factors
such as industry and number of employees

 In B2C marketing, companies may segment


customers by factors such as Demographics,
Geographic's, Psychographics, Behavioral data
and Technographic.

 These segments help companies create


personalized marketing campaigns that resonate
with individual consumer needs and preferences.

02/24/2025 11
6. Fraud detection
Machine learning is widely applied to detect
and prevent fraud across various industries.

In financial services, it identifies


fraudulent transactions, combats tax
evasion, and detects money laundering.

The insurance industry leverages


machine learning to uncover fraudulent
claims and activities

In retail banking, machine learning


enhances fraud prevention by analyzing
transaction patterns and anomalies.

This highlights machine learning's critical role in


proactively identifying and combating fraud in
different sectors.
02/24/2025 12
7. Dynamic Pricing
 Dynamic pricing (also known as surge pricing,
demand pricing, or time-based pricing) is a flexible
pricing strategy that adjusts prices based on current
market demands.

 Machine learning algorithms analyze historical data


and integrate it with real-time market and consumer
insights to inform pricing decisions.

 This strategy enables businesses to optimize prices


based on numerous variables, ultimately helping to
maximize revenue.

 Examples of industries utilizing dynamic pricing


include:
1. Airline industry
2. Indian railway and transportation industry

02/24/2025 13
Websites To Learn More About Data Science And
Machine Learning

https://www.kaggle.com/

https://archive.ics.uci.edu/ml/index.php

https://www.analyticsvidhya.com/

02/24/2025 14
Type of
Analytics

02/24/2025 15
Descriptive Analytics: What
happened?
Example of Descriptive Analytics:

 Monthly Profit and Loss Statement: Serves as a classic


example, summarizing financial performance over a specific
period.

 Customer Demographics: Demographic Insights:


Analyzing customer information, such as the fact that 30% of
customers are self-employed, falls under descriptive analytics
as it describes the customer base.

 Time-Based Analysis: Highlights the ability to compare data


across different time periods to observe changes in health
outcomes.

Purpose of Descriptive Analytics: Aims to provide a clear


view of past events and patterns, facilitating better decision-
making and strategic planning.

16
Diagnostic Analytics: Why is it
happening?
Diagnostic analytics examines data to
answer the question, “Why did it
happen?”

 In the assessment of descriptive data,


diagnostic analytical tools empower
analysts to drill down and isolate the
root cause of a problem.

 For example, we can find the reason


why the recovery and death rates have
increased for COVID-19 patients in
April 2021 compared to April 2020.
17
Predictive Analytics: What is
likely to happen?
Predictive analytics uses historical data and algorithms to
forecast future events or behaviors.

 Analyzes customer behavior and market trends to


create predictive models.
 Example: Retailers predict future buying patterns
based on past purchase data.

 Used in healthcare, finance, marketing, and more for


informed decision-making.
 Example: Hospitals forecast patient admission rates to
optimize resource allocation.

Note: Predictive analytics involves using historical data and


statistical algorithms to forecast future events or behaviors. It
analyzes patterns and trends to predict what is likely to occur.

18
Prescriptive Analytics: What do I need to do?

Prescriptive analytics uses data and algorithms to recommend


actions for optimal outcomes.

 Data Utilization: Combines historical data, predictive


models, and business rules to guide decision-making.

 Techniques: Employs methods like optimization,


simulation, and decision analysis.

 Example: Airlines use it to optimize flight schedules based


on demand forecasts.

 Example: Supply chain managers use optimization models


to minimize costs while meeting demand.

 Example: Hospitals determine the best staffing levels


based on patient inflow predictions to enhance service
delivery.
19
02/24/2025 20
Longitudinal Data
Subscription Age Salary Gender
Cross Sectional Data 0
0
25
64
132
84
1
1
0 31 100 0
0 66 72 1
0 39 78 1
A cross sectional data is collected by observing various 1 53 54 1
subjects like (firms, countries, regions, individuals), at the 1 49 102 0

same point in time or during the same period. 1 63 145 0


0 66 130 1
0 72 87 0
0 35 89 1
1. Sales of 30 companies 0 42 97 1
2. Productivity of each sales division 1 65 90 1
3. Gross annual income for each of 1000 randomly chosen 1 48 90 1
households in New York City for the year 2000 0 28 67 1
0 46 98 1
0 31 55 1
0 41 103 1
1 57 128 1
1 70 72 1

02/24/2025 34
Time Series Data
Data collected for a single variable such as sales of smartphones over several time
interval (weekly, monthly) is called a time-series data.

Year Quarter Sales


2012 1 $1,65,000.00
2 $2,53,000.00
3 $3,16,000.00
4 $2,87,000.00
2013 1 $2,57,000.00
2 $3,08,000.00
3 $3,16,000.00
4 $3,51,000.00

02/24/2025 35
Panel Data
A data collected for a several variables over several time interval
(weekly, monthly, yearly) is called Panel data.
Panel Data
Person Year Income Expenditure
$1,300.00
1 2016 $1,300.00
$1,300.00
1 2017 $1,600.00
$1,300.00
1 2018 $2,000.00
$1,300.00
2 2016 $2,000.00
$1,300.00
2 2017 $2,300.00
$1,300.00
2 2018 $2,400.00

02/24/2025 36
Problem Statement
Project Title: Analyzing Factors Influencing Vehicle Prices

Step 1: Define the Problem


Objective of the Project: The core objective is to analyze vehicle advertisements and determine what
drives vehicle prices. This involves identifying key features or trends in the data that influence price
fluctuations and help predict vehicle valuation.

Research Questions:
• What vehicle attributes significantly affect the price?
• How do external factors (e.g., market trends, location) impact pricing?
• Can we predict vehicle prices accurately based on available data?
Step 2: Data Collection

Gather Historical Data: Collect advertisement data over the last few years, including:

• Vehicle make and model


• Year of manufacture
• Mileage
• Condition (new/used)
• Listing price
• Location of the advertisement
• Seller type (private seller, dealership)
• Date of the advertisement

Step 3: Data Preparation

Data Cleaning:
• Remove duplicates and irrelevant entries.
• Handle missing values appropriately (impute, drop, or fill).
• Standardize data formats (e.g., date formats, string casing).

Feature Engineering:
• Create new variables if necessary (e.g., age of the vehicle, price per mile).
• Convert categorical variables into numerical ones using techniques like one-hot encoding.
Step 4: Exploratory Data Analysis (EDA)

Visualizations:
• Use histograms, scatter plots, and box plots to visualize price distributions and relationships.
• Examine correlation matrices to identify relationships between features.

Statistical Analysis:
• Calculate descriptive statistics (mean, median, mode) for key features.
• Perform hypothesis testing to determine significant factors influencing price.

Step 5: Modeling

Select Appropriate Models:


• Start with simple regression models (e.g., linear regression) to establish baseline performance.
• Explore more complex models (e.g., decision trees, random forests, or gradient boosting) for improved accuracy.

Model Training and Testing:


• Split the dataset into training and testing sets (e.g., 80/20 split).
• Train models on the training set and evaluate performance on the testing set using metrics like RMSE, MAE, and R².
Step 6: Results Interpretation

Analyze Model Outputs:


• Determine which features are the most important in predicting vehicle prices.
• Visualize the relationship between predicted and actual prices.
•Generate Insights:
• Provide actionable insights based on analysis (e.g., optimal price ranges for specific vehicle types).

Step 7: Reporting

Create a Presentation:
• Summarize findings in a clear and concise manner.
• Use visual aids (charts, graphs) to illustrate key points.
•Recommendations:
• Suggest strategies for sellers based on the analysis.
• Propose improvements for Crankshaft List based on insights gained.

Step 8: Implementation

Integrate Findings:
• Work with relevant teams to implement findings into the platform.
• Monitor outcomes and adjust strategies based on ongoing analysis.
Data Reading
Test: IF data distribution is
symmetric, replace with Mean

If data is numeric

Test: IF data distribution is not


symmetric, replace with
Median
Missing Value Imputation

Replace missing value with


If data is Non-numeric Mode
Imputation Technique Type of Data How it Works Best For

Replace missing values with the mean of


Mean Imputation Numerical the column. Normally distributed data.

Replace missing values with the median of Skewed numerical data (e.g., income, house
Median Imputation Numerical the column. prices).

Replace missing values with the most


Mode Imputation Categorical frequent value. Categorical features.

Forward/Backward Fill Time Series Propagate the next/previous value. Sequential or time-series data.

Both Numerical & Use nearest neighbors to impute missing Complex datasets with relationships between
KNN Imputation Categorical values. features.

MICE (Multivariate Interdependent missing data across multiple


Imputation) Both Iteratively predicts missing values. features.

Randomly select a value from existing


Random Sampling Both data. Preserving data distribution.

Use regression or other models to predict Well-understood relationships between


Predictive Models Both values. variables.

Estimate missing values through


Interpolation Time Series linear/curve fitting. Continuous time-series data.

When missing data is too extensive or


Drop Missing Values Both Remove rows/columns with missing data. irrelevant.
Numeric Dataset
Outlier detection
Why might advertisement dates such as week, month,
and year be useful in predicting the price of an old
vehicle?

How can we impute the average distance driven,


which is an important factor in the valuation of
used vehicles?
Histogram
Outlier Detection
Pie Chart
Question: Study how many days advertisements were displayed (days_listed). Plot a histogram, calculate
the mean and median, and describe the typical lifetime of an ad. Identify when ads were removed quickly
and when they were listed for an abnormally long time.
Continue…..
ANOVA
ANOVA Continue…

You might also like