DSML
DSML
Analytics
02/24/2025 1
Evaluation:
24/02/2025 2
Topics:
What is Analytics?
Why Analytics?
Types of Analytics
02/24/2025 3
Analytics is the scientific process of transforming data
into insights for making better decisions.
OR
02/24/2025 4
Why Analytics ?
Decision making is not easy, specially if the business is big
Competitive Advantage
Improving Efficiency
Example:
02/24/2025 7
2. Image
Image recognition is one of the most
Recognition
significant machine learning applications,
used to catalog and detect objects or
features in digital images.
02/24/2025 8
3. Sentiment
Analysis
Sentiment analysis is a real-time machine
learning application that identifies the emotion
or opinion of the speaker or writer.
02/24/2025 9
4. Smart Health Records
02/24/2025 10
5. customer segmentation
Customer segmentation involves dividing
customers into groups based on common
characteristics to enable effective and
targeted marketing.
02/24/2025 11
6. Fraud detection
Machine learning is widely applied to detect
and prevent fraud across various industries.
02/24/2025 13
Websites To Learn More About Data Science And
Machine Learning
https://www.kaggle.com/
https://archive.ics.uci.edu/ml/index.php
https://www.analyticsvidhya.com/
02/24/2025 14
Type of
Analytics
02/24/2025 15
Descriptive Analytics: What
happened?
Example of Descriptive Analytics:
16
Diagnostic Analytics: Why is it
happening?
Diagnostic analytics examines data to
answer the question, “Why did it
happen?”
18
Prescriptive Analytics: What do I need to do?
02/24/2025 34
Time Series Data
Data collected for a single variable such as sales of smartphones over several time
interval (weekly, monthly) is called a time-series data.
02/24/2025 35
Panel Data
A data collected for a several variables over several time interval
(weekly, monthly, yearly) is called Panel data.
Panel Data
Person Year Income Expenditure
$1,300.00
1 2016 $1,300.00
$1,300.00
1 2017 $1,600.00
$1,300.00
1 2018 $2,000.00
$1,300.00
2 2016 $2,000.00
$1,300.00
2 2017 $2,300.00
$1,300.00
2 2018 $2,400.00
02/24/2025 36
Problem Statement
Project Title: Analyzing Factors Influencing Vehicle Prices
Research Questions:
• What vehicle attributes significantly affect the price?
• How do external factors (e.g., market trends, location) impact pricing?
• Can we predict vehicle prices accurately based on available data?
Step 2: Data Collection
Gather Historical Data: Collect advertisement data over the last few years, including:
Data Cleaning:
• Remove duplicates and irrelevant entries.
• Handle missing values appropriately (impute, drop, or fill).
• Standardize data formats (e.g., date formats, string casing).
Feature Engineering:
• Create new variables if necessary (e.g., age of the vehicle, price per mile).
• Convert categorical variables into numerical ones using techniques like one-hot encoding.
Step 4: Exploratory Data Analysis (EDA)
Visualizations:
• Use histograms, scatter plots, and box plots to visualize price distributions and relationships.
• Examine correlation matrices to identify relationships between features.
Statistical Analysis:
• Calculate descriptive statistics (mean, median, mode) for key features.
• Perform hypothesis testing to determine significant factors influencing price.
Step 5: Modeling
Step 7: Reporting
Create a Presentation:
• Summarize findings in a clear and concise manner.
• Use visual aids (charts, graphs) to illustrate key points.
•Recommendations:
• Suggest strategies for sellers based on the analysis.
• Propose improvements for Crankshaft List based on insights gained.
Step 8: Implementation
Integrate Findings:
• Work with relevant teams to implement findings into the platform.
• Monitor outcomes and adjust strategies based on ongoing analysis.
Data Reading
Test: IF data distribution is
symmetric, replace with Mean
If data is numeric
Replace missing values with the median of Skewed numerical data (e.g., income, house
Median Imputation Numerical the column. prices).
Forward/Backward Fill Time Series Propagate the next/previous value. Sequential or time-series data.
Both Numerical & Use nearest neighbors to impute missing Complex datasets with relationships between
KNN Imputation Categorical values. features.