SDP EV2 Updated
SDP EV2 Updated
Review-1 Presentation
Machine learning based dynamic pricing for
perishable products
Supervised By: Prof. Sankarsan Sahoo
Group No.: C9
Neejara Dikshita Choudhury :-
2141003023
Shubham Swain :- 2141019401
Joydeep Sutradhar :- 2141019400
Debashree Priyadarshini :- 2141016343 Department of Computer Sc. and
Engineering
Faculty of Engineering & Technology (ITER)
Siksha ‘O’ Anusandhan (Deemed to be)
University
Bhubaneswar, Odisha 1
Introduction
Perishable products like fresh produce, dairy, and
meat require efficient inventory and pricing
strategies to minimize waste and maximize
profits. Traditional pricing models, such as fixed
discounts, fail to adapt to real-time demand
fluctuations, leading to significant losses. Studies
[1] indicate that 40% of fresh produce is wasted
due to ineffective sales strategies.
Fig 1.0 3
Problem Statement
In traditional retail, pricing perishable products is a balance between making the most
profit and reducing unsold stock. Fixed prices or fixed discounts don’t work well
because they don’t consider real-time customer demand, stock levels, or how long a
product will stay fresh. The challenge is to create a smart, automated pricing system
that adjusts based on these factors. This way, businesses can increase profits while
keeping waste to a minimum.
Motivation
• Enhance sales efficiency and maximize profit using, TD3 (Twin Delayed Deep
Deterministic Policy Gradient)
• Reduce food wastage through better inventory management [2].
• Implement data-driven pricing strategies for improved decision-making [1].
• Benefit retailers and grocery stores with optimized pricing models [1].
4
Objectives
This project aims to develop a Machine Learning-based dynamic pricing strategy for
perishable products to optimize sales and minimize wastage. By analyzing factors like
expiration dates, demand fluctuations, and market trends, the system will adjust
prices dynamically to maximize revenue.
Expected Impacts
• Increased Revenue: Optimized pricing ensures higher profitability using , (TD3
Algorithm).
• Reduced Waste: Minimizes perishable product wastage through dynamic pricing
[2].
• Consumer Benefits: Encourages fair pricing and affordability for customers [1].
• Data-Driven Decision Making: Supports businesses with AI-powered sales
strategies [1].
5
• Sustainability Contribution: Reduces environmental impact by lowering food
Literature Review
7
Improvements Over Existing Solution
• Unlike prior studies that assume a two-period lifetime, our model is designed to
handle perishable products with longer and variable shelf lives, providing a more
practical and scalable solution [2].
8
Work-flow Diagram
Fig 1.1
9
Key Components/Features & Modules
Cont.
• Data Collection & Preprocessing
• Data Source:Collected from a retail dataset containing product details, demand factors,
pricing history, and expiration dates.
• Feature Engineering:
• Days_To_Expire = EXPIRATION_DATE - CURRENT_DATE
• Discount_Applied = Original_Price - Discounted_Price
• Data Cleaning:
• Handling missing values using median imputation.
• Removing duplicate entries.
• Feature Transformation:
• Used standardScalar to normalize the feature.
10
Key Components/Features & Modules
• Machine Learning Model Training & Optimization
• Algorithm Used:
• TD3 (Twin Delayed DDPG) Reinforcement Learning for dynamic discounting.
• Training Setup:
• State Space: (Product price, demand factor, stock level, expiration days).
• Action Space: (Discount % applied dynamically).
• Reward = Profit - Overstocking Penalty - Expiration Loss
• Optimization:
• Learning Rate: 0.0003
• Training Episodes: 5000+ for convergence.
11
Visualizations and Insight
12
Fig 1.2 Fig 1.3
Algorithms and Methods Used
• TD3 (Twin Delayed Deep Deterministic Policy Gradient)
Best Fit for continuous action spaces like flexible discounts
Key Strengths: Handles noise, avoids overestimation
• Heuristic Methods
Reward Method
Probability Model to find Demand Factor 13
Technologies, Frameworks & Tools
Used
• Programming Language: Python
14
Results and Analysis
User Input when DQN is implemented: (Good Fit for discrete pricing actions , only for
Discrete Data)
Fig 1.4
15
Results and Analysis
Visualization using DQN:
● Not Efficient due to its improper model training
Fig 1.5
16
Results and Analysis
User Input when A2C is implemented:
Decent Fit for both continuous & discrete actions but less stable than TD3
Fig 1.6
17
Results and Analysis
Visualization using A2C:
● A2C has high gradient variance and limited exploration.
Fig 1.7 18
Results and Analysis
User Input when TD3 is implemented: (Best Model Performance)
Fig 1.8 19
Results and Analysis
Visualization using TD3:
● Reduces overestimation bias with twin critics.
Fig 1.9 20
Conclusion and Future Work
Key Findings:-
● TD3 handled continuous action spaces (prices) better, giving more stable and realistic
pricing compared to A2C (high variance) and DQN (discrete-only).
● A well-shaped reward (profit vs. penalties) was key to training efficiency and realistic
pricing decisions.
● Flask integration proved effective for testing real-time predictions and making the
solution user-interactive.
● TD3 required more training time than DQN but provided more accurate and profitable 21
Conclusion and Future Work
Future Work:-
[2] T. Yavuz, O. Kaya, Deep reinforcement learning algorithms for dynamic pricing and
inventory management of perishable products, Appl. Soft Comput. 111864 (2024)
https://doi.org/10.1016/j.asoc.2024.111864.
[3] J. Shen, Y. Wang, F. Xiao, "Dynamic Pricing Strategy for Data Product Through Deep
Reinforcement Learning," IEEE Access, vol. 12, pp. 194829-194838, 2024
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10810405
● Web Resources
○ Kaggle. (n.d.). Grocery Inventory and Sales Dataset www.kaggle.com
24