0% found this document useful (0 votes)
7 views24 pages

SDP EV2 Updated

The document presents a senior design project focused on developing a machine learning-based dynamic pricing strategy for perishable products to optimize sales and minimize waste. It highlights the limitations of traditional pricing models and proposes using the TD3 algorithm to dynamically adjust prices based on factors like demand and expiration dates. The project aims to enhance profitability, reduce food wastage, and support retailers with data-driven decision-making.

Uploaded by

debashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views24 pages

SDP EV2 Updated

The document presents a senior design project focused on developing a machine learning-based dynamic pricing strategy for perishable products to optimize sales and minimize waste. It highlights the limitations of traditional pricing models and proposes using the TD3 algorithm to dynamically adjust prices based on factors like demand and expiration dates. The project aims to enhance profitability, reduce food wastage, and support retailers with data-driven decision-making.

Uploaded by

debashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Senior Design Project

Review-1 Presentation
Machine learning based dynamic pricing for
perishable products
Supervised By: Prof. Sankarsan Sahoo

Group No.: C9
Neejara Dikshita Choudhury :-
2141003023
Shubham Swain :- 2141019401
Joydeep Sutradhar :- 2141019400
Debashree Priyadarshini :- 2141016343 Department of Computer Sc. and
Engineering
Faculty of Engineering & Technology (ITER)
Siksha ‘O’ Anusandhan (Deemed to be)
University
Bhubaneswar, Odisha 1
Introduction
Perishable products like fresh produce, dairy, and
meat require efficient inventory and pricing
strategies to minimize waste and maximize
profits. Traditional pricing models, such as fixed
discounts, fail to adapt to real-time demand
fluctuations, leading to significant losses. Studies
[1] indicate that 40% of fresh produce is wasted
due to ineffective sales strategies.

Fig 1.0 3
Problem Statement
In traditional retail, pricing perishable products is a balance between making the most
profit and reducing unsold stock. Fixed prices or fixed discounts don’t work well
because they don’t consider real-time customer demand, stock levels, or how long a
product will stay fresh. The challenge is to create a smart, automated pricing system
that adjusts based on these factors. This way, businesses can increase profits while
keeping waste to a minimum.

Motivation
• Enhance sales efficiency and maximize profit using, TD3 (Twin Delayed Deep
Deterministic Policy Gradient)
• Reduce food wastage through better inventory management [2].
• Implement data-driven pricing strategies for improved decision-making [1].
• Benefit retailers and grocery stores with optimized pricing models [1].
4
Objectives
This project aims to develop a Machine Learning-based dynamic pricing strategy for
perishable products to optimize sales and minimize wastage. By analyzing factors like
expiration dates, demand fluctuations, and market trends, the system will adjust
prices dynamically to maximize revenue.

Expected Impacts
• Increased Revenue: Optimized pricing ensures higher profitability using , (TD3
Algorithm).
• Reduced Waste: Minimizes perishable product wastage through dynamic pricing
[2].
• Consumer Benefits: Encourages fair pricing and affordability for customers [1].
• Data-Driven Decision Making: Supports businesses with AI-powered sales
strategies [1].
5
• Sustainability Contribution: Reduces environmental impact by lowering food
Literature Review

Author & Purpose of ML Dataset Input to Target Evaluation


Year the Study Techniques Used Model Variable Criteria
Used (Source)

Wenchuan Dynamic Multi-Agent Simulated Market Optimal Learning


Qiao et al. pricing of Reinforceme market data state, dynamic speed,
(2024) multiple nt Learning product pricing revenue
[1] perishable (MARL), Q- prices, optimization
products learning, inventory
DQN levels,
demand
interactions
Tuğçe Dynamic Deep Simulated Pricing, Optimal Optimizatio
Yavuz, Onur pricing and Reinforceme dataset inventory pricing and n
Kaya (2024) inventory nt Learning levels, stock levels performanc
[2] manageme (DRL) demand e, stability
nt for data in stochastic
6
Table 1.0
Limitations of Existing Research
• Short Product Lifetimes Considered
• Qiao et al. [1] utilized multi-agent reinforcement learning (MARL) to optimize pricing for
perishable products. Yavuz et al. [2] proposed a deep reinforcement learning (DRL) model
to enhance pricing and inventory management, reducing waste and maximizing revenue

• Limited Use of Advanced RL Algorithms


• Most studies, including Qiao et al. [1] and Yavuz et al. [2], rely on basic RL models like
DQN, with limited exploration of advanced RL techniques such as PPO or SAC for dynamic
pricing

• No Simultaneous Multi-Age Product Pricing


• Qiao et al. [1] focused on single-age product pricing, overlooking multi-age inventory
interactions. Yavuz et al. [2] optimized pricing but did not address simultaneous pricing for
different product ages.

7
Improvements Over Existing Solution

• Our project employs Reinforcement Learning (RL) techniques, specifically TD3


(Twin Delayed Deep Deterministic Policy Gradient), to dynamically adjust pricing
based on demand fluctuations and product aging.

• Unlike prior studies that assume a two-period lifetime, our model is designed to
handle perishable products with longer and variable shelf lives, providing a more
practical and scalable solution [2].

• By leveraging deep reinforcement learning (DRL), our model outperforms


traditional heuristic or simulation-based approaches by continuously improving
pricing strategies based on learned patterns from historical data [2].

8
Work-flow Diagram

Fig 1.1
9
Key Components/Features & Modules
Cont.
• Data Collection & Preprocessing
• Data Source:Collected from a retail dataset containing product details, demand factors,
pricing history, and expiration dates.
• Feature Engineering:
• Days_To_Expire = EXPIRATION_DATE - CURRENT_DATE
• Discount_Applied = Original_Price - Discounted_Price
• Data Cleaning:
• Handling missing values using median imputation.
• Removing duplicate entries.
• Feature Transformation:
• Used standardScalar to normalize the feature.

10
Key Components/Features & Modules
• Machine Learning Model Training & Optimization
• Algorithm Used:
• TD3 (Twin Delayed DDPG) Reinforcement Learning for dynamic discounting.
• Training Setup:
• State Space: (Product price, demand factor, stock level, expiration days).
• Action Space: (Discount % applied dynamically).
• Reward = Profit - Overstocking Penalty - Expiration Loss
• Optimization:
• Learning Rate: 0.0003
• Training Episodes: 5000+ for convergence.

• Visualization & Insights


• Graphs (e.g., histograms, and box plots)

11
Visualizations and Insight

12
Fig 1.2 Fig 1.3
Algorithms and Methods Used
• TD3 (Twin Delayed Deep Deterministic Policy Gradient)
Best Fit for continuous action spaces like flexible discounts
Key Strengths: Handles noise, avoids overestimation

• A2C (Advantage Actor-Critic)


Decent Fit for both continuous & discrete actions but less stable than TD3.
Key Strengths: Balanced value learning, faster updates

• DQN (Deep Q-Network)


Good Fit for discrete pricing actions
Key Strengths: Simple, effective in low-dimensional action space

• Heuristic Methods
Reward Method
Probability Model to find Demand Factor 13
Technologies, Frameworks & Tools
Used
• Programming Language: Python

• (specifically implemented in a Jupyter Notebook)

• Libraries for Data Handling: Pandas, NumPy

• Machine Learning Frameworks: Scikit-learn, Stable-Baselines3, Gym, TD3

• Visualization Tools: Matplotlib, Seaborn

• Backend Processing: FlaskAPI

• Dataset: Grocery Inventory and Sales Dataset (www.kaggle.com)

14
Results and Analysis
User Input when DQN is implemented: (Good Fit for discrete pricing actions , only for
Discrete Data)

Fig 1.4

15
Results and Analysis
Visualization using DQN:
● Not Efficient due to its improper model training

Fig 1.5
16
Results and Analysis
User Input when A2C is implemented:
Decent Fit for both continuous & discrete actions but less stable than TD3

Fig 1.6

17
Results and Analysis
Visualization using A2C:
● A2C has high gradient variance and limited exploration.

Fig 1.7 18
Results and Analysis
User Input when TD3 is implemented: (Best Model Performance)

Fig 1.8 19
Results and Analysis
Visualization using TD3:
● Reduces overestimation bias with twin critics.

Fig 1.9 20
Conclusion and Future Work
Key Findings:-

● TD3 handled continuous action spaces (prices) better, giving more stable and realistic
pricing compared to A2C (high variance) and DQN (discrete-only).

● Normalizing features like Stock_Level, Days_To_Expire significantly improved model


convergence and stability.

● A well-shaped reward (profit vs. penalties) was key to training efficiency and realistic
pricing decisions.

● Flask integration proved effective for testing real-time predictions and making the
solution user-interactive.

● TD3 required more training time than DQN but provided more accurate and profitable 21
Conclusion and Future Work
Future Work:-

● Batch Processing: Upload CSV/Excel datasets instead of single inputs.

● Feature Matching: Automatically map uploaded features to model’s expected features

● Generalization: Retrain or fine-tune the model to handle varied input distributions


across different sellers.

● Scalability: Automate optimal pricing for hundreds/thousands of products in one go.

● Real-time Integration: Enable live inventory and demand monitoring to support


continuous and dynamic price updates.

● User Experience (UX) Enhancements: Upgrade the React-based frontend to ensure a


more intuitive and interactive interface for users. 22
Bibliography
● References
[1] W. Qiao, M. Huang, Z. Gao, X. Wang, Distributed dynamic pricing of multiple perishable
products using multi-agent reinforcement learning, Expert Syst. Appl. 237 (2024) 121252
https://doi.org/10.1016/j.eswa.2024.121252.

[2] T. Yavuz, O. Kaya, Deep reinforcement learning algorithms for dynamic pricing and
inventory management of perishable products, Appl. Soft Comput. 111864 (2024)
https://doi.org/10.1016/j.asoc.2024.111864.

[3] J. Shen, Y. Wang, F. Xiao, "Dynamic Pricing Strategy for Data Product Through Deep
Reinforcement Learning," IEEE Access, vol. 12, pp. 194829-194838, 2024
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10810405

[4] S. B. Gadipudi, R. K. Kalaimani, "Reinforcement Learning for Dynamic Pricing under


Competition for Perishable Products," 28th International Conference on System Theory,
Control and Computing (ICSTCC), Sinaia, Romania, 2024
https://www.sciencedirect.com/science/article/abs/pii/S1568494624006380

● Web Resources
○ Kaggle. (n.d.). Grocery Inventory and Sales Dataset www.kaggle.com
24

You might also like