Reinventing Grocery Shopping With Reinforcement Learning
Reinventing Grocery Shopping With Reinforcement Learning
net/publication/388121891
CITATIONS READS
0 30
5 authors, including:
SEE PROFILE
All content following this page was uploaded by Prof Yamarthi Narasimha Rao on 18 January 2025.
explored various technologies and methodologies, one of correct pricing data. The research evaluates the multi-agent
which is the utilization of agent-based systems powered by shopping system's resilience to faults and manipulation, such
reinforcement learning (RL) algorithms. as price misreporting or store pricing adjustments. Researchers
tested the system's capacity to survive obstacles and optimize
II. OBJECTIVES client cost reductions using simulations with random and
To design a dynamic and adaptable grocery shopping systematic mistakes. This study helps educate conversations
recommendation system using Q-learning, a reinforcement on the practical implementation and possible advantages for
learning algorithm, and sentiment analysis was introduced in customers in real-world grocery shopping situations by
this work. Applying sentiment analysis to user-generated assessing the approach's practicality and dependability. The
reviews and understanding grocery product perceptions is the experiment also showed that the suggested multi-agent
main aim of this work. Developing and optimizing the shopping system might save money and be resilient to pricing
recommendation model based on user input and Q-learning inaccuracies, which was not previously examined.
iterations is one objective. Adjusting the system to change Researchers developed an agent-based grocery shopping
user choices, habits, and sentiment patterns is another system to automate the shopping process by matching store
objective. The work also aims to create an easy-to-use data to consumer preferences [3]. Their solution uses user,
interface that interfaces with online grocery purchasing information management, and store server role agents to
systems. This work aims to improve user experience by cooperate and fulfill user objectives. The suggested system
buys the finest foods depending on the user's desire to save
making sentiment-driven product suggestions easier to
time and effort. It meets the grocery shopping system
navigate. Shortening the recommendation system learning
functional criteria and supports the five customer purchasing
curve and providing accurate and relevant grocery product behavior model phases. However, this study does not
recommendations from the start is another objective. The main specifically discuss RL for adjusting to user preferences over
aim is to increase grocery shopping platform customer time, which may vary from previous studies.
happiness and engagement by improving suggestion accuracy.
Researchers introduced an agent-based shopping system to
The work offers suggestions that match consumers' tastes to help supermarket consumers at home [4]. Lightweight agent
make grocery shopping more pleasurable and customized. To implementation TEEMA (TRLabs Execution Environment for
evaluate the Q-learning-driven sentiment-based Mobile Agents) provides basic agent communication,
recommendation system's influence on user happiness and migration, and location services using a microkernel approach.
engagement is the main motto. To help businesses understand Name, storage, security, and database services may be added
grocery customer attitudes and preferences is one objective. to TEEMA to improve functionality. The suggested system
To help retailers use sentiment-driven analytics to choose lets users send agents with shopping lists to chosen
products, promote, and manage inventories is another supermarkets, where they get restricted pricing lists. The
objective. To determine whether sentiment-driven suggestions system protects user data during agent travel via a residential
may optimize the grocery supply chain by affecting inventory, gateway. The system displays the search results and sends an
waste, and sustainability is the main aim of this work. agent to a supermarket via the residential gateway to get
comprehensive price lists if the user visits. User location,
III. RELATED WORKS
home gateway, mobile terminal, and participating
In [1], researchers developed an agent-based grocery supermarkets are the logical components of the distributed
shopping system that automates and considers consumer system architecture. The authors noted that their agent-based
preferences. They employed a network of agents to acquire system automates inventory management, helps supermarket
grocery store data, compared it to consumer preferences, and shoppers choose based on price and availability, and provides
changed depending on user input. Social grocery shopping, real-time special offer updates. They also addressed how to
where people share prices and amounts to get the greatest integrate the system with older database and server
savings and easiest shopping schedules, was established in [2]. technologies. They accept the problems and limits, notably in
This technique empowers customers to make better buying privacy protection, user authentication, and system resilience
choices and save money. Their strategy relies on agents to faults or manipulation. Comparing the agent-based
representing individual clients, which simplifies deployment approach to supermarket shopping with others shows its
and builds confidence by revealing which agents can supply benefits and flaws in maximizing customer shopping
trustworthy information. The writers assure realism and experiences. E-commerce has grown dramatically in recent
relevancy by using U.S. Consumer Price Index-based decades due to advances in information and
shopping lists. The suggested solution addressed a frequent telecommunications technologies and changing lifestyles.
problem in conventional grocery shopping: the absence of Horizontal collaboration among enterprises may reduce
price comparison tools and the hassle of visiting various transportation costs, improve service quality, reduce
locations to locate the best discounts. To democratize the environmental effects, mitigate risk, and increase market
shopping experience and foster fairer relationships between share, according to researchers [5]. Recently, e-grocery,
consumers and shops, imagine an online platform where users especially fresh produce, has become the most cost-effective
can exchange pricing information and get recommendations to and time-efficient delivery method. Food safety, storage
stores with the lowest overall costs for their preferred goods. temperature, and perishability are logistical concerns for e-
However, effective implementation and incentivization tactics groceries. The authors examined how cooperation-based
are needed to get consumers to actively engage and provide initiatives affected service quality in Pamplona supermarkets
[6]. The process began with a rigorous Pamplona survey to model to evaluate the is contingent on various
model e-grocery demand. Consumers demand longer shelf effect of horizontal external factors, including the
cooperation on lead times availability and accuracy of
life, but merchants prefer sending shorter-lived commodities and customer satisfaction data from grocery stores,
first to prevent food waste. Second, using survey data, the in e-grocery distribution. fluctuations in market
authors created an agent-based simulation model for conditions, and the reliability of
cooperative and non-cooperative situations. The simulation communication networks. Any
framework generates and solves a Vehicle Routing Problem disruptions or inaccuracies in
using a biased randomization approach. In conclusion, these external factors could
compromise the system's
horizontal collaboration in e-grocery delivery reduces lead performance and user
times and improves consumer satisfaction. Table I lists E- experience.
Grocery donations. The proposed method is an
Mitigating dependencies may
agent-based micromodel
TABLE I. CONTRIBUTIONS TO E-GROCERY, HORIZONTAL require redundancies, fallback
for simulating spatial
COOPERATION, AND AGENT-BASED SIMULATION [6] mechanisms, and continuous
choice in grocery shopping
monitoring to ensure robustness
Study Method Demerits behavior based on
and resilience.
individual populations.
As the number of users and
An agent-based grocery
transactions increases, An agent-based supermarket shopping system tackles
managing a large network of persistent issues in conventional grocery buying. Consumers
shopping system automates
agents and ensuring efficient
shopping by gathering
communication between them
must search several retailers for desired products at
information from multiple competitive pricing using traditional techniques [7]. These
[1] may become challenging. This
stores, comparing it with
user preferences, and
complexity could hinder the strategies may also fail to meet customer preferences, resulting
system's ability to scale in inferior selections. Given these inefficiencies, the suggested
adapting over time through
effectively to accommodate a
feedback.
growing user base and handle
solution uses modern technology and algorithms to transform
increasing transaction volumes. grocery shopping. E-commerce and consumer behavior
The system's reliance on
literature emphasizes the value of individualized suggestions
The proposed method gathering and analyzing user in customer pleasure and engagement. Researchers like [8]
entails customers data to tailor recommendations have studied customized recommendation systems, which
exchanging information on raises significant privacy and adjust product recommendations to customer preferences.
item prices and quantities security concerns. Collecting
[2]
to optimize savings and sensitive information about
Traditional recommendation fails to account for price changes
convenience, facilitated by users' preferences, purchasing and item availability in real time. The proposed approach
agents representing habits, and location data may bridges this gap by incorporating RL algorithms to allow
customers. pose risks if not adequately agents to learn and adjust their decision-making processes to
protected.
changing settings, giving users more customized and relevant
Addressing algorithmic bias
The proposed method
requires ongoing monitoring,
suggestions. RL algorithms may optimize decision-making in
involves an agent-based dynamic contexts, as shown by Sutton and Barto's pioneering
evaluation, and mitigation
grocery shopping system
that consists of three role
strategies to ensure equitable work. Modeling the grocery shopping issue as a Markov
treatment for all users. Without Decision Process (MDP) [9] and using Q-learning allows
agents: a user agent, an
careful design and oversight,
[3] information management
the system may inadvertently
agents to make intelligent store selection and item purchasing
agent, and a store server choices. This method optimizes grocery shopping by using RL
favor certain demographics or
agent, which cooperate to
purchase groceries
product categories over others, algorithms' capacity to learn from experience and modify
leading to disparities in decision-making rules [10]. The suggested solution uses RL
according to user
recommendation accuracy and
preferences.
fairness.
algorithms, unlike static recommendation systems used in
While the system aims to standard e-commerce platforms. The system uses RL to adapt
automate and optimize the to changing customer preferences, market circumstances, and
The method involves
grocery shopping process, it store offers in real time, making shopping more customized
may limit user control and and efficient [11]. The suggested solution also follows e-
employing TEEMA an
transparency over decision-
agent, to facilitate agent-
making. Users may feel
commerce automation and optimization trends, which aim to
based shopping, enabling use cutting-edge technology to solve long-standing problems
disenfranchised if they perceive
users to send agents with
shopping lists to select
the system as making decisions and improve customer satisfaction [12]. In conclusion, the
on their behalf without agent-based grocery purchasing system combines e-
[4] supermarkets, retrieve
sufficient input or explanation.
price lists, and receive real-
Providing users with greater
commerce, customized recommendation systems, and
time updates on special reinforcement learning. The technology uses powerful
control over preferences,
offers, ensuring privacy
and
recommendations, and algorithms and real-time data to improve grocery shopping by
decision-making processes, as improving convenience, customization, and efficiency. The
compatibility with legacy
well as enhancing transparency
software.
in how the system operates, is technology might make grocery shopping more smooth,
essential to foster trust and user personalized, and delightful for customers globally via
acceptance. iterative improvement and adaption [13].
[5] An agent-based simulation The effectiveness of the system
IV. PROPOSED METHODOLOGY improve the learned policy. This component gathers rewards
and compares the learned policy against a random policy in
The complete process of the Grocery Shopping is shown in
many circumstances. It compares learned policy performance
Fig. 1.
to baseline (random) policy. This research aims to optimize
A. Methodology purchasing selections across several stores using Q-learning to
The code initializes problem parameters in Data Setup. evaluate product availability, pricing, and shop distances. Q-
This contains MRPs, stores, goods, etc. A dictionary stores learning is a model-free RL algorithm that learns the best
MRPs with prices for each item. The state space in the state action-selection policy for an environment without a dynamics
space definition phase includes all agent states. Each state has model. Q-learning helps the autonomous agent learn the
two parts: the store number and item purchase status. Whether predicted cumulative benefits of adopting a certain action in a
an item is purchased is represented by a binary vector. given state for grocery shopping optimization. Q-Learning
Transition probabilities define the chance of changing states implementation is shown in Fig.2.
after an action in transition probabilities and rewards. Rewards
are instant benefits or costs of moving from one place to
another. Transition probabilities and incentives for each
activity (choosing a store) from one state to another are done
in the next phase. It takes into account store availability,
pricing, and distances.
B. Q-Learning Algorithm
The model-free reinforcement learning algorithm, Q-learning
learns optimum policies. Q-values show the predicted future
benefit of doing that action in that state for each state-activity Fig. 2. Implementation using Q-Learning
pair. The Q-learning function iterates across multi-step
episodes. The agent chooses an action based on an epsilon- In Fig.3, a model-based RL technique called value iteration
greedy strategy (balancing exploration and exploitation), computes the best value function for each environmental state,
changes Q-values depending on reward and transitions, and indicating the predicted cumulative rewards from that
collects rewards in each step. The code evaluates the optimum condition forward. Value iteration is used to calculate the
policy against a random policy after Q-learning. It simulates optimum value function in grocery shopping optimization by
purchasing situations and compares benefits from both plans. updating each state's value based on the Bellman equation,
Rewards from Q-learning episodes and evaluating the learned which describes the link between a state's value and its nearby
policy against the random policy are done in this module. This states. The agent randomly initializes the value function and
graphic shows policy learning and performance. Each state's iteratively updates it until convergence, bringing the values
best action (shop) indicates the learned optimum policy. It also closer to their ideal values. After computing the optimum
shows reward plots during learning and testing to assess policy value function, the agent may choose the action with the
performance and learning. These are the Q-learning algorithm greatest predicted value in each state to maximize cumulative
and reward calculation utility functions. They compute rewards. Value iteration is beneficial for grocery shopping
penalties using item pricing, shop distances, item availability optimization when the shopping environment is well-defined
probabilities, and transition probabilities. It runs the Q- because it applies a systematic approach to finding the best
learning algorithm for a set number of episodes and steps. Q- policy in known dynamics.
values are updated depending on rewards and transitions to
A. Experimental SetUp
Multi-core processors with 2.5 GHz or greater clock speeds
are suggested for reinforcement learning algorithms,
especially during training. Reinforcement learning tasks are
memory-intensive, particularly with big datasets, hence 16 GB
RAM is recommended. NVIDIA GPUs with CUDA support
accelerate reinforcement learning model training. At least 4
GB of GPU VRAM is recommended. An SSD with at least
512 GB is suggested for quicker data access, model storage,
and retrieval. Downloading datasets, model updates, and
cloud-based training or deployment need a reliable, fast
internet connection. Many machine learning tools and
frameworks function well with Linux-based operating systems
like Ubuntu 18.04 or above. The preferred language for
reinforcement learning algorithms is Python 3. x. NumPy,
PyTorch, and OpenAI Gym are essential. Deep learning
frameworks TensorFlow and PyTorch implement
reinforcement learning neural network designs. Jupyter
Notebook or Visual Studio Code may be used to code, test,
Fig. 3. Implementation of Value Iteration and debug reinforcement learning systems. Unity ML-Agents
or bespoke OpenAI Gym settings may simulate grocery
C. Proposed Model shopping.
This work proposes a solution to grocery buying problems.
An ecosystem resembling food stores was constructed. Each B. Results
store will have its unique inventory, availability, and pricing Fig.4 shows the connection between episode rewards and
patterns. Distances between each pair of stores in the episode count, revealing the Q-learning algorithm's learning
ecosystem are also defined. Real-world or simplified distances ability. The algorithm's increasing trend shows its capacity to
might be used for simulation. Next, the agent's state will optimize rewards as it learns. In Fig. 4, the agent interacts
contain its present store and its purchasing status, which with the grocery shopping environment, making choices
shows whether each item on the shopping list has been (actions) based on its present state and getting feedback
purchased. The agent might shift between stores depending on (rewards) depending on its actions. During each contact, the
the situation. A list of stores the agent may visit from the agent updates its Q-values using the Q-learning update
present shop will be provided. Given a state, the agent will algorithm, which includes the observed reward and the
operate according to a policy. Rendition learning algorithms following state's maximum Q-value. The agent learns the
like Q-learning or policy iteration may teach this policy. ideal Q-values via repeated interactions and updates, allowing
Agents earn favorable rewards for finding the needed item in it to choose actions in various stages to optimize cumulative
stores. Rewards might be fixed or depending on the item price. rewards like cost savings, time efficiency, and customer
If the item is not discovered at the store, the agent may be satisfaction. Fig. 5 compares Q-learning awards to random
penalized to encourage finding it elsewhere. The agent may be rewards in 100 experiments. It starts each test with a random
penalized for going between stores to reduce distance. The state, follows the regulations, and earns rewards until the
agent may be penalized for item costs, prompting it to locate episode ends or a step limit is met. The x-axis shows the test
cheaper solutions. The transition model calculates the number and the y-axis shows the incentives, enabling
probability of changing states depending on agent activity and comparison of the two policies' performance. The graph in
result. Historical data or a distribution (e.g., Bernoulli Fig. 6 does value iteration 1000 times to get the best policy. It
distribution) might predict a shop's item availability. Distance then compares the rewards from a random policy with the
and traffic circumstances may affect the likelihood of ideal policy from value iteration, summing and displaying
switching shops. The transition probabilities may also evaluate them for visual comparison.
merchandise pricing, preferring stores with cheaper costs if
References
[1] Joo, Kwang Hyoun, Tetsuo Kinoshita, and Norio Shiratori. "Agent-
based grocery shopping system based on user's preference." Proceedings
Seventh International Conference on Parallel and Distributed Systems:
Workshops. IEEE, 2000.
[2] Du, Hongying, and Michael N. Huhns. "A Multiagent system approach
to grocery shopping." Advances on Practical Applications of Agents and
Multiagent Systems: 9th International Conference on Practical
Applications of Agents and Multiagent Systems. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2011.
[3] Joo, Kwang Hyoun, Tetsuo Kinoshita, and Norio Shiratori. "Design and
implementation of an agent-based grocery shopping system." IEICE
TRANSACTIONS on Information and Systems 83.11 (2000): 1940-
1951.
Fig. 5. Comparison of Q-Learning with Random Algorithm for Rewards Vs [4] L.Benedicenti, Xuguang Chen, Xiaoran Cao, and R. Paranjape, "An
Test No agent-based shopping system," Canadian Conference on Electrical and
Computer Engineering 2004 (IEEE Cat. No.04CH37513), Niagara Falls,
ON, Canada, 2004, pp.703-705Vol.2, doi:
10.1109/CCECE.2004.1345210.
[5] Serrano-Hernandez, Adrian, et al. "Agent-based simulation improves E-
grocery deliveries using horizontal cooperation." 2020 Winter
Simulation Conference (WSC). IEEE, 2020.
[6] Schenk, Tilman A., Günter Löffler, and Jürgen Rauh. "Agent-based
simulation of consumer behavior in grocery shopping on a regional
level."JournalofBusinessresearch 60.8(2007):894-903.
[7] Yu-San Chen, Chang-Franw Lee, Proceedings of the 2009 International
Conference on Artificial Intelligence (ICAI'09), “Agent-Based
Simulation of Consumer Purchasing Behaviour in a Virtual
Environment”.
[8] Giuseppe Mangioni, Rosario Sinatra, Vincenzo Nicosia, Vito Latora,
Journal of Artificial Societies and Social Simulation, “A Multi-Agent
Fig. 6. Value Iteration Vs Random Algorithm System for Modelling Consumer Behaviour in Supermarkets”.
VI. CONCLUSION FUTURE SCOPE [9] Yuri Kaniovski, Martin Summer, Journal of Artificial Societies and
Social Simulation, “Agent-based Modeling of Store Choice Dynamics”.
AI-enabled grocery shopping optimization has several exciting [10] Kozma Zsolt, Máté Csorba, 7th International
research and development opportunities. For a truly ConferenceonAppliedInformatics,2015, “Agent-based Simulation of
customized shopping experience, refine RL algorithms to Consumer Behaviour in the Retail Sector”.
provide more personalized recommendations based on diet, [11] Michele Piccione, Annual Review of Economics, 2016, “Agent-Based
health goals, and culture. The present work focused on Modeling in Consumer Economics”.
increasing multi-agent system research to model complex [12] Pavle Boskovic, Srdjan Boskovic, Aleksandar Ivanovic, 2018 17th
International Symposium INFOTEH-JAHORINA (INFOTEH), “Agent-
customer, merchant, and stakeholder interactions to increase Based Model of Consumer Decision Making Process in Online Grocery
joint optimization and system efficiency. The work aimed at Shopping”.
developing adaptable RL algorithms to adjust shopping [13] Jungsoo Park, Hyunju Park, 2014 International Conference on Control,
strategies to product availability, pricing, and store layout Automation and Information Sciences (ICCAIS), “Agent-Based
Simulation for Understanding Consumers' Shopping Behavior in Virtual
changes. RL framework sustainability principles promote food Supermarket.
waste reduction, eco-friendly product selection, and carbon-
efficient transportation. The present work focused on