0% found this document useful (0 votes)

18 views17 pages

RL Report TEAM - 12

This report explores the application of reinforcement learning (RL) to optimize response strategies in care processes involving aggression incidents among clients with intellectual disabilities. By training a Markov Decision Process (MDP) on data from 21,384 incidents, the study employs Q-learning and SARSA algorithms to derive optimal policies that align with current strategies while offering additional effective options. The findings suggest that RL can enhance decision-making in dynamic healthcare environments, ultimately aiming to minimize aggression and support staff well-being.

Uploaded by

K arun kumar Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views17 pages

RL Report TEAM - 12

Uploaded by

K arun kumar Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

x

Itgalpura, Rajanukunte, Bengaluru - 560064

School of Computer Science And Engineering

Reinforcement Learning CSE3011
Report On

Using Reinforcement Learning to Optimize Responses in Care

Processes: A Case Study on Aggression Incidents

Submitted
by:

STUDENT NAME ROLL NO

PRATHAM V 20221CSG0082
CHETHSN S D 20221CSG0080

SUMITH HIREMATH 20221CSG0070

KUSHAL GOWDA 20221CSG0108

Section: 6CSG02

Year: 2025

Submitted to: Ms. Smitha S P

Asst. Professor

Date of Submission: 21/04/2025

1
Table of Contents

 Cover Page 01

 Table of Contents 02

 Abstract 03

 Introduction 04

 Problem Analysis 06

 Methodology 07

 Results 13

 Conclusions 16

 Bibliography 17

Abstract
2
Previous studies have used prescriptive process monitoring to find actionable
policies in business processes and conducted case studies in similar domains,
such as the loan application process and the traffic fine process. However, care
processes tend to be more dynamic and complex. For example, at any stage of a
care process, a multitude of actions is possible. In this paper, we follow the
reinforcement approach and train a Markov decision process using event data
from a care process. The goal was to find optimal policies for staff members when
clients are displaying any type of aggressive behavior. We used the reinforcement
learning algorithms Q-learning and SARSA to find optimal policies. Results
showed that the policies derived from these algorithms are similar to the most
frequent actions currently used but provide the staff members with a few more
options in certain situations.

This study investigates how reinforcement learning (RL) can optimize response
strategies in care processes involving aggression by clients with intellectual
disabilities. Using data from 21,384 reported incidents in Dutch care facilities, we
train Markov Decision Processes (MDPs) and apply Q-learning and SARSA.
Results show that RL-derived policies align with current strategies but offer
additional effective choices, especially in more complex scenarios. RL thus holds
promise for enhancing decision-making in dynamic healthcare settings.

Key words: prescriptive process mining, reinforcement learning, Markov decision

process, process optimization, process mining

Introduction
3
Prescriptive process monitoring focuses on analyzing process execution data to not only
predict the future behavior of a process but also provide actionable recommendations or
interventions to optimize the process [1, 2, 3]. It goes be- yond descriptive or predictive
process monitoring by actively suggesting specific actions or decisions for improving process
performance, compliance, or efficiency. Considering the decision points in business
processes, the ability to offer specific guidance to users regarding optimal actions is crucial, as
it can lead to improved decision-making and efficiency.
One prominent approach is to use reinforcement learning, which learns online by interacting
with an environment to adapt and improve its recommendations over time. The
environments can be learned and built using the historical execution traces and the feedback
they received. While reinforcement learning methods have been applied in business processes,
healthcare processes exhibit distinct characteristics and present new challenges for these
techniques [4], such as dynamic workflows, diverse stakeholders, and patient safety
considerations. In particular, patients may exhibit very diverse statuses, and a wide range of
actions is possible at any stage. Moreover, each patient may react differently to these actions.
These challenges may cause RL methods not to converge or not be able to improve the current
policies. In such dynamic settings, it is worth investigating the validity and effectiveness of
the RL approaches.
In this paper, we focus on the healthcare domain, and in particular, the process of actions
and responses in the aggression incidents by clients with intellectual impairments in
community care facilities [5]. Being in aggressive situations can have a severe impact on staff
members since there is a mediation effect between experiencing aggressive behavior from
clients and burnout through fear of assault [6]. This means that experiencing aggressive
behavior leads to fear of assault, which in turn leads to burnout. It also has a negative impact
on the clients themselves because aggressive behavior can lead to more aggressive behavior
[7]. Therefore, learning the optimal way to act during aggression incidents helps de-escalate
the incidents and reduce negative impact.
Previous studies have analyzed the aggression incidents of such clients within Dutch
residential care facilities using a process mining approach [8] or proposing to mine potential
causal patterns [9, 10, 11]. This meant that insights into the use of actions and their effects
could be made visible to show which actions had a negative and which actions had a positive
outcome in each situation. However, this approach can only provide recommendations for a
single incident and does not take consecutive incidents and their consequences into account.

4
In this paper, we investigate the use of prescriptive process monitoring, in- spired by [12],
particularly reinforcement learning techniques, for this healthcare process, in which the
optimal policies of the best possible action in a given situation (or state) can be determined.
First, we train a Markov Decision Pro- cess (MDP) from the aggression incident log [10].
Second, we apply reinforcement learning techniques, aiming to find optimal policies for staff
members to minimize aggressive incidents by clients with intellectual impairments. We use
the model-free, value-based control algorithms: Q-learning and SARSA. The rea- son for
choosing these methods, rather than the Monte Carlo methods used in [12], stems from their
practical advantage of achieving earlier convergence on stochastic processes [13].
The structure of the paper is as follows. Section 2 discusses the related work. Then we
explain the methods in Section 3, including the description of the data set and the design of
the environment. Section 4 presents the results, and Section 5 discusses the results. Section 6
concludes the paper.

Reinforcement Learning (RL), a machine learning paradigm that learns optimal policies
through interactions with an environment, holds significant promise for supporting such
decision-making. RL agents iteratively improve their strategies by receiving feedback in the
form of rewards or penalties, making them ideal for environments where sequential decisions
have long-term consequences.
Applying RL to healthcare, however, presents unique challenges. Care processes are
highly dynamic, involving numerous stakeholders, fluctuating patient conditions, and a wide
range of possible interventions. Each patient may react differently to similar actions, and the
environment rarely remains stable over time. This variability can hinder the convergence of
RL algorithms and complicate the design of reward functions and state transitions.
In this context, the study investigates how RL can be used to support staff in managing
aggressive incidents involving clients with intellectual disabilities. The goal is to derive
optimal response policies that help minimize the escalation of such incidents and support the
well-being of both clients and staff members.
To achieve this, the study trains a Markov Decision Process (MDP) on real-world event
data and applies two model-free, value-based RL algorithms: Q-learning and SARSA. These
are chosen over Monte Carlo methods due to their superior performance in stochastic
environments—where outcomes are probabilistic and data is often noisy. Specifically, Q-
learning and SARSA are known for their ability to converge more quickly and reliably in such
settings, making them better suited for the unpredictable nature of healthcare processes.
This paper thus contributes to the field of prescriptive process monitoring by evaluating the
feasibility and effectiveness of RL in a sensitive and complex domain, offering insights into
both its potential and its limitations.

Problem Analysis
5
Research in prescriptive process monitoring has been done in the recent cou- ple of years,
mainly with a focus on business processes. Fahrenkrog-Petersen et al. [1] used it to make a
framework that parameterized a cost model to assess the cost–benefit trade-off of generating
alarms. Bozorgi et al. [14] researched it in the context of reducing the cycle time of general
supply chain processes. Both use supervised learning methods instead of reinforcement
learning methods and predict a threshold value that, when exceeded, recommends an action.
The algorithms themselves do not make a recommendation; only predictions are made, and
based on the predictions, a user-defined action is recommended.
Weinzierl et al. [15] also made this remark and proposed an alternative approach to
prescriptive process monitoring in which there is a learning and a recommendation phase, in
which the recommendation gives the next best action to take. Branchi et al. [12] used
prescriptive process monitoring with Monte Carlo methods to determine the best actions to
lend out loans and ensure most traffic fines are paid.

The Monte Carlo methods are valid algorithms, although TD methods such as Q-learning
and SARSA tend to converge earlier on stochastic processes in practice [13]. In this paper,
we use Q-learning and SARSA to find optimal policies.

Aggressive behavior in care facilities affects both staff and clients, leading to burnout and
escalation if not managed effectively. Previous research used process mining but focused on
isolated incidents. The current approach aims to model aggression incidents as sequential
decision-making problems using RL, which can capture long-term effects of actions across
episodes. This is framed as a Markov Decision Process (MDP) where aggression types are
states, actions are staff responses, and transitions are learned from historical data.

6
Methodology

This section describes the methods used in the research. First, we describe the data set. We
then explain the preprocessing steps and the way the environment is built. Finally, we discuss
the evaluation measures used.

Data set

The data set is from a Dutch residential care organization with several facilities. The event
data contains 21,384 reported aggression incidents from 1,115 clients with intellectual
impairments. The data has been anonymized for privacy rea- sons. The reported incidents
were reported by staff members between the 1st of January 2015 and the 31st of December
2017. The event data includes attributes such as the date of the incident, pseudonym client ID,
the type of aggression, the countermeasure that the staff took, and the type of persons
involved (such as family, staff members, and other clients).

A simplified example of the event data is listed in Table 1.

Pseudonym Date of Aggression Involve Measure

client incident type d s
ab45 05/01/2016 va family talk to
client
ab45 06/01/2016 pp client none
lz12 06/01/2015 sib unknow seclusion
n
lz12 18/01/2015 po client none

Table 1. A snippet of the incident data where the last column describes the countermeasures taken by staff
members to stop the aggression.

7
In the event data, four types of aggression are reported, which are verbal aggression (va),
physical aggression against people (pp), physical aggression against objects (po), and self-
injurious behavior (sib). Eight distinct countermeasures are reported by the staff members:
talk to the client, held with force, no measure taken, seclusion, send to another room, distract
client, terminate contact, and starting preventive measures.

Data cleaning and preprocessing

To use reinforcement learning with this dataset, we preprocess the data. We follow the same
steps as in [10]. First, we add the type of next aggression incident as an attribute of the current
event, in order to create tuples of three which contain the type of current aggression, the
countermeasures taken by a staff member, and the type of next aggression. The aim is to use
the aggression types as the states a client is in and use the countermeasures as actions. Such a
triplet describes a transition from one state to the next state after taking an action.

In the second step, we group incidents into episodes. According to a behav- ioral expert at
the care organization [10], an episode is a sequence of incidents by the same client that
occurred after each other, where the time between incidents is less than or equal to nine days.
Following this domain knowledge, we segment the sequences of incidents into episodes.
When two consecutive incidents ei and ei+1 of a client are more than nine days apart, we insert
a Tau after ei as the final state of an episode. The incident ei+1 is the start state of the next
episode. An overview of the approach is shown in Figure 1.

Fig. 1. Preprocessing pipeline used to get enriched and clean data

8
We assign each episode a unique ID. The episodes that do not end in a Tau state are
considered incomplete and, therefore, filtered. We obtained a total of 8,800 episodes after
this filter, consisting of 19,848 incidents. In addition, the episodes where the incidents miss
the values on the measures column are removed; these are incidents in which the staff
member did not report the measures they had taken. Applying this filter reduced the number
of episodes to 8,013, consisting of 15,464 incidents. Finally, we decided to remove the most
infrequent action, ‘preventive measures started’ due to its ambiguity and to reduce the search
space. Any episode that contains this action was removed, resulting in 14,676 incidents and
7,812 episodes for training the final MDP. In Table 2, a simplified example of the
preprocessed log is listed.

Building the environment

Now that the data is cleaned and preprocessed, we use it to build a finite MDP. For this, we
need the five-tuple consisting of the states, actions, transition probabilities, rewards, and
discount factor [13]. The discount factor is a hyperparameter that can be tuned; therefore, we
later perform hyperparameter tuning to determine the discount factor for the agent.

Pseudonym Aggression Measures Next aggression Episode

client type type Id
ab45 va talk to pp 1
client
ab45 pp none Tau 1

lz12 sib secluded Tau 2

lz12 po none Tau 3

Table 2. A simplified example of the preprocessed event data

9
Action or state Reward
Tau 1
Verbal Aggression (va) 0
Physical Aggression against objects (po) -1
Self-injurious behavior (sib) -3
Physical Aggression against people (pp) -4
Client distracted, Contact terminated, Send to other room -1
Hold with force, Seclusion -2
Other actions 0

Table 3. The rewards (penalty) assigned for each action or state, based on the severity of the action and the state
[16]. When an agent takes an action and ends in the follow- up state, the combination of the action and state is
used to compute the reward.

We describe the MDP using the standard formalization in [13] as follows:

– S = {va, po, sib, pp, Tau}, i.e., the set of states;

– A = {talk to the client, no measure taken, seclusion, holding with force, send to another
room, distract client, terminate contact}, i.e., the set of actions;
– P, which is the probability of going from one state to the next based on the action. This is
determined using the following function

P (s, a, s′) = Number of times a leads to s′

Number of times a is chosen in state s

10
–
R : A × S Z, which is the reward function. We defined the reward function based on the literature in assessin

Another design choice has been made regarding the calculation of the transition
probabilities. In the data set, multiple actions could be filled in at each incident. For this
paper, a decision was made to consider only the most frequent action as the transition from
one state to the next, in order to limit the number of possible actions and avoid having too
many infrequent actions. Also, the reward function was designed based on the severity of the
action and the state as indicated in the existing literature in aggression [16]. The simple
reward function was designed on purpose such that the results can be more easily
communicated to the experts. A subgraph of the environment can be seen in Figure 2.

Fig. 2. A subgraph of the MDP, depicting the current state of self-injurious behavior (SIB), a sample of actions
that can be chosen, and a sample of transitions. P is the probability of going to that state, and R is the reward
associated with that action and next state.

11
Training the agents

We used the following parameters in the tuning: the learning rate, α ∈ [0, 1], the discount
factor, γ ∈ [0, 1], and the amount of exploration, ϵ ∈ [0, 1], which have an impact on the
training of the agents and therefore the results. The best parameters are obtained
experimentally by hyperparameter tuning using the best average reward of 100 runs as the
goal, each consisting of 2000 episodes.

The search spaces are α ∈ [0.1, 0.2, 0.3, 0.4, 0.5], γ ∈ [0.2, 0.4, 0.6, 0.8, 1.0] and
ϵ ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. First γ was obtained while keeping α and ϵ on 0.1. After this ϵ was
obtained using the optimal γ value and α = 0.1 and finally α was obtained using the optimal γ
and ϵ value. Each parameter has been used for ten different runs to get a fair average. The
hyperparameter values for both Q-learning and SARSA are α = 0.2, γ = 0.2 and ϵ = 0.1.

Evaluation of policies

We evaluate the agents both quantitatively, by comparing the average rewards, and
qualitatively, by discussing the policies. For the quantitative evaluation, we compute the
average reward for the best-trained agent using Q-learning and the best-trained agent using
SARSA. These are then compared with the average reward of taking random actions and the
average reward with the current policy. The current policy has been derived as the most
frequent action taken in a state. The current policy is “talking to the client” when they display
verbal.

12
Results

In this section, we first present the results regarding the rewards. Next, we discuss the results
qualitatively, presenting the optimal policy and the variants. We used two baselines to
compare the results: (1) using random actions and (2) taking the most frequent action at each
state as the policy for the agent. The data set is shared under the NDA and thus unavailable.
The code and the MDPs used in this paper are online available 1, which can be used to
reproduce the results.

Quantitative results

In this section, the average reward per policy is described and evaluated. It is listed in Table4.

Policy Average reward

Random -3.783

Most frequent action -1.105

Q-learning -1.127

SARSA -1.168

Table 4. Average reward per policy based on 10,000 runs, each with 100 episodes.

We run each policy for 10,000 runs, each consisting of 100 episodes, resulting in
1,000,000 episodes total. To test if the differences between the policies are significant, we
performed a one-way ANOVA with the data from SARSA, Q- Learning, and the current
policy. The one-way ANOVA was done using the Scipy library from Python 3, specifically
the stats.f oneway function. The p-value was 7.719e

26, which is smaller than 0.05, therefore, we can reject the null hypothesis that the groups have the s

13
https://git.science.uu.nl/6601723/ppm-aggressive-incidents

Qualitative results

This section describes the qualitative results where we show the derived policies and the most
common variants of episodes per agent. The derived policies can be found in Table 5, where
the action taken at each state for each policy can be found.

Policy VA SIB PP PO
Most frequent talk with no talk with talk with
action client measure client client
Q-learning talk with no talk with no measure
client measure client
SARSA no measure no talk with talk with
measure client client

Table 5. Derived policies for Q-learning and SARSA together with the most frequent actions taken on the
10,000 runs, each with 100 episodes.

The five most common variants with their frequencies for each of the agents can be found
in Tables 6, 7, and 8. In the tables, each variant is a distinct episode of tuples, where the first
element of the tuple is the current state, the second element is the action taken, and the last
element is the next state after the action. If the state is Tau, the episode is ended; otherwise,
another action is taken.

Path Frequency
(va, Talk with client, Tau) 14454
(sib, No measure, ’Tau’) 13987
(po, Talk with client, Tau) 13100
(pp, Talk with client, Tau) 12769
(pp, Talk with client, pp)(pp, Talk with client, Tau) 4454

14
Table 6. Five most common

Path Frequency
(va, Talk with client, Tau) 14417
(sib, No measure, Tau) 14294
(po, No measure, Tau) 12974
(pp, Talk with client, Tau) 12866
(pp, Talk with client, pp)(pp, Talk with client, Tau) 4526

Table 7. Five most common variants when using the policy derived by Q-learning.

In the tables, it can be seen that the four most frequent variants of episodes end in one
action for all policies. For each state doing that action leads immedi- ately to Tau regardless
of the policy. Also, most of the episodes ended when only one action had been taken. When
we take a closer look at the current policy, the Q-learning policy, and the SARSA policy, we
see that most variants are the same.

Path Frequency
(va, No measure, Tau) 14926
(sib, No measure, Tau) 14025
(pp, Talk with client, Tau) 13079
(po, Talk with client, Tau) 12971
(sib, No measure, sib)(sib, No measure, Tau) 4313

Table 8. Five most common variants when using the policy derived by SARSA.

with only two differences: (1) in the verbal aggression state (va), “no measure” action is
suggested by the Q-learning; (2) in the self-injury-behavior state (sib), “no measure” action is
suggested by SARSA.

15
Conclusion

This paper presents the application of reinforcement learning (RL) to optimize response
policies in healthcare processes, specifically addressing aggressive incidents in care settings.
The research aims to investigate the validity of RL in healthcare and the ability to find optimal
response policies for staff members towards such incidents. The results have shown that RL
algorithms can find such an optimal policy, which consists of taking no measures or talking
with the client depending on the state. The policies are very similar to the current policy, i.e.,
the most frequent action taken by staff members.

Despite the simple MDP, the results do show that prescriptive process monitoring can be
used in the healthcare domain. Interestingly, it may be more beneficial to use the techniques
in more complex situations, rather than the simple situation. However, further research is
necessary to validate this finding. For future work, one may refine the environment by
extending the MDP with more refined states and actions. Future research should be
multidisciplinary, where such an environment can be more elaborately built based on experts
in the field of aggressive behavior and staff members who work daily with clients.
Results can then also be validated by the experts or staff to help them make better decisions
and therefore their input is crucial.

Acknowledgement This research was supported by the NWO TACTICS project

(628.011.004) and Lunet in the Netherlands. We would like to thank the experts from the
Lunet for their assistance. We also thank Dr. Shihan Wang and Dr. Ronald Poppe for the
invaluable discussions.

16
Bibliography

1. Fahrenkrog-Petersen, S.A., Tax, N., Teinemaa, I., Dumas, M., Leoni, M.d., Maggi, F.M.,
Weidlich, M.: Fire now, fire later: Alarm-based systems for prescriptive process
monitoring. Knowledge and Information Systems 64(2) (2021) 559–587
2. Metzger, A., Kley, T., Palm, A.: Triggering proactive business process adaptations via
online reinforcement learning. In: BPM. Volume 12168 of Lecture Notes in Computer
Science., Springer (2020) 273–290
3. Bozorgi, Z.D., Dumas, M., Rosa, M.L., Polyvyanyy, A., Shoush, M., Teinemaa, I.:
Learning when to treat business processes: Prescriptive process monitoring with causal
inference and reinforcement learning. In: CAiSE. Volume 13901 of Lecture Notes in
Computer Science., Springer (2023) 364–380
4. Munoz-Gama, J., et al.: Process mining for healthcare: Characteristics and challenges. J.
Biomed. Informatics 127 (2022) 103994
5. Hensel, J.M., Lunsky, Y., Dewa, C.S.: Staff perception of aggressive behaviour in
community services for adults with intellectual disabilities. Community Mental Health
Journal 50(6) (2013) 743–751
6. Mills, S., Rose, J.: The relationship between challenging behaviour, burnout and cognitive
variables in staff working with people who have intellectual disabilities. Journal of
Intellectual Disability Research 55(9) (2011) 844–857
7. Bushman, B.J., Bonacci, A.M., Pedersen, W.C., Vasquez, E.A., Miller, N.: Chewing on it
can chew you up: Effects of rumination on triggered displaced aggression. Journal of
Personality and Social Psychology 88(6) (2005) 969–983
8. Koorn, J.J., Lu, X., Leopold, H., Reijers, H.A.: Towards understanding aggressive
behavior in residential care facilities using process mining. In Guizzardi, G., Gailly, F.,
Suzana Pitangueira Maciel, R., eds.: Advances in Conceptual Modeling, Cham, Springer
International Publishing (2019) 135–145
9. Koorn, J.J., Lu, X., Leopold, H., Reijers, H.A.: Looking for meaning: Discovering action-
response-effect patterns in business processes. In: BPM. Volume 12168 of Lecture Notes
in Computer Science., Springer (2020) 167–183
10. Koorn, J.J., Lu, X., Leopold, H., Reijers, H.A.: From action to response to effect: Mining
statistical relations in work processes. Information Systems 109 (2022) 102035
11. Koorn, J.J., Lu, X., Leopold, H., Martin, N., Verboven, S., Reijers, H.A.: Mining statistical
relations for better decision making in healthcare processes. In: ICPM, IEEE (2022) 32–39
12. Branchi, S., Di Francescomarino, C., Ghidini, C., Massimo, D., Ricci, F., Ronzani, M.:
Learning to act: a reinforcement learning approach to recommend the best next activities
(2022)
13. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book,
Cambridge, MA, USA (2018)

PFAP e SBT
No ratings yet
PFAP e SBT
110 pages
Solid Works Training BASIC
100% (4)
Solid Works Training BASIC
176 pages
NeoscholarDPRLL1 Intro
No ratings yet
NeoscholarDPRLL1 Intro
53 pages
Activity DMAIC
No ratings yet
Activity DMAIC
3 pages
Rgreen 7715 A1 1
No ratings yet
Rgreen 7715 A1 1
9 pages
Comp. 06
No ratings yet
Comp. 06
56 pages
"Process" in Educational Research
No ratings yet
"Process" in Educational Research
2 pages
DAB Article
No ratings yet
DAB Article
4 pages
Reinforcement Learning Advancements Limitations An
No ratings yet
Reinforcement Learning Advancements Limitations An
14 pages
Handout 2 NCM 111a
No ratings yet
Handout 2 NCM 111a
5 pages
Module 6 - Data Science Methodology (Steps)
No ratings yet
Module 6 - Data Science Methodology (Steps)
19 pages
Modeling of Processes and Decisions in H
No ratings yet
Modeling of Processes and Decisions in H
15 pages
Major Final Report Kartik
No ratings yet
Major Final Report Kartik
48 pages
Reinforcement Learning in Healthcare A Survey
No ratings yet
Reinforcement Learning in Healthcare A Survey
36 pages
Grzes 10 Thesis
No ratings yet
Grzes 10 Thesis
199 pages
PLANNING
No ratings yet
PLANNING
3 pages
Parent Training For Disruptive Behavior The RUBI Autism Network, Clinician Manual Official Download
100% (17)
Parent Training For Disruptive Behavior The RUBI Autism Network, Clinician Manual Official Download
14 pages
Ics Measbeh
No ratings yet
Ics Measbeh
33 pages
tmpDA48 TMP
No ratings yet
tmpDA48 TMP
6 pages
NIHR Open Research
No ratings yet
NIHR Open Research
16 pages
Process Modeling and Management For Healthcare 1st Edition New Edition PDF
100% (20)
Process Modeling and Management For Healthcare 1st Edition New Edition PDF
16 pages
Procedural Integrity, Selection and Implementation Issues - Case Studies
No ratings yet
Procedural Integrity, Selection and Implementation Issues - Case Studies
18 pages
Learning When To Treat Policies
No ratings yet
Learning When To Treat Policies
19 pages
Timberlake 1991
No ratings yet
Timberlake 1991
13 pages
Engineering Psychology and Human Performance - 5th Edition Entire Book Download
100% (15)
Engineering Psychology and Human Performance - 5th Edition Entire Book Download
17 pages
Intervention
No ratings yet
Intervention
27 pages
Imporvement Project
No ratings yet
Imporvement Project
9 pages
C I P E R L: Ausal Nformation Rioritization For Fficient Einforcement Earning
No ratings yet
C I P E R L: Ausal Nformation Rioritization For Fficient Einforcement Earning
37 pages
Nursing Process
No ratings yet
Nursing Process
15 pages
NURS FPX 6030 Assessment 4 Implementation Plan Design
No ratings yet
NURS FPX 6030 Assessment 4 Implementation Plan Design
5 pages
Verbal Agression-Emotional Regulation
No ratings yet
Verbal Agression-Emotional Regulation
6 pages
Fischhoff, B., & Broomell, S. B. Judgment and Decision Making
No ratings yet
Fischhoff, B., & Broomell, S. B. Judgment and Decision Making
25 pages
Being Pragmatic About Healthcare Complexity Our Ex
No ratings yet
Being Pragmatic About Healthcare Complexity Our Ex
9 pages
Pedersen DDM As Choice Rule in RL
No ratings yet
Pedersen DDM As Choice Rule in RL
18 pages
A Process Based Approach To Brief ACT Interventions - Slides
No ratings yet
A Process Based Approach To Brief ACT Interventions - Slides
26 pages
Principles of Analytics 3CO02 Abdulaziz Alaql 44211
No ratings yet
Principles of Analytics 3CO02 Abdulaziz Alaql 44211
11 pages
Leadership Week 12
No ratings yet
Leadership Week 12
2 pages
CSCI946 w3 - DataPrep
No ratings yet
CSCI946 w3 - DataPrep
58 pages
Thesis Reinforcement Learning
100% (2)
Thesis Reinforcement Learning
5 pages
Research Process 1
No ratings yet
Research Process 1
94 pages
Patient Understanding of Diabetes Self-Management
No ratings yet
Patient Understanding of Diabetes Self-Management
9 pages
Action Research Final Draft July 21 Tiffany and Lewis
No ratings yet
Action Research Final Draft July 21 Tiffany and Lewis
15 pages
Assessing Learning Needs of Nursing Staff Jesson Plantas
No ratings yet
Assessing Learning Needs of Nursing Staff Jesson Plantas
4 pages
Monday Afternoon Longitudinal Data Causal Inference-1
No ratings yet
Monday Afternoon Longitudinal Data Causal Inference-1
55 pages
Research Purpose, Problem Statements
No ratings yet
Research Purpose, Problem Statements
9 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
9 Validity
No ratings yet
9 Validity
65 pages
Brook Moldoveanu April16
No ratings yet
Brook Moldoveanu April16
33 pages
Final Assignment 2
No ratings yet
Final Assignment 2
6 pages
RRL Chapter 3
No ratings yet
RRL Chapter 3
4 pages
Project Report
No ratings yet
Project Report
30 pages
Lecture 5 - Introduction To System Dynamics, Causal Loop Diagrams PDF
No ratings yet
Lecture 5 - Introduction To System Dynamics, Causal Loop Diagrams PDF
97 pages
Introduction To System Dynamics, Causal Loop Diagrams
100% (2)
Introduction To System Dynamics, Causal Loop Diagrams
97 pages
EPOL 472 FA2023 Week 4 Final - Students
No ratings yet
EPOL 472 FA2023 Week 4 Final - Students
48 pages
Week-7 Nursing Process Planning, Implementing, and Evaluating
No ratings yet
Week-7 Nursing Process Planning, Implementing, and Evaluating
29 pages
Dwyer Koutsouleris AnnRev 2018 PDF
No ratings yet
Dwyer Koutsouleris AnnRev 2018 PDF
28 pages
Process Mining in Health - State of Art
No ratings yet
Process Mining in Health - State of Art
21 pages
Health Promotion and Disease Prevention 2
No ratings yet
Health Promotion and Disease Prevention 2
15 pages
văn bản tiếng anh
No ratings yet
văn bản tiếng anh
18 pages
NCM 103 Handouts 4-1
No ratings yet
NCM 103 Handouts 4-1
4 pages
RL Report TEAM - 6
No ratings yet
RL Report TEAM - 6
13 pages
Document From ????
No ratings yet
Document From ????
5 pages
React JS
No ratings yet
React JS
11 pages
Module 3 Part 1B AgularJS
No ratings yet
Module 3 Part 1B AgularJS
24 pages
LB12 - Implement GAN For Neural Style Transfer (1) .Ipynb - Colab
No ratings yet
LB12 - Implement GAN For Neural Style Transfer (1) .Ipynb - Colab
17 pages
Important Programs
No ratings yet
Important Programs
20 pages
Public Participation in EIA
No ratings yet
Public Participation in EIA
12 pages
C# Expiriments
No ratings yet
C# Expiriments
42 pages
Labsheet 2
No ratings yet
Labsheet 2
21 pages
PREMKUMAR
No ratings yet
PREMKUMAR
2 pages
Comprehensive Analysis of Software Project Management
No ratings yet
Comprehensive Analysis of Software Project Management
20 pages
Soa Module1
No ratings yet
Soa Module1
42 pages
Case Study Applied ML
No ratings yet
Case Study Applied ML
1 page
Lab Sheet 3 - Interactive Webpage Using HTML5 and CSS3 - Resturant
No ratings yet
Lab Sheet 3 - Interactive Webpage Using HTML5 and CSS3 - Resturant
7 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Word Count
No ratings yet
Word Count
10 pages
CIGRE Technical Brochure 939 - Analysis of AC Transformer Reliability, September 2024
100% (1)
CIGRE Technical Brochure 939 - Analysis of AC Transformer Reliability, September 2024
109 pages
Seamo Paper E
75% (4)
Seamo Paper E
8 pages
Week 1a - Introduction To Biostatistics
No ratings yet
Week 1a - Introduction To Biostatistics
40 pages
Module 11 Unit 1 Correlation Analysis
No ratings yet
Module 11 Unit 1 Correlation Analysis
13 pages
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
No ratings yet
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
19 pages
GEOG2144 L2 Transport Planning and Analysis (2023-24) R
No ratings yet
GEOG2144 L2 Transport Planning and Analysis (2023-24) R
50 pages
AME 365 Heat Transfer & Combustion (UNIT 5)
No ratings yet
AME 365 Heat Transfer & Combustion (UNIT 5)
73 pages
Grade 12 A&B Term 1 Study Guide
No ratings yet
Grade 12 A&B Term 1 Study Guide
2 pages
Experiment 1 Computation of Parameters and Modelling of Transmission Lines
No ratings yet
Experiment 1 Computation of Parameters and Modelling of Transmission Lines
13 pages
S-DLP Inverse Variation
No ratings yet
S-DLP Inverse Variation
5 pages
Full Download Hands-On Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas PDF
100% (2)
Full Download Hands-On Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas PDF
55 pages
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
No ratings yet
Cambridge IGCSE ™: Physics 0625/53 October/November 2022
8 pages
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
No ratings yet
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
6 pages
Otc 25457 MS PDF
No ratings yet
Otc 25457 MS PDF
9 pages
Quiz No.1 - Physics
No ratings yet
Quiz No.1 - Physics
3 pages
CH 08 - Deflection in Statically Indeterminate Structures - 2
No ratings yet
CH 08 - Deflection in Statically Indeterminate Structures - 2
9 pages
Symplectic Geometry
No ratings yet
Symplectic Geometry
21 pages
Lecture 13
No ratings yet
Lecture 13
29 pages
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
Instant Access To Topics in Non Commutative Geometry Y. Manin Ebook Full Chapters
No ratings yet
Instant Access To Topics in Non Commutative Geometry Y. Manin Ebook Full Chapters
51 pages
Cryptography
No ratings yet
Cryptography
3 pages
Wavelets and Multiresolution Processing PDF
No ratings yet
Wavelets and Multiresolution Processing PDF
15 pages
Electrical and Electronics Measurements and Instrumentation by Prithwiraj Purkait PDF
83% (6)
Electrical and Electronics Measurements and Instrumentation by Prithwiraj Purkait PDF
651 pages
Checking The Timing Between Asynchronous Clock Group Paths
No ratings yet
Checking The Timing Between Asynchronous Clock Group Paths
14 pages
(Graduate Texts in Mathematics 3) Helmut H. Schaefer (Auth.) - Topological Vector Spaces-Springer New York (1971)
No ratings yet
(Graduate Texts in Mathematics 3) Helmut H. Schaefer (Auth.) - Topological Vector Spaces-Springer New York (1971)
305 pages
Hexadecimal To Others
No ratings yet
Hexadecimal To Others
12 pages
Unit-II - ADS - IMP QP
No ratings yet
Unit-II - ADS - IMP QP
3 pages
Liar by Isaac Asimov 2
No ratings yet
Liar by Isaac Asimov 2
16 pages
Maths Assignment
No ratings yet
Maths Assignment
7 pages

RL Report TEAM - 12

Uploaded by

RL Report TEAM - 12

Uploaded by

x

Itgalpura, Rajanukunte, Bengaluru - 560064

School of Computer Science And Engineering

Using Reinforcement Learning to Optimize Responses in Care

STUDENT NAME ROLL NO

SUMITH HIREMATH 20221CSG0070

Submitted to: Ms. Smitha S P

Date of Submission: 21/04/2025

Key words: prescriptive process mining, reinforcement learning, Markov decision

A simplified example of the event data is listed in Table 1.

Pseudonym Date of Aggression Involve Measure

Data cleaning and preprocessing

Fig. 1. Preprocessing pipeline used to get enriched and clean data

Building the environment

Pseudonym Aggression Measures Next aggression Episode

lz12 sib secluded Tau 2

lz12 po none Tau 3

Table 2. A simplified example of the preprocessed event data

We describe the MDP using the standard formalization in [13] as follows:

– S = {va, po, sib, pp, Tau}, i.e., the set of states;

P (s, a, s′) = Number of times a leads to s′

Number of times a is chosen in state s

Policy Average reward

Most frequent action -1.105

Acknowledgement This research was supported by the NWO TACTICS project

You might also like