0% found this document useful (0 votes)
59 views

Knowledge-Based Reinforcement Learning For Data Mining

The document discusses two techniques for knowledge-based reinforcement learning (KBRL) that can be applied to data mining tasks where an agent actively collects data. The first technique uses high-level STRIPS operator knowledge to shape rewards and focus the search for an optimal policy. The second technique shapes rewards using a hierarchical tile coding approach when STRIPS knowledge is unavailable. Both KBRL techniques aim to overcome limitations of heuristic domain knowledge and reinforcement learning to produce optimal solutions for complex, real-world data collection tasks.

Uploaded by

mtsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Knowledge-Based Reinforcement Learning For Data Mining

The document discusses two techniques for knowledge-based reinforcement learning (KBRL) that can be applied to data mining tasks where an agent actively collects data. The first technique uses high-level STRIPS operator knowledge to shape rewards and focus the search for an optimal policy. The second technique shapes rewards using a hierarchical tile coding approach when STRIPS knowledge is unavailable. Both KBRL techniques aim to overcome limitations of heuristic domain knowledge and reinforcement learning to produce optimal solutions for complex, real-world data collection tasks.

Uploaded by

mtsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Knowledge-Based Reinforcement Learning for Data Mining

Daniel Kudenko and Marek Grzes


Department of Computer Science, University of York, York YO105DD, UK {kudenko,grzes}@cs.york.ac.uk

Extended Abstract

Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The rst approach is concerned with mining an agents observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agents actions and goals are often independent of the data mining task. The data collection is mainly considered as a side eect of the agents activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it ts the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human experts have developed heuristics that help them in planning and scheduling resources in their work place. However, this domain knowledge is often rough and incomplete. When the domain knowledge is used directly by an automated expert system, the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility to encounter unexpected situations. RL, on the other hand, can overcome the weaknesses of the heuristic domain knowledge and produce optimal solutions. In the talk we propose two techniques, which represent rst steps in
L. Cao et al. (Eds.): ADMI 2009, LNCS 5680, pp. 2122, 2009. c Springer-Verlag Berlin Heidelberg 2009

22

D. Kudenko and M. Grzes

the area of knowledge-based RL (KBRL). The rst technique [1] uses high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We showed that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modications which can overcome encountered problems. The STRIPSbased method we propose allows expressing the same domain knowledge in a dierent way and the domain expert can choose whether to dene an MDP or STRIPS planning task. We also evaluated the robustness of the proposed STRIPS-based technique to errors in the plan knowledge. In case that STRIPS knowledge is not available, we propose a second technique [2] that shapes the reward with hierarchical tile coding. Where the Q-function is represented with low-level tile coding, a V-function with coarser tile coding can be learned in parallel and used to approximate the potential for ground states. In the context of data mining, our KBRL approaches can also be used for any data collection task where the acquisition of data may incur considerable cost. In addition, observing the data collection agent in specic scenarios may lead to new insights into optimal data collection behaviour in the respective domains. In future work, we intend to demonstrate and evaluate our techniques on concrete real-world data mining applications.

References
1. Grzes, M., Kudenko, D.: Plan-based Reward Shaping for Reinforcement Learning. In: Fourth International IEEE Conference on Intelligent Systems, vol. 2, pp. 2229 (2008) 2. Grzes, M., Kudenko, D.: Learning potential for reward shaping in reinforcement learning with tile coding. In: Proceedings AAMAS 2008 Workshop on Adaptive and Learning Agents and Multi-Agent Systems (ALAMAS-ALAg 2008), pp. 1723 (2008)

You might also like