Deep Learning Feature Modeling
Deep Learning Feature Modeling
Deep Learning Feature Modeling
1
Soccer actions can also be parameterized (e.g., how hard to kick, turn
direction) but for simplicity our initial evaluation only examines action
classification.
full and zoomed models perform reasonably well, the best representation. In Proceedings of the 25th International Florida
performance was achieved when the Combined model was Artificial Intelligence Research Society Conference, 323-328.
Marco Island, USA: AAAI Press.
used. This demonstrates that using multiple representations
Floyd, M. W., and Esfandiari, B. 2011. A case-based reasoning
of the visual data is preferable since these models have
framework for developing agents using learning by observation. In
varying strengths and weaknesses. Proceedings of the 23rd IEEE International Conference on Tools
with Artificial Intelligence, 531-538. Boca Raton, USA: IEEE
Computer Society Press.
5. Conclusions and Future Work Floyd, M. W., Esfandiari, B., and Lam, K. 2008. A case-based
reasoning approach to imitating RoboCup players. In Proceedings
We described a preliminary study of how well a learning by
of the 21st International Florida Artificial Intelligence Research
observation agent can learn without explicitly modeling the Society Conference, 251-256. Coconut Grove, USA: AAAI Press.
objects it observes. Our approach uses an expert’s raw visual Gómez-Martín, P. P., Llansó, D., Gómez-Martín, M. A., Ontañón,
inputs at two levels of granularity to train a pair of CNNs. S., and Ram, A. 2010. MMPM: A generic platform for case-based
In our study, the agent reproduced the expert’s action planning research. In Proceedings of the International Conference
selection decisions reasonably well in tasks drawn from a on Case-Based Reasoning Workshops, 45-54. Alessandria, Italy.
simulated soccer domain. This indicates that even with Grollman, D. H., and Jenkins, O. C. 2007. Learning robot soccer
limited training observations, noisy observations, and partial skills from demonstration. In Proceedings of the IEEE
International Conference on Development and Learning, 276-281.
observability, it is possible to create an agent that can learn
London, UK: IEEE Press.
an expert’s behavior without being provided an explicit
Hausknecht, M., and Stone, P. (2016) Deep reinforcement learning
object model. in parameterized action space. In Proceedings of the International
Although our approach removes the need to model Conference on Learning Representations. San Juan, Puerto Rico.
observable objects, it still requires modeling the possible Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick,
actions. An area of future work will be to identify methods R. B., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional
for learning the actions an expert performs based on architecture for fast feature embedding. In Proceedings of the ACM
observations. Additionally, we have only examined a single International Conference on Multimedia, 675-678. Orlando, USA:
ACM.
two-model architecture (i.e., selecting the most confident
prediction from two CNNs). In future work we will examine LeCun, Y., Bengio, Y. and Hinton, G. E. 2015. Deep learning.
Nature, 521, 436-444.
if added benefit can be achieved by training additional
Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012.
models (e.g., other levels of granularity) or by modifying
Classification with deep convolutional neural networks. In
how the model outputs are combined (e.g., inducing a Proceedings of the 26th Annual Conference on Neural Information
decision tree from their output). Our preliminary evaluation Processing Systems, 1106-1114. Lake Tahoe, USA.
has only measured the performance from a single Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J.,
experiment from a single expert in a single domain. We plan Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K.,
to perform a more thorough evaluation of the learning Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I.,
performance involving numerous experimental trails. This King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D.
2015. Human-level control through deep reinforcement learning.
will not only allow us to show the benefit of our approach, Nature, 518, 529-533.
but it will also allow for a thorough comparison with other
Ontañón, S., Mishra, K., Sugandh, N., and Ram, A. 2007. Case-
LbO agents that learn in RoboCup (Floyd, Esfandiari, and based planning and execution for real-time strategy games. In
Lam 2008; Young and Hawes 2015). To determine whether Proceedings of the 7th International Conference on Case-Based
our approach is truly domain-independent, we plan to Reasoning, 164-178. Belfast, UK: Springer.
conduct additional studies with different experts in different RoboCup. 2016. RoboCup Official Site. Retrieved from
environments. Finally, we plan to examine how this [http://www.robocup.org]
approach can be extended to learn from state-based experts Romdhane, H., and Lamontagne, L. 2008. Forgetting reinforced
since the RoboCup expert we examined is purely reactive cases. In Proceedings of the 9th European Conference on Case-
Based Reasoning, 474-486. Trier, Germany: Springer.
(i.e., the expert’s action is based entirely on its current visual
inputs). Rubin, J., and Watson, I. 2010. Similarity-based retrieval and
solution re-use policies in the game of Texas Hold’em. In
Proceedings of the 18th International Conference on Case-Based
Reasoning, 465-479. Alessandria, Italy: Springer.
References
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den
Coates, A., Abbeel, P., and Ng, A. Y. 2008. Learning for control Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam,
from multiple demonstrations. In Proceedings of the 25th V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner,
International Conference on Machine Learning, 144-151. N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K.,
Helsinki, Finland: ACM. Graepel, T., and Hassabis, D. 2016. Mastering the game of Go with
Floyd, M. W., Bicakci, M. V. and Esfandiari, B. 2012. Case-based deep neural networks and tree search. Nature, 529, 484-503.
learning by observation in robotics using a dynamic case
Thurau, C., Bauckhage, C., and Sagerer, G. 2003. Combining self
organizing maps and multilayer perceptrons to learn bot-behaviour
for a commercial game. In Proceedings of the 4th International
Conference on Intelligent Games and Simulation, 119-123.
London, UK: EUROSIS.
Young, J., and Hawes, N. 2015. Learning by observation using
qualitative spatial relations. In Proceedings of the International
Conference on Autonomous Agents and Multiagent Systems, 745-
751. Istanbul, Turkey: ACM.