The REX-D library
This is a C++ library that implements algorithms that combine Reinforcement Learning and Active Learning. It has the following features:
- The V-MIN [1] and the REX-D [2] algorithm.
- An algorithm to learn models with action and exogenous effects from a set of input state transitions. The learned models can be used by standard probabilistic task planners [3].
- Integrates Tobias Lang's implementation of Pasula et al.'s learner [4] and PRADA [5] planner.
- Provides a wrapper for IPPC planners. It currently works with the PROST [6] and the G-PACK [7] planners, but it should be easy to integrate with other planners.
- Implements teacher guidance [1] to facilitate interaction with a teacher.
Code
The code is available at bitbucket.
Documentation
Documentation for the code is available here.
Considerations about the library
- Most features should work in the master branch.
- A few features such as subgoals may only work in the v1.0 or previous tags.
- Teacher guidance only works with the G-PACK planner.
- Exogenous effects only work with the PROST planner.
[1] V-MIN: Efficient reinforcement learning through demonstrations and relaxed reward demands D. Martínez, G. Alenyà, and C. Torras Proceedings of the AAAI Conference on Artificial Intelligence, 2015, pp. 2857–2863 [2] Relational reinforcement learning with guided demonstrations D. Martínez, G. Alenyà, and C. Torras Artificial Intelligence, 247: 295-312, 2017 [3] Learning Relational Dynamics of Stochastic Domains for Planning D. Martínez, G. Alenyà, C. Torras, T. Ribeiro and K. Inoue International Conference on Automated Planning and Scheduling, 2016, pp. 235-243 [4] Learning symbolic models of stochastic domains H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling Journal of Artificial Intelligence Research, 2007, 29(1), pp. 309–352 [5] Planning with noisy probabilistic relational rules T. Lang, M. Toussaint The Journal of Machine Learning Research, 2012, 39, pp. 1–49 [6] PROST: Probabilistic Planning Based on UCT T. Keller and P. Eyerich International Conference on Automated Planning and Scheduling, 2012, pp. 119–127 [7] LRTDP Versus UCT for Online Probabilistic Planning A. Kolobov, Mausam, and D. S. Weld Proceedings of the AAAI Conference on Artificial Intelligence, 2012, pp. 1786–1792