Combining Reinforcement and Active Learning
The methods presented below integrate model-based reinforcement learning (RL) and active learning with the objective of minimizing both the number of action executions and the teacher demonstration requests. These approaches learn rule models that can be used by planners.
The code is available at bitbucket.
Documentation for the code is available here.
V-MIN extends REX  by including a teacher in the loop to reduce the number of actions required to learn.
- The REX algorithm is devised to apply relational generalizations to R-MAX, reducing the exploration needed.
- V-MIN actively requests a teacher demonstration when a plan with a value larger than a certain threshold Vmin cannot be found.
The result is that V-MIN can learn models even if exploration is very scarce. If an important state-action pair is not visited, teacher demonstration are requested until the agent learns a model that can obtain values larger than Vmin.V-MIN has the following features:
- V-MIN can learn with scarce exploration. V-MIN will visit unexplored important state-action pairs through demonstrations.
- As a model that can be used by task planners is learned, the initial state, the goal state, and the number of objects can be changed without requiring further learning.
- Demonstrations can be used to learn new unknown actions that weren't required before.
- A high Vmin forces the system to learn good policies at the cost of a higher number of demonstrations and exploration actions, whereas a lower Vmin leads to a faster and easier learning process, but worse models are learned.
- The teacher can change Vmin online until the system performs as desired.
Video comparing REX and V-MIN
The video below compares REX with V-MIN in the AUTAS scenario. The video speed is slower during the first episodes to show with more detail the teacher demonstration requests and the exploration.
The algorithms solve three problems with 5 episodes each. AUTAS 1 is the standard one, while AUTAS 2 and AUTAS 3 show new unexpected cases where previously unknown actions are required (and thus a new demonstration is required in V-MIN).
REX (RL without demonstrations)
V-MIN (RL + active learning)
Teacher Guidance 
In the work presented before, the agent actively requests demonstrations to a teacher. However, the teacher may not know what parts of the model are unknown. Here we analyze the model to decide which parts are causing the planner to find bad solutions, and use these causes to provide guidance to the teacher.
To explain the planning errors, Göbelbecker et al.  designed a method to find excuses, that are changes to the state that make the planner find a solution. Based on these excuses, we can provide feedback to the teacher about possible wrong preconditions or unknown needed effects that may be the cause that make the planner fail.
An example is shown in the video below. Here the robot has to learn a new action to reposition a shaft in a vertical position. When the planner cannot obtain a solution, the robot tells to the teacher that the placeShaft action requires a non horizontal position for the shaft, but it cannot be obtained.
Safe Reinforcement Learning 
In real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. Robots should reason about dead-ends and their causes, and once dangerous actions are identified, the RL algorithm can avoid them.
We show this in a tableware clearing task.
 V-MIN: Efficient reinforcement learning through demonstrations and relaxed reward demands D. Martínez, G. Alenyà, and C. Torras Proceedings of the AAAI Conference on Artificial Intelligence, 2015, pp. 2857–2863  Relational reinforcement learning with guided demonstrations D. Martínez, G. Alenyà, and C. Torras Artificial Intelligence, 247: 295-312, 2017  Safe robot execution in model-based reinforcement learning D. Martínez, G. Alenyà, and C. Torras IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015, pp. 6422-6427  Exploration in relational domains for model-based reinforcement learning T. Lang, M. Toussaint, and K. Kersting The Journal of Machine Learning Research, 2012, 13(1), pp. 3725–3768  Coming Up With Good Excuses: What to do When no Plan Can be Found M. Göbelbecker, T. Keller, P. Eyerich, M. Brenner, and B. Nebel International Conference on Automated Planning and Scheduling, 2010, pp. 81–88