David Martínez: RL and AL

The methods presented below integrate model-based reinforcement learning (RL) and active learning with the objective of minimizing both the number of action executions and the teacher demonstration requests. These approaches learn rule models that can be used by planners.

The code is available at bitbucket.

Documentation for the code is available here.

V-MIN extends REX [4] by including a teacher in the loop to reduce the number of actions required to learn.

The REX algorithm is devised to apply relational generalizations to R-MAX, reducing the exploration needed.
V-MIN actively requests a teacher demonstration when a plan with a value larger than a certain threshold V_min cannot be found.

The result is that V-MIN can learn models even if exploration is very scarce. If an important state-action pair is not visited, teacher demonstration are requested until the agent learns a model that can obtain values larger than V_min.

V-MIN has the following features:

V-MIN can learn with scarce exploration. V-MIN will visit unexplored important state-action pairs through demonstrations.
As a model that can be used by task planners is learned, the initial state, the goal state, and the number of objects can be changed without requiring further learning.
Demonstrations can be used to learn new unknown actions that weren't required before.
A high V_min forces the system to learn good policies at the cost of a higher number of demonstrations and exploration actions, whereas a lower V_min leads to a faster and easier learning process, but worse models are learned.
The teacher can change V_min online until the system performs as desired.

Video comparing REX and V-MIN

The video below compares REX with V-MIN in the AUTAS scenario. The video speed is slower during the first episodes to show with more detail the teacher demonstration requests and the exploration.

The algorithms solve three problems with 5 episodes each. AUTAS 1 is the standard one, while AUTAS 2 and AUTAS 3 show new unexpected cases where previously unknown actions are required (and thus a new demonstration is required in V-MIN).

Demonstration Request	Exploration	Exploitation	Rules

REX (RL without demonstrations)

V-MIN (RL + active learning)

In the work presented before, the agent actively requests demonstrations to a teacher. However, the teacher may not know what parts of the model are unknown. Here we analyze the model to decide which parts are causing the planner to find bad solutions, and use these causes to provide guidance to the teacher.

To explain the planning errors, Göbelbecker et al. [5] designed a method to find excuses, that are changes to the state that make the planner find a solution. Based on these excuses, we can provide feedback to the teacher about possible wrong preconditions or unknown needed effects that may be the cause that make the planner fail.

Video

An example is shown in the video below. Here the robot has to learn a new action to reposition a shaft in a vertical position. When the planner cannot obtain a solution, the robot tells to the teacher that the placeShaft action requires a non horizontal position for the shaft, but it cannot be obtained.

In real-world domains, there are usually sequences of actions that, if executed, may produce unrecoverable errors (e.g. breaking an object). Robots should avoid repeating such errors when learning, and thus explore the state space in a more intelligent way. Robots should reason about dead-ends and their causes, and once dangerous actions are identified, the RL algorithm can avoid them.

We show this in a tableware clearing task.

[1] V-MIN: Efficient reinforcement learning through demonstrations and relaxed reward demands
D. Martínez, G. Alenyà, and C. Torras
Proceedings of the AAAI Conference on Artificial Intelligence, 2015, pp. 2857–2863

PDF Bibtex Code

[2] Relational reinforcement learning with guided demonstrations
D. Martínez, G. Alenyà, and C. Torras
Artificial Intelligence, 247: 295-312, 2017

PDF Bibtex Code

[3] Safe robot execution in model-based reinforcement learning
D. Martínez, G. Alenyà, and C. Torras
IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015, pp. 6422-6427

PDF Bibtex

[4] Exploration in relational domains for model-based reinforcement learning
T. Lang, M. Toussaint, and K. Kersting
The Journal of Machine Learning Research, 2012, 13(1), pp. 3725–3768

[5] Coming Up With Good Excuses: What to do When no Plan Can be Found
M. Göbelbecker, T. Keller, P. Eyerich, M. Brenner, and B. Nebel
International Conference on Automated Planning and Scheduling, 2010, pp. 81–88

Combining Reinforcement and Active Learning

V-MIN [1]

Video comparing REX and V-MIN

REX (RL without demonstrations)

V-MIN (RL + active learning)

Teacher Guidance [2]

Video

Safe Reinforcement Learning [3]