_requests_for_research/multitask-rl-with-continuous-actions.html (41 lines of code) (raw):
---
title: 'Multitask RL with continuous actions.'
summary: ''
difficulty: 2 # out of 3
---
<p>At present, most machine learning algorithms are trained to solve one task
and one task only. But we do not necessarily train models on only one task
at a time because we believe that it is the best approach in the long term;
on the contrary, while we would like to use multitask learning in as many
problems as possible, the multitask learning algorithms are not yet at a stage
where they provide a robust and a sizeable improvement across a wide range of domains.
</p>
<p> This sort of multitask learning should be particularly important in reinforcement
learning settings, since in the long run, experience will be very expensive
relative to computation and possibly supervised data. For this reason, it is
worthwhile to investigate the feasibility of multitask learning using
the RL algorithms that have been developed so far.
</p>
<p>Thus the goal is to train a single neural network that can simultaneously solve a collection of <a href="https://gym.openai.com/envs#mujoco">MuJoCo
environments</a> in Gym. The current enviroments are dissimilar enough that it is unlikely that information can be shared between them. Therefore, your job is to create a set of similar environments that will serve as a good testbed for multi-task learning. Some possibilities include (1) bipedally walking with different limb dimensions and masses, (2) reaching with <a href="http://papers.nips.cc/paper/2785-learning-to-control-an-octopus-arm-with-gaussian-process-temporal-difference-methods.pdf">octopus arms</a> that have different numbers of links, (3) using the same robot model for walking, jumping, and standing.</p>
<p> At the end of learning, the trained neural network should be told
(via an additional input) which task it's running on, and
achieve high cumulative reward on this task. The goal of this problem
is to determine whether there is any benefit whatsoever to training a
single neural network on multiple environments versus a single
one, where we measure benefit via training speed. </p>
<p> We already know that the multitask learning on Atari has been difficult
(see the <a href="http://arxiv.org/pdf/1511.06295.pdf">relevant</a> <a href="http://arxiv.org/pdf/1511.06342.pdf">papers</a>). But will
multitask learning work better on MuJoCo environments? The goal is to
find out. </p>
<p>The most interesting experiment is to train a multitask net of this kind
on all but one MuJoCo environment, and then see if the resulting net
can be trained more rapidly on a task that it hasn't been trained on.
In other words, we hope that this kind of multitask learning can
accelerate training of new tasks. If successful, the results
can be significant.</p>
<hr />
<h3> Notes </h3>
<p> It is a reasonably risky project, since there is a chance that
this kind of transfer will be as difficult as it has been for
Atari.</p>