_requests_for_research/multitask-rl-with-continuous-actions.html

--- title: 'Multitask RL with continuous actions.' summary: '' difficulty: 2 # out of 3 --- At present, most machine learning algorithms are trained to solve one task and one task only. But we do not necessarily train models on only one task at a time because we believe that it is the best approach in the long term; on the contrary, while we would like to use multitask learning in as many problems as possible, the multitask learning algorithms are not yet at a stage where they provide a robust and a sizeable improvement across a wide range of domains. This sort of multitask learning should be particularly important in reinforcement learning settings, since in the long run, experience will be very expensive relative to computation and possibly supervised data. For this reason, it is worthwhile to investigate the feasibility of multitask learning using the RL algorithms that have been developed so far. Thus the goal is to train a single neural network that can simultaneously solve a collection of <a href="https://gym.openai.com/envs#mujoco">MuJoCo environments</a> in Gym. The current enviroments are dissimilar enough that it is unlikely that information can be shared between them. Therefore, your job is to create a set of similar environments that will serve as a good testbed for multi-task learning. Some possibilities include (1) bipedally walking with different limb dimensions and masses, (2) reaching with <a href="http://papers.nips.cc/paper/2785-learning-to-control-an-octopus-arm-with-gaussian-process-temporal-difference-methods.pdf">octopus arms</a> that have different numbers of links, (3) using the same robot model for walking, jumping, and standing. At the end of learning, the trained neural network should be told (via an additional input) which task it's running on, and achieve high cumulative reward on this task. The goal of this problem is to determine whether there is any benefit whatsoever to training a single neural network on multiple environments versus a single one, where we measure benefit via training speed. We already know that the multitask learning on Atari has been difficult (see the <a href="http://arxiv.org/pdf/1511.06295.pdf">relevant</a> <a href="http://arxiv.org/pdf/1511.06342.pdf">papers</a>). But will multitask learning work better on MuJoCo environments? The goal is to find out. The most interesting experiment is to train a multitask net of this kind on all but one MuJoCo environment, and then see if the resulting net can be trained more rapidly on a task that it hasn't been trained on. In other words, we hope that this kind of multitask learning can accelerate training of new tasks. If successful, the results can be significant. <hr /> <h3> Notes </h3> It is a reasonably risky project, since there is a chance that this kind of transfer will be as difficult as it has been for Atari.

_requests_for_research/multitask-rl-with-continuous-actions.html (41 lines of code) (raw):