_requests_for_research/parallel-trpo.html

--- title: Parallel TRPO summary: '' difficulty: 2 # out of 3 --- <p> As it is always desirable to train larger models on harder domains, one important area of research is parallelization. Parallelization has played an <a href="http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf">important role</a> in deep learning, and has been <a href="https://arxiv.org/pdf/1507.04296.pdf">especially</a> <a href="https://arxiv.org/pdf/1602.01783.pdf">successful</a> in reinforcement learning. The successful development of algorithms that parallelize well will make it possible to train larger models faster, which will advance the field. </p> <p> The goal of this project is to implement the <a href="https://arxiv.org/pdf/1502.05477v4.pdf">Trust Region Policy Optimization (TRPO)</a> algorithm so that it would use multiple computers to achieve 15x lower wall-clock time than <a href="https://gym.openai.com/evaluations/eval_W27eCzLQBy60FciaSGSJw">joschu's single-threaded implementation</a> on the MuJoCo or Atari <a href="https://gym.openai.com/envs">Gym environments</a>. Given that TRPO is a highly stable algorithm that is extremely easy to use, a well-tuned parallel implementation could have a lot of practical significance. </p> <p> You may worry that in order to solve this problem, you would need access to a large number of computers. However, it is not so, as it is straightforward to simulate a set of parallel computers using a single core.</a> <p> Make sure your code remains generic and readable. </p> <hr /> <h3>Notes</h3> <p>It is known that RL algorithms can be parallelized well, so we expect it to be possible to improve upon the basic implementation. What is less obvious is whether it is possible to get 15x speedup using, say, only 20x more nodes.</p> <h3>Solutions</h3> <p> Preliminary paper describing TRPO with parallel actors <a href="http://kvfrans.com/static/trpo.pdf"> here </a>, with the implementation avaiable <a href="https://github.com/kvfrans/parallel-trpo"> at this repo. Current results are a 3x speedup with when using 4 cores.</a>

_requests_for_research/parallel-trpo.html (21 lines of code) (raw):