_requests_for_research/inverse-draw.html (55 lines of code) (raw):
---
title: 'The Inverse DRAW model'
summary: ''
difficulty: 2 # out of 3
---
<p>Investigate an “Inverse DRAW” model. </p>
<p>
The <a href="https://arxiv.org/abs/1502.04623">DRAW</a> model is a generative
model of natural images that operates by making a large number of small
contributions to an additive canvas using an attention model. The attention
model used by the DRAW model identifies a small area in the image and "writes" to it.
</p>
<p>
In the inverse DRAW model, there is a stochastic hidden variable and
an attention model that reads from these hidden variables. The outputs
of the attention model are provided to an LSTM that produces the
observation one dimension (or one group of dimensions) at a time.
Thus, while the DRAW model uses attention to decide where to write on the output canvas,
the inverse DRAW uses attention to choose the latent variable to be used
at a given timestep. The Inverse DRAW model can be seen as a
<a href="https://arxiv.org/pdf/1410.5401v2.pdf">Neural Turing Machine</a> generative
model that emits one dimension at a time, where the memory is a read-only latent variable.
</p>
<p> The Inverse DRAW model is an interesting concept to explore
because the dimensionality of the hidden variable is decoupled from
the length of the input. In more detail, the Inverse DRAW model is
a <a href="https://arxiv.org/pdf/1312.6114v10.pdf">variational
autoencoder</a>, whose p-model emits the observation one dimension at
a time, using attention to choose the appropriate latent variable for
each visible dimension. There is a fair bit of choice in the
architecture of the approximate posterior. A natural choice is to use
the same architecture for the posterior, where the observation will be
playing the role of the latent variables.
</p>
<p> A useful property of the Inverse DRAW model is that its latent variables
may operate at a <i>rate</i> that is different from the observation. This
is the case because each dimension of the observation gets assigned to one
hidden state. If this model were to successfully be made deep, we would get
a hierarchy of representation, where each representation is operating at a
variable rate, which is trained to be as well-suited as possible for the
current dataset.
</p>
<p>It would be interesting to apply this model to a text dataset, and to visualize
the latent variables, as well as the precise way in which the model assigns
words to latent variables.
</p>
<hr>
<h3>Notes</h3>
<p> It is a hard project as it is not clear that a models like this
can be made to work with current techniques. However, it makes
success all the more impressive.</p>
<p>The inverse DRAW model may have a cost function that's very difficult to optimize, so expect a struggle.</p>
<h3>Solutions</h3>
The code and an associated paper for a model implementing a version of the
"Inverse DRAW" model is available <a href="https://github.com/dojoteef/glas">here</a>.