--- title: 'Im2Latex' summary: '' difficulty: 1 # out of 3 ---

Sequence-to-sequence models with attention have been enormously successful. They made it possible for neural networks to reach new levels of state of the art in machine translation, speech recognition, and syntactic parsing. Thanks to this work, neural network can now consume inputs of arbitrary shapes, and output sequences of variable length, without much effort on the practitioner's side.

Implement an attention model that takes an image of a PDF math formula, and outputs the characters of the LaTeX source that generates the formula.


Getting Started

For a quick start, download a prebuilt dataset or use these tools to build your own dataset. Alternatively, you can proceed manually with the following steps:


Notes

A success here would be a very cool result and could be used to build a useful online tool.

While this is a very non-trivial project, we've marked it with a one-star difficulty rating because we know it's solvable using current methods. It is still very challenging to really do it, as it requires getting several ML components together correctly.

Solutions

Results, data set, code, and a write-up are available at http://lstm.seas.harvard.edu/latex/. The model is trained on the above data sets and uses an extension of the Show, Attend and Tell paper combined with a multi-row LSTM encoder. Code is written in Torch (based on the seq2seq-attn system), and the model is optimized using SGD. Additional experiments are run using the model to generate HTML from small webpages.