model/train.py (6 lines):
	- line 50: # TODO: can we move handling of mems to inside transformer_gan?
	- line 403: # TODO Add more options
	- line 406: # Gen optimizer TODO: URGENT
	- line 615: # TODO: reload epoch and scheduler
	- line 927: # TODO: dis training messes up memory structure maintained during batch loading
	- line 1000: # TODO: investigate training tricks for dis different clip?


model/transformer_gan.py (5 lines):
	- line 244: # TODO: low priority could potentially make forward_generate a static func?
	- line 275: # TODO: When training gen do not pass only context into dis (since no grads anyway)
	- line 276: # TODO: do not loop over context
	- line 301: # TODO: Can we retain sub graph after calling backward?
	- line 502: self.cfg.DISCRIMINATOR.batch_chunk * self.cfg.DISCRIMINATOR.sample_chunks_mem)).backward()  # TODO CNN WGAN-GP


model/mem_transformer.py (1 line):
	- line 494: # TODO Think about how to enable masking when we reach BOS.


model/utils/config_helper.py (1 line):
	- line 60: cfg.DISCRIMINATOR.beta_max = 100.  # TODO: temperature decay