Summary: 28 instances, 24 unique

Text	Count
# TODO: Wasteful, but for now "deserialize" the mask set into individual positions	1
# # TODO: The loading code in `transformers` assumes pred_layer is under transformers, so the LM head is not loaded properly. We load manually:	1
# TODO: WIP	1
# TODO: Manual numerically-stable softmax	1
# TODO: Not needed in transformers v3? Will vet.	1
# TODO: Dedup with BaseScorer's score()	2
# TODO: SEE COMMENT ABOVE REGARDING FIXEDBUCKETSAMPLER	1
# TODO: It also unnecessarily locks the GPUs to each other	1
# TODO: Test score accumulation	1
# TODO: Assumes newline at the end	1
# TODO: This does not restrict to specific GPUs however, use CUDA_VISIBLE_DEVICES?	1
# TODO: More than one file	1
# TODO: Super inefficient where we go from MXNet to NumPy to PyTorch	1
# TODO: What is correct # of steps?	1
# TODO: Manual numerically-stable softmax	1
# TODO: deduplicate	1
# TODO: WORK ON THIS	1
# TODO: Return the BLEU score	1
# TODO: Write tests about batching over multiple GPUs and getting the same scores	1
# TODO: Stale grad?	1
# TODO: Here we aggregate all edits and then	1
# TODO: There is a 'by-design' bug in FixedBucketSampler with num_shards > 0, where it silently reuses the last utterances:	4
# TODO: Manual numerically-stable softmax	1
# TODO: Have the scorers condition on what the vocab and tokenizer class are	1