Summary: 28 instances, 24 unique Text Count # TODO: Wasteful, but for now "deserialize" the mask set into individual positions 1 # # TODO: The loading code in `transformers` assumes pred_layer is under transformers, so the LM head is not loaded properly. We load manually: 1 # TODO: WIP 1 # TODO: Manual numerically-stable softmax 1 # TODO: Not needed in transformers v3? Will vet. 1 # TODO: Dedup with BaseScorer's score() 2 # TODO: SEE COMMENT ABOVE REGARDING FIXEDBUCKETSAMPLER 1 # TODO: It also unnecessarily locks the GPUs to each other 1 # TODO: Test score accumulation 1 # TODO: Assumes newline at the end 1 # TODO: This does not restrict to specific GPUs however, use CUDA_VISIBLE_DEVICES? 1 # TODO: More than one file 1 # TODO: Super inefficient where we go from MXNet to NumPy to PyTorch 1 # TODO: What is correct # of steps? 1 # TODO: Manual numerically-stable softmax 1 # TODO: deduplicate 1 # TODO: WORK ON THIS 1 # TODO: Return the BLEU score 1 # TODO: Write tests about batching over multiple GPUs and getting the same scores 1 # TODO: Stale grad? 1 # TODO: Here we aggregate all edits and then 1 # TODO: There is a 'by-design' bug in FixedBucketSampler with num_shards > 0, where it silently reuses the last utterances: 4 # TODO: Manual numerically-stable softmax 1 # TODO: Have the scorers condition on what the vocab and tokenizer class are 1