Summary: 28 instances, 28 unique Text Count # TODO: investigate how to avoid using torch.cat (because it seems to be slow for CPU tensors) 1 # TODO: Optimizer state gets cast to FP16 and back to FP32 for 1 # TODO: is there a way to split by device and dtype without appending in the inner loop? 1 # TODO (Min): these two factors doesn't have to be the same. More tuning can be done. 1 # TODO: We need to make sure following conditions hold before preparing the graph for the pipeline: 1 // TODO: Print which tensor fails. 1 # TODO: There seems to be a memory leak that is solved by following line. 1 # TODO type process_group once `distributed` module is stubbed 1 // TODO: Kernel arg size limit may be <4KB for some other cards (ie Jetson) 1 # TODO (min): this may not work with activation checkpointing when 1 # TODO (Min): Future PyTorch versions may provide a way to completely disable this 1 # TODO: This dependency injection causes deadlock if this partition 1 # TODO: fill in the types for these, or otherwise figure out some 1 TODO (min): need a way of determine how much to increase the step size? 1 # TODO (Min): the `group` used here in the key is the object hash, not the content 1 # TODO (min, mike): We need to have a way to check that training loop & DDP 1 TODO (min): have both `set_scale` and `set_num_gradients_to_accumulate` 1 # TODO: Remove the toggle here when github open issue #801 is resolved. 1 # TODO More precise types here. 1 # TODO (Min): this global used to be assigned every time this file is imported. 1 # TODO: One downside of doing it this way, is direct use of 1 # TODO: deprecate this function 1 # TODO set process_group to the write type once torch.distributed is stubbed 1 # TODO: Remove the toggle-enable_nccl_base_collectives when github open issue #801 is resolved. 1 TODO (Min): It would be nice to capture the stepping state as well. 1 # TODO: this list could be huge. Can we avoid materializing? 1 # TODO: Debug why we need this special handling 1 # TODO make Variable and Function more precise 1