csrc/kernels/internode.cu (5 lines): - line 146: // TODO: faster encoding - line 253: // TODO: more light fence or barrier or signaling - line 254: // TODO: overlap EP barrier and NVL cleaning - line 283: // TODO: may use NVSHMEM reduction - line 1246: // TODO: maybe too many registers here csrc/config.hpp (3 lines): - line 47: // TODO: add assertions - line 73: // TODO: add assertions - line 145: // TODO: optimize memory usages csrc/kernels/runtime.cu (1 line): - line 61: // TODO: we still use `nvshmem_barrier` under IBRC mode, which should be switch to IBGDA mode later csrc/kernels/ibgda_device.cuh (1 line): - line 47: // TODO: simplify PTX using 16-bit instructions setup.py (1 line): - line 12: # TODO: currently, we only support Hopper architecture, we may add Ampere support later