src/hyperpod_nemo_adapter/collections/model/nlp/custom_models/modeling_deepseek.py (3 lines): - line 786: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1. - line 858: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache - line 943: # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in DeepseekV3FlashAttention2 __init__. src/hyperpod_nemo_adapter/collections/model/sagemaker_base_model.py (2 lines): - line 375: # TODO add support later for flash att - line 402: # TODO add support later for flash att src/hyperpod_nemo_adapter/patches/patch_llama_flash_attn_cp.py (2 lines): - line 92: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1. - line 195: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache src/hyperpod_nemo_adapter/utils/get_rank.py (1 line): - line 23: TODO: Add support for getting EKS rank src/hyperpod_nemo_adapter/utils/callbacks/checkpoint.py (1 line): - line 337: # TODO: resample if hardward configurations change src/hyperpod_nemo_adapter/collections/model/nlp/sagemaker_deepseek_model.py (1 line): - line 32: # TODO add a model class for the first-party DeepSeek models (DeepSeek-R1, DeepSeek-V3...)