src/hyperpod_nemo_adapter/collections/model/nlp/custom_models/modeling_deepseek.py (3 lines):
	- line 786: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
	- line 858: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache
	- line 943: # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in DeepseekV3FlashAttention2 __init__.


src/hyperpod_nemo_adapter/collections/model/sagemaker_base_model.py (2 lines):
	- line 375: # TODO add support later for flash att
	- line 402: # TODO add support later for flash att


src/hyperpod_nemo_adapter/patches/patch_llama_flash_attn_cp.py (2 lines):
	- line 92: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
	- line 195: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache


src/hyperpod_nemo_adapter/utils/get_rank.py (1 line):
	- line 23: TODO: Add support for getting EKS rank


src/hyperpod_nemo_adapter/utils/callbacks/checkpoint.py (1 line):
	- line 337: # TODO: resample if hardward configurations change


src/hyperpod_nemo_adapter/collections/model/nlp/sagemaker_deepseek_model.py (1 line):
	- line 32: # TODO add a model class for the first-party DeepSeek models (DeepSeek-R1, DeepSeek-V3...)