deepseek_vl2/models/modeling_deepseek.py (6 lines):
	- line 948: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
	- line 1022: # TODO: support compressed_kv for kv_cache (instead of key_states, value_states) in flash_attention version
	- line 1029: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache
	- line 1121: # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in DeepseekV2FlashAttention2 __init__.
	- line 1809: # TODO @gante we should only keep a `cache_position` in generate, and do +=1.
	- line 1819: # TODO: use `next_tokens` directly instead.


web_demo.py (1 line):
	- line 396: # TODO always render the last image's visual grounding image


deepseek_vl2/models/modeling_deepseek_vl_v2.py (1 line):
	- line 308: # FIXME 目前tile tag & global_view_pos的默认取值都是之前的实验策略；后续应当去掉默认取值，改为没有取值就raise error