deepseek_vl2/models/modeling_deepseek.py (6 lines): - line 948: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1. - line 1022: # TODO: support compressed_kv for kv_cache (instead of key_states, value_states) in flash_attention version - line 1029: # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache - line 1121: # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in DeepseekV2FlashAttention2 __init__. - line 1809: # TODO @gante we should only keep a `cache_position` in generate, and do +=1. - line 1819: # TODO: use `next_tokens` directly instead. web_demo.py (1 line): - line 396: # TODO always render the last image's visual grounding image deepseek_vl2/models/modeling_deepseek_vl_v2.py (1 line): - line 308: # FIXME 目前tile tag & global_view_pos的默认取值都是之前的实验策略;后续应当去掉默认取值,改为没有取值就raise error