parler_tts/modeling_parler_tts.py (7 lines): - line 611: # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1. - line 676: # # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim] - line 743: # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in LlamaFlashAttention2 __init__. - line 833: # TODO: Improve this warning with e.g. `model.config._attn_implementation = "manual"` once this is implemented. - line 1187: TODO: it's passed through enc_to_dec_proj and optionnally we concat the prompt hidden states in certain cases. - line 1508: # TODO: As it is, the masked ids from the prompt will still count in the positions embeddings - line 1666: # TODO: As of torch==2.2.0, the `attention_mask` passed to the model in `generate` is 2D and of dynamic length even when the static parler_tts/dac_wrapper/modeling_dac.py (2 lines): - line 68: # TODO: for now, no chunk length - line 133: # TODO: for now, no chunk length