muse/modeling_transformer.py (4 lines):
	- line 888: # TODO: should norm be applied to encoder_hidden_states as well?
	- line 1208: # TODO: make this configurable
	- line 1290: can_remask_prev_masked=False,  # TODO: implement this
	- line 1360: scores = rearrange(scores, "... 1 -> ...")  # TODO: use torch


muse/pipeline_muse.py (4 lines):
	- line 120: ).input_ids  # TODO: remove hardcode
	- line 299: # TODO: Add config for pipeline to specify text encoder
	- line 311: # TODO: make this more robust
	- line 436: ).input_ids  # TODO: remove hardcode


training/train_muse.py (3 lines):
	- line 179: # TODO - would be nice to vectorize
	- line 526: # TODO: make this configurable
	- line 911: # TODO: Add generation


scripts/gen_sdxl_synthetic_dataset.py (1 line):
	- line 97: # TODO - can we avoid syncing images to cpu


muse/modeling_paella_vq.py (1 line):
	- line 11: # TODO: This model only supports inference, not training. Make it trainable.


scripts/pre_encode.py (1 line):
	- line 92: TODO - probably would be better to wait until the thread pool is full and then


muse/modeling_utils.py (1 line):
	- line 831: # TODO: remove this when we remove the deprecation warning, and the `kwargs` argument,


training/data.py (1 line):
	- line 75: # FIXME webdataset version throws if suffix in current_sample, but we have a potential for


training/train_maskgit_imagenet.py (1 line):
	- line 499: # TODO: Add generation


muse/modeling_transformer_v2.py (1 line):
	- line 160: # TODO: Allow enabling fused norm using a function (like we do for xformers attention)