Summary: 4 instances, 2 unique

Text	Count
# FIXME: A hack for models pre-trained with layernorm over all the tokens not just the CLS	3
# TODO: support multiple images per gpu (only minor changes are needed)	1