Summary: 4 instances, 2 unique Text Count # FIXME: A hack for models pre-trained with layernorm over all the tokens not just the CLS 3 # TODO: support multiple images per gpu (only minor changes are needed) 1