Summary: 8 instances, 8 unique Text Count # TODO: try to reduce this / make it a function of "hash_in_mem" / num_langs 1 # TODO: should we use another format ? 1 # TODO: better default 1 # TODO find a tokenizer for those languages 1 # TODO: try copying models file, try READ or PARALLEL_READ 1 # TODO use classic files directory. 1 # TODO: open the remote file in streaming mode. 1 # TODO: start downloading the next segment in the background 1