cc_net/jsonql.py (2 lines): - line 1014: # TODO: should we use another format ? - line 1121: # TODO: open the remote file in streaming mode. cc_net/mine.py (2 lines): - line 300: # TODO: try to reduce this / make it a function of "hash_in_mem" / num_langs - line 415: # TODO: better default cc_net/tokenizer.py (1 line): - line 56: # TODO find a tokenizer for those languages cc_net/tools/expand_corpus.py (1 line): - line 48: # TODO use classic files directory. cc_net/perplexity.py (1 line): - line 155: # TODO: try copying models file, try READ or PARALLEL_READ cc_net/process_wet_file.py (1 line): - line 198: # TODO: start downloading the next segment in the background