The distribution of size of files (measured in lines of code).
File | # lines | # units |
---|---|---|
jsonql.py in cc_net |
948 | 97 |
mine.py in cc_net |
464 | 20 |
dedup.py in cc_net |
360 | 24 |
perplexity.py in cc_net |
284 | 26 |
expand_corpus.py in cc_net/tools |
230 | 15 |
minify.py in cc_net |
230 | 22 |
process_wet_file.py in cc_net |
197 | 18 |
execution.py in cc_net |
172 | 9 |
flat_hash_set.py in cc_net |
160 | 28 |
text_normalizer.py in cc_net |
150 | 8 |
dl_cc_100.py in cc_net/tools |
133 | 6 |
split_by_lang.py in cc_net |
117 | 10 |
regroup.py in cc_net |
81 | 5 |
get_wiki_cirrus.py in cc_net |
74 | 6 |
make_dmoz_corpus.py in cc_net/tools |
55 | 4 |
tokenizer.py in cc_net |
55 | 6 |
setup.py in root |
43 | - |
pyproject.toml in root |
20 | - |
__main__.py in cc_net |
6 | 1 |
__init__.py in cc_net/tools |
1 | - |
__init__.py in cc_net |
1 | - |
File | # lines | # units |
---|---|---|
jsonql.py in cc_net |
948 | 97 |
flat_hash_set.py in cc_net |
160 | 28 |
perplexity.py in cc_net |
284 | 26 |
dedup.py in cc_net |
360 | 24 |
minify.py in cc_net |
230 | 22 |
mine.py in cc_net |
464 | 20 |
process_wet_file.py in cc_net |
197 | 18 |
expand_corpus.py in cc_net/tools |
230 | 15 |
split_by_lang.py in cc_net |
117 | 10 |
execution.py in cc_net |
172 | 9 |
text_normalizer.py in cc_net |
150 | 8 |
get_wiki_cirrus.py in cc_net |
74 | 6 |
dl_cc_100.py in cc_net/tools |
133 | 6 |
tokenizer.py in cc_net |
55 | 6 |
regroup.py in cc_net |
81 | 5 |
make_dmoz_corpus.py in cc_net/tools |
55 | 4 |
__main__.py in cc_net |
6 | 1 |
There are 2 files with lines longer than 120 characters. In total, there are 2 long lines.
File | # lines | # units | # long lines |
---|---|---|---|
dedup.py in cc_net |
360 | 24 | 1 |
process_wet_file.py in cc_net |
197 | 18 | 1 |