huggingface / tokenizers
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
Learn more...
Duplication Overall
system17% (4,475 lines)
dependency graphs: 2D graph | 3D graph | 3D graph (with duplicates)...
Duplication per Extension
rs17% (3,986 lines)
py19% (336 lines)
pyi21% (133 lines)
css14% (20 lines)
Duplication per Component (primary)
tokenizers20% (2,979 lines)
bindings14% (1,496 lines)
Longest Duplicates
The list of 50 longest duplicates.
See data for all 893 duplicates...
Size#FoldersFilesLinesCode
58 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1989:2049 (3%)
2051:2111 (3%)
view
40 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1632:1673 (2%)
1675:1716 (2%)
view
37 x 2 tokenizers/src/pre_tokenizers
tokenizers/src/processors
510:547 (7%)
126:163 (26%)
view
35 x 2 tokenizers/src/processors
tokenizers/src/processors
56:95 (14%)
86:125 (12%)
view
31 x 2 tokenizers/src/processors
tokenizers/src/processors
97:130 (12%)
127:160 (10%)
view
28 x 2 bindings/python/src
bindings/python/src
152:179 (3%)
372:399 (3%)
view
26 x 2 bindings/python/src
bindings/python/src
320:348 (3%)
527:555 (3%)
view
23 x 2 tokenizers/src/processors
tokenizers/src/processors
172:200 (9%)
213:241 (8%)
view
22 x 2 tokenizers/src/models/unigram
tokenizers/src/models/wordlevel
635:662 (3%)
104:131 (16%)
view
22 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1996:2018 (1%)
2120:2142 (1%)
view
22 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2058:2080 (1%)
2120:2142 (1%)
view
21 x 2 tokenizers/src/processors
tokenizers/src/processors
254:274 (8%)
930:950 (2%)
view
20 x 2 bindings/python/src/utils
bindings/python/src/utils
427:453 (5%)
467:493 (5%)
view
20 x 2 tokenizers/src/pre_tokenizers
tokenizers/src/pre_tokenizers
28:47 (28%)
61:80 (33%)
view
18 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2062:2080 (<1%)
2192:2210 (<1%)
view
18 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1639:1657 (<1%)
1725:1744 (<1%)
view
18 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2000:2018 (<1%)
2192:2210 (<1%)
view
18 x 2 bindings/python/src
bindings/python/src
237:257 (2%)
789:809 (2%)
view
18 x 2 bindings/python/src
bindings/python/src
238:258 (2%)
452:472 (2%)
view
18 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2124:2142 (<1%)
2192:2210 (<1%)
view
18 x 2 bindings/node/src
bindings/node/src
96:115 (12%)
161:180 (9%)
view
18 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1682:1700 (<1%)
1725:1744 (<1%)
view
18 x 2 tokenizers/src/processors
tokenizers/src/processors
1048:1065 (2%)
1099:1116 (2%)
view
18 x 2 bindings/python/scripts
bindings/python/scripts
119:139 (5%)
254:274 (5%)
view
17 x 2 bindings/python/src
bindings/python/src
643:660 (2%)
789:806 (2%)
view
17 x 2 bindings/python/src
bindings/python/src
452:471 (2%)
790:809 (2%)
view
17 x 2 bindings/python/src/utils
bindings/python/src/utils
399:421 (4%)
431:453 (4%)
view
17 x 2 bindings/python/src/utils
bindings/python/src/utils
399:421 (4%)
471:493 (4%)
view
17 x 2 bindings/python/src
bindings/python/src
237:254 (2%)
643:660 (2%)
view
17 x 2 tokenizers/src/pre_tokenizers
tokenizers/src/processors
488:504 (3%)
101:117 (12%)
view
17 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1390:1408 (1%)
1443:1461 (1%)
view
17 x 2 bindings/python/src
bindings/python/src
1038:1055 (1%)
1097:1114 (1%)
view
17 x 2 tokenizers/src/processors
tokenizers/src/processors
230:246 (6%)
908:924 (2%)
view
16 x 2 bindings/python/src
bindings/python/src
329:346 (2%)
683:700 (2%)
view
16 x 2 tokenizers/src/processors
tokenizers/src/processors
237:252 (6%)
280:295 (5%)
view
16 x 2 tokenizers/src/processors
tokenizers/src/processors
903:918 (2%)
981:996 (2%)
view
16 x 2 bindings/python/src
bindings/python/src
452:468 (2%)
644:660 (2%)
view
16 x 2 bindings/python/src
bindings/python/src
536:553 (2%)
683:700 (2%)
view
15 x 2 tokenizers/src/models/bpe
tokenizers/src/models/wordlevel
623:642 (2%)
111:130 (11%)
view
15 x 2 tokenizers/src/models/bpe
tokenizers/src/models/unigram
623:642 (2%)
642:661 (2%)
view
15 x 2 tokenizers/src/processors
tokenizers/src/processors
135:152 (6%)
162:179 (6%)
view
15 x 2 tokenizers/src/processors
tokenizers/src/processors
1069:1083 (1%)
1101:1115 (1%)
view
15 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1523:1539 (<1%)
1642:1657 (<1%)
view
15 x 2 bindings/python/src
bindings/python/src
329:345 (2%)
856:872 (2%)
view
15 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1523:1539 (<1%)
1685:1700 (<1%)
view
15 x 2 tokenizers/src/processors
tokenizers/src/processors
213:228 (6%)
891:906 (1%)
view
15 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2120:2134 (<1%)
2239:2253 (<1%)
view
15 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
1523:1539 (<1%)
1728:1744 (<1%)
view
15 x 2 tokenizers/src/tokenizer
tokenizers/src/tokenizer
2058:2072 (<1%)
2239:2253 (<1%)
view
15 x 2 bindings/python/src
bindings/python/src
683:699 (2%)
856:872 (2%)
view
Duplicated Units
The list of top 4 duplicated units.
See data for all 4 unit duplicates...
Size#FoldersFilesLinesCode
27 x 2 tokenizers/src/models/unigram
tokenizers/src/models/wordlevel
628:657 
97:126 
view
11 x 2 tokenizers/src/models/unigram
tokenizers/src/models/bpe
86:98 
223:235 
view
6 x 3 bindings/python/src
bindings/python/src
bindings/python/src
551:558 
840:847 
658:665 
view
6 x 2 bindings/python/src
bindings/python/src
851:858 
669:676 
view