huggingface / datatrove
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
Learn more...
Duplication Overall
system7% (865 lines)
dependency graphs: 2D graph | 3D graph | 3D graph (with duplicates)...
Duplication per Extension
py4% (509 lines)
rs53% (356 lines)
Duplication per Component (primary)
src7% (865 lines)
ROOT0% (0 lines)
Longest Duplicates
The list of 50 longest duplicates.
See data for all 61 duplicates...
Size#FoldersFilesLinesCode
51 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
298:367 (21%)
490:559 (11%)
view
33 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
34:66 (71%)
35:67 (64%)
view
25 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
40:64 (42%)
42:66 (49%)
view
25 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
42:66 (49%)
43:67 (47%)
view
25 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
40:64 (42%)
43:67 (47%)
view
25 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
41:65 (54%)
40:64 (42%)
view
25 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
41:65 (54%)
43:67 (47%)
view
23 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
167:192 (9%)
352:378 (5%)
view
22 x 2 src/datatrove/executor
src/datatrove/executor
85:111 (25%)
228:254 (15%)
view
20 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
373:394 (4%)
456:477 (4%)
view
20 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
246:267 (5%)
248:269 (7%)
view
18 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
111:131 (7%)
283:303 (4%)
view
17 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
223:247 (7%)
413:437 (3%)
view
17 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
34:50 (36%)
38:54 (19%)
view
17 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
35:51 (33%)
38:54 (19%)
view
15 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
50:64 (25%)
56:70 (17%)
view
15 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
52:66 (29%)
56:70 (17%)
view
15 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
51:65 (32%)
56:70 (17%)
view
15 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
53:67 (28%)
56:70 (17%)
view
14 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
42:59 (5%)
83:100 (3%)
view
12 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
139:150 (9%)
39:50 (26%)
view
12 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
305:320 (5%)
324:339 (5%)
view
12 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
139:150 (9%)
40:51 (23%)
view
12 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
139:150 (9%)
43:54 (13%)
view
12 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
305:320 (5%)
516:531 (2%)
view
12 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
324:339 (5%)
497:512 (2%)
view
12 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
497:512 (2%)
516:531 (2%)
view
11 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
218:230 (3%)
214:226 (3%)
view
11 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
146:170 (3%)
145:158 (3%)
view
10 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
141:150 (7%)
43:52 (18%)
view
10 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
235:244 (2%)
237:246 (3%)
view
10 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
43:52 (18%)
45:54 (11%)
view
10 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
141:150 (7%)
40:49 (16%)
view
10 x 2 src/datatrove/pipeline/stats
src/datatrove/pipeline/stats
38:47 (22%)
38:47 (25%)
view
10 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
88:102 (4%)
258:272 (2%)
view
10 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
40:49 (16%)
45:54 (11%)
view
10 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
22:38 (4%)
40:56 (2%)
view
9 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
88:96 (15%)
79:87 (16%)
view
9 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
494:503 (2%)
406:416 (3%)
view
9 x 2 src/datatrove/pipeline/tokens
src/datatrove/pipeline/tokens
169:177 (6%)
186:194 (6%)
view
9 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
259:268 (3%)
452:461 (2%)
view
8 x 2 src/datatrove/pipeline/stats
src/datatrove/pipeline/stats
40:47 (17%)
47:54 (16%)
view
8 x 2 src/datatrove/pipeline/stats
src/datatrove/pipeline/stats
40:47 (20%)
47:54 (16%)
view
7 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
472:478 (2%)
384:390 (2%)
view
7 x 2 src/datatrove/pipeline/dedup
src/datatrove/pipeline/dedup
386:394 (2%)
321:329 (2%)
view
7 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
31:38 (11%)
33:40 (13%)
view
7 x 2 src/datatrove/pipeline/tokens
src/datatrove/pipeline/tokens
202:209 (2%)
235:241 (2%)
view
6 x 2 src/datatrove/pipeline/readers
src/datatrove/pipeline/readers
135:140 (4%)
33:38 (10%)
view
6 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
187:192 (2%)
263:268 (2%)
view
6 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
263:268 (2%)
373:378 (1%)
view
Duplicated Units
The list of top 1 duplicated units.
See data for all 1 unit duplicate
Size#FoldersFilesLinesCode
6 x 2 src/datatrove/tools/fast_mh3/src
src/datatrove/tools/fast_mh3/src
95:102 
54:61 
view