huggingface / fineweb-2
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
Learn more...
Duplication Overall
system24% (441 lines)
dependency graphs: 2D graph | 3D graph | 3D graph (with duplicates)...
Duplication per Extension
py24% (441 lines)
Duplication per Component (primary)
misc35% (409 lines)
ablations7% (32 lines)
ROOT0% (0 lines)
Longest Duplicates
The list of 18 longest duplicates.
See data for all 18 duplicates...
Size#FoldersFilesLinesCode
54 x 2 misc/reference_datasets/monolingual/ar
misc/reference_datasets/monolingual/fr
8:67 (79%)
7:66 (78%)
view
34 x 2 misc/reference_datasets/monolingual/zh
misc/reference_datasets/multilingual
46:89 (26%)
56:98 (32%)
view
31 x 2 misc/reference_datasets/monolingual/zh
misc/reference_datasets/multilingual
10:44 (24%)
9:43 (29%)
view
15 x 2 misc/reference_datasets/multilingual
misc/reference_datasets/multilingual
96:114 (15%)
115:132 (14%)
view
14 x 2 misc/reference_datasets/monolingual/zh
misc/reference_datasets/multilingual
90:103 (10%)
100:113 (13%)
view
12 x 2 misc/reference_datasets/monolingual/ar
misc/reference_datasets/multilingual
33:44 (17%)
49:60 (12%)
view
12 x 2 misc/reference_datasets/monolingual/fr
misc/reference_datasets/multilingual
32:43 (17%)
49:60 (12%)
view
11 x 2 misc/precision_filtering
misc/precision_filtering
7:27 (27%)
66:86 (21%)
view
10 x 2 misc/reference_datasets/monolingual/hi
misc/reference_datasets/monolingual/te
11:20 (55%)
11:20 (31%)
view
9 x 2 misc/precision_filtering
misc/precision_filtering
37:45 (14%)
41:49 (23%)
view
9 x 2 ablations/evaluation
ablations/evaluation
49:66 (7%)
11:28 (20%)
view
8 x 2 misc/precision_filtering
misc/precision_filtering
18:25 (8%)
42:49 (21%)
view
8 x 2 misc/precision_filtering
misc/precision_filtering
18:25 (8%)
38:45 (12%)
view
7 x 2 misc/reference_datasets/monolingual/ar
misc/reference_datasets/monolingual/ar
13:19 (46%)
75:81 (10%)
view
7 x 2 ablations/evaluation
ablations/evaluation
189:269 (5%)
58:106 (15%)
view
7 x 2 misc/precision_filtering
misc/precision_filtering
4:17 (10%)
66:77 (13%)
view
7 x 2 misc/precision_filtering
misc/precision_filtering
7:18 (17%)
4:17 (10%)
view
6 x 2 misc/reference_datasets/monolingual/te
misc/reference_datasets/monolingual/th
36:41 (18%)
14:19 (46%)
view
Duplicated Units
The list of top 1 duplicated units.
See data for all 1 unit duplicate
Size#FoldersFilesLinesCode
15 x 2 misc/reference_datasets/monolingual/ar
misc/reference_datasets/monolingual/fr
0:0 
0:0 
view