facebookresearch / bio-lm
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 39% duplication:
    • 2,484 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 973 duplicated lines
  • 46 duplicates
system39% (973 lines)
Duplication per Extension
py39% (973 lines)
Duplication per Component (primary)
biolm37% (690 lines)
preprocessing44% (283 lines)
Longest Duplicates
The list of 20 longest duplicates.
See data for all 46 duplicates...
Size#FoldersFilesLinesCode
61 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
98:171 (29%)
88:161 (29%)
view
38 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
591:639 (6%)
511:559 (7%)
view
34 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
71:117 (5%)
75:120 (6%)
view
32 x 2 biolm
biolm
utils_classification.py
utils_classification.py
332:376 (5%)
382:426 (5%)
view
29 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
192:229 (13%)
179:216 (13%)
view
28 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
119:153 (4%)
122:156 (5%)
view
22 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
495:516 (3%)
415:436 (4%)
view
18 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
568:586 (3%)
491:509 (3%)
view
17 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
543:560 (2%)
469:486 (3%)
view
16 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
462:477 (2%)
383:398 (3%)
view
14 x 2 biolm
biolm
utils_classification.py
utils_classification.py
287:308 (2%)
434:455 (2%)
view
13 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
188:203 (2%)
177:193 (2%)
view
12 x 2 biolm
biolm
utils_classification.py
utils_classification.py
184:201 (1%)
287:304 (1%)
view
12 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
530:541 (2%)
456:467 (2%)
view
12 x 2 biolm
biolm
utils_classification.py
utils_classification.py
184:201 (1%)
434:451 (1%)
view
11 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
240:252 (5%)
227:239 (5%)
view
11 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
17:27 (5%)
17:27 (5%)
view
9 x 2 preprocessing
preprocessing
preprocess_i2b2_2010_ner.py
preprocess_i2b2_2012_ner.py
216:225 (5%)
263:271 (4%)
view
9 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
64:73 (4%)
59:68 (4%)
view
9 x 2 biolm
biolm
run_classification.py
run_sequence_labelling.py
483:493 (1%)
397:407 (1%)
view
Duplicated Units
The list of top 3 duplicated units.
See data for all 3 unit duplicates...
Size#FoldersFilesLinesCode
60 x 2 preprocessing
preprocessing
preprocess_i2b2_2012_ner.py
preprocess_i2b2_2014_ner.py
0:0 
0:0 
view
14 x 2 biolm
biolm
utils_classification.py
utils_classification.py
0:0 
0:0 
view
7 x 2 biolm
biolm
utils_classification.py
utils_classification.py
0:0 
0:0 
view