facebookresearch / CodeGen
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 14% duplication:
    • 74,756 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 10,676 duplicated lines
  • 159,032 duplicates
system14% (10,676 lines)
Duplication per Extension
py24% (8,343 lines)
java5% (1,457 lines)
cpp6% (822 lines)
html11% (54 lines)
Duplication per Component (primary)
data/transcoder_evaluation_gfg/python27% (5,332 lines)
data/transcoder_evaluation_gfg/java5% (1,457 lines)
data/transcoder_evaluation_gfg/cpp6% (822 lines)
codegen_sources/model/src10% (703 lines)
CodeXGLUE/Code-Text/code-to-text57% (497 lines)
CodeXGLUE/Text-Code/text-to-code48% (400 lines)
codegen_sources/preprocessing/dataset_modes32% (359 lines)
CodeXGLUE/Code-Code/Clone-detection-BigCloneBench56% (311 lines)
CodeXGLUE/Text-Code/NL-code-search-Adv56% (302 lines)
codegen_sources/model17% (201 lines)
codegen_sources/wrappers28% (161 lines)
CodeXGLUE11% (54 lines)
codegen_sources/preprocessing/lang_processors4% (40 lines)
codegen_sources/preprocessing/bpe_modes10% (16 lines)
codegen_sources/preprocessing/obfuscation1% (14 lines)
codegen_sources/preprocessing1% (7 lines)
codegen_sources0% (0 lines)
codegen_sources/model/tools0% (0 lines)

Duplication Between Components (50+ lines)

G CodeXGLUE/Code-Code/Clone-detection-BigCloneBench CodeXGLUE/Code-Code/Clone-detection-BigCloneBench CodeXGLUE/Text-Code/NL-code-search-Adv CodeXGLUE/Text-Code/NL-code-search-Adv CodeXGLUE/Code-Code/Clone-detection-BigCloneBench--CodeXGLUE/Text-Code/NL-code-search-Adv 540 CodeXGLUE/Code-Text/code-to-text CodeXGLUE/Code-Text/code-to-text CodeXGLUE/Code-Code/Clone-detection-BigCloneBench--CodeXGLUE/Code-Text/code-to-text 68 CodeXGLUE/Text-Code/text-to-code CodeXGLUE/Text-Code/text-to-code CodeXGLUE/Code-Code/Clone-detection-BigCloneBench--CodeXGLUE/Text-Code/text-to-code 209 CodeXGLUE/Text-Code/NL-code-search-Adv--CodeXGLUE/Text-Code/text-to-code 207 CodeXGLUE/Code-Text/code-to-text--CodeXGLUE/Text-Code/NL-code-search-Adv 68 CodeXGLUE/Code-Text/code-to-text--CodeXGLUE/Text-Code/text-to-code 322

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 159,032 duplicates...
Size#FoldersFilesLinesCode
152 x 2 CodeXGLUE/Code-Text/code-to-text/code
CodeXGLUE/Code-Text/code-to-text/evaluator
bleu.py
evaluator.py
3:199 (100%)
3:199 (100%)
view
85 x 2 CodeXGLUE/Code-Code/Clon...tion-BigCloneBench/code
CodeXGLUE/Text-Code/NL-code-search-Adv/code
run.py
run.py
435:528 (17%)
414:507 (18%)
view
84 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
SORT_EVEN_PLACED_ELEMENTS_INCREASING_...
SPLIT_ARRAY_ADD_FIRST_PART_END.java
12:105 (100%)
12:106 (100%)
view
81 x 2 CodeXGLUE/Code-Text/code-to-text/code
CodeXGLUE/Text-Code/text-to-code/code
model.py
beam.py
104:226 (54%)
96:218 (57%)
view
71 x 2 CodeXGLUE/Text-Code/text-to-code/code
CodeXGLUE/Text-Code/text-to-code/evaluator
bleu.py
bleu.py
28:134 (100%)
28:134 (100%)
view
60 x 2 CodeXGLUE/Code-Text/code-to-text/code
CodeXGLUE/Text-Code/text-to-code/code
model.py
beam.py
9:94 (40%)
9:94 (42%)
view
51 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
obfuscation_functions_mode.py
obfuscation_mode.py
47:101 (34%)
44:97 (42%)
view
51 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
FIND_THE_MINIMUM_DISTANCE_BETWEEN_TWO...
FIND_THE_MINIMUM_DISTANCE_BETWEEN_TWO...
25:84 (82%)
36:96 (72%)
view
45 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
30:74 (72%)
33:77 (70%)
view
41 x 2 CodeXGLUE/Code-Code/Clon...tion-BigCloneBench/code
CodeXGLUE/Text-Code/NL-code-search-Adv/code
run.py
run.py
532:587 (8%)
511:569 (8%)
view
40 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
DISTRIBUTING_ITEMS_PERSON_CANNOT_TAKE...
DISTRIBUTING_ITEMS_PERSON_CANNOT_TAKE...
26:74 (76%)
22:70 (81%)
view
32 x 2 data/transcoder_evaluation_gfg/python
data/transcoder_evaluation_gfg/python
MOVE_VE_ELEMENTS_END_ORDER_EXTRA_SPAC...
MOVE_ZEROES_END_ARRAY.py
26:57 (68%)
20:51 (78%)
view
32 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
obfuscation_functions_mode.py
obfuscation_mode.py
153:187 (21%)
124:158 (26%)
view
32 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
monolingual_functions_mode.py
monolingual_mode.py
44:77 (28%)
40:74 (27%)
view
31 x 2 CodeXGLUE/Code-Code/Clon...tion-BigCloneBench/code
CodeXGLUE/Text-Code/NL-code-search-Adv/code
run.py
run.py
208:246 (6%)
182:220 (6%)
view
29 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
monolingual_functions_mode.py
obfuscation_mode.py
48:79 (25%)
48:79 (23%)
view
29 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
monolingual_functions_mode.py
obfuscation_functions_mode.py
48:79 (25%)
51:83 (19%)
view
28 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
PRODUCT_NODES_K_TH_LEVEL_TREE_REPRESE...
SUM_NODES_K_TH_LEVEL_TREE_REPRESENTED...
32:63 (68%)
32:63 (68%)
view
28 x 2 data/transcoder_evaluation_gfg/python
data/transcoder_evaluation_gfg/python
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
21:48 (73%)
27:54 (63%)
view
28 x 2 codegen_sources/preprocessing/dataset_modes
codegen_sources/preprocessing/dataset_modes
monolingual_mode.py
obfuscation_functions_mode.py
44:74 (24%)
51:81 (19%)
view
Duplicated Units
The list of top 20 duplicated units.
See data for all 59 unit duplicates...
Size#FoldersFilesLinesCode
78 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
SORT_EVEN_PLACED_ELEMENTS_INCREASING_...
SPLIT_ARRAY_ADD_FIRST_PART_END.java
28:106 
29:107 
view
54 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
FIND_THE_MINIMUM_DISTANCE_BETWEEN_TWO...
FIND_THE_MINIMUM_DISTANCE_BETWEEN_TWO...
31:85 
42:97 
view
54 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
30:84 
33:87 
view
55 x 2 CodeXGLUE/Text-Code/text-to-code/evaluator
CodeXGLUE/Text-Code/text-to-code/code
bleu.py
bleu.py
0:0 
0:0 
view
43 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
DISTRIBUTING_ITEMS_PERSON_CANNOT_TAKE...
DISTRIBUTING_ITEMS_PERSON_CANNOT_TAKE...
32:75 
28:71 
view
32 x 11 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
...
COUNT_DISTINCT_OCCURRENCES_AS_A_SUBSE...
WAYS_TRANSFORMING_ONE_STRING_REMOVING...
WILDCARD_CHARACTER_MATCHING.java
CHECK_STRING_CAN_OBTAINED_ROTATING_AN...
SPACE_OPTIMIZED_SOLUTION_LCS.java
FIND_ONE_EXTRA_CHARACTER_STRING_1.java
MAXIMUM_LENGTH_PREFIX_ONE_STRING_OCCU...
LONGEST_COMMON_SUBSTRING_SPACE_OPTIMI...
FIND_NUMBER_TIMES_STRING_OCCURS_GIVEN...
CHECK_POSSIBLE_TRANSFORM_ONE_STRING_A...
...
40:72 
50:82 
24:56 
26:58 
35:67 
31:63 
27:59 
44:76 
39:71 
45:77 
...
view
32 x 4 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
SUM_TWO_LARGE_NUMBERS.java
SUM_TWO_LARGE_NUMBERS_1.java
MULTIPLY_LARGE_NUMBERS_REPRESENTED_AS...
PROGRAM_CENSOR_WORD_ASTERISKS_SENTENC...
46:78 
44:76 
49:81 
32:64 
view
32 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
PROGRAM_COUNT_OCCURRENCE_GIVEN_CHARAC...
CHECK_OCCURRENCES_CHARACTER_APPEAR_TO...
26:58 
30:62 
view
32 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
PRODUCT_NODES_K_TH_LEVEL_TREE_REPRESE...
SUM_NODES_K_TH_LEVEL_TREE_REPRESENTED...
32:64 
32:64 
view
28 x 2 CodeXGLUE/Code-Text/code-to-text/evaluator
CodeXGLUE/Code-Text/code-to-text/code
evaluator.py
bleu.py
0:0 
0:0 
view
33 x 2 codegen_sources/model
codegen_sources/model
translate.py
deobfuscate.py
0:0 
0:0 
view
25 x 2 data/transcoder_evaluation_gfg/cpp
data/transcoder_evaluation_gfg/cpp
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
FIND_WHETHER_AN_ARRAY_IS_SUBSET_OF_AN...
35:60 
35:60 
view
23 x 2 data/transcoder_evaluation_gfg/cpp
data/transcoder_evaluation_gfg/cpp
SUM_NODES_K_TH_LEVEL_TREE_REPRESENTED...
PRODUCT_NODES_K_TH_LEVEL_TREE_REPRESE...
35:58 
35:58 
view
23 x 2 CodeXGLUE/Code-Text/code-to-text/evaluator
CodeXGLUE/Code-Text/code-to-text/code
evaluator.py
bleu.py
0:0 
0:0 
view
22 x 2 data/transcoder_evaluation_gfg/cpp
data/transcoder_evaluation_gfg/cpp
PANGRAM_CHECKING.cpp
FIND_EXPRESSION_DUPLICATE_PARENTHESIS...
35:57 
40:62 
view
21 x 63 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
...
COUNT_CHARACTERS_STRING_DISTANCE_ENGL...
LONGEST_PALINDROME_SUBSEQUENCE_SPACE....
LONGEST_EVEN_LENGTH_SUBSTRING_SUM_FIR...
CHECK_LARGE_NUMBER_DIVISIBLE_9_NOT.java
NUMBER_DIGITS_REMOVED_MAKE_NUMBER_DIV...
CHECK_GIVEN_STRING_ROTATION_PALINDROM...
COUNT_PALINDROMIC_SUBSEQUENCE_GIVEN_S...
CHECK_LARGE_NUMBER_DIVISIBLE_3_NOT.java
LONGEST_PREFIX_ALSO_SUFFIX_1.java
LONGEST_REPEATING_SUBSEQUENCE.java
...
27:48 
41:62 
39:60 
25:46 
31:52 
23:44 
38:59 
25:46 
42:63 
32:53 
...
view
21 x 16 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
...
PRINT_A_CLOSEST_STRING_THAT_DOES_NOT_...
LEXICOGRAPHICALLY_NEXT_STRING.java
LEXICOGRAPHICALLY_MINIMUM_STRING_ROTA...
DECODE_MEDIAN_STRING_ORIGINAL_STRING....
LEXICOGRAPHICAL_MAXIMUM_SUBSTRING_STR...
BINARY_REPRESENTATION_OF_NEXT_NUMBER....
REMOVE_BRACKETS_ALGEBRAIC_STRING_CONT...
PRINT_WORDS_STRING_REVERSE_ORDER.java
LEXICOGRAPHICAL_CONCATENATION_SUBSTRI...
NTH_EVEN_LENGTH_PALINDROME.java
...
31:52 
25:46 
29:50 
40:61 
28:49 
36:57 
45:66 
34:55 
36:57 
24:45 
...
view
21 x 2 data/transcoder_evaluation_gfg/java
data/transcoder_evaluation_gfg/java
WRITE_ONE_LINE_C_FUNCTION_TO_FIND_WHE...
WRITE_ONE_LINE_C_FUNCTION_TO_FIND_WHE...
25:46 
20:41 
view
33 x 2 CodeXGLUE/Text-Code/text-to-code/code
CodeXGLUE/Code-Text/code-to-text/code
beam.py
model.py
0:0 
0:0 
view
19 x 2 CodeXGLUE/Text-Code/text-to-code/evaluator
CodeXGLUE/Text-Code/text-to-code/code
bleu.py
bleu.py
0:0 
0:0 
view