microsoft / nlp-recipes
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 12% duplication:
    • 13,195 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 1,685 duplicated lines
  • 115 duplicates
system12% (1,685 lines)
Duplication per Extension
py12% (1,561 lines)
c12% (124 lines)
Duplication per Component (primary)
utils_nlp/models14% (1,273 lines)
utils_nlp/dataset22% (274 lines)
utils_nlp/eval5% (138 lines)
ROOT0% (0 lines)
tools0% (0 lines)
utils_nlp0% (0 lines)
utils_nlp/azureml0% (0 lines)
utils_nlp/common0% (0 lines)
utils_nlp/interpreter0% (0 lines)
utils_nlp/language_utils0% (0 lines)
Longest Duplicates
The list of 20 longest duplicates.
See data for all 115 duplicates...
Size#FoldersFilesLinesCode
49 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
sequence_classification.py
290:395 (18%)
216:321 (26%)
view
24 x 2 utils_nlp/models/glove/src
utils_nlp/models/glove/src
cooccur.c
vocab_count.c
121:163 (6%)
122:164 (16%)
view
23 x 2 utils_nlp/models/bert
utils_nlp/models/transformers
common.py
named_entity_recognition.py
330:363 (10%)
223:256 (8%)
view
22 x 2 utils_nlp/models/bert
utils_nlp/models/transformers
common.py
named_entity_recognition.py
278:306 (10%)
181:210 (8%)
view
22 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
sequence_classification.py
40:77 (8%)
38:75 (12%)
view
20 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
question_answering.py
question_answering.py
745:834 (1%)
1042:1125 (1%)
view
20 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
datasets.py
datasets.py
42:62 (5%)
104:126 (5%)
view
20 x 2 utils_nlp/dataset
utils_nlp/dataset
bbc_hindi.py
dac.py
127:150 (28%)
120:143 (30%)
view
20 x 2 utils_nlp/eval/SentEval/senteval
utils_nlp/eval/SentEval/senteval
sick.py
sick.py
56:80 (12%)
160:184 (12%)
view
20 x 2 utils_nlp/models/bert
utils_nlp/models/bert
common.py
common.py
122:150 (9%)
179:207 (9%)
view
18 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
question_answering.py
sequence_classification.py
542:559 (1%)
224:241 (9%)
view
18 x 2 utils_nlp/dataset
utils_nlp/dataset
bbc_hindi.py
multinli.py
140:161 (25%)
224:245 (15%)
view
18 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
question_answering.py
298:315 (6%)
542:559 (1%)
view
16 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
sequence_classification.py
397:432 (6%)
323:358 (8%)
view
14 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
question_answering.py
question_answering.py
1007:1026 (1%)
1254:1275 (1%)
view
14 x 2 utils_nlp/models/gensen
utils_nlp/models/gensen
gensen.py
multi_task_model.py
72:87 (5%)
117:132 (6%)
view
13 x 2 utils_nlp/dataset
utils_nlp/dataset
dac.py
multinli.py
50:62 (20%)
134:146 (11%)
view
13 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
question_answering.py
316:379 (4%)
561:629 (1%)
view
13 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
question_answering.py
sequence_classification.py
561:629 (1%)
242:305 (7%)
view
13 x 2 utils_nlp/dataset
utils_nlp/dataset
bbc_hindi.py
dac.py
59:71 (18%)
50:62 (20%)
view
Duplicated Units
The list of top 8 duplicated units.
See data for all 8 unit duplicates...
Size#FoldersFilesLinesCode
28 x 2 utils_nlp/models/glove/src
utils_nlp/models/glove/src
cooccur.c
vocab_count.c
134:164 
135:165 
view
17 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
sequence_classification.py
0:0 
0:0 
view
12 x 4 utils_nlp/models/glove/src
utils_nlp/models/glove/src
utils_nlp/models/glove/src
utils_nlp/models/glove/src
cooccur.c
glove.c
shuffle.c
vocab_count.c
451:463 
353:365 
177:189 
226:238 
view
11 x 3 utils_nlp/dataset
utils_nlp/dataset
utils_nlp/dataset
bbc_hindi.py
dac.py
multinli.py
0:0 
0:0 
0:0 
view
9 x 2 utils_nlp/models/transformers
utils_nlp/models/transformers
named_entity_recognition.py
sequence_classification.py
0:0 
0:0 
view
16 x 2 utils_nlp/dataset
utils_nlp/dataset
data_loaders.py
data_loaders.py
0:0 
0:0 
view
6 x 2 utils_nlp/models/gensen
utils_nlp/models/gensen
gensen.py
gensen.py
0:0 
0:0 
view
9 x 3 utils_nlp/dataset
utils_nlp/dataset
utils_nlp/dataset
bbc_hindi.py
dac.py
multinli.py
0:0 
0:0 
0:0 
view