tensorflow / datasets
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 9% duplication:
    • 49,426 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 4,910 duplicated lines
  • 1,433 duplicates
system9% (4,910 lines)
Duplication per Extension
py10% (4,910 lines)
Duplication per Component (primary)
tensorflow_datasets/text16% (1,338 lines)
tensorflow_datasets/image_classification11% (905 lines)
tensorflow_datasets/structured9% (355 lines)
tensorflow_datasets/audio17% (342 lines)
tensorflow_datasets/question_answering27% (325 lines)
tensorflow_datasets/testing10% (271 lines)
tensorflow_datasets/image11% (256 lines)
tensorflow_datasets/translate10% (160 lines)
tensorflow_datasets/summarization9% (158 lines)
tensorflow_datasets/rl_unplugged19% (155 lines)
tensorflow_datasets/core1% (152 lines)
tensorflow_datasets/d4rl19% (139 lines)
tensorflow_datasets/vision_language12% (92 lines)
tensorflow_datasets/object_detection5% (92 lines)
tensorflow_datasets/rlds21% (86 lines)
tensorflow_datasets/ranking8% (28 lines)
tensorflow_datasets/text_simplification7% (24 lines)
tensorflow_datasets13% (14 lines)
tensorflow_datasets/video<1% (6 lines)
tensorflow_datasets/robomimic2% (6 lines)
tensorflow_datasets/time_series4% (6 lines)
ROOT0% (0 lines)
tensorflow_datasets/scripts0% (0 lines)
tensorflow_datasets/graphs0% (0 lines)
tensorflow_datasets/proto0% (0 lines)

Duplication Between Components (50+ lines)

G tensorflow_datasets/structured tensorflow_datasets/structured tensorflow_datasets/text tensorflow_datasets/text tensorflow_datasets/structured--tensorflow_datasets/text 319 tensorflow_datasets/summarization tensorflow_datasets/summarization tensorflow_datasets/structured--tensorflow_datasets/summarization 72 tensorflow_datasets/image tensorflow_datasets/image tensorflow_datasets/image--tensorflow_datasets/structured 212 tensorflow_datasets/image--tensorflow_datasets/text 264 tensorflow_datasets/image_classification tensorflow_datasets/image_classification tensorflow_datasets/image--tensorflow_datasets/image_classification 271 tensorflow_datasets/question_answering tensorflow_datasets/question_answering tensorflow_datasets/image--tensorflow_datasets/question_answering 84 tensorflow_datasets/image--tensorflow_datasets/summarization 108 tensorflow_datasets/testing tensorflow_datasets/testing tensorflow_datasets/image--tensorflow_datasets/testing 76 tensorflow_datasets/image_classification--tensorflow_datasets/structured 168 tensorflow_datasets/image_classification--tensorflow_datasets/text 237 tensorflow_datasets/image_classification--tensorflow_datasets/question_answering 95 tensorflow_datasets/image_classification--tensorflow_datasets/summarization 120 tensorflow_datasets/question_answering--tensorflow_datasets/structured 66 tensorflow_datasets/question_answering--tensorflow_datasets/text 211 tensorflow_datasets/summarization--tensorflow_datasets/text 194 tensorflow_datasets/audio tensorflow_datasets/audio tensorflow_datasets/audio--tensorflow_datasets/structured 54 tensorflow_datasets/audio--tensorflow_datasets/text 102 tensorflow_datasets/audio--tensorflow_datasets/image 60 tensorflow_datasets/audio--tensorflow_datasets/question_answering 60 tensorflow_datasets/audio--tensorflow_datasets/summarization 54

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 1,433 duplicates...
Size#FoldersFilesLinesCode
56 x 2 tensorflow_datasets/text
tensorflow_datasets/text
math_dataset.py
math_dataset.py
70:125 (28%)
128:183 (28%)
view
37 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
840:877 (2%)
909:945 (2%)
view
31 x 2 tensorflow_datasets/audio
tensorflow_datasets/audio
crema_d.py
savee.py
56:121 (31%)
60:124 (32%)
view
30 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
840:870 (1%)
1189:1218 (1%)
view
30 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
909:938 (1%)
1189:1218 (1%)
view
26 x 2 tensorflow_datasets/audio
tensorflow_datasets/audio
crema_d.py
dementiabank.py
56:114 (26%)
46:103 (29%)
view
26 x 2 tensorflow_datasets/audio
tensorflow_datasets/audio
dementiabank.py
savee.py
46:103 (29%)
60:117 (27%)
view
24 x 2 tensorflow_datasets/text
tensorflow_datasets/text
eraser_multi_rc.py
movie_rationales.py
74:101 (38%)
69:93 (38%)
view
24 x 2 tensorflow_datasets/image_classification
tensorflow_datasets/image_classification
plant_leaves.py
plantae_k.py
101:132 (30%)
92:123 (32%)
view
24 x 2 tensorflow_datasets/rlds...te_panda_pick_place_can
tensorflow_datasets/rlds...te_panda_pick_place_can
robosuite_panda_pick_place_can.py
robosuite_panda_pick_place_can.py
94:117 (12%)
162:185 (12%)
view
23 x 2 tensorflow_datasets/image_classification
tensorflow_datasets/image_classification
imagenette.py
imagewang.py
111:135 (34%)
118:142 (32%)
view
22 x 2 tensorflow_datasets/imag...cation/i_naturalist2018
tensorflow_datasets/image_classification
i_naturalist2018.py
inaturalist.py
130:155 (21%)
135:160 (22%)
view
22 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
879:900 (1%)
923:944 (1%)
view
22 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
855:876 (1%)
879:900 (1%)
view
21 x 2 tensorflow_datasets/question_answering
tensorflow_datasets/question_answering
ai2_arc.py
ai2_arc_with_ir.py
74:98 (22%)
95:119 (21%)
view
20 x 2 tensorflow_datasets/text
tensorflow_datasets/text/xtreme_xnli
xnli.py
xtreme_xnli.py
60:83 (40%)
51:74 (27%)
view
19 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
786:806 (1%)
1061:1080 (1%)
view
19 x 2 tensorflow_datasets/structured/web_nlg
tensorflow_datasets/structured
web_nlg.py
wiki_bio.py
61:84 (17%)
78:98 (15%)
view
19 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
955:973 (1%)
1061:1080 (1%)
view
19 x 2 tensorflow_datasets/text/gem
tensorflow_datasets/text/gem
gem.py
gem.py
786:806 (1%)
955:973 (1%)
view
Duplicated Units
The list of top 15 duplicated units.
See data for all 15 unit duplicates...
Size#FoldersFilesLinesCode
24 x 2 tensorflow_datasets/image_classification
tensorflow_datasets/image_classification
plant_leaves.py
plantae_k.py
0:0 
0:0 
view
19 x 2 tensorflow_datasets/text/xtreme_xnli
tensorflow_datasets/text
xtreme_xnli.py
xnli.py
0:0 
0:0 
view
27 x 4 tensorflow_datasets/audio
tensorflow_datasets/audio
tensorflow_datasets/audio
tensorflow_datasets/audio
dementiabank.py
voxforge.py
savee.py
crema_d.py
0:0 
0:0 
0:0 
0:0 
view
25 x 2 tensorflow_datasets/audio
tensorflow_datasets/audio
voxforge.py
crema_d.py
0:0 
0:0 
view
12 x 2 tensorflow_datasets/audio/gtzan_music_speech
tensorflow_datasets/audio/gtzan
gtzan_music_speech.py
gtzan.py
0:0 
0:0 
view
12 x 2 tensorflow_datasets/core/logging
tensorflow_datasets/core/logging
logging_logger.py
base_logger.py
0:0 
0:0 
view
14 x 2 tensorflow_datasets/translate
tensorflow_datasets/translate
ted_hrlr.py
flores.py
0:0 
0:0 
view
12 x 2 tensorflow_datasets/text/paws_x_wiki
tensorflow_datasets/text
paws_x_wiki.py
paws_wiki.py
0:0 
0:0 
view
8 x 2 tensorflow_datasets/testing
tensorflow_datasets/testing
feature_test_case.py
feature_test_case.py
0:0 
0:0 
view
9 x 2 tensorflow_datasets/image_classification
tensorflow_datasets/image_classification
imagewang.py
imagenette.py
0:0 
0:0 
view
7 x 3 tensorflow_datasets/question_answering
tensorflow_datasets/question_answering
tensorflow_datasets/question_answering
mctaco.py
cosmos_qa.py
web_questions.py
0:0 
0:0 
0:0 
view
6 x 2 tensorflow_datasets/testing/fake_data_generation
tensorflow_datasets/image/bccd
voc2007.py
dummy_data_generation.py
0:0 
0:0 
view
6 x 3 tensorflow_datasets/testing/fake_data_generation
tensorflow_datasets/testing/fake_data_generation
tensorflow_datasets/testing/fake_data_generation
cassava.py
beans.py
caltech.py
0:0 
0:0 
0:0 
view
6 x 2 tensorflow_datasets/image
tensorflow_datasets/image
shapes3d.py
dsprites.py
0:0 
0:0 
view
6 x 2 tensorflow_datasets/image_classification
tensorflow_datasets/image_classification
citrus.py
plant_village.py
0:0 
0:0 
view