awslabs / sagemaker-debugger
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 8% duplication:
    • 14,710 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 1,274 duplicated lines
  • 129 duplicates
system8% (1,274 lines)
Duplication per Extension
py8% (1,118 lines)
yml51% (156 lines)
Duplication per Component (primary)
smdebug/profiler12% (511 lines)
smdebug/core4% (255 lines)
smdebug/tensorflow8% (184 lines)
config53% (156 lines)
smdebug/pytorch11% (63 lines)
smdebug/mxnet16% (49 lines)
smdebug/analysis41% (24 lines)
smdebug/trials3% (20 lines)
smdebug/xgboost2% (12 lines)
ROOT0% (0 lines)
smdebug0% (0 lines)
smdebug/rules0% (0 lines)

Duplication Between Components (50+ lines)

G smdebug/pytorch smdebug/pytorch smdebug/tensorflow smdebug/tensorflow smdebug/pytorch--smdebug/tensorflow 123 smdebug/mxnet smdebug/mxnet smdebug/mxnet--smdebug/pytorch 98 smdebug/mxnet--smdebug/tensorflow 95 smdebug/core smdebug/core smdebug/core--smdebug/tensorflow 79 smdebug/profiler smdebug/profiler smdebug/core--smdebug/profiler 51

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 129 duplicates...
Size#FoldersFilesLinesCode
27 x 2 config
config
buildspec_tensorflow_2_3.yml
buildspec_tensorflow_2_4.yml
10:39 (55%)
5:34 (52%)
view
19 x 2 smdebug/profiler/analysis/utils
smdebug/profiler/analysis/utils
profiler_data_to_pandas.py
profiler_data_to_pandas.py
358:380 (4%)
521:543 (4%)
view
18 x 2 smdebug/mxnet
smdebug/pytorch
hook.py
hook.py
30:47 (10%)
67:84 (3%)
view
17 x 2 smdebug/profiler/analysis/utils
smdebug/profiler/analysis/utils
merge_timelines.py
merge_timelines.py
167:183 (6%)
222:238 (6%)
view
15 x 2 smdebug/profiler/analysis/notebook_utils
smdebug/profiler/analysis/notebook_utils
heatmap.py
timeline_charts.py
133:148 (9%)
104:118 (4%)
view
14 x 2 smdebug/pytorch
smdebug/tensorflow
hook.py
session.py
66:79 (3%)
25:38 (5%)
view
13 x 2 config
config
buildspec.yml
buildspec_tensorflow_2_4.yml
50:63 (25%)
50:63 (25%)
view
13 x 2 smdebug/mxnet
smdebug/tensorflow
hook.py
base_hook.py
35:47 (7%)
69:81 (3%)
view
13 x 2 smdebug/mxnet
smdebug/tensorflow
hook.py
session.py
30:42 (7%)
26:38 (4%)
view
13 x 2 smdebug/pytorch
smdebug/tensorflow
collection.py
collection.py
8:20 (33%)
19:31 (11%)
view
13 x 2 smdebug/pytorch
smdebug/tensorflow
hook.py
base_hook.py
72:84 (2%)
69:81 (3%)
view
12 x 2 smdebug/profiler/analysis/utils
smdebug/profiler/analysis/utils
profiler_data_to_pandas.py
profiler_data_to_pandas.py
330:341 (3%)
481:492 (3%)
view
12 x 2 config
config
buildspec.yml
buildspec_tensorflow_2_3.yml
18:32 (23%)
23:37 (24%)
view
12 x 2 config
config
buildspec.yml
buildspec_tensorflow_2_4.yml
18:32 (23%)
18:32 (23%)
view
11 x 2 smdebug/profiler/analysis/notebook_utils
smdebug/profiler/analysis/notebook_utils
metrics_histogram.py
step_histogram.py
124:136 (9%)
112:124 (11%)
view
11 x 2 smdebug/pytorch
smdebug/tensorflow
hook.py
base_hook.py
86:96 (2%)
83:93 (2%)
view
11 x 2 smdebug/profiler/analysis/utils
smdebug/profiler/analysis/utils
merge_timelines.py
merge_timelines.py
273:284 (4%)
291:302 (4%)
view
11 x 2 smdebug/tensorflow
smdebug/tensorflow
keras.py
keras.py
1027:1038 (1%)
1396:1407 (1%)
view
11 x 2 smdebug/mxnet
smdebug/tensorflow
hook.py
base_hook.py
49:59 (6%)
83:93 (2%)
view
11 x 2 smdebug/tensorflow
smdebug/tensorflow
base_hook.py
session.py
83:93 (2%)
79:89 (4%)
view
Duplicated Units
The list of top 3 duplicated units.
See data for all 3 unit duplicates...
Size#FoldersFilesLinesCode
11 x 3 smdebug/pytorch
smdebug/mxnet
smdebug/tensorflow
hook.py
hook.py
session.py
0:0 
0:0 
0:0 
view
8 x 2 smdebug/core/tfevent
smdebug/core/tfrecord
event_file_reader.py
tensor_reader.py
0:0 
0:0 
view
7 x 3 smdebug/pytorch
smdebug/core
smdebug/tensorflow
collection.py
collection.py
collection.py
0:0 
0:0 
0:0 
view