aws-samples / amazon-sagemaker-build-train-deploy
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 52% duplication:
    • 1,495 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 781 duplicated lines
  • 17 duplicates
system52% (781 lines)
Duplication per Extension
yaml52% (402 lines)
py51% (379 lines)
Duplication per Component (primary)
setup52% (402 lines)
03_train_model/source_dir92% (107 lines)
08_projects/modelbuild/pipelines/endtoendmlsm/train98% (56 lines)
08_projects/modelbuild/pipelines/endtoendmlsm/deploy94% (55 lines)
08_projects/modelbuild/pipelines/endtoendmlsm/dataprep98% (53 lines)
02_data_exploration_and_feature_eng/source_dir100% (53 lines)
04_deploy_model/sklearn_source_dir100% (28 lines)
04_deploy_model/xgboost_source_dir100% (27 lines)
08_projects/modelbuild/pipelines/endtoendmlsm0% (0 lines)
04_deploy_model0% (0 lines)
02_data_exploration_and_feature_eng0% (0 lines)

Duplication Between Components (50+ lines)

G 03_train_model/source_dir 03_train_model/source_dir 08_projects/modelbuild/pipelines/endtoendmlsm/train 08_projects/modelbuild/pipelines/endtoendmlsm/train 03_train_model/source_dir--08_projects/modelbuild/pipelines/endtoendmlsm/train 163 02_data_exploration_and_feature_eng/source_dir 02_data_exploration_and_feature_eng/source_dir 08_projects/modelbuild/pipelines/endtoendmlsm/dataprep 08_projects/modelbuild/pipelines/endtoendmlsm/dataprep 02_data_exploration_and_feature_eng/source_dir--08_projects/modelbuild/pipelines/endtoendmlsm/dataprep 106 04_deploy_model/sklearn_source_dir 04_deploy_model/sklearn_source_dir 08_projects/modelbuild/pipelines/endtoendmlsm/deploy 08_projects/modelbuild/pipelines/endtoendmlsm/deploy 04_deploy_model/sklearn_source_dir--08_projects/modelbuild/pipelines/endtoendmlsm/deploy 56 04_deploy_model/xgboost_source_dir 04_deploy_model/xgboost_source_dir 04_deploy_model/xgboost_source_dir--08_projects/modelbuild/pipelines/endtoendmlsm/deploy 54

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 17 longest duplicates.
See data for all 17 duplicates...
Size#FoldersFilesLinesCode
91 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
189:282 (33%)
283:376 (18%)
view
67 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
1:69 (24%)
1:69 (13%)
view
56 x 2 03_train_model/source_dir
08_projects/modelbuild/p...ines/endtoendmlsm/train
training.py
train.py
11:86 (100%)
11:86 (100%)
view
53 x 2 02_data_exploration_and_feature_eng/source_dir
08_projects/modelbuild/p...s/endtoendmlsm/dataprep
preprocessor.py
preprocess.py
6:97 (100%)
6:97 (100%)
view
37 x 2 03_train_model/source_dir
08_projects/modelbuild/p...ines/endtoendmlsm/train
training_debug.py
train.py
14:65 (61%)
11:62 (66%)
view
37 x 2 03_train_model/source_dir
03_train_model/source_dir
training.py
training_debug.py
11:62 (66%)
14:65 (61%)
view
28 x 2 04_deploy_model/sklearn_source_dir
08_projects/modelbuild/p...oendmlsm/deploy/sklearn
inference.py
inference.py
14:48 (100%)
14:48 (100%)
view
27 x 2 04_deploy_model/xgboost_source_dir
08_projects/modelbuild/p...oendmlsm/deploy/xgboost
inference.py
inference.py
8:41 (100%)
8:41 (100%)
view
19 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
152:172 (7%)
246:266 (3%)
view
14 x 2 03_train_model/source_dir
08_projects/modelbuild/p...ines/endtoendmlsm/train
training_debug.py
train.py
71:86 (23%)
64:79 (25%)
view
14 x 2 03_train_model/source_dir
03_train_model/source_dir
training.py
training_debug.py
64:79 (25%)
71:86 (23%)
view
13 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
174:187 (4%)
268:281 (2%)
view
7 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
284:290 (2%)
532:538 (1%)
view
6 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
266:271 (2%)
441:446 (1%)
view
6 x 2 setup
setup
create_studio_domain_vpc.yaml
create_studio_domain_vpc.yaml
297:302 (1%)
439:444 (1%)
view
6 x 2 setup
setup
create_studio_domain_vpc.yaml
create_studio_domain_vpc.yaml
360:365 (1%)
441:446 (1%)
view
6 x 2 setup
setup
create_studio_domain.yaml
create_studio_domain_vpc.yaml
203:208 (2%)
439:444 (1%)
view
Duplicated Units
The list of top 6 duplicated units.
See data for all 6 unit duplicates...
Size#FoldersFilesLinesCode
39 x 2 08_projects/modelbuild/p...ines/endtoendmlsm/train
03_train_model/source_dir
train.py
training.py
0:0 
0:0 
view
13 x 3 08_projects/modelbuild/p...ines/endtoendmlsm/train
03_train_model/source_dir
03_train_model/source_dir
train.py
training_debug.py
training.py
0:0 
0:0 
0:0 
view
13 x 2 08_projects/modelbuild/p...oendmlsm/deploy/xgboost
04_deploy_model/xgboost_source_dir
inference.py
inference.py
0:0 
0:0 
view
10 x 2 08_projects/modelbuild/p...oendmlsm/deploy/sklearn
04_deploy_model/sklearn_source_dir
inference.py
inference.py
0:0 
0:0 
view
8 x 2 08_projects/modelbuild/p...oendmlsm/deploy/sklearn
04_deploy_model/sklearn_source_dir
inference.py
inference.py
0:0 
0:0 
view
7 x 2 08_projects/modelbuild/p...oendmlsm/deploy/xgboost
04_deploy_model/xgboost_source_dir
inference.py
inference.py
0:0 
0:0 
view