awslabs / aws-serverless-data-lake-framework
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 67% duplication:
    • 33,761 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 22,817 duplicated lines
  • 7,491 duplicates
system67% (22,817 lines)
Duplication per Extension
yaml76% (18,223 lines)
py48% (4,420 lines)
yml41% (113 lines)
scala21% (61 lines)
Duplication per Component (primary)
sdlf-utils75% (15,482 lines)
sdlf-team94% (2,479 lines)
sdlf-foundations31% (1,372 lines)
sdlf-stageB99% (820 lines)
sdlf-stageA100% (782 lines)
sdlf-cicd36% (676 lines)
sdlf-datalakeLibrary34% (629 lines)
sdlf-pipeline99% (326 lines)
sdlf-dataset100% (138 lines)
thirdparty-scms41% (113 lines)
sdlf-pipLibrary0% (0 lines)

Duplication Between Components (50+ lines)

G sdlf-team sdlf-team sdlf-utils sdlf-utils sdlf-team--sdlf-utils 8255 sdlf-stageA sdlf-stageA sdlf-stageA--sdlf-team 168 sdlf-stageA--sdlf-utils 5747 sdlf-stageB sdlf-stageB sdlf-stageA--sdlf-stageB 971 sdlf-stageB--sdlf-team 142 sdlf-stageB--sdlf-utils 5641 sdlf-foundations sdlf-foundations sdlf-foundations--sdlf-team 390 sdlf-foundations--sdlf-utils 2298 sdlf-foundations--sdlf-stageA 378 sdlf-foundations--sdlf-stageB 372 sdlf-cicd sdlf-cicd sdlf-cicd--sdlf-team 865 sdlf-cicd--sdlf-utils 1943 sdlf-cicd--sdlf-stageA 89 sdlf-cicd--sdlf-stageB 89 sdlf-cicd--sdlf-foundations 406 sdlf-pipeline sdlf-pipeline sdlf-cicd--sdlf-pipeline 65 sdlf-pipeline--sdlf-team 168 sdlf-pipeline--sdlf-utils 1455 sdlf-datalakeLibrary sdlf-datalakeLibrary sdlf-datalakeLibrary--sdlf-utils 1248 sdlf-dataset sdlf-dataset sdlf-dataset--sdlf-utils 633

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 7,491 duplicates...
Size#FoldersFilesLinesCode
1307 x 2 sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
sdlf-utils/pipeline-exam...sdlf-team/nested-stacks
template-iam.yaml
template-iam.yaml
1:1322 (100%)
1:1322 (100%)
view
947 x 2 sdlf-team/nested-stacks
sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
template-cicd.yaml
template-cicd.yaml
1:979 (100%)
1:979 (100%)
view
536 x 2 sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
610:1153 (47%)
643:1186 (45%)
view
380 x 2 sdlf-utils/pipeline-exam...sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
936:1322 (29%)
800:1186 (32%)
view
380 x 2 sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
936:1322 (29%)
800:1186 (32%)
view
380 x 2 sdlf-team/nested-stacks
sdlf-utils/pipeline-exam...sdlf-team/nested-stacks
template-iam.yaml
template-iam.yaml
767:1153 (33%)
936:1322 (29%)
view
380 x 2 sdlf-team/nested-stacks
sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
template-iam.yaml
template-iam.yaml
767:1153 (33%)
936:1322 (29%)
view
359 x 2 sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
248:609 (31%)
280:641 (30%)
view
319 x 2 sdlf-stageA
sdlf-utils/pipeline-exam.../topic-modelling/stageA
template.yaml
template.yaml
102:441 (53%)
101:440 (53%)
view
246 x 2 sdlf-utils/pipeline-examples/manifests/stageA
sdlf-utils/pipeline-exam.../topic-modelling/stageA
template.yaml
template.yaml
95:355 (27%)
98:358 (41%)
view
244 x 2 sdlf-stageA
sdlf-utils/pipeline-examples/manifests/stageA
template.yaml
template.yaml
102:359 (41%)
98:355 (27%)
view
222 x 2 sdlf-datalakeLibrary/pyt...lake_library/interfaces
sdlf-utils/pipeline-exam...lake_library/interfaces
dynamo_interface.py
dynamo_interface.py
47:312 (88%)
58:323 (86%)
view
212 x 2 sdlf-utils/ingestion-exa...-pipeline/nested-stacks
sdlf-utils/pipeline-exam...-pipeline/nested-stacks
template-statemachine.yaml
template-statemachine.yaml
1:218 (100%)
1:218 (100%)
view
198 x 2 sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
379:576 (15%)
323:520 (16%)
view
198 x 2 sdlf-team/nested-stacks
sdlf-utils/ingestion-exa...sdlf-team/nested-stacks
template-iam.yaml
template-iam.yaml
291:488 (17%)
379:576 (15%)
view
198 x 2 sdlf-team/nested-stacks
sdlf-utils/pipeline-exam...sdlf-team/nested-stacks
template-iam.yaml
template-iam.yaml
291:488 (17%)
379:576 (15%)
view
198 x 2 sdlf-utils/pipeline-exam...sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
379:576 (15%)
323:520 (16%)
view
181 x 2 sdlf-team/nested-stacks
sdlf-utils/pipeline-examples/topic-modelling
template-iam.yaml
template-iam.yaml
1:184 (15%)
1:184 (15%)
view
180 x 2 sdlf-utils/ingestion-exa...sqoop/sdlf-team/scripts
sdlf-utils/pipeline-exam...ncies/sdlf-team/scripts
template-team-repos.yaml
template-team-repos.yaml
1:186 (100%)
1:186 (100%)
view
175 x 2 sdlf-utils/ingestion-examples/sqoop/sdlf-pipeline
sdlf-utils/pipeline-exam...lf-engineering-pipeline
template.yaml
template.yaml
1:181 (100%)
1:181 (92%)
view
Duplicated Units
The list of top 20 duplicated units.
See data for all 40 unit duplicates...
Size#FoldersFilesLinesCode
125 x 2 sdlf-utils/pipeline-exam...orms/stage_a_transforms
sdlf-utils/pipeline-exam...orms/stage_a_transforms
light_transform_athena_ctas_sqoop.py
light_transform_athena_ctas.py
0:0 
0:0 
view
64 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-datalakeLibrary/pyt...lake_library/interfaces
dynamo_interface.py
dynamo_interface.py
0:0 
0:0 
view
63 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-datalakeLibrary/pyt...lake_library/interfaces
dynamo_interface.py
dynamo_interface.py
0:0 
0:0 
view
63 x 2 sdlf-stageB/lambda/stage-b-routing/src
sdlf-utils/pipeline-exam...bda/stage-b-routing/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
46 x 3 sdlf-utils/pipeline-exam...-preupdate-metadata/src
sdlf-utils/pipeline-exam...-preupdate-metadata/src
sdlf-stageA/lambda/stage-a-preupdate-metadata/src
lambda_function.py
lambda_function.py
lambda_function.py
0:0 
0:0 
0:0 
view
36 x 2 sdlf-stageB/lambda/stage-b-crawl-data/src
sdlf-utils/pipeline-exam.../stage-b-crawl-data/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
26 x 2 sdlf-stageB/lambda/stage-b-redrive/src
sdlf-utils/pipeline-exam...bda/stage-b-redrive/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
28 x 2 sdlf-utils/data_lake_testing_pipeline/src/lambdas
sdlf-utils/pipeline-testing/src/lambdas
pickTest.py
pickTest.py
0:0 
0:0 
view
26 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-datalakeLibrary/pyt...lake_library/interfaces
dynamo_interface.py
dynamo_interface.py
0:0 
0:0 
view
30 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-datalakeLibrary/pyt...lake_library/interfaces
sqs_interface.py
sqs_interface.py
0:0 
0:0 
view
24 x 2 sdlf-utils/data_lake_testing_pipeline/src/lambdas
sdlf-utils/pipeline-testing/src/lambdas
nextTest.py
nextTest.py
0:0 
0:0 
view
23 x 2 sdlf-utils/pipeline-exam...postupdate-metadata/src
sdlf-utils/pipeline-exam...a/stage-b-check-job/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
19 x 2 sdlf-utils/pipeline-exam...bda/stage-a-redrive/src
sdlf-stageA/lambda/stage-a-redrive/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
24 x 2 sdlf-foundations/lambda/elasticsearch/src
sdlf-foundations/lambda/topic/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
17 x 2 sdlf-foundations/lambda/catalog-redrive/src
sdlf-foundations/lambda/routing-redrive/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
17 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-datalakeLibrary/pyt...lake_library/interfaces
sqs_interface.py
sqs_interface.py
0:0 
0:0 
view
14 x 2 sdlf-utils/pipeline-exam...bda/stage-a-routing/src
sdlf-stageA/lambda/stage-a-routing/src
lambda_function.py
lambda_function.py
0:0 
0:0 
view
13 x 2 sdlf-foundations/scripts/deequ/resources
sdlf-utils/ingestion-exa...dms-replication/scripts
deequ-controller.py
DMSCDC_Controller.py
0:0 
0:0 
view
13 x 2 sdlf-utils/pipeline-exam...lake_library/interfaces
sdlf-utils/ingestion-exa...lake_library/interfaces
states_interface.py
states_interface.py
0:0 
0:0 
view
12 x 2 sdlf-foundations/scripts/deequ/resources
sdlf-foundations/scripts/deequ/resources
deequ-profile-runner.scala
deequ-analysis-verification-runner.scala
93:106 
305:318 
view