aws-samples / aws-analytics-reference-architecture
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 17% duplication:
    • 10,475 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 1,842 duplicated lines
  • 179 duplicates
system17% (1,842 lines)
Duplication per Extension
py23% (1,202 lines)
ts14% (339 lines)
scala13% (139 lines)
yaml42% (104 lines)
java3% (58 lines)
Duplication per Component (primary)
refarch/aws-native/common26% (814 lines)
refarch/aws-native/dataviz33% (363 lines)
core/src/emr-eks-platform18% (186 lines)
core/src/data-generator26% (148 lines)
core/src/datasets21% (77 lines)
refarch/aws-native/streaming5% (58 lines)
core/src/synchronous-athena-query28% (46 lines)
refarch/aws-native/batch9% (40 lines)
refarch/aws-native/dwh5% (38 lines)
core/src7% (30 lines)
core/src/notebook-platform5% (16 lines)
core/src/synchronous-crawler12% (14 lines)
refarch/aws-native14% (12 lines)
refarch/aws-native/cicd0% (0 lines)
core/src/common0% (0 lines)
core/src/db-schema-manager0% (0 lines)
doc0% (0 lines)
doc/content/javascripts0% (0 lines)
doc/content/stylesheets0% (0 lines)

Duplication Between Components (50+ lines)

G refarch/aws-native/common refarch/aws-native/common refarch/aws-native/dataviz refarch/aws-native/dataviz refarch/aws-native/common--refarch/aws-native/dataviz 494

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 179 duplicates...
Size#FoldersFilesLinesCode
64 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/common/common_cdk
batch_data_generator.py
stream_data_generator.py
46:109 (15%)
63:126 (14%)
view
58 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
327:384 (9%)
154:211 (13%)
view
39 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
162:206 (6%)
10:54 (9%)
view
38 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
407:444 (6%)
238:275 (9%)
view
36 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
629:664 (5%)
456:491 (8%)
view
26 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/common/common_cdk
batch_data_generator.py
stream_data_generator.py
122:158 (6%)
141:177 (5%)
view
23 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
383:405 (3%)
214:236 (5%)
view
22 x 2 core/src/data-generator
core/src/data-generator
data-generator.ts
data-generator.ts
103:124 (8%)
318:339 (8%)
view
22 x 2 core/src/data-generator
core/src/data-generator
data-generator.ts
data-generator.ts
109:130 (8%)
157:178 (8%)
view
18 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/datav...ft_cdk/dataviz_redshift
config.py
redshift_config.py
551:568 (2%)
286:303 (4%)
view
16 x 2 core/src/data-generator
core/src/data-generator
data-generator.ts
data-generator.ts
197:212 (5%)
236:251 (5%)
view
16 x 2 core/src/data-generator
core/src/data-generator
data-generator.ts
data-generator.ts
157:172 (5%)
324:339 (5%)
view
15 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/common/common_cdk
batch_data_generator.py
stream_data_generator.py
165:180 (3%)
184:199 (3%)
view
15 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/common/common_cdk
batch_data_generator.py
stream_data_generator.py
307:321 (3%)
253:267 (3%)
view
14 x 2 core/src/emr-eks-platfor...ambdas/managed-endpoint
core/src/synchronous-crawler/resources/lambdas
lambda.py
lambda.py
14:31 (18%)
10:25 (24%)
view
14 x 2 refarch/aws-native/batch/glue-scripts
refarch/aws-native/batch/glue-scripts
raw2clean_hudi.py
raw2clean_parquet.py
59:84 (15%)
45:70 (33%)
view
14 x 2 core/src/emr-eks-platfor...ambdas/managed-endpoint
core/src/synchronous-ath...query/resources/lambdas
lambda.py
lambda.py
14:31 (18%)
12:27 (25%)
view
14 x 2 core/src/synchronous-ath...query/resources/lambdas
core/src/synchronous-crawler/resources/lambdas
lambda.py
lambda.py
12:27 (25%)
10:25 (24%)
view
13 x 2 core/src/datasets
core/src/datasets
dataset.ts
partitioned-dataset.ts
253:265 (5%)
104:119 (17%)
view
12 x 2 refarch/aws-native/common/common_cdk
refarch/aws-native/common/common_cdk
stream_data_generator.py
stream_data_generator.py
379:390 (2%)
479:490 (2%)
view
Duplicated Units
The list of top 2 duplicated units.
See data for all 2 unit duplicates...
Size#FoldersFilesLinesCode
9 x 4 core/src/synchronous-crawler/resources/lambdas
core/src/emr-eks-platfor...mbdas/nodegroup-asg-tag
core/src/emr-eks-platfor...ambdas/managed-endpoint
core/src/synchronous-ath...query/resources/lambdas
lambda.py
lambda.py
lambda.py
lambda.py
0:0 
0:0 
0:0 
0:0 
view
6 x 3 refarch/aws-native/strea...ara/streaming/operators
refarch/aws-native/strea...ara/streaming/operators
refarch/aws-native/strea...ara/streaming/operators
ItemParser.java
ItemParser.java
PromoParser.java
18:24 
60:66 
18:24 
view