aws-samples / aws-big-data-blog
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
  • For duplication, we look at places in code where there are 6 or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 12% duplication:
    • 9,771 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 1,213 duplicated lines
  • 74 duplicates
system12% (1,213 lines)
Duplication per Extension
yml35% (375 lines)
html30% (210 lines)
java5% (196 lines)
R49% (178 lines)
py18% (152 lines)
js2% (62 lines)
scala10% (28 lines)
yaml14% (12 lines)
Duplication per Component (primary)
BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing31% (435 lines)
aws-blog-sparkr-geospatial-analysis49% (178 lines)
aws-blog-lambda-map-reduce-streaming15% (120 lines)
aws-blog-kinesis-data-visualization6% (117 lines)
aws-blog-multi-az-multi-region-redshift49% (98 lines)
aws-blog-jobserver-emr16% (68 lines)
aws-blog-mirth-healthcare-hub26% (62 lines)
aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platform8% (32 lines)
aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite9% (25 lines)
aws-blog-hbase-on-emr2% (24 lines)
aws-blog-vpcflowlogs-athena-quicksight4% (14 lines)
aws-blog-nodejs-on-emr17% (12 lines)
aws-blog-s3-index-with-lambda-ddb3% (12 lines)
aws-blog-kinesis-beanstalk-workers2% (10 lines)
aws-blog-kinesis-storm-clickstream-app2% (6 lines)
aws-blog-firehose-lambda0% (0 lines)
aws-blog-end-to-end-emrencyrption0% (0 lines)
aws-blog-spark-parquet-conversion0% (0 lines)
aws-blog-campanile0% (0 lines)
aws-blog-retain-redshift-stl0% (0 lines)
aws-blog-strata20160% (0 lines)
aws-blog-sqoop0% (0 lines)
aws-blog-sparkstreaming-from-kafka0% (0 lines)
aws-blog-machine-learning-with-spark0% (0 lines)
aws-blog-bigtop-application-emr0% (0 lines)
aws-blog-analyzing-using-ipython-notebook0% (0 lines)
aws-blog-real-time-clickstream0% (0 lines)
aws-blog-event-driven-batch-analytics0% (0 lines)
aws-blog-elasticsearch-cascading-commoncrawl0% (0 lines)
aws-blog-redshift-python-udf-tutorial0% (0 lines)
aws-blog-emrencyrption0% (0 lines)
aws-blog-athena-genomics0% (0 lines)
aws-blog-athena-importing-hive-metastores0% (0 lines)
aws-blog-titan-graph-database0% (0 lines)
aws-blog-iot-athena-quicksight-bi0% (0 lines)

Duplication Between Components (50+ lines)

G aws-blog-kinesis-data-visualization aws-blog-kinesis-data-visualization aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite aws-blog-kinesis-data-visualization--aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite 50

Download: SVG DOT (open online Graphviz editor)

Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 20 longest duplicates.
See data for all 74 duplicates...
Size#FoldersFilesLinesCode
67 x 2 aws-blog-sparkr-geospatial-analysis
aws-blog-sparkr-geospatial-analysis
SparkRGeoInt.R
SparkRGeoIntPipes.R
23:89 (37%)
27:93 (40%)
view
28 x 2 aws-blog-kinesis-data-visualization/Globe
aws-blog-kinesis-data-visualization/Globe
globe.html
heatmap.html
1:32 (70%)
1:32 (45%)
view
23 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
653:675 (2%)
691:713 (2%)
view
23 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
1027:1049 (2%)
1076:1098 (2%)
view
23 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
613:635 (2%)
691:713 (2%)
view
23 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
613:635 (2%)
653:675 (2%)
view
21 x 2 aws-blog-multi-az-multi-region-redshift
aws-blog-multi-az-multi-region-redshift
loader.py
loader.py
136:156 (10%)
197:217 (10%)
view
19 x 2 aws-blog-lambda-map-reduce-streaming
aws-blog-lambda-map-reduce-streaming
index.html
scatter.html
48:66 (23%)
208:226 (8%)
view
18 x 2 aws-blog-kinesis-data-vi...s/dataviz/kinesisclient
aws-blog-real-time-in-me...mostreams/clientlibrary
KinesisRecordProcessor.java
StreamsRecordProcessor.java
172:200 (20%)
116:141 (19%)
view
16 x 2 aws-blog-lambda-map-reduce-streaming
aws-blog-lambda-map-reduce-streaming
index.html
scatter.html
1:16 (19%)
1:16 (7%)
view
16 x 2 BDB-701-Moving-From-Tran...m/amazon/aws/blog/model
BDB-701-Moving-From-Tran...com/amazon/aws/blog/ddb
StatefulArtifactIndex.java
StatefulArtifactIndex.java
13:32 (100%)
18:49 (100%)
view
14 x 2 BDB-701-Moving-From-Tran...mazon/aws/blog/function
BDB-701-Moving-From-Tran...ava/com/amazon/aws/blog
PreFetcherFlatMapFunction.java
StateIndexerLambdaImpl.java
64:81 (20%)
279:296 (9%)
view
14 x 2 aws-blog-lambda-map-reduce-streaming
aws-blog-lambda-map-reduce-streaming
index.html
scatter.html
33:46 (17%)
29:42 (6%)
view
14 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
1006:1019 (1%)
1052:1065 (1%)
view
13 x 2 aws-blog-multi-az-multi-region-redshift
aws-blog-multi-az-multi-region-redshift
loader.py
loader.py
173:186 (6%)
234:246 (6%)
view
11 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
823:833 (1%)
902:912 (1%)
view
11 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
823:833 (1%)
876:886 (1%)
view
11 x 2 aws-blog-lambda-map-reduce-streaming
aws-blog-lambda-map-reduce-streaming
scatter.html
scatter.html
89:99 (5%)
167:177 (5%)
view
11 x 2 aws-blog-mirth-healthcare-hub/bb-webapp
aws-blog-mirth-healthcare-hub/hl7-webapp
bb-conversion-listener.js
service-xml-to-json.js
11:26 (45%)
12:25 (37%)
view
11 x 2 BDB-701-Moving-From-Tran...ateful-Batch-Processing
BDB-701-Moving-From-Tran...ateful-Batch-Processing
template.yml
template.yml
876:886 (1%)
902:912 (1%)
view