aws-samples / aws-analytics-reference-architecture
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 175 files with 11,876 lines of code.
    • 0 very long files (0 lines of code)
    • 1 long files (631 lines of code)
    • 11 medium size files (3,523 lines of codeclsfd_ftr_w_mp_ins)
    • 22 small files (2,896 lines of code)
    • 141 very small files (4,826 lines of code)
0% | 5% | 29% | 24% | 40%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 11% | 27% | 17% | 42%
ts0% | 0% | 46% | 23% | 30%
java0% | 0% | 22% | 27% | 49%
scala0% | 0% | 20% | 50% | 29%
yaml0% | 0% | 0% | 54% | 45%
yml0% | 0% | 0% | 0% | 100%
sbt0% | 0% | 0% | 0% | 100%
css0% | 0% | 0% | 0% | 100%
js0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
refarch/aws-native/common0% | 19% | 34% | 20% | 25%
core/src/db-schema-manager0% | 0% | 35% | 26% | 38%
core/src/emr-eks-platform0% | 0% | 43% | 22% | 33%
refarch/aws-native/dataviz0% | 0% | 37% | 0% | 62%
core/src/data-generator0% | 0% | 47% | 38% | 13%
core/src/datasets0% | 0% | 67% | 0% | 32%
core/src/notebook-platform0% | 0% | 60% | 35% | 4%
refarch/aws-native/streaming0% | 0% | 15% | 26% | 58%
refarch/aws-native/dwh0% | 0% | 0% | 62% | 37%
core/src0% | 0% | 0% | 32% | 67%
refarch/aws-native/batch0% | 0% | 0% | 31% | 68%
core/src/synchronous-athena-query0% | 0% | 0% | 66% | 33%
core/src/synchronous-crawler0% | 0% | 0% | 0% | 100%
refarch/aws-native0% | 0% | 0% | 0% | 100%
refarch/aws-native/cicd0% | 0% | 0% | 0% | 100%
doc0% | 0% | 0% | 0% | 100%
core/src/common0% | 0% | 0% | 0% | 100%
doc/content/stylesheets0% | 0% | 0% | 0% | 100%
doc/content/javascripts0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
config.py
in refarch/aws-native/common/common_cdk
631 -
emr-eks-cluster.ts
in core/src/emr-eks-platform
487 8
stream_data_generator.py
in refarch/aws-native/common/common_cdk
446 1
batch_data_generator.py
in refarch/aws-native/common/common_cdk
416 1
redshift_config.py
in refarch/aws-native/dataviz/dataviz_redshift_cdk/dataviz_redshift
415 -
data-generator.ts
in core/src/data-generator
292 1
FlywayRequest.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
274 65
dataset.ts
in core/src/datasets
267 7
FlywayService.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway
253 14
RawSchema.scala
in refarch/aws-native/common/data-generator/src/main/scala
230 -
notebook-platform.ts
in core/src/notebook-platform
223 2
es_domain.py
in refarch/aws-native/streaming/streaming_cdk
220 4
GitService.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/lambda/git
182 19
BaseData.scala
in refarch/aws-native/common/data-generator/src/main/scala
176 4
redshift_admin.py
in refarch/aws-native/dwh/dwh_cdk
161 3
data-lake-exporter.ts
in core/src
155 1
BaseSchema.scala
in refarch/aws-native/common/data-generator/src/main/scala
149 -
dwh_loader.py
in refarch/aws-native/dwh/dwh_cdk
148 1
raw2clean_job.py
in refarch/aws-native/batch/batch_cdk
146 2
kda_application.py
in refarch/aws-native/streaming/streaming_cdk
143 2
CleanSchema.scala
in refarch/aws-native/common/data-generator/src/main/scala
138 -
emr-containers.yaml
in core/src/emr-eks-platform/resources/k8s/rbac
133 -
notebook-platform-helpers.ts
in core/src/notebook-platform
131 7
StreamFactory.java
in refarch/aws-native/streaming/stream-processing/src/main/java/com/amazonaws/ara/streaming
128 12
write-in-batch.py
in core/src/data-generator/resources/lambdas/write-in-batch
123 5
emr-eks-nodegroup.ts
in core/src/emr-eks-platform
119 -
redshift.py
in refarch/aws-native/dwh/dwh_cdk
115 7
synchronous-athena-query.ts
in core/src/synchronous-athena-query
114 1
AWSRequestSigningApacheInterceptor.java
in refarch/aws-native/streaming/stream-processing/src/main/java/com/amazonaws/ara/http
112 4
batch-replayer.ts
in core/src/data-generator
112 1
StreamGenerator.scala
in refarch/aws-native/common/data-generator/src/main/scala
106 3
GitRequest.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
103 15
data_lake.py
in refarch/aws-native/common/common_cdk
101 2
ValueManager.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
101 11
raw2clean_hudi.py
in refarch/aws-native/batch/glue-scripts
100 1
data-lake-storage.ts
in core/src
92 1
DataGenerator.scala
in refarch/aws-native/common/data-generator/src/main/scala
89 2
crawler.py
in refarch/aws-native/batch/batch_cdk
88 2
S3Service.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/lambda/s3
87 11
redshift_datasource.py
in refarch/aws-native/dataviz/dataviz_redshift_cdk/dataviz_redshift/lambda
83 4
audit_trail_glue.py
in refarch/aws-native/common/common_cdk
83 1
foundations.py
in refarch/aws-native/common/common_cdk
83 15
partitioned-dataset.ts
in core/src/datasets
82 3
qs_redshift_dataset.py
in refarch/aws-native/dataviz/dataviz_redshift_cdk/dataviz_redshift
81 2
DataframeUtils.scala
in refarch/aws-native/common/data-generator/src/main/scala
80 11
flyway-runner.ts
in core/src/db-schema-manager
80 1
lambda.py
in core/src/emr-eks-platform/resources/lambdas/managed-endpoint
79 5
batch_module.py
in refarch/aws-native/batch/batch_cdk
78 1
StreamingJob.java
in refarch/aws-native/streaming/stream-processing/src/main/java/com/amazonaws/ara/streaming
76 2
qs_athena_dataset.py
in refarch/aws-native/dataviz/dataviz_cdk
76 2
Files With Most Units (Top 20)
File# lines# units
FlywayRequest.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
274 65
GitService.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/lambda/git
182 19
foundations.py
in refarch/aws-native/common/common_cdk
83 15
GitRequest.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
103 15
FlywayService.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway
253 14
StreamFactory.java
in refarch/aws-native/streaming/stream-processing/src/main/java/com/amazonaws/ara/streaming
128 12
DataframeUtils.scala
in refarch/aws-native/common/data-generator/src/main/scala
80 11
ValueManager.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
101 11
S3Service.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/lambda/s3
87 11
emr-eks-cluster.ts
in core/src/emr-eks-platform
487 8
Request.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
35 8
redshift.py
in refarch/aws-native/dwh/dwh_cdk
115 7
dataset.ts
in core/src/datasets
267 7
notebook-platform-helpers.ts
in core/src/notebook-platform
131 7
DBRequest.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
54 7
dwh_module.py
in refarch/aws-native/dwh/dwh_cdk
59 6
ResponseInfo.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/response
33 6
streaming_module.py
in refarch/aws-native/streaming/streaming_cdk
59 5
lambda.py
in core/src/synchronous-crawler/resources/lambdas
61 5
lambda.py
in core/src/emr-eks-platform/resources/lambdas/nodegroup-asg-tag
74 5
Files With Long Lines (Top 20)

There are 48 files with lines longer than 120 characters. In total, there are 135 long lines.

File# lines# units# long lines
raw2clean_hudi.py
in refarch/aws-native/batch/glue-scripts
100 1 10
emr-eks-cluster.ts
in core/src/emr-eks-platform
487 8 10
DataframeUtils.scala
in refarch/aws-native/common/data-generator/src/main/scala
80 11 8
BaseData.scala
in refarch/aws-native/common/data-generator/src/main/scala
176 4 7
StreamGenerator.scala
in refarch/aws-native/common/data-generator/src/main/scala
106 3 6
batch_module.py
in refarch/aws-native/batch/batch_cdk
78 1 6
raw2clean_parquet.py
in refarch/aws-native/batch/glue-scripts
51 1 6
notebook-platform-helpers.ts
in core/src/notebook-platform
131 7 6
ValueManager.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway/request
101 11 6
batch_data_generator.py
in refarch/aws-native/common/common_cdk
416 1 5
stream_data_generator.py
in refarch/aws-native/common/common_cdk
446 1 4
notebook-platform.ts
in core/src/notebook-platform
223 2 4
FlywayCustomResourceHandler.java
in core/src/db-schema-manager/resources/flyway-lambda/src/main/java/com/geekoosh/flyway
66 4 4
streaming_module.py
in refarch/aws-native/streaming/streaming_cdk
59 5 3
BatchModuleData.scala
in refarch/aws-native/common/data-generator/src/main/scala
42 1 3
raw2clean_job.py
in refarch/aws-native/batch/batch_cdk
146 2 3
dwh_module.py
in refarch/aws-native/dwh/dwh_cdk
59 6 3
redshift_admin.py
in refarch/aws-native/dwh/dwh_cdk
161 3 3
es_domain.py
in refarch/aws-native/streaming/streaming_cdk
220 4 2
CustomerAddressJoiner.java
in refarch/aws-native/streaming/stream-processing/src/main/java/com/amazonaws/ara/streaming/operators
46 4 2