awslabs / aws-serverless-data-lake-framework
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 221 files with 34,734 lines of code.
    • 4 very long files (4,922 lines of code)
    • 17 long files (12,181 lines of code)
    • 25 medium size files (6,545 lines of codeclsfd_ftr_w_mp_ins)
    • 39 small files (5,620 lines of code)
    • 136 very small files (5,466 lines of code)
14% | 35% | 18% | 16% | 15%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
yaml20% | 48% | 18% | 10% | 2%
py0% | 6% | 19% | 26% | 47%
scala0% | 0% | 46% | 35% | 17%
yml0% | 0% | 0% | 83% | 16%
js0% | 0% | 0% | 0% | 100%
q0% | 0% | 0% | 0% | 100%
toml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
sdlf-utils17% | 33% | 15% | 16% | 16%
sdlf-team43% | 36% | 7% | 12% | 0%
sdlf-foundations0% | 45% | 27% | 15% | 11%
sdlf-cicd0% | 47% | 52% | 0% | 0%
sdlf-stageA0% | 73% | 0% | 0% | 26%
sdlf-stageB0% | 64% | 0% | 0% | 35%
sdlf-datalakeLibrary0% | 0% | 36% | 27% | 36%
sdlf-pipeline0% | 0% | 63% | 36% | 0%
thirdparty-scms0% | 0% | 0% | 83% | 16%
sdlf-dataset0% | 0% | 0% | 100% | 0%
sdlf-pipLibrary0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
template-iam.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-team/nested-stacks
1307 -
template-iam.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-team/nested-stacks
1307 -
template-iam.yaml
in sdlf-utils/pipeline-examples/topic-modelling
1170 -
template-iam.yaml
in sdlf-team/nested-stacks
1138 -
template-cicd.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-team/nested-stacks
947 -
template-cicd.yaml
in sdlf-team/nested-stacks
947 -
template-kibana.yaml
in sdlf-foundations/nested-stacks
937 -
template.yaml
in sdlf-utils/pipeline-examples/manifests/stageA
887 -
template-cicd-child-foundations.yaml
in sdlf-cicd
878 -
template.yaml
in sdlf-utils/pipeline-examples/topic-modelling/stageB
831 -
template.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-stageA
696 -
template.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-stageB
687 -
template.yaml
in sdlf-utils/pipeline-examples/manifests/stageB
668 -
template-s3.yaml
in sdlf-foundations/nested-stacks
662 -
template.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-stageX
625 -
examplepipeline-glue-job.py
in sdlf-utils/pipeline-examples/glue-jobs-deployer/pipeline_scripts
617 26
template.yaml
in sdlf-utils/pipeline-examples/topic-modelling/stageA
595 -
template.yaml
in sdlf-stageA
595 -
template.yaml
in sdlf-stageB
555 -
template-dynamo.yaml
in sdlf-foundations/nested-stacks
528 -
template.yaml
in sdlf-utils/pipeline-examples/cloudfront
526 -
template-codecommit-pr-check.yaml
in sdlf-cicd
410 -
template-glue.yaml
in sdlf-foundations/nested-stacks
396 -
template-workload-management.yaml
in sdlf-utils/pipeline-examples/datalake-workload-management/wlm-standalone/nested-stack
378 -
template-cicd-team-repos.yaml
in sdlf-cicd
345 -
template.yaml
in sdlf-utils/ingestion-examples/preprocessing/fargate-unzip
313 -
light_transform_athena_ctas_sqoop.py
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-datalakeLibrary/python/datalake_library/transforms/stage_a_transforms
309 3
template.yaml
in sdlf-utils/ingestion-examples/cdc/dms-replication
302 -
light_transform_athena_ctas.py
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-datalakeLibrary/python/datalake_library/transforms/stage_a_transforms
290 3
template.yaml
in sdlf-utils/pipeline-examples/dataset-dependency/stageA
285 -
dynamo_interface.py
in sdlf-utils/pipeline-examples/datalake-workload-management/sdlf-wlm-integration/sdlf-datalakelibrary/python/datalake_library/interfaces
264 20
dynamo_interface.py
in sdlf-datalakeLibrary/python/datalake_library/interfaces
257 19
lambda_function.py
in sdlf-foundations/lambda/topic/src
228 12
metric.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
228 13
template.yaml
in sdlf-foundations
227 -
template.yaml
in sdlf-utils/ingestion-examples/cdc/dms-task
227 -
peh.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
220 6
template-cicd-shared-foundations.yaml
in sdlf-cicd
219 -
template-statemachine.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-pipeline/nested-stacks
212 -
template-statemachine.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-pipeline/nested-stacks
212 -
template-statemachine.yaml
in sdlf-pipeline/nested-stacks
207 -
deequ-analysis-verification-runner.scala
in sdlf-foundations/scripts/deequ/resources
206 8
lambda_function.py
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-stageA/lambda/stage-a-postupdate-metadata/src
205 3
template-statemachine.yaml
in sdlf-utils/pipeline-examples/manifests/pipeline/nested-stacks
203 -
template-cloudtrail.yaml
in sdlf-foundations/nested-stacks
201 -
template-kms.yaml
in sdlf-team/nested-stacks
201 -
template-sns.yaml
in sdlf-foundations/nested-stacks
197 -
lambda_function.py
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-stageB/lambda/stage-b-postupdate-metadata/src
194 1
template.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-pipeline
189 -
DMSCDC_Controller.py
in sdlf-utils/ingestion-examples/cdc/dms-replication/scripts
186 1
Files With Most Units (Top 20)
File# lines# units
resource_configs.py
in sdlf-utils/pipeline-examples/datalake-workload-management/sdlf-wlm-integration/sdlf-datalakelibrary/python/datalake_library/configuration
177 29
resource_configs.py
in sdlf-datalakeLibrary/python/datalake_library/configuration
160 27
examplepipeline-glue-job.py
in sdlf-utils/pipeline-examples/glue-jobs-deployer/pipeline_scripts
617 26
client.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
106 23
config.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
102 21
dynamo_interface.py
in sdlf-utils/pipeline-examples/datalake-workload-management/sdlf-wlm-integration/sdlf-datalakelibrary/python/datalake_library/interfaces
264 20
dynamo_interface.py
in sdlf-datalakeLibrary/python/datalake_library/interfaces
257 19
metadata.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
73 14
event_configs.py
in sdlf-datalakeLibrary/python/datalake_library/configuration
83 13
metric.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
228 13
lambda_function.py
in sdlf-foundations/lambda/topic/src
228 12
s3_interface.py
in sdlf-datalakeLibrary/python/datalake_library/interfaces
163 11
utils.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
53 11
lambda_function.py
in sdlf-foundations/lambda/replicate/src
136 8
deequ-analysis-verification-runner.scala
in sdlf-foundations/scripts/deequ/resources
206 8
heavy_transform_manifest.py
in sdlf-utils/pipeline-examples/manifests/transforms
105 7
artifact.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
67 7
schema_validator.py
in sdlf-datalakeLibrary/python/datalake_library/data_quality
67 7
states_interface.py
in sdlf-utils/pipeline-examples/datalake-workload-management/sdlf-wlm-integration/sdlf-datalakelibrary/python/datalake_library/interfaces
44 6
sqs_interface.py
in sdlf-utils/pipeline-examples/datalake-workload-management/sdlf-wlm-integration/sdlf-datalakelibrary/python/datalake_library/interfaces
85 6
Files With Long Lines (Top 20)

There are 60 files with lines longer than 120 characters. In total, there are 217 long lines.

File# lines# units# long lines
template-cicd.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-team/nested-stacks
947 - 23
template-cicd.yaml
in sdlf-team/nested-stacks
947 - 23
bitbucket-team-pipelines.yml
in thirdparty-scms/bbucket
124 - 12
deequ-suggestion-analysis-verification-runner.scala
in sdlf-foundations/scripts/deequ/resources
154 1 11
template-iam.yaml
in sdlf-utils/pipeline-examples/topic-modelling
1170 - 10
template-iam.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-team/nested-stacks
1307 - 10
template-iam.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-team/nested-stacks
1307 - 10
template-iam.yaml
in sdlf-team/nested-stacks
1138 - 10
deequ-analysis-verification-runner.scala
in sdlf-foundations/scripts/deequ/resources
206 8 9
template-codecommit-pr-check.yaml
in sdlf-cicd
410 - 8
template-cicd-child-foundations.yaml
in sdlf-cicd
878 - 6
template.yaml
in sdlf-utils/ingestion-examples/cdc/dms-task
227 - 5
template.yaml
in sdlf-utils/ingestion-examples/cdc/dms-replication
302 - 5
template-s3.yaml
in sdlf-foundations/nested-stacks
662 - 4
template-kibana.yaml
in sdlf-foundations/nested-stacks
937 - 3
template-cicd-team-repos.yaml
in sdlf-cicd
345 - 3
template.yaml
in sdlf-utils/pipeline-examples/event-dataset-dependencies/sdlf-engineering-pipeline
189 - 3
template.yaml
in sdlf-utils/ingestion-examples/sqoop/sdlf-pipeline
175 - 3
metric.py
in sdlf-datalakeLibrary/python/datalake_library/octagon
228 13 3
azure-pipelines-team-template.yml
in thirdparty-scms/ado
103 - 3