aws-samples / amazon-textract-transformer-pipeline
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 53 files with 7,069 lines of code.
    • 0 very long files (0 lines of code)
    • 2 long files (1,079 lines of code)
    • 11 medium size files (2,908 lines of codeclsfd_ftr_w_mp_ins)
    • 14 small files (2,242 lines of code)
    • 26 very small files (840 lines of code)
0% | 15% | 41% | 31% | 11%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
html0% | 58% | 41% | 0% | 0%
py0% | 8% | 41% | 36% | 13%
toml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
notebooks/annotation0% | 100% | 0% | 0% | 0%
pipeline/ocr0% | 47% | 23% | 28% | 0%
notebooks/src0% | 0% | 52% | 44% | 3%
notebooks/util0% | 0% | 47% | 31% | 20%
notebooks/review0% | 0% | 100% | 0% | 0%
pipeline0% | 0% | 73% | 0% | 26%
notebooks/preproc0% | 0% | 99% | 0% | <1%
ROOT0% | 0% | 74% | 0% | 25%
pipeline/postprocessing0% | 0% | 31% | 55% | 13%
pipeline/review0% | 0% | 0% | 76% | 23%
annotation0% | 0% | 0% | 100% | 0%
annotation/fn-SMGT-Post0% | 0% | 0% | 100% | 0%
pipeline/enrichment0% | 0% | 0% | 0% | 100%
pipeline/fn-trigger0% | 0% | 0% | 0% | 100%
annotation/fn-SMGT-Pre0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
ocr-bbox-and-validation.liquid.tpl.html
in notebooks/annotation
560 -
__init__.py
in pipeline/ocr/sfn_semaphore
519 17
fields-validation.liquid.html
in notebooks/review
402 -
ner.py
in notebooks/src/code/data
343 12
preproc.py
in notebooks/util
293 7
__init__.py
in pipeline
262 10
viz.py
in notebooks/util
262 7
main.py
in pipeline/ocr/fn-call-textract
258 7
config.py
in notebooks/src/code
236 4
imgclean.py
in notebooks/preproc
226 7
cdk_demo_stack.py
in root
220 1
main.py
in pipeline/postprocessing/fn-postprocess
203 5
inference.py
in notebooks/src/code
203 6
smgt.py
in notebooks/util
200 5
base.py
in notebooks/src/code/data
190 10
__init__.py
in annotation
184 4
__init__.py
in pipeline/postprocessing
181 1
boxes.py
in pipeline/postprocessing/fn-postprocess/util
170 14
boxes.py
in notebooks/util/postproc
170 14
__init__.py
in pipeline/ocr
169 1
train.py
in notebooks/src/code
169 3
mlm.py
in notebooks/src/code/data
161 8
main.py
in pipeline/review/fn-review-callback
151 1
geometry.py
in notebooks/src/code/data
143 20
main.py
in pipeline/ocr/sfn_semaphore/fn-acquire-lock
136 2
__init__.py
in pipeline/review
113 1
main.py
in annotation/fn-SMGT-Post
105 1
iam_utils.py
in pipeline
93 4
project.py
in notebooks/util
90 4
main.py
in pipeline/fn-trigger
88 3
__init__.py
in pipeline/enrichment
83 1
main.py
in pipeline/review/fn-start-review
81 2
__init__.py
in notebooks/util
46 3
setup.py
in root
45 -
deser.py
in pipeline/postprocessing/fn-postprocess/util
42 6
deser.py
in notebooks/util/postproc
42 6
config.py
in pipeline/postprocessing/fn-postprocess/util
41 3
config.py
in notebooks/util/postproc
41 3
main.py
in pipeline/enrichment/fn-call-sagemaker
36 1
pyproject.toml
in root
27 -
logging_utils.py
in notebooks/src/code
20 3
__init__.py
in notebooks/src/code/data
15 1
train.py
in notebooks/src
13 -
training.py
in notebooks/util
11 1
uid.py
in notebooks/util
8 1
main.py
in annotation/fn-SMGT-Pre
7 1
cdk_app.py
in root
5 -
__init__.py
in pipeline/postprocessing/fn-postprocess/util
1 -
__init__.py
in notebooks/src
1 -
__init__.py
in notebooks/src/code
1 -
Files With Most Units (Top 20)
File# lines# units
geometry.py
in notebooks/src/code/data
143 20
__init__.py
in pipeline/ocr/sfn_semaphore
519 17
boxes.py
in pipeline/postprocessing/fn-postprocess/util
170 14
boxes.py
in notebooks/util/postproc
170 14
ner.py
in notebooks/src/code/data
343 12
__init__.py
in pipeline
262 10
base.py
in notebooks/src/code/data
190 10
mlm.py
in notebooks/src/code/data
161 8
main.py
in pipeline/ocr/fn-call-textract
258 7
preproc.py
in notebooks/util
293 7
viz.py
in notebooks/util
262 7
imgclean.py
in notebooks/preproc
226 7
deser.py
in pipeline/postprocessing/fn-postprocess/util
42 6
inference.py
in notebooks/src/code
203 6
deser.py
in notebooks/util/postproc
42 6
main.py
in pipeline/postprocessing/fn-postprocess
203 5
smgt.py
in notebooks/util
200 5
iam_utils.py
in pipeline
93 4
__init__.py
in annotation
184 4
config.py
in notebooks/src/code
236 4
Files With Long Lines (Top 1)

There is only one file with lines longer than 120 characters. In total, there are 3 long lines.

File# lines# units# long lines
fields-validation.liquid.html
in notebooks/review
402 - 3