awslabs / aws-data-wrangler
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 72 files with 13,062 lines of code.
    • 0 very long files (0 lines of code)
    • 8 long files (5,140 lines of code)
    • 13 medium size files (3,947 lines of codeclsfd_ftr_w_mp_ins)
    • 18 small files (2,559 lines of code)
    • 33 very small files (1,416 lines of code)
0% | 39% | 30% | 19% | 10%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 39% | 30% | 18% | 10%
toml0% | 0% | 0% | 100% | 0%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
awswrangler0% | 50% | 27% | 18% | 3%
awswrangler/s30% | 44% | 25% | 19% | 11%
awswrangler/athena0% | 54% | 32% | 11% | 2%
awswrangler/catalog0% | 37% | 38% | 19% | 4%
awswrangler/quicksight0% | 0% | 58% | 15% | 26%
awswrangler/opensearch0% | 0% | 64% | 0% | 35%
awswrangler/lakeformation0% | 0% | 0% | 92% | 7%
ROOT0% | 0% | 0% | 100% | 0%
awswrangler/data_api0% | 0% | 0% | 0% | 100%
awswrangler/dynamodb0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
redshift.py
in awswrangler
771 24
_read.py
in awswrangler/athena
737 16
_create.py
in awswrangler/catalog
704 14
_read_parquet.py
in awswrangler/s3
703 21
_data_types.py
in awswrangler
622 28
_write_text.py
in awswrangler/s3
582 4
_fs.py
in awswrangler/s3
513 28
emr.py
in awswrangler
508 13
_write_parquet.py
in awswrangler/s3
490 7
_get.py
in awswrangler/catalog
448 20
_utils.py
in awswrangler/athena
433 18
_config.py
in awswrangler
321 57
_read_text.py
in awswrangler/s3
277 7
_definitions.py
in awswrangler/catalog
276 8
_write_dataset.py
in awswrangler/s3
264 6
_write.py
in awswrangler/opensearch
257 14
_get_list.py
in awswrangler/quicksight
244 24
_utils.py
in awswrangler
243 24
_create.py
in awswrangler/quicksight
242 6
_databases.py
in awswrangler
240 13
timestream.py
in awswrangler
212 13
_list.py
in awswrangler/s3
199 8
mysql.py
in awswrangler
198 8
sqlserver.py
in awswrangler
197 10
postgresql.py
in awswrangler
186 8
_utils.py
in awswrangler/lakeformation
177 12
_add.py
in awswrangler/catalog
157 5
_cache.py
in awswrangler/athena
153 11
_read.py
in awswrangler/lakeformation
148 4
_delete.py
in awswrangler/quicksight
126 9
_select.py
in awswrangler/s3
122 4
_read.py
in awswrangler/s3
119 9
_describe.py
in awswrangler/s3
119 5
pyproject.toml
in root
115 -
_utils.py
in awswrangler/catalog
113 11
_copy.py
in awswrangler/s3
111 3
_delete.py
in awswrangler/s3
111 4
cloudwatch.py
in awswrangler
106 5
_delete.py
in awswrangler/catalog
102 5
redshift.py
in awswrangler/data_api
97 8
__init__.py
in awswrangler/catalog
93 -
_describe.py
in awswrangler/quicksight
92 5
_wait.py
in awswrangler/s3
92 5
_write.py
in awswrangler/s3
87 4
rds.py
in awswrangler/data_api
87 5
__init__.py
in awswrangler/quicksight
82 -
_merge_upsert_table.py
in awswrangler/s3
73 3
_read.py
in awswrangler/opensearch
68 6
_utils.py
in awswrangler/opensearch
60 5
__init__.py
in awswrangler
51 -
Files With Most Units (Top 20)
File# lines# units
_config.py
in awswrangler
321 57
_data_types.py
in awswrangler
622 28
_fs.py
in awswrangler/s3
513 28
_get_list.py
in awswrangler/quicksight
244 24
redshift.py
in awswrangler
771 24
_utils.py
in awswrangler
243 24
_read_parquet.py
in awswrangler/s3
703 21
_get.py
in awswrangler/catalog
448 20
_utils.py
in awswrangler/athena
433 18
_read.py
in awswrangler/athena
737 16
_create.py
in awswrangler/catalog
704 14
_write.py
in awswrangler/opensearch
257 14
emr.py
in awswrangler
508 13
_databases.py
in awswrangler
240 13
timestream.py
in awswrangler
212 13
_utils.py
in awswrangler/lakeformation
177 12
_utils.py
in awswrangler/catalog
113 11
_cache.py
in awswrangler/athena
153 11
sqlserver.py
in awswrangler
197 10
_delete.py
in awswrangler/quicksight
126 9
Files With Long Lines (Top 1)

There is only one file with lines longer than 120 characters. In total, there is only one long line.

File# lines# units# long lines
emr.py
in awswrangler
508 13 1