aws-samples / aws-glue-samples
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 59 files with 4,268 lines of code.
    • 1 very long files (1,051 lines of code)
    • 0 long files (0 lines of code)
    • 0 medium size files (0 lines of codeclsfd_ftr_w_mp_ins)
    • 7 small files (1,053 lines of code)
    • 51 very small files (2,164 lines of code)
24% | 0% | 0% | 24% | 50%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py59% | 0% | 0% | 0% | 40%
java0% | 0% | 0% | 63% | 36%
yaml0% | 0% | 0% | 100% | 0%
scala0% | 0% | 0% | 20% | 79%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
utilities/Hive_metastore_migration/src84% | 0% | 0% | 0% | 15%
GlueCustomConnectors/development/Spark0% | 0% | 0% | 47% | 52%
utilities/sagemaker_notebook_automation0% | 0% | 0% | 100% | 0%
GlueCustomConnectors/localValidation0% | 0% | 0% | 18% | 81%
GlueCustomConnectors/development/Athena0% | 0% | 0% | 55% | 44%
GlueCustomConnectors/gluescripts/withoutConnection0% | 0% | 0% | 0% | 100%
GlueCustomConnectors/gluescripts/withConnection0% | 0% | 0% | 0% | 100%
utilities/Crawler_undo_redo/src0% | 0% | 0% | 0% | 100%
GlueCustomConnectors/glueJobValidation0% | 0% | 0% | 0% | 100%
utilities/use_only_IAM_access_controls0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
hive_metastore_migration.py
in utilities/Hive_metastore_migration/src
1051 92
SparkConnectorMySQL.scala
in GlueCustomConnectors/development/Spark
199 17
glue_sagemaker_notebook_cn.yaml
in utilities/sagemaker_notebook_automation
177 -
glue_sagemaker_notebook.yaml
in utilities/sagemaker_notebook_automation
177 -
SparkConnectorCSV.java
in GlueCustomConnectors/development/Spark
162 14
MinimalSparkConnector.java
in GlueCustomConnectors/development/Spark
120 14
DbtableQueryTest.scala
in GlueCustomConnectors/localValidation
112 2
ExampleMetadataHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
106 7
MinimalSpark3Connector.scala
in GlueCustomConnectors/development/Spark/glue-3.0
97 17
export_from_datacatalog.py
in utilities/Hive_metastore_migration/src
96 6
glue_job_validation_update.py
in GlueCustomConnectors/glueJobValidation
96 -
import_into_datacatalog.py
in utilities/Hive_metastore_migration/src
95 5
update_permission.py
in utilities/use_only_IAM_access_controls
93 -
ExampleRecordHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
84 4
crawler_undo.py
in utilities/Crawler_undo_redo/src
80 4
scripts_utils.py
in utilities/Crawler_undo_redo/src
74 8
DataSinkTest.scala
in GlueCustomConnectors/localValidation
74 4
TPCDSUtils.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
64 4
GlueJobValidationDataSchemaTest.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/jobvalidation/scala
62 1
TPCDSScan.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
59 2
GlueJobValidationDataPartitioningTest.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/jobvalidation/scala
57 1
DataTypeMappingTest.scala
in GlueCustomConnectors/localValidation
54 2
JDBCUrlTest.scala
in GlueCustomConnectors/localValidation
50 2
ReadWriteTest.scala
in GlueCustomConnectors/localValidation
49 2
ColumnPartitioningTest.scala
in GlueCustomConnectors/localValidation
49 2
DataSourceTest.scala
in GlueCustomConnectors/localValidation
48 2
CatalogConnectionTest.scala
in GlueCustomConnectors/localValidation
48 2
SecretsManagerTest.scala
in GlueCustomConnectors/localValidation
47 2
crawler_redo_from_backup.py
in utilities/Crawler_undo_redo/src
46 3
FilterPredicateTest.scala
in GlueCustomConnectors/localValidation
46 2
TPCDSPartitionReader.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
45 2
DataSchemaTest.scala
in GlueCustomConnectors/localValidation
44 2
JDBCSalesforce.scala
in GlueCustomConnectors/gluescripts/withoutConnection
37 1
jdbc-script-example.py
in GlueCustomConnectors/gluescripts/withoutConnection
34 -
JDBCSalesforce.scala
in GlueCustomConnectors/gluescripts/withConnection
32 1
jdbc-script-example.py
in GlueCustomConnectors/gluescripts/withConnection
30 -
MinimalSparkConnectorTest.scala
in GlueCustomConnectors/development/Spark
28 1
MinimalSparkConnectorTestWithConnection.scala
in GlueCustomConnectors/development/Spark
28 1
SparkSnowflake.scala
in GlueCustomConnectors/gluescripts/withoutConnection
28 1
SparkSnowflake.scala
in GlueCustomConnectors/gluescripts/withConnection
27 1
JDBCSalesforce.java
in GlueCustomConnectors/gluescripts/withoutConnection
27 1
GlueJobValidationDataSourceTest.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/jobvalidation/scala
26 1
JDBCSalesforce.java
in GlueCustomConnectors/gluescripts/withConnection
24 1
SparkSnowflake.java
in GlueCustomConnectors/gluescripts/withoutConnection
24 1
spark-script-example.py
in GlueCustomConnectors/gluescripts/withoutConnection
24 -
SparkSnowflake.java
in GlueCustomConnectors/gluescripts/withConnection
23 1
spark-script-example.py
in GlueCustomConnectors/gluescripts/withConnection
23 -
TPCDSTable.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
21 1
AthenaCloudwatch.scala
in GlueCustomConnectors/gluescripts/withConnection
21 1
AthenaCloudwatch.java
in GlueCustomConnectors/gluescripts/withConnection
21 1
Files With Most Units (Top 20)
File# lines# units
hive_metastore_migration.py
in utilities/Hive_metastore_migration/src
1051 92
SparkConnectorMySQL.scala
in GlueCustomConnectors/development/Spark
199 17
MinimalSpark3Connector.scala
in GlueCustomConnectors/development/Spark/glue-3.0
97 17
MinimalSparkConnector.java
in GlueCustomConnectors/development/Spark
120 14
SparkConnectorCSV.java
in GlueCustomConnectors/development/Spark
162 14
scripts_utils.py
in utilities/Crawler_undo_redo/src
74 8
ExampleMetadataHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
106 7
export_from_datacatalog.py
in utilities/Hive_metastore_migration/src
96 6
import_into_datacatalog.py
in utilities/Hive_metastore_migration/src
95 5
crawler_undo.py
in utilities/Crawler_undo_redo/src
80 4
TPCDSUtils.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
64 4
ExampleRecordHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
84 4
DataSinkTest.scala
in GlueCustomConnectors/localValidation
74 4
crawler_redo_from_backup.py
in utilities/Crawler_undo_redo/src
46 3
TPCDSScan.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
59 2
TPCDSPartitionReader.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
45 2
DataSchemaTest.scala
in GlueCustomConnectors/localValidation
44 2
JDBCUrlTest.scala
in GlueCustomConnectors/localValidation
50 2
DataTypeMappingTest.scala
in GlueCustomConnectors/localValidation
54 2
ReadWriteTest.scala
in GlueCustomConnectors/localValidation
49 2
Files With Long Lines (Top 16)

There are 16 files with lines longer than 120 characters. In total, there are 64 long lines.

File# lines# units# long lines
hive_metastore_migration.py
in utilities/Hive_metastore_migration/src
1051 92 9
import_into_datacatalog.py
in utilities/Hive_metastore_migration/src
95 5 8
crawler_undo.py
in utilities/Crawler_undo_redo/src
80 4 8
scripts_utils.py
in utilities/Crawler_undo_redo/src
74 8 7
glue_sagemaker_notebook_cn.yaml
in utilities/sagemaker_notebook_automation
177 - 7
glue_sagemaker_notebook.yaml
in utilities/sagemaker_notebook_automation
177 - 6
export_from_datacatalog.py
in utilities/Hive_metastore_migration/src
96 6 4
MinimalSpark3Connector.scala
in GlueCustomConnectors/development/Spark/glue-3.0
97 17 4
crawler_redo_from_backup.py
in utilities/Crawler_undo_redo/src
46 3 2
TPCDSPartitionReaderFactory.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
10 1 2
ExampleRecordHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
84 4 2
update_permission.py
in utilities/use_only_IAM_access_controls
93 - 1
TPCDSSource.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
19 1 1
TPCDSScan.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
59 2 1
TPCDSPartitionReader.scala
in GlueCustomConnectors/development/Spark/glue-3.0/tpcds-custom-connector-for-glue3.0/src/main/scala/com/amazonaws/services/glue/marketplace/connector...
45 2 1
ExampleMetadataHandler.java
in GlueCustomConnectors/development/Athena/src/main/java/com/amazonaws/athena/connectors/example
106 7 1