aws-samples / aws-big-data-blog
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 166 files with 11,873 lines of code.
    • 1 very long files (1,024 lines of code)
    • 0 long files (0 lines of code)
    • 11 medium size files (2,945 lines of codeclsfd_ftr_w_mp_ins)
    • 23 small files (3,134 lines of code)
    • 131 very small files (4,770 lines of code)
8% | 0% | 24% | 26% | 40%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
yml95% | 0% | 0% | 0% | 4%
js0% | 0% | 49% | 21% | 29%
java0% | 0% | 14% | 42% | 43%
html0% | 0% | 61% | 0% | 38%
hql0% | 0% | 79% | 0% | 20%
py0% | 0% | 23% | 13% | 62%
R0% | 0% | 0% | 96% | 3%
scala0% | 0% | 0% | 0% | 100%
spec0% | 0% | 0% | 0% | 100%
yaml0% | 0% | 0% | 0% | 100%
css0% | 0% | 0% | 0% | 100%
q0% | 0% | 0% | 0% | 100%
gvy0% | 0% | 0% | 0% | 100%
sbt0% | 0% | 0% | 0% | 100%
pp0% | 0% | 0% | 0% | 100%
SQL0% | 0% | 0% | 0% | 100%
jade0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing65% | 0% | 13% | 7% | 13%
aws-blog-kinesis-data-visualization0% | 0% | 43% | 30% | 26%
aws-blog-lambda-map-reduce-streaming0% | 0% | 55% | 25% | 19%
aws-blog-spark-parquet-conversion0% | 0% | 85% | 0% | 14%
aws-blog-hbase-on-emr0% | 0% | 24% | 35% | 40%
aws-blog-jobserver-emr0% | 0% | 50% | 0% | 49%
aws-blog-multi-az-multi-region-redshift0% | 0% | 100% | 0% | 0%
aws-blog-event-driven-batch-analytics0% | 0% | 69% | 0% | 30%
aws-blog-sparkr-geospatial-analysis0% | 0% | 0% | 96% | 3%
aws-blog-kinesis-beanstalk-workers0% | 0% | 0% | 52% | 47%
aws-blog-kinesis-storm-clickstream-app0% | 0% | 0% | 56% | 43%
aws-blog-mirth-healthcare-hub0% | 0% | 0% | 61% | 38%
aws-blog-s3-index-with-lambda-ddb0% | 0% | 0% | 37% | 62%
aws-blog-titan-graph-database0% | 0% | 0% | 100% | 0%
aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platform0% | 0% | 0% | 31% | 68%
aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite0% | 0% | 0% | 35% | 64%
aws-blog-firehose-lambda0% | 0% | 0% | 68% | 31%
aws-blog-vpcflowlogs-athena-quicksight0% | 0% | 0% | 0% | 100%
aws-blog-bigtop-application-emr0% | 0% | 0% | 0% | 100%
aws-blog-athena-importing-hive-metastores0% | 0% | 0% | 0% | 100%
aws-blog-machine-learning-with-spark0% | 0% | 0% | 0% | 100%
aws-blog-strata20160% | 0% | 0% | 0% | 100%
aws-blog-sparkstreaming-from-kafka0% | 0% | 0% | 0% | 100%
aws-blog-retain-redshift-stl0% | 0% | 0% | 0% | 100%
aws-blog-elasticsearch-cascading-commoncrawl0% | 0% | 0% | 0% | 100%
aws-blog-nodejs-on-emr0% | 0% | 0% | 0% | 100%
aws-blog-athena-genomics0% | 0% | 0% | 0% | 100%
aws-blog-end-to-end-emrencyrption0% | 0% | 0% | 0% | 100%
aws-blog-iot-athena-quicksight-bi0% | 0% | 0% | 0% | 100%
aws-blog-emrencyrption0% | 0% | 0% | 0% | 100%
aws-blog-analyzing-using-ipython-notebook0% | 0% | 0% | 0% | 100%
aws-blog-real-time-clickstream0% | 0% | 0% | 0% | 100%
aws-blog-sqoop0% | 0% | 0% | 0% | 100%
aws-blog-redshift-python-udf-tutorial0% | 0% | 0% | 0% | 100%
aws-blog-campanile0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
template.yml
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing
1024 -
addpartitions.hql
in aws-blog-spark-parquet-conversion
365 -
TrackballControls.js
in aws-blog-kinesis-data-visualization/Globe/public/js
337 17
OrbitControls.js
in aws-blog-kinesis-data-visualization/Globe/public/js
334 19
KinesisMessageModel.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis
300 41
Quaternion.js
in aws-blog-kinesis-data-visualization/Globe/public/js
257 24
streams.js
in aws-blog-lambda-map-reduce-streaming
251 11
mltest.html
in aws-blog-jobserver-emr/html
245 -
loader.py
in aws-blog-multi-az-multi-region-redshift
219 1
scatter.html
in aws-blog-lambda-map-reduce-streaming
217 -
StateIndexerLambdaImpl.java
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing/lambda/src/main/java/com/amazon/aws/blog
216 7
LambdaContainer.java
in aws-blog-event-driven-batch-analytics/src/main/java/com/amazonaws/bigdatablog/edba
204 8
MCAWS.java
in aws-blog-mirth-healthcare-hub/mirth-aws-sample-app/src/main/java/org/mirth/project
193 8
SparkRGeoInt.R
in aws-blog-sparkr-geospatial-analysis
180 -
S3DataGenerator.java
in aws-blog-s3-index-with-lambda-ddb/s3-log-generator/src/main/java/com/amazonaws/bigdatablog/s3index
177 15
earth.js
in aws-blog-kinesis-data-visualization/Globe/public/js
168 13
EMRUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
167 8
SparkRGeoIntPipes.R
in aws-blog-sparkr-geospatial-analysis
164 -
RestaurantFactory.java
in aws-blog-titan-graph-database/src/main/java/com/amazonaws/bigdatablog/titanrestaurants
156 2
KinesisUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
154 9
ManagedConsumer.java
in aws-blog-kinesis-beanstalk-workers/src/main/java/com/amazonaws/services/kinesis
139 12
tweet_utils.py
in aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platform/lambda-s3-twitter-to-es-python
127 5
KinesisRecordProcessor.java
in aws-blog-kinesis-data-visualization/KinesisApplication/src/main/java/com/amazonaws/kinesis/dataviz/kinesisclient
127 6
StreamsRecordProcessor.java
in aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite/src/main/java/com/amazon/dynamostreams/clientlibrary
127 7
KinesisToFirehose.java
in aws-blog-firehose-lambda/kinesisFirehose/src/main/java/com/amazonaws/proserv/lambda
124 9
KinesisConnectorExecutor.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis
120 7
KinesisApplication.java
in aws-blog-kinesis-data-visualization/KinesisApplication/src/main/java/com/amazonaws/kinesis/dataviz/kinesisclient
118 4
earth2.js
in aws-blog-kinesis-data-visualization/Globe/public/js
117 7
PreFetcherFlatMapFunction.java
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing/emr/src/main/java/com/amazon/aws/blog/function
114 4
RollingCountBolt.java
in aws-blog-kinesis-storm-clickstream-app/src/main/java/KinesisStormClickstreamApp
112 10
cascade.js
in aws-blog-lambda-map-reduce-streaming
112 3
KinesisWorkerServletInitiator.java
in aws-blog-kinesis-beanstalk-workers/src/main/java/com/amazonaws/services/kinesis/beanstalk/connector
111 4
TwitterProducer.java
in aws-blog-kinesis-data-visualization/TwitterProducer/src/main/java/com/amazonaws/kinesis/dataviz/twitter
111 4
SampleTopology.java
in aws-blog-kinesis-storm-clickstream-app/src/main/java/KinesisStormClickstreamApp
110 3
wordcount.js
in aws-blog-lambda-map-reduce-streaming
106 2
twitter_stream_producer.js
in aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platform/firehose-twitter-streaming-nodejs
98 7
HBaseUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
95 4
PointCloud.js
in aws-blog-kinesis-data-visualization/Globe/public/js
93 3
MovieRecML.scala
in aws-blog-jobserver-emr/src/main/scala/com/amazonaws/proserv/ml
93 7
HBaseEmitter.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/connector
91 3
ProcessVendorTrasactions.java
in aws-blog-event-driven-batch-analytics/src/main/java/com/amazonaws/bigdatablog/edba/emr
91 2
index.html
in aws-blog-lambda-map-reduce-streaming
90 -
spec
elasticsearch.spec
in aws-blog-bigtop-application-emr/bigtop-packages/src/rpm/elasticsearch/SPECS
89 -
template.yaml
in aws-blog-s3-index-with-lambda-ddb/example-indexer-app
82 -
extract_rs_query_logs.py
in aws-blog-retain-redshift-stl/scripts
81 -
index.html
in aws-blog-kinesis-storm-clickstream-app/static/visualization
79 -
VerifyIndex.java
in aws-blog-s3-index-with-lambda-ddb/s3-log-generator/src/main/java/com/amazonaws/bigdatablog/s3index
79 6
StreamSource.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis
78 5
AmazonDynamoDBStreamstoIgnite.java
in aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite/src/main/java/com/amazon/dynamostreams/clientlibrary
77 2
TruncateBucket.java
in aws-blog-s3-index-with-lambda-ddb/s3-log-generator/src/main/java/com/amazonaws/bigdatablog/s3index
76 6
Files With Most Units (Top 20)
File# lines# units
KinesisMessageModel.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis
300 41
Quaternion.js
in aws-blog-kinesis-data-visualization/Globe/public/js
257 24
OrbitControls.js
in aws-blog-kinesis-data-visualization/Globe/public/js
334 19
TrackballControls.js
in aws-blog-kinesis-data-visualization/Globe/public/js
337 17
S3DataGenerator.java
in aws-blog-s3-index-with-lambda-ddb/s3-log-generator/src/main/java/com/amazonaws/bigdatablog/s3index
177 15
earth.js
in aws-blog-kinesis-data-visualization/Globe/public/js
168 13
S3Object.java
in aws-blog-vpcflowlogs-athena-quicksight/PartitioningFunction/src/main/java/com/amazonaws/services/lambda/model
51 12
ManagedConsumer.java
in aws-blog-kinesis-beanstalk-workers/src/main/java/com/amazonaws/services/kinesis
139 12
PartitionType.java
in aws-blog-vpcflowlogs-athena-quicksight/PartitioningFunction/src/main/java/com/amazonaws/services/lambda/model
75 11
streams.js
in aws-blog-lambda-map-reduce-streaming
251 11
RollingCountBolt.java
in aws-blog-kinesis-storm-clickstream-app/src/main/java/KinesisStormClickstreamApp
112 10
OrderData.java
in aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite/src/main/java/com/amazon/dynamostreams/clientlibrary
51 10
KinesisToFirehose.java
in aws-blog-firehose-lambda/kinesisFirehose/src/main/java/com/amazonaws/proserv/lambda
124 9
KinesisUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
154 9
MCAWS.java
in aws-blog-mirth-healthcare-hub/mirth-aws-sample-app/src/main/java/org/mirth/project
193 8
EMRUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
167 8
LambdaContainer.java
in aws-blog-event-driven-batch-analytics/src/main/java/com/amazonaws/bigdatablog/edba
204 8
twitter_stream_producer.js
in aws-blog-firehose-lambda-elasticsearch-near-real-time-discovery-platform/firehose-twitter-streaming-nodejs
98 7
StateIndexerLambdaImpl.java
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing/lambda/src/main/java/com/amazon/aws/blog
216 7
KinesisConnectorExecutor.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis
120 7
Files With Long Lines (Top 20)

There are 52 files with lines longer than 120 characters. In total, there are 528 long lines.

File# lines# units# long lines
addpartitions.hql
in aws-blog-spark-parquet-conversion
365 - 365
SparkRGeoIntPipes.R
in aws-blog-sparkr-geospatial-analysis
164 - 15
SparkRGeoInt.R
in aws-blog-sparkr-geospatial-analysis
180 - 15
extract_rs_query_logs.py
in aws-blog-retain-redshift-stl/scripts
81 - 11
EMRUtils.java
in aws-blog-hbase-on-emr/hbase-connector/src/main/java/com/amazonaws/hbase/kinesis/utils
167 8 9
LambdaContainer.java
in aws-blog-event-driven-batch-analytics/src/main/java/com/amazonaws/bigdatablog/edba
204 8 9
template.yml
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing
1024 - 8
FlightsBatch.scala
in aws-blog-jobserver-emr/src/main/scala/com/amazonaws/proserv/blog
44 2 7
mltest.html
in aws-blog-jobserver-emr/html
245 - 7
ProcessVendorTrasactions.java
in aws-blog-event-driven-batch-analytics/src/main/java/com/amazonaws/bigdatablog/edba/emr
91 2 7
q
diff.q
in aws-blog-campanile/hive
7 - 4
PreFetcherFlatMapFunction.java
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing/emr/src/main/java/com/amazon/aws/blog/function
114 4 4
sample_addpartitions.hql
in aws-blog-athena-importing-hive-metastores
4 - 4
SampleTopology.java
in aws-blog-kinesis-storm-clickstream-app/src/main/java/KinesisStormClickstreamApp
110 3 3
StateIndexerLambdaImpl.java
in BDB-701-Moving-From-Transactional-to-Stateful-Batch-Processing/lambda/src/main/java/com/amazon/aws/blog
216 7 3
init.pp
in aws-blog-bigtop-application-emr/bigtop-deploy/puppet/modules/elasticsearch/manifests
33 - 3
TrackballControls.js
in aws-blog-kinesis-data-visualization/Globe/public/js
337 17 3
Detector.js
in aws-blog-kinesis-data-visualization/Globe/public/js
38 2 3
Quaternion.js
in aws-blog-kinesis-data-visualization/Globe/public/js
257 24 3
dummyOrderGenerator.py
in aws-blog-real-time-in-memory-oltp-and-analytics-with-apache-ignite/sample
39 - 3