aws / sagemaker-spark
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 69 files with 7,304 lines of code.
    • 0 very long files (0 lines of code)
    • 3 long files (1,873 lines of code)
    • 8 medium size files (2,363 lines of codeclsfd_ftr_w_mp_ins)
    • 13 small files (1,748 lines of code)
    • 45 very small files (1,320 lines of code)
0% | 25% | 32% | 23% | 18%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 37% | 31% | 20% | 11%
scala0% | 17% | 37% | 24% | 20%
proto0% | 0% | 0% | 100% | 0%
yml0% | 0% | 0% | 0% | 100%
sbt0% | 0% | 0% | 0% | 100%
java0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
in0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms0% | 57% | 28% | 13% | <1%
sagemaker-spark-sdk/src/main/scala0% | 17% | 37% | 24% | 21%
sagemaker-pyspark-sdk/src/sagemaker_pyspark0% | 0% | 51% | 25% | 23%
protobuf0% | 0% | 0% | 100% | 0%
sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation0% | 0% | 0% | 66% | 33%
sagemaker-pyspark-sdk0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
sagemaker-spark-sdk0% | 0% | 0% | 0% | 100%
sagemaker-spark-sdk/project0% | 0% | 0% | 0% | 100%
sagemaker-pyspark-sdk/src0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
LinearLearnerSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
772 95
LinearLearnerSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
599 13
XGBoostSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
502 73
FactorizationMachinesSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
413 56
FactorizationMachinesSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
361 2
XGBoostSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
322 3
SageMakerModel.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
320 8
SageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
291 8
KMeansSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
220 27
SageMakerModel.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
219 25
SageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
217 40
deserializers.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/deserializers
179 32
KMeansSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
164 2
LDASageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
157 17
PCASageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
152 15
ProtobufConverter.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/protobuf
150 10
RequestBatchIterator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/util
146 5
LDASageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
136 2
PCASageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
133 3
ProtobufResponseRowDeserializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/deserializers
112 4
NamePolicy.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
107 22
wrapper.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
106 16
proto
record.proto
in protobuf
104 -
proto
record-2.5.proto
in protobuf
102 -
ImageURIProvider.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
93 2
setup.py
in sagemaker-pyspark-sdk
91 2
DataUploader.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/internal
88 8
build.sbt
in sagemaker-spark-sdk
84 -
SageMakerResourceCleanup.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
64 5
serializers.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/serializers
59 10
NamePolicy.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
56 -
SageMakerResourceCleanup.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
47 9
IAMRoleResource.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
46 7
SageMakerAlgorithmParams.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
42 3
__init__.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
40 1
buildspec.yml
in root
38 -
RecordIOOutputFormat.java
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/protobuf
38 4
SageMakerProtobufWriter.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/protobuf
36 4
S3Resources.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
35 6
SageMakerProtobufFileFormat.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/protobuf
34 2
buildspec-release.yml
in root
33 -
LibSVMResponseRowDeserializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/deserializers
32 2
SageMakerClients.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
31 6
LibSVMRequestRowSerializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers
29 3
UnlabeledCSVRequestRowSerializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers
28 4
UnlabeledLibSVMRequestRowSerializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers
25 3
SchemaValidators.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers
23 2
__init__.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/deserializers
23 -
S3Resource.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
20 1
ProtobufRequestRowSerializer.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/transformation/serializers
20 2
Files With Most Units (Top 20)
File# lines# units
LinearLearnerSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
772 95
XGBoostSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
502 73
FactorizationMachinesSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
413 56
SageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
217 40
deserializers.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/deserializers
179 32
KMeansSageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
220 27
SageMakerModel.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
219 25
NamePolicy.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
107 22
LDASageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
157 17
wrapper.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
106 16
PCASageMakerEstimator.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/algorithms
152 15
LinearLearnerSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
599 13
ProtobufConverter.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/protobuf
150 10
serializers.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark/transformation/serializers
59 10
SageMakerResourceCleanup.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
47 9
SageMakerModel.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
320 8
SageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
291 8
DataUploader.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/internal
88 8
IAMRoleResource.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
46 7
S3Resources.py
in sagemaker-pyspark-sdk/src/sagemaker_pyspark
35 6
Files With Long Lines (Top 7)

There are 7 files with lines longer than 120 characters. In total, there are 9 long lines.

File# lines# units# long lines
buildspec.yml
in root
38 - 3
SageMakerResourceCleanup.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
64 5 1
LinearLearnerSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
599 13 1
FactorizationMachinesSageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/algorithms
361 2 1
SageMakerEstimator.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk
291 8 1
SageMakerSparkSDKException.scala
in sagemaker-spark-sdk/src/main/scala/com/amazonaws/services/sagemaker/sparksdk/exceptions
10 - 1
buildspec-deploy.yml
in root
17 - 1