Petastorm
File Size

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 100-200 (small files), 200-500 (medium size files), 501-1000 (long files), 1001+ (very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 84 files with 5,134 lines of code.
    • 0 very long files (0 lines of code)
    • 0 long files (0 lines of code)
    • 4 medium size files (1,183 lines of code)
    • 14 small files (2,015 lines of code)
    • 66 very small files (1,936 lines of code)
0% | 0% | 23% | 39% | 37%
Legend:
1001+
501-1000
201-500
101-200
1-100
File Size per Extension
py0% | 0% | 23% | 39% | 37%
Legend:
1001+
501-1000
201-500
101-200
1-100
File Size per Logical Decomposition
primary
petastorm0% | 0% | 27% | 49% | 22%
petastorm/spark0% | 0% | 99% | 0% | <1%
petastorm/workers_pool0% | 0% | 42% | 25% | 32%
petastorm/etl0% | 0% | 0% | 38% | 61%
petastorm/hdfs0% | 0% | 0% | 99% | <1%
petastorm/benchmark0% | 0% | 0% | 50% | 49%
petastorm/reader_impl0% | 0% | 0% | 52% | 47%
examples/mnist0% | 0% | 0% | 43% | 56%
examples/spark_dataset_converter0% | 0% | 0% | 57% | 42%
ROOT0% | 0% | 0% | 0% | 100%
examples/hello_world0% | 0% | 0% | 0% | 100%
examples0% | 0% | 0% | 0% | 100%
examples/imagenet0% | 0% | 0% | 0% | 100%
petastorm/tools0% | 0% | 0% | 0% | 100%
petastorm/gcsfs_helpers0% | 0% | 0% | 0% | 100%
petastorm/pyarrow_helpers0% | 0% | 0% | 0% | 100%
Legend:
1001+
501-1000
201-500
101-200
1-100
Alternative Visuals
Longest Files (Top 50)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
380 36
reader.py
in petastorm
325 20
unischema.py
in petastorm
265 23
process_pool.py
in petastorm/workers_pool
213 14
dataset_metadata.py
in petastorm/etl
200 12
arrow_reader_worker.py
in petastorm
183 10
tf_utils.py
in petastorm
177 17
pytorch.py
in petastorm
171 14
namenode.py
in petastorm/hdfs
162 16
codecs.py
in petastorm
154 24
py_dict_reader_worker.py
in petastorm
147 12
throughput.py
in petastorm/benchmark
126 8
thread_pool.py
in petastorm/workers_pool
126 11
pytorch_shuffling_buffer.py
in petastorm/reader_impl
125 23
pytorch_example.py
in examples/mnist
115 6
fs_utils.py
in petastorm
115 9
pytorch_converter_example.py
in examples/spark_dataset_converter
109 6
ngram.py
in petastorm
105 15
petastorm_generate_metadata.py
in petastorm/etl
98 3
predicates.py
in petastorm
93 21
setup.py
in null
86 -
ventilator.py
in petastorm/workers_pool
81 12
copy_dataset.py
in petastorm/tools
77 4
tf_example.py
in examples/mnist
76 2
shuffling_buffer.py
in petastorm/reader_impl
75 20
rowgroup_indexing.py
in petastorm/etl
73 4
rowgroup_indexers.py
in petastorm/etl
73 14
generate_petastorm_imagenet.py
in examples/imagenet
67 3
cli.py
in petastorm/benchmark
67 3
namedtuple_gt_255_fields.py
in petastorm
65 2
generate_petastorm_mnist.py
in examples/mnist
63 3
tensorflow_converter_example.py
in examples/spark_dataset_converter
62 4
utils.py
in petastorm
62 3
dummy_reader.py
in petastorm/benchmark
56 6
gcsfs_wrapper.py
in petastorm/gcsfs_helpers
49 3
compat.py
in petastorm
48 7
selectors.py
in petastorm
46 11
metadata_util.py
in petastorm/etl
45 -
weighted_sampling_reader.py
in petastorm
44 8
dummy_pool.py
in petastorm/workers_pool
38 7
transform.py
in petastorm
32 3
generate_petastorm_dataset.py
in examples/hello_world/petastorm_dataset
31 2
exec_in_new_process.py
in petastorm/workers_pool
31 1
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
spark_session_cli.py
in petastorm/tools
28 3
local_disk_cache.py
in petastorm
26 3
pyspark_hello_world.py
in examples/hello_world/petastorm_dataset
21 1
__init__.py
in petastorm/etl
21 6
tensorflow_hello_world.py
in examples/hello_world/petastorm_dataset
19 1
tensorflow_hello_world.py
in examples/hello_world/external_dataset
19 1
Files With Most Units (Top 50)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
380 36
codecs.py
in petastorm
154 24
pytorch_shuffling_buffer.py
in petastorm/reader_impl
125 23
unischema.py
in petastorm
265 23
predicates.py
in petastorm
93 21
shuffling_buffer.py
in petastorm/reader_impl
75 20
reader.py
in petastorm
325 20
tf_utils.py
in petastorm
177 17
namenode.py
in petastorm/hdfs
162 16
ngram.py
in petastorm
105 15
rowgroup_indexers.py
in petastorm/etl
73 14
pytorch.py
in petastorm
171 14
process_pool.py
in petastorm/workers_pool
213 14
dataset_metadata.py
in petastorm/etl
200 12
py_dict_reader_worker.py
in petastorm
147 12
ventilator.py
in petastorm/workers_pool
81 12
selectors.py
in petastorm
46 11
thread_pool.py
in petastorm/workers_pool
126 11
arrow_reader_worker.py
in petastorm
183 10
fs_utils.py
in petastorm
115 9
throughput.py
in petastorm/benchmark
126 8
weighted_sampling_reader.py
in petastorm
44 8
compat.py
in petastorm
48 7
dummy_pool.py
in petastorm/workers_pool
38 7
pytorch_example.py
in examples/mnist
115 6
pytorch_converter_example.py
in examples/spark_dataset_converter
109 6
dummy_reader.py
in petastorm/benchmark
56 6
__init__.py
in petastorm/etl
21 6
tensorflow_converter_example.py
in examples/spark_dataset_converter
62 4
copy_dataset.py
in petastorm/tools
77 4
rowgroup_indexing.py
in petastorm/etl
73 4
pyarrow_serializer.py
in petastorm/reader_impl
19 4
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
generate_petastorm_mnist.py
in examples/mnist
63 3
generate_petastorm_imagenet.py
in examples/imagenet
67 3
cli.py
in petastorm/benchmark
67 3
spark_session_cli.py
in petastorm/tools
28 3
gcsfs_wrapper.py
in petastorm/gcsfs_helpers
49 3
local_disk_cache.py
in petastorm
26 3
petastorm_generate_metadata.py
in petastorm/etl
98 3
utils.py
in petastorm
62 3
transform.py
in petastorm
32 3
worker_base.py
in petastorm/workers_pool
11 3
tf_example.py
in examples/mnist
76 2
utils.py
in examples/spark_dataset_converter
16 2
generate_petastorm_dataset.py
in examples/hello_world/petastorm_dataset
31 2
generate_external_dataset.py
in examples/hello_world/external_dataset
18 2
cache.py
in petastorm
9 2
pickle_serializer.py
in petastorm/reader_impl
6 2
arrow_table_serializer.py
in petastorm/reader_impl
12 2