uber / petastorm File Size

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Learn more...

File Size Overall

There are 56 files with 4,596 lines of code.

0 very long files (0 lines of code)
0 long files (0 lines of code)
7 medium size files (1,938 lines of codeclsfd_ftr_w_mp_ins)
9 small files (1,256 lines of code)
40 very small files (1,402 lines of code)

Legend:

1001+

501-1000

201-500

101-200

1-100

explore: grouped by folders | grouped by size | sunburst | 3D view

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 50)

File	# lines	# units
spark_dataset_converter.py in petastorm/spark	403	37
reader.py in petastorm	381	21
unischema.py in petastorm	266	23
pytorch.py in petastorm	248	19
arrow_reader_worker.py in petastorm	225	11
process_pool.py in petastorm/workers_pool	214	14
dataset_metadata.py in petastorm/etl	201	12
tf_utils.py in petastorm	180	17
namenode.py in petastorm/hdfs	159	16
py_dict_reader_worker.py in petastorm	155	12
codecs.py in petastorm	154	24
thread_pool.py in petastorm/workers_pool	150	13
throughput.py in petastorm/benchmark	124	8
pytorch_shuffling_buffer.py in petastorm/reader_impl	123	22
fs_utils.py in petastorm	106	9
ngram.py in petastorm	105	15
petastorm_generate_metadata.py in petastorm/etl	98	3
predicates.py in petastorm	93	21
setup.py in root	92	-
ventilator.py in petastorm/workers_pool	87	12
copy_dataset.py in petastorm/tools	77	4
shuffling_buffer.py in petastorm/reader_impl	75	20
rowgroup_indexers.py in petastorm/etl	73	14
rowgroup_indexing.py in petastorm/etl	71	4
namedtuple_gt_255_fields.py in petastorm	65	2
cli.py in petastorm/benchmark	63	3
utils.py in petastorm	61	3
dummy_reader.py in petastorm/benchmark	56	6
weighted_sampling_reader.py in petastorm	48	10
selectors.py in petastorm	46	11
metadata_util.py in petastorm/etl	45	-
local_disk_cache.py in petastorm	43	3
dummy_pool.py in petastorm/workers_pool	38	7
legacy.py in petastorm/etl	36	3
transform.py in petastorm	32	3
batching_table_queue.py in petastorm/pyarrow_helpers	30	4
spark_session_cli.py in petastorm/tools	28	3
__init__.py in petastorm/etl	21	6
spark_utils.py in petastorm	18	1
exec_in_new_process_entrypoint.py in petastorm/workers_pool	18	-
exec_in_new_process.py in petastorm/workers_pool	18	1
generator.py in petastorm	15	1
arrow_table_serializer.py in petastorm/reader_impl	12	2
worker_base.py in petastorm/workers_pool	11	3
cache.py in petastorm	9	2
pickle_serializer.py in petastorm/reader_impl	6	2
__init__.py in petastorm	4	-
__init__.py in petastorm/workers_pool	3	-
cfg setup.cfg in root	2	-
__init__.py in petastorm/spark	2	-

Files With Most Units (Top 43)

File	# lines	# units
spark_dataset_converter.py in petastorm/spark	403	37
codecs.py in petastorm	154	24
unischema.py in petastorm	266	23
pytorch_shuffling_buffer.py in petastorm/reader_impl	123	22
reader.py in petastorm	381	21
predicates.py in petastorm	93	21
shuffling_buffer.py in petastorm/reader_impl	75	20
pytorch.py in petastorm	248	19
tf_utils.py in petastorm	180	17
namenode.py in petastorm/hdfs	159	16
ngram.py in petastorm	105	15
rowgroup_indexers.py in petastorm/etl	73	14
process_pool.py in petastorm/workers_pool	214	14
thread_pool.py in petastorm/workers_pool	150	13
dataset_metadata.py in petastorm/etl	201	12
py_dict_reader_worker.py in petastorm	155	12
ventilator.py in petastorm/workers_pool	87	12
arrow_reader_worker.py in petastorm	225	11
selectors.py in petastorm	46	11
weighted_sampling_reader.py in petastorm	48	10
fs_utils.py in petastorm	106	9
throughput.py in petastorm/benchmark	124	8
dummy_pool.py in petastorm/workers_pool	38	7
__init__.py in petastorm/etl	21	6
dummy_reader.py in petastorm/benchmark	56	6
batching_table_queue.py in petastorm/pyarrow_helpers	30	4
rowgroup_indexing.py in petastorm/etl	71	4
copy_dataset.py in petastorm/tools	77	4
transform.py in petastorm	32	3
utils.py in petastorm	61	3
legacy.py in petastorm/etl	36	3
petastorm_generate_metadata.py in petastorm/etl	98	3
cli.py in petastorm/benchmark	63	3
local_disk_cache.py in petastorm	43	3
worker_base.py in petastorm/workers_pool	11	3
spark_session_cli.py in petastorm/tools	28	3
cache.py in petastorm	9	2
pickle_serializer.py in petastorm/reader_impl	6	2
arrow_table_serializer.py in petastorm/reader_impl	12	2
namedtuple_gt_255_fields.py in petastorm	65	2
generator.py in petastorm	15	1
spark_utils.py in petastorm	18	1
exec_in_new_process.py in petastorm/workers_pool	18	1

Files With Long Lines (Top 0)

There are 0 files with lines longer than 120 characters. In total, there are 0 long lines.

File	# lines	# units	# long lines