uber / petastorm
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
0% | 0% | 42% | 27% | 30%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 0% | 42% | 27% | 30%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
petastorm0% | 0% | 43% | 27% | 29%
ROOT0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
403 37
reader.py
in petastorm
381 21
unischema.py
in petastorm
266 23
pytorch.py
in petastorm
248 19
225 11
process_pool.py
in petastorm/workers_pool
214 14
dataset_metadata.py
in petastorm/etl
201 12
tf_utils.py
in petastorm
180 17
namenode.py
in petastorm/hdfs
159 16
155 12
codecs.py
in petastorm
154 24
thread_pool.py
in petastorm/workers_pool
150 13
throughput.py
in petastorm/benchmark
124 8
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22
fs_utils.py
in petastorm
106 9
ngram.py
in petastorm
105 15
98 3
predicates.py
in petastorm
93 21
setup.py
in root
92 -
ventilator.py
in petastorm/workers_pool
87 12
copy_dataset.py
in petastorm/tools
77 4
shuffling_buffer.py
in petastorm/reader_impl
75 20
rowgroup_indexers.py
in petastorm/etl
73 14
rowgroup_indexing.py
in petastorm/etl
71 4
65 2
cli.py
in petastorm/benchmark
63 3
utils.py
in petastorm
61 3
dummy_reader.py
in petastorm/benchmark
56 6
48 10
selectors.py
in petastorm
46 11
metadata_util.py
in petastorm/etl
45 -
43 3
dummy_pool.py
in petastorm/workers_pool
38 7
legacy.py
in petastorm/etl
36 3
transform.py
in petastorm
32 3
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
spark_session_cli.py
in petastorm/tools
28 3
__init__.py
in petastorm/etl
21 6
spark_utils.py
in petastorm
18 1
exec_in_new_process_entrypoint.py
in petastorm/workers_pool
18 -
exec_in_new_process.py
in petastorm/workers_pool
18 1
generator.py
in petastorm
15 1
arrow_table_serializer.py
in petastorm/reader_impl
12 2
worker_base.py
in petastorm/workers_pool
11 3
cache.py
in petastorm
9 2
pickle_serializer.py
in petastorm/reader_impl
6 2
__init__.py
in petastorm
4 -
__init__.py
in petastorm/workers_pool
3 -
cfg
setup.cfg
in root
2 -
__init__.py
in petastorm/spark
2 -
Files With Most Units (Top 43)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
403 37
codecs.py
in petastorm
154 24
unischema.py
in petastorm
266 23
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22
reader.py
in petastorm
381 21
predicates.py
in petastorm
93 21
shuffling_buffer.py
in petastorm/reader_impl
75 20
pytorch.py
in petastorm
248 19
tf_utils.py
in petastorm
180 17
namenode.py
in petastorm/hdfs
159 16
ngram.py
in petastorm
105 15
rowgroup_indexers.py
in petastorm/etl
73 14
process_pool.py
in petastorm/workers_pool
214 14
thread_pool.py
in petastorm/workers_pool
150 13
dataset_metadata.py
in petastorm/etl
201 12
155 12
ventilator.py
in petastorm/workers_pool
87 12
225 11
selectors.py
in petastorm
46 11
48 10
fs_utils.py
in petastorm
106 9
throughput.py
in petastorm/benchmark
124 8
dummy_pool.py
in petastorm/workers_pool
38 7
__init__.py
in petastorm/etl
21 6
dummy_reader.py
in petastorm/benchmark
56 6
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
rowgroup_indexing.py
in petastorm/etl
71 4
copy_dataset.py
in petastorm/tools
77 4
transform.py
in petastorm
32 3
utils.py
in petastorm
61 3
legacy.py
in petastorm/etl
36 3
98 3
cli.py
in petastorm/benchmark
63 3
43 3
worker_base.py
in petastorm/workers_pool
11 3
spark_session_cli.py
in petastorm/tools
28 3
cache.py
in petastorm
9 2
pickle_serializer.py
in petastorm/reader_impl
6 2
arrow_table_serializer.py
in petastorm/reader_impl
12 2
65 2
generator.py
in petastorm
15 1
spark_utils.py
in petastorm
18 1
exec_in_new_process.py
in petastorm/workers_pool
18 1
Files With Long Lines (Top 0)

There are 0 files with lines longer than 120 characters. In total, there are 0 long lines.

File# lines# units# long lines