Uber / petastorm
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
0% | 0% | 37% | 31% | 30%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 0% | 37% | 31% | 30%
yml0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
petastorm0% | 0% | 38% | 32% | 29%
ROOT0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
403 37
reader.py
in petastorm
367 20
unischema.py
in petastorm
266 23
pytorch.py
in petastorm
248 19
process_pool.py
in petastorm/workers_pool
214 14
dataset_metadata.py
in petastorm/etl
201 12
188 10
tf_utils.py
in petastorm
180 17
namenode.py
in petastorm/hdfs
159 16
155 12
codecs.py
in petastorm
154 24
thread_pool.py
in petastorm/workers_pool
126 11
throughput.py
in petastorm/benchmark
124 8
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22
fs_utils.py
in petastorm
106 9
ngram.py
in petastorm
105 15
98 3
predicates.py
in petastorm
93 21
setup.py
in root
91 -
ventilator.py
in petastorm/workers_pool
83 12
copy_dataset.py
in petastorm/tools
77 4
shuffling_buffer.py
in petastorm/reader_impl
75 20
rowgroup_indexers.py
in petastorm/etl
73 14
rowgroup_indexing.py
in petastorm/etl
71 4
65 2
cli.py
in petastorm/benchmark
63 3
utils.py
in petastorm
61 3
dummy_reader.py
in petastorm/benchmark
56 6
48 10
selectors.py
in petastorm
46 11
metadata_util.py
in petastorm/etl
45 -
dummy_pool.py
in petastorm/workers_pool
38 7
legacy.py
in petastorm/etl
36 3
transform.py
in petastorm
32 3
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
spark_session_cli.py
in petastorm/tools
28 3
28 3
__init__.py
in petastorm/etl
21 6
exec_in_new_process.py
in petastorm/workers_pool
18 1
exec_in_new_process_entrypoint.py
in petastorm/workers_pool
18 -
spark_utils.py
in petastorm
18 1
generator.py
in petastorm
15 1
arrow_table_serializer.py
in petastorm/reader_impl
12 2
worker_base.py
in petastorm/workers_pool
11 3
9 -
cache.py
in petastorm
9 2
pickle_serializer.py
in petastorm/reader_impl
6 2
__init__.py
in petastorm
4 -
__init__.py
in petastorm/workers_pool
3 -
cfg
setup.cfg
in root
2 -
Files With Most Units (Top 43)
File# lines# units
spark_dataset_converter.py
in petastorm/spark
403 37
codecs.py
in petastorm
154 24
unischema.py
in petastorm
266 23
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22
predicates.py
in petastorm
93 21
shuffling_buffer.py
in petastorm/reader_impl
75 20
reader.py
in petastorm
367 20
pytorch.py
in petastorm
248 19
tf_utils.py
in petastorm
180 17
namenode.py
in petastorm/hdfs
159 16
ngram.py
in petastorm
105 15
rowgroup_indexers.py
in petastorm/etl
73 14
process_pool.py
in petastorm/workers_pool
214 14
dataset_metadata.py
in petastorm/etl
201 12
155 12
ventilator.py
in petastorm/workers_pool
83 12
selectors.py
in petastorm
46 11
thread_pool.py
in petastorm/workers_pool
126 11
188 10
48 10
fs_utils.py
in petastorm
106 9
throughput.py
in petastorm/benchmark
124 8
dummy_pool.py
in petastorm/workers_pool
38 7
dummy_reader.py
in petastorm/benchmark
56 6
__init__.py
in petastorm/etl
21 6
copy_dataset.py
in petastorm/tools
77 4
rowgroup_indexing.py
in petastorm/etl
71 4
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4
cli.py
in petastorm/benchmark
63 3
spark_session_cli.py
in petastorm/tools
28 3
28 3
legacy.py
in petastorm/etl
36 3
98 3
utils.py
in petastorm
61 3
transform.py
in petastorm
32 3
worker_base.py
in petastorm/workers_pool
11 3
cache.py
in petastorm
9 2
pickle_serializer.py
in petastorm/reader_impl
6 2
arrow_table_serializer.py
in petastorm/reader_impl
12 2
65 2
generator.py
in petastorm
15 1
exec_in_new_process.py
in petastorm/workers_pool
18 1
spark_utils.py
in petastorm
18 1
Files With Long Lines (Top 0)

There are 0 files with lines longer than 120 characters. In total, there are 0 long lines.

File# lines# units# long lines
Correlations

File Size vs. Commits (all time): 57 points

petastorm/__init__.py x: 113 commits (all time) y: 4 lines of code petastorm/spark/spark_dataset_converter.py x: 39 commits (all time) y: 403 lines of code petastorm/reader.py x: 82 commits (all time) y: 367 lines of code petastorm/local_disk_cache.py x: 5 commits (all time) y: 28 lines of code petastorm/arrow_reader_worker.py x: 26 commits (all time) y: 188 lines of code petastorm/py_dict_reader_worker.py x: 15 commits (all time) y: 155 lines of code petastorm/workers_pool/ventilator.py x: 9 commits (all time) y: 83 lines of code readthedocs.yml x: 2 commits (all time) y: 9 lines of code petastorm/reader_impl/pytorch_shuffling_buffer.py x: 4 commits (all time) y: 123 lines of code petastorm/workers_pool/process_pool.py x: 26 commits (all time) y: 214 lines of code petastorm/predicates.py x: 7 commits (all time) y: 93 lines of code petastorm/pytorch.py x: 23 commits (all time) y: 248 lines of code petastorm/reader_impl/pickle_serializer.py x: 2 commits (all time) y: 6 lines of code petastorm/hdfs/namenode.py x: 15 commits (all time) y: 159 lines of code petastorm/workers_pool/exec_in_new_process.py x: 9 commits (all time) y: 18 lines of code petastorm/workers_pool/exec_in_new_process_entrypoint.py x: 1 commits (all time) y: 18 lines of code petastorm/tf_utils.py x: 22 commits (all time) y: 180 lines of code petastorm/unischema.py x: 41 commits (all time) y: 266 lines of code petastorm/etl/legacy.py x: 3 commits (all time) y: 36 lines of code petastorm/fs_utils.py x: 22 commits (all time) y: 106 lines of code setup.py x: 54 commits (all time) y: 91 lines of code petastorm/reader_impl/arrow_table_serializer.py x: 2 commits (all time) y: 12 lines of code petastorm/workers_pool/thread_pool.py x: 10 commits (all time) y: 126 lines of code petastorm/etl/dataset_metadata.py x: 40 commits (all time) y: 201 lines of code petastorm/ngram.py x: 15 commits (all time) y: 105 lines of code petastorm/weighted_sampling_reader.py x: 8 commits (all time) y: 48 lines of code petastorm/benchmark/cli.py x: 4 commits (all time) y: 63 lines of code petastorm/benchmark/throughput.py x: 10 commits (all time) y: 124 lines of code petastorm/etl/rowgroup_indexing.py x: 15 commits (all time) y: 71 lines of code petastorm/utils.py x: 15 commits (all time) y: 61 lines of code petastorm/codecs.py x: 21 commits (all time) y: 154 lines of code petastorm/benchmark/dummy_reader.py x: 1 commits (all time) y: 56 lines of code petastorm/transform.py x: 10 commits (all time) y: 32 lines of code petastorm/spark/__init__.py x: 4 commits (all time) y: 2 lines of code petastorm/reader_impl/shuffling_buffer.py x: 6 commits (all time) y: 75 lines of code petastorm/etl/petastorm_generate_metadata.py x: 18 commits (all time) y: 98 lines of code petastorm/tools/copy_dataset.py x: 6 commits (all time) y: 77 lines of code petastorm/selectors.py x: 4 commits (all time) y: 46 lines of code petastorm/errors.py x: 1 commits (all time) y: 1 lines of code petastorm/namedtuple_gt_255_fields.py x: 1 commits (all time) y: 65 lines of code petastorm/etl/metadata_util.py x: 8 commits (all time) y: 45 lines of code petastorm/spark_utils.py x: 6 commits (all time) y: 18 lines of code petastorm/pyarrow_helpers/batching_table_queue.py x: 1 commits (all time) y: 30 lines of code petastorm/cache.py x: 4 commits (all time) y: 9 lines of code petastorm/etl/__init__.py x: 4 commits (all time) y: 21 lines of code petastorm/workers_pool/__init__.py x: 3 commits (all time) y: 3 lines of code petastorm/workers_pool/dummy_pool.py x: 5 commits (all time) y: 38 lines of code petastorm/tools/spark_session_cli.py x: 1 commits (all time) y: 28 lines of code petastorm/generator.py x: 3 commits (all time) y: 15 lines of code setup.cfg x: 1 commits (all time) y: 2 lines of code petastorm/etl/rowgroup_indexers.py x: 5 commits (all time) y: 73 lines of code petastorm/workers_pool/worker_base.py x: 1 commits (all time) y: 11 lines of code
403.0
lines of code
  min: 1.0
  average: 79.12
  25th percentile: 11.5
  median: 48.0
  75th percentile: 114.5
  max: 403.0
0 113.0
commits (all time)
min: 1.0 | average: 13.3 | 25th percentile: 2.0 | median: 6.0 | 75th percentile: 15.0 | max: 113.0

File Size vs. Contributors (all time): 57 points

petastorm/__init__.py x: 13 contributors (all time) y: 4 lines of code petastorm/spark/spark_dataset_converter.py x: 6 contributors (all time) y: 403 lines of code petastorm/reader.py x: 22 contributors (all time) y: 367 lines of code petastorm/local_disk_cache.py x: 4 contributors (all time) y: 28 lines of code petastorm/arrow_reader_worker.py x: 6 contributors (all time) y: 188 lines of code petastorm/py_dict_reader_worker.py x: 5 contributors (all time) y: 155 lines of code petastorm/workers_pool/ventilator.py x: 6 contributors (all time) y: 83 lines of code readthedocs.yml x: 2 contributors (all time) y: 9 lines of code petastorm/reader_impl/pytorch_shuffling_buffer.py x: 3 contributors (all time) y: 123 lines of code petastorm/workers_pool/process_pool.py x: 8 contributors (all time) y: 214 lines of code petastorm/predicates.py x: 5 contributors (all time) y: 93 lines of code petastorm/pytorch.py x: 11 contributors (all time) y: 248 lines of code petastorm/reader_impl/pickle_serializer.py x: 2 contributors (all time) y: 6 lines of code petastorm/hdfs/namenode.py x: 7 contributors (all time) y: 159 lines of code petastorm/workers_pool/exec_in_new_process.py x: 4 contributors (all time) y: 18 lines of code petastorm/workers_pool/exec_in_new_process_entrypoint.py x: 1 contributors (all time) y: 18 lines of code petastorm/tf_utils.py x: 10 contributors (all time) y: 180 lines of code petastorm/unischema.py x: 10 contributors (all time) y: 266 lines of code petastorm/etl/legacy.py x: 2 contributors (all time) y: 36 lines of code petastorm/fs_utils.py x: 16 contributors (all time) y: 106 lines of code setup.py x: 13 contributors (all time) y: 91 lines of code petastorm/reader_impl/arrow_table_serializer.py x: 2 contributors (all time) y: 12 lines of code petastorm/workers_pool/thread_pool.py x: 5 contributors (all time) y: 126 lines of code petastorm/etl/dataset_metadata.py x: 14 contributors (all time) y: 201 lines of code petastorm/ngram.py x: 6 contributors (all time) y: 105 lines of code petastorm/weighted_sampling_reader.py x: 4 contributors (all time) y: 48 lines of code petastorm/benchmark/cli.py x: 2 contributors (all time) y: 63 lines of code petastorm/benchmark/throughput.py x: 6 contributors (all time) y: 124 lines of code petastorm/etl/rowgroup_indexing.py x: 7 contributors (all time) y: 71 lines of code petastorm/utils.py x: 5 contributors (all time) y: 61 lines of code petastorm/codecs.py x: 8 contributors (all time) y: 154 lines of code petastorm/benchmark/dummy_reader.py x: 1 contributors (all time) y: 56 lines of code petastorm/transform.py x: 4 contributors (all time) y: 32 lines of code petastorm/spark/__init__.py x: 2 contributors (all time) y: 2 lines of code petastorm/reader_impl/shuffling_buffer.py x: 3 contributors (all time) y: 75 lines of code petastorm/etl/petastorm_generate_metadata.py x: 9 contributors (all time) y: 98 lines of code petastorm/tools/copy_dataset.py x: 4 contributors (all time) y: 77 lines of code petastorm/selectors.py x: 3 contributors (all time) y: 46 lines of code petastorm/errors.py x: 1 contributors (all time) y: 1 lines of code petastorm/namedtuple_gt_255_fields.py x: 1 contributors (all time) y: 65 lines of code petastorm/etl/metadata_util.py x: 5 contributors (all time) y: 45 lines of code petastorm/spark_utils.py x: 5 contributors (all time) y: 18 lines of code petastorm/pyarrow_helpers/batching_table_queue.py x: 1 contributors (all time) y: 30 lines of code petastorm/etl/__init__.py x: 3 contributors (all time) y: 21 lines of code petastorm/workers_pool/dummy_pool.py x: 2 contributors (all time) y: 38 lines of code petastorm/tools/spark_session_cli.py x: 1 contributors (all time) y: 28 lines of code petastorm/generator.py x: 3 contributors (all time) y: 15 lines of code setup.cfg x: 1 contributors (all time) y: 2 lines of code petastorm/etl/rowgroup_indexers.py x: 4 contributors (all time) y: 73 lines of code petastorm/workers_pool/worker_base.py x: 1 contributors (all time) y: 11 lines of code
403.0
lines of code
  min: 1.0
  average: 79.12
  25th percentile: 11.5
  median: 48.0
  75th percentile: 114.5
  max: 403.0
0 22.0
contributors (all time)
min: 1.0 | average: 4.88 | 25th percentile: 2.0 | median: 4.0 | 75th percentile: 6.0 | max: 22.0

File Size vs. Commits (30 days): 0 points

No data for "commits (30d)" vs. "lines of code".

File Size vs. Contributors (30 days): 0 points

No data for "contributors (30d)" vs. "lines of code".


File Size vs. Commits (90 days): 0 points

No data for "commits (90d)" vs. "lines of code".

File Size vs. Contributors (90 days): 0 points

No data for "contributors (90d)" vs. "lines of code".