Uber / petastorm
File Change Frequency

File change frequency (churn) shows the distribution of file updates (days with at least one commit).

Overview
File Change Frequency Overall
  • There are 57 files with 4,510 lines of code.
    • 0 files changed more than 100 times (0 lines of code)
    • 2 files changed 51-100 times (371 lines of code)
    • 9 files changed 21-50 times (1,897 lines of code)
    • 17 files changed 6-20 times (1,465 lines of code)
    • 29 files changed 1-5 times (777 lines of code)
0% | 8% | 42% | 32% | 17%
Legend:
101+
51-100
21-50
6-20
1-5

explore: grouped by folders | grouped by update frequency | data
Contributors Count Frequency Overall
  • There are 57 files with 4,510 lines of code.
    • 0 files changed by more than 25 contributors (0 lines of code)
    • 6 files changed by 11-25 contributors (1,017 lines of code)
    • 12 files changed by 6-10 contributors (2,045 lines of code)
    • 26 files changed by 2-5 contributors (1,232 lines of code)
    • 13 files changed by 1 contributor (216 lines of code)
0% | 22% | 45% | 27% | 4%
Legend:
26+
11-25
6-10
2-5
1

explore: grouped by folders | grouped by contributors count | data
File Change Frequency per File Extension
py, rst, yml, css, ini, cfg, gitignore, txt, sh, md, json
File Change Frequency per Extension
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
py0% | 8% | 42% | 32% | 17%
yml0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
File Change Frequency per Logical Decomposition
primary
primary (file change frequency)
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
petastorm0% | 8% | 40% | 33% | 17%
ROOT0% | 0% | 89% | 0% | 10%
Most Frequently Changed Files (Top 50)

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
__init__.py
in petastorm
4 - 2018-07-24 2023-02-03 89 13 yevgeni@uber.com yevgeni@aurora.tech
reader.py
in petastorm
367 20 2018-07-24 2022-12-13 67 22 yevgeni@uber.com udhayaraj12@gmail.com
setup.py
in root
91 - 2018-07-20 2021-07-26 40 13 yevgeni@uber.com selitvin@users.noreply.gith...
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
dataset_metadata.py
in petastorm/etl
201 12 2018-07-24 2021-04-26 36 14 yevgeni@uber.com tgaddair@gmail.com
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 26 6 liang.zhang@databricks.com weichen.xu@databricks.com
pytorch.py
in petastorm
248 19 2018-08-16 2022-04-14 23 11 ocheng@uberatc.com tir.karthi@gmail.com
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
fs_utils.py
in petastorm
106 9 2018-07-24 2021-07-26 21 16 yevgeni@uber.com manjuransari143@gmail.com
tf_utils.py
in petastorm
180 17 2018-07-24 2021-10-22 21 10 yevgeni@uber.com gregor.ehrensperger@gmx.com
188 10 2018-12-05 2022-08-23 21 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
codecs.py
in petastorm
154 24 2018-07-24 2020-09-11 19 8 yevgeni@uber.com dmcguire81@gmail.com
98 3 2018-08-13 2019-06-08 15 9 robbieg@uber.com dimitrov@uber.com
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 13 7 yevgeni@uber.com selitvin@users.noreply.gith...
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
throughput.py
in petastorm/benchmark
124 8 2018-09-21 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
ventilator.py
in petastorm/workers_pool
83 12 2018-07-24 2022-08-23 9 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 9 6 kash@uber.com 44522052+ritwikbera@users.n...
thread_pool.py
in petastorm/workers_pool
126 11 2018-07-24 2021-06-29 9 5 yevgeni@uber.com selitvin@users.noreply.gith...
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-18 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
28 3 2018-07-24 2022-09-15 5 4 yevgeni@uber.com selitvin@users.noreply.gith...
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-21 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-08 4 4 yevgeni@uber.com dimitrov@uber.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-27 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-29 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
9 - 2018-08-30 2022-07-28 2 2 ocheng@uber.com 74630762+chongxiaoc@users.n...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
errors.py
in petastorm
1 - 2019-04-06 2019-04-06 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/hdfs
1 - 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/pyarrow_helpers
1 - 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-18 2018-08-18 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/tools
1 - 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-21 2018-09-21 1 1 yevgeni@uber.com yevgeni@uber.com
Files With Most Contributors (Top 50)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
reader.py
in petastorm
367 20 2018-07-24 2022-12-13 67 22 yevgeni@uber.com udhayaraj12@gmail.com
fs_utils.py
in petastorm
106 9 2018-07-24 2021-07-26 21 16 yevgeni@uber.com manjuransari143@gmail.com
dataset_metadata.py
in petastorm/etl
201 12 2018-07-24 2021-04-26 36 14 yevgeni@uber.com tgaddair@gmail.com
__init__.py
in petastorm
4 - 2018-07-24 2023-02-03 89 13 yevgeni@uber.com yevgeni@aurora.tech
setup.py
in root
91 - 2018-07-20 2021-07-26 40 13 yevgeni@uber.com selitvin@users.noreply.gith...
pytorch.py
in petastorm
248 19 2018-08-16 2022-04-14 23 11 ocheng@uberatc.com tir.karthi@gmail.com
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
tf_utils.py
in petastorm
180 17 2018-07-24 2021-10-22 21 10 yevgeni@uber.com gregor.ehrensperger@gmx.com
98 3 2018-08-13 2019-06-08 15 9 robbieg@uber.com dimitrov@uber.com
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
codecs.py
in petastorm
154 24 2018-07-24 2020-09-11 19 8 yevgeni@uber.com dmcguire81@gmail.com
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 13 7 yevgeni@uber.com selitvin@users.noreply.gith...
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 26 6 liang.zhang@databricks.com weichen.xu@databricks.com
188 10 2018-12-05 2022-08-23 21 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
throughput.py
in petastorm/benchmark
124 8 2018-09-21 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 9 6 kash@uber.com 44522052+ritwikbera@users.n...
ventilator.py
in petastorm/workers_pool
83 12 2018-07-24 2022-08-23 9 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
thread_pool.py
in petastorm/workers_pool
126 11 2018-07-24 2021-06-29 9 5 yevgeni@uber.com selitvin@users.noreply.gith...
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
28 3 2018-07-24 2022-09-15 5 4 yevgeni@uber.com selitvin@users.noreply.gith...
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-08 4 4 yevgeni@uber.com dimitrov@uber.com
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-18 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-27 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-21 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-29 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
9 - 2018-08-30 2022-07-28 2 2 ocheng@uber.com 74630762+chongxiaoc@users.n...
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
cfg
setup.cfg
in root
2 - 2018-08-30 2018-08-30 1 1 robbieg@uber.com robbieg@uber.com
dummy_reader.py
in petastorm/benchmark
56 6 2020-04-18 2020-04-18 1 1 felipe.such@uber.com felipe.such@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-21 2018-09-21 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/tools
1 - 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
spark_session_cli.py
in petastorm/tools
28 3 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-18 2018-08-18 1 1 yevgeni@uber.com yevgeni@uber.com
Files With Least Contributors (Top 50)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
65 2 2019-03-28 2019-03-28 1 1 yevgeni@uber.com yevgeni@uber.com
dummy_reader.py
in petastorm/benchmark
56 6 2020-04-18 2020-04-18 1 1 felipe.such@uber.com felipe.such@uber.com
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
spark_session_cli.py
in petastorm/tools
28 3 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
exec_in_new_process_entrypoint.py
in petastorm/workers_pool
18 - 2022-01-08 2022-01-08 1 1 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
worker_base.py
in petastorm/workers_pool
11 3 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
cfg
setup.cfg
in root
2 - 2018-08-30 2018-08-30 1 1 robbieg@uber.com robbieg@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-21 2018-09-21 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/tools
1 - 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-18 2018-08-18 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/pyarrow_helpers
1 - 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/hdfs
1 - 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
errors.py
in petastorm
1 - 2019-04-06 2019-04-06 1 1 yevgeni@uber.com yevgeni@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-21 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-29 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
9 - 2018-08-30 2022-07-28 2 2 ocheng@uber.com 74630762+chongxiaoc@users.n...
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-27 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-18 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-08 4 4 yevgeni@uber.com dimitrov@uber.com
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
28 3 2018-07-24 2022-09-15 5 4 yevgeni@uber.com selitvin@users.noreply.gith...
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
thread_pool.py
in petastorm/workers_pool
126 11 2018-07-24 2021-06-29 9 5 yevgeni@uber.com selitvin@users.noreply.gith...
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 26 6 liang.zhang@databricks.com weichen.xu@databricks.com
188 10 2018-12-05 2022-08-23 21 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
throughput.py
in petastorm/benchmark
124 8 2018-09-21 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 9 6 kash@uber.com 44522052+ritwikbera@users.n...
ventilator.py
in petastorm/workers_pool
83 12 2018-07-24 2022-08-23 9 6 yevgeni@uber.com 74630762+chongxiaoc@users.n...
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 13 7 yevgeni@uber.com selitvin@users.noreply.gith...
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
codecs.py
in petastorm
154 24 2018-07-24 2020-09-11 19 8 yevgeni@uber.com dmcguire81@gmail.com
98 3 2018-08-13 2019-06-08 15 9 robbieg@uber.com dimitrov@uber.com
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
Correlations

File Size vs. Number of Changes: 57 points

petastorm/__init__.py x: 4 lines of code y: 89 # changes petastorm/spark/spark_dataset_converter.py x: 403 lines of code y: 26 # changes petastorm/reader.py x: 367 lines of code y: 67 # changes petastorm/local_disk_cache.py x: 28 lines of code y: 5 # changes petastorm/arrow_reader_worker.py x: 188 lines of code y: 21 # changes petastorm/py_dict_reader_worker.py x: 155 lines of code y: 15 # changes petastorm/workers_pool/ventilator.py x: 83 lines of code y: 9 # changes readthedocs.yml x: 9 lines of code y: 2 # changes petastorm/reader_impl/pytorch_shuffling_buffer.py x: 123 lines of code y: 4 # changes petastorm/workers_pool/process_pool.py x: 214 lines of code y: 22 # changes petastorm/predicates.py x: 93 lines of code y: 7 # changes petastorm/pytorch.py x: 248 lines of code y: 23 # changes petastorm/reader_impl/pickle_serializer.py x: 6 lines of code y: 2 # changes petastorm/hdfs/namenode.py x: 159 lines of code y: 12 # changes petastorm/workers_pool/exec_in_new_process.py x: 18 lines of code y: 9 # changes petastorm/workers_pool/exec_in_new_process_entrypoint.py x: 18 lines of code y: 1 # changes petastorm/tf_utils.py x: 180 lines of code y: 21 # changes petastorm/unischema.py x: 266 lines of code y: 37 # changes petastorm/etl/legacy.py x: 36 lines of code y: 3 # changes petastorm/fs_utils.py x: 106 lines of code y: 21 # changes setup.py x: 91 lines of code y: 40 # changes petastorm/reader_impl/arrow_table_serializer.py x: 12 lines of code y: 2 # changes petastorm/workers_pool/thread_pool.py x: 126 lines of code y: 9 # changes petastorm/etl/dataset_metadata.py x: 201 lines of code y: 36 # changes petastorm/ngram.py x: 105 lines of code y: 9 # changes petastorm/weighted_sampling_reader.py x: 48 lines of code y: 7 # changes petastorm/benchmark/cli.py x: 63 lines of code y: 4 # changes petastorm/benchmark/throughput.py x: 124 lines of code y: 10 # changes petastorm/etl/rowgroup_indexing.py x: 71 lines of code y: 13 # changes petastorm/utils.py x: 61 lines of code y: 14 # changes petastorm/codecs.py x: 154 lines of code y: 19 # changes petastorm/benchmark/dummy_reader.py x: 56 lines of code y: 1 # changes petastorm/transform.py x: 32 lines of code y: 8 # changes petastorm/spark/__init__.py x: 2 lines of code y: 3 # changes petastorm/reader_impl/shuffling_buffer.py x: 75 lines of code y: 6 # changes petastorm/etl/petastorm_generate_metadata.py x: 98 lines of code y: 15 # changes petastorm/tools/copy_dataset.py x: 77 lines of code y: 4 # changes petastorm/selectors.py x: 46 lines of code y: 3 # changes petastorm/errors.py x: 1 lines of code y: 1 # changes petastorm/namedtuple_gt_255_fields.py x: 65 lines of code y: 1 # changes petastorm/etl/metadata_util.py x: 45 lines of code y: 6 # changes petastorm/spark_utils.py x: 18 lines of code y: 6 # changes petastorm/pyarrow_helpers/batching_table_queue.py x: 30 lines of code y: 1 # changes petastorm/cache.py x: 9 lines of code y: 4 # changes petastorm/etl/__init__.py x: 21 lines of code y: 4 # changes petastorm/workers_pool/__init__.py x: 3 lines of code y: 3 # changes petastorm/workers_pool/dummy_pool.py x: 38 lines of code y: 4 # changes petastorm/tools/spark_session_cli.py x: 28 lines of code y: 1 # changes petastorm/generator.py x: 15 lines of code y: 3 # changes setup.cfg x: 2 lines of code y: 1 # changes petastorm/etl/rowgroup_indexers.py x: 73 lines of code y: 5 # changes petastorm/workers_pool/worker_base.py x: 11 lines of code y: 1 # changes
89.0
# changes
  min: 1.0
  average: 11.32
  25th percentile: 2.0
  median: 5.0
  75th percentile: 14.5
  max: 89.0
0 403.0
lines of code
min: 1.0 | average: 79.12 | 25th percentile: 11.5 | median: 48.0 | 75th percentile: 114.5 | max: 403.0

Number of Contributors vs. Number of Changes: 57 points

petastorm/__init__.py x: 13 # contributors y: 89 # changes petastorm/spark/spark_dataset_converter.py x: 6 # contributors y: 26 # changes petastorm/reader.py x: 22 # contributors y: 67 # changes petastorm/local_disk_cache.py x: 4 # contributors y: 5 # changes petastorm/arrow_reader_worker.py x: 6 # contributors y: 21 # changes petastorm/py_dict_reader_worker.py x: 5 # contributors y: 15 # changes petastorm/workers_pool/ventilator.py x: 6 # contributors y: 9 # changes readthedocs.yml x: 2 # contributors y: 2 # changes petastorm/reader_impl/pytorch_shuffling_buffer.py x: 3 # contributors y: 4 # changes petastorm/workers_pool/process_pool.py x: 8 # contributors y: 22 # changes petastorm/predicates.py x: 5 # contributors y: 7 # changes petastorm/pytorch.py x: 11 # contributors y: 23 # changes petastorm/hdfs/namenode.py x: 7 # contributors y: 12 # changes petastorm/workers_pool/exec_in_new_process.py x: 4 # contributors y: 9 # changes petastorm/workers_pool/exec_in_new_process_entrypoint.py x: 1 # contributors y: 1 # changes petastorm/tf_utils.py x: 10 # contributors y: 21 # changes petastorm/unischema.py x: 10 # contributors y: 37 # changes petastorm/etl/legacy.py x: 2 # contributors y: 3 # changes petastorm/fs_utils.py x: 16 # contributors y: 21 # changes setup.py x: 13 # contributors y: 40 # changes petastorm/workers_pool/thread_pool.py x: 5 # contributors y: 9 # changes petastorm/etl/dataset_metadata.py x: 14 # contributors y: 36 # changes petastorm/weighted_sampling_reader.py x: 4 # contributors y: 7 # changes petastorm/benchmark/cli.py x: 2 # contributors y: 4 # changes petastorm/benchmark/throughput.py x: 6 # contributors y: 10 # changes petastorm/etl/rowgroup_indexing.py x: 7 # contributors y: 13 # changes petastorm/utils.py x: 5 # contributors y: 14 # changes petastorm/codecs.py x: 8 # contributors y: 19 # changes petastorm/transform.py x: 4 # contributors y: 8 # changes petastorm/reader_impl/shuffling_buffer.py x: 3 # contributors y: 6 # changes petastorm/etl/petastorm_generate_metadata.py x: 9 # contributors y: 15 # changes petastorm/tools/copy_dataset.py x: 4 # contributors y: 4 # changes petastorm/selectors.py x: 3 # contributors y: 3 # changes petastorm/etl/metadata_util.py x: 5 # contributors y: 6 # changes
89.0
# changes
  min: 1.0
  average: 11.32
  25th percentile: 2.0
  median: 5.0
  75th percentile: 14.5
  max: 89.0
0 22.0
# contributors
min: 1.0 | average: 4.88 | 25th percentile: 2.0 | median: 4.0 | 75th percentile: 6.0 | max: 22.0

Number of Contributors vs. File Size: 57 points

petastorm/__init__.py x: 13 # contributors y: 4 lines of code petastorm/spark/spark_dataset_converter.py x: 6 # contributors y: 403 lines of code petastorm/reader.py x: 22 # contributors y: 367 lines of code petastorm/local_disk_cache.py x: 4 # contributors y: 28 lines of code petastorm/arrow_reader_worker.py x: 6 # contributors y: 188 lines of code petastorm/py_dict_reader_worker.py x: 5 # contributors y: 155 lines of code petastorm/workers_pool/ventilator.py x: 6 # contributors y: 83 lines of code readthedocs.yml x: 2 # contributors y: 9 lines of code petastorm/reader_impl/pytorch_shuffling_buffer.py x: 3 # contributors y: 123 lines of code petastorm/workers_pool/process_pool.py x: 8 # contributors y: 214 lines of code petastorm/predicates.py x: 5 # contributors y: 93 lines of code petastorm/pytorch.py x: 11 # contributors y: 248 lines of code petastorm/reader_impl/pickle_serializer.py x: 2 # contributors y: 6 lines of code petastorm/hdfs/namenode.py x: 7 # contributors y: 159 lines of code petastorm/workers_pool/exec_in_new_process.py x: 4 # contributors y: 18 lines of code petastorm/workers_pool/exec_in_new_process_entrypoint.py x: 1 # contributors y: 18 lines of code petastorm/tf_utils.py x: 10 # contributors y: 180 lines of code petastorm/unischema.py x: 10 # contributors y: 266 lines of code petastorm/etl/legacy.py x: 2 # contributors y: 36 lines of code petastorm/fs_utils.py x: 16 # contributors y: 106 lines of code setup.py x: 13 # contributors y: 91 lines of code petastorm/reader_impl/arrow_table_serializer.py x: 2 # contributors y: 12 lines of code petastorm/workers_pool/thread_pool.py x: 5 # contributors y: 126 lines of code petastorm/etl/dataset_metadata.py x: 14 # contributors y: 201 lines of code petastorm/ngram.py x: 6 # contributors y: 105 lines of code petastorm/weighted_sampling_reader.py x: 4 # contributors y: 48 lines of code petastorm/benchmark/cli.py x: 2 # contributors y: 63 lines of code petastorm/benchmark/throughput.py x: 6 # contributors y: 124 lines of code petastorm/etl/rowgroup_indexing.py x: 7 # contributors y: 71 lines of code petastorm/utils.py x: 5 # contributors y: 61 lines of code petastorm/codecs.py x: 8 # contributors y: 154 lines of code petastorm/benchmark/dummy_reader.py x: 1 # contributors y: 56 lines of code petastorm/transform.py x: 4 # contributors y: 32 lines of code petastorm/spark/__init__.py x: 2 # contributors y: 2 lines of code petastorm/reader_impl/shuffling_buffer.py x: 3 # contributors y: 75 lines of code petastorm/etl/petastorm_generate_metadata.py x: 9 # contributors y: 98 lines of code petastorm/tools/copy_dataset.py x: 4 # contributors y: 77 lines of code petastorm/selectors.py x: 3 # contributors y: 46 lines of code petastorm/errors.py x: 1 # contributors y: 1 lines of code petastorm/namedtuple_gt_255_fields.py x: 1 # contributors y: 65 lines of code petastorm/etl/metadata_util.py x: 5 # contributors y: 45 lines of code petastorm/spark_utils.py x: 5 # contributors y: 18 lines of code petastorm/pyarrow_helpers/batching_table_queue.py x: 1 # contributors y: 30 lines of code petastorm/etl/__init__.py x: 3 # contributors y: 21 lines of code petastorm/workers_pool/dummy_pool.py x: 2 # contributors y: 38 lines of code petastorm/tools/spark_session_cli.py x: 1 # contributors y: 28 lines of code petastorm/generator.py x: 3 # contributors y: 15 lines of code setup.cfg x: 1 # contributors y: 2 lines of code petastorm/etl/rowgroup_indexers.py x: 4 # contributors y: 73 lines of code petastorm/workers_pool/worker_base.py x: 1 # contributors y: 11 lines of code
403.0
lines of code
  min: 1.0
  average: 79.12
  25th percentile: 11.5
  median: 48.0
  75th percentile: 114.5
  max: 403.0
0 22.0
# contributors
min: 1.0 | average: 4.88 | 25th percentile: 2.0 | median: 4.0 | 75th percentile: 6.0 | max: 22.0