uber / petastorm
File Change Frequency

File change frequency (churn) shows the distribution of file updates (days with at least one commit).

Overview
File Change Frequency Overall
  • There are 56 files with 4,596 lines of code.
    • 0 files changed more than 100 times (0 lines of code)
    • 2 files changed 51-100 times (385 lines of code)
    • 9 files changed 21-50 times (1,935 lines of code)
    • 18 files changed 6-20 times (1,536 lines of code)
    • 27 files changed 1-5 times (740 lines of code)
0% | 8% | 42% | 33% | 16%
Legend:
101+
51-100
21-50
6-20
1-5

explore: grouped by folders | grouped by update frequency | data
Contributors Count Frequency Overall
  • There are 56 files with 4,596 lines of code.
    • 0 files changed by more than 25 contributors (0 lines of code)
    • 6 files changed by 11-25 contributors (1,032 lines of code)
    • 14 files changed by 6-10 contributors (2,279 lines of code)
    • 23 files changed by 2-5 contributors (1,069 lines of code)
    • 13 files changed by 1 contributor (216 lines of code)
0% | 22% | 49% | 23% | 4%
Legend:
26+
11-25
6-10
2-5
1

explore: grouped by folders | grouped by contributors count | data
File Change Frequency per File Extension
py, rst, yaml, css, cfg, ini, gitignore, txt, sh, md
File Change Frequency per Extension
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
py0% | 8% | 42% | 33% | 16%
cfg0% | 0% | 0% | 0% | 100%
File Change Frequency per Logical Decomposition
primary
primary (file change frequency)
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
petastorm0% | 8% | 40% | 34% | 16%
ROOT0% | 0% | 97% | 0% | 2%
Most Frequently Changed Files (Top 50)

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
__init__.py
in petastorm
4 - 2018-07-24 2025-12-19 91 15 yevgeni@uber.com kashish@uber.com
reader.py
in petastorm
381 21 2018-07-24 2025-10-31 72 25 yevgeni@uber.com kashish@uber.com
setup.py
in root
92 - 2018-07-19 2025-08-09 44 14 yevgeni@uber.com 40239280+arushi297@users.no...
dataset_metadata.py
in petastorm/etl
201 12 2018-07-24 2021-04-26 37 14 yevgeni@uber.com tgaddair@gmail.com
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 27 6 liang.zhang@databricks.com weichen.xu@databricks.com
225 11 2018-12-05 2025-12-16 24 9 yevgeni@uber.com kashish@uber.com
pytorch.py
in petastorm
248 19 2018-08-16 2022-04-14 23 11 ocheng@uberatc.com tir.karthi@gmail.com
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
fs_utils.py
in petastorm
106 9 2018-07-24 2021-07-26 21 16 yevgeni@uber.com manjuransari143@gmail.com
tf_utils.py
in petastorm
180 17 2018-07-24 2021-10-22 21 10 yevgeni@uber.com gregor.ehrensperger@gmx.com
codecs.py
in petastorm
154 24 2018-07-24 2020-09-10 19 8 yevgeni@uber.com dmcguire81@gmail.com
98 3 2018-08-13 2019-06-07 15 9 robbieg@uber.com dimitrov@uber.com
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 14 7 yevgeni@uber.com selitvin@users.noreply.gith...
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
ventilator.py
in petastorm/workers_pool
87 12 2018-07-24 2025-08-09 10 7 yevgeni@uber.com 40239280+arushi297@users.no...
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 10 6 kash@uber.com 44522052+ritwikbera@users.n...
throughput.py
in petastorm/benchmark
124 8 2018-09-20 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
thread_pool.py
in petastorm/workers_pool
150 13 2018-07-24 2025-08-09 10 6 yevgeni@uber.com 40239280+arushi297@users.no...
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
43 3 2018-07-24 2025-10-31 7 6 yevgeni@uber.com kashish@uber.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-17 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-20 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-07 4 4 yevgeni@uber.com dimitrov@uber.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-26 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-28 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
__init__.py
in petastorm/tools
1 - 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-20 2018-09-20 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-17 2018-08-17 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/hdfs
1 - 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
errors.py
in petastorm
1 - 2019-04-05 2019-04-05 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/pyarrow_helpers
1 - 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
cfg
setup.cfg
in root
2 - 2018-08-30 2018-08-30 1 1 robbieg@uber.com robbieg@uber.com
Files With Most Contributors (Top 50)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
reader.py
in petastorm
381 21 2018-07-24 2025-10-31 72 25 yevgeni@uber.com kashish@uber.com
fs_utils.py
in petastorm
106 9 2018-07-24 2021-07-26 21 16 yevgeni@uber.com manjuransari143@gmail.com
__init__.py
in petastorm
4 - 2018-07-24 2025-12-19 91 15 yevgeni@uber.com kashish@uber.com
setup.py
in root
92 - 2018-07-19 2025-08-09 44 14 yevgeni@uber.com 40239280+arushi297@users.no...
dataset_metadata.py
in petastorm/etl
201 12 2018-07-24 2021-04-26 37 14 yevgeni@uber.com tgaddair@gmail.com
pytorch.py
in petastorm
248 19 2018-08-16 2022-04-14 23 11 ocheng@uberatc.com tir.karthi@gmail.com
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
tf_utils.py
in petastorm
180 17 2018-07-24 2021-10-22 21 10 yevgeni@uber.com gregor.ehrensperger@gmx.com
225 11 2018-12-05 2025-12-16 24 9 yevgeni@uber.com kashish@uber.com
98 3 2018-08-13 2019-06-07 15 9 robbieg@uber.com dimitrov@uber.com
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
codecs.py
in petastorm
154 24 2018-07-24 2020-09-10 19 8 yevgeni@uber.com dmcguire81@gmail.com
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 14 7 yevgeni@uber.com selitvin@users.noreply.gith...
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
ventilator.py
in petastorm/workers_pool
87 12 2018-07-24 2025-08-09 10 7 yevgeni@uber.com 40239280+arushi297@users.no...
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 27 6 liang.zhang@databricks.com weichen.xu@databricks.com
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 10 6 kash@uber.com 44522052+ritwikbera@users.n...
throughput.py
in petastorm/benchmark
124 8 2018-09-20 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
thread_pool.py
in petastorm/workers_pool
150 13 2018-07-24 2025-08-09 10 6 yevgeni@uber.com 40239280+arushi297@users.no...
43 3 2018-07-24 2025-10-31 7 6 yevgeni@uber.com kashish@uber.com
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-07 4 4 yevgeni@uber.com dimitrov@uber.com
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-17 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-26 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-20 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-28 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
cfg
setup.cfg
in root
2 - 2018-08-30 2018-08-30 1 1 robbieg@uber.com robbieg@uber.com
__init__.py
in petastorm/pyarrow_helpers
1 - 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
errors.py
in petastorm
1 - 2019-04-05 2019-04-05 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/hdfs
1 - 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-17 2018-08-17 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-20 2018-09-20 1 1 yevgeni@uber.com yevgeni@uber.com
Files With Least Contributors (Top 50)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
65 2 2019-03-28 2019-03-28 1 1 yevgeni@uber.com yevgeni@uber.com
dummy_reader.py
in petastorm/benchmark
56 6 2020-04-18 2020-04-18 1 1 felipe.such@uber.com felipe.such@uber.com
batching_table_queue.py
in petastorm/pyarrow_helpers
30 4 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
spark_session_cli.py
in petastorm/tools
28 3 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
exec_in_new_process_entrypoint.py
in petastorm/workers_pool
18 - 2022-01-08 2022-01-08 1 1 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
worker_base.py
in petastorm/workers_pool
11 3 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
cfg
setup.cfg
in root
2 - 2018-08-30 2018-08-30 1 1 robbieg@uber.com robbieg@uber.com
__init__.py
in petastorm/pyarrow_helpers
1 - 2018-11-11 2018-11-11 1 1 yevgeni@uber.com yevgeni@uber.com
errors.py
in petastorm
1 - 2019-04-05 2019-04-05 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/hdfs
1 - 2018-07-24 2018-07-24 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/reader_impl
1 - 2018-08-17 2018-08-17 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/benchmark
1 - 2018-09-20 2018-09-20 1 1 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/tools
1 - 2018-09-25 2018-09-25 1 1 yevgeni@uber.com yevgeni@uber.com
cli.py
in petastorm/benchmark
63 3 2018-09-20 2020-11-09 4 2 yevgeni@uber.com selitvin@users.noreply.gith...
dummy_pool.py
in petastorm/workers_pool
38 7 2018-07-24 2018-10-06 4 2 yevgeni@uber.com yevgeni@uber.com
legacy.py
in petastorm/etl
36 3 2018-08-10 2021-07-28 3 2 selitvin@users.noreply.gith... selitvin@users.noreply.gith...
arrow_table_serializer.py
in petastorm/reader_impl
12 2 2018-12-05 2021-07-26 2 2 yevgeni@uber.com selitvin@users.noreply.gith...
cache.py
in petastorm
9 2 2018-07-24 2018-11-26 4 2 yevgeni@uber.com yevgeni@uber.com
pickle_serializer.py
in petastorm/reader_impl
6 2 2018-10-03 2022-03-24 2 2 yevgeni@uber.com 54285210+rbetz@users.norepl...
__init__.py
in petastorm/workers_pool
3 - 2018-07-24 2018-11-26 3 2 yevgeni@uber.com yevgeni@uber.com
__init__.py
in petastorm/spark
2 - 2020-02-20 2020-03-10 3 2 liang.zhang@databricks.com weichen.xu@databricks.com
pytorch_shuffling_buffer.py
in petastorm/reader_impl
123 22 2020-04-18 2022-07-26 4 3 felipe.such@uber.com 74630762+chongxiaoc@users.n...
shuffling_buffer.py
in petastorm/reader_impl
75 20 2018-08-17 2019-08-09 6 3 yevgeni@uber.com yevgeni@uber.com
selectors.py
in petastorm
46 11 2018-07-24 2019-05-05 3 3 yevgeni@uber.com greg.arutiunian@ge.com
__init__.py
in petastorm/etl
21 6 2018-07-24 2018-11-26 4 3 yevgeni@uber.com yevgeni@uber.com
generator.py
in petastorm
15 1 2018-07-24 2018-08-30 3 3 yevgeni@uber.com ocheng@uber.com
copy_dataset.py
in petastorm/tools
77 4 2018-09-25 2019-06-07 4 4 yevgeni@uber.com dimitrov@uber.com
rowgroup_indexers.py
in petastorm/etl
73 14 2018-07-24 2018-08-18 5 4 yevgeni@uber.com m.eremeev@uber.com
48 10 2018-09-13 2020-12-02 7 4 yevgeni@uber.com gueguenster@gmail.com
transform.py
in petastorm
32 3 2018-12-29 2020-04-02 8 4 yevgeni@uber.com weichen.xu@databricks.com
exec_in_new_process.py
in petastorm/workers_pool
18 1 2018-07-24 2022-01-08 9 4 yevgeni@uber.com selitvin@users.noreply.gith...
155 12 2018-12-05 2022-08-23 15 5 yevgeni@uber.com 74630762+chongxiaoc@users.n...
predicates.py
in petastorm
93 21 2018-07-24 2022-04-14 7 5 yevgeni@uber.com tir.karthi@gmail.com
utils.py
in petastorm
61 3 2018-07-24 2020-11-09 14 5 yevgeni@uber.com selitvin@users.noreply.gith...
metadata_util.py
in petastorm/etl
45 - 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_utils.py
in petastorm
18 1 2018-07-24 2019-01-17 6 5 yevgeni@uber.com kimham@kth.se
spark_dataset_converter.py
in petastorm/spark
403 37 2020-02-20 2023-01-30 27 6 liang.zhang@databricks.com weichen.xu@databricks.com
thread_pool.py
in petastorm/workers_pool
150 13 2018-07-24 2025-08-09 10 6 yevgeni@uber.com 40239280+arushi297@users.no...
throughput.py
in petastorm/benchmark
124 8 2018-09-20 2020-11-09 10 6 yevgeni@uber.com selitvin@users.noreply.gith...
ngram.py
in petastorm
105 15 2018-08-18 2020-12-07 10 6 kash@uber.com 44522052+ritwikbera@users.n...
43 3 2018-07-24 2025-10-31 7 6 yevgeni@uber.com kashish@uber.com
namenode.py
in petastorm/hdfs
159 16 2018-07-24 2022-01-10 12 7 yevgeni@uber.com selitvin@users.noreply.gith...
ventilator.py
in petastorm/workers_pool
87 12 2018-07-24 2025-08-09 10 7 yevgeni@uber.com 40239280+arushi297@users.no...
rowgroup_indexing.py
in petastorm/etl
71 4 2018-07-24 2020-11-09 14 7 yevgeni@uber.com selitvin@users.noreply.gith...
process_pool.py
in petastorm/workers_pool
214 14 2018-07-24 2022-06-24 22 8 yevgeni@uber.com farahani.mostafa@gmail.com
codecs.py
in petastorm
154 24 2018-07-24 2020-09-10 19 8 yevgeni@uber.com dmcguire81@gmail.com
225 11 2018-12-05 2025-12-16 24 9 yevgeni@uber.com kashish@uber.com
98 3 2018-08-13 2019-06-07 15 9 robbieg@uber.com dimitrov@uber.com
unischema.py
in petastorm
266 23 2018-07-24 2021-07-29 37 10 yevgeni@uber.com selitvin@users.noreply.gith...
tf_utils.py
in petastorm
180 17 2018-07-24 2021-10-22 21 10 yevgeni@uber.com gregor.ehrensperger@gmx.com