Petastorm
File Change Frequency

File change frequency measurements show the distribution of the number of file updates (days with at least one commit).

File Change Frequency Overall
File Change Frequency Overall
The number of recorded file updates
  • There are 84 files with 5,134 lines of code.
    • 0 files changed more than 100 times (0 lines of code)
    • 2 files 51 to 100 times (329 lines of code)
    • 4 files changed 21 to 50 times (931 lines of code)
    • 19 files changed 6 to 20 times (2,207 lines of code)
    • 59 files changed 5 or fewer times (1,667 lines of code)
0% | 6% | 18% | 42% | 32%
Legend:
101+
51-100
21-50
6-20
1-5 updates

Detailed data...

File Change Frequency per File Extension
py, crc, parquet
File Change Frequency per Extension
The number of recorded file updates
py0% | 6% | 18% | 42% | 32%
Legend:
101+
51-100
21-50
6-20
1-5 updates
File Change Frequency per Logical Decomposition
primary
primary (file change frequency)
The number of recorded file updates
petastorm0% | 15% | 12% | 56% | 15%
petastorm/spark0% | 0% | 99% | 0% | <1%
petastorm/etl0% | 0% | 38% | 32% | 29%
ROOT0% | 0% | 100% | 0% | 0%
petastorm/workers_pool0% | 0% | 0% | 83% | 16%
petastorm/hdfs0% | 0% | 0% | 99% | <1%
petastorm/benchmark0% | 0% | 0% | 50% | 49%
petastorm/reader_impl0% | 0% | 0% | 31% | 68%
examples/mnist0% | 0% | 0% | 23% | 76%
examples/spark_dataset_converter0% | 0% | 0% | 0% | 100%
examples/hello_world0% | 0% | 0% | 0% | 100%
examples0% | 0% | 0% | 0% | 100%
examples/imagenet0% | 0% | 0% | 0% | 100%
petastorm/tools0% | 0% | 0% | 0% | 100%
petastorm/gcsfs_helpers0% | 0% | 0% | 0% | 100%
petastorm/pyarrow_helpers0% | 0% | 0% | 0% | 100%
Legend:
101+
51-100
21-50
6-20
1-5 updates
Most Frequently Changed Files (Top 50)
File# lines# unitslast modified
(days ago)
created
(days ago)
# changes
__init__.py
in petastorm
4 - 17 781 62
reader.py
in petastorm
325 20 50 781 59
setup.py
in null
86 - 19 787 37
unischema.py
in petastorm
265 23 47 781 32
dataset_metadata.py
in petastorm/etl
200 12 2 781 31
spark_dataset_converter.py
in petastorm/spark
380 36 51 206 23
process_pool.py
in petastorm/workers_pool
213 14 225 781 19
codecs.py
in petastorm
154 24 2 781 18
pytorch.py
in petastorm
171 14 64 758 16
tf_utils.py
in petastorm
177 17 143 781 16
arrow_reader_worker.py
in petastorm
183 10 50 648 16
fs_utils.py
in petastorm
115 9 152 781 14
rowgroup_indexing.py
in petastorm/etl
73 4 338 781 13
petastorm_generate_metadata.py
in petastorm/etl
98 3 463 761 13
utils.py
in petastorm
62 3 153 781 12
namenode.py
in petastorm/hdfs
162 16 103 781 11
py_dict_reader_worker.py
in petastorm
147 12 50 648 10
ngram.py
in petastorm
105 15 339 757 9
transform.py
in petastorm
32 3 163 624 8
thread_pool.py
in petastorm/workers_pool
126 11 657 781 8
throughput.py
in petastorm/benchmark
126 8 150 723 8
weighted_sampling_reader.py
in petastorm
44 8 123 731 7
generate_petastorm_mnist.py
in examples/mnist
63 3 449 764 7
ventilator.py
in petastorm/workers_pool
81 12 529 781 7
shuffling_buffer.py
in petastorm/reader_impl
75 20 401 757 6
spark_utils.py
in petastorm
18 1 604 781 5
exec_in_new_process.py
in petastorm/workers_pool
31 1 596 781 5
dummy_pool.py
in petastorm/workers_pool
38 7 707 781 5
metadata_util.py
in petastorm/etl
45 - 604 781 5
generate_petastorm_imagenet.py
in examples/imagenet
67 3 744 775 5
predicates.py
in petastorm
93 21 454 781 5
pytorch_example.py
in examples/mnist
115 6 596 721 5
cache.py
in petastorm
9 2 657 781 4
compat.py
in petastorm
48 7 153 338 4
rowgroup_indexers.py
in petastorm/etl
73 14 757 781 4
copy_dataset.py
in petastorm/tools
77 4 463 718 4
__init__.py
in petastorm/spark
2 - 186 206 3
__init__.py
in petastorm/workers_pool
3 - 657 781 3
schema.py
in examples/mnist
9 - 758 764 3
generator.py
in petastorm
15 1 744 781 3
__init__.py
in petastorm/etl
21 6 657 781 3
local_disk_cache.py
in petastorm
26 3 759 781 3
selectors.py
in petastorm
46 11 496 781 3
gcsfs_wrapper.py
in petastorm/gcsfs_helpers
49 3 51 234 3
tensorflow_converter_example.py
in examples/spark_dataset_converter
62 4 64 162 3
cli.py
in petastorm/benchmark
67 3 467 723 3
__init__.py
in examples/mnist
1 - 739 764 2
schema.py
in examples/imagenet
9 - 768 775 2
legacy.py
in petastorm/etl
16 1 759 765 2
utils.py
in examples/spark_dataset_converter
16 2 64 162 2