Petastorm
Unit Size

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 467 units with 4,204 lines of code in units (81.9% of code).
    • 0 very long units (0 lines of code)
    • 6 long units (345 lines of code)
    • 43 medium size units (1,233 lines of code)
    • 92 small units (1,315 lines of code)
    • 326 very small units (1,311 lines of code)
0% | 8% | 29% | 31% | 31%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
py0% | 8% | 29% | 31% | 31%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Logical Component
primary logical decomposition
petastorm0% | 15% | 26% | 25% | 32%
petastorm/workers_pool0% | 12% | 18% | 37% | 31%
petastorm/etl0% | 0% | 50% | 22% | 26%
examples/mnist0% | 0% | 57% | 33% | 9%
petastorm/benchmark0% | 0% | 49% | 36% | 14%
examples/spark_dataset_converter0% | 0% | 40% | 40% | 19%
petastorm/tools0% | 0% | 58% | 11% | 30%
examples/imagenet0% | 0% | 66% | 20% | 13%
petastorm/reader_impl0% | 0% | 15% | 13% | 70%
petastorm/gcsfs_helpers0% | 0% | 55% | 25% | 18%
petastorm/hdfs0% | 0% | 16% | 44% | 38%
petastorm/spark0% | 0% | 6% | 55% | 38%
examples/hello_world0% | 0% | 0% | 74% | 25%
petastorm/pyarrow_helpers0% | 0% | 0% | 55% | 44%
Legend:
101+
51-100
21-50
11-20
1-10
Alternative Visuals
Longest Units
Top 50 longest units
Unit# linesMcCabe index# params
def __init__()
in petastorm/fs_utils.py
67 19 6
def make_reader()
in petastorm/reader.py
63 11 21
def namedtuple_gt_255_fields()
in petastorm/namedtuple_gt_255_fields.py
55 6 5
def make_batch_reader()
in petastorm/reader.py
54 9 19
def __init__()
in petastorm/reader.py
53 20 17
def _worker_bootstrap()
in petastorm/workers_pool/process_pool.py
53 8 8
def train_and_test()
in examples/mnist/tf_example.py
47 6 4
def main()
in examples/mnist/pytorch_example.py
47 9 0
def _parse_args()
in petastorm/benchmark/cli.py
40 2 1
def generate_petastorm_metadata()
in petastorm/etl/petastorm_generate_metadata.py
37 15 5
def imagenet_directory_to_petastorm_dataset()
in examples/imagenet/generate_petastorm_imagenet.py
35 7 5
def from_arrow_schema()
in petastorm/unischema.py
35 13 3
def read_next()
in petastorm/arrow_reader_worker.py
34 10 4
def _main()
in petastorm/etl/petastorm_generate_metadata.py
34 3 1
def materialize_dataset()
in petastorm/etl/dataset_metadata.py
34 7 6
def run()
in examples/spark_dataset_converter/tensorflow_converter_example.py
33 7 1
def _load_rows_with_predicate()
in petastorm/arrow_reader_worker.py
33 7 5
def _load_rows_with_predicate()
in petastorm/py_dict_reader_worker.py
33 14 5
def _numpy_and_codec_from_arrow_type()
in petastorm/unischema.py
33 15 1
def get_results()
in petastorm/workers_pool/process_pool.py
32 16 1
def _add_many()
in petastorm/reader_impl/pytorch_shuffling_buffer.py
31 12 2
def run()
in examples/spark_dataset_converter/pytorch_converter_example.py
30 7 1
def main()
in petastorm/benchmark/dummy_reader.py
29 7 3
def reader_throughput()
in petastorm/benchmark/throughput.py
29 6 13
def process()
in petastorm/arrow_reader_worker.py
29 11 4
def copy_dataset()
in petastorm/tools/copy_dataset.py
28 10 9
def _iter_impl()
in petastorm/pytorch.py
28 10 1
def mnist_data_to_petastorm_dataset()
in examples/mnist/generate_petastorm_mnist.py
27 5 5
def form_ngram()
in petastorm/ngram.py
26 16 3
def args_parser()
in petastorm/tools/copy_dataset.py
26 1 0
def dict_to_spark_row()
in petastorm/unischema.py
26 9 2
def _init_spark()
in petastorm/etl/dataset_metadata.py
25 1 4
def load_row_groups()
in petastorm/etl/dataset_metadata.py
25 9 1
def process()
in petastorm/py_dict_reader_worker.py
25 10 4
def _iter_impl()
in petastorm/pytorch.py
25 12 1
def add_to_dataset_metadata()
in petastorm/utils.py
25 11 3
def walk()
in petastorm/gcsfs_helpers/gcsfs_wrapper.py
24 11 2
def build_rowgroup_index()
in petastorm/etl/rowgroup_indexing.py
24 7 4
def _load_rows()
in petastorm/arrow_reader_worker.py
23 7 4
def encode()
in petastorm/codecs.py
23 11 3
def _apply_predicate_to_row_groups()
in petastorm/reader.py
23 6 4
def __init__()
in petastorm/workers_pool/ventilator.py
23 8 7
def __init__()
in petastorm/hdfs/namenode.py
23 5 2
def make_petastorm_dataset()
in petastorm/tf_utils.py
23 3 1
def run()
in petastorm/workers_pool/thread_pool.py
22 10 1
def __init__()
in petastorm/weighted_sampling_reader.py
21 11 3
def _split_row_groups()
in petastorm/etl/dataset_metadata.py
21 8 1
def _sanitize_field_tf_types()
in petastorm/tf_utils.py
21 17 1
def _wait_file_available()
in petastorm/spark/spark_dataset_converter.py
21 6 1
def main()
in examples/mnist/tf_example.py
20 1 0