Petastorm
Duplication

Intro
  • For duplication, we look at places in code where there are six or more lines of code that are exactly the same.
  • Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
  • You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.
Learn more...
Duplication Overall
  • 5% duplication:
    • 4,696 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
    • 246 duplicated lines
  • 13 duplicates
system5% (246 lines)
Duplication per Extension
py5% (246 lines)
Duplication per Component (primary)
petastorm6% (120 lines)
examples/spark_dataset_converter26% (44 lines)
petastorm/reader_impl14% (32 lines)
petastorm/etl6% (32 lines)
petastorm/benchmark8% (18 lines)
ROOT0% (0 lines)
examples/mnist0% (0 lines)
examples/hello_world0% (0 lines)
examples0% (0 lines)
examples/imagenet0% (0 lines)
petastorm/tools0% (0 lines)
petastorm/gcsfs_helpers0% (0 lines)
petastorm/pyarrow_helpers0% (0 lines)
petastorm/workers_pool0% (0 lines)
petastorm/hdfs0% (0 lines)
petastorm/spark0% (0 lines)
Longest Duplicates
The list of 13 longest duplicates.
Size#FoldersFilesLinesCode
15 x 2 examples/spark_dataset_converter
examples/spark_dataset_converter
pytorch_converter_example.py
tensorflow_converter_example.py
142:164 (15%)
94:116 (27%)
view
14 x 2 petastorm
petastorm
codecs.py
codecs.py
136:151 (9%)
177:192 (9%)
view
11 x 2 petastorm
petastorm
arrow_reader_worker.py
py_dict_reader_worker.py
140:154 (6%)
141:155 (8%)
view
10 x 2 petastorm/reader_impl
petastorm/reader_impl
pytorch_shuffling_buffer.py
shuffling_buffer.py
46:76 (8%)
36:65 (14%)
view
10 x 2 petastorm
petastorm
reader.py
reader.py
62:71 (3%)
187:196 (3%)
view
10 x 2 petastorm
petastorm
arrow_reader_worker.py
py_dict_reader_worker.py
225:244 (5%)
189:209 (7%)
view
9 x 2 petastorm/benchmark
petastorm/benchmark
dummy_reader.py
dummy_reader.py
54:65 (19%)
72:83 (19%)
view
9 x 2 petastorm/etl
petastorm/etl
rowgroup_indexers.py
rowgroup_indexers.py
42:53 (12%)
96:107 (12%)
view
8 x 2 petastorm
petastorm
selectors.py
selectors.py
59:71 (18%)
84:96 (18%)
view
7 x 2 petastorm/etl
petastorm/etl
rowgroup_indexers.py
rowgroup_indexers.py
59:66 (10%)
113:120 (10%)
view
7 x 2 examples/spark_dataset_converter
examples/spark_dataset_converter
pytorch_converter_example.py
tensorflow_converter_example.py
114:127 (7%)
61:74 (12%)
view
7 x 2 petastorm
petastorm
pytorch.py
pytorch.py
169:184 (4%)
294:310 (4%)
view
6 x 2 petastorm/reader_impl
petastorm/reader_impl
pytorch_shuffling_buffer.py
shuffling_buffer.py
122:129 (4%)
89:96 (8%)
view