Petastorm
Duplication

Intro

For duplication, we look at places in code where there are six or more lines of code that are exactly the same.
Before duplication is calculated, the code is cleaned to remove empty lines, comments, and frequently duplicated constructs such as imports.
You should aim at having as little as possible (<5%) of duplicated code as high-level of duplication can lead to maintenance difficulties, poor factoring, and logical contradictions.

Duplication Overall

5% duplication:

4,696 cleaned lines of cleaned code (without empty lines, comments, and frequently duplicated constructs such as imports)
246 duplicated lines

13 duplicates

Duplication per Extension

Duplication per Component (primary)

Longest Duplicates

The list of 13 longest duplicates.

Size	#	Folders	Files	Lines	Code
15	x 2	examples/spark_dataset_converter examples/spark_dataset_converter	pytorch_converter_example.py tensorflow_converter_example.py	142:164 (15%) 94:116 (27%)	view
14	x 2	petastorm petastorm	codecs.py codecs.py	136:151 (9%) 177:192 (9%)	view
11	x 2	petastorm petastorm	arrow_reader_worker.py py_dict_reader_worker.py	140:154 (6%) 141:155 (8%)	view
10	x 2	petastorm/reader_impl petastorm/reader_impl	pytorch_shuffling_buffer.py shuffling_buffer.py	46:76 (8%) 36:65 (14%)	view
10	x 2	petastorm petastorm	reader.py reader.py	62:71 (3%) 187:196 (3%)	view
10	x 2	petastorm petastorm	arrow_reader_worker.py py_dict_reader_worker.py	225:244 (5%) 189:209 (7%)	view
9	x 2	petastorm/benchmark petastorm/benchmark	dummy_reader.py dummy_reader.py	54:65 (19%) 72:83 (19%)	view
9	x 2	petastorm/etl petastorm/etl	rowgroup_indexers.py rowgroup_indexers.py	42:53 (12%) 96:107 (12%)	view
8	x 2	petastorm petastorm	selectors.py selectors.py	59:71 (18%) 84:96 (18%)	view
7	x 2	petastorm/etl petastorm/etl	rowgroup_indexers.py rowgroup_indexers.py	59:66 (10%) 113:120 (10%)	view
7	x 2	examples/spark_dataset_converter examples/spark_dataset_converter	pytorch_converter_example.py tensorflow_converter_example.py	114:127 (7%) 61:74 (12%)	view
7	x 2	petastorm petastorm	pytorch.py pytorch.py	169:184 (4%) 294:310 (4%)	view
6	x 2	petastorm/reader_impl petastorm/reader_impl	pytorch_shuffling_buffer.py shuffling_buffer.py	122:129 (4%) 89:96 (8%)	view