uber / gluten-fork
Duplication

Places in code with 6 or more lines that are exactly the same.

Intro
Learn more...
Duplication Overall
system20% (20,052 lines)
dependency graphs: 2D graph | 3D graph | 3D graph (with duplicates)...
Duplication per Extension
scala29% (10,175 lines)
proto99% (3,894 lines)
java16% (1,472 lines)
cc7% (1,345 lines)
sql23% (1,209 lines)
cpp6% (1,016 lines)
h4% (374 lines)
yaml50% (330 lines)
cmake14% (209 lines)
py7% (28 lines)
Duplication per Component (primary)
shims70% (4,819 lines)
cpp-ch12% (3,528 lines)
gluten-core21% (3,512 lines)
backends-clickhouse32% (2,928 lines)
tools20% (1,632 lines)
cpp7% (1,377 lines)
gluten-data23% (1,028 lines)
backends-velox19% (728 lines)
gluten-celeborn42% (338 lines)
substrait8% (111 lines)
gluten-delta18% (23 lines)
gluten-iceberg6% (14 lines)
dev1% (14 lines)
gluten-ui0% (0 lines)

Duplication Between Components (50+ lines)

G cpp-ch cpp-ch gluten-core gluten-core cpp-ch--gluten-core 3894 shims shims gluten-core--shims 74 backends-clickhouse backends-clickhouse backends-clickhouse--shims 1491 backends-velox backends-velox backends-clickhouse--backends-velox 858 backends-clickhouse--gluten-core 566 gluten-data gluten-data backends-clickhouse--gluten-data 254 gluten-celeborn gluten-celeborn backends-clickhouse--gluten-celeborn 209 backends-velox--gluten-core 266 backends-velox--shims 82 gluten-celeborn--gluten-data 215 gluten-delta gluten-delta gluten-delta--shims 56

Download: SVG DOT (open online Graphviz editor)

Open 2D force graph... Open 3D force graph...

Show more details on duplication between components...
Longest Duplicates
The list of 50 longest duplicates.
See data for all 1,453 duplicates...
Size#FoldersFilesLinesCode
1175 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:1388 (100%)
1:1388 (100%)
view
202 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:241 (100%)
1:241 (100%)
view
193 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
308:623 (53%)
310:625 (52%)
view
193 x 2 shims/spark32/src/main/s...park/sql/execution/stat
shims/spark33/src/main/s...park/sql/execution/stat
41:354 (100%)
41:362 (100%)
view
134 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:160 (100%)
1:160 (100%)
view
121 x 2 shims/spark33/src/main/j...ql/execution/vectorized
shims/spark34/src/main/j...ql/execution/vectorized
28:221 (100%)
28:221 (100%)
view
115 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:148 (100%)
1:148 (100%)
view
106 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:128 (100%)
1:128 (100%)
view
105 x 2 shims/spark32/src/main/s...ion/datasources/parquet
shims/spark33/src/main/s...ion/datasources/parquet
287:441 (29%)
279:433 (29%)
view
99 x 2 shims/spark32/src/main/s...ion/datasources/parquet
shims/spark33/src/main/s...ion/datasources/parquet
463:623 (27%)
449:609 (27%)
view
98 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
140:286 (57%)
142:288 (57%)
view
93 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
51:209 (25%)
44:202 (25%)
view
92 x 2 backends-clickhouse/src/...tasources/v1/clickhouse
shims/spark33/src/main/s...l/execution/datasources
530:662 (23%)
493:625 (25%)
view
92 x 2 backends-clickhouse/src/...tasources/v1/clickhouse
shims/spark32/src/main/s...l/execution/datasources
530:662 (23%)
491:623 (25%)
view
81 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
280:395 (29%)
301:415 (27%)
view
79 x 2 shims/spark32/src/main/s...park/sql/hive/execution
shims/spark33/src/main/s...park/sql/hive/execution
56:163 (67%)
54:160 (61%)
view
77 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
47:180 (59%)
39:172 (59%)
view
73 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
60:165 (26%)
52:157 (25%)
view
69 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
194:281 (30%)
265:352 (24%)
view
68 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:82 (100%)
1:82 (100%)
view
64 x 2 cpp-ch/local-engine/proto/substrait/extensions
gluten-core/src/main/res...to/substrait/extensions
1:81 (100%)
1:81 (100%)
view
63 x 2 shims/spark32/src/main/j...ql/execution/vectorized
shims/spark34/src/main/j...ql/execution/vectorized
63:157 (54%)
70:164 (52%)
view
63 x 2 shims/spark32/src/main/j...ql/execution/vectorized
shims/spark33/src/main/j...ql/execution/vectorized
63:157 (54%)
70:164 (52%)
view
63 x 2 backends-clickhouse/src/...es/v2/clickhouse/source
backends-clickhouse/src/...es/v2/clickhouse/source
32:108 (96%)
31:107 (98%)
view
58 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
37:131 (53%)
37:136 (53%)
view
56 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
107:192 (24%)
178:263 (19%)
view
55 x 2 shims/spark32/src/main/s...che/spark/sql/execution
shims/spark33/src/main/s...che/spark/sql/execution
74:146 (64%)
75:147 (64%)
view
55 x 2 shims/spark32/src/main/s...ecution/datasources/orc
shims/spark33/src/main/s...ecution/datasources/orc
227:300 (28%)
191:264 (35%)
view
54 x 2 shims/spark32/src/main/s...ion/datasources/parquet
shims/spark33/src/main/s...ion/datasources/parquet
66:154 (15%)
58:146 (15%)
view
54 x 2 gluten-celeborn/velox/sr...rg/apache/spark/shuffle
gluten-data/src/main/sca...lutenproject/vectorized
175:263 (37%)
156:250 (39%)
view
52 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
67:135 (30%)
69:137 (30%)
view
48 x 2 gluten-data/src/main/jav...enproject/columnarbatch
gluten-data/src/main/jav...enproject/columnarbatch
80:157 (61%)
38:115 (88%)
view
48 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
397:462 (17%)
417:482 (16%)
view
45 x 2 shims/spark32/src/main/s...che/spark/sql/execution
shims/spark33/src/main/s...che/spark/sql/execution
26:91 (100%)
26:91 (100%)
view
43 x 2 backends-clickhouse/src/...tasources/v1/clickhouse
shims/spark33/src/main/s...l/execution/datasources
224:308 (10%)
202:286 (11%)
view
43 x 2 gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
27:76 (97%)
23:71 (97%)
view
41 x 2 cpp-ch/local-engine/proto/substrait
gluten-core/src/main/res...bstrait/proto/substrait
1:51 (100%)
1:51 (100%)
view
40 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
159:230 (36%)
160:232 (37%)
view
38 x 2 shims/spark32/src/main/s...park/sql/hive/execution
shims/spark33/src/main/s...park/sql/hive/execution
170:223 (32%)
182:235 (29%)
view
37 x 2 gluten-core/src/main/res...es/substrait/extensions
gluten-core/src/main/res...es/substrait/extensions
45:81 (13%)
84:120 (13%)
view
37 x 2 gluten-core/src/main/res...es/substrait/extensions
gluten-core/src/main/res...es/substrait/extensions
6:42 (13%)
84:120 (13%)
view
37 x 2 gluten-core/src/main/res...es/substrait/extensions
gluten-core/src/main/res...es/substrait/extensions
6:42 (13%)
45:81 (13%)
view
36 x 2 backends-velox/src/main/...oject/backendsapi/velox
backends-velox/src/main/...oject/backendsapi/velox
90:125 (7%)
131:166 (7%)
view
36 x 2 shims/spark32/src/main/s...ion/datasources/parquet
shims/spark33/src/main/s...ion/datasources/parquet
157:216 (10%)
153:212 (10%)
view
36 x 2 shims/spark32/src/main/s...ion/datasources/parquet
shims/spark33/src/main/s...ion/datasources/parquet
223:263 (10%)
218:258 (10%)
view
35 x 2 shims/spark32/src/main/s...l/execution/datasources
shims/spark33/src/main/s...l/execution/datasources
213:258 (26%)
205:250 (26%)
view
35 x 2 tools/gluten-it/common/s...ject/integration/tpc/ds
tools/gluten-it/common/s...oject/integration/tpc/h
28:63 (22%)
28:63 (46%)
view
34 x 2 shims/spark32/src/main/s...ecution/datasources/orc
shims/spark33/src/main/s...ecution/datasources/orc
106:154 (17%)
85:133 (21%)
view
33 x 2 backends-clickhouse/src/...tasources/v1/clickhouse
shims/spark33/src/main/s...l/execution/datasources
398:442 (8%)
369:413 (8%)
view
33 x 2 backends-clickhouse/src/...lutenproject/vectorized
gluten-celeborn/clickhou...rg/apache/spark/shuffle
130:185 (35%)
191:246 (25%)
view
Duplicated Units
The list of top 50 duplicated units.
See data for all 73 unit duplicates...
Size#FoldersFilesLinesCode
78 x 2 shims/spark33/src/main/s...park/sql/hive/execution
shims/spark32/src/main/s...park/sql/hive/execution
70:164 
72:167 
view
71 x 2 shims/spark33/src/main/s...park/sql/execution/stat
shims/spark32/src/main/s...park/sql/execution/stat
276:363 
269:355 
view
48 x 2 shims/spark33/src/main/s...ion/datasources/parquet
shims/spark32/src/main/s...ion/datasources/parquet
450:507 
464:521 
view
40 x 2 shims/spark33/src/main/s...park/sql/execution/stat
shims/spark32/src/main/s...park/sql/execution/stat
75:123 
77:124 
view
39 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
81:140 
89:148 
view
39 x 2 shims/spark33/src/main/s...park/sql/execution/stat
shims/spark32/src/main/s...park/sql/execution/stat
219:273 
216:266 
view
35 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
83:131 
91:139 
view
35 x 3 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
backends-clickhouse/src/...tasources/v1/clickhouse
577:614 
575:612 
614:651 
view
34 x 2 backends-clickhouse/src/...es/v2/clickhouse/source
backends-clickhouse/src/...es/v2/clickhouse/source
70:110 
71:111 
view
33 x 3 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
backends-clickhouse/src/...tasources/v1/clickhouse
501:544 
499:542 
538:581 
view
32 x 2 backends-clickhouse/src/.../io/glutenproject/utils
gluten-core/src/main/scala/io/glutenproject/utils
90:125 
77:112 
view
26 x 2 gluten-celeborn/velox/sr...rg/apache/spark/shuffle
gluten-data/src/main/sca...lutenproject/vectorized
187:215 
168:196 
view
25 x 2 shims/spark33/src/main/s...che/spark/sql/execution
shims/spark32/src/main/s...che/spark/sql/execution
117:148 
116:147 
view
24 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
231:258 
229:256 
view
22 x 2 shims/spark33/src/main/s...ion/datasources/parquet
shims/spark32/src/main/s...ion/datasources/parquet
514:540 
528:554 
view
22 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
112:137 
107:132 
view
21 x 2 shims/spark33/src/main/s...che/spark/sql/execution
shims/spark32/src/main/s...che/spark/sql/execution
68:92 
68:92 
view
21 x 2 gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
50:73 
55:78 
view
20 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
409:431 
407:429 
view
20 x 2 cpp/velox/benchmarks
cpp/velox/benchmarks
54:83 
60:89 
view
19 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
461:484 
441:464 
view
18 x 3 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
backends-clickhouse/src/...tasources/v1/clickhouse
385:407 
383:405 
414:436 
view
17 x 2 shims/spark34/src/main/s...oject/sql/shims/spark34
shims/spark33/src/main/s...oject/sql/shims/spark33
89:107 
88:106 
view
17 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
267:285 
265:283 
view
17 x 3 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
backends-clickhouse/src/...tasources/v1/clickhouse
547:566 
545:564 
584:603 
view
17 x 2 shims/spark33/src/main/s...ion/datasources/parquet
shims/spark32/src/main/s...ion/datasources/parquet
590:610 
604:624 
view
17 x 2 cpp-ch/local-engine/Storages
cpp-ch/local-engine/Storages/ch_parquet
36:57 
99:120 
view
16 x 2 backends-clickhouse/src/...es/v2/clickhouse/source
backends-clickhouse/src/...es/v2/clickhouse/source
39:56 
40:57 
view
15 x 2 backends-velox/src/main/...glutenproject/execution
backends-clickhouse/src/...glutenproject/execution
29:49 
29:48 
view
15 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
73:91 
71:89 
view
14 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
64:79 
72:87 
view
14 x 2 shims/spark33/src/main/s...che/spark/sql/execution
shims/spark32/src/main/s...che/spark/sql/execution
90:105 
89:104 
view
14 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
381:400 
302:321 
view
13 x 2 shims/spark33/src/main/s...ion/datasources/parquet
shims/spark32/src/main/s...ion/datasources/parquet
556:574 
570:588 
view
13 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
419:433 
339:353 
view
12 x 2 shims/spark34/src/main/s...che/spark/sql/execution
shims/spark33/src/main/s...che/spark/sql/execution
36:49 
36:49 
view
12 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
486:499 
484:497 
view
12 x 2 shims/spark33/src/main/s...park/sql/execution/stat
shims/spark32/src/main/s...park/sql/execution/stat
155:168 
154:167 
view
12 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
198:213 
197:212 
view
11 x 2 shims/spark33/src/main/s...ion/datasources/parquet
shims/spark32/src/main/s...ion/datasources/parquet
218:230 
223:235 
view
11 x 2 cpp-ch/local-engine/Operator
cpp-ch/local-engine/Operator
261:273 
36:48 
view
10 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
63:76 
63:76 
view
10 x 2 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
152:166 
159:173 
view
10 x 3 shims/spark33/src/main/s...l/execution/datasources
shims/spark32/src/main/s...l/execution/datasources
backends-clickhouse/src/...tasources/v1/clickhouse
174:187 
181:194 
196:209 
view
10 x 3 cpp-ch/local-engine/Operator
cpp-ch/local-engine/Operator
cpp-ch/local-engine/Operator
37:48 
34:45 
90:101 
view
10 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
72:85 
67:80 
view
10 x 2 backends-clickhouse/src/...e/spark/sql/delta/files
backends-clickhouse/src/...e/spark/sql/delta/files
95:110 
90:105 
view
10 x 2 backends-clickhouse/src/.../v1/clickhouse/commands
backends-clickhouse/src/.../v1/clickhouse/commands
96:107 
94:105 
view
10 x 2 cpp/velox/benchmarks
cpp/core/benchmarks
39:50 
41:52 
view
10 x 6 gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
gluten-data/src/main/sca...o/glutenproject/metrics
23:34 
23:34 
23:34 
23:34 
23:34 
23:34 
view