uber / gluten-fork
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
16% | 7% | 28% | 18% | 29%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
cc51% | 10% | 22% | 7% | 7%
cpp14% | 14% | 30% | 21% | 18%
proto60% | 0% | 10% | 18% | 11%
java12% | 0% | 15% | 19% | 52%
scala2% | 7% | 39% | 24% | 26%
orc93% | 0% | 0% | 0% | 6%
h0% | 4% | 25% | 11% | 58%
yaml0% | 0% | 41% | 15% | 42%
xml0% | 0% | 88% | 0% | 11%
sql0% | 0% | 4% | 17% | 78%
py0% | 0% | 48% | 0% | 51%
cmake0% | 0% | 0% | 18% | 81%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
cpp-ch25% | 11% | 24% | 16% | 22%
cpp30% | 7% | 26% | 9% | 25%
gluten-data25% | 0% | 19% | 24% | 30%
shims15% | 0% | 48% | 18% | 18%
gluten-core5% | 9% | 26% | 23% | 35%
tools0% | 7% | 8% | 22% | 60%
backends-velox0% | 11% | 40% | 25% | 23%
backends-clickhouse0% | 0% | 46% | 18% | 35%
substrait0% | 0% | 29% | 30% | 40%
dev0% | 0% | 40% | 0% | 59%
gluten-celeborn0% | 0% | 25% | 56% | 17%
gluten-ui0% | 0% | 56% | 0% | 43%
gluten-iceberg0% | 0% | 0% | 46% | 53%
gluten-delta0% | 0% | 0% | 64% | 35%
Longest Files (Top 50)
File# lines# units
encoding.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
2718 184
SerializedPlanParser.cpp
in cpp-ch/local-engine/Parser
1971 46
SubstraitToVeloxPlan.cc
in cpp/velox/substrait
1957 65
column_reader.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
1672 86
ArrowWritableColumnVector.java
in gluten-data/src/main/java/io/glutenproject/vectorized
1523 214
GlutenConfig.scala
in shims/common/src/main/scala/io/glutenproject
1262 5
JniWrapper.cc
in cpp/core/jni
1205 58
proto
algebra.proto
in cpp-ch/local-engine/proto/substrait
1175 -
proto
algebra.proto
in gluten-core/src/main/resources/substrait/proto/substrait
1175 -
VeloxShuffleWriter.cc
in cpp/velox/shuffle
1163 50
orc
part-00000-482d0e2c-7667-42bb-8750-64ecfe331d28-c000.snappy.orc
in cpp/velox/benchmarks/data/bm_lineitem/orc
1151 -
local_engine_jni.cpp
in cpp-ch/local-engine
1135 72
reader.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
1121 79
SubstraitToVeloxPlanValidator.cc
in cpp/velox/substrait
1086 27
CHColumnToSparkRow.cpp
in cpp-ch/local-engine/Parser
829 39
ColumnarOverrides.scala
in gluten-core/src/main/scala/io/glutenproject/extension
720 23
reader_internal.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
712 23
TransformHintRule.scala
in gluten-core/src/main/scala/io/glutenproject/extension/columnar
709 28
CHUtil.cpp
in cpp-ch/local-engine/Common
694 32
TpcdsDataGen.scala
in tools/gluten-it/common/src/main/scala/io/glutenproject/integration/tpc/ds
648 3
ReadBufferBuilder.cpp
in cpp-ch/local-engine/Storages/SubstraitSource
596 19
OptimizedArrowColumnToCHColumn.cpp
in cpp-ch/local-engine/Storages/ch_parquet
590 17
ExpressionConverter.scala
in gluten-core/src/main/scala/io/glutenproject/expression
568 8
SparkParseURL.cpp
in cpp-ch/local-engine/Functions
555 34
VeloxToSubstraitExpr.cc
in cpp/velox/substrait
522 3
HashAggregateExecTransformer.scala
in backends-velox/src/main/scala/io/glutenproject/execution
521 13
SubstraitToVeloxExpr.cc
in cpp/velox/substrait
521 26
SparkFunctionHashingExtended.h
in cpp-ch/local-engine/Functions
517 27
VeloxShuffleReader.cc
in cpp/velox/shuffle
508 28
GraceMergingAggregatedStep.cpp
in cpp-ch/local-engine/Operator
497 19
MetricsApiImpl.scala
in backends-velox/src/main/scala/io/glutenproject/backendsapi/velox
494 4
MergeTreeFileFormatDataWriter.scala
in backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/v1/clickhouse
487 25
LocalPartitionWriter.cc
in cpp/core/shuffle
481 23
HashAggregateExecBaseTransformer.scala
in gluten-core/src/main/scala/io/glutenproject/execution
459 16
Payload.cc
in cpp/core/shuffle
458 11
FileFormatDataWriter.scala
in shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources
451 24
CHSparkPlanExecApi.scala
in backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse
447 24
FileFormatDataWriter.scala
in shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources
443 24
ParquetFileFormat.scala
in shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/parquet
442 13
ParquetFileFormat.scala
in shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/parquet
439 12
ArrowColumnVector.java
in gluten-data/src/main/java/io/glutenproject/vectorized
437 50
WholeStageResultIterator.cc
in cpp/velox/compute
435 11
ExcelNumberReader.h
in cpp-ch/local-engine/Storages/Serializations
416 6
CHExpressionTransformer.scala
in backends-clickhouse/src/main/scala/io/glutenproject/expression
409 9
SparkPlanExecApiImpl.scala
in backends-velox/src/main/scala/io/glutenproject/backendsapi/velox
396 19
CompressionBenchmark.cc
in cpp/core/benchmarks
388 14
CHHashAggregateExecTransformer.scala
in backends-clickhouse/src/main/scala/io/glutenproject/execution
380 12
JniFileSystem.cc
in cpp/velox/jni
373 43
SelectorBuilder.cpp
in cpp-ch/local-engine/Shuffle
372 11
ShuffleSplitter.cpp
in cpp-ch/local-engine/Shuffle
368 23
Files With Most Units (Top 50)
File# lines# units
ArrowWritableColumnVector.java
in gluten-data/src/main/java/io/glutenproject/vectorized
1523 214
encoding.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
2718 184
column_reader.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
1672 86
reader.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
1121 79
local_engine_jni.cpp
in cpp-ch/local-engine
1135 72
SubstraitToVeloxPlan.cc
in cpp/velox/substrait
1957 65
JniWrapper.cc
in cpp/core/jni
1205 58
WritableColumnVectorShim.java
in shims/spark34/src/main/java/org/apache/spark/sql/execution/vectorized
143 52
WritableColumnVectorShim.java
in shims/spark33/src/main/java/org/apache/spark/sql/execution/vectorized
143 52
WritableColumnVectorShim.java
in shims/spark32/src/main/java/org/apache/spark/sql/execution/vectorized
134 50
VeloxShuffleWriter.cc
in cpp/velox/shuffle
1163 50
ArrowColumnVector.java
in gluten-data/src/main/java/io/glutenproject/vectorized
437 50
SerializedPlanParser.cpp
in cpp-ch/local-engine/Parser
1971 46
JniFileSystem.cc
in cpp/velox/jni
373 43
ExpressionBuilder.java
in gluten-core/src/main/java/io/glutenproject/substrait/expression
210 41
CHColumnToSparkRow.cpp
in cpp-ch/local-engine/Parser
829 39
SparkParseURL.cpp
in cpp-ch/local-engine/Functions
555 34
CHUtil.cpp
in cpp-ch/local-engine/Common
694 32
TransformHintRule.scala
in gluten-core/src/main/scala/io/glutenproject/extension/columnar
709 28
TaskResources.scala
in gluten-core/src/main/scala/org/apache/spark/util
274 28
VeloxShuffleReader.cc
in cpp/velox/shuffle
508 28
QatCodec.cc
in cpp/core/utils/qat
239 28
SparkFunctionHashingExtended.h
in cpp-ch/local-engine/Functions
517 27
SubstraitToVeloxPlanValidator.cc
in cpp/velox/substrait
1086 27
RelBuilder.java
in gluten-core/src/main/java/io/glutenproject/substrait/rel
238 26
SubstraitToVeloxExpr.cc
in cpp/velox/substrait
521 26
MergeTreeFileFormatDataWriter.scala
in backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/v1/clickhouse
487 25
SubstraitParser.cc
in cpp/velox/substrait
332 25
IndicatorVector.java
in gluten-data/src/main/java/io/glutenproject/columnarbatch
116 25
FileFormatDataWriter.scala
in shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources
451 24
FileFormatDataWriter.scala
in shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources
443 24
ToSubstraitRel.scala
in substrait/substrait-spark/src/main/scala/io/substrait/spark/logical
294 24
SparkRowToCHColumn.h
in cpp-ch/local-engine/Parser
310 24
MetricsData.java
in backends-clickhouse/src/main/java/io/glutenproject/metrics
91 24
CHSparkPlanExecApi.scala
in backends-clickhouse/src/main/scala/io/glutenproject/backendsapi/clickhouse
447 24
ConverterUtils.scala
in gluten-core/src/main/scala/io/glutenproject/expression
361 24
OnHeapFileSystem.java
in backends-velox/src/main/java/io/glutenproject/fs
231 23
SparkFunctionRoundHalfUp.h
in cpp-ch/local-engine/Functions
205 23
ShuffleSplitter.cpp
in cpp-ch/local-engine/Shuffle
368 23
reader_internal.cc
in cpp-ch/local-engine/Storages/ch_parquet/arrow
712 23
ColumnarOverrides.scala
in gluten-core/src/main/scala/io/glutenproject/extension
720 23
LocalPartitionWriter.cc
in cpp/core/shuffle
481 23
ColumnarBatches.java
in gluten-data/src/main/java/io/glutenproject/columnarbatch
299 23
Print.h
in cpp/core/utils
60 22
ColumnarBatch.cc
in cpp/core/memory
164 22
JniLibLoader.java
in gluten-core/src/main/java/io/glutenproject/vectorized
267 21
TypeBuilder.java
in gluten-core/src/main/java/io/glutenproject/substrait/type
62 20
RewriteDateTimestampComparisonRule.scala
in gluten-core/src/main/scala/io/glutenproject/extension
271 20
JniCommon.h
in cpp/core/jni
300 20
SparkPlanExecApiImpl.scala
in backends-velox/src/main/scala/io/glutenproject/backendsapi/velox
396 19
Files With Long Lines (Top 50)

There are 282 files with lines longer than 120 characters. In total, there are 1257 long lines.

File# lines# units# long lines
orc
part-00000-482d0e2c-7667-42bb-8750-64ecfe331d28-c000.snappy.orc
in cpp/velox/benchmarks/data/bm_lineitem/orc
1151 - 161
local_engine_jni.cpp
in cpp-ch/local-engine
1135 72 74
SerializedPlanParser.cpp
in cpp-ch/local-engine/Parser
1971 46 67
OptimizedArrowColumnToCHColumn.cpp
in cpp-ch/local-engine/Storages/ch_parquet
590 17 41
ReadBufferBuilder.cpp
in cpp-ch/local-engine/Storages/SubstraitSource
596 19 25
SparkParseURL.cpp
in cpp-ch/local-engine/Functions
555 34 24
AggregateRelParser.cpp
in cpp-ch/local-engine/Parser
341 8 19
GraceMergingAggregatedStep.cpp
in cpp-ch/local-engine/Operator
497 19 19
CHColumnToSparkRow.cpp
in cpp-ch/local-engine/Parser
829 39 17
PartitionWriter.cpp
in cpp-ch/local-engine/Shuffle
344 13 14
SerializedPlanParser.h
in cpp-ch/local-engine/Parser
368 9 14
JoinRelParser.cpp
in cpp-ch/local-engine/Parser
351 10 14
TpcMixin.java
in tools/gluten-it/common/src/main/java/io/glutenproject/integration/tpc
126 3 14
SparkFunctionTupleElement.cpp
in cpp-ch/local-engine/Functions
185 10 13
SparkFunctionHashingExtended.h
in cpp-ch/local-engine/Functions
517 27 13
CHUtil.cpp
in cpp-ch/local-engine/Common
694 32 13
WindowRelParser.cpp
in cpp-ch/local-engine/Parser
328 9 11
orc
lineitem.orc
in cpp/velox/benchmarks/data/bm_lineitem/orc
62 - 11
SparkFunctionMakeDecimal.cpp
in cpp-ch/local-engine/Functions
187 9 10
SparkFunctionGetJsonObject.h
in cpp-ch/local-engine/Functions
322 13 10
StreamingAggregatingStep.cpp
in cpp-ch/local-engine/Operator
270 9 10
SubstraitToVeloxPlanValidator.cc
in cpp/velox/substrait
1086 27 10
SparkFunctionPositionUTF8.cpp
in cpp-ch/local-engine/Functions
221 8 9
SparkRowToCHColumn.cpp
in cpp-ch/local-engine/Parser
356 12 9
ExcelTextFormatFile.cpp
in cpp-ch/local-engine/Storages/SubstraitSource
319 13 9
ShuffleSplitter.cpp
in cpp-ch/local-engine/Shuffle
368 23 8
slice.cpp
in cpp-ch/local-engine/Parser/scalar_function_parser
76 5 8
AggregateFunctionParser.cpp
in cpp-ch/local-engine/Parser
156 5 8
AggregateFunctionParser.h
in cpp-ch/local-engine/Parser
137 7 8
ExcelStringReader.cpp
in cpp-ch/local-engine/Storages/Serializations
210 5 8
ExcelNumberReader.h
in cpp-ch/local-engine/Storages/Serializations
416 6 8
FunctionsBloomFilter.h
in cpp-ch/local-engine/Functions
179 9 7
SparkFunctionNextDay.cpp
in cpp-ch/local-engine/Functions
263 12 7
SparkFunctionCheckDecimalOverflow.cpp
in cpp-ch/local-engine/Functions
158 10 7
SelectorBuilder.cpp
in cpp-ch/local-engine/Shuffle
372 11 7
SparkRowToCHColumn.h
in cpp-ch/local-engine/Parser
310 24 7
sequence.cpp
in cpp-ch/local-engine/Parser/scalar_function_parser
67 2 7
ProjectRelParser.cpp
in cpp-ch/local-engine/Parser
152 2 7
GlutenSignalHandler.cpp
in cpp-ch/local-engine/Common
338 11 7
SparkFunctionRegexpExtractAll.cpp
in cpp-ch/local-engine/Functions
309 13 6
SparkFunctionTrim.cpp
in cpp-ch/local-engine/Functions
142 8 6
RelParser.h
in cpp-ch/local-engine/Parser
75 6 6
FunctionParser.h
in cpp-ch/local-engine/Parser
86 6 6
NativeReader.cpp
in cpp-ch/local-engine/Storages/IO
193 7 6
SubstraitFileSource.cpp
in cpp-ch/local-engine/Storages/SubstraitSource
326 9 6
ParquetFormatFile.cpp
in cpp-ch/local-engine/Storages/SubstraitSource
148 4 6
ExcelBoolReader.cpp
in cpp-ch/local-engine/Storages/Serializations
152 3 6
AggregateFunctionGroupBloomFilter.h
in cpp-ch/local-engine/AggregateFunctions
131 12 6
SparkRunModes.java
in tools/gluten-it/common/src/main/java/io/glutenproject/integration/tpc/command
108 7 6
SparkFunctionStrToMap.cpp
in cpp-ch/local-engine/Functions
134 9 5