backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHSparkPlanExecApi.scala (8 lines): - line 226: // FIXME: The operation happens inside ReplaceSingleNode(). - line 228: // FIXME: HeuristicTransform is costly. Re-applying it may cause performance issues. - line 252: // FIXME: The operation happens inside ReplaceSingleNode(). - line 254: // FIXME: HeuristicTransform is costly. Re-applying it may cause performance issues. - line 522: // TODO: remove this after pushdowning preprojection - line 842: // FIXME: DeltaMergeTreeFileFormat should not inherit from ParquetFileFormat. - line 848: // TODO: datasource v2 ? - line 849: // TODO: Push down conditions with scalar subquery cpp-ch/local-engine/Common/GlutenSignalHandler.cpp (8 lines): - line 236: std::string build_id; // TODO : Build ID - line 238: std::string stored_binary_hash; // TODO: binary checksum - line 312: /// TODO: Please keep the below log messages in-sync with the ones in ~programs/server/Server.cpp~ - line 342: /// FIXME: Write crash to system.crash_log table if available. - line 346: ///TODO: Send crash report to developers (if configured) - line 349: /// TODO: SentryWriter::onFault(sig, error_message, stack_trace); - line 351: /// TODO: Advice the user to send it manually. - line 393: /// TODO:: Set up Poco ErrorHandler for Poco Threads. shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala (8 lines): - line 52: // FIXME the option currently controls both JVM and native validation against a Substrait plan. - line 465: * TODO: Improve the get native conf logic. - line 581: * TODO: Improve the get native conf logic. - line 773: // FIXME the option currently controls both JVM and native validation against a Substrait plan. - line 1310: // FIXME: This option is no longer only used by RAS. Should change key to - line 1468: // FIXME: This only works with CH backend. - line 1476: // FIXME: This only works with CH backend. - line 1483: // FIXME: This only works with CH backend. cpp/velox/shuffle/GlutenByteStream.h (7 lines): - line 18: // TODO: wait to delete after rss sort reader refactored. - line 25: /// TODO Remove after refactoring SpillInput. - line 54: /// TODO Remove after refactoring SpillInput. - line 70: /// TODO: Remove 'virtual' after refactoring SpillInput. - line 229: /// TODO: Remove 'virtual' after refactoring SpillInput. - line 244: // TODO: Remove after refactoring SpillInput. - line 249: // TODO: Remove after refactoring SpillInput. gluten-substrait/src/main/resources/substrait/proto/substrait/algebra.proto (6 lines): - line 50: // TODO: nodes, cpu threads/%, memory, iops, etc. - line 289: // TODO -- Remove this unnecessary type. - line 617: //TODO add PK/constraints/indexes/etc..? - line 1029: // greater than the upper bound, TODO (null range/no records passed? - line 1039: // less than the lower bound, TODO (null range/no records passed? - line 1344: // TODO: should allow expressions cpp-ch/local-engine/proto/substrait/algebra.proto (6 lines): - line 50: // TODO: nodes, cpu threads/%, memory, iops, etc. - line 289: // TODO -- Remove this unnecessary type. - line 617: //TODO add PK/constraints/indexes/etc..? - line 1029: // greater than the upper bound, TODO (null range/no records passed? - line 1039: // less than the lower bound, TODO (null range/no records passed? - line 1344: // TODO: should allow expressions backends-clickhouse/src/main/scala/org/apache/gluten/utils/CHExpressionUtil.scala (4 lines): - line 89: // TODO: When limit is positive, CH result is wrong, fix it later - line 103: // TODO: CH substringIndexUTF8 function only support string literal as delimiter - line 108: // TODO: CH substringIndexUTF8 function only support single character as delimiter - line 144: // TODO: CH formatDateTimeInJodaSyntax/fromUnixTimestampInJodaSyntax only support gluten-flink/runtime/src/main/java/org/apache/gluten/vectorized/FlinkRowToVLVectorConvertor.java (4 lines): - line 54: // TODO: support more types - line 74: // TODO: refine this - line 106: // TODO: support precision - line 133: // TODO: support more types gluten-substrait/src/main/scala/org/apache/gluten/expression/ConverterUtils.scala (4 lines): - line 102: // TODO: This is used only by `BasicScanExecTransformer`, - line 347: case BooleanType => // TODO: Not in Substrait yet. - line 363: // TODO: different with Substrait due to more details here. - line 368: // TODO: different with Substrait due to more details here. cpp/velox/compute/VeloxBackend.cc (4 lines): - line 203: // FIXME: Make this configurable. - line 224: // FIXME It's known that if spill compression is disabled, the actual spill file size may - line 275: // TODO: this is not tracked by Spark. - line 278: // TODO: this is not tracked by Spark. gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/enumerated/RasOffload.scala (4 lines): - line 90: // TODO: Tag the original plan with fallback reason. - line 107: // TODO: Remove this catch block - line 131: // TODO: Tag the original plan with fallback reason. This is a non-trivial work - line 140: // TODO: Tag the original plan with fallback reason. This is a non-trivial work cpp-ch/local-engine/Parser/RelParsers/MergeTreeRelParser.cpp (3 lines): - line 341: // TODO need to test - line 410: // TODO: get primary_key_names - line 527: 10); // TODO: Expect use driver cores. backends-clickhouse/src-delta-32/main/scala/org/apache/spark/sql/delta/ClickhouseOptimisticTransaction.scala (3 lines): - line 73: // TODO: update FallbackByBackendSettings for mergetree always return true - line 267: // TODO: val checkInvariants = DeltaInvariantCheckerExec(empty2NullPlan, constraints) - line 306: val fileFormat = deltaLog.fileFormat(protocol, metadata) // TODO support changing formats. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/heuristic/RewriteSparkPlanRulesManager.scala (3 lines): - line 56: // TODO: Find a better approach than checking `p.isInstanceOf[ProjectExec]` which is not - line 77: // TODO: Remove this catch block - line 93: // TODO: Fix the exception and remove this branch backends-clickhouse/src/main/scala/org/apache/spark/sql/delta/util/MergeTreePartitionUtils.scala (3 lines): - line 26: * `DelayedCommitProtocol.parsePartitions`. This is a copied version.
TODO: Remove it. - line 33: // TODO: timezones? - line 34: // TODO: enable validatePartitionColumns? backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/DeltaLog.scala (3 lines): - line 501: // TODO: Don't add the bucketOption here, it will cause the OOM when the merge into update - line 524: // TODO: If snapshotToUse is unspecified, get the correct snapshot from update() - line 691: // TODO: We should use ref-counting to uncache snapshots instead of a manual timed op backends-clickhouse/src-delta-32/main/scala/org/apache/spark/sql/delta/DeltaLog.scala (3 lines): - line 543: // TODO: Don't add the bucketOption here, it will cause the OOM when the merge into update - line 563: // TODO: If snapshotToUse is unspecified, get the correct snapshot from update() - line 737: // TODO: We should use ref-counting to uncache snapshots instead of a manual timed op backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHRuleApi.scala (3 lines): - line 169: * TODO: Remove this since tricky to maintain. - line 184: // TODO: Currently there are some fallback issues on CH backend when SparkPlan is - line 185: // TODO: SerializeFromObjectExec, ObjectHashAggregateExec and V2CommandExec. cpp-ch/local-engine/Common/GlutenConfig.cpp (2 lines): - line 41: // TODO: Remove BackendInitializerUtil::initSettings - line 159: // TODO support transfer spark settings from spark session to native engine cpp-ch/local-engine/Parser/SerializedPlanParser.cpp (2 lines): - line 316: // TODO: set optimize_plan to true when metrics could be collected while ch query plan optimization is enabled. - line 378: // TODO: make it the same as spark, it's too simple at present. cpp-ch/local-engine/Storages/SubstraitSource/ParquetFormatFile.cpp (2 lines): - line 107: // TODO: format_settings_.parquet.max_block_size - line 203: // TODO: enable filter push down again gluten-substrait/src/main/scala/org/apache/gluten/execution/BasicPhysicalOperatorTransformer.scala (2 lines): - line 108: // FIXME: Should use field "condition" to store the actual executed filter expressions. - line 250: // FIXME: Avoid such practice for plan immutability. gluten-arrow/src/main/java/org/apache/gluten/vectorized/ArrowWritableColumnVector.java (2 lines): - line 767: // TODO: should be final after removing ArrayAccessor workaround - line 1171: // TODO: Workaround if vector has all non-null values, see ARROW-1948 gluten-ras/common/src/main/scala/org/apache/gluten/ras/path/PathMask.scala (2 lines): - line 23: // FIXME: This is not currently in use. Use pattern instead. - line 43: // FIXME: This is a rough validation. backends-velox/src/main/scala/org/apache/spark/sql/execution/ColumnarCachedBatchSerializer.scala (2 lines): - line 53: * 2. TODO: support push down filter - line 54: * 3. Super TODO: support store offheap object directly gluten-substrait/src/main/scala/org/apache/spark/sql/execution/ShuffledColumnarBatchRDD.scala (2 lines): - line 41: // TODO this check is based on assumptions of callers' behavior but is sufficient for now. - line 77: // TODO order by partition size. cpp-ch/local-engine/Storages/Output/NormalFileWriter.h (2 lines): - line 71: // TODO Support delta.dataSkippingStatsColumns, detail see https://docs.databricks.com/aws/en/delta/data-skipping - line 458: // FIXME if toString(partition_column) is empty backends-velox/src/main/scala/org/apache/gluten/execution/HashJoinExecTransformer.scala (2 lines): - line 117: // TODO: Support cross join with Cross Rel - line 130: // FIXME: Do we have to make build side a RDD? cpp-ch/local-engine/Common/CHUtil.cpp (2 lines): - line 647: // TODO: we need set Setting::max_threads to 1 by default, but now we can't get correct metrics for the some query if we set it to 1. - line 657: /// TODO: FIXME set true again. cpp/velox/substrait/SubstraitToVeloxPlan.cc (2 lines): - line 168: // TODO Simplify Velox's aggregation steps - line 1429: // TODO: Use the names as the output names for the whole computing. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/enumerated/planner/property/Conv.scala (2 lines): - line 43: // TODO: Add a similar case to RAS UTs. - line 81: // TODO: Should the convention-transparent ops (e.g., aqe shuffle read) support cpp-ch/local-engine/Common/DebugUtils.cpp (2 lines): - line 59: //TODO: Implement this method - line 434: //TODO: ColumnSet backends-clickhouse/src/main/scala/org/apache/gluten/execution/CHHashJoinExecTransformer.scala (2 lines): - line 79: // TODO: Support cross join with Cross Rel - line 272: // FIXME: Do we have to make build side a RDD? cpp-ch/local-engine/Functions/SparkFunctionGetJsonObject.h (2 lines): - line 265: /// FIXME: It will be OK if we just return a leaf value, but it will have different result for - line 541: /// FIXME: If it contains \t, \n, simdjson cannot parse. backends-clickhouse/src/main/scala/org/apache/gluten/extension/FallbackBroadcastHashJoinRules.scala (2 lines): - line 162: // FIXME Hongze: In following codes we perform a lot of if-else conditions to - line 248: // FIXME did we consider the case that AQE: OFF && Reuse: ON ? backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/CHColumnarWriteFilesExec.scala (2 lines): - line 107: // TODO: task commit time - line 108: // TODO: get the schema from result ColumnarBatch and verify it. gluten-arrow/src/main/java/org/apache/gluten/vectorized/ArrowColumnVector.java (2 lines): - line 210: // TODO: should be final after removing ArrayAccessor workaround - line 490: // TODO: Workaround if vector has all non-null values, see ARROW-1948 gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala (2 lines): - line 212: // FIXME: Do we still need this trick since - line 219: // FIXME Hongze 22/12/06 cpp/velox/shuffle/VeloxHashShuffleWriter.cc (2 lines): - line 703: // TODO: maybe an estimated row is more reasonable - line 707: // TODO: maybe memory issue, copy many times gluten-flink/planner/src/main/java/org/apache/gluten/rexnode/RexNodeConverter.java (2 lines): - line 86: // TODO: use LogicalRelDataTypeConverter - line 112: // TODO: fix precision check backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/v1/CHFormatWriterInjects.scala (2 lines): - line 42: // TODO: move to SubstraitUtil - line 88: // TODO: parquet and mergetree gluten-substrait/src/main/scala/org/apache/gluten/execution/SortMergeJoinExecTransformer.scala (2 lines): - line 157: // TODO: Support cross join with Cross Rel - line 158: // TODO: Support existence join shims/spark33/src/main/scala/org/apache/spark/sql/execution/AbstractFileSourceScanExec.scala (2 lines): - line 232: // TODO SPARK-24528 Sort order is currently ignored if buckets are coalesced. - line 234: // TODO Currently Spark does not support writing columns sorting in descending order cpp/velox/compute/WholeStageResultIterator.cc (2 lines): - line 279: // FIXME: The whole metrics system in gluten-substrait is magic. Passing metrics trees through JNI with a trivial - line 485: // TODO: Move the calculations to Java side. backends-velox/src/main/scala/org/apache/gluten/datasource/v2/ArrowCSVPartitionReaderFactory.scala (2 lines): - line 110: // TODO: support array/map/struct types in out-of-order schema reading. - line 113: // TODO: support array/map/struct types in out-of-order schema reading. backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxListenerApi.scala (2 lines): - line 231: // TODO shutdown implementation in velox to release resources - line 236: // TODO: Implement graceful shutdown and remove these flags. shims/spark32/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java (2 lines): - line 476: // TODO: try to find space on previous pages - line 802: *

TODO: support forced spilling gluten-substrait/src/main/scala/org/apache/spark/sql/execution/datasources/GlutenWriterColumnarRules.scala (2 lines): - line 92: // TODO: support ctas in Spark3.4, see https://github.com/apache/spark/pull/39220 - line 93: // TODO: support dynamic partition and bucket write cpp-ch/local-engine/Common/GlutenConfig.h (2 lines): - line 179: /// TODO: spark_version - line 180: /// TODO: pass spark configs to clickhouse backend. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/heuristic/HeuristicTransform.scala (2 lines): - line 98: * TODO: Handle tags internally. Remove tag handling code in user offload rules. - line 120: // TODO: Avoid using this and eventually remove the API. backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/DeltaLog.scala (2 lines): - line 140: // TODO: There is a race here where files could get dropped when increasing the - line 149: // TODO (Fred): Get rid of this FrameProfiler record once SC-94033 is addressed gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/rewrite/RewriteIn.scala (2 lines): - line 32: * TODO: Remove this rule once Velox support the list option in `In` is not literal. - line 65: // TODO: Support datasource v2 shims/spark32/src/main/scala/org/apache/spark/sql/execution/AbstractFileSourceScanExec.scala (2 lines): - line 222: // TODO SPARK-24528 Sort order is currently ignored if buckets are coalesced. - line 224: // TODO Currently Spark does not support writing columns sorting in descending order backends-velox/src/main/scala/org/apache/gluten/execution/GenerateExecTransformer.scala (2 lines): - line 166: // TODO: supports outer and remove this param. - line 251: // TODO: The prefix is just for adapting to GetJsonObject. cpp-ch/local-engine/Storages/MergeTree/SparkMergeTreeSink.h (2 lines): - line 35: // TODO: Remove ConcurrentDeque - line 336: // TODO implement with bucket backends-velox/src-uniffle/main/java/org/apache/spark/shuffle/gluten/uniffle/UniffleShuffleManager.java (1 line): - line 39: // FIXME: remove this after https://github.com/apache/incubator-uniffle/pull/2193 gluten-ras/common/src/main/scala/org/apache/gluten/ras/memo/Memo.scala (1 line): - line 128: // TODO: Traverse up the tree to do more merges. cpp/core/compute/ResultIterator.h (1 line): - line 28: // FIXME the code is tightly coupled with Velox plan execution. Should cleanup the abstraction for uses from gluten-ras/common/src/main/scala/org/apache/gluten/ras/dp/DpPlanner.scala (1 line): - line 28: // TODO: Branch and bound pruning. gluten-core/src/main/scala/org/apache/spark/task/TaskResources.scala (1 line): - line 210: // TODO: cpp-ch/local-engine/Storages/SubstraitSource/ParquetFormatFile.h (1 line): - line 60: /// TODO: we should use KeyCondition instead of ColumnIndexFilter, this is a temporary solution shims/common/src/main/scala/org/apache/gluten/config/ReservedKeys.scala (1 line): - line 23: * TODO: Other internal constant key should be moved here. cpp-ch/local-engine/Storages/Output/ORCOutputFormatFile.cpp (1 line): - line 44: // TODO: align all spark orc config with ch orc config cpp-ch/local-engine/Functions/SparkCastComplexTypesToString.h (1 line): - line 110: // TODO: respect spark.sql.legacy.castComplexTypesToString.enabled cpp/velox/compute/VeloxRuntime.h (1 line): - line 50: // FIXME This is not thread-safe? shims/spark35/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala (1 line): - line 43: // TODO: unify the equal/hashCode implementation for all data source v2 query plans. shims/spark33/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala (1 line): - line 53: * TODO: implement the read logic. cpp/core/shuffle/ShuffleReader.h (1 line): - line 28: // FIXME iterator should be unique_ptr or un-copyable singleton gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/enumerated/RemoveSort.scala (1 line): - line 31: * TODO: Sort's removal could be made much simpler once output ordering is added as a physical gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/rewrite/RewriteEligibility.scala (1 line): - line 28: * TODO: Remove this then implement API #isRewritable in rewrite rules. gluten-flink/runtime/src/main/java/org/apache/flink/streaming/runtime/translators/SourceTransformationTranslator.java (1 line): - line 100: // TODO: should use config to get parameters gluten-substrait/src/main/scala/org/apache/gluten/execution/WriteFilesExecTransformer.scala (1 line): - line 154: // TODO: Currently Velox doesn't support Parquet write of constant with complex data type. cpp/core/jni/JniCommon.h (1 line): - line 310: // TODO: Move the static functions to namespace gluten gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/rewrite/RewriteMultiChildrenCount.scala (1 line): - line 46: * TODO: Remove this rule when Velox support multi-children Count cpp-ch/local-engine/Storages/Parquet/ColumnIndexFilter.h (1 line): - line 142: //TODO: parameters backends-clickhouse/src/main/scala/org/apache/gluten/execution/CHWindowGroupLimitExecTransformer.scala (1 line): - line 84: // TODO: Make the framework aware of grouped data distribution backends-clickhouse/src-delta-32/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommandOverwrites.scala (1 line): - line 289: // TODO: Remove this wrapper and let former callers invoke DeltaTableV2.extractFrom directly. gluten-arrow/src/main/java/org/apache/gluten/vectorized/ShuffleWriterJniWrapper.java (1 line): - line 209: * @param memLimit memory usage limit for the split operation FIXME setting a cap to pool / backends-velox/src/main/scala/org/apache/gluten/execution/VeloxBroadcastNestedLoopJoinExecTransformer.scala (1 line): - line 45: // FIXME: Do we have to make build side a RDD? backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHTransformerApi.scala (1 line): - line 147: // TODO: consider compression or orc.compression in table options. cpp/velox/substrait/SubstraitExtensionCollector.cc (1 line): - line 26: // TODO: Currently we treat all velox registry based function signatures as cpp-ch/local-engine/Storages/MergeTree/MetaDataHelper.cpp (1 line): - line 261: //TODO: name gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/EnsureLocalSortRequirements.scala (1 line): - line 39: // FIXME: HeuristicTransform is costly. Re-applying it may cause performance issues. shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (1 line): - line 258: // TODO: if you move this into the closure it reverts to the default values. backends-clickhouse/src/main/java/org/apache/gluten/vectorized/BlockOutputStream.java (1 line): - line 103: // FIXME: finalize gluten-substrait/src/main/scala/org/apache/gluten/expression/ExpressionTransformer.scala (1 line): - line 37: // TODO: the funcName seems can be simplified to `substraitExprName` gluten-ras/common/src/main/scala/org/apache/gluten/ras/dp/DpClusterAlgo.scala (1 line): - line 26: // FIXME: Code is so similar with DpGroupAlgo. gluten-substrait/src/main/scala/org/apache/spark/sql/execution/ColumnarSubqueryBroadcastExec.scala (1 line): - line 71: // TODO: support BooleanType, DateType and TimestampType cpp-ch/local-engine/Storages/SubstraitSource/FormatFile.cpp (1 line): - line 99: /// TODO: check whether using const column is correct or not. cpp-ch/local-engine/Parser/RelParsers/AggregateRelParser.cpp (1 line): - line 151: /// FIXME: Really don't like this implementation. It's too easy to be broken. gluten-arrow/src/main/java/org/apache/gluten/vectorized/ColumnarBatchOutIterator.java (1 line): - line 80: // TODO: Remove this API if we have other choice, e.g., hold the pools in native code. backends-clickhouse/src/main/scala/org/apache/gluten/parser/GlutenClickhouseSqlParserBase.scala (1 line): - line 203: // TODO: Spark 3.5 supports catalog parameter shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (1 line): - line 264: // TODO: if you move this into the closure it reverts to the default values. backends-velox/src/main/scala/org/apache/gluten/datasource/ArrowCSVFileFormat.scala (1 line): - line 139: // TODO: support array/map/struct types in out-of-order schema reading. backends-velox/src/main/scala/org/apache/gluten/metrics/MetricsUtil.scala (1 line): - line 204: // FIXME: Metrics updating code is too magical to maintain. Tree-walking algorithm should be made gluten-flink/runtime/src/main/java/org/apache/gluten/util/LogicalTypeConverter.java (1 line): - line 54: // TODO: may need precision backends-clickhouse/src/main/scala/org/apache/gluten/metrics/MetricsUtil.scala (1 line): - line 109: // TODO: if `RuntimeSettings.COLLECT_METRICS` set to false, we should not log the warning gluten-substrait/src/main/scala/org/apache/gluten/execution/TakeOrderedAndProjectExecTransformer.scala (1 line): - line 34: // FIXME: The operator is simply a wrapper for sort + limit + project (+ exchange if needed). backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHBackend.scala (1 line): - line 248: // FIXME: verify Support compression codec gluten-flink/planner/src/main/java/org/apache/gluten/rexnode/FunctionMappings.java (1 line): - line 27: // TODO: support more functions. gluten-arrow/src/main/java/org/apache/gluten/columnarbatch/IndicatorVector.java (1 line): - line 52: // TODO use stronger restriction (IllegalStateException probably) gluten-ras/common/src/main/scala/org/apache/gluten/ras/path/OutputWizard.scala (1 line): - line 370: // TODO: Document gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/offload/OffloadSingleNodeRules.scala (1 line): - line 205: // TODO: Add DynamicPartitionPruningHiveScanSuite.scala gluten-substrait/src/main/scala/org/apache/spark/sql/execution/datasources/GlutenFormatWriterInjectsBase.scala (1 line): - line 44: // FIXME: HeuristicTransform is costly. Re-applying it may cause performance issues. gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/MergeTwoPhasesHashBaseAggregate.scala (1 line): - line 48: // TODO: now it can not support to merge agg which there are the filters in the aggregate exprs. cpp-ch/local-engine/Storages/SubstraitSource/Iceberg/EqualityDeleteFileReader.cpp (1 line): - line 132: //TODO: deleteFile_.equalityfieldids(i) - 1 ? why gluten-substrait/src/main/scala/org/apache/spark/sql/execution/ColumnarCollapseTransformStages.scala (1 line): - line 176: // TODO: Make this inherit from GlutenPlan. cpp/core/compute/Runtime.cc (1 line): - line 56: // FIXME: Pass the path through relevant member functions. gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala (1 line): - line 376: // TODO: May can apply to BatchScanExecTransformer without key group partitioning cpp-ch/local-engine/Parser/scalar_function_parser/arithmetic.cpp (1 line): - line 131: //TODO: checkDecimalOverflowSpark throw exception per configuration backends-velox/src/main/scala/org/apache/spark/sql/execution/utils/ExecUtil.scala (1 line): - line 159: .recyclePayload(p => ColumnarBatches.forceClose(p._2)) // FIXME why force close? cpp-ch/local-engine/Builder/SerializedPlanBuilder.cpp (1 line): - line 199: // TODO support group gluten-core/src/main/scala/org/apache/gluten/extension/columnar/rewrite/RewriteSingleNode.scala (1 line): - line 30: * TODO: Ideally for such API we'd better to allow multiple alternative outputs. cpp-ch/local-engine/Storages/SubstraitSource/ORCFormatFile.cpp (1 line): - line 72: //TODO: support prefetch gluten-ras/common/src/main/scala/org/apache/gluten/ras/path/RasPath.scala (1 line): - line 128: // TODO: Make inner builder list mutable to reduce memory usage backends-clickhouse/src/main/scala/org/apache/gluten/extension/RemoveDuplicatedColumns.scala (1 line): - line 71: // TODO: we cannot build a UT for this case. cpp-ch/local-engine/Storages/SubstraitSource/Iceberg/SimpleParquetReader.cpp (1 line): - line 63: // TODO: set min_bytes_for_seek backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/v1/CHOrcWriterInjects.scala (1 line): - line 29: // TODO: implement it backends-clickhouse/src/main/resources/org/apache/spark/sql/execution/datasources/v1/write_optimization.proto (1 line): - line 9: //TODO : set compression codec gluten-substrait/src/main/scala/org/apache/gluten/execution/WindowGroupLimitExecTransformer.scala (1 line): - line 79: // TODO: Make the framework aware of grouped data distribution shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala (1 line): - line 39: // TODO: unify the equal/hashCode implementation for all data source v2 query plans. cpp-ch/local-engine/local_engine_jni.cpp (1 line): - line 635: // TODO support multiple sinks gluten-arrow/src/main/java/org/apache/gluten/memory/arrow/alloc/ArrowBufferAllocators.java (1 line): - line 40: // FIXME: Remove this then use contextInstance(name) instead cpp-ch/local-engine/Storages/Output/WriteBufferBuilder.cpp (1 line): - line 104: //TODO: support azure and S3 backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/mergetree/MetaSerializer.scala (1 line): - line 51: // TODO: remove pathList cpp-ch/local-engine/Storages/Parquet/VectorizedParquetRecordReader.h (1 line): - line 217: // TODO: create ColumnIndexFilter here, currently disable it now. backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/delta/ClickhouseOptimisticTransaction.scala (1 line): - line 156: // TODO: support native delta parquet write cpp-ch/local-engine/Shuffle/SelectorBuilder.cpp (1 line): - line 118: /// TODO: implement new hash function sparkCityHash64 like sparkXxHash64 to process null literal as column more gracefully. backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala (1 line): - line 488: // TODO: Support LeftSemi after resolve issue backends-clickhouse/src-delta-20/main/scala/org/apache/spark/sql/delta/ClickhouseOptimisticTransaction.scala (1 line): - line 155: // TODO: support native delta parquet write cpp-ch/local-engine/Storages/Kafka/GlutenKafkaSource.cpp (1 line): - line 270: // TODO: it seems like in case of put_error_to_stream=true we may need to process those differently backends-velox/src/main/scala/org/apache/spark/sql/execution/ColumnarBuildSideRelation.scala (1 line): - line 93: .recyclePayload(ColumnarBatches.forceClose) // FIXME why force close? cpp/velox/compute/VeloxRuntime.cc (1 line): - line 238: // FIXME: Check file formats? backends-clickhouse/src-delta-32/main/scala/org/apache/spark/sql/delta/commands/OptimizeTableCommand.scala (1 line): - line 47: // TODO: Remove this file once we needn't support bucket cpp-ch/local-engine/Parser/scalar_function_parser/arrayHighOrderFunctions.cpp (1 line): - line 161: /// TODO: make a new version of arrayFold that can handle nullable array. backends-velox/src/main/scala/org/apache/spark/sql/execution/unsafe/UnsafeColumnarBuildSideRelation.scala (1 line): - line 202: .recyclePayload(ColumnarBatches.forceClose) // FIXME why force close? cpp/core/shuffle/Spill.cc (1 line): - line 51: // TODO: Add compression threshold. backends-clickhouse/src/main/scala/org/apache/gluten/vectorized/NativeExpressionEvaluator.scala (1 line): - line 24: // TODO: move CHNativeExpressionEvaluator to NativeExpressionEvaluator shims/spark32/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala (1 line): - line 255: // TODO: to optimize, bucket value is computed twice here cpp-ch/local-engine/Storages/Parquet/ColumnIndexFilter.cpp (1 line): - line 539: /// TODO: bencnmark gluten-kafka/src/main/scala/org/apache/gluten/execution/MicroBatchScanExecTransformer.scala (1 line): - line 53: // TODO: unify the equal/hashCode implementation for all data source v2 query plans. gluten-ras/common/src/main/scala/org/apache/gluten/ras/exaustive/ExhaustivePlanner.scala (1 line): - line 86: // TODO: ONLY APPLY RULES ON ALTERED GROUPS (and close parents) cpp/core/shuffle/LocalPartitionWriter.cc (1 line): - line 182: // TODO: Merging complex type is currently not supported. cpp-ch/local-engine/Storages/MergeTree/SparkStorageMergeTree.cpp (1 line): - line 529: //TODO: set settings though ASTStorage gluten-flink/runtime/src/main/java/org/apache/flink/streaming/runtime/translators/SinkTransformationTranslator.java (1 line): - line 184: // TODO: this is a constrain of velox. gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/SparkPlanExecApi.scala (1 line): - line 646: // TODO: For data lake format use pushedFilters in SupportsPushDownFilters gluten-arrow/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala (1 line): - line 75: // TODO: Time unit is not handled. cpp-ch/local-engine/proto/write_optimization.proto (1 line): - line 9: //TODO : set compression codec backends-clickhouse/src/main/scala/org/apache/spark/sql/execution/datasources/v1/CHMergeTreeWriterInjects.scala (1 line): - line 97: * TODO: We should refactor the code to avoid creating the JNI wrapper in this case. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/enumerated/EnumeratedTransform.scala (1 line): - line 75: // TODO: Avoid using this and eventually remove the API. backends-clickhouse/src-delta-32/main/scala/org/apache/gluten/sql/shims/delta32/Delta32Shims.scala (1 line): - line 51: * TODO: native size needs to support the ZeroMQ Base85 backends-velox/src/main/scala/org/apache/gluten/execution/ColumnarPartialProjectExec.scala (1 line): - line 250: // TODO: should check the size <= 1, but now it has bug, will change iterator to empty backends-clickhouse/src/main/scala/org/apache/gluten/extension/CommonSubexpressionEliminateRule.scala (1 line): - line 62: // TODO: CSE in Filter doesn't work for unknown reason, need to fix it later cpp-ch/local-engine/Parser/RelParsers/CrossRelParser.cpp (1 line): - line 174: /// FIXME: There is mistake in HashJoin::needUsedFlagsForPerRightTableRow which returns true when backends-velox/src/main/scala/org/apache/gluten/execution/VeloxColumnarToRowExec.scala (1 line): - line 127: // TODO: Pass the jni jniWrapper and arrowSchema and serializeSchema method by broadcast. gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/RemoveNativeWriteFilesSortAndProject.scala (1 line): - line 110: // TODO: support bucket write backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/RuntimeSettings.scala (1 line): - line 40: // TODO: support check value gluten-ras/common/src/main/scala/org/apache/gluten/ras/PropertyModel.scala (1 line): - line 21: // TODO Use class tags to restrict runtime user-defined class types. gluten-substrait/src/main/scala/org/apache/gluten/expression/ExpressionConverter.scala (1 line): - line 707: // TODO: Remove after fix ready for tools/scripts/gen-function-support-docs.py (1 line): - line 728: # TODO: Remove this filter as it may exclude supported expressions, such as Builder. cpp-ch/local-engine/Parser/RelParsers/GroupLimitRelParser.cpp (1 line): - line 460: // TODO: WindowGroupLimitStep has bad performance, need to improve it. So we still use window + filter here. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/enumerated/planner/plan/GlutenPlanModel.scala (1 line): - line 44: // TODO: Make this inherit from GlutenPlan. shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala (1 line): - line 40: // TODO: unify the equal/hashCode implementation for all data source v2 query plans. gluten-core/src/main/scala/org/apache/gluten/extension/columnar/ColumnarRuleExecutor.scala (1 line): - line 34: // TODO: Remove this exclusion then manage to pass Spark's idempotence check. shims/spark33/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala (1 line): - line 275: // TODO: to optimize, bucket value is computed twice here gluten-substrait/src/main/scala/org/apache/gluten/metrics/MetricsUpdater.scala (1 line): - line 25: * TODO: place it to somewhere else since it's used not only by whole stage facilities. gluten-substrait/src/main/scala/org/apache/gluten/utils/SubstraitUtil.scala (1 line): - line 50: // TODO: Support existence join shims/spark34/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala (1 line): - line 46: // TODO: unify the equal/hashCode implementation for all data source v2 query plans. backends-clickhouse/src-delta-32/main/scala/org/apache/spark/sql/execution/datasources/v1/clickhouse/MergeTreeFileFormatWriter.scala (1 line): - line 201: // TODO: to optimize, bucket value is computed twice here gluten-flink/runtime/src/main/java/org/apache/flink/client/StreamGraphTranslator.java (1 line): - line 164: // TODO: may need fallback if failed. cpp-ch/local-engine/Storages/Kafka/ReadFromGlutenStorageKafka.cpp (1 line): - line 96: // TODO: add more configuration cpp/velox/substrait/SubstraitParser.cc (1 line): - line 206: // TODO Refactor using Bison. gluten-flink/planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java (1 line): - line 587: // TODO: support it gluten-core/src/main/java/org/apache/gluten/memory/memtarget/spark/TreeMemoryConsumer.java (1 line): - line 207: while (true) { // FIXME should we add retry limit? gluten-substrait/src/main/scala/org/apache/spark/sql/hive/HiveTableScanExecTransformer.scala (1 line): - line 80: // TODO: get root paths from hive table. backends-velox/src/main/java/org/apache/gluten/fs/OnHeapFileSystem.java (1 line): - line 53: // FIXME: This is rough. JVM heap can still be filled out by other threads cpp/velox/memory/VeloxMemoryManager.cc (1 line): - line 194: const uint64_t memoryPoolInitialCapacity_; // FIXME: Unused. backends-velox/src/main/scala/org/apache/gluten/execution/VeloxResizeBatchesExec.scala (1 line): - line 40: * FIXME: Code duplication with ColumnarToColumnarExec. backends-velox/src/main/scala/org/apache/spark/sql/execution/BroadcastUtils.scala (1 line): - line 40: // FIXME: Truncate output with batch size. gluten-substrait/src/main/java/org/apache/gluten/substrait/utils/SubstraitUtil.java (1 line): - line 44: // TODO: generate the message according to the object type cpp/velox/operators/functions/Arithmetic.h (1 line): - line 52: // TODO: Make this more efficient with Boost to support high arbitrary precision at runtime. cpp-ch/local-engine/Parser/RelParsers/JoinRelParser.cpp (1 line): - line 311: /// TODO: make smj support mixed conditions backends-velox/src/main/scala/org/apache/gluten/execution/HashAggregateExecTransformer.scala (1 line): - line 66: // TODO: We should have a check to make sure the returned schema actually matches the output cpp-ch/local-engine/Storages/SubstraitSource/ReadBufferBuilder.cpp (1 line): - line 601: //TODO: support online change config for cached per_bucket_clients backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxTransformerApi.scala (1 line): - line 73: // TODO: IMPLEMENT SPECIAL PROCESS FOR VELOX BACKEND tools/gluten-it/common/src/main/scala/org/apache/spark/sql/SparkQueryRunner.scala (1 line): - line 244: // We have 50% chance to kill the task. FIXME make it configurable? gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/BackendSettingsApi.scala (1 line): - line 136: // TODO: Move this to test settings as used in UT only. cpp-ch/local-engine/Storages/Output/ParquetOutputFormatFile.cpp (1 line): - line 46: // TODO: align all spark parquet config with ch parquet config cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc (1 line): - line 1249: // The supported aggregation functions. TODO: Remove this set when Presto aggregate functions in Velox are not gluten-iceberg/src/main/scala/org/apache/gluten/execution/IcebergScanTransformer.scala (1 line): - line 118: // TODO: get root paths from table. cpp-ch/local-engine/Storages/SubstraitSource/ExcelTextFormatFile.cpp (1 line): - line 299: /// FIXME: move it to ExcelSerialization ???