spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (14 lines): - line 859: // TODO implement these types with tests for formatting options and timezone - line 914: // TODO custom escape char - line 1407: // TODO enable once https://github.com/apache/datafusion/issues/11557 is fixed or - line 2069: val childExpr = exprToProtoInternal(child, inputs, binding) // TODO review - line 2301: // TODO remove flatMap and add error handling for unsupported data filters - line 2306: // TODO: modify CometNativeScan to generate the file partitions without instantiating RDD. - line 2424: // TODO: We don't support negative limit for now. - line 2428: // TODO: Spark 3.3 might have negative limit (-1) for Offset usage. - line 2948: case ArrayType(ArrayType(_, _), _) => false // TODO: nested array is not supported - line 2949: case ArrayType(MapType(_, _, _), _) => false // TODO: map array element is not supported - line 2952: case MapType(MapType(_, _, _), _, _) => false // TODO: nested map is not supported - line 2954: case MapType(StructType(_), _, _) => false // TODO: struct map key/value is not supported - line 2956: case MapType(ArrayType(_, _), _, _) => false // TODO: array map key/value is not supported - line 2978: // TODO: Remove this constraint when we upgrade to new arrow-rs including native/core/src/execution/shuffle/map.rs (9 lines): - line 89: // TODO: Getting key/value builders outside loop when new API is available - line 124: // TODO: Getting key/value builders outside loop when new API is available - line 164: // TODO: Getting key/value builders outside loop when new API is available - line 199: // TODO: Getting key/value builders outside loop when new API is available - line 241: // TODO: Getting key/value builders outside loop when new API is available - line 275: // TODO: Getting key/value builders outside loop when new API is available - line 306: // TODO: Getting key/value builders outside loop when new API is available - line 334: // TODO: Getting key/value builders outside loop when new API is available - line 368: // TODO: Getting key/value builders outside loop when new API is available common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java (6 lines): - line 132: // TODO: (ARROW NATIVE) - line 141: // TODO: (ARROW NATIVE) - line 251: // TODO: enable off-heap buffer when they are ready - line 482: // TODO: read from native reader - line 490: // TODO: (ARROW NATIVE) Add Metrics - line 540: // TODO: (ARROW NATIVE) handle tz, datetime & int96 rebase native/core/src/parquet/read/values.rs (5 lines): - line 227: // TODO: optimize this further as checking value one by one is not very efficient - line 270: // TODO: optimize this further as checking value one by one is not very efficient - line 324: // TODO: optimize this further as checking value one by one is not very - line 379: // TODO: optimize this further as checking value one by one is not very - line 840: // TODO: optimize this further as checking value one by one is not very efficient common/src/main/java/org/apache/comet/parquet/BatchReader.java (4 lines): - line 252: // TODO: enable off-heap buffer when they are ready - line 549: // TODO: We may expose metrics from `FileReader` and get from it directly. - line 574: // TODO: handle tz, datetime & int96 rebase - line 575: // TODO: consider passing page reader via ctor - however we need to fix the shading issue native/core/src/common/bit.rs (4 lines): - line 88: // TODO: support f32 and f64 in the future, but there is no use case right now - line 474: // TODO: should we return `Result` for this func? - line 541: buffer: Buffer, // TODO: generalize this - line 851: // FIXME assert!(memory::is_ptr_aligned(in_ptr)); spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffledRowRDD.scala (3 lines): - line 57: // TODO this check is based on assumptions of callers' behavior but is sufficient for now. - line 77: // TODO order by partition size. - line 146: // TODO: Reads IPC by native code native/core/src/parquet/mod.rs (3 lines): - line 602: // TODO: (ARROW NATIVE) remove this if not needed. - line 762: // TODO: (ARROW NATIVE) We can update metrics here - line 770: // TODO: (ARROW NATIVE): Just keeping polling?? spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometShuffleExchangeExec.scala (3 lines): - line 29: // TODO: remove after dropping Spark 3.4 support - line 44: // TODO: remove after dropping Spark 3.x support - line 48: // TODO: remove after dropping Spark 3.x support native/core/src/parquet/util/bit_packing.rs (3 lines): - line 21: // TODO: may be better to make these more compact using if-else conditions. - line 25: // TODO: we should use SIMD instructions to further optimize this. I have explored - line 28: // TODO: support packing as well, which is used for encoding. native/core/src/parquet/read/column.rs (3 lines): - line 879: // TODO: consider using dictionary here - line 909: // TODO: consider using dictionary here - line 935: // TODO: is it better to convert self.value_buffer to &mut [i64] and for-loop update? native/core/src/execution/shuffle/row.rs (3 lines): - line 1875: // TODO: support other types of map after new release of Arrow. In new API, `MapBuilder` - line 3115: // TODO: nested list is not supported. Due to the design of `ListBuilder`, it has - line 3165: // TODO: We can tune this parameter automatically based on row size and cache size. native/core/src/execution/planner.rs (3 lines): - line 1128: // TODO: I think we can remove partition_count in the future, but leave for testing. - line 2084: false, // TODO: Ignore nulls - line 2191: // TODO this should try and find scalar native/core/src/parquet/util/jni.rs (2 lines): - line 79: .unwrap(); // TODO: convert Parquet errot to JNI error - line 229: //TODO: (ARROW NATIVE) check the use of unwrap here common/src/main/java/org/apache/comet/parquet/NativeColumnReader.java (2 lines): - line 36: // TODO: extend ColumnReader instead of AbstractColumnReader to reduce code duplication - line 134: // TODO: ARROW NATIVE : Handle Uuid? spark/src/main/scala/org/apache/spark/sql/comet/operators.scala (2 lines): - line 201: // TODO: support native metrics for all operators. - line 637: // TODO: support native Expand metrics spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala (2 lines): - line 123: // TODO: add `override` keyword after dropping Spark-3.x supports - line 325: // TODO: Handle BroadcastPartitioning. spark/src/main/scala/org/apache/spark/sql/comet/DecimalPrecision.scala (2 lines): - line 40: * TODO: instead of relying on this rule, it's probably better to enhance arithmetic kernels to - line 111: // TODO: consider to use `org.apache.spark.sql.types.DecimalExpression` for Spark 3.5+ common/src/main/java/org/apache/comet/parquet/TypeUtil.java (2 lines): - line 149: // TODO: use dateTimeRebaseMode from Spark side - line 173: // TODO: use dateTimeRebaseMode from Spark side native/core/src/parquet/util/memory.rs (2 lines): - line 243: // TODO: implement this for other types - line 290: // TODO: will this create too many references? rethink about this. native/spark-expr/src/conversion_funcs/cast.rs (2 lines): - line 807: // TODO we should change timezone to Tz to avoid repeated parsing - line 1084: // TODO some of this logic may be specific to converting Parquet to Spark native/core/src/parquet/parquet_support.rs (2 lines): - line 55: // TODO we should change timezone to Tz to avoid repeated parsing - line 188: // TODO some of this logic may be specific to converting Parquet to Spark native/core/src/execution/operators/scan.rs (2 lines): - line 152: /// TODO: revisit this once DF has improved its dictionary type support. Ideally we shouldn't - line 279: // TODO: validate array input data spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometNativeShuffleWriter.scala (1 line): - line 134: Array.empty, // TODO: add checksums native/core/src/parquet/schema_adapter.rs (1 line): - line 142: // TODO SchemaMapping is mostly copied from DataFusion but calls spark_cast spark/src/main/scala/org/apache/comet/parquet/CometParquetPartitionReaderFactory.scala (1 line): - line 99: // TODO: we may want to revisit this as we're going to only support flat types at the beginning spark/src/main/scala/org/apache/comet/parquet/ParquetFilters.scala (1 line): - line 52: * Copied from Spark 3.4, in order to fix Parquet shading issue. TODO: find a way to remove this common/src/main/java/org/apache/comet/parquet/Native.java (1 line): - line 239: // TODO: Add partitionValues(?), improve requiredColumns to use a projection mask that corresponds common/src/main/scala/org/apache/spark/sql/comet/execution/arrow/CometArrowConverters.scala (1 line): - line 35: // TODO: we should reuse the same root allocator in the comet code base? common/src/main/java/org/apache/comet/parquet/MetadataColumnReader.java (1 line): - line 47: // TODO: should we handle legacy dates & timestamps for metadata columns? native/spark-expr/src/json_funcs/to_json.rs (1 line): - line 18: // TODO upstream this to DataFusion as long as we have a way to specify all spark/src/main/scala/org/apache/spark/sql/comet/CometTakeOrderedAndProjectExec.scala (1 line): - line 133: // TODO: support offset for Spark 3.4 spark/src/main/spark-3.4/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala (1 line): - line 55: // TODO: remove after PARQUET-2161 becomes available in Parquet (tracked in SPARK-39634) spark/src/main/scala/org/apache/spark/sql/comet/CometCollectLimitExec.scala (1 line): - line 40: * TODO: support offset semantics spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala (1 line): - line 221: // TODO: consider converting other intermediate operators to columnar. spark/src/main/java/org/apache/spark/sql/comet/execution/shuffle/SpillWriter.java (1 line): - line 135: // TODO: try to find space in previous pages common/src/main/java/org/apache/comet/parquet/ReadOptions.java (1 line): - line 34: *

TODO: merge this with {@link org.apache.parquet.HadoopReadOptions} once PARQUET-2203 is done. spark/src/main/scala/org/apache/comet/expressions/CometCast.scala (1 line): - line 51: // TODO add DataTypes.TimestampNTZType for Spark 3.4 and later spark/src/main/scala/org/apache/spark/sql/comet/CometMetricNode.scala (1 line): - line 49: // TODO: throw an exception, e.g. IllegalArgumentException, instead? native/core/src/parquet/read/levels.rs (1 line): - line 53: current_buffer: Vec, // TODO: double check this common/src/main/java/org/apache/comet/parquet/Utils.java (1 line): - line 39: // TODO: support `useLegacyDateTimestamp` for Iceberg native/core/src/execution/shuffle/list.rs (1 line): - line 292: // TODO: support nested list spark/src/main/scala/org/apache/spark/sql/comet/CometBatchScanExec.scala (1 line): - line 130: // TODO: find a better approach than this hack native/core/src/execution/sort.rs (1 line): - line 117: let presize = cmp::max(16, (n << 2) / cfg_nbuckets); // TODO: justify the presize value spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala (1 line): - line 70: // FIXME: ideally we should reuse wrapped.supportsColumnar, however that fails many tests native/core/src/parquet/mutable_vector.rs (1 line): - line 32: /// TODO: unify the two structs in future spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala (1 line): - line 69: // TODO this was added as a workaround for TPC-DS q14 hanging and needs common/src/main/java/org/apache/arrow/c/CometBufferImportTypeVisitor.java (1 line): - line 300: // TODO: need better tests to cover the failure when I forget to multiply by offset width spark/src/main/java/org/apache/parquet/filter2/predicate/SparkFilterApi.java (1 line): - line 28: *

TODO: find a way to remove this duplication native/core/src/execution/operators/copy.rs (1 line): - line 190: // TODO: replace copy_or_cast_array with copy_array if upstream sort kernel fixes spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometSparkSessionExtensions.scala (1 line): - line 26: * TODO: delete after dropping Spark 3.x support and directly call spark/src/main/java/org/apache/spark/sql/comet/execution/shuffle/CometBypassMergeSortShuffleWriter.java (1 line): - line 236: // TODO: We probably can move checksum generation here when concatenating partition files