Summary: 2361 instances, 2030 unique Text Count // TODO: Less copy/paste between this and normal dicts. 1 // TODO: https://github.com/pytorch/pytorch/issues/47442 1 // TODO: propagate exception_ptr to the caller side 1 # TODO: is there a way to split by device and dtype without appending in the inner loop? 1 // TODO static_assert(AllowDeprecatedTypes, "You tried to register a kernel with an unsupported output type: std::vector. Please use List instead."); 1 // TODO: we don't actually need separate instantiations per dtype; 1 // TODO: currently we do not extract a common factor if divisor and 1 // TODO Avoid vector allocation. One idea would be to keep the std::vector 1 // TODO: Optimize somehow. Remember iterator instead of searching for it linearly. 1 # FIXME this may have the wrong shape when support contains batched 1 # TODO: handle module -> add_scalar -> add_scalar 1 // TODO: support runtime state capture similar to `jitted_gpu_kernel`. 1 # FIXME: Unfortunately, for Windows, we are missing a worker 1 # TODO: grad_output size asserts in THNN 1 # TODO: Make sure out argument is guaranteed to be self 1 # TODO: need to extend this to consider all relevant args instead of just arg[0] 1 // TODO: much of preamble is common to both jitted_gpu_kernel and gpu_kernel 1 // TODO: Since there is some issues in gpu_kernel_multiple_outputs, we are 1 // TODO: remove the "buf_flat_size" process below and extend the buf bound 1 # TODO: Maybe add a separate section for the layernorm/dropout lstms 1 // TODO: This approach is a bit hacky as we assume that the second output is 1 //TODO: actually support int64_t index_t 1 # TODO: See if it's possible to use those directly. 2 // TODO: this should arguably support 3d as well 1 // TODO: Add actual patterns (like Conv-Relu). 1 // TODO: Deprecate this 1 # TODO (wanchaol): remove/merge this with ScriptLocalOptimizer once 1 // TODO: trick the optimizer for case where C % 4 == 0? 1 // TODO: not sure if we need to check for output size != 1, since we 1 // TODO: remove code duplication and unify code 2 // TODO (tugsuu): make sure this is optimized later 1 // TODO: changing at::ArrayRef to at::ArrayRef? 1 # TODO: rccl_LIBRARIES should return fullpath to the library file, 1 TODO: We can optimize this to get rid of unneccesary transformation. 1 // TODO: The binary ops are copypasta. 1 // TODO: Ideally, this pass could run earlier, before inlining 1 # TODO write up issue, maybe fix 1 // TODO: code cleaning in parser so we don't need these. 1 # TODO: make this handle more cases 1 // TODO: It is possible to implement efficient batched orgqr for small tau (tau.size(-1) <= 32) 1 # TODO currently vgg19 doesn't work in the CI environment, 1 // TODO: Add hierarchical collapsible tree? 1 # TODO: Two weeks after this lands, combine these two overloads, 1 // TODO: this doesn't work with Scalar-Tensor ops! We should 1 // TODO: [T87180544] Implment softmax/log_softmax in metal shaders 1 /* FIXME: workaround for bug: https://github.com/pytorch/pytorch/issues/20342 */ \ 1 // TODO: can batch_first be a wrapper around this function? 1 aliased_input = aliased_input_iv.toTensor(); // TODO: Can we avoid saving this tensor and incurring the refcount bump? 1 # TODO: Add type annotations 1 # TODO: https://msdata.visualstudio.com/Vienna/_workitems/edit/1408006 1 // TODO: Remove the last use of Device(const std::string& device_spec). 1 # TODO: better name 1 // TODO: Fix resize_as_. See pytorch/pytorch#11665. 1 // TODO: fix the constness of target 1 // TODO: merge this into populate_operands 1 // TODO: A useful internal assert would be to show that device_opt_ is null 1 # TODO: this only checks one input and one output, need to generalize to multiple 1 padding_mode: str = 'zeros', # TODO: refine this type 3 /// TODO change this to const GraphT& g 1 // TODO Re-enable logging 1 # TODO: rename weight_is_statically_quantized to weight_is_int8_quantized 1 # FIXME: Ideally these functions should be methods on Type class, but we have a 1 // TODO: Set device to input 2 # TODO: xinyu, figure out why Nvidia do this? 1 TODO (mingzhe09088): 1 # TODO: handling of slice 2 // TODO: Calling add_context O(n) times has O(n^2) cost. We can fix 1 // TODO optimize 1 // Run backward (TODO: make this async?). 1 input.ndimension() == 4 && // TODO: 5-D contiguous depthwise is not supported yet, need benchmarks 1 // TODO: not that it matters, but mark the right type here; 1 # TODO: I'm not sure if the clone here is necessary but it is safer 1 # TODO: maybe add a pass to cleanup bn modules? 1 // TODO: AutoNonVariableTypeMode should be removed in release 1.10. 1 // TODO: If we need TORCH_CHECK(available()) calls here as a sanity check, add it. 1 // TODO: Clean this up, Naoya added a mechanism we should be able to reuse. 1 # TODO: consider regex matching for test filtering. 1 // TODO: Support ComplexHalf accessor 1 // TODO: cut this over to HIP dispatch once we stop pretending that CUDA 1 // TODO: This attempts to keep the underlying memory alive by setting the base 1 // TODO: lower simple tuples ? 1 //! TODO: we also need to assert/check reduction axes and replace it with 1 # TODO: Update this once bfloat16 and float16 are better supported. 1 // TODO: check if all nodes have lowerings 1 // TODO: war_sync_ is only used for testing/validation purposes. 1 # TODO: make img_data a single example instead of a list 1 // TODO: make this point at hip_dispatch_ptr 1 # TODO: __reduce__ 2 # FIXME: delete this 1 // TODO Kimish, we are allocating affine_quantized regardless of per channel or not. 1 // TODO: Better formatting. Right-align this. 1 // TODO: why do we assume there's a single TV output? 1 # flaky test - TODO fix 1 // TODO: eventually, codegen these calculations and make them part of the 1 // TODO: change the types to vector 1 // TODO: remove or return launch parameters 1 // TODO: move this to a more shared place. 1 // TODO: If I put Undefined as entry 64 and then adjust the 1 // TODO: Rfactor a Welford 1 // TODO: Find way to expose alias info for opaque tensors. 1 {} # backend_config_dict, TODO: point to README doc when it's ready 1 # TODO: use foreach API in optim._functional to do all the computation 1 # TODO: when https://github.com/pytorch/pytorch/issues/33782 is fixed 1 // TODO: this is only needed when dequantize_output_ == false but leave 2 # TODO: FIXME: RuntimeError: "bitwise_xor_cuda" not implemented for 'Half' 1 // TODO: Maybe distinguish between visible size and storage size. 1 # TODO: add more validations 1 # FIXME: mean does not support passing None to dim 1 // TODO: can we convert outputs_ to store indices? 1 # TODO: Validate ceil_mode semantics. 1 # TODO: update sample inputs with for_inplace_variant kwarg to support this test 2 // TODO optimize 1 // TODO exprs can have multiple accesses... we're returning the first but that 1 // TODO : omp parallelization 1 # FIXME: "prod_cpu" not implemented for 'Half' 1 // TODO: these groups should never merged into any other groups, but are 1 // TODO Licensing. 4 # TODO: when python 2 support is dropped, change the signature to 1 // TODO: deprecate this in favor of aten::getelem 1 IntType::get()}, // TODO This type should be removed from the schema 1 // TODO: move as much of this into the constructors. 1 // TODO: T21635077 fix float-divide-by-zero undefined behavior 1 * TODO: Look into changing the threading semantics of Generators in ATen (e.g., making 1 // TODO: Enable view in parser by detecting non-alias view operation 1 // TODO: support? 1 // TODO: This is nuts! There is no reason to let the default tensor type id 1 TODO: add dequant, if needed 1 // TODO: Explicitly test the 3 cases below 1 * TODO: https://github.com/pytorch/pytorch/issues/56482 1 // TODO: follow up with MKLDNN what the best way is 1 // TODO Find a better way to handle this. 1 // TODO: CUDA polling 1 // TODO: can use inplace ops? 1 # TODO: refactor this part to a function 1 // TODO: maybe this can go later in pipeline / directly in autodiff forward 1 // FIXME: this is a slow, simple implementation; need up/down sweep, 2 // TODO: We're going to get a lot of similar looking string literals 1 // TODO Replace by Reshape(), once wrappers are written 1 //! TODO: Some expr simplifications could also be helpful 1 // TODO: assert requires_grad=False 1 # TODO: delete this! 1 raise AssertionError(f"TODO add support for type {repr(typ)}") 1 // TODO: consider if this should explicilty check for the file's existence or not to throw 1 // TODO: [T87350528] Fallback to shader kernels for 10.0 users 1 // TODO: Check if miopen has the functions above and unify 1 // TODO: support dynamic input by profiling it 1 except Exception as e: # TODO: Catch only timeout exceptions 3 // TODO: Cleanup this. 1 // TODO: make it easier not to do O(k) iterations over the graph, where 1 // TODO: lookup by historic string key to start, then issue key 1 // TODO: add a test 3 # FIXME: improve load_tests() documentation here 1 # TODO: FIXME: RuntimeError: "max_elementwise_cuda" not implemented for 'ComplexFloat' 2 # TODO: Once we decide to break serialization FC, we can 1 # TODO: This method needs a refactor for clarity 1 // TODO: Only replay dispatch is really borrowed from TransformIter, we should 1 # TODO: remove dispatch section when porting TH CUDA to ATen 1 //! (TODO: need structure better so we don't have to do this) 1 # TODO @ansley: add `Union` once landed 1 # TODO: Once we decide to break serialization FC, `storage` no longer needs to 2 (NOTE: Numbers below prior with missing parameter=update step, TODO to update) 1 # TODO: Possibly support variable-sized inputs. 1 // TODO: this class needs cleanup 1 # TODO this should be removed now that gpu support for quantization is being supported. 1 // TODO: handle dynamic shapes here. 1 // TODO: Check contiguous and dtype. 2 // strong refcount TODO: pack these into one word 1 // FIXME: this may not actually make any sense if we can efficiently move 1 // TODO: generalize logic 1 // FIXME: do these actually need to be zeros_like or can they be empty_like? 1 at::AutoDispatchBelowADInplaceOrView guard; // TODO: remove 3 // TODO: Need to handle other Stmts / Exprs that read / write buffers. 1 // TODO: today, we put a single bailout template at the front to 1 // TODO: Potentially the following logic can be replaced by special logic in VariableType_x.cpp 1 // TODO Use list_element_from? 2 /// TODO: it's not in native_functions.yaml yet as it's not exposed to python 1 // TODO: Caffe2 Concat has an extra output. It should be only 1 // TODO: This should be something like TVMC2Frontend::supports(op); 1 # FIXME: mean reduces all dimensions when dim=[] 1 // TODO: maybe this should be a more shared location? 1 // TODO: This qint special case looks very suspicious... 1 // TODO We shouldn't use c10::impl stuff directly here. We should use the 1 # TODO currently onnx can't translate squeezenet :( 1 // TODO return generator object when torchscript supports RNG 1 // TODO Test use_count 1 # FIXME: min does not accept scalar inputs 1 // TODO: Brian Vaughan observed that we might be able to get this to work on 1 // TODO: consider printing more proper schema strings with defaults, optionals, etc. 1 // TODO: TensorArg check should start handle memory format 2 # TODO: complete the data type: bool, float16, byte, string 1 // TODO: Figure out why const-correctness doesn't work here 1 // TODO: review jiterating igamma and igammac if/when a persistent (across processes) 1 // TODO: Maybe eliminate required_size and just rely on next_pointer for bounds checking. 1 /// \todo FIXME: remove when \c constexpr becomes really \c constexpr 1 # hardcoded for now, TODO: expose the api to user, 1 // TODO: Remove??? 1 # FIXME: nansum reduces all dimensions when dim=[] 1 // TODO: once modules support arbitrary ivalue attributes, we don't need this 1 // TODO: handle loop 1 // TODO: update this to support COO sparse layout 3 // TODO: temporarily disabled 1 // TODO: may need/want to initialize CUDA context here (refactor into nvrtc call) 1 # TODO: In principle, we track device information in our trace, so it 1 // TODO Disallow this and rather use std::unordered_map/set everywhere 1 * - Pattern graph nodes cannot alias. TODO: the check not implemented yet. 1 # TODO: create dedicated columns 1 # TODO: remove the skip after these two operators schemas are fixed 1 # TODO: default exact_device to True 1 # TODO: review porting these to make_tensor 1 // TODO (possible optimization): 1 // TODO: Potentially more enforcements are necessary to avoid accidental 2 # FIXME add an override for JIT and revert 0. back to 0 1 // TODO: try more memory reuse algorithms and compare their memory efficiency. 1 # TODO: L1HingeEmbeddingCriterion 1 // FIXME: doesn't work for bias so we shouldn't quantize bias before 1 // TODO: we are sometimes emitting expressions like 1 // TODO: current RRef implementation does not tolerate failures 1 // TODO: more sane strategy 1 // TODO: handle channels last 1 TORCH_CHECK(std > 0.0, "normal_ expects std > 0.0, but found std=", std); // TODO: dedupe 1 # TODO: add shape checks 2 # TODO: some signatures of std_mean do support out 1 // TODO: Handle non-contiguous Tensors. 1 // TODO: Consider readding Tensor and TensorList constructors here, when 1 // TODO - XXX - if any output is the same tensor multiple times, views 1 // TODO Add GPU support by writing a generic wrapper. 1 // TODO: consider unrolling CopyItems to make elemental types copy faster 1 // TODO: ListType case 1 # TODO: merge this to the case above? 1 // TODO: unify to C10_MOBILE. In theory this header could be used in OSS. 1 # FIXME: prod does not support passing None to dim 1 // TODO: This unwrapping code is ONLY used for TH bindings; once TH goes 1 // TODO: remove after empty Tensor serialization is forbidden 1 // TODO: Remove join() 1 # TODO: Once we have proper scoping, stop reimplementing chunk, delete this 1 // TODO: handle multiple kernels. 1 # TODO: FIXME: jiterator doesn't support non-tensor inputs 1 // TODO move to typeid.h (or codemod away) when TypeMeta et al 1 // TODO: can i haz an out version of the conv2d? 1 * TODO: move this to fbgemm after making transpose routine more general 1 // TODO clean up 1 # TODO properly map the exceptions in pybind (c10d/init.cpp) 1 // FIXME: try switch statement and explicitly handle cases 1 prim::unchecked_unwrap_optional, // TODO remove 1 // TODO Make quantize_tensor_per_channel_impl work for other datatypes too 1 // FIXME: it is actually possible for us to handle padding, figure 1 // TODO: should be b.transpose(1, 0)? 2 // TODO: add the max/min clip 1 // TODO this should go in the global Python CU 1 // TODO: return iterator 2 # TODO: Add calibration for the sparsity 2 # TODO: sure looks like we unconditionally initialize the context here 1 # TODO: RuntimeError: cholesky_inverse does not support automatic differentiation for outputs 1 // TODO: channels last 3d 1 # TODO: Consolidate `i0e` with sample_inputs_unary when `make_tensor`, 1 # TODO: when FL is migrated from full-jit to lite trainer, remove '__ROOT__' 1 # TODO: Delete this line once https://github.com/pytorch/pytorch/pull/55889 lands 1 # hacky check for collections.namedtuple, TODO improve this 1 // TODO move this to c10 namespace 1 # TODO: Handle dilation 1 # TODO: __name__ not set for submodules in recursive script 1 // TODO: type is unknown until a user starts to fill data; 1 # TODO: type dim as BroadcastingList when 1 # TODO: remove these flags when 1 // TODO: verify the same thing in miopen 1 # TODO: Once we decide to break serialization FC, this case 2 // TODO: only handling conv2d at this moment, expand this to convXd 1 # TODO (wanchaol): remove this once we added TorchScript 1 // TODO: add %dtype after when https://github.com/pytorch/pytorch/issues/34351 1 // TODO Can be replaced with packB->getOutputChannels() when update pre-pack 1 # FIXME: mean does not support passing keepdim without passing dim 1 # TODO: Add Gumbel-Laplace KL Divergence 1 # TODO: assertions could be expanded with the error messages 1 // TODO: Instead of cudaStreamSynchronize it is possible to add Stream 1 # TODO: These are deprecated, maybe we shouldn't type hint them 1 # TODO: create a dedicated SelfArgument type for 'self'? 1 // TODO: total hack. Switch to numel when it is available. 1 //! TODO: unify the segmented and un-segmented code-path 2 # TODO: Need to handle collisions with argument names at some point 2 // TODO: split out 1 // TODO: Look up cached contained types. this is kind of tricky 1 # TODO: output dtype in the config and parse it back from the str 1 # TODO: --op_registration_whitelist will be removed when all call-sites 1 // TODO: scalar-tensor ops should be canonicalized 1 // TODO: Once copy_ is fully migrated to use dispatcher, handle named 1 // TODO: replace Half by BFloat16, after BFloat16 is supported by Nvidia 2 // TODO: support channels last in sum 1 // FIXME: really, overlapping writes should be illegal/an error in Torch 1 // TODO: remove this code path once Variable and Tensor are merged in Python 1 * FIXME: Can we specialize elementwise_kernel and launch_kernel in Loops.cuh 1 //TODO: [ROCm] Need to remove this after CUDA->HIP mapping is updated. 1 # FIXME: prod reduces all dimensions when dim=[] 2 // TODO: use tensor.index() after improving perf 1 # TODO: the name should be weight is int8 quantized 1 // TODO: Workaround since MIOpen does not support NHWC bias 2 // TODO: jiterate kaiser window and make them only available when not jiterating 1 // TODO: handle strides? 1 // TODO: fix it after the storage sharing is merged 1 return f.func.name.name.base # TODO: should be str(f.func.name.name)? 1 // TODO: enable input fillers 1 // TODO: fix this when windows can correctly capture variables in nested lambda 2 # FIXME: Once 3.7 is the minimum version, type annotate `other` per PEP 563 1 // TODO: _batch_norm_impl_index_backward is only used in JIT. cudnn NHWC 1 // TODO Replace by Reshape(), once wrappers are written 1 // TODO: use nccl reduce 1 // TODO: Make HIPIFY understand CUDART_VERSION macro 1 // TODO: contiguous is called for further jit optimizations. 3 # TODO: implement add_param_group 1 // TODO: Re-audit this; it used to be an indexSelect directly into r_values 1 // TODO: think more about TensorExpr alias correctness 1 // TODO: This is a known bug that it's not yet implemented and the allocation is failing. 1 // TODO: if we need to, we can also enable this path for quantized tensor 1 // TODO: Rewrite this using dispatchKeyToTensorOptions 1 # TODO: hip_hcc has an interface include flag "-hc" which is only 1 // TODO: Maybe skip this for fixed-size outputs? 1 # TODO: There is a bug in rocblas's & rocfft's cmake files that exports the wrong targets name in ${rocblas_LIBRARIES} 1 // TODO: Support more dtypes. 1 // TODO: implement equality 1 # TODO: reference function 1 # FIXME: does not support passing keepdim without dim 2 // TODO: first operand for pow can be Tensor / Scalar 1 // TODO: this could be insertReferencedAttr to be more clear, 1 // TODO: canonicalize as aten::dim ? 1 // TODO: There is no need to branch with every element 1 // TODO: packing + quantization in same block. 1 // TODO: cache these result from the forward pass 1 // TODO: channels last kernel can be made faster. 1 // TODO: implement _out variant avoiding copy and using already allocated storage directly 5 # TODO: fix torch.zeros(sizes, grad.options()) before enabling select, topk, kthvalue 1 # TODO: current patterns are the ones after fusion, we will want to expose fusion 1 // TODO: check for overflow 1 // TODO: we need to figure out what are supported input scalar 1 // TODO: When adding failure retries and timeout, this fork needs to be 1 # TODO: Update the comment reference to the correct location 1 # TODO: Remove this dependency 1 // TODO: check type and return the right flag 1 # TODO remove Dropout special after codebase stable 1 // TODO: not sure if optimizer is able to compile two levels of 1 // TODO: implement negative step 1 /* TODO: TORCH_CHECK just have 2 args: condition and message */ 1 // TODO: replace "PredictorParameters" with the constant in OSS bbp 1 # Stores underlying RecordFunction as a tensor. TODO: move to custom 1 // TODO: Write a more refined GRU heuristic. 1 .target_device(options.device()) // TODO: this shouldn't be necessary if it came from options 1 # TODO (alband) Why this is not automatically broadcasted? (had to add the repeat) 1 // TODO: incorporate center distance normalization. 1 # FIXME: does not support dim=None 2 // TODO: avoid extra copy by directly feed initializers to backend blobs 1 // TODO: limit it only to amp related node; 2 // FIXME: crashes if exception type is not RuntimeError 1 # TODO: This discrepancy isn't required; we could also generated 1 // TODO: this doesn't work with Scalar-Tensor ops! We should 2 # TODO: This implies that ellipses is valid syntax for allocating 1 // TODO: we should support any type, not just float 1 // TODO: I think we should do this but 1 # TODO: refactor this part to return WEIGHT_INDEX_DICT and BIAS_INDEX_DICT 1 // FIXME The two resizes below zero out the vectors, which is not needed. 1 # TODO: This is a potential source of accuracy drop. 1 # TODO: review with var_mean tests in test_autograd.py 1 // TODO reuse col_offsets_with_zero_pt_s8acc32_ref in fbgemm 1 # TODO: Add an example to use such a wrapper. 1 # TODO: Add Gamma-Laplace KL Divergence 1 // TODO: elaborate in this comment on the structure of math.cuh 1 // TODO: We currently expect contiguous memory layout. 1 // TODO: Maybe 260 KB is a bit small... 1 # TODO: Call the torch.ao.utils.convert in here 1 # TODO: Handle this case better. TorchScript ranges are in bytes, 1 // TODO: remove output dequantization once NNC supports quantized outputs. 1 // TODO: Maybe more user friendly to report where the expected size 1 // TODO: restore the above, see https://github.com/pytorch/pytorch/issues/64709 2 # TODO: after having proper ways to map Python strings to ATen Enum, move 1 // TODO: rotated_rect_intersection_pts is a replacement function for 1 // TODO: remove when possible, since it just slows down 1 # TODO: We should really error in this case, but its bc-breaking so 2 // TODO: check x for consistency with input_size? 1 // TODO: put this in the correct device??? 1 // TODO: Make this a real argument 1 # TODO: clone storage aliasing 1 // TODO benchmark to decide whether to remove this special case 1 // TODO: tune this with cache size detection code. Changing to 32 helps on some 1 # TODO: bfloat16 support. 1 // TODO: add requires_grad once we decide on semantics for sharing data. 1 //! TODO: we should probably change shared_ptr to unique_ptr, as we want to 1 TODO: when scale != 1 is introduced then use: 1 # TODO blowtorch 1 // TODO: switch to PyTorch dtype as it's closer to truth. 1 // TODO: Make rpc_op(..) support taking kwargs, 1 # TODO: We will save the original weight and bias, because the unpacking is not yet there. 1 // TODO: Add documentation. 1 // TODO: even though both scopes are conditional, we can merge accesses if 1 # FIXME: torchscript: torch.zeros(sizes, grad.options()) 3 // TODO: for (1), multiple -1 may conflict each other. Consider use 1 // TODO: add parallelization once fbgemmGroupwiseConv supports multi-threading 1 // TODO: consider caching the heuristics value so tryMerge doesn't have to be 1 // TODO: hadd_pd() & hsub_pd() may have scope for improvement. 2 # TODO Add some asserts about input type 1 // FIXME: This probably should be called WorkGloo since the work is executed in sync mode 1 # TODO: can we use JIT here to reduce python overhead? 1 // TODO At some point this should probably be done, including tricks 1 # TODO: PartialLinear - maybe in sparse? 1 # FIXME: figure out a better way when we support sparse tensors in jit 1 // TODO: Unify 2d and 3d when ChannelsLast3d is ready. 1 # TODO: @kefeilu: also incorporates a validator to do inference (and optionally) 1 # TODO: These functions are not used outside the `fuse_modules.py` 1 // TODO: perhaps it would be nice to have int128, a signed 128-bit type? 1 # TODO: Do this in translate instead 1 // TODO: expand support to wire non-constant inputs, this is currently 1 # TODO: Move orelse to the body after calling ConditionalExceptionWrapper. 1 // TODO: hacky way of deciding the groups 1 // TODO: add a note explaining the design decisions 1 // TODO: update this when codegen can output scalar 1 # TODO: merge with get_static_quant_module_class 1 // TODO: We could squeeze some perf by calling at::cuda::mul_out here instead, to bypass the dispatcher. 1 # TODO: remove this as onnx opset 11 spec allows negative axes 2 # FIXME: logical_or does not accept scalar inputs 1 // TODO: convert to c10::optional 1 // TODO: I'm assuming we have stride information in `graph->toString` 1 # TODO: create an internal helper function and extract the duplicate code in FP16_compress and BF16_compress. 1 # TODO: we should probably add them in 1 // TODO: ideally we should be strict here and return nullopt if the dtype is 1 // For cases with scripting, TODO: Add logic to handle NoneType outputs 1 // this should be removed. TODO (kimishpatel) 1 // TODO Update quantize_tensor_per_channel_impl implementation to follow 1 //TODO use _assert_fail, because assert is disabled in non-debug builds 1 # TODO: delete the original weights 2 // TODO: replace conv1d with conv2d ? 1 //! TODO: refactor this similar to isConsumerOf 1 // TODO: If caching inputs would require persistence we are sending it to the 1 // FIXME: by doing this only on the inputs, we only capture graph inputs and 1 # TODO: is_reference option for conv module 1 DecorateInfo(unittest.skip("FIXME: Jiterator does not support complex outs!"), 1 # TODO: consider adding torch.unravel_index 1 * TODO: make this API also take "offsets" rather than "lengths" to match the 2 this->Size = this->Capacity = 0; // FIXME: Setting Capacity to 0 is suspect. 1 // TODO: Add the assert after all uninitialized states are eliminated 1 # TODO: Update comment below since it is out of date. 1 // TODO: it seems like sparse_dim == 0 could be supported even if self.dim() > 1 // TODO: after we add Get(DeviceType) 1 # TODO: update this when promote_types supports bfloat16 and/or 1 // TODO: figure out how to do chaining 1 # TODO finish disentangling control flow so we don't do in-projections when statics are passed 2 // TODO: run root tasks inline in inference mode 1 // TODO: Remove this else. Or add assert 1 rpc_backend_options (rpc): configurations/options for the rpc TODO: fix 1 # TODO: Determine if the other cases need to be fixed as well 1 // TODO: Maybe show key? 1 // TODO: Deduplicate from compute_at.cpp 1 // TODO: Maybe TensorAccessor can be used here? 1 # TODO: properly handle case when u is tuple instead of only taking first element 1 // TODO Direct implementation might be faster 1 # FIXME: document this and move it to test_serialization 1 # TODO (zaf): Inherit from `quantized.Linear` (T83294430) 1 // TODO: Is it worth it to have a contiguous call or maybe we should go with 1 // TODO: Revisit this once we decide on how dependencies analysis should look 1 // TODO: replace __syncthreads with __threadfence for alias ops 1 // TODO: this is not a legit assumption? Can't we run with 1 # TODO: rename this to supports_bwgrad_bwgrad to be consistent with below 1 // TODO: we are only checking output 0. This means that our current check for 1 // TODO: do HIP 1 // TODO: revisit this later to use batch_iterator_with_broadcasting in triangular_solve 1 base_name = f.func.name.name.base # TODO: should be str(f.func.name.name)? 1 // loop variable. TODO: Remove this constraint. 1 // TODO: move to this an internal IR. 1 // TODO: put this into ir_cloner instead 1 // TODO: are these a single call to cublas batched matmul? 4 // TODO: dont require # of dimensions of tensors set ? 1 // FIXME: it's vA < vB because the sorting order for V (aka 1 # TODO: maybe the logic to search for all variants is no longer necessary? 1 # TODO: Make this an enum. 1 # FIXME: remove this by updating test suites using it 2 // TODO: investigate parallelization of the accumulate kernel. 1 ret = input # TODO: remove when jit supports exception flow 3 # TODO: What exactly is the semantics of the 'dispatch' field? 1 // TODO: use guards to avoid leaking 1 // TODO (@zasdfgbnm): this function assume trivial 1d and no dynamic casting 1 // TODO: This will only be useful if we write a backend fallback that plumbs dispatch keys (currently there are none) 1 // TODO: refactor all current uses of this function to the Opt one 1 // TODO: Move this function to MetalContext 1 // TODO: fast exp 1 // TODO: Allow individual functions to specify non-default translations: 1 // TODO: make sure there's nothing wrong with grouping of nodes that 1 # TODO: Why skip this? Because @torch.jit._overload_method will 1 // TODO: enable other min/max variants, operators that can be both 1 // TODO: this should be handled during dispatch, but that's missing... 1 // FIXME: we don't actually handle a dynamic padL > 0 1 //! TODO: remove inline versions of this as much as possible 1 # FIXME: figure out the flaky -1024 anti-leaks on windows. See #8044 1 # TODO: handle the other Ju 1 // TODO: specialize fbgemm::Quantize for a single vector and make it 1 // TODO: need to call GenerateClone sometimes? Or else return LowerBuiltIn() directly 1 // TODO: runOnGraph takes const ref? 1 // TODO: don't strictly need to reset write cache, evaluate on models 2 // TODO: move this to Buf printer 1 // TODO: 1. Add assert in the dist engine to ensure no GPU NodeTasks during 1 # TODO: consider more complicated noncontiguity schemes 1 // TODO: Handle all other cases here. 1 // TODO: Decouple and improve error handling and messages. 4 // TODO: free GIL - but remember to reacquire it when an exception is thrown 1 // TODO: caffe2::PThreadPool only provides a data-parallel API. 2 # TODO: write a non-layer checker and log it 1 // TODO: Update alias key precedence after we add new alias keys AutogradDispatchCPUOrCUDA . 1 // TODO: Deprecate me 1 // TODO: refactor bindFusionInputs to better support this 1 points for compilation (TODO add a link when the rules are published). 1 // TODO: Use name-hint of the producer indices instead of 'idx' 1 //! TODO: make sure there's nothing wrong with segmentation on nodes that 1 // TODO: can be unified with at::from_blob when Tensor is merged and string 1 // TODO: trick the optimizer for case where C == 4? 1 # TODO: discuss a unified TorchScript-friendly API for autocast 2 * // TODO modify docs 1 - Fix various TODO comments in this file and the JS. 1 // TODO: Activation transpose before and after the kernel can be removed if we 1 // TODO we need to refactor graph APIs (e.g., addInputs) 1 # TODO: What follows is a reference implementation of a masked sum 1 # TODO: it's possible that the following is confusing: 1 // TODO: remove following two after at::kDouble and its friends are TypeMeta's. 1 // TODO: This is annoying, having to put the cudnnTensorDescriptor_t 1 // TODO: Make this accept options instead of dispatch key 1 # TODO: this is not handling non-tensor tuple args (for example, 1 // TODO: investigate parallelization of the accumulate kernel. Unlike the non-accumulate case, 1 # TODO: handle objects with deeper nested tensors 1 /// TODO: Eliminate this function as much as possible, as it can be expressed 1 // TODO This should be deprecated in favor of linalg_matrix_exp_differential 2 // TODO: When lint fails, give better indication about which 1 # TODO: need to specify this is side-effectful somehow 1 // TODO: Use name-hint of the producer instead of "temp" 1 // TODO: can non_c10_complex go through the other path? Need to verify. 2 inline void Record(Event* ev, const char*&) const { /* TODO */ 1 // TODO: we could attempt to follow the GetAttr chain and 1 // TODO: Use itensor after 0-dim is supported. Now use CPU tensor. 3 // TODO: Issue #20497 2 // XXX TODO: revisit the alternatives 1 # TODO: replace torch.subtract/divide/square/maximum with 1 # TODO Movie this to C++ once the jit has better support for torch.Size. 1 # TODO: some signatures of var_mean do support out 1 // TODO: failure in buildShapeExpressions should not break fusion execution, 1 # TODO: maybe create a PythonTensorOptionsArgument? 1 # TODO: this is looking into how the value is used in the future 1 // TODO: Make this actually return something that's "user friendly". 1 // TODO: empirically, on OS X this assert appears to be untrue 1 // TODO: figure out if we can narrow gO and save some compute, 1 // Trigger tests for D25440771. TODO: Remove this line any time you want. 1 # TODO: support get_autocast_gpu/cpu_dtype 1 // TODO: Do we have a ScalarOrTensor type? Would such a thing exist? 1 // FIXME: _fft does not support complex_output=false with inverse=false 1 # TODO: There's an issue here with FC. It might be impossible to 1 // TODO Making operator!= noexcept if operator== is noexcept doesn't work with 1 // TODO: This file is still in the caffe2 namespace, despite living 1 // TODO Handle other composite types, such as vector<...> 1 # TODO: expose other parameters in the future. 1 # TODO: Conv2dMap 1 TODO: Need a clean way of loading the state of the "preapred" module 1 // TODO: Cache indices in forward pass to re-use in backward 1 # TODO: Why do I have to call this grad?! 1 // TODO: this is not exposed here, I need to remove that before inserting 2 // TODO: This code can path can be removed if #61309 is resolved 1 // TODO this should come from cmake 1 // TODO: We might need to use nodes_map instead of value_map. Otherwise, we 1 // TODO: pass tracer and counters through ExecutorHelper 1 // TODO: uncomment when we properly support pow 1 // TODO: share more logic with tensorexpr_fuser ? 1 # FIXME: keepdim parameter is ignored when dim=None 2 # TODO: Transform at load time to share weights with CPU model. 3 // TODO: Reuse convertToDotString once convertToDotString can work 1 # TODO: Add support for multiple parametrizations for the same weight 1 // TODO: RRef internal messages are not yet idempotent 1 // TODO: verify that the CUDA handles copy from device to device correctly 1 // TODO: Add an IR verifier check to detect invalidly compressed buffers. 1 # TODO: Once we decide to break serialization FC, no longer 2 // TODO: Handle nested dictionaries. 1 # TODO: more robust handling of recognizing ignore context manager 1 // TODO: Use bit_cast once C++20 becomes available. 1 // FIXME choose desired unroll level 1 # TODO from zdevito: 1 // TODO: use a dedicated bind-var to make sure v is not evalualted multiple 1 # FIXME: torchscript: int - bool 1 // TODO: may be more efficient to place after the first non-computeAt 1 /// TODO: The consistency check here is inconsistent with StreamGuard's 1 assert len(schema.keyword_values) == 0, "TODO the logic for operand(i) is broken if there are kw values" 1 // TODO thread_local might fix this 1 // TODO Union types are not supported on embedded runtime, and we need to 1 // TODO:: check restrictions for inputs; outputs not used elsewhere 6 // TODO: consider adding a 32bit indexed kernel for improved performance 1 using Stack = torch::jit::Stack; // TODO Instead of this, move torch::jit::Stack to the c10 namespace. 2 // TODO: Would be great if we didn't need this, but we have nice functionality 1 # TODO: clone indices in sparse tensor ctor. 1 name = f.func.arguments.out[0].name # TODO: old codegen behavior - should fix 1 // TODO: add memory types. 1 // TODO: delete this and update accessor for value_map(_) 1 # TODO: figure out how to return torch.return_types.histogramdd 1 // TODO: turn contiguous_ into an enum CONTIGUOUS, NONCONTIGUOUS, 1 // FIXME: TestJit.test_ge_optimized fails this assertion. 1 // TODO: Make this more efficient 1 // TODO: consider using C++ stack trace 1 // TODO: When handling fused ops d = a + b + c, the correct 1 // TODO: Support only 9 args once the old signature has been removed. 1 # TODO: return this to the caller 1 # TODO : Fix these discrepancies 1 // TODO: here, we create a Storage 1 // TODO: Perhaps we should use cloneFrom now, as it seems unlikely 2 // TODO remove these switches once interface call is rolled out. 1 # TODO: check schema 1 // TODO: merge this with at::empty after Tensor is merged 1 attr.isTensor()) { // TODO: Handle float/int 1 TODO: refactor so we don't set inputs onto the subgraphs 1 // TODO: consider using C++ stack trace 1 # TODO: See if we can remove this in the future 1 // TODO: this should be replaced by RAII wrappers. 1 // TODO: Maybe track by dtype as well. 1 raise unittest.SkipTest('TODO: Memory availability checks for XLA?') 1 # TODO: Determine whether this can be removed after type inference. 1 // TODO - Implement rfactor 1 # TODO: Uniform-Laplace KL Divergence 1 // TODO: Do we need this, or can we get it from getMatcherGraph? 1 # TODO: This was historically used to help some JIT interop code 1 # fbgemm (FIXME: this assumes fbgemm is used only for NHWC and im2col 1 // TODO: Consolidate this file with util.h 1 # TODO: Update all quantization tests to use this decorator. 1 // TODO : `unary_jitted_gpu_kernel` for cleaner UX. 1 {}); // TODO: https://github.com/pytorch/pytorch/issues/55757 1 * TODO: This will only be useful if we write a backend fallback that plumbs dispatch keys (currently there are none) 1 // TODO: int -> int64_t 1 ss << "{TODO implement Node::shape}"; 1 tmp_y[i] = _cvtss_sh(y[i], 0); // TODO: vectorize 1 // FIXME: In case forward has multi outputs, we only support one requires grad 1 # TODO: support dynamic quant 1 # FIXME: heaviside does not accept scalar inputs 1 TODO: This is fragile, whether output is quantized should not depend on `is_reference` since 1 # FIXME: cannot specify keepdim without dim 2 // TODO: arange doesn't have complex support 2 1); // TODO: expose as argument? 1 # TODO: optional start/end attribute. 1 // TODO: gcc vectorization 2 // TODO: Find way to expose alias info for opaque tensors. 1 // TODO: MemoryFormat is not implemented in this way 1 # TODO: Maybe embed the enforced zero_point in the `torch.iinfo`. 1 // TODO: remove this comment after enabling autograd support for CSR tensor 1 // TODO: Reduce this extra TensorIterator construction for Reduction::Mean & Sum. 1 // TODO: (@pavithran) size is overloaded with int[] and Tensor 1 seed=random.randint(0, 100000), # TODO: dropout seed 1 // TODO: unlock GIL when contacting the manager 1 // TODO: call into `convertOutputToCorrectStrides`. Currently this causes a 1 // TODO Needs to be fixed this to work in all cases 1 // TODO: We should fix this when we have some time. 1 // TODO: may be more efficient to place before the first non-computeAt 1 // TODO: try_get_weight_buf returns a Tensor, but _cudnn_rnn below takes a c10::optional 1 # TODO: enforce that functions in fp32_to_int8_fun_mapping must both be 1 // TODO these arrays are potentially of the different types, use function 1 # TODO: there should be a more principled way of doing this. 1 # TODO: When `len(inputs) == 1` and all inputs are on `destination`, just 1 // TODO: make this cheaper by computing hash of fdata. 1 // TODO: Resize and copy should be avoided with 1 // TODO: remove when the function in math are changed to use vector 1 // TODO: replace input.svd with linalg_svd 1 // TODO: Ideally AutoNonVariableTypeMode in this file should be changed to 1 // TODO: move to TensorMath.cpp 1 # TODO: Should we do this even for non-contiguous tensors? 1 // TODO: we can actually support it by adding an extra inputs to the 1 // TODO: To support all features of MemoryFormat::Preserve we need to add 1 // TODO need guards on init returning none 1 # TODO: move all tri/tril/triu testing to tensor creation op test suite and remove 1 // TODO: Type Inference does not propagate Shape Information 1 # TODO: it's not used, so actually we can skip quantization 1 // TODO: support input tensor with dynamic shape (PR #54982) 1 // Brute-force algorithm. TODO: Come up with something better. 1 # TODO: Need to add options to qconfig to avoid the calibration. 2 // TODO: revisit this (do we need to consider float64 types?) 1 // TODO: call UDF if it exists 1 // TODO: refactor codegenOutputQuery into its own file 1 # TODO: we should probably handle a few additional errors, 1 // TODO add other op signatures. 1 // TODO: there should be a shorter way to spell this 1 // TODO: The following don't work on Windows. Specifically, sigaction, waitid 1 // TODO: This is a very naive implementation with a single mutex. We can do the 1 # TODO: we may want to try to remove the special case here 1 // TODO: add LSTM function to composite operations 1 # TODO: make sure virtual devices such as 'cpu1' and 'cpu4' are supported. 1 // TODO: remove later 2 // TODO This backward pass uses a very complext expression to compute (diff 1 // TODO: separate passes into different file; 1 # TODO: include the configuration in backend_config_dict 1 # TODO: Once we decide to break serialization FC, we can 1 # TODO: only to keep it byte-for-byte compatible with the old codegen, should remove. 2 // TODO: Use something less heavy duty than a big honking mutex 1 // TODO: Is there a way to py::cast that doesn't raise an exception on 1 return NamedCType(binds, MutRefCType(BaseCType(tensorT))) # TODO: fix this discrepancy 1 // TODO: lift the restriction of not fusing producer containing reduction when 1 // FIXME: remove this check once cub sort supports bool 1 // TODO: should we set this as human-readable time instead of unixtime? 1 for (int i = 0; i < N; ++i) { // TODO: multithreading 5 schema.overload_name().empty(), // @TODO: is this check correct? 1 // FIXME: _fft_r2c doesn't support native r2c IFFT 1 // TODO: class types cannot be redefined because we have no way right now 1 // TODO: static_assert that a templated function exists, and throw a friendly 2 # TODO: remove them when users are ready to take a hard dependency on PyTorch 1. 1 // TODO: move this somewhere more generally useful 1 // TODO @wconstab refactor to use ModuleValue::asTuple instead of new API 1 // TODO: Make this call the TensorOptions version, maybe? 2 // FIXME: this doesn't actually unroll; clang has per-loop unroll 1 // TODO: C++17 has the fileystem header, which may replace these 1 // TODO: replace Half by BFloat16, after BFloat16 is supported by Nvidia 1 # TODO: Kill this when we eventually remove it! 3 # TODO: make sharding spec a ChunkShardingSpec by inferring from the metadata list. 2 // TODO: Only remaining use of this is in index compute, remove use from there, 1 # TODO: try to recover the location of else:? Python doesn't give us useful 1 // TODO: The question about how to handle negative orders when the input 1 // TODO: going to need to change this if we want nested functionalize() transforms. 1 // TODO: change the number of microprocessors 1 // TODO: enable slice, shape inference is not implemented for this op yet 1 // TODO: Handle arg promotion. 4 // TODO: Stop calling PyObject_HasAttrString() in a loop on our read loop 1 # TODO: deprecate this 1 // TODO: #ProfileIValue List should update this 1 // TODO: Error if an operator is def'ed multiple times. Right now we just 1 //! TODO: Space efficiency of this class will be important, 1 // TODO: eliminate const_cast 1 // FIXME: We'd have to find some other trick with Thrust to perform a 1 # TODO: See if we can remove this in the future if we are 1 # TODO: To cover more problematic cases, replace stride = 0 check with 1 # TODO: update this when isclose is implemented for CPU float16 1 // TODO: enable 3d batchnorm. 1 // TODO: This qint special case looks very suspicious... 1 // TODO dim == 4 case will be enabled once it is fully tested 2 // TODO: not supporting casting to outputs is only really necessary for arg{min,max} 1 // TODO: uncomment once we can handle rand+broadcasts 1 // TODO: we need to check input type when we handle `to()` 1 # TODO: handle other tuple subclasses more generically 1 // TODO: is it worth optimizing this loop via padding in C? 1 # TODO: for _shared_model, do only NCCLReduce 1 // TODO: checking this is not free, so we should stop if this keeps 1 // TODO: This is better represented as an OrderedDict, but alas it is not yet 1 # FIXME: logical_and does not accept scalar inputs 1 # TODO: Once we decide to break serialization FC, no longer 1 // FIXME: We need to call it here since Future completion requires all 1 # TODO: add args/kwargs for passing to assertEqual (e.g. rtol, atol) 1 # TODO: https://github.com/pytorch/pytorch/issues/53023 1 // TODO: Write in more idiomatic C++17 1 // TODO: come up with a more user friendly interface 1 // FIXME : code duplication with 3 auto dx = at::empty(input.sizes(), input.options()); // TODO: more compact way of saying this 1 # FIXME: nansum does not support passing keepdim without passing dim 1 # TODO: enable later 1 // TODO: This is bad; this test should apply universally 1 # TODO: I think this means structured won't work with method 1 // TODO: pass pre-resized output? 1 // TODO: factor out isCallFunc 1 # TODO: Can we improve this error message to point out the gaps? 1 // TODO: implement MAGMA-based path using magma_zgeqrf_expert_batched 1 // TODO: clarify hasSideEffects, isNondeterministic 1 // TODO: use CUDAGuard here instead of context and employ explicit sync 1 # TODO: not sure if typing supports recursive data types 1 // TODO: It's possible this is still triggering 1 // TODO: add support for the following fusible operators. 1 // TODO: Use shape information from weight tensor 1 # FIXME: dim=[] reduces all dimensions 3 // TODO: support the case where kernel allocates output tensors dynamically. 1 // TODO: support int64_t dims in ideep::tensor to avoid extra conversion 2 // TODO: kernel implementation could stride on spatial dimension. We probably 1 // TODO: hacky way of inferring the groups number for grouped Conv3D 1 // TODO: This may be a common operation, could be worth making a utility 1 # TODO: Support automatic reshape 1 # TODO If this codepath becomes popular, it may be worth 1 # TODO: Update to provide the libraries and paths for linking npymath lib. 1 // TODO: pass block_size here; 1 // TODO: retry on collision 1 // TODO: change to a proper error report 2 // TODO: Improved heuristic on when to coalesce or remove need to coalesce 1 // TODO: old fuser is not maintained internally, somewhere it is being turned on 1 // TODO: Add human-readable sizes? 1 // TODO: Should we revert to the original timeout at the end of the call? 1 # TODO: fix lint 1 // TODO: only check `!impl_->requires_grad()` after Variable and Tensor are 2 // TODO: update cast_op signature to take dynamic context flags 1 // TODO: this should be merged with the storage flattener. 1 # TODO: we don't have _concrete_type set after load(), and in general we lose constant information. 1 # TODO: Uncomment when negative weights is supported. 1 //FIXME - this are defined in Loops.cuh, but including Loops.cuh here would lead to circular includes Loops.cuh -> CUDALoops.cuh -> jit_utils.h -> Loops.cuh 1 // TODO This probably should be using at::native::make_reduction 1 // TODO: Fill in output sizes 1 // TODO: dedup this part with code in quantizeTensors 1 // TODO: make IR nodes extensible. 1 # TODO: remove these special cases, ArrayRef fallthrough works fine 1 // TODO: Stop using legacyExtractDispatchKey here (probably need to build 1 # TODO: We need to change this to rpc.remote, and make it async (see the else branch below). 1 # TODO: This code can path can be removed if #61309 is resolved 1 // TODO: Remove PYTORCH_MIOPEN_SUGGEST_NHWC once ROCm officially supports NHWC in MIOpen 1 # TODO: maybe rename this to MatchInputNode 1 // TODO: split it into two kernels to make it more similar to exact 2 // TODO: Remove this function when at::native::empty() is modified to accept a 1 // TODO: unhook this 1 // TODO: insert specialized cases (e.g. depthwise convolutions, the direct 1 // TODO: check consistency, e.g.: code version, input shape and compiled 1 // TODO: qscheme 1 * TODO Instead of doing it this way, we should only have pure-jit ops in 1 // TODO: There are several places that recurse over IValue. This is fragile. 1 // TODO: Raise if not all output values are visible in input geometry. 1 # TODO: this can be simplified after https://github.com/pytorch/pytorch/issues/69316 is fixed 1 // TODO: fix this when windows can correctly capture variables in nested lambda 1 // TODO: is this necessary? We used to treat nullptr-vs-not in IntList differently 1 # TODO: can we emit the union of these? What are the implications on TorchScript 1 // TODO: replace this with a real overload_name when FunctionSchema supports 1 // TODO - Implement cache_access 1 // TODO: Any chance to make this cleaner? 1 // TODO: uncomment the following when svd is deprecated not only in docs 1 // TODO: XZP kernels won't be supporting per channel quantization. 1 # TODO: (@krshrimali), add error_inputs_func once https://github.com/pytorch/pytorch/pull/67354 is merged 1 // TODO: Rename to register 5 // TODO: add a mutex to make it thread safe. 1 // TODO: refactor findObserverName to take Node* as input 1 // TODO: not currently using these gradients, investigate t16675365 1 // TODO: we should use wrapped pg_'s timeout here, but C++ ProcessGroup API 1 // TODO: does this works? 1 // TODO: This code path is not ideal as we are "lying" to the caller in 1 // TODO: consider using trash can 1 # TODO: figure out one liner to .clone() and set requires_grad 1 # TODO: why this needs to be special case? 1 # TODO: fixme 1 // TODO: hack to make `test_lstm_gates_permutations_cuda` 1 // TODO: we need to figure out how to profile calls to custom functions 1 # TODO: enable inplace in aten exporting mode. 2 // TODO: should we consider adding support for NoneType; 1 # TODO: fix along with var_mean autograd tests 1 # FIXME: maximum does not accept scalar inputs 2 // TODO: check multiple uses ? 1 // TODO: Check adding iteration domain unrolling 1 // TODO: update and add a usage example after https://github.com/pytorch/pytorch/pull/58092 lands. 1 # TODO: Properly handle aliasing caused by get_attr. For now, 1 // TODO Is there some way to implement this? 1 # TODO: improve the handling of complex tensors here 1 # TODO: Support this by adding trailing 1 dims. 1 // TODO: Turn this into an honest to goodness class. I briefly attempted to do 1 // TODO: do MLC 1 // TODO: We do go through different code path, should investigate whether this 1 // TODO: The logic for cast_outputs will need to be handled by the 1 // TODO Do schema inference without relying on WrapFunctionIntoFunctor 4 // TODO: better TensorOptions argument passing(e.g. default argument) 1 # TODO: Get rid of dynamic_type, after getting tools/autograd 1 # TODO: In principle, we could provide more structured version/config 1 // TODO: could this be a reference and not allocated on 1 # TODO (refactor) this is duplicated, maybe have a helper function 1 // FIXME: remove magic > 0 after we ensure no models were serialized with -1 defaults. 2 // TODO - expensive (>1ms) - cache these. 1 # TODO: Uggggh, parsing the schema string here, really??? 1 # TODO: Add GPU support 1 // TODO maybe avoid call to vec 1 //FIXME I didn't find how complex -> real conversion is done in eager 1 // TODO - come up with better message 1 // FIXME: warn if this is the case -- see comment about skipped 1 # TODO: this can probably be optimized 1 // TODO: Typecheck the parameters 1 // TODO: Kimish 1 // TODO: Order of this list is important as it affects type promotion. it's not 1 // Grab consumer domain entries and reverse replay map. TODO: Maybe 1 # TODO: do we want to test this too? 1 # 'fuse_fx', 'quantize_fx', # TODO: add quantize_dynamic_fx 1 # TODO: maybe need more complex attr name here 1 // TODO: Remove this once the following issue is addressed: 1 # TODO: deduplicate annotation matching with Return 1 # FIXME: there are a few things that fall under this like 1 // TODO: rename context.h -> context_cpu.h & context_base.h -> context.h 1 # TODO: We might not need this anymore, since most scalars now show up 1 // TODO: Improve this by checking if it is mutated in the graph region 1 # TODO: make GivenTensor generic 1 # TODO: Deprecate and remove the following alias `_ConvTransposeMixin`. 1 // TODO: should serialize parameters with Module instead of with each Method. 1 // TODO: This constructor should probably use an ATen abstract method in order 1 # TODO: add support for more ops 1 // TODO: This is morally the same thing as KernelRegistrationConfig, but it's 1 # TODO: allow convert_custom_config_dict to override backend_config_dict 1 // TODO: Figure out how to unify these call interfaces. 1 // TODO: Consider a way to pre-allocate and reuse 1 // TODO: more vectorization with loop interleaving 1 // TODO After we actually export CALL instructions we can remove this. 1 // TODO: figure out why this needs to be computed... 1 // TODO: [algo] improve this algorithm, as it is horrendously inefficient. 1 # TODO: consider more complex/custom dynamic ranges for 1 # TODO: maybe update the cpp argument API to take optional namespace argument? 1 # TODO: miopen_LIBRARIES should return fullpath to the library file, 1 # TODO: should be `arg.type.is_tensor_like()`? 1 // TODO: vector may be faster 1 # TODO: If we ever implement tensor.nextafter, below is what we want ideally. 1 // TODO: is it ok that we're doing it eagerly? In the other implementation we 1 // TODO Make this explicit 1 # TODO: im not good enough with regexes to ignore -> * 1 // TODO: it's also odd these ops use gpu_kernel_with_scalars 1 // TODO: can we support caching this? 2 /// a graph (probably a dataflow graph). TODO: refactor this 1 // TODO xinyu: standardrize reset_parameters virtual funcs 2 // TODO: should be input 1 # TODO: Remove this once script supports type() calls 1 # TODO: I think this is not necessary anymore 1 // TODO: Add constructors for all of the descriptors 1 // TODO: piping down the parallel dimension map here would 1 # TODO use arg_dequant_infos 2 // TODO: strides would also be important when we handle permutations in 1 # TODO: refactor to solve this weird dependency where 1 // TODO: Skip this if not writing tensors 1 // TODO: remove constant prop in the pass 1 # TODO: Use TypeAlias when Python 3.6 is deprecated 1 // TODO: _fused_dropout_cuda is to be removed, see PR #63937 1 # TODO (maybe): merge with embedding quantize handler 1 //! TODO: May want to sort this based on size of connections between this and 1 // TODO: Run more tests for bsize > 128. 1 /* TODO: Use [[maybe-unused]] when C++17 becomes the standard */ \ 1 // TODO: Support skipping python frames 1 * TODO: add docs after this is finalized. 1 // TODO: Consider a way to pre-allocate and reuse 1 """FIXME: Temporarily replace std:: invocations of math functions 1 # FIXME: modernize these to be consistent with make_tensor 1 # TODO: include the configuration in backend_config_dict 1 # TODO allow kwargs such as unsafe and others for parametrization 1 // TODO: remove this when the cuda kernel is updated to support the channels_last memory format. 1 // TODO: this keeps reallocating map_size at every iteration, but we know 1 // TODO: Don't go through WrapRuntimeKernelFunctor 3 // TODO: we should have normal_like operator 1 # TODO: rename Relu -> ReLU to be more consistent with other classes 1 // TODO: There are places in core where a scalar is wrapped but not marked as 1 // TODO: make all operations that resize given outputs use this function 1 # TODO: should have an unspported list of operators, be optimistic for now 1 // TODO: a better invariant is that if we tagged, we MUST have a valid 1 // TODO: Change to fieldDesc 1 # TODO: change the way we get binary file -- binary may not in build/bin ? 1 ret = -1 # TODO: remove once JIT exceptions support control flow 1 // TODO: matrix-vector products in the code below are dispatched to matrix-matrix products. 1 // TODO: merge HasRand with CudaAnalysis. 1 # TODO: binary search 1 // TODO: Use the list from AMP eager directly 1 # FIXME: move to test_sparse or sparse utils 1 // TODO: Move to a dedicated validation pass 1 // TODO: support negative strides 1 // TODO: assert it's actually immediate 1 // TODO extend to support 4-bit qtensor. 1 // TODO: Consider representing debug info as a struct instead so you 1 # TODO: This can be further optimized by passing dim_in, dim_out = features, 1 // TODO: convert to schema, add a test 2 // TODO: verify whether source or dest device should be a priority in picking 1 // TODO: Remove this constraint. 1 // TODO: Compare vs OpenSSL and/or CryptoPP implementations 1 * FIXME: use std::random_device with entropy information 1 // TODO: Change this when ChannelsLast3d is ready. 1 // TODO: Remove the condition on AT_ROCM_ENABLED entirely, 2 # TODO: TensorBase should work 1 // TODO: here is an optimization opportunity since welford uses int64_t for 1 # TODO: LSTM can't be TorchScript'd 1 // TODO: This test seems a bit goofy 2 // TODO: Dimension collapsing should be abstracted out and integrated into 1 //! TODO: Remove this interface as we do not intend to support dynamic 1 # TODO: do this with in-memory files as soon as torch.save will support it 1 # TODO: Fancier types? 1 # TODO: The unpacking is not yet implemented 1 # [old codegen] TODO: remove this? doesn't rename in codegen, it's just 2 // TODO: This error message seems awfully opaque 1 // TODO replace with TensorIterator implementation once #33166 is fixed. 1 // details. TODO Update to actually call pre-pack here once bias is removed 2 // TODO (viswanath): Should angle info be included as well while filtering? 2 //! TODO: In the next refactor PR, should put segment candidate 1 # TODO: use actual dtype instead of defaulting to float 1 // TODO: Fix this filter. Requires_grad is not the appropriate 1 # TODO: make this more efficient 1 # TODO: Re-enable this check (.type isn't supported in TorchScript) 1 # TODO: figure out what this does 1 # TODO: Implement `SeedSequence` like object for `torch.random` 1 /* TODO: result here is truncated to scalar_t, 1 // TODO: fast sigmoid 1 # TODO: move into common_utils.py or the test suite(s) that use this 2 // TODO: implement scale_grad_by_freq 1 # TODO: remove this (prefer make_symmetric_matrices below) 1 // TODO: remove this once we no longer have old TorchBind code 1 # TODO: avoid this special handling? 1 // TODO: try making the CUcontext thread local to see if that improves performance - why is this slow? 1 // TODO: enable if once shared libraries are unified in CMake 1 default = 'at::kLong' # TODO: this is wrong 1 // TODO [unpickler refactor] __main__ isn't used by the pickler anymore, this 1 // TODO: remove this once the Kernel IR split is complete 1 // TODO: merge this code with case 1. 1 // manually. TODO: make declarable in native_functions 1 # TODO: include files like this should not set the default dtype 1 // TODO: these functions are unconditionally available because kaiser window depends on them 1 // TODO: merge this check up 1 // TODO: assert on empty buffer 1 // TODO: This line doesn't seem to be exercised at all in tests 1 # TODO: CUDA path doesn't work with batched or empty inputs 1 # TODO: reduce signatures down to one when optional args is available 1 // TODO: check the bounds only once 1 // TODO: [Need verify] looks like we can quantize simple functionals that just 1 inputs[4])})); // TODO: handle other dtypes of alpha and beta 1 // TODO: when we do legacy group convolution support, we'll repeatedly 1 // TODO add FBGEMM kernel 1 at::AutoDispatchBelowADInplaceOrView guard; // TODO: remove 1 # TODO: Don't allocate a in-memory string for the protobuf 1 # TODO: directly translate a.default to python default 1 // FIXME: crappy implementation 2 # TODO: implement state_dict 1 // TODO: remove this custom tracing code once the custom op bugfix 2 # TODO: compute correct memory usage and CPU time once 1 // FIXME: assert ((D % VEC) == 0) 1 # TODO: skipping storage copy is wrong for meta, as meta 1 # TODO: test_cpu_gpu_parity doesn't handle case where output is not a singleton, submit fix 1 // TODO Make quantize_tensor_arm work for other datatypes too (int8, int32). 1 // TODO: are the `else` branches needed? 2 // TODO We don't call .cpu() on quantized tensors as it fails when calling 1 // TODO: This is marginally less efficient than it could 1 // TODO: generate a proper error log, as this probably means something 1 //! TODO: 1 // FIXME: remove magic > 0 after we ensure no models were serialized with -1 defaults. 1 // TODO: can be cleaner. 1 // TODO (T90180710): Simplify type_resolver and obj_loader when getting 1 // TODO: assert the output is a buffer and not a scalar 1 // TODO refactor so this function is usable both from jit and from aten 1 // TODO: For now, we are not moving the loads with the IfThenElse. 1 // TODO: handle the mask 1 //! TODO: cleaner way to set options? 1 // TODO: migrate all to using torchscript 1 // TODO Do schema inference without relying on WrapFunctionIntoRuntimeFunctor 1 // TODO: Remove duplicate declaration. 1 {}) # backend_config_dict, TODO: point to README doc when it's ready 1 ss << "{TODO implement Node::shape} "; 1 // TODO: maybe we should just push false and fallback 1 # TODO: Undo this special-case; see the header for motivation behind this 1 // TODO: Replace this helper with DECLARE/DEFINE_DISPATCH 1 # TODO: Add Pareto-Laplace KL Divergence 1 // TODO: call nvrtc. 1 // TODO @wconstab refactor using Symbol instead of string compare 1 // TODO: alias should be made aware to segmentation, so we'll always include 1 // TODO: Improved heuristic on when to coalesce or remove need to coalesce 1 // TODO: this is kind of... blegh 1 // TODO: when we fail commandline flag parsing, shall we continue, or 1 # TODO type process_group once `distributed` module is stubbed 1 // TODO: The assert is not necessary when we can handle matmul, right now we 1 // TODO: reduce the apparent redundancy of all the code below. 2 # TODO: Remove PYTORCH_MIOPEN_SUGGEST_NHWC once ROCm officially supports NHWC in MIOpen 1 # TODO: remove the skip after these two operators schemas are fixed 1 # TODO: add an API to map real -> complex dtypes 1 // TODO: Shall we handle the case when shape has -1 here? 2 // TODO Delete this once kernels don't do that anymore 1 # TODO: Remove the configuration by reference ('module') 1 // TODO: use LCM of stride and dilation to avoid unnecessary loops 2 # TODO: replace torch.divide with masked divide when available. 1 # FIXME: nansum does not support passing None to dim 1 # TODO: do we need eagerly calculate and save it here? Can it be derived 1 // TODO: remove, all constant tensors should have typed sizes 1 # TODO: consider removing this check and allowing users to specify 1 // TODO: These two functions below are slow! Fix internal data structures so 1 # TODO: where 1 # FIXME: torchscript: div(float, float) 2 // TODO: Improve this once D31357486 is landed. 1 // TODO This is an incomplete implementation of std::apply, not working for 1 # TODO: return self 1 // FIXME: Allow any integral type. 2 // TODO if constexpr instead of enable_if 2 // TODO - relax this? CAFFE_ENFORCE_EQ(op->output_size(), 1); 1 # TODO define a bijection for LowerCholeskyTransform 1 # TODO: expand this to `_ConvNd` when channels_last support is extended 1 # TODO support future 1 // TODO we should attempt to call __str__ if the object defines it. 1 // TODO: investigate making this SingletonOrSharedTypePtr 1 # FIXME: second derivative is implemented but seems to be incorrect 1 // TODO: contiguous is called for further jit optimizations. 1 // FIXME: the shape of the input to the fictional PadPacked node has 1 # TODO: See https://github.com/pytorch/pytorch/issues/56285 1 // TODO: Make this take Variable by const reference 1 # FIXME: dim=None not supported 2 # TODO: Not sure why the arguments assigned here are for 1 # TODO: remove QConfigAny and replace it with Optional[QConfig] 1 // TODO: 1 # TODO remove and replace in favor of contextlib.nullcontext 1 # TODO: Missing type hints for nn 1 // TODO: setup grid-stride loop 2 # FIXME: amax reduces all dimensions when dim=[] 2 // TODO: check that weight matches output->sizes() 1 // TODO: Make a more type safe std::includes wrapper which disallows use 1 const float pred_ctr_x = ctr_x + width * dx; // TODO fuse madd 1 // TODO: This is a temporary apporoach to enable calling user fucntion 1 # TODO: maybe the logic to handle the legacy schema is no longer necessary? 1 # TODO: should use some canonical form instead of 'str(arg.type)' - see comments 1 # TODO @kiuk - make entrypoint a required field 1 // TODO This is an inefficient way to compite sign, and can be much faster 1 * TODO: t15868555 This algorithm is fast but can miss dependencies. 1 // TODO: move abs to aten namespace because it's schematized! 1 # FIXME: Remove when back testing is no longer required. 1 // TODO: guard impl_index, but I think that's not needed; 1 # TODO this should probably be a separate loss, not hacked in this one here 1 /* TODO: Use [[maybe-unused]] when C++17 becomes the standard */ \ 1 // TODO: Adding this attr should be correct, but as of LLVM 9.0.1 adding it 1 // inplace argument is ignored now, TODO:support inplace 1 * TODO: This should be jettisoned in favor of `set_sizes_and_strides`, 2 // TODO: have to copy output because at::embedding doesnt have an out 1 // TODO: to really support input tensor large enought to go beyond int32, 2 // TODO: Turn on autocast by default. default turned off to avoid tests failures 1 # TODO: update notes/cuda.rst when this class handles 8+ GPUs well 1 # TODO is it correct to call_cur_module twice here? 1 // TODO Deprecate this function in favour of linalg_lu_factor_ex 1 // TODO: decide on fixpoint strategy 1 // TODO: avoid having to set this guard for custom mobile build with mobile 1 // TODO: implement prefetching if it starts mattering (TF does it) 1 # FIXME: prod does not support passing keepdim without passing dim 1 // TODO: remove this special case for HIP when issue is fixed: 2 # TODO: allow structured external backends later. 1 // TODO: restore decomposition after fusion, in case we are decomposing 1 # TODO: See https://github.com/pytorch/pytorch/issues/68592 1 // TODO: Expose in PyTorch Frontend 1 // TODO: It's a bit irritating that we have to do logical ORs here, it would 1 # TODO: consolidate this with the get_cases function from 1 // TODO: we could probably merge the two if it has perf impact on generated 1 # TODO: Once step_param interface is robust, refactor step to call 1 // TODO: Return const ptr eventually if possible 1 # TODO: only assertion error is bound in C++ compilation right now 2 /* TODO: remove once the bug is fixed. */ \ 1 // TODO: Remove once [serialization type tags] is landed 1 # TODO: Fix for python < 3.3 1 inline void WaitEvent(const Event& ev) { /* TODO */ 1 # TODO: include the deprecation as soon as torch.testing.assert_close is stable 1 // TODO: this can be improved with summarizes of what the function does 1 // TODO: CPU instruction set selection should be folded into whatever 1 # TODO: add input, output validator 1 # TODO: Verify that sysconfig isn't inaccurate 1 // TODO: fix this when windows can correctly capture variables in nested lambda 2 TODO: test coverage 1 const auto output_padding = output_padding_arg; // TODO: Deconvolutions 1 // TODO find a way to parallelize this... 2 // TODO: use MaybeOwned 1 # TODO: check in THNN (if inplace == True, then assert value <= threshold) 1 # TODO: for _shared_model, no need to broadcast 1 // TODO: Check if the tensors with symbolic shapes are contiguous. 1 # TODO: complete the data type: bool, float16, byte, int64, string 1 # TODO not sure what this message really means 1 // TODO: we are having one unnecessary copy here if the context is already 2 TODO: To remove this check once Union support lands. 1 # TODO: FIXME 1 // TODO: add TORCH_API 2 // TODO: consolidate other ELF file related functions in loader.cpp to this file 1 // TODO: SetDeviceTensor accept vector 1 // TODO: figure out how to make compiler happy without dynamic casts 1 // TODO: replace filler distribution enum with a better abstraction 1 // TODO: support 16bit, 32bit, and etc. 2 thread=0) # TODO: find in sqlite database 1 'ConstQuantizerPtr', # TODO: rename 1 # TODO: how come ValuesView isn't a Sequence lol 1 // TODO: how to do this more intelligently 1 scalar_t scale = 1; // TODO: expose as argument? 1 # TODO: DivisiveNorm2d 1 // TODO: Currently tarjans mutates the graph, and that's the only reason we 1 // TODO: we don't support list type in codegen yet; 1 // TODO: exceptions in future 1 // TODO: Currently we do not support signal handling in non-Linux yet - below is 1 // TODO: if image decoding was unsuccessful, set label to 0 1 //! TODO: have a interface for grabbing all recent logs. Need to put a buffer 1 // TODO: move this from `at::` to `jit::torch::` after 1 # TODO: These invariants are weirdly asymmetric? 1 // TODO: More efficient would be to create event inside of main thread (at 1 # FIXME: `x` is a sparse view of `v`. Currently rebase_history for 1 // TODO: It might be good to use cpuinfo third-party dependency instead for 1 # TODO: Remove me once https://bugs.python.org/issue42666 is resolved 1 // TODO: only support NCHW for now 2 // TODO Change Ptr to DynamicTypePtr when all migrations are done. 1 // TODO: Expose this for real in ATen, some day? 1 // TODO: fix this when windows can correctly capture variables in nested lambda 1 # TODO: Consider not exporting these during wildcard import (reserve 1 // TODO: this might be slow - consider batched updates? 1 * TODO: need to support customizing equality 1 // TODO: Abstract stride logic to reuse with consumer indexing 1 // TODO: Add sorting options? 1 # FIXME: Undefined behavior sanitizer: shift exponent -9 is negative 1 TODO: when scale != 1 is introduced then use: 1 # TODO: It would be better to export this as a chunk directly, as this is 1 AutoDispatchBelowADInplaceOrView guard{}; // TODO: Remove. 1 // TODO: Re-audit this; it used to be an indexSelect directly into r_values 1 // TODO: c10::optional<>::value returns an rvalue ref so can't use it here?? 1 # TODO update this when inplace namings are unified 1 # TODO: Consider adding a utility function to torch.jit to test 1 // TODO: switch to kUint64 when it is available. 1 false)); // TODO: nDim is bad, as it is collapsed 1 # TODO: maybe move to the generator side as it's not related to binding. 1 // TODO: gcc/clang has __builtin_clz() but it's not portable. 1 # TODO: currently model building does not have access to iter counter or 1 // TODO: remove 2 # TODO: enforce typing for each instance based on mode, otherwise 1 # FIXME: logical_xor does not accept scalar inputs 1 // TODO: dynamic allocation size: cur_rows*factor_j[i]*ranks[i+1] 1 // TODO: Remove this code in a separate diff, since we only have one 1 int stream_id = 0; // TODO: thread local stream id 1 # TODO: create weight observers from qconfig.weight 2 # TODO: Limitations and things about enable_python_mode we should fix before exposing it: 1 // TODO: modify this after resize_ added `memory_format` tag 1 # TODO: remove this (prefer make_symmetric_pd_matrices below) 1 // TODO: we need to stay on safer side instead of "default to return true 1 // TODO: extract arg_count from packed. 1 # FIXME (by @ssnl): Improve adaptive pooling docs: specify what the input and 1 # TODO: Validate supported requests 1 # TODO: Use a real parser here; this will get bamboozled 1 # TODO: ClassSimplexCriterion 1 # TODO: diagnostic if dir does not exist 3 // TODO: https://github.com/pytorch/pytorch/pull/59380#pullrequestreview-725310492 1 # TODO: some signatures of median do support out 1 // TODO: Remove once torchvision has been updated to use the ATen header 1 // TODO: No need to have this whole header, we can just put it all in 1 // TODO: compile the generated C++ kernel into a library, 2 // TODO Add tracing here 1 * FIXME: The behavior in this function is from legacy code 1 // TODO: support clamp_min, clamp_max 1 // TODO: test if Python key is disabled 1 // TODO use fast math when possible 2 // TODO: Change dims related arguments to int64_t? 1 // TODO: Build utility to strip off debug map. It should also do the 1 // TODO: When all kernels that use gpu_kernel_with_scalars are 2 // TODO: Simply traverse through uses from of. Would be a lot faster than 1 // TODO: implement prealloc optimization and fill in temp_sizes 1 * TODO: As people experiment with capture, keep an eye out for use cases that might need to 1 // TODO HIP support 1 /// TODO: it might be possible to handle cases where backward is 1 # TODO: Undo at least that second hack. We should support string states. 1 # TODO: change the signature for fuser_method to take matched module patterns 1 // TODO: this message is not correct anymore, since this InferredType is 1 # TODO: switch to scale.item() after adding JIT support 1 OpInfo('trapz', # TODO: in the future, 'trapz' should be made a proper alias of 'trapezoid' 1 // FIXME: Epsilon parameter isn't required, we don't take the reciprocal 2 // TODO Add GPU support by writing a generic wrapper. 1 // TODO: rand_like should support cast. 1 // TODO: deprecate? 1 // TODO this can be a qualified name check 1 # TODO: maybe we should change activation_post_process to _activation_post_process 1 // TODO add "canRunNatively" once memory management is audited 1 # TODO: Remove this scripting logic once the 2-week FC window has passed. 2 # TODO: If indexing is supported natively in ONNX in future opsets, 1 // TODO: we need to properly restore shape information after fusion. 2 # TODO: could add more detail here. For example, what the user should do 1 # TODO: Remove the try/except once all operators have sample_inputs_func with 2 // TODO: add fbgemm for per channel 1 # TODO: probably better to accumulate these errors and report them all 1 # TODO: specify __all__ 1 // TODO: reuse memory for bufs with dynamic shapes 1 // TODO: check that output->size() matches output_sizes 1 // TODO: must be investigated and unified!!! 1 // TODO: [T87340633] Support reducing the batch dimension 1 # TODO: can we simplify this to always return a tuple of Tensor or None? 1 # FIXME: sum reduces all dimensions when dim=[] 7 // TODO: add in-place variant 1 // TODO: this function needs to be implemented and tested. Currently just throw 4 // TODO: if the divisor is a scalar, rewrite as multiplication by a constant. 1 // TODO: can't get rid of this use of TensorType 1 # FIXME: remove after implementing reflection pad 3d 1 // TODO Do schema inference without relying on WrapFunctionIntoRuntimeFunctor 4 // TODO: are the `else` branches needed? 2 # TODO: use slicing when slice optimization has landed 1 // TODO extend this special case to when the underlying storage of new_grad 1 // TODO: Handle tensors with different dtype, layout, device, memory 1 // TODO: numa? 1 // TODO: this should be a TypeError 3 // TODO: use op info to print out the op in a more user-friendly way 1 // TODO: investigate how "ExprPtr" can be implicitly converted to "ExprHandle" 1 // TODO: remove when script supports setting grad mode 5 // TODO: eliminate newCapacity. 1 // TODO: Deprecate these structs after we land this diff 1 // TODO - confirm that this is correct for NHWC 1 // TODO this needs to go in `m`s compilation unit 1 // TODO: How do we adjust this so we can reduce to a single scalar value? 1 # TODO: fail fast on quantization API usage error, then remove this class 1 // TODO: handle other alpha and beta dtypes, e.g. alpha=0.6, beta=0.2 1 // TODO: For future tasks, since output quantization parameters are set equal to 1 // TODO: make RRef system messages idempotent and retry on failures. 1 # TODO: dedup with BatchNorm2d 1 // TODO: This is a massive hack! There is some confusion about 1 # TODO: add a copy kwarg that guarantees that the tensor is put into fresh 1 // TODO: Check what happens with MKL, the output error reported with non square matrices tends to be high 1 # TODO: maybe change to this when https://github.com/pytorch/pytorch/pull/32958 is landed 1 // TODO: We clone grad_slice because we modify it below and "fn" might save 1 // TODO: use CUDAGuard here instead of context and employ explicit sync 1 // TODO: channels last 3d 1 # TODO (zaf): Mask might not be part of the qconfig (T83295194) 1 // TODO include edges in the SCC in a smarter way. 1 // TODO: check latency here!!!! 1 # TODO Make return type more specific 1 # TODO: when standard argument type for "nets" is introduced, 1 // TODO This can probably use fused add multiply to get better perf 1 // TODO: is there a more idiomatic way to do this? 1 // TODO: Remove duplication with Upsample.h (CPU). 1 # TODO: shouldn't this be OptionalType[ListType[...]], since it defaults to None? 1 # FIXME: sum does not support passing keepdim without passing dim 1 # TODO: some cpp naming logic (e.g. resolving name conflict) might be irrelevant? 1 * TODO: Should (2) and (3) be swapped? 1 // TODO: I think the answer is we shouldn't have used Symbol here 1 // TODO: need to see if there is extra error checking needed 1 # TODO: it would be nice to not have these special cases 1 // TODO: support runtime flag 1 # TODO: handle attrs 1 # TODO: we should pipe the exception of the failed subprocess here. 1 # TODO: 1 # TODO: Lambda for picking 1 // TODO: implement prefetching if it starts mattering (TF does it) 1 # TODO: instead of always doing this if there is an observer, 1 // TODO: Once the code in caffe2/python/onnx/backend.py no longer calls 1 TODO: fixme 1 // TODO possible to remove this arg by deferring the init value until we 1 // TODO: Tuning NumThreads for w_grad 2 // TODO: This function may be sub optimial. If we find that an iteration domain 1 // TODO: When we move to SM 3.5 we should update this 2 // TODO: This isn't right if there's a thread index at a higher level 1 // TODO Also register c10 operators on mobile 1 # TODO: Once we decide to break serialization FC, we can 4 // TODO: 5D channels last 1 // TODO: supports only single comprehension for now 2 # TODO: T18892922, use device annotations 1 # Special tree reduction for 16 gpus, TODO generalize like in muji.py 1 // TODO: Unify with DepTracker 1 # TODO: use argv 1 // TODO: handle dynamic dimension. 1 // risk of breaking existing clients. TODO: A better way would be to allow 1 # TODO: remove these once we support Type's in the JIT IR and we can once again 1 raise AssertionError("TODO not sure if there are other valid types to handle here") 2 // FIXME: make this thread-safe by reusing the benchmark cache in Conv_v7.cpp 1 # TODO: remove observed_op, looks like it's not used 1 // TODO: vectorization 1 # TODO: maybe don't represent default here 1 # TODO: This method has some duplicate lines with the 1 # TODO: maybe don't need keep scattered out fields for python signature? 1 # TODO: My kingdom for a pattern matcher 1 // TODO: in a follow up we need a global logging structure 1 // TODO: Update input names of function to match those in Module source code 1 // TODO: add filter to the clamp patterns and remove this pass 1 // TODO: Add remaining transforms 1 # TODO: compose all metas into one AssertionError 1 # [old codegen] TODO: because these aren't guaranteed to be 100% faithful 1 // TODO fix duplication caused by referencing same op across multiple 1 .device(at::kCPU) // TODO: support GPUs too 2 # TODO: compare structure (ensure analytic jacobian has correct shape) 1 // TODO: remove when serialization of dtype uninitialized tensor is removed 1 # TODO: type annotations for *args and **kwargs 1 // TODO: ThreadDim should be BlockDim and BlockDim should be GridDim 1 // TODO: consider to initializing to a blocked layout 1 # TODO: the second dim (num of input nodes) of param is after feature preproc, 2 // TODO: we frequently use pairwise root mapping from consumers to producers. 1 // TODO: Relax the checks to support dynamic shapes 2 # TODO: add a typing.Protocol to be able to tell Mypy that only objects with 1 # TODO: rename this to node_name_to_target_dtype_info 1 // TODO: Update this as well; 1 # TODO: Once we decide to break serialization FC, we can 3 // TODO: this should be really swapped for something more efficient 1 schema.overload_name().empty(), // @TODO: is this check correct? 1 // TODO: vectorize in accscalar_t? 2 // TODO: What if these parameters are not of the correct dimensionality? 1 // TODO: remove d) from the requirements because the simplification formula 4 // TODO: We could allow freezing in this case but we would need to 1 // TODO: Land on a general solution for RPC ThreadLocalState. See 1 // TODO: add a macro to declare the filters 1 // TODO: consider to convert non-contiguous tensor to `ideep::tensor` directly. 1 # TODO check if we should set reduce_rage = True by default here 1 // FIXME: how do cases work? 1 // TODO: strides variant? 1 // TODO: Extend support to N-D batched embeddings, similar to 1 # TODO move op-specific logic out of here 2 # TODO use valid to mask invalid areas due to padding in loss 1 # TODO: this doesn't seem right... 1 // "aten::masked_fill.Tensor(Tensor self, Tensor mask, Tensor value) -> Tensor", TODO: requires 0-dim Tensor 1 int64_t model_version) { /* TODO: T90339189 deprecate all v3 when v3 models 1 // TODO: we don't care about merging multiple profiling runs as we don't 5 // TODO: Write gradient for this when needed 3 // FIXME: consider each updater above will broadcast its value with 1 // TODO: #ProfileIValue List should update this 1 # TODO: implement load_state_dict 1 # TODO: check that criterions don't ignore grad_output 1 // TODO: Stop manually allocating CUDA memory; allocate an ATen byte 1 // TODO: revisit complex inputs and equal_nan=true after 1 # FIXME: AssertionError: False is not true : Tensors failed to compare as equal! 2 * TODO: we could use different names for the following 'handle_torch_function' 1 // TODO: hacky way of determine the group size 1 # TODO refactor this code once we update the prepare logic to have additional information on 1 # FIXME: minimum does not accept scalar inputs 1 // TODO this faster solution does not work on Android build 1 // TODO: specialize for float2half2/half2float2? 1 // FIXME : code duplication with conv_dnnlowp_acc16_op.cc 1 // FIXME refactor aliasdb construction to be more robust to mutation so this 1 # (e.g., at::cpu::add). We don't generate methods (TODO: do this 1 // TODO: Autotune/use better heuristics, improve speed more. 1 // TODO If just B requires grad, the following formula is better: 1 # TODO: backward uses in-place operations that vmap doesn't like 1 # TODO: Maybe this should be in tensor_classes? :) 1 // TODO: preserve the func type. 1 // TODO: rename to c10 1 // TODO: maybe consider deduplicating the definitions here, it's getting 1 # TODO: Possibly check scale and zero point. 1 # TODO: The above procedure does two matmul+allreduce steps per iteration -- 2 # TODO: Conv2dLocal 1 // TODO: sameAs should have better logic to check against any type and return 1 // TODO: The implementation of `tensor_kernel_scan_outer_dim` and 1 // TODO: Replace me with inline constexpr variable when C++17 becomes available 1 # TODO move op-specific logic out of here 1 # TODO: implement timeout 1 // TODO: Remove once we clean up the GraphExecutor usage. 1 # TODO Consider using Q = torch.orgqr(*torch.geqrf(A)) to compute the Q of the QR _much_ faster 1 # TODO: core_trainer_sources is not necessary for libtorch lite 1 # TODO allow (loc,scale) parameterization to allow independent constraints. 1 // TODO Disallow this and rather use std::unordered_map/set everywhere 1 // TODO: 2 // TODO: memory stride should be considered here, our inference above is not 1 // TODO: Disable cont. branch to test more risky code 1 // TODO Make it work for more compilers 1 // TODO: handle optional out_qscale, out_qzero 1 // FIXME: don't do this if they're efficiently moveable. 1 # TODO add more checks 1 // TODO: Actually record which one we actually picked 1 // TODO: change the condition to `self_.dim() != 0` once we expose scalars 1 // TODO: expand this to convXd 1 // TODO: support the mask case 1 # TODO: Support context manager interface 1 // TODO: we're not using the most efficient algorithm here for simplicity. 1 // TODO: temporary hack to resolve my is_constructible issue; 1 // TODO: we could add __torch_function__ dispatch here but I don't know 1 // TODO: set correct domain for function proto. 1 # TODO: This is probably not exhaustive, but it's a start 1 # FIXME: uint8 input returns uint8 instead of bool 2 // TODO: Remove functions below when ChannelsLast3d is ready. 1 // TODO: the TensorIterator reduction implementation of mean 1 // TODO: The deprecation here triggers a deprecated use warning 1 // TODO: optimized kernel 1 # TODO: add try-except and destroy _agent in all processes if any fails. 1 // TODO: we probably have done this already up to this point 1 // TODO: change whether to include the parenthesis to the parent expression, 1 TODO: This is an inefficient implementation that uses `.dequantize`. 1 // TODO: Maybe check that compressed_size === file_size. 1 // TODO: Remove all these messages and use rpc + registered functions instead. 1 // TODO: Fix this pass/maybe get rid of this part. 1 // TODO Update quantize_tensor_arm implementation to follow quantize_val, 1 // TODO: replace self.inverse with linalg_inverse 1 // TODO: inference mode for chaining 1 # TODO: move this into library proper 2 # TODO: Add Exponential-Laplace KL Divergence 1 # TODO: explain this 1 // TODO: Memory use can probably be optimized by re-using kernels across GPUs with 1 // case we need to run aten ops (TODO: support different devices). The first 1 # TODO: Expand to remote RRefs. 1 // TODO: a possible optimization is to fuse the fp16 conversion into Concat 1 // TODO: restore the above, see https://github.com/pytorch/pytorch/issues/64709 1 # TODO currently placeholders/parameters aren't put into random partitions, 1 // TODO: trick the optimizer for case where C == 4? 1 // TODO: optimize in-place opererations and copy operations 1 // TODO: use adapter instead of istream? 2 # TODO: this does the wrong thing with KeyError 1 // TODO: I am not sure if we actually need the 'dropout' and 'train' parameters 1 // TODO: add GUARDED_BY once it is available 1 // TODO: remove this once we don't automatically enabled Autograd dispatch 1 # TODO: don't explicitly list dtypes here; get it from canonical 1 // TODO: this is an unnecessary copy. In theory we can directly 1 # TODO: handle fqn 1 * TODO: We may want to have ordering of outputs to inputs. I'm not sure why we 1 // TODO: Swtich to TensorIterator for better maintainablility and 1 // TODO What if it gets set later? 1 // TODO dim == 3 case will be enabled once it is fully tested 2 // TODO: call disarm rather than leak gil_scoped_acquired once PyThreadState_Clear can safely be called from finalize 1 // TODO: Make this structured to undo the perf regression from native:: removal 2 # TODO: FIXME: RuntimeError: "bitwise_or_cuda" not implemented for 'Half' 1 // TODO: remove after broadcasting is supported 3 # TODO: This feature could be added in the future 2 // TODO: remove this. This is a temporary list of functions that allow Python 1 // TODO: we could also parse extra arguments here and allow to pass in 1 1}); // TODO investigate how this is different from normal empty_strided 1 // TODO: better error message 1 // TODO: It's not good for these ops to be top-level, it makes cases 1 // TODO: enable with better TLS support on mobile 1 // TODO: keep running estimates. 1 # TODO: remove allow_list 1 // TODO: Actually, would this make ASAN's job harder catching 1 // TODO: mutex this guy; 1 CUDNN_RNN_ALGO_STANDARD, // TODO: verify correctness / efficiency. 1 # TODO: Pretty sure this approach loses ConstSequential status and such 1 # TODO: replace torch.maximum with masked maximum when available. 1 // TODO: Use lookahead to avoid creating the tuple and immediately 1 // TODO: this checks that the metavars occur directly as an index, but this 1 // TODO: remove when we have Type support in the IR 1 // TODO: could also take intersection of refinements present in 1 /* TODO: move this to a common place */ 1 // TODO: Maybe show simple lists and tuples on one line. 2 // TODO: extend the supporting inputs here. 1 // TODO: code duplication with dnnlowp_op.h 1 // TODO holding this thing is creepy 1 // TODO: Try to avoid a copy here. 1 assert_jit_shape_analysis=False, # TODO: support index.Tensor() 1 // TODO: Add more types (int32, int64) 1 // TODO: Need to redesign this part a bit to 1 // TODO: potential optimization - if there is a Symbolic 1 // TODO: This can be optimized. 1 # TODO: return self 1 # TODO: make DDP uneven inputs context manager support buffer 1 // TODO: These don't really belong here but torchvision builds in CI need them 1 // FIXME: code duplication with ConvDNNLowPOp::QuantizeBias_ 1 // TODO: make this a context guard 1 # TODO: `cpp_type` is only to keep it byte-for-byte compatible with the old codegen, should remove. 1 // TODO handle multiple levels here 1 // TODO: In the ideal end state, it's okay to set disabled version_counter 1 // TODO: support shape inferencing. Right now we only handles static shape 1 // TODO: Swtich to TensorIterator for better maintainablility and 1 # TODO: see https://github.com/pytorch/pytorch/issues/64709 4 // TODO: Maybe show simple (empty?) dicts on one line. 1 // TODO: Check size, stride, offset, and numel and indicate if 1 // TODO: uncomment once unpack is implemented for BCSRMatrix 1 // TODO: proper implementation with masking. 2 // TODO: We have to extend it to support shapes vector. 1 // TODO: Consider generalizing this into a call stack. 1 // TODO: Proper fix is to create real descriptor classes 2 # TODO: Support non-equal-rank broadcast where semantics match. 1 # TODO MAKE SURE THAT DISABLING WORKS 1 // TODO: 1 // TODO: clean this up when https://github.com/pytorch/pytorch/issues/60306 is improved 1 // TODO: Complete after verifying utility of TT-layer's forward pass. 1 // TODO: consider interleaving herrmman merge and bruteforce merge, as 1 // TODO: this is in no way unique and is just a hack right now. 1 // TODO: add error reporting for graphs that can't be converted. 1 // TODO: descriptor checking 1 # TODO: move this to a separate function 1 // TODO: replace input.svd with linalg_svd when torch/xla can work with at::linalg_svd 1 // TODO: Make the Python API above to just call this C++ API. 1 // TODO: put this into the public API 1 # TODO: So far we don"t have a module using this method. We"ll keep 1 default_complex[0] = atof(str.c_str()); // TODO: parse "x + xj"? 1 // TODO: Remove this header 1 // TODO: Here can only use std::partial_sum for C++14, 1 // TODO: we should ideally be able to interrupt this blocking wait if we check 1 // TODO: remove the extra check when all the Tensors are properly initialized 1 # TODO: make helper functions for (torch.quint8, torch.qint8, None) 1 // TODO: trick the optimizer for case where C % 4 == 0? 1 # TODO: To remove this check once Union suppport in TorchScript lands. 1 // TODO: the above were the only checks in rnn.py, but it doesn't seem 1 # FIXME: numpy reference diverges: Comparing (nan+nanj) and (-0+0j) 1 # TODO (mingzhe09088): get rid of noqa 1 // TODO: will unify the two macros BUILD_LITE_INTERPRETER and C10_MOBILE soon. 1 # TODO: FIXME: sigmoid fails on complex inputs that require grad 1 // TODO: need to clean up all the env options 1 // TODO: Include alpha check for add/sub 1 // TODO: Support multiple quantization methods instead of assuming 2b1b. 1 # TODO: implement numpy-like issubdtype 1 // TODO: I think tensor geometry sufficient for weight_buf/weight 1 // FIXME: adding value comparison is slow 2 .DisallowInputFillers() // TODO: enable the filler 11 // TODO: Make this configurable. 1 # TODO: is this right? Don't really understand this 2 // TODO: move this to C10 and make it C10_API 1 // TODO: We should change parallelize interface to be on tensorview or at least 1 # TODO: reconcile with torch.linalg.det and torch.linalg.slogdet 1 // TODO: Ideally, this function would never be called if requires_grad is 1 // TODO: Deep-copy the module 1 // TODO: Can be changed to FIFO in order to avoid full traverse on every 1 # TODO: this is not handling non-tensor tuple args (for example, 1 // TODO: use llvm.abs intrinsic for LLVM 12 1 // TODO: when resize_cuda_ is re-written to be unified with resize_, 1 // TODO: move this to more generic location. 1 // TODO: extend to fusion of consumer into _producer's_ fusion blob 2 // TODO: remove context 1 // TODO: Refacto qnnpack_utils.h so as to separate code 1 // FIXME: This error is conservative. Detected an interface module 1 // TODO: resolve VarType if necessary 1 // TODO: This probably shouldn't actually be static inline 1 // TODO: once copy is exposed in Declarations.yaml we may be able to bind 1 // TODO: Refactor this so we just pass everything in via options 1 // TODO: Keep only the else branch once constant_folding is enabled by 1 // TODO: Should this be rfactor instead of root?? 1 // TODO: This could be bad juju if someone calls globalContext() in the 2 // TODO: Test on GPUs. 1 # TODO: flatten allocates a std::vector, which could be expensive 1 // TODO this function makes broadcast communication call and 1 # FIXME: "prod_cpu" not implemented for 'BFloat16' 1 // TODO: some of these ops will not get generated because 1 // TODO: throw saved exceptions 1 // TODO: Can we change the logic of vectorizer so that we don't need this? 1 // TODO: currently only identify terms with one variable being mod; it is 1 // TODO: To achieve better performance we can have a pipe pool per 1 # TODO: put this somewhere else, maybe 1 # TODO: ConvTranspose2dMap 1 // TODO: this will return only the AccessInfo for A. It's included for 1 # TODO: for some reason weak_script_method causes a destruction of the 1 // TODO: this function is nontrivial and since CudaRTCFunction uses CRTP, it 1 // FIXME: pass in num_reduce_dims?! 2 // FIXME: this is a temporary solution to add a special-case for 1 # TODO: the following line needs to only check fqn 1 // TODO: what if the previous handler uses sa_sigaction? 1 // TODO: Use TensorGeometry here instead of the entire Tensor, which we 2 # TODO: remove the overriding implementations for LSTM and GRU when TorchScript 1 // TODO fields 1 // FIXME: There is no `operator<<` overload for `at::kBFloat16` type, 1 # TODO: I'm not sure if this counts as an implementation detail of 1 // TODO: Unranked SymbolicShape printing is ambiguous with that of 1 // TODO: the Python equivalent code has special-cased copy_to 1 # TODO: Improve this error message, possibly after converting 1 TODO: 1 // TODO: should we get symbolic_size instead and check for size 1 # TODO: SubtractiveNorm2d 1 # TODO: add different size support for sparse_nn_partition 1 // TODO: Verify that nodes in the pattern don't alias. 1 // TODO: Validate method_compile_spec. 1 TODO: The current implementation of this script only generates interfaces for built-in methods. To generate 1 // TODO: trying to reduce the variable number (common subexpression 2 // TODO (after Tensor merge) If we pass in a Blob holding a Tensor, extract 1 // TODO: use bitwise operator overloads once we add them 1 # TODO: create a new type of reducer with external weights to wrap 1 // TODO: too many things are currently abstracted under the term 1 // TODO: combine with TensorArg? So far that's been for debugging, and this is functional... 1 # TODO: Fix test_out_arg_all_dtypes as torch.empty_like(expected_output) where expected_output=op(input) 1 # TODO: sm_backend_config_dict can fallback to use parent's backend_config_dict 1 # TODO: Add qat support for BNReLU2d 1 // TODO There is nothing in the system that relies on aten:: and prim:: 2 # TODO mypy doesn't support @property, see: https://github.com/python/mypy/issues/6185 4 // TODO: make RRef an IValue, and edit createStackForSchema accordingly 1 // TODO: check scalarType 1 // TODO: reuse temporaries when possible (e.g. for inplace operations) 1 # TODO: this is hack to recognize NumberType 1 // TODO: Say what actually used it 1 // TODO: Some ops have conversion happen at Peephole pass. 1 // TODO enable fast handling for reductions 1 # TODO: Consider incorporating this into the data model 1 # TODO: handle errors here and just ignore the file? 1 # TODO: Either lint that GHA scripts only use 'set -eux' or make this more 1 # TODO: Add qat support for BNReLU3d 1 # TODO: make WEIGHT_INDEX_DICT and BIAS_INDEX_DICT an argument to the functions that needs them 1 # TODO: refactor & remove the following alias 1 _dim = [i for i in range(ndim)] # noqa: C416 TODO: rewrite as list(range(m)) 1 # TODO: FIXME: RuntimeError: "min_elementwise_cuda" not implemented for 'ComplexFloat' 2 // TODO combine this with quantize_val once the numerics for ARM are aligned 1 # TODO: Checking `ps.method and ('requires_grad' in parser_outputs)` is a hacky 1 // TODO torch::autograd::backward should take the c10::optional gradient directly 1 # TODO: investigate nondeterminism 1 with torch.onnx.select_model_mode_for_export(model, torch.onnx.TrainingMode.EVAL): # TODO: move outside of torch.onnx? 1 // TODO (this will include the version number later) 1 // TODO: output(1) & output(2) should also be marked 4 // TODO: check type? 1 """ TODO (mingzhe): it is not necessary to sum up everything by myself, 1 // TODO: Now that set_output resizes both the original_tensor 1 // TODO: the following logic can be merged into regular Tensor class methods 1 // TODO: This function is not used. 1 // FIXME We should always extract DataPtrs, in order to catch the case of 1 // TODO: I think it may be possible to track inside the loop and 1 # TODO: Remove this scripting logic once the 2-week FC window has passed. 1 # TODO: no union or any types in TorchScript, make step a scalar tensor instead 1 main() # TODO: Run this script automatically within the build and CI process 1 // TODO: - refactor and make explicit part of TE Kernel api 1 // TODO: Remove ROCM-specific behavior when https://github.com/pytorch/pytorch/issues/59750 is fixed. 1 * TODO: Deduplicate this with THTensor_(newWithTensor) 1 // TODO: put this into the library 1 // TODO: fast erf 1 # TODO: DDPSink is currently enabled for unused parameter detection and 1 // TODO: we can optimize dequantization by doing a premultiplication 1 // TODO: Use re2. 1 # TODO: Expose these directly to Python to avoid maintaining this list. 1 // FIXME: share version counters 1 # TODO: handle in place on tensor list 1 // TODO: is there a danger of us fusing operations that's supposed to be on 1 # TODO: make this part of something more general, or get rid of it. 1 # TODO: delete these special cases; see tools.codegen.api.cpp--these 1 // TODO: The algorithm below can probably be optimized. 1 // TODO: Replace me with std::numbers::pi when C++20 is there 1 // TODO: should this run through dispatch on this and other? 1 # TODO: These are the modules that cannot be observed 1 // TODO: contiguous is called for further JIT optimizations. 1 # TODO: verify that weight comply with TRT structured sparsity requirements: 1 // TODO: support for exceptions 1 // TODO: Not compatible with embedding dim larger than maxThread 1 // TODO: This at::abs() call is used so that the at::abs() call in the 1 // TODO: this is currently too wide. It detects whether a store-target 1 // TODO: try passing the "mapped" file directly to cuModuleLoadCall instead of using an intermediate buffer 1 // TODO: use PackAMatrix if filter_qparams_[0].zero_point == 0 1 // TODO: it would be nice if we could use 1 # TODO: move these out 1 # TODO: maybe orgnize this better (e.g. break down to more functions) 1 * FIXME: You can move this function to Generator.cpp if the algorithm 2 // FIXME: Consider assigning over existing elements, rather than clearing & 1 // TODO: we should comply to codegen type promotion. 1 // TODO: it's possible to make the _out variant to be a primal function and implement linalg_eigh on top of _out 1 // FIXME: stride should be optional 1 // TODO Making operator== noexcept if underlying type is noexcept equality 1 // TODO: replace (module, method_name) with graph? 1 // TODO: handle scaling factor when it's not constant 1; 1 node_->kind() == aten::cat, "TODO: generalize logic"); 1 // TODO: multi-scale histogram for this thing 1 // TODO: is weight required to be contiguous? 2 // TODO: if necessary, use dispatcher. 1 // TODO Support more types 1 * TODO: we could consider exposing API to allow custom registration of parsing 1 // TODO: Why?! Can't we just flip the order here... 1 // TODO: figure out how to commonize this with int8 quantize 1 // TODO: provide guarantees that compiler won't optimize this out 1 # TODO: Now, there is something interesting going on here. In the code below, 1 // TODO: add invariants 1 # TODO generalize this for more things 1 // TODO: we should probably get a list that's close to what our fuser handles 1 # TODO: RuntimeError: While computing batched gradients, 1 # TODO: support more than just LSTM 1 // TODO: out op variants 1 # TODO: add qat.Conv1d 1 //TODO handle multi-return functors 1 # TODO: eliminate mask_input as unnecessary when using masked divide. 1 // FIXME: this should be (and was) Scalar::toTensor, but there is currently no way 1 // TODO: to support more data type 1 // TODO: different iterator types for sizes & strides to prevent 1 // TODO: use current device id from thread local instead of passing gpu in 1 // FIXME: Current Alias analysis fails to track subvalues. 1 // TODO: uncomment the following when passing incorrectly sized 'result' is not allowed 1 #include // TODO: remove, debugging only 1 // TODO: The kernels are copied from fbgemm_gpu, we should dedup them later 2 // TODO: replace with at::zeros when it's implemented for sparse csr 2 // TODO: Enable view in parser by detecting non-alias view operation 1 // TODO: insert CUDA's async stream waits; tracing and counters 1 // TODO: Add List[bool] once .to> doesn't throw an error 1 // TODO: if we fail GlobalInit(), should we continue? 1 // TODO: check that listeners are not relying on prepareForDeregistration() 1 // TODO this isn't a scalable way to determine parallelism. 1 # TODO: Check tensor types for ops 1 # TODO: make into assert 1 # TODO: Remove string escape once Python-3.6 no longer supported 1 // TODO: output 3 & 4 are not created 1 // TODO: check for casting once it's supported 1 # TODO: Add Beta-Laplace KL Divergence 1 // TODO: we should make the tradeoff here to use thread_local instead of global 1 # TODO: this list might be incomplete. 1 // TODO: Move this to fixed_divisor.h 1 # TODO: may need to change the key to Node regenerate the map in each transformation, 3 # TODO: Use a real parser here; this will get bamboozled 1 # TODO: remove special case for operator.getitem 1 // TODO: once modules are first class in the interpreter and methods are not 1 // TODO: we can consider preallocating and pre-filling the args vector. 1 // TODO: Remove template? 1 // TODO remove after TensorOptions rationalization 1 // TODO: better error message 1 // TODO: handle cases where we need to generate > 2^32 element tensors 1 # TODO: Move this into ONNX main library 2 # TODO: Add ContinuousBernoulli-Laplace KL Divergence 1 // TODO: refactor this file (one per namespace) 1 // TODO: Remove once we fully migrate to non-blocking mode. 1 # TODO: See if we can extract GPU vs CPU information from the PyTorch model 1 // TODO: named constructor to avoid default initialization 1 TODO: @kefeilu: this function's body should be moved into the actual calling 1 # TODO use glob 1 // TODO: this can return MaybeOwned 1 # FIXME: bfloat16 backward support likely depends on CUDA11+ 2 // TODO: Replace the link to the documentation once it's available. 1 # TODO: delete this list once we make all nn_tests work 1 # TODO: This could get us in recomputation trouble if b.expr is nontrivial 1 // TODO: We should add the following, but we need to go through schedulers 1 // TODO: fix fb internal use-case so that it doesn't trigger this internal assert when the base is not a view. 2 // TODO this is deprecated but we don't throw a warning because a lot of ops in 1 // TODO: (@krshrimali) Try inheriting from TensorIteratorBase instead. 1 # hardcoded for now, TODO: expose the api to user, 1 // TODO: Set to 1? 1 // TODO: Tuning NumThreads for sum_squares 2 // TODO: remove? 1 Callable, # fp32 op type (TODO future PR: add quantized op type) 1 // TODO: masked_scale_cuda is to be removed, see PR #63937 1 //! TODO: this can be largely expanded to look at complete 1 // TODO: Clean up what remains here 1 # TODO: Manually add `self.param_groups` if using a functional 1 # TODO: proper overriding analysis when implementing class inheritance 1 // TODO: Extend support for attribute of type List[Tensor] etc. 1 // TODO: generalize logic to for other tensor input ops when they are 1 // TODO: Currently we only support (*, Sparse) combination for (tensor.layout(), tensor.grad.layout()) 1 // TODO: should we support runtime compilation with updated dynamic shape; 1 // TODO: even though this API is currently used **only** in codegen to 1 // TODO: consider storing namespace separately too 1 // TODO: allow abstract kernels to reuse generated kernels from common pool 1 # TODO: test coverage for mixed types inputs. 1 // TODO: When Variable is added, delete these constructors 1 /// TODO: This function encourages bad behavior (assuming CUDA is 1 // of graph inputs. TODO: remove 1 # TODO: FIXME: complex inputs requiring grad error in forward 3 * TODO: Look into changing the threading semantics of Generators in ATen (e.g., 1 // TODO: eliminate this conditional when zero-size dims supported correctly 1 // TODO: find a better metric in using ldg or not. Support different dtypes. 1 // FIXME: remove const_cast once unary_op_impl_out is updated 1 # FIXME: improve precision 6 // TODO: avoid spilling W by breaking out the non-padded vs padded case. 2 # TODO: we don't currently do this functions that are recursively 1 // TODO: Blegh, bare references 1 // TODO: could transform up to 2 other dims in the same cuFFT operation 1 // TODO: Use liveness analysis to catch more general scenario 1 # TODO: byte-for-byte compatible with old codegen behavior - it's incorrect to assume 1 // TODO (Ashkan): Disabling temporarily. 1 // TODO: Assert it's an ATen identifier??? 1 // TODO: assert all provided preferred roots are in the history of reference 1 # TODO: FIXME: cholesky_inverse throws an error in forward when requires_grad=True 1 # TODO: merge this with elementwise bench 1 # TODO: @krshrimali, once to_numpy method in SampleInput class is modified to take None inputs, 1 # TODO (add more FileCheck signature) 1 // TODO: eliminate me 1 # TODO: fix LSTM handling in eager mode static quant and remove this 1 // TODO: what's an "inverted interval"? Open on the left 2 // TODO: Should this actually be in launch params? 1 # TODO: dedup this branch 1 // TODO: See if it's possible to use those directly. 2 // TODO Use fbgemm kernel to pack values 2 // TODO: This is a temporary approach to allow C++ thread to correctly 1 # TODO: clean up old codegen behavior 1 # TODO: support regex as well 1 // TODO: Could consider putting some of 1 # TODO: move the exceptions to proper locations 1 // FIXME: Not actually doing floor division (#43874) 2 # TODO: Might need a fix in torch group_norm module 1 // TODO: for now don't attempt partial factorization of this 2 # TODO fix when https://github.com/python/mypy/issues/2427 is address 1 # TODO: look into using weakref here instead. 2 # TODO (zaf): Inherit from `quantized.LinearPackedParams` (T83294430) 1 // TODO: no ifs for now 2 # TODO: this WAR is for https://github.com/pytorch/pytorch/issues/18524 1 # TODO: move DefaultFuseHandler 1 # TODO: Maybe make these names match the original. 1 # TODO: some signatures of nanmedian do support out 1 // TODO: try to remove this 1 // TODO: more user friendly API 3 # TODO: rename this file to config_utils 1 # TODO: Remove this once ScriptModule supports registering None buffer 2 // TODO: we can avoid this guard by moving operations 1 # TODO: do not run this twice on input and output 1 // TODO: get rid of temp memory use 1 // TODO: Maybe add better logging here. 1 // TODO: support clamp_min.Tensor(Tensor self, Tensor min) -> Tensor 1 // TODO: this step would not be deterministic, because valuesBetween isn't 1 // TODO: Remove once we migrate everything to non-blocking mode. 1 // TODO: DebugUtil will be upstreamed after LazyTensor is in. 1 // TODO: Ideally we only add AutogradBackend key when the tensor requires 1 int64_t model_version); /* TODO: T90339189 deprecate all v3 when v3 models 1 // TODO: `aten::sum` is too flexible, we should restrict for a better 1 # TODO: add limited pickling support for sharing an iterator 1 # TODO: fix bug in the documentation for svd_lowrank: 1 "TODO: support more cuDNN activation modes"); 1 // TODO: get std::forward<> to work 1 // FIXME This does some unpickling, which could be a bit expensive: 1 // TODO: support color jitter and color lighting in gpu_transform 2 # TODO: switch to zero_point.item() after adding JIT support 1 // TODO We still save ws is because of the current design of workspace and 1 // TODO: let's check the data type for buffer and skip if it's good 1 # TODO: make attentions a generic state 1 // TODO: For real input, perform rfftn then mirror with conjugate symmetry 2 // TODO: check if objects have been freed from time to time 1 # TODO: may need to change the mapping when we support dynamic quantization 1 // TODO: allow abstract kernels to use multiple generated kernels 1 * found matches, no nodes in the subgraph alias with each other). TODO: check 1 // TODO: is_reduction is too hacky here. we should categorize operation types 1 // TODO this leads to ambiguous cases (NC11) to be always treated as contiguous 1 // TODO: fill? 1 // TODO: Maybe check that only the batch dimension is changed? 1 // TODO: handle epilogue 2 // TODO: be more explicit about the full key set at call sites so we 1 // TODO remove the mutation here 1 // TODO: rename flags to C10 1 # TODO: maybe make "pattern" to be a list of patterns 1 # TODO: look into rewriting with early return and getting loop unrolling to fire 1 /// // TODO: when tensors are stored in the pickle, delete this 1 # TODO: check gradients for parameters, not just inputs 1 # TODO: When PyTorch drops the support for Python 3.6, it can be converted 1 // TODO : init value support or remove. 1 // TODO: Look into using DepthFirstGraphNodeIterator 1 // TODO: Deprecate this instancecheck entirely. It's here to make 1 // FIXME: warn if this is the case 1 # TODO: MSECriterion weight 1 # TODO More precise types here. 1 # TODO: if statement only here to tell the jit to skip emitting this when it is None 1 # TODO make these types more precise 1 // TODO This whole file should be deleted and replaced with the mechanism 1 constexpr int thread_work_size = 4; // TODO: make template substitution once we decide where those vars live 1 // TODO: std::vector -> std::vector 1 # TODO: this doesn't work with RNN ops 1 # TODO: This is a little inaccurate, because it will also pick 1 // TODO: consider storing shape compute graph directly on operator, 1 // TODO: if we extend TensorIterator to accept 3 inputs, 1 # TODO: Should handle optional here? 2 // TODO: need to raise an error when you impl a function that has a 2 // TODO: Handle the storage_order properly to get the NCWH. 1 // TODO: change it to CUDAGuard 1 // TODO: Would be great if we didn't need this. 1 // TODO: TO be removed, once this properly works from libkineto 1 # TODO: clamp shares tensors among its sample inputs --- we should prohibit this! 1 // TODO: why is this optional? 2 // TODO: for_blob produces non-resizable tensors, we might want this to be 1 // TODO: make this more const correct 1 // TODO: The new CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode was 1 // TODO: Need to bump 'num_expected_refs' here when we support post_hooks for 1 # FIXME: clone indices on construction. 1 // TODO: Maybe add better logging here. 29 * FIXME: Remove support of the legacy state in the future? 1 // TODO: only use this if necessary (add a pass to find all shared ivalues, 1 // TODO: The Python version also accepts arguments 1 // TODO: Check stride and indicate if the tensor is channels-last or non-contiguous 1 // TODO: Implement precison-based formatting 1 // TODO: Seriously consider writing the derivative formulas for 1 // TODO: change to a proper error report 4 # TODO: make use of reduce like below when JIT is ready with the missing features: 1 # FIXME: ldexp does not accept scalar inputs 1 // TODO: merge this loop below. 1 // TODO: [T87567124] Fully implement chunk in Metal shader 1 # TODO: ContrastiveNorm2d 1 // TODO: add a check for OpArgCount and op_type 1 * TODO: We may want a BFS version of this code to extract ILP, not implemented 1 // TODO make this an iterator and simply promote it on insertion. 1 // TODO: Remove annotateInputShapes pass when TraceGraph can also capture 1 // TODO: remove this, there are no codegenerated checks for devices yet 1 // Zero-fill undefined grads (TODO: do this more efficiently) 1 // TODO: the operator depends on output being set to 0 before the run 1 // TODO: is it worth optimizing this loop via padding in C? 1 // TODO: merge with the binary counter parts 1 // TODO: No version; c.f. https://github.com/Maratyszcza/NNPACK/issues/165 1 // TODO: if neither *at nor *bt is 1, ensure they are identical 1 // TODO: either decompose composite ops like slice or add handling here 1 return at::dequantize(out); // TODO: optimized kernel that outputs fp32 so 2 // TODO: Review const model, and objects 1 // TODO: debugging mode to see the qualifier. We definitely 1 raise AssertionError("TODO not sure if there are other valid types to handle here") 1 // TODO: Revisit this if Bufs need to be cloned as well. 1 // TODO add factor as an input, need to check Split::Split during validation 1 // TODO: This is a temporary solution. We should pass enough information to 1 // TODO: check perf vs dedicated kernel. 2 // TODO: raise warning when parsing deprecated signatures 1 // TODO: make mkldnn tensor serialize... 1 // TODO: There is no need to branch with every element 1 // TODO: getDevice() ? 1 // TODO: make a version that takes an impl argument. Unfortunately, 1 // TODO: parallel_method->getParallelType 1 // TODO: do XLA 1 // FIXME Isn't it too verbose for a library to print logs in normal operation? 1 # TODO Remove once gloo submodule is recent enough to contain upstream fix. 2 # FIXME Once we consolidate the error messages returned by the 1 SparseHIP, // TODO: I think this is not actually used, due to Note 1 // TODO (tugsuu) move this calculation into a seperate step. 1 // TODO: we are following the convention for no good reason; 1 # TODO: 2 # TODO: torch.complex32 when properly supported 1 // TODO: calculate the version_token. 1 # TODO: currently we hard code the root node, which only works for 1 // TODO: Shouldn't all returned results be successful? 1 # FIXME: count_nonzero does not accept keepdim kwarg 1 # TODO: I guess we should do copyreg too? 1 # TODO: Because torch.jit._IgnoreContextManager relies on Python's `exec` method 1 # TODO: for ops with structured_delegate it should check the dispatch table of 1 // TODO: consider TLS (tid + tls counter) 1 // TODO: review if this is computing in double when given a float input 1 // TODO: In C++17 we should be able to use the filesystem header. 1 # TODO: This is to keep same byte-for-byte result as the old codegen - maybe unnecessary? 1 // TODO: Decide what kind of fixed point strategy we will have 1 # TODO: FIXME: lcm doesn't support scalars 1 // TODO: Think of a smarter way to leverage tbb::thread_arena to limit the 1 //TODO maybe think about unifying offset calculators and reuse 1 // TODO: simplify 1 // TODO: see if this pass can be replaced with peephole pass 1 # TODO: Need to benchmark the performance of lowering linear as fully_connected versus 1 // TODO: dont require dimensions of tensors to be set AOT ? 1 // TODO: merge this check up 1 // TODO - smallvector here ? 1 // TODO: This is awful code. Also it doesn't work on Windows. 1 # FIXME: the docs say that persistent_id should only return a string 2 // TODO: update this to become a static assertion 1 # FIXME: bfloat16 backward support likely depends on CUDA11+ and SM53+ 1 # TODO: refactor this to use iterate_and_apply 1 // TODO: Make traversal items local to this function. 1 // TODO: Inline intermediate operations (avoid inlining unrolled/vectorized 1 TODO: maybe rename this to TensorValueOpQuantizeHandler 1 # FIXME: both derivatives are implemented incorrectly 1 // ReductionSchedulerMultiDimNonFastest TODO: test reenablement to make 1 // TODO: should be inputs 1 // TODO: Abstract stride logic to reuse with producer indexing 1 # TODO: state dict offloading, activation offloading 1 // TODO: measure which default value will give better 1 // TODO: Improve this once D31357486 is landed. 1 // TODO (zach): we should consider skipping tensor factories in the cases 1 // TODO: this is not needed as we are only using the first val 1 // TODO: Make this an aten function and replace as_strided_qtensorimpl once that is done. 1 // TODO: support cast of output types 2 # TODO: Talk to ONNX about unconditional cast of scalar to float 1 # FIXME: Undefined behavior sanitizer 1 # FIXME matmul(x,y) + bias currently goes through jit AD, and backward formula in AD is not optimized for this 1 # TODO: byte-for-byte compatible with old codegen behavior - should clean up 1 # TODO: Merge these two templates together in the future once TorchScript syntax is improved. 1 // TODO: unique_name 2 // TODO: Make this more sophisticated. A value being the same as another value 1 # TODO: Figure out if this is safe. It seems like when generating the type signatures for 1 # TODO: remove this as onnx opset 11 spec allows negative axes 1 # TODO: fix this special case in PythonArgParser? 1 # TODO: Annotate with TypedDict when 3.8 is the minimum supported verson. 1 input.ndimension() == 4 && // TODO: 5-D contiguous depthwise is not supported yet, need benchmarks 1 # TODO after stopping workers, wait at least monitor_interval*2 for 1 //! TODO: This implementation looks at common producers only, since common 1 // TODO: another copy paste from jit, refactor so it's usable from both 1 # TODO: rename to something more general 1 # TODO: Stop hardcoding that the output type is a Tensor. Note 1 # TODO: evaluate optimizing this if needed. 1 // TODO: Remove this global var 1 // TODO: Do these make sense? 1 # TODO (refactor) this is duplicated, maybe have a helper function 1 # TODO: support check for standalone module 1 self.model = model # TODO: Need to figure out how to load without this. 2 // TODO: would be nice if there were easy facility to look at uses and see 1 // TODO: check the op_type and make a real decision 3 // TODO: Instead of going up the loop nest we should go through the indices in 1 // TODO: Enable view in parser by detecting non-alias view operation 1 // TODO: What if inputSizes is not of the expected dimensionality? 1 # TODO: namespace threshold in 'nn' 1 // TODO: Return an ArrayRef instead (and delete the singleton while you're at 1 // TODO: Our current linear mode impls use unbound indices 1 // TODO A previous implementation of alias analysis always accessed 1 // TODO: out_axis_dim is assumed to be the same as the extent of 1 // TODO: To support all features of MemoryFormat::Preserve we need to add 1 // FIXME: use occupancy calculator instead 1 # TODO: handle non-tensor inputs 2 # FIXME: would be nicer if TensorOptions was optional based; not adding default arguments for options given 1 // TODO: I'm not really sure if we're actually obligated to traverse PyObject 1 // FIXME: if input < kVecSizeInFloat, can't vectorize at all 1 // TODO: fetch scalar type from Tensor? But it doesn't really matter... 1 # TODO: Add support for comparing meta tensors. See https://github.com/pytorch/pytorch/pull/67032. 1 // TODO: Consider putting the stub definitions in another class, so that one 1 // TODO: contiguous can be made to preserve the memory format 1 // TODO: extract & guard profile_ivalue; but how do we restore it??? 1 // TODO: include all the other ways of adding these args. 1 // TODO: we will need a kernel_ir cloner to make this 1 # TODO: improve error propagation 1 # TODO: something smarter 1 at::AutoDispatchBelowAutograd guard; // TODO: remove 4 // TODO: fbgemm::Quantize doesn't support taking in the 1 // TODO: Are we sure these tensors will always come into this fucntion with the 1 # TODO: merge with default static mapping 1 # TODO: Consider defining some aliases for our Union[...] types, to make 1 for (const auto i : c10::irange(N)) { // TODO: multithreading 1 TODO: add a README when it's more stable 1 # TODO: Provide more useful diagnostics. 1 // TODO Reuse stack vector instead of allocating? 1 # FIXME: sum does not support passing None to dim 1 # TODO: consider time-bound constraints as well. 1 // TODO: Improve the performance of this by figuring out a better approach. 1 // TODO: preallocate `runArgs` during compilation and fill in values where 1 // TODO: RpcCommandBase should have an abstract execute() method that we can 1 # TODO: test_non_contiguous_tensors doesn't handle case where output is not a singleton (such as 1 # TODO: Remove this when `make_tensor` supports excluding `0`. 1 # TODO: FIXME: jiterator does not support casting to complex outs 1 // Half. TODO: __truncdfhf2 1 # TODO: This only works with new style gcc and clang (not the old -faddress-sanitizer). 1 // TODO to support single process multiple devices and multi device modules, 1 // TODO: we might need to do two pass to avoid adverse memory allocations 1 # FIXME: isclose does not accept scalar inputs 1 // TODO: include this as warning once we have a more consolidated 1