Summary: 102 instances, 64 unique Text Count //TODO: need to make sure opId is not an output layer of the model 1 #FIXME MaxPool2d supports ceil_mode 2 // FIXME: These 2 functions could execute independently in parallel 1 // TODO: remove this line 1 // FIXME: currently we zero init the region if not read output 1 #TODO: target shape does not support -1 2 // TODO: to be removed when we have attention layers 1 #FIXME assume there is 1 output 2 //TODO: implement 4 // FIXME: Currently assume 1st input for 1st layer = batch_size 1 #FIXME fix kernel, stride and padding 2 // TODO: why cublas/cudnn stream is needed here? 1 # TODO: input shape should not contain batch size for now 2 // TODO: only support relu and sigmoid for now 1 //TODO: change to use output instead of recomputing 1 //TODO: currently do not support splitting over the channel dimension 1 //FIXME: for now only consider i * j == NUM_PARTITIONS 1 //TODO: check data type matches 2 //FIXME: uncomment us 1 //TODO: implement measure_forward 1 # TODO: automatically call this function 2 # TODO: finish API 2 // FIXME: Legion tracing currently does not support MUST_EPOCH 1 // TODO: Use index launcher instead of task launcher 4 //TODO: need to make sure opId is not an output layer of the model 1 _remove_long_seq = sequence._remove_long_seq # TODO: make it public? 2 // FIXME: For now, set upper limits Better: Do as follows, but memory is 3 // TODO: missing profiling here 1 // FIXME: even though it is a CPU task, we use data parallelism 2 //TODO: delete fused_op to avoid memory leakage 1 // FIXME: currently assume the final layer has exactly one output 1 // TODO: why cublas/cudnn stream is needed here 1 //FIXME: currently assume continous indices 4 // TODO: Check why cublas/cudnn stream is needed here 1 //TODO: dataloader does not support CR 1 //TODO: Currently we use a task launch, change to index launch for NCCL parameter 1 int k = out1_domain.hi()[0] - out1_domain.lo()[0] + 1; /*TODO: This prints to 5*/ 1 # TODO: move check shape into another function 4 // FIXME: currently create gradients for constants since the current auto grad algorithm 1 //TODO: swich to use the Legion dim ordering 1 // TODO: perform prefetch for performance imporvement 1 // TODO add parameter synchronization time 1 # TODO: add unsqueeze 2 Tensor label_tensor_with_final_part;//FIXME: to be removed 1 assert(_input.numDim == 2); // TODO: support dims > 2 1 // TODO: assign priorities 1 //TODO: remove data loaders except single data loader 1 // TODO: add support for meta->a_seq_length_dim >= 0 1 // FIXME: Currently only support the sample dimension for operators with NCCL 1 if(l == metrics_input && metrics_input < (int)layers.size()-1) continue; // TODO: If layer serves for metrics and for further prop 1 // FIXME: Need to be atomic depending on the strategy 1 #TODO: finish API 6 // TODO: temp work, will let users to pick either NCCL or PS 1 #FIXME may be 3 2 #TODO: seperate compute_metrics from backward 2 //TODO: consider reduction dim for conv2d and linear 1 #FIXME assume it is a merge 2 # TODO: add range 2 // FIXME: it seems curand has an internal bug with volume < 4 1 # FIXME BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) args are not in FF 2 //TODO: implement broadcast op 2 # TODO: add cast 2 #TODO: this path has not tested 2 //FIXME: Not functionaly correct. 1