models/nlp/electra/run_pretraining.py (7 lines): - line 9: TODO: Combine two segments into a single example. https://github.com/google-research/electra/blob/master/build_pretraining_dataset.py - line 10: TODO: Add zero-padding for shorter sequences - line 64: # TODO: Limit code duplication between train_step and val_step. - line 67: # TODO: Re-add validation step - line 82: # TODO: Make temperature a hyperparameter - line 151: # TODO: Check that accuracy_mask is different - line 280: # SageMaker may have some extra strings. TODO: Test this on SM. models/nlp/common/arguments.py (4 lines): - line 40: # TODO: Change this to per_gpu_train_batch_size - line 96: # TODO: Move this to PathArguments? - line 107: # TODO: Pre-layer norm is not yet supported in transformers. PR is at https://github.com/huggingface/transformers/pull/3929, but maintainers are unresponsive. - line 163: # TODO: Remove these since they're a little too specific models/nlp/albert/run_squad.py (3 lines): - line 238: # TODO: Improve. If only tf.keras.clone_model(model) worked. - line 440: # TODO: Abstract out to specify any checkpoint path - line 448: # TODO: Abstract out to specify any logs path models/nlp/albert/run_pretraining.py (3 lines): - line 162: # TODO: On iteration 0, loss=11 and loss_scale()=32768, so scaled_loss=inf. - line 181: # TODO: Does placing this clip before or after allreduce affect accuracy? - line 368: # SageMaker may have some extra strings. TODO: Test this on SM. models/vision/detection/awsdet/models/anchor_heads/retina_head.py (3 lines): - line 78: positive_fraction=1.0, # no sampling TODO: pass sampler as arg into anchor target generator - line 82: #TODO make losses package common to all models - line 278: TODO: move to utils models/nlp/common/optimizers.py (3 lines): - line 19: wd_schedule = train_args.weight_decay # TODO: Get weight decay schedule working. - line 40: TODO: If weight decay is decoupled from the gradient update, then we need to decay the weight decay - line 44: TODO: Unclear if LAMB needs a weight decay schedule. If so, that would explain why performance gets models/vision/classification/models/resnet_evo.py (2 lines): - line 313: use_bias: whether to use biases for convolutional layers or not #FIXME: - line 365: if not backend.is_keras_tensor(input_tensor): # FIXME: dead code legacy/models/resnet/tensorflow/train_imagenet_resnet_hvd.py (2 lines): - line 642: inputs = features # TODO: Should be using feature columns? - line 686: loss = tf.identity(loss, name='loss') # For access by logger (TODO: Better way to access it?) models/vision/detection/awsdet/utils/runner/hooks/lr_updater.py (2 lines): - line 39: #TODO param groups not supported - line 63: # TODO: setting different learning rates for different param groups unsupported models/vision/detection/awsdet/utils/runner/runner.py (2 lines): - line 171: # TODO: move this method out of runner - line 205: list: Current learning rate (#TODO: support individual LR for param groups) models/vision/detection/awsdet/core/anchor/anchor_target.py (2 lines): - line 60: Used to provide a normalizing factor for bbox regression loss calculation. TODO: revisit this implementation - line 142: # TODO: handle scenario where all flags are False models/vision/detection/awsdet/models/detectors/faster_rcnn.py (2 lines): - line 46: #TODO: delegate to assigner and sampler in the future - line 153: # AS: currently we limit eval to 1 image bs per GPU - TODO: extend to multiple models/vision/detection/tools/test.py (1 line): - line 124: # TODO: support multiple images per gpu (only minor changes are needed) models/vision/detection/awsdet/datasets/builder.py (1 line): - line 10: #TODO: Handle cases with multiple datasets, etc. models/vision/detection/awsdet/core/bbox/bbox_target.py (1 line): - line 90: # TODO: concat proposals list and rois_list models/vision/detection/awsdet/utils/runner/hooks/logger/text.py (1 line): - line 42: # TODO: resolve this hack models/vision/detection/awsdet/models/backbones/resnet_aws.py (1 line): - line 294: use_bias: whether to use biases for convolutional layers or not #FIXME: models/nlp/common/utils.py (1 line): - line 97: # TODO: Merge this with gather_indexes() ci/frcnn/train_sagemaker.py (1 line): - line 124: # FIXME: currently hardcoded to SGD models/vision/detection/awsdet/core/bbox/transforms.py (1 line): - line 59: max_ratio = np.abs(np.log(16.0/1000)) #TODO: make part of config models/vision/detection/awsdet/apis/train.py (1 line): - line 101: # add regularization losses TODO: change multiplier based on type of regularizer (currently L2) models/vision/detection/awsdet/models/anchor_heads/anchor_head.py (1 line): - line 98: TODO: build common when assigner and sampler functionality is done models/vision/detection/awsdet/datasets/loader/build_loader.py (1 line): - line 27: for each GPU. - TODO: unused models/vision/classification/train_backbone.py (1 line): - line 141: #TODO support optimizers choice via config models/vision/classification/models/resnet.py (1 line): - line 300: use_bias: whether to use biases for convolutional layers or not #FIXME: models/vision/detection/awsdet/utils/runner/dist_utils.py (1 line): - line 20: return hvd.rank(), hvd.local_rank(), hvd.size(), hvd.local_size() #TODO return a dict instead models/vision/detection/awsdet/models/anchor_heads/rpn_head.py (1 line): - line 252: TODO: move to utils