Summary: 50 instances, 46 unique

Text	Count
TODO: Add zero-padding for shorter sequences	1
# TODO: Abstract out to specify any logs path	1
# TODO: Merge this with gather_indexes()	1
# AS: currently we limit eval to 1 image bs per GPU - TODO: extend to multiple	1
#TODO: delegate to assigner and sampler in the future	1
if not backend.is_keras_tensor(input_tensor): # FIXME: dead code	1
# TODO: setting different learning rates for different param groups unsupported	1
positive_fraction=1.0, # no sampling TODO: pass sampler as arg into anchor target generator	1
# TODO: support multiple images per gpu (only minor changes are needed)	1
# TODO: Move this to PathArguments?	1
# TODO: Check that accuracy_mask is different	1
#TODO param groups not supported	1
# TODO: move this method out of runner	1
# TODO: Make temperature a hyperparameter	1
# add regularization losses TODO: change multiplier based on type of regularizer (currently L2)	1
# TODO: Limit code duplication between train_step and val_step.	1
# TODO: Improve. If only tf.keras.clone_model(model) worked.	1
#TODO support optimizers choice via config	1
TODO: Unclear if LAMB needs a weight decay schedule. If so, that would explain why performance gets	1
loss = tf.identity(loss, name='loss')  # For access by logger (TODO: Better way to access it?)	1
list: Current learning rate (#TODO: support individual LR for param groups)	1
for each GPU. - TODO: unused	1
# TODO: Pre-layer norm is not yet supported in transformers. PR is at https://github.com/huggingface/transformers/pull/3929, but maintainers are unresponsive.	1
# TODO: Abstract out to specify any checkpoint path	1
wd_schedule = train_args.weight_decay  # TODO: Get weight decay schedule working.	1
# TODO: Change this to per_gpu_train_batch_size	1
#TODO make losses package common to all models	1
Used to provide a normalizing factor for bbox regression loss calculation. TODO: revisit this implementation	1
# TODO: Re-add validation step	1
max_ratio = np.abs(np.log(16.0/1000)) #TODO: make part of config	1
# FIXME: currently hardcoded to SGD	1
return hvd.rank(), hvd.local_rank(), hvd.size(), hvd.local_size() #TODO return a dict instead	1
# SageMaker may have some extra strings. TODO: Test this on SM.	2
# TODO: Does placing this clip before or after allreduce affect accuracy?	1
#TODO: Handle cases with multiple datasets, etc.	1
TODO: move to utils	2
# TODO: handle scenario where all flags are False	1
# TODO: resolve this hack	1
TODO: If weight decay is decoupled from the gradient update, then we need to decay the weight decay	1
# TODO: Remove these since they're a little too specific	1
# TODO: On iteration 0, loss=11 and loss_scale()=32768, so scaled_loss=inf.	1
use_bias: whether to use biases for convolutional layers or not #FIXME:	3
TODO: Combine two segments into a single example. https://github.com/google-research/electra/blob/master/build_pretraining_dataset.py	1
# TODO: concat proposals list and rois_list	1
TODO: build common when assigner and sampler functionality is done	1
inputs = features  # TODO: Should be using feature columns?	1