Summary: 50 instances, 46 unique Text Count TODO: Add zero-padding for shorter sequences 1 # TODO: Abstract out to specify any logs path 1 # TODO: Merge this with gather_indexes() 1 # AS: currently we limit eval to 1 image bs per GPU - TODO: extend to multiple 1 #TODO: delegate to assigner and sampler in the future 1 if not backend.is_keras_tensor(input_tensor): # FIXME: dead code 1 # TODO: setting different learning rates for different param groups unsupported 1 positive_fraction=1.0, # no sampling TODO: pass sampler as arg into anchor target generator 1 # TODO: support multiple images per gpu (only minor changes are needed) 1 # TODO: Move this to PathArguments? 1 # TODO: Check that accuracy_mask is different 1 #TODO param groups not supported 1 # TODO: move this method out of runner 1 # TODO: Make temperature a hyperparameter 1 # add regularization losses TODO: change multiplier based on type of regularizer (currently L2) 1 # TODO: Limit code duplication between train_step and val_step. 1 # TODO: Improve. If only tf.keras.clone_model(model) worked. 1 #TODO support optimizers choice via config 1 TODO: Unclear if LAMB needs a weight decay schedule. If so, that would explain why performance gets 1 loss = tf.identity(loss, name='loss') # For access by logger (TODO: Better way to access it?) 1 list: Current learning rate (#TODO: support individual LR for param groups) 1 for each GPU. - TODO: unused 1 # TODO: Pre-layer norm is not yet supported in transformers. PR is at https://github.com/huggingface/transformers/pull/3929, but maintainers are unresponsive. 1 # TODO: Abstract out to specify any checkpoint path 1 wd_schedule = train_args.weight_decay # TODO: Get weight decay schedule working. 1 # TODO: Change this to per_gpu_train_batch_size 1 #TODO make losses package common to all models 1 Used to provide a normalizing factor for bbox regression loss calculation. TODO: revisit this implementation 1 # TODO: Re-add validation step 1 max_ratio = np.abs(np.log(16.0/1000)) #TODO: make part of config 1 # FIXME: currently hardcoded to SGD 1 return hvd.rank(), hvd.local_rank(), hvd.size(), hvd.local_size() #TODO return a dict instead 1 # SageMaker may have some extra strings. TODO: Test this on SM. 2 # TODO: Does placing this clip before or after allreduce affect accuracy? 1 #TODO: Handle cases with multiple datasets, etc. 1 TODO: move to utils 2 # TODO: handle scenario where all flags are False 1 # TODO: resolve this hack 1 TODO: If weight decay is decoupled from the gradient update, then we need to decay the weight decay 1 # TODO: Remove these since they're a little too specific 1 # TODO: On iteration 0, loss=11 and loss_scale()=32768, so scaled_loss=inf. 1 use_bias: whether to use biases for convolutional layers or not #FIXME: 3 TODO: Combine two segments into a single example. https://github.com/google-research/electra/blob/master/build_pretraining_dataset.py 1 # TODO: concat proposals list and rois_list 1 TODO: build common when assigner and sampler functionality is done 1 inputs = features # TODO: Should be using feature columns? 1