Summary: 576 instances, 550 unique Text Count # TODO: Currently double disk usage, saving model in HPO and also saving model in bag 1 # TODO: Clean this code, for loop 1 self.name = name # TODO: v0.1 Consider setting to self._name and having self.name be a property so self.name can't be set outside of self.rename() 1 # TODO: Add functionality for advanced feature generators such as gl_code_matrix_generator (inter-row dependencies, apply to train differently than test, etc., can only run after train/test split, rerun for each cv fold) 1 # TODO: Hyperparam could have non-serializble objects. Save as pkl and loaded on demand 1 # TODO: Currently, if this is a stacker above level 1, it will be very slow taking raw input due to each stacker needing to repeat computation on the base models. 1 # TODO: Add option to disable OOF generation of newly fitted models 1 # TODO: Can instead do `_compute_oob_predictions` but requires post-processing. Skips scoring func. 1 # FIXME: Unsure if this works for custom metrics! 1 s3_utils.delete_s3_prefix(bucket=bucket, prefix=prefix) # TODO: Might only delete the first 1000! 1 # TODO: Resolve raw text feature usage in default feature generator 1 # TODO: QUANTILE VERSION? 1 # TODO: Save dataset to binary and reload for HPO. This will avoid the memory spike overhead when training each model and instead it will only occur once upon saving the dataset. 1 # FIXME: Efficient OOF doesn't work in RAPIDS 1 # TODO: Support S3 paths 1 # TODO: Consider having a removal stage, remove each model and see if score is affected, if improves or not effected, remove it. 1 # TODO: Add fillna_features for feature specific fill values 1 # TODO: MXNetErrorCatcher was removed because it didn't return traceback 1 fit_new_weighted_ensemble = False # TODO: Add as option 1 # FIXME: use FULL features (children can have different features) 1 self._dummy_pred_proba = self._compute_dummy_pred_proba(y[null_indices]) # FIXME: Do this one for better results 1 # TODO Dynamically cache the preprocessor that has been fitted. 1 # TODO: Consider highlighting why a feature was unused (complex to implement, can check if was valid input to any generator in a generator group through feature chaining) 1 # TODO: Move this outside of this function so it is visible even if problem type was not inferred. 1 # TODO: Confirm if GPU is used in HPO (Probably not) 1 # FIXME: imagenet does not work, crashes in validating data due to empty DataFrames. 1 model_obj.save() # TODO: consider omitting for sake of efficiency 1 # TODO: rename to `advice` 1 # TODO: Optimize for inference 1 # TODO: Respect time limit 1 elif params['optimizer'] == 'adam': # TODO: Can we try AdamW? 1 # TODO: Ensure arbitrary feature removal does not result in inconsistencies (add unit test) 1 # TODO: Callbacks not supported in GPU mode 1 # TODO: refit_full does not work as expected: It won't use all data, will just split train data internally. 1 # TODO: Unused? There are not many reasonable situations that seem to require this method. 1 # TODO: ImagePredictor doesn't use problem_type in any way at present. 1 # TODO: categorical features: special encoding: _ 1 # TODO: User could be intending to blend instead. Add support for blend stacking. 1 # TODO: stack_name=REFIT_FULL_NAME_AUX? 1 # TODO: Report errors? 1 # TODO: make core_kwargs a kwargs argument to predictor.fit 1 DEFAULT_QUANTILE_MODEL = ['RF', 'XT', 'FASTAI', 'NN_TORCH', 'ENS_WEIGHTED'] # TODO: OTHERS will be added 1 # TODO: Documentation 2 # FIXME: Benchmarks show that CPU KNN can be trained in ~3 seconds with 0.2 second validation time for CoverType on automlbenchmark (m5.2xlarge) 1 # TODO: Consider passing the metadata search space to searcher to avoid having to do this 1 # TODO: Make this work with unlabelled data 1 # TODO: Be aware: When converted to float32 by downstream models, the seconds value will be up to 3 seconds off the true time due to rounding error. If seconds matter, find a separate way to generate (Possibly subtract smallest datetime from all values). 1 # TODO: Note that temperature scaling is known to worsen calibration in the face of shifted test data. 1 # TODO: Move code logic to learner/trainer 1 # TODO: May have to split search space's by problem type. Not necessary right now. 1 # TODO: Remove train_data argument once we start caching the raw original data: Can just load that instead. 1 # TODO Avoid using psutil when lgb fixed the mem leak. 1 # TODO: Split this to avoid confusion, HPO should go elsewhere? 1 possible TODO: although there is a supervised pretext option below, i.e. pretrain using 1 # TODO: Move to autogluon.core 1 # TODO: max_desired_bins and min_desired_bins are currently equivalent, but in future they will be parameterized to allow for flexibility. 1 # TODO: Consider changing how this works or where it is done 1 # FIXME: Sometimes doesn't detect GPU on Windows 1 # TODO: add pip freeze + python version output after fit + log file, validate that same pip freeze on load as cached 1 'allow_post_generators': False, # TODO: This might not be necessary anymore 1 hyperparameter_tune = False # TODO: add as argument with scheduler options. 1 # TODO: Test softclass 1 # FIXME: Consider removing 1 # TODO: Remove all `time_limits` in project, replace with `time_limit` 1 # TODO: v0.1 Proper error catching 1 # TODO: model_name change to model in params 1 # TODO: use get_gpu_count_torch() or some better way once torch models are available. 1 # TODO v0.1: This import depends on mxnet, consider refactoring to avoid mxnet 1 # TODO: v0.1 Document trainer_fit_kwargs 1 # TODO: Try returning bool type instead of uint8 1 # TODO: calculate max emb concat layer size and use 1st layer as that value and 2nd in between number of classes and the value 1 # TODO: Find way to increase flexibility here, possibly through init args 1 # TODO: In future perhaps give option for the reduce_memory_size arguments, perhaps trainer level variables specified by user? 1 # TODO: Would be ideal to not create dir, but still track that it is unique. However, this isn't possible to do without a global list of used dirs or using UUID. 1 # TODO: Add S3 support 1 # TODO: We will want to track size of model as well, even if we early stop before OOM, we will still crash when saving if the model is large enough 1 # TODO: Binary log_loss doesn't work for some reason 1 # TODO: Add support for different namespaces 1 # FIXME: This doesn't assign proper special types to stack features, relying on a hack in StackerEnsembleModel to assign S_STACK to feature metadata, don't do this. 1 X, X_val = self.bundle_weights(X, w, X_val, w_val) # TODO: consider not bundling sample-weights inside X, X_val 1 X[self.sample_weight] = w # TODO: consider not bundling sample weights inside X 1 # FIXME: This is a hack, add refit tag in a nicer way than via the _model_full_dict_val_score 1 ) # TODO: use 'auto' downstream 1 # FIXME: Unknown if this works with quantile regression 1 # FIXME: v0.1 This section is a hack 1 # TODO: Add error in AG_ARGS if unknown key present 1 # TODO: Robert dataset, LightGBM is super good but RF and KNN take all the time away from it on 1h despite being much worse 1 # TODO: After _get_inputs_to_stacker_legacy is removed, this if/else is not necessary, instead pass fit param to get_model_pred_proba_dict() 1 'hyperparameters_fit': self.params_trained, # TODO: Explain in docs that this is for hyperparameters that differ in final model from original hyperparameters, such as epochs (from early stopping) 1 # TODO: Add more granular switch, currently this affects all future KNN models even if they had `use_daal=False` 1 TODO: ensure features with zero variance have already been removed before this function is called. 2 # TODO: There should also be PL added to weighted ensemble model name to notify 1 model_loaded.save() # TODO: Avoid this! 1 # TODO: v0.1 Add reference link to all valid keys and their usage or keep full docs here and reference elsewhere? 1 # Note: Don't use self.params_aux['max_memory_usage_ratio'] here as LightGBM handles memory per iteration optimally. # TODO: Consider using when ratio < 1. 1 # TODO: copy_X=True currently set during regression problem type, could potentially set to False to avoid unnecessary data copy. 1 transform_matrix = vectorizer.fit_transform(text_data) # TODO: Consider upgrading to pandas 0.25.0 to benefit from sparse attribute improvements / bug fixes! https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html 1 # TODO: Take best performance model with lowest inference 1 # TODO: consider returning self._transform(X) if we allow users to specify real dtypes as input 1 # TODO: Add HashingVectorizer support 1 self._features_internal = list(df.columns) # FIXME: Don't edit _features_internal 1 # TODO: Make sure that pretraining on X_unlabeled only happens 1 time rather than every fold of bagging. (Do during pretrain API work?) 1 # TODO: Move this to model code 1 R_BOOL = 'bool' # TODO: R_BOOL/R_BOOLEAN? 1 # TODO: Improve error messages when trying to get oof from refit_full and distilled models. 1 # TODO: This uses private methods in sklearn, could potentially break without warning in future sklearn releases 1 # FIXME: This section is a hack, compute genuine feature_metadata for each stack level instead 1 # TODO: Investigate allowing categorical features as well 1 # TODO: Also enable deletion of models which didn't succeed in training (files may still be persisted) 1 # TODO: object -> obj? 1 # TODO: Consider adding cutoff arguments such as top-k models 1 # TODO: In online-inference this becomes expensive, add option to remove it (only safe in controlled environment where it is already known features are present 1 # TODO: What if path is in S3? 1 # TODO: Expand print statement to multiclass as well 1 # TODO: Dynamic model loading for ensemble models during prediction, only load more models if prediction is uncertain. This dynamically reduces inference time. 1 objective_goal_is_negative = False # Fixed to false if using sklearn scorers # self.model_base.problem_type == REGRESSION # TODO: if objective function goal = lower (logloss, MAE, etc.) 1 holdout_frac=None, # TODO: Potentially error if num_bag_folds is also specified 1 # TODO: remove the switch if mxnet is deprecated 1 # TODO: Remove _get_inputs_to_stacker_legacy eventually, move logic internally into this function instead 1 # TODO: Add time component 1 # TODO: Consider moving to AbstractModel or as a separate function 1 # TODO: path_context -> path for v0.1 1 self._log(log_level, f'\t\tThese features carry no predictive signal and should be manually investigated.') # TODO: What about features with 1 unique value but also np.nan? 1 # TODO: What about datasets that are 100k+? At a certain point should we not bag? 1 # TODO: Add option to minimize memory usage of feature names by making them integers / strings of integers 1 # TODO: could also return an extra boolean column is_nan which could provide predictive signal. 1 # TODO: Consider HPO-enabled configs if training time doesn't matter but inference latency does. 1 # TODO: handle collinear features - they will impact results quality 1 # TODO: Incorporate score, rolling window 1 # TODO: Improve time estimate (Currently pessimistic) 1 # TODO: Add holdout here, it is overfitting with Logistic Regression 1 # FIXME: TODO: v0.1 X_unlabeled isn't cached so it won't be available during refit_full or fit_extra. 1 # TODO: Add argument to output ngrams as a sparse matrix 1 # TODO: Add unit test for this situation, to confirm it is handled properly. 1 # FIXME: This is extremely slow due to unoptimized metric / objective sent to CatBoost 1 # TODO: Have documentation for all AG_ARGS values 1 # TODO: How to deal with models that fail during this? They have trained valid models before, but should we still use those models or remove the entire model? Currently we still use models. 1 problem_type = MULTICLASS # TODO: Check if integers are from 0 to n-1 for n unique values, if they have a wide spread, it could still be regression 1 # TODO: cache_data must be set to True to be able to pass X and y as None in this function, otherwise it will error. 1 # TODO: Code is largely identical to `predict` and `predict_proba` methods, but due to how those methods are coded, we can't call them directly. 1 # TODO: Add special exception type when there are no more new configurations to try (exhausted search space) 1 # FIXME: Avoid hardcoding model names. 3 # TODO: Consider avoiding converting to string for improved memory efficiency 1 # SoftclassCustomMetric = make_softclass_metric() # TODO: remove after catboost 0.24 1 # TODO: explore/add other hyperparameters like weight decay, use of batch-norm, activation-function choice, etc. 1 # TODO: consider adding kwarg option for data which has already been preprocessed by feature generator to skip feature generation. 1 df.to_parquet(path, compression=compression, engine='fastparquet') # TODO: Might be slower than pyarrow in multiprocessing 1 # FIXME: Consider representing categories as int 1 self._ensure_no_duplicate_column_names(X=X) # TODO: Remove this, move pre_memory_usage and post_memory_usage into super(). 1 # TODO: Rewrite preprocess() in greedy_weighted_ensemble_model to enable 1 hyperparameters = self._process_hyperparameters(hyperparameters=hyperparameters) # TODO: consider exposing ag_args_fit, excluded_model_types as distill() arguments. 1 # TODO: utilize sample_weight_val in early-stopping if provided 1 # TODO: v0.1 consider adding documentation to each model highlighting which feature dtypes are valid 1 pass # TODO: print detailed bagging info 1 # FIXME This is a HACK. Passing in value -1, 0, or None will only use 1 cores. Need to pass in a large number instead 1 self.summary_writer.add_scalar(tag='train_loss', value=train_loss.asscalar(), global_step=e) # TODO: do we want to keep mxboard support? 1 # TODO: re-enable macos once this issue is addressed 1 # TODO: Allow disable aux (default to disabled) 1 # TODO: Technically isn't OOF, but can be used inplace of OOF. Perhaps rename to something more accurate? 1 # TODO: Consider limiting X to 10k rows here instead of inside the model call 1 # TODO: Consider adding _log and verbosity methods to mixin 1 # TODO: Add special error message if called and training/val data was not cached. 1 # TODO: Add fit() arg to perform this automatically at end of training 1 # TODO: Require is_valid == True (add option param to ignore is_valid) 1 # TODO: Add default values to X_val, X_unlabeled, holdout_frac, and num_bag_folds 1 # TODO: Move this up to top of data before removing data, this way our feature generator is better 1 # TODO: Add ability to track which input features created which output features. 1 # TODO: Remove features from models option for fit_extra 1 # TODO: Augment to generate OOF after shuffling each column in X (Batching), this is the fastest way. 1 num_bag_sets = 20 # TODO: v0.1 Reduce to 5 or 3 as 20 is unnecessarily extreme as a default. 1 # TODO: Consider moving base model info to a separate pkl file so that it can be edited without having to load/save the model again 1 # FIXME: REMOVE THIS, first GluonCV needs to stop depending on AG, as it imports this class 1 bag._child_oof = True # TODO: Consider a separate tag for refit_folds vs efficient OOF 1 # TODO: Advice on feature_importance 1 # TODO: Bin size max increase 2 # TODO: Consider adding special optional AG args for use_original_features,features_to_use,etc. 1 # TODO: if provided, feature_types in X, X_val are ignored right now, need to pass to Learner/trainer and update this documentation. 1 'psutil': '>=5.7.3,<5.9', # TODO: Consider capping to <6.0 instead, capping to 5.9 to avoid possible issues. 1 # TODO: Save gain_df, banned_features 1 # FIXME: Don't add `_FULL` to name 1 # TODO: Remove unnecessary load when no stacking 1 # TODO: R_FLOAT_SPARSE/R_INT_SPARSE/R_CATEGORY_SPARSE? 1 # TODO: Preprocess text? 1 # TODO: if we don't want to save intermediate network parameters, need to do something like saving in temp directory to clean up after training: 1 # TODO: move logic to self._preprocess_nonadaptive() 1 # TODO: Consider adding persist to disk functionality for pred_proba dictionary to lessen memory burden on large multiclass problems. 1 #TODO: abstract general loss shared across tasks 1 # FIXME: with use_bag_holdout=True, the fold-1 scores that are logged are of the inner validation score, not the holdout score. 1 # TODO: Consider replacing with C++ implementation (but requires building catboost from source). 1 # TODO: Check memory after loading best model predictions, only load top X model predictions that fit in memory 1 self._user_params = hyperparameters # TODO: Delete after initialization? 1 # TODO: raise exception if no base models and level != 1? 1 # TODO: Try creating multiple Datasets for subsets of features, then combining with Dataset.add_features_from(), this might avoid memory spike 1 # TODO: Enable raw=True for bagged models when X=None 1 # TODO: enable HPO for RF models 1 # TODO: Advice on fit_extra 1 ensemble_selection.fit(predictions=pred_probas, labels=y_internal, identifiers=None, sample_weight=w) # TODO: Only fit non-nan 1 if self.feature_metadata is None: # TODO: This is probably not the best way to do this 1 # TODO: This doesn't have to be done twice, can update transform matrix based on new vocab instead of calling .transform 1 # TODO: This is expensive to convert at inference time, try to avoid in future 1 # TODO: Add valid_names, invalid_names arguments which override all other arguments for the features listed? 1 # TODO: Add regression support 1 if sample_weight is not None: # TODO: support 6 # FIXME: get_gpu_count by itself doesn't always work for Windows 1 # TODO: Add recursive=True to avoid repeatedly loading models each time this is called for bagged ensembles (especially during repeated bagging) 1 # TODO: 2 # TODO: Add persist_if_mem_safe() function to persist in memory all models if reasonable memory size (or a specific model+ancestors) 1 language_featnames = [] # TODO: not implemented. This should fetch text features present in the data 1 # FIXME: v0.1 Must store final num rows for refit_full or else will use everything! Worst case refit_full could train far longer than the original model. 1 # TODO: Move to a generator 1 # TODO: Only update scores when finished, only update model as part of final models if finished! 1 # TODO: Remove below lines or write logic to switch between using these lines and the multithreaded version once multithreaded version is optimized 1 # TODO: Add argument to define the text preprocessing logic 1 # TODO: v0.1 see if this can be removed and logic moved to greedy weighted ensemble model -> Use StackerEnsembleModel as stacker instead 1 # sample_weight=sample_weight, # TODO: add sample_weight support 1 logger = logging.getLogger(__name__) # TODO: Currently unused 1 # TODO: add inplace option? 1 if sample_weight == AUTO_WEIGHT: # TODO: update auto_weight strategy and make it the default 1 # FIXME: Avoid hard-coding 'image' column name 1 # TODO explore using liblinear for smaller datasets 1 X = self.cleaner.fit_transform(X) # TODO: Consider merging cleaner into label_cleaner 1 # TODO: Unit test this 1 # TODO: Consider making hyperparameters arg in fit() accept lists, concatenate hyperparameter sets together. 1 # FIXME: Hack to be compatible with gluoncv 1 # TODO: special optional AG arg for only training model if eval_metric in list / not in list. Useful for F1 and 'is_unbalanced' arg in LGBM. 1 kwargs = self.initialize(**kwargs) # FIXME: This might have to go before self._preprocess_fit_args, but then time_limit might be incorrect in **kwargs init to initialize 1 # TODO: Add more parameters (possibly pass in binning function as an argument for full control) 1 # TODO: Add documentation 1 kwargs['num_classes'] = self.num_classes # TODO: maybe don't pass num_classes to children 2 # TODO: Improve handling of missing datetimes 1 # TODO: - Differentiate between advanced generators that require fit (stateful, gl_code_matrix) and those that do not (bucket label averaging in SCOT GC 2019) 1 # TODO: add time/ngram features 1 # TODO: hpo_results likely not correct because no renames 1 if type_family != 'object': # TODO: seconds from epoch support 1 model_set = [m for m in model_set if m != model.name] # TODO: Can probably be faster, get this result from graph 1 # TODO: Time allowance not accurate if running from fit_continue 1 # TODO: Preprocess data here instead of repeatedly 2 # TODO: Reduce logging clutter during OOF importance calculation (Currently logs separately for each child) 1 # TODO: This can also be done via setting `oob_score=True` in model params, 1 # TODO: Add more strategies 1 # TODO: Confirm this works with sparse and other feature types! 1 # TODO: use sample_weight? 1 # TODO: Consider having CatBoost variant that converts all categoricals to numerical as done in RFModel, was showing improved results in some problems. 1 # TODO: Replace XGBoost, NN, and Linear Model OHE logic with this 1 # TODO: Add to HPO 1 # TODO: Remove this limitation if n_repeats > 1 1 # TODO: Return transformed data? 1 # TODO: Check if low numeric numbers, could be categorical encoding! 1 # TODO: Consider imputing to mode? This is tricky because training data had no missing values. 1 layer_expansion_factor = 1 # TODO: consider scaling based on num_rows, eg: layer_expansion_factor = 2-np.exp(-max(0,train_dataset.num_examples-10000)) 1 # TODO: Advice on distill 1 # TODO: Consider making this fully generic and available to all models 1 # TODO: Gets stuck after infering feature types near infinitely in nyc-jiashenliu-515k-hotel-reviews-data-in-europe dataset, 70 GB of memory, c5.9xlarge 1 # TODO: Remove kwargs? 2 # TODO: adjust max_memory_ratio correspondingly if prefilter_tokens==True 1 # TODO: Identify optimal threshold values for early_stopping based on lack of memory 1 # TODO: Could alternatively add epsilon to y_val_probs in order to avoid. 1 # TODO: Add data argument 1 # TODO: Custom metrics not supported in GPU mode 1 # TODO: Won't work for RAPIDS without modification 1 # TODO: Add Early Stopping support via validation 1 # TODO: Add metadata object with info like score on each model, train time on each model, etc. 1 # TODO: This will break on S3. Use tabular/utils/savers for datasets, add new function 1 # TODO: Depending on max_depth parameter, RFRapidsModel is slower than RFModel. 1 # TODO: Consider moving `_child_oof` logic to a separate class / refactor OOF logic. 1 # TODO: Add feature inference / feature engineering info to get_info output 1 X = X.reset_index(drop=True) # TODO: Theoretically inplace=True avoids data copy, but can lead to altering of original DataFrame outside of method context. 2 # TODO: Could this be optimized to be faster? This might be a bit slow for large data. 1 # self._exceptions_list = [] # TODO: Keep exceptions list for debugging during benchmarking. 1 # TODO: What about time_limit? Metalearning can tell us expected runtime of each model, then we can select optimal folds + stack levels to fit time constraint 1 # TODO: What about nulls / unknowns? 1 # TODO: Consider NaN values as a separate value? 1 pass # TODO: print detailed feature-selection info once feature-selection is functional. 1 # TODO: s3 support 1 # TODO: Given this is so fast, consider doing rapid feature pruning 1 # TODO: Try midstack Semi-Supervised. Just take final models and re-train them, use bagged preds for SS rows. This would be very cheap and easy to try. 1 # TODO: Create new Ensemble Here 1 # TODO: Rename for v0.1 1 # TODO: Add documentation for valid args for each model. Currently only `ag.early_stop` 1 # TODO: v0.1 cleanup and avoid hardcoded logic with model names 1 # TODO: load time 1 # TODO: respect CUDA_VISIBLE_DEVICES to select proper GPU 1 # FIXME: Cleanup self 1 # TODO: Standardize / unify this code with ag.save() 2 # TODO: lightgbm must have a special install to support GPU: https://github.com/Microsoft/LightGBM/tree/master/python-package#build-gpu-version 1 # return postprocess_augmented(X_aug, X) # TODO: dropping duplicates is much more efficient, but may skew distribution for entirely-categorical data with few categories. 1 TODO: Fix Mac OS X warning spam. 1 # TODO: Consider contributing to sklearn officially 1 if self.problem_type != SOFTCLASS: # TODO: remove this after catboost 0.24 1 scheduler_options[1]['time_out'] = orig_time * 0.8 # TODO: Scheduler doesn't early stop on final model, this is a safety net. Scheduler should be updated to early stop 1 # TODO: Note that raw importance will not equal non-raw importance for bagged models, even if raw features are identical to the model features. 1 aux_kwargs = None # TODO: Add as option 1 # TODO: Add TFIDF support 1 # TODO: currently fastai automatically detect and use CUDA if available - add code to honor autogluon settings 1 # TODO: Update so verbose prints at level 20, and adjust calls to save accordingly 1 # TODO: Can these be optimized? What computational cost do they have compared to the default catboost versions? 1 # TODO: Advice on dropping poor models 1 # TODO Add the "labels" option to metrics that will require the label map. 1 # TODO: Add toggle parameter to early_stopping to disable this 1 # TODO: Handle multiple image columns? 1 # TODO: disk size 1 # TODO: Handle cases where self.features_in or self.feature_metadata_in was already set at init. 1 # FIXME: Avoid depending on torch for temp scaling 1 X, w = extract_column(X, self.sample_weight) # TODO: consider redesign with w as separate arg instead of bundled inside X 1 # TODO: Add weight support? 1 # FIXME: Avoid unnecessary refit during refit_full on `_child_oof=True` models, just re-use the original model. 1 # TODO: Add fillna_special_map, fillna_combined_map to increase options 1 # TODO: write out in English the full set of transformations that are applied (and eventually host page on website). Also explicitly write out all of the feature-generator "hyperparameters" that might affect the results from the AutoML FeatureGenerator 1 # TODO: Expand functionality if necessary 1 # TODO: hack to subpress a deprecation warning from fasttext 1 # TODO: this is using unique on X again. Ideally we should integrate 1 # TODO: Investigate speed-ups from GPU inference 1 # FIXME: Don't save folds except for refit 1 # TODO: uncomment once feature_prune is functional: self._summarize('feature_prune', 'feature-selection used', results) 1 self.save() # TODO: This could be more efficient by passing in arg to not save if called by refit_ensemble_full since it saves anyways later. 1 # TODO: currently is buggy 1 self.sample_weight = sample_weight # TODO: consider redesign where Trainer doesnt need sample_weight column name and weights are separate from X 1 # TODO: Add time_limit_per_model 1 # TODO: Add option to crash if null is present for faster predict_proba 1 # TODO: Add more params 1 # TODO: Use code from problem type detection for column types. Ints/Floats could be Categorical through this method. Maybe try both? 1 # TODO: v0.1 Identify why distributed logs are spammed if not suppressed via the below code 1 # self.bagged_mode = True # TODO: Add options for bagging 1 scheduler = optim.lr_scheduler.ExponentialLR(optimizer_embeds, gamma=base_exp_decay) # TODO: Should we be using this in _epoch()? 1 # TODO: text features: alternate text preprocessing steps 1 # TODO: super() call? 1 # TODO: Add option to update hyperparameters with only added keys, so disabling CatBoost would just be {'CAT': []}, which keeps the other models as is. 1 # TODO: Consider some kind of weighting of the two options so there isn't a harsh cutoff at 50 1 # TODO: package_dependencies, package_dependencies_full 1 # TODO: Remove this limitation 1 # TODO: Consider NaN? 1 compute_base_preds = False # TODO: Consider removing, this can be dangerous but the code to make this work otherwise is complex (must rewrite predict_proba) 1 # TODO: Consider deleting these variables after initialization 1 # TODO: This results in a doubling of memory usage of the model to calculate its size. 1 self.weight_evaluation = weight_evaluation # TODO: sample_weight and weight_evaluation can both be properties that link to self._learner.sample_weight, self._learner.weight_evaluation 1 # TODO: Ban KNN from being a Stacker model outside of aux. Will need to ensemble select on all stack layers ensemble selector to make it work 1 # TODO: Clean code 1 # TODO: Advice on leaderboard 1 # FIXME: Doesn't ensure the GPUs are actually usable by the model (MXNet, PyTorch, etc.) 1 # TODO: REMOVE NEED FOR text_data input! 1 val = val['importance'].to_dict() # TODO: Don't throw away stddev information of children 1 drop_unique=False, # TODO: Get the value from child instead 1 # TODO: This call has very high fixed cost with many features (0.7s for a single row with 3k features) 1 # TODO: Can get feature importances of all children of model at no extra cost, requires scoring the values after predict_proba on each model 1 # FIXME: There is a bug in sklearn that causes an incorrect ValueError if performing stratification and all classes have fewer than n_splits samples. 1 elif self.problem_type == SOFTCLASS: # TODO: delete this elif if it's unnecessary. 1 # TODO: Consider making level be auto-determined based off of max(base_model_levels)+1 1 # FIXME: This is a hack, stack feature special types should be already present in feature_metadata, not added here 1 # TODO: This could be preprocess_nonadaptive=True in general, just have preprocess_nonadaptive=False for child models 1 # TODO: maybe need to initialize/hybridize? 1 # TODO this method is generalizable and potentially should be moved out into framework 1 # TODO: Below should not occur until at time of saving 1 # TODO: - Minimize memory usage of DataFrames (convert int64 -> uint8 when possible etc.) 1 ignored_type_group_raw=[R_BOOL, R_CATEGORY, R_OBJECT], # TODO: Eventually use category features 1 # TODO: add option for only top-k ngrams 1 logger.exception('Detailed Traceback:') # TODO: Avoid logging if verbosity=0 1 # TODO: feature_generator_kwargs? 1 # FIXME: Sample weight `extract_column` is a hack, have to compute feature_metadata here because sample weight column could be in X upstream, extract sample weight column upstream instead. 1 # TODO: Potentially move to fit_extra, raise exception if value too large / invalid in fit_extra. 1 # TODO: Improve memory safety 1 'min_data_in_leaf': Int(lower=2, upper=60, default=20), # TODO: Use size of dataset to set upper, if row count is small upper should be small 1 self._user_params_aux = hyperparameters.pop(AG_ARGS_FIT) # TODO: Delete after initialization? 1 # TODO: Refactor as a dictionary mapping as done in LGBM 1 # FIXME: Don't do this, fix the outer code to not require this 1 # self._save_preprocessor() # TODO: should save these things for hyperparam tunning. Need one HP tuner for network-specific HPs, another for preprocessing HPs. 1 'num_leaves': Int(lower=16, upper=96, default=31), # TODO: Use row count and feature count to set this, the higher feature count the higher num_leaves upper 1 # TODO: fill_nan add additional options: group_rares, possibly percentile based 1 # TODO: Consider adding to get_info() output 1 # TODO: Consider adding 'skex' to 'all' 1 # TODO: Add hashing trick if minimize_memory=True to avoid storing full original mapping 1 # TODO: Optimize below to perform better before using 1 # TODO: can_infer is technically more complicated, if an ancestor can't infer then the model can't infer. 1 k_fold=num_bag_folds, # TODO: Consider moving to fit call 1 # TODO: consider moving to self._remove_features_out 1 # TODO: Can skip features which were pruned on all models that model depends on (Complex to implement, requires graph representation) 1 # TODO: LabelEncoderTransformer 1 self.num_dataloading_workers = 0 # TODO: >0 crashes on MacOS 1 # TODO: Enable HPO for KNN 1 # needed for gluon-cv TODO: remove after gluon-cv is updated https://github.com/dmlc/gluon-cv/issues/1633 1 # TODO: Experimental, currently unused 1 AG_ARGS_ENSEMBLE = 'ag_args_ensemble' # Contains arguments that impact model ensembling, such as if an ensemble model is allowed to use the original features. # TODO: v0.1 add to documentation 1 # TODO: Args such as max_repeats, num_folds 1 # TODO: Remove `_set_oob_score` after sklearn version requirement is >=1.0 1 # TODO: Add options to specify mean/median/mode for int/float 1 # FIXME: Right now the upper bound on the number of features we evaluate feature importance at once is determined by our expected feature 1 pass # TODO: print detailed stacking info, like how much it improves validation performance 1 # TODO: Add data info gathering at beginning of .fit() that is used by all learners to add to get_info output 1 # TODO: add aux_kwargs to predictor.fit 1 # TODO: remove this once mxnet is deprecated 1 model.normalize_pred_probas = True # FIXME: Do we need to do this for child models too? 1 # TODO: Add cv / OOF generator option, so that AutoGluon can be used as a base model in an ensemble stacker 1 # TODO: QuantileTransformer in pipelines accounts for majority of online inference time 1 # TODO: The code fixes the crash on mxnet gluon interpreting a single value in a batch incorrectly. 1 # TODO: Still have to change paths of models in trainer + trainer object path variables 1 # TODO: consider moving weight normalization into AbstractModel.fit() 1 # TODO: Add documentation on shared_raw_features usage 1 # FIXME: Raise exception if multiclass/binary and a single val fold contains all instances of a class. (Can happen if custom groups is specified) 1 X_stack_preds = self.get_inputs_to_stacker(X_val, base_models=base_model_names, fit=False, use_orig_features=False) # TODO: May want to cache this during original fit, as we do with OOF preds 1 # TODO: Consider adding final check of validity/that features are reasonable. 1 # TODO: re-enable bayesopt after it's been implemented 1 # TODO: 600 is much better, but increases info leakage in stacking -> therefore 300 is ~equal in stack ensemble final quality. 1 if i == indices_to_check[0]: # TODO: documentation needs to note that we assume 0th index is the 'official' validation performance metric. 1 # TODO: Time allowance can be made better by only using time taken during final model training and not during HPO and feature pruning. 1 base_size = max(1, min(num_net_outputs, 100) / 50) # TODO: Updated because it improved model quality and made training far faster 1 # TODO: v0.1 add tutorial related to this method, as it is very powerful. 1 X_stacker.append(y_pred_proba) # TODO: This could get very large on a high class count problem. Consider capping to top N most frequent classes and merging least frequent 1 # TODO: Can Bagged LinearModels be combined during inference to 1 model by averaging their weights? 1 # TODO: Use DOCS_PACKAGES and TEST_PACKAGES 1 # TODO: Documentation, flesh out capabilities 1 # TODO: Consider adding optional AG args to dynamically disable models such as valid_num_classes_range, valid_row_count_range, valid_feature_count_range, etc. 1 # TODO: Advice on refit_full 1 # FIXME: update to use only torch for TIMM or find a better GPU detection strategy 1 # TODO: maybe integrate this part with the one above 1 # TODO: - Those that do not could be added to preprocessing function of model, but would then have to be recomputed on each model. 1 # TODO: Can be improved further to make it more accurate 1 # TODO: Add log of # of observation counts to high cardinality categorical features 1 # TODO: Custom metrics don't seem to work anymore 1 # FIXME: Below is major defect! 1 # TODO: CV5 instead of holdout? Should be better 1 # TODO: Better solution: Track trend to early stop when score is far worse than best score, or score is trending worse over time 1 # TODO: feature_metadata 1 # TODO: - Semi-supervised learning 1 # TODO: Adjust max_bins to 254? 1 TODO : Add dedicated confusion_matrix function to AbstractLearner 1 self.num_dataloading_workers = 0 # TODO: verify 0 is typically faster and uses less memory than 1 in pytorch 1 # TODO: It could be possible to adaptively set max_iter [1] to approximately respect time_limit based on sample-size, feature-dimensionality, and the solver used. 1 df = pd.read_parquet(path, columns=columns_to_keep, engine='fastparquet') # TODO: Deal with extremely strange issue resulting from torch being present in package, will cause read_parquet to either freeze or Segmentation Fault when performing multiprocessing 1 self.bagged_mode = og_bagged_mode # TODO: Confirm if safe to train future models after training models in both bagged and non-bagged modes 1 # TODO: Ensure reporter/scheduler properly handle None/nan values after refactor 1 # TODO: X can be optional because it isn't needed if fit=True 1 # TODO: Regression does not return NaN for missing rows, instead it sets them to 0. This makes life hard. 1 # FIXME: If rapids is installed, normal CPU LinearModel crashes. 1 # TODO: Handle multiple images in a single image column? 1 # TODO: This can easily be optimized heavily 1 # TODO: Optimize predict speed when fit on kfold, can simply sum weights 1 n_repeats=num_bag_sets, # TODO: Consider moving to fit call 1 # TODO: Do it for all models in the level at once to avoid repeated processing of data? 1 # TODO: make core_kwargs a kwargs argument to predictor.fit, add aux_kwargs to predictor.fit 1 # FIXME: Address situation where test-time invalid type values cause crash: 1 # TODO: Consider merging feature_metadata and feature_metadata_real, have FeatureMetadata contain exact dtypes, grouped raw dtypes, and special dtypes all at once. 1 # TODO: Add option to return feature_metadata instead to avoid data copy 1 # TODO: Potentially set reset_paths=False inside load function if it is the same path to avoid re-computing paths on all models 1 # TODO: Add support for calculating pred_time_test_full for oracle_ensemble, need to copy graph from trainer and add oracle_ensemble to it with proper edges. 1 # TODO: Consider moving this into end of abstract model fit for all models. 1 approx_mem_size_req = data_mem_uasge * 7 + data_mem_uasge / 4 * num_classes # TODO: Extremely crude approximation, can be vastly improved 3 # FIXME: What if all rows in a class are null? Will probably crash. 1 # TODO: Control CPU vs GPU usage during inference 1 if threshold < -100000000: # FIXME: Hacked for regression 1 # TODO: This is a hack! self.predict_prob does not update to use weighted ensemble 1 X_category[column] = X_category[column].astype(CategoricalDtype(categories=category_list)) # TODO: Remove columns if all NaN after this? 1 Assumes baseline validation score has already been computed. TODO: Take into account speedup from parallel feature 1 # TODO: Delete from all the other model dicts 1 # TODO: Add HPO 4 # TODO: Possibly rename to features_in or consider refactoring all feature_generators features_in -> features 1 # eval_metric=self.eval_metric, # TODO: multiclass/binary vision problem works only with accuracy, regression with rmse 1 # TODO: Rename feature_generator -> feature_pipeline for users? 1 # TODO: Use Pool in preprocess, optimize bagging to do Pool.split() to avoid re-computing pool for each fold! Requires stateful + y 1 # TODO: This crashes if a feature is already named 'index'. 1 # TODO: We need to handle user provide custom num_cpus 1 # TODO: Add option to stop if current run's metric value is X% lower, such as min 30%, current 40% -> Stop 1 # TODO: Remove level and stack_name arguments, can get them automatically 1 # FIXME: ImagePredictor crashes if given float time_limit 1 # TODO: Don't use os.makedirs here, have save_parameters function in tabular_nn_model that checks if local path or S3 path 2 # TODO: We should probably uncomment the below lines, NaN label should be treated as just another value in multiclass classification -> We will have to remove missing, compute problem type, and add back missing if multiclass 1 one_hot_threshold = 10000 # FIXME research memory constraints 2 # TODO: no filtering of data-frame columns based on statistics, e.g. categorical columns with all unique variables or zero-variance features. 1 # TODO: If low numeric, potentially it is just numeric instead of date 1 elif self.sample_weight == AUTO_WEIGHT: # TODO: support more sophisticated auto_weight strategy 1 # TODO: DON'T THROW AWAY SAMPLES! USE LARGER N 1 # TODO: Add parameter to add prefix to each generator to guarantee no name collisions: 'G1_', 'G2_', etc. 1 # TODO: Legacy code, still used during training because it is technically slightly faster and more memory efficient than get_model_pred_proba_dict() 1 fit_kwargs=dict(num_classes=self.num_classes, groups=None), # FIXME: Is this the right way to do this? 1 self.features_to_drop = [] # may change between different bagging folds. TODO: consider just removing these from self._features_internal 2 # TODO: Add in documentation that Categorical default is the first index 1 # FIXME: Avoid using private API! (https://github.com/awslabs/autogluon/issues/1381) 1 # self.model.set_verbosity(verbosity) # TODO: How to set verbosity of fit predictor? 1 # TODO: Do the below more elegantly, ideally as a parameter to the trainer train function to disable recording scores/pred time. 1 # TODO: Technically we don't need to re-train the weighted ensemble, we could just copy the original and re-use the weights. 1 # TODO: We may want to throw an exception instead and avoid calling fit more than once 1 # TODO: min_samples_leaf=5 is too large on most problems, however on some datasets it helps a lot (airlines likes >40 min_samples_leaf, adult likes 2 much better than 1) 1 # TODO: Advice on persist 1 # TODO: num_bag_sets 1 mean_squared_error='rmse', # TODO: not supported from default eavl metric. Firstly, use `rsme` refenced by catboost model. 1 # TODO: Add docs to dedicated page, or should it live in AbstractModel? 1 # TODO: Epochs could take a very long time, we may want smarter logic than simply # of epochs without improvement (slope, difference in score, etc.) 1 sample_count=sample_count, worker_count=worker_count, multiprocessing_method=multiprocessing_method) # TODO: Add arguments! 1 # TODO: Can speedup shuffle_repeats by incorporating into X_raw (do multiple repeats in a single predict call) 1 time_limit=time_limit, # FIXME: Also limit augmentation time 1 # TODO: Remove this after simplifying _predict_proba to reduce code duplication. This is only present for SOFTCLASS support. 1 # TODO: Advice on unused features (if no model uses a feature) 1 # TODO: X.fillna -inf? Add extra is_missing column? 1 # FIXME: Switch to adaptive ES 1 language_featnames = [] # TODO: Disabled currently, have to pass raw text data features here to function properly 1 # TODO: ignore models which were killed early by scheduler (eg. in Hyperband). How to ID these? 1 # TODO: Add more granular switch, currently this affects all future LR models even if they had `use_daal=False` 1 # TODO: Add memory optimal topological ordering -> Minimize amount of pred_probas in memory at a time, delete pred probas that are no longer required 1 # TODO: Remove once sklearn < 1.0 support is dropped 1 # TODO: Loading learner from S3 on Windows may cause issues due to os.path.sep 1 # TODO: Stratify? We currently don't know in this function the problem_type (could pass as additional arg). 1 # TODO: Takes extremely long time prior to training start if many (10000) continuous features from ngrams, debug - explore TruncateSVD option to reduce input dimensionality 1 # TODO: pd.Series(['20170204','20170205','20170206']) is incorrectly not detected as datetime_as_object 1 # TODO: Since data preprocessor is fitted on original train_data it cannot account for if 1 # TODO: Add int fillna input value options: 0, set value, mean, mode, median 1 y = y.reset_index(drop=True) # TODO: this assumes y and X had matching indices prior 1 # save_bag_folds = kwargs['save_bag_folds'] # TODO: Enable 1 # TODO: 0.20 causes OOM error with 64 GB ram on NN with several datasets. LightGBM and CatBoost succeed 1 # TODO: Enable easier re-mapping of trained models -> hyperparameters input (They don't share a key since name can change) 1 # TODO: num_bag_sets -> ag_args 1 # FIXME: DAAL OOB score is broken, returns biased predictions. Without this optimization, can't compute Efficient OOF. 1 # TODO: Not necessary to exist after fitting, can just update outer context feature_out/feature_in and then delete this 2 # TODO: Consider moving convert_pred_probas_df_to_list into inner model to ensure X remains a dataframe after preprocess is called 1 # TODO: Consider re-using bins variable instead of making bins_2-7 variables 1 def _predict_tabular_data(self, new_data, process=True, predict_proba=True): # TODO ensure API lines up with tabular.Model class. 1 # TODO: MXNetErrorCatcher was removed because it didn't return traceback, 1 X_nlp_features = pd.DataFrame(transform_array, columns=self._feature_names_dict[nlp_feature], index=X.index) # TODO: Consider keeping sparse 1 # TODO: Rewrite with normalized value counts as binning technique, will be more performant and optimal 1 # TODO: v0.1 add documentation for arguments 1 # TODO no label processing for now 1 # FIXME: Has a leak somewhere, training additional models in a single python script will slow down training for each additional model. Gets very slow after 20+ models (10x+ slowdown) 1 class EmbedNet(gluon.Block): # TODO: hybridize? 1 # TODO: Multiply epochs/n_iterations by some value (such as 1.1) to account for having more training data than bagged models 1 # TODO: v0.1 clean method 1 To apply label-smoothing: teacher_preds='onehot' will use original training data labels converted to one-hots for multiclass (no data augmentation). # TODO: expose smoothing-hyperparameter. 1 # TODO: v0.1 update to be aligned with _set_default_auxiliary_params(), add _get_default_params() 1 # TODO: Finetune this, or find a better way to ensure stability 1 # TODO: Add logging comments that models are serialized on disk after fit 1 self.models = {} # Dict of model name -> model object. A key, value pair only exists if a model is persisted in memory. # TODO: v0.1 Rename and consider making private 1 self._oof_pred_proba = model_base.predict_proba(X=X) # TODO: Cheater value, will be overfit to valid set 1 # TODO: v0.1 Change this to update path_root only, path change to property 1 # TODO: Add more control over these params (specifically early_stopping_rounds) 1 # FIXME: Use ImagePredictor's tuning_data split logic when None, currently this does not perform an ideal split. 1 # TODO: bagged-with-holdout refit to bagged-no-holdout should still be able to return out-of-fold predictions 1 if val_dataset is not None and (not np.isnan(val_metric)): # TODO: This might work without the if statement 1 # TODO: Add support for sample_weight when pseudo is present 2 # TODO: How to reference correct version of docs? 1 # TODO: Use sklearnex instead once a suitable toggle option is provided that won't impact future models 1 # TODO: Remove or fix, likely doesn't work anymore 1 # TODO: Normalize data! 1 # TODO: Consider adding special optional AG args for #cores,#gpus,num_early_stopping_iterations,etc. 1 self.path = self.create_contexts(self.path_root + self.path_suffix) # TODO: Make this path a function for consistency. 1 # TODO: This takes a surprisingly long time to run, ~30 seconds a laptop for 50,000 rows of datetime_as_object for a single column. Try to optimize. 1 # FIXME: Consider counting NaNs as unique values, if unique_counts == 2 (including NaN), then treat as boolean 1 # TODO: Remove name_suffix, hacked in 1 # TODO: inefficient copy of approxes, targets to np.array from provided UniTuple (required for JIT to work) 1 # TODO: predictor.clone() 1 # TODO: Expand to int64 -> date features (milli from epoch etc) 1 # TODO: Add is_persisted() function to check which models are persisted in memory 1 # TODO: Add argument keep_unused, which creates an identity feature generator at each stage to pipe unused input features into the next stage instead of dropping them. 1 # TODO: v0.1 Document get_features_kwargs_extra in task.fit 1 # TODO: Constructor? 1