You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In FilterPermutation.R, the permutation importance algorithm performs a complete resampling (train and predict) for each permuted column. In Breiman’s original paper introducing the technique for random forests, he used a pre-trained model and observed the effect of that feature on the performance of that specific model. This is consistent with how it is usually described in literature, as well as the scikit-learn implementation. It is also considerably less computationally expensive.
While there are potential upsides to retraining the model for each permutation, it seems like that shouldn’t be the default behavior. I’d like to propose that the default behavior should be:
Build the original unpermuted resample result and calculate the performance measure
Shuffle each column one by one and recalculate the performance measure without retraining
Return the result
There could be an option “retrain = FALSE” that could be set to TRUE in the case that the user wants to refit the model for each column.
The text was updated successfully, but these errors were encountered:
In FilterPermutation.R, the permutation importance algorithm performs a complete resampling (train and predict) for each permuted column. In Breiman’s original paper introducing the technique for random forests, he used a pre-trained model and observed the effect of that feature on the performance of that specific model. This is consistent with how it is usually described in literature, as well as the scikit-learn implementation. It is also considerably less computationally expensive.
While there are potential upsides to retraining the model for each permutation, it seems like that shouldn’t be the default behavior. I’d like to propose that the default behavior should be:
There could be an option “retrain = FALSE” that could be set to TRUE in the case that the user wants to refit the model for each column.
The text was updated successfully, but these errors were encountered: