Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutation importance with already trained model? #175

Open
Prometheus77 opened this issue Jan 4, 2025 · 0 comments
Open

Permutation importance with already trained model? #175

Prometheus77 opened this issue Jan 4, 2025 · 0 comments

Comments

@Prometheus77
Copy link

In FilterPermutation.R, the permutation importance algorithm performs a complete resampling (train and predict) for each permuted column. In Breiman’s original paper introducing the technique for random forests, he used a pre-trained model and observed the effect of that feature on the performance of that specific model. This is consistent with how it is usually described in literature, as well as the scikit-learn implementation. It is also considerably less computationally expensive.

While there are potential upsides to retraining the model for each permutation, it seems like that shouldn’t be the default behavior. I’d like to propose that the default behavior should be:

  • Build the original unpermuted resample result and calculate the performance measure
  • Shuffle each column one by one and recalculate the performance measure without retraining
  • Return the result

There could be an option “retrain = FALSE” that could be set to TRUE in the case that the user wants to refit the model for each column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant