-
Hello! Thanks for the great package! I'm wondering how I could pass stop words to pyserini when building a BM25 index? The default is not using any stop words, but I want to accelerate the construction and inference process by removing stop words. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Answered by
lintool
Nov 28, 2023
Replies: 1 comment 1 reply
-
The pyserini indexer is just a wrapper around the anserini indexer, and there's a Hope this helps! |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
RulinShao
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Start here:
https://github.com/castorini/pyserini/blob/master/docs/usage-index.md#building-a-bm25-index-direct-java-implementation
The pyserini indexer is just a wrapper around the anserini indexer, and there's a
-stopwords
option to specify your own stopwords:https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexCollection.java#L144
Hope this helps!