Rank by Boolean Expr BM25¶
Description¶
Ranks objects in SOURCE [OBJ,STRING]
according to the relevance score of each STRING
with the expression in QUERY [STRING]
.
The relevance is computed using Okapi BM-25 ranking method.
Inputs¶
SOURCE [OBJ,STRING]
: a 2-column input with an object-string pair. Typically obtained with theExtract string
block
Outputs¶
RESULT [OBJ]
: a list of ranked objects
Parameters¶
Query
: a boolean queryUse
and
,or
(case does not matter) to express conjunctions and disjunctions of termsUse parentheses to group sub-expressions
Negations are not yet supported
Quotes to group terms into a phrase are not yet supported
Example:
apple AND (pear OR banana)
Stemming
: tokens can be stemmed for a specific language or left as they areCase-sensitive
: if set tofalse
, upper/lower case is ignoredNormalize diacritics
: transliterates non-ASCII characters into their closest ASCII formTokenization
: the method to tokenize the input strings.None
: perform no tokenizationSpaces
: all valid Unicode space charactersSpaces/Punctuation
:Spaces
+ all valid Unicode punctuation charactersSpaces/Punctuation/Digits
:Spaces/Punctuation
+ all valid Unicode digit charactersSpaces/Punctuation/Digits/Symbols
:Spaces/Punctuation/Digits
+ all valid Unicode symbol charactersCustom Regular Expression
: any regular expression
Min token length
: tokens whose character length is shorter than this value are discardedAll query terms must match
: if set totrue
, only candidates where all tokens inQTERMS
match a string inSOURCE
are considered a matchk1
: controls non-linear term frequency normalisation (saturation). Lower value = quicker saturation (term frequency is more quickly less important)b
: degree of document-length normalisation applied.0
=no normalisation,1
=full normalisation