# Match by BM25

### Description
This is a multi-query BM25 block, multiple lists of query keywords instead of a single one.
It is in fact equivalent to a matching operation.
It finds matches between the `STRING`-columns in the inputs by calculating the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) relevance score.

### Input
Because this is originally a retrieval block,  the notation `SOURCE` / `QTERMS` will be used, instead of `A` / `B` as in other matching blocks.
- `SOURCE [OBJ,STRING]`: a list of candidates, in which the `STRING`-column will be used for comparison and the `OBJ`-column will be the result
- `QTERMS [OBJ,STRING]`: a list of candidates, in which the `STRING`-column will be used for comparison and the `OBJ`-column will be the result

### Output
- `RESULT [OBJ,OBJ]`: the matched objects from `SOURCE` and `QTERMS`

### Parameters
- `Stemming`: tokens can be stemmed for a specific language or left as they are
- `Case-sensitive`: if set to `false`, upper/lower case is ignored
- `Normalize diacritics`: transliterates non-ASCII characters into their closest ASCII form
- `Tokenization`: the method to tokenize the input strings.
  - `None`: perform no tokenization
  - `Spaces`: all valid Unicode space characters
  - `Spaces/Punctuation`: `Spaces` + all valid Unicode punctuation characters
  - `Spaces/Punctuation/Digits`: `Spaces/Punctuation` + all valid Unicode digit characters
  - `Spaces/Punctuation/Digits/Symbols`: `Spaces/Punctuation/Digits` + all valid Unicode symbol characters
  - `Custom Regular Expression`: any [regular expression](https://www.regular-expressions.info)
- `Min token length`: tokens whose character length is shorter than this value are discarded
- `Gram type`:
  - `Word` (default): each token is composed by UTF-8 word n-grams
  - `Character`: each token is composed by UTF-8 character n-grams
- `Grams`: allows to extract n-gram tokens (default is 1)
- `All query terms must match`: if set to `true`, only candidates where all tokens in `QTERMS` match a string in `SOURCE` are considered a match
- `k1`: controls non-linear term frequency normalisation (saturation). Lower value = quicker saturation (term frequency is more quickly less important)
- `b`: degree of document-length normalisation applied. `0`=no normalisation, `1`=full normalisation