Match by string¶
Description¶
Finds matches between the STRING-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
The result provides both the matching items, as well as the items from both inputs that didn’t generate a match.
Optional input Cands [OBJ,OBJ] can limit the matching to only the pairs of candidates listed.
The first column corresponds to the first column of
A.The second column corresponds to the first column of
B.Scores are propagated to final matches.
Input¶
A [OBJ,STRING]: a list of candidates, in which theSTRING-column will be used for comparison and theOBJ-column will be the resultCands [OBJ,OBJ](optional): candidate pairs, onlyAs andBs that are inCandswill be matchedB [OBJ,STRING]: a list of candidates, in which theSTRING-column will be used for comparison and theOBJ-column will be the result
Output¶
RESULT [OBJ,OBJ]: the matched objects fromAandBNOTA [OBJ]: the objects from A that did not match with an item fromBNOTB [OBJ]: the objects from B that did not match with an item fromA
Parameters¶
Comparison: Comparison function to useequal: the strings must be equalcontains: the string inBmust be contained inAcontainsWholeWord: the string inBmust be contained inA, as a whole word (only punctuation/spaces around)startsWith: the string inAmust start withBendsWith: the string inAmust end withBprefix: strings inAandBshare a prefix of a given lengthlevenshtein: the string in A may not have more thanMax edit-distancedifferences (character insertions or deletions) with B.jaro-winkler: the strings inAandBmust have a Jaro-Winkler similarity score not smaller thanMin similarity.
Invert comparison: the function selected is inverted: results include pairs for which the comparison is false. Only possible with aCandsinput connected.Case-sensitive: if set tofalse, upper/lower case is ignoredExclude self-matches: whether to emit the match if the objects inAandBare the same. Mostly useful whenAandBcome from the same source