Replace with RegEx¶
Description¶
Transforms strings in a [OBJ,STRING] input using a regular expression replacement.
Input¶
SOURCE [OBJ,STRING]: a 2-column input with an object-string pair. Typically obtained with theExtract stringblock
Output¶
RESULT [OBJ,STRING]: the pairs fromSOURCE, where the string has been modifiedSTRINGS [STRING]: the modified strings, without the object they were paired to
Parameters¶
Pattern RegEx: the regular expression to use for the match inSOURCE.Replacement: the replacement to use nRESULT.Occurrences:First: replace only the first occurrence in each string in inputAll: replace all the occurrences in each string in input
Case-sensitive: if set tofalse, upper/lower case is ignored
Output scores can be aggregated and/or normalised.
Regular expressions¶
Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.
Some of the most common questions/mistakes¶
Regular expressions are different from [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming) using wildcards. In particular,
*does NOT mean “anything”,.*does.All special characters (
. * + ? | \ ( ) [ ] ^ $) must be escaped (prefixed with\) when they are meant literally, in thePattern RegEx. They are always meant literally (thus, no escaping!) in theReplacent RegEx(except group references, see below)Capturing groups are indicated by parentheses, and back-references by either
\nor$n, whithnbeing the n-th group in the pattern.Parentheses can also be used to group sub-expressions together, for example in choices:
(one|two|three). To use parentheses only for grouping and not capturing, use the?:prefix, as in(?:one|two|three).^indicates the beginning of an input text, or negation when used inside a multiple choice (e.g.[^\d-_]).$indicates the end of an input text.\bindicates a word-boundary (spaces, punctuation, etc.).
Examples¶
Normalize spaces (with
Occurrences = All)Pattern RegEx:\s+Replacement:⎵(a single space)
Turn
Smith, JohnintoJohn Smith:Pattern RegEx:^([^,]+)\s*,\s*(.+)$Replacement:$2 $1
Extract any day of the week (with
Case-sensitive = false):Pattern RegEx:.*\b((?:mon|tue|wednes|thurs|fri|sat|sun)day)\b.*Replacement:$1