Replace with RegEx¶
Description¶
Transforms strings in a [OBJ,STRING]
input using a regular expression replacement.
Input¶
SOURCE [OBJ,STRING]
: a 2-column input with an object-string pair. Typically obtained with theExtract string
block
Output¶
RESULT [OBJ,STRING]
: the pairs fromSOURCE
, where the string has been modifiedSTRINGS [STRING]
: the modified strings, without the object they were paired to
Parameters¶
Pattern RegEx
: the regular expression to use for the match inSOURCE
.Replacement
: the replacement to use nRESULT
.Occurrences
:First
: replace only the first occurrence in each string in inputAll
: replace all the occurrences in each string in input
Case-sensitive
: if set tofalse
, upper/lower case is ignored
Output scores can be aggregated and/or normalised.
Regular expressions¶
Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.
Some of the most common questions/mistakes¶
Regular expressions are different from [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming) using wildcards. In particular,
*
does NOT mean “anything”,.*
does.All special characters (
. * + ? | \ ( ) [ ] ^ $
) must be escaped (prefixed with\
) when they are meant literally, in thePattern RegEx
. They are always meant literally (thus, no escaping!) in theReplacent RegEx
(except group references, see below)Capturing groups are indicated by parentheses, and back-references by either
\n
or$n
, whithn
being the n-th group in the pattern.Parentheses can also be used to group sub-expressions together, for example in choices:
(one|two|three)
. To use parentheses only for grouping and not capturing, use the?:
prefix, as in(?:one|two|three)
.^
indicates the beginning of an input text, or negation when used inside a multiple choice (e.g.[^\d-_]
).$
indicates the end of an input text.\b
indicates a word-boundary (spaces, punctuation, etc.).
Examples¶
Normalize spaces (with
Occurrences = All
)Pattern RegEx
:\s+
Replacement
:⎵
(a single space)
Turn
Smith, John
intoJohn Smith
:Pattern RegEx
:^([^,]+)\s*,\s*(.+)$
Replacement
:$2 $1
Extract any day of the week (with
Case-sensitive = false
):Pattern RegEx
:.*\b((?:mon|tue|wednes|thurs|fri|sat|sun)day)\b.*
Replacement
:$1