# Replace with RegEx [Strings] ### Description Transforms input strings using a [regular expression](https://www.regular-expressions.info/) replacement. ### Input - `SOURCE [STRING]`: Input strings ### Output - `RESULT [STRING]`: the modified strings ### Parameters - `Pattern RegEx`: the regular expression to use for the match in `SOURCE`. - `Replacement`: the replacement to use in `RESULT`. - `Occurrences`: - `First`: replace only the first occurrence in each string in input - `All`: replace all the occurrences in each string in input - `Case-sensitive`: if set to `false`, upper/lower case is ignored Output scores can be [aggregated](docs://score_aggregation) and/or [normalised](docs://score_normalisation). ### Regular expressions Regular expressions are internally evaluated by a [PCRE](https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) engine. For a syntax reference, see [this page](https://www.regular-expressions.info/refflavors.html). For a 1-page syntax reference, see this [cheat-sheet](https://www.debuggex.com/cheatsheet/regex/pcre). #### Some of the most common questions/mistakes - Regular expressions are different from [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming) using wildcards. In particular, `*` does NOT mean "anything", `.*` does. - All special characters (`. * + ? | \ ( ) [ ] ^ $`) must be escaped (prefixed with `\`) when they are meant literally, in the `Pattern RegEx`. They are always meant literally (thus, no escaping!) in the `Replacent RegEx` (except group references, see below) - [Capturing groups](https://www.regular-expressions.info/refcapture.html) are indicated by parentheses, and back-references by either `\n` or `$n`, whith `n` being the n-th group in the pattern. - Parentheses can also be used to group sub-expressions together, for example in choices: `(one|two|three)`. To use parentheses only for grouping and not capturing, use the `?:` prefix, as in `(?:one|two|three)`. - `^` indicates the beginning of an input text, or negation when used inside a multiple choice (e.g. `[^\d-_]`). `$` indicates the end of an input text. - `\b` indicates a word-boundary (spaces, punctuation, etc.). #### Examples - Normalize spaces (with `Occurrences = All`) - `Pattern RegEx`: `\s+` - `Replacement`: `⎵` (a single space) - Turn `Smith, John` into `John Smith`: - `Pattern RegEx`: `^([^,]+)\s*,\s*(.+)$` - `Replacement`: `$2 $1` - Extract any day of the week (with `Case-sensitive = false`): - `Pattern RegEx`: `.*\b((?:mon|tue|wednes|thurs|fri|sat|sun)day)\b.*` - `Replacement`: `$1`