# Replace with RegEx [Strings]

### Description
Transforms input strings using a [regular expression](https://www.regular-expressions.info/) replacement.

### Input
- `SOURCE [STRING]`: Input strings

### Output
- `RESULT [STRING]`: the modified strings

### Parameters
- `Pattern RegEx`: the regular expression to use for the match in `SOURCE`.
- `Replacement`: the replacement to use in `RESULT`.
- `Occurrences`:
  - `First`: replace only the first occurrence in each string in input
  - `All`: replace all the occurrences in each string in input
- `Case-sensitive`: if set to `false`, upper/lower case is ignored

Output scores can be [aggregated](docs://score_aggregation) and/or [normalised](docs://score_normalisation).

### Regular expressions
Regular expressions are internally evaluated by a [PCRE](https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) engine.
For a syntax reference, see [this page](https://www.regular-expressions.info/refflavors.html).
For a 1-page syntax reference, see this [cheat-sheet](https://www.debuggex.com/cheatsheet/regex/pcre).

#### Some of the most common questions/mistakes
- Regular expressions are different from [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming) using wildcards.
  In particular, `*` does NOT mean "anything", `.*` does.
- All special characters (`. * + ? | \ ( ) [ ] ^ $`) must be escaped (prefixed with `\`) when they are meant literally, in the `Pattern RegEx`.
  They are always meant literally (thus, no escaping!) in the `Replacent RegEx` (except group references, see below)
- [Capturing groups](https://www.regular-expressions.info/refcapture.html) are indicated by parentheses, and back-references by either `\n` or `$n`,
  whith `n` being the n-th group in the pattern.
- Parentheses can also be used to group sub-expressions together, for example in choices: `(one|two|three)`.
  To use parentheses only for grouping and not capturing, use the `?:` prefix, as in `(?:one|two|three)`.
- `^` indicates the beginning of an input text, or negation when used inside a multiple choice (e.g. `[^\d-_]`).
  `$` indicates the end of an input text.
- `\b` indicates a word-boundary (spaces, punctuation, etc.).

#### Examples
- Normalize spaces (with `Occurrences = All`)
  - `Pattern RegEx`: `\s+`
  - `Replacement`: `⎵` (a single space)
- Turn `Smith, John` into `John Smith`:
  - `Pattern RegEx`: `^([^,]+)\s*,\s*(.+)$`
  - `Replacement`: `$2 $1`
- Extract any day of the week (with `Case-sensitive = false`):
  - `Pattern RegEx`: `.*\b((?:mon|tue|wednes|thurs|fri|sat|sun)day)\b.*`
  - `Replacement`: `$1`