Extract RegEx [Obj,String]

Description

Extract sub-strings based on the given RegEx.

Input

  • SOURCE [OBJ,STRING]: a list of object-string pairs.

Output

  • PAIR [OBJ,STRING]: a result pair contains an object from the input source and an extracted sub-string.

  • RESULT [STRING]: the extracted sub-strings. Notice that reference to which object each sub-string came from is lost.

Parameters

  • Pattern RegEx: the regular expression to use for matching sub-strings

  • Max matches: extract up to this number of sub-strings. Use 0 (default) for unlimited.

  • Case-sensitive: if set to false, upper/lower case is ignored

Regular expressions

Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.

Some of the most common questions/mistakes

  • Regular expressions are different from [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming) using wildcards. In particular, * does NOT mean “anything”, .* does.

  • All special characters (. * + ? | \ ( ) [ ] ^ $) must be escaped (prefixed with \) when they are meant literally, in the RegEx.

  • ^ indicates the beginning of an input text, or negation when used inside a multiple choice (e.g. [^\d-_]). $ indicates the end of an input text.

  • \b indicates a word-boundary (spaces, punctuation, etc.).

Examples

  • Find names in the form of Smith, John:

    • Pattern RegEx: \b[^,]+\s*,\s*\b\w+\b

  • Find any day of the week (with Case-sensitive = false):

    • Pattern RegEx: \b(mon|tue|wednes|thurs|fri|sat|sun)day\b

Output scores can be aggregated and/or normalised.