Lookaround

Lookarounds assert that a certain expression matches before or after the current position. As an assertion, a lookaround does not contain any text; it matches between two code points.

Syntax

let Lookaround = LookaroundPrefix Expression;

let LookaroundPrefix =
    | '<<'
    | '>>';

See Expression.

A lookaround must be wrapped in parentheses if it is followed by another expression:

(>> [word]) [Greek]

Note that a lookaround contains an expression, so it introduces a new scope and can include statements.

Example

(!<< [w])
(>>
  disable unicode;
  let aw = [w];
  aw{3}
)

Support

Support for lookaround is gated by the lookahead and lookbehind features. Specify features with the --allowed-features option.

Lookahead is supported almost everywhere. Lookbehind support is more limited:

PCRE

PCRE does not support arbitrary-length lookbehind. PCRE must be able to determine the length of the lookbehind in advance, so << 'foo'{3} works, but << 'foo'+ does not. PCRE has a special case that a lookbehind containing an alternation works even if the alternatives have different lengths, but each alternative must be constant-length.

JavaScript

JavaScript fully supports lookahead and lookbehind. However, lookbehind is still unsupported in some older browsers (notably, Safari up to version 16.3).

Java

Before Java 13, repetition in lookbehind was required to be finite, * and + did not work. Since Java 13, repetition can be unbounded, but may not correctly handle repetition with multiple quantifiers if one of them is unbounded. Lookbehind also may not contain backreferences.

Python

Python supports lookahead and constant-length lookbehind. Repetitions and alternations like << 'a' | 'bb' are forbidden in lookbehind.

Ruby, .NET

Full support for both lookahead and lookbehind

Rust

Lookaround not supported

Behavior

Lookahead checks if the contained expression matches at the current position. If it matches, the lookahead succeeds, otherwise it fails. Lookahead can be negated. A negative lookahead succeeds if the expression does not match. After the lookahead succeeded, the regex engine returns to the position in the string where it was before the lookahead, so the string matching the lookahead is not consumed.

Conceptually, lookbehind works in the same way, except that the expression is matched in reverse direction against the text preceding the current position. In reality, however, many regex engines do not match in reverse direction but go back n characters and check if the next n characters match the lookbehind.

Compilation

  • >> ... is compiled to (?=...)
  • !>> ... is compiled to (?!...)
  • << ... is compiled to (?<=...)
  • !<< ... is compiled to (?<!...)

Issues

The various limitations on lookbehind by different regex engines are not enforced at the moment.

Security concerns

Lookbehind can be slow in some regex engines.

History

Initial implementation in Pomsky 0.1