Modifier

Modifiers change how the following expression should be treated.

Syntax

let Modifier = ModifierKeyword BooleanSetting ';';

let ModifierKeyword =
    | 'enable'
    | 'disable';

let BooleanSetting =
    | 'lazy'
    | 'unicode';

Example

enable lazy;
disable unicode;

[w]*
(
  disable lazy;
  .+
)

Support

Modifiers are supported in all flavors.

Support for each mode is gated by the lazy-mode and ascii-mode features. Specify features with the --allowed-features option.

Behavior

Modes can be enabled and disabled in any scope.

There are two modifiers that can be enabled or disabled:

Lazy

Enabling lazy mode means that all repetitions in the same scope are lazy by default; opting out is done with the greedy keyword, e.g.

enable lazy;

[w]* greedy

Unicode

Unicode mode is enabled by default. Disabling it means that the expression in the same scope is no longer Unicode aware and assumes an ASCII-only input. As a result, shorthand character classes are compiled differently (e.g. [space] is compiled to [ \t-\r]), and Unicode properties (e.g. [Greek]) are unavailable. Non-ASCII strings and code points are still allowed.

In JavaScript, Unicode must be disabled in order to use %, < and > word boundaries.

Disabling Unicode can vastly improve runtime performance, especially for [word] and [digit]. Alternatively, you can use [ascii_word], [ascii_digit], and so on.

Compilation

Modifiers produce no output, but they change how other expressions are compiled.

Issues

The dot and word boundaries are Unicode-aware in some regex engines even when Unicode mode is disabled.

Some mode modifiers are not yet implemented, most importantly ignore_case, single_line and multi_line.

History

  • Non-Unicode mode added in Pomsky 0.10
  • Lazy mode added in Pomsky 0.3