Modifiers

Modifiers allow you to change the behavior of a Pomsky expression. Modifiers are statements; they can appear either at the top of the file, or inside a group:

disable unicode;

[word]+ (enable unicode; '.' [word]+)

Modifiers must appear before the expression they modify. They consist of two parts: The enable or disable keyword, and a mode, followed by a ;.

There are currently two modes that can be enabled or disabled:

Unicode mode

Unicode is enabled by default; disable it with disable unicode;.

When Unicode mode is disabled, shorthands like [word] no longer recognize Unicode, only ASCII. Unicode properties like [Letter] or [Emoji] are forbidden when Unicode is disabled.

Unicode mode also affects word boundaries: When disabled, only the ASCII characters a-z, A-Z, 0-9 and underscore _ are treated as word characters. This means that the word Königsstraße has word boundaries around the ö and ß, because they are not in the ASCII character set.

Lazy mode

The lazy mode is enabled with enable lazy;. It has the effect that repetition (which is usually greedy) becomes lazy: The regex engine will then try to repeat the expression as few times as possible.

For example, the expression 'la'+ will always match exactly one la in lazy mode, even when the search string is lalalala, because the regex engine stops searching as soon as it found the first la.

Lazy mode is a solution to the problem that occurs when the dot is repeated:

enable lazy;

'{' .* '}'

Without lazy mode, this greedily consumes as many characters as possible. So if the string {foo} bar {baz} should contain two matches, lazy mode is required. However, it is usually better to make the repetition more specific:

'{' !['{}']* '}'

This is more performant because it avoids backtracking, and it is unambiguous.

Note that laziness and greediness can also be set individually for each repetition:

.* lazy     # make only this repetition lazy
.* greedy   # make only this repetition greedy