Boundaries

Boundaries match a position in a string without consuming any code points. There are 4 boundaries:

  • % matches a word boundary. It matches successfully if it is preceded, but not succeeded by a word character, or vice versa. For example, Codepoint % Codepoint matches A; and ;A, but not AA or ;;.

  • !% matches a position that is not a word boundary. For example, Codepoint !% Codepoint matches aa and ::, but not a: or :a.

  • ^ (or Start) matches the start of the string.

  • $ (or End) matches the end of the string.

A word character is anything that matches [word]. If the regex engine is Unicode-aware, this is [Alphabetic Mark Decimal_Number Connector_Punctuation]. For some regex engines, Unicode-aware matching has to be enabled first (see here).

In JavaScript, % and !% is never Unicode-aware, even when the u flag is set. That’s why Unicode must be disabled to use them:

disable unicode;

% 'Pomsky' %

Note that a disable statement can be nested within a group:

(disable unicode; %) 'Pomsky' (disable unicode; %)

This can be useful when you want to only disable Unicode for part of an expression.