Boundaries
Boundaries match a position in a string without consuming any code points. There are 4 boundaries:
%
matches a word boundary. It matches successfully if it is preceded, but not succeeded by a word character, or vice versa. For example,Codepoint % Codepoint
matchesA;
and;A
, but notAA
or;;
.!%
matches a position that is not a word boundary. For example,Codepoint !% Codepoint
matchesaa
and::
, but nota:
or:a
.^
(orStart
) matches the start of the string.$
(orEnd
) matches the end of the string.
A word character is anything that matches [word]
. If the regex engine is
Unicode-aware, this is [Alphabetic Mark Decimal_Number Connector_Punctuation]
.
For some regex engines, Unicode-aware matching has to be enabled first
(see here).
In JavaScript, %
and !%
is never Unicode-aware, even when
the u
flag is set. That’s why Unicode must be disabled to use them:
disable unicode;
% 'Pomsky' %
Note that a disable
statement can be nested within a group:
(disable unicode; %) 'Pomsky' (disable unicode; %)
This can be useful when you want to only disable Unicode for part of an expression.