Boundaries (word boundaries and anchors) are assertions that match if the current position has a certain property.
let Boundary = | '^' | '$' | '%' | '<' | '>';
^ $ # match empty string % 'foo' % # match 'foo' surrounded by word boundaries !% 'foo' !% # match 'foo' not surrounded by word boundaries < 'foo' > # match 'foo' as a whole word
$) are supported in all flavors. Word boundaries (
>) are not
Support for boundaries is gated by the
boundaries feature. Specify features with the
All boundaries are assertions – they match between two characters. They do not contain any text, and repeating them has no effect.
$ are anchors. They match at the start and end of the string, respectively. Regex engines
usually have a way to change their behavior to match at the start and end of the line instead.
They have the built-in
End variables as aliases.
% is a word boundary, which matches either at the start or at the end of a word.
< only matches
at the start of a word,
> only at the end. Surround a word with
% % or with
< > to make sure
it doesn’t match a substring of a word, e.g.
test in the word
A word boundary is a position next to a “word character” (matching
only on one side. A word character is a character in one of the following Unicode general
In the ASCII subset of Unicode, this would be the letters
A-Z, the digits
0-9, and the
% word boundary is the only boundary that can be negated.
!% matches a position that is not
a word boundary, which means that it must be surrounded by either 0 or 2 word characters.
Relation to lookaround
Every boundary can be expressed in terms of lookaround assertions:
Anchors are compiled verbatim to
$. Word boundaries are compiled to
> are compiled to
[[:>:]]when targeting PCRE
\>when targeting Rust
(?<=\w)(?!\w)when targeting any other flavor
In other flavors, word boundaries are always Unicode aware, even when Unicode has been disabled.
>in Pomsky 0.11
- Removed deprecated
%>syntax in Pomsky 0.7
$in Pomsky 0.6
Endvariables in Pomsky 0.4.2
- Initial implementation in Pomsky 0.1
- Using old syntax
- Using old syntax