Lookaround
Lookarounds allow you to see if the characters before or after the current position match a certain expression. Lookarounds are assertions, meaning that they have a length of 0, much like anchors and word boundaries.
Example
Section titled “Example”Let’s say we want to match all keys in a JSON file. JSON is a simple, structured text format, e.g.
{ "languages": [ { "name": "Pomsky", "proficiency": "expert", "open_source": true } ]}
To match all keys, we need to look for strings followed by a :
. However, we don’t want the colon
to be part of our match; we just want to check that it’s there! Here’s a possible solution:
'"' !['"']* '"' (>> [space]* ':')
The >>
introduces a lookahead assertion. It checks that the "
is followed by
a:
, possibly with spaces in between. The contents of the lookahead are not included in the match.
But what if there’s a key containing escaped quotes, e.g. "foo \"bar\" baz"
? To handle this, we
need to allow escape sequences in the string:
'"' (!['\"'] | '\\' | '\"')* '"' (>> [space]* ':')
There’s just one piece missing: The first quote should not be preceded by a backslash, so we need another assertion:
(!<< '\') '"' (!['\"'] | '\\' | '\"')* '"' (>> [space]* ':')
This is a negative lookbehind assertion. It asserts that the string is not preceded by the contained expression.
In total, there are 4 kinds of lookaround assertions:
>>
(positive lookahead)<<
(positive lookbehind)!>>
(negative lookahead)!<<
(negative lookbehind)
Note that lookbehind isn’t supported everywhere. Rust supports neither lookbehind nor lookahead.
Intersection expressions
Section titled “Intersection expressions”Lookaround makes it possible to simultaneously match a string in multiple ways. For example:
< (!>> ('_' | 'for' | 'while' | 'if') >) [word]+ >
This matches a string consisting of word characters, but not one of the keywords _
, for
,
while
and if
.
Be careful when using this technique, because the lookahead might not match the same length as the
expression after it. Here, we ensured that both match until the word end with >
.