Variables
Variables can be declared and later used to keep your code DRY. Variables are inlined into the resulting expressions, similarly to macros in some programming languages.
Syntax
let LetDeclaration = 'let' Name '=' OrExpression ';';
let Variable = Name;
Variabes are used simply by mentioning their name.
Example
let number = [digit]+;
let identifier = [ascii_alnum '-']+;
let identifiers = identifier ('.' identifier)*;
number '.' number '.' number ('-' identifiers)? ('+' identifiers)?
Support
Variables are supported in all flavors, since they are inlined.
Support for variables is gated by the variables
feature (enabled by default). Specify features
with the --allowed-features
option.
Behavior
A variable may be used multiple times, but not recursively:
let a = '.' b;
let b = ':' a?; # ERROR
This is because variables are inlined, so recursion would produce a regex of infinite size.
Variable declarations must be written before the actual expression. They can be nested within groups and lookaround assertions. When nested, the variables can only be used within the enclosing scope:
(
let foo = 'foo';
foo # allowed
)
foo # ERROR
Variables from an outer scope can be “shadowed” (redeclared) in an inner scope. When using it in the inner scope, it refers to the inner (shadowed) declaration, but when using it in the outer scope, it refers to the outer variable:
let foo = '1';
(
let foo = '2';
foo # 2
)
foo # 1
Technically, these are considered two different variables that just happen to have the same name, but the inner variable is only accessible within the group in which it was declared.
Variables can depend on each other, as long as there are no cycles, and the order of declarations does not matter. Notably, a variable can be used before it was declared:
let a = b b;
let b = 'test';
a
There are a few built-in variables. These can also be shadowed.
Built-in variables
There are 6 built-in variables:
Grapheme
matches a single extended grapheme cluster. It compiles to the regex\X
. Note that this functionality is not available in all regex flavors.G
is an alias forGrapheme
Codepoint
matches a single Unicode code point. It compiles to the regex[\s\S]
.C
is an alias forCodepoint
Start
: Matches the start of the string. Equivalent to^
.End
: Matches the end of the string. Equivalent to$
.
Compilation
Compilation works by recursively substituting variables with the expression in their declaration. This is called expansion:
let a = '.' b?;
let b = 'test'*;
a+
becomes:
let b = 'test'*;
('.' b?)+
becomes:
('.' 'test'*)+
Note that expressions sometimes need to be wrapped in a group. Also, the expansion sometimes enables
optimizations, such as the removal of the ?
repetition above.
Issues
Because of the way variables are compiled, the resulting regex can be quite large – so large, in fact, that regex engines may run out memory trying to compile them into a state machine. This is particularly likely in the Rust flavor. To remedy this, be careful how often you use variables that expand to complicated expressions.
Security concerns
Expansion of variables is not cached, so compilation time can be exponential, see the Billion Laughs Attack as an example.
An attacker should not be allowed to compile untrusted Pomsky expressions on a server, as this can take forever and exhaust the server’s resources.
History
- Built-in variables
Start
,End
,Codepoint
,Grapheme
added in Pomsky 0.4.2 - Initial implementation in Pomsky 0.3