Example: Email Addresses
This StackOverflow answer contains a massive regular expression for matching any RFC 5322 compliant email address:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
If your regex engine supports insiginificant whitespace mode (?x)
, it can
be written like this:
(?x)
(?:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?: \. [a-z0-9!#$%&'*+/=?^_`{|}~-]+ )*
| "
(?:
[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
| \\ [\x01-\x09\x0b\x0c\x0e-\x7f]
)*
"
)
@
(?:
(?: [a-z0-9] (?: [a-z0-9-]* [a-z0-9] )? \. )+
[a-z0-9]
(?: [a-z0-9-]* [a-z0-9] )?
| \[
(?:
(?: (2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9]) )
\.
){3}
(?:
(2 (5 [0-5] | [0-4] [0-9]) | 1 [0-9] [0-9] | [1-9]? [0-9])
| [a-z0-9-]*
[a-z0-9]
:
(?:
[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
| \\ [\x01-\x09\x0b\x0c\x0e-\x7f]
)+
)
\]
)
Here’s a straightforward translation into Pomsky:
(
| ['a'-'z' '0'-'9' "!#$%&'*+/=?^_`{|}~-"]+
('.' ['a'-'z' '0'-'9' "!#$%&'*+/=?^_`{|}~-"]+ )*
| '"'
(
[U+01-U+08 U+0b U+0c U+0e-U+1f U+21 U+23-U+5b U+5d-U+7f]
| '\' [U+01-U+09 U+0b U+0c U+0e-U+7f]
)*
'"'
)
'@'
(
| ( ['a'-'z' '0'-'9'] ( ['a'-'z' '0'-'9' '-']* ['a'-'z' '0'-'9'] )? '.' )+
['a'-'z' '0'-'9']
( ['a'-'z' '0'-'9' '-']* ['a'-'z' '0'-'9'] )?
| '['
(:(range '0'-'255') '.'){3}
(
| :(range '0'-'255')
| ['a'-'z' '0'-'9' '-']*
['a'-'z' '0'-'9']
':'
(
| [U+01-U+08 U+0b U+0c U+0e-U+1f U+21-U+5a U+53-U+7f]
| '\' [U+01-U+09 U+0b U+0c U+0e-U+7f]
)+
)
']'
)
Notice how the complex logic for matching a number between ‘0’ and ‘255’ is replaced by a simple
range
expression in Pomsky.
We can also write the above as follows using variables:
let before_at = ['a'-'z' '0'-'9' "!#$%&'*+/=?^_`{|}~-"];
let escaped = '\' [U+01-U+09 U+0b U+0c U+0e-U+7f];
let quoted_before_at = [U+01-U+08 U+0b U+0c U+0e-U+1f U+21 U+23-U+5b U+5d-U+7f];
let port_digit = [U+01-U+08 U+0b U+0c U+0e-U+1f U+21-U+5a U+53-U+7f];
let lower_digit = ['a'-'z' '0'-'9'];
let lower_digit_dash = ['a'-'z' '0'-'9' '-'];
let domain_label = lower_digit (lower_digit_dash* lower_digit)?;
(
| before_at+ ('.' before_at+)*
| '"' (quoted_before_at | escaped)* '"'
)
'@'
(
| (domain_label '.')+ domain_label
| '['
(:(range '0'-'255') '.'){3}
(
| :(range '0'-'255')
| lower_digit_dash* lower_digit ':' (port_digit | escaped)+
)
']'
)