Dots
On this page
You can use the dot (.
) to match any character, except line breaks. For example:
... # 3 characters
Most regex engines have a “singleline” option that changes the behavior of .
. When enabled,
.
matches everything, even line breaks. You could use this to check if a text fits in an SMS:
.{1,160} # enforces the 160 character limit
If you want to match any character, without having to enable the “singleline” option, Pomsky also
offers the variable C
, or Codepoint
:
Codepoint{1,160}
What’s a codepoint?
I lied when I said that the dot matches a character; it actually matches a Unicode codepoint.
A Unicode codepoint usually, but not always, represents a character. Exceptions are
composite characters like ć
(which may consist of a ´
and a c
when it isn’t normalized).
Composite characters are common in many scripts, including Japanese, Indian and Arabic scripts.
Also, an emoji can consist of multiple codepoints, e.g. when it has a gender or skin tone modifier.
Repeating the dot
Be careful when repeating C
or .
. My personal recommendation is to never repeat them. Let’s
see why:
'{' .* '}'
This matches any content surrounded by curly braces. Why is this bad? Because .*
will greedily consume anything, even curly braces, so looking for matches in the string
{ab} de {fg}
will return the whole string, but we probably expected to get the two matches {ab}
and {fg}
.
We’ll see how this can be fixed in a bit.