Dots
You can use the dot (.) to match any character, except line breaks. For example:
... # 3 charactersMost regex engines have a “singleline” option that changes the behavior of .. When enabled,
. matches everything, even line breaks. You could use this to check if a text fits in an SMS:
.{1,160} # enforces the 160 character limitIf you want to match any character, without having to enable the “singleline” option, Pomsky also
offers the variable C, or Codepoint:
Codepoint{1,160}What’s a codepoint?
Section titled “What’s a codepoint?”I lied when I said that the dot matches a character; it actually matches a Unicode codepoint.
A Unicode codepoint usually, but not always, represents a character. Exceptions are
composite characters like ć (which may consist of a ´ and a c when it isn’t normalized).
Composite characters are common in many scripts, including Japanese, Indian and Arabic scripts.
Also, an emoji can consist of multiple codepoints, e.g. when it has a gender or skin tone modifier.
Repeating the dot
Section titled “Repeating the dot”Be careful when repeating C or .. My personal recommendation is to never repeat them. Let’s
see why:
'{' .* '}'This matches any content surrounded by curly braces. Why is this bad? Because .*
will greedily consume anything, even curly braces, so looking for matches in the string
{ab} de {fg} will return the whole string, but we probably expected to get the two matches {ab}
and {fg}.
We’ll see how this can be fixed in a bit.