Skip to content
You are viewing the next version of this website. See current version

Dots

You can use the dot (.) to match any character, except line breaks. For example:

... # 3 characters

Most regex engines have a “singleline” option that changes the behavior of .. When enabled, . matches everything, even line breaks. You could use this to check if a text fits in an SMS:

.{1,160} # enforces the 160 character limit

If you want to match any character, without having to enable the “singleline” option, Pomsky also offers the variable C, or Codepoint:

Codepoint{1,160}

I lied when I said that the dot matches a character; it actually matches a Unicode codepoint.

A Unicode codepoint usually, but not always, represents a character. Exceptions are composite characters like ć (which may consist of a ´ and a c when it isn’t normalized). Composite characters are common in many scripts, including Japanese, Indian and Arabic scripts. Also, an emoji can consist of multiple codepoints, e.g. when it has a gender or skin tone modifier.

Be careful when repeating C or .. My personal recommendation is to never repeat them. Let’s see why:

'{' .* '}'

This matches any content surrounded by curly braces. Why is this bad? Because .* will greedily consume anything, even curly braces, so looking for matches in the string {ab} de {fg} will return the whole string, but we probably expected to get the two matches {ab} and {fg}.

We’ll see how this can be fixed in a bit.