Pomsky 0.10 released with new tools

Posted on March 22, 2023 by Ludwig Stecher ‐ 4 min read

Cover art: A grown husky standing in the grass on a river bank, as if he is thinking about taking a swim. It is a beautiful, sunny day, and the husky looks happy as he is sticking out his tongue. His fur is black on the back and changes to light gray towards the legs. His face is mostly white, with a black forehead. His neck has white and dark parts.

Just last week, Pomsky celebrated its first birthday! Today, I’m announcing version 0.10, along with a VSCode extension and a JavaScript plugin. This allows you to write regular expressions with Pomsky, and include them as RegExp with a bundler like Vite or Webpack.

What is Pomsky?

Pomsky is a modern syntax for regular expressions, transpiled to regexes compatible with JavaScript, Java, PCRE, Rust, Ruby, Python, or .NET. It aims to be more readable, portable, and powerful than traditional regexes. The biggest difference is that Pomsky is whitespace-insensitive, and string literals are quoted. For example, the regex \[.+\] could be written as '[' .+ ']'. Here’s a larger example that shows Pomsky’s strengths:

# variables allow code re-use
let octet = range '0'-'255';

# ranges can match multi-digit numbers
let subnetMask = '/' range '0'-'32';

# concise syntax for named capturing groups (ipv4, mask)
:ipv4(octet ('.' octet){3}) :mask(subnetMask)?

This is transpiled to the following regex:

(?<ipv4>(?:0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)(?:\.(?:0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?|[3-9][0-9]?)){3})(?<mask>/(?:0|[1-2][0-9]?|3[0-2]?|[4-9]))?

What’s new

Version 0.10 includes a number of language changes and new tools.

VSCode extension

Pomsky now has a VSCode extension, with syntax highlighting, auto-completion, and diagnostics (errors and warnings). It also has a preview panel, which you can open with the icon in the top right corner or from the right-click menu:

VSCode window showing a Pomsky file with the &lsquo;.pom&rsquo; extension on the left, and a panel with the compiled regular expression on the right. The panel has a toolbar with a dropdown to select a regex flavor. VSCode&rsquo;s blue toolbar at the bottom tells us that the expression was compiled in 0.07 ms, using Pomsky 0.10.0

To use the VSCode extension, you currently need to have Pomsky installed locally. You can download it from the releases page. If you want to try Pomsky without installing anything, check out the playground!

Note that we don’t have an official file extension yet. You can vote for one here.

JavaScript plugin

Thanks to @Kyza, we now have a JavaScript plugin that allows you to easily import Pomsky expressions in your JavaScript projects using Vite, Webpack, Rollup, ESBuild or ESM. Once you followed the installation instructions, you can use it like this:

import ipv4 from './ipv4.pom'

function isValidIpV4(string) {
  return ipv4().test(string)
}

How does it work? The plugin looks for imports with the file extension .pom or .pomsky and transforms them to export a function that returns the corresponding RegExp. This is done at build time, so it doesn’t affect website performance. I would like to thank @Kyza for creating this wonderful plugin.

Disabling Unicode

One annoying thing about regular expressions in JavaScript is that \w and \b do not respect Unicode, even if the u flag (short for unicode) is set! Pomsky polyfills Unicode support for [word], but not for % (a word boundary) for several reasons1.

To address this issue, Pomsky now lets you opt out of Unicode support. Now word boundaries are forbidden in the JavaScript flavor, but can be used if you disable Unicode support:

disable unicode;
% 'Pomsky' %

This makes sense when you know that your input contains only ASCII characters. It also has the advantage that some shorthands, such as [word], can be optimized better: [word] normally matches over 800 Unicode ranges, but when Unicode is disabled, it’s just [a-zA-Z0-9_], which is more performant.

Reserved keywords

U was reserved to make the syntax for codepoints (e.g. U+FF) unambiguous. Now codepoints can be parsed even if they contain spaces (e.g. U + FF), although this isn’t usually recommended.

Also, test has been reserved as a keyword to support unit testing in the future. Pomsky will be able to run unit tests during compilation to make sure that the regex matches certain strings:

test {
  match '127.0.0.1';
  match '80.255.255.254/24';
  reject '400.1.2.3';
}

# your Pomsky expression here...

These tests not only give you immediate feedback if the expression is incorrect, they also document the code so that a reader can see examples of what the expression does or does not match. Unit tests will probably land in Pomsky 0.11, but the syntax is not fully fleshed out yet. Let me know what you think!

You can find the full list of changes in the changelog.

A glimpse into the crystal ball

In January, I published my roadmap for this year, and some things are already done! The VSCode extension is not feature complete, but it is already very useful. Most of the language improvements I envisioned in January have also been implemented. The testing infrastructure has also made progress: All regex flavors are now tested in CI and compiled in our fuzzer.

As for the future, it can be hard to predict what will happen. For example, I didn’t expect someone to create a JavaScript plugin, which was a pleasant surprise! I also underestimated my own velocity, so some of the things I put in the “not planned” section could be completed this year.

If there is something you would like to work on, any help is appreciated! Pomsky has some good first issues. For example, if you’re a native English speaker, we could use your help to overhaul the documentation.

If you like Pomsky, consider sponsoring me. You can also give Pomsky a star and join our Discord server.

Cheers,
Ludwig


  1. [word] is polyfilled as [\p{Alphabetic}\p{M}\p{Nd}\p{Pc}]. Word boundaries (%) aren’t polyfilled for two reasons: First, the produced regex would be very large: (?:(?<![\p{Alphabetic}\p{M}\p{Nd}\p{Pc}])(?=[\p{Alphabetic}\p{M}\p{Nd}\p{Pc}])|(?<=[\p{Alphabetic}\p{M}\p{Nd}\p{Pc}])(?![\p{Alphabetic}\p{M}\p{Nd}\p{Pc}])) Secondly, the polyfill requires lookbehind, which isn’t universally supported. Notably, lookbehind doesn’t work in Safari. ↩︎