The standard notation for regular expressions dates back to a time when short, concise notations were preferred over lengthy, verbose ones. Such notational brevity, powerful though it is, can all too easily trigger panic in the uninitiated. In Perl, we can reduce the shock value of regex-induced punctuation overload with a few simple constructs.
The /x
modifier is the most important of these. It causes whitespace to be ignored
in the regex (well, except in a character class), and also allows you to
use normal comments there, too. As you can imagine, whitespace and comments
help significantly.
Another feature that can enhance legibility is selecting your own delimiters for matching or substitution. This way an unfortunate pattern afflicted with LTS (that's Leaning Toothpick Syndrome) can be written in a variety of ways:
if ( /^\/usr\/bin\/perl\b/ ) { ... } if ( m(^/usr/bin/perl\b) ) { ... } if ( m{^/usr/bin/perl\b} ) { ... } if ( m#^/usr/bin/perl\b# ) { ... }
For example, contrast this apparent hiccup from your modem:
s{(?:[^>'"]*|".*?"|'.*?')+>
with its legible rewrite derived the striphtml program described in the FAQ section on Networking:
s{ < # opening angle bracket (?: # Non-backreffing grouping paren [^>'"] * # 0 or more things that are neither > nor ' nor " | # or else ".*?" # a section between double quotes (stingy match) | # or else '.*?' # a section between single quotes (stingy match) ) + # all occurring one or more times > # closing angle bracket }{}gsx; # replace with nothing, i.e. delete
Ok, so it's still not quite so clear as prose, but at least now you have a chance of going back to it later and having a clue what you were trying to do.