Let's suppose you have some weird Martian encoding where pairs of ASCII uppercase encode single Martian letters (i.e. the two bytes ``CV'' make a single Martian letter, as do the two bytes ``SG'', ``VS'', ``XX'', etc.). Other bytes represent single characters, just like ASCII.
So, the string of Martian ``I am CVSGXX!'' uses 12 bytes to encode the nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'.
Now, say you want to search for the single character /SG/. Perl doesn't know about Martian, so it'll find the two bytes ``SG'' in the ``I am CVSGXX!'' string, even though that character isn't there. It's a big problem.
Here are a few ways, all painful, to deal with it:
$martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent ``maritan'' bytes # are no longer adjacent. print "found SG!\n" if $martain =~ /SG/;
Or like this:
@chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g; # above is conceptualy similar to: @chars = $text =~ m/(.)/g; # foreach $char (@chars) { print "found SG!\n", last if $char eq 'SG'; }
Or like this:
while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded print "found SG!\n", last if $1 eq 'SG'; }
Or like this:
die "sorry, Perl doesn't (yet) have Martian support )-:\n";
There are many double- (and multi-) byte encodings commonly used these days, including:
Big Five (Chinese) EUC-JP (Japanese) GB (Chinese) KS (Korean) SJIS (Microsoft Braindamage to Japanese) Unicode (various) Some versions of these have 1-, 2-, 3-, and 4-byte characters, all mixed.