This problem shows up most often when people try using chmod, mkdir, umask, or
sysopen,
which all want real permissions in octal.
chmod(644, $file); # WRONG -- perl -w catches this chmod(0644, $file); # right
The POSIX module (part of the standard perl distribution) implements
ceil,
floor,
and a number of other mathematical
and trigonometric functions.
The Math::Complex module (part of the standard perl distribution) defines a number of mathematical functions that can also work on real numbers. It's not as efficient as the POSIX library, but the POSIX library can't work with complex numbers.
Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.
$decimal = pack('B8', '10110110');
Here's an example of going the other way:
$binary_string = join('', unpack('B*', "\x29"));
@results = map { my_func($_) } @array;
For example:
@triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the results:
foreach $iterator (@array) { &my_func($iterator); }
To call a function on each integer in a (small) range, you can use:
@results = map { &my_func($_) } (5 .. 25);
but you should be aware that the ..
operator creates an array of all integers in the range. This can take a lot
of memory for large ranges. Instead use:
@results = (); for ($i=5; $i < 500_005; $i++) { push(@results, &my_func($i)); }
You should also check out the Math::TrulyRandom module from CPAN.
$day_of_year = (localtime(time()))[7];
or more legibly (in 5.004 or higher):
use Time::localtime; $day_of_year = localtime(time())->yday;
You can find the week of the year by dividing this by 7:
$week_of_year = int($day_of_year / 7);
Of course, this believes that weeks start at zero.
When gmtime
and localtime
are used in a scalar context they return a timestamp string that contains a
fully-expanded year. For example,
$timestamp = gmtime
sets $timestamp to ``Tue Nov 13 01:00:00 2001''. There's no year 2000
problem here.
s/\\(.)/$1/g;
Note that this won't expand \n or \t or any other special escapes.
s/(.)\1/$1/g;
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:
print "That yields ${\($n + 5)} widgets\n";
/xx/
will get the intervening bits in $1. For multiple ones, then something more
like /alphaomega/
would be needed. But none of these deals with nested patterns, nor can
they. For that you'll have to write a parser.
$reversed = reverse $string;
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard perl distribution).
use Text::Tabs; @expanded_lines = expand(@lines_with_tabs);
use Text::Wrap; print wrap("\t", ' ', @paragraphs);
$first_byte = substr($a, 0, 1);
If you want to modify part of a string, the simplest way is often to use substr as an lvalue:
substr($a, 0, 3) = "Tom";
Although those with a regexp kind of thought process will likely prefer
$a =~ s/^.../Tom/;
$count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there }igex;
$string = "ThisXlineXhasXsomeXx'sXinXit": $count = ($string =~ tr/X//); print "There are $count X charcters in the string";
This is fine if you are just looking for a single character. However, if
you are trying to count multiple character substrings within a larger
string, tr/// won't work. What you can do is wrap a while
loop around a
global pattern match. For example, let's count negative integers:
$string = "-9 55 48 -2 23 -76 4 14 -44"; while ($string =~ /-\d+/g) { $count++ } print "There are $count negative numbers in the string";
To make the whole line upper case: $line = uc;
To force each word to be lower case, with the first letter upper case: $line =~ s/(\w+)/\u\L$1/g;
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex problem. However, we thankfully have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in the special variable $_):
@new = (); push(@new, $+) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes | ([^,]+),? | , }gx; push(@new, undef) if substr($text,-1,1) eq ',';
$string =~ s/^\s*(.*?)\s*$/$1/;
It would be faster to do this in two steps:
$string =~ s/^\s+//; $string =~ s/\s+$//;
Or more nicely written as:
for ($string) { s/^\s+//; s/\s+$//; }
$text = 'this has a $foo in it and a $bar'; $text =~ s/\$(\w+)/${$1}/g;
Before version 5 of perl, this had to be done with a double-eval substitution:
$text =~ s/\$(\w+)/$$1/eeg;
Which is bizarre enough that you'll actually need probably need an EEG afterwards. :-)
If you get used to writing odd things like these:
print "$var"; # BAD $new = "$old"; # BAD somefunc("$var"); # BAD
You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
print $var; $new = $old; somefunc($var);
Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:
func(\@array); sub func { my $aref = shift; my $oref = "$aref"; # WRONG }
You can also get into subtle problems on those few operations in Perl that
actually do care about the difference between a string and a number, such
as the magical ++
autoincrement operator or the syscall function.
Sometimes it doesn't make a difference, but sometimes it does. For example, compare:
$good[0] = `some program that outputs several lines`;
with
@bad[0] = `same program that outputs several lines`;
The -w flag will warn you about these matters.
$prev = 'nonesuch'; @out = grep($_ ne $prev && ($prev = $_), @in);
This is nice in that it doesn't use much extra memory, simulating
uniq's
behavior of removing only adjacent duplicates.
undef %saw; @out = grep(!$saw{$_}++, @in);
@out = grep(!$saw[$_]++, @in);
undef %saw; @saw{@in} = (); @out = sort keys %saw; # remove sort if undesired
undef @ary; @ary[@in] = @in; @out = sort @ary;
@blues = qw/azure cerulean teal turquoise lapis-lazuli/; undef %is_blue; for (@blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); undef @is_tiny_prime; for (@primes) { $is_tiny_prime[$_] = 1; }
Now you check whether $is_tiny_prime[$some_number]
.
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
@articles = ( 1..10, 150..2000, 2017 ); undef $read; grep (vec($read,$_,1) = 1, @articles);
Now check whether vec is true for some $n
.
Please do not use
$is_there = grep $_ eq $whatever, @array;
or worse yet
$is_there = grep /$whatever/, @array;
These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are regexp characters in $whatever?).
@union = @intersection = @difference = (); %count = (); foreach $element (@array1, @array2) { $count{$element}++ } foreach $element (keys %count) { push @union, $element; push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; }
for ($i=0; $i < @array; $i++) { if ($array[$i] eq "Waldo") { $found_index = $i; last; } }
And now $found_index
has what you want.
If you really, really wanted, you could use structures as described in the perldsc manpage or the perltoot manpage and do just what the algorithm book tells you to do.
unshift(@array, pop(@array)); # the last shall be first push(@array, shift(@array)); # and vice versa
srand; @new = (); @old = 1 .. 10; # just a demo while (@old) { push(@new, splice(@old, rand @old, 1)); }
for
/foreach
:
for (@lines) { s/foo/bar/; tr[a-z][A-Z]; }
Here's another; let's compute spherical volumes:
for (@radii) { $_ **= 3; $_ *= (4/3) * 3.14159; # this will be constant folded }
srand; # not needed for 5.004 and later $index = rand @array; $element = $array[$index];
If you just want a random line from a file, you can do this:
srand; rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file in.
permut
function should work on any list:
#!/usr/bin/perl -n # permute - tchrist@perl.com permut([split], []); sub permut { my @head = @{ $_[0] }; my @tail = @{ $_[1] }; unless (@head) { print "@tail\n"; } else { my(@newhead,@newtail,$i); foreach $i (0 .. $#head) { @newhead = @head; @newtail = @tail; unshift(@newtail, splice(@newhead, $i, 1)); permut([@newhead], [@newtail]); } } }
@list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would sort into
.
<=>
, used above, is the numerical comparison operator.
If you have a complicated function needed to pull out the part you want, then don't do it inside the sort function. Pull it out first, because the sort function can be called many times for the same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively.
@idx = (); for (@data) { ($item) = /\d+\s*(\S+)/; push @idx, uc($item); } @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
Which could also be written this way, using a trick that's come to be known as the Schwartzian Transform:
@sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, uc((/\d+\s*(\S+) )[0] ] } @data;
If you need to sort on several fields, the following paradigm is useful.
@sorted = sort { field1($a) <=> field1($b) || field2($a) cmp field2($b) || field3($a) cmp field3($b) } @data;
This can be conveniently combined with precalculation of keys as given above.
See CPAN/doc/FMTEYEWTK/sort.html for more about this approach.
See also the question below on sorting hashes.
For example, this sets $vec to have bit N set if $ints[N] was set:
$vec = ''; foreach(@ints) { vec($vec,$_,1) = 1 }
And here's how, given a vector in $vec, you can get those bits into your @ints array:
sub bitvec_to_list { my $vec = shift; my @ints; # Find null-byte density then select best algorithm if ($vec =~ tr/\0// / length $vec > 0.95) { use integer; my $i; # This method is faster with mostly null-bytes while($vec =~ /[^\0]/g ) { $i = -9 + 8 * pos $vec; push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); } } else { # This method is a fast general algorithm use integer; my $bits = unpack "b*", $vec; push @ints, 0 if $bits =~ s/^(\d)// && $1; push @ints, pos $bits while($bits =~ /1/g); } return \@ints; }
This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfriend Koenig.)
while (($key,$value) = each %hash) { print "$key = $value\n"; }
If you want it sorted, you'll have to use foreach
on the
result of sorting the keys as shown in an earlier question.
head2 What happens if I add or remove keys from a hash while iterating over it?
Don't do that.
%by_value = reverse %by_key; $key = $by_value{$value};
That's not particularly efficient. It would be more efficient of space to use:
while (($key, $value) = each %by_key) { $by_value{$value} = $key; }
If your hash might have repeated values, the methods above will only find one of the associated keys. This may or may not worry you.
$num_keys = scalar keys %hash;
In void context it just resets the iterator, which is faster for tied hashes.
@keys = sort keys %hash; # sorted by key @keys = sort { $hash{$a} cmp $hash{$b} } keys %hash; # and by value
Here we'll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that fails, by straight ASCII comparison of the keys (well, possibly modified by your locale -- see the perllocale manpage).
@keys = sort { $hash{$b} <=> $hash{$a} || length($b) <=> length($a) || $a cmp $b } keys %hash;
$key
is present in the array, exists will return true. The value for a given key can be undef, in which case $array{$key}
will be
undef while $exists{$key}
will return true. This corresponds to ($key
, undef) being in the hash.
Pictures help... here's the %ary
table:
keys values +------+------+ | a | 3 | | x | 7 | | d | 0 | | e | 2 | +------+------+
And these conditions hold
$ary{'a'} is true $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is true exists $ary{'a'} is true (perl5 only) grep ($_ eq 'a', keys %ary) is true
If you now say
undef $ary{'a'}
your table now reads:
keys values +------+------+ | a | undef| | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is FALSE $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is FALSE exists $ary{'a'} is true (perl5 only) grep ($_ eq 'a', keys %ary) is true
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
delete $ary{'a'}
your table now reads:
keys values +------+------+ | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is false $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is false exists $ary{'a'} is FALSE (perl5 only) grep ($_ eq 'a', keys %ary) is FALSE
See, the whole entry is gone!
EXISTS
and
DEFINED
methods differently. For example, there isn't the
concept of undef with hashes that are tied to DBM* files. This means the
true/false tables above will give different results when used on such a
hash. It also means that exists and defined do the same thing with a DBM*
file, and what they end up doing is not what they do with ordinary hashes.
keys %hash
in a scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if
you use last to exit a loop early so that when you re-enter it, the hash iterator has
been reset.
%seen = (); for $element (keys(%foo), keys(%bar)) { $seen{$element}++; } @uniq = keys %seen;
Or more succinctly:
@uniq = keys %{{%foo,%bar}};
Or if you really want to save space:
%seen = (); while (($key) = each %foo) { $seen{$key}++; } while (($key) = each %bar) { $seen{$key}++; } @uniq = keys %seen;
somefunc($hash{"nonesuch key here"});
Then that element ``autovivifies''; that is, it springs into existence
whether you store something there or not. That's because functions get
scalars passed in by reference. If somefunc
modifies $_[0]
, it has to be ready to write it back into the caller's version.
This is a considered a bug that we hope to fix.
Normally, merely accessing a key's value for a nonexistent key does not cause that key to be forever there. This is different than awk's behavior.
if (`cat /vmunix` =~ /gzip/) { print "Your kernel is GNU-zip enabled!\n"; }
On some systems, however, you have to play tedious games with ``text'' versus ``binary'' files. See binmode.
If you're concerned about 8-bit ASCII data, then see the perllocale manpage.
If you want to deal with multi-byte characters, however, there are some gotchas. See the section on Regular Expressions.
warn "has nondigits" if /\D/; warn "not a whole number" unless /^\d+$/; warn "not an integer" unless /^[+-]?\d+$/ warn "not a decimal number" unless /^[+-]?\d+\.?\d*$/ warn "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
Or you could check out CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz instead.
use FreezeThaw qw(freeze thaw); $new = thaw freeze $old;
Where $old can be (a reference to) any kind of data structure you'd like. It will be deeply copied.