[Tfug] OT: Fairly simple relational database quick job bids...
Zack Williams
zdwzdw at gmail.com
Fri Oct 1 12:35:55 MST 2010
On Fri, Oct 1, 2010 at 10:48 AM, Jim March <1.jim.march at gmail.com> wrote:
> Folks,
>
> I need a contract on what we think is a fairly simple relational
> database problem.
>
> What he's got is "raw data" somebody to chew on.
>
> He has two .CSV files.
>
> The first is 112megs, listing people and details about them as a
> single line (record). There's a unique ID number in one field.
>
> The second is 100megs, with each record only three fields long. The
> first field contains the unique ID number, the second contains a
> number for an event they participated in, the third is not really
> relevant. For each person there will be several lines (records).
>
> Example:
>
> 1234543,2
> 1234543,5
> 1234543,6
> 1234543,7
>
> This tells us that voter 1234543 was involved in events 2, 5, 6 and 7.
>
> One complication: it's common for the data on events to contain a ID
> number of "123456" where the main data on the people lists the ID
> number as "00123456". That'll have to be parsed somehow.
Here you go:
--
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;
# this is the name of the input file
my $infile = "votes.csv";
# sort the contents of the file numerically
`sort -n $infile > $infile.sorted`;
open(INFILE, "$infile.sorted") or die "couldn't open input file";
my $current_voter = 0;
while(<INFILE>){
chomp; # get rid of the newline
my ($voter,$vote)= split(","); # split on the comma
if($voter != $current_voter) { # if this isn't the same voter as
the last line (numeric comparison), print a newline and new voter id,
print "\n$voter,";
}
print "$vote,"; # print the vote
$current_voter = $voter;
}
print "\n";
---
- Zack
More information about the tfug
mailing list