[Tfug] OT: Fairly simple relational database quick job bids...

Fri Oct 1 12:35:55 MST 2010

On Fri, Oct 1, 2010 at 10:48 AM, Jim March <1.jim.march at gmail.com> wrote:
> Folks,
>
> I need a contract on what we think is a fairly simple relational
> database problem.
>
> What he's got is "raw data" somebody to chew on.
>
> He has two .CSV files.
>
> The first is 112megs, listing people and details about them as a
> single line (record).  There's a unique ID number in one field.
>
> The second is 100megs, with each record only three fields long.  The
> first field contains the unique ID number, the second contains a
> number for an event they participated in, the third is not really
> relevant.  For each person there will be several lines (records).
>
> Example:
>
> 1234543,2
> 1234543,5
> 1234543,6
> 1234543,7
>
> This tells us that voter 1234543 was involved in events 2, 5, 6 and 7.
>
> One complication: it's common for the data on events to contain a ID
> number of "123456" where the main data on the people lists the ID
> number as "00123456".  That'll have to be parsed somehow.

Here you go:

--
#!/usr/bin/env perl
use strict;
use warnings;
use diagnostics;

# this is the name of the input file
my $infile = "votes.csv";

# sort the contents of the file numerically
`sort -n $infile > $infile.sorted`;

open(INFILE, "$infile.sorted") or die "couldn't open input file";

my $current_voter = 0;

while(<INFILE>){
    chomp; # get rid of the newline
    my ($voter,$vote)= split(","); # split on the comma
    if($voter != $current_voter) { # if this isn't the same voter as
the last line (numeric comparison), print a newline and new voter id,
        print "\n$voter,";
    }
    print "$vote,"; # print the vote
    $current_voter = $voter;
}
print "\n";
---

- Zack