[Tfug] text file help
Choprboy
choprboy at dakotacom.net
Thu Jun 26 10:07:40 MST 2008
On Thursday 26 June 2008 06:25, christopher floess wrote:
> Alright, I imagine this to be at least in part a sed-type problem. I'd
> like to try to figure
> it out on my own, but I'm on a time crunch here. I've got a 1300 page
> document that needs to be parsed and edited.
[snip]
>The problem is that if a location has 30 of one
> thing, that thing is
> listed 30 times.
[snip]
> 883230063751 BELT A REVERSIBLE /RED /L 402008
> 1 13.21 13.21
> 883230063805 BELT A REVERSIBLE /RED/NVY /XL 402008
> 1 13.41 13.41
> 883230063805 BELT A REVERSIBLE /RED/NVY /XL 402008
> 1 13.41 13.41
> 883230063751 BELT A REVERSIBLE /RED /L 402008
> 1 13.21 13.21
> 883230063751 BELT A REVERSIBLE /RED /L 402008
> 1 13.21 13.21
> 883230063751 BELT A REVERSIBLE /RED /L 402008
> 1 13.21 13.21
[snip]
> here you can see what's in fixture 122, area 13 and you can see item
> #883230063751
> is listed 6 times and in fact, the 6th column on any one of those
> lines is the quantity
So... this could be VERY simple to fix. How consistent is the data? If it is
well formatted, you could just do a sort and uniq count. (Even if it isnt
completely, it will clean it up and make the differences apparent.) Using
your snippet of data:
[ajensen at tatertot ~]$ sort <inventory.txt |uniq -c
1 883230063737 BELT A REVERSIBLE /RED /S 402008
6 13.21 79.27
1 883230063744 BELT A REVERSIBLE /RED/NVY /M 402008
6 13.41 80.48
8 883230063751 BELT A REVERSIBLE /RED /L 402008
1 13.21 13.21
3 883230063805 BELT A REVERSIBLE /RED/NVY /XL 402008
1 13.41 13.41
1 883230063805 BELT A REVERSIBLE /RED/NVY /XL 402008
6 13.41 80.48
From your 14 lines of data you end up with 5 lines of results showing you have
8 of SKU 883230063751 in location 402008 listed as a single quantity. Since
it is sorted by SKU/Location/Quantity (assuming the desc field is the same)
other quantity counts of the same SKU would be in sequential lines.
Simple first step reduction in the data to group and count all the same
entries. Afterward add in a couple awk and sed commands, perhaps multiplying
the uniq count and the qty fields as a total field and then reordering the
columns, lather, rinse, repeat to continue consolidating the data.
Adrian
More information about the tfug
mailing list