[Tfug] grep question [was find | grep]
Rich
r-lists at studiosprocket.com
Fri Oct 24 08:23:22 MST 2008
Hang on a sec...
Maxim #644: Use of temporary files with sed is an admission of failure.
So: (tested)
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g'
I'll assume you understand the "find" you were given earlier. And
there's more after the regex explanation.
Explanation of regex:
s = search & replace
/ = beginning of search pattern
[^=]*=\([^@]*@[^)]*\))[^=]* = search pattern
[^=]* = any number (*) of not (^) an equals sign '='
= = the '=' sign -- opening delimiter for email addresses
\([^@]*@[^)]*\) = the bit we want to keep
\( = the opening delimiter
\) = the closing delimiter
[^@]* = any number (*) of not (^) at signs '@'
@ = the at sign
[^)]* = any number (*) of not (^) right parentheses ')'
) = the right parenthesis which is the closing delimiter for email
addresses
[^=]* = again, any number (*) of not (^) an equals sign '='
/ = end of search pattern and beginning of replace pattern
\1, = replace pattern
\1 = first remembered string in search pattern
, = the comma sign ','
/ = end of replace pattern
g = 'global': do this s&r more than once per line
So, all in all, pretty straightforward. It doesn't even get into the
advanced features of sed.
Caveats:
Currently, the one-liner preserves carriage returns. If you want to
squash them, use xargs:
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' | xargs
And there's a trailing comma. Squish:
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' | xargs \
| sed -e 's/,$//'
And it doesn't check for dups. Splat:
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,/g' \
| sort -u | xargs \
| sed -e 's/,$//'
Only problem here is that with more than one email per line, they're
still treated as one line. I'm pretty sure GNU sed allows you to
insert a newline with \n, but I'm on a Mac without GNU sed, and it's
a nice challenge to work around this:
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,=/g' \
| tr = '\n' | sort -u | xargs \
| sed -e 's/,$//'
So I added an equals sign to the sed replace string, as a placeholder
for where I want the newline to be. The output is piped to tr to
convert '=' into newlines. Then it goes to sort, and everything's
nice again.
Now that we're happy with the output, we can push it into a file:
find . -type f -name abook.mab -exec grep @ {} \; \
| sed -e 's/[^=]*=\([^@]*@[^)]*\))[^=]*/\1,=/g' \
| tr = '\n' | sort -u | xargs \
| sed -e 's/,$//' > spam_list.csv
The best thing about one-liners is that you don't have to fit them on
one line :-)
R.
On Oct 23, 2008, at 9:21 pm, Jeff Breadner wrote:
>
>>
>>
>> ...ummm, now that all the terms/lines containing '@' are in one
>> file (which is fine in itself), is there any way I could extract
>> the string of characters containing the '@' symbol, but only print
>> those characters in the string between the '=' symbol and the ')'
>> symbol???
>>
>> and output that into a nice text file with commas in between?????
>>
>> I suppose I can write my own script someday.
>>
>> :|
>>
>> Here's are two sample lines of output:
>>
>> (145=lasalledre at aol.com)(146=lasalledre)(14D=438f81ad)(147=2f)(A1
>> (152=marc diMinno)(142=marcdiminno at hotmail.com)(153=43919ff2)(143=2e
>>
>> which would be perfect to see like this:
>>
>> ...
>>
>> lasalledre at aol.com,
>> marcdimminno at hotmail.com,
>>
>> ...
>>
>>
> Another one, this one might be more reliable for your specific
> situation, but less portable to other files (should anyone else be
> looking for a similar solution). This one replaces the bracket and
> equals signs with newline characters, then just greps out any line
> that has an @ symbol in it:
>
> cat inputfile | tr \(=\) \\\n\\\n\\\n | grep @
>
>
> cheers
> Jeff
>
> _______________________________________________
> Tucson Free Unix Group - tfug at tfug.org
> Subscription Options:
> http://www.tfug.org/mailman/listinfo/tfug_tfug.org
More information about the tfug
mailing list