>  Robert, let me disclose some more information about the nature of the 
> problem. Actually I have Protein  sequences (strings of variable length 
> composed of a 20 letter alphabet) for example "CAARGNDLYSKNIG" can be 
> considered as a protein sequence. basically it's just a string.

I happen to be a molecular biologist myself, so this of course gets my 
attention ;)

I assume there is a particular reason why you are looking for sequence 
features using regexp? Because it seems somewhat inefficient - which of 
course got you here in the first place. There is no way that you can use 
distance measures of sequences to cluster them, for example (like Blast 
+ MCL)? If you are looking to group your seqs, that is. Also, databases 
like ProDom do basically just that - looking for particular sequence 
features in protein sequences. Sure, they focus on domains, but 
depending of the nature of your regexps, their tools may be applicable 
regardless. Then there is the new CS-BLAST, which uses scoring matrices 
- which may perhaps be derivable from your regexps, dunno. Well I 
suppose I could offer more help if the idea behind this 
fishing-experiment was a little clearer (i.e. what is it that you want 
to find out).

In any case, I suppose depending on your search space, ruby may really 
not be the ideal approach. Hope it works out in the end!
-- 
Posted via http://www.ruby-forum.com/.