> Robert, let me disclose some more information about the nature of the > problem. Actually I have Protein sequences (strings of variable length > composed of a 20 letter alphabet) for example "CAARGNDLYSKNIG" can be > considered as a protein sequence. basically it's just a string. I happen to be a molecular biologist myself, so this of course gets my attention ;) I assume there is a particular reason why you are looking for sequence features using regexp? Because it seems somewhat inefficient - which of course got you here in the first place. There is no way that you can use distance measures of sequences to cluster them, for example (like Blast + MCL)? If you are looking to group your seqs, that is. Also, databases like ProDom do basically just that - looking for particular sequence features in protein sequences. Sure, they focus on domains, but depending of the nature of your regexps, their tools may be applicable regardless. Then there is the new CS-BLAST, which uses scoring matrices - which may perhaps be derivable from your regexps, dunno. Well I suppose I could offer more help if the idea behind this fishing-experiment was a little clearer (i.e. what is it that you want to find out). In any case, I suppose depending on your search space, ruby may really not be the ideal approach. Hope it works out in the end! -- Posted via http://www.ruby-forum.com/.