2010/8/30 Martin Hansen <mail / maasha.dk>:
>> You could do this though
>>
>> seq.gsub! /./ do |m|
>> scores[$`.length].ord - BASE_SOLEXA < cutoff ? m.downcase! : m
>> end
>>
>> Not too nice though.  ¨Βου γουμθοχεφεδο σονε πςεπαςατιοξ¬ ε®η®
>> store scores as an Array of Fixnum instead of using #ord.
>
> Yes, converting scores to arrays is bad since the scores are parsed from
> files as strings (millions of them). And I am unsure if substr
> substitutions are very efficient ...
>
> We need to trick this into the regex engine somehow.
>
> How about transforming scores to a mask string like this: 000111 where 1
> indicates that the corresponding sequence char should be lowercased
> (that can be done with tr). Then we plug this onto the sequence string:
>
> seq = "ATCGAT000111"
>
> And then we construct a regex with a forward looking identifier that
> reads the mask and manipulates the ATCG chars?

Here's what I'd probably do.

Create a custom class (and not use a Hash) for this, e.g.

Score = Struct.new :seq, :score

Create another structure for caching scores and a bit representation
for downcase dependent on cutoff valiue, e.g.

ScoreCache = Struct.new :score do
  def mask(cutoff)
    cache[cutoff]
  end

  def mask_sequence(cutoff, seq)
    mask(cutoff).each_bit do |idx|
      seq[idx] = seq[idx].downcase!
    end

    seq
  end

private
  def cache
    @cache ||= Hash.new do |h,cutoff|
      c = 0

      self.score.each_with_index |ch,idx|
        c |= (1 << idx) if ch.ord - BASE_SOLEXA < cutoff
      end

      h[cutoff] = c
    end
  end
end

# Store score string -> ScoreCache
global_score_cache = Hash.new do |h,score|
  h[score] = ScoreCache.new score
end

class Integer
  def each_bit
    raise "Currently only positive implemented" if self < 0

    if block_given?
      idx = 0
      x = self

      while x != 0
        yield idx if x[0] == 1
        idx += 1
        x >>= 1
      end

      self
    else
      Enumerator.new self, :each_bit
    end
  end
end


And then use it and profile.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/