2010/8/30 Martin Hansen <mail / maasha.dk>:
>> You could do this though
>>
>> seq.gsub! /./ do |m|
>> =A0 scores[$`.length].ord - BASE_SOLEXA < cutoff ? m.downcase! : m
>> end
>>
>> Not too nice though. =A0You could however do some preparation, e.g.
>> store scores as an Array of Fixnum instead of using #ord.
>
> Yes, converting scores to arrays is bad since the scores are parsed from
> files as strings (millions of them). And I am unsure if substr
> substitutions are very efficient ...
>
> We need to trick this into the regex engine somehow.
>
> How about transforming scores to a mask string like this: 000111 where 1
> indicates that the corresponding sequence char should be lowercased
> (that can be done with tr). Then we plug this onto the sequence string:
>
> seq =3D "ATCGAT000111"
>
> And then we construct a regex with a forward looking identifier that
> reads the mask and manipulates the ATCG chars?

Here's what I'd probably do.

Create a custom class (and not use a Hash) for this, e.g.

Score =3D Struct.new :seq, :score

Create another structure for caching scores and a bit representation
for downcase dependent on cutoff valiue, e.g.

ScoreCache =3D Struct.new :score do
  def mask(cutoff)
    cache[cutoff]
  end

  def mask_sequence(cutoff, seq)
    mask(cutoff).each_bit do |idx|
      seq[idx] =3D seq[idx].downcase!
    end

    seq
  end

private
  def cache
    @cache ||=3D Hash.new do |h,cutoff|
      c =3D 0

      self.score.each_with_index |ch,idx|
        c |=3D (1 << idx) if ch.ord - BASE_SOLEXA < cutoff
      end

      h[cutoff] =3D c
    end
  end
end

# Store score string -> ScoreCache
global_score_cache =3D Hash.new do |h,score|
  h[score] =3D ScoreCache.new score
end

class Integer
  def each_bit
    raise "Currently only positive implemented" if self < 0

    if block_given?
      idx =3D 0
      x =3D self

      while x !=3D 0
        yield idx if x[0] =3D=3D 1
        idx +=3D 1
        x >>=3D 1
      end

      self
    else
      Enumerator.new self, :each_bit
    end
  end
end


And then use it and profile.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/