On 6/14/06, Leslie Viljoen <leslieviljoen / gmail.com> wrote:
>
> Here's a quick version that is closer to having speech synth. It's not
> a real synthesiser, but if you can provide the corresponding ogg files
> it can look for certain phrases and play them. The result should sound
> a bit better than a real synthesiser since the sections will be spoken
> fairly naturally. Only 53 files to record!
>

I know this part is after the summary  (Thanks for the nice writeup) ,
but I wanted to share.

I had an idea similar to Leslie's, but I wanted to actually write out
an audio file, instead of sending the narration to the speakers.  The
solution has 2 parts.

class WaveRead extracts all the information from a wave file.  I put
it together in under 2 hours last night.  It was so much easier to
write than the one in I did C a few years ago, and I'm really pleased
the result.  It's clean and extensible.  I already have an idea for
making it trivial to add the other chunk definitions.

class WaveSpeaker writes a new wave file with everything it was told
to say.  It does this by using a wave file feature called cues, which
are a way of marking a point in the file and giving it a name.   I
created a wave file with several words, and a cue marking each one.
(see more about this below.)  WaveSpeaker parses this file, and starts
writing a new output file with the same format.  Then, when #say is
called, it looks for each word in the list of cues, and if found,
pastes the appropriate part of the source wave into the output file.
It inserts silence for each #wait, compensating for the length of the
previous sentences.  At the end it just fixes up the filesize data,
and closes the file.  All you need to do is convert the file to MP3
and transfer to your iPod.

------wavespeaker.rb------
require 'Ostruct'

class RiffRead
  def initialize io
    @io = io
    raise "Not a RIFF file" if io.read(4) != "RIFF"
    @size = get_long
    @type = get_word
  end
  def parse
    chunks = []
    chk = get_chunk
    while chk
      chunks << chk
      chk = get_chunk
    end
    chunks
  end
  def self.get_long io
    io.read(4).unpack('V')[0]
  end
  def self.get_short io
    io.read(2).unpack('v')[0]
  end
  def self.get_word io
    io.read(4)
  end

private
  def get_chunk
    tag = get_word
    return nil if !tag
    if tag == 'LIST'
      handle_list
    else
      size = get_long
      size+=1 if size%2 != 0
      data = handle_tag(tag,size)
      data ||= @io.read(size)
      [tag, size, data]
    end
  end
  def handle_tag tag,size
    funcname = "parse_"+tag.strip
    if methods.include? funcname
      return self.send(funcname, size)
    end
  end
  def handle_list
    listsize = get_long
    @listtype = get_word
    ['LIST',listsize,@listtype]
  end
  def get_long
    self.class::get_long @io
  end
  def get_short
    self.class::get_short @io
  end
  def get_word
    self.class::get_word @io
  end
end

def make_cue io
  cue = OpenStruct.new
  cue.name = RiffRead::get_long io
  cue.position = RiffRead::get_long io
  cue.chkname = RiffRead::get_word io
  cue.chkstart = RiffRead::get_long io
  cue.blockkstart = RiffRead::get_long io
  cue.samplestart = RiffRead::get_long io
  cue
end

class WaveRead < RiffRead
  attr_reader :cues,:labels,:format, :data
  def initialize io
    super
    raise "Not a Wave File" if @type != 'WAVE'
  end
  def parse_fmt size
    @format = OpenStruct.new
    @format.data = @io.read(size)
    @format.size = size
    @format.tag = format.data[0,2].unpack('v')[0]
    @format.channels = format.data[2,2].unpack('v')[0]
    @format.samples_per_sec = format.data[4,4].unpack('V')[0]
    @format.bytes_per_sec = format.data[8,4].unpack('V')[0]
    @format.blockAlign = format.data[12,2].unpack('v')[0]
    @format
  end
  def parse_data size
    @data = @io.read(size)
  end
  def parse_cue size
    @cues = []
    numcues = get_long
    numcues.times  do
      @cues << make_cue(@io)
    end
    @cues
  end
  def parse_labl size
    id = get_long
    string = @io.read(size-4)
    @labels||=[]
    @labels << [id,string.strip]
    @labels.last
  end
  def parse_note size
    id = get_long
    string = @io.read(size-4)
    @notes||=[]
    @notes << [id,string.strip]
    @notes.last
  end
end


class WaveSpeaker
  def initialize filename
    File.open(filename, "rb") do |f|
      @data = WaveRead.new(f)
      @data.parse
    end
    @elapsed = 0
  end
  def begin outfile
    @out = File.open(outfile, "wb")
    @out.write('RIFF')
    @filesize_marker = @out.pos
    @out.write [0].pack('V')
    @written = @out.write('WAVEfmt ')
    @written+= @out.write [@data.format.size].pack('V')
    @written+= @out.write @data.format.data
    @written+= @out.write('data')
    @datasize_marker = @out.pos
    @written+= @out.write [0].pack('V')
  end
  def say string
    fixup(string).split.each do |str|
      str = fixup(str)
      if str == 'COMMA'
        wait 0.2
      else
        cue_id = nil
        @data.labels.each_with_index{|label,i|
          if label[1].downcase == str.downcase
            cue_id = i
            break
          end
        }
        if cue_id
          #p "saying #{str}"
          start = @data.cues[cue_id].samplestart*2
          endpt = @data.cues[cue_id+1].samplestart*2
          endpt+=1 if (endpt-start)%2 != 0
          @written+= @out.write(@data.data[start...endpt])
          @elapsed += (endpt-start).to_f / @data.format.bytes_per_sec
        else
          p "CAN'T FIND <#{str}>"
        end
      end
    end
  end
  def wait seconds
    a = "\0"
    delay = (seconds - @elapsed)
    p delay
    if delay > 0
      bytes = (delay * @data.format.bytes_per_sec).to_i
      p "wait #{bytes}"
      bytes+=1 if (bytes%2 != 0)
      silence = a*bytes
      @written+= @out.write silence
      @elapsed = 0
    else
      @elapsed -= seconds
    end
  end
  def fixup str
    #remove punctuation, mark pauses
    str.gsub!(/,/," COMMA ")
    str.gsub!(/[^\w\s]/,"")
    str
  end
  def quit
    @out.seek @filesize_marker
    @out.write [@written].pack('V')
    @out.seek @datasize_marker
    @out.write [@written-@datasize_marker+4].pack('V')
    @out.close
    p @written
  end
end

if __FILE__ == $0
  wr = WaveSpeaker.new("coach.wav")
  wr.begin("todays_run.wav")
  wr.say 'run 60 seconds'
  wr.wait 1
  wr.say 'walk 15 minutes'
  wr.quit
end
-----end-----

To get to work with my solution, just add the following lines:
in Coach#initialize, add
   @speaker = WaveSpeaker.new "coach.wav"
   @speaker.begin "current_workout.wav"

at the end of Coach#coach add
   @speaker.quit

and replace these two functions:
 def say s
   @speaker.say s
 end
 def wait n
   @speaker.wait n
   @target_time -= n
 end


To get the source file, I generated a wave file with 53 words from my
coaching script using a synth (couldn't find a microphone), and used
my wave editor's auto cue feature to insert numbered cues in all the
gaps between words.  After running simple script to replace the
numbers with the words, I have a complete solution that produces a 20
minute long wav file of a robot coach.  It would probably be better if
you used a real voice.  If anyone is actually interested in this, I
can give you more details on the wave file creation.

-Adam