On Sep 6, 5:29 pm, "Simon Schuster" <significa... / gmail.com> wrote:
> I know this could be more idiomatic to ruby.
>
> it's basically to turnhttp://www.gutenberg.org/etext/18362into an
> array. the "_no_extras" refers to me having snipped the intro and
> outro of the text outside of ruby. I still have to do something with
> the "SECTION" fields and the "A", "B", etc. fields. (not to mention
> some kind of linguistic parsing which would make f[rand(f.size)] + " "
> f[rand(f.size)] + " " .... link together in a coherent matter, but
> that's a little beyond me. any direction in this area would be kindly
> appreciated though! I'm thinking of separating it into different text
> files maybe. certain sections are almost whole sentences, they're
> grouped in all kinds of ways that will maybe help with this. no
> long-term goal, really, just learning ruby and having fun. :)
>
> anyhow, the sloppy newbie code is as follows:
>
> f = File.read("phrases_no_extra.txt")
> f = f.to_a
> f = f.each { |x| x.chop! }
> f.each_with_index { |x,y|    # deletes the empty array items
>         if x.size == 0
>                 f.delete_at(y)
>         end
>         }
> f.each_with_index { |x,y|     # deleting all but the last (which is
> spread of two lines)
>         if x.include? "]"         # of his comments
>                 f.delete_at(y)
>         end
>         }
> f.each_with_index { |x,y|         # yes, this is me unable to recall
> how to do "or" hahaha.
>         if x.include? "["
>                 f.delete_at(y)
>         end
>         }
> f.delete_at(-1)          # random whitespace item at the end from the last quote
>
> puts f[rand(f.size)]

While this does not handle everything in the input stream you mention,
This will print most of the lines you are looking for. This will skip
blank lines and lines with whose only content is a single upper case
letter. This will also remove the full and partial comments following
a phrase. With some work on the regular expressions it could probably
do what you want. This solution has the advantages of not reading the
entire input stream into memory before processing and of being
concise. It has the disadvantage of requiring Rio (http://
rio.rubyforge.org) which is not part of the standard ruby library.

require 'rio'

rio('phrases_no_extra.txt').chomp.lines(/^\S/).skip(/^[A-Z]?$/) do |
line|
  puts line.gsub(/\s+\[.*$/,'')
end