William James wrote: > Tom Cloyd wrote: > > > I'm baffled by this strange outcome - I cannot reduce multiple > > spaces from a text file. This isn't just a regex problem, somehow. > > I'm failing to grasp something essential, but don't know what it > > is. All help appreciated, as usual! > > > > Here is a demo of my problem, in which I try two different ways, > > and both fail: > > > > === code === > > # h2t.rb > > > > def main > > # conversion table spec > > conv = [ > > [ '<h1>', 'h1. ' ], [ '<h2>', 'h2. ' ], [ '<h3>', 'h3. ' ], > > [ '<h4>', 'h4. ' ], [ '<h5>', 'h5. ' ], [ '<h6>', 'h6. ' ], [ > > /<\/h\d>/, '' ], > > [ " +", ' ' ]] # <= this last array element should do the trick, > > but doesn't > > > > data = open( 'h2t-in2.txt', 'r' ) { |f| ( f.readlines( data > > )).to_s } > > conv.each do |i| > > data.gsub!( i[0], i[1] ) > > end > > data.squeeze(' ') # <= putting this here was sheer desperations, > > but even THIS fails > > > > open( "h2t-out.txt", "w" ) { |f| f.write( data ) } > > > > end > > > > %w(rubygems ruby-debug readline strscan logger fileutils).each{ > > |lib| require lib } > > > > main > > > > === input file === > > > > <h1>Library catalog listing </h1>x > > > > <h3>Library catalog listing </h3>x > > > > <h2>Library catalog listing </h2>x > > > > p(subtitle). A complete listing of all material in the Library > > > > > > === output file === > > > > > > h1. Library catalog listing x > > > > h3. Library catalog listing x > > > > h2. Library catalog listing x > > > > p(subtitle). A complete listing of all material in the Library > > > > ============== > > > > The "x"s in the input file are to show that while the end tags are > > being removed the space before them is NOT. > > > > t. > > puts IO.readlines("data2").map{|line| > line.sub( /<(h\d)>/, '\1. ' ).sub( /<\/h\d>/, ""). > squeeze " " } > > --- output --- > > h1. Library catalog listing x > > h3. Library catalog listing x > > h2. Library catalog listing x > > p(subtitle). A complete listing of all material in the Library puts IO.read("data2").gsub( /<(h\d)>/, '\1. ' ).gsub( /<\/h\d>/, ""). squeeze " "