<--- On Mar 12, William James wrote --->

> Aditya Mahajan wrote:
>> I am trying to convert a plain text file into a tex file with
>> somemarkup. I have the following piece of code that does most of the
>> work. It works fine but does not "look good". Can someone suggest how
>> can I improve this code.
>>
>> ---------------[snip]----------------------
>> file = File.new(filename, 'r')
>> texfile = File.open(basename + ".tex", 'w')
>>
>> CHAPTER = Regexp.new("CHAPTER")
>> SPACES  = Regexp.new("\s\s\s\s")
>> BLANK   = Regexp.new(/^\s*$/)
>>
>> chapter = true
>> verse   = false
>> prev_line = ""
>>
>> file.each_line do |line|
>>    if chapter && !BLANK.match(line)
>>      chapter = false
>>      texfile.puts "\\chapter{#{line.chomp}}"
>>    elsif CHAPTER.match(line)
>>      chapter = true
>>    elsif verse && !SPACES.match(line)
>>      texfile.puts '\stoplines\stopnarrower'
>>      texfile.puts line.chomp
>>      verse = false
>>    elsif !verse && BLANK.match(prev_line) && SPACES.match(line)
>>      texfile.puts '\startnarrower\startlines'
>>      texfile.puts line.chomp
>>      verse = true
>>    else
>>      texfile.puts line.chomp
>>    end
>>    prev_line = line
>> end
>> ------------------[snip]--------------------
>>
> ruby -p01e'gsub(/(\n\s*\n)((?:^\s{4}.*\n)+)/,
> "\\1\\startnarrower\\startlines\n\\2\\stoplines\\stopnarrower\n");
> gsub(/(CHAPTER\s+)(.*)\n/,"\\chapter{\\2}\n")' in >out
>
Thank you for the regex. The chapter part of your regex is better than 
what I was doing, it does not correctly identify narrower region. 
I think that I can tweak it a little to make it work correctly. But 
what I want to know is there a better way to do this in ruby.

I am not too comfortable with coding using a regex as it can be very 
difficult to maintain. Each time I have to look into the expression 
and try to understand it again.

Basically the logic of the program depends on the "state" which I am 
keeping track of using flags. Your code gets rid of the flags using a 
two pass algorithm. What are the pros and cons. In using  a gsub, the 
program needs to read the entire file before it can make any changes. 
I thought that this would be memory inefficient but for a ~160 kb 
file, it is almost instantaneous. Even for a 3MB file it takes less 
than a second. At what file sizes should one read the file line by 
line rather than entire file in a single shot?

Thanks


-- 
Aditya Mahajan, EECS Systems, University of Michigan
http://www.eecs.umich.edu/~adityam || Ph: 7342624008