Dear all,

I have many scanned pages which I'd like to cut to prepare them
for OCR.
There are two things I'd like to do:

1.) Cut off a header of each page containing the page number,

2.) Find the largest horizontal blanks in a page (which are supposed
to separate chapters) like this:

Chapter1's text        Chapter1's text
Chapter1's text        Chapter1's text
Chapter1's text        Chapter1's text
Chapter1's text        Chapter1's text
                                       <---- cut here, at this blank
Chapter2's text        Chapter2's text
Chapter2's text        Chapter2's text
Chapter2's text        Chapter2's text
Chapter2's text        Chapter2's text
                   ^
                   |
                    --- (Then cut vertically)

I have tried to convert my pages, which are A4 and 600 dpi, to pixel arrays,
but this is quite slow. Is there a better method, ie. using to_blob ?

Thank you very much,

Axel 

                  

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal f?r Modem und ISDN: http://www.gmx.net/de/go/smartsurfer