On Sep 2, 2008, at 5:52 AM, Axel Etzold wrote:

> Dear all,
>
> I have a number of black-and-white scanned pages. To prepare them  
> for OCR,
> I have to split them in columns and rows. Additionally, somewhere in  etween, there
> are pictures, which also need to be separated.
>
> So, in a page that might look like this:
>
> Text1 Text4 Text6
>
> Text2 Pict1 Text7
>
> Text3 Text5 Pict2
>
> I'd like to find the largest blocks of white which separate the  
> texts and pictures, both horizontally
> and vertically.
>
> Right now, I would use RMagick with export_pixels_to_str and then  
> regular expressions to find the
> zeros, but I am not sure whether there's a more effective way for  
> this purpose....
>
> Do you have any suggestions ?
>
> Thank you very much,
>
> Best regards,
>
> Axel
>
>
> -- 
> GMX Kostenlose Spiele: Einfach online spielen und Spahaben mit  
> Pastry Passion!
> http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196

you are attempting to roll your own image segmentation.  google for  
'computer vision'.  some helpful links

   http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/

   http://www.itk.org/

   http://camellia.sourceforge.net/

it can be quite a different domain than normal image processing


a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama