On Sep 2, 2008, at 5:52 AM, Axel Etzold wrote: > Dear all, > > I have a number of black-and-white scanned pages. To prepare them > for OCR, > I have to split them in columns and rows. Additionally, somewhere in etween, there > are pictures, which also need to be separated. > > So, in a page that might look like this: > > Text1 Text4 Text6 > > Text2 Pict1 Text7 > > Text3 Text5 Pict2 > > I'd like to find the largest blocks of white which separate the > texts and pictures, both horizontally > and vertically. > > Right now, I would use RMagick with export_pixels_to_str and then > regular expressions to find the > zeros, but I am not sure whether there's a more effective way for > this purpose.... > > Do you have any suggestions ? > > Thank you very much, > > Best regards, > > Axel > > > -- > GMX Kostenlose Spiele: Einfach online spielen und Spahaben mit > Pastry Passion! > http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 you are attempting to roll your own image segmentation. google for 'computer vision'. some helpful links http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ http://www.itk.org/ http://camellia.sourceforge.net/ it can be quite a different domain than normal image processing a @ http://codeforpeople.com/ -- we can deny everything, except that we have the possibility of being better. simply reflect on that. h.h. the 14th dalai lama