On Sun, 24 Oct 2004 17:35:46 GMT,
Martin Pfeffer <udlduz / chello.at> wrote:
> hi
> my problem is i need a file with german words and so i try to create a 
> file parsing html sites and write extracted words to a database so my 
> questizn is what is the easyest way to extract text from html pages?
> thx
> Martin

there's a /usr/share/dict/ngerman on my Debian box
> wc ngerman
 308860  308860 3998536 ngerman

which tells me that the average word length is about 13 (!) letters.
Unvorstellbar!

s.