martinus [mailto:martin.ankerl / gmail.com] wrote:

> The goal is to automatically create summaries of a text. For 
> example, if you have a large text file and you have no idea 
> what this is about, the analyzer should be able to give you a 
> short summery of the file. Another nice idea might be to add 
> such a feature to a blogging webpage, each entry could show a 
> short summary, or at least the most important words.

I find your analyzer quite promising..

1. it would be very useful (to me) if you can extend it so that one can be
able to create indexes.

like eg.

#analyze.rb *.txt -index
aardvark
   file1.txt 24,100
   file2.txt 9,1000
abacus
   file1.txt 24,100
   file2.txt 9,1000

.....

the numbers after the filename are just line numbers in file, you an extend
it to page level though..


2. also you say important, but how important? It would be helpful if one can
do grading. Sometimes, one would not want to display simple words. Eg, if I
can a list of ruby files, I would want the analyzer to ignore words like
"def, class, if, then, case" etc...

3. maybe you can extend it to include types also. Eg, I can search files
that are scientific, or maybe related to music. etc.. This is possible if
the analyzer can classify..

4. also, it would be great if it recognizes phrases also not just words. eg,
it recognizes "human body" or "programming language" as one entity..

of course, I'm throwing big suggestions that requires a big project. Just
consider it a challenge, pls :-)

> 
> martinus
> 

thanks for analyzer.

kind regards -botp