Victor "Zverok" Shepelev wrote:

/ ...

> Now we have some part of page, need to delete all tables, images, and so
> on, and strip all "non-content" tags (everything but p, ul, ol, li, b,
> i...), and I need to have "consistent" HTML to show.

Easy to say in one word, but that one word cannot be turned into code.

> It is a task definition.
> 
> The task may vary for different dictionaries. For ex., with some
> dictionaries tables must not be deleted, but "normalized":
> "<td>text1<td>text2" => "<table><tr><td>text1<td>text2</table>"

Both the before and after forms show big syntax errors. I hope you
understand HTML syntax, if not, this may be more difficult than I thought.

> Or even XHTMLish "<table><tr><td>text1</td><td>text2</td></tr></table>"

Well, your description of the problem is way too general for any progress
toward a solution.

Perhaps you could post what you consider to be the desired end result for a
particular entry from the "dictionary" site of your choice.

By the way (my boilerplate remark about page scraping), if this is for any
purpose other than your own personal use, it represents a copyright
problem.

I want to emphasize this is not difficult at all, once there is a clear
statement of purpose. In can be done in a few (maybe a few dozen) lines of
Ruby code.

-- 
Paul Lutus
http://www.arachnoid.com