On 30.03.2007 17:34, Jon wrote:
> I'm trying to translate a strange derivative of xml into valid xml. Here
> is an example line:
> 
> <SUBEVENTSTATUS
> 1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
> 1:1><SUBEVENTSTATUS 2:2><......and on
> 
> REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
> should be some kind of attribute declaration instead. I want to
> translate it to something like this: <SUBEVENTSTATUS no="1" of="2">
> 
> I'm trying to make a regex to detect the funny tags. Here is what I have
> so far:
> 
> xml_fix=/<(\S+)\s+(\d+):(\d+)>/
> 
> This is great, but it will match this:
> 
> <Request><code_set_list 1:2>
> 
> instead of just this:
> 
> <code_set_list 1:2>
> 
> ..because there is no gauranteed whitespace between tags. Basically, I
> need to stop matching if a ">" is found. I've never had to deal with
> anything quite like this in my regex experience. Any help or thoughts of
> a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

	robert