On 10/21/06, Paul Lutus <nospam / nosite.zzz> wrote: > Harold Hausman wrote: > > > On 10/21/06, Paul Lutus <nospam / nosite.zzz> wrote: > >> > >> Show us the problem. There are many kinds of character sequences that are > >> not allowed in XML data fields, and there are a number of ways to escape > >> the data fields, but they have to be applied in order to work. Arbitrary > >> data can't simply be dropped between XML delimiters, without certain > >> precautions being taken. > >> > >> > >> -- > >> Paul Lutus > >> http://www.arachnoid.com > >> > > > > Hi Paul, > > > > It sounds like you might have some experience in this area. Not to > > hijack the OP, but could you possibly describe the process you would > > go through if you had a completely random pile of binary barf that you > > wanted to store as an XML attribute? > > Okay, you need to know I am famously lazy. In fact, I think Larry Wall was > describing me when he made his well-known remark about programmer laziness > and hubris. Being lazy, the first simple approach I would take is to > enclose the binary data like this: > > <enclosing XML tag><![CDATA[(binary data here)]]></enclosing XML tag> > > The next step would be to make sure neither the starting or ending CDATA tag > appears in the enclosed binary data, otherwise this strategy will fail. > > The next step after that is to escape (and later unescape) the binary data > if needed to assure the uniqueness of the delimiters. > > You need to understand that, with a sufficiently large and varied binary > data set, every imaginable character string will appear in the data, > eventually including the delimiters. > > This, in turn, means that escaping the data is eventually a requirement, and > escaping the data means it will be larger than if this step were not > needed. > > You should realize that another, possibly better, approach for truly large > binary globs is to store them as files, and store links to the files in the > XML data set, rather than the raw data itself. > Thanks for this insight. It's funny, to me, how laziness has become a defense mechanism. I think *I* personnally kind of like it. (: Storing the binary as a seperate file is a great solution. In our particular case we like to have the data in one big xml file for the purposes of source control. I'm sure I don't need to expound on the greatness of plain text on the Ruby list, but the source control system we use doesn't play exceptionally well with binary files. Thanks again, -Harold