On 10/21/06, Paul Lutus <nospam / nosite.zzz> wrote:
> Harold Hausman wrote:
>
> > On 10/21/06, Paul Lutus <nospam / nosite.zzz> wrote:
> >>
> >> Show us the problem. There are many kinds of character sequences that are
> >> not allowed in XML data fields, and there are a number of ways to escape
> >> the data fields, but they have to be applied in order to work. Arbitrary
> >> data can't simply be dropped between XML delimiters, without certain
> >> precautions being taken.
> >>
> >>
> >> --
> >> Paul Lutus
> >> http://www.arachnoid.com
> >>
> >
> > Hi Paul,
> >
> > It sounds like you might have some experience in this area. Not to
> > hijack the OP, but could you possibly describe the process you would
> > go through if you had a completely random pile of binary barf that you
> > wanted to store as an XML attribute?
>
> Okay, you need to know I am famously lazy. In fact, I think Larry Wall was
> describing me when he made his well-known remark about programmer laziness
> and hubris. Being lazy, the first simple approach I would take is to
> enclose the binary data like this:
>
> <enclosing XML tag><![CDATA[(binary data here)]]></enclosing XML tag>
>
> The next step would be to make sure neither the starting or ending CDATA tag
> appears in the enclosed binary data, otherwise this strategy will fail.
>
> The next step after that is to escape (and later unescape) the binary data
> if needed to assure the uniqueness of the delimiters.
>
> You need to understand that, with a sufficiently large and varied binary
> data set, every imaginable character string will appear in the data,
> eventually including the delimiters.
>
> This, in turn, means that escaping the data is eventually a requirement, and
> escaping the data means it will be larger than if this step were not
> needed.
>
> You should realize that another, possibly better, approach for truly large
> binary globs is to store them as files, and store links to the files in the
> XML data set, rather than the raw data itself.
>

Thanks for this insight.

It's funny, to me, how laziness has become a defense mechanism. I
think *I* personnally kind of like it. (:

Storing the binary as a seperate file is a great solution. In our
particular case we like to have the data in one big xml file for the
purposes of source control. I'm sure I don't need to expound on the
greatness of plain text on the Ruby list, but the source control
system we use doesn't play exceptionally well with binary files.

Thanks again,
-Harold