On Mon, Apr 11, 2005 at 06:21:39AM +0900, Sam Roberts wrote:
> You willalso have to take into account invalidly encoded DER, though,
> unless you can really take the moral high-ground and refuse to interop
> with invalid DER. It's quite common for implementations to neglect the
> leading zero necessary to make INTEGER positive if the high bit is set,

But then, they are actually sending you a negative value, are they not?

What I mean is, there's no ambiguity. If somebody sends you b11111111 then
it's -1, not 255, and there's no question about it. It must be a contextual
thing to decide that -1 is an invalid value here, and that therefore the
sender 'must' have meant 255.

> for example. So, when you reencode (correctly) you don't have the same
> input. There's a whole set of common errors like this.

Ah. OK, I can see the case where you receive b11111111 b11111111 - you could
either reject this as an invalid encoding (the BER rules say that it is), or
you could decode it as -1, in which case you'll generate a different
encoding when you re-encode.

I'd prefer to take the view that the encoding is invalid: the standard is
absolutely unambiguous.

However, if I really needed to interoperate with something so broken, I'd
probably define an UNSIGNEDINTEGER type internally. It would encode using
the same universal tag as INTEGER, but the value would be treated as
unsigned. Hence b11111111 would be 255 and b11111111 b11111111 would be
65535. Propagating invalid encodings in this way should be something of a
last resort.

I'd be interested to know what the other common errors are that you mention.
This is the sort of knowledge which only an experienced implementor will
have...

> > If there's a possibility that a single attribute will be one of multiple
> > types, then it should be wrapped in an ASN.1 'choice'
> 
> ASN.1 choices aren't a "wrapping" in the sense that you see any wrapping
> in the BER or DER encoding, not unless you tag, anyhow. When an ASN.1
> choice appears, you literally encode whichever one you want.

Yes indeed. What I meant was, if I have
      foo   PrintableString,
      bar   UTF8String,

then I can assign @foo = "xxx" and @bar = "yyy", i.e. using native Ruby
strings, since when it comes to re-encoding them I'll know what ASN.1 type
to use from the ASN.1 definition for each attribute.

However if foo were a CHOICE between PrintableString and UTF8String, then
this information would be lost. One solution would be to decode as
     @foo = PrintableString.new("xxx")
or
     @foo = UTF8String.new("xxx")
in which case the class of foo carries forward that information. But that
makes a new object with an instance variable (say @value) holding the
string. Alternatively that information could be recorded in the singleton
class of the object:
     @foo = "xxx"
     @foo.extend PrintableString

That may be cleaner, although this metadata is easily lost:

     @foo.downcase!        # keeps singleton class
     @foo = @foo.downcase  # loses it

> This is the
> common case for strings, for example. ASN.1 to BER/DER is one-way, there
> are numbers of places where you cannot infer the ASN.1 from the
> encoding. Not necessarily a criticism, just an observation.

Yes, I gathered that. That's why you'd need to carry metadata about the
required ASN.1 encodings with the class, or (in some cases, as outlined
above) individual values.

> > Incidentally, Ruby's ASN.1 library does appear to have a 'traverse' method
> > which acts as a stream parser. You still need to build a suitable state
> > machine for it to 'yield' each element to, of course.
> 
> Probably built on top of openssl's tree-base routines, so you pay the
> memory cose, and the complexity cost. 

ossl_asn1_decode0 is basically a loop on ASN1_get_object, and as far as I
can tell that just walks along an DER stream in memory, updating a start
pointer as it goes. So it should work along an object in its linear form,
not having expanded to a tree; and with mmap() I guess it could work
directly from a file too. It calls itself recursively when it meets a
constructed item.

$ cat traverse.rb
require 'openssl'

a = "\xA1\x0A\x43\x08..test.."
OpenSSL::ASN1.traverse(a) { |y| p y }

$ ruby traverse.rb
[0, 0, 2, 10, true, :CONTEXT_SPECIFIC, 1]
[1, 2, 2, 8, false, :APPLICATION, 3]
$

The parameters to the block seem to be (looking at ext/openssl/ossl_asn1.c):
 depth
 start offset
 header length
 data length
 constructed=true (so primitive=false)
 tag class
 tag

A more friendly API could be a stream of tag_start / data / tag_end method
calls on an object, like an REXML stream parser.

I don't think the reverse exists, i.e. for taking a stream of these tags and
turning them into DER/CER.

Shame that none of this appears to be documented! Somebody has taken a lot
of time to wrap openssl's ASN.1 parsing for Ruby, but anyone who wants to
use it (like me) has to do quite a bit of work to reverse-engineer the API.

> Anyhow, mostly I just wanted to say writing a stream-based BER/DER
> decoder in ruby would be easy. Writing stream-base DER encoders is
> impossible, unfortunately (the ouput size is encoded at the beginning,
> they should have used CER more often, but its too late now), but
> stream-based BER encoders are also easy.

Understood. Once upon a time I wrote a one-pass machine-code assembler that
used to rewind to previous points and insert branch offsets once it was able
to resolve a label :-)

DER makes this a bit more difficult with the variable sized encoding of the
length octets, but I think it could be made a two-pass operation. Or you
could write out as CER, and then have a two-pass CER to DER convertor (pass
one reads in the CER and writes out some auxilliary data about lengths seen;
pass two reads the CER again and merges in the length data to create DER)

Regards,

Brian.