Although a string in Ruby is defined as a sequence
of arbitrary (i.e. unrestricted) bytes, there is
no way to specify the initial contents of a string
in a manner that reflects this. That is, there is
no 'hex string literal'. It is of course possibe
to use escapes within a normal character string,
but for strings with large numbers of unprintable
characters this is a clumsy solution (and the
particular itch I needed to scratch). This
proposal is an attempt to address that omission.

The specification of a hex string seems like a
natural for 'general delimited input'. Sadly, the
obvious character, 'x', has been taken already, so
unless we are interested in breaking a serious
amount of code (none of which is mine, so I would
be game - anyone for a religious war?) I suggest
'h' as a suitable replacement.

The functionality is defined as follows:
1) Hex literals may be specified as general
delimited input using the %h 
2) Each byte within the string contents is
specified as two hex digits
3) For readability, arbitrary numbers of spaces
may be placed between the bytes (although the two
digits representing a byte must be adjacent
3) the usual escape sequences (\n, \t etc.) area
allowed
4) embedded string expressions (#{...}) may also
be embedded

This is perhaps best specified by the following
test cases:

    assert_equal "123", %h"313233"
    assert_equal "123", %h| 31 32 33 |
    assert_equal
"\x01\x23\x45\x67\x89\xab\xcd\xef\xAB\xCD\xEF", 
			%h"0123 4567 89ab cdef AB CD EF"
    assert_equal "1ac2", %h"31#{'ac'}32"
    assert_equal "1\t3", %h[31\t33]


The diff for my proposed changes to enable this
(somewhat modest) extension is attached below. I
would be very pleased if these could find their
way into 1.9, though I am not sure what the next
step is to making this happen. Do I need a
committer to champion my cause? Any volunteers? (I
would also be very happy to make the
corresponding, and fairly obvious changes for
1.8.6 or 1.8.7 if anyone would be interested).

As this is my first attempt to work with the ruby
code, I extend my apologies for any violence I may
have done to the ruby coding standards and
conventions, and for any inadvertent breaches of
protocol. Any suggestions and/or comments would be
welcome.

graeme

[graeme@localhost ruby]$ svn diff
Index: sample/test.rb
===================================================================
--- sample/test.rb	(revision 18231)
+++ sample/test.rb	(working copy)
@@ -1466,6 +1466,11 @@
 test_ok("abcd" == "abcd")
 test_ok("abcd" =~ /abcd/)
 test_ok("abcd" === "abcd")
+# general delimited hex strings
+test_ok(%h"31 32 33" === "123")
+test_ok(%h"31#{'ac'}32" === "1ac2")
+test_ok(%h|3132| === "12")
+test_ok(%h[31\t33] === "1\x093")
 # compile time string concatenation
 test_ok("ab" "cd" == "abcd")
 test_ok("#{22}aa" "cd#{44}" == "22aacd44")
Index: parse.y
===================================================================
--- parse.y	(revision 18231)
+++ parse.y	(working copy)
@@ -4870,10 +4870,12 @@
 #define STR_FUNC_QWORDS 0x08
 #define STR_FUNC_SYMBOL 0x10
 #define STR_FUNC_INDENT 0x20
+#define STR_FUNC_HEXSTR 0x40
 
 enum string_type {
     str_squote = (0),
     str_dquote = (STR_FUNC_EXPAND),
+    str_hquote = (STR_FUNC_EXPAND|STR_FUNC_HEXSTR),
     str_xquote = (STR_FUNC_EXPAND),
     str_regexp =
(STR_FUNC_REGEXP|STR_FUNC_ESCAPE|STR_FUNC_EXPAND),
     str_sword  = (STR_FUNC_QWORDS),
@@ -5385,6 +5387,10 @@
 	if (paren && c == paren) {
 	    ++*nest;
 	}
+	else if ((func & STR_FUNC_HEXSTR) && c == ' ') {
+//	    c = nextc();
+	    continue;
+	}
 	else if (c == term) {
 	    if (!nest || !*nest) {
 		pushback(c);
@@ -5462,6 +5468,27 @@
 	    pushback(c);
 	    break;
 	}
+
+	else if (func & STR_FUNC_HEXSTR) {
+#define hexval(x) (((x)>='0' && (x)<='9') ?
(x)-'0' :        \
+		   ((x)>='a' && (x)<='f') ? (x)-'a'+10 :     \
+		   ((x)>='A' && (x)<='F') ? (x)-'A'+10 : -1)
+#define hexchar(x) (hexval(x)>=0)
+	    if (hexchar(c)) {
+	    	int temp = hexval(c);
+	    	c = nextc();
+	    	if (hexchar(c)) {
+		    c = (temp<<4) + hexval(c);
+	        } else {
+		    pushback(c);
+		    continue;
+		}
+	    } else {
+		yyerror("invalid character in hex literal");
+		continue;
+	    }		
+	}
+
 	if (!c && (func & STR_FUNC_SYMBOL)) {
 	    func &= ~STR_FUNC_SYMBOL;
 	    compile_error(PARSER_ARG "symbol cannot
contain '\\0'");
@@ -6983,6 +7010,10 @@
 		lex_strterm = NEW_STRTERM(str_squote, term, paren);
 		return tSTRING_BEG;
 
+	      case 'h':
+		lex_strterm = NEW_STRTERM(str_hquote, term, paren);
+		return tSTRING_BEG;
+
 	      case 'W':
 		lex_strterm = NEW_STRTERM(str_dword, term, paren);
 		do {c = nextc();} while (ISSPACE(c));
Index: test/ruby/test_basicinstructions.rb
===================================================================
--- test/ruby/test_basicinstructions.rb	(revision
18231)
+++ test/ruby/test_basicinstructions.rb	(working copy)
@@ -49,6 +49,18 @@
     assert_equal "xOKx", "x#{s}x"
   end
 
+  def test_hstring
+    assert_equal "12", %h"3132"
+    assert_equal
"\x01\x23\x45\x67\x89\xab\xcd\xef\xAB\xCD\xEF", 
+			%h"0123456789abcdefABCDEF"
+    s = 'OK'
+    assert_equal "OK", %h"#{s}"
+    assert_equal "OK0", %h"#{s}30"
+    assert_equal "0OK", %h"30#{s}"
+    assert_equal "0OK1", %h"30#{s}31"
+    assert_equal "OK\n2", %h"#{s}\n32"
+  end
+
   def test_dsym
     assert_equal :a3c, :"a#{1+2}c"
     s = "sym"
[graeme@localhost ruby]$