Hi Kubo,

I tried proceeding with the above mentioned APIs. However I am seeing
some interesting stuffs. Not sure if I am using the right constructs.

Below is the Ruby script I am using:

======================================
#encoding: utf-8

puts "Results in C extension"
puts "----------------------"
require 'ibm_db'
str = "insert into woods (name) values ('GHRING)"

conn = IBM_DB.connect 'DRIVER={IBM DB2 ODBC
DRIVER};DATABASE=devdb;HOSTNAME=9.124.159.74;PORT=50000;PROTOCOL=TCPIP;UID=db2admin;PWD=db2admin;','',''
stmt = IBM_DB.exec conn, str
IBM_DB.close conn

print "----------------------\n\n"

puts "Results in Ruby script"
puts "----------------------"

puts "str.length is :#{str.length}"
puts "str.bytesize: #{str.bytesize}"
puts "**Forcing encoding**"
str1 = str.force_encoding("UTF-16LE")
puts "str.length is :#{str1.length}"
puts "str.bytesize: #{str1.bytesize}"
======================================

In the script above, IBM_DB is the C extension module. However the
database call has got nothing to do with the unicode API usage. I have
just resused the module for trying the unicode support.

The snippet in C extension that uses the unicode functions is as
below:

======================================
VALUE ibm_db_exec(int argc, VALUE *argv, VALUE self){
  rb_scan_args(argc, argv, "21", &connection, &stmt, &options);
  if (!NIL_P(stmt)) {
    rb_encoding *enc_received;
    rb_encoding *ucs2_enc = rb_enc_find("UTF-16LE");
    rb_encoding *ucs4_enc = rb_enc_find("UTF-32LE");

    enc_received = rb_enc_from_index(ENCODING_GET(stmt));

    printf("\nString in received format: %s\n",RSTRING_PTR(stmt));
    printf("\nrb_str_length is: %d\n",rb_str_length(stmt));
    printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt));
    printf("\nEncoding format received: %s\n",enc_received->name);

    stmt_ucs2  =  rb_str_export_to_enc(stmt,ucs2_enc);

    printf("\nString in utf16 format: %s\n",RSTRING_PTR(stmt_ucs2));
    printf("\nrb_str_length is: %d\n",rb_str_length(stmt_ucs2));
    printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt_ucs2));
    printf("\nEncoding after conversion: %s\n",ucs2_enc->name);
  }
}

======================================

The above ruby script run produces the following output:

======================================

Results in C extension
----------------------

String in received format: insert into woods (name) values
('GHRING')

rb_str_length is: 89

RSTRING_LEN is: 47

Encoding format received: UTF-8

String in utf16 format: i #Expected because used printf

rb_str_length is: 89

RSTRING_LEN is: 88

Encoding after conversion: UTF-16LE
----------------------

Results in Ruby script
----------------------
str.length is :44
str.bytesize: 47
**Forcing encoding**
str.length is :24
str.bytesize: 47

======================================

I am not sure why is there a difference in the string length in the
original string [44] (UTF-8 format) and string after changing the
encoding [24] (to UTF-16LE). The same is the case in case of output in
the C extension, the bytesize and the length are same (+1 or -1) and
the length is different in different encoding formats.

Could you tell me what is that I am doing wrong?

Along with this, in C extension is there any API that I can call to
check if the given string is in a particular encoding or should I use
rb_enc_from_index and from there read the struct member name and
determine in the extension that I write?

Thanks

Praveen