Hi,

2010/2/16 Praveen <praveendevarao / gmail.com>:
> Hi Kubo,
>
> I tried proceeding with the above mentioned APIs. However I am seeing
> some interesting stuffs. Not sure if I am using the right constructs.
>
> Below is the Ruby script I am using:
>
> ======================================
> #encoding: utf-8
>
> puts "Results in C extension"
> puts "----------------------"
> require 'ibm_db'
> str = "insert into woods (name) values ('GHRINGʸ')"
>
> conn = IBM_DB.connect 'DRIVER={IBM DB2 ODBC
> DRIVER};DATABASE=devdb;HOSTNAME=9.124.159.74;PORT=50000;PROTOCOL=TCPIP;UID=db2admin;PWD=db2admin;','',''
> stmt = IBM_DB.exec conn, str
> IBM_DB.close conn
>
> print "----------------------\n\n"
>
> puts "Results in Ruby script"
> puts "----------------------"
>
> puts "str.length is :#{str.length}"
> puts "str.bytesize: #{str.bytesize}"
> puts "**Forcing encoding**"
> str1 = str.force_encoding("UTF-16LE")
> puts "str.length is :#{str1.length}"
> puts "str.bytesize: #{str1.bytesize}"
> ======================================
>
> In the script above, IBM_DB is the C extension module. However the
> database call has got nothing to do with the unicode API usage. I have
> just resused the module for trying the unicode support.
>
> The snippet in C extension that uses the unicode functions is as
> below:
>
> ======================================
> VALUE ibm_db_exec(int argc, VALUE *argv, VALUE self){
> rb_scan_args(argc, argv, "21", &connection, &stmt, &options);
> if (!NIL_P(stmt)) {
>  rb_encoding *enc_received;
>  rb_encoding *ucs2_enc = rb_enc_find("UTF-16LE");
>  rb_encoding *ucs4_enc = rb_enc_find("UTF-32LE");
>
>  enc_received = rb_enc_from_index(ENCODING_GET(stmt));
>
>  printf("\nString in received format: %s\n",RSTRING_PTR(stmt));
>  printf("\nrb_str_length is: %d\n",rb_str_length(stmt));
>  printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt));
>  printf("\nEncoding format received: %s\n",enc_received->name);
>
>  stmt_ucs2 = rb_str_export_to_enc(stmt,ucs2_enc);
>
>  printf("\nString in utf16 format: %s\n",RSTRING_PTR(stmt_ucs2));
>  printf("\nrb_str_length is: %d\n",rb_str_length(stmt_ucs2));
>  printf("\nRSTRING_LEN is: %d\n",RSTRING_LEN(stmt_ucs2));
>  printf("\nEncoding after conversion: %s\n",ucs2_enc->name);
> }
> }
>
> ======================================
>
> The above ruby script run produces the following output:
>
> ======================================
>
> Results in C extension
> ----------------------
>
> String in received format: insert into woods (name) values
> ('GHRING')
>
> rb_str_length is: 89
>
> RSTRING_LEN is: 47
>
> Encoding format received: UTF-8
>
> String in utf16 format: i #Expected because used printf
>
> rb_str_length is: 89
>
> RSTRING_LEN is: 88
>
> Encoding after conversion: UTF-16LE
> ----------------------
>
> Results in Ruby script
> ----------------------
> str.length is :44
> str.bytesize: 47
> **Forcing encoding**
> str.length is :24
> str.bytesize: 47
>
> ======================================
>
> I am not sure why is there a difference in the string length in the
> original string [44] (UTF-8 format) and string after changing the
> encoding [24] (to UTF-16LE). The same is the case in case of output in
> the C extension, the bytesize and the length are same (+1 or -1) and
> the length is different in different encoding formats.
>
89 is not an integer but a VALUE. VALUE of 89 means 44 of integer.
> Could you tell me what is that I am doing wrong?
>
You should use String#encode instead of String#force_encode like this:

puts "**Converting encoding**"
str1 = str.encode("UTF-16LE")
puts "str.length is :#{str1.length}"
puts "str.bytesize: #{str1.bytesize}"

> Along with this, in C extension is there any API that I can call to
> check if the given string is in a particular encoding or should I use
> rb_enc_from_index and from there read the struct member name and
> determine in the extension that I write?
>
Using rb_enc_get is more simple then rb_enc_from_index like this:
    enc_received = rb_enc_get(stmt);

And, rb_str_length returns not an integer but a VALUE. So you should
use NUM2INT like this:
    printf("\nrb_str_length is: %d\n",NUM2INT(rb_str_length(stmt)));

Regards,

Park Heesob