Martin Bosslet <Martin.Bosslet / googlemail.com> wrote:
> This is related to the proposal in [ruby-core:41321][1].
> 
> I'd like to take advantage of streaming IO in an extension I am
> working on. The problem I'm having is that I don't want to call
> IO#read on the rb_funcall level because that would kill the
> performance due to wrapping the bytes into Ruby objects back and
> forth again.

Is starting with Ruby String objects (with binary encoding) and then
having read(2)/write(2) hit RSTRING_PTR not possible?

> I saw two solutions to my problem:
> 
> 1. Duplicating the file descriptor to obtain a pure FILE*
> like it is done in ext/openssl/ossl_bio.c[2] and continue
> working on the raw FILE*.

That may be from the old 1.8 days when all IO objects wrapped FILE *.
It might be better to use BIO_new_fd() nowadays instead since 1.9
generally prefers bare file descriptors (for all fd > 2).

> 2. Since I really only need to read and write on the stream,
> I was looking for public Ruby C API that would support me
> in the process, and I found
> 
>  - ssize_t rb_io_bufwrite(VALUE io, const void *buf, size_t size)
>  - ssize_t rb_io_bufread(VALUE io, void *buf, size_t size)

Is userspace buffering really necessary in your case?

If you're working with sockets/pipes, I would reckon not (Ruby already
defaults to IO#sync=false on sockets/pipes when writing).  If you're
reading (and probably parsing), you would need to do your own read
buffering anyways, no?

> I think both cases are valid use cases, 1. is likely necessary
> if there is the need to pass a FILE* on to an external C library,

It's not easily possible to share userspace buffers in FILE * with
userspace buffers in rb_io_t.  Userspace buffering is pretty miserable
and error-prone whenever/wherever IPC is concerned.

> 2. is for cases like mine where there is the need to operate
> on raw C data types for performance reasons.

It depends on what you're doing, but if performance is a concern you
should try to work on largish chunks off the file descriptor and
skip the userspace buffering stages.  Userspace buffering can improve
performance by reducing syscalls, but it can also double the memory
bandwidth required to do things.