Wes Gamble wrote:
> Some more info:
> 
> Recently, I was confronted with a task in one of the apps. I'm building 
> that would allow the parsing of data in an Excel spreadsheet where the 
> number of rows could be on the order of 30000/40000/50000 or higher.
> 
> Originally, I was using the parseexcel gem to handle the parsing - 
> however, it proved to be fairly slow and consumed a lot of memory.  When 
> I presented it with a > 42000 row spreadsheet, it basically cratered. 
> So I had to figure out another way to handle this problem.  Someone 
> mentioned that there was a nice open source Java - based Excel parser 
> called JExcelAPI (http://jexcelapi.sourceforge.net/).  A quick native 
> Java test showed that the performance and memory footprint would be much 
> much better.
> 
> In order to take advantage of JExcelAPI, I looked at JRuby briefly - but 
> still had problems implementing that (and I didn't want to run this app. 
> on it yet since it's still so young), so I took a look at some of the 
> Java - Ruby bridges.  I gave one called Rjb 
> (http://arton.no-ip.info/collabo/backyard/?RubyJavaBridge) a shot.  I 
> was very pleasantly surprised - it was really easy to use this to 
> integrate with the JExcelAPI.
> 
> If I understand correctly, Rjb uses JNI to start, and then interact with 
> an available JVM (a JDK, not a JRE).  Works on Windows or UNIX.  You 
> basically embed a JVM in your Ruby interpreter and then load classes 
> into it and start using them.  Basic type casting to/from Java types is 
> done for you.  The documentation is terrible but there's just enough of 
> it to get you started.
> 
> Here's what I did:
> 
> 0) Get the Rjb gem using "gem install rjb"
> 1) Put the JAR file that I wanted to use - jxl.jar in my RAILS_ROOT/lib 
> directory.
> 2) Start the JVM using Rjb::load("#{RAILS_ROOT}/lib/jxl.jar", 
> ['-Xms256M', '-Xmx512M']) - the array is a set of parameters to send to 
> the JVM for startup.
> 3) Load classes using Rjb::import(classname)
> 
> Here's an example of using it in my app.:
> 
>    file_class = Rjb::import('java.io.File')
>   workbook_class = Rjb::import('jxl.Workbook')
>   workbook = workbook_class.getWorkbook(file_class.new(filename))
> 
> Some things to notice:
> * filename is a Ruby string - that's being passed to the File.new() Java 
> method.
> * The return of the call to file_class.new is a wrapped Java File object 
> and can be immediately passed to the getWorkbook method.
> * workbook is a Java object that can then be used in other parts of the 
> app.
> 
> The good news: Once you get past loading a class and/or instantiating an 
> object, doing method calls is as simple as just calling the methods on 
> the Java objects you've instantiated or received from other method 
> calls.
> 
> The bad news: This is so seamless, it would be very easy to forget that 
> some of the objects that you're dealing with are effectively Java 
> objects, and then you might forget how to use them correctly.
> 
> For production for this app, I may need to change approaches since I'll 
> prob. be running multiple Mongrel processes and I don't know if I want 
> to have one embedded JVM per process (if I understand Mongrel deployment 
> correctly - currently I'm doing Apache/FastCGI so I know it's a problem 
> there).  That may force using DRb in a separate process to host this 
> Excel parser component and allow it to be used from anywhere (if that 
> happens - could also do a Web service-y thing on top of a JRuby process 
> or whatever).
> 
> Hope this is useful for someone.
> 
> Wes

That seems to me to be an awfully roundabout way of doing things. I
think you could hack something up with ODBC to just take your whole
spreadsheet and upload it to a database, then use Rails or DBI to read
the database.