Hello Sunny-

On Nov 26, 2006, at 1:10 PM, Sunny Hirai wrote:

> First, I am a Ruby newbie but am an experienced developer of highly
> scalable applications.
> <snip>
>
> QUESTION 1
>
> Is there a way (or does it do this already) for the Classes of the
> application to be cached such that it doesn't add performance  
> overhead?
> Because it is such a dynamic language, my understanding is that the
> classes themselves are created at run-time and DO add overhead before
> any object instantiation occurs. My guess is that this would still
> happen under YARV too?
>
> In other words, can I create a scope in the web application (say a  
> scope
> that lasts the lifetime of the web server) where I can store class
> definitions and/or object instances themselves and then use them to
> create instances in the page request scope?
>
> Why I want to do this is so I can define many classes without  
> having to
> worry about the overhead of having them defined at runtime for every
> page request. In this way, if I don't use the classes in a page  
> request,
> they won't add any extra to the execution time.
>
> When I write Javascript, I know that there is overhead so I have to  
> keep
> my libraries short and sweet. In ColdFusion, I have created a  
> framework
> that stores shared classes and object instances in what ColdFusion  
> calls
> an "application" scope so that classes and objects are setup only once
> at application startup (my framework is a little more complex than  
> this
> but you probably get the point). Because of this, I have many  
> libraries
> and they are wide and deep. This is very helpful because I can create
> many helper libraries without worrying about performance overhead.
>
> I want to know if this is possible in Ruby.

Yes this is entirely possible in ruby. You can easily pre load all  
classes at server start time inastead of per request once you are  
ready for production environment. What I mean is that in a  
development mode it is very handy to have classes reload for every  
request. This allows you to make code changes and see them instantly  
reflected in the browser. But this is only good for when you are  
developing your applications. When you no longer are making changes  
to the source code then you can pre load all classes and not reload  
them per request. This does reduce the overhead for apps in  
production quite a bit but still can allow you to do dev in an easy  
way as well.
>
> Overall, I'm not really sure what kind of persistence and
> non-persistence there is between page requests when Ruby is attached a
> web server.

This is entirely up to you. In something like rails you have the  
session around for state between requests. But you can also run a drb 
(distributed ruby) daemon to do longer tasks in an asyncronous way to  
increase speed. In effect offload any time consuming tasks to a  
background daemon and let the htp request return right away thru an  
xmlhttprequest. Then polling to check the status of jobs. These  
daemons can be avaiable to all your ruby processes running your  
application code.

The best way to obtain high throughput in ruby web applications is to  
add more processes behind a http or fcgi proxy. This is how rails and  
other frameworks scale. You add more processes to the cluster and  
they share state through the database or other means like memcached  
or drb.

>
> Any thoughts or pointers to resources would be helpful.
>
> QUESTION 2
>
> I've found a lot of documentation on ERB but a lot less on eRuby. All
> the documentation I have found on eRuby has it executing from the
> command line or through a web server plugin, usually through  
> Apache. Can
> eRuby be called from inside Ruby to do parsing? I ask this because  
> eRuby
> seems like it would execute faster seeing it is built using C.
>
> Using ERB is straightforward but I'd love to get the performance
> benefits of using eRuby if I could; however, my framework would likely
> requiring making calls from inside Ruby and not ONLY .rhtml files
> directly.
>
> I'm guessing we can use ERB to generate the Ruby code and saving the
> generated code to a file and then executing the generated file. This
> would improve performance since the parsing step only happens once;
> however, I'd still like to know if eRuby can be used this way.


Ok this you are going to like. There is a erb compatible alternative  
that is 3 times faster then ERB and 10-15% faster then the C eruby  
and it is written in pure ruby. Its called erubis:

http://www.kuwata-lab.com/erubis/

I also want to mention a project I am working on. Its called Merb  
mongrel+erb:

http://merb.devjavu.com/
http://svn.devjavu.com/merb/README

Merb is faster lightweight replacement for ActionPack which is the VC  
layer for the rails MVC. Merb still uses ActiveRecord for database  
persistence. But it can also use Og or Mongoose(pure ruby db). It is  
integrated into mongrel for http serving and has its own controller  
and view abstraction with sessions filters and erb. It is just a lot  
smaller and closer to the metal then ActionPack. I wrote it mainly to  
use in conjusnction with rails applications. To have a small merb app  
stand in for performance sensative portions of an application.

ActionPack is not thread safe and requires a  mutex around the entire  
dispatch to rails. This can cause problems with file uploads. Because  
each file upload blocks an entire rails app server for the duration  
of the upload. This means that if you have numerous users uploading  
large files all at once, you will need an app server instance for  
each concurrent upload(!). This was one of the original reasons I  
made merb. It has its own mime parser and does not use cgi.rb or  
anything else that makes actionpack non thread safe. So it can  
process many concurrent file uploads or requests at one time in one  
multi threaded app server mongrel process. Merb does use a mutex for  
parts of the request that can be calling out to ActiveRecord code  
because although ActiveRecord is thread safe, it does not perform  
better then single threaded mode and does cause some other problems.  
So all of the header and mime parsing is handled in thread safe  
sections of the code and only uses a mutex for sections of code that  
call the database. ActionPack has a mutex around all mime body  
parsing as well as everything else actionpack does to serve one request.

You mention you would rather build most of your own framework to be  
closer to the metal. But you may want to look at merb and see if you  
want to work on it with me. I plan on continuing its development and  
it is being used in heavy production already. Augmenting rails  
applications for faster response times and file uploads.

>
> FINAL COMMENTS
>
> Sorry for the monster large post. This is incredibly important for us
> and will help us decide if we want to switch to Ruby for our new
> application. We have a large amount of good code in ColdFusion but  
> as an
> agile company, I can see the benefits of Ruby down the line,  
> especially
> after a couple of years. Mostly, I love the clean syntax and the  
> overall
> design of the language.
>
> Thanks for your input and I hope (beg) that somebody can help answer
> these questions.

I also find that Xen virtualization works very well for scaling ruby  
applications. Scaling ruby apps usually means adding more application  
servers and maxing out your database servers. Also caching plays an  
important role as well. Anything that can be cached to static files  
or even partial caching or using memcached for expensive sections of  
code can yield big performance gains. Using a number of Xen virtual  
machines with a shared filesystem like gfs can make it easy to scale  
your ruby applications pretty much horizontally. You just end up  
pushing the persistence into the database, memcached or drb and  
trying to use the "shared nothing" approach for as many portions of  
the system as you can.

In an application stack like this adding nodes to the app server  
cluster is easy and gives you very good scalability up or down. Ruby  
is really a small part of a technology stack like this. There are  
lots of other places to optimize performance. We have built a custom  
Gentoo distribution that is tailored to running ruby application at  
optimal performance in Xen instances. I hope to release this distro  
as soon as I get some free time to package it up.

Cheers-

-- Ezra Zygmuntowicz 
-- Lead Rails Evangelist
-- ez / engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)