Robert wrote:
> How many ruby-ists have to do statistical analysis or data cleaning
> prior to analysis?
> 
> Is it not something that is done often by web developers?
> 
> What is the well known software out there for statistical software or
> data transformation software? That is open source, or at least free of
> charge? I mean besides R, I think I understand what R's strengths and
> limitations are.
> 
> There is a number of applications at
> http://directory.fsf.org/math/stats/
> 
> but I do not know how mature they are (except for the one I submitted
> (vilno)).
> 
> Is there currently a successful project for incredibly user-friendly
> open source statistical software, usually using a GUI, to compete with
> SAS (JMP)  or SPSS? ( R is more for research statistics, with a tough
> learning curve.).
> 
> Appreciate your feedback,
> 
> Robert

I do a lot of data cleaning/pre-processing. Most of it is numerical data
rather than more "traditional" business data mining like
name/address/zip code stuff. My main current modus operandi is

1. Do the data extraction in Perl. I'd use Ruby, but
   a) I learned Perl years ago and just learned Ruby about a year ago
   b) There are no other Ruby programmers around for backup.

2. Load the extracted data into a PostgreSQL database. I used to use
Access, then migrated to SQL Server, and now I'm on PostgreSQL.

3. Do SQL queries for the easy stuff and R (via RODBC) for the fancy stuff.

Mind you, I've been doing this with minor alterations in the tools for
something like 15 years, so I haven't really dug into the way other
folks do it. But there are starting to be projects, both open source and
commercial, in the so-called ETL (Extract, Transfer, Load) arena, that
promise to revolutionize this type of work. One name that sticks in my
mind in open source is Pentaho, but I have not had a chance to check it
out. Most of the big ETL products are Java-based, IIRC.

As for the learning curve of R, there *are* a few GUI front-ends that
take some of the sting out of it, but the basic underlying *philosophy*
of R is that it *is* a language (and a damn good one!) for
scientific/statistical/graphical computing. The GUI builders expect you
to start with the GUI and learn the language, rather than continue using
the GUI like you would Excel, Minitab, or some of the other packages.
That said, the most complete and user-friendly is probably R Commander
(Rcmdr), which works on both Windows and Linux R.

This is something I'd like to see built in Rails -- you've got the RDBMS
back ends, the AJAX and MVC GUI tools, the ORM, etc. There is an
interface to R from Ruby, but IIRC the bridge logic between the two
languages currently only works on Linux -- there's no way yet for a
Windows Ruby program to hook up with the R DLL. There are some R DCOM
interfaces, though -- that might be the way to do it on a Windows machine.

By the way, I think the Windows R UI is *far* superior to the one on
Linux. The Linux version hasn't changed substantially from its origin --
it's a simple xterm -- X windows application.