Hi Benjamin,

This can be done in ruby, and my inline responses will hopefully point you in
the right direction. However, depending on the size of the data or the amount
of available memory, doing this in ruby may not be the best choice.
If I were doing this, I would probably import both files into a database
like sqlite3, using a script, of course, and generate the output with sql.
The front end script would accept parameters for the input files, the columns
to output (or a list of columns to suppress, if any) and produce the desired
results.

Rather than using generating two arrays, I would read one file
into a hash, keyed by the common key, 'ref', then read the second file
and match up its record, if any, in in other file, and output the combined
set. You'd want to keep track of which lines from file1 have a match so you
can output all the non-matching values. However, I would strongly suggest
that rather than doing the work yourself, generate code to script sqlite
or other database system.

HTH

-Gyepi

On Thu, Oct 22, 2009 at 06:20:24AM +0900, Benjamin Thomas wrote:
> I have 2 csv files I'd like to merge such as :
> 
> File1               File2
> -----               -----
> ref;qty             ref;price
> A;10                A;100
> B;20                D;150
> C;30                C;200
> E;5                 B;75
> 
> outputs to File3 =>
> 
> File3
> -----
> ref;qty;price;total
> A;10;100;1000
> B;20;75;1500
> C;30;200;6000
> D;"missing_data";150;"missing_data"
> E;5;"missing_data";"missing_data"
> 
> Here is the code I have so far:
> 
> ###########################################################################
> def process_file(file)
>   processed = []
>   opened = File.open(file, "r").readlines
> 
>   opened.each {|line| processed << line.strip.split(";")}
>   return processed
> end
> 
> arr1 = process_file("file1")
> arr2 = process_file("file2")
> 
> p arr1, arr2
> ###########################################################################
> 
> ==> [["ref", "qty"], ["A", "10"], ["B", "20"], ["C", "30"], ["E", "5"]]
> ==> [["ref", "price"], ["A", "100"], ["D", "150"], ["C", "200"], ["B",
> "75"]]
> 
> ----
> So I have 2 arrays which is great but I'm not sure on how to go about
> merging all this since data is sometimes missing from file1, some other
> times from file2. Of course those 2 files are more complex (and
> bigger!), and I'd like to setup a mechanism that would allow me to
> select which columns to filter and merge, but that's the basic idea
> anyway.