On Sat, Mar 9, 2013 at 4:31 PM, Joel Pearson <lists / ruby-forum.com> wrote:
> Thanks for your input, Robert.

You're welcome!

> I did wonder whether I should convert the underlying dataset into an
> Array rather than using a Hash, since spreadsheets are "structured" and
> I find the easiest way to manipulate the structure is with Array
> methods.

It does not really matter what you do.  You could even use a hybrid
approach where you start with an Array based storage and exchange it
with a Hash based storage once sparseness is too large (for your
particular measure of "too large").  If you follow the layered
approach (see at end) you could have two implementations of the plain
data store which have the exact same API but one uses a Hash
internally and the other on an Array...

> Enumerable seems to be Array-based as well, and I'm still

No, not at all.  Enumerable is just a module which relies solely on
the existence of a method #each.  It's as simple as

irb(main):001:0> class X
irb(main):002:1> def each; yield 1; self end
irb(main):003:1> include Enumerable
irb(main):004:1> end
=> X
irb(main):005:0> x = X.new
=> #<X:0x802ec310>
irb(main):006:0> x.to_a
=> [1]
irb(main):007:0> x.select {|n| n.odd?}
=> [1]
irb(main):008:0> x.select {|n| n.even?}
=> []
irb(main):009:0> x.find {|n| n > 0}
=> 1

Or a simplistic integer range:

irb(main):014:0> class IntRange
irb(main):015:1> include Enumerable
irb(main):016:1> def initialize(a, b)
irb(main):017:2> @low, @high = [Integer(a), Integer(b)].sort
irb(main):018:2> end
irb(main):019:1> def each
irb(main):020:2> n = @low
irb(main):021:2> while n < @high
irb(main):022:3> yield n
irb(main):023:3> n += 1
irb(main):024:3> end
irb(main):025:2> self
irb(main):026:2> end
irb(main):027:1> end
=> nil
irb(main):028:0> ir = IntRange.new 3, 8
=> #<IntRange:0x80280048 @low=3, @high=8>
irb(main):029:0> ir.to_a
=> [3, 4, 5, 6, 7]
irb(main):030:0> ir.each {|x| p x}
3
4
5
6
7
=> #<IntRange:0x80280048 @low=3, @high=8>

No Arrays around. :-)

> rather hazy on when to override methods like "map", or when to rely on
> the methods already available through "each".

You should normally not override Array methods.  You generally
shouldn't inherit from Array either.  Those core classes are best used
through delegation.

> Still, I've learned a lot about Hashes while writing this code, so even
> if I do abandon their use for the main data storage I'll still find good
> use for them elsewhere.

That's good!  I am glad you see it that way.  Others might view these
exercises as useless detours - but they underestimate the learning
effects which come from that.  You certainly learned a lot more than
by placing questions for every detailed question that occurred to you
as a few other members of the community seem to have chosen to do
recently.

> My reasoning behind the prevalence of headers is simply that if you
> wanted data without headers you'd just use arrays rather than this
> class. One of the big things I find helpful with this is that code is
> much more readable if I can reference a header rather than an index.

Then I would at least make the number of header rows and header
columns a property of the individual instance - and not as a constant
in the implementation.  Still, I believe that with introduction of the
concept of "headers" in this class you may make things too complex too
fast.

> I'm not sure what a test class or wrapper class is. I'll look them up.

A test class would be a class implementing unit tests.  A wrapper
class simply wraps around your class RubyExcel in much the same way
that RubyExcel wraps a Hash.  In other words: it presents a different
abstraction.  It's a general approach in software engineering to
create several layers of abstractions which makes it easier to deal
with only few aspects on one layer.  The ISO 7 layer model is a famous
representative of that approach.
http://www.technology-training.co.uk/understandingtheiso7layermodel_10.php

> I see your point about to_s. I suppose I should differentiate between
> using interpolation for multiple variables and to_s for single cases.

+1

> The +, -, and << methods are recent additions; mostly because I only
> just learned that you can define these. I'm sure there are multiple ways
> to write these;

I wrote about numeric operators in Ruby a while back:
http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html

> my first attempt was very poor in performance... and my
> thinking was to avoid re-inventing the wheel by using the Array methods
> written by someone much smarter than me :)

... which is perfectly understandable and OK.  In this case the
conversion to an Array based structure might burn a lot of memory
though.

> All in all, this still needs a lot of work to make it useful; but now
> that I have a better feel of what the weaknesses and strengths are, I
> hope to improve on this starting point and eventually build something
> genuinely useful to others as well as myself.

That's a good approach.

> I'd be particularly interested on the question of Array vs Hash for the
> internals. Hash is great because of the simplicity of addresses and its
> efficient way of coping with blank space, but Arrays can keep their
> "form" much more effectively and already support things like sorting,
> rows, and columns.

In the end it does not matter that much what you use internally for
representation.  The important bit is to use the proper API to your
storage to allow for a consistent view of the model and easy of use.
For the moment I'd stick with Hash but it may make sense to use a
layered approach: split class RubyExcel into (at least) two where one
is only responsible for providing a consistent API to your data with
the minimal operations needed to make it work.  Use that internally as
storage and put everything else like header handling, those
convenience methods mentioned or reading from file and writing to file
in the wrapper class.  That way you get a clean separation of
concerns: you have a proper abstraction of the storage and you get a
second layer which adds all the whistles and bells you need to work
efficiently with it (like treading header rows and columns
differently).

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/