On Thu, Jul 5, 2012 at 10:40 PM, Eliezer Croitoru <eliezer / ngtech.co.il> wrote:
> On 7/5/2012 10:03 AM, Robert Klemme wrote:
>>
>> On Thu, Jul 5, 2012 at 4:13 AM, Eliezer Croitoru <eliezer / ngtech.co.il>
>> wrote:
>>>
>>> thanks in advance i need a bit help to break the ice that my head is in.
>>>
>>> i am working on ICAP server and i am building some acls for content
>>> filtering.
>>
>>
>> Ah, interesting!
>>
> i have used greasyspoon as icap server but it's too much for my needs.
> it takes a lot of memory and cpu for nothing so i started writing my own on
> ruby and i then found out that i can add some nice features to it.

That sounds fun!

>>> i have a list of domains and i want to apply acl such as ".example.com"
>>> will
>>> match all domains that starts with "example.com".
>>> but the problem is that domain start in a reverse order.
>>> i was thinking on what is the best\better way to match a domain to domain
>>> list ?
>>
>>
>> If in memory (which does not seem to be the case here) I would create
>> a forest for matching with TLD's as root level:
>>
>> com
>> +- example
>> +- google
>>
> i will use mysql\pgsql so i will use mysql+memcached or pgsql with shm.
> i will be more then happy to hear about how to implement such a forest just
> for a case i will need it later.

Well, basically you need a structure of nested Hashes.  Matching would
start at the TLD and descend to the most specific domain.

> i am thinking of loading some if not all acls into memory and in this case i
> will need to use it.

That's certainly faster if memory is sufficient for the size of lists
you want to handle.

>> You might be able to write a procedure to convert regular
>> representation to reverse representation, make that a calculated
>> column using the procedure and create an index on that calculated
>> column.  Alternatively calculate the value via a trigger.  Then your
>> insertion and update logic can stay as is.
>>
> i was thinking about the same idea but it will much more simple for me to
> store the domain in a full reverse such as "moc.elpmaxe".

Why?

irb(main):020:0> "foo.bar.baz".split(/\./).reverse
=> ["baz", "bar", "foo"]
irb(main):021:0> "foo.bar.baz".split(/\./).reverse.join('.')
=> "baz.bar.foo"

> i have two objectives i want to achieve:
> 1. a dstdomain acl like in squid for simple allow\deny.
> 2. a dstdomain from squidGuard blacklists to block porn spyware and others
> based on category.

Well, if you make "allow" and "deny" (or only "deny") a category then
it's just one mechanism. :-)

> the blacklists are updated via a txt file with only one match domain or
> domain wildcard per line and i will use it as is.
> so i will just use:
> LOAD DATAT INFILE '/tmp/porndoms.txt'
> INTO TABLE porn (@var1)
> SET dom= REVERSE(@var1);
>
> this is not suppose to be a "readable" field and it's 30MB+ size so i dont
> really care how it's ordered in the table.

Order in table rarely matters with RDBMS.  It's the indexes which are important.

> about acls that the admin writes in "acltable" this is another story because
> it's most of the time very sort and must be readable for the admin as a
> human.

Isn't it more important that the UI presents human readable
information?  But I agree, readable DB contents helps in debugging
etc.  That's why I suggested the reversed format based on domains.

>> Note: this is not exactly identical from a logical point of view - I
>> just assume that you do not have a dom which starts with ".".
>>
> i heard this podcast about programing and this guy said "it's better to have
> code that works and then improve it then not having code at all"

I'd be careful with that.  That philosophy only works for small
systems where it is easy to do a major rewrite.  In other cases it
might bring you into a situation where you are stuck with an
architecture that does not fit the needs.

Cheers

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/