On 7/5/2012 10:03 AM, Robert Klemme wrote:
> On Thu, Jul 5, 2012 at 4:13 AM, Eliezer Croitoru <eliezer / ngtech.co.il> wrote:
>> thanks in advance i need a bit help to break the ice that my head is in.
>>
>> i am working on ICAP server and i am building some acls for content
>> filtering.
>
> Ah, interesting!
>
i have used greasyspoon as icap server but it's too much for my needs.
it takes a lot of memory and cpu for nothing so i started writing my own 
on ruby and i then found out that i can add some nice features to it.

>> i have a list of domains and i want to apply acl such as ".example.com" will
>> match all domains that starts with "example.com".
>> but the problem is that domain start in a reverse order.
>> i was thinking on what is the best\better way to match a domain to domain
>> list ?
>
> If in memory (which does not seem to be the case here) I would create
> a forest for matching with TLD's as root level:
>
> com
> +- example
> +- google
>
i will use mysql\pgsql so i will use mysql+memcached or pgsql with shm.
i will be more then happy to hear about how to implement such a forest 
just for a case i will need it later.
i am thinking of loading some if not all acls into memory and in this 
case i will need to use it.

>> i have a sql(PGSQL\MYSQL) db that contains the list of domains.
>> so i was thinking to use "divide and Conquer way" to dissemble the requested
>> domain such as "subporndomain.example.com" to
>> ["com","example","subporndomain"] and then to fetch from the db all the
>> domains that starts with "example.com" and then check match each and one of
>> them(if any record exists) to the requested domain.
>
> That won't work since if you have a domain "foo.example.com" using
> "example.com" as prefix won't retrieve it.
>
> Since you have a relational database, I would store domains in reverse
> order ("example.com" -> "com.example"), create an index on that column
> and use LIKE operator, for example:
>
> -- just a matching test
> SELECT MIN(rev_domain)
> WHERE rev_domain = 'com.example'
> OR rev_domain LIKE 'com.example.%'
>
> SELECT rev_domain
> WHERE rev_domain = 'com.example'
> OR rev_domain LIKE 'com.example.%'
> LIMIT 1
>
> -- all
> SELECT domain
> WHERE rev_domain = 'com.example'
> OR rev_domain LIKE 'com.example.%'
>
>
> You might be able to write a procedure to convert regular
> representation to reverse representation, make that a calculated
> column using the procedure and create an index on that calculated
> column.  Alternatively calculate the value via a trigger.  Then your
> insertion and update logic can stay as is.
>
i was thinking about the same idea but it will much more simple for me 
to store the domain in a full reverse such as "moc.elpmaxe".
i have two objectives i want to achieve:
1. a dstdomain acl like in squid for simple allow\deny.
2. a dstdomain from squidGuard blacklists to block porn spyware and 
others based on category.

the blacklists are updated via a txt file with only one match domain or 
domain wildcard per line and i will use it as is.
so i will just use:
LOAD DATAT INFILE '/tmp/porndoms.txt'
INTO TABLE porn (@var1)
SET dom= REVERSE(@var1);

this is not suppose to be a "readable" field and it's 30MB+ size so i 
dont really care how it's ordered in the table.
about acls that the admin writes in "acltable" this is another story 
because it's most of the time very sort and must be readable for the 
admin as a human.
the options for the admins is either create a db front end web\cli\other 
or to store the two fields dom and reversedom with index only on the 
reversedom.



>> i was thinking of using the "string".start_with?
>> method in reverse such as
>> "moc.elpmaxe.niamodnropbus".start_with?(["moc.elpmaxe", ...])
>> with the list of domains as a prefix..
>> but there is one problem: i cant match all the domains as prefix because
>> some of them are not a prefix but an exact match.
>> so i will need to do some code like this:
>>
>> dom = "subdomain.example.com"
>> domacl = ".example.com"
>> domacl1 = "example.com"
>>
>> def match?(domacl,dom)
>>          if domacl.start_with?(".")
>>             return  ( dom.reverse.start_with?(domacl.reverse.chop) )
>>          else
>>             return true if dom.eql?(domacl)
>>             return false
>>          end
>> end
>
> You can make that simpler and I guess also a tad more efficient:
>
> def match? domacl, dom
>    dom == domacl ||
>      domacl[0] == '.' && dom.reverse.start_with?(domacl.reverse.chop)
> end
>
> Note: this is not exactly identical from a logical point of view - I
> just assume that you do not have a dom which starts with ".".
>
i heard this podcast about programing and this guy said "it's better to 
have code that works and then improve it then not having code at all"

as i will use a reversed domain stored almost in every part of the code 
i will make it:
# @param domacl = reversed domain acl
# @param dom = reversed domain to check
def match? domacl, dom
     dom == domacl ||
       domacl.endwith?"." && dom.start_with?(domacl.chop)
end


>> so it works..
>
> :-)
>
>> my goal is maximum efficiency.
>
> Yes, but you should not loose sight of functionality.
>
> Kind regards
>
> robert
>
>
Thanks,
Eliezer


-- 
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il