On 7/6/2012 9:21 AM, Robert Klemme wrote:
> On Thu, Jul 5, 2012 at 10:40 PM, Eliezer Croitoru <eliezer / ngtech.co.il> wrote:
<SNIP>
>> i have used greasyspoon as icap server but it's too much for my needs.
>> it takes a lot of memory and cpu for nothing so i started writing my own on
>> ruby and i then found out that i can add some nice features to it.
>
> That sounds fun!
>
it's nice because the main reason was to coordinate two cache proxies 
together.
in order to cache dynamic content (youtube and some others) i needed to 
"fake" request on one proxy and then on the other when requested the 
fake then serv the real one.
so i have used ICAP to get the original url from the first and store it 
in memDB\sql then rewrite a fake url and send it back.
the proxy have explicit rule that direct the fake domain request through 
proxy2 and then proxy2 request the same ICAP server on the url.
the ICAP server will then rewrites the fake url to the original one.

the first cache thinks it gets the fake url and stores it in mem.
the second will get the real one for proxy1.
what i benefit? the dynamic content is stored as a "static" url in cache 
and can be served to other users.



> Well, basically you need a structure of nested Hashes.  Matching would
> start at the TLD and descend to the most specific domain.
>
>> i am thinking of loading some if not all acls into memory and in this case i
>> will need to use it.
>
> That's certainly faster if memory is sufficient for the size of lists
> you want to handle.
>
>> i was thinking about the same idea but it will much more simple for me to
>> store the domain in a full reverse such as "moc.elpmaxe".
>
> Why?
>
> irb(main):020:0> "foo.bar.baz".split(/\./).reverse
> => ["baz", "bar", "foo"]
> irb(main):021:0> "foo.bar.baz".split(/\./).reverse.join('.')
> => "baz.bar.foo"
>
instead of spliting and reversing just one reverse will be lower cpu.
it will give me the same function..

>> i have two objectives i want to achieve:
>> 1. a dstdomain acl like in squid for simple allow\deny.
>> 2. a dstdomain from squidGuard blacklists to block porn spyware and others
>> based on category.
>
> Well, if you make "allow" and "deny" (or only "deny") a category then
> it's just one mechanism. :-)
>
yes indeed it is one mechanism that up and running as we speak.
but the dstdomain and blacklists is not the same...
dstdomain is example.com matches only this domain and not any others but
in blacklists example.com will match also subdomains but not 1example.com.
so i had to add some condition to it.

>> the blacklists are updated via a txt file with only one match domain or
>> domain wildcard per line and i will use it as is.
>> so i will just use:
>> LOAD DATAT INFILE '/tmp/porndoms.txt'
>> INTO TABLE porn (@var1)
>> SET dom= REVERSE(@var1);
>>
>> this is not suppose to be a "readable" field and it's 30MB+ size so i dont
>> really care how it's ordered in the table.
>
> Order in table rarely matters with RDBMS.  It's the indexes which are important.
>
ordered ..as organized .. as stored..
POTATO POTATO

>> about acls that the admin writes in "acltable" this is another story because
>> it's most of the time very sort and must be readable for the admin as a
>> human.
>
> Isn't it more important that the UI presents human readable
> information?  But I agree, readable DB contents helps in debugging
> etc.  That's why I suggested the reversed format based on domains.
>
yes indeed the point is that the UI will present it but as i am not 
writing any UI right now and also because if the admin dont know how to 
work with command line it will be very hard to do something with the 
server in his current state.
the server still has some exceptions here and there that i have found.
and speak of the devil: if i want to log ruby exceptions into a specific 
file. do yo now a thing about it?

>> i heard this podcast about programing and this guy said "it's better to have
>> code that works and then improve it then not having code at all"
>
> I'd be careful with that.  That philosophy only works for small
> systems where it is easy to do a major rewrite.  In other cases it
> might bring you into a situation where you are stuck with an
> architecture that does not fit the needs.
>
> Cheers
>
> robert
>
well this is the reason i am trying to:
1. make it more modular by using methods that can be changed easily
2. thing about efficiency.
3. consult with others.

for now there is one guy that requested me for that ACL of deny\allow 
per ldap group policy.
so my main goals now are:
1. fix bugs to make it bug free( i have some that i know of and might 
have others that i dont ).
2. add a more accurate url match filtering then just host\domain.
3. add user\ip db integration for future filtering\acl capabilities.
4. improve the filtering based on categories\level.
5. add a form that will allow a user to report a false-positive to the 
admin.
6. add a "user custom allowed\denied domains\urls list".
7. create a category option for the "custom allowed\denied domains\urls" 
so a user\admin can add to a user specific allowed categories.
for the above option i must really think more before implementing the 
filtering acls as levels or categories etc..
8. content auditor module
( i had in mind to add an option of "content inspector\inspecting\auditing".
what i mean is to add a feature that will log requests 
urls\domains\pages on a db so some human inspection on the content later 
can be done.
so in environment like small isp\office that want to build his own 
blacklists\categories based on users browsing experience\habits the 
"content auditor" will get the list from the the DB somehow. )
9. live urls\domains access statistics on a DB for admins.
(squid has logs but not live statistics)

i had just one simple goal and it became more then just that and i'm 
happy for that.

any ideas on the subjects?

Eliezer

-- 
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il