--0-51391511-10327667483010
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Thanks all for speedup tips.

I have tried all of them and the fastest one is attached.
Results:
ruby : 115 sec -> 62 sec  (wow :-)
python : 60 sec -> 53 sec

Ruby speedup is really impressive. Half of the improvement is caused by using
regex to eliminate comment lines and half by different way for extracting second field.

I have also attached a snippet from my  web.log file.
Information contained is faked but the structure keeps the same.

This solution seems to be fastest (or one close to the fastest) to get lines having 2nd field
satisfying the condition. 

But there could be generally conditions put on more fields. What then?
Use String.split to get the fields and then match single fields or build a all_in_one regex and
try to match the whole line?

<snippet_1>
pom  ine.split( "\t" )
if pom[1] /expr_to_match_1/ and pom[3] /expr_to_match_2/ and .....
  do_something
end
</snippet_1>

OR?

<snippet_2>
if line /expr_to_match_1 .... expr_to_match_2  .... expr_to_match_3/
 do_something
end
</snippet_2>



Thanks
Tom


--- Joseph McDonald <joe / vpop.net> wrote:
> 
> can you give a few examples of the lines in the logfile?
> 
> thanks,
> -joe
> 
> Friday, September 20, 2002, 9:22:34 AM, you wrote:
> 
> TB> Hello,
> 
> TB> I have written just a simple script to analyze a log file and (just for fun) I have written
> TB> exactly the same in python to see the difference and ...
> TB> python is almost twice the faster doing the same job :-| (???)
> TB> You can see attached files for sources.
> TB> Environment: P4 1.8Ghz, 256MB, WinXP Pro. Python 2.2.1, Ruby 1.7.2-4 - the Pragmatic
> distribution.
> TB> The analyzed file is about 420 Mbytes and python does it in about 60 sec and ruby in about
> 115
> TB> sec.
> TB> Have some suggestion how to speed the ruby code?
> 
> TB> Regards
> 
> TB> Tom
> 
> 
> TB> __________________________________________________
> TB> Do you Yahoo!?
> TB> New DSL Internet Access from SBC & Yahoo!
> TB> http://sbc.yahoo.com
> 
> 
> -- 
> Best regards,
>  Joseph                            mailto:joe / vpop.net
> 





__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
--0-51391511-10327667483010
Content-Type: application/octet-stream; name="parse.rb"
Content-Transfer-Encoding: base64
Content-Description: parse.rb
Content-Disposition: attachment; filename="parse.rb"

DQpvdXRmID0gRmlsZS5uZXcoICJmb3VuZC5sb2ciLCAidyIgKQ0KciA9IFJl
Z2V4cC5uZXcoICJ4eHhcXHh4eHh4IiwgUmVnZXhwOjpJR05PUkVDQVNFICkN
CnJjID0gUmVnZXhwLm5ldyggIl4jIiApDQp0ID0gVGltZS5uZXcNCnB1dHMg
dA0KRmlsZS5uZXcoICJXRUIubG9nIiApLmVhY2hfbGluZSBkbw0KCXxsaW5l
fA0KCW5leHQgaWYgbGluZSA9fiByYw0KCXQxID0gbGluZS5pbmRleCgiXHQi
KSsxOyBwb20gPSBsaW5lW3QxLi4obGluZS5pbmRleCgiXHQiLCB0MSktMSld
DQoJaWYgcG9tID1+IHINCgkJb3V0Zi5wdXRzKCBsaW5lICkNCgllbmQNCmVu
ZA0KdGUgPSBUaW1lLm5ldw0KcHV0cyB0ZQ0KcHJpbnQgIlRpbWU6ICINCnB1
dHMgKHRlIC0gdCkudG9fcw0Kb3V0Zi5jbG9zZSgpDQo-0-51391511-10327667483010
Content-Type: application/octet-stream; name="web.log"
Content-Transfer-Encoding: base64
Content-Description: web.log
Content-Disposition: attachment; filename="web.log"

I1NvZnR3YXJlOiBNaWNyb3NvZnQoUikgSW50ZXJuZXQgU2VjdXJpdHkgYW5k
IEFjY2VsZXJhdGlvbiBTZXJ2ZXIgMjAwMA0KI1ZlcnNpb246IDEuMA0KI0Rh
dGU6IDIwMDItMDktMTkgMDA6MDA6MDANCiNGaWVsZHM6IGMtaXAJY3MtdXNl
cm5hbWUJYy1hZ2VudAlkYXRlCXRpbWUJci1ob3N0CWNzLWJ5dGVzCXNjLWJ5
dGVzCWNzLXByb3RvY29sCWNzLXVyaQlzLW9iamVjdC1zb3VyY2UJc2Mtc3Rh
dHVzDQoxMC54eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxsYS80LjAgKHh4
eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEwCTEuaW0ueHh4CTM4NgkxODgJ
aHR0cAlodHRwOi8vaW0uLmdpZglWQ2FjaGUJMzA0DQoxMC54eHgueHh4Lnh4
eAl4eHhceHh4eHgJTW96aWxsYS80LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5
CTAyOjM3OjEwCTEuaW0ueHh4CTM4NwkxODgJaHR0cAlodHRwOi8vLi4uLmdp
ZglWQ2FjaGUJMzA0DQoxMC54eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxs
YS80LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEwCTEuaW0ueHh4
CTM4NgkxODgJaHR0cAlodHRwOi8vLi4uLmdpZglWQ2FjaGUJMzA0DQoxMC54
eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxsYS80LjAgKHh4eHh4eHh4KQky
MDAyLTA5LTE5CTAyOjM3OjEwCTEuaW0ueHh4CTM5MgkxODgJaHR0cAlodHRw
Oi8vLi4uLmdpZglWQ2FjaGUJMzA0DQoxMC54eHgueHh4Lnh4eAl4eHhceHh4
eHgJTW96aWxsYS80LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEw
CTEuaW0ueHh4CTM4NQkxODgJaHR0cAlodHRwOi8vLi4uLmdpZglWQ2FjaGUJ
MzA0DQoxMC54eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxsYS80LjAgKHh4
eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEwCTEuaW0ueHh4CTM5MgkxODgJ
aHR0cAlodHRwOi8vLi4uLmpwZwlWQ2FjaGUJMzA0DQoxMC54eHgueHh4Lnh4
eAlhbm9ueW1vdXMJTW96aWxsYS80LjA4IFtlbl0gKFdpbk5UOyBVIDtOYXYp
CTIwMDItMDktMTkJMDI6Mzc6MTAJLQkyNDkJLQktCWh0dHA6Ly8uLi4uaHRt
bAktCTEyMjA5DQoxMC54eHgueHh4Lnh4eAlhbm9ueW1vdXMJTW96aWxsYS80
LjA4IFtlbl0gKFdpbk5UOyBVIDtOYXYpCTIwMDItMDktMTkJMDI6Mzc6MTAJ
LQkzNjMJLQktCWh0dHA6Ly8uLi4uaHRtbAktCTANCjEwLnh4eC54eHgueHh4
CXh4eFx4eHh4eAlNb3ppbGxhLzQuMDggW2VuXSAoV2luTlQ7IFUgO05hdikJ
MjAwMi0wOS0xOQkwMjozNzoxMQk2NC54eHguMTQ1Lnh4eAk0OTgJMjM1CWh0
dHAJaHR0cDovLy4uLi5odG1sCUluZXQJMjAwDQoxMC54eHgueHh4Lnh4eAl4
eHhceHh4eHgJTW96aWxsYS80LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5CTAy
OjM3OjEzCXd3dy54eHgueHh4CTUwNAkzNzAxCWh0dHAJaHR0cDovLy4uLi4u
Li4uCUluZXQJMjAwDQoxMC54eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxs
YS80LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEzCWltLnh4eC54
eHh4CTM3NwkxODgJaHR0cAlodHRwOi8vLi4uLmdpZglWQ2FjaGUJMzA0DQox
MC54eHgueHh4Lnh4eAlhbm9ueW1vdXMJTW96aWxsYS80LjAgKHh4eHh4eHh4
KQkyMDAyLTA5LTE5CTAyOjM3OjEzCS0JMzc2CS0JLQlodHRwOi8vLi4uLmdp
ZgktCTEyMjA5DQoxMC54eHgueHh4Lnh4eAl4eHhceHh4eHgJTW96aWxsYS80
LjAgKHh4eHh4eHh4KQkyMDAyLTA5LTE5CTAyOjM3OjEzCWltLnh4eC54eHgJ
Mzc3CTE4OAlodHRwCWh0dHA6Ly9pbS4uZ2lmCVZDYWNoZQkzMDQNCg
--0-51391511-10327667483010
Content-Type: text/plain; name="parse.py"
Content-Description: parse.py
Content-Disposition: inline; filename="parse.py"

import re
import string
import time

outf  ile( "found.log", "w" )
r  e.compile( "xxx\\xxxxx", re.I )
rc  e.compile( "^#" )
t  ime.time()
print time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.localtime( t ) )
for line in file( "WEB.log" ):
	if  rc.search( line ) ! one:
		continue
	t1  tring.index( line, "\t")+1
	pom  ine[t1:(string.index( line, "\t", t1)-1)]
	if r.search( pom ) ! one:
		outf.write( line + "\n" )

te  ime.time()
print time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.localtime( te ) )
print "Time: " + str( (te - t) )
outf.close()

--0-51391511-10327667483010--