| <schneik / us.ibm.com> wrote: | | David Douthitt writes: | > I have a set of applications that were written | > in Perl 4, which scan the UNIX system logs and generate color-coded | > HTML pages for them. | > All they do is scan the log (41,000 lines plus) and generate HTML | > files based on them. | | Yes, but HOW do they do this? Well, the Ruby version used to do: LogFile.open.readlines.each { } Now it does LogFile.open { |f| f.readlines.each { } } Then it checks every line to see if the system name has changed, and changes FONT color if it has. | > At one time I had them (Ruby version, Perl | > version) generating separate files for each system in the log; | > when I switched to using ksh and grep, the speed increase was incredible. | | Again, HOW was this done? (Also recall that you already know that Ruby can | invoke grep too.) And what is incredible here? Was the speedup 50%? 100%? | 500%? It went from something like 3-6 minutes to about .3seconds. I might note that I used ksh ("grep foo file > file.out") inside a for-loop instead of my Perl/Ruby variants. | > I'm still stuck though, since scanning for one particular host | > (with 41,000 lines!) can take over 3 minutes. | | That doesn't seem (wild guess here) large enough to cause problems if you | are running on a moderately fast machine. Is the run time pretty much a | straight linear function of the number of lines of input? I haven't taken statistical samples and done regression analysis :-) | The basic problem here is that if you don't show people your (suitably | sanitized if necessary) code (or at least representative critical pieces of | your code), the answers you get to such questions will be based largely on | people's imagination, which may or may not have anything to do with the | most relevant factors in this case. (Maybe you should have been using Perl | 5 with compiled regular expressions instead of Perl 4. Maybe you were | somehow doing unnecessary or unbuffered I/O without realizing it. Maybe | your were building some sort of table or index for your HTML output that | inadvertently did something in an O(n**2) fashion. Maybe you stashed | everything in memory on a machine with insufficient RAM. Maybe any number | of other things....) Behave! Be nice! :-) You get out of bed on the wrong side this morning? Remember to SMILE when you say that! :-) You think I want to RELEARN Perl from the ground up - corrupting my Perl and OOP knowledge at the same time? Yuck! I noticed that HP-UX 11 STILL comes with Perl 4, not Perl 5. Here is some code: This following ruby code was replaced entirely by (ksh): for sys in $* do egrep ":.. [^ ]* (in\.|)$sys" $MESSAGES > ${sys} done (MESSAGES=/var/log/messages) And it FLIES! Here is the ruby code: #!/usr/bin/ruby # Interesting problem reached here... # # ARGV is [ "arg1", "arg2", "arg3" ] # results from a scan are [ [ "str1" ] ] # # Thus, ARGV.each is "arg1" ... "arg2" ... "arg3" ... # scanXX.each is [ "str1" ] ... [ "str2 ] ... # # Thus ARGV[0] is "arg1" ; scanXX[0] is [ "str1" ] ; # and scanXX[0][0] is "str1" # # This explains a lot. systems = Array.new class String def systemName self.scan("^... .. ..:..:.. ([^ ]*)") end end File.open("/var/log/messages").each { |line| line.chomp! # sys = line.scan("^... .. ..:..:.. ([^ ]*)") sys = line.systemName systems = systems | sys if not ARGV.include?(sys[0][0]) print(" ", line, "\n") end } # (systems.sort!).each { |sys| # print(" ", sys, "\n") # } [.......end.......] I thought about posting Perl code, but..... this is a RUBY mailing list... #!/usr/bin/env ruby #---------------------------------- # CLASSES #---------------------------------- require("Html.rb") require("getopts") # class FixNum # def format_color # format("#%04X", self) # end # end class Logs at_exit { Logs.end_body } def Logs.header (str = nil) Html.header { Html.title str } end def Logs.color_table (colors) Html.table { Html.table_row { colors.each { |machine, color| Html.table_data(color) { print machine } } } } end def Logs.date_heading(date) Html.named_anchor (date) Html.table (Colors::BREAKLINES, "100%") { Html.table_row { Html.table_data { print " " Html.em { Html.strong { print date } } } Html.table_data(Colors::BREAKLINES, "RIGHT") { Html.anchor("Top", "#HTMLTop") } } } end end class OutputFile < File TMPDIR = "/tmp/log2html.rb" def OutputFile.open (sys) File.open(OutputFile::TMPDIR + "/" + sys + ".html", "w") end end class LogFile < File def LogFile.open if ($*[0] == nil) super("/var/log/messages") else super($*[0]) end end def LogFile.unlink if ($*[0] != nil) super($*[0]) end end def LogFile.copy f = File.expand_path $*[0] `/bin/cat #{f}` # -- this is the speed up end end class String def log_fields self.chomp! # Interesting pattern: subsystem field (such as "in.identd[27062]:") # is not guaranteed to be present. So on occasion, $5 == nil. self =~ /(.{6,6}) (.{8,8}) ([^ ]*) (([^[]+).*: (.*)|.*)$/ [ $1, $2, $3, $4, $5 ] end end #---------------------------------- # MAIN PROGRAMME #---------------------------------- getopts("1s") machines = { "sys0" => Colors.green, "sys1" => Colors.blue, "sys2" => Colors.purple, "sys3" => Colors.red, "sys4" => Colors.yellow, "sys5" => Colors.blue_green } sysfiles = Hash.new Html.html { Logs.header "System Logs" Html.body(Colors.buff) { Html.target "main" Html.named_anchor "HTMLTop" Html.heading "messages" print "This page was generated on " print Time.now.ctime Logs.color_table machines Html.paragraph if ($OPT_1) # -- this is new Html.pre { # -- this is new print LogFile.copy # -- this is new } # -- this is new else old_date = "" old_sys = "" LogFile.open { |f| f.readlines.each { |line| date, time, sys, entry, subsys = line.log_fields if date != old_date Html.end_pre Logs.date_heading date Html.begin_pre Html.font (machines[sys]) # not always needed.... but is at first else if sys != old_sys Html.font (machines[sys]) end end print line, "\n" old_date = date old_sys = sys } } end } } The box says this: # uname -a Linux mysys.nowhere.nope.nyet.nada.zip.zilch.nicht 2.2.9-27mdk #1 Mon Jun 14 16:44:05 CEST 1999 i586 unknown It's a Compaq Prolinea 5100e - 100MHz Pentium. 32Megs of memory, 600M disk (yes, the disk fills up routinely :-) Now aren't you glad you asked for code? :-)