Brian Candler <b.candler / pobox.com> wrote:
>> Anyway, below is the code. I ran it through the profiler, but the top
>> two most costly ops were Dir.foreach, which I don't see any way to
>> optimize*, and the loop that gathers environment information, which I
>> again see no way to optimize.
> 
> Could you post your profiling? If you run using "time", how much user 
> CPU versus system CPU are you using?
> 
> Have you tried using Dir.open.each instead of Dir["/foo/*"]? Maybe 
> globbing is expensive.
> 
> Your environment loop does a fresh sysread(1024) for each var=val pair, 
> even if you've only consumed (say) 7 bytes from the previous call. You 
> would make many fewer system calls if you read a big chunk and chopped 
> it up afterwards. This may also avoid non-byte-aligned reads.
> 
> I would also be tempted to write one long unpack instead of lots of 
> string slicing and unpacking. The overhead here may be negligible, but 
> the code may end up being smaller and simpler. e.g.
> 
>  struct = ProcTableStruct.new(*psinfo.unpack(<<PATTERN))
> i i i i
> i i i i
> i i L L
> L x4i ss
> ..etc
> PATTERN
> 
> Perhaps you could combine it with your struct building, e.g.
> 
>      FIELDS = [
>         [:flag,"i"],      # process flags (deprecated)
>         [:nlwp,"i"],      # number of active lwp's in the process
>         ...
>         [:size,"s"],      # size of process in kbytes
>         [:rssize,"s"],    # resident set size in kbytes
>         [nil,"X4"],       # skip pr_pad1
>         ... etc
> 
> HTH,
> 
> Brian.

Here's a concept for metaprogramming that that I was able to generate
mostly by running regexes to transform the code. I've only tackled
/proc/#{file}/psinfo, but it should be fairly simple to extend to
the other files as well

# The Sys module serves as a namespace only.
module Sys

   # The ProcTable class encapsulates process table information.
   class ProcTable

      # The version of the sys-proctable library
      VERSION = '0.8.0'

      private

      PRNODEV = -1 # non-existent device

       
      #Dissecting the format of this, we have a symbol mapping to an unpack format string segment
      #the @ sign followed by a number indicates the offset in the string, and the text following that number is the format
      #of the data to unpack
      FIELDS=[
            [:flag , "@0 i"],
            [:nlwp , "@4 i"],
            [:pid , "@8 i"],
            [:ppid , "@12 i"],
            [:pgid , "@16 i"],
            [:sid , "@20 i"],
            [:uid , "@24 i"],
            [:euid , "@28 i"],
            [:gid , "@32 i"],
            [:egid , "@36 i"],
            [:addr , "@40 L"],
            [:size , "@44 L"],
            [:rssize , "@48 L"],
            [:ttydev , "@56 i"],
            [:pctcpu , "@60 S"],
            [:pctmem , "@62 S"],
            [:start , "@64 L"],
            [:time , "@72 L"],
            [:ctime , "@80 L"],
#note that the A format specifier automatically does what the #strip method does
#so I don't have to call .strip in the ps method
            [:fname , "@88 A16"],
            [:psargs , "@104 A80"],
            [:wstat , "@184 i"],
            [:argc , "@188 i"],
            [:argv , "@192 L"],
            [:envp , "@196 L"],
            [:dmodel , "@200 C"],
            [:taskid , "@204 i"],
            [:projid , "@208 i"],
            [:nzomb , "@212 i"],
            [:poolid , "@216 i"],
            [:zoneid , "@220 i"],
            [:contract , "@224 i"],
            [:lwpid , "@236 i"],
            [:wchan , "@244 L"],
            [:stype , "@248 C"],
            [:state , "@249 C"],
            [:sname , "@250 a1"],
            [:nice , "@251 C"],
            [:syscall , "@252 S"],
            [:pri , "@256 i"],
            [:clname , "@280 A8"],
            [:name , "@288 A16"],
            [:onpro , "@304 i"],
            [:bindpro , "@308 i"],
            [:bindpset , "@308 i"]
      ]

      field_names,format_strings=FIELDS.transpose

      eval <<-"end;"
        def first_pass_fill string
          struct=ProcTableStruct.new

          #{ field_names.collect{|x| "struct.#{x}"}.join(", ") } = string.unpack "#{format_strings.join ' '}"
        end
      end;

      #repeat the above with a new array instead of FIELDS and a new method name
      #for any other file you want to unpack this way

=begin
This eval will define a function with the following code. The arrays and
metaprogramming are just an easier way to manage the format string and
fieldnames that you can understand them when maintenence time comes around.

        def first_pass_fill string
          struct=ProcTableStruct.new

          struct.flag, struct.nlwp, struct.pid, struct.ppid, struct.pgid,
          struct.sid, struct.uid, struct.euid, struct.gid, struct.egid, struct.addr,
          struct.size, struct.rssize, struct.ttydev, struct.pctcpu, struct.pctmem,
          struct.start, struct.time, struct.ctime, struct.fname, struct.psargs,
          struct.wstat, struct.argc, struct.argv, struct.envp, struct.dmodel,
          struct.taskid, struct.projid, struct.nzomb, struct.poolid, struct.zoneid,
          struct.contract, struct.lwpid, struct.wchan, struct.stype, struct.state,
          struct.sname, struct.nice, struct.syscall, struct.pri, struct.clname,
          struct.name, struct.onpro, struct.bindpro, struct.bindpset = string.unpack "@0
          i @4 i @8 i @12 i @16 i @20 i @24 i @28 i @32 i @36 i @40 L @44 L @48 L @56 i
          @60 S @62 S @64 L @72 L @80 L @88 A16 @104 A80 @184 i @188 i @192 L @196 L @200
          C @204 i @208 i @212 i @216 i @220 i @224 i @236 i @244 L @248 C @249 C @250 a1
          @251 C @252 S @256 i @280 A8 @288 A16 @304 i @308 i @308 i"
        end
=end

      public

      ProcTableStruct = Struct.new("ProcTableStruct", *field_names)

      #if you have multiple files you're reading from with their field names
      #in multiple different variables, you'll want to replace field_names
      #with some array concatentation
      

      # In block form, yields a ProcTableStruct for each process entry that you
      # have rights to. This method returns an array of ProcTableStruct's in
      # non-block form.
      #                
      # If a +pid+ is provided, then only a single ProcTableStruct is yielded or
      # returned, or nil if no process information is found for that +pid+.
      #
      # Example:
      #         
      #   # Iterate over all processes
      #   ProcTable.ps do |proc_info| 
      #      p proc_info             
      #   end           
      #      
      #   # Print process table information for only pid 1001
      #   p ProcTable.ps(1001)                               
      #                       
      def self.ps(pid = nil)
         array = block_given? ? nil : []
         Dir.foreach("/proc") do |file|
            next if file =~  \D  # Skip non-numeric entries under / proc

            # Only return information for a given pid, if provided
            if pid
               next unless file.to_i == pid
            end
               
            # Skip over any entries we don't have permissions to read
            begin
               psinfo = IO.read("/proc/#{file}/psinfo")
            rescue StandardError, Errno::EACCES
               next
            end
               

            #the first pass fill just gets the raw data and unpacks it
            struct = first_pass_fill psinfo
            #now we do the transformations we need on the few fields that need it
            struct.pctcpu= (struct.pctcpu*100).to_f / 0x8000
            struct.pctmem= (struct.pctmem*100).to_f / 0x8000
            struct.start=Time.at(struct.start)
            #the fields that needed stripping were handled by unpack

            #repeat the above for other files that we need to deal with

            if block_given?
               yield struct
            else
               array << struct
            end
         end   
            
         pid ? array[0] : array
      end                      
   end
end


-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/