sean.swolfe / gmail.com wrote:
> Hi gang. Sorry I haven't been able to respond to my last post about
> Stop Word dictionaries. Been all too busy, but the posted info was very
> useful.
> 
> Anyways, I have a bit of a situation. I have a collection of files,
> around 10,000, that I need to parse and then suck that data into a
> database, along with their linked images. I've had a script that has
> been working pretty much for over 99.5% of the articles. Both the
> article data, and the images were getting imported fine.
> 
> The images also have to go through a few processing steps before being
> put into the database. They are resized to meet a certain constraint,
> of a new document format, and are also resized again a few more times
> to create 2 sizes of Thumbnails. I've been using the RMagick library to
> do the image resizing.
> 
> I then had to make a change, because I realized that I wasn't
> accommodating Animated GIF's and the resulting images that were saved
> only contained the first frame. So now I added just a routine around
> the resizing to iterate through an ImageList and do all the resizing
> for each frame.
> 
> Now with this change I get a "[BUG] Segmentation fault" error. I
> thought maybe I was starving the memory resources and the GC couldn't
> keep up. (I'm uncomfortably unfamiliar with how the Ruby GC works as
> opposed to Java or .NET GC), so I explicitly call garbage_collect after
> about 100 imports. Still get the same error.
> 
> This happens in both Ruby 1.8.2 and 1.8.4 on Mac OSX as well as Linux.
> Also, if it possibly means anything, this script is run in a Rails
> environment using the runner script. It will run successfully for about
> 4000 articles before it bombs out.
> 
> Here is an excerpt of the possibly offensive code:
> # image_object is an ActiveRecord object created
> # a little before this code.
> # get the ImageList from the image file.
> image_file_list = Magick::ImageList.new(@old_site_path + image_path)
> # create ImageLists for the thumbnails copied from the original
> ImageList
> smaller_list = image_file_list.copy
> smallest_list = image_file_list.copy
> tiniest_list = image_file_list.copy
> 
> # loop thorugh the images in the list
> for image_index in 0...image_file_list.length
>   image_file = image_file_list[image_index]
>   # resize the loaded image to the main constraints
>   image_file.change_geometry!('150x150') do |cols, rows, img|
>     img.resize!(cols, rows)
>     image_object.original_x = cols
>     image_object.original_y = rows
>    end
> 
>   smaller_list[image_index] = image_file.change_geometry('110x110') do
> |cols, rows, img|
>     image_object.thumb_x = cols
>     image_object.thumb_y = rows
>     img.resize(cols, rows)
>   end
>   smallest_list[image_index] = image_file.change_geometry('91x91') do
> |cols, rows, img|
>      image_object.small_thumb_x = cols
>      image_object.small_thumb_y = rows
>      img.resize(cols, rows)
>   end
>   tiniest_list[image_index] = image_file.change_geometry('50x50') do
> |cols, rows, img|
>     image_object.tiny_thumb_x = cols
>     image_object.tiny_thumb_y = rows
>     img.resize(cols, rows)
>   end
> end
> 
> image_object.original_filename = File.basename(image_path)
> image_object.title = "#{ review_data[:artist] }: #{ review_data[:title]
> }"
> 
> image_object.image_data = image_file_list.to_blob
> image_object.tiny_thumb = tiniest_list.to_blob    # <-- segement fault
> usually happens here.
> image_object.big_thumb = smaller_list.to_blob
> image_object.small_thumb = smallest_list.to_blob
> 
> 
> Thanks in advance....
> 

Anything that runs for 4000 iterations and then bombs almost has to be 
an out-of-memory condition. Depending on the size of your images, 100 
images could represent a very sizable chunk of memory. For example, 100 
4 megapixel images from a digital camera require >1600MB of memory _if_ 
you've configured ImageMagick to use 8 bits per pixel. (By default 
ImageMagick is configured to use 16 bits per pixel.)

Why don't you try putting a GC.start after the .to_blob calls to free up 
those image lists?

Here's a description of memory problems that can arise when you're using 
RMagick: http://rubyforge.org/forum/forum.php?thread_id=1374&forum_id=1618