Nokogiri goes bump (or segfaults) in the night…
Recently we’ve been working on upgrading the version of Ruby the site runs on, from 1.8.7 to 1.9.3. We’re doing this for a number of reasons, including improved performance, new language features,and trying to stay relatively current with our tech.
Things went fairly smoothly until we hit a problem where we could get one of our rspecs to segfault (i.e. actually crash the ruby interpreter) every time we ran it on our (OSX Lion) development machines:
ruby(89240,0x7fff7b133960) malloc: *** error for object 0x24: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Or sometimes we would see the more dramatic:
.../Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/activesupport-3.0.12/lib/active_support/whiny_nil.rb:48: [BUG] Segmentation fault
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]
-- Control frame information -----------------------------------------------
c:0041 p:0054 s:0161 b:0161 l:000160 d:000160 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/activesupport-3.0.12/lib/active_support/whiny_nil.rb:48
c:0040 p:---- s:0154 b:0154 l:000153 d:000153 FINISH
c:0039 p:0019 s:0152 b:0152 l:000151 d:000151 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/header.rb:165
c:0038 p:0123 s:0148 b:0144 l:000143 d:000143 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/header.rb:160
c:0037 p:0033 s:0137 b:0136 l:000135 d:000135 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/message.rb:1105
c:0036 p:0018 s:0131 b:0131 l:000130 d:000130 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/message.rb:581
c:0035 p:0048 s:0127 b:0127 l:000126 d:000126 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/actionmailer-3.0.12/lib/action_mailer/test_case.rb:48
-- Control frame information -----------------------------------------------
c:0041 p:0054 s:0161 b:0161 l:000160 d:000160 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/activesupport-3.0.12/lib/active_support/whiny_nil.rb:48
c:0040 p:---- s:0154 b:0154 l:000153 d:000153 FINISH
c:0039 p:0019 s:0152 b:0152 l:000151 d:000151 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/header.rb:165
c:0038 p:0123 s:0148 b:0144 l:000143 d:000143 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/header.rb:160
c:0037 p:0033 s:0137 b:0136 l:000135 d:000135 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/message.rb:1105
c:0036 p:0018 s:0131 b:0131 l:000130 d:000130 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/message.rb:581
c:0035 p:0048 s:0127 b:0127 l:000126 d:000126 METHOD /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/actionmailer-3.0.12/lib/action_mailer/test_case.rb:48
... REMOVED MASSIVE AMOUNTS OF STACKTRACE ....
-- Other runtime information -----------------------------------------------
* Loaded script: /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/bin/rspec
* Loaded features:
0 enumerator.so
1 /Users/joff/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin11.4.0/enc/encdb.bundle
2 /Users/joff/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin11.4.0/enc/trans/transdb.bundle
3 /Users/joff/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb
4 /Users/joff/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin11.4.0/rbconfig.rb
2635 /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/elements/address_list.rb
2636 /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/fields/to_field.rb
2637 /Users/joff/.rvm/gems/ruby-1.9.3-p194@redbubble/gems/mail-2.2.19/lib/mail/fields/subject_field.rb
2638 /Users/joff/dev/redbubble-19/lib/red_cloth/formatters/compact_text.rb
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Abort trap: 6
A bit of searching around suggested that the Nokogiri gem might be at fault here, specifically during Garbage Collection.
We tested this by adding a ‘GC.disable’ line at the top of our spec, which prevented the crash from happening. Unfortunately, simply turning off garbage collection isn’t a viable option, so a fix needed to be found.
Further research seemed to suggest that the issue lies somewhere in the interaction between the Nokogiri gem and libxml2, and that using a newer version of that library could alleviate the problem.
As we use homebrew on our development machines, we wanted to use that to recompile Nokogiri with the updated library.
First, we installed some pre-requisites:
libiconv
Recent versions of homebrew do not provide libiconv, so we compiled it ourselves:
curl "http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.13.1.tar.gz" -o /tmp/libiconv-1.13.1.tar.gz
cd /tmp
tar xvfz libiconv-1.13.1.tar.gz
cd libiconv-1.13.1
# This places it alongside your brew installations
./configure --prefix=/usr/local/Cellar/libiconv/1.13.1
make
sudo make install
libxml2 and libxslt
brew install libxml2 libxslt
Homebrew does provide these libraries, but by default brew won’t symlink those into the library path. This is because
they are already provided by OSX, and are somewhat core to its operation, and putting them on the path could screw things up.
This is fine, as we explicitly point to them when we compile Nokogiri.
Nokogiri
We also use bundler, so instead of manually compiling Nokogiri, we can configure bundler with the compile flags we need:
bundle config build.nokogiri --with-xml2-include=/usr/local/Cellar/libxml2/2.7.8/include/libxml2 --with-xml2-lib=/usr/local/Cellar/libxml2/2.7.8/lib --with-xslt-dir=/usr/local/Cellar/libxslt/1.1.26 --with-iconv-include=/usr/local/Cellar/libiconv/1.13.1/include --with-iconv-lib=/usr/local/Cellar/libiconv/1.13.1/lib
Then we can run a bundle install and Nokogiri will be compiled with the correct libraries.
Even after this we were seeing segfaults. Something was still not quite right. We would see in our output, a message like“Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.7.3”
Doing a nokogiri -v showed:
warnings: []
nokogiri: 1.4.3.1
ruby:
version: 1.9.3
platform: x86_64-darwin11.4.0
engine: ruby
libxml:
binding: extension
compiled: 2.7.8
loaded: 2.7.8
Which looked correct. So why was it loading the older libxml version when we ran the spec? It turns out that other gems
in our Gemfile were also using libxml2, and they were loading the version supplied by the operating system.
Then when Nokogiri would load, it would just use the already loaded library. By changing our Gemfile to list Nokogiri at
the top, it would ensure the newer library would be loaded before the older one had a chance to, and thus (hopefully) fix
the issue.
Once we did this (and removed Gemfile.lock and ran bundler again), our spec showed much better results:
$ rspec spec/mailers/activity_mailer_spec.rb
...........
Finished in 6.63 seconds
11 examples, 0 failures