2008/9/10 Xiong Chiamiov <xiong.chiamiov+ruby_forum / gmail.com>: > Ruby 1.8.6 with Oniguruama installed and working (everywhere else, this > seems to be my problem). > > Let me preface this by saying that I am new to Ruby (and kinda jumped > in, rather than learning it properly), and regexes are not my thing - > that why I have nifty regex-checkers. > > I am trying to extract some parts out of a string > ("<p><b>'Algebra'</b><br>") that I scraped from some html. I'm getting > nil returned from the expression: > > Oniguruma::ORegexp.new("(?<=<p><b>').*(?='</b><br>)").scan(scraped_html) > > with scraped_html being the string mentioned above. > > Doing some experimenting, I have found that the first part works just as > planned (eg, everything except the lookahead). Using wildcards (. and > *) works as well: > > Oniguruma::ORegexp.new("(?<=<p><b>').*(?=.)").scan(scraped_html) > > returns [#<MatchData "Foo'</b><br">, #<MatchData "Bar'</b><br">], as > expected. However, anything else (<, b, \w, etc.) causes the regex to > not match. > > I am quite befuddled about this, though I (almost certainly) know it is > my fault. Any help would be much appreciated. With 1.9: irb(main):001:0> s="<p><b>'Algebra'</b><br>" => "<p><b>'Algebra'</b><br>" irb(main):002:0> s.scan %r{(?<=<p><b>').*(?='</b><br>)} => [] irb(main):003:0> s.scan %r{(?<=<p><b>').*?(?='</b><br>)} => ["Algebra"] Note the non greedy match. I usually rather do this in those cases: irb(main):005:0> s.scan %r{<p><b>'(.*?)'</b><br>} => [["Algebra"]] I.e. use groups to extract the part that I am interested in. Kind regards robert -- use.inject do |as, often| as.you_can - without end