Simon Schuster wrote:
> compile a whole lot of ruby regex examples, with commentary on what's
> going on. the few websites I've found, and books I've looked through
> just touch on the basics with minimal examples and explanation, or are
> specifically for perl/etc. a nice-looking and lengthy site could be
> extremely helpful to a lot of people starting with ruby, I imagine.
> 
> - dealing with unicode?
This one bothered me a lot, but the solution is simple. At the beginning 
of the document, set
$KCODE = "u"

This will fix regex behavior for use with regular expressions. I assume 
the default behavior will be improved with Ruby 2.0, but I'm not using 
1.9 so can't say for sure.

> - mingling literal " / \ etc, with their regex counterparts, in ways
> that would be daunting for the inexperienced
The first think to keep in mind is that it never hurts to accidentally 
escape something in a double quoted (soft quoted) string or regex. So if 
you aren't sure, "\"", "\'", "\\" are all okay, as are /\"/, /\//, and 
%r|\/| (the latter being an alternative way to specify a regex. But you 
only need to escape characters that have special meaning. So in a 
slash-delimited regex, a slash has special meaning, but in a %r regex, 
it does not:
%{/} is the same as /\//, as the former does not need to be escaped.

If you use Regexp.new(" ... "), then the regexp comes from a string, and 
needs to follow the escaping rules for strings--you need to escape 
double quotes.

A single quoted string is sometimes called "hard quoted". This means 
nothing is expanded / nothing has special meaning, so nothing needs to 
be escaped. Slash is not an escape character, here. The one exception is 
  if the slash is before a single quote, in which case it will escape it.

Sorry if these rules are confusing. You will get used to them. The way 
to learn regular expressions is to use them. You will get comfortable 
with them when you need them.

> - just generally "higher-level" regex, leave the "intro to regex" to
> all the other places. that's easy enough to find.
> 
> 

Here's one of mine:
/<a[^>]+?href=['"]?(.+?)['"\s>][^>]*>/im
This matches a link. Throughout the regex I use [^>] frequently, which 
means "any character that does not end the tag". Think of [^>]* as a 
better .*
Interesting bits:
-using +? says that the match is non-greedy. It will match as little as 
possible. *? does the same think, but I find less use for it, as it 
usually matches an empty string.
-the /i and /m at the end mean "case insensitive" and "multi-line". You 
can mix and match from /i, /m, /x (extended--ignores whitespace in the 
regex).

I don't know what your level is, so this may be a bit too cryptic, but 
you can probably puzzle it out if you are complaining about regex 
tutorials being too basic.

Dan