On Feb 12, 8:25 am, "J. mp" <joaomiguel.pere... / gmail.com> wrote:
> Gavin Kistner wrote:
> > OK, but *why* aren't they allowed. You haven't described exactly what
> > your requirements are. Is it because you can't have to non-letters in
> > a row? Is it because the string must contain at least three letters?

You didn't answer these questions.

> > BTW, where are these requirements coming from? Are these business
> > requirements that must be enforced? Are you just making up what you
> > think people should probably have to use as a name? Or are you just
> > trying to learn regexp?
>
> It's a business requirement. The user name will be used before the
> domain, for example:
> I have the domain http://somedomain.com and for each user a unique url
> will exists like http://user.name.somedomain.com
> http://david_coperfield.somedomain.com
> http://andreas-blast.somedomain.com
>
> This is my business requirement, so I can only allow user names that can
> be used in a URI.

So the question is, what is legal in that part of a URI? The best
resource I can find is RFC2396 [1], and it says:
"The most common name registry mechanism is the Domain Name System
(DNS). A registered name intended for lookup in the DNS uses the
syntax defined in Section 3.5 of [RFC1034] and Section 2.1 of
[RFC1123]."


Section 2.1 of RFC 1123 [2] says:
"The syntax of a legal Internet host name was specified in RFC-952
[DNS:4].  One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a letter
or a digit.  Host software MUST support this more liberal syntax.

Host software MUST handle host names of up to 63 characters and SHOULD
handle host names of up to 255 characters."


RFC 952 [3] says:
"<domainname> ::= <hname>
<hname> ::= <name>*["."<name>]
<name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]"


So, my reading of that (and I'm not an expert) is that a machine name
MAY have digits in it (including at the start or end), may NOT have
underscores, and may be pretty darn long. (Though it makes sense to
put some sort of bound on it - if you think 30 chars is OK, so be it.)

A regexp for this, allowing multiple dotted names joined together:

# Regexp for a single name
/[a-z\d](?:[a-z\d-]*[a-z\d])?/i

# Regexp for 1 or more of those joined by periods
/(?:[a-z\d](?:[a-z\d-]*[a-z\d])?)(?:\.[a-z\d](?:[a-z\d-]*[a-z\d])?)*/i


[1] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
[2] http://rfc-ref.org/RFC-TEXTS/1123/chapter2.html#sub1
[3] http://rfc.net/rfc952.html#sA.