--W/nzBZO5zC0uMSeA
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2007-05-07 16:39:12 +0900 (Mon, May), Nanyang Zhan wrote:
> Don't get me wrong, because I just want to know how to separate English
> words from a string with ruby.
> There are strings (UTF-8 encoded) to record people's name,
> like:
>=20
> =E6=91=A9=E6=A0=B9=C2=B7=E5=BC=97=E9=87=8C=E6=9B=BC Morgan Freeman
> =E5=B8=83=E9=B2=81=E6=96=AF=C2=B7=E5=A8=81=E5=88=A9=E6=96=AF Bruce Willis
> =E6=9D=8E=E5=B0=8F=E6=98=8E Lee xiao ming
> these strings containing Chinese name(without space between characters),
> separated by a space, following an English name
>=20
> or
> Frank Darabont
> Just an English name.
>=20
> Would you give me an idea how to separate these Chinese characters(if
> any)?

Maybe a regexp similiar to
/^([^qazwsxedcrfvtgbyhnujmikolpQAZWSXEDCRFVTGBYHNUJMIKOLP ]+)/
would help?

Does [a-zA-Z] include Chinese characters? In Polish locale it includes
Polish non-ASCII characters, so I guess it might include Chinese ones.

I guess you want split a given string into words (separated by space),
and then check whether the first word starts or includes at least one
Chinese character.

--=20
No virus found in this outgoing message.
Checked by 'grep -i virus $MESSAGE'
Trust me.

--W/nzBZO5zC0uMSeA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6-ecc01.6 (GNU/Linux)

iD8DBQFGPvmRsnU0scoWZKARAnaLAJsGCJwgW5wc0JgwJwwQLtAHY0eMjwCfbdb9
Ky1++DV5VAmjTHKyzASqYTI=
=saZW
-----END PGP SIGNATURE-----

--W/nzBZO5zC0uMSeA--

On 2007-05-07 16:39:12 +0900 (Mon, May), Nanyang Zhan wrote:
> Don't get me wrong, because I just want to know how to separate English
> words from a string with ruby.
> There are strings (UTF-8 encoded) to record people's name,
> like:
> 
> ຬʦΤ Morgan Freeman
> ۻ۰ Bruce Willis
>  Lee xiao ming
> these strings containing Chinese name(without space between characters),
> separated by a space, following an English name
> 
> or
> Frank Darabont
> Just an English name.
> 
> Would you give me an idea how to separate these Chinese characters(if
> any)?

Maybe a regexp similiar to
/^([^qazwsxedcrfvtgbyhnujmikolpQAZWSXEDCRFVTGBYHNUJMIKOLP ]+)/
would help?

Does [a-zA-Z] include Chinese characters? In Polish locale it includes
Polish non-ASCII characters, so I guess it might include Chinese ones.

I guess you want split a given string into words (separated by space),
and then check whether the first word starts or includes at least one
Chinese character.

-- 
No virus found in this outgoing message.
Checked by 'grep -i virus $MESSAGE'
Trust me.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6-ecc01.6 (GNU/Linux)

iD8DBQFGPvmRsnU0scoWZKARAnaLAJsGCJwgW5wc0JgwJwwQLtAHY0eMjwCfbdb9
Ky1++DV5VAmjTHKyzASqYTI=
=saZW
-----END PGP SIGNATURE-----