On Sat, Dec 4, 2010 at 9:38 AM, Rajarshi Chakravarty
<raj_plays / yahoo.com> wrote:
> Hi,
> I read records from a text file and insert them in the DB.
> Sometimes the data contains non ascii characters and I want to keep
> these out of the DB.
> How can I cleanse them and where?
> I mean should it be done while reading data or has ActiveRecord got any
> feature to do it?

What do you exactly mean by "non ascii"? Do you mean extended ascii
(aka high ascii), printable ascii, or unicode?

Without knowing details, I would suggest a regular expression like:

  text.gsub /[^[:ascii:]]/, ''

Or if you're using a ruby older than 1.9 or want cross-version compatibility:

  text.gsub /[^\x00-\x7F]/, ''

Note that the class [:ascii:] and the range in the second regular
expression include all valid ascii characters, which include the
control characters and \r (0x0D), \n (x0A), etc. If you only want the
alphabet, newlines, and punctuation, then you need to exclude the
control characters and try something like:

  text.gsub /[^\x20-\x7F\x0D\x0A]/, ''

HTH,
Ammar