I have a need for something like this as well. But I need to  
replace the chars with something plain ascii besides a placeholder.  
Any ideas how to do that?

	I ended up finding the escape codes for all the chars like "\322"  
and friends so I could replace say curly quotes with standard quotes  
and stuff like that.

	I will play with your code a bit and see if I can make it do what I  
want. Thanks for sharing it though.

Cheers-
-Ezra



On Jan 22, 2006, at 10:41 AM, Levin Alexander wrote:

> Hi,
>
> i needed a method to convert a piece of text to plain ascii and
> replace all non-ascii chars with a placeholder.  I could not find
> anything in the stdlib so I wrote one.
>
> I'd love to hear your comments. (or pointers to existing libraries for
> this task)
>
> -Levin
>
>
> #!/usr/bin/ruby
>
> require 'iconv'
>
> class String
>
>   # removes all characters which are not part of ascii
>   # and replaces them with +replacement+
>   #
>   # +replacement+ is supposed to be the same encoding as +source+
>   #
>   def asciify(replacement = "?", target = "ASCII", source = "UTF-8")
>     intermediate = "UCS-4"
>     pack_format = "N*"
>     i = Iconv.new(intermediate, source)
>
>     u16s = i.iconv(self)
>     repl = i.iconv(replacement).unpack(pack_format)
>
>     s = u16s.unpack(pack_format).collect { |codepoint|
>       codepoint < 128 ? codepoint : repl
>     }.flatten.pack(pack_format)
>
>     return Iconv.new(target, intermediate).iconv(s)
>   end
> end
>
> if __FILE__ == $0
>   require 'test/unit'
>
>   class TestAsciify < Test::Unit::TestCase
>     def test_asciify
>       assert_equal "I?t?rn?ti?n?liz?ti?n".asciify, "I?t?rn?ti?n?liz? 
> ti?n"
>       assert_equal "M?torhead".asciify("(removed)"), "M(removed) 
> torhead"
>     end
>   end
> end

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
http://yakimaherald.com
ezra / yakima-herald.com
blog: http://brainspl.at