Logan Capaldo wrote: > Here's an example > % cat test.txt > ��Hello darkness my old friend, I've come to talk to you again. > What's new pussy-cat? > Hello world! Mine comes out exactly the same, even through cat. I just never noticed the first characters before. > As you can see I saved this file as UTF-16. You can also see that my > cat isn't quite as smart as yours, we see the BOM at the beginning. > The next step is to write a ruby script that can handle this: > Sadly, this will _only_ handle utf-16 encoded files, it can't even > handle utf-8. Here's the code I've decided I'm happy with: #!/usr/bin/env ruby search_term = /#{ARGV[0]}/ notes_dir = Dir.new(".").to_a - ['.', '..'] positive_results = [] notes_dir.each do |note| fl = `cat "#{note}"` if fl =~ search_term positive_results.push(note) end end positive_results.uniq.each do |x| puts "\"#{x}\"" end The search script is in the directory I want to traverse (~/notes). I just want to get the names of files that contain the search terms. From there, I can pipe the output to another script. Come to think of it, I'm still only checking against ARGV[0] as a search term. I should be iterating through ARGV. Easy fix. > > Detecting utf-16 or ascii isn't so bad, if you know for sure the > utf-16 will have a BOM, you just have to look for it. (It's going to > be either 0xFEFF or 0xFFFE). On the other hand if you have to handle > more than just utf-16 and ascii, things are going to get confusing > quick, it's difficult to detect the proper encoding of a file, > especially since so many encodings are supersets of ascii. I'll just let `cat` do that for me :-) -- Posted via http://www.ruby-forum.com/.