< :前の番号
^ :番号順リスト
> :次の番号
P :前の記事(スレッド移動)
N :次の記事(スレッド移動)
|<:前のスレッド
>|:次のスレッド
^ :返事先
_:自分への返事
>:同じ返事先を持つ記事(前)
<:同じ返事先を持つ記事(後)
---:分割してスレッド表示、再表示
| :分割して(縦)スレッド表示、再表示
~ :スレッドのフレーム消去
.:インデックス
..:インデックスのインデックス
Issue #12275 has been updated by tad (Tadashi Saito).
File benchmark.rb added
File v1.patch added
Sorry for late, I implemented `#undump` as `v1.patch` based on my "string_undump" gem.
Please see https://github.com/ruby/ruby/pull/1765 also.
## Spec
Roughly speaking, my implementation follows steps below:
1. If `self` is wrapped with double quote, just ignore them
2. Parse `self` and produce new string with concatenating character
1. If escaped character (begins with backslash) found, unescape and add it to new string
2. Otherwise, just add the character to the new string
3. Return the produced string
Note that this method does not require the wrapping of double quotes. It will be a help
for the cases such as in the initial proposal like `"\\\t".undump` .
Supported escaping formats are:
* Backslash itself
* \\\\
* Double quote after backslash
* \" yields double quote itself
* One ASCII character after backslash
* \n \r \t \f \v \b \a \e
* "u" after backslash (Unicode)
* \uXXXX form
* \u{XXXXX} form (number of hex digits is variable)
* "x" and two hex digits after backslash
* \xXX form
* "#$", "#@" or "#{" after backslash
* These are embedded-Ruby-variable-like strings
I was careful to cover all escaping cases in `String#dump` so that `s.dump.undump == s`
is true as possible. Unfortunately, there are some limitations that shown below.
## Testing
I added some testcases in test/ruby/test_string.rb
https://github.com/ruby/ruby/pull/1765/files#diff-25eb856a893dbc53c562f6865b215083
and they passes of course.
Another testcases that based on the original gems also passed.
https://gist.github.com/tadd/634b6e4b09b6dfe7c8b97bca138d31ec
Furthermore, at the RubyKaigi of this year, I knew about AFL (American Fuzzy Lop).
http://lcamtuf.coredump.cx/afl/
(I was fortunate to know that. Thank you shyouhei!)
It can tease my implementation. I checked my original gem (string_undump 0.1.0) with AFL 2.36b,
then I confirmed that:
* It did not cause SEGV during one night, with (about) 9 million times execution
* It did not cause roundtrip error during one night, with (about) 10 million times execution
* `s == s.dump.undump` always `true`
* I ran it in UTF-8 environment
## Performance
It may be a boring result, but I'll also mention about performance. With really-naive
benchmark, `undump` is about 9 times faster than `eval(string)`.
See and try attached `benchmark.rb` file, then feel free to experience Ruby 3x3x3 now...
## Limitations
Sorry, some limitations exist on current implementation.
* Can't undump non ASCII-compatible string
* `'"abc"'.encode('utf-16le').undump` yields `Encoding::CompatibilityError` for now
* This is simply due to my lack of impl knowledge. Advice welcomed
* Can't undump dump-ed string correctly that is produced from non ASCII-compatible string
* String#dump adds `.force_encoding("encoding name here")` at the end of dump-ed string,
but String#undump doesn't parse this. Please check code below:
~~~ ruby
s = '"abc"'.encode('utf-16le')
puts s.dump #=> "a\x00b\x00c\x00".force_encoding("UTF-16LE")
s == s.dump.undump #=> false
~~~
* I believe this is rare case, and it's convenient enough even in the present situation
* But of course, I will not commit the patch if this limitation is not acceptable
## Future work
* Improve support for non ASCII-compatible encodings (eliminate limitations above)
* Optimization for single-byte-optimizable string
## Conclusion
I implemented `#undump` to be "someone" matz said. The code
* covers most practical cases of `dump` treats
* is enough safe from SEGV
* runs far faster from `eval()`
but some limitations still exist.
Any comments?
----------------------------------------
Feature #12275: String unescape
https://bugs.ruby-lang.org/issues/12275#change-67942
* Author: asnow (Andrew Bolshov)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
I think it will be usefull to have function that convert input string as it was written in prime qouted string or in double qouted string. It's part of metaprogramming.
Example:
~~~ ruby
class String
# Create new string like it will be writed in qoutes. Optional argument define type of qouting used: true - prime qoute, false - double qoute. Default is double qoute.
def unescape prime = false
eval( prime ? "'#{self}'" : "\"#{self}\"" )
end
end
"\\\t".unescape # => "\t"
~~~
Other requests:
http://www.rubydoc.info/github/ronin-ruby/ronin-support/String:unescape
http://stackoverflow.com/questions/4265928/how-do-i-unescape-c-style-escape-sequences-from-ruby
http://stackoverflow.com/questions/8639642/best-way-to-escape-and-unescape-strings-in-ruby
Realized
http://www.rubydoc.info/github/ronin-ruby/ronin-support/String:unescape
---Files--------------------------------
benchmark.rb (193 Bytes)
v1.patch (8.95 KB)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>