Issue #12744 has been updated by Martin Drst.


Bouke van der Bijl wrote:

> I don't really have a use case for reverse_chars, but I added it for symmetry with the other methods.

Other languages may do that, but Ruby doesn't add something just for symmetry.

> I meant str.reverse_each_char, I typo'd it in the issue but it's correct in the patch. The equivalent with doing allocation would be str.chars.reverse.each. I could use `reverse_each_char` in Sprockets, where we need to iterate over the string backwards to check that it ends with certain characters (and know what it ends with).

Wouldn't this usually be done with a Regexp? If using a Regexp directly isn't efficient, what about just applying the reverse of the Regexp to the reverse of the string (so that it gets applied from the start)?


> Not sure why you think we can't make it faster than `reverse.each_char`, I've already implemented it and attached the patch. It uses `rb_enc_left_char_head`, which is implemented by all the encodings to scan a string backwards. 

Some of these implementations are not exactly trivial. Please look at enc/shift_jis.c or enc/gb18030.c. Please try your code on something like

```ruby
"\x95\x95".force_encoding('Shift_JIS') * x
```

where you increase x and see whether the time increases linearly or not.

 
> For the most common encoding (UTF8) it is always possible to scan a string backwards from any point, and looking at the other encodings implemented in Ruby it seems only gb18030 has a stateful way to back up to previous characters, so iterating backwards over that one could end up being O(N^2).

Yes indeed.


----------------------------------------
Feature #12744: Add str.reverse_each_char and str.reverse_chars
https://bugs.ruby-lang.org/issues/12744#change-60525

* Author: Bouke van der Bijl
* Status: Feedback
* Priority: Normal
* Assignee: 
----------------------------------------
This patch adds `str.reverse_each` and `str.reverse_chars`. It's currently not really possible to iterate a Ruby string in reverse while guaranteeing that you're not accidentally introducing an O(N^2) bug, without encoding to a fixed-length encoding like UTF-32. This is because variable-length encodings like UTF-8 requiring iterating over the whole string if you want to address characters by index.

The patch uses `rb_enc_left_char_head` to iterate over the string in reverse, so you can do so without allocating more memory.

---Files--------------------------------
add-reverse-string-iteration.patch (5.91 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>