Issue #13263 has been updated by Jabari Zakiya.


It would be really helpful if people produce empirical results of actual coded
examples of techniques, to establish their efficacies.

**Math may suggest theoretical possibilities, but engineering determines their feasibility.**

I've tried to be objective in assessing alternaitve techniques, creating objective tests, and 
presentating empirical test results (accuracy and performance), that anyone can run themselves to verify.

Based on these empirical results, **bbm** is the "best" technique that satisfies a host of criteria I have listed.

The only thing Newton's method possibly has over **bbm** is speed, but only under certain conditions, and if
you select an optimized initial estimate. If the estimate is larger than necessary you get speed degradation.
If the estimate is too small you can get incorrect results. And Newton's speed advantage is only consistently
observed for an optimized version for squareroots, which cannot be applied generally to any nth-root, which
requires another specialized implementation.

Below I present a technique to create one optimized C impementation of **bbm** that will produce accurate results
for any nth-root of any integer.  I think if something like this (or possibly better; I leave that to the C
devs gurus) is done, all the different relevant criteria I think should be considered (accuracy, speed, memory
use, power consumption, universal cpu efficiency) will be satisfied.

Here is my proposal on how to optimally implement (in C) the **bbm** technique, to use for all nth-roots.

```
class Integer
  def irootn(n)   # binary bit method (bbm) for nth root
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
#-------------------------------------------------------- 
    root = bit_mask = 1 << (num.bit_length - 1)/n
    numb = root**n
    until ((bit_mask >>= 1) == 0) || numb == num
      root |= bit_mask
      root ^= bit_mask if (numb = root**n) > num
    end
#-------------------------------------------------------- 
    root *= self < 0 ? -1 : 1
  end
end
```

Here is a possible C optimized implementation of the **bbm** that will work for all n-bit sized cpus.

This example assumes a 64-bit cpu for illustration purposes.

Let's use the example below to illustrate the technique.

Let:  ``num = 10**50; num.size => 21`` bytes to represent.

Therefore **num** is represented in memory as below, where each character is a 4-bit nibble.

```
             W2          W1               W0
num   =  xxxxxxxx yyyyyyyyyyyyyyyy zzzzzzzzzzzzzzzz
```

Now the first value the algorithm computes is the **bit_mask** size,

but the first computation is this: ``(num.bit_length - 1) => (167 - 1) = 166``

We then (integer) divide this by the root value **n**.

Lets use the squareroot n = 2, but the algorithm works for any n >= 2.
    
Now: ``bit_mask = 1 << (166/2) =>  bit_mask = 1 << 83``

In a naive implementation this value would take up 84 bits, or two (2) 64-bit words.

But in an optimized implementation we only need to use one cpu register, and
a word count value/register, to keep track of this value.  So here the word
count is 2, and the initial value for the bit_mask is the portion of it that
fits in the upper word. As we shift **bit_mask** right, when it equals "0" we
decrement the word count to "1" and set the msb for **bit_mask** to "1", and continue.
When the **bit_mask** value hits "0" again, and we decrement its word count to "0"
we know we've finished the process (if we make it this far, if it wasn't a perfect root).

This  ``bit_mask = 1 << 83`` would computationally look like: ``bit_mask = 0x80000 0x0000000000000000``

but be initially represented as: ``bit_mask = 0x80000``; ``bit_mask_word_count = 2``

So we only have to work with register sized values to work with **bit_mask**.

The **root** value is then also set to the initial **bit_mask** value, but be represented as:

      root = 0x80000 0x0000000000000000; root_word_count = 2

The next computation is: ``numb = root**n``

Now we know **numb** has the same bit size as **num** but we can do an optimized ``root**n``
computation because we know here only the msb is set, so we can just do the ``root**n`` 
computation on the MSword of root, and store that as **numb** with the bit length of **num**.
This, again, eliminates doing an initial **n-eponentiation** on any arbitrary sized integer.

Now we start the loop: ``until (bit_mask >> 1) || numb == numb``

Here again, we only shift the value representing the current word position for **bit_mask**
stored in a cpu register, which is a fast/cheap inplace cpu operation for any sized cpu.

We can also do the ``numb == num`` comparison on a word-by-word basis, in a cpu register.

Now we do the operations: ``root |= bit_mask`` and ``root ^= bit_mask``

Again, we only operate on the current **root** and **bit_mask** words Wi that are in a cpu register
until we need to operate on the next lower word.

As we do the operation: ``if (numb = root**n) > num``

the **root** exponentiation only needs to be performed on the non-zero Wi for **root**, as the
lower "0" valued Wi will contribute nothing to the computation's value.

Thus, we've reduced the algorithm to a series of very fast, and computationally inexpensive, primitive
cpu operations, that can easily be performed on any cpu of any size.  We need no adds, multiplies, and divisions,
which are much more cpu intensive/expensive, that are necessary to perform Newton's method.

So with one standard C optimized implemenation we will always get correct results for any/all integer roots,
that will work optimally on any cpu, which uses the least amount of electrical power to perform (a key consideration
for power sensitive platforms like phones, tablets, embedded systems, robots, and IoT things).

I am also 99.99% sure that this type of **bbm** optimized implementation will be faster than any other
theoretically possible technique, as a general technique to always accurately produce the nth-root of an integer.

This can be empirically verified by creating this implemetation and doing an apples-to-apples comparison to other
techniques, and assess them to the list of criteria I suggested, as well as other relevant criteria, for comparison.

----------------------------------------
Feature #13263: Add companion integer nth-root method to recent Integer#isqrt
https://bugs.ruby-lang.org/issues/13263#change-63338

* Author: Jabari Zakiya
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Following the heels of adding the method ``Integer#isqrt``, to create exact integer
squareroot values for arbitrary sized integers, based on the following threads:

https://bugs.ruby-lang.org/issues/13219
https://bugs.ruby-lang.org/issues/13250

I also request adding its companion method to compute any integer nth-root too.

Below are sample methods of high level Ruby code that compute exact results.

https://en.wikipedia.org/wiki/Nth_root_algorithm

The Newton's code is a Python version I tweaked to make it look like ``Integer#isqrt``'s form.

Benchmarks show the **bbm** method is generally faster, especially as the roots become larger, 
than using Newton's method, with an added benefits its simpler to code/understand, and has a lower
sensitivity to the initial root value, and handling of small numbers.

```
class Integer
  def irootn(n)   # binary bit method (bbm) for nth root
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length - 1)/n + 1   # add 1 for initial loop >>= 1
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= self < 0 ? -1 : 1
  end

  def irootn1(n)   # Newton's method for nth root
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    return self if self == 0 || (self == -1 && n.odd?)
    num = self.abs
    b = num.bit_length
    e, u, x = n-1, (x = 1 << (b-1)/(n-1)), x+1
    while u < x
      x = u
      t = e * x + num / x ** e
      u = t / n
    end
    x *= self < 0 ? -1 : 1
  end

  def irootn2(n)   # Newton's restructured coded method for nth root
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    return self if self == 0 || (self == -1 && n.odd?)
    num = self.abs
    b = num.bit_length
    e, x = n-1, 1 << (b-1)/(n-1) + 1
    while t = (e * x + num / x ** e)/n < x
      x = (e * x + num / x ** e)/n
    end
    x *= self < 0 ? -1 : 1
  end
end

require "benchmark/ips"

[50, 500, 1000, 2000, 4000, 5000].each do |exp|
  [3, 4, 7, 13, 25, 33]. each do |k|
    Benchmark.ips do |x|
      n = 10**exp
      puts "integer root tests for root #{k} of n = 10**#{exp}"
      x.report("bbm"     ) { n.irootn(k)  }
      x.report("newton1" ) { n.irootn1(k) }
      x.report("newton2" ) { n.irootn2(k) }
      x.compare!
    end
  end
end
```
Here are results.

```
def tm; t=Time.now; yield; Time.now-t end

2.4.0 :022 > exp = 111; n = 10**exp; r = 10; puts n, "#{ tm{ puts n.irootn(r)} }", "#{ tm{ puts n.irootn1(r)} }", "#{ tm{ puts n.irootn2(r)} }"
125892541179
125892541179
125892541179
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
4.6673e-05
6.5506e-05
0.000121357
 => nil 
2.4.0 :023 > exp = 150; n = 10**exp; r = 50; puts n, "#{tm{ puts n.irootn(r)}}", "#{ tm{ puts n.irootn1(r)}}", "#{ tm{ puts n.irootn2(r)} }"
1000
1000
1000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
2.28e-05
1.8762e-05
0.000128852
 => nil 
2.4.0 :024 >
```
The benchmarks show that ``irootn2`` is the slowest but it has the same
form as ``Integer#isqt`` in the numeric.c and bignum.c files in trunk.
It probably can be tweaked to make it faster. 

bignum.c, starting at line 6772
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/57705/entry/bignum.c
numeric.c, starting at line 5131
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/57705/entry/numeric.c

Thus, a hybrid method could be created that swtiches between the two.

```
def isqrt(num=self)

  b = num.bit_length
  x = 1 << (b-1)/2 | num >> (b/2 + 1)     # optimum first root extimate
  while (t = num / x) < x
    x = ((x + t) >> 1) 
  end
  x
end

def irootn2(n)

  b = num.bit_length
  e, x = n-1, 1 << (b-1)/(n-1) + 1       # optimum first root estimate(?)
  while t = (e * x + num / x ** e)/n < x
    x = (e * x + num / x ** e)/n
  end
  x
end

def irtn(n)  # possible hybrid combination for all nth-roots

  b = num.bit_length
  if 2 < n  # for squareroot
    x = 1 << (b-1)/2 | num >> (b/2 + 1)
    while (t = num / x) < x
      x = ((x + t) >> 1) 
    end
  else      # for roots > 2
    e, x = n-1, 1 << (b-1)/(n-1) + 1
    while t = (e * x + num / x ** e)/n < x
      x = (e * x + num / x ** e)/n
    end
  end
  x *= if self < 0 ? -1 : 1
end
```

So with just a little more work, a highly performant nth-root method can be added 
to the std lib, as with ``Integer#isqrt``, to take care of all the exact integer roots
for arbitrary sized integers, by whatever name that is preferable.

This will enhance Ruby's use even more in fields like number theory, advanced math, cryptography,
etc, to have fast primitive standard methods to compute these use case values.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>