Issue #15166 has been reported by jzakiya (Jabari Zakiya).

----------------------------------------
Bug #15166: 2.5 times faster implementation than current gcd implmentation
https://bugs.ruby-lang.org/issues/15166

* Author: jzakiya (Jabari Zakiya)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
This is to be more explicit (and accurate) than https://bugs.ruby-lang.org/issues/15161

This is my modified gcd benchmarks code, originally presented by Daniel Lemire (see 15161).

https://gist.github.com/jzakiya/44eae4feeda8f6b048e19ff41a0c6566

Ruby's current implementation of Stein's gcd algorithm is only slightly faster than the
code posted on the wikepedia page, and over 2.5 times slower than the fastest implementation
in the benchmarks.

```
[jzakiya@localhost ~]$ ./gcdbenchmarks
gcd between numbers in [1 and 2000]

gcdwikipedia7fast32   :  time = 99
gcdwikipedia4fast     :  time = 121
gcdFranke             :  time = 126
gcdwikipedia3fast     :  time = 134
gcdwikipedia2fastswap :  time = 136
gcdwikipedia5fast     :  time = 139
gcdwikipedia7fast     :  time = 138
gcdwikipedia2fast     :  time = 136
gcdwikipedia6fastxchg :  time = 144
gcdwikipedia2fastxchg :  time = 156
gcd_iterative_mod     :  time = 210
gcd_recursive         :  time = 215
basicgcd              :  time = 211
rubygcd               :  time = 267
gcdwikipedia2         :  time = 321
gcd between numbers in [1000000001 and 1000002000]

gcdwikipedia7fast32   :  time = 100
gcdwikipedia4fast     :  time = 121
gcdFranke             :  time = 126
gcdwikipedia3fast     :  time = 134
gcdwikipedia2fastswap :  time = 136
gcdwikipedia5fast     :  time = 138
gcdwikipedia7fast     :  time = 138
gcdwikipedia2fast     :  time = 136
gcdwikipedia6fastxchg :  time = 144
gcdwikipedia2fastxchg :  time = 156
gcd_iterative_mod     :  time = 210
gcd_recursive         :  time = 215
basicgcd              :  time = 211
rubygcd               :  time = 269
gcdwikipedia2         :  time = 323
```

This is Ruby's code per: https://github.com/ruby/ruby/blob/3abbaab1a7a97d18f481164c7dc48749b86d7f39/rational.c#L285-L307
which is basically the wikepedia implementation.

```
inline static long
i_gcd(long x, long y)
{
    unsigned long u, v, t;
    int shift;

    if (x < 0)
	x = -x;
    if (y < 0)
	y = -y;

    if (x == 0)
	return y;
    if (y == 0)
	return x;

    u = (unsigned long)x;
    v = (unsigned long)y;
    for (shift = 0; ((u | v) & 1) == 0; ++shift) {
	u >>= 1;
	v >>= 1;
    }

    while ((u & 1) == 0)
	u >>= 1;

    do {
	while ((v & 1) == 0)
	    v >>= 1;

	if (u > v) {
	    t = v;
	    v = u;
	    u = t;
	}
	v = v - u;
    } while (v != 0);

    return (long)(u << shift);
}
```

This is the fastest implementation from the benchmarks. (I originally, wrongly, cited
the implementation in the article, which is 4|5th fastest in benchmarks, but
still almost 2x faster than the Ruby implementation.)

```
// based on wikipedia's article, 
// fixed by D. Lemire,  K. Willets
unsigned int gcdwikipedia7fast32(unsigned int u, unsigned int v)
{
     int shift, uz, vz;
     if ( u == 0) return v;
     if ( v == 0) return u;
     uz = __builtin_ctz(u);
     vz = __builtin_ctz(v);
     shift = uz > vz ? vz : uz;
     u >>= uz;
     do {
       v >>= vz;
       int diff = v;
       diff -= u;
       if ( diff == 0 ) break;
       vz = __builtin_ctz(diff);
       if ( v <  u ) u = v;
       v = abs(diff);
    
     } while( 1 );
     return u << shift;
}
```

The key to speeding up all the algorithms is using the ``__builtin_ctz(x)`` directive
to determine the number of trailing binary '0's.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>