Here's a little exercize. Make the most efficient algorithm that returns the integer closest to, but not exceeding the square root of another integer. This is to be done without using floating point conversions.

1->1

2->1

3->1

4->2

5->2

6->2

7->2

8->2

9->3

...

1023->31

1024->32

1025->32

...

etc.

I was working on this for a program that needed to calculate lots of approximate square roots and converting to floats is unacceptable. In other words, doing something like:

int(sqr(99))

is not acceptable.

I'm doing the work in c++, but QB should allow us to find the fastest alg. Anyone want to work on this one?

OK...this works and is 50% faster on my compiler than using pow().

This approach first makes a guess at the root and then goes through the bits and flips them one at a time if necessarry. it goes through the main loop once for each bit in the final number. Any other approaches?

Code:

`#include<iostream>`

//#include <cmath> // only needed if using pow() for speed comparison

using namespace std;

unsigned isqrt(unsigned);

int countbits(unsigned);

int main(){

for (unsigned i = 1; i <= 0xffffffff; i++ ){

//test raw speed with many iterations

//isqrt(i);

//(unsigned)pow(i, 0.5);

//show output to check function accuracy

cout << i << ' ' << isqrt(i) << endl;

//cout << i << ' ' << (unsigned)pow(i,0.5) << endl;

//if(((unsigned)pow(i,0.5)) != isqrt(i)){

// cout << "Bad match " << i << endl;

//}

}

}

unsigned isqrt(unsigned in){

int i = countbits(in)/2; //likely # bits in answer...could be 1 more bit

unsigned testvalue = 1 << i; //first guess eg 1000000

for (int j = i-1; j>=0 ; --j){

unsigned tsquare = testvalue * testvalue;

if( tsquare < in){

testvalue |= (1 << j) ; // turn on next bit 1000000 -> 1100000

}else if(tsquare > in){

testvalue ^= (3 << j); // if bust move small bit down 1010000 -> 1001000

}else return testvalue; // but add-a-bit in the case

}

if((testvalue * testvalue) > in){ // to deal with the final one's place

testvalue ^= 1;

}

return testvalue;

}

int countbits(unsigned in){

if (in){

unsigned msb = 1 << ((sizeof(unsigned) * CHAR_BIT) - 1); // 0x80000000;

int count = (sizeof(unsigned) * CHAR_BIT);

while(!(in & msb)){

msb >>= 1;

count--;

}

return count;

}else return 0;

}

Here's one that's about half as fast as int(sqr()) (perhaps faster if ported to C). It's identical to int(sqr()) from 0 to 2 billion (that's as high as I tested it).

The weird thing is, I couldn't remember the square root approximation method I remember seeing in my math textbook. Instead I came up with my own way of doing it and to my surprise it actually works! I'd be happy to try to explain it as clearly as I can if anyone's interested.

This should work in both QB and FB:

Code:

`deflng a-z`

function intSquareRoot (n)

if n <= 0 then intSquareRoot = 0: exit function

j = 1

r = 1

for i = 0 to 15

j = j + j

j = j + j

r = r + r

if j > n then exit for

next i

do

if r <= 46340 then ' prevent overflow error on r*r

i = r * r

if i = n then exit do

if i < n then

i = r + 1

if i * i > n then exit do

end if

end if

r = r - ((r - n \ r) + 1) \ 2

loop

intSquareRoot = r

end function

Quote:

well...if you run the following:

a& = 1234567890

print a&, a&^0.5

you will see that it doesn't meet my requirements, as the "to the power of" operator returns a float. I am actually looking for FAST algorithms that don't use any floating point arithmatic.

Thanks anyway for the suggestion.

Quote:Here's one that's about half as fast as int(sqr()) (perhaps faster if ported to C). It's identical to int(sqr()) from 0 to 2 billion (that's as high as I tested it).

The weird thing is, I couldn't remember the square root approximation method I remember seeing in my math textbook. Instead I came up with my own way of doing it and to my surprise it actually works! I'd be happy to try to explain it as clearly as I can if anyone's interested.

This should work in both QB and FB:Code:

`#include<iostream>`

//#include <cmath> // only needed if using pow() for speed comparison

using namespace std;

unsigned isqrt(unsigned);

int countbits(unsigned);

unsigned intSquareRoot(unsigned n);

int main(){

for (unsigned i = 1; i <= 0x1ffffff; i++ ){

//test raw speed with many iterations

//isqrt(i);

//(unsigned)pow(i, 0.5);

//intSquareRoot(i);

//show output to check function accuracy

cout << i << ' ' << isqrt(i) << endl;

//cout << i << ' ' << (unsigned)pow(i,0.5) << endl;

cout << i << ' ' << intSquareRoot(i) << endl;

//if(((unsigned)pow(i,0.5)) != isqrt(i)){

//if(intSquareRoot(i) != isqrt(i)){

// cout << "Bad match " << i << endl;

//}

}

}

unsigned isqrt(unsigned in){

int i = countbits(in)/2; //likely # bits in answer...could be 1 more bit

unsigned testvalue = 1 << i; //first guess eg 1000000

for (int j = i-1; j>=0 ; --j){

unsigned tsquare = testvalue * testvalue;

if( tsquare < in){

testvalue |= (1 << j) ; // turn on next bit 1000000 -> 1100000

}else if(tsquare > in){

testvalue ^= (3 << j); // if bust move small bit down 1010000 -> 1001000

}else return testvalue; // but add-a-bit in the case

}

if((testvalue * testvalue) > in){ // to deal with the final one's place

testvalue ^= 1;

}

return testvalue;

}

int countbits(unsigned in){

if (in){

unsigned msb = 1 << ((sizeof(unsigned) * CHAR_BIT) - 1); // 0x80000000;

int count = (sizeof(unsigned) * CHAR_BIT);

while(!(in & msb)){

msb >>= 1;

count--;

}

return count;

}else return 0;

}

unsigned intSquareRoot(unsigned n){

if (n <= 0) return 0;

unsigned j = 1;

unsigned r = 1;

unsigned i;

for(i = 0; i < 15; i++){

j += j;

j += j;

r += r;

if (j > n) break;

}

while(1){

if (r <= 65536){ //' prevent overflow error on r*r

i = r * r;

if( i == n) break;

if( i < n){

i = r + 1;

if( i * i > n) break;

}

}

r = r - ((r - n / r) + 1) / 2;

}

return r;

}

Thanks Christian. This looks similar to the "Babylon method". I'll port it and check its speed.

EDIT: I added your function to my code...It's not quite working...gives incorrect roots. However it is much faster. Perhaps you see the bug? It looks like the root has a factor of 2 error.

EDIT2: OK...found the bug. I used the "=" operator (assignment) rather than the "==" operator (logical comparison) in one of the conditionals...one of the hazards of porting to c++. Anyway...looks to work. I'll look at the speed shortly.

EDIT3: to calculate the square roots of the first (hex)1ffffff numbers to the nearest whole integer

pow() 12 sec

mango's method 8 sec

Sterling Christensen's method 6 sec

Good job Sterling. Can anyone offer any improvements?

Quote:Good job Sterling. Can anyone offer any improvements?

Thanks

The compiler probably is intelligent enough to do this for you, but just in case, you could change this:

Code:

`r = r - ((r - n / r) + 1) / 2;`

To this:

Code:

`r = r - (((r - n / r) + 1) >> 1);`

Quote:Mango Wrote:Good job Sterling. Can anyone offer any improvements?

Thanks

The compiler probably is intelligent enough to do this for you, but just in case, you could change this:Code:

`r = r - ((r - n / r) + 1) / 2;`

To this:Code:

`r = r - (((r - n / r) + 1) >> 1);`

yeah...I'm using dev-c++ 4.9.9.1 (gcc 3.4 I believe) with the optimization set to 'more' which I've found better than 'best'. The shift vs /2 doesn't make a difference.

however, I have made a couple of changes that take another second off (17% speed increase):

Code:

`unsigned intSquareRoot(unsigned n){ `

if (n <= 0) return 0;

unsigned j = 1;

unsigned r = 1;

while(j<n){

j += j;

j += j;

r += r;

}

unsigned i;

unsigned z;

while((i = r * r) ^ n){

if(i < n){

z = r + 1;

if( z*z > n) break;

}

r -= ((r - (n / r)) + 1 ) / 2;

}

return r;

}

changing the first for loop to a while got rid of 1 conditional

and modifications to the second loop got rid of the overflow-checking conditional which is unnecessary since the root of an unsigned int will always be within range when passed an unsigned int plus another couple of ordering of execution changes that lowered the number of times conditionals are determined.

my method doesn't stand up, as it *always* requires a fixed number of loop iteration, while yours exits the iterative loop once it's close enough. Can anyone materially improve on the function shown in this post???

Again, nice job, Sterling. BTW...Happy new year, everyone.

-M