Qbasicnews.com

Full Version: depth of recursion and resources
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I was messing around with the quicksort alg, and my test-bed for improvement was 10,000,000 4-byte random integers. (got it down to a hair under 3 secs on my 1.5 GHz :o )

Anyway...The two improvements I made involved switching sort methods once a chunk got pretty small, and improving the partition function (that is...the method for determining the split-point of the two pieces that are being sorted).

After I maxed out on my tweaks and could get no further, I started wondering how many layers of recursion I was using for the 10 mil integer testbed. It turned out that the original code I had used went to a max depth of 72, the "change methods for small chunks" tweak got it down to 66 levels, and the "improve partition function" tweak got it down to max depth of 50, with most of the time spent at circa 20-30. (this is just for 1-particular set of random data...I'm sure the numbers would change with the data-set).

each recursive function passes 2 variables to the next level, in 2 steps...for a total of 4 variables passed. These are 4-byte pointers...which means that at the depth of 50, there should only be 800 bytes of memory occupied by the recursion. IS THIS CORRECT?

If so...for this particular alg...I'm really limited by my ability to hold the array of test data much more than I'm limited by the depth of recurssion. ??!!!?? I always thought recursion was death...Me Wrong??

The second thing...I was thinking about the efficiecy of the partion function and realized that it is pretty poor, considering that in the ideal case, the max depth would be only about 20. What I was using (which is still fast) is just to average the first and last data of the array-chunk using integer division unless that caused an out-of bounds data...in which case use just the first data...

Can any of you think of other fast ways to get a representitive sampling of data of unknown array sizes?

edit::: ok...I changed the partition function and now it only goes to circa 35 for 10 mil...but the speed is similar to my last best which went to 50 levels. The tradeoff is more processing to find a better split point. Anyway...I've posted my code if anyone is interested. Also...note my posted code uses the c++ rand() function which has zero at bit 31 and 15...don't know how this will affect the sort...seems a bit slower than with my better rand function I was using when developing.
(2[Variables]*4[BytePerVariable]) ^ 50 [Depth]
Only thing I could thing of that's more efficient than a quicksort at small amounts is a bin sort. But, you have to check for the depth you're in each time, and, the "log2(n)/k" is miniscule with small numbers, so that probably kills it.
Not that anyone is interested....

but, this is about as fast as I could get this to run for a 10 mil data set. My results were compiled with GCC with the optimization settings maxed. some tweaks don't look like they'd be good, but gave better results with this compiler. Initially I didn't use so many pointers, but I was on a quest to break the 3 second barrier...and it was the only way I could get there. If anyone knows of faster code, I'm interested. this code will sort 10,000,000 in ~2900 ms, and 20,000,000 in ~6100 ms on my machine (1.5 GHz P4, 256MB).

Can anyone find some fat in this prog??

Note...there is a lot of commented code in the partition function which makes it hard to read here...These code chunks represent different methods of selecting the split point. I left them in so you could see what I've tried, in case anyone wants to make a suggestion.

Code:
#include <iostream>

using std::cout;
using std::cin;
using std::endl;

        void  fillRND      (unsigned int* a   , int size);
unsigned int* partition    (unsigned int* left, unsigned int* right);
        void  quicksort    (unsigned int* left, unsigned int* right);
        void  selectionsort(unsigned int* left, unsigned int* right);
        void  showit       (unsigned int* a   , int size);
         int verify        (unsigned int* a   , int size);

const int alternate_sort = 8; //when to call selection-sort
int depth = 0;                // use along with lines in "quicksort" to check depth
int dmax = 0;                 // ditto

int main(){
  int clo;
  cout << "Enter the number of elements to be sorted. ";
  int sz;
    cin >> sz;
  unsigned int* a = new unsigned int[sz];
  cout << "generating data...";
  fillRND(a, sz);
  cout << "done\nchecking data...";
  verify(a, sz);
  //showit(a, sz);                   //only use this for small data sets...
  
  cout << "sorting data...\n";
  clo=clock();
  quicksort(a, a+sz-1);
  cout << "done in " << clock() - clo << " (ms)\nchecking data...";
  verify(a, sz);
  //showit(a, sz);
  
  delete [] a;
  system("pause");
  return 0;
}

void quicksort(unsigned int* left, unsigned int* right){
    if (++depth > dmax) cout << ++dmax << endl;      //to note max recursion depth
   //cout << '.' << ++depth;             //use this and matching to monitor recursion
   if(left + alternate_sort < right) {        //this line to use selection sort...next line without  
    //if(right > left) {                          //use this line if you don't want selection sort...
      unsigned int* split_pt = partition(left, right);
        quicksort(left, split_pt);
      quicksort(split_pt+1 , right);
   }
   else selectionsort(left, right);           //comment this line to see w/o selection sort
   --depth;                                          //eliminate along with first line of function...if you want max speed
   //cout << '`' << --depth;             //use this and the above matching line to monitor depth
}

unsigned int* partition(unsigned int* left, unsigned int* right){
    //unsigned int val = *left;       // this one is pretty basic...
  
/*   unsigned int x = *left;         // this is one of the fastest I found...recursion to circa 50 for 10 mil data
      unsigned int z = *right;
     unsigned int val = x&1 ?  z&1 ? x :x/2 + z/2  : (x/2 + z/2);
*/
    
     int mod = 3;                    // My current favorite...gets depth down to circa 35 recursion levels for a 10 mil rand sort
     unsigned int val = *left/mod + left[1]/mod + *right/mod+ // note the line continues...
     (*left%mod + left[1]%mod + *right%mod)/mod;
  
//   unsigned int val = a[align=left]/2 + a[align=right]/2;  //yuck...there's a bug here...val can be out of range = hang....

/* // this is the equivalent to the one I call "one of the fastest"...easier to understand, though
   unsigned int val;  
   {  
      unsigned int x = a[align=left];
      unsigned int z = a[align=right];
      val = x/2 + z/2;      

      if(x < val)       // uses middle value
         if(val < z);   //val = y; //x < y < z
         else if(x < z)   val = z; //x < z < y      
              else        val = x; //z < x < y
      else if(z < val); //val = y; //z < y < x
           else if(z < x) val = z; //y < z < x
                else      val = x; //y < x < z  
   }
*/
  
   unsigned int* lm = left - 1;  //lm = left mover
   unsigned int* rm = right + 1;
   for(;;){
        while(*--rm > val);
        while(*++lm < val);
      if(lm < rm){
            unsigned int temp = *rm;
            *rm = *lm;
            *lm = temp;
        }else return rm;
   }
}

void selectionsort(unsigned int* lp, unsigned int* rp){
    unsigned int* i = lp;
    unsigned int* r = rp;
   for(; i < r; ++i) {
        unsigned int* min = i;
        for(unsigned int* j=i; ++j <= r;)
            if(*j < *min) min = j;
        unsigned int temp = *min;
        *min = *i;
        *i = temp;
    }
}

int verify(unsigned int* a, int size){
   for(int i = 0; i<size-1; ++i){
     if(a[i] > a[i+1]){
        cout << "numbers not sorted\n";
        return 1;
     }
   }
   cout << "numbers are sorted\n";
   return 0;
}

void showit(unsigned int* a, int size){
   for(int i = 0; i < size; ++i)
      cout << i << ' ' << a[i] << endl;
   cout << endl;  
}

// to make it easy for you to compile, I changed to using rand() instead of some high-quality random numbers ;-)
void fillRND(unsigned int* a, int size){
   srand(clock());
   for(int i = 0; i < size; ++i)
      a[i] = rand() + (rand() << 16); //to fill more than 15 bits...
}
Quote:(2[Variables]*4[BytePerVariable]) ^ 50 [Depth]

Hmmm...that's a pretty big number...1.4e+45
a few more than my machine has.

I don't think that this analysis is correct. at each level of recursion, it only needs to hold the ones above it...not all the possible ones above it....I think :-?
but when you leave one depth and return to the upper depth you must have the old variables too

but I am not sure
Quote:but when you leave one depth and return to the upper depth you must have the old variables too

but I am not sure

I know this one Big Grin When exiting a function, all non-static variables cease to exist. They are wiped off the stack. Anyway...I think that in my case, each call to qsort passes 2-4byte variables(by value, not by reference) so 8-bytes are consumed. Then, each call to Qsort calls partition, again passing 8 byte...and partition returns a 4-byte variable...8 + 8 = 16 - 4 = 12 bytes per recursive level. So...if in the depths of a large sort, the program max's out at 40 recursive levels, then the memory requirements should be

1) the program itself (whereever programs go when they are loaded :-? )
2) The array created on the Heap (where there's lots of room!!)
3) all other variables...created on the stack.

so...partition pushes a few more bytes (5 four-byte ints = 20, as I count) on the stack, but always erases these before exiting. So...let's call:
1) 200,000 bytes
2) 40,000,000 for a 10 mil array
3) 500 bytes (main variables + recursion variables...12 bytes/recursion * 40 recursion levels)

The point is that, in this case, it appears that the act of recursing *is not* a huge drain on resources...contrary to what was my understanding prior to doing the measurements/calculations. But I still wonder...IS MY ANALYSIS CORRECT?

Thanks for the replys. If I sound curt, I certainly don't mean to.
It's not a huge drain, because you are doing things step by step, so each level has exactly the amount of data it needs. Still, though... the stack data is 99.99999% dynamic in a recursive function... in an interative function it is 99.99999% static.. dynamic is slower than static....
Quote:the stack data is 99.99999% dynamic in a recursive function... in an interative function it is 99.99999% static.. dynamic is slower than static....

aga...can you elaborate? My (most likely incorrect) understanding is that the stack is memory-space preallocated to a particular program to hold it's variables, as they are needed. It's like a pre-defined array that holds all the variables a program uses. The stack pointer points to the highest "used" portion of this array, and new variables are "pushed" to the next availble memory site.

Is this more or less correct? If so...then with the Qsort I'm using, each recursive level adds 12 bytes to the stack...and each recursion level exited returns that many to the stack...where is the speed loss if the program *only* uses the memory at(near??) the top of the stack???

If my understanding of the stack is correct, it appears that my program is always operating within 12 bytes (3 words) of the "top"...and seems to be pretty quick. Is there a faster way to sort the same data-set?
When a function call occurs, several things are pushed into the stack:

1.- The return address
2.- Each parameter
3.- The previous stack pointer

Then, static variables defined in the function are allocated in the stack as well.

Dynamics arrays are created anywhere in the memory space reserved to the program, only the pointers to the allocated memory are in the stack. That means that a dynamic array (malloc-ated) takes always 32 bits in the stack (4 bytes).

Note that the stack is usually word-aligned for speed in a 16 bits environment, and most likely dword-aligned in a 32 bits environment. That means that one "char" variable can take 32 bits, 'cause it takes a whole slot in the stack.

I don't get what Agamemnus says, as well.
Pages: 1 2