Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Sorting Arrays
#1
I'm trying to come up with a fast sorting routine to sort
string arrays. I tried a quicksort and bubble sort (which
I admittedly borrowed from some posted source I found),
but I can't come close to the speed of the ARRAY SORT
function built into PB. I'm trying to stay away from PB,
but "just when I thought I was out, they pull me back in..."

I could use a hand if anyone can offer some FB code that will
perform as fast as PB's built in function, or at least come close.
In this PB example, on my PIII 1GHZ laptop, it takes
15.5 seconds to sort a 3 million element 7 byte string array.

#COMPILE EXE
#DIM ALL

FUNCTION PBMAIN () AS LONG

DIM X AS LONG
DIM Y AS LONG
DIM Z AS LONG
DIM S AS STRING
DIM TI1 AS DOUBLE
DIM TI2 AS DOUBLE

X = 3000000

DIM MyArray(X) AS STRING * 7

PRINT "Started loading array in descending order..."
FOR Z = X TO 1 STEP - 1
S = LTRIM$(STR$(Z))
IF LEN(S) < 7 THEN S = S + STRING$(7 - LEN(S), " ")
Y = Y + 1
MyArray(Y) = S
NEXT Z

PRINT "Started to sort array in ascending order..."
TI1 = TIMER
ARRAY SORT MyArray() : 'PB built in sorting function
TI2 = TIMER

PRINT "Time to sort array = ";TI2 - TI1; " seconds."
INPUT "Hit return to quit...", S

END FUNCTION

Dean
Reply
#2
Dean,

You are really making things difficult for the subject sort.

1) 3 million strings of 7 bytes each. That's 21 million bytes. I don't know the limits of FreeBasic for handling that capacity in memory, but even with memory paging or swapping, it's going to run slow.

2) Next, you generate your input in descending order. For most sort algorithms, that's the worst case for speed.

3) No sort algorithm that I have ever seen in 40 years is going to solve your problem. The volume of strings (or records) that you have will slow down any sort.

4) Your fundamental problem is sorting these 7 million strings (or records) in memory. I suggest the following:-

* Do the sorting "off-line" from your program. That is, put or generate the strings onto a flat, text file.

* Do a test using the old MSDOS SORT command from the command line. Run SORT /? for help. In your case, the command would be:
SORT INPFILE /O OUTFILE

* Then process the sorted OUTFILE as required by your program.

* If SORT does not satisfy you need for speed, then go to:
WWW.OPTTECH.COM and buy the best sort program I have ever seen. The DOS version costs $149. You won't find anything better. The OptTech people have been exclusively providing sorts for over 20 years.

Good luck. Let us know how it went.
*****
Reply
#3
3 million damn at that size i can't see normal array sorting being able to do it that just way to many recoreds to sort.

a LinkList soluation might be better off for what your trying to do since you can easy delete , add , append data togather the only down side is you have to run through the whole link list to find to search the list.

also FB should be able to handle 3 million elements at 8byte per element it would only be 22 to 23 megs i think.
Reply
#4
A simple recursive Quicksort compiled in FB 0.13 took 3,3 seconds in my PC (P4 2.4Mhz) to order the 3M elements.

Code:
Sub Quicksort(min,max)
dim p1,p2,midd as string*7
IF min < max THEN
    p1 = min
    p2 = max
    midd = Myarray((min + max) \ 2)
    DO UNTIL p1 > p2
        DO WHILE Myarray(p1) < midd
            p1 = p1 + 1
        LOOP
        DO WHILE midd < Myarray(p2)
            p2 = p2 - 1
        LOOP
        IF p1 <= p2 THEN
            SWAP Myarray(p1), Myarray(p2)
            p1 = p1 + 1
            p2 = p2 - 1
        END IF
    LOOP
    IF min < p2 THEN Quicksort min, p2
    IF p1 < max THEN Quicksort p1, max
END IF
end sub
It's said recursion is slow..I have somewhere a nonrecursive quicksort that should be faster.
I have no PBCC to compare.
Antoni
Reply
#5
The example I posted was intended to test a worst case scenario
just to get some benchmarks.

As for memory, since FB can take in 2gb, that's not an issue. And
it did work fine -- just that it took twice as long as the built in
PB function. Actually, using the same quicksort in PB without
using their built in function, FB was over 50% faster than PB.
But their built in funciton was twice as fast as FB doing a quicksort.
And I did try a nonrecursive quicksort, which was really slow, so
I think recursion is helping speed.

I was trying to see if someone had an algorithm that performed
as well as the PB function just so I wasn't led into PB temptation.
I want FB to deliver me from evil! As Antoni pointed out, it's not
too slow using a quicksort in FB, it's just not as fast as that ARRAY
SORT function in PB. I'll try Antoni's code to see if its faster than
the quicksort I was using and let you know the results.

I can live with using a quicksort in FB -- just hoped that maybe
there's an algo that I don't know about that is faster.

Thanks for the feedback!

Dean
Reply
#6
Quicksort is actually one of the fastest algorithms around. What is ARRAY SORT using?
Antoni
Reply
#7
Chances are that PBCC SORT routine is coded in assembly. Look for an iterative quicksort routine written in assembly.
SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Reply
#8
Don't know what they're doing -- it's a built-in function.
Probably done in assembly. I heard of algo's that
are supposedly faster than quicksort - LINEAR TIME.
I'm having a conultant potentially write me a .dll that
I can use which should be 3x faster than a quicksort.

Dean
Reply
#9
Quote:There are sorting algorithms that run faster than O(n lg n) time but they require special assumptions about the input sequence to be sort. Examples of sorting algorithms that run in linear time are counting sort, radix sort and bucket sort. Counting sort and radix sort assume that the input consists of integers in a small range. Whereas, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval.

Since we already know that the best comparison-based sorting can to is Ω(n lg n). It is not difficult to figure out that linear-time sorting algorithms use operations other than comparisons to determine the sorted order.

Despite of linear time usually these algorithms are not very desirable from practical point of view. Firstly, the efficiency of linear-time algorithms depend on the keys randomly ordered. If this condition is not satisfied, the result is the degrading in performance. Secondly, these algorithms require extra space proportional to the size of the array being sorted, so if we are dealing with large file, extra array becomes a real liability. Thirdly, the "inner-loop" of these algorithms contain quite a few instructions, so even though they are linear, they would not be as faster than quick sort (say).
Antoni
Reply
#10
How about a link to that post?

There's a sort utility I use to work on files called PSORT, which
is incredibly fast, and it is based on a linear time algorithm. He calls
it a "postman's sort". Some analogy to sorting mail.

I suppose I could also just compile a PB .dll which calls their
built in sort function, then link FB to that .dll. Hmm...
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)