Qbasicnews.com

Full Version: External sort compo?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
AHH! Big avatar!
Antoni, Just a thought.

Here's your getform FUNCTION:
FUNCTION getform
FOR j = 0 TO maxi
GET #1,,s1(j)
IF EOF(1) THEN EXIT FOR
NEXT
IF j =< maxi THEN getform = j-1 ELSE getform = maxi
END FUNCTION

It looks like you will read one long variable into s1(j) 16k times. Since the unsort file is 4MB, using this function you will perform 1 million reads (gets) from the file. Won't this slow down the whole process? Isn't there a way you could read some larger chunks, like maybe 4000 bytes, to reduce the number of reads down to 1000?

Of course you would have to parse these chunks in 4 byte increments and convert each 4 bytes to a long variable, and then store them into the s1 array. Would this be difficult? Assuming you have an extra 4000 bytes in your program's memory, wouldn't that increase the speed?
*****
I can read 12MB at a time, reducing the number of reads. I'm not sure if this is standadr accross the board for all disk I/O buffers, but mine is 12MB. Meaning I can read a total of 40 or 50MB per second.

Then all you wold do is split it into chunks that you can deal with. Tongue
Quote: Isn't there a way you could read some larger chunks, like maybe 4000 bytes, to reduce the number of reads down to 1000?

Of course you would have to parse these chunks in 4 byte increments and convert each 4 bytes to a long variable, and then store them into the s1 array. Would this be difficult? Assuming you have an extra 4000 bytes in your program's memory, wouldn't that increase the speed?
Unfortunately QB can only GET simple variables.
The only way to reduce the number of gets would be to use a 32K string, then move the values to the array with CVL(MID$ ) in a loop. I don't know if it would increase the speed , because of the overhead in using a string function as MID$. I should try it..

BTW: FB can read a complete array in a single operation, this makes it fast!
Still awaiting a reply Tongue

Quote:The only way to reduce the number of gets would be to use a 32K string, then move the values to the array with CVL(MID$ ) in a loop. I don't know if it would increase the speed , because of the overhead in using a string function as MID$. I should try it..
If the overhead on using the string function is too big, play it static. I.e. use PEEK and read the 4-byte data directly from the string instead of creating many string duplicates. Then merge the 4 bytes together using binary operations.
I think if done correctly, you might reduce the overhead that you would have got if you applied the string method.

- Neo
I rechecked my code, I think the botleneck is not where Moneo suggests, in fact the bottleneck are the 255 different SEEKS required to save the sorted data en each pass. Each file pointer movement does flush the output buffer...

Now I think the fastest sort would be an in-memory radix sort combined with a multiway merge at the end.
Quote:Antoni, I read Knuth's section on external sorting, with special attention to Radix sorting. As with many chapters of Knuth's books, I hardly understood what he was talking about.

I've been reading some Knuth during the last few days, its pretty tough going. I found an error too, and reported it, but apparently some people had to wait 15 years for their cheque because Knuth was too busy!

I also found this great quote, from when Steve Jobs (of Apple) met Knuth.

Quote:"It's a pleasure to meet you, Professor Knuth," Steve said. "I've read all of your books."

"You're full of sh*t," Knuth responded.
I made a program, unfortunately when i tried running it in qb 4.5 I discovered that it only works in PDS 7.1, which i used when I was writing it.

Is it still ok if I post the program?
Please do it!
Ok.. here we go, it's kind of spaghetti like, but I was trying to keep it simple and as fast as i could.
It's based on radix sorting.

Code:
DEFINT A-Z
CLS
CLOSE

TYPE buffertype
array(1 TO 8000) AS LONG
END TYPE

DIM buffer AS buffertype
DIM overallcount(0 TO 3, 0 TO 255) AS LONG
DIM fileoffset(0 TO 3, 0 TO 255)  AS LONG
DIM temp AS LONG
DIM tmp(0 TO 3) AS LONG
DIM tempstr(0 TO 255) AS STRING

OPEN "unsortqb.dat" FOR BINARY AS #1

t! = TIMER
FOR foo = 1 TO 125
GET #1, , buffer

DEF SEG = VARSEG(temp)
pointer = VARPTR(temp)

FOR i = 1 TO 8000
temp = buffer.array(i)
FOR j = 0 TO 3
a = PEEK(pointer + j)
overallcount(j, a) = overallcount(j, a) + 1
NEXT j
NEXT

NEXT foo
CLOSE

FOR i = 0 TO 255
FOR j = 0 TO 3
fileoffset(j, i) = tmp(j) * 4 + 1
tmp(j) = tmp(j) + overallcount(j, i)
NEXT
NEXT

OPEN "unsortqb.dat" FOR BINARY AS #1
OPEN "tmpsort.dat" FOR BINARY AS #2
tmpfile1$ = "tmpsort.dat"
tmpfile2$ = "tmpsort2.dat"

FOR rep = 0 TO 3

IF FREEFILE = 1 THEN
OPEN tmpfile1$ FOR BINARY AS #1
OPEN tmpfile2$ FOR BINARY AS #2
SWAP tmpfile1$, tmpfile2$
END IF

FOR foo = 1 TO 125
GET #1, , buffer

FOR i = 1 TO 8000
temp = buffer.array(i)
a = PEEK(pointer + rep)
tempstr(a) = tempstr(a) + CHR$(PEEK(pointer)) + CHR$(PEEK(pointer + 1)) + CHR$(PEEK(pointer + 2)) + CHR$(PEEK(pointer + 3))
NEXT

FOR i = 0 TO 255
PUT #2, fileoffset(rep, i), tempstr(i)
fileoffset(rep, i) = fileoffset(rep, i) + LEN(tempstr(i))
tempstr(i) = ""
NEXT

NEXT foo
CLOSE
NEXT rep

t! = TIMER - t!
PRINT t!

DEF SEG

KILL tmpfile2$
IF LEN(DIR$("sort.dat")) > 0 THEN KILL "sort.dat"
NAME tmpfile1$ AS "sort.dat"

Please post comments if you have any.
Pages: 1 2 3 4 5 6 7 8 9