Login

anarky · 05-16-2006, 11:46 AM

AHH! Big avatar!

Moneo · 05-17-2006, 05:36 AM

Antoni, Just a thought.

Here's your getform FUNCTION:
FUNCTION getform
FOR j = 0 TO maxi
GET #1,,s1(j)
IF EOF(1) THEN EXIT FOR
NEXT
IF j =< maxi THEN getform = j-1 ELSE getform = maxi
END FUNCTION

It looks like you will read one long variable into s1(j) 16k times. Since the unsort file is 4MB, using this function you will perform 1 million reads (gets) from the file. Won't this slow down the whole process? Isn't there a way you could read some larger chunks, like maybe 4000 bytes, to reduce the number of reads down to 1000?

Of course you would have to parse these chunks in 4 byte increments and convert each 4 bytes to a long variable, and then store them into the s1 array. Would this be difficult? Assuming you have an extra 4000 bytes in your program's memory, wouldn't that increase the speed?
*****

anarky · 05-17-2006, 06:01 AM

I can read 12MB at a time, reducing the number of reads. I'm not sure if this is standadr accross the board for all disk I/O buffers, but mine is 12MB. Meaning I can read a total of 40 or 50MB per second.

Then all you wold do is split it into chunks that you can deal with. Tongue

Antoni Gual · 05-17-2006, 11:12 AM

Quote: Isn't there a way you could read some larger chunks, like maybe 4000 bytes, to reduce the number of reads down to 1000?

Of course you would have to parse these chunks in 4 byte increments and convert each 4 bytes to a long variable, and then store them into the s1 array. Would this be difficult? Assuming you have an extra 4000 bytes in your program's memory, wouldn't that increase the speed?

Unfortunately QB can only GET simple variables.
The only way to reduce the number of gets would be to use a 32K string, then move the values to the array with CVL(MID$ ) in a loop. I don't know if it would increase the speed , because of the overhead in using a string function as MID$. I should try it..

BTW: FB can read a complete array in a single operation, this makes it fast!

Neo · 05-17-2006, 11:20 AM

Still awaiting a reply Tongue

Quote:The only way to reduce the number of gets would be to use a 32K string, then move the values to the array with CVL(MID$ ) in a loop. I don't know if it would increase the speed , because of the overhead in using a string function as MID$. I should try it..

If the overhead on using the string function is too big, play it static. I.e. use PEEK and read the 4-byte data directly from the string instead of creating many string duplicates. Then merge the 4 bytes together using binary operations.
I think if done correctly, you might reduce the overhead that you would have got if you applied the string method.

- Neo

Antoni Gual · 05-17-2006, 12:00 PM

I rechecked my code, I think the botleneck is not where Moneo suggests, in fact the bottleneck are the 255 different SEEKS required to save the sorted data en each pass. Each file pointer movement does flush the output buffer...

Now I think the fastest sort would be an in-memory radix sort combined with a multiway merge at the end.

yetifoot · 05-21-2006, 10:33 PM

Quote:Antoni, I read Knuth's section on external sorting, with special attention to Radix sorting. As with many chapters of Knuth's books, I hardly understood what he was talking about.

I've been reading some Knuth during the last few days, its pretty tough going. I found an error too, and reported it, but apparently some people had to wait 15 years for their cheque because Knuth was too busy!

I also found this great quote, from when Steve Jobs (of Apple) met Knuth.

Quote:"It's a pleasure to meet you, Professor Knuth," Steve said. "I've read all of your books."

"You're full of sh*t," Knuth responded.

BigD · 05-23-2006, 07:59 PM

I made a program, unfortunately when i tried running it in qb 4.5 I discovered that it only works in PDS 7.1, which i used when I was writing it.

Is it still ok if I post the program?

Antoni Gual · 05-23-2006, 08:20 PM

Please do it!

BigD · 05-23-2006, 08:27 PM

Ok.. here we go, it's kind of spaghetti like, but I was trying to keep it simple and as fast as i could.
It's based on radix sorting.

Code:
DEFINT A-Z

CLS

CLOSE

TYPE buffertype

array(1 TO 8000) AS LONG

END TYPE

DIM buffer AS buffertype

DIM overallcount(0 TO 3, 0 TO 255) AS LONG

DIM fileoffset(0 TO 3, 0 TO 255)  AS LONG

DIM temp AS LONG

DIM tmp(0 TO 3) AS LONG

DIM tempstr(0 TO 255) AS STRING

OPEN "unsortqb.dat" FOR BINARY AS #1

t! = TIMER

FOR foo = 1 TO 125

GET #1, , buffer

DEF SEG = VARSEG(temp)

pointer = VARPTR(temp)

FOR i = 1 TO 8000

temp = buffer.array(i)

FOR j = 0 TO 3

a = PEEK(pointer + j)

overallcount(j, a) = overallcount(j, a) + 1

NEXT j

NEXT

NEXT foo

CLOSE

FOR i = 0 TO 255

FOR j = 0 TO 3

fileoffset(j, i) = tmp(j) * 4 + 1

tmp(j) = tmp(j) + overallcount(j, i)

NEXT

NEXT

OPEN "unsortqb.dat" FOR BINARY AS #1

OPEN "tmpsort.dat" FOR BINARY AS #2

tmpfile1$ = "tmpsort.dat"

tmpfile2$ = "tmpsort2.dat"

FOR rep = 0 TO 3

IF FREEFILE = 1 THEN

 OPEN tmpfile1$ FOR BINARY AS #1

 OPEN tmpfile2$ FOR BINARY AS #2

 SWAP tmpfile1$, tmpfile2$

END IF

FOR foo = 1 TO 125

GET #1, , buffer

FOR i = 1 TO 8000

 temp = buffer.array(i)

 a = PEEK(pointer + rep)

 tempstr(a) = tempstr(a) + CHR$(PEEK(pointer)) + CHR$(PEEK(pointer + 1)) + CHR$(PEEK(pointer + 2)) + CHR$(PEEK(pointer + 3))

NEXT

FOR i = 0 TO 255

PUT #2, fileoffset(rep, i), tempstr(i)

fileoffset(rep, i) = fileoffset(rep, i) + LEN(tempstr(i))

tempstr(i) = ""

NEXT

NEXT foo

CLOSE

NEXT rep

t! = TIMER - t!

PRINT t!

DEF SEG

KILL tmpfile2$

IF LEN(DIR$("sort.dat")) > 0 THEN KILL "sort.dat"

NAME tmpfile1$ AS "sort.dat"

Please post comments if you have any.

Login
Username:
Password:	Lost Password?
	Remember me