Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
External sort compo?
#1
Anyone for a file sorting compo?

The idea is to see how fast we can sort a file that does'nt fit in memory, for example 5MB in QBasic and 2GB in Freebasic. The tests should be done in a 256Mb RAM computer, with EMS and XMS disabled in the case of QB...

The idea is to see how close of this
http://research.microsoft.com/barc/SortBenchmark/
we can reach.
Antoni
Reply
#2
i'm not too sure exactly what "file sorting" means... care to explain a bit?
Reply
#3
Well, we have a lot of examples of sorting methods applied to arrays.
I propose to sort an array so big that does'nt fit in memory, and has to be read and written from/to the HD
Antoni
Reply
#4
This sounds like an interesting challenge, but the guide lines need to be thought out further.

How will the speed of each program be judged?
Would there be restrictions on external libraries, and if so, what sort?

Would the test file be distributed or randomly generated? (2G over a dialup connection is a long, long download.)
Reply
#5
Antoni,

The type of sort program that you propose is very complex. One would need to understand sorting theories, including the handling and merging of data strings and blocks on disk work files.

To be truthful, I don't think anyone is going to take the time to start from zero to develop a utility like this. Someone, maybe you, already has something in the works which he can use.

I have not bothered to attempt to implement such a utility because in 1985 I found an excellent sort utility by a company called OptTech (www.opttech.com), versions of which I have been using ever since. I'll be glad to use this utility as a benchmark for your evaluation.
*****
Reply
#6
Quote:To be truthful, I don't think anyone is going to take the time to start from zero to develop a utility like this.

yeah.... sorry... it's way too complex. and like RK said, the input ata could not be standardized, i'm NOT spending 2 months downloading the file =) lmao
Reply
#7
Ryan:
We should not allow external libraries, to do it with the language's resources. Assembler should not be allowed.

Testing should be done in the same PC to be fair. We could do it in a single machine, or to try it in several machines and average results.

Test file could be generated locally using basic. RND is known to give always the same sequence if it's not randomized. The only download would be the a short exe.

Moneo: The idea is to apply some theory and have fun. A professional quality program not required.

I have tried to do a direct sort of a a file of 200000 LONG integers (comparing and swapping each single register directly to disc using quicksort) and it took one minute in my machine.
Antoni
Reply
#8
Antoni,
Sounds good so far. I don't think such a competition would get a large number of entries, so one test machine should suffice. Once the rules are formalized, I definitely want to take at crack at it.
Reply
#9
Antoni,

It's been over 40 years since I was involved in doing such a sort. Here's a bit of what I can remember.
1) You read blocks of unsorted records from an input file.
2) The size of these blocks corresponds to the maximum amount of memory that the program can allocate for this.
3) Your sort the records in the block, and write each block onto a work file.
4) You will need at least 2 of these work files.
5) Continue reading, sorting, and writing sorted blocks onto the work files, until the input is depleted.
6) Now close the work files.
7) Now begin the MERGE PASSES, reading 2 (or more) work files and merging them onto a another work file. In truth, I don't remember how this works. Somehow, you keep doing merge passes until all the records are merged onto one final output file.

The sort in memory is just a small part. The tricky part is the merge passes.

Any ideas?
*****
Reply
#10
Quote:Any ideas?
Yeah, you can implement it and enter it into the contest. Wink
stylin:
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)