Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
External sort compo?
#21
Quote:P.S. Also I did not understand someone else's question that seem to reduce to
163532
If I sort this to
123356
does it matter which 3 comes first?
Not in this contest. The sort isn't required to be stable, as per Antoni Gual's rules.
stylin:
Reply
#22
OK. I tried the data generator and it is evidently not clear what it does. It ends with a bug in any event.

Anyway, I use QB1.0 as distributed by Microsoft with WindowsNT

For QB1.0 and QB1.1 users, I submit the following rules.

Rules for QB1.0 or QB1.1 players:

1) The following file will be input:

OPEN "QB1Data.rnd" FOR OUTPUT AS #1
FOR i = 1 TO 1000000
PRINT #1, INT(RND * 999999)
NEXT i
CLOSE
SYSTEM

2) The file must be read via INPUT or LINE INPUT

3) The file to be produced: QB1Data.srt"

4) It must be in order of lowest to highest.

5) Of course, it must be run in IDE and not be coded specifically for one million records. Should work on any size file.

Mac
Reply
#23
Thanks, Mac!

We QBers gone illegal (plus law-abiding people as Moneo and a few others) tend to forget QBasic 1.1!

I hope there is more than one entry in each category so we can compare..


EDIT: I have my entry for QBasic 1.1 running. It sorts the 1M registers file in 150 seconds...The QBasic test file is text , this makes everything slower.
Antoni
Reply
#24
Hi antoni, can i use qsort from the CRT? that cuts about 25% off my time, compared to implementing it myself (down from 200 to 150 seconds on my machine). Then again it may be a moot point if my new idea works!
EVEN MEN OF STEEL RUST.
[Image: chav.gif]
Reply
#25
Yetifoot:
I allowed to use crt lib, so i must say yes, you can use qsort. It did'nt looked that fast the last time I tried...perhaps I missed something.

Are you much below the 400 seconds you told us in the last post?

My entry will not require a merge sort at all, I don't know if this will make it faster or slower than the "classic" method Moneo detailed.
Antoni
Reply
#26
Hi Antoni, Yes i have tested some more and the crt qsort does not appear faster, it must have been a blip. I believe this is because you must supply a compare function, whereas the FB implementation i use compares within the function.

My big new idea was to try Flashsort, but that was slower than both qsort methods, this may be due to a poor implementation however as i don't understand flashsort fully.

The code i posted above is the best i have come up with so far. This runs in approx 190 seconds on my machine (P4 1.8Ghz, 256 Mb)

The speed using crt qsort was 220 seconds, using flashsort it was 265 seconds.

I have not thought of a way yet to eliminate my merge phase, which is very expensive. I probably will not update my code unless i can think of a way to eliminate the merge phase, which is unlikely unless i do a lot of research as my knowledge of sorting methods is very limited. The method Moneo oulined is how I do it, and was what I initially thought of, it seemed most the natural to me.
EVEN MEN OF STEEL RUST.
[Image: chav.gif]
Reply
#27
I suppose this is not good LOL

Mac

Code:
CLS
SHELL "sort <QB1Data.rnd >QB1Data.srt"
SYSTEM
Reply
#28
Yetifoot:
Sorry I did'nt see you had already posted a link to your code. 190 seconds is a lot better than your first 400. After all qucksort is almost the fastest sorting method around.
Don't know if my "idea" will do better, after all "traditional" methods with a merge phase are used for some reason.

I must thank Mac for opening the QBasic 1.1 category, this made me think a little, and devise a very simple method that works pretty well. Now I must add it the bells and whistles of FB: after all, 100 times more data have to be sorted there, my QB1.1 code would do it in 3.5 hours!

Mac:
I think you designed the data of the QB1.1 category to be able to use that trick Big Grin
But other categories can't use it as they sort binary data so to be fair...
Antoni
Reply
#29
Quote:......
P.S. Also I did not understand someone else's question that seem to reduce to
163532
If I sort this to
123356
does it matter which 3 comes first?

Hunhh? Whoever it was: what are you talking about? 3=3. No such concept in my mind of which comes first.

Mac
When the issue of stability was raised, it wasn't apparent that we are only sorting keys for this contest. Usually, data records are composed of keys and some data portion. When you have a data portion, stability can be an issue. If two input records had the same key but diffferent data, and your sort changed the original order of these records, then the sort is not stable.

Why would you need stability? Assume we had a file where the data portiion was already sorted by name. Now we want to sort the file by zipcode. Using a stable sort, we just need to sort on the zipcode key, and the names will still be in sequence within each zipcode. For an unstable sort, we would have to sort on 2 keys, the zipcode as major and the name as minor --- more time.

BTW, when Antoni says "REGISTER" he is translating from Spanish, and he really means "RECORD".
*****
Reply
#30
Quote:Assume we had a file where the data portiion was already sorted by name. Now we want to sort the file by zipcode. Using a stable sort, we just need to sort on the zipcode key, and the names will still be in sequence within each zipcode. For an unstable sort, we would have to sort on 2 keys, the zipcode as major and the name as minor --- more time.
I think you have that backwards.

In an unstable sort, if two records' zip codes compare equal, it's not guaranteed that the record on the lhs will stay on the lhs: the order of the two equal records may be switched, it just depends on the sorting implementation. Unstable sorts are faster because they don't have to worry about relative position; they can just put a record at the most convenient (efficient) spot in the sorted list.

Stable sorts do have to keep the relative position of two equal records, thus are slower, since any arbitrary position for a record won't do.

You can add more sort criterion - for example by name - but all you're changing is the definition for equality - you still have to decide what to do with two (fully) equal records (whether that equality is based on zip code, name and/or whatever). The distinction is subtle, but IMO, important.

Quote:BTW, when Antoni says "REGISTER" he is translating from Spanish, and he really means "RECORD".
Yeah, I was confused for a minute there. Tongue
stylin:
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)