Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Challenge: 3-way merge:
#1
3-WAY MERGE PROGRAM

PURPOSE:
Write a program to merge 3 input text files, by key, and generate a merged output text file.

GIVEN:
* There are 3 input text files: AAA, BBB, and CCC.
* Let's call the output file DDD.
* The 3 files are always present and contain at least 1 record.
* Each file should be in sequence within itself.
* The records are variable length, minimum 4 bytes, maximum 100.
* The key is in the first 4 bytes of each record.
* The key is strictly numeric, unsigned with leading zeros.
* The key can have values from 0000 to 9999.
* Records with duplicate keys can occur.
* Therefore, the output file could have over 30,000 records.

TIP:
In order not to have to check the sequence of each input file, just check the sequence of the output file. If it's out of sequence, then (1) one of the input files was out of sequence, or (2) your program is not merging correctly.

NOTE:
This is a problem that has appeared on programming aptitude tests. It is not as simple as it seems. I suggest you do a flowchart first. It wil take about 70-100 lines of code, without comments. Please debug your code before posting.

I will prepare some sample input files on my machine for testing your posted programs.

Good luck!
*****
Reply
#2
1.-The records are variable length? How are they separed? CR+LF?

2.-We can suppose the records in the input files have been sorted previously?
Antoni
Reply
#3
Yes, Antoni, the records end with CR LF.

You can "suppose" that the records on each input file have been sorted, but you should sequence check them anyway. I ALWAYS check it to make sure. Otherwise, like I said in some other post, my program looks like it failed when the output comes out unsorted. In this particular program, I would only check the sequence of the output.
*****
Reply
#4
Well it looks like nobody jumped in with a solution. You probably figured out that this is a tricky program. Let me give you a few tips that really simplify the logic of this program:

COMPARE CONTROL INDICATOR:
----------------------------------------
I call this NEXTLOW. This is a integer whose bits indicate the results of comparing the keys of the 3 files (AAA,BBB,CCC).
Bit 1 of NEXTLOW corresponds to file AAA.
Bit 2 of NEXTLOW corresponds to file BBB.
Bit 4 of NEXTLOW corresponds to file CCC.

As a result of the compares, you OR the corresponding bit or bits into NEXTLOW based on which file key or keys are low.
Then:
NEXTLOW is 1 when key of file AAA compared the lowest.
NEXTLOW is 2 when key of file BBB compared the lowest.
NEXTLOW is 3 when keys of AAA & BBB compared the lowest.
NEXTLOW is 4 when key of file CCC compared the lowest.
NEXTLOW is 5 when keys of AAA & CCC compared the lowest.
NEXTLOW is 6 when keys of BBB & CCC compared the lowest.
NEXTLOW is 7 when keys of AAA, BBB & CCC compared equal.

When you're done with the compares and setting up NEXTLOW, which is about 16 lines of code, the rest of the program is trivial because the NEXTLOW indicator tells you what you have to do. Example: If NEXTLOW IS 5, you have to write the records from file AAA and CCC to the output, and then you have to read new records from AAA and CCC, and go back to the compare logic again.

HIGHVALUE CONTROL:
---------------------------
Using the concept of "highvalue" save you from setting switches to indicate that particular files have gone to end-of-file. How does this work? When the read for a particular file detects an end-of-file, you stuff a "highvalue" into the key for this file. A highvalue is a string or number that you have determined will be higher that any key of any file. In this sample program, we have a 4 byte key, To be absolutely safe we could set up a highvalue of 5 bytes of hex'FF'.
Having done this for all 3 files, then at the end of the compare logic we add one line of code:
IF NEXTLOW = 7 AND KEY.OF.AAA = HIGHVALUE THEN GOTO EOJ

What has happened is that the HIGHVALUE has flushed all the files. When the keys of all 3 files are sitting at HIGHVALUE, the program is finished --- there's no more records to process.

The above concepts of NEXTLOW and HIGHVALUE have been used successfully for over 40 years by industry expert. I didn't just pull them out of my hat.
*****
Reply
#5
A little late...
Errors I have considered:
Input or output files can't be opened
Input files out of order
Blank lines
files with 0 records
....

But ,you know, the errors can come in infinite flavours... Big Grin


Code:
'3 Way Merge by Antoni Gual 2003

SCREEN 0: CLS

ON ERROR GOTO FILERR
OPEN "aaa.txt" FOR INPUT AS #1
IF ERRATA THEN PRINT "FILE AAA.TXT CAN'T BE OPENED": END
OPEN "bbb.txt" FOR INPUT AS #2
IF ERRATA THEN PRINT "FILE BBB.TXT CAN'T BE OPENED": END
OPEN "ccc.txt" FOR INPUT AS #3
IF ERRATA THEN PRINT "FILE CCC.TXT CAN'T BE OPENED": END
OPEN "ddd.txt" FOR OUTPUT AS #4
IF ERRATA THEN PRINT "OUTPUT FILE  CAN'T BE OPENED": END
ON ERROR GOTO 0

CONST highval = &H7FFF
DIM A$(1 TO 3)
DIM ln(1 TO 3) AS INTEGER
DIM cnt(1 TO 3) AS INTEGER
FOR i% = 1 TO 3
IF LOF(i%) = 0 THEN ln(i%) = highval ELSE ln(i%) = -1
NEXT
MIN% = -2

DO
  LMIN% = MIN%
  MIN% = highval: indx% = 0
  FOR i% = 1 TO 3
    IF ln(i%) < MIN% THEN
      MIN% = ln(i%): indx% = i%
    END IF
  NEXT
  IF MIN% = highval THEN
    EXIT DO
  ELSE
    IF MIN% < LMIN% THEN
       PRINT USING "FILE _## OUT OF ORDER AT POS ##  \     \"; indx%; cnt(indx%); A$(indx%)
       CLOSE
       KILL "DDD.TXT"
       END
    END IF
    IF MIN% >= 0 THEN
      PRINT #4, A$(indx%)
      PRINT A$(indx%)
    END IF
    DO
    IF EOF(indx%) THEN
      ln(indx%) = highval: EXIT DO
    ELSE
        LINE INPUT #indx%, A$(indx%)
        
        IF LEN(A$(indx%)) = 0 THEN
          PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx%
          GOTO skipblank
        END IF
        cnt(indx%) = cnt(indx%) + 1
      ln(indx%) = VAL(LEFT$(A$(indx%), 4))
    END IF
skipblank:
    LOOP UNTIL LEN(A$(indx%))
  END IF
LOOP
PRINT
PRINT USING "#### records of file 1 "; cnt(1)
PRINT USING "#### records of file 2 "; cnt(2)
PRINT USING "#### records of file 3 "; cnt(3)
PRINT "Have been sorted"
CLOSE
END

FILERR: ERRATA = ERR: RESUME NEXT
Antoni
Reply
#6
Excellent Antoni. I gave it a good test with all 7 posible matching conditions, and it works 100%.

Too bad I can't understand your code, as usual. I'm surprised you didn't manage to use recursion.
*****
Reply
#7
No it' does not use recursion, but it has a GOTO, that's even worse. :evil:
Antoni
Reply
#8
ANTONI,

This is your original code which contains the GOTO.
Code:
DO
    IF EOF(indx%) THEN
      ln(indx%) = highval: EXIT DO
    ELSE
        LINE INPUT #indx%, A$(indx%)
        
        IF LEN(A$(indx%)) = 0 THEN
          PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx%
          GOTO skipblank
        END IF
        cnt(indx%) = cnt(indx%) + 1
      ln(indx%) = VAL(LEFT$(A$(indx%), 4))
    END IF
skipblank:
    LOOP UNTIL LEN(A$(indx%))
I think it can be structured as follows to avoid using GOTO.
Code:
DO
    IF EOF(indx%) THEN
      ln(indx%) = highval: EXIT DO
    ELSE
        LINE INPUT #indx%, A$(indx%)
        
        IF LEN(A$(indx%)) = 0 THEN
          PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx%
        ELSE
          cnt(indx%) = cnt(indx%) + 1
          ln(indx%) = VAL(LEFT$(A$(indx%), 4))
        END IF
     END IF
    LOOP UNTIL LEN(A$(indx%))
Won't this work?
*****
Reply
#9
Hey guys,
This challenge has been open for 2 weeks and we only had one submitted solution, an excellent piece of code by Antoni, which complies with all the specifications and works 100%.

The imlementation of this solution is considered a classic programming problem. You all should take a crack at it to appreciate the subtleties. You will definitely learn by doing.

I'm going to establish a deadline of 1 week from today, 26-Jul-2003 23:59 GMT for additional solutions, otherwise Antoni will be the winner.
*****
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)