Qbasicnews.com

Full Version: NIO 2003 Exercise 3
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Note: these examples are from the Dutch Informatics Olympiad, on very high level. I post them here because moneo finally found out that every challenge here is about games. Now, here come some very straightforward problems... Big Grin

Exercise 3. Recognising Letters.
In this exercise you'll make a start with recognising letters. To limit us a bit with it, we'll only work with capital letters.

These letters are shaped as a series of black boxes on a field of 8 by 7 squares. The patterns to compare these "characters" with must also been seen as a series of black boxes.

Such a field of 8 by 7 boxes can also be displayed with 7 numbers. Every number indicates which boxes in that column are coloured. Every row of the field has a number and the numbers of the black boxes summed up return the total number of that column. The seven column numbers indicate how the character looks like.

The rows are designated 128, 64, 32, 16, 8, 4, 2 and 1 from top to bottom. Here is an example of the capital letter A. The numbers on the bottom are the column numbers, unique to the character.

Code:
...#...   <row 128
..###..   <row 64
..###..   <row 32
.##.##.   <row 16
.#####.   <row 8
#######   <row 4
##...##   <row 2
##...##   <row 1
abcdefg

a = 7
b = 31
c = 124
d = 236
e = 124
f = 31
g = 7

File:
There is a file patroon.dat (patroon is dutch for pattern) available with a description of all these patterns. The file has the same structure as all input files:
The first line contains a number N, which indicates that there are N patterns in the file (in patroon.dat this obviously is 26).
Then follow N lines with each 7 numbers k, with 0 <= k <= 255. On every line the column numbers of ONE pattern are displayed.

The contents of the file patroon.dat is as follows:
Code:
26
7 63 124 236 124 63 7
255 255 219 219 219 255 110
60 126 231 195 195 102 36
255 255 195 195 231 126 60
255 255 219 219 219 195 195
255 255 216 216 216 192 192
60 126 227 203 235 110 44
255 255 24 24 24 255 255
0 0 255 255 255 0 0
4 6 7 3 7 254 252
255 255 24 60 102 195 129
255 255 3 3 3 3 3
255 127 48 24 48 127 255
255 255 112 60 14 255 255
60 126 231 195 231 126 60
255 216 216 216 216 216 112
60 126 199 203 199 126 61
255 216 216 216 220 223 115
115 251 219 219 219 223 206
192 192 255 255 255 192 192
252 254 7 3 7 254 252
224 252 30 3 30 252 224
254 255 6 12 6 255 254
195 231 126 60 126 231 195
199 238 124 56 112 224 192
195 199 207 219 243 227 195

When in the exercises a pattern fits just as good with two or more letters, you choose for the letter which occurs most in the English language. The file vaakst.dat contains a line of text that exists of all 26 capital letters, ordered by occurance in the the English language, starting at the letter with the highest occurance.

This is the contents of the vaakst.dat file:
Code:
ENATRDOISLGHVMKUWPCBZYFJXQ
You can use this file whenever possible. If a pattern fits just as good with the C as with the G, your program has to select the G because it is more often used in English than the C.

Input
Your program will receive as input a file TEKST.IN (dutch for text), which has the same structure as the file patroon.dat. On the first line there is a number N, with 1 <= N <= 75. Next there are N lines of each 7 numbers, these seven numbers indicate a pattern that your program tries to recognise as a letter.

Example: File tekst0.in
Code:
3
224 129 129 255 129 129 224
0 62 42 42 35 54 0
0 0 28 4 4 0 0
This file will be used as an example input in all exercises.

Example files and testing
There are files tekst0.in, tekst1.in up to and including tekst5.in available which you can test your program with.


Overview
Code:
Subpart  Program   Output   Timelimit   Number of tests   Points per test   Total
3A       nio3a     3a.out   2 sec       8                 2                 16
3B       nio3b     3b.out   2 sec       10                2                 20
3C       nio3c     3c.out   2 sec       8                 3                 24
3D       nio3d     3d.out   2 sec       10                4                 40


Exercise 3A: Draw the input as a picture
Write a program nio3a that receives a file tekst.in as input. The program has to print to file 3a.out the patterns from the input file by means of points (.) and X's. The letters will be printed one after another, always followed by a blank line.

Output with the give example (tekst0.in):
Code:
XXXXXXX
X..X..X
X..X..X
...X...
...X...
...X...
...X...
.XXXXX.

.......
.......
.XXXXX.
.X...X.
.XXX...
.X...X.
.XXXXX.
.......

.......
.......
.......
..X....
..X....
..XXX..
.......
.......


Exercise 3B: Count the similarities
Write a program nio3b that gets its input from a file tekst.in. The program will give as output a file 3b.out that contains ONE line of N capital letters. The text in the output file is the interpretation of the input patterns by means of the following algorithm.

You compare every inputted pattern with the 26 character patterns from pattern.dat. For every letter you count the amount of boxes that are the same both in the character as in the pattern. With an ideal similarity this amount will be 56 (7x8). Your program chooses as interpretation for the pattern the letter with the highest number of equal boxes, in case these are more letters the letter first called in the file vaakst.dat is taken.

Example output with the tekst0.in:
Code:
TAI




Some problems
When you compare a pattern with a standardized character using the above algorithm, of course things can go wrong. Two examples:
- Letter thickness: Maybe the one whose letters are to be recognised makes his letters less or more bold.
- Letter size: Maybe the letters to be recognised don't fill the entire area of 7x8 squares.

Intermediate Patterns
To avoid the problem of letter size we will limit us to the smallest rectangle which contains all black squares. In the input files for the exercises the letters will contain at least one black square. To also avoid letter thickness, we have to look at color-intermediates. A way to do this is as follows:
1. Write the pattern by means of W and B (white and Black), for example the 2nd letter from tekst0.in:
Code:
BBBBB
BWWWB
BBBWW
BWWWB
BBBBB
2. Now remove all double B's and W's:
Code:
B
BWB
BW
BWB
B
3. Now remove all subsequent double lines (doesn't occur in this example).

This process can be repeated for the columns, from top to bottom. The number of lines that remain of the rows we call rowlines, and the number of lines that remain of the columns we call columnlines.

Here is an application to the A from pattern.dat (pattern b) and a smaller a-pattern (pattern p).

Code:
...X...
..XXX..
..XXX..
.XX.XX.
.XXXXX.
XXXXXXX
XX...XX
XX...XX

Rows:
WBW
WBWBW
WBW
B
BWB

Columns:
WB
WBW
BWBW
WBW
WB

Rowlines Rb = 5; Columnlines Cb =   5
Code:
.X.
   X.X
   XXX
   X.X
   X.X
      
      

Rows:
WBW
BWB
B
BWB

Columns:
WB
BWBW
WB

Rowlines Rp = 4; Columnslines Cp = 3


Exercise 3C: Detemining Intermediate Patterns
Write a program nio3c that gets its input from a file tekst.in and writes its output to a file 3c.out. The output consists of the intermediate patterns of the inputted patterns.
In the example beneath is the output format. Note that in the format the rows come before the columns and that each is started by a number indicating the number of lines. After each pattern there is a blank line.

Output with the example input (tekst0.in):
Code:
Pattern 1:
3
B
BWBWB
WBW
5
BW
BWB
B
BWB
BW

Pattern 2:
5
B
BWB
BW
BWB
B
3
B
BWBWB
BWB

Pattern 3:
2
BW
B
2
B
WB



Comparing by means of intermediate patterns
To determine the amount of similarities between two intermediate patterns, you'll go as follows. Both for the rows as for the columns you go look for the longest series of lines that occur in both intermediate patterns, in the same order. In above example these are the 3 lines of the rows and all lines of the columns of the second pattern. The number of common rowlines Rg and the number of common Cg you use in the next formula, that is a measurement of the amount of equality between two patterns:

O(b, p) = Rg / Rp + Rg / Rb + Cg / Cp + Cg / Cb

In the example the amount of equality is:
3/4+3/5+3/3+3/5 = 59/20 = 2.95
Obviously, you now choose as interpretation of a pattern the letter for which this number is as high as possible.

Exercise 3D: Comparing with Intermediate Patterns
Write a program nio3d that gets its input from a file tekst.in. The program outputs a file 3d.out that contains ONE line of N capital letters. The text in the output file is the interpretation of the inputted patterns, according to above algorithm based upon the amount of equality of two patterns.

Output with the given example (tekst0.in):
Code:
TEL




If you're able to complete these exercises, you're on your way to understand a bit about OCR... Wink

About me, I got for 3A 16 points, for 3B 20 points, for 3C 24 points and for 3D 0 (!) points, for a total of 60 out of 100, which I got 3rd place with, as you can see here. The reason for me to get 0 points at exercise 3D was that I had little time left to finish it completely:
During the competition, you had about 1 hour and 15 minutes for all these 4 small exercises!

Have fun I'd say! (I already made them last year Wink :kingSmile
Aah, a *real* challenge Smile. Thus, you'll probably get no entries Wink

What language did you write it in?
Unless I totally misunderstood it then I should have one done by tommorow (in qbasic)
Aw - this is the kind of problems I had on my exams... Too many bad memories Tongue
Quote:Aah, a *real* challenge . Thus, you'll probably get no entries

What language did you write it in?
... :roll: A *real* challenge -> no entries Big Grin Do you imply that no one here can make it? Wink lol
I wrote it in QB of course ... Smile

Quote:Unless I totally misunderstood it then I should have one done by tommorow (in qbasic)
No, I just mentioned that I only had 1.25 hours to make it. You can have all the time you want Smile

Quote:Aw - this is the kind of problems I had on my exams... Too many bad memories Tongue
Then you can finally overcome the memories by entering some code Wink


So... it may be a *real* challenge, but I didn't sit here 2 hours translating the problem from dutch to english for nothing... Tongue So... moneo, it won't take any time (at max about 2 hours or so!). So... nathan, oracle, white, try to write some Smile So... the rest, where are you? Wink
Quote:No, I just mentioned that I only had 1.25 hours to make it. You can have all the time you want Smile

Ok. I shall work on it in my spare time.
I stopped reading when I saw the word TEKST. What an eyeful!
I almost finished college and... NO MORE EXAM PROBLEMS Big Grin

Real life problems are far more easy than exam problems Sad
Quote:Real life problems are far more easy than exam problems Sad

I beg to differ. Real life problems can be exponentially tough than exam problems.


Neo: quite a cool challenge there. I wanted to enter. But unfortunately I have exams now... :???:
Okay, I'll take up the challenge, though I might just write it in java to annoy you Wink

I've only got free time on Tuesday (which might not be free time, I have to code an airport simulation in java for COMP then), and on Friday. So we'll see what I've done around then Wink
Pages: 1 2 3