Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
"Out of string space" opening a big secuencial fil
#1
I have to open a 140Mby text file with 48 and 49 bytes records and rewrite a new one, with records that contain "AC" at lengh 44

I use:

OPEN "D:\FILE.TXT" FOR INPUT AS #1
OPEN "C:\CUIT\NEW.TXT FOR OUTPUT AS #2

FOR RECORD& = 1 TO 1000 REM THERE ARE NEARLIY 3M RECORDS
LINE INPUT #1 RECORD$
IF MID$(RECORD$,44,2)="AC" THEN WRITE #2 RECORD$
NEXT RECORD&


All I get is the following error:

Out of string space.

But if I replace D:\FILE.TXT (nearly 140Mby) with
C:\CUIT\PARTFILE.TXT (100Kby)

no errors occur.

Could it be that OPEN tests the file size before reading the first record ?
OPEN as any limitation in the size of a file that can open ?


Alejandro Lieber
Rosario Argentina
he command line is a vestige of an era of macho computing
Reply
#2
Hi alajandrolieber

The file limit (which is a DOS limit not a QB limit per se) is 2Gb.

I'm not sure what the file looks like (the big one that is) but one of the things that can cause such an error is if perhaps a line is bigger than the string variable allows.

Since you have a LINE INPUT # statement it would seem that one of those lines is indeed longer than the string type allows, if that's the case, open the file in a windows text editor and see if you can see where the problem is.

It must be the cause.
hen they say it can't be done, THAT's when they call me ;-).

[Image: kaffee.gif]
[Image: mystikshadows.png]

need hosting: http://www.jc-hosting.net
All about ASCII: http://www.ascii-world.com
Reply
#3
Che Alejandro,

Use PRINT # instead of WRITE #.
The use of WRITE # is strongly NOT recommended except for very special cases when you want quotes around string data. Try this, although it may not solve the "out of string space" problem.

You should not use the FOR loop of 1 to 1000. If there are less than 1000 records on the file, you will get an error. If there are more that 1000, you will not process the extra records.
Use the following to process all the records:
Code:
DO WHILE NOT EOF (1)
      LINE INPUT #1 RECORD$
      IF MID$(RECORD$,44,2)="AC" THEN PRINT #2 RECORD$
LOOP
What is the format of the input file D:\FILE.TXT ?
It should be text only.
Are all the records 48 or 49 bytes long?
How was this file created?
Are there other programs that can read it successfully?

Like MystikShadows says, maybe this input file has been corrupted somehow. If you can, view it with a text editor like he says. If the file is really 140MB, few editors will handle it.

Let us know what happened.
*****
Reply
#4
Thanks all.

The input file is pure text, aprox 3.200.000 records, cannot know because the records are 48 and 49 bytes long.

This file is the complete list of all Argentine tax contributors, but should have only 49 bytes records, so it should be easy to open with randon file access.

Here in Argentina we have V.A.T. tax, so we should know before making an invoice, if the buyer is included in that list.

Up to now I have used XTREE Gold's View to find any text in that file, but reads it secuenselly so it can take up to 2 minutes to find a record.

I think QBasic can read aprox. 2M records in a randon file.

I used FOR T=1 to 1000 just to begin programing.
It just cannot read the first record.

I repeat: a part of that huge file can be read and writen to a new one if I use only a small part of it.

Alejandro Lieber
he command line is a vestige of an era of macho computing
Reply
#5
Che,
You said "I cannot read the first record".
This makes me suspect that this "text" file does not have records that end with a Carriage Return and Line Feed (CRLF) at the end of each record. Maybe this text file was created by a "C" program or on a Unix machine, which only put a Line Feed on the end of each record.

When you do a LINE INPUT, it expects to find a CRLF at the end of each record. It keeps reading until it finds it, or it blows up with out of string space error.

You will probably need a Hex editor to see what's on the end of each record. See if XTree has a hex option. Another quick way is to have XTree search the file for a CRLF. If it doesn't find any, than we know it doesn't have any.

Let me know if this is the problem, so we can figure out how to fix it. For example: if XTree has a search and replace option, you could search for a Line Feed and replace with Carriage Return and Line Feed.
*****
Reply
#6
Moneo is right :-).

Some freely available text editors offer the option to save for linux or save for windows (which would add that CRLF at the end of every line.
hen they say it can't be done, THAT's when they call me ;-).

[Image: kaffee.gif]
[Image: mystikshadows.png]

need hosting: http://www.jc-hosting.net
All about ASCII: http://www.ascii-world.com
Reply
#7
Both files, the aprox 140Mby and the other, part of the first one, size aprox 100Kby
have character 0A at the end of each 38/39 bytes record.

The program works perfectly with the small one, but gives "Out of string space" with the big one.

The problem is not the record delimiter.

I also did a:

PRINT FRE("") and got 29856

Alejandro Lieber
Rosario Argentina
he command line is a vestige of an era of macho computing
Reply
#8
Quote:Both files, the aprox 140Mby and the other, part of the first one, size aprox 100Kby
have character 0A at the end of each 38/39 bytes record.

The program works perfectly with the small one, but gives "Out of string space" with the big one.

The problem is not the record delimiter....
Alejandro, Te estoy tratando de ayudar, pero no escuchas bien lo que te digo. Tengo experiencia en estos asuntos. Dime si quieres que te siga apoyando.

You said ".... have character 0A at the end of each ....". Of course. A record ending in only a Line Feed (hex 0A) will end with a 0A just like a record ending with a Carriage Return and Line Feed (hex 0D0A) also ends with a 0A.
Go back and look at the record delimeters again and see what is the character JUST BEFORE the ending 0A. Check both the long and short files. Let me know the results of this.
*****
Reply
#9
Moneo: claro que te escucho y mucho te lo agradezco.
He programado años pero solo con archivos RANDOM, pero no SECUENCIALES.

The character before the end of record 0A is any text.
It is the record data, characters codes between 33 and 127.

If someone can try to open any BIG text file with OPEN and report the results.

Alejandro Lieber
Rosario Argentina
he command line is a vestige of an era of macho computing
Reply
#10
Quote:Moneo: claro que te escucho y mucho te lo agradezco.
He programado años pero solo con archivos RANDOM, pero no SECUENCIALES.

The character before the end of record 0A is any text.
It is the record data, characters codes between 33 and 127.
....
Bien. Mi experiencia en diferentes dialectos de Basic desde 1969 ha sido en acceso secuencial, nunca random.

If the character before the end of record 0A is NOT a 0D, then the records of this file cannot be read with LINE INPUT. I suspect that RANDOM won't work either.

I assume that this is the case for the large file. I suspect that the records of the small file, which can be processed, end in 0D0A. How was this small file created?

Anyway, I see the following options for you:
1) Obtain another version of the large file with records delimited by Carriage Return and Line Feed (0D0A).

2) If XTree has the option, search or 0A and replace with 0D0A, generating a new large file.

3) I had the same problem 10 years ago, and wrote a utility program to replace the record delimiters. I would need to find this program and send it to you.

So, which of the above 3 options is the best for you?
*****
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)