Login

DrV · 09-19-2005, 06:03 AM

You could also avoid using LINE INPUT and just read a buffer of a certain number of bytes, searching for 0A and splitting at those points.

Moneo · 09-19-2005, 06:41 AM

Quote:You could also avoid using LINE INPUT and just read a buffer of a certain number of bytes, searching for 0A and splitting at those points.

Yes, this could be done, but then the I/O becomes the major part of this program, with figuring out where records begin and end, whether a record spans the buffer size, and detecting end-of-file.

You would be programming around a file input problem instead of fixing the file problem in the first place.
*****

Moneo · 09-20-2005, 12:30 AM

Alejandro,

Of the 3 options that I mentioned before, I see only options 1 and 3 working correctly.
Option 1 goes back to the source of the file to get the file with CRLF only. This is the best, which eliminates us from having to manipulate the file.

Option 3 uses my utility to convert the delimiters to CRLF. I found this utility program, which I wrote in C, and the documentation indicates that UNIX text files can contain several combinations of record delimiters, not just Line Feed alone, like:
Carriage Return, Carriage Return, Line Feed
Carriage Return and Line Feed
Carriage Return and Form Feed
Carriage Return only
Line Feed only
Form Feed only

Not knowing what system this file was created under, If they won't do the conversion to CRLF for us (option 1), then I suggest we use my utility.

Option 2 of using XTree becomes a manual problem to run the search and replace for every combination of delimeters.

No matter what method we use to fix the file, your original program should include a test of the length of each record. As per your specifications, the records must be 48 or 49 bytes long. If your program finds a record that is not 48 or 49, then there was some file conversion problem.
*****

Oz · 09-20-2005, 03:36 AM

may i also suggest a BINARY file

just even the bytes per record to whatever (probably 50 would be easiest) by adding spaces or a "NULL" characters

then, you could search for records by using

[syntax="qbasic"]GET #ff, rec_num% * 50, somevar$[/syntax]

that would be the easiest for records

Oz~

Moneo · 09-20-2005, 04:19 AM

Oz,
That's a good idea for later, but first we need to get this "text" file into a standard format with only CRLF as record delimiters.
*****

alajandrolieber · 09-20-2005, 05:36 AM

Moneo was right.

OPEN will not detect a new record with only 0A as the last character.

So QBasic could not open the file because it saw it as a one 140Mby long record file.

I have the program GSAR: General Search and Replace Utility by Tormod Tjaberg.

It seems it can do what I need:

gsar -ud -o file.txt

will rewrite file.txt replacing each 0A with 0D0A (UNIX to DOS)

I will try it next wednesday.

Alejandro Lieber
Rosario Argentina

Moneo · 09-20-2005, 07:20 AM

Alejandro, Please read my comments below.

Quote:Moneo was right.

OPEN will not detect a new record with only 0A as the last character.
NO. "OPEN" DOES NOT DETECT RECORD DELIMITERS. THE PROBLEM OCCURS ON THE "LINE INPUT".

So QBasic could not open the file because it saw it as a one 140Mby long record file.
AGAIN, NOT A QBASIC "OPEN" PROBLEM. THE FIRST EXECUTION OF "LINE INPUT" SAW THE HUGE RECORD.

I have the program GSAR: General Search and Replace Utility by Tormod Tjaberg.

It seems it can do what I need:

gsar -ud -o file.txt

will rewrite file.txt replacing each 0A with 0D0A (UNIX to DOS)

I will try it next wednesday.
IN A PREVIOUS POST, I MENTIONED AT LEAST 5 OTHER COMBINATIONS OF RECORD DELIMITERS WHICH "COULD" APPEAR ON THE FILE. UNLESS YOU ARE ABSOLUTELY SURE THAT ALL THE RECORDS ARE DELIMITED ONLY BY ONE LINE FEED, THEN THIS GSAR PROGRAM WILL NOT WORK AND ONLY SCREW UP THE RECORD DELIMITERS EVEN MORE.
IF YOU KNOW EXACTLY HOW MANY RECORDS THE FILE HAS, YOU COULD USE XTREE OR OTHER UTILITY TO COUNT THE NUMBER OF LINE FEEDS THAT ARE ON THE FILE. IF THESE COUNTS COINCIDE EXACTLY, THEN YOU CAN USE THE GSAR UTILITY.

Alejandro Lieber
Rosario Argentina

Alejandro,
Where was this file produced? Was it on Unix? Is it a "print file"? Are you familiar wht the program or utility that produced the exact version of this large file?

I've asked you these questions before, but you don't answer. You seem to want to find alternative solutions. I've told you before that I was confronted with this problem before. It is not a simple problem, and the solution is not simple either.

The simplest solution, as I've said before, is to go back to the source of the data file, and request a version with records delimited only by CRLF. Is this option not feasible?

Again, if you run the GSAR program without the assurance that ALL the records are delimited only by one Line Feed, that is, without having counted, then you will be headed for disaster. Believe me.
*****

alajandrolieber · 09-20-2005, 07:47 AM

>IN A PREVIOUS POST, I MENTIONED AT LEAST 5 OTHER COMBINATIONS
>OF RECORD DELIMITERS WHICH "COULD" APPEAR ON THE FILE.
>UNLESS YOU ARE ABSOLUTELY SURE THAT ALL THE RECORDS ARE
>DELIMITED ONLY BY ONE LINE FEED, THEN THIS GSAR PROGRAM WILL
>NOT WORK AND ONLY SCREW UP THE RECORD DELIMITERS EVEN
>MORE.

All the records are delimited by one LF

>Where was this file produced? Was it on Unix? Is it a "print file"? Are you
>familiar wht the program or utility that produced the exact version of this large
>file?

The file was produced by the Argentine goverment.

>The simplest solution, as I've said before, is to go back to the source of the >data file, and request a version with records delimited only by CRLF. Is this >option not feasible?

I have told them the problem in mixing 38 and 39 bytes records.
Nothing happened in the following release.

>Again, if you run the GSAR program without the assurance that ALL the >records are delimited only by one Line Feed, that is, without having counted, >then you will be headed for disaster. Believe me.

I believe you. But the original file is in a CD, so I can try several ideas.

Alejandro Lieber

Moneo · 09-20-2005, 10:57 PM

Ok, Alejandro, it looks like you are in control of all the information.

So, are you going to run the CSAR program? Later, when you have a new large file with records delimited by CRLF, don't forget to put the following logic that I suggested into your original progam:

..... your original program should include a test of the length of each record. As per your specifications, the records must be 48 or 49 bytes long. If your program finds a record that is not 48 or 49, then there was some file conversion problem.

You should also add a record count to the program, printing it out at the end, to insure that the number of records processed coincides with your original specifications.

Let us know of the results or any new problems. Good luck!
*****

alajandrolieber · 09-22-2005, 03:08 AM

Moneo:

Tu ayuda ha sido imprecindible.

Muchas gracias, espero poder ayudarte algÃºn dia.

Alejandro Lieber
Rosario Argentina

Login
Username:
Password:	Lost Password?
	Remember me