Qbasicnews.com

Full Version: Beware of the 4.3 gigabyte boundary
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
Hi guys,
Recently I made a comment that volume will blow away many a good program. Well, at work, for backup purposes, we tried to zip a SQL database file with 4.3 gigabytes using WinZip. Guess what? WinZip can't handle that large a file and issued an error.

I thought about it and then realized that 4.3 gigs is over the boundary of 2 to the 32nd power, the greatest absolute value that you can keep in a 32 bit register machine.

I tried my personal PKZIP 2.04g on the file and also got an error. Then I went to the PKWARE site, and discovered that they already encountered this problem and have newer versions: PKZIP 2.5 and PKZIP 6.0 for Windows. I'm deciding on a replacement for WinZip.

I wonder how many other programs are out there that will bomb when given a 3.4+ gig file.

In QB, when you define a LONG variable, it can only be up to 2.x gigabyes. So if your program does a LOF command on a larger file, it will blow up. This is only one example of this. How many other things will not work?

Does anyone have any experience or ideas on this matter?
*****
It would add a lot of code to make a zip that large and such huge file sizes are very rarely zipped... A quicksort, for example, takes up more memory with a larger size than a smaller one, so multiple versions have to be made.
Quote:Recently I made a comment that volume will blow away many a
good program. Well, at work, for backup purposes, we tried to zip a SQL database file with 4.3 gigabytes using WinZip.

Many 32bit filesystems won't allow files larger than 4Gb either, FAT32 is limited at 4Gb, however NTFS doesn't have any limit on file size.

If you are using NTFS you could try using Windows native file compression on the file to reduce it below 4Gb and then archive it with Winzip. Alternatively you could look at a file splitting utility to break the file into a number of smaller pieces.

Hopefully this problem will be resolved when 64bit operating systems/computers become mainstream, in the meantime projects such as this http://ftp.sas.com/standards/large.file/ are aiming to get standarised large file support on 32bit operating systems (mainly Unixes).
How would your memory handle something that big, anyway?
What may (also) be the problem is that the ZIP file format only allows 4 bytes (32 bits) for both uncompressed size and compressed size - see pkware's ZIP specification at wotsit.org. This is from 2001 though, there might be a newer version of the file format now that supports larger file sizes - I dunno.

BTW, what version of WinZip are you using?

Quote:How would your memory handle something that big, anyway?
WinZip doesn't need to load the entire file into memory to work with it. Besides input/output buffers (which Windows probably takes care of), adaptive dictionary-based compression algorithms only have to store the dictionary in memory. And they can limit the dictionary size and still compress effectively. For a quick overview of how it works, go here. EDIT: this is clearer and explains it better. Found both links by googling "data compression". This site has links to tutorials on various compression algorithms.

Quote:It would add a lot of code to make a zip that large and such huge file sizes are very rarely zipped... A quicksort, for example, takes up more memory with a larger size than a smaller one, so multiple versions have to be made.
Not so. C++ compilers usually support the long long type, which is 64 bit, and it takes no more C++ code to use it than to use an int. The compiler does have to generate a little more machine code for it, but it's not that much more.

And thankfully, nothing like quicksort is required. Whether or not the file length is greater than 2^32 or very small makes no difference to the way the algorithm builds the dictionary - the compressor can keep the dictionary as small as it wants at the expense of compression quality/efficiency. See the links above.
What on earth do you have in that database????!?!?! Even a database 100Mbs would store **Massive* amounts of data. 4.3 Gigs??!?!
Well, say you have a large company, 1500 employees....150 different tables with lots of columns, storing info about them all...
I'd say you could easily fill up 4.3 gigs in a database.
Quote:Not so. C++ compilers usually support the long long type, which is 64 bit, and it takes no more C++ code to use it than to use an int.

Thats not always true, you are still limited by the system calls to the filesystem. For example, Unix used to use the following two system calls for getting and setting the offset for a file stream:
Code:
int fseek(FILE *stream, long int offset, int whence);
long int ftell(FILE *stream);

The problem with these two calls is that the offset is limited to fitting inside a long int, so you can't just replace it with a long long. To fix this problem, two new calls were introduced:
Code:
int fsetpos(FILE *stream, const fpos_t *pos);
int fgetpos(FILE *stream, fpos_t *pos);

Now the offset is limited to whatever fpos_t is defined as, most Unixes have something like:
Code:
typedef long long fpos_t;

Other problems like this always crop up when attempting to port code from 32bit to 64bit.
Yeah, I didn't think about that.
LooseCaboose,

What is NTFS?

Well, this subject about 4.3 gig files is really weird. Since I made a test file of 4.4 gigs last week, I tried some of my old software. All those that read files crashed including Norton's famous FS file search.

If volumes keep getting larger and larger, we may be faced with a problem similar to Y2K in the near future. Can you imagine having to fix all programs that read/write files to handle files beyond the 4.3 gig boundary? Wow!
*****
Pages: 1 2 3 4 5 6 7