Poll: Lemon trees are nice...
You do not have permission to vote in this poll.
Yes
100.00%
6 100.00%
Total 6 vote(s) 100%
* You voted for this item. [Show Results]

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Does anybody know anything about compression?
#11
I always thought Huffman encoding was easy, I had to write a Huffman encoder in Java as a first year project, it gives really nice results for the right kinds of data as well. :-)

* digs out notes on JPEG format *

Yeah that is nasty, even so far as to have different compression formulas for PAL and NTSC JPEGs. Compression is done around two chrominance and one luminance values (because the eyes responds more to the former) and breaking the image into 8x8 blocks. The blocks are then passed through a Discrete Cosine Transformation (DCT) to create a set of 8x8 matrices. Quantization then removes the less important DCT values, each individual application must provide its own quantization tables for loss-compression tradeoff (these arent part of the JPEG standard). The differences of the {0, 0} values are then calculated, reducing most of them to small values.

The 64 elements are then linearized with a zig-zag and compressed using RLE (;-)), usually there are lots of zeros in the lower right corner of the matrix so this compresses quite well. Then the resulting data is Huffman encoded for easy transmition. JPEG gives about 20:1 compression ratio. Running the algorithm backwards will decompress it in about the same time taken to compress it.

JPEG is a bit tricky, but at least it isnt as bad as MPEG (which is the same only worse) or bzip which one of my tutors commented on "The author of bzip compression was clearly on heavy drugs at the time and I could show you how the compression algorithm works, but you honestly wouldnt believe me" ;-).
esus saves.... Passes to Moses, shoots, he scores!
Reply
#12
The idea of how the LZW decoder predicts next character is just genial! :o (a very simple and effective algorithm, not so easy to catch).

Huffman algorithm is easy to understand, very theoric, not a spark of imagination, just statistics.
Jpeg is a funny thing: an algorithm on the top of an algorithm on the top of.. It's difficult to put all together and it's harder to guess why all that complication is needed. Because it was designed by a comission? :???:

Some day I will try progressive JPEG's, they add a new layer of complexity, more fun!. I need to master EMS before.. :-?


Bzip sounds interesting. Where i can find something about it? :bounce:
Antoni
Reply
#13
The home page for bzip is at:
http://sources.redhat.com/bzip2/#what-is-bzip2
You can get the freely available source there which is fully ANSI complaint so it should compile on any platform. Check the futher reading section in the help page for information regarding the algorithm. The author comments that bzip doesnt actually present any new ideas in compression, but is more of an engineering of old ideas. I think you would have to use the source code in conjunction with the cited papers in order to get a full grasp on how the compression method works. I havent actually looked at it myself because I havent had the time with so many other projects and issues in real life™.
esus saves.... Passes to Moses, shoots, he scores!
Reply
#14
I have worked out a genial but bad way of compressing
files: More or less a file is made off ascii chars which is
values from 0-255. I f you then could make a mathematical
algorithm to decide what ascii number a character in the
nth position
=
Code:
asciiChar=(filePosition*52)^filePosition/LotsOfOtherMathStuff

If you know what I mean

...But the disadvantage of it is that the equation tend to be
bigger than the initial file...

but it would be quite nice :wink: ...
/post]
Reply
#15
Yep, I thought of the same thing before. A minimal equation to represent your file. That should compress it to the theoretical maximum.

But how does one get such an equation?

Plot the ASCII characters on an X (LOF) / Y (0 to 255) graph..

You could use a genetic algorithm to find approximate matches.. but if the approximate match corrupts the whole file, what then?

It has to be perfect.

I suppose you could start with a bunch of *exact* (no decimals for ASCII chars!!) sine/cosine curve approximations... Then you could use some exotic logic system to shorten your N equations to M equations, where M is the theoretical maximum.

You could start by enveloping most of the curves as determined by a density analysis into one sine/cosine curve and then using another analysis make another one, and so forth...

The COS/SIN format can be provably the smallest in existence for the current data when you can't compress the actual SIN/COS data anymore!!!!
Peace cannot be obtained without war. Why? If there is already peace, it is unnecessary for war. If there is no peace, there is already war."

Visit www.neobasic.net to see rubbish in all its finest.
Reply
#16
Just found a link to a series of articles on compression
http://www.rebolforces.com/articles/compression1.html
it seems interesting and well explained.
Antoni
Reply
#17
png is based somewhat on zip compression right? Anyways thats something to look up, loading up png, the whole format is open source and comes with libs ans stuff so i guess its a good start.

The reason i assume png uses some form of zip compression is becuase well, it needs zlib to run.

And i think zip is lzw right? and upx uses a slighltly different form of lzw.
b]Hard Rock[/b]
[The Stars Dev Company] [Metal Qb flopped] [The Terror]
Stop Double Posts!
Whats better? HTML or Variables?
Reply
#18
Red_Marvin, your approach may actually work correctly if you try to create the mathmatical formula for blocks of data rather than the entire file. For example if a sequence of bytes in the file were a, b, c, d, e, f, g, ... Then you could create a formula for that block and store a block lookup table at the beginning of the compressed file.

The problem is that not all data can be represented by mathmatical formula, completely random data being the prime example and that it would be computationaly expensive for a compression algorithm to try and derive formulas from data, how would the machine know how big to make the blocks, or the best arrangement.

Most good compression algorithms (ie JPEG and bzip) are hybrids of several compression techniques. A good compression algorithm should also yield good results for most types of data, ie an algorithm that compresses some data very well, but makes random data larger is not ideal.
esus saves.... Passes to Moses, shoots, he scores!
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)