Perfect Antivirus
Perfect Compressor


This article is my attempt to collect and list interesting “proofs” that demonstrate why certain types of “perfect” software tools are impossible to create.

I plan to add more when I can recollect them or if I come across more such demonstrations.

For each of the tools, I first define what is meant by a perfect such tool and then demonstrate why it is not possible to create it. Wherever possible, I have tried to give credit to the author or forum where I first came across the proof.

Perfect Antivirus

Definition: A perfect antivirus would be a program that can reliably detect all computer virii and can remove them from infected programs - or can at least quarantine infected programs.

Notes: We will show that it is not possible in the first place to write a program that can detect all computer virii with 100% accuracy. I came across this proof in a book by Fred Cohen, the pioneer researcher on computer virii.

Proof: Let us suppose that there indeed exists a program that can tell us with 100% accuracy whether a given program is a virus or not. It is easy to see that we can write this program as a procedure (or write a wrapper procedure around the program) that returns true if a given program is a virus, false otherwise. Let us name this procedure isVirus( ).

Now we can write a program that uses this procedure on itself in the following manner:

if( isVirus( self))
/* Behave like a normal program */
/* Behave like a virus */

It can easily be seen that our program always behaves exactly opposite to what the virus detector says is its behaviour. So the virus detector is always wrong about our program and is not 100% accurate after all!

Thus there can never be a program that can always correctly determine if a given program is a virus or not.

Perfect Compressor

Definition: A perfect compressor would be a program that can always compress any given data file to a file with a smaller size, such that a corresponding decompressor will always be able to decompress the compressed file to exactly the same data that was present in the original file without using any extra information.

Notes: We will use the “pigeon hole” principle to demonstrate the impossibility of such a tool. I do not know who came up with this proof first, but it is seen frequently on the comp.compression newsgroup.

Proof: A compressor will have to compress a given sequence of n bytes to a sequence of at most (n-1) bytes to be called a “compressor” and will have to always do it for any such sequence to be even considered for being called “perfect”.

Let us suppose that such a tool exists.

Now each byte can store 256 different values, so there are 256n possible input sequences of n bytes. In other words, there are

256n + 256(n-1) + 256(n-2) + ... + 2561

possible non-empty input sequences of n bytes or less for our compressor.

Since the compressor can output sequences of only (n-1) bytes or less, it can only output

256(n-1) + 256(n-2) + ... + 2561

possible non-empty sequences of bytes of compressed data.

Clearly there just isn't enough room to fit the compressed output of the possible set of input data - we have more pigeons than we have pigeon holes to put them in.

In other words, it is just not possible to build a perfect compressor as defined here.

Note that we didn't even have to consider the additional restriction that n, (n-1), etc. bytes have to be respectively compressed to (n-1), (n-2), etc. or less bytes.