And what I say three times is true. — The Bellman
(The Hunting of the Snark)
Context approximately equals language game, approximately equals background knowledge/prior agreement/protocol/ etc etc.
Shannon's context is information
transfer: I post - you read. We are used to a faithfully exact transfer; that what you read is exactly what I wrote, complete with typos. We are used in this context, to the elimination of noise. And this is done by the application of Shannon's theory.
But if one downloads a large app, one generally 'verifies' it because there is always some noise and thus the possibility of a 'wrong' bit. Verification uses redundancy to (hopefully) eliminate noise. And the Bellman does the same thing. Saying everything three times is very redundant, but reduces 'noise' if one compares the three versions and takes the average. A checksum does some of the job at much less informational cost, in the sense that if the checksum matches the probability of error is vanishingly small, but if it fails to match, there is no way to correct, but one must begin again.
Good writing somewhat tends to follow the Bellman's method; the introduction sets out what it to be said, the body of the piece says it, and the conclusion says what has been said. Redundancy is somewhat misnamed, because it helps reduce misunderstanding.
So, now imagine as analogy to the rain in the Sahara an array of 100 * 100 pixels - black or white, 0 or1.
There are 2^10,000 possible pictures. That is a
large number. But now consider the Saharan version, where we know that almost all the pixels are black, or 0. Obviously, one does not bother to specify all the dry days in the Sahara, one gives the date of the exceptional wet day, and says, "else dry".
In the same way, any regularity that might predominate, stripes, chequerboards or whatever, can be specified and then any deviations noted. This is called compression, and is the elimination of redundancy. The elimination of redundancy is equivalent to the elimination of order. Maximum compression results in a message that is maximally disordered and thus
looks exactly like noise.
This then explains the rather counter-intuitive finding that disordered systems contain more information than ordered systems and thus that entropy reduces available energy but increases information. One proceeds from the simple 'sun hot, everything else cold', to a much more complex, but essentially dull 'lukewarm heat death of everything'.