Objectives of Information Theory.
Measure information. How much information about A is in B ?
Make a message as short as possible, but resistant to errors of transmission.
Understand randomness.
Notation. Any 1:1 mapping F from natural numbers to words over an alphabet of size r satisfies | F(n)| > log_r n, for infinitely many n's.
From this observation, one can infer among others that there are infinitely many prime numbers.
Codes, prefix-free codes (also called instantaneous). Relation to 20-question game: minimizing average number of questions.
Kraft inequality: necessary and sufficient condition for a function to be the length of some prefix-free code.
10 October 2018
McMillan inequality for arbitrary finite codes.
Reminder: Convex functions, criterion of the second derivative, Jensen inequality, convexity of x log x.
Gibbs' inequality -- Golden lemma and its consequences. Roughly:
given a probability distribution on a finite set S, under the requirement that a tuple of numbers
n_i satisfies Kraft's inequality,
the minimum of the weighted sum of p_i n_i is achieved for n_i = - log p_i.
Shannon's entropy H(S) of a finite probabilistic space (always between 0 and log |S|).
Relation between entropy and the minimal average length of a code.
17 October 2018
The minimal average length of a (binary) code of objects in S is between the entropy H(S) and H(S) + 1.
Coding blocks of n symbols -- instead of single symbols -- brings the code length closer to the entropy. Shannon's Coding Theorem.
The proof uses observation that H(A x B) = H(A) + H(B); consequently H(S^n) = n H(S).
Entropy of random variable, H(X).
24 October 2018
Conditional entropy H(A | B).
Entropy of a product variable, H(A,B), is at most H(A) + H(B), and equal to iff A and B are independent.
Mutual information of random variables A and B: I(A;B) = H(A) + H(B) - H(A,B); always non-negative and equal 0 iff A and B are independent.
Chain rule: H(A,B) = H(A | B) + H(B) = H(A) + H(A | B). Consequently I(A;B) = H(A) - H(A | B) = H(B) - H(B | A).
Conditional chain rule: H(A,B | C) = H(A | B,C) + H(B | C).
Conditional information: I(A;B | C) = H(A | C) + H(B | C) - H(A,B | C), always non-negative and equal 0 iff A and B are conditionally independent given C.
Mutual information: R(A;B;C) = I(A;B) - I(A;B | C), can be negative! (example).
Useful visualisation: Venn diagram.
31 October 2018
Data processing inequality: I(A;f(B)) is not greater that I(A;B).
Perfect cryptosystem: M (messages), K (keys), C (cipher-texts), such that
I(M;C) = 0. Example: One-time pad.
Shannon's pessimistic theorem: in perfect systems keys must be as long as messages.
Information channel, input-output pair, channel capacity.
Definition and intuitions. Examples: faithful channel (and inverse), noisy typewriter.
7 November 2018
Capacity of binary symmetric channel (1-H(P)).
Decision rule : decoding input from output.
Quality of a decision rule Delta in terms of the probability of correct decoding Pr_C (Delta,A) = p(Delta B = A).
Ideal observer rule (if we know distribution of A), Maximal likehood rule (otherwise). Average optimality of Maximal likelihood rule.
Multiple use of channel.
Independence of symbols principle: p(b_1...b_k | a_1...a_k) = p(b_1|a_1) * ... * p(b_k|a_k).
Examples that neither independence of A's nor
independence of B's guarantees this property.
14 November 2018
Characterization of independence of symbols in terms of memorylessness and absence of feedback.
Improving the reliability of the channel by repetition. Weak tradeoff: error tends to 0, but the transmission rate tends to 0 as well.
Transmission algorithm (encoding, transmission, decoding).
Why the worst scenario is when the input to the channel has uniform distribution.
21 November 2018
Transmission rate of a code. Example: how to achieve zero-error for noisy typewriter.
The transmission rate of a zero-error code is upper-bounded by the channel capacity.
The Shannon Channel Coding Theorem: we can achieve arbitrarily small error with the transmission rate arbitrarily close to the channel capacity.
An overview of the proof: a geometrical intuition and the use the Weak Law of Large Numbers.
28 November 2018
The Shannon Channel Coding Theorem: end of the proof.
The existence of the desired codes is showed by a probabilistic method.
12 December 2018
Codes detecting/correcting a fixed number of errors. Relation to the minimal distance bewteen the code words.
Idea of check bits. Example: PESEL.
Hamming codes correcting one bit and their optimality. Construction of Hamming (7,4), and more generally (2^k-1, 2^k-k-1).
19 December 2018
Kolmogorov's algorithmic complexity. Intuition: the length of a shortest program generating x.
Definition: C_U (x) = the length of a shortest string y, such that U(y) = x, for a universal Turing machine U.
Invariance: The choice of an actual universal machine is inessential, up to an additive constant.
Infinitely many strings x have complexity is at least |x|.
(Intuitively: the shortest program generating x is: write(x).)
Kolmogorov viewed such strings as
random.
The function x |--> C_U(x) is uncomputable.
An alternative definition: algorithmic prefix complexity K_U defined with a
universal prefix Turing machine.
9 January 2019
The Chaitin Omega number and its properties.
Turing machine with Omega as oracle can solve the halting problem in finite time.
Prefixes of Omega are incompressible.
Probabilistic interpretation of Chaitin's constant: probablity that a random (prefix) program halts.
Algorithmic probability p_U of a string y: probablity that a random program generates y.
Connection with Shannon's entropy:
logarithm of p_U (y) equals K_U (y) up to an additive constant.
16 January 2019
Connection with Shannon's entropy:
logarithm of p_U (y) equals K_U (y) up to an additive constant:
The proof based on a simultaneous approximation of all numbers p_U (y), representing these number as
intervals I_{t,y} on the real line,
and selecting the left end of a maximal binary interval included in I_{t,y} as
a definition of y.
Application of the Kolmogorov complexity in an
alternative proof of the Goedel incompleteness theorem in mathematical logic.
23 January 2019
Randomness as lack of regularity. The concept of test. When is one test better than another?
A universal test of Martin-Lof majorizing all (partially computable) tests.
Entropy of English, a (sketchy) overview of two approaches: by guessing game, and by gambling.
End of the course.
Note. The title of the site is borrowed from a song by Tom Paxton.