Objectives of Information Theory. How much information about A is in B ?
Make a message as short as possible, but resistant to errors of transmission.
Notation. Any 1:1 mapping F from natural numbers to words over an alphabet of size r satisfies | F(n)| > log_r n, for infinitely many n's.
From this observation, one can infer among others that there are infinitely many prime numbers.
Codes, prefix-free codes (also called instantaneous). Relation to 20-question game: minimizing average number of questions.
Kraft inequality.
11 October 2017
Kraft inequality, continued.
McMillan inequality for arbitrary codes.
Strategies in 20-question games -- relation to codes.
How to minimize the number of questions?
Reminder: convex functions, criterion of the second derivative.
18 October 2017
Reminder continued: Jensen inequality, convexity of x log x.
Gibbs' inequality -- Golden lemma and its consequences. Roughly:
given a probability distribution on a finite set S, under the requirement that a tuple of numbers
n_i satisfies Kraft's inequality,
the minimum of the weighted sum of p_i n_i is achieved for n_i = - log p_i.
Shannon's entropy of a finite probabilistic space.
Relation between entropy and the minimal average length of a code.
25 October 2017
The minimal average length of a code of objects in S is between the entropy H(S) and H(S) + 1.
Coding blocks -- instead of single symbols -- may help. Shannon's Coding Theorem.
Proof uses observation that H(A x B) = H(A) + H(B); consequently H(S^n) = n H(S).
8 November 2017
Entropy of random variable, H(X).
Entropy of a product variable, H(A,B), is at most H(A) + H(B) and equal iff A and B are independent.
Conditional entropy H(A|B). Chain rule: H(A,B) = H(A|B) + H(B).
Mutual information of random variables A and B: I(A;B) = H(A) + H(B) - H(A,B).
15 November 2017
Conditional chain rule: H(A,B|C) = H(A|B,C) + H(B|C).
Conditional information: I(A;B|C) = H(A|C) + H(B|C) - H(A,B|C) (always non-negative).
Mutual information: R(A;B;C) = I(A;B) - I(A;B|C), can be negative (example).
Data processing inequality: I(A;f(B)) is not greater that I(A;B).
Perfect cryptosystem: M (messages), K (keys), C (cipher-texts), such that
I(M;C) = 0. Example: One-time pad.
Shannon's pessimistic theorem: in perfect systems keys must be as long as messages.
22 November 2017
Information channel, input-output pair, channel capacity.
Definition and intuitions. Examples: faithful channel (and inverse), noisy typewriter, binary symmetric channel.
29 November 2017
Decision rule : decoding input from output.
Quality of a decision rule Delta in terms of the probability of correct decoding Pr_C (Delta,A) = p(Delta B = A).
Ideal observer rule (if we know distribution of A), Maximal likelihood rule (otherwise). Average optimality of Maximal likehood rule.
Multiple use of channel.
Independence of symbols principle: p(b_1...b_k | a_1...a_k) = p(b_1|a_1) * ... * p(b_k|a_k).
Examples that neither independence of A's nor
independence of B's guarantees this property.
Characterization of independence of symbols in terms of memorylessness and absence of feedback.
6 December 2017
(lecture held by Szymon Torunczyk)
Multiple use of channel continued. Transmission algorithm, transmission rate.
Improving the reliability of the channel by repetition. Weak tradeoff: error tends to 0, but the transmission rate tends to 0 as well.
The Shannon Channel Coding Theorem. Intuition:
we can achieve arbitrarily small error with the transmission rate close to a constant > 0, namely the channel capacity.
20 December 2017
The Shannon Channel Coding Theorem: proof.
10 January 2018
Weak converse to the Shannon Channel Coding Theorem. If error=0 then the transmission rate must be below the channel capacity.
Codes detecting/correcting a fixed number of errors. Relation to the minimal distance bewteen the code words.
Idea of check bits. Example: PESEL.
Hamming codes correcting one bit and their optimality. Construction of Hamming (7,4), and more generally (2^k-1, 2^k-k-1).
Idea of Kolmogorov's algorithmic complexity. Intuition: the length of a shortest program generating x.
17 January 2018
Kolmogorov's algorithmic complexity cont'd.
Definition: C_U (x) = the length of a shortest string y, such that U(y) = x, for a universal Turing machine U.
Invariance: The choice of an actual universal machine is inessential, up to an additive constant.
There exist infinitely many strings x, whose complexity is at least |x|. Kolmogorov called them random.
The function x |--> C_U(x) is uncomputable.
Universal prefix Turing machine and algorithmic prefix complexity K_U.
A Chaitin Omega number and its properties.
Turing machine with Omega as oracle can solve the halting problem in finite time.
24 January 2018
Chaitin's Omega number cont'd.
Prefixes of Omega are incompressible.
Probabilistic interpretation of Chaitin's constant: probablity that a random (prefix) program halts.
Algorithmic probability p_U of a string y: probablity that a random program generates y.
Connection with Shannon's entropy: logarithm of p_U (y) equals K_U (y) up to an additive constant.
End of the course.
Note. The title of the site is borrowed from a song by Tom Paxton.