Objectives of Information Theory. How much information about A is in B ?
Make a message as short as possible, but resistant to errors of transmission.
Notation. Any 1:1 mapping F from natural numbers to words over an alphabet of size r satisfies | F(n)| > log_r n, for infinitely many n's.
From this observation, one can infer among others that there are infinitely many prime numbers.
Codes, prefix-free codes (also called instantaneous). Relation to 20-question game: minimizing everage number of questions.
Kraft inequality.
12 October 2016
McMillan inequality for arbitrary codes.
Jensen inequality for convex functions. Convexity of x log x.
Golden lemma. Roughly: the sum of P(s) log_r (1/P(s)) is minimal over the sums of
P(s) log_r (1/r^{L(s)}), for L(s) satisfying the Kraft inequality.
19 October 2016
Entropy of a finite probabilistic space.
Relation between entropy and the minimal average length of a code.
Coding blocks (instead of single symbols) may help.
Shannon's Coding Theorem (statement).
26 October 2016
Entropy of product space. Proof of Shannon's Theorem.
Entropy of random variable.
Mutual information I(A;B) of random variables A and B is the (always non-negative) difference H(A) + H(B) - H(A,B).
Conditional entropy. Chain rule. Conditional chain rule.
2 November 2016
Further properties of conditional entropy. Venn diagrams.
Mutual information R(A;B;C), example when it is negative.
Application to Perfect secrecy. Example: One-time pad.
Shannon's pessimistic theorem: in perfect systems keys must be as long as messages.
Inequality I(A;f(B)) is not greater that I(A;B).
9 November 2016
Information channel, input-output pair, channel capacity.
Definition and intuitions. Examples: faithful channel (and inverse), noisy typewriter, binary symmetric channel.
16 November
Decision rule: decoding input from output.
Quality of decision rules in terms of minimizing the transmission error.
Ideal observer rule, Maximal likehood rule.
Average optimaility of the Maximal likehood rule.
23 November
Multiple use of channel.
Independence principle: p(b_1...b_k | a_1...a_k) = p(b_1|a_1) * ... * p(b_k|a_k).
Characterization in terms of Memorylessness and Absence of feedback.
Improving reliability by repetition. Weak tradeoff: error tends to 0, but the length of message tends to infinity.
30 November
Transmission algorithm, transmission rate.
The Shannon Channel Coding Theorem. Statement, intuition, idea of proof (not yet a complete proof).
14 December
Proof of Shannon's theorem (continuation).
21 December
Proof of Shannon's theorem (end).
Error correcting codes --- short discussion.
11 January 2017
Information vs. complexity in Shannon's theory. New idea of Kolmogorov: algorithmic complexity.
Intuition: the length of a shortest program generating x.
Definition: C_U (x) = the length of a shortest string y, such that U(y) = x, for a universal Turing machine U.
Properties: The choice of an actual universal machine is inessential, up to an additive constant.
There exist infinitely many strings x, such that C_U (x) >= |x|, which Kolmogorov called random.
The function x |--> C_U(x) is uncomputable.
18 January
Algorithmic prefix complexity K_U.
Universal prefix Turing machine.
A Chaitin Omega number and its properties.
Turing machine with Omega as oracle can solve the halting problem in finite time.
Prefixes of Omega are incompressible.
25 January
Probabilistic interpretation of Chaitin's constant: probablity that a random (prefix) program halts.
Algorithmic probability p_U of a string y: probablity that a random program generates y.
Connection with Shannon's entropy: logarithm of p_U (y) equals K_U (y) up to an additive constant.
End of the course.