Lectures 4-5
Divide and Conquer
Algorithms based on the divide and conquer technique reduce
larger inputs into a inputs which are b times smaller,
solve them using the same algorithm, then
combining them into the solution of the full problem. Binary search can be understood as
an example of the divide and conquer technique -- after comparing x with
the number in the middle, it is sufficient to solve $a=1$ problem which is $b=2$ times
smaller! We will see more examples in this lecture.
Example: multiplication of polynomials
In this example we use arrays to denote polynomials. The array $a[0..d]$ is used
to denote the polynomial $A(x) = \sum_i a[i] x^i$ (and similarly $b$, $c$, etc.).
We want to multiply two such polynomials.
INPUT: Arrays a[0..d-1], b[0..d-1]
OUTPUT: Array c[0..2d-2] such that $C(x) = A(x) B(x)$.
For example, for $a=[1,1,1]$, $b=[1,2,2]$, we have $c=[1,3,5,4,2]$.
If you prefer, you can think about multiplication of numbers rather than polynomials
-- $A(10)$ is the number, and $a$ are its consecutive digits (starting with the
least significant one) -- the example above corresponds to $111\cdot 221=24531$.
Numbers in a positional system are multiplied exactly like polynomials, but slightly more
because of having to 'carry' when a temporary result is greater than 9.
The trivial algorithm which we have learned in the primary school simply multiplies each
value in a with each value in b. This algorithm has time complexity of
O(n2).
The more efficient Karatsuba algorithm is based on divide and conquer. Assume that $d=2s$.
Split the arrays a and b into two halves of length s each. Thus,
we obtain $a_0=a[0..s-1]$, $a1=a[s..d-1]$, $b_0=b[0..s-1]$, $b_1=b[s..d-1]$.
If we denote by $A(x)$, $B(x)$, ... the value of the polynomial described by $a$, $b$, ...,
we have $A(x) = A_0(x) + x^s A_1(x)$, and $B(x) = B_0(x)+x^s B_1(x)$. Therefore,
$C(x) = A(x) \cdot B(x) = A_0(x)B_0(x) + (A_0(x)B_1(x) + A_1(x) B_0(x))x^s + A_1(x) B_1(x)x^d$.
Let $C_0(x)=A_0(x)B_0(x)$, $C_1(x)=(A_0+A_1)(x)(B_0+B_1)(x)$, and $C_2(x) = A_1(x)B_1(x)$;
thus, we have $C(x) = C_0(x) + (C_1(x)-C_0(x)-C_2(x))x^s + c_2(x)x^d$. This leads to the following
algorithm (pseudocode):
to multiply(a[0..d-1], b[0..d-1]):
if d<1:
just multiply a[0] by b[0]
else:
let s=d/2
a0=a[0..s-1]
a1=a[s..d-1]
b0=b[0..s-1]
b1=b[s..d-1]
c0=multiply(a0,b0)
c1=multiply(a0+a1,b0+b1)
c2=multiply(a1,b1)
return c0 + shift(c1-c0-c2, s) + shift(c2, d)
(Note: This pseudocode assumes that $d$ is even -- if not, we can simply add an extra 0)
In this pseudocode, + and - correspond to addition/subtraction of polynomials, which can be done
in $O(d)$ -- simply return the array whose $i$-th element is $a[i]±b[i]$ for each $i$. shift(a, i) adds i zeros to the front of array a, which
corresponds to multiplcation of $A$ by $x^i$.
What is the time complexity of this algorithm? Note that to multiply $A$ and $B$ of length $d>1$, we need to do three
multiplications of numbers of length $s=d/2$, and several additions/subtractions, which are all done in $O(d)$. Therefore,
if we denote the running time of this algorithm for arrays of length $d$ by $T(d)$, we get that $T(d) = 3T(d/2) + O(d)$
for $d \geq 2$ (and $T(1) = O(1)$).
How to solve this? Assume that $d=2^n$. By rewriting $T(d/2)$ using the same formula, and repeating until we reach $T(1)$,
we obtain that $T(2^n) = O(2^n) + O(2^{n-1}3) + \ldots + O(3^n) = O(3^n)$. Thus, $T(d) = O(3^{\log_2 d}) = O(d^{\log_2 3})$
as long as $d$ is a power of 2. If $d$ is not a power of 2, take $d'$ to be the smallest power of 2 larger than $d$ --
we still have $T(d) < T(d') = O(d'^{\log_2 3}) = O(d^{\log_2 3})$, because $d'< 2d$.
Therefore, the Karatsuba algorithm based on Divide and conquer runs in time $O(d^{\log_2 3}) = O(d^{1.585})$, which is better than the
trivial algorithm running in time $O(d^2)$. The memory complexity of both algorithms is $O(d)$ (for Karatsuba algorithm this will be shown below).
Thus, for example, if we take $d=1024$, we can expect the trivial algorithm to
execute 1048576 multiplications, while the Karatsuba algorithm will execute just 59049 multiplications -- much faster!
In practice, the hidden constant in the $O$ notation is quite high, so the trivial algorithm will still run faster for small
values of $d$ -- we can solve this by changing the first line of our algorithm "if $d< 1$" to use the trivial algorithm for
these small values.
Note that Karatsuba algorithm is not the best known algorithm for this problem --
Fast Fourier Transform (FFT) can be used to multiply polynomials or large numbers in roughly $O(d \log d)$ (this algorithm will
not be covered in this course).
Master theorem for divide-and-conquer recurrences
The Master theorem for divide-and-conquer recurrences
provides a general way to solve recurrences like this. Suppose that $T(n) = aT(n/b)+f(n)$. Let $c=\log_b a$. Then:
- if $f(n) = O(n^{c'})$ for $c'< c$, we have $T(n) = \Theta(n^c)$;
- if $f(n) = \Theta(n^c)$, we have $T(n) = \Theta(n^c \log n)$ (more generally, if $f(n) = \Theta(n^c \log^k n)$, then $T(n) = \Theta(n^c \log^{k+1} n)$);
- if $f(n) = \Omega(n^{c'})$ where $c'>c$, and $af(n/b) \leq kf(n)$ for some $k<1$ (so called regularity condition), then $T(n) = \Theta(f(n))$.
In the first case, the recursive part (solving the subproblems) dominates the non-recursive parts (splitting into subproblems and merging the results).
In the last case (which rarely occurs in practice), the non-recursive part dominates. In the middle case, both the recursive and non-recursive part
contribute into the result complexity. Examples:
- $T(d) = 3T(d/2) + O(d)$ (Karatsuba algorithm) is an example of the first case,
- $T(d) = 7T(d/2) + O(d)$ is achieved by the Strassen algorithm for multiplying matrices; the algorithm could be understood as the matrix version of the
Karatsuba algorithm. By solving the recursion, we get $T(d) = O(d^{log_2 7}) = O(d^{2.808})$, again with a rather big constant.
Matrix multiplication is an important research problem -- algorithms better than $O(d^{2.808})$, but we expect to find even better algorithms in the future.
- Binary search can be described with $T(d) = O(1) + T(d/2)$, which is covered by the second case.
- Merge sort algorithm (described below) has the following recursive relation: $T(d) = O(d) + 2T(d/2)$, which is also covered by the second case. Thus, merge sort
runs in time $T(d) = O(d \log d)$.
- We could write a recurrence for the memory complexity of Karatsuba algorithm: to multiply polynomials of size $d$, we use several auxiliary arrays of size $d$,
and perform the multiplication of polynomials of size $d/2$; since these multiplications can reuse the same memory, we get the recurrence $T(d) = O(d) + T(d/2)$. This is covered
by the last case, and we get that the memory complexity is $O(d)$.
Sorting
Now, we will be talking about the problem of sorting:
INPUT: Array a[0..n-1]
OUTPUT: Array a'[0..n-1] such that $a$ and $a'$ have the same elements, but $a'[0] \leq a'[1] \leq ... \leq a'[n-1]$.
Although we need to sort something in our programs very frequently, we rarely need to actually implement a sorting algorithm,
as all the popular programming languages have sorting functions in their standard libraries. However, it is still good to
know the theory, and there are situations where our program will run faster if we implement our own sorting algorithm.
Furthermore, sorting algorithms are a good showcase of various basic techniques used throughout algorithmics, from
ones we have already learned (counting, Divide and Conquer) to new ones (data structures).
Some theory first...
Theorem: Any sorting algorithm based on comparison requires $\Omega(n\log n)$ comparisons to sort an array of length $n$.
Proof (sketch). This could be viewed as a puzzle: we have $n$ coins and we want to order them from lightest to heaviest. We can compare
two coins using the scales. How to do this if we want to use the scales the smallest possible number of times?
As an example, let's try $n=5$. The coins can be ordered in 5! = 120 possible ways. After one comparison, 60 ways are left. If we
perform the second comparison correctly, there are at least 30 possibilities left in the worst case (i.e., the one where the number
of possibilities is the greatest); by continuing this process (120 -> 60 -> 30 -> 15 -> 8 -> 4 -> 2 -> 1), we learn that at least
7 comparisons are necessary in the worst case. In general, we need $\log_2(n!)$ comparisons, which is $\Omega(n \log n)$
(because $n! > (n/2)^{n/2}$).
Classic sorting algorithms
The following algorithms have been covered in the lecture. We only present the general idea in these notes
-- the details can be found e.g. in the CLRS book; Internet sources such as Wikipedia tend to be rather reliable too.
InsertionSort
InsertionSort works as follows. We add the first, second, third, fourth, etc. element to the sorted array, always inserting it into the correct position.
Unfortunately, as we already know, whenever we insert a new element into an array which has $k$ elements, we need $O(k)$ time to push the later elements to the right.
This makes the time complexity of InsertionSort $O(n^2)$ even though we could do only $O(n \log n)$ comparisons when using Binary Search. For this reason, we usually
do not use Binary Search when implementing InsertionSort -- it does not make it faster, and actually makes it slower in special cases, such as an array which is
already sorted.
InsertionSort has time complexity $O(n^2)$, memory complexity $O(1)$, stable (if two elements are equal, they remain in the same order); fast ($O(n)$) when the input is already sorted.
MergeSort
This sorting algorithm is based on the Divide and Conquer technique: split the array into two halves, sort them, and merge them.
Time complexity is always $O(n \log n)$, memory complexity $O(n)$, stable.
QuickSort
This also can be seen as a sorting algorithm based on the Divide and Conquer technique, but it works in a different way.
We choose a pivot, move elements smaller than the pivot to the left, and larger than the pivot to the right.
(This is done using the "Polish flag" method shown during the exercise sessions.)
Then we sort both parts using the same algorithm.
There are bad cases where it runs in time $\Theta(n^2)$ (when one of the smallest/largest elements is always
chosen to be the pivot), but on the average it runs very quickly ($O(n \log n)$). If we take the pivot randomly,
the algorithm will run very quickly with very high probability, so in this particular case, average time
complexity is practically more useful than the pessimistic one.
Memory complexity is usually $O(\log n)$ but it can be $O(n)$ in the worst case. (This is because, when a function
calls itself recursively, the computer needs to remember the previous call (so called recursion stack); therefore, in the worst case above,
the computer will have to remember $n$ calls on the recursion stack. It is possible to do it more cleverly and
use $O(\log n)$ in the worst case -- use recursion for the smallest part, and then solve the bigger part without
using a recursive call.)
HeapSort
HeapSort is interesting, because it is the first algorithm based on a non-trivial data structure.
Data structures are ways of arranging data. In most data structures there is some important property that defines how data is arranged.
This property has to be satisfied after every operation
(this property called the invariant of the structure -- similarly to loop invariants, which must be satisfied after every iteration of the loop).
So far we have seen two basic data structures: unsorted array (no invariant) and
sorted array (the invariant here is that the array should be sorted after every iteration).
In an unsorted array, we can add a new element very quickly, but it takes very long to find anything.
In a sorted array, we can find every given element very quickly (the invariant helps us); however, as we have seen in InsertionSort, adding a new element
takes time (we need to spend extra time to keep the invariant satisfied). This is a common tradeoff in algorithmics (as well as in real life): it takes more time to
make the data more ordered, but then it takes less time to search them!
HeapSort is based on the data structure known as complete binary heap. This is a data structure into which we can
insert multiple elements (in time $O(\log n)$ each), quickly find the current largest element (in time $O(1)$)
and remove the current largest element (in time $O(\log n)$ each).
This is perfect for sorting -- we first add all the elements
to the heap, then we repeatedly take the greatest element in the heap and remove it. The time complexities stated above
give us a $O(n \log n)$ algorihthm.
A complete binary heap is represented as an array $a[1..i]$. (In most programming languages, including Python and C++, array indices start with 0; however, it is a bit easier to
describe the complete binary heap starting with index 1. In practice, we can just add a dummy [0] element and not use it.)
We arrange the elements into a tree-like structure
(see this visualization), where elements
$2k$ and $2k+1$ are considered the children of $k$. The heap has to satisfy the invariant called the heap property: the value in the
parent is always greater than the children. (Note: in the visualization, only the 'white background' part is in the heap; $i$ is the index of the last element in the white
part. Ignore the remaining part for now, the heap property is
not satisfied there.)
("Binary" refers to every element having two spots for children, and "complete" refers to the shape where all the possible children spots are filled, until the last
row which may be partially filled, but it still has to be completely filled from the left.)
The heap property ensures that the currently largest element is always $a[1]$. To add an element to the heap (with $i$ elements), we put it at the last position (as $a[i+1]$),
then we move it upwards as long as it is greater than its parent. Note that we make at most $O(\log n)$ steps here.
upheap(a, i):
while i>1 and a[i//2] < a[i]:
(a[i], a[i//2]) = (a[i//2], a[i])
i = i//2
To remove the greatest element from the heap, we switch its location with the last element ($a[i]$), then remove it.
After this, the new element $a[1]$ is probably not placed correctly, so we similarly move it downwards to its correct place:
downheap(a,i,n):
while true:
left_greater = 2*i<=n and a[2*i] > a[i]
right_greater = (2*+1) <= n and a[2*i+1] > a[i]
if left_greater and right_greater:
if a[2*i] > a[2*i+1]:
right_greater = False
else:
left_greater = False
if left_greater:
(a[i], a[2*i]) = (a[2*i], a[i])
i = 2*i
elif right_greater:
(a[i], a[2*i+1]) = (a[2*i+1], a[i])
i = 2*i+1
else:
return
It is possible to implement HeapSort in memory $O(1)$. To do this, we use the first $i$ elements of the array $a[1..n]$ given in the input as the heap.
During the first phase (storing all the elements in the heap), after $i$ steps, the first $i$ elements are in the heap, and the remaining elements have
not yet been added. During the second phase (moving the elements from the heap back to the array in the correct order), after $j$ steps, the last $j$ elements
are the greatest $j$ elements which have been already found, and the remaining $n-j$ elements are in the heap.
(In the visualization, the elements in the heap are white, and elements yet to be inserted to heap /
already sorted are gray.)
The whole algorithm is implemented as follows:
for i in range(1,n+1):
upheap(a, i)
for i in range(n, 1, -1):
(a[1], a[i]) = (a[i], a[1])
downheap(a, 1, i-1)
Constructing the heap can be implemented more efficiently than we have shown here (in time $O(n)$), but the upheap function will be useful to us again later.
Time complexity of HeapSort is always $O(n \log n)$, memory complexity $O(1)$, but not stable.
It is a bit faster than MergeSort in practice, but the lack of stability may be an issue in some situations.
Sorting in real life
What sorting algorithms are used in practice? Popular programming languages today have built-in sorting functions.
They are based on the rough ideas given above, but they use powerful optimizations, which make them even better.
We will roughly study implementations in C++ and in Python.
C++
C++ has two sorting functions available: sort and stable_sort. Stability is an important
property in many applications, so we often want to use a stable algorithm;
most popular implementations of C++ use MergeSort for stable_sort.
However, it takes extra resources to guarantee stability, so if stability is not necessary, it is better to use an unstable algorithm.
The fastest algorithm in practice is QuickSort, and so the implementation of sort is based on QuickSort (with
a smart method of choosing the pivot). However,
QuickSort has two disadvantages:
- When sorting a very small number of elements, InsertionSort is actually the fastest. However, we can easily combine
the advantages of QuickSort and InsertionSort to get something better than both of them: we use QuickSort for the general
structure, but if the subarray to be sorted is very small, sort it using InsertionSort instead!
- QuickSort has bad cases which take $O(n^2)$ time. This can be mitigated by using so-called introspection: the algorithm
studies how well it is doing, and if it finds out that it is not doing very well (bad pivots are used, causing the next iteration
to sort almost all the elements more often than expected), we switch to another sorting algorithm. This another sorting algorithm
would be HeapSort (as it is more efficient than MergeSort). This approach is known as IntroSort.
Python
For simplicity, Python has only one sorting function, which sorts in the stable way. As above, the general idea is based on
the best stable sorting algorithm covered in these notes (MergeSort), but we use the insights provided by other algorithms
in order to improve it.
We have mentioned that InsertionSort works very quickly (in $O(n)$) on data that is already sorted. Similarly, it is easy
to construct an algorithm that works very quickly on data that is in the reverse order. All the other algorithms mentioned
above (QuickSort, MergeSort, HeapSort) take $O(n \log n)$ time in the best case.
The sorting algorithm implemented in most versions of Python, called TimSort, combines the advantages of MergeSort and
InsertionSort. We first look for 'runs' in our data: these are segments which are already sorted, or which are reverse
sorted. The reverse sorted segments are reversed, and afterwards, all the runs are merged, using MergeSort.
This yields a sorting algorithm which is not only $O(n \log n)$ in the worst case, but also $O(n)$ in many "easy" cases.
Since these "easy" cases actually occur quite often in real world data, TimSort is very fast in practice.
Can we sort even better?
The last section may give an impression that it is pointless to implement sorting algorithms of our own -- after all,
our programming languges use algorithms which are more sophisticated than what we could probably implement ourselves,
and also more efficient. We have also proven a Theorem saying that we cannot sort data faster than in $O(n \log n)$.
However, this is not the case -- there are situations when a simple sorting algorithm actually works better!
This is because our Theorem assumed that our sorting algorithms was based on comparisons. If we know that
our data to be sorted has some extra structure, it may be possible to sort it faster than in $O(n \log n)$, and also
much faster than the built-in implementations!
CountSort
We have shown the techniques of counting and cumulative sums on previous lectures, but we have not yet tried to used it for sorting.
Imagine that you are a teacher, just after grading the tests (let's say there are 200 of them), and you want to sort
the test papers alphabetically by the students' last names, so that you can find them easily when the students
complain. What is the quickest way to do this?
A method that turns out to be very good in practice is not based on any of the sorting algorithms above.
It works as follows: we create 26 piles of tests, each for a different letter of the alphabet (A..Z). Then we look
at every test, and we drop it on the pile corresponding to the student's name.
Then we simply collect all the piles in order. Of course this still leaves students not sorted correctly if they
share the first letter of their names. Let's assume for now that it is good enough: every student is close to the expected
position, and we can easily find them.
CountSort is an algorithm based on the idea above. It is
used if there is a small number ($k$) of possible keys -- for example, when we are sorting 10000 people
by their birth year (1900..1999 -- a small number of possibilities). Simply count the number of people born
in each year, then for every key we compute where people having this key should start (using cumulative sums),
and then place each person in their correct location.
For example, assume that we have the following people A (1907), B (1905), C (1907), D (1909). We count that there
is 1 person born in 1905, 2 persons born in 1907, 1 person born in 1909. By using cumulative sums, we find out
that people born in 1905 should start at position 0, people born in 1907 should start at position 1, and people
born in 1909 should start at position 3. We look through our list again and move all the people to their correct
positions (A is moved to 1, B is moved to 0, C is moved to 2 because 1 was already taken, D is moved to 3).
Time complexity of CountSort is $O(n+k)$, memory complexity also $O(n+k)$. This sorting algorithm is stable.
This algorithm is not based on comparisons,
which allows it to circumvent the Theorem mentioned above, and run faster than $O(n \log n)$.
As the time complexity suggests,
this can work noticeable faster than the standard library algorithms -- but it needs specific kind of data to work.
However, it is often the case that the number of possible keys is small.
RadixSort
The CountSort examples given above are not completely satisfying. In the example with the students, we still had to
sort every pile alphabetically by the remaining letters. In the birth year example, we have sorted people by their
birth year, but what about month and day?
One way of solving this issue is using RadixSort. We will explain RadixSort on the birth year example. RadixSort
works as follows:
- First, we sort all the people by day (1..31), using CountSort. We ignore the month and year for now.
- Then, we sort all the people by month (1..12), using CountSort. We do this in a stable way.
- Last, we sort all the people by year (0..99), using CountSort again. We do this in a stable way.
Note that, since we were using a stable sorting algorithm in the last phase, people in every year will
remain sorted with respect to month. Likewise, for every month and year, people will be correctly sorted
with respect to the day. Therefore all the people are sorted correctly, and we did it quite quickly, in time
$O(n * (12 + 31 + 100))$! A similar method could be also used for alphabetical sorting, although it is more
difficult there, because the names could be long.
Both CountSort and RadixSort are easy to implement, and they often can be used to sort data much quicker than the
built-in sorting functions.
Exercise: we have to sort $N$ numbers in the range from $0$ to $N^2-1$. How to do this in $O(N)$ time?