Basic definitions
We start by defining the most basic notions a bit more formally.
A computational problem is defined by:
- the set of possible inputs, called instances
- expected correct output(s) for each possible input
The last lecture included an example computational problem:
Search problem
INPUT:
A non-decreasing array of integers, i.e., a[0], a[1], ..., a[n-1], and a number v.
OUTPUT:
the smallest x such that a[x] ≥ v, or n if none exist.
An algorithm is a sequence of steps which computes a correct output for each
possible input.
The last lecture included two algorithms for the search problem (linear search and binary search).
In this course, we are concerned with computational problems which are precisely stated
mathematically (not e.g. "given a picture, tell if it is a cat or a dog").
How to compare algorithms?
Suppose we have two algorithms solving the same computational problem. Which of them is better?
Probably the most important characteristics is the time complexity, i.e. the amount of time
used by the program. Typically, algorithms take more time for larger instances, for example,
for the search problem, the time/memory complexity would typically depend on n, the size
of the array. Therefore, time complexity is given as a function f(n) of parameter(s)
describing the instance. It is also possible that our algorithm runs in different time
for different instances of size n -- we are usually concerned with the worst
(pessimistic) case, i.e., f(n) is the longest running time on instances of size n.
We use this approach for the following reasons:
- It is good to give a guarantee that the program runs in given time.
- Optimistic case is rarely interesting -- the linear search problem runs very quickly
if it finds v right away, but this happens rarely
- Average (or typical) case might be difficult to define
- Pessimistic case often gives a good approximation on how fast our algorithm runs typically
As usual, our theory might be a bit too idealistic -- in practice, we might be perfectly
happy with an algorithm which works slowly (or maybe even incorrectly) in the worst case,
but works fast in the real world cases. For example, there are programs called SAT solvers,
which solve the computational problem called SAT. Many problems in the industry can be
rephrased as instances of SAT and very efficiently solved by the SAT solvers, even though it is
widely believed that no efficient algorithm exists solving SAT in the worst case. (It depends on
the instance, though -- some ciphers used in cryptography could be also rephrased as instances of SAT,
but these instances are hard.)
Now, what unit should we use to measure the time complexity? We will discuss several options:
1. Real time
This is a rather bad measure, as it strongly depends on the hardware and software
used. We want our theory to be more universal. Also we do not want to get swamped in the technical
details.
2. Number of operations executed by a high level programming language
...such as C++ or Python.
This is a very bad try too, because what can be written simply in such languages might be
actually a very complex operation (that's why these programming languages are called high level).
For example, consider the two following Python programs:
tab = [i for i in range(1000000)]
for i in range(100000):
tab[0:1] = []
tab = [i for i in range(1000000)]
for i in range(100000):
tab[len(tab)-1:len(tab)] = []
We create a list of size 1000000, and then we delete 100000 elements of it. The second program (which removes elements from the end) looks a bit more complex than the first one
(which removes them from the beginning) -- yet it is much faster (a fraction of second, versus about 100 seconds)!
A similar thing would happen with C++ vectors. Why does this happen? Real computers are reasonably well approximated by a theoretical model called the
Random Access Machine. In this model, the memory of the computer is represented as a sequence of memory cells.
Each memory cell can contain an integer (usually from a bounded range), and cells are indexed by integers
called their addresses.
The Random Access Machine may execute basic arithmetical operations on the contents of the memory cells (for example,
add the contents of memory cells 0 and 1 and put the result in the memory cell whose index is given in the memory
cell 2). Everything in our computers is also represented in such a way. For example, in C++, a vector v is
roughly represented as follows: the consecutive memory cells, starting from the memory cell number k, contain the
contents of the vector; there is also a memory cell which contains the address k, and the consecutive cell
will contain the number of elements in the vector, n. If the vector v is a global variable, the address of the memory
cell containing k will be fixed.
[The actual lecture contained a picture -- so it should be much more clear.]
To remove an element from the end of the vector, it is sufficient to decrease the
n -- however, to remove an element from the beginning, we also need to move
each remaining element in the vector to the previous memory cell, which will
require many operations of our RAM!
(Well, we could increase k and decrease n, but C++ and Python do not use this approach
for technical/performance reasons, and it would not help anyway if we removed an element from the middle.)
Python lists use a similar representation, but a bit more complex -- since Python is dynamically typed,
it is not sufficient to simply give the number representing each element, Python also needs to store
information about the type of each element. (Causing such Python programs to work slower and
use up more memory.)
3. Number of RAM operations
As we have said, the Random Access Machine (RAM) is a theoretical model, so we could try to
compute the number of steps executed by it. This corresponds quite closely (i.e., is roughly
proportional) to the real time, but it lacks the mentioned disadvantages of being hardware
dependent. However, this is still a bit too complex -- we would have to write Random Access
Machine programs for each algoritm, in order to compute its exact running time! Thus, we would
like even more abstraction.
4. Dominating operations
A more practical solution is to choose a dominating operation, and count the number of
times this dominating operation is executed. We give the
running time of our algorithm as the number of dominating
operations performed.
For example, for the search problem, the dominating
operation could be a comparison of v with an element of the array a,
and we can easily compute that the binary search algorithm will execute at most the ceiling of log(n)
of these dominating operations.
If we choose the dominating operation
correctly (i.e. the number of actual RAM operations per one dominating operation is bounded),
this gives a good idea on the actual running time of our algorithm.
5. Asymptotics
In practice, we abstract even further, using asymptotics.
Intuitively -- if our algorithm requires, say, 8n^2 + 4*n + 125 operations,
we would not care at all about the last two summands, since they are small compared to 8n^2 for larger n;
also, we usually do not care about the constant factor 8, because this factor is too dependant on the
specific computer/theoretical model used. Thus, we just say that the running time of our algorithm is
on the order of n^2. Formally, we use the asymptotic notation (also called the Landau notation
or the Big Oh notation). Given two functions f(n) and g(n), we say that:
- f(n) = O(g(n)) if there exist constants C>0 and n0, such that for each n>n0 we have f(n) < C * g(n);
- f(n) = Omega(g(n)) if there exist constants c>0 and n0, such that for each n>n0 we have f(n) > c * g(n);
- f(n) = Theta(g(n)) if f(n) = O(g(n)) and f(n) = Omega(g(n)).
Intuitively, f(n) = O(g(n)) says "f(n) is on the order of at most g(n)", f(n) = Omega(g(n)) says
"f(n) is of the order of at least g(n)", f(n) = Theta(g(n)) says "f(n) is exactly of order g(n)".
(Two more notations are sometimes used: o(g(n)), "of order smaller than g(n)", and omega(g(n)), "of order greater than
g(n)" -- we will not be using them, though, so we do not define them here.) This notation is a bit weird
for modern mathematics -- since O(g(n)) actually denotes a class of functions of order at most g(n),
it would be more logical to say f(n)∈O(g(n)); yet, we use the equality notation for historical
reasons.
Typical algorithms could have time complexites of order: Theta(n^d) for d>=0 (typically d is an integer, but
real values are possible too), the greater d, the greater the order; the binary search was Theta(log n), which is
O(n^d) for any d>0, but Omega(1). Since O(1) means a function that is bounded by a constant, we sometimes use the
notation n^O(1) to denote any function on the order of at most some polynomial (without giving the actual degree --
only that it is bounded by a constant). Any polynomial is O(2^n), 2^n is O(3^n), 3^n is O(n!), n! is O(n^n), etc.
Memory complexity
Another most important characteristics is the memory complexity -- i.e., the number of working memory used
by our algorithm. Most of what we said about time complexity also applies to memory complexity, in particular,
we prefer to abstract from the technicalities and just use the asymptotic notation.
Example algorithms and their time and memory complexities
will be given in the next lecture.