AlgoDS: Lecture 2

Basic definitions

We start by defining the most basic notions a bit more formally.

A computational problem is defined by:

the set of possible inputs, called instances
expected correct output(s) for each possible input

The last lecture included an example computational problem:

Search problem
INPUT: A non-decreasing array of integers, i.e., a[0], a[1], ..., a[n-1], and a number v.
OUTPUT: the smallest x such that a[x] ≥ v, or n if none exist.

An algorithm is a sequence of steps which computes a correct output for each possible input.

The last lecture included two algorithms for the search problem (linear search and binary search). In this course, we are concerned with computational problems which are precisely stated mathematically (not e.g. "given a picture, tell if it is a cat or a dog").

How to compare algorithms?

Suppose we have two algorithms solving the same computational problem. Which of them is better? Probably the most important characteristics is the time complexity, i.e. the amount of time used by the program. Typically, algorithms take more time for larger instances, for example, for the search problem, the time/memory complexity would typically depend on n, the size of the array. Therefore, time complexity is given as a function f(n) of parameter(s) describing the instance. It is also possible that our algorithm runs in different time for different instances of size n -- we are usually concerned with the worst (pessimistic) case, i.e., f(n) is the longest running time on instances of size n. We use this approach for the following reasons:

It is good to give a guarantee that the program runs in given time.
Optimistic case is rarely interesting -- the linear search problem runs very quickly if it finds v right away, but this happens rarely
Average (or typical) case might be difficult to define
Pessimistic case often gives a good approximation on how fast our algorithm runs typically

As usual, our theory might be a bit too idealistic -- in practice, we might be perfectly happy with an algorithm which works slowly (or maybe even incorrectly) in the worst case, but works fast in the real world cases. For example, there are programs called SAT solvers, which solve the computational problem called SAT. Many problems in the industry can be rephrased as instances of SAT and very efficiently solved by the SAT solvers, even though it is widely believed that no efficient algorithm exists solving SAT in the worst case. (It depends on the instance, though -- some ciphers used in cryptography could be also rephrased as instances of SAT, but these instances are hard.)

Now, what unit should we use to measure the time complexity? We will discuss several options:

1. Real time

This is a rather bad measure, as it strongly depends on the hardware and software used. We want our theory to be more universal. Also we do not want to get swamped in the technical details.

2. Number of operations executed by a high level programming language

...such as C++ or Python. This is a very bad try too, because what can be written simply in such languages might be actually a very complex operation (that's why these programming languages are called high level). For example, consider the two following Python programs:

tab = [i for i in range(1000000)]
for i in range(100000):
  tab[0:1] = []

tab = [i for i in range(1000000)]
for i in range(100000):
  tab[len(tab)-1:len(tab)] = []

We create a list of size 1000000, and then we delete 100000 elements of it. The second program (which removes elements from the end) looks a bit more complex than the first one (which removes them from the beginning) -- yet it is much faster (a fraction of second, versus about 100 seconds)!

A similar thing would happen with C++ vectors. Why does this happen? Real computers are reasonably well approximated by a theoretical model called the Random Access Machine. In this model, the memory of the computer is represented as a sequence of memory cells. Each memory cell can contain an integer (usually from a bounded range), and cells are indexed by integers called their addresses. The Random Access Machine may execute basic arithmetical operations on the contents of the memory cells (for example, add the contents of memory cells 0 and 1 and put the result in the memory cell whose index is given in the memory cell 2). Everything in our computers is also represented in such a way. For example, in C++, a vector v is roughly represented as follows: the consecutive memory cells, starting from the memory cell number k, contain the contents of the vector; there is also a memory cell which contains the address k, and the consecutive cell will contain the number of elements in the vector, n. If the vector v is a global variable, the address of the memory cell containing k will be fixed. [The actual lecture contained a picture -- so it should be much more clear.] To remove an element from the end of the vector, it is sufficient to decrease the n -- however, to remove an element from the beginning, we also need to move each remaining element in the vector to the previous memory cell, which will require many operations of our RAM! (Well, we could increase k and decrease n, but C++ and Python do not use this approach for technical/performance reasons, and it would not help anyway if we removed an element from the middle.) Python lists use a similar representation, but a bit more complex -- since Python is dynamically typed, it is not sufficient to simply give the number representing each element, Python also needs to store information about the type of each element. (Causing such Python programs to work slower and use up more memory.)

3. Number of RAM operations

As we have said, the Random Access Machine (RAM) is a theoretical model, so we could try to compute the number of steps executed by it. This corresponds quite closely (i.e., is roughly proportional) to the real time, but it lacks the mentioned disadvantages of being hardware dependent. However, this is still a bit too complex -- we would have to write Random Access Machine programs for each algoritm, in order to compute its exact running time! Thus, we would like even more abstraction.

4. Dominating operations

A more practical solution is to choose a dominating operation, and count the number of times this dominating operation is executed. We give the running time of our algorithm as the number of dominating operations performed. For example, for the search problem, the dominating operation could be a comparison of v with an element of the array a, and we can easily compute that the binary search algorithm will execute at most the ceiling of log(n) of these dominating operations. If we choose the dominating operation correctly (i.e. the number of actual RAM operations per one dominating operation is bounded), this gives a good idea on the actual running time of our algorithm.

5. Asymptotics

In practice, we abstract even further, using asymptotics. Intuitively -- if our algorithm requires, say, 8n^2 + 4*n + 125 operations, we would not care at all about the last two summands, since they are small compared to 8n^2 for larger n; also, we usually do not care about the constant factor 8, because this factor is too dependant on the specific computer/theoretical model used. Thus, we just say that the running time of our algorithm is on the order of n^2. Formally, we use the asymptotic notation (also called the Landau notation or the Big Oh notation). Given two functions f(n) and g(n), we say that:

f(n) = O(g(n)) if there exist constants C>0 and n0, such that for each n>n0 we have f(n) < C * g(n);
f(n) = Omega(g(n)) if there exist constants c>0 and n0, such that for each n>n0 we have f(n) > c * g(n);
f(n) = Theta(g(n)) if f(n) = O(g(n)) and f(n) = Omega(g(n)).

Intuitively, f(n) = O(g(n)) says "f(n) is on the order of at most g(n)", f(n) = Omega(g(n)) says "f(n) is of the order of at least g(n)", f(n) = Theta(g(n)) says "f(n) is exactly of order g(n)". (Two more notations are sometimes used: o(g(n)), "of order smaller than g(n)", and omega(g(n)), "of order greater than g(n)" -- we will not be using them, though, so we do not define them here.) This notation is a bit weird for modern mathematics -- since O(g(n)) actually denotes a class of functions of order at most g(n), it would be more logical to say f(n)∈O(g(n)); yet, we use the equality notation for historical reasons.

Typical algorithms could have time complexites of order: Theta(n^d) for d>=0 (typically d is an integer, but real values are possible too), the greater d, the greater the order; the binary search was Theta(log n), which is O(n^d) for any d>0, but Omega(1). Since O(1) means a function that is bounded by a constant, we sometimes use the notation n^O(1) to denote any function on the order of at most some polynomial (without giving the actual degree -- only that it is bounded by a constant). Any polynomial is O(2^n), 2^n is O(3^n), 3^n is O(n!), n! is O(n^n), etc.

Memory complexity

Another most important characteristics is the memory complexity -- i.e., the number of working memory used by our algorithm. Most of what we said about time complexity also applies to memory complexity, in particular, we prefer to abstract from the technicalities and just use the asymptotic notation.

Example algorithms and their time and memory complexities

will be given in the next lecture.