5. Analysis of Algorithms
AEA 2021/2022
Based on
Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice - ch. 4
1/60
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
3/60
Introduction
Optimal algorithms for many optimization problems from industrial applications do not exist.
TSP: enumerate all tours and choose the shortest one; evaluate
(n−1)!
2 tours; the computation of a tour: 1ns → 2 years for an instance with 23 points.
If we relax the condition of optimality and the travel costs satisfy the triangle inequality, useChristofides heuristic to find a tour at most 1.5 times longer than the optimal (less than 1s)
1. build a minimal spanning tree from the set of all cities
2. create a minimum-weight matching on the set of nodes having an odd degree; add the MST together with the MWM
3. create an Euler cycle from the combined graph, and traverse it taking shortcuts to avoid visited nodes
Christofides heuristic - example
5/60
Introduction
An algorithm is ”efficient” if it’s running time grows polynomially in the input size, for all input instances.
I worst-case analysis: find an upper bound on the running time over all instances
I the ideal case: an algorithm whose worst-case running time is polynomially bounded; for NP-hard problems such algorithms do not exist
I versions of a problem
I the decision version: Is there a TSP tour of length at mostb?
I the optimization: What’s the length of the shortest TSP tour?
I the construction version: What’s the TSP tour of min length?
Introduction
Dealing with lack of efficient algorithms:
I relax the requirement of solving optimally
I give up the requirement that an algorithm should be efficient on every input;heuristics perform poorly in the worst case, but work well on inputs from practical applications
7/60
2-Opt heuristic
I Removes two edges from the tour, and reconnects the two paths created (2-opt move). Do it only if the new tour will be shorter.
I Continue removing and reconnecting the tour until no 2-opt improvements can be found.
On real-world Euclidean TSP instances, 2-Opt heuristic needs a sub-quadratic no. of iterations until it reaches a local optimum.
The approximation ratio of the 2-Opt heuristic for Euclidean TSP instances withn points isθ(logn/log logn).
Introduction
I Elementary operation: the time spent for its execution is bounded by a const.
How to deal with arithmetic operations?
I Logarithmic model: the costs for arithmetic operations depend on the lengths of the representations of the (large) numbers;
I Uniformmodel: arithmetic operations are assumed to be elementary
9/60
Introduction
Definition
f :N→Ran asymptotic positive function if there exists a natural numbern0 s.t. f(n)>0,∀n ≥n0. Let F⊕ the set of asymptotic positive functions,f,g ∈F⊕.
Functionf belongs to O(g) (f is of growth order g) iff there exist positive constantsc andn0 s.t. f(n)≤c·g(n), ∀n≥n0.
Functionf belongs to Ω(g), iffg ∈O(f).
Θ(g) =O(g)∩Ω(g),f ∈Θ(g): f andg have the same order of growth.
f ∈o(g) (f grows slower than g) ifflimn→∞f(n)/g(n) = 0.
f ∈ω(g) (f grows faster than g) ifflimn→∞f(n)/g(n) =∞.
Polynomial time algorithm: the running time isO(p(n)), for some polynomialp(n).
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
11/60
Worst-Case Analysis
The worst-case analysis yields anupper boundon the running time of an algorithm over all input instances.
LetT(i),i = 1, ...,k, the running time of the execution of line i multiplied with the max no. of times it is executed.
Definition
A linei dominates a linej ifT(i)>T(j). A line that is not dominated by any other line is called abottleneck.
Definition
Theworst-case running time of an algorithm Ais the maximal number of steps that algorithmAcan perform on instances.
Corollary
Let S be the set of those lines of algorithm A that are not
dominated by other lines. The worst-case running time of A can be bounded by O(P
i∈ST(i)).
Worst-Case Analysis - example
Theminimum weight spanning tree problem: G = (V,E), E ={e1, ...,em},n vertices, a weight function c :E →R, find a connected cycle-free subgraph of minimum total weight.
1 Sort the edges s.t. c(e1)≤c(e2)≤...≤c(em);
2 T ←(V,∅);
3 for i ←1 tomdo
4 if T +ei contains no cycle then
5 T ←T +ei
Algorithm 1: Kruskal’s algorithm
Sorting the edgesO(mlogm) (mergesort), the 3rd and the 4th line are executedm times, the 5th line at mostn times, and proving the non-existence of a cycle (BFS algorithm)O(n); the worst-case running time: O(mlogm+ 1 +m+mn+n) =O(mn).
13/60
Worst-Case Analysis
I Disadvantage: if an algorithm performs badly on a single instance, this instance determines the worst-case running time
I there are algorithms with exponential running time used in practice
I Advantages
I it yields an upper bound on the running time overall instances I provides theoretical insights to the behavior of an algorithm; it
helps to identify bottleneck operations I can usually be performed easily and fast
Average-Case Analysis
Idea: averagethe running time over all valid instances of the same size. The instances on which the algorithm performs badly may be rare or occur seldom in practice.
LetAan algorithm, In the set of all instances of lengthn, gA:In →N maps each instanceI ∈ In to the no. of basic steps performed byAonI,f :In→[0,1] the density function of a probability distribution onIn. Theaverage-case running timeon the set of instances of lengthn is:
TAVG(n) = Z
I∈In
f(I)·gA(I)dI
IfIn is finite, the density function is a discrete probability function pI. If all instances are equally likely,pI = 1/|In|, and the average running time is 1 P
g (I).
15/60
Average-Case Analysis
I Advantage: single instances do not influence the running time by too much
I Disadvantages
I must know the probability distribution of the input instances; a uniform distribution is not a realistic assumption in most cases I not robust under changes of the computational model
Example:
LetIn={0,1}n all instances of a fixed lengthnthat occur with equal probability 1/2n, andD ⊆ Inwith cardinality 2n(1−2−0.1n). AlgorithmAruns in polynomial timep(n) on every instance fromDand in 20.09ntime on all other instances.
The average-case running time:
TAVG1 =X
I∈D
1
2np(n) + X
I∈In−D
1 2n20.09n
= 1
2n[2np(n)−20.9np(n) + 20.99n]
=p(n)− 1
20.1np(n) + 1 20.01n
(1)
Average-Case Analysis
I Disadvantages I Example(cont.)
Consider a quadratic increase in the running time of the algorithm: Aneedsp2(n), resp. 20.18n time on instances from D, resp. In\D. The average-case running time is not polynomially any more:
TAVG2 =p2(n)− 1
20.1np2(n) + 20.08n
I An algorithm with average polynomial running time on one machine model can have an average exponential running time on another machine model
I Although an algorithm may have a small average-case running time, it can perform badly on some instances
17/60
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
Amortized Analysis
”to average the running times of operations in a sequence over the sequence”(Tarjan)
Used for analyzing online algorithms and data structures.
Definition
Theactual cost ci of an operation is the amount of time (no. of basic steps) actually consumed by this operation. Thetotal actual cost of a sequence of moperations is Pm
i=1ci.
Idea: ifaj >cj, the difference is stored and is used to pay for operations withai <ci,i 6=j. We requirePm
i=1ci ≤Pm i=1ai. Amortized analysis gives a worst-case boundfor a sequence of operations.
19/60
Amortized Analysis - Example
Themanipulation of a stack: operations POP and PUSH consume one time unit. Operation MULTI consists of zero or more POP operations and exactly one PUSH after all POPs.
Consider asequenceof moperations (PUSH, POP, or MULTI) executed on an initially empty stack. The worst-case running time of MULTI isO(m). The worst-case running time: O(m2).
The running time is bounded by 2m (the stack is initially empty and there are at mostm PUSHs and mPOPs).
1. Aggregate Analysis
The amortized cost per operation: ai = m1 Pm
j=1cj,∀1≤i ≤m I m the length of the sequence,Pm
j=1cj the total actual cost I Obs: the amortized costs are the same for every operation Example (stack manipulation)
I the actual cost of PUSH and POP is 1, of MULTI is
min(s,k) + 1, s the no. of objects on the stack, k the no. of POPs in MULTI
I at mostm PUSHs in a sequence of length m→at mostm POPs, single POPs, or POPs contained in MULTI
→ the total actual cost is bounded by 2m→O(m) I the amortized cost per operation O(1)
21/60
2. The Accounting Method (the banker’s view)
Explicitly allows different amortized costs for different operations.
If the amortized cost exceeds the actual cost, the difference (credit) is stored and is used to pay for operations where the amortized cost is less than the actual cost.
Thetotal credit of a sequence of length k: Pk
i=1ai−Pk i=1ci. To bound the total actual cost by the total amortized cost, the total credit must be nonnegative. If the length of the sequence is known in advance, the condition is sufficient; otherwise, the total credit has to be nonnegative after every operation:
Pk
i=1ci ≤Pk
i=1ai,∀k ≥1.
The crucial step: the allocation of the amortized costs.
The Accounting Method
Example (stack manipulation)
I Assign amortized cost of 2 to PUSH and 0 to POP →of MULTI is 2
I For every PUSH store one unit of amortized cost, used to pay for the POPs
I The stack is initially empty. There are at most as many POPs as PUSHs,Pk
i=1ci ≤Pk
i=1ai,k >1.
23/60
3. The potential method (physicist’s view)
The differenceci−ai is not stored as a credit, but is reflected by the change in a potential function.
I An object modified by a sequence of m operations (a data structure modified by a sequence of insertions or deletions), I D the set of all configurations of the object, D0 the initial
configuration,Di the configuration after theith operation, i = 1, ...m.
I Choose the potential function φ:D →Rwhich maps the current configuration to real numbers.
I The amortized cost of the ith operation:
ai =ci+φ(Di)−φ(Di−1)
If the change in the potential function φis positive, the difference is stored as ”potential energy”, otherwise the stored
”energy” is used to perform the operation.
The potential method
m
X
i=1
ai =
m
X
i=1
ci+φ(Dm)−φ(D0)
The total amortized cost must bound the total actual cost, so φ(Dm)−φ(D0) nonnegative. If the length of the sequence is not know in advance, require the property fork ≥1.
25/60
The potential method
Example (stack manipulation)
I the potential function: the no. of objects on the stack
I start with the empty stack →φ(D0) = 0,φis nonnegative for all otherDi
→ the total amortized cost is an upper bound on the total actual cost
The potential method
I compute the amortized cost for each operation
l the no. of items on the stack before theith operation POP:ai =ci +φ(Di)−φ(Di−1) = 1 + (l−1)−l = 0 PUSH: ai =ci+φ(Di)−φ(Di−1) = 1 + (l+ 1)−l = 2 If theith operation is MULTI (s POPs and one PUSH):
MULTI:
ai =ci +φ(Di)−φ(Di−1) = (s+ 1) + (l −s+ 1)−l = 2 I ai ≤2,i ∈ {1, . . . ,m} →the total amortized cost is bounded
by 2m
I for a sequence of length k, the potential is nonnegative for k ≥1→ the total actual cost is bounded by 2k
27/60
Amortized analysis: example
Binary counter: counting from 0 ton = 2k −1;
b[0..k−1], b[0] the least significant bit,b[k−1] the most significant bit
1 i ←0
2 whilei <k and b[i] = 1do
3 b[i]←0
4 i ←i+ 1
5 b[i]←1
6 returnb
Algorithm 2:increment(b[0..k−1])
1 b[0..k−1]←0; write b[0..k−1]
2 forj ←1to n do
3 b[0..k−1]←increment(b[0..k−1])
4 writeb[0..k−1]
Algorithm 3:counting(n,k)
Binary counter
Worst case running time: incrementO(k), counting O(kn), k= logn; The total running time: θ(n).
1. Aggregate method
I increment doesn’t flip θ(logn) bits every time;
the least significant bitb[0] flips every iteration, b[1] flips every other iteration, b[2] flips every 4th iteration;b[i] flips every 2ith iteration
29/60
Binary counter
1. Aggregate method
I a sequence ofn increments flips each bit b[i] for bn/2ic times
I the total number of bit-flips for the entire sequence
blg nc
X
i=0
bn 2ic<
∞
X
i=0
n 2i = 2n
I on average, each call to increment flips less than two bits, and therefore runs in constant time
I the amortized time for eachincrement is O(1)
Binary counter
2. Accounting method
Set a bit from 0 to 1: charge two-dollars increment tax; one of those dollars is spent changing the bit from 0 to 1; the other is stored as credit until we need to reset the bit to 0.
The amortized cost of anincrementis 2.
Actual total costsPn
i=1ci ≤Pn
i=12 = 2n
31/60
Binary counter
3. Potential method
I the potential φi after theith increment: the no. of bits with value 1
I initially, all bits are equal to zero: φ0= 0; φi >0,∀i >0 I define
ci = #bits changed from 0 to 1 + #bits changed from 1 to 0 φi −φi−1 = #bits changed from 0 to 1 - #bits changed from 1 to 0
I the amortized cost of theith increment is
ai =ci +φi−φi−1= 2×#bits changed from 0 to 1 I since increment changes only one bit from 0 to 1, the
amortized cost incrementis 2
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
33/60
Smoothed Analysis
I Worst-case analysis are often too pessimistic and average-case analyses are often problematic
I Smoothed analysis: a hybrid of worst-case and average-case models;
In the 1st step, an adversary specifies an arbitrary input; in the 2nd step, the input is slightly perturbed at random (σ, the magnitude of the perturbation).
The smoothed running time is the worst expected running time that the adversary can achieve.
Smoothed Analysis
LetAan algorithm, I an input forA,CA(I) a complexity measure, In the inputs of length n. The worst-case complexity is
CAworst(n) =maxI∈InCA(I)
Given a family of probability distributionsµn onIn, the average-case complexity is
CAave(n) =EI←µnIn[CA(I)]
Letperσ(I) the random variable that describes the instance obtained fromI by a perturbation withσ.
The smoothed complexity is
CAsmooth(n, σ) =maxI∈InE[CA(perσ(I))]
35/60
Smoothed Analysis
Figure:Complexity measures;σ1< σ2→CAsmooth(n, σ1)>CAsmooth(n, σ2)
Interpolate between worst- and average-case analyses by adjustingσ(for σ→0, a worst-case analysis, forσ→ ∞, an average-case analysis)
Machine learning
k-means clustering: partition a set of d-dimensional vectors Q=q1, ...,qn into k clusters Q1, ...,Qk so that the intra-cluster variance
V =
k
X
i=1
X
qj∈Qi
kqj −µ(Qi)k2 is minimized;µ(Qi) = (P
qj∈Qiqj)/|Qi|.
Lloyd’s algorithm
I choose an arbitrary set of k centers
I partitionQ into k clusters (use the Voronoi diagram of the centers)
I repeat the process:
I use the centroids of the current clusters as the new centers I re-partitionQ accordingly
Ω(√ n)
37/60
Lloyd’s algorithm
Lloyd’s algorithm haspolynomial smoothed complexity.
Theorem
Fix an arbitrary set X0 ⊆[0,1]d of n points and assume that each point in X0 is independently perturbed by a normal distribution withµ= 0 and stdevσ, yielding a new set X of points.
The expected running time of k-means on X is bounded by a polynomial in n and1/σ.
Binary Optimization Problems
An instanceI of a linear binary optimization problem Π consists of a set of feasible solutionsS ⊆ {0,1}n and a linear objective functionf :{0,1}n→R,f(x) =cTx,c ∈Rn to maximize (or minimize).
Examples: the minimum spanning tree problem, the knapsack problem, TSP, etc.
Study these problems in a probabilistic input model, where the coefficients are randomly perturbed.
39/60
Binary Optimization Problems
Asemi-random model: the input model combines adversarial and random decisions:
I 1st step: the coefficients are chosen by an adversary (real valued coefficients from [-1, 1])
I 2nd step: the numbers specified by the adversary are
perturbed (by adding independent Gaussian random variables with µ= 0 and stdevσ)
Smoothed Analysis of Binary Optimization Problems
IN the set of unperturbed instances of lengthN;perσ(I) the random instance obtained by a perturbation of an instanceI ∈ IN with magnitudeσ.
Π haspolynomial smoothed complexity iff it admits a polynomial P and a randomized algorithmA whose running timeCA satisfies
Pr[CA(perσ(I))≥P(N,1/σ,1/p)] ≤ p for everyN∈N, σ∈(0,1],p ∈(0,1],I ∈ IN.
With probability at least 1−p, the running time ofAis polynomially bounded in the input lengthN, 1/σ, and 1/p.
Theorem
A linear binary optimization problemΠhaspolynomial smoothed complexityiff there exists a randomized algorithm for solvingΠ whose expected worst-case running time is pseudo- polynomial
41/60
Smoothed Analysis of Binary Optimization Problems
Examples
I The knapsack problem (can be solved by dynamic programming in pseudo-polynomial time) haspolynomial smoothed complexity, even if the weights are fixed and only theprofits are randomly perturbed.
I TSP does not have polynomial smoothed complexity when only the distances are randomly perturbed
2-opt heuristic for TSP in the plane
Input: n points in [0,1]×[0,1]
Goal: define a tourx1, . . .xn to minimize Pn
i=1||xi+1−xi||1,||x||
thel1 norm of x (|x(1)|+|x(2)|).
A polynomial-time approximation scheme (but not particularly practical).
2-opt local search: in the worst case, the algorithm might require an exponential no. of iterations before finding a locally optimal solution.
43/60
2-opt heuristic
The adversary chooses an arbitrary set of pointsx1, . . . ,xn in [0,1]×[0,1], which are then randomly perturbed.
Combine these two steps into an equivalent one: an adversary picks a density functionfi, from which each pointxi is drawn.
Assume the distributions are ”not too spiky”: fi(x)≤ σ1, for all i andx∈[0,1]×[0,1]→ the support of fi has the area at least σ.
Theorem: The smoothed complexity of the 2-OPT heuristic for the TSP in the plane with thel1 metric ispolynomial. In particular, the expected number of iterations isO(σ−1n6logn).
2-opt heuristic
Sufficient condition: every swap used by the local search algorithm results in a significantly improved tour.
Consider a local move that begins with some tour, removes the edges (u,v),(x,y), and replaces them with (u,x),(v,y).
The decrease in the TSP objective under this swap is
||u−v||1+||x−y||1− ||u−x||1− ||v−y||1, that is,
|u1−v1|+|u2−v2|+|x1−y1|+|x2−y2| − |u1−y1| − |u2−y2| −
|v1−x1| − |v2−x2|(*).
45/60
2-opt heuristic
Prove the Theorem by
I lower bounding the probability that there are no bad −swaps (as a function of), and
I upper bounding the number of iterations when this condition is satisfied.
LemmaFor every perturbed instance and >0, the probability (over the perturbation) that there is−bad swap isO(n4/σ).
2-opt heuristic
Thel1 distance between any pair of points is at most 2 →every tour has length at most 2n.
If there are no−bad swaps, then the local search must terminate within 2n/iterations.
Because every iteration strictly improves the quality of the current tour, the worst-case no. of iterations of the algorithm is at most the no. of different tours, which is bounded above byn!.
Using these observations and Lemma (with= 2n/M):
E[#of iterations] =
n!
X
M=1
Pr[#of iterations ≥M]
≤
n!
X
M=1
Pr[there is a2n
M −bad swap]
n!
X n5 −1 6
(2)
47/60
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
Computational Testing
Many optimization problem arising from real-world applications (TSP, packing problems, the Steiner tree problem) are known to be computationally hard. Solve instances optimally in”reasonable”
time.
I Worst-case analysis - too pessimistic and might prevent us from applying the algorithm
I If a problem admits a polynomial time algorithm, it does not imply its practical usefulness
Consider not onlytheoretical, but alsoexperimental measures of performance. Anexperimental measure for the performance of an algorithm is thetime needed to solve a set of test instances.
49/60
Computational Testing
Evaluating algorithms under the criterion ofcomputational running time means to measure theCPU time each algorithm needs to solve the set of instances.
Definition
TheCPU time a computer C needs to solve an instanceI, using algorithmA, is the actual time (measured by a stop clock) C needs to solveI if 100% of the CPU power can be used to runA.
To compare algorithms using the CPU time:
I use the same programming language for implementations I create instances (e.g. using randomization)
I run each algorithm (on the same computer) on instances while measuring the CPU time
Computational Testing
In practical settings, computational testing is a reasonable way to analyze an algorithm.
I problems:
I the CPU time depends on the computational environment I the programming skills are important
I algorithms not only as fast as possible from a theoretical point of view, but also allow efficient implementations
I requires an implementation, theoretical analysis only the pseudocode
51/60
Content
Introduction
I. Worst-Case and Average-Case Analysis II. Amortized Analysis
III. Smoothed Analysis Computational Testing
Representative Operation Counts
Representative Operation Counts
I Identify operations thatdominate the running time of an algorithm and specify the CPU time in O-notation by only considering these operations (bottlenecks)
I Does not provide a theoretical upper-bound on the no. of performed operations, as worst-case analysis does
I experimentally counts the number of executions of the bottlenecks
53/60
Identifying Representative Operations
Let
I A the code of a computer program (linesa1, ...,aK of code), I the time for executingai isθ(1),
I αk(I),k = 1, ...,K the no. of timesak is executed, while runningA on instance I.
The CPU timeAtakes to solve I,CPU(I), lies within a constant factor of the no. of times each line of code is executed.
Lemma
CPU(I) =θ(PK
k=1αk(I))
Identifying Representative Operations
Example:
1 sum←0;prod ←1;i ←1;
2 whilei ≤n do
3 sum←sum+i;
4 prod ←prod·i;
5 i ←i+ 1;
Algorithm 4: Summation and Multiplication
Count the no. of times line 3 is executed (representative operation)
55/60
Identifying Representative Operations
Definition
LetS ⊆ {1, ...,K},aS ={ak :k ∈S}.
aS is a representative set of lines of a program code if ∃c ∈R s.t.
αi(I)≤c·P
k∈Sαk(I), for every instanceI and every lineai of code.
The no. of timesai is executed is bounded or dominated up to a constant factor by the no. of times the lines fromS are executed.
Corollary
Let aS be a representative set of lines of a program code. Then CPU(I) =θ(P
k∈Sαk(I)).
Applications of Representative Operation Counts
Representative operations can be used to:
1. identify operations that asymptotically have a strong influence on the running time of an algorithm
2. define virtual running time (tool for estimating the CPU time)
57/60
1. Applications of Representative Operation Counts
aS a set of representative operations,αS =P
k∈Sαk. Definition
An operation is called anasymptotic non-bottleneck operationif its share in the computational time becomes smaller and
approaches zero as the problem size increases (otherwise, asymptotic bottleneck operation).
To find asymptotic bottleneck operations: plotαk/αS,∀k ∈S over increasing instance size; return the fractions that have a non-decreasing trend for growing problem sizes.
Counting representative operations→identify bottleneck operations→improve the running time.
1. Applications of Representative Operation Counts
Comparing Algorithms Using Representative Operations LetaS1 andaS2 a set of representative operations for algorithm A1, respectivelyA2. A1 is asymptotically superior toA2 if and only if lim|I|→∞
P
k∈S1αk(I) P
k∈S2αk(I) = 0.
59/60
2. Applications of Representative Operation Counts
Virtual Running Time- an approach towards a machine independent measure, like worst-/average-case.
Thevirtual running time of algorithm Aon instance I with representative operationsa1, ...,aK is
VA(I) =c1·α1(I) +...+cK ·αK(I) with c1, ...,cK ∈R≥0. c1, ...,cK can be computed using the least squares method for the points (CPU(I), α1(I), ..., αK(I)). The least squares method minimizesP
I∈I(CPU(I)−VA(I))2,I a set of instances.
Virtual Running Time
Example: network simplex algorithm: VA(I) = α1(I)+2α690002(I)+α3(I). After computing constants, when comparing the actual running time on 25 instances with their estimationVA, found that the error is at most 7%.
The instances chosen for linear regression may have special properties that can lead to under- or overestimation.
Advantages:
I it points out the percentage of the CPU time a representative operation consumes
I it can easily be transferred from one computer to another I the elimination of effects such as paging and caching