### 7. Implementation Aspects

AEA 2021/2022

Based on

Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice - ch. 6

2/51

### Introduction

The description of an algorithm allow the reader to easily understand the algorithm and its key properties. Sometimes omitted details are highly nontrivial.

An implementation must be: correct,efficient,flexible andeasy to use.

I Correctness: implementations must be correct - performs according to the algorithm’s specification.

I Efficiency: solve small problems fast and large ones in a reasonable time.

I Flexibility: an implementation, especially in a library, should be flexible: modularity, reusability, extensibility, and

adaptability; efficiency and flexibility are competing goals I Ease of Use: integrate the provided code easily, the user must

understand the interface and design quickly

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

4/51

### Motivation and Description

Definition

f a function, P a deterministic program that (seemingly) computes f andI the set of all feasible inputs forf (resp. P). Forx ∈I,f(x) (resp. P(x)) is called theoutput off (resp. P) on the inputx. P is”correct” ifP(x) =f(x), ∀x∈I, otherwiseP is called ”buggy”.

I Testing, checking, verifying - methods that detect whether an implemented program satisfies its specification, i.e. the program is correct

I Debugging - methods that help us to debug the program if a bug is found

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

6/51

### Testing

Program testing: the process of executing a programP on a subsetT of I and verifying whether P(x) =f(x), ∀x∈T. IfP(x)6=f(x), for anyx ∈T, thenP isbuggy; ifP(x) =f(x),

∀x∈T, we don’t know whether the program is correct (except T =I).

I Thus, testing is the process of finding errors, rather than of increasing the reliability of the program. Nevertheless, testing can provide some confidence in the correctness of a program.

I The central job in testing is to find a test setT for which it is likely to find an error.

### Testing

I Random Data Generation: the data is randomly chosen from the set of possible inputs; probably uncovers mistakes which occur in many instances.

If the ”actual probability distribution on the input sequences”

is given or a sampling procedure that reflects it, then one can estimate the ”operational reliability”; if no such distribution is given, the random process naturally ignores possible structures in the input domain.

”Black box method”: ignore any structure of the code itself.

8/51

### Testing

I Input Space Partitioning: partitions the input space into classes and executes P on representative elements of each class. The goal: identify classesfor which the execution of elements leads to similar errors.

I The selection of representatives of many different classes should reveal many different errors.

### Space Partitioning

Statement coverage demands on the test set that every statement of the code is executedat least once

I the different classes are formed by the inputs that execute specific statements; these classes are not necessarily disjoint I unfortunately, it may ignore specific classes of mistakes

10/51

### Space Partitioning

Definition

Thecontrol flow graphis constructed from the code by means of some basic programming structures in the following way:

1. Graph (a) is an I/O, an assignment, a procedure call.

2. (b) is anif-then-else statement, 3. (c) is a whileloop, and

4. (d) constitutes the graph of two sequential statements.

### Space Partitioning

Generating test data on the basis of the control flow graph:

I The edge coverage demands that each edge of the control flow graph is executed at least once

I The no. of paths needed is linear in the no. of edges; thus, the approach is applicable if there are no non-reachable edges in the graph.

I Finer than the statement coverage: a non-existingelse branch is covered at least once by the edge coverage.

I The path coverage demands that each possible path in the control graph is traversed at least once

I fulfill the edge coverage condition

I it may discover mistakes that are possibly overlooked by the edge coverage: there may be errors that only occur under specific preconditions of a statement

12/51

### Space Partitioning

If there arewhile loops, the control flow graph contains an infinite no. of paths. Theif-then-elsestatements let the no. of paths increase exponentially in the no. of statements. Thus, a coverage of all possible paths cannot be guaranteed for all programs; design test data that executes many paths.

Drawbacks of Testing:

I provides correctness statements for the executed inputs which may be a very small part of the set of all possible inputs;

therefore the confidence in the correctness of the program may be small;

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

14/51

### Checking

Definition

Aprogram checker C_{f} for a computational decision/search
problemf is defined as follows: C_{f}^{P}(x) is a program that satisfies
the following conditions, for any programP (supposedly forf) and
all inputsx∈I (the instances of the problemf):

1. IfP is correct, thenC_{f}^{P}(x) =CORRECT (i.e. P(x) is correct)
2. IfP(x)6=f(x), then C_{f}^{P}(x) =BUGGY (i.e. P is buggy)
Run the programP on any input; if the checker outputs

CORRECT, then it has verified that the program works correctly on the given input. If it outputsBUGGY, then the program is verified to contain a bug.

Note: the checker doesn’t verify that the output ofP is correct for all inputs, butjust for the given input. Inprogram verificationit is proven that the program is correctfor all inputs.

### Checking

Program checking uses the program as a black box: it gives some inputs, checks whether the output is correct, and then it makes a statement about the correctness of the program.

Certifying algorithms request extra services from the program code to simplify the construction of checkers. A certifying algorithm doesn’t only compute the output, it also computes acertificate (witness)that makes it easy for a checker to prove that the computed answer is correct.

16/51

### Checking

Example: graph G = (V,E) is bipartite? i.e., there are two
disjoint subsetsV0,V1 of V with V0∪V1 =V s.t. ∀{u,v} ∈E,
eitheru ∈V_{0},v∈V_{1} or reversely u∈V_{1},v ∈V_{0}.

P - a program that supposedly decides this question. How to check whether a given answer YES or NO is correct?

”a program should justify (prove) its answer in a way that is easily checked by the user of the program.” (K. Melhorn)

Lemma

A graph G = (V,E) is bipartite iff it contains no cycle with odd length.

### Checking

A certifying algorithm: if the answer is YES, the program outputs
YES andV_{0},V_{1} that indicate the partitioning of V (the output
makes it easy to check the correctness of the answer). If the
answer is NO, the algorithm outputs NO and a cycle of odd length.

A certifying bipartiteness checker

1. Find a spanning tree T in G. Put the vertices with even
depth into V_{0} and the vertices with odd depth intoV_{1}.
2. Check for all edges in G\T whether the adjacent nodes are

both in V_{0} or both in V_{1}.

3. If there is such an edge (u,v), output NO and the path between u andv inT, together with (u,v). If there is no such edge, output YES, together with V0 andV1.

18/51

### Checking

I Constructing checkers is not an easy task; The checker itself may be buggy

I A drawback of the concept of certifying algorithms: for some problems it may be difficult to say what a certificate could actually look like, or it may be hard to compute a certificate according to a given description.

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

20/51

### Verification

The strongest way of showing the correctness of a program is to verify it (prove that the program fulfills the specification).

To verify if an algorithms solves the problem:

I Experimental analysis (testing): test the algorithm for different instances

I advantage: simplicity

I disadvantage: testing cannot cover always all possible instances

I Formal analysis (proving): prove that the algorithm works for any instance

I advantage: guarantee the correctness

I disadvantage: the difficulty of finding a proof, mainly for complex algorithms

### Formal methods in correctness verification

The main steps in the formal analysis of the correctness of an algorithm:

I Identify the properties of input data (problem’s preconditions) and of the output data (problem’s postconditions).

I Prove that starting from the preconditions and executing the actions specified in the algorithm, obtain the postconditions.

22/51

### Formal methods in correctness verification

I Algorithmstate: the set of the values corresponding to all variables used in the algorithm.

I The state of the algorithm changes from one processing step to another one.

I The algorithm is correct if at the end of the algorithm its state implies the postconditions.

### Formal methods in correctness verification

I Assertions: statements about the program’s state (”asserted”

to be true); used to annotate the algorithms.

Example:

Preconditions: a,b,c distinct real numbers Postconditions: m=min(a,b,c)

if a<b then //{a<b};

if a<c then

//{a<b,a<c};

min ←a //{min =a,a<b,a<c};

else

min ←c //{min=c,c <a,c <b};

24/51

### Steps in correctness verification

I Identify the preconditionsandpostconditions

I Annotate the algorithm with assertions concerning its state s.t.

- the preconditions are satisfied

- the final assertion implies the postconditions

I Prove that by each processing step one arrives from the previous assertion to the next assertion

The calculus of Hoare: a formal logic system which provides a set of logical rules that allow the derivation of statements. The central element of Hoare’s logic is the Hoare triple{P}A{Q}.

### The calculus of Hoare

{P}A{Q} corresponds to acorrectalgorithm (P →^{A} Q) if:

for input data which satisfies the preconditionsP, the algorithm:

I will lead to a result which satisfies the problem’s postconditions Q

I stop after a finite no. of processing steps.

Axioms and rules: the assertion axiom, the rules of consequence (if-then-else), the composition rule (considering the logical AND), the rule of iteration (while-loops). These axioms and rules handle the basic elements of procedural programs.

26/51

### Formal methods in correctness verification

To verify the correctness of an algorithm, analyze the correctness of each processing structure. For each processing structure, there are rules which make the verification easy:

I Sequential statement rule. Aan algorithm consisting of a
sequence of processing steps (A_{1},A_{2},· · ·,A_{n}), Pi−1 andP_{i}
the state of the algorithm before and after executing the
actionA_{i}. The sequential structure’s rule:

IfP =⇒ P_{0},Pi−1 →^{A}^{i} P_{i},i = 1,n and P_{n} =⇒ Q then
P →^{A} Q.

### Sequential statement rule

Example: x andy variables having the valuesa andb, respectively. Swap the values ofx andy.

P :{x =a,y =b},Q:{x=b,y =a}.

//P_{0} ={x =a,y=b};

aux←x;

//P1 ={aux =a,x =a,y =b};

x←y;

//P_{2} ={aux =a,x =b,y =b};

y←aux;

//P3 ={aux =a,x =b,y =a};

P =P_{0},P_{3} =⇒ Q andP_{i}−1→^{A}^{i} P_{i},i = 1,3. According to the
rule of sequential structure it follows that the algorithm is correct.

28/51

### Formal methods in correctness verification

I Conditional statement rule

The conditional structure: if c then A1 else A2

The rule can be stated as follows: If c is well-defined (it can
be evaluated), P∧c →^{A}^{1} Q andP∧c →^{A}^{2}Q thenP →^{A} Q.

The rule suggest that we have to verify the correctness of each branch (both when c is true and when is false).

### Conditional statement rule

Example: the minimum of two distinct real values.

P :{a∈R,b ∈R,a6=b},Q:{m=min{a,b}}.

if a<b then

A_{1} :m←a {a<b,m=a};

else

A_{2} :m←b {a≥b,m=b};

The conditionc is a<b. Ifa<b thenm=a<b thus

m=min{a,b}. In the other casea>b and the processing stepA2

leads tom=b <a thus we obtainm=min{a,b}.

30/51

### Formal methods in correctness verification

I Loop statement rule

Consider the following structure: while c do A

The loop statement is correct if it exists an assertionI (loop invariant) concerning the algorithm’s state and a function t :N→N (termination function) s.t.:

I I is true at the beginning (P =⇒ I).

I I is aninvariant property: ifI is true before executingAandc is true thenI remains true after the execution ofA

(I∧c→^{A}I).

I At the end of the loop (whenc becomes false),Q can be inferred fromI (I∧c =⇒ Q).

I After each execution ofAthe value oft decreases, i.e.

c∧(t(p) =k)→^{A}t(p+ 1)<k, wherep can be interpreted
as a counting variable associated to each loop execution.

I Ifc is true thent(p)≥1 (at least one iteration remains to be executed) and whent becomes 0 then the conditionc becomes false (the algorithm stops after a finite no. of iterations).

### Loop statement rule

Example: finding the minimum in a finite sequence.

A1 : min←a[1] {min=a[1]};

A2 : i ←2 {min=a[1],i = 2};

while i ≤n do

A3 :if min>a[i]then

min←a[i] {min≤a[j],j = 1,i};

A_{4} : i ←i+ 1 {min ≤a[j],j = 1,i−1};

returnmin {i =n+ 1,min≤a[j],j = 1,i−1};

Algorithm 1: minim(a[1..n]) P :n≥1,Q :min≤a[i],i = 1,n.

The loop invariant ismin≤a[j],j = 1,i−1. After the execution ofA2 the property is satisfied. The property remains true after each iteration.

32/51

### Loop statement rule

Example(cont.):

The stopping condition: i =n+ 1; if it is satisfied, then the final assertion implies the postcondition.

The termination function: t(p) =n+ 1−ip,ip the value of indexi
corresponding to iterationp. Since i_{p+1}=i_{p}+ 1 it follows that
t(p+ 1)<t(p) andt(p) = 0 impliesi =n+ 1 which implies the
stopping condition.

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

34/51

### Debugging

I A failure: an instancex ∈I for which f(x)6=P(x). The failure became apparent by the execution of the program code.

I Defect: the part of code that is responsible for the existence of the failure.

I The defect causes a state of the program execution that doesn’t fit to the specification (infected state). The infection propagates through the states.

I A state that is not infected is calledsane.

I An infection: a transition from a sane state to an infected state.

Debugging is a search problem: search for that part of the code that is responsible for the infection.

### Debugging

Definition

Acause of a failure is an event preceding the failure, without which the failure would not occur. Aminimum cause is an event for which one cannot subtract any non-empty subset of the circumstances of the event s.t. the failure still occurs.

36/51

### Debugging

To find out which part of the code is responsible for the existence of an infected state:

I inspect the states in between and decide whether they are already infected.

Reasoning about the existence of an infected state usually goes backward.

”Omniscient debugger”: records all program states, stepping backward in time while observing the current state; drawback: the set of data that must be recorded is very large.

### Debugging

Program slicing: a method used for abstracting from programs (start with a subset of a program’s behavior and reduce it to a minimal form that still produces that behavior).

I static: slices are computed statically using a dependence graph

I dynamic: adynamic slice is a set of statements that affected the value of a variable for one specific input; dynamic data dependence information is traversed to compute the slices, using an execution trace of the program.

38/51

### Debugging

Determinenecessary conditions for the sanity of a state; these extensions of the code are calledassertions and have the structure if (condition) then printf; and halt(); The condition determines whether the state is still sane.

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

40/51

### Efficiency

I Asymptotic running time vs. Actual running time

I Optimize code which contribute significantly to the runtime I Profiling: by instrumenting the code with timers or tools like

gprof.

Optimizing for a library or for a special problem is a different task:

I for a library - an implementation that is efficient on all possible inputs - several algorithms available or let the user take certain decisions

I for a certain problem - additional information about the input - exploit the information to speed up the algorithm.

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

42/51

### Implementation Tricks - Tuning the Algorithms

I Numerical parameters (e.g. the size of some buffer), data structures (balanced binary trees, priority queues),

sub-algorithms (sorting), the algorithm

I Example: computing a maximum cardinality matching in bipartite graphs

M ← some matching in G; for all nodes v in G do

if there is an M-augmenting path p starting in v then augment M byp

return M

Algorithm 2:FordFulkerson(G) I how to find the initial matching (empty vs. greedy);

recommend the use of a heuristic

I how to compute the augmenting path: BFS (shortest
augmenting path) or DFS; clearing the marking for all node
after each augmentation step Θ(n), the runtime of the
algorithm Ω(n^{2}); put on an additional stack each marked node

→lowerboundO(m)

### Implementation Tricks - Tuning the Algorithms

Divide-and-conquer algorithms;stop dividing early: stop dividing if the subproblem becomes smaller thanM and useB to solve it

I quicksort - sort sub-arrays of length smaller than M using insertion sort

I quicksort: if the recursion depth is larger than some bound in O(log n), the sub-array is sorted using heapsort

44/51

### Implementation Tricks - Tuning the Algorithms

I Sentinels: used to remove the end-of-data-structure tests

I Lazy Evaluation: the computations are not performed before their result isused

Example: matrix3 =matrix1∗matrix2; after the assignment,
matrix3 holdsn^{2} values; if only the top left entry is used?

I Over-Eager Evaluation: values are computed just in case they might be used sometimes later

I the length of a linked list

I prefetching: cache the block that has been read; the neighboring blocks are read and cached;

Dynamic arrays: if the size is increased by the amount needed,
fornappend operations→Θ(n^{2}); with prefetchingO(n)
amortized time by doubling the size

### Content

Correctness Testing Checking Verification Debugging

Efficiency

Tuning the Algorithms Tuning the Code

46/51

### Implementation Tricks - Tuning the Code

I Many tricks are specific to hardware, operating system, programming language

I Temporary Objects: str1 += str2 results in str1 = (new Stringbuffer(str1)).append(str2).toString(); the use of an object as return value of a function is one of the most obvious source of temporary objects being created and destroyed shortly after

I return pointers to new objects instead of returning objects I use non-const references instead of returning objects

I use optimizations that modern compilers provide: return value optimization (eliminate the temporary object created to hold a function’s return value)

const Rational

operator * (const Rational& lhs, const Rational& rhs) { return Rational(lhs.numerator() * rhs.numerator(), lhs.denominator() * rhs.denominator());

}

Rational r1, r2, r3;

r3 = r1 * r2

### Implementation Tricks - Tuning the Code

I Built-In Data Types: loading anunsigned charinto a register - 2 instructions less than loading asigned char, less instructions to manipulate an intthan a char /short I Conditional Branches

I pipelining - process operations before the preceding ones are totally completed;

I conditional branches slow down the computation; branch prediction; avoid conditional branches (inside loops)

48/51

### Conditional Branches. Example

if ((character >= ’0’) and ((character <= ’9’)) { characterType = DIGIT;

} else if ((character >= ’A’) and ((character <= ’Z’)) { characterType = UPPERCASE;

} else if ((character >= ’a’) and ((character <= ’z’)) { characterType = LOWERCASE;

} else { characterType = OTHERS; } count[characterType]++;

Solution:

characterType = typeTable[character];

count[characterType]++;

or

int count[256];

forall (character in word) { count[character]++;

}

int digits = 0;

for (int i = ’0’; i <= ’9’; i++) digits += count[i];

### Implementation Tricks - Tuning the Code

I Memory Hierarchy: the cache contains copies of several regions of the main memory that were recently accessed;

accessing an address that is currently cached is much faster than accessing one that is not cached (cache miss).

Cache aware techniques: data structures or algorithms to minimize the no. of cache misses.

I Inlining: copying the function body to the place where the function is called; automatic inlining (inlinekeyword in C/

C++); not only removes the overhead of the function call, but also allows the compiler to perform its optimizations over a larger part of code

50/51

### Implementation Tricks - Tuning the Code

I Argument Passing to Functions: by value or by reference;

passing by value involves copying - a source of inefficiency;

often objects are passed by value to ensure that the calling function doesn’t modify them - a better alternative: const references;mutable

I Squeezing Space to Improve Locality

I reducing space complexity also reduces time complexity (smaller objects fit into faster memory) - ex: bit vectors I the order of member declarations in a class: int- aligned at a

four byte boundary,short - two byte boundary;char, int, char, int, short, int- 6 bytes;char, char, short, int, int, int- 4 bytes.

### Implementation Tricks - Tuning the Code

I Exploiting Algebraic Identities

I replace an arithmetic expression with an algebraically

equivalent expression (cheaper to evaluate); ex: Horner’s rule for polynomial evaluation

I save time by replacing multiplications/divisions by powers of 2 by left and right shift operations

I take care at floating-point operations

I example: comparison of Euclidean distances - not necessary to take square roots

I Exploiting Word Parallelism: using logical bitwise OR on two 64-bit sets, 64 operations in parallel;Broadword computing: efficient n-bit computations for large values ofn.