An Experimental Study of Minimum Mean Cycle Algorithms

(1)

An Experimental Study of Minimum Mean Cycle Algorithms

UCI-ICS Technical Report # 98-32 Ali Dasdan

Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Springeld Ave., Urbana, IL 61801

E-mail: [email protected]

Sandra S. Irani

(Corresponding author) and

Rajesh K. Gupta

Department of Information and Computer Science 444 Computer Science Building

University of California, Irvine, CA 92697

E-mails: [email protected]and[email protected]

Abstract

We present a comprehensive experimental study of ten leading algorithms for the minimum mean cycle problem. For most of these algorithms, there has not been a clear understanding of their performance in practice although theoretical bounds have been proved for their running times. Only an experimental study can shed light on whether changes in an algorithm that make its running time theoretically more ecient are worth the overhead in terms of their payo in practice. To this end, our experimental study provides a great deal of insight. In our evaluation, we programmed these algorithms uniformly and eciently. We systematically compared them on a test suite composed of random graphs as well as benchmark circuits.

Above all, our experimental results provide important insights into the individual performance as well as relative performance of these algorithms in practice. One of the most surprising results of this study is that Howard's algorithm, a well known algorithm to the stochastic control community but a relatively unknown algorithm to the graph theory community, is by far the fastest algorithm on our test suite although the only known bound on its running time is exponential. We also present two new, stronger bounds on its running time.

(2)

1. Introduction

1.1. Notation and denitions.

Consider a digraphG= (V;E)withnnodes andmarcs. Associate with each arceinEtwo numbers:

a cost (or weight) w(e) and a transit time t(e). The weight and transit time of a path in G is equal to the sum of the weights and transit times of the arcs on the path, respectively. Let w(C) andt(C) denote the weight and the transit time of a cycle C in G.

The cost to time ratio(C) of cycle C is dened as (C) = w(C)

t(C); t(C)>0; (1) which basically gives the average cost per transit time for the cycle. The minimum cost to time ratio of Gis dened as = minC^f(C)^gwhereC ranges over all cycles in G. The problem of nding is called the minimum cost to time ratio problem (MCTRP).

Suppose now we simplify this problem by setting the transit time of every arc to unity. Then, the cost to time ratio of cycle C is referred to as its (cycle) mean and is denoted by (C). More precisely,

(C) = w(C)

t(C) = w(C)

jC^j ; (2)

where ^jC^jis the length ofC. Notice that the mean of a cycle gives its average arc cost. The minimum cycle mean of Gis dened as = minC^f(C)^gwhere C ranges over all cycles in G. The problem of nding is called the minimum mean cycle problem (MMCP). The denitions of the maximum versions of both of these problems are analogous.

1.2. Applications.

The applications of both MCTRP and MMCP are important and numerous. See [11] for the applications in graph theory. We will mostly focus on their applications in the performance analysis: these problems have fundamental importance to the performance analysis of discrete event systems [3], which are general enough to model the manufacturing systems [3] and digital systems such as synchronous [20], asynchronous [4], data ow [12], and embedded real-time [16] systems. Simply put, the algorithms for these problems are essential tools to nd the cycle period of a given cyclic discrete event system. Once determined, the cycle period is used to describe the behavior of the system analytically over an innite time period.

1.3. Related work.

There are many algorithms proposed for both MCTRP and MMCP. We give a comprehensive clas- sication of the fastest and the most common ones in Table I. References to a few more algorithms can be found in [11]. Note that as MMCP is a special case of MCTRP, any algorithm for the latter problem can be used to solve the former problem. Conversely, it is also possible to solve MCTRP using an algorithm for MMCP [10].

In Table I, the polynomial and pseudopolynomial algorithms are respectively ordered according to their worst-case running times. Those with the same running time are presented in alphabetical order of their inventors' names. Some references are cited more than once because they contain more than

(3)

TABLE I

Minimum mean cycle and minimum cost to time ratio algorithms for a graph withn nodes and

m arcs. (W, the max arc weight;T, the total transit time of the graph.) Minimum mean cycle algorithms

Name Source Year Running time Result Complexity

1 DG Dasdan & Gupta [8] 1997 O(nm) Exact Polynomial

2 HO Hartmann & Orlin [11] 1993 O(nm) Exact Polynomial

3 Karp's Karp [13] 1978 (nm) Exact Polynomial

4 Hartmann & Orlin [11] 1993 O(nm + n²lgn) Exact Polynomial 5 YTO Young, Tarjan, & Orlin [22] 1991 O(nm + n²lgn) Exact Polynomial

6 Karp & Orlin [14] 1981 (n³) Exact Polynomial

7 KO Karp & Orlin [14] 1981 O(nmlgn) Exact Polynomial

8 OA1 Orlin & Ahuja [19] 1992 O(^pnmlg(nW)) Approximate Pseudopoly.

9 OA2 Orlin & Ahuja [19] 1992 O(^pnmlg²(nW)) Approximate Pseudopoly.

10 Cuninghame-Green & Yixun [7] 1996 O(n⁴) Exact Polynomial Minimum cost to time ratio algorithms

Name Source Year Running time Result Complexity

11 Burns' Burns [4] 1991 O(n²m) Exact Polynomial

12 Megiddo [17] 1979 O(n²mlgn) Exact Polynomial

13 Hartmann & Orlin [11] 1993 O(Tm) Exact Pseudopoly.

14 Lawler's Lawler [15] 1976 O(nmlg(nW)) Approximate Pseudopoly.

15 Ito & Parhi [12] 1995 O(Tm + T³) Exact Pseudopoly.

16 Gerez et al. [9] 1992 O(Tm + T³lgT) Approximate Pseudopoly.

17 Gerez et al. [9] 1992 O(Tm + T⁴) Exact Pseudopoly.

18 Howard's Cochet-Terrasson et al. [6] 1997 O(Nm) Exact Pseudopoly.

one algorithm. The notation W is the largest integer arc weight, T is the sum of the transit times in the input graph, which is assumed to be an integer for the algorithms in rows 13 and 15, and N is a bound on the number of iterations that Howard's algorithm does. The previously known bound on N is the product of the out-degrees of all the nodes [6]. In this paper, we give two improved bounds.

Let denote the amount of error that we want to tolerate when doing a comparison of two numbers.

Lawler's algorithm converges when the interval that contains the exact is smaller than , which is why its result is approximate. The algorithms in rows 8, 9, and 16 can produce approximate results because they use the same convergence criterion as that of Lawler's algorithm. We also note that Burns' algorithm, the algorithm in row 6, and Howard's algorithm produce exact results but they may have precision problems. For example, Burns' algorithm needs to perform many equality tests between two oating point numbers, and such tests are controlled by .

1.4. Scope, motivations, and contributions.

This paper reports the results of our experimental study of ten leading minimum mean cycle algorithms and the minimum mean cycle versions of the minimum cost to time ratio algorithms from Table I, all of which are named in the table. The remaining algorithms in this table are not included in our study because they are very similar to the chosen ones.

We implemented each algorithm in a uniform and ecient manner. We tested them on a series of random graphs, obtained using one generator from [5], and real benchmark circuits, obtained from logic synthesis benchmarks. The running time as well as representative operation counts, as advocated in [2], are measured and compared.

The main contribution of this paper is a comprehensive empirical study comparing the ten leading algorithms for the minimum mean cycle problem. For many of these algorithms, there has not been

(4)

a clear understanding of their performance in practice, although theoretical bounds have been proven for their running times. Only an empirical analysis can shed light on whether changes in the algorithm which make the running time theoretically more ecient are worth the overhead in terms of their payo in practice. To the best of our knowledge, this is the rst study that systematically compares their performance as well as that brings a comprehensive study of these algorithms to the attention of the diverse group of research communities, e.g., graph theory, discrete event systems, computer-aided design of digital systems, to which this problem is very important.

In the process of this study, we have gained a great deal of insight into the behavior of these algorithms and have found eective implementational improvements. One of the most surprising results of this study is that Howard's algorithm, an unknown algorithm to the graph theory community, was by far the fastest on the graphs tested in this study. There is no polynomial time bound on the running time of this algorithm although it has been proven that the algorithm terminates with a worst-case bound which is the product of the out-degrees of all the nodes [6]. We present two alternative time bounds which, although not polynomial, give stronger bounds on most graphs.

1.5. Paper organization.

x 2 presents a brief description of each algorithm with our improvements and our bounds on the running time of Howard's algorithm. ^x 3 describes our experimental framework. ^x 4 presents our experimental results and observations. Finally, ^x 5 gives our conclusions. We also have an appendix that contains the tables of the experimental results. Although we summarize most of the experimental results before the appendix, this appendix is nevertheless included to present the actual numbers.

2. Algorithm Descriptions

2.1. Denitions.

The minimum cycle mean of a graphG= (V;E) can be dened as the optimum value ofin the following linear program:

max s.t. d(v)^,d(u)w(u;v)^,; ⁸(u;v)²E; (3) where d(v) is called the distance (or the node potential) of v. When the inequalities are all satised, d(v) is equal to the weight of the shortest path fromstov in G when is subtracted from every arc weight. The node s is arbitrarily chosen as the source in advance. LetG denote the graph obtained from Gby subtractingfrom its every arc weight. The minimum cycle mean is the largest value of for which G has no negative cycles.

We say that an arc (u;v)²E is critical ifd(v)^,d(u) =w(u;v)^,, which we refer to as the criticality criterion. We say that a node is critical if it is adjacent to a critical arc, and that a graph is critical if its every arc is critical. The critical subgraph ofG contains all the minimum mean cycles ofG. This is implied by the above linear program. Note that if the critical subgraph of G is acyclic, then it is the shortest path tree ofG.

Henceforth, we assume that the input to the algorithm,G= (V;E), is cyclic and strongly connected.

This assumption not only simplies most of the algorithms and their coding but also generally improves

(5)

their running times in practice. If G is not strongly connected, it can be partitioned into strongly connected components. Its minimum cycle mean is the minimum of those of its strongly connected components. This is the way we implemented all of the algorithms.

2.2. Burns' algorithm.

Burns' algorithm [4] is given in Figure 2. This algorithm is actually the minimum mean cycle version of the original Burns' algorithm, which can also solve the minimum cost to time ratio problem. The algorithm in [7] is almost identical to this algorithm. The only \major" dierence is that in the former algorithm, the node heights are positive whereas in Burns' algorithm, they are negative (lines 14-19).

Note that the authors of [7] give the running time of their algorithm as O(n⁴), which agrees with the O(n²m) running time of Burns' algorithm.

Burns' algorithm is based on linear programming. It is an iterative algorithm constructed by applying the primal-dual method. It solves the above linear program (Equation 3) and its dual simultaneously.

In essence, the behavior of Burns' algorithm is very similar to that of the parametric shortest path algorithms such as the KO algorithm: The KO algorithm improves upon an initial acyclic critical subgraph of G until the critical subgraph becomes cyclic, at which point the minimum cycle mean is found. Burns' algorithm also operates on the critical subgraph and terminates when it becomes cyclic but every iteration, it reconstructs the critical subgraph from scratch.

Burns' algorithm produces exact results but the equality check in line 9 can create some precision problems because the variables involved in this check are oating point numbers. This equality check is used to determine if an arc is critical. In our implementation, we changed this check so that an arc e = (u;v) is critical when (d(u)^,d(v))^,(w(u;v)^,)< , where is a very small, user-dened constant.

In our implementation, lines 14-19 are performed during the topological sorting of the critical graph in line 11. Moreover, as a byproduct of the topological sorting, we know whether or not the critical graph is cyclic.

2.3. Karp's algorithm and its variants.

DeneDk(v) to be the weight of the shortest path of lengthkfroms, the source, tov; if no such path exists, thenDk(v) = +¹. Karp's algorithm [13], which is given in Figure 3, is based on his observation

that = min_v

2V ₀ max

kn^,1Dn(v)^,Dk(v)

n^,k ; (4)

which is called Karp's theorem. Karp's algorithm computes the quantities Dk(v) by the recurrence Dk(v) = min_(u;v)²_E^fDk^,1(u)+w(u;v)^g; k= 1;2;:::;n, with the initial conditions thatD0(s) = 0 and D0(v) = +¹; v⁶=s. As observed in [8], [11], [22], this recurrence, which is not implemented recursively, makes the best and worst cases of Karp's algorithm the same, which is why it runs in (nm).

We have three variants of Karp's algorithm: the DG algorithm, the HO algorithm, and the Karp2 algorithm. To summarize, the rst algorithm improves Karp's algorithm by processing the nodes and arcs of the input graph more eciently. The second algorithm checks to see if any of the cycles found before and during certain iterations in Karp's algorithm are critical. If so, it terminates. The

(6)

third algorithm improves the space complexity of Karp's algorithm. The remainder of this subsection describes these three variants in more detail.

The DG algorithm [8] improves upon Karp's algorithm by eliminating unnecessary work introduced by the above recurrence. It is given in Figure 4. It works in a breadth-rst manner in that starting from the source, it visits the successors of nodes rather than their predecessors, as done in the recurrence.

This process creates an unfolding of G, and when the algorithm is implemented using linked lists, its running time becomes equal to the size of the \unfolded" graph. Depending on the structure of G, the running time ranges from (m) toO(mn).

The HO algorithm [11] also improves upon Karp's algorithm. It is given in Figure 5. It helps to terminate Karp's algorithm early without changing its structure, i.e., it still uses the above recurrence.

It is based on the following observation: Suppose D0(v);:::;Dk(v) are computed for all v ² V and for some k < n. At this point, many of these paths will contain cycles. If one of these cycles, say C, is critical in G when = (C), then the minimum cycle mean is found and is equal to(C). Note that d(v) = min0kn^,1^fDk(v)^,k^g, which is needed to check the criticality ofC. We check for early termination whenkbecomes a power of 2, as suggested in [11]. If the early termination is not possible, this algorithm can add an overhead of O(n²+mlgn) in total to the running time of Karp's algorithm although it does not change the running time asymptotically.

In our implementation of the HO algorithm, we changed line 33 in Figure 5 to a comparison with so that an arc e= (u;v) is critical when (d(u)^,d(v))^,(w(u;v)^,)< . The reason for this change is exactly the same as the similar change for Burns' algorithm.

The Karp2 algorithm is a space ecient version of Karp's algorithm¹. Karp's algorithm takes up (n²) space in order to store the D-values. The Karp2 algorithm reduces this space requirement to linear in the number of nodes n. The Karp2 algorithm performs two passes. In the rst pass, it computes Dn(v) for each nodev without storing Dk(v) for k < n. In the second pass, it computes the fraction in Karp's theorem as it computes each Dk(v),k < n. The DG and HO algorithms also suer from this large space complexity problem. Fortunately, the technique used in the Karp2 algorithm is also applicable to them.

2.4. Parametric shortest path algorithms.

The KO algorithm [14] and the YTO algorithm [22] are in this category. They are given in Figure 6 and Figure 7, respectively. The YTO algorithm is essentially an ecient implementation of the KO algorithm. These algorithms are based on the observation that the minimum cycle meanis the largest such that G does not have any negative cycles. Thus, these algorithms start with a =^,1 and always maintain a tree of shortest paths to a source node s. These algorithms changeincrementally so that the shortest path tree changes by one arc in each iteration. When a cycle of weight zero is detected in G, that cycle is the cycle with the minimum mean.

In our implementation, we slightly improved the KO algorithm. First, we obtain the initial tree by creating arcs with zero weight from the source s, a node not in the input graph, to all the other nodes, as suggested in [22]. These arcs exist in the shortest path tree only; We do not insert these arcs to the graph. In [14], the use of a shortest path algorithm is recommended to nd this initial tree but our

1Suggested by S. Gaubert of INRIA, France.

(7)

implementation reduces the time to nd it to (n). Note that the shortest path tree foris a critical subgraph of G. Second, line 26 in Figure 6 ensures that an update for an arc is performed only when exactly one of its endpoints is in the shortest path tree. In [14], these updates are performed when at least one of its endpoints is in the shortest path tree. It is easy to prove that our improvement is correct. The rst improvement is also applied to the YTO algorithm.

2.5. Lawler's algorithm.

Lawler's algorithm [15] is given in Figure 8. It is based on the same observation as the parametric shortest path algorithms. It also uses the fact that ofGlies between the minimum and the maximum arc weights inG. Lawler's algorithm does a binary search over the possible values of and checks for a negative cycle in Gevery iteration. If one is found, then the chosen is too large so it is decreased;

if not, it is too small so it is increased. Lawler's algorithm terminates when the interval for the possible values of becomes too small. The size of that interval,, determines the precision of the algorithm.

In our implementation, we observed that line 5 in Figure 8 may sometimes evaluate to either H or L even though they are dierent. This is a precision problem due to the use of oating point numbers.

When this problem occurs, the algorithm goes into an innite loop. In order to prevent this problem, we added the checks in lines 9 and 11. They basically mean that if is too close to one of the bounds, then just terminate the algorithm because is good enough.

The performance of this algorithm depends on the width of the interval of H and L and the performance of the algorithm used in line 7 to nd a negative cycle. We used the standard Bellman-Ford algorithm but optimized it for this algorithm. The time complexity of the best negative-cycle detection algorithms is the same as that of the Bellman-Ford algorithm but their performance in practice may dier considerably. Thus, using a faster negative-cycle detection algorithm in line 7 will improve the performance of Lawler's algorithm. Another improvement is to set H to a smaller value. Since the mean of any cycle is an upper bound on the minimum cycle mean, we can set H to the mean of any cycle instead of the maximum arc weight. In order to ensure that we have a small cycle mean for H, we can compute the cycle mean in the graph in which each node has one successor arc that has the minimum weight among all the successor arcs of that node. This graph has n arcs, which also speeds up the computation of the cycle mean within it.

2.6. Howard's Algorithm.

Howard's algorithm [6] is given in Figure 9. Although this algorithm is well known in the control community, the version in [6] which we use here is slightly dierent than the classical version known to the control community. In particular, the authors of [6] have adapted the algorithm specically for the problems we consider in this paper, whereas the previous versions addressed a broader class of problems. In addition, the version found in [6] uses linear time per iteration, whereas previous versions used O(n³) time per iteration.

Howard's algorithm is similar to the style of the parametric shortest path algorithms except that it starts with a large and decreases until the shortest paths in G are well dened. For a given , the algorithm attempts to nd the shortest paths from every node to an chosen node w. In doing so, it either discovers that the shortest paths are well dened in which case the correct has been found

(8)

or it discovers a negative cycle in G. In the latter case, the negative cycle has a smaller mean weight than the current. In this case,can be updated to the mean weight of the new cycle and the process continues.

Howard's algorithm computes the shortest paths in G in a somewhat unusual way. A policy graph is maintained which is simply a subgraph of G such that the out-degree of each node is exactly one.

The designated sink node w is always a node on a cycle in the policy graph. The current is always the mean weight of a cycle in the policy graph. The policy graph is an estimate on the current shortest paths from each node tow.

Howard's algorithm maintains a distance d(u) of a node u which is an estimate on the distance to the designated node. In each iteration of Howard's algorithm, a pass is made through all the arcs and each d(u) is updated as follows:

d(u) = min_(u;v)

2E^fd(v)+w(u;v)^,^g: (5) The policy graph is also updated so that the arc going out of u points to the node v for which the minimum was achieved. In the next iteration,is chosen to be the mean weight of a cycle in the policy graph,wis updated and the process continues. It is also possible to choose the cycle with the minimum mean in the policy graph. Since the policy graph contains narcs, such a cycle can be found in linear time.

If there is an iteration in which the policy graph does not change, then Equation 5 has been satised for every node. This can only happen when the graph Gdoes not have a negative cycle. In this case, is the minimum mean cycle weight and eachd(u) is its distance inG to some node on a zero length cycle. Note that Equation 5 is almost equivalent to Equation 3 with the dierence that the places of d(u) andd(v) in the latter are exchanged in the former.

In the implemention of the algorithm, d(u) in Equation 5 and the arc going out of u in the policy graph are only updated if d(u) will improve by some which is the precision of the algorithm. Note that unlike Lawler's algorithm, Howard's algorithm does not use to control the value of.

The beauty of Howard's algorithm is that each iteration of the algorithm is extremely simple and requires only (m) time. Meanwhile, although it ensures that the value ofis non-increasing from one iteration to another, it usually manages to make signicant progress in decreasing its value in a very few number of iterations.

In [6], it is proven that Howard's algorithm does terminate by showing that the algorithm never has the same policy graph during its run. Thus, the running time is bounded by the total number of policy graphs, which is the product of the out-degrees of all the nodes in the input graphG. Below, we prove two bounds on the running time of Howard's algorithm. For our proofs, we use a slightly dierent version of Howard's algorithm than the one in Figure 9. The new version is given in Figure 10. The new version essentially selects the cycle with the minimum mean in the policy graph in an iteration rather than an arbitrary cycle in in the policy graph.

Theorem 1:

decreases by at least =n at least every n iterations of the main loop of Howard's algorithm in Figure 10, where is the precision of the algorithm.

The proof of this theorem appears in the Appendix. Of course, this is a very pessimistic bound since in our experiments, almost always decreases and usually by much more than .

(9)

The theorem yields the following two corollaries.

Corollary 1:

The running time of Howard's algorithm in Figure 10 is at most O(nm), where is the number of simple cycles in the graphG.

Corollary 2:

The running time of Howard's algorithm in Figure 10 is at most O(n²m(wmax^,wmin)=);

wmax and wmin are the maximum and minimum arc weights in the graph G and is the precision of the algorithm.

2.7. Scaling algorithms.

The OA1 and OA2 algorithms [19] are in this category. They assume that the arc weights are integers bounded by W. If W is polynomial in n, then these algorithms are asymptotically the fastest algorithms. The OA2 algorithm applies scaling to a hybrid version of an assignment algorithm, called the auction algorithm, and the successive shortest path algorithm. It uses an approximate binary search technique. The OA1 algorithm is the same as the OA2 algorithm except that it does not use the successive shortest path algorithm. We do not give the pseudocode for the OA1 and OA2 algorithms because the pseudocode is quite long when described in as much detail as the others in this paper. For the pseudocode, we refer to [19].

3. Experimental Framework

We programmed the algorithms in C++ using the LEDA library version 3.4.1. This library is a template library for ecient data types and algorithms [18]. It may be slower than a custom implementation of these data structures for the sole purpose of comparing the algorithms in this paper but we found it quite ecient and useful. In addition, the data structures of this library are the basis of every algorithm so we expect that the comparison of the same kind of operations between two dierent algorithms is accurate.

In order to ensure uniformity of implementation, all the algorithms were implemented by one of us.

Thus, they were programmed in the same style. In addition, the same routines to read and write the graph and to nd its strongly connected components were used. We also attened each algorithm in that we manually inlined all the functions other than the functions needed by the LEDA data types.

This eliminated the overhead of function invocations. The total size of the programs is approximately 2700 lines of C++ code.

We compiled and linked each program using the Sun C++ compiler CC version 3.0.1 under the O4 optimization option. We conducted the experiments on a Sun Sparc 20 Model 512 with two CPUs, 64 MB of main memory, and 105 MB of virtual memory. The operating system was SunOS version 5.5.1.

We did two sets of experiments: one to measure the running time of each algorithm and another to count the key operations of each algorithm, as advocated in [2]. Our test suite contains random graphs, generated using SPRAND [5], and cyclic sequential multi-level logic benchmark circuits, obtained from the 1991 Logic Synthesis and Optimization Benchmarks [21]. SPRAND produces a graph with nnodes

(10)

and m arcs by rst building a Hamiltonian cycle on the nodes and then adding m^,narcs at random.

This cycle makes the graph strongly connected. We generated 10 versions of each random graph. The experimental data reported for these graphs in this paper are the average over these 10 versions. The arc weights in the random graphs as well as the benchmark circuits were randomly assigned such that they are uniformly distributed in [1;10000], which is the default weight interval in SPRAND. The properties of the random graphs and circuits are given together with the experimental results, e.g., see Tables II and IV.

A close look at our test suite will reveal that the random graphs in our test suite are quite sparse.

The reason for this choice is that we wanted to have random graphs which best represent real circuits, which are sparse. In addition, all the applications of these algorithms as mentioned in^x1.2 usually have sparse graphs. In general, we were not interested in evaluating the performance of these algorithms on graphs that do not exist in any application that we know of.

We did more experiments than were reported in this paper. Since the trend for the dependence of the performance on the graph parameters is evident from those that we included in this paper, we did not see any need to include more experimental results. When doing our experiments, we tried to follow the guidelines in [1], e.g., the graph parameters, the number of versions of the same graph, etc. When comparing the algorithms using their operation counts, we compared only the relevant ones because all the algorithms do not have the same kind of operations. For instance, we compared only the KO and YTO algorithms for the number of heap operations.

We are not aware of any other published work that contains a comparison of the algorithms that we compare. As a matter of fact, most of the works that introduced these algorithms do not present any experimental results on these algorithms. Worse, some of the works are not even aware of the others.

Hence, our study should be taken as a rst step in the direction of obtaining better versions of these algorithms using the insight from experimental analysis. We believe that these algorithms have a great deal of potential for improvement.

4. Experimental results and observations

We now list the experimental results and our observations as they apply to our test suite. We also present the results of some of the experiments on the representative operation counts of the algorithms.

We will explain these counts very briey because of lack of space.

4.1. The minimum cycle mean and the graph parameters.

For the random graphs, the minimum cycle mean is almost independent of the number of nodes, and it changes inversely with the density of the graph, as shown in Figure 1. This is expected because as the density of a graph increases, the graph contains more cycles and the critical cycles get smaller.

This simple observation will be used to explain the behavior of some of the algorithms.

4.2. KO versus YTO.

In our implementation, both algorithms use Fibonacci heaps, which is the default heap data structure in LEDA. The YTO algorithm stores the node keys in the heap rather than the arc keys, which is the case in the KO algorithm, in an attempt to decrease the number of increase key operations. In

(11)

1000 2000 3000 4000 5000 6000 7000

512 1024 2048 4096 8192

Minimum cycle mean

Number of nodes

d=2 d=3 d=4 d=5 d=6

Figure 1. The minimum cycle mean versus the number of nodes n on the random graphs. Here d is the density of the graph, d = 2m=n, and the x-axis is logarithmic.

our implementation, increase and decrease key operations are implemented using heap insertions and deletions.

These two algorithms can be compared in terms of the number of iterations, the number of heap insertions and deletions in total and per iteration, and the running time. Tables VI and VII compare these algorithms for their parameters other than the running time on the random graphs and the circuits, respectively. We can see that both the algorithms perform almost the same number of iterations on each test case; however, the YTO algorithm provides savings in the number of heap operations, especially in the number of insertions. The savings are more on the random graphs, and they get better as the density increases because the rate of increase in these numbers in the KO algorithm is larger.

As shown in Tables II and IV, on the random graphs, their running times are comparable but the YTO algorithm performs a bit faster when the density increases. This is expected because the YTO algorithm performs fewer heap operations. On the circuits, their running times are comparable for the smaller circuits but the KO algorithm is a bit faster when the size of the circuits gets larger. This is unexpected. Our explanation for this behavior is that the savings achieved by the decrease in the number of heap operations are not enough to compensate for the extra time needed to maintain the node keys in the YTO algorithm.

4.3. Number of iterations.

Burns', KO, YTO, and Howard's algorithms perform a number of iterations before they converge.

An upper bound on the number of iterations for the rst three algorithms is n², and that for Howard's algorithm is the product of the out-degrees of all the nodes. We have also measured the value ofkwhen the HO algorithm terminates. We refer to this value as the number of iterations of the HO algorithm although it is not one in the sense of the other algorithms. It is always less than n.

As shown in Tables VIII and IX, the number of iterations is always less than the number of nodes for each algorithm. One anomaly is the behavior of Howard's algorithm on a random graph of 512 nodes and 1024 arcs. It seems that unless n = m, the number of iterations for the rst three algorithms is around n=2 on the random graphs, each of which is strongly connected. Moreover, Burns' algorithm performs fewer number of iterations than the KO algorithm, and the KO and YTO algorithms perform

(12)

the same number of iterations. The number of iterations of the Howard's algorithm is drastically small compared to the other algorithms. In [6], it is conjectured that the number of iterations is O(lgn) on the average, and it isO(m) in the worst case. Our experiments support the worst case conjecture. They also show that the number of iterations for Howard's algorithm and the HO algorithm gets smaller as the density of the graph increases although some anomalies exist. This can be explained by the rst observation.

4.4. Karp's algorithm and its variants.

The improvement achieved by the DG algorithm in the number of arcs visited during the computation of Dk(v) for each v is very small on the random graphs, indicating that it is not eective for dense graphs. Note that when n = m, the running time of the DG algorithm is linear whereas that of Karp's algorithm is quadratic. The improvement on the circuits is far better, which explains the better performance of the DG algorithm than Karp's algorithm.

The space ecient version of Karp's algorithm, the Karp2 algorithm, roughly doubles its running time, as expected. The space eciency of the Karp2 algorithm is directly applicable to the DG and HO algorithms. The most eective improvement on the Karp's algorithm is the HO algorithm. Its running time is even better than those algorithms which are asymptotically faster than it. Extrapolating from the Karp2 algorithm, we can say that the space ecient version of the HO algorithm will double its running time, which still maintains its superiority to most of the other algorithms.

4.5. Running times.

The running time comparisons are given in Tables II, III, IV, and V. The results show the following:

1. Howard's algorithm is the fastest. Its running time is impressive. The HO algorithm ranks second, which indicates that the early termination scheme in the HO algorithm is very eective. The slowest algorithm is Lawler's algorithm. This ranking is based on our implementation choices.

2. The good performance of Karp's algorithm, especially on small test cases, is mostly due to its simplicity; it contains three simple nested loops. Its simplicity facilitates its optimization by a compiler, e.g., when compiled without optimization, the DG algorithm almost always beats it.

However, it seems that as the number of nodes gets larger, its performance degrades more rapidly.

3. Burns' algorithm is slower than the KO and YTO algorithms although it performs fewer number of iterations and it does not perform expensive operations such as heap operations. We attribute this behavior to the fact that it is not incremental; every iteration builds from the scratch.

4. The OA1 and OA2 algorithms are not as fast as their running time implies. They are in general slower than Karp's algorithm. We attribute much of this to their complexity; they are more dicult to optimize than the other algorithms. It is interesting to note that these algorithms perform very well for the two largest circuits and that they perform very badly when n = m. We have also performed experiments on the parameters of the OA1 and OA2 algorithms. We observed that the successive shortest path algorithm, which comes after the assignment algorithm in the OA2 algorithm, is never invoked. Experiments on some other random graphs show that whenever it is invoked on a graph, it is invoked only a few times. This indicates that the assignment algorithm used in these algorithms is very good at assigning all but a few of the nodes, so it may be more

(13)

protable to focus on improving it rather than the successive shortest path algorithm.

5. Conclusions and Future Work

We have presented a comprehensive experimental evaluation of ten leading algorithms for the minimum mean cycle problem. This is the rst such study. We systematically compare these algorithms on random graphs as well as benchmark circuits and provide important insights into their individual performance as well as relative performance in practice. One of the most surprising results of this study is that Howard's algorithm, a well known algorithm to the stochastic control community but an unknown algorithm to the graph theory community, is by far the fastest algorithm on the graphs tested in this study. Unfortunately, the only known bound on the running time of this algorithm is exponential. We present two new bounds on its running time that are stronger on most graphs.

We are working on improving these algorithms based on the insight that we have obtained from this study. We have achieved some encouraging results for Howard's algorithm and Lawler's algorithm. We are going to present these results in a future paper.

References

[1] Ahuja, R. K., Kodialam, M., Mishra, A. K., and Orlin, J. B. Computational investigation of maximum ow algorithms. European J. of Operational Research, 97 (1997), 509{542.

[2] Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. Network Flows. Prentice Hall, Upper Saddle River, NJ, USA, 1993.

[3] Bacelli, F., Cohen, G., Olsder, G. J., and Quadrat, J.-P.Synchronization and Linearity. John Wiley & Sons, New York, NY, USA, 1992.

[4] Burns, S. M. Performance analysis and optimization of asynchronous circuits. PhD thesis, California Institute of Technology, 1991.

[5] Cherkassky, B. V., Goldberg, A. V., and Radzik, T. Shortest ptah algorithms: Theory and experimental evaluation. InProc. 5th ACM-SIAM Symp. on Discrete Algorithms(1994), pp. 516{525.

[6] Cochet-Terrasson, J., Cohen, G., Gaubert, S., McGettrick, M., and Quadrat, J.-P.Numerical computation of spectral elements in max-plus algebra. InProc. IFAC Conf. on Syst. Structure and Control (1998).

[7] Cuninghame-Green, R. A., and Yixun, L. Maximum cycle-means of weighted digraphs. Applied Math.-JCU 11 (1996), 225{34.

[8] Dasdan, A., and Gupta, R. K. Faster maximum and minimum mean cycle algorithms for system performance analysis. To appearieee transactions on computer-aided design, 1997.

[9] Gerez, S. H., de Groot, S. M. H., and Herrmann, O. E. A polynomial-time algorithm for the computation of the iteration-period bound in recursive data-ow graphs. IEEE Trans. on Circuits and Syst.-1 39, 1 (Jan. 1992), 49{52.

[10] Gondran, M., and Minoux, M. Graphs and Algorithms. John Wiley and Sons, Chichester, NY, USA, 1984.

[11] Hartmann, M., and Orlin, J. B. Finding minimum cost to time ratio cycles with small integral transit times.

Networks 23(1993), 567{74.

[12] Ito, K., and Parhi, K. K. Determining the minimum iteration period of an algorithm. J. VLSI Signal Processing 11, 3 (Dec. 1995), 229{44.

(14)

[13] Karp, R. M.A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23 (1978), 309{11.

[14] Karp, R. M., and Orlin, J. B.Parametric shortest path algorithms with an application to cyclic stang. Discrete Applied Mathematics 3(1981), 37{45.

[15] Lawler, E. L.Combinatorial Optimization: Networks and Matroids. Holt, Reinhart, and Winston, New York, NY, USA, 1976.

[16] Mathur, A., Dasdan, A., and Gupta, R. K. Rate analysis of embedded systems. ACM Trans. on Design Automation of Electronic Systems 4, 2 (Apr. 1999).

[17] Megiddo, N. Combinatorial optimization with rational objective functions. Mathematics of Operations Research 4, 4 (Nov. 1979), 414{424.

[18] Mehlhorn, K., and Naher, S.LEDA: A platform for combinatorial and geometric computing. Comm. of the ACM 38, 1 (1995), 96{102.

[19] Orlin, J. B., and Ahuja, R. K. New scaling algorithms for the assignment and minimum mean cycle problems.

Mathematical Programming 54 (1992), 41{56.

[20] Teich, J., Sriram, S., Thiele, L., and Martin, M.Performane analysis and optimization of mixed asynchronous synchronous systems. IEEE Trans. Computer-Aided Design 16, 5 (May 1997), 473{84.

[21] Yang, S. Logic synthesis and optimization benchmarks user guide version 3.0. Tech. rep., Microelectronics Center of North Carolina, Jan. 1991.

[22] Young, N. E., Tarjan, R. E., and Orlin, J. B.Faster parametric shortest path and minimum-balance algorithms.

Networks 21(1991), 205{21.

(15)

Appendix

A. The Proof of the time complexity of Howard's Algorithm

Proof of Theorem 1.

We refer in this proof to the outline of the algorithm in Figure 10. Consider G before an after the for-loop of line 21. We will denote the resulting graph after the for-loop of line 21 as G⁰. Note thatG has exactly one cycle after the while-loop of line 13. We will call an arc new if it is in G⁰ but not in G. If there are no new arcs in G⁰, then the algorithm stops. If there is a cycle in G⁰ with any new arcs, then must decrease. This is because we know that for each u ² V, d(u) +^,w(u;(u))^,d((u)) = 0. The policy (u) will only get reassigned to, say v, if d(u)+^,w(u;v)^,d(v)> . Thus, for any new arc inG⁰, we know thatd(u)+^,w(u;v)^,d(v)> . Thus, if there is a cycle C with a new arc in G⁰, we know that

X

(u;v)²Cd(u) +^,w(u;v)^,d(v) = ^X

(u;v)²C^,w(u;v) =^jC^j^,w(C)> : (6) Let the mean of C be(C), where (C) =w(C)=^jC^j. Using Equation 6, we have

jC^j^,w(C) =^jC^j^,^jC^j(C) =^jC^j(^,(C))> ; (7) which implies that

^,(C)>

jC^j

n: ⁽⁸⁾

Thus, the mean (C) of cycle C is at least =n less than .

We only have to worry now about the case where there are new arcs in G but no cycles with new arcs. We will prove that this can happen at most n^,1 times in a row. Let G₁;G₂;:::;G_x be the sequence of policy graphs where in each iteration, there are no cycles with new arcs. This means that all these graphs have exactly one cycle and it is the same cycleC in every graph. This also implies that the chosen vertex u from which we do our reverse BFS is always the same. Since each graph G_j has exactly one cycle, every vertexv has a unique simple path to the chosen vertexu. Given a graphG_j, we will assign each vertex v a numbern(v) which is the minimum number of arcs in this path fromv to u. We will prove that for k > j, there will be no new paths in G_k to u with j or fewer arcs. This will imply that after G_n, either the graph will not change (in which case the algorithm terminates) or there will be a path with at most narcs, which means there is a cycle with a new arc in the graph.

We will prove our claim by contradiction. Consider the rst time in makingG_j for somej, that(y) is reassigned fromx to z wheren(z)< j ^,1. Let d⁰(y) andd⁰(z) be thed() values for y and z at the time of this reassignment. We know thatd⁰(y)> d⁰(z)+w(y;z)^,. Let d⁰⁰(y) andd⁰⁰(z) be the values for d() at the time the arc (y;z) is considered in making G^j^,¹. Since this is the rst time that the claim has been violated, we know that the path fromz tou has remained unchanged sinceG^j^,² which means that d⁰⁰(z) = d⁰(z). We also know that d⁰⁰(y) d⁰(y) since the d() values can not increase in time. Putting these inequalities together, we get that d⁰⁰(y) > d⁰⁰(z) +w(y;z)^,which implies that at the time G^j^,¹ was being made, (y) would have been reassigned to z and d(y) would have been assigned a value ofd⁰⁰(z)+w(y;z)^,which is strictly less thand⁰(y), the value ofd(y) at the time G_j is made. Since the d() values can not increase, this is a contradiction.

(16)

B. Algorithms in Pseudocode

Input: A strongly connected digraphG= (V;E).

Output:ofG. /* Initialize */

1 ^foreach node u²V ^do 2 d(u) 0

/* Setto the minimum arc weight. */

3 +¹

4 ^foreach arc (u;v)²E ^do 5 MIN^f;w(u;v)^g

/* Iterate */

6 ^while(true)^do

/* Find the setE⁰ of critical arcs */

7 E⁰

8 ^foreach arce= (u;v)²E^do

9 ^if(d(v)^,d(u))^,(w(u;v)^,) = 0^then 10 E⁰ E⁰+^fe^g/*e is critical */

11 Topologically sort the graphG⁰= (V;E⁰) of critical arcs.

12 ^if(G⁰ is cyclic)^then 13 ^return

/* Compute the (negative) node heightshin G⁰ */

14 ^foreach nodev²V in the topological order ofG⁰ ^do 15 ^if(vhas no predecessors)^then

16 h(v) 0

17 ^else

18 ^foreach predecessor nodeuof nodevinE⁰ ^do 19 h(v) MIN^fh(v);h(u)^,1^g

/* Compute*/

20 =^,1

21 ^foreach arc (u;v)²E^do 22 ^if(h(v)^,h(u) + 1>0)^then

23 =MAX^f;((d(v)^,d(u))^,(w(u;v)^,))=(h(v)^,h(u) + 1)^g /* Update using */

24 ^,

25 ^foreach nodeu²V ^do 26 d(u) d(u)^,h(u)

Figure 2. Burns' algorithm.

(17)

1 ^foreachk= 0^ton^do 2 ^foreach nodeu²V ^do 3 d^k(u) +¹

4 d⁰(s) 0 /*s is any node inV */

/* Compute the distances */

5 ^foreachk= 1^ton^do 6 ^foreach nodev²V ^do 7 ^foreach arc (u;v)²E ^then

8 d^k(v) MIN^fd^k(v);d^{k ,1}(u) +w(u;v)^g /* Compute using Karp's theorem */

9 ^foreach node u²V ^do

10 ⁰ +¹

11 ^foreachk= 0^to(n^,1)^do

12 ⁰ MAX^f⁰;(dⁿ(u)^,d^k(u))=(n^,k)^g 13 MIN^f;⁰^g

14 ^return

Figure 3. Karp's algorithm.

(18)

1 ^foreach node u²V ^do 2 level(u) ^,1

3 Find a \good" or arbitrary source nodes. 4 d⁰(s) 0;⁰(s) ^,1

5 level(s) 0;ENQUEUE(Q; <0;s >) /* Compute the distances */

7 < k;u > DEQUEUE(Q)

8 ^do

9 ^foreach arc (u;v)²E^do 10 ^if(level(v)< k+ 1)^then 11 ENQUEUE(Q; < k+ 1;v >) 12 ^{k +1}(v) level(v)

13 level(v) k+ 1 14 d^{k +1}(v) +¹

16 d^{k +1}(v) MIN^fd^{k +1}(v);d^k(u) +w(u;v)^g 18 < k;u > DEQUEUE(Q)

19 ^while(k < n)

/* Compute using Karp's theorem */

20 +¹

21 ^foreach node u²V ^do 22 ^if(level(u) =n)^then

23 ⁰^,1

24 k ⁿ(u)

25 ^while(k >^,1)^do

27 ⁰ MAX^f⁰;(dⁿ(u)^,d^k(u))=(n^,k)^g 29 k ^k(u)

31 MIN^f;⁰^g

33 ^return

Figure 4. Dasdan-Gupta's algorithm (the DG algorithm).