**Part I Sciencific achievements**

**4.1 Resource allocation optimization for clouds**

Bel(S1) = mS1(T) (3.21)

Pl (S1) = mS1(T)+ mS1(T, F) (3.22)

**3.5.3.3 Attacks assessment **

The attacks assessment consists of data fusion of the evidences obtained from sensors by using the Dempster’s combination rule, with the purpose of maximizing the DDoS true positive rates and minimizing the false positive alarm rate. mS1,S2(T) can be calculated using Table 16 and equation (3.16).

*m*_{S1}*(T) * *m*_{S1}*(F) * *m*_{S1}*(T, F) *

*m*_{S2}*(T) * m_{S1}(T) * m_{S2}(T) m_{S1}(F) * m_{S2}(T) m_{S1}(T, F) * m_{S2}(T)
*m*_{S2}*(F) * m_{S1}(T) * m_{S2}(F) m_{S1}(F) * m_{S2}(F) m_{S1}(T, F) * m_{S2}(F)
*m*_{S2}*(T, F) * m_{S1}(T) * m_{S2}(T, F) m_{S1}(F) * m_{S2}(T, F) m_{S1}(T, F) * m_{S2}(T, F)

Table 16 mS1,S2 calculation **[R 126**]

As a conclusion of our work, we affirm that by using DST our proposed solution has the following advantages: to accommodate the uncertain state, reduce the false negative rates, increase the detection rate, resolve the conflicts generated by the combination of information provided by multiple sensors and alleviate the work for cloud administrators.

**4 R****ESOURCE ALLOCATION **

The properties of the resources may allow us to model the system elements in terms of resource segments. For example, we may model one particular memory as a resource, if we know the probability of failure for it. On the other hand, if we don’t have the probability of failure at this level, we may model the whole memory at a given locality as a resource. Therefore, this model is flexible in supporting a complete resolution of the system elements.

We allocate required resources to executing an “to be executed” objects. These resources are physically linked according to their geographic and hardware constraints. However, in addition to resources, objects may need services that are provided by other objects, which in turn may need other services and resources and so on. We represent this relationship with a graph, where objects and

resources are nodes and the relations between them are direct arcs. The resources are always the leaf-nodes; they are not expected to need services from other resources.

This model can be utilized for identifying the number of resources that need to be allocate to ensure the availability/reliability of a service provided by a Cloud Provider.

The services provided by CSP each involve the running of a series of instances that we can associate with the objects of the proposed model. Successful completion of the service is conditional upon the successful completion of the instances, depending on their turn, by the proper functioning of the resources involved.

We assume that realization of a cloud service involves the use of several instances in the cloud (all of which must be functional), which in turn are conditioned by the proper functioning of several resources. The Service Reliability Model will be a series model that will include a number of n components (the resources involved in performing the service) serially connected from the reliability point of view is considered.

q1 q2

c1 c2

qn

cn

a

q1,1 c1

q1,2 c1

q1,m 1

c1

q2,1 c2

q2,2 c2

q2,m2 c2

qn,1 cn

qn,2 cn

qn,mn cn

b

Figure 42 Reliability Model

Each element *i* (1 £ *i* £ *n*) is characterized by its failure probability, *q*_{i}, and by its
cost, *c*_{i}, as depicted in Figure 42.a.

The reliability function *P*_{p}, respectively the cost function *C*_{p} for this system, are
computed using the following expressions [R 163]:

Figure 41 The Allocation model

### Õ

=-

= ^{n}

*i*

*i*

*p* *q*

*P*

1

) 1 (

(3.23)

### å

=

= ^{n}

*i*
*i*

*p* *c*

*C*

1 (3.24)
We consider for the entire system the Distributive Static Redundancy (DSR) by which
each element *i (i=1…n)* is replaced with a group of *m*_{i} identical elements of the same
type, connected in parallel from the reliability point of view (Figure 42.b). We admit
the case that the elements in the same group have the same attributes of the safety
running, which is stated by the following relation [R 163]:

*q*_{i} *q*_{i} *q*_{i}_{m} *q*_{i}

*i* =

=

=

= _{,}_{2} _{,}

1

, ... , for *i* = 1, *n* (3.25)

In these conditions, for the redundant system, the function of reliability (*P*_{D}) and the
expression for the cost (*C*_{D}) are given by the relations (3.26) and (3.27) [R 163].

𝑃_{ž} = ^{(}_{32%} 1 − 𝑞_{3}^{&}^{P} (3.26) 𝐶_{ž} = ^{(}_{32%}𝑚_{3}∙𝑐_{3} (3.27)
We will consider only the situation in which we have: 𝑄_{ž} = 1 − 𝑃_{ž} ≪ 1. This condition
is fulfilled in the case that every group of elements has a high reliability, which is
equivalent with relation (3.28) [R 163].

<<1

*m**i*

*q**i* , for *i* = 1, *n* (3.28)

Considering this last relation, it results:

𝑃_{ž} ≅ 1 − ^{(}_{32%}𝑞^{&}^{P} (3.29)

**4.1.2 The allocation algorithm **

The distributed DSR problem can be addressed in two ways. The first approach
considers the maximum allowed value of the cost, *C*_{DM}, and finds the values of

*n*
*i**, ... m*
*, ... m*
*, m*

*m*_{1} _{2} in such a way, that the reliability function of the redundant system,
*P**D*, has the maximum (optimum) value. The second approach considers the minimum
allowed value of the reliability function, *P*_{Dm} and searches for the values of

*n*
*i**, ... m*
*, ... m*
*, m*

*m*_{1} _{2} so that the cost function of the redundant system, *C*_{D}, has the
minimum (optimum) value. The un-faulty working probability, *P*_{D}, is a monotonous
rising function dependent of *n* variables, *m*_{1}*, m*_{2}*, ... m*_{i}*, ... m*_{n}, which respect the relation
[R 163]:

*DM*
*n*

*i*

*i*

*i* *c* *C*

*m* × =

### å

=1 (3.31)One of these variables (*m*_{n}, for instance) can be expressed depending on the others,
as presented below [R 163]:

Expressing *m*_{n} from relation (3.31) and replacing it in relation (3.29), leads to the next
relation.

𝑃_{ž} = 1 − ^{(}_{32%}𝑞_{3}^{&}^{P}− 𝑞_{(}^{†£¤¥} ^{OP∙¦P}

KPMN

¦K (3.32)

Next, looking for the maximum value of the function *P*_{D}

### (

*m*

_{1}

*, m*

_{2}

*, ... m*

_{i}

*, ... m*

_{n}

### )

, we will use the partial differential annulment method.ln 0

ln × =

+

× -

=

*n*
*n*
*m* *i*

*i*
*m*
*i*
*i*
*D*

*c*
*q*
*q* *c*

*q*
*m* *q*

*P* _{n}

*n*
*i*

¶

¶

Þ × =a

× =

*n*
*n*
*m*
*n*
*i*

*i*
*i*

*c*
*q*
*q*

*c*
*q*

*q* ln ^{n} ln

(3.33)

It is important to note that a is a coefficient that is independent of * i*.

Let: 𝛽_{3} = _{œ(¨}^{—}^{P}

P (3.34)

Þ 𝛼 =^{¨}_{ª}^{P}^{OP}

P (3.35)

Then: 𝑚_{3} =^{œ( «∙ª}_{œ(¨}^{P}

P = ^{ª}^{P}^{∙œ( «∙ª}_{—} ^{P}

P (3.36)

Substituting this computed relation for *m**i* in relation (3.31) gives:

𝐶_{ž?} = ^{(}_{32%} 𝛽_{3}∙ 𝑙𝑛 𝛼 ∙ 𝛽_{3} = ^{(}_{32%}𝛽_{3} ∙ 𝑙𝑛 𝛼 + ^{(}_{32%} 𝛽_{3} ∙ 𝑙𝑛 𝛽_{3} (3.37)
Þ 𝑙𝑛 𝛼 =^{H}^{£¤}^{I} ^{K}^{PMN}^{ª}_{ª}^{P}^{∙œ( ª}^{P}

K P

PMN (3.38)

Consequently, returning to (3.36), we obtain:

𝑚_{3} =^{œ( « -œ( ª}_{œ(¨} ^{P}

P (3.38)

So, for calculating the values of *m**1*, *m**1*, … *m**i*, … *m**n*, we have the following algorithm:

The Input data are: the cost ci, and the failure rate qi, of each component For a serial reliability model

Begin

For each elementar component from 1 to n calculate Begin

𝛽3= 𝑐_{3}
𝑙𝑛𝑞3

𝑆1 = 𝑆1 + 𝛽_{3}∙ 𝑙𝑛 𝛽_{3} ,
𝑆3 = 𝑆3 + 𝛽3

end

Calculate: 𝑙𝑛 𝛼 =^{H}^{£¤}_{@¯}^{I@%}

For each elementar component from 1 to n calculate begin

𝑚3=𝑙𝑛 𝛼 + 𝑙𝑛 𝛽3

𝑙𝑛𝑞3

end

Apply GA for obtaining integer solutions End

Figure 43 Pseudocode for obtaining the integer values for mi

Applying the above relations, one usually obtains fractional values for *m**1*, *m**1*, … *m**i*, …
*m**n*. The cost of the obtained system is exactly the maximum imposed one, *C*_{DM}. But
*m**i* should have positive integer values.

In order to obtain these integers values, we will initially consider for *m**i* the integer part
of the value obtained using (19). Thus, the cost of the resulted system, *C* , is less than
the imposed cost, *C*_{DM}. It remains to decide how we can use more efficiently the money
resulted from the removal of the fractional parts of elements, in order to increase the
system reliability [R 163].

For solving this, we use the genetic algorithm. The imposed cost is now [R 163]:

*DM*'

*C* _{ = }*C*_{DM}_{ - }_{C} _{. } _{(3.38) }

The inferior limit for each type of element is 0, as it is possible not to add any element of that type. Thus, the superior limit for each type is computed like this:

*i*
*DM*

*i* *c*

*C* '
max =

(3.39)

In [R 163] is investigated the effect of reducing the searching space of the genetic algorithm using a mathematical algorithm. It makes a comparison between two

implementations of a reliability system optimizing problem using genetic algorithms and mathematical methods.

[R 163] attempts to show how mathematical methods can be used to improve the results of genetic algorithms, particularly GAlib, considering two approaches for solving the distributed static redundancy. GAlib is a C++ library of genetic algorithms, implemented by Matthew Wall form the Massachusets Institute of Technology.

**4.1.3 Genetic Alghorityms **

Genetic algorithms (GAs) are global random search methods widely employed in optimization problems, or in problems where the gradient of a given objective function is not available. The power of GAs consists in only needing objective function evaluations to carry out their search.

GAs consist in having a population of candidate solutions (individuals, chromosomes) to an optimization problem that evolves at each iteration t of the algorithm, called generation [R 27] [R 75]

[R 92]. In each generation, relatively successful individuals are selected as parents for the next generation. A new generation of solutions evolves, using genetic operators like crossover and mutation. Each individual is evaluated relatively to the user defined fitness function. Individuals are then ordered by fitness and the process is repeated until a criterion of stop is reached. Fitness function is a scalar value that combines the optimization objectives and is obtained in the evaluation step, when a problem specific routine returns its value. The fittest individuals will survive generation after generation while also reproducing itself. At the same time the weakest individuals disappear from each generation [R 163].

Individuals must be encoded in some alphabet, like binary strings, real numbers, and vectors.

In a practical application of GAs, a
population pool* P(t)* of chromosomes must
be installed and they can be randomly set
initially. In each cycle of genetic evolution,

a subsequent generation is created from the chromosomes in the current population, shown in Figure 44 [R 163].

In the evaluation step, the Fitness value for each individual in the population is computed. During the selection stage, a temporary population denoted “Mating pool”

is created in which the fittest individuals have a higher number of instances than those
less fit. During the evolution, GAs employ *genetic operators * like crossover and
mutation. The crossover is applied with the probability PI. It randomly mates the
individuals and creates offsprings D1 and D2 from two parents I1 and I2 by combining

Listă de reguli Rezolvare

conflict Mesaje interne Listă de

mesaje

Interfaţă de intrare (detectori)

Interfaţă de ieşire (efectori)

Algoritm genetic

Reguli genetice noi

Componentă de alocare

credit

Mediu black box potrivire

feedback Mesaj de ieşire

(acţiune) Mesaj de intrare

potrivire

Ajustare reguli

Figure 44 Creating the next generation in Gas [R 163]:

the parental genes and transferring them to the next generation. The mutation operator is applied with a low probability Pm. It creates new individuals D3, which are inserted into the population, by randomly changing the parent I3 [R 163].

The algorithm establishes the next generation P(t+1) and by this way, individuals of the original population are substituted by the new created individuals.

In essence, the procedure of a GA is given as follows:

*Step 1. Generate randomly a population of chromosomes. *

*Step 2. Calculate the fitness for each chromosome in the population. *

*Step 3. Create offspring’s by using genetic operators. *

*Step 4. Stop if the search goal is achieved. Otherwise continue with Step 2. *

In usually optimization problems, besides the design variables there are constraints related to some physical or economical restrictions [R 76].

GAs can consider the constraints by using different methods. The most used approach is to reject the infeasible individuals. Another approach is to use penalty functions. In this case, violation of constraints takes the form of penalties. The basic idea of this approach is to “punish” the fitness value of an individual whenever the solution produced violates some of the constraints imposed by the problem [R 163].

**4.1.4 System Modeling and Implementation of the Genetic Algorithm **

The solution for the presented problem was implemented in Microsoft Visual C++ and uses the GAlib class library [R 71].

The genomes are objects of the class* *GA1DArrayAlleleGenome<int>. This class
defines an integer array genome with alleles, which means that the domain of possible
values of the elements can be specified. There can be specified one common domain
for all the elements in the array – by defining a set of alleles. We used this type of
genome to limit the searching space [R 163].

**The genome creation**

For the creation of the genome, the array of sets of alleles and the objective function are specified. The array is created as an integer array, having the size equal with the maximum number of elements that can be serially connected. The searching space for the genetic algorithm can be easily limited by limiting the size of the domains of the elements in the array genome. To encode the individuals, an array of sets of alleles is used. For each type of element, the corresponding set of alleles is appended. The program uses a configuration function that is called when another genetic algorithm is started or when the constant values of the problem change [R 163].

The configuration function performs the following actions:

- Creates the genome and the corresponding operators;

- Creates the genetic algorithm (the function gets as argument the created genome);

- Sets the parameters of the genetic algorithm;

- Calls the initialization function for the genetic algorithm.

Then, an initialize, a crossover and a mutation operator are established [R 76], [R 165], [R 210]. The initialize function initializes a genome like this: each element in the array is set to a randomly chosen value in the allele set. This value is then adjusted so that the initial genome respects the maximum imposed cost. This adjusting procedure, intended to obtain valid individuals, is also performed when mutation is applied.

The mutation operator performs two steps. First, the number of elements is computed proportionally with the mutation rate. If this number is less than 1, we cross twice the

array in each sense, changing the value of each element with a value that is randomly chosen but so that the cost does not go beyond the imposed cost. If the calculated number of elements is greater than 1, this difference will be the number of elements that will be affected by mutation. These elements are randomly selected and the new value for each is randomly established. Then adjusting is performed in order not to surpass the imposed cost [R 163].

As crossover operator, we use the uniform crossover between parents of the same length, which generates each child by randomly taking bits from each parent. For each bit we flip a coin to see if that bit should come from the mother or the father.

Figure 2 depicts the uniform crossover operator for arrays of the same length.

**Figure 45 Uniform crossover [R 163] **

The objective function for the genetic algorithm (the evaluator of the genome or fitness function) is implemented as a function that receives as argument the genome and returns the fitness value. If the cost correspondent to the genome surpasses the maximum imposed cost, the genome is penalized. This is performed returning a negative value. If the cost is allowable, the function returns the computed reliability of the system, computed using relation (3.26). Relation (3.29) could also have been used.

The presented mathematical solution works for the case the failure probability of the
elements in the system, *q**i*, is very small [R 163].

**4.1.5 Program Facilities and Obtained Results **

The implementation is available as two executable files, one for each implementation.

The user can specify the maximum imposed cost. The user is also allowed to set how
many groups of elements there are in the system (n) and, for every group, the failure
probability (𝑞_{3}) and the cost (𝑐_{3}). It can be selected the type of the genetic algorithm to
run: *Simple*, *Steady-State*, *Crowding*, *Incremental* and *Deme.*The evolution for the
selected genetic algorithm can be controlled by the following commands: reset the
evolution, evolve one generation, evolve many generations, continuously evolve, and
stop the evolution. These commands are also available through the toolbar. There is
also the possibility of specifying the genetic algorithm’s parameters: mutation
probability, crossover probability and population size.

The current generation is displayed during evolution. The best solution found until the current generation is displayed both textually and graphically [R 163].

Using the presented above mathematical method, the genetic algorithm searching space is not limited all the time with the same percent. The limitation performed by this method is dependent of the initial data [R 163].

• For limitations of the searching space to 30 % or 16 %, for instance, there are no differences between the two solutions.

• For a limitation to or 8.5 %, in 87 % of cases, there is no difference of performance between the two implementations, and in the rest of cases (13 %), the pure genetic algorithm needs mean 7 generations more.

• Not even for a limitation to 0.5 %, the difference is not sensible.

The graphic in Figure 46 presents the differences between the two implementations:

pure genetic algorithm and mathematical method + genetic algorithm in the case of a limitation of the searching space to 0.26 %, from 12168 to 32 points. The vertical axis represents the number of generations necessary to find the optimum solution. We can notice that the second method (represented with blue) is more efficient [R 163].

case

21 19 17 15 13 11 9 7 5 3 1

Generation

70

60

50

40

30

20

10 0

GA only MATH + GA

GA only

15 8 6 5 4 3 2 1 0

Count

40

30

20

10

0

**Figure 46 Differences between the two **
**implementations [R 163]**

**Figure 47 limitation of the searching space [R 163]**

In Figure 47 we can observe the difference in the case of a limitation of the searching space to 0.0128 %, from 7776 to 1 point. After this limitation, the solution is found in the first generation. The graphic shows the situation before the limitation. We can see that in 59 % of cases, the solution is obtained in the first generation, in 21.3 % – after one generation and in the rest of 19.7 % cases – after approximating 6.5 generations.

The mean number of generation after which the solution is found is 1.51. So, we can see that the effect is not considerable but for very big limitations [R 163].

The tests were made using the crowding genetic algorithm, having the following parameters [R 163]:

• population size = 500,

• mutation probability = 0.01,

• crossover probability = 0.9.

We conclude that genetic algorithms are powerful search techniques suitable to solve reliability problems, but an efficient modeling of the problem using mathematical methods can improve even more their performance. The main advantage of using genetic algorithms is that they require no gradient information about the problem and they are resistant to becoming trapped in local optima [R 163].

**4.2 Optimizing File Availability in Peer-to-Peer Content Distribution Cloud **