Part I Sciencific achievements
4.2 Optimizing File Availability in Peer-to-Peer Content Distribution Cloud Storage
• Not even for a limitation to 0.5 %, the difference is not sensible.
The graphic in Figure 46 presents the differences between the two implementations:
pure genetic algorithm and mathematical method + genetic algorithm in the case of a limitation of the searching space to 0.26 %, from 12168 to 32 points. The vertical axis represents the number of generations necessary to find the optimum solution. We can notice that the second method (represented with blue) is more efficient [R 163].
case
21 19 17 15 13 11 9 7 5 3 1
Generation
70
60
50
40
30
20
10 0
GA only MATH + GA
GA only
15 8 6 5 4 3 2 1 0
Count
40
30
20
10
0
Figure 46 Differences between the two implementations [R 163]
Figure 47 limitation of the searching space [R 163]
In Figure 47 we can observe the difference in the case of a limitation of the searching space to 0.0128 %, from 7776 to 1 point. After this limitation, the solution is found in the first generation. The graphic shows the situation before the limitation. We can see that in 59 % of cases, the solution is obtained in the first generation, in 21.3 % – after one generation and in the rest of 19.7 % cases – after approximating 6.5 generations.
The mean number of generation after which the solution is found is 1.51. So, we can see that the effect is not considerable but for very big limitations [R 163].
The tests were made using the crowding genetic algorithm, having the following parameters [R 163]:
• population size = 500,
• mutation probability = 0.01,
• crossover probability = 0.9.
We conclude that genetic algorithms are powerful search techniques suitable to solve reliability problems, but an efficient modeling of the problem using mathematical methods can improve even more their performance. The main advantage of using genetic algorithms is that they require no gradient information about the problem and they are resistant to becoming trapped in local optima [R 163].
4.2 Optimizing File Availability in Peer-to-Peer Content Distribution Cloud
can be optimized, but optimization is difficult since it includes a high number of parameters and implies integer programming [R 6] [R 141].
Genetic algorithms (GAs) require little knowledge about the problem being solved, are easy to implement and robust. being suited to many optimization. So, is presented a genetic algorithm approach for optimizing the File Availability in Peer-to-Peer (P2P) file sharing systems [R 165].
4.2.1 P2P SHARING SYSTEMS
Based on the notations and the assumptions presented in [R 6] let consider a P2P sharing system with the following characteristics: a nuber of I nodes with Pi, the up probability for node I and Si, the shared storage for node i. Let consider J the number of distinct files, with bj the size of the j th file, qj the request probability of the j th file and xij - a zero-one variable which is equal to 1 if node i contains a replica of file and is zero otherwise. Also, nj denote the number of replicas for file j and let consider the special case, where each node is up with the same probability pi=p. The problem is to choose a number of non-negative integers ni, … , nJ such that the relation (3.40) is maximized subject to constraints given by relation (3.41).
The hit probability is [R 6]:
( )
^{xij}I i
i J
j j
hit q p
P
å Õ
=
=
- -
=
1 1
1
1 (3.40)
The constraints that must be satisfy are:
I i
S x b _{ij} _{i}
J j
j , 1,.. ,.
1
=
å
£=
(3.41) In conclusion we have to maximize ^{J}
( )
^{n}^{j}j
j p
q - -
å
=
1 1
1
when ^{J} b n_{j} S
j
j £
å
=1, where SI
S
S= _{1}+...+ is the total shared storage.
This optimization problem can be also solved efficiently by genetic programming [R 165].
4.2.2 The Genetic Algorithms
For considering the constraints we reject the infeasible individuals. The basic idea is to “punish” the fitness value of an individual whenever the solution produced violates some of the constraints imposed by the problem. A simple GA is applied, having the following main features [R 165]:
- Individuals are encoded like vectors of binary strings;
- The genetic operators are implemented according with the encoding scheme used;
- Randomly generates the initial population, but allows the use of an initial population specified by the user;
- Performs the GAs specific iteration;
- Includes the best performing individual of the parent generation in the new generation in order to prevent a good individual being lost by the probabilistic nature of reproduction;
- Allows the user to establish the GAs parameters: the size of the population, the type of selection scheme, crossover and mutation and the probability of applying the genetic operators.
The design variables of the problem are the positive integer numbers: n1, … , nJ that maximize relation (3.40) related to the restrictions given by relation (3.41).
The coding scheme uses vectors of pozitive integer numbers, having the form of relation and presented in Figure 48 [R 165].
n1
One individual
nJ
…
Figure 2
Figure 48 Individuals encoding scheme [R 165
It has to be noted that, in the evaluation step, the binary strings are decoded to the corresponding pozitive integers and are used to establish the Fitness function [R 165.
4.2.2.1 The crossover operator
The crossover operator acts at genotypic level, and so it is intended to work with binary strings. The crossover algorithm consists in following steps [R 165:
- Randomly choose two parents p1 and p2, with the crossover probability.
- Randomly choose k1Î[1,2,..,l1-1], a number that becomes the crossover point, where l1 is the length of the first parent. The chromosomes of the first parent are divided in 3 subchromosomes: a left subromosome placed on the left side of k1-1, a right subromosome placed on the right side of k1+1 and a central one, placed between k1- 1 and k1+1.
- Randomly choose k2Î[1,2,..,l2-1], a number that becomes the crossover point, where l2 is the length of the first parent. The chromosomes of the first parent are divided in 3 subchromosomes: a left subchromosome placed on the left side of k2-1, a right subchromosome placed on the right side of k2+1 and a central one, placed between k2-1 and k2+1.
- By exchanging the central subchromosomes between the two parents, the crossover produces two offsprings.
Besides this basic approach, the crossover operator verifies if the obtained individuals fulfill the constraints of the problem. If not, it uses a repairing strategy, by repeatedly generating offsprings until the restrictions are fulfilled [R 165.
4.2.2.2 The mutation operator
Similar with the crossover operator, the mutation operator also acts at genotypic level, and so it is intended to work with binary strings. The one point mutation algorithm consists in following steps [R 165:
- Randomly choose a parent p with the mutation probability.
- Randomly choose kÎ[1,2,..,l], a number that becomes the mutation point, where l is the length of the parent. The selected element is replaced with a complementary value.
Similar with the crossover operator, the mutation operator verifies if the mutated individual fulfills the constraints of the problem. If not, it uses a repairing strategy, by repeatedly generating a new individual, until the restrictions are fulfilled.
4.2.2.3 Appling GAs in optimisation problems
The first step to start the optimization is to define the initial population. This is carried out, by generating individuals having the form depicted in Figure 44 with the randomly picked positive integer parameters in their domain. Also, the individuals are subject to the constraints given in relation (3.41).
By generating the initial population, the infeasible individuals are rejected and the random generation of individuals continues until the required number of individuals is established without violating the constraints [R 75].
Once the parent population is available, the individuals are to be evaluated related to their quality in the search space. The approached problem deals with the objective given by relation (3.43):
Fitness= ^{J}
( )
^{n}^{j}j
j p
q - -
å
=
1 1
1
(3.43)
The selection operator uses the Fitness value. In this approach, the roulette wheel selection is used, which is the traditional selection function. The probability of surviving is equal to the fitness of a given individual, divided by the sum of the fitness of all individuals. In the implementation of the algorithm, simple elitism was also applied.
This technique guarantees survival of the best individual [R 165].
Through the recombination operator a new population is created. All the parameter values have been calculated based on inherited values from the parent population.
Therefore, no new information has been inserted in the population but only old information has been recombined. In order to introduce new information into the population pool, the mutation operator is used [R 165].
The traditional crossover and mutation operators are modified in such a way to produce allowable offspings, that respect the restrictions [R 75] [R 165].
4.2.2.4 Study Cases
A P2P network with the following characteristics was considered [R 165]:
• Number of files: J= 250
• Number of nodes: I=10
• The shared storages Si for each node are equal quantities.
• The size of the files are equals.
The request probability of the j th file: 𝑞_{1} = ^{%}_{°}. The up probability for i th node: 𝑝_{1} =^{%}_{²} The problem is to find the J positive integers nj that maximize the hit probability given by (3.46). The unknown parameters nj belong to the interval [1, 50]. Was considered a GAs with a population of 100 individuals. An individual is a vector of 250
integers.The restriction are given by (3.43). Since all the sizes bj are equals and their value related to S is established in such a way that the constraint becomes:
å
=£
J ×
j
nj 1
6200 (3.44)
The Fitness value is equal with the objective function (3.47). The initialization routine continues to randomly generate vectors of integers until the restriction is fulfilled. The crossover operator and the mutation operators use a reparing strategy. These operators are applied repeatedly until the restriction criteria is fulfilled. Two different runs were considered [R 165].
Study case 1
GAs used a population of 50 individuals and evolved for 30 generations.
After the genetic run, the Fitness value for the solution gives a high hit probability:
8164 .
=0
Phit (3.45)
The obtained solution fufills the constraints since
6200 6140
1
<
=
å
×= J j
nj (3.46)
Figure 49 plots the obtained solution, that is, the vector having 250 integer components. The evolution of the Fitness during GAs generations is presented in Figure 50, where the Fitness value is plotted related to the generation number [R 165].
Figure 49 The solution for the study case 1 (a vector with 250 integer components)
Figure 50 Fitness value related to the generation number in study case 1
Study case 2
GAs used a population of 100 individuals and evolved for 30 generations.
After the genetic run, the Fitness value for the solution gives a high hit probability:
9236 .
=0
Phit (3.47)
The obtained solution fufills the constraints since 𝑛_{1}
°
12% = 6166 < 6200 (3.48)
Figure 51 plots the obtained solution, that is, the vector having 250 integer components. The evolution of the Fitness during GAs generations is presented in Figure 52, where the Fitness value is plotted related to the generation number [R 165].
Figure 51 The solution for the study case 2 (a vector
with 250 integer components) Figure 52 The solution for the study case 2 (a vector with 250 integer components)
By examining the Figure 50 and Figure 52, it can be seen that the Fitness function increases generation after generation. Also, a higher number of individuals leads to better solutions. The obtained results show that genetic algorithms seem to be suitable
0 50 100 150 200 250
0 5 10 15 20 25 30 35 40 45 50
J
0 50 100 150 200 250
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Generation Fitness
0 5 10 15 20 25 30
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Generations Fitness
for the approached task. The disadvantage that belong to the used method consists in the fact that, for large networks the design problem is time demanding. But, in recent years, the increasing computation power of computers makes this disadvantage lesser, so that it is possible to use stochastic algorithms effectively in many applications, such as this type of problems [R 165].
This approach can be continued by investigating other, more complex networks.
5 THE INTEGRATION OF CLOUD SERVICES INTO ORGANIZATIONS
1.1 The overall process taken by enterprises to manage the IaaS cloud services