Computer Science Specialization

(1)

Computer Science Manual for Bachelor Graduation Examination

July and September 2019

Computer Science Specialization

General topics:

Part 1. Algorithms and Programming

1. Search (sequential and binary), merging, sorting (selection sort, bubble sort, insertion sort, merge sort, quicksort). The backtracking method.

2. OOP concepts in programming languages (Python, C++, Java, C#): class and object, members of a class and access modifiers, constructors and destructors.

3. Derived classes and inheritance. Method overriding. Polymorphism. Dynamic binding. Abstract classes and interfaces.

4. UML class diagrams. Relationships between classes.

5. Lists, Maps. Specification of typical operations (without implementations)

6. Identify data structures and data types suitable (efficient) for solving problems (only the data structures specified at 5.). The use of existing libraries for these structures (Python, Java, C++, C#).

Part 2. Databases

1. Relational databases. First three normal forms of a relation.

2. Querying databases using relational algebra operators.

3. Querying relational databases using SQL (Select).

Part 3. Operating systems

1. The structure of UNIX file systems

2. UNIX processes: creation, and the fork, exec, exit, wait system calls. Pipe and FIFO communication

3. Unix Shell Programming

a. Basic concepts: variables, control structures (if/then/elif/else/fi, for/done, while/do/done, shift, break, continue), predefined variables ($0, $1,..., $9, $*,

$@, $?), I/O redirections (|, >, >>, <, 2>, 2>>, 2>&1, the /dev/null file, back- quotes ``)

b. Regular expressions

c. Basic commands (functioning and the effect of the specified arguments): cat, chmod (-R), cp (-r), cut (-d,-f), echo, expr, file, find (-name,-type), grep (-i,-q,- v), head (-n), ls (-l), mkdir (-p), mv, ps (-e,-f), pwd, read (-p), rm (-f,-r), sed (only the commands d,s,y), sleep, sort (-n,-r), tail (-n), test (numerical, string and file operators), true, uniq (-c), wc (-c,-l,-w), who

(2)

Content

1. ALGORITHMICS AND PROGRAMMING ... 3

1.1. SEARCHING AND SORTING ... 3

1.1.1 Searching ... 3

1.1.2 Merging ... 6

1.1.3 Internal sorting ... 7

1.1.4 The backtracking method ... 11

1.2. OOP CONCEPTS IN PROGRAMMING LANGUAGES ... 15

1.2.1 Classes ... 15

1.3. DERIVED CLASSES AND INHERITANCE ... 24

1.3.1 Theoretical basis ... 24

1.3.2. Declaration of derived classes ... 24

1.3.3. Virtual functions... 25

1.3.4. Abstract classes ... 30

1.3.5. Interfaces... 32

1.4 UML CLASS DIAGRAMS.RELATIONSHIPS BETWEEN CLASSES. ... 34

1.4.1 Class diagrams... 34

1.5LISTS AND MAPS ... 38

1.5.1. Lists ... 39

1.5.2. Maps ... 43

1.6PROPOSED PROBLEMS ... 46

2. DATABASES... 47

2.1. RELATIONAL DATABASES.THE FIRST THREE NORMAL FORMS OF A RELATION ... 47

2.1.1. Relational model ... 47

2.1.2. First three normal forms of a relation ... 50

2.2. QUERYING DATABASES USING RELATIONAL ALGEBRA OPERATORS ... 57

2.3. QUERYING RELATIONAL DATABASES USING SQL(SELECT) ... 60

2.4. PROPOSED PROBLEMS ... 65

3. OPERATING SYSTEMS ... 69

3.1. THE STRUCTURE OF UNIX FILE SYSTEMS ... 69

3.1.1 Unix File System ... 69

3.1.2. File Types and File Systems ... 72

3.2. UNIXPROCESSES ... 75

3.2.1. Main System Calls for Process Management ... 75

3.2.2. Communicating between processes using pipe ... 79

3.2.3. Communicating between processes with FIFO ... 81

3.3. COMMAND FILE INTERPRETERS ... 83

3.3.1. Shell Command Interpreter Functioning ... 83

3.3.2. Shell Programming ... 84

3.4. PROPOSED PROBLEMS ... 86

3.5. UNIX SHELL PROGRAMMING EXAMPLES... 87

4. GENERAL BIBLIOGRAPHY ... 91

(3)

1. Algorithmics and programming

1.1. Searching and sorting

1.1.1 Searching

The data are available in the internal memory, as a sequence of records. We will search a record having a certain value for one of its fields, called search key. If the search is successful, we will have the position of the record in the given sequence.

We denote by k1, k2, ..., kn the record keys and by a the key value to be found. Our problem is, thus, to find the position p characterized by a = kp.

It is a usual practice to store the keys in increasing sequence. Consequently, in the following we will assume that

k1 < k2 < .... < kn.

Sometimes, when the keys are already sorted, we may not only be interested to find the record having the requested key, but, if such a record is not available, we may need to know the insertion place of a new record with this key, such that the sort order is preserved.

We thus have the following specification for the searching problem:

Data a,n,(ki, i=1,n);

Precondition: nN, n1, and k1 < k2 < .... < kn ; Results p;

Postcondition: (p=1 and a  k1) or (p=n+1 and a > kn) or (1<pn) and (kp-1 < a  kp).

1.1.1.1 Sequential search

The first method is the sequential search, where the keys are successively examined. We distinguish three cases: a ≤ k1, a > kn, and respectively, k1 < a ≤ kn, the last case leading to the actual search.

Subalgorithm SearchSeq (a,n,K,p) is: {nN, n1 and k1 < k2 < .... < kn} {Search p such that: (p=1 and a  k1) or}

{ (p=n+1 and a>kn) or (1<pn) and (kp-1 < a  kp)}

Let p:=0; {Case "not yet found"}

If ak1 then p:=1 else If a>kn then p:=n+1 else For i:=2; n do

If (p=0) and (aki) then p:=i endif endfor

endif endif sfsub

(4)

We remark that this method leads to n-1 comparisons in the worst case, because the counter i will take all the values from 2 to n. The n keys divide the real axis in n+1 intervals.

When a is between k1 and kn, the number of comparisons is still n-1, and when a is outside the interval [k1, kn], there are at most two comparisons. So, the average complexity has the same order of magnitude at the worst-case complexity.

There are many situations when this algorithm does useless computations. When the key has already been identified, it is useless to continue the loop for the remaining values of i. In other words, it is desirable to replace the for loop with a while loop. We get the second subalgorithm, described as follows.

Subalgorithm SearchSucc(a,n,K,p) is: {nN, n1 and k1 < k2 < .... < kn}

{Se caută p astfel ca: p=1 and a  k1) or } {(p=n+1 and a>kn) or (1<pn) and (kp-1 < a  kp)}

Let p:=1;

If a>k1 then

While pn and a>kp do p:=p+1 endwh endif

end-subalg

The algorithm SearchSucc does n comparisons in the worst case. But, on the average, the number of comparisons is reduced to half, and, as such, the average running-time complexity order of SearchSucc is the same as with the SearchSeq subalgorithm.

Note that the sequential search algorithm can also be applied if the keys are not sorted. In this case the specification becomes:

Data a,n,(ki, i=1,n);

Precondition: nN, n1;

Results p;

Postcondition: (p=n+1 and a  ki,  1in) or ((1pn) and (a = kp)).

The sequential search algorithm is described below:

Subalgorithm SeqSearch(a, n, K, p) is: {nN, n1}

{p is searched such that: p=n+1 and a  ki,  1in or } { (1pn) and (a= kp).

Let p:=1;

While pn şi a kp execute p:=p+1 endwh end-subalg

1.1.1.2 Binary search

Another method, called binary search, more efficient than the previous two methods, uses the “divide and conquer” technique with respect to working with the data. We start by

(5)

considering the relation of the search key to the key of the element in the middle of the collection. Based on this check we will continue our search in one of the two halves of the collection. We can thus successively halve the collection portion we use for our search. Since we modify the size of the collection, we need to consider the ends of the current collection as parameters for the search. The binary search may effectively be realized with the function call SearchBin(a, n, K, p). This function is described hereby.

Subalgorithm SearchBin (a,n,K,p) is: {nN, n1 and k1 < k2 < .... < kn}

{Search p such that: (p=1 and a  k1) or}

{(p=n+1 and a>kn) or (1<pn) and (kp-1 < a  kp)}

If aK1 then p:=1 else

If a>Kn

then p:=n+1

else p:=BinarySearch(a,K,1,n) endif

endif sfsub

Function BinarySearch (a, K, Left, Right) is:

If LeftRight-1

then BinarySearch:=Rigt else m:=(Left+Right) Div 2;

If aKm

then BinarySearch:=BinarySearch(a,K,Left,m) else BinarySearch:=BinarySearch(a,K,m,Right) endif

endif sffunc

The variables Left and Right in the BinarySearch function described above represent the ends of the search interval, and m represents the middle of the interval. Using this method, in a collection with n elements, the search result may be provided after at most log2n comparisons.

Thus, the worst case time complexity is proportional to log2n.

We remark that the function BinarySearch is a recursive function. We can easily remove the recursion, as we see in the following function:

Function BinarySearchN (a,K,Left,Right) is:

While Right-Left>1 do

m:=(Left+Right) Div 2;

If aKm

then Right:=m else Left:=m endif

endwh

BinarySearchN:=Right endfunc

(6)

1.1.2 Merging

Let us consider two data sequences, sorted in increasing (or, possibly, decreasing) order by the values of a key. We need to build a sequence sorted in the same manner, whose elements are all elements of the two given sequences.

We mention that there are two types of merging: (1) merging with preserving the duplicates (in this case we keep in the resulted sequence all the elements from the first two sequences, even if they are not distinct) and (2) merging without duplicates (in this case we keep in the resulted sequence only the elements that are distinct).

In the following we will consider merging with preserving the duplicates. We let the reader to write the algorithm for merging without duplicates.

We thus need an algorithm, to solve the problem with the following specification:

Data m, (xi, i=1,m), n, (yi, i=1,n);

Precondition: {x1  x2  ...  xm} and {y1  y2  ...  yn} Results k, (zi, i=1,k);

Postcondition: {k=m+n} and {z1 z2 ... zk} and (z1,z2,..., zk) is a permutation of the values (x1, ..., xm,y1,..., yn)

A possible solution is to simply create the sequence Z by concatenating sequences X and Y. This achieves the second part of the postcondition. We now sort the elements of the sequence Z and get the desired solution. This is a correct algorithm, but is inefficient, and moreover, is not useful in external sorts (see the next section). It is important to create the sorted sequence Z by having a single traversal of the sequences X and Y. We achieve this with the following merging algorithm:

Subalgorithm Merge(m,X,n,Y,k,Z) is:

{X has the m components sorted}

{increasingly. The same for Y with n components.}

{The m+n values are placed in Z, non-decreasingly sorted }

Let i:=1; j:=1; k:=0;

While (i<=m) and (j<=n) do{There are components to process}

If xiyj

then Call ADD(xi,k,Z) {in X}

i:=i+1

else Call ADD(yj,k,Z) {in Y}

j:=j+1 endif

endwh

While (i<=m) do {There are components}

Call ADD (xi,k,Z) {only in X}

i:=i+1 {increase i}

endwh

(7)

While (j<=n) execute {There are components}

Call ADD (yj,k,Z) {only in Y}

j:=j+1 {increase j}

endwh sfsub

We have used the subalgorithm ADD(val,k,Z) that places the value val at teh end of the sequence Z of length k. This subalgorithm follows:

Subalgorithm Place(val,k,Z) is: {Add val}

k:=k+1; {in vector Z with}

zk:=val; {k components}

sfsub

The overall running time complexity of the Merge subalgorithm described above is )

(m+n

 [13]. The extra space complexity required by the Merge subalgorithm is (1).

1.1.3 Internal sorting

Internal sorting is the operation to reorganize the elements in a collection already available in the internal memory, in such a way that the record keys are sorted in increasing (or decreasing, if necessary) order.

From an algorithms complexity point of view, our problem is reduced to keys sorting. So, the specification of the internal sorting problem is the following:

Data n,K; {K=(k1,k2,...,kn)}

Precondition: kiR, i=1,n Results K';

Postcondition: K' is a permutation of K, having sorted elements, i.e.

k'1  k'2  ...  k'n.

1.1.3.1 Selection sort

The first technique, called Selection Sort, works by determining the element having the minimal (or maximal) key, and swapping it with the first element. Now, forget about the first element and resume the procedure for the remaining elements, until all elements have been considered.

Subalgorithm SelectionSort(n,K) is: {Do a permutation of the}

{n components of vector K such}

{that k1  k2  ....  kn } For i:=1; n-1 do

Let ind:=i;

For j:=i+1; n do

If kj < kind then ind:=j endif

(8)

endfor

If i<ind then t:=ki; ki:=kind; kind:=t endif endfor

sfsub

We remark that the total number of comparisons is (n-1)+(n-2)+...+2+1=n(n-1)/2

independently of the input data. So, the average computational complexity, as well as the worst- case computational complexity, is O(n²) [13].

1.1.3.2 Bubble sort

Another method, called BubbleSort, compares two consecutive elements, which, if not in the expected relationship, will be swapped. The comparison process will end when all pairs of consecutive elements are in the expected order relationship.

Subalgorithm BubbleSort (n,K) is:

Repeat

Let kod:=0; {Hypothesis "is sorted"}

For i:=2; n do If ki-1 > ki then

t := ki-1; ki-1 := ki; ki:=t;

kod:=1 {Not sorted yet!}

endif endfor

until kod=0 endrep {Sorted}

sfsub

This algorithms performs (n-1)+(n-2)+ ... +2+1 = n(n-1)/2 comparisons in the worst case, so the time complexity is O(n²).

An optimized variant of BubbleSort is:

Subalgorithm BubbleSort (n,K) is:

Let s:=0;

Repeat

Let kod:=0; {Hypothesis "is sorted"}

For i:=2; n-s do If ki-1 > ki then

t := ki-1; ki-1 := ki; ki:=t;

kod:=1 {Not sorted yet!}

endif endfor

Let s:=s+1;

until kod=0 endrep {Sorted}

sfsub

(9)

1.1.3.3 Insertion Sort

The idea of this sorting metod is that, while traversing the elements, we insert the current element at the right position in the subsequence of already sorted elements. In this way, the subsequence containing the already processed elements is kept sorted, and, at the end of the traversal, the whole sequence will be sorted. This algorithm is called InsertionSort.

Subalgorithm InsertionSort(n,K) is:

For i:=2; n do

Let ind:=i-1; a:=ki; While ind>0 and a<kind do kind+1 := kind ;

ind:=ind-1 endwh

kind+1:=a endfor sfsub

This algorithms performs (n-1)+(n-2)+ ... +2+1 = n(n-1)/2 comparisons in the worst case, so the time complexity is O(n²).

1.1.3.4 Merge Sort

We are now ready to give the full details of the MergeSort subalgorithm.

A slightly modified version of the Merge subalgorithm, described in Section 1.1.2, is aimed at merging subsequences only. This version merges X[sx, …, dx] and Y[sy, …, dy] into Z[1,…,k].

Subalgorithm MergeSubSeq(sx,dx,X,sy,dy,Y,k,Z) is:

{X has components sx,…,dx increasingly sorted. The same for Y with}

{components sy,…,dy. All these values are placed in Z, increasingly sorted and having size k.}

Let i:=sx; j:=sy; k:=0;

While (i<=dx) and (j<=dy) do{There are components to process}

If xiyj

then Call ADD(xi,k,Z) {in X}

i:=i+1

else Call ADD(yj,k,Z) {in Y}

j:=j+1 endif

endwh

While (i<=dx) do {There are components}

Call ADD (xi,k,Z) {only in X}

i:=i+1 endwh

While (j<=dy) do {There are components}

Call Place (yj,k,Z) {only in Y}

j:=j+1 endwh

(10)

sfsub

MergeSort algorithm for sorting a sequence S of n elements is based on a divide-and- conquer method:

1. If S has at least two elements (nothing needs to be done if S has zero or one elements), let S1 and S2 be the subsequences from S each containing about half of the elements of S.

(i.e. S1 contains the first 2

n elements and S2 contains the remaining elements).

2. Sort sequences S1 and S2 using MergeSort.

3. Put back the elements into S by merging the sorted sequences S1 and S2 into one sorted sequence

The MergeSort subalgorithm, to sort a sequence by merging sorted subsequences, is described hereby:

Subalgorithm MergeSort (n,A) is:

Call MergeSortRec (1,n,A);

sfsub

Subalgorithm MergeSortRec (Left,Right,A) is:

{Sort by merging the elements ALeft,ALeft+1,...,ARight} If Left < Right

then

Let m:=(Left+Right) div 2;

Call MergeSortRec (Left,m,A);

Call MergeSortRec (m+1,Right,A);

Call MergeSubSeq (Left,m,A,m+1,Right,A,k,C);

Let A[Left…Right] = C[1…k];

endif sfsub

The algorithm recursively sorts the two parts of the sequence, and, then, merges them into an additional array, named C. Before the end, the array C is copied onto the correct subsequence of A. The overall time complexity for MergeSort, that is (nlog₂n).

We can simply observe that MergeSort complexity from the extra-space point of view is (n) (nedeed for merging the two sorted subarrays).

1.1.3.5 Quicksort

Another, more efficient sorting method is described hereby. The method, called QuickSort, is based on the “divide and conquer” technique. The subsequence to be sorted is given through two input parameters, the inferior and superior limits of the substring elements indices. The procedure call to sort the whole sequence is: QuickSortRec(K,1,n), where n is the number of records of the given collection. So,

Subalgorithm QuickSort (n,K) is:

Call QuickSortRec(K,1,n) sfsub

(11)

The procedure QuickSortRec(K,Left,Right) will sort the subsequence kLeft,kLeft+1,..., kRight. Before performing the actual sort, the substring will be partitioned in such a way that the value of the element kLeft (called pivot) occupies its final position in the subsequence. If i is this position, the substring will be rearranged such that the following condition is fulfilled:

kj  ki  kl , for Left  j < i < l  Right (*)

As soon as the partitioning is achieved, we will only need to sort the subsequence kSt, kSt+1, ..., ki-1 using a recursive call to QuickSortRec(K,Left,i-1) and then the subsequence ki+1, ..., kDr using a recursive call to QuickSortRec(K,i+1,Right). Of course, we will need to sort these subsequences only if they have at least two elements. Otherwise, a subsequence of one element is, actually, sorted.

The procedure QuickSort is described hereby:

Subalgorithm QuickSort (K, Left, Right) este:

Let i := Left; j := Right; a := ki; Repeat

While kj  a and (i < j) do j := j - 1 endwh ki := kj;

While ki  a and (i < j) do i := i + 1 endwh kj := ki ;

until i = j endrep Let ki := a;

If St < i-1 then Call QuickSort(K, St, i - 1) endif If i+1 < Dr then Call QuickSort(K, i + 1, Dr) endif endsub

The time complexity of the described algorithm is O(n²) in the worst case, but the average time complexity is O(nlog2n).

1.1.4 The backtracking method

The backtracking method is applicable to search problems with more solutions. Its main disadvantage is that it has an exponential running time. We are first considering two examples and then will give a few general algorithms for this method.

Problem 1. (Permutations generation) Let n be a natural number. Print all permutations of numbers 1, 2, ..., n.

A solution for the permutation generation problem in the particular case n = 3, is:

(12)

Subalgorithm Permutations1 is:

For i1 := 1;3 execute For i2 := 1;3 execute For i3 := 1;3 execute

Let possible := (i1, i2, i3)

If components of the possible array are distinct then Print possible

endif endfor endfor endfor endsub

1

1 2 3 2

1 2 3 3

1 2 3

2

1

1 2 3 2

1 2 3 3

1 2 3

3

1

1 2 3 2

1 2 3 3

1 2 3

x₁

x₂

x₃

Figure 1.1. Graphical representation of the Cartesian product {1, 2, 3}³ Let us discuss a few remarks on the subalgorithm Permutations1:

• It is not general: for the general case we cannot describe an algorithm with n imbricated for loops.

• The total number of checked arrays is 3³, and in the general case nⁿ. The checked possible arrays are graphically represented in Figure 1.1: each array is a path from the tree root to the leaves.

• The algorithm first assigns values to all components of the array possible, and afterwards checks whether the array is a permutation.

One way to improve the efficiency of this algorithm is to check a few conditions during the construction of the array, avoiding in this way the construction of a complete array in the case we are certain it does not lead to a correct solution. For example, if the first component of the array is 1, then it is useless to assign the second component the value 1; if the second component has been assigned the value 2, it is useless to assign the third component the values 1 or 2. In this way, for a large n we avoid generating many arrays of the type (1, ...). For example, (1, 3, ...) is a potential array (potentially leading to a solution), while the array (1, 1, ...) is definitely not. The array (1, 3, ...) satisfies certain conditions to continue (set to lead to a solution): it has distinct components. The gray nodes in Figure 1.1 represent values that do not lead to a solution.

(13)

We will describe a general algorithm for the backtracking method, and then we will see how this algorithm may be used to solve Problem 1 stated in the beginning of this section. To start, we will state a few remarks and notations concerning the backtracking algorithm, applied to a problem where the solutions are represented on arrays of length not necessarily constant:

1. the solutions search space: S = S1 x S2 x ... x Sn; 2. possible is the array to represent the solutions;

3. possible[1..k]  S1 x S2 x ... x Sk is the subarray of solution candidates; it may or may not lead to a solution, i.e. it may or may not be extended to form a complete solution; the index k is the number of already constructed solution elements;

4. possible[1..k] is promising if it satisfies the conditions that may lead to a solution (Figure 1.2).;

5. solution (n, k, possible) is a function to check whether the potential array possible[1..k] is a solution of the problem.

Figure 1.2. The search space for the permutations problem The search process may be seen in the following subalgorithm:

Subalgorithm Backtracking(n) is: {draft version }

Let k = 1;

@Initialise the search for index k (= 1)

While k > 0 do

{possible[1..k-1] is a solution candidate }

@Sequentially search for index k a value v to extend the subarray possible[1..k-1] such that possible[1..k] is still a solution candidate

If the search is successful

then Let possible[k] := v; {possible[1..k] is a solution candidate}

If solution (n, k, possible) {found a solution; we are still on level k}

then Print possible[1..k]

(14)

else @ Initialize the search for index k+1 {a potential array } Let k = k + 1 {step forward on level k+1}

endif

else k = k – 1 {step backward (backtrack to level k-1)}

endif endwh endSub

In order to write the final version of this algorithm we need to specify the non-standard elements. We thus need the Boolean function

condToContinue (k, possible, v)

that checks whether the subarray with the solution candidate possible[1..k-1], extended with the value v, is still a solution candidate.

Then, to initialize the search at level j we need a way to select a fictive element for the set Sj, with the purpose of indicating that no element has been selected from the set Sj. The function

init (j)

returns this fictive element.

In order to search a value on the level j, in the hypothesis that the current value is not good, we need the Boolean function

next (j, v, new)

which returns true if it may select a new value in Sj that follows after the value v, value denoted by new and false when no other values in Sj exist, so no new choice may be made. With these notations, the subalgorithm becomes:

Subalgorithm Backtracking(n) is: {final version } Let k = 1 ;

possible[1] := init(1);

While k > 0 execute {possible[1..k-1] is potential}

Let found := false; v := possible[k] ; While next (k, v, new) and not found do Let v := new;

If condToContinue (k, possible, v) then found := true endif endwh

If found then Let possible[k] := v {possible[1..k] is potential}

if solution (n, k, possible) {found a solution; we are still on level k}

then Print possible[1..k]

else Let k = k + 1; {a potential array } possible[k] := init(k); {step forward on level k+1}

endif

else k = k - 1 {step backward (backtrack to level k-1)}

endif endwh endSub

The process of searching a value on level k and the functions condToContinue and solution are problem-dependent. For example, for the permutation generation problem, these functions are:

function init(k) is:

init:= 0

(15)

endfunc

function next (k, v, new) is:

if v < n

then next := true;

new:= v + 1 else next := false endif

endfunc

function condToContinue (k, possible, v) is:

kod:=True; i:=1;

while kod and (i<k) do

if possible[i] = v then kod := false endif endwh

condToContinue := kod endfunc

function solution (n, k, possible) is:

solution := (k = n) endfunc

To conclude, we are providing here the recursive version of the backtracking algorithm, described using the same helping functions:

Subalgorithm backtracking(n, k, possible) is: {possible[1..k] is potential}

if solution(n, k, possible) {a solution; terminate the recursive call}

then print possible[1..k] {else, stay on same level}

else for each value v possible for possible[k+1] do if condToContinue(k+1, possible, v)

then possible[k+1] := v

backtracking (n, k+1, possible) {step forward}

endif endfor

endif {terminate backtracking(n, k, possible) call}

endsub {so, step backward}

The problem is solved by the call backtracking(n, 0, possible).

1.2. OOP concepts in programming languages

1.2.1 Classes

1.2.1.1 Data protection in modular programming

In procedural programming, developing programs means using functions and procedures for writing these programs. In the C/C++ programming language instead of functions and procedures we have functions that return a value and functions that do not return a value. But

(16)

in case of large applications it is desirable to have some kind of data protection. This means that only some functions have access to problem data, specifically functions referring to that data. In modular programming, data protection may be achieved by using static memory allocation. If in a file a datum outside any function is declared static then it can be used form where it was declared to the end of the file, but not outside it.

Let us consider the following example dealing with integer vector processing. Write a module for integer vector processing that contains functions corresponding to vector initialization, disposing occupied memory, raising to the power two and printing vector elements. A possible implementation of this module is presented in the file vector1.cpp:

#include <iostream>

using namespace std;

static int* e; //vector elements static int d; //vector size

void init(int* e1, int d1) //initialization {

d = d1;

e = new int[d];

for(int i = 0; i < d; i++) e[i] = e1[i];

}

void destroy() //disposing occupied memory {

delete [] e;

}

void squared() // raising to the power two {

for(int i = 0; i < d; i++) e[i] *= e[i];

}

void print() //printing {

for(int i = 0; i < d; i++) cout << e[i] << ' ';

cout << endl;

}

Modulul se compilează separat obţinând un program obiect. Un exemplu de program principal este prezentat în File vector2.cpp:

The module is individually compiled and an object file is produced. A main program example is presented in the file vector2.cpp:

extern void init( int*, int); //extern may be omitted extern void destroy();

extern void squared();

extern void print();

//extern int* e;

int main() {

int x[5] = {1, 2, 3, 4, 5};

(17)

init(x, 5);

squared();

print();

destroy();

int y[] = {1, 2, 3, 4, 5, 6};

init(y, 6);

//e[1]=10; error, data are protected squared();

print();

destroy();

return 0;

}

Note that even though the main program uses two vectors, we cannot use them together, so for example the module vector1.cpp cannot be extended to implement vector addition. In order to overcome this drawback, abstract data types have been introduced.

1.2.1.2 Abstract data types

Abstract data types enable a tighter bound between the problem data and operations (functions) referring to these data. An abstract data type declaration is similar to a struct declaration, which apart of the data also declares or defines functions referring to these data.

For example in the integer vector case we can declare the abstract data type:

struct vect { int* e;

int d;

void init(int* e1, int d1);

void destroy() { delete [] e; } void squared();

void print();

};

The functions declared or defined within the struct will be called methods and the data will be called attributes. If a method is defined within the struct (like the destroy method from the previous example) then it is considered an inline method. If a method is defined outside the struct then the function name will be replaced by the abstract data type name followed by the scope resolution operator (::) and the method name. Thus the init, squared and print methods will be defined within the module as follows:

void vect::init(int *e1, int d1) {

d = d1;

e = new int[d];

for(int i = 0; i < d; i++) e[i] = e1[i];

}

void vect::squared() {

for(int i = 0; i < d; i++) e[i] *= e[i];

}

void vect::print()

(18)

{

for(int i = 0; i < d; i++) cout << e[i] << ' ';

cout << endl;

}

Even though by the above approach a tighter bound between problem data and functions referring to these data has been accomplished, data are not protected, so they can be accessed by any user defined function, not only by the methods. This drawback may be overcome by using classes.

1.2.1.3 Class declaration

A class abstract data type is declared like a struct, but the keyword struct is replaced with class. Like in the struct case, in order to refer to a class data type one uses the name following the keyword class (the class name). Data protection is achieved with the access modifiers:

private, protected and public. The access modifier is followed by the character ':'. The private and protected access modifiers represent protected data while the public access modifier represent unprotected data. An access modifier is valid until the next access modifier occurs within a class, the default access modifier being private. Note that structs also allow the use of access modifiers, but in this case the default access modifier is public.

For example the vector class may be declared as follows:

class vector {

int* e; //vector elements int d; //vector size public:

vector(int* e1, int d1);

~vector() { delete [] e; } void squared();

void print();

};

Note that the attributes e and d have been declared private (restricted access), while methods have been declared public (unrestricted access). Of course that some attributes may be declared public and some methods may be declared private if the problem specifics requires so. In general, private attributes can only be accessed by the methods from that class and by friend functions.

Another important remark regarding the above example is that attribute initialization and occupied memory disposal was done via some special methods.

Data declared as some class data type are called the classes' objects or simply objects. They are declared as follows:

class_name list_of_objects;

For example, a vector object is declared as follows:

vector v;

Object initialization is done with a special method called constructor. Objects are disposed by an automatic call of another special method called destructor. In the above example

(19)

vector(int* e1, int d1);

is a constructor and

~vector() { delete [] e; }

is a destructor.

Abstract data types of type struct may also be seen as classes where all elements have unrestricted access. The above constructor is declared inside the class, but it is not defined, while the destructor is defined inside the class. So the destructor is an inline function. In order to define methods outside a class, the scope resolution operator is used (like in the struct case).

1.2.1.4 Class members. The this pointer

In order to refer to class attributes or methods the dot (.) or arrow (→) operator is used (like in the struct case). For example, if the following declarations are considered:

vector v;

vector* p;

then printing the vector v and the vector referred by the p pointer is done as follows:

v.print();

p->print();

However inside methods, in order to refer to attributes or (other) methods only their name needs to be used, the dot (.) or arrow (→) operators being optional. In fact, the compiler automatically generates a special pointer, the this pointer, at each method call and it uses the generated pointer to identify attributes and methods.

The this pointer will be declared automatically as a pointer to the current object. In the example from above the this pointer is the address of the vector v and the address referred by the p pointer respectively.

For example, if inside the print method an attribute d is used then it is interpreted as this->d.

The this pointer may also be used explicitly by the programmer.

1.2.1.5. The constructor

Object initialization is done with a special method called constructor. The constructor name has to be the same with the class name. The class may have multiple constructors. In this case these methods will have the same name and this is possible due to function overloading. Of course that the number and/or formal parameter types has to be different otherwise the compiler cannot choose the correct constructor.

(20)

Constructors do not return any value. In this situation the use of the keyword void is forbidden.

In the following we show an example of a class having as attributes a person's last name and first name and a method for printing the person's whole name.

File person.h:

class person { char* lname;

char* fname;

public:

person(); //default constructor person(char* ln, char* fn); //constructor

person(const person& p1); //copy constructor ~person(); //destructor void print();

};

File person.cpp:

#include <iostream>

#include <cstring>

#include "person.h"

person::person() {

lname = new char[1];

*lname = 0;

fname = new char[1];

*fname = 0;

cout << "Calling default constructor." << endl;

}

person::person(char* ln, char* fn) {

lname = new char[strlen(ln)+1];

fname = new char[strlen(fn)+1];

strcpy(lname, ln);

strcpy(fname, fn);

cout << "Calling constructor (lname, fname).\n";

}

person::person(const person& p1) {

lname = new char[strlen(p1.lname)+1];

strcpy(lname, p1.name);

fname = new char[strlen(p1.fname)+1];

strcpy(fname, p1.fname);

cout << "Calling copy constructor." << endl;

}

person::~person() {

delete[] lname;

delete[] fname;

(21)

}

void person::print() {

cout << fname << ' ' << lname << endl;

}

File personTest.cpp:

#include "person.h"

int main() {

person A; //calling default constructor A.print();

person B("Stroustrup", "Bjarne");

B.print();

person *C = new person("Kernighan","Brian");

C->print();

delete C;

person D(B); //equivalent to person D = B;

//calling copy constructor D.print();

return 0;

}

We may notice the presence of two special types of constructors: the default constructor and the copy constructor. If a class has a constructor without any parameters then this is called default constructor. The copy constructor is used for object initialization given an object of the same type (in the above example a person having the same last and first name). The copy constructor is declared as follows:

class_name(const class_name& object);

The const keyword expresses the fact that the copy constructor's argument is not changed.

A class may contain attributes of other class type. Declaring the class as:

class class_name { class_name_1 ob_1;

class_name_2 ob_2;

...

class_name_n ob_n;

...

};

its constructor's header will have the following form:

class_name(argument_list):

ob_1(l_arg_1), ob_2(l_arg_2), ..., ob_n(l_arg_n)

where argument_list and l_arg_i respectively represent the list of formal parameters from the class_name's constructor and object ob_i respectively.

(22)

From the list ob_1(l_arg_1), ob_2(l_arg_2), ..., ob_n(l_arg_n) one my choose not to include the objects that do not have user defined constructors, or objects that are initialized by the default constructor, or by a constructor having only implicit parameters.

If a class contains attributes of another class type then first these attributes' constructors are called followed by the statements from this classes' constructor.

File pair.cpp:

#include <iostream>

#include "person.h"

class pair { person husband;

person wife;

public:

pair() //implicit constructor definition { //the implicit constructors

} //for objects husband and wife are called pair(person& ahusband, person& awife);

pair(char* lname_husband, char* fname_husband, char* lname_wife, char* fname_wife):

husband(lname_husband, fname_husband), wife(lname_wife, fname_wife)

{ }

void print();

};

inline pair::pair(person& ahusband, person& awife):

husband(ahusband), wife(awife) {

}

void pair::print() {

cout << "husband: ";

husband.print();

cout << "wife: ";

wife.print();

}

int main() {

person A("Pop", "Ion");

person B("Popa", "Ioana");

pair AB(A, B);

AB.print();

pair CD("C","C","D","D");

CD.print();

pair EF;

EF.print();

return 0;

}

(23)

Note that in the second constructor, the formal parameters husband and wife have been declared as references to type person. If they had been declared as formal parameters of type person, then in the following situation:

pair AB(A, B);

the copy constructor would have been called four times. In situations like this, temporary objects are first created using the copy constructor (two calls in this case), and then the constructors of the attributes having a class type are executed (other two calls).

1.2.1.6 The destructor

The destructor is the method called in case of object disposal. Global object destructor is called automatically at the end of the main function as part of the exit function. So using the exit function in a destructor is not recommended as it leads to an infinite loop. Local objects destructor is executed automatically when the bloc in which these objects were defined is finished. In case of dynamically allocated objects, the destructor is usually called indirectly via the delete operator (provided that the object has been previously created using the new operator). There is also an explicit way of calling the destructor and in this case the destructor name needs to be preceded by the class name and the scope resolution operator.

The destructor name starts with the ~ character followed by the class name. Like in the constructor case, the destructor does not return any value and using he void keyword is forbidden. The destructor call is various situations in shown in the following example:

File destruct.cpp:

#include <iostream>

#include <cstring>

class write { //write on stdout what it does.

char* name;

public:

write(char* n);

~write();

};

write::write(char* n) {

name = new char[strlen(n)+1];

strcpy(name, n);

cout << "Created object: " << name << '\n';

}

write::~write() {

cout << "Destroyed object: " << name << '\n';

delete name;

}

void function()

(24)

{

cout << "Call function" << '\n';

write local("Local");

}

write global("Global");

int main() {

write* dynamic = new write("Dynamic");

function();

cout << "In main" << '\n';

delete dynamic;

return 0;

}

1.3. Derived classes and inheritance

1.3.1 Theoretical basis

The use of abstract data types creates an ensamble for managing data and operations on this data. By means of the abstract type class data protection is also achieved, so usually the protected elements can only be accessed by the methods of the given class. This property of objects is called encapsulation.

But in everyday life we do not see separate objects only, but also different relationships among these objects, and among the classes these objects belong to. In this way a class hierarchy is formed. The result is a second property of objects: inheritance. This means that all attributes and methods of the base class are inherited by the derived class, but new members (both attributes and methods) can be added to it. If a derived class has more than one base class, we talk about multiple inheritance.

Another important property of objects belonging to the derived class is that methods can be overridden. This means that an operation related to objects belonging to the hierarchy has a single signiture, but the methods that describe this operation can be different. So, the name and the list of formal parameters of the method is the same in both the base and the derived class, but the implementation of the method can be different. Thus, in the derived class methods can be specific to that class, although the operation is identified through the same name. This property is called polymorphism.

1.3.2. Declaration of derived classes

A derived class is declared in the following way:

class name_of_derived_class : list_of_base_classes { //new attributes and methods

};

where list_of_base_classes is of the form:

(25)

elem_1, elem_2, ..., elem_n

and elem_i for each 1 ≤ i ≤ n can be

public base_class_i

or

protected base_class_i

or

private base_class_i

The public, protected and private keywords are called inheritance access modifiers in this situation too. They can be missing, and in this case the default modifier is private. Access to elements from the derived class is presented on Table 1.

Access to elements from the base class

Inheritance access modifier

Access to elements from the derived

class

public public public

protected public protected

private public inaccesible

public protected protected

protected protected protected

private protected inaccesible

public private private

protected private private

private private inaccesible

Tabel 1: access to elements from the derived class

We can observe that private members of the base class are inaccesible in the derived class.

Protected and public members become protected and private, respectively, if the inheritance access modifier is protected and private, respectively, and remain unchanged if the inheritance access modifier is public. This is why, generally, attributes and methods are declared protected and the inheritance access modifier is public. Thus, they can be accessed, but are protected in the derived class, too.

1.3.3. Virtual functions

Polymorphism leads naturally to the problem of determining the method that will be called for a given object. Let us consider the following example. We declare a base class, called base, and a class derived from this class, called derived. The base class has two methods: method_1 and method_2 and method_2 calls method_1. In the derived class

(26)

method_1 is overridden, but method_2 is not. In the main program an object of the derived class is declared and method_2, inherited from the base class, is called. In the C++ language, this example is written in the following way:

File virtual1.cpp:

#include <iostream>

class base { public:

void method_1();

void method_2();

};

class derived : public base { public:

void method_1();

};

void base::method_1() {

cout << "Method method_1 of the"

<< " base class is called" << endl;

}

void base::method_2() {

<< " base class is called" << endl;

method_1();

}

void derived::method_1() {

<< " derived class is called" << endl;

}

int main() { derived D;

D.method_2();

}

Executing the code, we will have the following result:

Method method_2 of the base class is called Method method_1 of the base class is called

But this is not the desired result, because in the main function method method_2, inherited from the base class, was called, but method method_1 called by method_2 was determined at compile-time. Consequently, although method_1 was overridden in the derived class, the method from the base class was called, not the overridden one.

This shortcoming can be overcome by introducing the notion of virtual methods. If a method is virtual, than for every call of it, the implementation corresponding to the class hierarchy

(27)

will not be determined at compile-time, but at execution, depending on the type of the object on which the call was made. This property is called dynamic binding, and if a method is determined at compile-time, we talk about static binding.

We have seen that if the virtual1.cpp program is executed, methods method_1 and method_2 from the base class are called. But method_1 being overridden in the derived class, we wanted the overridden method to be called instead of the one from the base class.

This can be realised by declaring method_1 as a virtual method. Thus, for each call of method_1, the implementation of the method that will be called is determined at execution- time and not at compile-time. So, the method method_1 is determined through dynamic binding.

In the C++ language a method is declared virtual in the following way: in the declaration of the class, the header of the method will start with the keyword virtual.

If a method is declared virtual in the base class, then the methods overriding it will be considered virtual in all derived classes of the hierarchy.

For the above example the declaration of the base class is modified in the following way:

class base { public:

virtual void method_1();

void method_2();

};

The result of the execution becomes:

Method method_2 of the base class is called Method method_1 of the derived class is called

So, method_1 from the derived class is called indeed.

Further we will present another example, where the neccessity of introducing virtual methods appears. Let us define the class fraction referring to rational numbers, having as attributes the numerator and the denominator of the fraction. The class has to have a constructor, the default value for the numerator being 0 and for the denominator being 1, and two methods:

product, for computing the product of two fractions and multiply, for multiplying the current object with a fraction given as parameter. Also, the fraction class has to have a method for displaying a rational number. Using class fraction as base class, we will define the derived class fraction_write, in which the product function will be overridden, so that besides executing the multiplication, the operation is displayed on stdout. The multiply method will not be overridden, but the performed operation has to be displayed on the standard output in this case, too. File fvirt1.cpp:

#include <iostream>

class fraction { protected:

int numerator;

int denominator;

public:

fraction(int numerator1 = 0, int denominator1 = 1);

(28)

fraction product(fraction& r); //computes the product of two //fractions, but does not simplify fraction& multiply(fraction& r);

void display();

};

fraction::fraction(int numerator1, int denominator1) {

numerator = numerator1;

denominator = denominator1;

}

fraction fraction::product(fraction& r) {

return fraction(numerator * r.numerator, denominator * r.denominator);

}

fraction& fraction::multiply(fraction& q) {

*this = this->product(q);

return *this;

}

void fraction::display() {

if ( denominator )

cout << numerator << " / " << denominator;

else

cerr << "Incorrect fraction";

}

class fraction_write: public fraction{

public:

fraction_write( int numerator1 = 0, int denominator = 1 );

fraction product( fraction& r);

};

inline fraction_write::fraction_write(int numerator1, int denominator1) : fraction(numerator1, denominator1)

{ }

fraction fractie_write::product(fraction& q) {

fraction r = fraction(*this).product(q);

cout << "(";

this->display();

cout << ") * (";

q.display();

cout << ") = ";

r.display();

cout << endl;

return r;

}

int main() {

fraction p(3,4), q(5,2), r;

r = p.multiply(q);

p.display();

cout << endl;