Functional Programming in C LEAN

(1)

Functional Programming in C ^LEAN

D

RAFT

S

^EPTEMBER

2, 2002

P

^IETER

K

^OOPMAN

, R

^INUS

P

^LASMEIJER

,

M

ARKO VAN

E

EKELEN

, S

JAAK

S

METSERS

(2)

(3)

Functional Programming in C

^LEAN

Preface

Functional languages enable programmers to concentrate on the problem one would like to solve without being forced to worry too much about all kinds of uninteresting implementation details. A functional program can be regarded as an executable specification. Func- tional programming languages are therefore popular in educational and research environments. Functional languages are very suited for teaching students the first principles of programming. In research environments they are used for rapid prototyping of complex systems. Recent developments in the implementation techniques and new insights in the underlying concepts such as input/output handling make that modern functional languages can nowadays also be used successfully for the development of real world applications.

The purpose of this book is to teach practical programming skills using the state-of-the art pure functional language CONCURRENT CLEAN. CLEAN has many aspects in common with other modern functional languages like MIRANDA, HASKELL and ML. In addition CLEAN offers additional support for the development of stand-alone window based applications and it offers support for process communication and for the development of dis- tributed applications.

This book on functional programming using CLEAN is split into three parts.

In the first part an introduction into functional programming is given. In six Chapters we treat the basic aspects of functional programming, functions, data structures, the type system and I/O handling. The idea is that you are able to write simple functions and applications as soon as possible. The intention is not to treat all available features of CLEAN but only the most important ones that are most commonly used. A complete description of all available language constructs can be found in the CLEAN language manual.

The main emphasis of this book lies in the second part in which several case studies are presented. Each case treats a tiny, but complete application in an illustrative problem domain. The case studies include applications like a simple database, an object-oriented drawing program, a data compression utility, an interpreter for a functional language. Each case furthermore illustrates a certain aspect of functional programming. Some case applications are reused in others to illustrate the reusability of code.

In the third part of this book we discuss the different kinds of programming development techniques for functional programming and efficiency aspects are treated.

So, a lot of material is presented in this book. However, one certainly does not have to work through all case studies. Depending on the programming experience already acquired and the time available one can use this book as a textbook for or one or two semester course on functional programming. The book can be used as an introductory textbook for people with little programming experience. It can also be used for people who already have programming experience in other programming paradigm (imperative, object-oriented or logical) and now want to learn how to develop applications in a pure functional language.

We hope that you enjoy the book and that it will stimulate you to use a functional language for the development of your applications.

(4)

(5)

Introduction to Functional Programming 1 1.1 Functional languages 1 1.2 Programming with functions 2 1.2.1 The `Start' expression 2 1.2.2 Defining new functions 3 1.2.3 Program evaluation with functions 4 1.3 Standard functions 4 1.3.1 Names of functions and operators 4 1.3.2 Predefined functions on numbers 5 1.3.3 Predefined functions on Booleans 6 1.3.4 Predefined functions on lists 6 1.3.5 Predefined functions on functions 6 1.4 Defining functions 7 1.4.1 Definition by combination 7 1.4.2 Definition by cases 8 1.4.3 Definition using patterns 8 1.4.4 Definition by induction or recursion 9 1.4.5 Local definitions, scope and lay-out 10 1.4.6 Comments 12 1.5 Types 12 1.5.1 Sorts of errors 12 1.5.2 Typing of expressions 13 1.5.3 Polymorphism 14 1.5.4 Functions with more than one argument 15 1.5.5 Overloading 15 1.5.6 Type annotations and attributes 16 1.5.7 Well-formed Types 17 1.6 Synonym definitions 19 1.6.1 Global constant functions (CAF’s) 19 1.6.2 Macro’s and type synonyms 19 1.7 Modules 20 1.8 Overview 22 1.9 Exercises 22 Functions and Numbers 25 2.1 Operators 25 2.1.1 Operators as functions and vice versa 25 2.1.2 Priorities 25 2.1.3 Association 26 2.1.4 Definition of operators 27

(6)

iv FUNCTIONAL PROGRAMMING IN CLEAN

2.2 Partial parameterization 28 2.2.1 Currying of functions 28 2.3 Functions as argument 29 2.3.1 Functions on lists 30 2.3.2 Iteration 31 2.3.3 The lambda notation 32 2.3.4 Function composition 32 2.4 Numerical functions 33 2.4.1 Calculations with integers 33 2.4.2 Calculations with reals 37 2.5 Exercises 39 Data Structures 41 3.1 Lists 42 3.1.1 Structure of a list 42 3.1.2 Functions on lists 44 3.1.3 Higher order functions on lists 49 3.1.4 Sorting lists 51 3.1.5 List comprehensions 53 3.2 Infinite lists 55 3.2.1 Enumerating all numbers 55 3.2.2 Lazy evaluation 56 3.2.3 Functions generating infinite lists 57 3.2.4 Displaying a number as a list of characters 58 3.2.5 The list of all prime numbers 59 3.3 Tuples 60 3.3.1 Tuples and lists 62 3.4 Records 62 3.4.1 Rational numbers 64 3.5 Arrays 66 3.5.1 Array comprehensions 67 3.5.2 Lazy, strict and unboxed arrays 67 3.5.3 Array updates 68 3.5.4 Array patterns 68 3.6 Algebraic data types 69 3.6.1 Tree definitions 71 3.6.2 Search trees 72 3.6.3 Sorting using search trees 74 3.6.4 Deleting from search trees 75 3.7 Abstract data types 75 3.8 Correctness of programs 77 3.8.1 Direct proofs 77 3.8.2 Proof by case distinction 78 3.8.3 Proof by induction 79 3.8.4 Program synthesis 81 3.9 Run-time errors 82 3.9.1 Non-termination 83 3.9.2 Partial functions 83 3.9.3 Cyclic dependencies 84 3.9.4 Insufficient memory 85 3.10 Exercises 87

(7)

FUNCTIONAL PROGRAMMING IN CLEAN v

The Power of Types 89 4.1 Type Classes 89 4.1.2 A class for Rational Numbers 92 4.1.3 Internal overloading 94 4.1.4 Derived class members 94 4.1.5 Type constructor classes 95 4.2 Existential types 96 4.3 Uniqueness types 98 4.3.1 Graph Reduction 99 4.3.2 Destructive updating 101 4.3.4 Uniqueness information 102 4.3.5 Uniqueness typing 102 4.3.5 Nested scope style 105 4.3.6 Propagation of uniqueness 106 4.3.7 Uniqueness polymorphism 107 4.3.8 Attributed data types 108 4.3.9 Higher order uniqueness typing 109 4.3.10 Creating unique objects 110 4.4 Exercises 110 Interactive Programs 113 5.1 Changing Files in the World 113 5.1.1 Hello World 115 5.1.2 Tracing Program Execution 116 5.2 Environment Passing Techniques 117 5.2.1 Composing Functions with Results 118 5.2.2 Monadic Style 119 5.3 Handling Events 120 5.4 Dialogs and Menus 123 5.4.1 A Hello World Dialog 123 5.4.2 A File Copy Dialog 126 5.4.3 Function Test Dialogs 128 5.4.4 An Input Dialog for a Menu Function 131 5.4.5 General Notices 131 5.5 The Art of State 133 5.5.1 A Dialog with state 133 5.5.2 A Control with State 134 5.5.3 A reusable Control 135 5.5.4 Adding an Interface to the Counter 137 5.6 Windows 139 5.6.1 Hello World in a Window 140 5.6.2 Peano Curves 141 5.6.3 A Window to show Text 147 5.7 Timers 152 5.8 A Line Drawing Program 154 5.9 Exercises 161 Efficiency of Programs 163 6.1 Reasoning About Efficiency 163 6.1.1 Upper Bounds 164

(8)

vi FUNCTIONAL PROGRAMMING IN CLEAN

6.1.2 Under Bounds 165 6.1.3 Tight Upper Bounds 165 6.2 Counting Reduction Steps 166 6.2.1 Memorization 166 6.2.2 Determining the Complexity for Recursive Functions 168 6.2.3 Manipulation of Recursive Data Structures 169 6.2.4 Estimating the Average Complexity 171 6.2.5 Determining Upper Bounds and Under Bounds 173 6.3 Constant Factors 174 6.3.1 Generating a Pseudo Random List 175 6.3.2 Measurements 176 6.3.3 Other Ways to Speed Up Programs 177 6.4 Exploiting Strictness 179 6.5 Unboxed Values 180 6.6 The Cost of Currying 182 6.6.1 Folding to the Right or to the Left 184 6.7 Exercises 185 Program development 187 A.1 A program development strategy 187 A.1.1 Analysis 188 A.1.2 Design 189 A.1.3 Implementation 190 A.1.4 Validation 190 A.1.5 Reflection 190 A.2 Top-down versus Bottom up 191 A.3 Evolutionary development 191 A.4 Quality and reviews 192 Standard Environment 195 B.1 StdEnv 195 B.2 StdOverloaded 196 B.3 StdBool 197 B.4 StdInt 198 B.5 StdReal 199 B.6 StdChar 200 B.7 StdArray 200 B.8 StdString 201 B.9 StdFile 202 B.10 StdClass 204 B.11 StdList 205 B.12 StdOrdList 207 B.13 StdTuple 207 B.14 StdCharList 208 B.15 StdFunc 208 B.16 StdMisc 209 B.17 StdEnum 209 B.18 Dependencies between modules from StdEnv 210

(9)

FUNCTIONAL PROGRAMMING IN CLEAN vii

Gestructureerd programmeren in P1 en P2 211 C.1 Probleem analyse 212 C.2 Algoritme en datastructuren 213 C.3 Reflectie 214 C.4 Implementeren 214 C.5 Evaluatie 215 Index 217

(10)

(11)

Part I

Chapter 1 Introduction to Functional Programming

1.1 Functional languages

1.2 P r o g r a m m i n g w i t h f u n c t i o n s 1.3 S t a n d a r d f u n c t i o n s

1.4 D e f i n i n g f u n c t i o n s 1.5 Types

1.6 S y n o n y m d e f i n i t i o n s 1.7 Modules

1.8 O v e r v i e w 1.9 E x e r c i s e s

1.1 Functional languages

Many centuries before the advent of digital computers, functions have been used to de- scribe the relation between input and output of processes. Computer programs, too, are descriptions of the way a result can be computed, given some arguments. A natural way to write a computer program is therefore to define some functions and applying them to concrete values.

We need not to constrain ourselves to numeric functions. Functions can also be defined that have, e.g., sequences of numbers as argument. Also, the result of a function can be some compound structure. In this way, functions can be used to model processes with large, structured, input and output.

The first programming language based on the notion of functions was LISP, developed in the early 60s by John McCarthy. The name is an abbreviation of `list processor', which re- flects the fact that functions can operate on lists (sequences) of values. An important feature of the language was that functions themselves can be used as argument to other functions.

Experience with developing large programs has showed that the ability to check programs before they are ran is most useful. Apart from the syntactical correctness of a program, the compiler can check whether it actually makes sense to apply a given function to a particular argument. This is called type checking. For example, a program where the square root of a list is taken, is considered to be incorrectly typed and is therefore rejected by the compiler.

In the last decade, functional languages have been developed in which a type system ensures the type correctness of programs. Some examples are ML, MIRANDA, HASKELL, and CLEAN. As functions can be used as arguments of other functions, functions are `values' in some sense. The ability of defining functions operating on functions and having functions as a result (higher-order functions) is an important feature of these functional languages.

In this book, we will use the language CLEAN. Compared to the other languages mentioned above, CLEAN provides an accurate control over the exact manipulations needed to exe- cute a program. There is a library that offers easy access to functions manipulating the user interface in a platform independent way. Also, the type system is enriched with uniqueness types, making it possible for implementations to improve the efficiency of program execu-

(12)

2 FUNCTIONAL PROGRAMMING IN CLEAN

tion. Finally, the CLEAN development system is fast and generates very efficient applications.

1.2 Programming with functions

In a functional programming language like CLEAN one defines functions. The functions can be used in an expression, of which the value must be computed.

The CLEAN compiler is a program that translates a CLEAN program into an executable application. The execution of such an application consists of the evaluation of an indicated expression given the functions you have defined in the program.

1.2.1 The `Start' expression

The expression to be evaluated is named ^Start. By providing an appropriate definition for the function ^Start, you can evaluate the desired expression. For example:

Start = 5+2*3

When this ^Start expression is evaluated, the result of the evaluation, '¹¹', will be shown to the user. For the evaluation of the start expression, other functions have to be applied. In this case the operators ⁺ and ^*. The operators ⁺ and ^* are actually special functions which have been predefined in the standard library which is part of the CLEAN system.

The standard library consists of several modules. Each module is stored in a separate file.

Each module contains the definition of a collection of functions and operators that some- how belong to each other.

In the program you write you have to specify which of the predefined functions you would like to use in your program. For the time being you just simply add the line

import StdEnv

and all commonly used predefined functions from the standard library, called the standard environment, can be used. The program you write yourself is a kind of module as well. It therefore should have a name, say

module test

and be stored in a file which in that case must have the name ^test.icl. So, an example of a tiny but complete CLEAN program which can be translated by the compiler into an executable application is:

module test import StdEnv Start = 5+2*3

From now on the lines containing the module name and the import of the standard environment will not be written, but are assumed in all examples in this text.

In the library commonly used mathematical functions are available, such as the square root function. For example, when the start expression

Start = sqrt(2.0)

is evaluated, the value ^1.414214 is displayed to the user.

Functions are, of course, heavily used in a functional language. To reduce notational complexity in expressions, the parentheses around the argument of a function are commonly omitted. Thus, the expression below is also valid:

Start = sqrt 2.0

This is a digression from mathematical practice that juxtaposition of expressions indicates multiplication. In CLEAN multiplication must be written explicitly, using the ^* operator. As function application occurs far more often than multiplication in functional programming practice, this reduces notational burden. The following would be a correct ^Start expression:

(13)

I.1 INTRODUCTION TO FUNCTIONAL PROGRAMMING 3

Start = sin 0.3 * sin 0.3 + cos 0.3 * cos 0.3

A sequence of numbers can be put into a list in CLEAN. Lists are denoted with square brackets. There is a number of standard functions operating on lists:

Start = sum [1..10]

In this example ^[1..10] is the CLEAN notation for the list of numbers from ¹ to ¹⁰. The standard function ^sum can be applied to such a list to calculate the sum (⁵⁵) of those numbers. Just as with ^sqrt and ^sin the (round) parentheses are redundant when calling the function ^sum.

A list is one of the ways to compose data, making it possible to apply functions to large amounts of data. Lists can also be the result of a function. Execution of the program

Start = reverse [1..10]

will display the list ^[10,^9,^8,^7,^6,^5,^4,^3,^2,^1] to the user. The standard function ^reverse reverses the order of a list.

There are many more standard functions manipulating lists. What they do can often be guessed from the name: ^length determines the length of a list, ^sort sorts the elements of a list from small to large.

In a single expression, several functions can be combined. It is, for example, possible to first sort a list and then reverse it. The program

Start = reverse (sort [1,6,2,9,2,7])

will sort the numbers in the list, and then reverse the resulting list. The result ^[9,^7,^6,^2,^2,

1] is displayed to the user. As conventional in mathematical literature, ^g^(f^x) means that ^f should be applied to ^x and ^g should be applied to the result of that. The parentheses in this example are (even in CLEAN!) necessary, to indicate that ^(f^x) as a whole is an argument of

g.

1.2.2 Defining new functions

In a functional programming language it is possible to define new functions by yourself.

The function can be used like the predefined functions from the standard environment, in the ^Start expression and in other function definitions. Definitions of functions are always part of a module. Such a module is always stored in a file.

For instance, a function ^fac, which calculates the factorial of a number, can be defined.

The factorial of a number n is the product of all numbers between 1 and n. For example, the factorial of 4 is 1*2*3*4 = 24. The ^fac function and its use in the ^Start expression can be defined in a CLEAN program:

fac n = prod [1..n]

Start = fac 6

The value of the ^Start expression, ⁷²⁰, will be shown to the user.

Functions that are defined can be used in other functions as well. A function that can make use of the ^fac function is ^over. It calculates the number of possibilities in which k objects can be chosen from a collection of n objects. According to statistics literature this number equals

(_kⁿ) = ^n!

k! (n−k)!

These numbers are called binomial coefficients, (ⁿ_k) is pronounced as n over k. The definition can, just as with ^fac, be almost literally (^n! means the factorial of ⁿ) been written down in CLEAN:

over n k = fac n / (fac k * fac (n-k)) Start = over 10 3

(14)

The arguments appearing on the left-hand side of a function definition, like ⁿ and ^k in the function ^over, are called the formal arguments or formal parameters of the function. For using it, one applies a function with actual arguments (also called actual parameters). For example, on the right-hand side of the start expression the function ^over is applied to the actual arguments ³and ¹²⁰. The actual argument corresponding to ⁿ is ^3, and the actual argument corresponding to ^k is ¹²⁰.

When run, this program displays the number of ways a committee of three persons can be chosen from a group of ten people (¹²⁰).

Apart from functions, also constants may be defined. This might be useful for definitions like

pi = 3.1415926

Another example of a constant is ^Start, which must be defined in every program. In fact, constants are just functions without arguments.

1.2.3 Program evaluation with functions

So, a functional program generally consists of a collection of function definitions and one initial expression (the ^Start expression). The execution of a functional program starts with the evaluation of the initial expression (the ^Start expression). This initial expression is re- peatedly replaced by its result after evaluating a function application. This process of evaluation is called reduction. One step of the process, evaluating a single function application, is called a reduction step. This step consists of the replacement of a part of the expression which matches a function definition (this part is called the redex, a reducable expression) with (a copy of) the function definition in which for the formal arguments uni- formly the actual arguments are substituted. When the expression contains no redexes reduction cannot take place anymore: the expression is said to be in normal form. In principle, the normal form of the start expression is the result of the evaluation of the program.

Suppose we define a function as follows

extremelyUsefulFunction x = (x + 19) * x

A program using this function consists then of a start expression

Start = extremelyUsefulFunction 2

This expression will be reduced as follows (the arrow → indicates a reduction step, the redex which is reduced is underlined):

Start

→ extremelyUsefulFunction 2

→ (2 + 19) * 2

→ 21 * 2

→ 42

So, the result of evaluating this extremely useless program is ⁴². In other words, ⁴² is the normal form of extremelyUsefulFunction 2.

1.3 Standard functions

1.3.1 Names of functions and operators

In the CLEAN standard environment, a large number of standard functions is predefined.

We will discuss some of them in the subsections below.

The rules for names of functions are rather liberal. Function names start with a letter, followed by zero or more letters, digits, or the symbol ^_ or ^`. Both lower and upper case letters are allowed, and treated as distinct symbols. Some examples are:

(15)

f sum x3 Ab g` to_the_power_of AverageLengthOfTheDutchPopulation

The underscore sign is mostly used to make long names easier to read. Another way to achieve that is to start each word in the identifier with a capital. This is a common con- vention in many programming languages.

Numbers and back-quotes in a name can be used to emphasize the dependencies of some functions or parameters. However, this is only meant for the human reader. As far as the CLEAN compiler is concerned, the name ^x3 is as related to ^x2 as to ^qX`a_y. Although the names of functions and function arguments are completely irrelevant for the semantics (the meaning) of the program, it is important to choose these names carefully. A program with well-chosen names is much easier to understand and maintain than a program without meaningful names. A program with misleading names is even worse.

Another possibility to choose function names is combining one or more `funny' symbols from the set

~ @ # % ^ ? ! + - * < > \ / | & = :

Some examples of names that are allowed are:

+ ++ && || <= == <> . %

@@ -*- \/ /\ ... <+> ? :->

The names on the first of these two lines are defined in some of the standard modules. The operators on the second line are examples of other names that are allowed.

There is one exception to the choice of names. The following words are reserved for special purposes, and cannot be used as the name of a function:

Bool Char default definition derive case class code export from if implementation import in infix infixl infixr instance Int let module of otherwise special system where with

Also, the following symbol combinations are reserved, and may not be used as function name:

// \\ & : :: { } /* */ | ! & # #! . [ ] = =: :== => -> <- <-:

However, enough symbol combinations remain to attain some interesting graphic effects…

1.3.2 Predefined functions on numbers

There are two kinds of numbers available in CLEAN: Integer numbers, like ^17,⁰ and ^-3; Floating-point numbers, like ^2.5,^-7.81,^0.0,^1.2e3 and ^0.5e-2. The character ^e in floating- point numbers means `times ten to the power of'. For example ^1.2e3 denotes the number

1.2*10³=1200.0. The number ^0.5e-2 is in fact ^0.5*10^-2⁼^0.005.

In the standard modules ^StdInt and ^StdReal some functions and operators are defined on numbers. The four mathematical operators addition (⁺), subtraction (^-), multiplication (^*) and division (^/) can be used on both integer and real numbers, as in the programs

Start = 5-12

and

Start = 2.5 * 3.0

When dividing integer numbers, the fractional part is discarded:

Start = 19/4

displays the result ⁴ to the user. If exact division is required, real numbers must be used:

Start = 19.0/4.0

will show the result ^4.75. The arguments of an arithmetic operator must both be integer or both be real. The expression ^{1.5 + 2} is not accepted by the compiler. However, there are standard functions ^toInt and ^toReal that convert numbers to an integer or real, respectively.

Other standard functions on integer numbers include

abs the absolute value of a number

sign -1 for negative numbers, ⁰for zero, ¹ for positive numbers

gcd the greatest common divisor of two numbers

(16)

^ raising a number to a power

Some standard functions on real numbers are:

sqrt the square root function

sin the sine function

ln the natural logarithm

exp the exponential function (e-to-the-power-of) 1.3.3 Predefined functions on Booleans

The operator ^< determines whether a number is smaller than another number. The result is the constant ^True (if it is true) or the constant ^False (if it is false). For example, the value of

1<2 is ^True.

The values ^True and ^False are the only elements of the set of truth values or Boolean values (named after the English mathematician George Boole, who lived from 1815 till 1864).

Functions (and operators) resulting such a value are called Boolean functions or predicates.

Next to ^< there is also an operator ^> (greater than), an operator ^<= (smaller or equal to), and an operator ^>= (greater or equal to). Furthermore, there is the operator ⁼⁼ (equal to) and an operator ^<> (not equal to).

Results of Boolean functions can be combined with the operators ^&& (`and') and ^|| (`or').

The operator ^&& only returns ^True if the results left and right are true:

Start = 1<2 && 3<4

will show the result ^True to the user. The `or' operator needs only one of the two state- ments to be true (both may be true as well), so ¹⁼⁼¹^||²⁼⁼³ will yield ^True. There is a function ^not swapping ^True and ^False. Furthermore there is a function ^isEven which checks whether an integer number is even.

1.3.4 Predefined functions on lists

In the standard module ^StdList a number of functions operating on lists is defined. Some functions on lists have already been discussed: ^length determines the length of a list, ^sum calculates the sum of a list of whole numbers.

The operator ⁺⁺ concatenates two lists. For example,

Start = [1,2] ++ [3,4,5]

will show the list [1,2,3,4,5].

The function ând operates on a list of which the elements are Booleans; ând checks if all the elements in the list are ^True. For example, the expression ând^[1<2,^2<3,^1==0] returns ^False. Some functions have two parameters. The function ^take operates on a number and a list. If the number is ^n, the function will return the first ⁿ elements of the list. For example, ^take³

[2..10] returns the list ^[2,3,4].

1.3.5 Predefined functions on functions

In the functions discussed so far, the parameters were numbers, Booleans or lists. Howe- ver, the argument of a function can be a function itself, too! An example of that is the function ^map, which takes two arguments: a function and a list. The function ^map applies the argument function to all the elements of the list. It is defined in the standard module

StdList.

Some examples of the use of the ^map functions are:

Start = map fac [1,2,3,4,5]

Applying the ^fac function to all five numbers, this shows the list [1,2,6,24,120]. Running the program

Start = map sqrt [1.0,2.0,3.0,4.0]

(17)

shows the list ^[1.0,^1.41421,^1.73205,^2.0], and the program

Start = map isEven [1..8]

checks all eight numbers for even-ness, yielding a list of Boolean values: ^[False,^True,^False,

True,False,True,False,True]. 1.4 Defining functions 1.4.1 Definition by combination

The easiest way to define functions is by combining other functions and operators, for example by applying predefined functions that have been imported:

fac n = prod [1..n]

square x = x * x

Functions can also have more than one argument:

over n k = fac n / (fac k * fac (n-k)) roots a b c = [ (~b+sqrt(b*b-4.0*a*c)) / (2.0*a) , (~b-sqrt(b*b-4.0*a*c)) / (2.0*a) ]

The operator ~ negates its argument. See 1.3.2 and Chapter 2 for the difference between ² and ^2.0.

Functions without arguments are commonly known as `constants':

pi = 3.1415926535 e = exp 1.0

These examples illustrate the general form of a function definition:

• the name of the function being defined

• names for the formal arguments (if there are any)

• the symbol ⁼

• an expression, in which the arguments may be used, together with functions (either functions imported form another module or defined elsewhere in the module) and elementary values (like ⁴²).

In definitions of functions with a Boolean result, at the right hand side of the ⁼-symbol an expression of Boolean value is given:

negative x = x < 0 positive x = x > 0 isZero x = x == 0

Note, in this last definition, the difference between the ⁼-symbol and the ⁼⁼-operator. A single `equals'-symbol (⁼) separates the left-hand side from the right-hand side in function definitions. A double `equals'-symbol (==) is an operator with a Boolean result, just as ^<

and ^>.

In the definition of the ^roots function in the example above, the expressions ^sqrt(b*b-

4.0*a*c) and ^(2.0*a) occur twice. Apart from being boring to type, evaluation of this kind of expression is needlessly time-consuming: the identical subexpressions are evaluated twice.

To prevent this, in CLEAN it is possible to name a subexpression, and denote it only once.

A more efficient definition of the function ^roots would be:

roots a b c = [ (~b+s)/d , (~b-s)/d ]

where

s = sqrt (b*b-4.0*a*c) d = 2.0*a

The word ^where is not the name of a function. It is one of the `reserved words' that where mentioned in subsection 1.3.1. Following the word ^where in the definition, again some definitions are given. In this case the constants ^s and ^d are defined. These constants may be used in the expression preceding the ^where. They cannot be used elsewhere in the program;

(18)

they are local definitions. You may wonder why ^s and ^d are called `constants', although their value can be different on different uses of the ^roots function. The word `constants' is justi- fied however, as the value of the constants is fixed during each invocation of ^roots. 1.4.2 Definition by cases

In some occasions it is necessary to distinguish a number of cases in a function definition.

The function that calculates the absolute value of a number is an example of this: for negative arguments the result is calculated differently than for positive arguments. In CLEAN, this is written as follows:

abs x

| x<0 = ~x | x>=0 = x

You may also distinguish more than two cases, as in the definition of the ^signum function below:

signum x | x>0 = 1 | x==0 = 0 | x<0 = -1

The expressions in the three cases are `guarded' by Boolean expressions, which are therefore called guards. When a function that is defined using guarded expressions is called, the guards are tried one by one, in the order they are written in the program. For the first guard that evaluates to ^True, the expression at the right hand side of the =-symbol is evaluated.

Because the guards are tried in textual order, you may write ^True instead of the last guard.

For clarity, you can also use the keyword ^otherwise.

abs x

| x<0 = ~x | otherwise = x

A guard which yields always true, like ^True or ^otherwise, is in principle superfluous and may be omitted.

abs x

| x<0 = ~x = x

The description of allowed forms of function definition (the `syntax' of a function definition) is therefore more complicated than was suggested in the previous subsection. A more adequate description of a function definition is:

• the name of the function;

• the names of zero or more arguments;

• an ⁼-symbol and an expression, or: one ore more `guarded expressions';

• (optional:) the word ^where followed by local definitions.

A `guarded expression' consists of a ^|-symbol, a Boolean expression, a ⁼-symbol, and an expression. But still, this description of the syntax of a function definition is not complete…

1.4.3 Definition using patterns

Until now, we used only variable names as formal arguments. In most programming languages, formal arguments may only be variables. But in CLEAN, there are other possibilities: a formal argument may also be a pattern.

An example of a function definition in which a pattern is used as a formal argument is

h [1,x,y] = x+y

This function can only be applied to lists with exactly three elements, of which the first must be ¹. Of such a list, the second and third elements are added. Thus, the function is not defined for shorter and longer list, nor for lists of which the first element is not ¹. It is a common phenomenon that functions are not defined for all possible arguments. For ex-

(19)

ample, the ^sqrt function is not defined for negative numbers, and the ^/ operator is not defined for ⁰ as its second argument. These functions are called partial functions.

You can define functions with different patterns as formal argument:

sum [] = 0 sum [x] = x sum [x,y] = x+y sum [x,y,z] = x+y+z

This function can be applied to lists with zero, one, two or three elements (in the next subsection this definition is extended to lists of arbitrary length). In each case, the elements of the list are added. On use of this function, it is checked whether the actual argument

`matches' one of the patterns. Again, the definitions are tried in textual order. For example, the call ^sum^[3,4] matches the pattern in the third line of the definition: The ³ corresponds to the ^x and the ⁴ to the ^y.

As a pattern, the following constructions are allowed:

• numbers (e.g. ³);

• the Boolean constants ^Trueand ^False;

• names (e.g. ^x);

• list enumeration's, of which the elements must be patterns (e.g. ^[1,x,y]);

• lists patterns in which a distinction is made between the first element and the rest of the list (e.g. ^[a:b]).

Using patterns, we could for example define the logical conjunction of two Boolean functions:

AND False False = False AND False True = False AND True False = False AND True True = True

By naming the first element of a list, two useful functions can be defined, as is done in the module ^StdList:

hd [x:y] = x tl [x:y] = y

The function ^hd returns the first element of a list (its `head'), while the function ^tl returns all but the first element (the `tail' of the list). These functions can be applied to almost all lists. They are not defined, however, for the empty list (a list without elements): an empty list has no first element, let alone a `tail'. This makes ^hd and ^tl partial functions.

Note the difference in the patterns (and expressions) ^[x:y] and ^[x,y]. The pattern ^[x:y] denotes a list with first element (head) ^x and rest (tail) ^y. This tail can be any list, including the empty list ^[]. The pattern ^[x,y] denotes a list of exactly two elements, the first one is called

x, and the other one ^y.

1.4.4 Definition by induction or recursion

In definitions of functions, other functions may be used. But also the function being defined may be used in it's own definition! A function which is used in its own definition is called a recursive function (because its name `re-(oc)curs' in its definition). Here is an example of a recursive definition:

fac n | n==0 = 1

| n>0 = n * fac (n-1)

The name of the function being defined (^fac) occurs in the defining expression on the right hand side of the ⁼-symbol.

Another example of a recursively defined function is `raising to an integer power'. It can be defined as:

power x n

(20)

| n==0 = 1

| n>0 = x * power x (n-1)

Also, functions operating on lists can be recursive. In the previous subsection we intro- duced a function to determine the length of some lists. Using recursion we can define a function ^sum for lists of arbitrary length:

sum list

| list == [] = 0

| otherwise = hd list + sum (tl list)

Using patterns we can also define this function in a much more readable way:

sum [] = 0

sum [first: rest] = first + sum rest

Using patterns, you can give the relevant parts of the list a name directly (like ^first and ^rest in this example). In the definition that uses guarded expressions to distinguish the cases, auxiliary functions ^hd and ^tl are necessary.

Using patterns, we can define a function ^length that operates on lists:

length [] = 0

length [first:rest] = 1 + length rest

The value of the first element is not used (only the fact that a first element exists). For cases like this, it is allowed to use the ‘^_’ symbol instead of an identifier:

length [] = 0

length [_:rest] = 1 + length rest

Recursive functions are generally used with two restrictions:

• for a base case there is a non-recursive definition;

• the actual argument of the recursive call is closer to the base case (e.g., numerically smaller, or a shorter list) than the formal argument of the function being defined.

In the definition of ^fac given above, the base case is ⁿ⁼⁼⁰; in this case the result can be determined directly (without using the function recursively). In the case that ^n>0, there is a recursive call, namely ^fac ^(n-1). The argument in the recursive call (^n-1) is, as required, smaller than ⁿ.

For lists, the recursive call must have a shorter list as argumen. There should be a non- recursive definition for some finite list, usually the empty list.

1.4.5 Local definitions, scope and lay-out

If you want to define a function to solve a certain problem you often need to define a number of additional functions each solving a part of the original problem. Functions following the keyword ^where are locally defined which means that they only have a meaning within the surrounding function. It is a good habit to define functions that are only used in a particular function definition, locally to the function they belong. In this way you make it clear to the reader that these functions are not used elsewhere in the program. The scope of a definition is the piece of program text where the definition can be used. The box in figure 1.1 shows the scope of a local function definition, i.e. the area in which the locally defined function is known and can be applied. The figure also shows the scope of the arguments of a function. If a name of a function or argument is used in an expression one has to look for a corresponding definition in the smallest surrounding scope (box). If the name is not defined there one has to look for a definition in the nearest surrounding scope and so on.

function args

| guard1 = expression1

| guard2 = expression2 where

function args = expression

Figure 1.1: Defining functions and values locally for a function alternative.

(21)

With a let statement one can locally define new functions which only have a meaning within a certain expression.

roots a b c = let s = sqrt (b*b-4.0*a*c) d = 2.0*a

in [(~b+s)/d , (~b-s)/d ]

A let statement is allowed in any expression on the right-hand side of a function or value definition. The scope of a let expression is illustrated in Figure 1.2.

let function args = expression in expression

Figure 1.2: Defining functions and values locally for a certain expression.

Layout

On most places in the program extra whitespace is allowed, to make the program more readable for humans. In the examples above, for example, extra spaces have been added in order to align the ⁼-symbols. Of course, no extra whitespace is allowed in the middle of an identifier or a number: ^len^gth is different from ^length, and ¹⁷ is different from ¹⁷.

Also, newlines can be added in most places. We did so in the definition of the ^roots function, because the line would be very long otherwise. However, unlike most other programming languages, newlines are not entirely meaningless. Compare these two ^where-expressions:

where where

a = f x y a = f x b = g z y b = g z

The place where the new line is inserted (between the ^y and the ^b, or between the ^x and the

y) does make a difference: in the first situation ^a and ^b are defined while in the second situation ^a and ^y are defined (^y has ^b as formal argument).

The CLEAN compiler uses the criteria below for determining which text groups together:

• a line that is indented exactly as much as the previous line, is considered to be a new definition;

• a line that is indented more belongs to the expression on the previous line;

• a line that is indented less does not belong to the same group of definitions any more.

The CLEAN compiler assumes that you use a fixed width font. The default tab-size is 4.

The third rule is necessary only when ^where-constructions are nested, as in:

f x y = g (x+w) where

g u = u + v where

v = u * u w = 2 + y

Here, ^w is a local definition of ^f, not of ^g. This is because the definition of ^w is indented less than the definition of ^v; therefore it doesn't belong to the local definitions of ^g. If it would be indented even less, it would not be a local definition of ^f anymore as well. This would result in an error message, because ^y is not defined outside the function ^f and its local definitions.

All this is rather complicated to explain, but in practice everything works fine if you adhere to the rule:

Definitions on the same level should be indented the same amount.

This is also true for global definitions, the global level starts at the very beginning of a line.

(22)

Although programs using this layout rule are syntactically appealing, it is allowed to define the scope of definitions explicitly. For example:

f x y = g (x+w) where { g u =u + v where { v = u * u };

w = 2 + y };

This form of layout cannot be mixed with the layout rule within a single module. When there is a semicolon after the module name on the first line of the module the scope of definitions in this module should be indicated by the symbols ^{ and ^}, otherwise the layout rule is used. The semicolons to separate definitions might be written when the layout rule is used and ought to be written otherwise.

1.4.6 Comments

On all places in the program where extra whitespace is allowed (that is, almost everywhere) comments may be added. Comments are neglected by the compiler, but serve to elucidate the text for human readers. There are two ways to mark text as comment:

• with symbols ^// a comment is marked that extends to the end of the line

• with symbols ^/* a comment is marked that extends to the matching symbols ^*/.

Comments that are built in the second way may be nested, that is contain a comment themselves. The comment is finished only when every ^/* is closed with a matching ^*/. For example in

/* /* hello */ f x = 3 */

There is no function ^f defined: everything is comment.

1.5 Types

All language elements in CLEAN have a type. These types are used to group data of the same kind. We have seen some integers, like ⁰, ¹ and ⁴². Another kind of values are the Boolean values ^True and ^False. The type system of CLEAN prevents that these different kinds of data are mixed. The type system of CLEAN assigns a type to each and every element in the language. This implies that basic values have a type, compound datatypes have a type and functions have a type. The types given to the formal arguments of a function specify the domain the function is defined on. The type given to the function result specifies the range (co-domain) of the function.

The language CLEAN, and many (but not all) other functional languages, have a static type system. This means that the compiler checks that type conflicts cannot occur during pro- gram execution. This is done by assigning types to all function definitions in the program.

1.5.1 Sorts of errors

To err is human, especially when writing programs. Fortunately, the compiler can warn for some errors. If a function definition does not conform to the syntax, this is reported by the compiler. For example, when you try to compile the following definition:

isZero x = x=0

the compiler will complain: the second ⁼ should have been a ⁼⁼. Since the compiler does not know your intention, it can only indicate that there is something wrong. In this case the error message is (the part ^[…] indicates the file and line where the error is found):

Parse error [...]: Unexpected token in input: definition expected instead of =

Other examples of parse and syntax errors that are detected by the compiler are expressions in which not every opening parenthesis has a matching closing one, or the use of reserved words (such as ^where) in places where this is not allowed.

(23)

Also wrong use of the layout rule, or expecting a layout rule while you switched it off by writing a semicolon after the module name causes syntax errors.

A second sort of errors for which the compiler can warn is the use of functions that are neither defined nor included from another module. For example, if you define, say on line

20 of a CLEAN module called ^test.icl

Start = Sqrt 5.0

the compiler notices that the function ^Sqrt was never defined (if the function in the module

StdReal was intended, it should have been spelled ^sqrt). The compiler reports:

Error [test.icl,20,Start]: Sqrt undefined

The next check the compiler does is type checking. Here it is checked whether functions are only used on values that they were intended to operate on. For example, functions which operate on numbers may not be applied to Boolean values, neither to lists. Functions which operate on lists, like ^length, may in turn not be applied to numbers, and so on.

If in an expression the term ^1+True occurs, the compiler will complain:

Type error […]: "argument 1 of +" cannot unify Bool with Int

The ^[…] replaces the indication of the file, the line and the function of the location where the error was detected. Another example of an error message occurs when the function

reverse is applied to anything but a list, as in ^reverse³:

Type error […]: "argument 1 of reverse" cannot unify [v1] with Int

The compiler uses a technique called unification to verify that, in any application, the actual types match the corresponding types of the formal arguments. This explains the term 'unify' in the type error messages if such a matching fails. Only when a program is free of type errors, the compiler can generate code for the program. When there are type errors, there is no program to be executed.

In strongly typed languages like CLEAN, all errors in the type of expressions are detected by the compiler. Thus, a program that survives checking by the compiler is guaranteed to be type-error free. In other languages only a part of the type correctness can be checked at compile time. In these languages a part of the type checks are done during the execution of the generated application when function is actually applied. Hence, parts of the program that are not used in the current execution of the program are not checked for type consis- tency. In those languages you can never be sure that at run time no type errors will pop up.

Extensive testing is needed to achieve some confidence in the type correctness of the program. There are even language implementations where all type checks are delayed until program execution.

Surviving the type check of the compiler does not imply that the program is correct. If you used multiplication instead of addition in the definition of ^sum, the compiler will not complain about it: it has no knowledge of the intentions of the programmer. These kind of errors, called `logical errors', are among the hardest to find, because the compiler does not warn you for them.

1.5.2 Typing of expressions

Every expression has a type. The type of a constant or function that is defined can be spec- ified in the program. For example:

Start :: Int Start = 3+4

The symbol ^:: can be pronounced as `is of type'.

There are four basic types:

• ^Int: the type of the integer numbers (also negative ones);

• ^Real: the type of floating-point numbers (an approximation of the Real numbers);

• ^Bool: the type of the Boolean values ^True and ^False;

(24)

• ^Char: the type of letters, digits and symbols as they appear on the keyboard of the computer.

In many programming languages string, sequence of ^Char, is a predefined or basic type.

Some functional programming languages use a list of Char as representation for string. For efficiency reasons Clean uses an unboxed array of ^Char, ^{#Char}, as representation of strings. See below.

Lists can have various types. There exist lists of integers, lists of Boolean values, and even lists of lists of integers. All these types are different:

x :: [Int]

x = [1,2,3]

y :: [Bool]

y = [True,False]

z :: [[Int]]

z = [[1,2],[3,4,5]]

The type of a list is denoted by the type of its elements, enclosed in square brackets: ^[Int] is the type of lists of integers. All elements of a list must have the same type. If not, the compiler will complain.

Not only constants, but also functions have a type. The type of a function is determined by the types of its arguments and its result. For example, the type of the function ^sum is:

sum :: [Int] -> Int

That is, the function sum operates on lists of integers and yields an integer. The symbol ^->

in the type might remind you of the arrow symbol (→) that is used in mathematics. More examples of types of functions are:

sqrt :: Real -> Real isEven :: Int -> Bool

A way to pronounce lines like this is `îsEven is of type Înt to ^Bool' or 'îsEven is a function from Înt to ^Bool'.

Functions can, just as numbers, Booleans and lists, be used as elements of a list as well.

Functions occurring in one list should be of the same type, because elements of a list must be of the same type. An example is:

trigs :: [Real->Real]

trigs = [sin,cos,tan]

The compiler is able to determine the type of a function automatically. It does so when type checking a program. So, if one defines a function, it is allowed to leave out its type definition. But, although a type declaration is strictly speaking superfluous, it has two ad- vantages to specify a type explicitly in the program:

• the compiler checks whether the function indeed has the type intended by the programmer;

• the program is easier to understand for a human reader.

It is considered a very good habit to supply types for all important functions that you define. The declaration of the type has to precede to the function definition.

1.5.3 Polymorphism

For some functions on lists the concrete type of the elements of the list is immaterial. The function ^length, for example, can count the elements of a list of integers, but also of a list of Booleans, and –why not– a list of functions or a list of lists. The type of length is denoted as:

length :: [a] -> Int

(25)

This type indicates that the function has a list as argument, but that the concrete type of the elements of the list is not fixed. To indicate this, a type variable is written, ^a in the example.

Unlike concrete types, like ^Int and ^Bool, type variables are written in lower case.

The function ^hd, yielding the first element of a list, has as type:

hd :: [a] -> a

This function, too, operates on lists of any type. The result of ^hd, however, is of the same type as the elements of the list (because it is the first element of the list). Therefore, to hold the place of the result, the same type variable is used.

A type which contains type variables is called a polymorphic type (literally: a type of many shapes). Functions with a polymorphic type are called polymorphic functions, and a language allowing polymorphic functions (such as CLEAN) is called a polymorphic language.

Polymorphic functions, like ^length and ^hd, have in common that they only need to know the structure of the arguments. A non-polymorphic function, such as ^sum, also uses proper- ties of the elements, like `addibility'. Polymorphic functions can be used in many different situations. Therefore, a lot of the functions in the standard modules are polymorphic.

Not only functions on lists can be polymorphic. The simplest polymorphic function is the identity function (the function that yields its argument unchanged):

id :: a -> a id x = x

The function îd can operate on values of any type (yielding a result of the same type). So it can be applied to a number, as in îd^3, but also to a Boolean value, as in îd^True. It can also be applied to lists of Booleans, as in îd [True,False] or lists of lists of integers: îd

[[1,2,3],[4,5]]. The function can even be applied to functions: îd^sqrt or îd^sum. The argument may be of any type, even the type â->a. Therefore the function may also be applied to itself: îdîd.

1.5.4 Functions with more than one argument

Functions with more arguments have a type, too. All the types of the arguments are listed before the arrow. The function ^over from subsection 1.4.1 has type:

over :: Int Int -> Int

The function ^roots from the same subsection has three floating-point numbers as arguments and a list of floats as result:

roots :: Real Real Real -> [Real]

Operators, too, have a type. After all, operators are just functions written between the arguments instead of in front of them. Apart from the actual type of the operator, the type declaration contains some additional information to tell what kind of infix operator this is (see section 2.1). You could declare for example:

(&&) infixr 1 :: Bool Bool -> Bool

An operator can always be transformed to an ordinary function by enclosing it in brackets.

This means that a && b and (&&) a b are equivalent expresions. In the type declaration of an operator and in the left-hand side of its own definition the form with brackets is obliga- tory.

1.5.5 Overloading

The operator ⁺ can be used on two integer numbers (^Int) giving an integer number as result, but it can also be used on two real numbers (^Real) yielding a real number as result. So, the type of ⁺ can be both Int Int->Int and Real Real->Real. One could assume that + is a polymorphic function, say of type ^{a a->a}. If that would be the case, the operator could be applied on arguments of any type, for instance ^Bool arguments too, which is not the case. So, the operator ⁺ seems to be sort of polymorphic in a restricted way.

Functional Programming in C LEAN

Functional Programming in C LEAN

D

S

2, 2002

P

K

, R

P

,

M

E

, S

S

Functional Programming in C

Preface

Table of Contents

Part I

Chapter 1

Introduction to Functional Programming

Functional Programming in C ^LEAN