MODERN SYNTACTIC THEORY AND COGNITIVE SCIENCE

(1)

MODERN SYNTACTIC THEORY AND COGNITIVE SCIENCE

¹

IAN ROBERTS²

Abstract. This paper critically reviews the assessment of the history of linguistic theory, particularly mainstream generative grammar, in relation to cognitive science since the 1950s as described in Boden (2006). After attempting to set the record straight by considering some of the more egregious omissions in Boden’s account, the paper addresses the question of repeated misconceptions of generative grammar that abound in much of the secondary cognitive-science literature. Adopting Dawkins’ (1976) concept of “meme”, several of these misconceptions are discussed and refuted.

Keywords: Linguistic theory, generative grammar, Chomsky, meme.

INTRODUCTION

Margaret Boden’s book Mind as Machine: A History of Cognitive Science (OUP 2006, 2 Vols; paperback edition 2008) is a monumental work. It traces the development of modern cognitive science from its 17th century origins through to its first flowering in the mid 20th century, and provides detailed sketches of the development of the various subfields of cognitive science since then (following the “Sloan Hexagon”, Fig 8.1, p. 523, these are philosophy, psychology, computer science, neuroscience, anthropology and linguistics − and their various hybrids such as psycholinguistics, neuropsychology, etc.).

Cognitive science is deliberately defined rather broadly, so as to include the study of all mental processes (including emotion, motor control and motivation, in addition to ‘purely’

cognitive aspects of psychology such as language, reasoning, perception, etc; see pp. 10f.

for discussion). The book is therefore very rich and extremely informative.

1 A note on the history of this paper. I read Boden (2006) in the summer of 2007, having seen it reviewed (by Michael Corballis) in the Times Literary Supplement. I was perturbed by the contents of Chapter 9, “Transforming Linguistics” for the reasons given below. I submitted the paper to Cognition, who declined to consider it for publication as it does not contain empirical results. It was subsequently very negatively reviewed by Language and rejected by Journal of Linguistics. It therefore remained unpublished until now aside from being the only response to this chapter that I am aware of, aside from Chomsky (2007). It is now somewhat out of date, but nonetheless perhaps deserves to see the light of day.

My thanks to Margaret Boden, Noam Chomsky and Neil Smith for comments on an earlier draft. It is perhaps more important than usual to emphasise that the errors are all mine.

2 Downing College, University of Cambridge, [email protected].

RRL, LIX, 2, p. 151–178, Bucureşti, 2014

(2)

However, as a theoretical linguist working on comparative syntax from a Chomskyan perspective, I was dismayed by much of the chapter dealing with linguistics (Chapter 9

“Transforming Linguistics”, pp. 590−700). This article is an attempt to fill in a gap in Boden’s presentation concerning Chomsky’s work in syntax since 1965, and to reflect on why this work has been overlooked here, and why, more generally, mainstream cognitive scientists (non-linguists working in fields that are, by any definition of cognitive science, allied to linguistics under the cognitive science umbrella) feel that this work can safely be overlooked.

1. “TRANSFORMING LINGUISTICS”

Boden’s chapter on linguistics is divided into 11 sections but really falls into five parts. In her very succinct introduction Boden quite accurately points out that pioneers of cognitive science interested in language in the first half of the 20th century such as Bateson, Turing and McCulloch, paid no attention to the theoretical linguistics of their day; one could add that the same is true of philosophers and logicians such as Frege and Russell, in whose foundational work the analysis of language played a central role. She then points out that “[b]y the early 1960s, noone involved in cognitive science… could ignore [theoretical linguistics – specifically, the study of syntax]” (590, emphasis hers). According to Boden, this changed state of affairs was due entirely to Chomsky’s influence. Boden outlines the various ways in which Chomsky’s influence made itself felt in cognitive science generally in the 1950s and early 1960s, and concludes “for the field as a whole, he was a Good Thing”.

But Boden goes on to give what she calls a health warning: “beware of the passions that swirl under any discussion of Chomsky” (591; italics hers). This is arguably well- advised, given the shockingly ad hominem nature of much anti-Chomsky polemic (including many of Boden’s quotations), and the rhetorical consequences of certain claims, which I return to below. But part of the health warning is the “tenfold Chomsky myth”

(592): ten theses that “those who uncritically take Chomsky as their scientific guru” (ibid) allegedly share. I will not comment in detail on this aspect of Boden’s presentation, although it deserves a critical review in itself. Suffice it to say that no citations are given as to who the holders of these beliefs actually are, nor where in the linguistics or cognitive science literature they have been expounded.³ This preliminary “health-warning” section

3 They are: (1) Chomsky always achieved mathematical rigour in his work; (2) Chomsky’s linguistic theory is indisputably correct in its essentials; (3) the (nativist) psychological implications of Chomsky’s theory were convincingly argued at the time; (4) they are now empirically beyond doubt; (5) Chomsky’s work in the 50s was wholly original; (6) his writings of the 60s were

“culmination of an august tradition of rationalism”; (7) without Chomsky’s grammar there would have been no computer modelling of language; (8) Chomsky was responsible for the demise of behaviourism; (9) he reawakened and strengthened the discipline of linguistics; (10) linguistics is as prominent in cognitive science as it was in the late 1950s to 1970s. These formulations are highly tendentious, and have been put forward in this form nowhere that I am aware of (no references are given). In any case, the use of the term “myth” is somewhat prejudicial: one could regard some of these points as misconceptions, or as points which have not been properly argued, etc. But instead certain centrally important substantive points (e.g. (3)) are mixed up with historiographical and sociological questions ((8-10)) and portrayed as a handed-down set of mythological (and therefore presumably false) beliefs. It is difficult to understand why an impartial observer would want to

(3)

goes on to observe the kinds of tensions that exist in linguistics, presenting several highly negative remarks about Chomsky and Chomskyans as illustration, and deploring “the unscholarly ignorance of the very existence of important competing theories” (591) on the part of an “anonymous young MIT linguist”. Boden is, however, at pains to profess impartiality (“I’m not a professional linguist, so have no axe to grind either way” (594);

“my account … isn’t in a polemical spirit” (594)). I will not call her good faith into question, but, as pointed out in Note 2, her presentation is rather tendentious for such an impartial observer.

The second part of Boden’s chapter (594−627) sketches the historical background to Chomsky’s work. Here the “Cartesian linguistics” of the 17th century is presented, followed by a discussion of Humboldt’s ideas on language, culture and creativity, and then the forms of structuralism which dominated American linguistics prior to Chomsky, culminating in a brief discussion of Zellig Harris’ work. Here, too, a number of critical points could be raised. These concern the historiography of 17th, 18th and early 20th century linguistic thought⁴ and the accuracy or otherwise of Boden’s claims, of Chomsky’s (1964, 1965 and especially 1966) claims, and of Boden’s claims about Chomsky’s claims.

But I will leave these issues aside, not because they are uninteresting or unimportant, but because they don’t bear directly on the question of the status of theoretical linguistics in contemporary cognitive science, which is my concern here.

The third part (627−647) covers what Boden takes to be the three works of Chomsky’s which had the greatest influence on cognitive science: Syntactic Structures (1957), his review of Skinner’s Verbal behavior (1959) and Aspects of the Theory of Syntax (1965). The discussion here is detailed and interesting, and Boden rightly draws attention to the importance of Chomsky’s 1956 paper “Three Models for the Description of Language”, in which the “Chomsky hierarchy” of formal grammars was put forward. The upshot of this highly formal and abstract work is that “human minds must have a level of computational power capable of dealing with whatever class of grammar is appropriate to natural languages” (628) − clearly a matter of central importance to cognitive science (and also an empirical question, whose answer remains unclear). This part of Boden’s presentation is clear and largely accurate as far as I am aware, although many would insist that it is incomplete in that the importance of Chomsky’s 1955 PhD thesis (The Logical Structure of Linguistic Theory, published in 1975) is underplayed (as well as his 1951 MA thesis The Morphophonemics of Modern Hebrew). Again, I will leave these potentially important points aside here.

The fourth part of Boden’s presentation is what concerns me most. Here Boden treats the “Aftermath”, the development of Chomskyan linguistics, and its changing place in cognitive science, after 1965. Here, Chomsky's work in syntactic theory is given very short shrift. The Extended Standard Theory − Chomsky’s model from roughly 1971 to 1980 − is introduce an entire field in this way, especially since the actual substance of Chomsky’s theory of syntax is not mentioned in this part of the presentation. I am sure that a similar exercise could be carried out with cognitive anthropology, or with AI, but Boden does not attempt this.

4 The very important achievements of 19th-century comparative philology are not mentioned.

This is reasonable to the extent that this work has had only a slight influence on Chomsky’s linguistics, but a full understanding of the developments in 20th-century linguistics arguably requires an understanding of the nature of the 19th-century legacy.

(4)

barely mentioned. Government-binding (GB) theory, a highly influential new model developed around 1980 and dominant until the early 1990s, is described in two paragraphs (650−651); here the “principles-and-parameters” model of universal grammar (UG) is presented but not illustrated − although there is a perceptive comment about the similarity between this model and developmental pathways in biology. Finally, the minimalist programme (MP), which has occupied Chomsky since about 1991, is described in five lines, followed by a paragraph consisting largely of quotations from an extremely hostile review by Pullum (1996). In contrast, alternative (all but one of them nonetheless Chomskyan) theories are described in relative detail: LFG (656−657), Montague grammar (657−660) and GPSG/HPSG (660−666). The descriptions of these models are concise, (mostly) accurate and (broadly) sympathetic. This contrasts with the extremely brief discussion of Chomsky’s own work in syntax a few pages earlier, amounting almost to the

“unscholarly ignorance… of competing theories” deplored earlier (see above). My main goal in this article is to attempt to redress this imbalance in presentation, which gives an extremely misleading picture of the development of syntactic theory over the past decades, and to reflect on why this imbalance exists. Why is mainstream cognitive science content to overlook developments in theoretical linguistics, one of its core disciplines?

Finally, Boden turns to the discussion of Natural Language Processing (NLP) (669−700). I have nothing to say about this here, as I take the concerns of NLP to be extrinsic to those of core syntactic theory.

2. PRINCIPLES AND PARAMETERS

Boden’s description of the essential idea behind the principles and parameters (P&P) approach to UG is succinct and accurate: “Principles are unvarying linguistic universals;

parameters are like variables, each has a limited number of possible values, which are partially independent. The diversity of actual languages is explained by differences between their sets of parameter values. So infants, in acquiring language, must discover which values characterize their mother tongue and set their parameters accordingly” (651). This is followed by a brief quotation from Chomsky (1980) making the key point that small differences in parameter values may give rise to large differences in the overall grammatical system.

But that’s it. No further discussion, illustration or mention of this approach appears.

On the other hand, Chomsky has described the introduction of P&P approach as a major paradigm shift in linguistic theory, possibly more profound than the introduction of generative grammar itself:

This [the P&P model – IGR] constituted a radical break from the rich tradition of thousands of years of linguistic inquiry, far more so than early generative grammar...

the P&P approach maintains that the main ideas of this tradition are misguided in principle (Chomsky 1995a: 5).

The whole history of the subject, for thousands of years, had been a history of rules and constructions, and transformational grammar in the early days, generative grammar, just took that over. So the early generative grammar had a very traditional flair... What happened in the Pisa discussions [the earliest formulation of P&P theory, an early version of Chomsky (1981), IGR] was that the whole framework was

(5)

turned upside down. So from that point of view, there was nothing left of the whole traditional approach to the structure of language, other than taxonomic artefacts. It initiated a period of great excitement in the field. In fact I think it is fair to say that more has been learned about language in the past twenty years than in the previous 2,000 years (Chomsky 2002: 95).

To see why this is so, we need to consider Chomsky’s (1964: 28f.) definitions of levels of adequacy for linguistic theory. These were observational, descriptive and explanatory adequacy. An observationally adequate grammar presents the data correctly, while a descriptively adequate grammar “specifies the observed data… in terms of significant generalizations that express underlying regularities in the language” (Chomsky 1964: 28). Explanatory adequacy “can be interpreted as asserting that data of the observed kind will enable a speaker whose intrinsic capacities are as represented in th[e] general theory to construct for himself a grammar that characterizes exactly this intuition”; in other words, attaining explanatory adequacy involves showing how a given empirical phenomenon can be deduced from UG. P&P was a very large step in the direction of explanatory adequacy, since, one could assume, if we can say that this syntactic feature of this language is due to setting that parameter to that value, we have provided an explanatorily adequate account of the syntactic feature in question in that we have related it directly to UG. And we may have done more, as a brief (and inevitably much oversimplified) illustration of the P&P approach in practice may show.

It is well known that many, perhaps all, languages can be divided into two classes according to whether, in a simple transitive clause, the direct object precedes or follows the verb. For example in English, the verb precedes the object:

(1) a. John ate the apple. (VO) b. *John the apple ate. (OV)

In Japanese, on the other hand, the object precedes the verb:

(2) a. Sensei-wa Taro-o sikata. (OV) teacher-TOP Taro-ACC scolded ‘The teacher scolded Taro’

b. *Sensei-wa sikata Taro-o. (VO) teacher-TOP scolded Taro-ACC

In addition to English, the Scandinavian, Romance, Celtic and Bantu languages are all VO; alongside Japanese, the Turkic, Indic, Dravidian and most Amerindian languages are OV.

Indeed, the observation that languages tend to fall into one of these two classes, and some of the consequences of this, which I’ll mention below, was originally made in the context of the highly empiricist, non-computational, non-Chomskyan, approach to documenting “language typology” initiated by Greenberg (1963) and developed ever since (culminating most recently in the World Atlas of Language Structures (Dryer, Matthew S.

& Haspelmath, Martin World Atlas of Language Structures Online, Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013; www.wals.info, henceforth WALS)). Now, let us assume that UG provides a characterisation of what a (transitive) verb is; this should

(6)

be deducible from the universal theory of grammatical categories. UG should also provide a definition of a direct object (this should follow from the universal theory of grammatical functions/relations, see for example Chomsky (1965:68f.)). UG may, as has been assumed in most versions of generative grammar since Chomsky (1965), further specify that object and verb must combine syntactically to form a Verb Phrase (VP). What UG does not specify, however, is the (surface) order of verb and object. This is a parameter, or the deducible consequence of a parameter. If this parameter is set to its “verb-initial in VP”

value, we have a VO language; if to its “verb final in VP” value, we have an OV one.

Children, on a simple nativist view of UG and parameter-setting (see below), “come equipped” with the UG notions of verb, object and VP; experience (of a fairly basic nature;

exposure to simple transitive clauses) gives rise to the setting of the parameter as OV or VO.

Parameters are generally thought to determine more than a single aspect of surface form. The setting of the OV/VO parameter, for example, has been held to be implicationally linked to the relative ordering of Pre/Postposition (P) and object. In his chapter on this correlation in the WALS, Dryer (2013a,b) gives the following figures concerning this correlation in 981 languages (a further 139 languages were defined as not falling into one of the four types):

(3) OV & Postpositions 472 OV & Prepositions 14 VO & Postpositions 41 VO & Prepositions 454

More than 90% (926) of the languages sampled show consistent orders in this respect, while only 55 (approximately 5.5%) of the languages diverge. This is clearly a significant result, although the 55 divergent languages of course require further investigation.

The order of object and verb, and object and pre/postposition, is also implicationally linked to that of auxiliary (Aux) and VP, and of clausal particles (C) and the body of the clause. In each case, the relevant element consistently either precedes or follows its notional dependent in very many languages. Compare again English and Japanese in these respects:

(4) a. to London (Pr)

b. has eaten the apple (Aux VP) c. that John will leave (C - clause) (5) a. Nihon kara (Po)

Japan from

‘from Japan’

b. John-ga Mary-to renaisite iru (VP Aux) John-NOM Mary-with in-love is

‘John is in love with Mary.’

c. John-ga Mary-o nikundeiro to (clause C) John-NOM Mary-ACC be-hating that ‘that John hated Mary’

(7)

These correlations hold surprisingly well (Dryer (1992) observes 14 such putative correlations across 625 languages carefully sampled from the genetic phyla of the world’s languages). They can be formulated, in terms of the “’X’ theory” of phrase structure developed by Chomsky and others in the 1970s (Chomsky 1970, Jackendoff 1977), in terms of a general choice of “head-initial” vs “head-final” ordering in a phrase, viz:

(6) a. English: XP. ÆX YP b. Japanese: XP Æ YP X

(where X is a variable ranging over V, P, Aux, C; and Y is a dependent variable ranging over N, V and, for simplicity, “clause”).

Of course, there are difficulties. Many languages appear to present “mixed” typology (among these are German, Latin and Chinese), and an ongoing, open research question is how “mixed” a language can be (see Biberauer, Holmberg, Roberts (2014) for the claim that at least one kind of “mixed” order does not exist at all). But what I hope the above shows is (i) the empirical coverage of the P&P approach and (ii) its explanatory power, given the P&P view of acquisition. In fact, P&P has opened up the possibility of a general, formal theory of comparative syntax. As Richard Kayne has pointed out “[c]omparative syntax has become an indispensable, if not privileged, part of our attempt to understand the (syntactic component of the) human language faculty” (Kayne 2005: 55). Chomsky (2000:

8) observes that the P&P research programme “gives at least an outline of a genuine theory of language, really for the first time.”

Many putative parameters have been discussed in the literature on comparative syntax, and some will no doubt turn out to be spurious. Here I will very briefly mention one or two others that have been the focus of attention and seem to be fairly well-established.

Wh-movement. This operation is active in “wh-questions”. In English and similar languages, it places a phrase containing a wh-pronoun in initial position in the clause independently of its grammatical function as subject, direct object, etc. (with associated

“subject-auxiliary inversion” in direct questions):

(7) a. Who did John meet − at the party? (direct question) b. I wonder who John met – at the party. (indirect question)

Here the wh-phrase bearing the direct-object function appears initially in the interrogative clause (a direct question in (7a), and indirect one in (7b)). The usual position for direct objects in English is immediately postverbal, indicated by the dashes in (7). A standard account of the syntax of English wh-questions is to say that there is a particular kind of dependency – usually called “movement” – holding between the initial wh-phrase and the “gap” in the grammatical-function position. In short, we say that direct object in the examples in (7) “undergoes wh-movement.”

Many languages lack wh-movement, however. Mandarin Chinese is an example:

(8) a. Zhangsan yiwei Lisi mai-le shenme?

Zhangsan thinks Lisi bought what?

‘What does Zhangsan think Lisi bought?’

(8)

b. Zhangsan xiang-zhidao Lisi mai-le shenme Zhangsan wonders Lisi bought what.

“Zhangsan wonders what Lisi bought.”

Here the word for ‘what’ (shenme) occupies (one) usual direct-object grammatical- function position in Chinese, just as in the corresponding declarative sentences. There is no syntactic operation creating a dependency between an initial, or otherwise “displaced”, wh-phrase and the usual position for phrases bearing the relevant grammatical function.

Other languages lacking the “wh-movement” dependency include Japanese, Dravidian, Korean, etc. It has been suggested that the absence of wh-movement is linked to the absence of a class of specialised interrogative wh-pronouns (cf. English who, which, etc.), or perhaps to aspects of the formation of “yes/no” interrogatives (Cheng 1991), or perhaps OV order (Bach 1971).

There is a further option: in English, if more than one constituent in a clause is questioned, only one undergoes wh-movement:

(9) a. Which facts did Alistair suppress in his testimony to whom?

b. *Which facts to whom did Alistair suppress in his testimony?

But many languages, including most Slavic languages, require all wh-phrases to move, as in Bulgarian.

(10) a. Koj kogo e vidjal? (Bulgarian) who whom aux saw-3s

b. Ko je koga vidjeo? (Serbian/Croatian) Who aux whom saw

‘Who saw whom?’

Here we observe an important property of parameter values: they may be implicationally related in that the value of one parameter may depend on that of another.

Clearly, “multiple” wh-movement, of the Slavic kind, isn’t possible if wh-movement isn’t possible at all. So we can describe the following implicational schema:

(11) wh-movement? N (Chinese)

Y (Slavic, English) Æ Yes: multiple wh-movement?

N (English)

Y (Slavic)

Naturally, it is a goal of work in P&P theory to set up such hierarchies (for a more elaborate example, see Baker 2001: 183).

Verb-placement. Languages differ in where in the clause the main verb may appear:

initially, as in the Celtic languages, second (i.e. following exactly one phrasal constituent), as in main clauses in most Germanic languages, after the subject, as in English or French, or finally, as in Japanese. This parameter interacts with OV/VO parameter to give rise to varying surface orders, although evidence from the Germanic languages, which I will not go into here, shows these two parameters to be distinct.

(9)

Null Subjects. Many languages allow the verb alone, seemingly, to express a definite pronominal subject in a finite clause. Others do not. The phenomenon is illustrated in (12):

(12) a. Cives Romanus sum. (Latin) b. Sono cittadino romano. (Italian) c. *Am a Roman citizen.

(12) shows that Latin and Italian are both “null-subject” languages while English is not; Latin and Italian do not require the definite pronominal subject to be expressed by an overt pronoun separately from the verb. It has been suggested that this property is implicationally related to a range of others (Rizzi 1982, Gilligan 1987, Nicolis 2004 and the papers in Biberauer 2008).

If the four parameters just illustrated are (perhaps only partially) independent of one another, then, for any given language, we can informatively specify its value for each one.

In (13), I give simple examples of this in relation to the parameters just discussed for certain fairly familiar languages:

(13) OV/VO Wh-movement V-position Null subjects English: VO Yes after subject No Italian: VO Yes after subject Yes German: OV⁵ Yes 2nd (main cl) No

Latin: OV Yes last Yes

Japanese: OV No last Yes

Welsh: VO Yes 1st Yes

With a small amount of further technical elaboration, these (and, in principle, all other) parameters can be reduced to binary values. We thus have the possibility of developing a kind of genetic code, or binary index, for any given grammatical system. This notion takes us some way towards a formal theory of comparative syntax. Recent work by Gianollo, Guardiano and Longobardi (2008, henceforth GGL) illustrates this point much more fully. Restricting themselves to a specific syntactic domain, the nominal phrase, GGL illustrate the values of 46 parameters in 24 languages (the parameters include such variant properties as Article-Noun order, Plural marker-Noun order, Noun-Relative Clause order and Noun-Genitive order). GGL use the huge amount of data they are able to summarise in order both to quantify grammatical distance between languages (which can then be compared with the depth of historical relatedness) and to produce a phylogenetic tree, using perfect phylogeny software developed by evolutionary biologists (Felsenstein 2001). Here we begin to see the potential of systematic comparison using the P&P approach.

Mainstream cognitive science does not need to interest itself in the details of the position of the verb in German, the nature of interrogative clauses in Chinese, or the order of words in Japanese. But, a fully formal theory of comparative syntax, of the kind which,

5 In German, the verb must be second in the main clause, but it is final in the subordinate clause. Since Koster (1975), it has been assumed that the former order is derived by a verb-movement operation from the latter.

(10)

thanks to P&P theory can now be envisaged and partially implemented, ought to be of interest. Boden clearly indicates the interest in finding and developing cross-cultural comparisons in her discussions of cognitive anthropology (see e.g. 527f.) and one would think that, for similar reasons, a theory of comparative syntax would be of interest. But work in P&P theory, which has proceeded apace for 25 years and has been one of Chomsky’s main concerns, is all but ignored in her discussion of linguistics, as we have seen.

I said above that reducing this or that syntactic phenomenon in a given language to a given parameter value is a way of attaining explanatory adequacy in the sense defined in Chomsky (1964). Clearly what should be added is how a given parameter value can be acquired, both in principle (i.e. in what sense it is learnable from primary linguistic data) and practice (i.e. acquisition studies should support the postulation of given parameters).

Progress has been made using P&P theory in both of these areas: Niyogi (2006) contains detailed studies of learning algorithms for parametrised grammars, while Guasti (2002) summarises the vast literature of first language-acquisition studies couched in P&P terms.

Again, these are issues which deal with questions which have always been at the heart of cognitive science; again they are entirely overlooked in Boden’s treatment of modern linguistics.

A final remark on the possible strengths of the P&P approach: I mentioned above that P&P theory can take the data gathered − generally in an entirely atheoretical, non- computational, non-cognitive scientific spirit − by language typologists and attempt, on that rich empirical basis, to develop a formal theory of comparative syntax. It can also be used to develop a formal theory of syntactic change. Consider our little “parameter table” in (13) above. There I listed both Latin and Italian, and one can observe that two parameters differ in value between these two languages. Yet, we know that modern Italian has evolved directly from Latin, through two millenia of parent-to-child transmission. Parameter values can change over time, then, as the population of native speakers changes over time, and P&P theory must develop an account of how this happens. Interesting work has been carried out in this area (see Roberts 2007 for a recent overview, and Niyogi 2006 for a formal treatment of the issues). In this connection, P&P theory offers the prospect, and at this stage it is only that, of bringing historical and comparative linguistics into the purview of cognitive science. This field is arguably the great success story of linguistics to date, in that the discoveries of 19th-century comparative and historical work are now universally accepted; as Morpurgo-Davies (1998: 20) said in this connection: “No one now rejects the suggestion that German and Sanskrit are related and continue an earlier Indo-European language”. P&P theory offers the promise of a cognitive explanation of these empirical discoveries, telling us why this is the case, something 19th century linguists were unable to achieve. As Longobardi (2003: 5) points out, P&P theory offers a way to combine the insights of the two major scientific revolutions in linguistics: the historical-comparative paradigm of the 19th century, and what he calls the “synchronic-cognitive” paradigm of the 20th. Again, we begin to see the outlines of a true theory of UG here.

3. THE MINIMALIST PROGRAMME

Since its inception around 1980, P&P theory has had two principal manifestations:

government-binding theory (GB) and the minimalist programme (MP). I will not linger over the details of GB theory here, since this model has been largely superseded and is no

(11)

longer the object of ongoing research. The MP differs from GB in being essentially an attempt to reduce GB to barest essentials. One could in fact see it as an (informal) attempt to axiomatise GB (for a partial formalisation of one version of the MP, see Stabler 1997).

Accordingly, where GB assumed four levels of representation for every sentence (D- Structure, S-Structure, Phonological Form (PF) and Logical Form (LF); the former two correspond roughly to the earlier deep and surface structures, the latter two to interpretative levels involving, respectively, phonological and semantic interpretation), the MP assumes just the two interpretative levels. The core syntax is seen as a derivational mechanism which relates these two, i.e. it relates sound (PF) and meaning (LF) over an unbounded domain (and hence contains recursive operations).

Recent versions of MP, stemming primarily from Chomsky (2001), rely on three basic operations: Merge, Move and Agree. Merge combines two syntactic objects to form a third object, a set consisting of the set of the two merged elements and their label. Thus, for example, verb (V) and object (O), may be merged to form the complex element {V,{V,O}}. The label of this complex element is V, indicating that V and O combine to form a “big V”, or a VP. The use of set-theoretic notation implies that V and O are not ordered by Merge, merely combined; the relative ordering at V and O is parametised, as we saw above, and so order is handled by some operation distinct from that which combines the two elements - perhaps a (PF) “interpretative” operation, or perhaps a transformational (Move) operation (this, too, is at present an open question). In general, syntactic structure is built up by the recursive application of Merge.

Move is the descendent of the transformational component of earlier versions of generative syntax. Chomsky (2004) proposed that Move is nothing more than a special case of Merge, where the two elements to be merged are (a) an already constructed piece of syntactic structure S which non-exhaustively contains a given category C (i.e. [s … X … C

… Y ..] where it is not the case that both X=0 and Y=0), and C. This will create the new structure {L{C,S}} (where the way in which the label L at the resulting structure is determined need not detain us). Move, then, is a natural occurrence of Merge as long as Merge is not subjected to arbitrary constraints. We therefore expect generative grammars to have, in older terminology, a transformational component. (I will return to the question of the need for a transformational component in the next section; here we can merely observe that special stipulations might be needed to avoid having one − and these then require empirical motivation).

Finally, in recent versions of MP, Agree does the work of some of the

“anthropomorphic” operations derided in Pullum (1996), as quoted by Boden (651). As its name suggests, this operation underlies a range of morphosyntactic phenomena related to

“concord” (e.g. subject-verb agreement in many languages), case and related matters. The essence of the Agree relation can be seen as a relation which copies “missing” feature values onto certain positions, which intrinsically lack them but will fail to be properly

“interpreted” (by PF or LF) if their values are not filled in by some syntactic operation. For example, in English a subject NP agrees in number with the verb (the boys leave/the boy leaves). Number is an intrinsic property of Nouns, and hence of NPs, and so we can say that boy is singular and boys is plural. More precisely, let us say that (count) Nouns have the attribute [Number] with (in English) the values [{Singular, Plural}]. Verbs lack intrinsic number specification, but, as an idiosyncracy of English (shared by many, but not all, languages) have the [Num] attribute with no value. The morphosyntactic Agree relation

(12)

ensures that the value of the subject NP is copied into the feature-matrix of the verb (if singular, this is realised in PF as the -s ending on present-tense verbs). It should be clear that Agree is the locus of a great deal of cross-linguistic morphosyntactic variation. Sorting out the parameters associated with the Agree relation is a major topic of ongoing research.

In addition to a lexicon, specifying the idiosyncratic properties of lexical items (including Saussurian arbitrariness), and an operation selecting lexical items for “use” in a syntactic derivation, the operations of Merge, Move and Agree form the core of minimalist syntax, as currently conceived. There is no doubt that this situation represents a simplification as compared to GB theory.

But there is more to the MP than this, and this again is where considerations relevant to cognitive science as a whole become relevant. Having approached classical explanatory adequacy with the development of P&P theory, the MP, as a “second-phase” version of P&P attempts to go “beyond explanatory adequacy”. Essentially, the question is: why this UG, and not one of countless other imaginable possibilities? One approach to answering this question has been to attempt to bring to bear “third-factor” explanations of UG.

To see what this means, consider again the basic P&P idea. A given syntactic structure is acquired on the basis of the combination of a UG specification (what a verb is, what an object is), and experience (exposure to OV or VO order). UG and experience are the first two factors making up adult competence then. But there remains the possibility that

“domain-general” factors such as optimal design, computational efficiency, etc., play a role.

In fact, it is a priori likely that such factors would play a role in UG; as Chomsky (2005a: 6) points out, principles of computational efficiency, for example, “would be expected to be of particular significance for computational systems such as language”.

Factors of this kind make up the third factor determining adult competence. In these terms, the MP can be viewed as asking the question “How far can we progress in showing that all language-specific technology is reducible to principled explanation, thus isolating core processes that are essential to the language faculty” (Chomsky 2005a: 11). The more we can bring third-factor properties into play, the less we have to attribute to “pure”

(domain-specific) UG, and the MP leads us to attribute as little as possible to “pure” UG.

An important area of current research involves investigating the extent to which “locality principles” (principles which define and limit the syntactic distance between a “moved”

element such as the fronted wh-phrase in (7) and “its” gap) may be reducible to “optimal search” conditions relating one position in a structure to another. An example of this may be the general existence of “intervention effects”, whereby given two elements {A,B}, A cannot be related to a third element C (by Agree or Move) if there is an element of the same kind B, “closer” to C (in terms of some formal definition) than A. This may be why, for example, subjects only agree with their “local” verb (cf. *The boys think that she are nice;

are can’t Agree with the plural NP the boys as there is a closer subject she which intervenes). The notions “of the same kind” and “closer” require, and have received, fully formal definitions, but the general idea should be clear without going into these details. It is currently thought that intervention effects may not be domain-specific, but may instead reflect more general computational constraints concerning optimal search and grouping like with like.

In a similar spirit, we may consider the syntactic derivation to be “forgetful”, in the sense that at a given point in a complex derivation, the system may not have access to all of its preceding steps. More precisely, it has been proposed that derivations proceed in

(13)

“phases”, each of which is (largely) impenetrable to other parts of the derivation. The postulation of phases may yield an account of locality phenomena in terms of what can be

“remembered” at any point in the derivation. The simplest way to understand what makes phases impenetrable is to think that the information in each phase is separately interpreted by PF and LF, and, once interpreted, is unavailable to the formal operations of syntax (which, as mentioned above, subserve interpretation). Currently, the notion of phase is being actively explored and developed as a leading idea in the MP. It is clear that dividing a potentially unbounded derivation into discrete phases reduces computational load.

It may be then, that while P&P theory was originally conceived in terms of a domain- specific, innate UG (principles given by the genetic endowment, parameters specified but their values left open), many aspects of UG may be less domain-specific than was once thought. The leading ideas of the MP certainly lead one to think along these lines. Again, this is a position that, one would think, would be of interest to cognitive science in general.

This leads us to the question of nativism. Boden (492−499) takes the Piagetian concept of epigenesis, defined as “a self-organising dialectic between biological maturation and experience” (493), to represent a possible “third way” between nativist and empiricist accounts of learning. As Boden observes in her all-too-brief discussion of P&P (651), the conception of L1 acquisition as the interaction of open parameter values with environmental input is reminiscent of this concept. It is at least possible to envisage, then, that “pure UG” (whatever is left over after the third-factor explanation has been run) will contain very little, perhaps as little as the recursive combinational operation Merge (for some speculations about how this may have evolved, see Chomsky (2005a: 11−12)), and that this, interacting with third-factor properties such as self-organisation, may in fact create parametric options in interaction with the environment. This is perhaps easiest to see if we take parameters to typically involve the option of the presence or absence of some formal property F: if the environment provides no clue, the default value (absence of F) is

“acquired”; if the environment provides a specific trigger for F, the presence of F is

“acquired”. All UG needs is the optionality of F; we can assume that a parameter-setting mechanism, active in the critical period of language acquisition, requires that F be consistently assumed to be absent or present (this is really the content of the notion of

“fixing a value” and, in essence, of “learning” in this context), and has the requisite sensitivity to triggers. So parameters may emerge from optional features; parameters arise where UG doesn’t mind. In this way, much of the P&P scenario may be attributable to

“third factors” along with an “epigenetic” view of parameters and parameter-setting of the kind just sketched. The P&P approach does not require us to think in terms of an innate

“menu” of options among which the child selects; particularly in its minimalist instantiation, it naturally leads us to think of a creative interaction between the innate endowment and the environment, each effectively influencing the other as acquisition progresses and the cognitive state matures. Once again, the fact that we can even contemplate this kind of picture, in the context of the formal theory of comparative syntax and first-language acquisition, ought to be of interest to cognitive science at large.

But something must be innate (if not domain-specific). At least two considerations militate against retreat into purely empiricist views of acquisition. One is the clear intuition that a competent adult native speaker of English is using “the same” mental faculty as a competent adult native speaker of Japanese, despite the glaring and manifest differences between the two forms of behaviour (and, usually, the difficulty for a single individual to

(14)

switch from one to the other). Comparative linguistics can explain what the intuition that the speakers are using the “same faculty” is - but this requires some notion of UG and of formal comparability of grammatical systems. It doesn’t, in itself, require rich assumptions of either domain-specificity or innateness, though, given the above considerations. The weakest assumption regarding the innate endowment is that it must be such as to make epigenesis possible - a pure tabula rasa is not possible. And that innate endowment still requires investigation and specification.

The other consideration, which I was genuinely surprised not to see in Boden's discussion of Chomskyan linguistics, is the argument from the poverty of the stimulus. In discussing trans-generational transmission of culture in her chapter on anthropology, Boden argues that purely inductive learning of the empiricist kind, as implicitly assumed by the majority of anthropologists (who exclude themselves from cognitive science), cannot work.

She endorses a pure poverty-of-the-stimulus argument given by the cognitive anthropologist Boyer (1994):

The variety of situations that a subject experiences supports indefinitely many possible inferences, of which only a small subset are ever entertained by the subject.

Unless we have a good description of the mechanisms that constrain the range of hypotheses, the empiricist account is insufficient. (Boyer 1994: 25, cited in Boden, p. 587).

Compare this with the following statement of the poverty of the stimulus in relation to L1 acquisition:

The astronomical variety of sentences any natural language user can produce and understand has an important implication for language acquisition … A child is exposed to only a small proportion of the possible sentences in its language, thus limiting its database for constructing a more general version of that language in its own mind/brain. This point has logical implications for any system that attempts to acquire a natural language on the basis of limited data. It is immediately obvious that given a finite array of data, there are infinitely many theories consistent with it but inconsistent with one another. In the present case, there are in principle infinitely many target systems … consistent with the data of experience, and unless the search space and acquisition mechanisms are constrained, selection among them is impossible…. No known ‘general learning mechanism’ can acquire a natural language solely on the basis of positive or negative evidence, and the prospects for finding any such domain-independent device seem rather dim. The difficulty of this problem leads to the hypothesis that whatever system is responsible must be biased or constrained in certain ways. Such constraints have historically been termed ‘innate dispositions,’ with those underlying language referred to as ‘universal grammar’

(Hauser, Chomsky, Fitch 2002: 1576−1577).

Compare also the following comment from Niyogi (2006: 12), who observes that formal studies of learning all point to:

the inherent difficulty of inferring an unknown target from finite resources, and in all such investigations, one concludes that tabula rasa learning is not possible. Thus children do not entertain every possible hypothesis that is consistent with the data

(15)

they receive but only a limited class of hypotheses. This class of grammatical hypotheses H is the class of possible grammars children can conceive and therefore constrains the range of possible languages that humans can invent and speak. It is Universal Grammar in the terminology of generative linguistics.

A further quotation (from Samuel Epstein (personal communication)) makes the same point:

The stimulus (language exposure) is acoustic disturbances hitting the child’s eardrums—or for SIGN, retinal images of hand-shapes in motion. The child assigns meaning to these, yet the meaning is not part of the input—the meaning does not float across the room from my mouth/handshapes to you .. we give a tadpole food and light and water—and it develops into a frog. I did not give it FROG, so the stimulus (food light water) is insufficient (too impoverished) to account for developmental biological facts like the tadpole becoming a frog. Therefore, we appeal to genetics.

The external stimulus for language acquisition simply cannot account for it;

something must come from within the acquirer. Boden in fact points out that this kind of conclusion is very widespread in cognitive science: “inductive learning requires some initial guidance and/or restriction … That’s been shown independently by work in GOFAI, in connectionism, in linguistics, in development psychology, in adult cognitive psychology, in the philosophy of science, and in neuroscience” (587-8). Poverty-of-stimulus considerations, then, are clearly domain-general. The importance of this argument for the acquisition of syntax has been known since at least Chomsky (1965: 58). In the context of the MP, this does not entail the assumption of a fully domain-specific UG, and does not preclude at least a partially epigenetic account of parameter-setting. So P&P in its MP version can provide a formal theory of UG and comparative syntax, language acquisition and syntactic change, while at the same time raising important questions for our understanding of nativism, domain-specificity and epigenesis. Given this, Boden’s cursory dismissal of this field of research seems very surprising, and certainly does a disservice to her book as a history of cognitive science.

4. LINGUISTICS ECLIPSED?

The final section of Boden’s discussion of generative linguistics proper, before she embarks on her discussion of NLP, has this title (without the question mark).

Unfortunately, this is a largely correct observation: “mainstream” cognitive science researchers feel quite entitled to ignore “mainstream” syntactic theory. This is the case despite the fact that, as I have tried to indicate above, in an inevitably oversimplified and non-technical way, current syntactic theory can offer the broad outlines of a formal theory of comparative syntax, specifying what’s universal in natural-language syntax and what’s not, and also how and why various parametric options may pattern together, along with an account of first-language acquisition, which may - depending on advances in our understanding of third-factor aspects of UG - not depend on the assumption of domain- specific innate mechanisms. Why is this research program overlooked by mainstream cognitive science (including Boden, in her presentation)?

(16)

One possibility is that the relevant notions (the elements of the theory of syntax itself, parameters, learning and epigenesis) are not adequately formalised. But Boden in many instances points out that an approach can be computational without being fully formalised or implementable; cf. in particular her comment, at the end of her chapter on cognitive anthropology, that “we now have computational accounts of a wide range of cultural phenomena. These are mostly schematic outlines, not nitty-gritty computer models.

Nevertheless [cognitive anthropology] has been respected, polished and sharpened” (589).

While P&P syntax is generally more than “a schematic outline” (the extreme simplicity of the presentation above is not representative in this regard), it is not (usually) formulated in terms of “nitty-gritty computer models” and, it is true, generally fails to reach the standards of formal precision of Chomsky’s early work, some work in GPSG/HPSG and Montague grammar. But it is not a requirement for theoretical work in cognitive science to reach these standards of precision, as Boden acknowledges in the quotation just given and elsewhere. It is worth pointing out here that, perhaps owing to the intellectual legacy of Frege and Russell, cognitive science and, to some extent, linguistic theory may have placed too much emphasis on the need for formalisation. In this connection, it is worth quoting Chomsky’s (1990) remarks (in response to Pullum 1989):

Inquiry is advanced by better data and fuller formalization, though the natural sciences would rarely if ever take seriously Pullum’s injunction that one should make

‘a concerted effort’ to meet ‘the criteria for formal theories set out in logic books’…

Even in mathematics, the concept of formalization in our sense was not developed until a century ago, when it became important for advancing research and understanding. I know of no reason to suppose that linguistics is so much more advanced than 19th century mathematics or contemporary molecular biology that pursuit of Pullum’s injunction would be helpful, but if that can be shown, fine. For the present, there is lively interchange and exciting progress without any sign, to my knowledge, of problems related to the level of formality of ongoing work (Chomsky 1990: 146).

AI is different; since it involves computer simulation, everything has to be fully formalised. But in theoretical linguistics, as in anthropology, formalisation for its own sake is not required.

Moreover, Pinker (1994) and Jackendoff (2002) are both cited as exceptions to the general lack of rigour in work on generative grammar, but these works are no more or less formal than most P&P work. Further, there is formal work: Niyogi’s (2006) work on learnability and change in parametric systems; Clark & Roberts’ (1993) attempt to apply a genetic algorithm to learning a GB-type parametric system; Yang’s (2002) work on modelling both acquisition in the individual and variation in populations, and Stabler’s (1997) formalisation of an early version of the MP come to mind. So relative lack of formalism cannot be the whole story (unless this criterion is being applied selectively).

We are left with a serious question, one which I would like to think is of concern to both linguists and mainstream cognitive scientists: why is mainstream generative syntax overlooked in cognitive science as a whole? In the next section, I will suggest that this is due to some widespread misconceptions, several of which are suggested by Boden's discussion of generative grammar. I'll try to show why these are misconceptions, and conclude that they should therefore be put aside. This ought, in principle, to lead to a reevaluation of contemporary generative syntax by cognitive scientists, although before this can happen Boden’s version of history must be abandoned.

(17)

5. MEMES, MYTHS AND MISUNDERSTANDINGS OF CHOMSKYAN GENERATIVE SYNTAX

Boden’s critique of generative grammar is quite wide-ranging. A number of the points she makes are very familiar, and are often repeated (several of them feature in Lappin, Levine and Johnson’s (1998) attack on the MP). These ideas seem to have to become memes in Dawkins’ (1976) sense; they are units of cultural transmission or imitation (see Boden’s discussion of this concept on pp. 562−568). The only thing required of a meme, according to Dawkins, is its ability to replicate in other minds or cultures. There is no requirement that any propositional content associated with a meme be true; Dawkins, for example, regards religions as “meme complexes”, and he certainly doesn’t think religions contain many true propositions (see Dawkins 2006). Here I’ll discuss the

“Chomsky memes” or “generative syntax memes” which Boden replicates in her discussion, and try to show why they are all, with one exception, false. The exception is computational intractability, which seems to be an empirical fact about natural languages that we just have to live with. Leaving this exception aside, I submit that it is the set of false beliefs described below − or something close to this − which has led to the situation I have been describing.

1) Chomsky as dictator/guru.

Chomsky is an intellectual leader, who deserves our admiration for his achievements in linguistics and beyond. Here I see little ground for controversy. Chomsky is also much admired for his political courage, integrity and commitment (whether or not one may agree with his views on a given issue). In his combination of intellectual acumen, his opening up of new intellectual territory and in his unstinting commitment to certain political causes, he is reminiscent of Russell. Boden appears to agree with all of this, and indeed mentions the similarity with Russell (641). Whether either should be called a “guru” is nonetheless debatable, but since it’s not clear what the term is supposed to mean (and Boden’s discussion is of no help in this respect) I’ll simply drop the matter. (It is in this context that she expounds “the tenfold Chomsky myth” that I mentioned above. Again, it’s not really clear what this is supposed to mean, but there is a clear implication that Chomsky and his associates have somehow developed a largely illusory belief system – see Note 2 above).

Others have compared Chomsky to a totalitarian dictator; two quotations from Edelmann (2003) to this effect are given (pp. 593, 667−668). Both claims imply that those who defend his ideas are somehow not in full possession of their faculties, either through devotion to the guru or terror of the dictator. Both claims are nonsensical in fact: nearly every linguist who works in a broadly Chomskyan paradigm disagrees with Chomsky on some point; one is committed to a research programme, not a religion or a personality cult.

A moment’s reflection shows that, on the one hand, there is no organised, quasi-religious cult of devotion to Chomsky (or to his scientific ideas), and, on the other, no one has been sent to a gulag − real or virtual − or executed or in any other way harmed through dissenting from Chomsky's scientific views (every single one of Chomsky's detractors quoted by Boden has pursued a successful career as an academic linguist in a major institution, and this applies virtually across the board).

(18)

Why, then, are such ideas perpetuated? How does the guru/dictator meme replicate?

The reason is that it performs a useful rhetorical function for Chomsky’s critics: it poisons the well for his defenders. It means, for example, that all of the points that I am trying to make in this article can be safely ignored, since I can be viewed as either a terrorised apparatchik or as a mindless acolyte. If the reader believes either of these, there is no need to read on; the prejudices engendered by this meme can save him or her the trouble. If, on the other hand, the reader will trust to my good faith, then, perhaps, some headway can be made in resolving the dilemma described in the previous section.

So let us recognise that Chomsky is a charismatic intellectual figure who deserves our admiration and respect, but who neither commands nor deserves absolute obeisance or unthinking credulity.

This last point can perhaps be supported by considering the history of model- theoretical (or truth-functional, or Montague) semantics in relation to generative grammar.

As Boden correctly points out in her discussion of Montague grammar (657−660), Barbara Partee was mainly responsible for introducing Montague’s work to linguists; this work applies the Tarskian approach to models and truth, developed for formal languages, to English. For a time, “Montague grammar” represented an alternative to “mainstream”

Chomskyan generative syntax, again as Boden describes. However, partly due to the shift in the relationship between syntax and semantics in the architecture of the grammar following evidence that surface structure affected scopal and other semantic relations (see in particular Jackendoff 1972), such that semantic interpretation became a post-syntactic interpretative component, it became increasingly apparent that Montague-style interpretations of pieces of syntactic structure could be added at this interpretative level.

The most obvious examples of this are the “Transparent Logical Forms” (TLFs) proposed by Heim (1993) and von Stechow (1993) (see for example the papers collected in Lutz, Müller, von Stechow 2000). Here each node in the phrase marker is associated with a (partial) logical form, written in intensional logic or whichever other formal language facilitates the evaluation of the structure (ultimately the truth or falsity of the proposition it expresses) in relation to a model.

Other approaches to integrating formal semantics (meaning any approach to interpretation which takes Fregean notions of truth and reference as its central ideas) with generative syntax have been advocated. Neo-Davidsonian approaches have been advocated by Higginbotham (1985), Schein (1992), Herburger (2000), for example. What is clear by now, though, is that the view that syntactic structures generated by Move and Merge are interpreted at (or after) the LF level into some formalism on the basis of which truth can be computed is quite widespread. Indeed, this assumption − which of course allows for a multitude of variants in practice − is so widespread that an entire textbook is based on it (Heim & Kratzer 1996).

But Chomsky has given voice numerous times to profound scepticism regarding the Fregean approach to semantics:

“[Human lexical items] lack the kind of relation to mind-independent elements that appears to be a general property of animal communication systems .. there is no reference-like relation for human language, hence no semantics in the technical sense of Frege, Peirce, Tarski, Carnap, Quine, and others” (Chomsky 2005b:5).

(19)

See also the more extended discussion in Chomsky (1995b: 22f.), and the reply to Ludlow (2003) and Horwich (2003) in Antony Hornstein (2003: 287−304). Consensus in the field − the field of generative syntax − has to a fair extent gone in a different direction.

And this isn’t a peripheral question; the relation between syntax and semantics is central to linguistic theory as a whole (and, arguably, to cognitive science, since it forms the basis of Searle’s famous (1980) critique of “strong AI”).

This situation hardly shows Chomskyan linguists as mindless acolytes or terrorised apparatchiks. Instead, I submit that most of us are engaged in theoretical linguistics for intellectual reasons, and find ourselves in agreement with Chomsky much, but not all, of the time. The idea that Chomsky is some kind of guru or dictator is nonsensical and merely serves to short-circuit genuine discussion.

2) “Real” data

It is often maintained that generative grammar does not deal with ‘real’ linguistic data. Boden appears to support this view in her quotation of the personal communication from Larry Trask that “Anything that any human being actually says is at once defined by Chomsky as out of the discipline” (593), and in her critical discussion of competence/performance distinction (pp. 417−419, 667).

Trask’s claim is simply false. Indeed, for many years, Chomsky would draw attention to the “creative aspect of language use” by pointing out that most of the sentences we say and hear every day are entirely novel (Chomsky (1965: 57−58; 1968/2006: 6)). Moreover, generative grammarians make use of real examples, and make use of corpora of various kinds for various purposes, all the time. Corpora are particularly necessary where native- speaker intuitions are unavailable, as in language acquisition studies (CHILDES is as useful to generativists as to anyone else) and in work on dead languages (see in particular the work on the historical syntax of English carried out in recent years by Anthony Kroch and his associates: Kroch Taylor 1997, 2000, Pintzuk 1993, 1999, 2002, Pintzuk Taylor 2003).

But, since generative grammar is not a behaviourist theory, it does not limit itself only to observed behaviour. Native-speaker intuitions are readily available as a source of evidence, and adequate for many purposes. To deny that one's own linguistic intuitions are a manifestation of the language faculty seems perverse. It’s also worth noting that the other theories considered by Boden (LFG, Montague grammar, GPSG/HPSG) all rely just as heavily on native-speaker intuitions as does “mainstream” generative grammar. And for comparative work, one is often forced to rely on native-speaker intuitions at one remove, by questioning informants; one cannot wait for the crucial, one-in-a-million, example to just turn up.

In summary, if “real” data means what people say, then that is as valid as data for generative grammarians as for linguists of any other theoretical persuasion. However, there is no methodological stricture which requires one to rely on actual attestations where data based on native-speaker intuitions would do just as well.

Boden is highly critical of Chomsky’s (1965: 4−5) distinction between competence and performance, ultimately identifying it as the source of the way mainstream cognitive science overlooks generative grammar (417, 667). This strikes me as surprising, coming from a non-behaviourist. The distinction is really that between a system of knowledge (competence) and the actualisation of that system in behaviour (performance). When I am

(20)

asleep, unconscious, or drunk, or if I decide to become a Trappist monk and never speak again, I remain a native speaker of English. Unless and until some catastrophe (ECT, stroke, lobotomy, death) removes the relevant parts of my brain, that is true whatever I do.

When I actually speak or write English, I put that competence into action, but I may do lots of other things (walk, chew gum, think about cognitive science) at the same time; and some of those things (chewing gum perhaps) may affect my performance. This distinction would be inadmissible to behaviourists, but it is a necessary and useful abstraction as a preliminary to discussing formal grammars (seen as aspects of the computational mind). It is also useful in other areas; one can surely think of a “cultural competence” which may be in principle distinct from actual behaviour. Furthermore, in AI, as in other areas of programming, the distinction between the existence of a program and actually running that program is obvious. In this light, Boden’s rejection of the competence-performance distinction is very hard to understand.

Boden, however, claims that performance is “what experimenters have to work on when they observe their speaking subjects”. But an inference to competence can be made.

In fact, native-speaker judgements reflect performance, not competence (and as such are acceptability, not grammatically, judgements). But practising syntacticians and semanticists

− of all theoretical persuasions − make the inference that certain types of judgements can reflect competence. Exactly the same move is available to experimental psycholinguists and psychologists. Hence, it is false to claim that the competence/performance distinction

“erected a solid firewall” between “theoretical and empirical studies of language” (417);

and doubly false to imply that this is true only of Chomskyan syntax, as opposed to LFG, GPSG/HPSG or Montague grammar. This conclusion implies that Boden's diagnosis of the origin of the rift between cognitive science and generative grammar is incorrect. In fact, I submit that it is part of the problem rather than part of the solution.

3) “Psychological reality”

Related to the above point is the idea that in studying abstract formal competence grammars, Chomsky removes theoretical linguistics from the domain of the “real world”.

But Chomsky’s work, including the work on the hierarchy of formal grammars, has always been concerned with discovering something about the human mind. Like Turing (but unlike Frege or Montague), Chomsky has always understood that this could be done by constructing abstract formal models of cognitive capacities. This was what Chomsky was doing in the 1950s when his results so impressed the nascent cognitive science community, and it is what he is still doing now, with the MP version of P&P as described above. Further validation of these models from psychological, neurological or computational work is welcome but not necessary. The same goes for invalidation. But it seems to me that neither has been forthcoming. Boden mentions the early failure of the “derivational theory of complexity” − the idea that the more transformations employed in the derivation of a sentence, the longer it would take to parse − as an early setback. But it has been clear for many years that this conclusion was based on the wrong theory of transformations and the misapprehension that an abstract syntactic derivation might have a real-time processing analogue. In discussing NLP under a different heading from syntactic theory - and indeed mentioning that NLP could have been included in the separate chapter on GOFAI (591) − Boden in fact seems to implicitly acknowledge this.