Inductive learning of attribute path values in typed-unification grammars

(1)

Inductive learning of attribute path values in typed-unification grammars

Liviu Ciortuz^?

Department of Computer Science, University of Iasi, Romania E-mail: [email protected]

Abstract. This work is aiming to show that inductive logic programming (ILP) is a suitable tool to learn linguistic phenomena in typed-unification grammars, a class of attribute-value grammars increasingly used in natural language processing (NLP).

We present a strategy for generating hypothesis spaces for either generalisation or specialisation of attribute-path values inside type definitions. This strategy is the core of a prototype module, extending theLIGHTsystem for parsing with typed-unification grammars.

1 Introduction

The work on typed-unification grammars can be traced back to the seminal paper on the PATR-II system [18]. Basically, the design of such grammars is under- lined by two simple ideas:i.context-free rules may be augmented with constraints, generalising the grammar symbols to attribute-value matrices (also called feature structures);ii. feature structures (FSs) may be organised into hierarchies oftypes, a very convenient way to describe in a concise manner classes of rules and lexical entries.

Different logical perspectives on such a grammar design were largely studied;

see [4] for a survey. For theLIGHTparsing system [7] we adopted the Order-Sorted Feature (OSF) constraint logic framework [2]. In this way, typed-unification grammars may be seen as a generalisation of Definite Clause Grammars (DCG, [15]), similar to the way in which LOGIN generalised Prolog [1].²

?This paper was published in The Scientific Annals of the “Al.I. Cuza” University of Iasi, Romania, Computer Science Series, 2003, pages 105–125.

2 As a matter of terminology, the ψ-terms in LOGIN, or OSF-terms as they were re- named in [2], correspond to Feature Structures in the Natural Language Processing (NLP) and Computational Linguistics (CL) literature.

OSF-terms generalise first-order terms in the following respects:

i.constant symbols (‘sorts’) are organised into a hierarchy, assumed to be an inferior semi-lattice;

ii.the fix order of subterms is replaced by identification via attributes/features;

iii.variables/tags can identify not merely leaf nodes into the tree representing a term, but whole sub-trees.

Unification on such terms is either OSF-unification or OSF-theory unification, the latter being known as ‘typed-unification’ to NLP/CL people.

(2)

Theaim of our work is to explore the usefulness of a logic-based learning approach — namely Inductive Logic Programming (ILP [14]) — to improve the coverage of a given typed-unification grammar: we either generalise some rule or lexical type feature structures (FSs) in the grammar so to make it accept a given sentence/parse tree, or try to specialise a certain type in the grammar so to reject a (wrongly accepted) parse.

Among severalrelated approaches to ILP-based learning of unification grammars, we mention first the work by Cussens and Pulman [9]. By generalising over

‘examples’ of chart-based parse actions which can build full parses starting from partial ones, they ‘invent’ new rules to be added to the grammar. Somehow com- plementary to theirs, our approach proposes new versions of existing rules, either through generalisation or specialisation. Moreover, our approach is not limited to rules, being capable of affecting any type FS in the hierarchy of types which defines the grammar. Concerning the conception, while the previously mentioned approach is closer to the parsing, we are closer to the logic underlying the grammar. An earlier approach [21] proposed learning grammars from treebanks (set of sentences with associated parses) and performed ILP-based lexical inferences in order to make the learned grammar act deterministically.

Theorganisation of this paper proceeds somehow top-down: Section 2 outlines our approach for ILP-based strategy for learning attribute path values in typed- unification grammars. Section 3 firstly presents the two procedures for learning generalisations and respectively specialisations of type FSs, and secondly formalises therefinement operatorsused by these learning procedures. Sections 4 and 5 go into low-level details to exemplify how the two learning procedures work on a simple but significant HPSG grammar, which will be given in Appendix. Section 6 briefly presents the strategies we designed to scale up the present learning approach to a large HPSG grammar for English.

The present paper is a revised and extended version of [8].

2 Overview

The main idea of ILP-based learning is to generate (and then evaluate) new clauses starting from those defined in the input program/grammar. The acceptance/rejection of new ‘hypotheses’ is done using a set of examples and an evaluation function, to rate the inferred hypotheses. Learning a new clause in the case of typed-unification grammars amounts to the creation of new type FSs and this is done either by generalisation or specialisation. Generalisation either relaxes or We have tonotethat in lexicalized large-scale typed-unification grammars, due to the large number of exceptions in human languages, NLP/CL people prefer doing parsing in a bottom-up manner (rather than top-down as Prolog and DCG do).

(3)

satisfy_HPSG_principles [ PHON diff_list,

CAT #1:categ, SUBCAT #2:categ_list, HEAD #4:phrase_or_word

[ PHON diff_list, CAT #1,

SUBCAT categ_list [ FIRST #3:categ,

REST #2 ] ], COMP #5:phrase_or_word

SUBCAT nil ], ARGS <#4, #5> ]

lH_phrase

satisfy_HPSG_principles

rH_phrase

X:s — sort constraint X.f =Y — feature constraint X .

=Y — equational constraint Fig. 1. A sample type feature structure, a simple sort/type hierarchy, and the atomic OSF-constraints for the logical description of feature structures.

removes one or more atomic OSF-constraints, while specialisation adds new such constraints to a type FS.

To illustrate atype feature structure, Figure 1 presentssatisfy HPSG principles, the parent of the two (binary) rules used in a simple Head-driven Phrase Struc- ture Grammar (HPSG, [16]) appearing in [19]. The satisfy HPSG principles type encodes three HPSG principles: the Head Feature Principle, the Subcategoriza- tion Principle, and the Saturation Principle.³The knowledge embodied in thesatisfy HPSG principles will be inherited into two rules:lH phraseandrH phrase.^{4 5}

Thearchitecture of theilpLIGHT prototype system we implemented for learning attribute path values in typed-unification grammars is designed in Figure 2.

The initial grammar is submitted to an Expander module, which propagates the appropriate constraints down into the type hierarchy. The Parser module uses the expanded form of (syntactic) rules and lexical descriptions to analyse input sen-

3 The feature constraint encoding of the three principles is respectively:

Ψ.cat .

= Ψ.head.cat, Ψ.head.subcat .

= Ψ.comp.cat| Ψ.subcat, Ψ.comp.subcat .

=nil.

whereΨ =satisfy HPSG principles, as given in Figure 1.

4 One can easily imagine classicalnpandvprules as derived fromrH phraseand respec- tivelylH phrase.

5 Mainly, the constraints specific to these rules, as shown in Appendix, impose that the head argument is on the left, respectively the right position inside a phrase which represents a particular value of thephon feature. Hence the names of the two rules, standing respectively for left-headed phrase and right-headed phrase. The constraints imposed on the subcatvalues — consand respectively nil— for these two rules are meant to rule out phrases like “girl the” and “nice is” which otherwise would be accepted by the grammar.

(4)

Parser

input grammar

output grammar sentences

input

examples (parses) Expander

ILP/learner

Fig. 2.TheilpLIGHTarchitecture for learning typed-unification grammars.

tences. The ILP/learner module receives (maybe partial) parses produced by the Parser, and — by calling one of two hypothesis generation procedures — it infers either more specific or more general type descriptions for the considered grammar, such that the new grammar will provide a better coverage of the given sample sentences.

The bidirectional arrows in the diagram in Figure 2 are due to the (double) functionality of the Parser and Expander modules. When asked to act in ‘reverse’

mode, the Parser takes as input a parse and tries to build up the FS associated to that parse. If the construction fails, then the LIGHT’s tracer component is able to deliver an ‘explanation’ for the failure.⁶ This failure ‘explanation’ will be analysed by the ILP/learner module to propose ‘fixes’ to the grammar so that parse get accepted. The Expander can also work in ‘reverse’ mode: given as input a type and an atomic constraint contained in the description of that type, the expander will indicate which type in the input (unexpanded) grammar is responsible for the introduction of that constraint in the expanded form of the designated type.

We conducted a series of experiments during which the ilpLIGHT prototype system was able to learn by specialisation each of the three HPSG principles, if the other two were present and a (rather small) set of sample sentences was provided.

Equally, the system learned by specialisation lexical descriptions, and was able to recover (by generalisation) the definition of the HPSG principles if they were provided in an over-restricted form. In all cases mentioned above, learning took place in a few minutes amount of time.⁷

6 This explanation assembles informations on:i.the parse operation at which the ‘deriva- tion’ of the FS associated to the input parse was blocked; ii. the feature path along which the last (tried) unification failed;iii.the (atomic) unification steps which lead to the sort clash causing the unification failure.

7 For instance, learning the Subcategorization Principle as presented in section 4 took 4min. 25 sec. on a Pentium III PC at 933MHz running Linux Red Had 7.1, using the sample HPSG grammar in [18] and a set of 14 training sentences. Generalising the over-constrained form ofsatisfy HPSG principlespresented in section 5 took 2min. 23sec.

We suggest that this speed can be substantially improved in a system which avoids recompilation of the grammar (despite the fact that the compilation speeds up parsing).

(5)

3 Learning attribute values

3.1 Algorithms

Two procedures, which are in charge with grammar generalisation, respectively type specialisation, build up our strategy for learning typed-unification grammars.

These procedures are presented in detail in Figure 3. The key idea behind the two procedures is simple: given a type FS, we generalise and respectively specialise it by removing/relaxing and respectively adding one ore more atomic OSF-constraints inside the given FS.

Theoperators for creation/deletion of atomic OSF-constraints invoked by the two procedures will be given in detail in section 3.2, Figure 5 and 7. The names of these operators — relax-sort, remove (feature or equation constraint), equation introduction, sort-specialisation, type-unfolding, hypothesis-combination, — are enough intuitive to let the reader go through the text of this subsection, without needing to refer here to low-level technical details.

Adesign difference between the two procedures: The generalisation procedure picks up automatically — by running the parser in ‘reverse’ mode, taking as input a parse which is (wrongly) rejected by the grammar — which type FS it has to generalise. Instead, the specialisation procedure, in the form given in Figure 3, requires to be specified (as part its input) which are the type and the path (inside that type) from which it has to start building sub-type specialisations.

Otherwise said, the specialisation procedure works only on a single type FS, inferring different FSs which are subsumed by the value of the specified path (inside that type). But once called, it never looks for another type (to specialise). Instead, the generalisation procedure may proceed with identifying different types that can make the grammar accept the (wrongly) rejected parse/sentence.

Note that if generalisation finds a hypothesis which reaches this goal, but the new form of the grammar over-parses some of the given sample sentences, then the generalisation procedure may call the specialisation procedure to refine that hypothesis/type.⁸

Both procedures work iteratively: the main operations are performed inside a while loop. At each iteration, a set of new hypotheses is added to the search space.

The new hypotheses are evaluated one by one, and then the algorithm decides either to stop or to further search for other, better hypotheses.

For both generalisation and specialisation procedures, the acceptance or rejection of a hypothesis is based — as inspired by ILP — on comparing the parsing results with a givenannotation on the set of sample sentences. In the current prototype of the ilpLIGHT system, we annotate each sentence with (the number of) its correct parses. Any other parse will be considered a negative example.

8 Of course, the specialisation procedure may also proceed independently, as in fact will be exemplified in the next section.

(6)

Generating typed-unification grammar generalisations

Input: a grammarGalready expanded (all types being normalised), E=E+∪E−– a set of sample grammatical (E+)

and un-grammatical (E−) sentences, and

a parseδfor a grammatical sentence rejected byG.

Output:G⁰, more general grammar thanG(logically:G |=G⁰), such thatG⁰acceptsδ.

Procedure (Generalisation):

1.G⁰:=G;

2. Whileδis not accepted byG⁰do:

a.Find a failing unification inδ–unify(ϕ, ψ) – and an associated failure pathπ.

b.Eliminate the sort clashϕ.π∧ψ.π=⊥by using the rule X:s&Y :s⁰&X .

=Y &s∧s⁰=⊥

relax-sort(X,LUB(s, s⁰))or relax-sort(Y,LUB(s, s⁰))or remove(X .

=Y) c.TakeG⁰ one of the grammar versions produced by therelax-sort andremove.

3. For the pairs (Ψ, π) of ‘faulting’ typesΨ and failure pathsπidentified successively for G⁰at the point 2.a, in case of over-generation run thespecialisationprocedure (see below).

Generating a type specialisation hypothesis space

Input:a grammarG={Ψ(s)}s∈S already expanded (and all types being normalised), a typeΨ(t) (witht∈ S), and a feature pathπinΨ(t);

E=E+∪E−– a set of sample sentences.

Output:a rooted direct acyclic graphΓ ofspecialisation hypotheses (ψ, ρ):

ψvΨ(t), withπa prefix ofρ, and

(ψ1, ρ1) precedes (ψ2, ρ2) in Γ iffψ1vψ2andρ1is a prefix ofρ2. Procedure (Specialisation):

(ψ, ρ) := (Ψ(t), π); %initialise (the root hypothesis of )Γ

while (ψ, ρ) is not a convenient hypothesis %cf. expand + parse + evaluate and the specialisation hypothesis space was not yet exhaustively searched { generate specialisation hypotheses for (ψ, ρ) by

equation introduction,sort specialisation,type unfolding, or hypothesis combination;

take (ψ, ρ) the next not-yet-analysed hypothesis

%(“next”: in the sense of the above stated “precedence” relation) }

Fig. 3.Procedures forhypothesis space generationin inductive learning of attribute-path values in typed-unification grammars.

(7)

X:s&Y :s⁰&X .

=Y

X: (s∧s⁰) sort refinement

X.f=U &X .

=Y & (∃Z, g, Y.g=Z)

Y.f=U feature carrying

Y.f=U &Y.f =V U .

=V coreferencing variables

X: (s∧s⁰) & (s∧s⁰)6=s& (∃Z, f, X.f=Z)

unfold(X,(s∧s⁰)) type-checking

(in connection with sort refinement)

Fig. 4.The rewriting rules for synthesising typed FS unification.

3.2 Operators

This subsection formalises the operators used in the two procedures for generating hypotheses while learning attribute-values for typed-unification grammars, as presented in the previous subsection.

At a first reading, this subsection might be skipped; the reader interested in technical details for implementing the two learning procedures may come back to it later.

The rewriting rules which synthesise the unification of typed FSs rules are presented in Figure 4. The generalisation operators have been designed in connection with these rewriting rules. The generalisation operators are presented in Figure 5 (the basic version) and Figure 6 (the extension corresponding to type-checking).

As a matter of notation, in Figures 5 and 6 the underscore character (–) is used to designate conveniently either i. the actual sort of a given variable orii. a certain (computable) sort which meets a certain, specified condition. The notationrelax- sort(X/Y, –) is an abbreviation forrelax-sort(X, –) and/orrelax-sort(Y, –). The termgenuine constraint denotes an atomic constraint which appears as such in the input grammar. Non-genuine constraints are those produced/synthesised during unification.

The specialisation operators which are presented in Figure 7 are responsible with the generation of new hypotheses generally by the introduction of new constraints inside the FS of a given hypothesis. Figure 8 outlines the improvements that can be added to specialisation operators when using positive examples to guide the search through the hypothesis space.

The next two sections will show in detail how the two procedures presented in the previous subsection learn attribute path values in type FSs by generating (and searching through) spaces of hypotheses for grammar generalisation and respectively sub-type specialisation. The HPSG grammar we used for testing our

(8)

relax-sort(X, s)≡

ifX=Ψ.π, andX : – is a genuine constraint, then replace the original sort constraint onΨ.πwithΨ.π:s else (i.e.X: (s1∧s2) usingsort refinement)

(relax-sort(X, s) andrelax-sort(Y, s)) orremove(X .

=Y).

remove(Y.f =Z)≡

ifY.f=Z is a genuine constraint (Y =Ψ.π), then delete it fromΨ;

else, (i.e. ifremove(Y.f =Z) comes as result of feature carrying) remove(X .

=Y)or remove(X.f =U)or remove(Y.f =V), whereX .

=Y,X.f =U were used in the premise offeature carrying;

remove(U .

=V)≡ ifU .

=V is a genuine constraint (U =Ψ.π,V =Ψ.π⁰,U6=V), then delete it fromΨ (either a rule, or a type (used in type-checking)), and propagate thedis-equation U 6=V inΨ;

else, i.e. U .

= V was synthesised by (combined) application of feature carrying andcoreferencing variables,

remove(X .

=Y)or remove(X.f =U)or remove(Y.f =V), where X .

=Y, X.f =U were used in the premise offeature carrying, and

Y.f =V incoreferencing variables.

Fig. 5.The (basic) operators used in generalising typed-unification grammars.

learning strategy is detailed in Appendix . This grammar is capable of analysing sentences like:

Mary laughs.

John meets Mary.

John embarrasses the girl.

The girl is nice.

John thinks Mary is nice.

Mary thinks John thinks Mary is pretty.

Mary thinks John embarrasses the girl.

John thinks Mary thinks the girl thinks John is embarrassed.

4 Learning generalisations of attribute values:

exemplification

This section will exemplify how to learn/generate grammar generalisations using the first procedure in Figure 3.

(9)

relax-sort(X, s):

ifX: – comes with type-checking, then

eliminate (that)type-checking or relax-sort(X/Y, –) so to implyX:s.

remove(Y.f =Z):

ifY.f=Z comes with type-checking, then

eliminate (that) type-checking or relax-sort(X/Y, –) so not to involve Y.f =Z.

remove(U .

=V):

ifU .

=V comes with type-checking, then

eliminate (that) type-checking or relax-sort(X/Y, –) so not to involve U .

=V.

eliminate type-checking ≡

relax-sort(Y, –) so thats∧s⁰=sor removeall feat constraintsX.f=Z or remove(X .

= Y), where X .

= Y was used in the premise of sort refinement

Fig. 6.The type-checking extensions to generalising operators

Consider that the type satisfy HPSG principles in our training grammar was further constrained such that cat=. comp.cat.⁹ Let us denote byΨ⁰ this (over- constraint) form of the type satisfy HPSG principles. As a consequence, now the grammar will recognise at most sequences of words having the same category (like noun noun), and therefore no sentence from the initial set of examples will be accepted. How will theilpLIGHTsystem have to proceed to recover the initial form of the grammar (or a form closed to it) so to parse in an (acceptably) correct manner the initial set of examples?

We run the new (expanded) grammar using theLIGHTparser in ‘reverse’ mode, giving as input a simple grammatical parse likerH phrase(the, girl). The information provided by the tracer module of theLIGHTunifier — which gets stopped at the moment when that parse fails to be recognised — is that after having accepted the key argument girl for the rule rH phrase, the unification with the complement argumentthefails. The unification failure involves two incompatible sorts constraints,X:nounand Y:det, found on the path values X =girl.cat, and respectivelyY =girl.subcat.first(=. the.cat).

TheLIGHT unification tracer module finds this conflict as a sort-inconsistent pair of paths: (girl.cat,girl.subcat.first). How will theilpLIGHTsystem use this information to arrive (if possible) at the removal of the constraintcat .

=comp.cat (or equivalentlyhead.cat .

=comp.cat) inΨ⁰? There are two recovery possibilities:

9 Adding this constraint corresponds to equating the tags/variables #1 and #3 in the expanded form of thesatisfy HPSG principlestype in Figure 1.

(10)

equation introduction ≡ ifroot(ψ.ρ) =s, then

for all pathsρ⁰ inψsuch thatroot(ψ.ρ)∧root(ψ.ρ⁰)6=⊥, generate the hypothesis (ψ⁰, ρ), where

ψ⁰is obtained fromψby unifying/equatingψ.ρwithψ.ρ⁰. sort specialisation ≡

ifroot(ψ.ρ) =sands1, s2, ..., snare all the immediate subsorts ofs in S, then

generate the hypothesis (ψⁱ, ρ), where

ψiis obtained fromψby replacing the root sort ofψ.ρwithsi. (eventual) unfolding≡

Ifψ.ρis an atomic FS,s =root(ψ.ρ) and the typeΨ(s) is non-atomic, then

generate the hypotheses (ψ⁰, ρi), where

ψ⁰is obtained fromψby unifyingψ.ρwith (a copy of)Ψ(s), and ρi=ρ.fi, for each featurefidefined at root level inΨ(s).

Ifψ.ρis non-atomic, then

generate the hypothesis (ψ, ρⁱ), whereρⁱ=ρ.fⁱ,

for each featurefidefined at root level inψ.ρ, provided that the hypothesis (ψ, ρⁱ) was not already generated.

hypotheses combination ≡

if (ψ, ρ) was created before (ψ⁰, ρ⁰),

neitherρ⁰ is a prefix ofρnorρis a prefix ofρ⁰, andunify(ψ, ψ⁰) doesn’t fail, then

generate the hypothesis (ψ&ψ⁰, ρ);

Fig. 7.The operators used in generating thetype specialisation hypothesis.

1. Relax (or even remove) either one of the two constraints X:noun and Y:det involved in the sort clash.

Suppose that we run the grammar with the sort value of either one ofgirl.cat orgirl.subcat.firstrelaxed tocateg(or removed). When the grammar will be re- expanded — as it has to be done, for the sake of logical soundness — the old, clash- ing constraints will be re-instated fromΨ⁰. Therefore, the removal or relaxation of such a sort constraint will have to be reversely propagated in the un-expanded grammar.

To reach this aim, reverse expansion firstly suggests to relax/remove the constraintcat:nounfrom thenoun letype. Alternatively, the constraint ondet le.cat can be relaxed to categ. However, in all of these ways the grammar will accept many incorrect sentences (like, “the laughs”, if the cat value for det le was relaxed). (This is why these three alternatives were represented as compacted into a single search path in Figure 9.) Therefore the second generalisation path must be

(11)

sort specialisation:

it is enough to consider directly those subsortssi (of s) for whichsi : LUB(S),

where

S={root(ϕ.ρ) | ϕis substructure in the FS associated to a parse for a positive example, withroot(ϕ) =root(Ψ(t))};

equation introduction:

considerX .

=Y, whereψ.ρ=X andψ.ρ⁰=Y, only if for ϕchosen as above,ϕ.ρandϕ.ρ⁰ are identical/equated;

eventual unfolding:

do not apply it for attribute pathsρfor which ϕ.ρis undefined for anyϕtaken as above.

Fig. 8. Refinements of the specialisation operators — using positive examples to guide the search.

pursued:

2. Remove the “inner” source of sort clash, namely the equationrH phrase.head.cat

=. rH phrase.head.subcat.first. The LIGHT’s unification tracer will reveal that this equation — identified by analysing backwards the coreference chain between the heap cells representing variables — has lead to sort clash forrH phrase.head.cat

=girl.catandrH phrase.head.subcat.first =girl.subcat.first(=the.cat).

Actually, the equation rH phrase.head.cat =. rH phrase.head.subcat.first will be removed not (only) from therH phrasetype but fromsatisfy HPSG principles, i.e.Ψ⁰, as dictated by reverse expansion.

Removing an equation constraintU .

=V requires the creation of a copyUof the feature structureV shared inΨ⁰by the pathshead.catandhead.subcat.first. The reader should notice that in our example, removing the equationΨ⁰.head.cat

=. Ψ⁰.head.subcat.firstdoes not generate a single FS generalisingΨ⁰, but four!

This is because the value shared (before equation elimination) by the two paths insideΨ0was shared also with the pathscatandcomp.cat. Removing the equa- tionhead.cat=. head.subcat.firstfrom Ψ0 entails the creation of all possible permutations corresponding to the potential values (U andV) for the paths cat andcomp.cat. This is why/howΨ1,Ψ2,Ψ3,Ψ4are generated, as shown in Figure 9 and Figure 10. The evaluation of the parsing results in each case will show that one of them (Ψ3) is the right generalisation.

5 Learning specialisations of attribute values:

exemplification

This section will exemplify how the procedure for generating type specialisations presented in Figure 3 works for learning the Subcategorization Principle encoded

(12)

satisfy_HPSG_principles &

[CAT = COMP.CAT]

fail fail fail success fail

Ψ₂

Ψ1 Ψ₃ Ψ₄

rh_phrase

[HEAD.SUBCAT.FIRST = HEAD.CAT]

Ψ =₀ undecided

__|

girl.CAT) = (girl.SUBCAT.FIRST,

check disjunctive sub−type definitions

evaluate hypothesis

parsing derivational/reverse

/ [HEAD.SUBCAT.FIRST = HEAD.CAT]satisfy_HPSG_principles &

noun_le [CAT categ] det_le

[CAT categ]

(reverse) expansion (reverse) expansion propagation disequation

Fig. 9.A sample (grammar) search space when generalising a type.

in the typesatisfy HPSG principlesalready presented in Figure 1.

Note that the Subcategorization Principle is directly linked to the nature of the parsing itself in lexicalized typed-unification grammars like HPSG, Lexicalized Functional Grammars (LFG, [12]), Categorial Unification Grammars (CUG, [20]):

the (few) rules only provide very general principles/constraints about how phrases or words may be combined, while information about specific combinatorial valances of words is provided in the (many) lexical descriptions.

Basically, as formalised here on right, the Sub- categorization Principle says that when applying a rule — in order to build amother FS out of ahead daughter and acomplement daughter —, the complement daughter “consumes” the first element of the head daughter’s subcat list, and “transmits”

the rest of that list to the mother FS.

satisfy_HPSG_principle [ SUBCAT #2,

HEAD phrase_or_word [ SUBCAT #3|#2 ] COMP phrase_or_word

[ CAT #3 ] ] So, can we learn this very nature of the parsing in such lexicalized typed- unification grammars? How should we proceed? We outline below how we apply our learning strategy for type specialisation. Figure 11 shows the specialisation hypothesis search space explored/created when applying the second procedure in Figure 3.

1. To eliminate the Subcategorization Principle from our grammar, we delete the value of the head.subcat feature path in the unexpanded definition of the satisfy HPSG principlestype in Figure 14. Then, the expansion of this type would come

(13)

satisfy HPSG principles [ HEAD.CAT 6= HEAD.SUBCAT.FIRST ]

Ψ¹=satisfy_HPSG_principles [ CAT #1:categ,

SUBCAT #2:categ_list, HEAD #3:phrase_or_word

[ CAT categ, SUBCAT categ_cons

[ FIRST #1, REST #2 ] ], COMP #4:phrase_or_word

[ CAT #1, SUBCAT nil ], ARGS <#3, #4> ]

Ψ²=satisfy_HPSG_principles [ CAT #1:categ,

[ CAT #3:categ, SUBCAT categ_cons

[ FIRST #1, REST #2 ] ], COMP #5:phrase_or_word

[ CAT #3, SUBCAT nil ], ARGS <#4, #5> ] Ψ³=satisfy_HPSG_principles

[ CAT #1:categ, SUBCAT #2:categ_list, HEAD #4:phrase_or_word

[ CAT #1, SUBCAT categ_cons

[ FIRST #3:categ, REST #2 ] ], COMP #5:phrase_or_word

Ψ⁴=satisfy_HPSG_principles [ CAT #1:categ,

[ CAT #1, SUBCAT categ_cons

[ FIRST categ, REST #2 ] ], COMP #4:phrase_or_word

Fig. 10.Propagating a dis-equation in generalising a typed-unification grammar

up with the constraints head.subcat : categ list, cat : categ, and comp.cat : categ. The expanded form of the satisfy HPSG principles type will be now Ψ⁰, as shown in Figure 12.

All sentences which have been parsed with the initial grammar are parsable now too — because the associated parses are entailed by the new grammar, which is less restrictive —, but also many ungrammatical input sentences become (incorrectly) acceptable. (The grammar over-generates.) For instance, the phrase the laughs is acceptable since the two words, taken respectively as complement and head for the (under-constrained) rH phrase make it satisfiable. How should one proceed to fix this over-parsing issue?

2. The first possibility: try to equate the head.subcat value — whose sort is categ list— to a substructure inside the type satisfy HPSG principles, which has a compatible root sort. There are two such substructures: thesubcatattribute value whose sort is also categ list, and the comp.subcat value which is nil. Either one of the equational constraints (denoted as head.subcat .

= subcat, respectively head.subcat .

=comp.subcat) makes the new (expanded) form of the grammar reject all sample sentences, therefore it is not acceptable.

3. Alternatively, we may try to refine the sort constraint on thehead.subcatpath value (in Ψ⁰). Choosing to make it nil— one of the two descendents of categ list

(14)

Ψ₅

Ψ₃ Ψ₄

success

Ψ ₃Ψ ₄

unify_types( , )

Ψ .π.₂ REST : nil

Ψ .π.₂ REST : categ_cons

Ψ .2SUBCAT

Ψ .π.₂ REST =

Ψ .π.₂ COMP.SUBCAT fail

Ψ .π.₂ FIRST = Ψ .π.₂ COMP.CAT

Ψ .π.₂ FIRST =Ψ ._{2 CAT}

Ψ .π.₂ REST

fail 2 FIRST :

Ψ .π. det

noun adjective

verb fail

Ψ₂ Ψ ₁

unfold_type( )

Ψ .π = Ψ .1 _{1 SUBCAT}

Ψ₁

fail fail

Ψ .π : 0 nil

Ψ =₀ remove (satisfy_HPSG_principles, π) π = HEAD.SUBCAT fail

Ψ ._{0 SUBCAT} Ψ .π = ₀

Ψ .π : ₀ categ_cons

Ψ .π = ₀ Ψ .0 COMP.SUBCAT

= satisfy_HPSG_principles

Ψ .π.₂ FIRST

Fig. 11.A sample search space in specialising a typed attribute value.

found in the grammar’s sort hierarchy — would block any parse; we have to aban- don this alternative, and consider to refine the head.subcat path value inΨ⁰ to categ cons. LetΨ¹ denote this new form ofsatisfy HPSG principles. The new grammar over-generates again.

The outcome of parsing is still not (yet) significantly improved w.r.t. the above point 1: all sentences which were grammatical w.r.t. the initial grammar get accepted — since our most recent grammar is still less specific than the original grammar — and most (if not all) ungrammatical sentences tested before are still accepted. Therefore more constraints must be pushed onto Ψ0.head.subcat. In order to do so, we have to unfold it, using the definition of its type value, namely categ cons.LetΨ²denote the newly obtained form ofsatisfy HPSG principles.

4. We study the possibility of equating the value for the pathhead.subcat.first to compatible sub-types in the type current form (Ψ²) of satisfy HPSG principles.

The potential candidates are:cat,head.cat, andcomp.cat.

The first two possibilities — which are in fact one and the same, due to the equation constraint cat .

= head.cat in the type satisfy HPSG principles — are discarded when the result of parsing with the new form of the grammar is compared

(15)

Ψ⁰=satisfy_HPSG_principles [ PHON diff_list,

CAT #1:categ, SUBCAT categ_list, HEAD #4:phrase_or_word

[ PHON diff_list, CAT #1:categ, SUBCAT categ_list ], COMP #5:phrase_or_word

[ PHON diff_list, CAT categ, SUBCAT nil ], ARGS <#4, #5> ]

Ψ³=satisfy_HPSG_principles [ PHON diff_list,

CAT #1:categ, SUBCAT categ_list, HEAD #4:phrase_or_word

SUBCAT cons

[ FIRST #2:categ, REST categ_list ] ], COMP #5:phrase_or_word

[ PHON diff_list, CAT #2, SUBCAT nil ], ARGS <#4, #5> ]

Fig. 12.Specialisation types created during learning the Subcategorization Principle.

with the parsing results for the original grammar.

The provided sample sentences will support the equation satisfy HPSG principles[head.subcat.first .

=comp.cat].

which make the the current version ofsatisfy HPSG principlesbecomeΨ³as shown in Figure 12. The number of ungrammatical sentences accepted by the current form of the grammar is significantly reduced, but still sentences likethe embarrasses mary are not rejected.

5. We can try to further constraint the value of the pathhead.subcat.first. But replacing its actual sort, categ, to anyone of its descendants would reject some of the positive examples, i.e. grammatical sentences.¹⁰

6. Try to equate/coreference head.subcat.rest. The (type-guided) search sub- space is represented by the attribute pathssubcat, andcomp.subcat. Examples shall support the equation

satisfy HPSG principles[head.subcat.rest =. subcat].

7. The new, expanded form ofsatisfy HPSG principles(Ψ⁴), tested via parsing the given examples, proves to be perfectly convenient after combination/unification with the previously obtained specialisation (Ψ³). (In fact it coincides with the form in Figure 1.) At this point the Subcategorization Principle as described in the initial grammar is learnt!

10As a matter of fact, this alternative (or: path in the specialisation search space) can be pruned by analysing the parses eventually provided as annotation to the grammatical sentences (positive examples): the least upper bound (LUB) for thecatvalues in the lexical descriptions (which unify with theheadvalue ofsatisfy HPSG principlesduring parsing) is categ, therefore no sort refinement is acceptable here. See Figure 8 in Sec- tion 3.2 for different improvements to sub-type specialisation using positive examples.

(16)

6 Scalability issues

We have shown in the preceding sections that ILP learning of attribute path values is both feasible and interesting on reduced-scale grammar. The important question is how can it be scaled up to large grammars, since it is obvious that in such cases the search space may easily grow beyond acceptable limits. In this section we will give to answers to this question, reflecting on our recent and respectively current work with LinGO [10], the wide coverage HPSG grammar for English developed at the Center for the Study of Language and Information (CSLI), University of Stan- ford. The LinGO grammar — 2.5Mb of source code, going to 40.4Mb in expanded version — has 15059 types, among whish 62 are rules and 6897 lexical entries.

1. The ILP-based generalisation of typed unification grammars presented in Section 3 and exemplified in Section 4 inspired us to elaborate another learning technique, which we called Generalised Reduction (GR) and presented in [6]. A couple of issues differentiate these two techniques:

— GR eliminates only as much as possible of atomic feature constraints, as long as the parsing results on a training corpus are not affected. The equation constraints and sort constraints associated to “reduced”/eliminated feature values are implicitly discarded.

— this form of unification grammar generalisation through successive elimination of atomic feature constraints is applied only to the rule FSs. Lexical FSs and other type FSs are not affected.

Applying our Generalised Reduction technique on LinGO to an interesting result: more than 59% of feature constraints in the rule FSs can be eliminated, thus downsizing the rule FSs and yielding a significant reduction factor for memory consumption (62%) and the average parsing time (22%). As training corpus for applying the GR learning technique on LinGO, we used the CSLI test-suite which has 1348 sentences with an average length of 6.38 tokens and an ambiguity of 1.5 parses per sentence. For the test, we used the a test-suite made of 96 sentences, the average length of which is 8.5 and the ambiguity 14.14. Following the learning of the GR-reduced form of LinGO, the average parsing precision we obtained on this second test-suite was 83.79%.

2. Coming back to our (general) ILP-based procedures presented in Section 3, we tested the ilpLIGHTprototype extension to theLIGHTsystem on a number of cases (similar to those presented in Sections 4 and 5) on the LinGO grammar. We saw that the potentially very large search space to be explored (especially) during generalisation can be effectively controlled by parameterising that procedure with a set P of types to which the search is limited.¹¹ For instance, when limiting the search space for the example presented in Section 4 to the satisfy HPSG principle,

11For the implementation, that means that when running the parser in ‘reverse’ mode, its output will be limited to failure paths insideP.

(17)

the tree in Figure 9 is pruned to its right-branch subtree. Similarly, for specialisation we can opt to limit the newly proposed feature paths (through unfolding) to a certain depth.

We make the important remark that restricting the application of the generalisation procedure through the above proposed (type FSs set) parameter is not only justified by efficiency reasons; it corresponds to the naturally modularised development of unification grammars, supported directly by their hierarchical or- ganisation.

We are currently doing engineering work aiming to replace the ilpLIGHT prototype with a much faster version, suitable for more extensive tests. We appre- ciate that after this new version will become stable, the parallelisation of tasks for checking the validity of the hypotheses proposed by the ILP learner will be possible without much difficulty, by using multiple instances of the LIGHT parser distributed on a cluster of PCs.

Conclusion: We presented an framework for logic-based inductive learning of attribute path values in typed-unification grammars. The implemented prototype

— ilpLIGHT— extended the LIGHT parsing system with an ILP/learner module whose core is made of a pair of procedures for generating/inferring hypotheses.

Driven by simply annotated sample sentences, the creation of these hypotheses either generalises (one or more types automatically identified in) the input grammar or specialises a designated type FS definition. We tested both generalisation and specialisation on both a simple but relevant HPSG grammar and a wide-coverage grammar for English.

We expect that our ILP-based learning technique will first serve as a debugging tool assisting the users in the elaboration of new (extensions to) typed-unification grammars.¹²Moreover, in combination with shallow parsing techniques and lexical semantics resources like WordNet [13], our approach may also serve to perform lexical inferences. We will explore test this idea while extending the LIGHTsystem (and the LinGO grammar) so to parse abstracts of bio-medical reports.¹³

Acknowledgements:This work was primarily supported by the EPSRC ROPA grant “Machine Learning for Natural Language Processing in a Computational Logic Framework”. TheLIGHTsystem was developed at the Language Technology Lab of The German Research Center for Artificial Intelligence (DFKI), Saarbr¨ucken, Germany. Many thanks go to Ulrich Callmeier who implemented the first versions of thetracer andderivation functions for theLIGHTsystem. I am grateful also to Steven Muggleton, Dimitar Kazakov, Suresh Manandhar, James Cussens and Ross King who, in different ways, made possible my work on this subject.

12See [3] for a recent proposal of methodological, concerted development of such grammars.

13Note that existing approaches to such a task [5] [11] do not tackle the issue of building specific lexical descriptions for words not found in the grammar’s lexicon. They simply associate such words only word-independent/categorial characterization.

(18)

Appendix: A sample HPSG grammar

We give here the grammar that we adapted from [19] for exemplification of our learning strategy for typed-unification grammars, as it was presented in Section 3.

This grammar is made of two parts: the sort/type hierarchy and the type definitions. For conciseness reasons, the sort hierarchy is given in Figure 13, while the types are presented in Figure 14 using theLIGHTsyntax:

Strings are surrounded by quotes. The sortsstart,listand the list subsortscons andnilare special (reservedLIGHT) types, and so is the difference list sort,diff list too. The notation<! !>in the grammar’s code is a syntax sugar for difference lists, just as< > is used for lists. Also, ! is a constructor for difference lists, playing a similar role to that played by the constructor|for non-nil (i.e. cons) lists.

Just for convenience, in Figure 13 the sorts situated under a dashed line are derived (i.e. have the same parents) as the sort found immediately above that line. For instance, the adjectiveprettyis derived from adjective le, similarly to the adjective nice. The grammar has two rules, lH phrase and rH phrase (see section program), and 15 lexical entries (see sectionquery).

maryjohn phrase

satisfy_HPSG_principles

word

noun_le det_le

the rh_phrase lh_phrase

top

list verb

adjective noun

det

diff_list string

categ phrase_or_word

categ_cons nil

categ_list cons

pnoun_le cnoun_le

girl

tverb_le iverb_le

laughs thinks adjective_le verb_le

embarrasseskisses embarrassednice

pretty kissedmet

is meets

Fig. 13.The sort signature for the grammar in Figure 14.

(19)

types:

cons

[ FIRST top, REST list ] diff_list

[ FIRST_LIST list, REST_LIST list ] categ_cons

[ FIRST categ, REST categ_list ] phrase_or_word [ PHON diff_list,

CAT categ, SUBCAT categ_list ] phrase

[ HEAD #1:phrase_or_word, COMP #2:phrase_or_word, ARGS <#1, #2> ]

satisfy_HPSG_principles

[ CAT #1,

SUBCAT #2, HEAD top

[ CAT #1,

SUBCAT #3|#2 ], COMP top

[ CAT #3, SUBCAT nil ] ] det_le

[ CAT det, SUBCAT nil ] noun_le [ CAT noun ] pnoun_le [ SUBCAT nil ] cnoun_le

[ SUBCAT <det> ] adjective_le [ CAT adjective,

SUBCAT nil ] iverb_le [ CAT verb,

SUBCAT <noun> ] tverb_le

[ CAT verb,

SUBCAT <noun, noun> ]

program: // rules lH_phrase

[ SUBCAT cons,

PHON #1!#3,

HEAD.PHON #1!#2, COMP.PHON #2!#3 ] rH_phrase

[ SUBCAT nil,

PHON #1!#3,

HEAD.PHON #2!#3, COMP.PHON #1!#2 ] query: // lexical entries the[ PHON <!"the"!> ] girl[ PHON <!"girl"!> ] john[ PHON <!"john"!> ] mary[ PHON <!"mary"!> ] nice[ PHON <!"nice"!> ]

embarrassed[ PHON <!"embarrassed"!> ] pretty[ PHON <!"pretty"!> ]

met[ PHON <!"met"!> ] kissed[ PHON <!"kissed"!> ] is[ PHON <!"is"!>,

CAT verb,

SUBCAT <adjective, noun> ] laughs[ PHON <!"laughs"!> ] kisses[ PHON <!"kisses"!> ] thinks[ PHON <!"thinks"!>,

CAT verb,

SUBCAT <verb, noun> ] meets[ PHON <!"meets"!> ]

embarrasses[ PHON <!"embarrasses"!> ]

Fig. 14.A sample typed-unification grammar (adapted from [17]), inLIGHTformat.

(20)

References

1. H. A¨ıt-Kaci and R. Nasr. Login: A logic programming language with built-in inheri- tance. Journal of Logic Programming, 3:185–215, 1986.

2. H. A¨ıt-Kaci and A. Podelski. Towards a meaning of LIFE. Journal of Logic Pro- gramming, 16:195–234, 1993.

3. E. Bender, D. Flickinger, and S. Oepen. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. InProceedings of COLING 2002 Workshop on Grammar Engi- neering and Evaluation, Taipei, Taiwan, 2002.

4. B. Carpenter. The Logic of Typed Feature Structures – with applications to unification grammars, logic programs and constraint resolution. Cambridge University Press, 1992.

5. J. Carroll and E. Briscoe. High precision extraction of grammatical relations. In Proceedings of the 7th ACL/SIGPARSE International Workshop on Parsing Tech- nologies, pages 78–89, Beijing, China, 2001.

6. L. Ciortuz. Learning attribute values in typed-unification grammars: On generalised rule reduction. In D. Roth and A. van den Bosch, editors,Proceedings of the 6th Con- ference on Natural Language Learning (CoNLL–2002), pages 70–76, Taipei, Taiwan, 2002. Morgan Kaufmann Publishers and ACL.

7. L. Ciortuz. LIGHT — a constraint language and compiler system for typed- unification grammars. InKI-2002: Advances in Artificial Intelligence, volume 2479, pages 3–17, Berlin, Germany, 2002. Springer-Verlag. Proceedings of the 25th Annual German Conference on AI, Aachen, Germany.

8. L. Ciortuz. Towards inductive learning of typed-unification grammars. InElectronic Proceedings of the 17th Workshop on Logic Programming, 2002. Dresden Technical University, (http://www.computational-logic.org/local/wlp2002/WLP schedule.html).

9. J. Cussens and S. Pulman. Experiments in inductive chart parsing. In Learning Language in Logic, volume 1925 of LNAI. Springer, 2000.

10. Daniel P. Flickinger, Ann Copestake, and Ivan A. Sag. HPSG analysis of English. In Wolfgang Wahlster, editor, Verbmobil: Foundations of Speech-to-Speech Translation, Artificial Intelligence, pages 254–263. Springer-Verlag, 2000.

11. C. Grover, M. Lapata, and A. Lascarides. A comparison of parsing technology for the biomedical domain. 2002. Submitted for publication to The Journal of Natural Language Engineering. (http://www.ltg.ed.ac.uk/disp/papers.html).

12. R. M. Kaplan and J. Bresnan. Lexical-functional grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pages 173–381. MIT Press, 1983.

13. G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to Word- Net: an on-line lexical database. International Journal of Lexicography, 3(4):235–244, 1990.

14. S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods.

Journal of Logic Programming, 19,20:629–679, 1994.

15. F.C.N. Pereira and D. Warren. Definite Clause Grammars for Language Analysis – a survey of the formalism and a comparison with Augmented Transition Networks.

Artificial Intelligence, 13:231–278, 1983.

(21)

16. C. Pollard and I. Sag. Head-driven Phrase Structure Grammar. CSLI Publications, Stanford, 1994.

17. S. Shieber. An introduction to unification-based approaches to grammars. CSLI Pub- lications, Stanford, 1986.

18. S. M. Shieber, H. Uszkoreit, F. C. Pereira, J. Robinson, and M. Tyson. The formalism and implementation of PATR-II. In J. Bresnan, editor, Research on Interactive Acquisition and Use of Knowledge. SRI International, Menlo Park, Calif., 1983.

19. G. Smolka and R. Treinen, editors. The DFKI Oz documentation series. German Re- search Center for Artificail Intelligence (DFKI), Stuhlsatzenhausweg 3, Sarrbr¨ucken, Germany, 1996.

20. H. Uszkoreit. Categorial Unification Grammar. InInternational Conference on Com- putational Linguistics (COLING’92), pages 498–504, Nancy, France, 1986.

21. J. Zelle and R. Mooney. Learning semantic grammars with constructive inductive logic programming. In Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93), pages 817–822. AAI Press/MIT Press, 1993.