Linguistic complexity locality of syntactic dependencies

更新时间:2023-04-20 07:34:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Linguistic complexity:locality of syntactic dependencies

Edward Gibson*

Department of Brain and Cognitive Sciences,Massachusetts Institute of Technology,

Cambridge,MA 02139,USA

Received 18September 1997;accepted 24April 1998

Abstract

This paper proposes a new theory of the relationship between the sentence processing mechanism and the available computational resources.This theory –the Syntactic Prediction Locality Theory (SPLT)–has two components:an integration cost component and a compo-nent for the memory cost associated with keeping track of obligatory syntactic requirements.Memory cost is hypothesized to be quanti?ed in terms of the number of syntactic categories that are necessary to complete the current input string as a grammatical sentence.Furthermore,in accordance with results from the working memory literature both memory cost and integration cost are hypothesized to be heavily in?uenced by locality (1)the longer a predicted category must be kept in memory before the prediction is satis?ed,the greater is the cost for maintaining that prediction;and (2)the greater the distance between an incoming word and the most local head or dependent to which it attaches,the greater the integration cost.The SPLT is shown to explain a wide range of processing complexity phenomena not previously accounted for under a single theory,including (1)the lower complexity of subject-extracted relative clauses com-pared to object-extracted relative clauses,(2)numerous processing overload effects across languages,including the unacceptability of multiply center-embedded structures,(3)the lower complexity of cross-serial dependencies relative to center-embedded dependencies,

(4)heaviness effects,such that sentences are easier to understand when larger phrases are placed later and (5)numerous ambiguity effects,such as those which have been argued to be evidence for the Active Filler Hypothesis.?1998Elsevier Science B.V.All rights reserved Keywords:Linguistic complexity;Syntactic dependency;Sentence processing;Computa-tional resources

1.Introduction

The process of comprehending a sentence involves structuring a sequence of 0010-0277/98/$19.00?1998Elsevier Science B.V.All rights reserved

PII

S0010-0277(98)00034-1C O G N I T I O N

Cognition 68(1998)1–76

*E-mail:gibson@9b79fbc26137ee06eff9187f

2 E.Gibson/Cognition68(1998)1–76

words syntactically and semantically to arrive at a representation of the sentence’s meaning.These processes consume computational 9b79fbc26137ee06eff9187fputational resources in the form of memory resources are also required to maintain the current unintegrated syntactic and conceptual structures activated in memory during the processing of a sentence.Thus an important part of understanding how humans process language involves understanding the relationship between the sentence processing mechanism and the available computational resources.

One well-established complexity phenomenon to be explained by a theory of the relationship between the sentence processing mechanism and the available compu-tational resources is the higher complexity of an object-extracted relative clause (RC)compared with a subject-extracted RC in a Subject-Verb-Object language like English:1

(1)(a)[S The reporter[S′who[S the senator attacked]]admitted the error].

(b)[S The reporter[S′who[S attacked the senator]]admitted the error].

In(1b),the relative pronoun‘who’is extracted from the subject position of the RC,whereas the same pronoun is extracted from the object position in(1a).The object extraction is more complex by a number of measures including phoneme-monitoring,on-line lexical decision,reading times,and response-accuracy to probe questions(Holmes,1973;Hakes et al.,1976;Wanner and Maratsos,1978;Holmes and O’Regan,1981;Ford,1983;Waters et al.,1987;King and Just,1991).In addition,the volume of blood?ow in the brain is greater in language areas for object-extractions than for subject-extractions(Just et al.,1996a,b;Stromswold et al.,1996),and aphasic stroke patients cannot reliably answer comprehension ques-tions about object-extracted RCs,although they perform well on subject-extracted RCs(Caramazza and Zurif,1976;Caplan and Futter,1986;Grodzinsky,1989; Hickok et al.,1993).

The source of the complexity difference is not related to lexical or plausibil-ity differences because both structures involve the same lexical items in equally plausible relationships among one another.Furthermore,the complexity differ-ence is not caused by a re-analysis difference due to a local ambiguity(a ‘garden-path’effect).Although there is potentially a brief local ambiguity at the word‘who’,there is no reanalysis effect at the disambiguating subject NP in the object-extraction(or at the disambiguating verb in the subject-extraction), compared with unambiguous control sentences(e.g.Stowe,1986).Thus,theories of ambiguity resolution(e.g.Frazier,1987a;MacDonald et al.,1994;Mitchell, 1994;Trueswell et al.,1994)make no predictions in these kinds of examples,nor 1For simplicity,I will assume a theoretically neutral phrase structure grammar,one which includes expansions of the form S→NP VP and S′→Comp S.No complexity results hinge on this assumption. In particular,the processing theory to be described here is compatible with a wide range of phrase structure theories,including lexical functional grammar(Bresnan,1982),dependency grammars(e.g. Hudson,1984,1990),categorial grammars(e.g.Ades and Steedman,1982),Head-Driven Phrase Struc-ture Grammar(Pollard and Sag,1994),and the Minimalist program(Chomsky,1995).

more generally in any construction which does not involve ambiguity.The only remaining plausible cause of the complexity difference is in the quantity of com-putational resources that the two constructions require to process.

A more general class of complexity effects to be accounted for by a theory of computational resources is the high complexity associated with nested (or center-embedded )structures,where a syntactic category A is said to be nested within another category

B in the con?guration in (2):

(2)[B X A Y ]

Increasing the number of nestings soon makes sentence structures unprocessable (Chomsky,1957,1965;Yngve,1960;Chomsky and Miller,1963;Miller and Chomsky,1963;Miller and Isard,1964).For example,consider the contrast between (3a),which contains a singly-nested relative clause (RC)structure and (3b),which contains a doubly-nested RC structure:2

(3)(a)[S The intern [S ′who [S the nurse supervised ]]had bothered the admin-

istrator [S ′who [S lost the medical reports ]]].

(b)#The administrator [S ′who [S the intern [S ′who [S the nurse supervised ]]had bothered ]]lost the medical reports.

In (3a),the RC ‘who the nurse supervised’is nested within the matrix sentence subject-verb dependency ‘the intern...had bothered’.In (3b)a second RC (‘who the nurse supervised’)interrupts the subject-verb dependency in the ?rst embedded sentence (‘the intern had bothered’),resulting in a structure that is so complex that it is unprocessable for most people.Note that the two sentences contain the same words and have the same meaning,so the complexity difference is not due to plausibility differences.Furthermore,there is no local ambiguity in (3b),so the processing dif?culty associated with this sentence is not related to ambiguity con-fusions.This type of sentence processing breakdown effect is often referred to as a processing overload effect.3

Multiply nested structures are similarly complex cross-structurally and cross-linguistically.For example,the English clausal modi?er structures are increasingly nested in (4)and are correspondingly increasingly dif?cult to understand.Similarly,2

Sentences that cause extreme processing dif?culty are pre?xed with the symbol ‘#’.

3This effect is distinct from processing breakdown caused by a preference in local ambiguity (a ‘garden-path’effect),such as in (i):(i)#The dog walked to the park was chasing the squirrel.

Although garden-path sentences like (i)are dif?cult to process,once the reader/listener realizes the correct interpretation,he/she can obtain this interpretation and process the structure without dif?culty.In contrast,even when the reader/listener understands what the appropriate interpretation for a sentence structure causing a processing overload effect is,it is still not possible to arrive at this interpretation using the normal sentence processing mechanism.3

E.Gibson /Cognition 68(1998)1–76

4 E.Gibson/Cognition68(1998)1–76

the Japanese sentential complement structures in(5)have the same meaning,but (5b)is more nested than(5a),and is therefore harder to understand.

(4)(a)[S[S′If[S the mother gets upset]][S′when[S the baby is crying]],

[S the father will help],[S′so[S the grandmother can rest easily]]].

(b)[S[S′If[S[;S′when[S the baby is crying]],[S the mother gets upset]]],[S

the father will help],[S′so[S the grandmother can rest easily]]].

(c)#[S[S′Because[S[S′if[S[S′when[S the baby is crying]],[S the mother gets

upset]]],[S the father will help]]],[S the grandmother can rest easily]]. (5)(a)[S[S′[S Bebiisitaa-ga[S′[S ani-ga imooto-o ijimeta]to]itta]to]obasan-ga

omotteiru]

babysitter-nom older-brother-nom younger-sister-acc bullied that said that aunt thinks

‘My aunt thinks that the babysitter said that my older brother bullied my younger sister’

(b)#[S Obasan-ga[S′[S bebiisitaa-ga[S′[S ani-ga imooto-o ijimeta]to]itta]to

]omotteiru]

aunt-nom babysitter-nom older-brother-nom younger-sister-acc bullied that said that thinks

‘My aunt thinks that the babysitter said that my older brother bullied my younger sister’

Although the nesting complexity effects and the subject-versus object-extraction complexity effects are well-established phenomena in the literature,there is cur-rently little agreement as to what properties of the complex constructions make them hard to understand.Some of the proposals in the literature include the following(see Gibson(1991)for a comprehensive review):4

4One factor which can contribute to the processing complexity of nested structures but is orthogonal to the factors to be investigated here is semantic similarity.Center-embedded RC structures like(3b)are easier to comprehend if the NPs come from distinct semantic classes and if the roles assigned by the following verbs also are compatible with distinct semantic classes,so that it is easy to guess who is doing what to whom(Stolz,1967;Schlesinger,1968;King and Just,1991).For example,(ii)is easier to comprehend than(3b)(Stolz,1967):

(ii)#The vase that the maid that the agency hired dropped on the?oor broke into a hundred pieces

Although semantic-role disambiguation improves the acceptability of these kinds of structures,a com-plexity theory based on semantic role interference alone is insuf?cient to explain many complexity effects.For example,although(ii)is easier to comprehend than(3b),it is still very complex,and this complexity needs to be accounted for.Furthermore,including an additional pragmatically-distinguishable nested RC makes the structure virtually unprocessable,similar or more complex than(3b):

(ii)#The vase that the maid that the agency that the lawyer represented hired dropped on the?oor broke into a hundred pieces.Hence factors other than semantic similarity or interference are responsible for the complexity of nested structures like(iii).

?Stacking incompletely parsed phrase-structure rules (Yngve,1960;

Chomsky and Miller,1963;Miller and Chomsky,1963;Miller and Isard,1964;Abney and Johnson,1991).This theory hypothesizes that complexity is indexed by the number of phrase structure rules that the parser has to hold in memory on a stack at a particular parse state.Assuming that the human parsing algorithm is partially top-down and partially bottom-up,using,e.g.a left-corner algorithm (Kimball,1975;Johnson-Laird,1983;see Abney and Johnson,1991;Gibson,1991;Stabler,1994;for recent summaries),this theory accounts for the contrast between (3a)and (3b)as follows.The maximal stack depth in parsing (3a)is three,at the point of processing the determiner ‘the’following ‘who’in the ?rst relative clause.There are three incompletely parsed phrase structure rules at this point:(1)the matrix S →NP VP rule,(2)the relative clause rule S ′→Comp S and (3)the NP rule NP →Det N.In contrast,the maximal stack depth in parsing (3b)is ?ve,at the point of processing the determiner ‘the’of the subject NP in the most embedded relative clause.

?Self-embedding (Miller and Chomsky,1963;Miller and Isard,1964;Gibson

and Thomas,1996).A category A is said to be self-embedded within another category B if it is nested as in the con?guration in (2)and A and B are the same type of syntactic category,e.g.a sentence (S)node.The hypothesis is that the parser has additional parsing dif?culty with self-embedded struc-tures.The motivation for this claim is that a stack-based parser might con-fuse two stacked instances of the same category,and not know which one to go back to,creating additional dif?culty.According to this theory,part of the dif?culty associated with understanding each of (3b),(4c)and (5b)is due to the fact that the most embedded clause is self-embedded within the second clause,which is itself self-embedded within the matrix clause.

?Perspective shifts (MacWhinney,1977,1982;MacWhinney and Pleh,

1988;cf.Bever’s (1970)double function hypothesis).According to the perspective-shift theory,processing resources are required to shift the perspective of a clause,where the perspective of a clause is taken from the subject of the clause.Under this theory,the reason that an object-extracted RC is harder to process than a subject-extracted RC is that proces-sing an object-extracted RC requires two perspective shifts:(1)from the perspective of the matrix subject to the subject of the RC and (2)from the perspective of the subject of the RC back to the matrix subject,after the RC is processed.Processing the subject-extracted RC requires no perspective shifts,because the matrix subject is also the subject of the RC,so that both clauses come from the same perspective.Processing a doubly nested RC structure like (3b)requires multiple perspective shifts,and so is correspond-ingly more dif?cult.

?Incomplete syntactic/thematic dependencies (Kimball,1973;Hakuta,1981;

MacWhinney,1987;Gibson,1991;Pickering and Barry,1991;Lewis,1993;Stabler,1994).An extension of the stack-based idea which can account for the subject-versus object-extraction contrast associates memory cost with 5

E.Gibson /Cognition 68(1998)1–76

6 E.Gibson/Cognition68(1998)1–76

incomplete syntactic dependencies rather than phrase structure rules.This theory allows an explanation of the contrast between subject-and object-extractions in(1),because object-extracted RCs involve one more nested dependency than do subject-extracted RCs:the embedding of the subject ‘the senator’between the relative pronoun and its role-assigning verb ‘attacked’.

A slightly narrower version of this idea was proposed by Gibson(1991),who,

building on similar ideas from Hakuta(1981)and MacWhinney(1987), hypothesized that there is a memory cost associated with each incomplete dependency involving a thematic-role assignment.Thus,an NP that requires

a thematic role but has not yet received a role is associated with memory cost,as

is a thematic role assigner which requires an argument but has not found one yet.Processing a subject-extraction like(1b)is associated with a maximal memory cost of two local thematic violations,at the point of processing the relative pronoun‘who’:one for the matrix subject‘the reporter’and one for the relative pronoun‘who’.In contrast,the maximal memory cost of processing (1a)is three local thematic violations:one for each of‘the reporter’,‘who’and ‘the senator’.

Observations about the unacceptability of multiply nested structures like(3b) led Gibson to hypothesize that the maximal memory capacity is four local thematic violations,so that sentences with parse states requiring?ve local thematic violations are unprocessable.In particular,(3b)is unprocessable because its processing involves a state with?ve local thematic violations.At the point of processing the most embedded subject‘the nurse’there are?ve thematic violations:one corresponding to each of the initial NPs:‘the admin-istrator’,‘who’,‘the intern’,‘who’and‘the nurse’.

A related account of the dif?culty of processing doubly nested structures like

(3b)is Kimball’s(1973)principle of two sentences,which states that the con-stituents of no more than two sentences(clauses)can be parsed at one time(cf.

Cowper,1976;Lewis,1993;Stabler,1994).According to this theory,the parser can maintain the predictions of one or two sentence nodes at one time,but there are not enough memory resources for the parser to predict three sentence nodes at the same time.This theory therefore correctly accounts for the unacceptabil-ity of sentence structures like(3b),because it requires three incomplete sen-tence nodes at its most complex point,at the point of processing the most embedded subject NP.

?The CC-READER(Capacity Constrained Reader)model(Just and Carpen-ter,1992).This is an activation-based production-rule model of reading times for English singly-embedded relative clause extractions.This model assumes that there is a limited pool of working memory resources available for both integration operations and storage.This assumption allows the model to account for experimental observations involving different groups of participants with different quantities of working memory capacity.This model is not linguistically formalized well enough yet to see how it applies

more generally,beyond the relative clause constructions it was designed to account for.

?Connectionist models.Another class of models of linguistic complexity are

connectionist models (e.g.Kempen and Vosse,1989;Elman,1990,1991;Weckerly and Elman,1992;Miikkulainen,1996;Christiansen,1996,1997;Christiansen and Chater,1998).The goal for these models is to have the complexity phenomena fall out from the architecture of the processor.This kind of model,with a basis in neural architecture,may eventually provide an architectural explanation of the approach proposed here.However,because these types of models are still quite novel,they have not yet been applied to a wide range of phenomena across languages.

Although there have been many theories of the computational resources available for sentence processing,each of the theories accounts for only a limited range of the data currently available.Each of the theories is incompatible with a wide range of evidence.Furthermore,with the exception of the CC-READER model,the theories currently available do not account for comprehension times.

One complexity contrast which is particularly interesting,because no current theory accounts for it,is the contrast between embedding a relative clause within a sentential complement (SC)and the reverse embedding consisting of an SC within an RC (Cowper,1976;Gibson,1991;Gibson and Thomas,1997a):5

(6)(a)Sentential complement,then relative clause (SC/RC):

The fact that the employee who the manager hired stole of?ce supplies

worried the executive.

(b)Relative clause,then sentential complement (RC/SC:

#The executive who the fact that the employee stole of?ce supplies worried hired the manager.

The SC/RC embedding is much easier to understand than the RC/SC embedding (Gibson and Thomas,1997a).This contrast is not explained by any of the theories listed above.For example,the principle of two sentences predicts that both con-structions are unprocessable,because each includes a parse state at which there are 5The difference between a relative clause and a sentential complement is that one of the NP positions inside a relative clause is empty,re?ecting the presence of the wh-pronoun at the front of the clause.In contrast,there is no such empty argument position in a sentential complement.For example,the initial sequence of words ‘that the reporter discovered’in (iv)and (v)is ambiguous between a relative clause modi?er and a sentential complement of ‘information’:

(iv)[NP The information [S ′that [S that the reporter discovered the tax documents ]]worried the senator.(iv)[NP The information [S ′that [S that the reporter discovered ]]worried the senator.

The presence of the overt NP as the object of ‘discovered’in (iv)disambiguates this clause as a sentential complement.The lack of an overt object of ‘discovered’in (v)disambiguates this clause as a relative clause.7

E.Gibson /Cognition 68(1998)1–76

8 E.Gibson/Cognition68(1998)1–76

three incomplete sentences:at the point of processing the most embedded subject (‘the manager’in(6a)‘the employee’in(6b)).

This paper provides a new theory of the relationship between the language com-prehension mechanism and the available computational resources:the Syntactic Prediction Locality Theory(SPLT).This theory contains two major components: (1)a memory cost component which dictates what quantity of computational resources are required to store a partial input sentence and(2)an integration cost component which dictates what quantity of computational resources need to be spent on integrating new words into the structures built thus far.The important idea in both of these components of the theory is locality:syntactic predictions held in memory over longer distances are more expensive(hence the name syntactic prediction locality theory),and longer distance head-dependent integrations are more expen-sive.The SPLT is notable for its simplicity,its quantitative predictions,and its ability to account for a range of data unexplained by other current theories,including the contrast between embedding orderings of sentential complements and relative clauses,as well as numerous other data from a number of languages.The paper is organized as follows.Section2proposes the SPLT and provides motivations and initial evidence for each of its components.Section3demonstrates how a variety of processing overload effects from the literature such the relative clause/sentential complement embedding effects are accounted for by the SPLT.Section4shows how the SPLT accounts for a range of heaviness or length effects in sentence processing. Section5shows how the SPLT can be applied to account for a range of ambiguity effects from the literature.A variant of the proposed memory cost theory is then presented in Section6,and it is shown how this theory makes similar predictions to the SPLT.Concluding remarks,including some possible conse-quences of a theory like the SPLT in other aspects of the study of language, are presented in Section7.

2.The Syntactic Prediction Locality Theory

This section has the following organization.First,some assumptions regarding the structure of the underlying sentence comprehension mechanism are presented in Section2.1.The integration component of the SPLT is then presented in Section2.2, followed by the memory component of the theory in Section2.3.The relationship between the two components of the theory is then discussed in Section2.4,leading to a theory of comprehension times.Section2.5presents a theory of intuitive com-plexity judgments within the SPLT framework.Evidence for various aspects of these proposals is provided in Section2.6.

2.1.The underlying sentence comprehension mechanism

Recent results have suggested that constructing an interpretation for a sentence involves the moment-by-moment integration of a variety of different information sources,constrained by the available computational resources.The information

sources include lexical constraints (Ford et al.,1982;MacDonald et al.,1994;Trueswell et al.,1994;Trueswell,1996),plausibility constraints (Tyler and Mar-slen-Wilson,1977;McClelland et al.,1989;Pearlmutter and MacDonald,1992;Trueswell et al.,1994)and discourse context constraints (Crain and Steedman,1985;Altmann and Steedman,1988;Ni et al.,1996).Following Just and Carpenter (1992),MacDonald et al.(1994),Stevenson (1994)and Spivey-Knowlton and Tanenhaus (1996)among others,I will assume an activation-based approach to implement the interaction of these constraints.In particular,I will assume that each discourse structure for an input string is associated with an activation (e.g.a number between 0and 1)which indicates how highly rated the representation is according to the combination of the constraints.Furthermore,I will assume that there is a target activation threshold T 1such that the processor works to activate a discourse representation above this threshold.In reader-paced comprehension,I will assume that the processor does not move on to a new region of analysis until at least one representation for the input so far reaches this threshold.

I will also assume that there is a limited pool of computational resource units available to activate representations,so that if there are more resources available,then the activations occur more quickly.By the same token,the more resources that a particular component of a structural activation requires,the slower the activation will be.Furthermore,I assume that it takes different quantities of energy (or work)from this resource pool to perform different aspects of representation activation,re?ecting the different constraints on sentence comprehension.For example,high frequency lexical items require fewer energy resources to become activated than low frequency lexical items do.As a result,high frequency lexical items can be activated by the resource pool more quickly.Relatedly,structures representing plausible meanings require fewer energy resources to become activated than structures repre-senting implausible meanings.The two primary aspects of resource use to be dis-cussed in this paper are structural integration and structural maintenance (or storage).Details of the hypotheses regarding these latter two aspects of resource use will be discussed in depth in the following sections.

In this framework,the structural integrations considered by the processor are limited by the syntactic constraints associated with each lexical item.Syntactic constraints are assumed to be implemented as lexically-based predictions about the categories that can follow the current lexical item L ,which are in a head-dependent relationship with L (Gibson,1991;cf.MacDonald et al.,1994).Some of these syntactic predictions are obligatory predictions (e.g.the prediction of a noun following a determiner)while others are optional (e.g.optional arguments of a verb,and all modi?er relationships).Structure building within this framework consists of looking up the current word in the lexicon and then matching the categories of these lexical entries to the predictions in the structures built thus far.

If all the constraints favor one structure for the input,then that structure will quickly receive a high activation.If there are two competing analyses for the input which are similar in heuristic value,then it will take longer to bring one of the representations over the threshold T 1,because of the limited pool of resources available (MacDonald et al.,1994;Trueswell et al.,1994).If one or more constraints 9

E.Gibson /Cognition 68(1998)1–76

10 E.Gibson/Cognition68(1998)1–76

disfavor a representation(e.g.it is implausible,or it contains syntactic predictions

that require a lot of resources to keep activated)then many energy resources will be

needed to activate it above the threshold activation T1.If there are no better compet-

ing structures available,then this structure will be followed,but it will be processed

slowly because of the large quantity of resources that it needs to activate it.Further-

more,because of the limited size of the resource pool,a representation might require

more resources to activate it over the threshold activation T1than are available in the

pool.

Another important parsing issue concerns how many representations the proces-

sor can retain at one time during the parse of a sentence:one(the serial hypothesis)

or more than one(the parallel hypothesis).Although this issue is orthogonal to the

questions of processing complexity in unambiguous structures to be discussed in this

section and the following two sections of this paper,the issue becomes more impor-

tant when the resource complexity in?uences on ambiguity resolution are consid-

ered in Section5.It is clear that the processor does not always follow all possible

interpretations of a locally ambiguous input string,because of the existence of

garden-path effects,as in(7)(see Gibson(1991)for a catalogue of ambiguities in

English leading to processing dif?culty):

(7)#The dog walked to the park was chasing the squirrel.

However,the existence of ambiguities which are dif?cult to resolve toward one

of their possible resolutions does not decide between a serial processor and a

ranked parallel processor.Following Gibson(1991)(cf.Kurtzman,1985;Gorrell,

1987;Jurafsky,1996),I will assume that the processor is ranked parallel,such that a

lower-rated representation can be retained in parallel with the most highly rated

representation from one parse state to the next as long as the heuristic value of the

lower-ranked representation,as indicated by its activation,is close to that of the

highest ranked interpretation at that point.In order to implement this assumption

within the activation-based framework,I will assume the existence of a second

threshold quantity of activation,T2,where T2?T1,such that a representation is retained from one parse state to the next as long as its activation is greater than or

equal to T2.Hence,if by the time that the activation on the most highly rated

representation R1has reached the target threshold activation T1the activation on a

second representation R2is greater than T2,then R2is retained along with R1in the

active representation set,the set of representations that the parser continues to work

on.If the activation on another representation R3does not reach T2by the time that

R1’s activation has reached T1,then R3is not retained in the active representation set.

The set of representations for the input considered by the processor is therefore

pided into two sets:one–the active representation set–in which the representa-

tions are being considered as integration sites for incoming words;and a second–

the inactive representation set–in which the representations are no longer being

considered as integration sites for incoming words.Resources are spent to keep the

representations in the active set activated,whereas no resources are spent to keep the

representations in the inactive set activated,so that the activation of the inactive

representations decays over time.Although the inactive representations are not being worked on,they remain in the representation space with a low activation,so that they may be reactivated later in a reanalysis stage if an incoming word cannot be integrated with any of the representations in the active representation set (see Gibson et al.(1998a)for discussion of how reanalysis takes place in this frame-work).Some important issues that are yet to be worked out in this general frame-work include:(1)how the different sources of information interact,(2)whether there is a hierarchy among the constraints such that some information is available earlier,

(3)what the resource costs and activation levels are for the different constraints,(4)where the resource costs and activation levels originate and (5)what the thresholds T 1and T 2are.The answers to these questions are beyond the scope of the work presented here.

2.2.Integration cost

Sentence comprehension involves integrating new input words into the currently existing syntactic and discourse structure(s).Each integration has a syntactic com-ponent,responsible for attaching structures together,such as matching a syntactic category prediction or linking together elements in a dependency chain.Each inte-gration also has a semantic and discourse component which assigns thematic roles and adds material to the discourse structure.It is assumed that each linguistic integration requires a ?xed quantity of computational resources to perform the integration plus additional resources proportional to the distance between the ele-ments being integrated (cf.Hudson (1995)for a related hypothesis regarding depen-dency distance and syntactic complexity).Thus longer-distance integrations require more resources,other factors being equal.The motivation for distance-based inte-gration cost within the activation-based framework is as follows.It is assumed that each lexical item in a structure has an activation level independent of the activation level for the whole structure.The lexical activations decay as additional words are integrated.To perform an integration,it is necessary to ?rst match the category of the current word with a syntactic prediction that is part of one of the candidate structures being pursued.This match then reactivates the lexical head/dependent associated with the syntactic prediction so that the plausibility of the head-depen-dent relationship can be evaluated within the discourse context.The lexical activa-tion on a word w decays as more words are input and integrated into the current structure for the input,unless the new words are also involved in a head-dependent relationship with w .Thus it generally takes more resources to reactivate the lexical head for words further back in the input string.If some of the intermediate words are also involved in a head-dependent semantic relationship with w ,then w ’s lexical activation will have been recently re-activated,so that the integration cost will be less than if none of the intermediate words were involved in a head-dependent relationship with w .

The hypothesis that longer distance integrations are more expensive is supported by evidence that local integrations are easier to make than more distant integra-tions,leading to a locality or recency preference in instances of ambiguity (Kim-11

E.Gibson /Cognition 68(1998)1–76

12 E.Gibson/Cognition68(1998)1–76

ball,1973;Frazier and Fodor,1978;Gibson,1991;Stevenson,1994;Gibson et al., 1996a).For example,consider the ambiguous attachment of the adverb‘yesterday’in(8):

(8)The bartender told the detective that the suspect left the country yesterday. The adverb‘yesterday’can be linked to either the local verb‘left’or to the more distant verb‘told’.The more local attachment is strongly preferred.In this activa-tion-based framework,the local attachment is preferred because it takes less resources to reactivate the verb‘left’because it occurred more locally in the input string,so that its activation has decayed less than that of the verb‘told’.

Two components of a distance-based integration cost function I(n)need to be speci?ed:(1)what the function I(n)is and(2)what kind of linguistic elements cause processing increments,i.e.what characterizes n in I(n).For simplicity,it will initi-ally be assumed that the function I(n)is a linear function whose slope is one and whose intercept is zero,i.e.I(n)=n.(However,see Section2.6.7for evidence that the integration and memory cost functions are not linear as n gets large.)Some possibilities to be considered for the kind of linguistic elements causing processing increments include:words,morphemes,components of syntactic structures(e.g. noun phrases,verb phrases),or components of discourse structures.Although pro-cessing all words probably causes some integration cost increment,it is hypothe-sized here that substantial integration cost increments are caused by processing words indicating new discourse structure(Kamp,1981;Heim,1982),particularly new discourse referents.A discourse referent is an entity that has a spatio-temporal location so that it can later be referred to with an anaphoric expression,such as a pronoun for NPs,or tense on a verb for events(Webber,1988).The motivation for this hypothesis is that much computational effort is involved in building a structure for a new discourse referent(e.g.Haviland and Clark,1974;Haliday and Hasan, 1976;Garrod and Sanford,1977;see Garrod and Sanford(1994)for a recent sum-mary).Expending this effort causes substantial decays in the activations associated with preceding lexical items.Thus processing an NP which refers to a new discourse object eventually leads to a substantial integration cost increment,as does proces-sing a tensed verb,which indicates a discourse event.6The discourse-reference-based hypothesis provided here is a?rst approximation of the integration cost function:

(9)Linguistic integration cost

The integration cost associated with integrating a new input head h2with a

6The hypothesis that there is a substantial computational cost associated with constructing structures for new discourse referents is related to Crain and Steedman’s(1985)ambiguity resolution principle,the principle of parsimony,which prefers simpler discourse structures in cases of ambiguity(cf.Altmann and Steedman,1988).According to the principle of parsimony,the sentence processing mechanism prefers not to assume the existence of additional unmentioned objects in the discourse,if it has the choice.See Section3.5.3for more discussion of the relevance of Crain et al.’s referential theory to the integration and memory cost functions.

head h 1that is part of the current structure for the input consists of two parts:

(1)a cost dependent on the complexity of the integration (e.g.constructing a new discourse referent);plus (2)a distance ?based cost:a monotone increas-ing function I(n)energy units (EUs)of the number of new discourse referents that have been processed since h 1was last highly activated.For simplicity,it is assumed that I(n)=n EUs.

It is likely that processing other aspects of discourse structures,such as new discourse predicates,also causes additional integration cost.See Section 2.6.4where this possibility is elaborated.It is also likely that processing every intervening word,whether introducing a new discourse structure or not,causes some integration cost increment for the distance-based component.Furthermore,because the dis-tance-based cost function is assumed to be determined by the amount of resources expended between the head and dependent that are being integrated,integration cost will also depend on the complexity of the integrations in the intermediate region.Although this is potentially an important source of integration cost,for simplicity of presentation we will also initially ignore this complexity source.7

2.3.Memory cost

The second component of the SPLT is a theory of linguistic memory cost.Accord-ing to this component of the theory,there is a memory cost associated with remem-bering each category that is required to complete the current input string as a grammatical sentence.This claim requires a theory of phrase structure.For simpli-city,I will assume a syntactic theory with a minimal number of functional cate-gories,such as Head-driven Phrase Structure Grammar (Pollard and Sag,1994)or Lexical Functional Grammar (Bresnan,1982).8Under these theories,the minimal number of syntactic head categories in a sentence is two:a head noun for the subject,and a head verb for the predicate.If words are encountered that necessitate other syntactic heads to form a grammatical sentence,then these categories are also predicted,and additional memory load is incurred.For example,at the point of processing the second occurrence of the word ‘the’in the object-extracted RC example (1a)‘The reporter who the senator attacked admitted the error’,there are four obligatory syntactic predictions:(1)a verb for the matrix clause,(2)a verb for 7Another potential source of integration complexity is the interference of the intervening head-depen-dent relationships on the one being formed (cf.Lewis,1996,1998).To the extent that the intervening head-dependent relationships are similar to the one being formed,this may make the current head-dependent relationship more complex.This possible component of integration dif?culty might account for the additional dif?culty of processing nested structures like (vi)(Stolz,1967;Schlesinger,1968;example from Bever,1970)(see footnote 4):

(iv)#The lion which the dog which the monkey chased bit died.

8The SPLT is also compatible with grammars assuming a range of functional categories such as In?,Agr,Tense,etc.(e.g.Chomsky,1995)under the assumption that memory cost indexes predicted chains rather than predicted categories,where a chain is a set of categories that are co-indexed through syntactic movement (Chomsky,1981).13

E.Gibson /Cognition 68(1998)1–76

the embedded clause,(3)a subject noun for the embedded clause and (4)an empty category NP for the wh-pronoun ‘who’(see the tree diagram in Fig.1).

It will initially be assumed that the memory cost function is the same as the integration cost function:a discourse-based locality function.9The conceptual moti-vation for this claim is as for the locality-based integration cost hypothesis:there is computational effort associated with building an intervening new discourse struc-ture,which makes it harder to keep the current syntactic predictions in mind,so more effort must be spent in keeping the syntactic predictions activated.Hence,a category that was predicted earlier is associated with more cost than a category that was predicted more recently (cf.Kaplan,1974;Wanner and Maratsos,1978;Haw-kins,1990,1994;Joshi,1990;Rambow and Joshi,1994).This assumption ?ts with what is known about short-term memory recall in non-linguistic domains:it is harder to retain items in short-term memory as more interfering items are processed (see,e.g.Waugh and Norman,1965;see Baddeley,1990;Anderson,1994;and

Lewis,1996for recent summaries).10

It is hypothesized that there is one exception to the locality-based memory cost proposal:The prediction of the matrix predicate is proposed to be cost-free.The motivation for this proposal is that the parser is always expecting a predicate (and possibly a subject for this predicate also),so that this expectation is built into the parsing mechanism,and is therefore cost-free.11All other category predictions are 9See Section 6for a different approach,in which the memory cost for a predicted category remains at the same level over the processing of intervening material.

10It has been demonstrated that it is not just the passage of time that makes retaining items in working memory dif?cult.In particular,a recall task performed at two different rates of presentation within a natural range (i.e.not so fast as to make lexical access dif?cult:1s per item versus 4s per item)resulted in the same pattern of memory recall (Waugh and Norman,1965).Thus the initial hypothesis for the SPLT memory cost function is stated in terms of linguistic elements processed,not the quantity of time that has passed.

11This claim turns out to be a slight oversimpli?cation,because of closure phenomena:the fact that opening a new clause causes shunting of the material in the old clause out of working memory.Closure phenomena are discussed in Section 2.6.6,and a more accurate hypothesis regarding which predicted predicates are cost-free is presented there also.Fig.1.The tree structure for the sentence ‘The reporter who the senator attacked admitted the error’.14 E.Gibson /Cognition 68(1998)1–76

dependent on the lexical material that is encountered,and therefore cannot be pre-dicted in advance,leading to a memory cost associated with each of these lexically-dependent categories.Memory cost will be quanti?ed in terms of memory units (MUs).The SPLT memory cost hypothesis is summarized in (10):

(10)Syntactic prediction memory cost

(a)The prediction of the matrix predicate,V 0,is associated with no memory cost.

(b)For each required syntactic head C i other than V 0,associate a memory cost of M(n)memory units MUs where M(n)is a monotone increasing function and n is the number of new discourse referents that have been processed since C i was initially predicted.

As for the integration cost function,it turns out that almost any monotonically increasing function makes the correct predictions with respect to the contrasts to be accounted for here.For simplicity,it is initially assumed that the memory cost function M(n)is linear with a slope of one and an intercept of zero,i.e.M(n)=n.This assumption is consistent with much of the processing evidence to be reported here,with one class of exceptions:examples in which incomplete dependencies are greatly lengthened.These kinds of examples suggest that the memory cost function is not linear in its limiting behavior,but rather approaches a maximal cost (cf.Church,1980;Gibson et al.,1996b).The relevant examples,along with a discussion of their implications with respect to the memory cost function,are presented in Section 2.6.7.

2.4.The relationship between memory and integration cost

Following Just and Carpenter (1992),it is assumed that linguistic integration processes and storage access the same pool of working memory resources.12The memory units in this pool of resources can therefore be used for either storage or computation (integration).The amount of energy or work that is necessary to per-form an integration is quanti?ed in terms of energy units:

(11)An energy unit (EU)=memory unit (M)?time unit (TU)

For example,suppose there are ten MUs available for performing an integration which requires ?ve EUs.If all ten MUs perform the integration,then the time required to complete the integration is 5EUs/10MUs =0.5TUs.If seven of the MUs are occupied with another task (e.g.storage)so that only three MUs are available to perform the integration,then the integration will take 5EUs/3MUs =1.67TUs.

12No claim is being made here with respect to whether the computational resources used in sentence processing are a general pool of memory resources,as argued by Just and Carpenter (1992),or whether they are a modular linguistic memory pool,as argued by Waters and Caplan (1996).15

E.Gibson /Cognition 68(1998)1–76

16 E.Gibson/Cognition68(1998)1–76

These assumptions result in the hypothesis that the time required to perform a linguistic integration,as indexed by reading times,is a function of the ratio of the integration cost required at that state to the space currently available for the com-putation.Thus the greater the memory cost,the smaller the resources available to perform linguistic integrations,and the longer the integration steps will take.These ideas are formalized in(12):13

(12)The timing of linguistic integration:

t struct?integ=C?I struct?integ/(M capacity?M current?memory?used);

t struct?integ is the time required to perform an integration;

C is a constant

I struct?integ is the quantity of energy resources(in EUs)required to perform the

integration as determined by the function I(n)in(9)

M capacity is the linguistic working memory resource capacity of the listener/ reader(in MUs)

M current?memory?used is the memory resources already in use(in MUs)as deter-mined by the function M(n)in(10).

The evidence currently available which supports this hypothesis comes from experiments performed by King and Just(1991)who looked at the reading time behavior of subject pools with different memory capacities for language.This evi-dence is discussed in Section2.6.3.

2.5.Memory cost and intuitive complexity judgments

Following earlier researchers,it is assumed that the relative intuitive acceptability of two unambiguous sentences which are controlled for plausibility is determined by the maximal quantity of memory resources that are required at any point during the parse of each sentence(see,e.g.Gibson(1991)and the references therein).If more memory resources are required than the available working memory capacity,then processing breakdown eventually results,as in the processing of(3b),repeated here as(13): (13)#The administrator who the intern who the nurse supervised had bothered lost

the medical reports.

According to this hypothesis,(13)is unprocessable because there is a state during its parse that requires more memory resources than are available.If sen-tence length were a major factor in determining intuitive sentence complexity,then (14)should be judged roughly as complex as(13),since the two sentences are of the same length and have the same meaning.However,(13)is much more complex than(14).

13The time for linguistic integration T

in(12)is not the only time involved in comprehending a

struct-integ

word.In particular,lexical access,eye-movements in reading and button-presses in self-paced reading all require some time to compute,so that there is a base-line time on top of which linguistic integration time is added.

(14)The nurse supervised the intern who had bothered the administrator who lost

the medical reports.Furthermore,including additional right-branching clauses at either the beginning or end of (13)as in (15a)and (15b)results in sentence structures with the same intuitive complexity as (13):

(15)(a)#The doctor thought that the medical student had complained that the

administrator who the intern who the nurse supervised had bothered lost the medical reports.

(b)#The administrator who the intern who the nurse supervised had bothered lost the medical reports which the doctor thought that the medical student had complained about.

Thus intuitive complexity is not determined by the average memory complexity over a sentence.We therefore assume that the intuitive complexity of a sentence is determined by the maximal memory complexity reached during the processing of the sentence.

2.6.Empirical evidence for the components of the SPLT

This section provides empirical evidence for a number of components of the SPLT.Section 2.6.1provides evidence for the discourse-based de?nition of locality within the memory cost function.Section 2.6.2provides evidence for the locality-based integration hypothesis.Section 2.6.3provides evidence for the single resource pool hypothesis.In Section 2.6.4evidence is described which supports an extended version of the memory and integration cost functions according to which there is additional cost for new discourse predicates.Section 2.6.5provides evidence for the claim that there is no memory cost for predicting the matrix predicate in a sentence.In Section 2.6.6,evidence is provided for a clause-based closure principle.Section

2.6.7provides evidence that the integration and memory cost functions are non-linear in the limit,approaching a maximal cost.

2.6.1.A discourse-based memory cost function

According to the discourse-based memory cost hypothesis,intervening elements which cause substantial integration and memory cost increments are words introdu-cing new discourse referents:NPs (object referents)and the main verbs of VPs (event referents).Evidence for this claim is provided by Gibson and Warren (1998a)who showed that doubly nested RC structures are easier to process when a ?rst-or second-person pronoun (an indexical pronoun)is in the subject position of the most embedded clause,as compared with similar structures in which a proper name,a full NP or a pronoun with no referent is in the subject position of the most embedded:

(16)(a)Indexical pronoun:The student who the professor who I collaborated with

had advised copied the article.17

E.Gibson /Cognition 68(1998)1–76pronoun 的使用频率丆在语境中的availability 可以解释这一点丆引入渐变量‘的概念

(b)Short name:The student who the professor who Jen collaborated with had advised copied the article.

(c)Full NP:The student who the professor who the scientist collaborated with had advised copied the article.

(d)No referent pronoun:The student who the professor who they collaborated with had advised copied the article.

In an acceptability questionnaire,participants rated the items with the indexical pronouns signi?cantly easier to process than any of the other three conditions (see Fig.2):(1)items with short names (e.g.‘Jen’in (16b)),(2)items with full de?nite NPs (e.g.‘the scientist’in (16c))and (3)items with a pronoun lacking a discourse referent (e.g.‘they’in (16d)).14These observations can be accounted for in the SPLT framework if the memory cost for a predicted category is increased when a new discourse referent is processed.At the point of processing the most embedded subject in these structures,there are four predicted categories associated with increasing memory cost (in addition to the prediction of the matrix verb):two embedded verbs and two embedded NP-empty-categories.According to the pro-posed SPLT memory cost function,the memory cost for each of these predicted categories increases when an NP which is new to the discourse is processed at this point (e.g.‘they’in (16d);‘Jen’in (16b);or ‘the scientist’in (16c)).If the most

9b79fbc26137ee06eff9187fplexity ratings for nested structures containing different kinds of NPs in the most embedded subject position (from Gibson and Warren,1998a).The scale that participants used went from 1(not complex:easy to understand)to 5(very complex:hard to understand).

14The higher complexity ratings of the condition containing a pronoun lacking a discourse referent may re?ect the infelicity of using a third person pronoun without an antecedent.18 E.Gibson /Cognition 68(1998)1–76

embedded subject is an indexical pronoun (e.g.the ?rst-person pronoun ‘I’in (16a),then its referent is in the current discourse (which is assumed to always include a speaker/writer and a hearer/reader)and the memory costs for the predicted cate-gories do not increase.As a result,the maximal memory cost required to process a doubly nested RC structure with a new referent in its most embedded subject posi-tion is greater than that required to process a similar structure with an old referent in its most embedded subject position.See Section 3.2for details of memory cost computations on doubly nested RC structures like these.

The discourse-based memory cost hypothesis therefore provides an account for the observation that a doubly nested RC structure in a null context is processable only when the most embedded NP is an indexical pronoun (Bever,1970,1974;Kac,1981;Gibson,1991;Gibson and Warren,1998a;Kluender,1998):

(17)(a)The reporter [everyone [I met ]trusts ]said the president won’t resign yet.

(Bever,1974)

(b)A book [that some Italian [I’ve never heard of ]wrote ]will be published soon by MIT Press.(Frank,1992)

(c)Isn’t it true that example sentences [that people [that you know ]produce ]are more likely to be accepted.(De Roeck et al.,1982)

Bever (1970,1974)was the ?rst to note that examples like these were acceptable.He attributed their acceptability to the syntactic non-similarity of the three kinds of subject NPs in the structure.However,this account does not explain why a pronoun and not some other dissimilar NP,such as a proper name or an inde?nite NP,must occur in the most embedded subject position in order to make the structure acceptable.Kac (1981)was the ?rst to notice the generalization that these structures were acceptable with pronouns in the most embedded position,except that he hypothe-sized that having any pronoun in the subject position of the most embedded RC makes a doubly nested RC structure easier to process than corresponding sentences with full NPs.This claim is too strong,because structures whose most embedded subject is a pronoun without a referent in the discourse,as in (16d),are as complex as those whose most embedded subject is a name or a full name,as in (16b)and (16c).

Other syntactic complexity theories are also not compatible with the discourse-dependent complexity difference discussed here.For example,Kimball’s (1973)theory indexes complexity with the number of incomplete clauses that need to be kept in memory at a particular parse location,and Gibson’s (1991)theory indexes complexity with the number of incomplete thematic role assignments.These metrics do not predict the observed discourse-based contrast,nor are they extendible to account for the difference.2.6.2.Evidence for the distance-based integration hypothesis

Under the single resource pool hypothesis in (12)above,it is necessary to deter-mine the quantity of linguistic memory resources that are used at each processing state in order to predict exact SPLT integration times.However,reasonable ?rst approximations of comprehension times can be obtained from the integration costs 19

E.Gibson /Cognition 68(1998)1–76

alone,as long as the linguistic memory storage used is not excessive at these integration points.When memory costs are also taken into consideration,integration times at points of higher memory cost will be increased relative to integration times at points of lower memory cost.For simplicity,examples of the SPLT integration costs will ?rst be compared with reading times,before the memory component of the theory is presented.Additionally,we will consider only the distance-based compo-nent of the integration cost function.The SPLT distance-based integration cost pro?les for the object-extraction (1a)and the subject-extraction (1b)are given in

(18)and (19)respectively.

The ?rst point at which an integration takes place is at the second word ‘reporter’,which is integrated with the preceding word ‘the’.No new discourse referents have been processed since the determiner ‘the’was processed,so that the distance-based integration cost is I(0)EUs at this point.Note that the two words ‘the’and ‘reporter’together form a new discourse referent,so constructing this new referent also con-sumes some integration resources.We are initially ignoring this cost,because it is part of the distance-based component of integration cost.We will also ignore this cost for the other words in these sentences which head new discourse referents:the nouns ‘reporter’,‘senator’,and ‘error’;and the tensed verbs ‘attacked’and ‘admitted’.

The next word ‘who’is now integrated into the structure,attaching to the most recent word ‘reporter’.No new discourse referents have been processed since the attachment site ‘reporter’was processed (note that ‘who’is not a new discourse referent:it is a pronominal element referring to the established discourse referent ‘the reporter’),resulting in a cost of I(0)EUs.The word ‘the’is integrated next,again with no intervening new discourse referents,for a cost of I(0)EUs.The word ‘senator’is integrated next,again at a cost of I(0)EUs.

Processing the next word,‘attacked,’involves two integration steps.First,the verb ‘attacked,’is attached as the verb for the NP ‘the senator’.This attachment involves assigning the agent thematic role from the verb ‘attacked’to the NP ‘the senator’.This integration consumes I(1)EUs,because one new discourse referent (‘attacked’)has been processed since the subject NP ‘the senator’was processed.The second integration consists of attaching an empty-category as object of the verb ‘attacked’and co-indexing it with the relative pronoun ‘who’.Two new discourse referents have been processed since ‘who’was input –the object referent ‘the senator’and the event referent ‘attacked’–so that this integration consumes an additional I(2)EUs,for a total cost of I(1)+I(2)EUs.

The verb ‘admitted’is then integrated as the main verb for the subject NP ‘the reporter’.This integration consumes I(3)EUs because three new discourse refer-ents have been processed since ‘the reporter’was input:the object referent ‘the Input word

The reporter

who the senator attacked admitted the error Integration cost (in eu s)–I(0)

I(0)I(0)I(0)I(1)+I(2)I(3)I(0)I(0)+I(1)

(18)Object ?extracted relative clause

20 E.Gibson /Cognition 68(1998)1–76

senator’and the two event referents ‘attacked’and ‘admitted’.Next,the word ‘the’is integrated into the structure,at a cost of I(0)EUs.Finally,the noun ‘error’is integrated into the structure.This step involves two integrations:the noun ‘error’integrating with the determiner ‘the’at a cost of I(0)EUs,and the NP ‘the error’integrating as the object of the verb ‘admitted’at a cost of I(1)EUs,because one new discourse referent –‘the error’–has been processed since the verb ‘admitted’was input.

The distance-based integration cost pro?le for the subject-extraction,(1b),is presented in (19).The integration costs for this construction are the same as for the object-extraction,except in the embedded clause.In the subject-extraction,the cost of integrating ‘attacked’is I(0)+I(1)EUs,corresponding to (1)the attachment of a gap in subject-position of the embedded clause,crossing zero new discourse referents and (2)the attachment of the embedded verb ‘attacked’to its preceding subject,an integration which crosses one new discourse referent (‘attacked’).15The determiner ‘the’is integrated next,consuming a cost of I(0)EUs.The noun ‘senator’is then integrated at a cost of I(0)+I(1)EUs.(19)Subject ?extracted relative clause Input word The reporter

who attacked the senator admitted the error I ntegration

cost (in EUs)

–I(0)I(0)I(0)+I(1)I(0)I(0)+I(1)I(3)I(0)I(0)+I(1)Thus the SPLT integration theory predicts reading times to be fast throughout both relative clause constructions,with the exception of (1)the matrix verb in each,where a long distance integration occurs and (2)the embedded verb in the object-extraction construction,where two substantial integrations take place.Furthermore,if integration cost is also dependent on the complexity of the intervening integra-tions,as the activation-based account predicts,then the integration cost on the matrix verb ‘admitted’should be larger in the object-extraction than in the sub-ject-extraction,because more complex integrations take place between the subject and the matrix verb in the object-extraction than in the subject-extraction.In parti-cular,integrating the embedded verb is more complex in the object-extraction structure than in the subject-extraction structure.(Note that the integration cost pro?les presented here have been simpli?ed,so that the predicted difference between the integration costs of the matrix verb in the two extraction structures is not represented in (18)and (19).)

A comparison between SPLT predicted integration costs and reading times for the two relative clause constructions is presented in Fig.3,based on data gathered by

15If one assumes a syntactic theory according to which there is no empty category mediating the subject-extraction in an example like (19),then the integration cost is simply I(1)EUs,corresponding to the cost of attaching the verb ‘attacked’to its subject ‘who’.The integration cost is virtually the same under either set of assumptions,because the cost of attaching the mediating empty category is very low,at I(0)EUs.Thus theories with and without subject empty categories make virtually the same predictions here.21

E.Gibson /Cognition 68(1998)1–76

本文来源:https://www.bwwdw.com/article/oljq.html

Top