This thesis entitled Verb Sense and Verb Subcategorization P

更新时间:2023-04-18 18:19:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Verb Sense and Verb Subcategorization Probabilities

by

Douglas William Roland

B.S., University of Delaware, 1989

M.A., University of Colorado, 1994

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirement for the degree of

Doctor of Philosophy

Department of Linguistics

2001

Copyright ? 2001

by

Douglas William Roland

This thesis entitled:

V erb Sense and V erb Subcategorization Probabilities

written by Douglas William Roland

has been approved for the Department of Linguistics

___________________________

Daniel Jurafsky

___________________________

Lise Menn

____________

Date

The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly

work in the above mentioned discipline.

iii

Abstract

Roland, Douglas William (Ph.D., Linguistics)

V erb Sense and V erb Subcategorization Probabilities

Thesis directed by Associate Professor Daniel S. Jurafsky

This dissertation investigates a variety of problems in psycholinguistics and computational linguistics caused by the differences in verb subcategorization probabilities found between various corpora and experimental data sets. For psycholinguistics, these problems include the practical problem of which frequencies to use for norming psychological experiments, as well as the more theoretical issue of which frequencies are represented in the mental lexicon and how those frequencies are learned. In computational linguistics, these problems include the decreases in the accuracy of probabilistic applications such as parsers when they are used on corpora other than the one on which they were trained.

Evidence is presented showing that different senses of verbs and their corresponding differences in subcategorization, as well as inherent differences between the production of sentences in psychological norming protocols and language use in context, are important causes of the subcategorization frequency differences found between corpora. This suggests that verb subcategorization probabilities should be based on inpidual senses of verbs rather than the whole verb lexeme, and that “test tube” sentences are not the same as “wild” sentences. Hence, the influences of experimental design on verb subcategorization probabilities should be given careful consideration.

This dissertation will demonstrate a model of how the relationship between verb sense and verb subcategorization can be employed to predict verb subcategorization based on the semantic context preceding the verb in corpus data. The predictions made by the model are shown to be the same as predictions made by human subjects given the same contexts.

For Sumiyo

v

Acknowledgements

This dissertation is the result of an enormous amount of help, advice, and support from many people. However, unlike the body of the dissertation, where a simple table or graph can adequately convey a message, a page of text in the introduction cannot convey impact that the people named here have had on this dissertation and on me as a researcher and as a human being.

At the top of the list are my advisor, Dan Jurafsky, my co-advisor, Lise Menn, and the other members of my committee, Alan Bell, Jim Martin, and Tom Landauer. Over the years, they have all given me more than a fair share of time, energy, insight, help, and patience. I can’t even begin to list all they have done.

Other present and former Boulder faculty members have also had a great influence on me, including: Paul Smolensky, who introducing me to computational modeling during my first semester in Boulder, Mike Eisenburg, who showed me the artistic side of programming, Laura Michaelis, and Barbara Fox.

I also owe special gratitude to Susanne Gahl, who read multiple versions of this dissertation and the preceding papers, and provided many useful suggestions for improving both content and the clarity.

Several people have very kindly contributed their data to this dissertation. These include Charles Clifton, who lent me the original hand-written subject response sheets from the Connine, Ferreira, Jones, Clifton, & Frazier (1984) study, and thus caused me to re-evaluate many of my original thoughts about the causes of the differences between norming study data and other corpus data. Of similar importance was the subject response data from Garnsey, Pearlmutter, Myers, & Lotocky (1997), provided by Susan Garnsey. Chapter 3 would not exist at all if it were not for the data and support provided by Mary Hare, Ken McRae, and Jeff Elman.

Financially, this work was supported in part by: NSF CISE/IRI/Interactive Systems Proposal 9818827, NSF CISE/IRI/Interactive Systems A ward IRI-9618838, NSF CISE/IRI/Interactive Systems A ward IRI-970406, NSF CISE/IRI/Interactive Systems A ward IIS-9733067

Many people have provided helpful feedback either on this dissertation, or on the papers and presentations, such as Roland & Jurafsky (1997), Roland & Jurafsky (1998), Roland & Jurafsky (in press), and Roland et al. (2000), that contain various pieces of the data and analysis in this dissertation. These include Charles Clifton, Charles Fillmore, Adele Goldberg, Mary Hare, Uli Heid, Paola Merlo, Neal Perlmutter, Philip Resnik, Suzanne Stevenson, and the ever-famous anonymous reviewers. This list also includes assorted office-mates and other people associated with Dan’s research group: (in an attempt at chronological order) Bill Raymond, Taimi Metzler, Giulia Bencini, Michelle Gregory, Traci Curl, Mike O’Connell, Cynthia Girand, Noah Coccaro, Chris Riddoch, Beth Elder, Keith Herold, and finally,

vi Dan Gildea and Sameer Pradhan, both of whom provided significant help while I was attempting to tag and parse data from various corpora.

Special thanks to Y oshie Matsumoto for showing me where all the good coffee shops were for getting work done, and setting a pace and spirit during my first year in Boulder which I have attempted (not always successfully) to maintain since. Also, thanks to Faridah Hudson and Linda Nicita, who started with me, and have provided support and camaraderie during the long trip.

Many friends (mostly human, but also one canine and one feline) from the real world also made life a much better place to be: (in order of neighborhood proximity to our kitchen) Rie, Pepper, Emi, Virgil, Hal, Elizabeth, Jeremy, Benny, Raj, Noriko, Wes, Lee Wah, Nanako, Hiroko, Kiyoshi, (and now for a great leap in neighborhood distance) Yoshie, Kazu, Tomoko, and Taro. Also thanks to Imanaka Sensei for ocha and okashi.

I would also like to thank my parents for being parents, a job which included Mom spending a day of her vacation last summer proof-reading the whole dissertation.

None of the help from the people above would mean anything were it not for the support, encouragement, love, and extreme patience shown by my wife Sumi. I couldn’t have done it without her.

vii

Contents

1 Introduction 1

1.1 Overview (1)

1.2 The importance of verb subcategorization probabilities in psycholinguistics

(1)

1.3 The problem with verb subcategorization frequencies (4)

1.4 V erb subcategorizations and computational linguistics (11)

1.4.1 The importance of verb subcategorization information for statistical

parsers (12)

1.4.2 Problem: the need to retrain parsers for new domains (13)

1.5 Solving the verb subcategorization frequency problem (15)

1.5.1 Evidence for the relationship between verb semantics and

subcategorization from linguistics (17)

1.5.2 Evidence for the relationship between verb semantics and

subcategorization from computational linguistics (19)

1.5.3 Evidence for the relationship between verb semantics and

subcategorization from psycholinguistics (20)

1.6 Outline of Chapters (21)

1.6.1 Chapter 2 (22)

1.6.2 Chapter 3 (23)

1.6.3 Chapter 4 (24)

2 Subcategorization probability differences in corpora and

experiments 25

2.1 Combining sense-based verb subcategorization probabilities with other

factors to yield observed subcategorization frequencies (25)

2.1.1 V erb senses (26)

2.1.2 Probabilistic factors (27)

2.2 Experiment – comparing norming studies and corpus data (28)

2.2.1 Methodology (28)

2.2.1.1 Connine et al. (1984) sentence production study (29)

2.2.1.2 Garnsey et al. (1997) sentence completion study (30)

2.2.1.3 Brown Corpus (31)

2.2.1.4 Wall Street Journal Corpus (32)

viii

2.2.1.5 Switchboard Corpus (32)

2.2.1.6 Extracting subcategorization probabilities from the corpora

(32)

2.2.1.7 Measuring differences between corpora (39)

2.2.2 Results and discussion: Part 1 - Subcategorization differences

resulting from comparing isolated sentence and

connected-discourse corpora (41)

2.2.2.1 Discourse cohesion (41)

2.2.2.2 Other experimental factors – subject animacy (46)

2.2.2.3 Conclusion for section 2.2.2 (47)

2.2.3 Results and discussion: Part 2 - Subcategorization differences

resulting from verb sense differences (48)

2.2.

3.1 V erbs have different subcategorization frequencies in

different corpora (48)

2.2.

3.2 V erbs have different distributions of sense in different

corpora (49)

2.2.

3.3 Topics provided in norming studies also influence verb

sense (51)

2.2.3.4 Subcategorization frequencies for each verb sense (51)

2.2.

3.5 Factors that contribute to stable cross-corpus

subcategorization frequencies (52)

2.2.3.6 Conclusion for section 2.2.3 (55)

2.2.4 Results and discussion: Part 3 – Reducing sense and discourse

differences decreases the differences in subcategorization

probabilities (56)

2.2.5 Conclusion (58)

2.3 Experiment – Controlling for discourse type and verb sense to generate

stable cross corpus subcategorization frequencies (58)

2.3.1 Data (59)

2.3.2 V erb Frequency (59)

2.3.3 Subcategorization Frequency (60)

2.3.3.1 Methodology: (60)

2.3.3.2 Results: (61)

2.3.4 Discussion: (62)

2.3.5 Conclusion (65)

2.4 Conclusion (66)

ix 3 Predicting verb subcategorization from semantic context using the

relationship between verb sense and verb subcategorization 68

3.1 Overview (68)

3.2 Psycholinguistic evidence for the effects of verb sense on human sentence

processing (68)

3.3 Model for predicting subcategorization from semantic context (72)

3.3.1 Previous related uses of LSA (74)

3.3.2 Details of how LSA is used to measure semantic similarity (79)

3.3.3 Corpus (training) data used in model (80)

3.4 Experiments (81)

3.4.1 Predicting the subcategorizations of the Hare et al. (2001) bias

contexts (81)

3.4.1.1 Results and discussion (82)

3.4.2 Predicting the subcategorizations of corpus bias contexts (87)

3.4.2.1 Results and discussion (88)

3.4.2.2 Additional analysis using corpus bias contexts (91)

3.4.3 Predicting the subcategorizations of examples of ‘admit’ (93)

3.4.3.1 Methods (94)

3.4.3.2 Results and discussion (95)

3.5 Conclusion (97)

4 Conclusions and future work 99

4.1 Psycholinguistics (99)

4.2 Computational linguistics (100)

4.3 Future work (100)

Bibliography 101 Appendix A: Subcategorizations and tgrep search strings 106 Appendix B: Stimuli used in Hare et al. (2001) 112

x

Tables

Table 1: Correlation values from comparisons in Merlo (1994) (5)

Table 2: High, middle, and low attachment sites from Gibson & Schuetze (1999)...

(7)

Table 3: Correlations (r) and agreement (b) for comparisons for DO and SC subcategorizations from Lapata et al. (2001) (9)

Table 4: Correlations (r) and agreement (b) for comparisons for NP and 0 subcategorizations from Lapata et al. (2001) (9)

Table 5: Sample grammar rules (made-up probabilities) (12)

Table 6: Results from Gildea (2001) (14)

Table 7: Senses and possible subcategorizations of admit from WordNet (Miller, Beckwith, Fellbaum, Gross, & Miller 1993) (16)

Table 8: Approximate size of each corpus (29)

Table 9: Connine et al. (1984) protocol 1 sample prompts and subject responses.29 Table 10: Connine et al. (1984) protocol 2 sample prompts and subject responses.30 Table 11: 127 verbs used from Connine et al. (1984) (30)

Table 12: Sentence Completion protocol used to collect subcategorization frequencies by Garnsey et al. (1997) (30)

Table 13: Example subcategorization-frame probabilities for each of the three subcategorization frame classes (DO-bias, SC-bias, and EQ-bias) of Garnsey et al.

(1997). (31)

Table 14: 127 verbs used from Garnsey et al. (1997) (31)

Table 15: List of subcategorizations (33)

Table 16: Examples of each subcategorization frame taken from the Brown Corpus.

(34)

Table 17: Examples of each subcategorization frame from the response sheets for the CFJCF data (35)

Table 18: Raw subcategorization vectors for hear from BC and WSJ (40)

Table 19: Modified subcategorization vectors for hear from BC and WSJ for use in calculating Chi Square (41)

xi Table 20: Use of passives in each corpus (42)

Table 21: The object of follow is only omitted in connected-discourse corpora.

(numbers are hand-counted, and indicate % of omitted objects out of all instances of follow) (43)

Table 22: Greater use of first person subject in isolated-sentences (44)

Table 23: Use of VP-internal NPs which are anaphorically related to the subject..44

Table 24: Token/Type ratio for arguments of accept (45)

Table 25: Subcategorization of worry affected by sentence-completion paradigm.46

Table 26: Uses of worry (46)

Table 27: Agreement between WSJ and BC data (49)

Table 28: Differences in distribution of verb senses between BC and WSJ (49)

Table 29: Examples of common senses of charge and their frequencies (50)

Table 30: Examples of common senses of jump and their frequencies (50)

Table 31: Examples of common senses of pass and their frequencies (51)

Table 32: Uses of pass in different settings in the CFJCF sentence production study.

(51)

Table 33: Different senses of charge in WSJ have different subcategorization probabilities. Dominant prepositions are listed in parentheses after the frequency (52)

Table 34: Improvement in agreement when after controlling for verb sense (52)

Table 35: Agreement between BC and WSJ data (52)

Table 36: Differences in distribution of verb sense between BC and WSJ (53)

Table 37: Examples of common senses of kill (53)

Table 38: Examples of common senses of stay (54)

Table 39: Examples of common senses of try (54)

Table 40: Senses and subcategorizations of kill in WSJ (55)

Table 41: Improvements in agreement for the verb hear (57)

xii Table 42: 64 verbs chosen for analysis (59)

Table 43: Number of verbs out of 64 showing a significant difference in frequency between corpora (60)

Table 44: V erbs that BNC and Brown both have more of than WSJ (60)

Table 45: V erbs that WSJ has more of than both Brown and BNC (60)

Table 46: Transitivity bias in each corpus (62)

Table 47: Sentence Completion results from Hare et al. (2001) (69)

Table 48: Word sense disambiguation results from Schütze (1997) (77)

Table 49: Sample size for each verb. (V erbs marked with * were used in Hare et al.

(2001), but were not used in the experiments in this dissertation due to small sample sizes.) (81)

Table 50: Average subcategorization frequencies for 15 verbs used in experiment

3.4.1, taken from corpus frequencies reported in Hare et al. (2001) (86)

Table 51: Sample sentence completion prompts (87)

Table 52: Average subcategorization frequencies for 15 verbs taken from sentence completion experiment in Hare et al. (2001) (87)

Table 53: Examples of senses and subcategorizations of admit (94)

Table 54: Counts and examples of subsenses of the 50 corpus examples of the DO-enter sense of admit (97)

Table 55: 20 nearest neighbors of grape in the TASA LSA semantic space (100)

Table 56: Errors not including quote-finding errors for communication verbs (106)

xiii

Figures

Figure 1: High, middle, and low attachment sites (from Gibson et al. 1996) (7)

Figure 2: Sample lexicalized tree taken from Charniak (1995) (12)

Figure 3: Semantic structures for two different syntactic patterns of ‘spray’ (Pinker 1989, page 228) (19)

Figure 4: Model showing why different corpora have different subcategorization probabilities for the same verb (26)

Figure 5: Effect of bias context on reading times in ambiguous condition, from Hare et al. (2001). (DA = DO bias, ambiguous condition, SA = SC bias, ambiguous condition) (71)

Figure 6: Effect of bias context on reading times in unambiguous condition, from Hare et al. (2001). (DU = DO bias, unambiguous condition, SU = SC bias, unambiguous condition) (72)

Figure 7: Predicting subcategorization from the context preceding the verb (73)

Figure 8: Use of semantic similarity to predict subcategorization (74)

Figure 9: Disambiguating the subcategorization of a target using Schütze style clusters in LSA semantic space (78)

Figure 10: Disambiguating the subcategorization of a target using the subcategorizations of the nearest neighbors (79)

Figure 11: Average % SC corpus examples in neighborhood of SC bias contexts (83)

Figure 12: Accuracy in predicting the subcategorization bias of the SC bias contexts.

(84)

Figure 13: Average % DO corpus examples in neighborhood of DO bias contexts.85 Figure 14: Accuracy in predicting the subcategorization bias of the DO bias contexts.

(85)

Figure 15: Average % SC corpus examples in neighborhood of SC target contexts.....

(88)

Figure 16: Accuracy in predicting the subcategorization bias of the SC corpus contexts. (89)

Figure 17: Average % DO corpus examples in neighborhood of DO corpus contexts.

(90)

xiv Figure 18: Accuracy in predicting the subcategorization bias of the DO bias contexts.

(91)

Figure 19: Comparison of various LSA weighting methods (92)

Figure 20: Effects of weighting neighborhoods by cosine on accuracy in predicting subcategorization (93)

Figure 21: Relative frequencies of each type of example in the neighborhood of DO-confess corpus examples (95)

Figure 22: Relative frequencies of each type of example in the neighborhood of SC-confess corpus examples (96)

Figure 23: Relative frequencies of each type of example in the neighborhood of DO-enter corpus examples (96)

1 1 Introduction

1.1 Overview

This dissertation will investigate a variety of problems in psycholinguistics and computational linguistics caused by the differences in verb subcategorization probabilities found between various corpora and experimental data sets. For psycholinguistics, these problems include the practical problem of which frequencies to use for norming psychological experiments as well as the more theoretical issue of which frequencies are represented in the mental lexicon and how those frequencies are learned. In computational linguistics, these problems include the decreases in the accuracy of probabilistic applications such as parsers when they are used on corpora other than the one on which they were trained.

Chapter 2 will demonstrate that different senses of verbs and their corresponding differences in subcategorization as well as inherent differences between the production of sentences in psychological norming protocols and language use in context are important causes of the subcategorization frequency differences found between corpora. This leads to two conclusions: 1) verb subcategorization probabilities, for psycholinguistic models and for norming purposes, should be based on inpidual senses of verbs rather than the whole verb lexeme, and 2) “test tube” sentences are not the same as “wild” sentences, and thus the influences of experimental design on verb subcategorization probabilities should be given careful consideration.

Chapter 3 will demonstrate a computational model, based on Latent Semantic Analysis, of how the relationship between verb sense and verb subcategorization can be employed to predict verb subcategorization based on the semantic context preceding the verb in corpus data. This chapter will also demonstrate that the predictions made by the model are the same as predictions made by human subjects given the same contexts. This will be accomplished by showing that the predictions from the algorithm correspond with parsing decisions made by human subjects in reading time experiments performed by Hare, Elman, & McRae (2001).

1.2 The importance of verb subcategorization probabilities in psycholinguistics

V erb subcategorization probabilities play an important role in recent psycholinguistic theories of human language processing and in computational linguistic applications such as probabilistic parsers. This section will address the role of verb subcategorization probabilities in psycholinguistics, providing examples of both evidence of how verb subcategorization probabilities affect sentence processing and of how various researchers have generated norming materials for their experiments. Section 1.4 will address the role of verb subcategorization probabilities in computational linguistics.

2 Fodor (1978) argued that the transitivity preferences of verbs affect the processing of sentences containing those verbs. She argues, based on intuition and informant judgment, that sentence (1) is more difficult to understand than sentence (2). The additional difficulty in (1) is attributed to the parser proposing a gap after the verb read at the location marked by (_).

(1) Which book i did the teacher read (_) to the children from _i?

(2) Which student i did the teacher go to the concert with _i? Alternatively, example (3) is more difficult than example (4). This time, the difficulty is attributed to the parser not proposing the filled gap (_) after the verb walk.

(3) Which student i did the teacher walk (_)i to the cafeteria?

(4) Which student i did the teacher walk to the cafeteria with _i?

These patterns of difficulty argue against both theories where the parser always proposes gaps, and theories where the parser never proposes gaps. Fodor argues that the key difference between these sets of examples is that the verb read commonly takes a direct object (DO), so parser proposes a gap, while the verb walk occurs more commonly without a DO, so no gap is proposed.

Clifton, Frazier, & Connine (1984) provided experimental evidence for the relationship between verb subcategorization expectations and processing difficulty discussed by Fodor. Clifton et al. (1984) relied on a norming study by Connine, Ferreira, Jones, Clifton, & Frazier (1984) for verb bias data. Parsing difficulties were measured by on line reaction times in grammaticality judgment and secondary task protocols.

Ford, Bresnan, & Kaplan (1982) showed how lexical preference affects the parsing of ambiguous sentences. Subjects were given different ambiguous sentences and asked to chose a meaning for the sentence. The subjects’ interpretations of the sentences changed when different verbs were used in otherwise identical sentences. Examples (5) and (6) show how parse preferences, indicated in parenthesis, change when the verb is changed. These changes in parse preference indicate that some information is associated with the verb that influences parsing decisions.

(5) They objected to everyone that they couldn’t hear.

a. They objected to everyone who they couldn’t hear (55%)

b. They objected to everyone about the fact that they couldn’t hear.

(45%)

(6) They signaled to everyone that they couldn’t hear.

a. They signaled to everyone who they couldn’t hear. (10%)

b. They signaled to everyone the fact that they couldn’t hear. (90%)

3 Trueswell, Tanenhaus, & Kello (1993) showed that the subcategorization bias of the verb affects the parsing difficulties in sentences with the sentential complement (SC) / direct object (DO) ambiguity. In this ambiguity, the noun phrase after the verb can be interpreted either as the direct object of the verb or as the subject of a sentential complement. In example (7), the student is a direct object of the verb accept, while in example (8), the student is the subject of the sentential complement the student wrote the paper.

(7) The teacher accepted the student.

(8) The teacher accepted the student wrote the paper.

If the verb is more frequently used with a direct object (i.e. has a DO bias), then parsing is more difficult in the region after the words the student, than it is if the verb is more frequently used with a sentential complement (i.e. has a SC bias). In order to determine the subcategorization bias of different verbs, Trueswell et al. (1993) used a sentence completion task. Subjects were given the initial portion of a sentence, such as John insisted, and asked to complete the sentence. The subjects’ completions were categorized as to whether they had written a direct object, a sentential complement, or some other use. In separate experiments relying on naming latency, self paced reading times, and eye tracking, they found that subjects had difficulties in sentences with the SC/DO ambiguity when the verbs had a DO bias, but not when the verbs had an SC bias.

Garnsey, Pearlmutter, Myers, & Lotocky (1997) showed that both verb bias and the plausibility of the noun phrase following the verb as a direct object played a role in parsing in the same direct object / sentential complement ambiguity investigated in Trueswell et al. (1993). This confirmed the results Trueswell et al. (1993), and separated out the effects caused by the plausibility of the noun phrase. As part of this project, Garnsey et al. (1997) performed a much larger sentence completion norming study on a superset of the verbs normed by Trueswell et al. (1993). Garnsey et al. (1997) selected candidate verbs based on the Connine et al. (1984) sentence production study.

MacDonald (1994) proposed that the difficulties in resolving syntactic ambiguities, such as in garden path sentences, were influenced by probabilistic constraints including “the frequencies of the alternative argument structures of ambiguous verbs.” This claim was supported by showing that the interpretation of reduced relative constructions was related to the degree to which the verb was used intransitively. Reduced relatives formed with highly transitive verbs such as interview, as in (9), are less difficult to process than reduced relatives formed with verbs that are frequently used intransitively, such as race, as in (10).

(9) The homeless people interviewed in the film are exceptionally

calm … (MacDonald (1994) taken from Maslin (1991))

4

(10) #1The horse raced past the barn fell. (MacDonald (1994)

originally from Bever (1970))

Jennings, Randall, & Taylor (1997) demonstrated graded effects of verb subcategorization preferences on sentence parsing. They found the degree of bias for 93 verbs that could take both a direct object and a sentential complement completion using a sentence completion task where subjects had to complete sentences consisting of a determiner followed by an adjective followed by an animate noun followed by a past tense verb such as “The old man observed _________.” They used 12 SC bias and 16 DO bias verbs in a cross-modal priming experiment where subjects heard the sentence up to the verb, and then had to name the visually presented prompt of either they, suggesting an SC completion or them, indicating a DO completion. The naming latency was related to the degree of preference or dispreference of the prompt as a possible continuation.

The papers discussed in this section have illustrated evidence for the role of verb subcategorization frequencies in human sentence processing. It is important to note that all of these papers implicitly assume that a single set of subcategorization probabilities can be defined for each verb. In general, these probabilities in the mental lexicon are assumed to be acquired through exposure to language use.

1.3 The problem with verb subcategorization frequencies

The previous section shows that verb subcategorization frequencies play an important role in human language processing. However, studies such as Merlo (1994), Gibson, Schuetze, & Salomon (1996), and Gibson & Schuetze (1999) have found differences between syntactic and subcategorization frequencies computed from corpora and those computed from psychological experiments. Additionally, Biber and colleagues (Biber, Conrad, & Reppen 1998, Biber 1993, Biber 1988) have found that corpora differ in a wide variety of phenomena, including the use of various syntactic structures. This presents two problems for the psycholinguistic community. On one hand, one must answer the practical question of which verb subcategorization frequencies are the most appropriate ones to use for norming experiments. On the other hand, if processing relies on frequencies, and these frequencies are learned through exposure to language use, which frequencies are actually represented in the lexicon? Norming studies and corpora such as the Brown Corpus have different verb subcategorization frequencies, yet frequencies from both are commonly used to represent language use.

Merlo (1994) compared subcategorization frequency data for a set of verbs taken from psycholinguistic norming studies with corpus subcategorization frequencies for the same verbs. The norming data considered in Merlo (1994) was taken from four separate studies; a sentence production study (Connine et al. 1984) and three sentence completion studies (Garnsey et al. 1997, Holmes, Stowe, & Cupples 1989, and Trueswell et al. 1993). In a sentence production study, subjects 1 The # symbol will be used to indicate anomalous examples or garden path examples.

5 are asked to write a sentence using a given verb, while in a sentence completion study, subjects are asked to complete a sentence based on a provided partial sentence, typically a grammatical subject followed by the verb. The corpus data used in her study was taken from the Penn Treebank (Marcus, Santorini, & Marcinkiewicz 1993), and consisted of a combination of Wall Street Journal, MARI radio broadcast transcriptions, and DARPA Air Travel Information System training data where subjects requested flight scheduling information from a reservation system. The comparisons between the corpus data and the Connine data were based on a set of five possible subcategorizations (NP, PP, S, SBAR, SBAR0) and an “other” category, while the comparisons between the corpus data and the other norming studies were based on the categories of NP, SC, and Other.

Table 1 shows the correlation values for the comparisons of the various data sets performed in Merlo (1994). The first column shows the corpora being compared. The second column shows the correlation between the frequencies of the NP subcategorization in each of the two corpora for each of the N verbs available in both corpora. This can be visualized by imagining a graph where each of the verbs is represented by a point in a two dimensional space where the X axis represents the frequency of the NP subcategorization in one corpus, and the Y axis represents the frequency of the NP subcategorization in the other corpus. If all of verbs have the same NP frequency in both corpora, then they would all be located on the line Y=X, and the correlation would be 1. For example, there are 36 verbs for which data is available in both the Trueswell and Merlo data sets, and the correlation between the frequency of the NP subcategorization for each of these verbs in the two data sets is .739. The third column shows the correlations between the frequencies of the SC subcategorizations.

Comparison2NP SC3

Trueswell vs. Garnsey4r = .935 r = .916

Trueswell vs. Merlo r = .739

F(1,36) = 43.36

p < .0001 r = .444

F(1,36) = 8.848 p = .0052

F(1,21) = 10.883 p = .0036 F(1,21) = 15.990 p = .0007

Garnsey vs. Merlo r = .727

F(1,48) = 52.723)

p < .0001 r = .585

F(1,48) = 24.503 p < .0001

Connine vs. Merlo (<50) r = .598 r = .751

F(1,24) = 38.326 p < .0001 F(1,24) = 55.258 p < .0001

Table 1: Correlation values from comparisons in Merlo (1994).

2 Merlo = corpus numbers in (Merlo 1994), Garnsey = Garnsey et al. (1997), Holmes = Holmes et al. (1989), and Trueswell = Trueswell et al. (1993)

3 SC is Merlo’s category “Clause” for all comparisons except the Connine comparison, in which case it is “S” (not “SBAR”)

4 Results for this comparison taken from Trueswell et al. (1993), and also appear in Merlo (1994).

6 Merlo concluded that the norming study data was “not strongly correlated with the corpus counts”, but that it was “appropriate to keep using corpus counts when needed and to continue exploring the possible sources of difference” because corpus probabilities do correlate with experimental evidence in some cases. The important point of this data is that the comparisons between the corpus data and the different norming study data sets have much lower correlations than the comparison between the norming studies done by Trueswell et al. (1993) and Garnsey et al. (1997). This shows that the results produced by two norming studies relying on similar protocols are much more like each other than they are like corpus data.

One must be careful in drawing conclusions about the degree of difference between various corpora based on such data, however. This is because each of the comparisons shown in Table 1 is based on the data for a different set of verbs (note that the values of N range between 21 and 48). One of the conclusions that will be drawn from that data presented in this dissertation is that the degree of subcategorization variability between corpora is related to the choice of which verbs are included in the frequency counts. Thus, if one wants to directly compare the relative degree of subcategorization frequency differences between a series of corpora, one should use the same set of verbs for each case (a difficult objective for both the Merlo paper and this dissertation, in that both rely on fixed data sets from norming studies conducted by other authors).

Other studies have found differences between corpus probabilities and experimental data. Gibson et al. (1996) and Gibson & Schuetze (1999) compare corpus frequencies and experimental data, although they do not address the issue of verb subcategorization probabilities directly. Gibson et al. (1996) found the frequencies of three different structures in the Penn Treebank Brown Corpus and Wall Street Journal Corpus. These structures corresponded to high, middle, and low attachment of a NP with three possible attachment sites. This ambiguity is shown in Figure 1, while examples of these structures are shown in Table 2.

本文来源:https://www.bwwdw.com/article/xaeq.html

Top