A preliminary inquiry into using corpus word frequency data in the automatic generation of English language cloze tests, D Coniam

Tags: test items, word class, word frequency, test production, word classes, items, Hong Kong, word frequencies, Bank of English, Natural Language Processing, cloze test, vocabulary, Language Testing, AGTS, R. Garside, word list, test writer, secondary school teachers, language proficiency, test material, vocabulary size, Hong Kong English Secondary School, items items, word forms, cloze tests, singular noun, Acc, English language, David Coniam, Computer Analysis, Computational Analysis, Garside, R., A. Voutilainen, G. Leech, past participle, Institute of Natural Language Processing Shanghai Jiaotong University, Mouton de Gruyter, Corpus Research Unit, Journal David Coniam Laufer, Cambridge University Press, Oxford University Press, VBD past tense verb, University of Birmingham, ESL students
Content: David Coniam A Preliminary Inquiry Into Using Corpus Word Frequency Data in the Automatic Generation of English Language Cloze Tests David Coniam The Chinese University of Hong Kong ABSTRACT This paper outlines how a multiple choice vocabulary cloze test can be produced from a text. The process described involves assigning word class tags to the text and then retrieving word frequencies for the words in the text from an analyzed corpus. The system allows for the creation of three types of test--one based on the "nth-word deletion" principle, one based on user-specified frequency ranges, and one based on a particular word class. After the user's selection, word class and word frequency of each test item key are matched with similar word class and word frequency options to construct the test items. Analysis of tests produced by the system and administered to students indicates the potential of the computer aided test system, although the three test production modes are not equally successful in their production of "acceptable" test items with the nth-word deletion mode producing considerably fewer acceptable items than the two language oriented test production modes of specified word frequency ranges and particular word classes. The paper concludes with a discussion of the extent to which good test material can be realistically produced by computer aided systems and the different computer tools which may be of use in the process.
KEY WORDS Corpus, word frequency, language testing, word class tagging, computer test production
INTRODUCTION
This project investigates the extent to which it is possible to produce tests (and preferably "good" tests) using currently available computer tech-
Volume 14 Numbers 2-4
15
Using Corpus Word Frequency Data nology. The paper describes the automatic production of English language multiple choice vocabulary cloze tests by accessing word frequency data from a large corpus. One of the impetuses for the project has been involvement with secondary school teachers in Hong Kong who need to design and administer tests on a regular basis. The tests which secondary school teachers produce are often of a quality which provides teachers with far from reliable information on their students. The system described in this paper is therefore a tool to aid, or to ease, the burden of inexperienced test-producing teachers. If tests are to provide accurate information on students, they need to be pretested and refined--with poor items amended or deleted after pretesting. It is difficult to specify exactly what constitutes a good test writer; however, personal experience over several years of setting public examinations suggests that an attrition rate between pretesting and the final product is often in the region of one third of the items initially produced. Given this situation, an acceptable-item rate for a computer aided system should be set somewhat lower than the 66% returned by a competent human setter; an initial target of 50% would appear reasonable. It may well be, however, that a system does not only need to be capable of producing "good" test items. Since even experienced test setters do not produce perfect tests, a system which produces the first draft of a test (which could then be moderated and amended) would in itself be a useful facility for teachers. Apart from the issue of the extent to which a computer aided testing system can produce "good" tests, the matter of the relationship between students' proficiency in English and the size of their vocabulary is also a point at issue and is addressed briefly below.
LANGUAGE PROFICIENCY, VOCABULARY, AND WORD FREQUENCY
Spolsky's 1985 definitions of proficiency in terms of knowledge and use (Structural, Functional, and General Proficiency) are a useful starting point for placing proficiency into perspective. The type of proficiency tapped by the current system falls into Spolsky's third category--General Proficiency. With regard to vocabulary, corpus data analysis has been causing a significant refocusing of views concerning the nature of the English language, with some researchers arguing that descriptions of English need to be considered as much, if not more, in lexical terms than in purely grammatical terms (see, for example, Willis 1990; Sinclair 1991; and Lewis 1993). In addition to studies into the role of lexis, fruitful research has been conducted into the relationship between word frequency and language profi-
16
CALICO Journal
David Coniam ciency. HarlechJones (1983) in a small-scale study with ESL teacher-trainees in South Africa compared word frequency counts and language proficiency levels. His results indicate that it is indeed the case that the more "unusual" ("infrequent" in corpus terms) words which students have control over, the higher their expected level of proficiency. Meara and Jones (1988), in a rather larger study (250 ESL students), investigated the extent to which a larger vocabulary indicates a greater degree of proficiency. They concluded that there is a strong relationship between the size of students' vocabulary and their level of proficiency, with the results of their study suggesting that vocabulary size can be a useful tool for placement testing. One of the most extensive studies is Laufer and Nation's Lexical Frequency Profile (1995) which illustrates the importance of word frequency as a factor in language proficiency. They argue that ESL students' vocabulary size is, to some degree, a reflection of their productive use of the language--indicating how lexis can be assessed independently of grammar. In their study, they examined essays written by 22 students of differing levels of proficiency in English and compared the word frequencies in the essays against three word list blocks. Significant correlations emerged between vocabulary levels in two essays by the same writer and proficiency level as determined by a placement test. Coniam (in preparation) also investigated word frequency, examining the occurrences of different word tokens and word types in 18-year old ESL students' examination scripts against a frequency word list of all words in English. With word tokens, no significant correlations emerged between students' use of the less frequent words and the grades awarded to their writing. With word types, however, significant correlations emerged between students' use of the less frequent words and grade. Given that empirical evidence supports the relationship between word frequency and proficiency levels, the concept of attempting to use a word frequency list to produce vocabulary tests may therefore be less controversial than it appears.
PROGRAM OPERATION The setup of the current computer system allows tests to be constructed in three different ways. · by selecting every nth-word in the text to be a test item; · by specifying word frequency ranges for the test (e.g., only high frequency or only low-frequency words) ; and · by specifying a certain word class (e.g., noun, verb, or adjective).
Volume 14 Numbers 2-4
17
Using Corpus Word Frequency Data
This paper will limit itself to describing type (1) above since the procedure for all three test construction modes is similar, and this type of cloze test is the type most familiar to teachers. The manner in which a test is produced by the automatic test generating system is as follows.
1. Take a plain text file There are no restrictions on the type of text to be used, except that text length should not exceed 1,000 words, due to computer memory limitations of having been developed under the MS DOS operating system. In terms of the production of vocabulary cloze tests, however, this limit is more than adequate because cloze passages with 30 test items with a deletion occurring, say, every eighth word are generally in the region of 300400 words in length.
2. Pass the text file to a word class tagger The word class tagger that has been used in the current project is the Automatic Grammatical Tagging System of Shanghai Jiatong University's Institute of Natural Language Processing (1988) (henceforth AGTS). The output of the tagger consists of each word, with its corresponding word class tag on the line beside it. A sample of the tagged output of a sentence taken from the text-to-test in the Appendix at the end of this article is presented in table 1.
Table 1 Sample Word Class Tagging The Australian economy has really been going down for several years and it is no longer competitive on the world market.
Word
Tag
Word
Tag
The Australian economy has really been going down for several years
ATI JNP NN HVZ RB BEN VBG RI IN AP NNS
and
CC
it
PP3
is
BEZ
no
RB
longer
RB
competitive JJ
on
IN
the
ATI
world
NN
market
NN
Legend: AP = postdeterminer, ATI = article, CS = coordinating conjunction, IN = preposition, JJ = adjective, JNP = word initial capital adjective, NN = singular noun, NNS = plural noun, PP3 = 3rd person pronoun, RB = adverb, RI = adverb/ preposition, VBD = past participle, VBG = past tense verb.
18
CALICO Journal
David Coniam Note: For a full list of AGTS word class tags, the reader is referred to the set of word class tags for the Constituent-Likelihood Automatic Word Tagging System (CLAWS) described by Sampson (1987, 165). The AGTS tagset is very similar to the CLAWS tagset. The AGTS's claimed accuracy rate is 95%, which is comparable to that of many other taggers e.g., the Constituent-Likelihood Automatic Word Tagging System (Garside 1987), the English Constraint Grammar Parser (Karlsson et al. 1995). The 5% error rate is broadly acceptable. (It should not be forgotten that human test setters also produce bad items.) However, the AGTS's mistagging does result in some "bad" items being produced. This factor can be improved and is discussed later. 3. Set the starting word/test item spacing settings The "Classic" cloze procedure involves setting every "nth" (e.g., sixth or eighth) word as a test item, often with the first sentence left intact to give the student some initial context. (See Madsden [1983, 48-50] and Hughes [1989, 62-66] for overviews of some of the principles behind nth-deletion cloze tests.) Users are therefore prompted to select the word at which they want the test to begin (which allows them to leave the first sentence untouched) and the length of the gap they want between items. 4. Obtain the word class for each word around which a test item is to be constructed The AGTS's word class tagset consists of 114 word classes. Only a small number of word classes can be used, however, to construct test items. For example, some word classes have a very limited membership that makes them unsuitable for forming test items. Selecting the word class article does not work because the multiple-choice alternatives would always be the same in that they would draw on the same limited set. Similarly, the word class number does not work because--with the exception of numbers such as hundred, thousand, million--numbers are generally interchangeable as test items. There are also a number of the word classes in the AGTS which consist of unique words--"DOZ" ("does") and "CD1" ("one")--and cannot therefore be used to form test items. 5. Obtain the word frequency for each word to be a test item Having set the nth-word deletion option, the program calls up a word list. The word list comes from the tagged 211-million-word Bank of English (BoE) corpus which comprises, in total, some 600,000 different word forms. To give an indication of relative word frequency, table 2 lays out the BoE in terms of frequency "bands."
Volume 14 Numbers 2-4
19
Using Corpus Word Frequency Data
Table 2 Word Frequencies in the Bank of English
Most frequent words
% of text accounted for in English
157 394 915 2,145 6,358 15,292 32,400
50% 60% 70% 80% 90% 95% 97.5%
It is apparent that, since the 32,400 most frequent words account for the vast majority of English text (97.5%), the list of 600,000 word forms can be drastically reduced for present purposes with little loss of usable information.1 An amended list of the 158,000 most frequent word types in English has therefore been constructed by excluding word forms with less than four occurrences, which has eliminated most proper nouns, word forms containing numbers, and typographical errors (such as "theamerican" for "the American"). A sample of the amended list is set out in table 3.
Table 3 The Bank of English Tagged Word List
Word
Word class tag
Frequency position
the be of and a in to abashing abandoned aban abalones abah
AT V IN CC AT ININ TO VBG VBD NN NNS NN
1 2 3 4 5 6 7 158,614 158,615 158,616 158,617 158,618
Number of occurrences 11,610,921 6,861,894 5,359,123 4,941,292 4,537,315 3,777,696 3,306,951 4 4 4 4 4
Even when the word list has been pruned to remove word forms such as proper nouns, numbers etc., the use of an extensive corpus word list is not without its own problems. The fact that the BoE has been constructed from authentic language inevitably means that the word list contains elements which, from a testing point of view, may lead to the creation of poor test items. School tests do not usually include items of language such as
20
CALICO Journal
David Coniam
colloquialisms ("wimp," guvnor"), contractions ("bbc," "agm"), or swear words ("shit, "fuck"). Once the spacing between test items has been established, word frequencies are retrieved for each word at position n in the text. Table 4 sets out a sample of a test produced with word classes and word frequencies indicated for the words designated as test items. A University of Wollongong researcher, Ms. Robyn Iredale, commented that a __(2)__ of the hiring practices of 55 companies also said "there was no __(3)__ putting a small Asian in a __(4)__ of authority over taller Australians." She said: "They said __(5)__ workers would not like having Asians __(6)__ because they work too hard."
Table 4 Word Classes and Word Frequencies in Test Items
Item no.
Word (test key)
Word class tag
2
survey
noun
3
point
noun
4
position
noun
5
other
determiner
6
around
preposition
Frequency 1,715 299 632 80 201
As mentioned above, with the AGTS's accuracy being around 95%, a degree of error has to be expected. As can be seen in table 4, item (5) "other" has been tagged as a determiner, while item (6) "around" has been tagged as a preposition. In cases in which the computer analysis is deficient, the possibility of poor test item creation obviously exists. This issue will be discussed later.
6. Construct the test items Having established the frequency and word class of each test item, the program now calls up the tagged BoE word list again and matches the word class tag and word frequency of potential alternatives with that of the item key. Two test items for items (2) and (3) from table 4 emerge as in table 5.
Volume 14 Numbers 2-4
21
Using Corpus Word Frequency Data
Table 5. Two Sample Items
Item (2)
Option Frequency
A.
driver
B.
distance
C. survey [key]
D.
dream
E.
tree
1,716 1,717 1,715 1,719 1,724
Item (3)
Option Frequency
A.
war
210
B.
course
222
C. point [key]
299
D.
lot
231
E.
thing
234
As the program constructs test items, it needs to maintain parallelism among the items' alternatives: · If the key is capital-initial, all alternatives in the item become capital initial. · Any mismatch between any of the alternatives in terms of being vowelinitial with respect to the preceding word in the body of the text is examined and adjusted; for example, "a orange" · "an orange" "an banana" · "a banana." · If a test item has a similar frequency (i.e., within 100 word frequency steps) to that of a previous test item, the frequency of the most recent item is "adjusted"; that is, its frequency is incremented slightly in order to avoid items with the same alternatives being constructed. 7. Write output logfile Once the test is complete, a logfile is written to the computer's hard disk which incorporates the original text, the constructed test with blanks and test items, a summary of the answer key, and the word class and the word frequency of each test item. On a PC operating at 133 MHz, the process of word class tagging, word frequency analysis, and test construction takes approximately two minutes. (See the sample in the Appendix.)
AN EXAMINATION OF TESTS PRODUCED BY THE SYSTEM
A number of cloze tests have been produced by the system, utilizing the three different test-producing modes. Of these tests, two from each of the three test modes have been administered to two Grade 12 classes (approximately 60 students) of average ability Hong Kong secondary school ESL students.2 This section examines the results of the cloze tests from the perspective of acceptable test items. For the purposes of the current discussion, "acceptable" and "unacceptable" items are defined in terms of facility index and discrimination in-
22
CALICO Journal
David Coniam
dex. An acceptable item has a facility index of between 30%-80% and a discrimination index greater than 0.2. An unacceptable item has a facility index of lower than 30% (very difficult) or greater than 80% (very easy) or a discrimination index is lower than 0.2 Table 6 presents the results for the two cloze tests produced by each of the three tests modes.
Table 6 Acceptable Items Produced in Trial Tests
Nth word based test items
Frequency based test items
Cloze 1 Cloze 2 Cloze 1 Cloze 2
No. of Acc. No. of Acc. No. of Acc. No. of Acc. items items items items items items items items
33 15 (45%)
26 10 (38%)
19 13 (68%)
27 12 (44%)
Word class based test items
Cloze 1 Cloze 2
No. of Acc. No. of Acc. items items items items
22 11 (50%)
48 28 (58%)
As can be seen in table 6, test results vary, with the target of 50% acceptable items being achieved on only three of the six cloze tests. Although only a few tests have been field-tested, the nth-word-deletion mode appears, on initial results, to be the least successful. Hughes (1989, 66) comments on the general unsatisfactory nature of nth-deletion cloze tests, and how it is preferable to target test items at specific testing points. This advice is confirmed by the results from the current system in which tests constructed around specific word frequency ranks or certain word classes have produced better results than tests in which a deletion has been specified for every nth word. Table 2 presented the BoE in terms of frequency ranks. An analysis of the acceptable and unacceptable items against these frequency ranks for the three modes of test production does not, however, show an even distribution, as table 7 illustrates.
Volume 14 Numbers 2-4
23
Using Corpus Word Frequency Data
Table 7. Good and Bad Items and Word Frequency Levels
Frequency <80% 2 80%-90% 90%-95% 95%-97.5% >97.5% Total
Acc. 12 1 1 15 (45%)
Unacc. 17 5 18
Acc. 5 - 10 (38%)
Unacc. 14 2 16
Cloze 1
Acc. Unacc.
items items
-
-
8
2
3
4
1
-
1
-
13
6
Cloze 2
Acc. Unacc.
items items
1
6
7
8
3
1
1
-
-
-
12 15
Cloze 1
Acc. Unacc.
items items
5
7
1
2
3
1
-
-
2
1
11 11
Cloze 2
Acc. Unacc.
items items
14 12
4
1
3
4
3
2
4
1
28 20
The picture that emerges from table 7 is that test items constructed from the very frequent words appear to result in the highest number of unacceptable items. This result is perhaps not surprising since many more of the common words belong to more than one word class than do the less common words. (Of the most frequent 2,500 words [i.e., the top 80%], 13.4% belong to than one word class, whereas from 2,501 to 20,000, only 7.4% of the words belong to more than one word class. [Coniam 1995, 318]). It is also the case that, with regard to single word class membership, more frequent words tend to have a greater number of different senses than the less common words. It is therefore a fact of language that the nth-word-deletion mode will result in many high-frequency words appearing as test items.
CONCLUSION
This paper has described a preliminary investigation into automatic test production, outlining how a test might be produced from a text, but also examining the extent to which the process produces usable output from the point of view of an end user--an ESL teacher. The paper has described three ways in which multiple choice vocabulary tests may be constructed,
24
CALICO Journal
David Coniam
concluding that the "classic" nth-word-deletion manner of producing test items produces much less usable output than the modes in which the user specifies either a word class or a range of word frequencies. The fact that better tests emerge from the two language oriented test production modes than from the specifying of an arbitrary number does vindicate, to some extent, the computer tools used in the project. While the relationship between word frequency and proficiency was justified earlier in the paper and does appear to have--as an operational principle--a certain amount of validity, a valid criticism of the system described is that the word frequency approach is still a "blunt" one in that meaning (as with word sense, for example) plays no part in the way test items are produced. While the program can differentiate between different word classes for a particular word (e.g., "light" as adverb, noun, adjective), it does not draw distinctions for a word with different senses for the same word class (e.g., "tie" = equal score, article of clothing). One direction which is currently being explored involves allying word frequency with a thesaurus (such as the machine-readable Roget's thesaurus from the Project Gutenberg 1995), so that word frequency could be combined with an element of meaning. The computer tools used in this project are an issue which require some final discussion. Some poor quality output can be attributed to the fact that the AGTS's 5% error rate, although low, does lead to poor item creation. A more accurate tagger would probably be an aid to better quality item production; for example, Karlsson et al. (1995) quote figures of 96% or higher for the word class tagging achieved by the English Constraint Grammar Parser.3 Poor items produced because of problems with the BoE have also been mentioned, although--as has been outlined--part of the problem lies in the fact that some of the authentic language which the BoE contains (contractions, colloquialisms, swear words, etc.) would generally not be expected to appear in test items. It would of course be possible to edit these expressions out. Editing a list of some 158,000 words is, however, quite a formidable task, and one which is arguably not cost effective, given that the rate of problematic cases is less than 1%.4 One possibility involves using a parser, which would render rather a more detailed (albeit still syntactic) analysis. Multi-word lexical items could be identified, for example, attributive or predicative adjectives within an adjectival group could be differentiated. In general, however, the accuracy rate of parsers on authentic text is very low. (See Black [1993, 5] for a discussion of the "dismal state of the art of parsing in English" over the period 1990-1992, where the accuracy rates of a number of "major parsers" ranged from 16% to 41%). It could therefore be argued that the more sophisticated methods of computer analysis and parsing are perhaps not as reliable in terms of their output as that achieved by marrying a word
Volume 14 Numbers 2-4
25
Using Corpus Word Frequency Data
list's "predigested" form with a tool such as a word class tagger. Computer tools are available which could have an impact, however. Sinclair (1992), for example, proposed a number of specific lexical and lexico-grammatical "partial parsers" which would analyze corpora for major syntactic or semantic patterns. Amongst these are
· a collocator to determine the collocational patterns of a given word; · a disambiguator to determine likely meaning of words through ex- amining their collocational patterns; · a setter to build up lexical sets through the study of collocational consistencies among words.
Output from such tools--which would present corpora in "predigested" form that could be accessed in its entirety like a word list--will indeed have more potential than the simple word frequency approach utilized by the current system. In terms of the quality of the test output, the current system, although not complete "garbage-in-garbage-out," cannot be said to be an indication of total success. The benchmark for acceptable test items was proposed at the start of the paper as 50%--considerably less than for an experienced human setter. Of the three test production modes--selecting every nth word, selecting particular word frequency ranges, and selecting particular word classes, the nth-word mode has shown the least promise. The latter two, with their clearer linguistic focuses, show greater potential. To conclude, word frequency can be a starting point for teachers to sample their students' proficiency by means of a computer test generator. The current study has demonstrated a first step towards how a computer system can be used to produce vocabulary cloze tests, with around 50% of the items produced on word frequency based tests and word class based tests determined acceptable. This figure is not high enough for the system to be used in an unsupervised manner to generate achievement tests (such as end-of-year tests in which the percentage of good items produced needs to be high). However, the system has considerable potential with respect to the production of first drafts which can then be moderated and revised. Further from personal experience, any test production requires that a number of ideas be initially explored, with some first draft tests having to be discarded completely. As a human setter needs to invest time in initial exploration of suitable material, given that the text-to-test process takes only two minutes, a number of texts need to be run through the system before an acceptable-looking test emerges. One obvious criticism of the availability of computer generated tests is that easy access to an automatic test generator may result in teachers foisting more tests on their students than would otherwise be the case. An abundance of testing material is available to teachers (much of which, admit-
26
CALICO Journal
David Coniam tedly, consists of poorly designed test items). The design of a good test by a computer will be influenced in part by the teacher's choice of a good authentic text; that is, a text in which the vocabulary is appropriate to the level of a particular group of students. A teacher using a computer aided testing system should therefore already have made a number of important pedagogical decisions.
Volume 14 Numbers 2-4
27
Using Corpus Word Frequency Data
APPENDIX Test Produced Using Nth-Word Delection Mode
Original text Australians "Fear Hong Kong Workaholics"
Australian bosses disliked hiring Hong Kong employees because they worked too hard and made their Australian colleagues uneasy, according to a recent study in Australia. A University of Wollongong researcher, Ms. Robyn Iredale, commented that a survey of the hiring practices of 55 companies also said "there was no point putting a small Asian in a position of authority over taller Australians." She said: "They said other workers would not like having Asians around because they work too hard. They put their heads down to work and show the others off badly. Australians feel uncomfortable with them." Legislator Dr. Huang Chen-ya, who holds an Australian passport, said the survey findings were "a very sad story." Dr. Huang said: "The Australian economy has really been going down for several years and it is no longer competitive on the world market. One of the key factors is in productivity -- if Australian companies are going to reverse Darwin rule of the survival of the fittest, and reject the most productive workers, they are committing suicide." The director of the Hong Kong General Chamber of Commerce, Mr. Ian Christie, however, said the findings were a "marvellous, although back-handed, praise for Hong Kong." He said Hong Kong migrants hard work was "exactly why Hong Kong is where it is today and why Australia is where it is today." Mr. Christie said: "I wish the UK also had woken up to the marvellous opportunities they have bypassed by not getting enough Hong Kong migrants through the British Nationality Scheme." The chamber chief economist, Mr. Ian Perkin, agreed that Hong Kong migrants were hard-working but disagreed that this made Australians uncomfortable. Hong Kong overtook Britain last year to become the prime source of immigrants to Australia, with about 13,540 Hong Kong people migrating there in the year ended June 30, 1991. Australian bosses showed a preference for American, British, Canadian and New Zealand migrants ahead of other nationalities. Companies which nominated particular nationalities said Hong Kong Chinese were too hard-working, South Africans too aggressive, Indians too bureaucratic, while Poles, Turks and French too "hot-headed." There was also a bias against recognising qualifications from anything but English-speaking countries. Ms. Iredale said outdated Australian attitudes led to an "entrenched preference" which was "covertly discriminatory." She said she was most surprised that only one of the 55 companies could see a benefit from hiring foreign nationals. The one that did, an engineering firm, said hiring Asians would help it win contracts in the region. Source: South China Morning Post, 7-10-1992
28
CALICO Journal
David Coniam Constructed test: 23 test items Australians "Fear Hong Kong Workaholics" Australian bosses disliked hiring Hong Kong employees because they worked too hard and made their Australian colleagues uneasy, __(1)__ to a recent study in Australia. A University of Wollongong researcher, Ms. Robyn Iredale, commented that a __(2)__ of the hiring practices of 55 companies also said "there was no __(3)__ putting a small Asian in a __(4)__ of authority over taller Australians." She said: "They said __(5)__ workers would not like having Asians __(6)__ because they work too hard. They put their heads down to work and show the others off badly. Australians feel uncomfortable with them." Legislator Dr. Huang Chen-ya, who holds an Australian passport, said the survey findings were "a very sad story." Dr. Huang said: "The Australian economy has really been going __(7)__ for several years and it is no longer competitive on the world market. One of the key factors is in productivity -- if Australian companies are __(8)__ to reverse Darwin rule of the __(9)__ of the fittest, and reject the most productive workers, they are __(10)__ suicide." The director of the Hong Kong General Chamber of Commerce, Mr. Ian Christie, however, __(11)__ the findings were a "marvellous, although back-handed, praise for Hong Kong." He said Hong Kong migrants hard work was "exactly why Hong Kong is where it is today and why Australia is where it is __(12)__." Mr. Christie said: "I wish the UK also had __(13)__ up to the marvellous opportunities they __(14)__ bypassed by not getting enough Hong Kong migrants through the British Nationality Scheme." The chamber chief economist, Mr. Ian Perkin, agreed that Hong Kong migrants were hardworking but disagreed that this made Australians uncomfortable. Hong Kong __(15)__ Britain last year to become the __(16)__ source of immigrants to Australia, __(17)__ about 13,540 Hong Kong people migrating __(18)__ in the year ended June 30, 1991. Australian bosses showed a __(19)__ for American, British, Canadian and New Zealand migrants ahead of other nationalities. Companies which nominated particular nationalities __(20)__ Hong Kong Chinese were too hard-working, South Africans too aggressive, Indians too bureaucratic, while Poles, Turks and French too "hot-headed." There was also a bias against recognising qualifications from anything but English-speaking countries. Ms. Iredale said outdated Australian attitudes led to an "entrenched preference" which was "covertly discriminatory." She __(21)__ she was most surprised that only __(22)__ of the 55 companies could see a benefit from hiring foreign nationals. The one that did, an engineering __(23)__, said hiring Asians would help it win contracts in the region.
Volume 14 Numbers 2-4
29
Using Corpus Word Frequency Data
__(1)__ A. certainly B. according [key] C. sometimes D. instead E. particularly
__(9)__ A. reception B. laboratory C. survival [key] D. setting E. thinking
__(17)__ A. with [key] B. from C. about D. out E. into
__(2)__ A. driver B. distance C. survey [key] D. dream E. tree
__(10)__
__(18)__
A. condemning
A. not
B. flashing
B. there [key]
C. widening
C. more
D. committing [key] D. just
E. boosting
E. then
__(3)__ A. war B. course C. point [key] D. lot E. thing
__(11)__ A. had B. were C. did D. said [key] E. came
__(19)__ A. bloke B. preference [key] C. implementation D. beard E. prosecutor
__(4)__ A. situation B. letter C. summer D. position [key] E. stage
__(12)__ A. today [key] B. still C. back D. much E. too
__(20)__ A. went B. didn't C. took D. made E. said [key]
__(5)__ A. other [key] B. last C. good D. such E. more
__(13)__ A. woken [key] B. wheeled C. subsidized D. sharpened E. enlisted
__(21)__ A. used B. saw C. said [key] D. looked E. knew
__(6)__ A. around [key] B. without C. within D. among E. behind
__(14)__ A. know B. think C. have [key] D. don't E. see
__(22)__ A. something B. nothing C. anything D. one [key] E. someone
__(7)__ A. down [key] B. still C. back D. much E. too
__(15)__ A. depicted B. notched C. overtook [key] D. ticked E. shipped
__(23)__ A. loss B. minute C. start D. firm [key] E. return
__(8)__ A. being B. going [key] C. making D. doing E. looking
__(16)__ A. prime [key] B. past C. main D. major E. short
30
CALICO Journal
David Coniam
Summary: Item Keys, Word Class Tags, Word Frequency Distribution
Item
Key Word
Tag
Gloss
Word Frequency
(1)
B
according RB
adverb
546
(2)
C
survey
NN
singular noun
1,715
(3)
C
point
NN
singular noun
299
(4)
D
position NN
singular noun
632
(5)
A
other
AP
post-determiner
80
(6)
A
around IN
preposition
201
(7)
A
down
RI
adverb particle
120
(8)
B
going
VBG -ing verb
144
(9)
C
survival NN
singular noun
3,608
(10)
D
committing VBG -ing verb
9,722
(11)
D
said
VBD past tense verb
47
(12)
A
today
RT
time adverb
164
(13)
A
woken
VBN
past participle
17,014
(14)
C
have
HV
verb "have"
25
(15)
C
overtook VBD past tense verb 23,967
(16)
A
prime
JJ
adjective
459
(17)
A
with
IN
preposition
18
(18)
B
there
RL
place adverb
37
(19)
B
preference NN
singular noun
6,615
(20)
E
said
VBD past tense verb
47
(21)
C
said
VBD past tense verb
47
(22)
D
one
PN
pronoun
39
(23)
D
firm
NN
singular noun
915
NOTES
1 The calculations in table 2 concerning the number of words which account for various percent bands of the Bank of English were obtained from the analyzed 211-million-word corpus in late 1995. The definition of a "word" in the BoE is "a string of characters delimited at either end of the string by white space." Understandably, therefore, the corpus contains a lot of numbers, foreign words, misspellings, words run together, etc. For the purposes of the current project, where the definition of a word needs to be one that is standard English such as might be found in a dictionary, a cutoff point was set at word forms with four occurrences or more. This amounts to almost 204 million words. Almost half (48%) of the 600,000 words occur only once, and a substantial number of these occurrences are proper nouns.
2 Student ability was established by incorporating a number of multiple choice
Volume 14 Numbers 2-4
31
Using Corpus Word Frequency Data items (calibrated using IRT techniques against a representative sample of the Hong Kong secondary school population) in the tests along with the multiple choice cloze subtests (see Coniam 1995). 3 A number of taggers are available both commercially (e.g., the TOSCA ICE tagger for 500 Dutch Gilders [e-mail [email protected]]) and free of charge (e.g., Brill's tagger [http://www.cs.jhu.edu/~brill/home.html] and the Xerox tagger [http://www.xerox.com/lexdemo/xlt-overview.html]). It is also possible to send text via e-mail to (free) tagging facilities which will return the text in a matter of hours (e.g., Part-of-speech Tagging Service at Corpus Research Unit, University of Birmingham, UK [http://www.clg.bham.ac.uk/tagger.html] and the Automatic Mapping Among Lexico-Grammatical Annotation Models at the University of Leeds, UK [http://agora.leeds.ac.uk/amalgam/]). 4 Sinclair (personal communication 1995) suggests that an error rate of some 2% may be a fact of life in any analysis of language.
REFERENCES Black, E. (1993). "Statistically-based Computer Analysis of English." In Statistically-driven Computer Grammars of English: the IBM/Lancaster Approach, edited by E. Black, R. Garside, and G. Leech, 1-16. Amsterdam: Rodopi. Coniam, D. (1995). "Towards a Common Ability Scale for Hong Kong English Secondary School Forms." Language Testing 12, 2, 184-195. _____. (1995). "Partial Parsing: Software for Marking Linguistic Boundaries in English Texts." Ph.D. diss., University of Birmingham. _____. (in preparation). "Word Frequency and Language Proficiency." Falvey, P., J. Holbrook, and D. Coniam. (1994). Assessing Students. Hong Kong; Longman. Garside, R. (1987). "The CLAWS Word tagging System." In The Computational Analysis of English, edited by R. Garside, G. Leech, and G. Sampson, 30-41. London: Longman. Harlech-Jones, B. "ESL Proficiency and a Word Frequency Count." ELT Journal 37, 1 (1983): 62-70. Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University Press. Institute of Natural Language Processing Shanghai Jiaotong University. (1988). Automatic Grammatical Tagging System, V. 1.0. Shanghai: Institute of Natural Language Processing, Shanghai Jiaotong University. Karlsson, F., A. Voutilainen, J. Heikkila, and A. Anttila. (1995). Constraint Grammar: A Language-independent System for Parsing Unrestricted Text. Berlin: Mouton de Gruyter.
32
CALICO Journal
David Coniam Laufer, B. and P. Nation. (1995). "Vocabulary Size and Use: Lexical Richness in L2 Written Production." applied linguistics 16, 3, 307-322. Lewis, M. (1993). The Lexical Approach. Hove: language teaching Publications. Madsden, H. (1983). Techniques in Testing. New York: Oxford University Press. Project Gutenberg. (1995). Etext of Roget's Thesaurus Number Two. Lisle, IL.: Benedictine College. Sampson, G. (1987). "Alternative Grammatical Coding Systems." In The Compu- tational Analysis of English, edited by R. Garside, G. Leech and G. Sampson, 165-183. London: Longman. Sinclair, J. M. (1992). "Automatic Analysis of Corpora." In Directions in corpus linguistics, edited by J. Svartvik, 379-397. Proceedings of Nobel Symposium 82, Stockholm, 4-8 Aug. 1991. Berlin: Mouton de Gruyter. Sinclair, J. (1991). Corpus Concordance Collocation. Oxford: Oxford University Press. _____, ed. (1987). "The Nature of the Evidence." Looking Up. London: Collins. Spolsky, B. (1985). "What Does it Mean to Know How to Use a Language? An Essay on the Theoretical Basis of Language Testing." Language Testing 2, 180-191. Willis, D. (1990). The Lexical Syllabus. London and Glasgow: Collins ELT. ACKNOWLEDGMENTS I would like to thank Cobuild of the University of Birmingham for access to the Bank of English corpus, and the Institute of Natural Language Processing of Shanghai Jiatong University for use of the Automatic Grammatical Tagging System word class tagger.
AUTHOR'S BIODATA
David Coniam is an Associate Professor in the Faculty of Education at the Chinese University of Hong Kong, where he is a teacher educator, working with ESL teachers in Hong Kong secondary schools. His main publication and research interests are in computational linguistics, language testing and language teaching methodology.
AUTHOR'S ADDRESS
Faculty of Education The Chinese University of Hong Kong Sha Tin Hong Kong
Phone: Fax: E-mail:
(852) 2619 6917 (852) 2818 6591 [email protected]
Volume 14 Numbers 2-4
33
Using Corpus Word Frequency Data Come see why we were visited over 5000 times last month AGORA LANGUAGE MARKETPLACE www.agoralang.com for comprehensive listings of language publishers, schools, study abroad, learning center services, and other professional language services
34
CALICO Journal

D Coniam

File: a-preliminary-inquiry-into-using-corpus-word-frequency-data-in.pdf
Author: D Coniam
Author: Calico
Published: Fri Nov 8 10:25:41 2002
Pages: 20
File size: 0.12 Mb


, pages, 0 Mb

MYSTERY READERS JOURNAL, 32 pages, 0.55 Mb

, pages, 0 Mb

The Mission, 1 pages, 0.87 Mb
Copyright © 2018 doc.uments.com