Wednesday, April 9, 2008

Corpus / Corpora V.S. Concordancer

I. What is a corpus?

Corpus is a collection of texts, either written texts or a transcription of recorded speech, which has been designed and compiled based on a set of clearly defined criteria. It is also a collection of linguistic data, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language.

Corpora (that's the plural of corpus) are now used by many people involved in language teaching. All of the modern learners' dictionaries are based on corpora. People who study grammar and how best to teach it use corpora to discover new grammar principles not found in older grammar books.

Different corpora include native speaker vs. learner, monolingual vs. multilingual, plain vs. annotated.

*Extra information provided:

1. BNC Spoken Corpus



II. What is a concordancer?


Corpus users use a type of software called a concordancer to search for a specific word or phrase they want to find. The concordancer searches the whole database and then provides all the sentences containing that word or phrase. In this way, the user can see how it has been used by many different people, and can study its meaning, grammar, and so on. (Also see Corpus Linguistics).

A concordancer is a kind of search engine designed for language study. If you enter a word, it looks through a large body of texts, called a corpus, and lists every single example of the word.
This lets you look at a word in context, see how common it is, and see the style associated with it. Such a tool is a computer-specific tool that you may not be familiar with from learning English by more traditional ways, but it is worth spending some time experimenting with it and getting to know how to use it. In addition to showing you a clear and objective picture of language use, concordancers can help you with words that you are unsure of. You can use it to compare your usage with that of native speakers.


*Extra information provided:

1.Also see Corpus Linguistics


2. http://www.usingenglish.com/articles/concordancers.html


Examples of English language corpora




1. The Bank of English – written and spoken English (used extensively by researchers and for the COBUILD series of English language books)


2. The BNC – written and spoken British English (used extensively by researchers and for the Oxford University Press, Chambers and Longman publishing houses)



3. BYU Corpus of American English-- is a large corpus of American English contains a wide array of texts from a number of genres.Different corporanative speaker vs. learner monolingual vs. multilingual plain vs. annotated

No comments: