In two weeks, several scholars affiliated with the Centre will be heading south to attend the 5th International Language in the Media Conference, taking place this year at Queen Mary, University of London. We are particularly excited about the theme — “Redefining journalism: Participation, practice, change” — as well as the conference’s continued prioritization of papers on “language and class, dis/ability, race/ethnicity, gender/sexuality and age; political discourse, commerce and global capitalism” (among other important themes). As a taster for those of you who will be joining us in London and an overview for those who are unfortunately unable to make it this year, abstracts of the CASS affiliated papers to be given at the conference are reproduced below.

“I hate that tranny look”: a corpus-based analysis of the representation of trans people in the national UK press

Paul Baker

In early 2013, two high-profile incidents involving press representation of trans people resulted in claims that the British press were transphobic. For example, Jane Fae wrote in The Independent, that ‘the trans community… is now a stand-in for various minorities… and a useful whipping girl for the national press… trans stories are only of interest when trans folk star as villains” (1/13/13). This paper examines Fae’s claims by using methods from corpus linguistics in order to identify the most frequent and salient representations of trans people in the national UK press. Corpus approaches use computational tools as an aid in human research, offering a good balance between quantitative and qualitative analyses, My analysis is based upon previous corpus-based research where I have examined the construction of gay people, refugees and asylum seekers and Muslims in similar contexts.

Using a 660,000 word corpus of news articles about trans people published in 2012, I employ concordancing techniques to examine collocates and discourse prosodies of terms like transgender, transsexual and tranny, in order to identify repetitive patterns of representation that occur across newspapers. I compare such patterns to sets of guidelines on language use by groups like The Beaumont Society, and discuss how certain representations can be enabled by the Press Complaints Commissions Code of Practice. While the analysis found that there are very different patterns of representation around the three labels under investigation, all of them showed a general preference for negative representations, with occasional glimpses of more positive journalism.

“I think we’d rather be called survivors”: A corpus-based critical discourse analysis of the semantic preferences of referential strategies in Hurricane Katrina news articles as indicators of ideology

Amanda Potts

In times of great crisis, people often rely upon the discourse of powerful institutions to help frame experiences and reinforce established ideologies (van Dijk 1985). Selection of referential strategies in such discourses can reveal much about our society; for instance, some words have the power to comfort addressees but further oppress the referents. Taking a corpus-based critical discourse analytical approach, in this paper I explore the discursive cues of underlying ideology (of both the publications and perhaps the assumed audience) with special attention on journalists’ referential and predicational strategies (Reisgl and Wodak 2000). Analysis is based on a custom-compiled 36.7-million-word corpus of American news print articles concerning Hurricane Katrina.

A variety of forms of reference have been identified in the corpus using part-of-speech tagged word lists. Collocates of each form of reference have been calculated and automatically assigned a semantic tag by the UCREL USAS tagger (Archer et al. 2002). Semantic categories represented by the highest proportion of collocates overall have been identified as the most salient indicators of ideology.

The semantic preferences of the referential strategies are found to be quite distinct. For instance, resident prefers the M: Movement semantic category, whereas collocates of evacuee tend to fall under N: Numbers. This may prime readers to interpret Gulf residents and evacuees as large, threatening, ‘invading’ masses (often in conjunction with negative water metaphors such as flood). The highest collocate semantic category for victim, displaced, and survivor is S: Social actions, states and processes, indicating that the [social] experiences of these referents—such as being helped or stranded, or linked to social identifies such as wife—are foregrounded rather than their numbers or movement.

Finally, the plummeting frequency of refugee following a unique debate in the media over the word’s meaning and even its semantic preference will also be discussed as an illustrative example of how unconscious language patterns can sometimes come to the fore in contested usage and influence the journalistic lexicon. Following from this, a more considered use of referential strategies is recommended, particularly in the media, where this could encourage heightened compassion for- and understanding of those gravely affected by catastrophic events.

Journalism through the Guardian’s goggles

Anna Marchi

‘Journalism is an intensely reflexive occupation, which constantly talks to and about itself’ (Aldridge and Evetts 2003: 560). Journalists create interpretative communities (Zelizer 2004) through the discourses they circulate about their profession, the meaning and role of journalism are constituted through daily performance (Matheson 2003) and can be studied by means of the self-reflexive traces in texts. That is, they can be detected and studied in a newspaper corpus.

This paper proposes a corpus-assisted discourse analysis (Partington 2009) of the ways journalists represent their trade in their own news-work. The focus of the research in one newspaper in particular: the Guardian. Previous research (Marchi and Taylor 2009) suggested that among British broadsheets the Guardian is by far the most interested in other media, as well as the most inclined to talk about itself. Using newspaper data from 2005, a particularly relevant year in the newspaper’s biography (it changed format from traditional broadsheet to berliner) and rich with self-reflexivity, I examine the discursive behavior of media-related lexical items in the corpus (such as journalist, reporter, hack, media, newspaper, press, tabloid) exploring the ways in which the Guardian conceptualises the role of the news media, how it represents professional values and the divide between good and bad journalism, and, ultimately, how it constructs its own identity. The study relies on the typical tools of corpus linguistics research – collocation analysis, keywords analysis, concordance analysis – and aims to a comprehensive description of the data, following the principle of total accountability (McEnery and Hardie 2012: 17), while keeping track of the broader extralinguistic context. From a methodological point of view this work encourages interdisciplinary contamination and a serendipitous approach to the data and wishes to offer an example of how corpus-based research can contribute to the academic investigation of journalism across disciplines.

Visit the conference website for more details, including a list of plenary speakers.

Corpus linguistics, particularly in relation to discourse analysis or critical discourse analysis, or recent diachronic change. Representation of identity, especially gender and sexuality. Analysis of news or online corpora.

My current PhD students are working on the following topics:

Construction of Islam in the BBC sitcom Citizen Khan
Metrosexuality in Malaysia
Discourses of infertility in blogs, news and clinic websites
Representation of dialect in fiction

Children's books containing same-sex parent families

Language around schizophrenia in the British press

Previous PhDs I have supervised include:
A corpus-based examination of the concept of political correctness in British broadsheet newspapers
The language of marriage rituals in Botswana
Combining corpus approaches and CDA to examine discourses of terrorism in the British and Chinese popular press
Combining corpus approaches and CDA to examine discourses of homophobia in a right-wing political organisation
A corpus study to compare lexical bundle use of Chinese learners of English with native speakers of English
A corpus study of keywords to examine gender identity in British and Malaysian children's writing
The construction of gender identity in Iranian bloggers
A corpus-based comparison of two academic books about Wahhabi Islam

Research Interests

My research interests include corpus linguistics, language and identities and (critical) discourse analysis. Books include: Using Corpora to Analyse Gender (2014), Discourse Analysis and Media Attitudes (2013), Corpus Linguistics and Sociolinguistics (2010), Sexed Texts: Language, Gender and Sexuality (2008), Using Corpora in Discourse Analysis (2006), Public Discourses of Gay Men (2005) and Polari: The Lost Language of Gay Men (2002). I am the commissioning editor for the journal Corpora.



Baker, P. (2017) American and British English. Divided by a Common Language? Cambridge: Cambridge University Press. Publisher's Website

Baker, P. and Balirano, G. (eds) (2017) Queering Masculinities in Language and Culture. London: Palgrave. Publisher's Website

Baker, P. and Egbert, J. (eds) (2016) Triangulating Methodological Approaches in Corpus-Linguistic Research. London: Routledge.

Baker, P. and McEnery, T. (eds) (2015) Corpora and Discourse: Integrating Discourse and Corpora. London: Palgrave.

Baker, P. (2014) Using Corpora to Analyse Gender. London: Bloomsbury.

Baker, P. Gabrielatos, C. and McEnery. T. (2013) Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge: Cambridge University Press.

Baker, P. and Ellece, S. (2011) Key Terms in Discourse Analysis. London: Continuum.

Baker, P. (2010) Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press. More information

Baker, P. (ed.) (2009) Contemporary Corpus Linguistics. London: Continuum. More information

Baker, P. (2008) Sexed Texts: Language, Gender and Sexuality. London: Exquinox. More information

Baker, P. (2006) Using Corpora in Discourse Analysis. London: Continuum. More information

Baker, P., Hardie, A. & McEnery, A. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

Baker, P. (2005) Public Discourses of Gay Men. London: Routledge. More information

Baker, P. & Stanley, J. (2003) Hello Sailor! Seafaring life for gay men: 1945-1990. London: Pearson. More information

Baker, P. (2002) Fantabulosa: A Dictionary of Polari and Gay Slang. London: Continuum. More information

Baker, P. (2002). Polari: The Lost Language of Gay Men. London: Routledge. More information


I am commissioning editor of the journal Corpora published by Edinburgh University Press.

I am on the editorial board for the Journal of English Linguistics, the Journal of Language and Sexuality, Gender and Language, Applied Linguistics, Journalism and Discourse Studies, Text and Talk and Discourse Coherence, Cognition and Creativity.

Journal Articles

Brookes, G. and Baker, P. (2017) 'What does patient feedback reveal about the NHS? A mixed methods study of comments posted to the NHS Choices online service'. BMJ Open 7(4).

Paknahad Jaborooty, M. and Baker, P. (2017) 'Resisting silence: moments of empowerment in Iranian women's blogs'. Gender and Language 11(1): 77-99. 

Baker, P. (2016) 'The shapes of collocation.' International Journal of Corpus Linguistics 21(2): 139-164.

Baker, P. and Levon, E. (2016) '"That's what I call a man": representations of racialised and classed masculinities in the UK print media. Gender and Language 10(1): 106-139.

Anthony, L. and Baker, P. (2015) 'ProtAnt: A tool for analysing the protoypicality of texts.' International Journal of Corpus Linguistics 20(3): 273-292.

Baker, P. (2015) 'Introduction to the Special Issue.' Discourse and Communication 9(2): 143-147.

Chen, Y-H., and Baker, P. (2015) 'Investigating criterial discourse features across second language development: lexical bundles in rated learner essays, CEFR B1, B2 and C1.' Applied Linguistics

Baker, P. and Levon, E. (2015) 'Picking the right cherries?: a comparison of corpus-based and qualitative analyses of news articles about masculnity.' Discourse and Communication 9(2): 221-336

Baker, P., Gabrielatos, C. and McEnery T. (2013) ‘Sketching Muslims: A corpus-driven analysis of representations around the word “Muslim” in the British press 1998-2009’ Applied Linguistics 34:3

Baker, P. and Potts, A. (2013) '"Why do white people have thin lips?": Google and the perpetuation of stereotypes via auto-complete search forms." Critical Discourse Studies 10:2 187-204.

Baker, P. (2012) ‘From gay language to normative discourse: a diachronic corpus analysis of
Lavender Linguistics conference abstracts 1994-201.’ Journal of Language and Sexuality 2:2 179-205.

Potts, A. and Baker. P. (2012) 'Does semantic tagging identify cultural change in British and American English?' International Journal of Corpus Linguistics 17:3 295-324.

Baker, P. (2012) 'Acceptable bias?: Using corpus linguistics methods with critical discourse analysis.' Critical Discourse Studies 9:3 247-256.

Gabrielatos, C., McEnery, T., Diggle, P., Baker. P. and ESRC (funder). (2012) 'The peaks and troughs of corpus-based contextual analysis.' International Journal of Corpus Linguistics. 17:2 151-175.

Baker, P. (2011) 'Times may change but we'll always have money: a corpus driven examination of vocabulary change in four diachronic corpora.' Journal of English Linguistics 39: 65-88.

Baker, P. (2010) 'Will Ms ever be as frequent as Mr? A corpus-based comparison of gendered terms across four diachronic corpora of British English.' Gender and Language 4.1: 125-129.

Chen, Y. and Baker, P. (2010) 'Lexical Bundles in L1 and L2 Academic Writing.' Language Learning and Technology. 14: 2 30-49.

Baker, P. (2010) 'Representations of Islam in British broadsheet and tabloid newspapers 1999-2005.' Language and Politics. 9:2 310-338.

Baker, P. (2009) 'The BE06 Corpus of British English and recent language change.' International Journal of Corpus Linguistics. 14:3 312-337.

Baker, P.,Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T and Wodak, R. (2008) 'A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press.'Discourse and Society 19(3): 273-306.

Gabrielatos, C. and Baker, P. (2008) 'Fleeing, sneaking, flooding: a corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press 1996-2005)'Journal of English Linguistics 36:1 pp. 5-38.

Baker, P. and McEnery, A. (2005) 'A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts.'Language and Politics 4:2 pp. 197-226(30).

Baker, P. Hardie, A. McEnery, A., Xiao, R., Bontcheva, K., Cunningham, H., Gaizauskas, R., Hamza, O., Maynard, D., Tablan, V., Ursu, C., Jayaram, B.D., Leisher, M. (2004) 'Corpus linguistics and South Asian languages: Corpus creation and tool development', Literary and Linguistic Computing, Volume 19, Issue 4, pp 509-524.

Baker, P. (2004) 'Querying keywords: questions of difference, frequency and sense in keywords analysis.' Journal of English Linguistics. 32: 4 pp 346-359.

Baker, P. (2004) '"Unnatural acts"' Discourses of homosexuality within the House of Lords debates on gay male law reformSociolinguistics 8:1 88-106.

Baker, P. (2002) 'Construction of Gay Identity via Polari in the Julian and Sandy Radio Sketches,' Lesbian and Gay Review, 3:3: pp 75-84.

Baker, P. (2001) 'Moral Panic and Alternative Identity Construction in Usenet'. Journal of Computer Mediated Communication 7:1.

Baker, P. Lie, M., McEnery, A. and Sebba, M. (2000) 'Building a Corpus of Spoken Sylheti', Literary and Linguistic Computing, Volume 15, Issue 4, pp 419-431.

McEnery, A., Wilson, A.and Baker, P.(2000) 'Language teaching: corpus based help for teaching grammar', Journada de Corpus Linguistics, Volume 6, pp 65-77.

McEnery, A. Baker, P. Gaizauskas, R. & Cunningham, H. (2000) 'EMILLE: towards a corpus of South Asian languages', British Computing Society Machine Translation Specialist Group, London, pp 11-1 - 11-9.

McEnery, A. Wilson, A.and Baker, P. (1997) 'Teaching Grammar Again after Twenty Years: Corpus based help for grammar teaching.' New Approaches to Grammar Teaching, RECALL Journal, Volume 9, Number 2, pp 8-17.

Baker, P., McEnery, A.and Wilson, A. (1995) 'A brief report on a statistical analysis of corpus-based versus traditional human-teaching methods of part-of-speech analysis', Language Testing Update, Issue 18, pp 59-62.

McEnery, A., Baker, P. and Wilson, A. (1995) 'A Statistical Analysis of Corpus Based Computer vs Traditional Human Teaching Methods of Part of Speech Analysis', Computer Assisted Language Learning, Volume 8, Number 2-3, pp 259-274.

Baker, P. (1994) 'Lithium Discontinuation - A meta-analysis.' Lithium.

Book Chapters

Subtirelu, N. C. and Baker, P. (2017) Corpus-based approaches. In Richardson, J. and Flowerdew, J. (eds) The Routledge Handbook of Critical Discourse Studies, pp. 107-120. 

Baker, P. (2017) Sexuality. In E. Friginal (ed) Studies in Corpus-Based Sociolinguistics. London: Routledge, pp. 159-177.

Baker, P. (2016) 'Gendered Discourses' in Baker, P. and Egbert, J. (eds) Triangulating Methodological Approaches in Corpus-Linguistics Research. London Routledge, pp. 138-151.

Baker, P. and McEnery, T. (2015) 'Who benefits when discourse gets democratised? Analysing a Twitter corpus around the British Benefits Street debate.' In Baker, P. and McEnery T. (eds) (2015) Corpora and Discourse Studies: Integrating Discourse and Corpora. London: Palgrave, pp 244-265.

Baker, P. and McEnery, T. (2015) 'Introduction' In Baker, P. and McEnery, T. (eds) (2015) Corpora and Discourse Studies: Integrating Discourse and Corpora. London: Palgrave, pp 1-20.

Baker, P. (2015) 'Two hundred years of the American man.' In T. Milani (ed) Language and Masculinities: performances, intersections, dislocations. London: Routledge.

Baker, P. and McEnery, A. (2014) '"'FIND THE DOCTORS OF DEATH': The UK Press and the Issue of Foreign Doctors Working in the NHS, a Corpus-Based Approach". In A. Jaworski and N. Coupland (eds) The Discourse Reader. London: Routledge.

Baker, P. (2014) '"Bad wigs and screaming mimis": Using corpus-Assisted techniques to carry out critical discourse analysis of the representation of trans people in the British press.' In C. Hart and P. Cap (eds) Contemporary Critical Discourse Studies. London, Bloomsbury: 211-236

Baker P. ‘Discourse and Gender’. (2013) In K. Hyland and B. Paltridge (eds) ContinuumCompanion to Discourse Analysis. London: Continuum.

Baker, P. (2013) ‘Corpus Linguistics and Sociolinguistics’. J . Holmes (ed). Research Methods in Sociolinguistics. A Practical Guide. Wiley Blackwell.

Baker, P. (2012) 'Corpora and Gender studies' In K. Hyland, C. M. Huat and M. Handford (eds) Corpus Applications in Applied Linguistics. London: Continuum, pp. 100-116.

Baker, P. (2012) ‘Diachronic lexical change in American English (1961-2006).’ In J. Zhang (ed). A Morphologically-based Study of the Lexical Collocation Heterogeneity in EST Texts. Shanghai Jiaotong University.

Baker, P. (2011) 'Social involvement in Corpus Studies.' In V. Viana, S. Zyngier, and G. Barnbrook (eds) Perspectives on Corpus Linguisitcs. Amsterdam: John Benjamins pp.  17-28.

Baker, P. (2010) 'Corpus Linguistics'. L. Litosseleti (ed) Research Methods in Linguistics. London: Continuum, pp. 93-113.

Baker, P. (2009) 'Issues in teaching corpus-based discourse analysis' In L. Lombardo (ed). Using Corpora to Learn about Language and Discourse. Peter Lang, pp. 73-98.

Baker, P. (2009) 'Introduction' In P. Baker (ed) Contemporary Approaches to Corpus Linguistics. London: Continnum, pp. 1-8.

Baker, P. (2009) 'Language and Sexuality'. In J. Culpeper, F. Katamba, P. Kerswill, R. Wodak and T. McEnery (eds) English Language and Linguistics. London: Palgrave, pp. 550-563.

Baker, P. (2008) 'Eligible' bachelors and 'frustrated' spinsters: corpus linguistics, gender and language. In J. Sunderland, K. Harrington and H. Stantson (eds) Gender and Language Research Methodologies. London: Palgrave.

McEnery, T. and Baker, P. (2003) 'Corpora, translation and multilingual computing' in F. Zannetin (ed.) Corpora in Translator Education, St. Jerome Press, Manchester.

Baker, P. (2002) 'No Fats, Femmes or Flamers: Changing Constructions of Identity and the Object of Desire in Gay Men's Magazines.' B. Benwell (ed.) Masculinity and Men's Lifestyle Magazines. Sociological Review.

McEnery, A., Baker, P. and Cheepen, C. (2001) 'Lexis, Indirectness and Politeness in Operator Calls.' In C. Meyer & P. Leistyna. (eds.) Corpus Analysis: Language Structure and Language Use. Rodopi: Amsterdam.

Singh, S., McEnery, A. and Baker, P.(2000) 'Building a Parallel Corpus of English/Punjabi', in J. Veronis (ed) Parallel Text Processing. Kluwer: Dordrecht, pp 335-347.

McEnery, A.M., Baker, P. andHardie, A. (2000) 'Swearing and Abuse in Modern British English', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 37-48.

McEnery, A. and Baker, P. (2000) 'Minority Language Engineering', in B. Lewandowska-Tomaszczyk and P.J. Melia (eds.) Practical Applications of Language Corpora, Peter Lang: Hamburg, pp 411-428.

McEnery, A.M., Baker, P. andHardie, A. (2000) 'Assessing Claims about Language Use with Corpus Data - Swearing and Abuse', in J. Kirk (ed) Corpora Galore, Rodopi: Amsterdam, pp 45-55.

Baker, P. (1997) 'Consistency and Accuracy in Correcting Automatically Tagged Data.' In Corpus Annotation. R. Garside, G. Leech & A. McEnery (eds.) Longman Addison-Wesley, pp 243-250.

McEnery, A.M., Baker, P.& Hutchinson, J.E. (1997) 'A Corpus Based Grammar Tutor'. In R.G. Garside, G.N. Leech & A.M. McEnery (eds.) Corpus Annotation, Longman Addison-Wesley, pp 209-219.

Conference Proceedings

Xiao, Z, McEnery, A, Baker, P and Hardie, A (2004) 'Developing Asian language corpora: standards and practice'. In: Proceedings of the 4th Workshop on Asian Language Resources, Sanya, China.

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) 'Constructing corpora of South Asian languages'. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

Baker, P, Hardie, A, McEnery, AM and Jayaram, BD (2003) 'Corpus data for South Asian language processing'. In: Proceedings of the EACL Workshop on South Asian Languages, Budapest.

Tablan, V., Ursu, C., Bontcheva, K., Cunningham, H., Maynard, D., Hamza, O., McEnery, T., Baker, P. & Leisher, M. (2002) 'A Unicode-based Environment for Creation and Use of Language Resources,' in LREC 2002 Proceedings, pp 66-71.

Baker, P, Hardie, A, McEnery, A, Cunningham, H and Gaizauskas, R (2002) 'EMILLE, a 67-million word corpus of Indic languages: data collection, markup and harmonisation'. In: Proceedings of LREC 2002.

Baker, P, Hardie, A, McEnery, A and Siewierska, A (eds.) (2000) Proceedings of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics, Lancaster University.

McEnery, T., Baker, P., and Burnard, L. (2000) 'Corpus Resources and Minority Language Engineering', in M. Gavrilidou, G. Carayannis, S. Markantontou, S. Piperidis and G. Stainhauoer (eds) Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, pp. 801-806.

McEnery, A. and Baker. P. (1998) 'Intergrating the Intranet into the teaching of linguistics.' (1998). The Future of the Humanities in the Digital Age. International Conference Bergen, Norway. 138-140.

Currrent Teaching

I currently teach various modules in Corpus Linguistics at MA level (on four different schemes), have several PhD students and supervise third year UG dissertations.

I have supervised the following PhD students (dates show completion):

Saiqa Asif (2006), Stephanie Suhr (2007), Sibonile Ellece (2008), Yufang Qian (2008), Andrew Brindle (2009), Yuhua Chen (2009), Sheena Kaur (2009), Maryam Pakhnahad (2011), Amir Salama (2011), Rob Bianchi (2011), Hiroko Usami (2012), Bandar Al-Hejin (2012), Rajab Zahrani (2013), Amanda Potts (2014), Anna Marchi (2014).

My current PhD students are Karen Kinloch (discourses around infertility), David Brown (historical representation of 'othered' speech in fiction), Bilal Kadiri (language use in Citizen Khan), Khushairi Tohiar (metrosexuality in Malaysian men), Mark McGlashan (same-sex parent family books aimed at children)and James Balfour (newspaper discourses around schizophrenia).

The BE06 Corpus

The BE06 Corpus is a one million word corpus of published general written British English. It has the same sampling frame as the LOB and FLOB corpora. This consists of 500 files of 2000 word samples taken from 15 genres of writing.

Eighty-two per cent of the texts were published between 2005 and 2007, while the remainder were published in 2003-4 and early 2008. The median sampling point is 2006, hence the title BE06 (British English 2006). The corpus is described in this paper:

Baker, P. (2009) 'The BE06 Corpus of British English and recent language change.' International Journal of Corpus Linguistics. 14:3 312-337.

Using the corpus

Due to copyright issues, there are no plans to make the corpus files fully available. However, the corpus has been placed on the CQP (Corpus Query Processor) system at Lancaster University and users can carry out concordances, get distribution information (and eventually have access to collocation information). Contact Andrew Hardie in order to obtain a username and password.

Additionally, the following links give frequency lists of the BE06 in various formats (right click on the link and then save it).

BE06 in AntConc format

BE06 Wordlist in WordSmith 5 format

BE06 Wordlist in Wordsmith 4 format

BE06 Wordlist in WordSmith 3 format

The AmE06 Corpus

The AmE06 Corpus is a one million word corpus of published general written American English, also using the same sampling frame as the LOB and FLOB corpora. This consists of 500 files of 2000 word samples taken from 15 genres of writing. The vast majority of the texts were published in 2006. The corpus is also available via CQPweb, and the wordlist is available below.

AmE06 in AntConc format

AmE06 Wordlist in WordSmith 5 format

