- Natural Language Processing Techniques
- Lexicography and Language Studies
- Translation Studies and Practices
- Geographic Information Systems Studies
- Language, Metaphor, and Cognition
- Linguistic Variation and Morphology
- linguistics and terminology studies
- Topic Modeling
- Linguistics, Language Diversity, and Identity
- Language, Linguistics, Cultural Analysis
- Swearing, Euphemism, Multilingualism
- Digital Humanities and Scholarship
- Gender Studies in Language
- Historical Linguistics and Language Studies
- Syntax, Semantics, Linguistic Variation
- Text Readability and Simplification
- Discourse Analysis in Language Studies
- Empathy and Medical Education
- Second Language Acquisition and Learning
- Historical and Linguistic Studies
- Language, Discourse, Communication Strategies
- Authorship Attribution and Profiling
- Humor Studies and Applications
- Social Media and Politics
- Interpreting and Communication in Healthcare
Lancaster University
2014-2024
University of Birmingham
2018
Universities UK
2016
Curtin University
2016
The Open University
2015
Solomon R. Guggenheim Museum
2014
Manchester Metropolitan University
2014
Centre National de la Recherche Scientifique
2014
Institut d'Etudes Politiques de Paris
2014
Infection et inflammation
2014
CQPweb is a new web-based corpus analysis system, intended to address the conflicting requirements for usability and power in software. To do this, its user interface emulates BNCweb system. Like BNCweb, built on two separate query technologies: IMS Open Corpus Workbench MySQL relational database. CQPweb’s main innovative feature flexibility; more generalised data model makes it compatible with any corpus. The options available include: concordancing; collocations; distribution tables...
Abstract This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers English from across UK, recorded in years 2012–2016. After showing that a survey recent history corpora spoken justifies compilation this new corpus, we describe main stages BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, annotation. In doing so aim to (i) encourage users approach with...
To compare the frequencies with which patients cancer and health professionals use Violence Journey metaphors when writing online; to investigate of these by cancer, in view critiques war-related for adoption notion 'cancer journey' UK policy documents.Computer-assisted quantitative qualitative study two data sets totalling 753 302 words.A UK-based online forum (500 134 words) a website (253 168 words).56 between 2007 2012; 307 2008 2013.Patients both approximately 1.5 times per 1000 words...
This alphabetic guide provides definitions and discussion of key terms used in corpus linguistics. Corpus data is being a growing number English Linguistics departments which have no record past research with data. the first comprehensive glossary many specialist linguistics will be useful for linguists non alike. Clearly written, by team experienced academics field, full coverage both traditional contemporary terminology. Entries are focused around following broad groupings: * Important...
This study combines quantitative semi-automated corpus methods with manual qualitative analysis to investigate the use of Violence metaphors for cancer and end life in a 1,500,000-word data from three stakeholder groups healthcare: patients, family carers healthcare professionals. general, especially military metaphors, are conventionally used talk about illness, particularly cancer. However, they have also been criticized their potentially negative implications. The innovative methodology...
Corpus linguistics and Geographical Information Systems (GIS) are approaches exploiting computer-based methodologies in the study of, respectively, language usage, spatial patterns geographical databases. We present an approach that uses corpus methods to bridge gap between textual content of a (and, thus, typically concerns many branches humanities) geo-referenced database at heart GIS. Using part-of-speech tagging extract instances proper nouns from corpus, gazetteer limit these those...
Abstract This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need engage in corpus construction. While extensive standards exist - most notably, Text Encoding Initiative’s Guidelines Corpus Standard based on them these are rather heavyweight approaches, implicitly intended major corpus-building projects, which different from increasingly common efforts construction undertaken individual researchers support their...
This paper describes the work carried out on EMILLE Project (Enabling Minority Language Engineering), which was undertaken by Universities of Lancaster and Sheffield. The primary resource developed project is Corpus, consists a series monolingual corpora for fourteen South Asian languages, totalling more than 96 million words, parallel corpus English five these languages. Corpus also includes an annotated component, namely, part-of-speech tagged Urdu data, together with twenty written Hindi...
Abstract This article focuses on how register considerations informed and guided the design of spoken component British National Corpus 2014 (Spoken BNC2014). It discusses why compilers corpus sought to gather recordings from just one broad – ‘informal conversation’ this other decisions afforded contributors much freedom with regards selection situational contexts for recordings. resulted in a high level diversity parameters such as recording location activity type , each which was captured...
Abstract Topic modelling is a method of statistical data mining corpus documents, popular in the digital humanities and, increasingly, social sciences. A critical methodological issue how ‘topics’ (groups co-selected word types) can be interpreted analytically meaningful terms. In current literature, this typically done by ‘eyeballing’; that is, cursory and largely unsystematic examination ‘top’ words each algorithmically identified group. We critically evaluate approach dual analysis,...
Abstract The aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues analyze large textual collections the Humanities, particularly historical research. Using as examples parts collection Registrar General's Reports that contain more than 200,000 pages descriptions, census data vital statistics for UK , we introduce newly developed automated tools...
The Glencairn Uprising (1653–1654) was a military rebellion by Scottish Highlanders under the leadership of William, Earl Glencairn, against English government Oliver Cromwell. This paper investigates presentation actors and groups on both sides — but most especially himself in contemporary London press. theoretical framework analysis is Critical Discourse Analysis (modelled approach van Dijk 1991); however, corpus-based methodology, partially-quantitative analysis, are employed. documents...