Variational Text Linguistics (2016)
Variational Text Linguistics (2016)
Variational Text Linguistics (2016)
)
Variational Text Linguistics
Topics in English Linguistics
Editors
Elizabeth Closs Traugott
Bernd Kortmann
Volume 90
Variational
Text Linguistics
Edited by
Christoph Schubert
Christina Sanchez-Stockhammer
ISBN 978-3-11-044310-3
e-ISBN (PDF) 978-3-11-044355-4
e-ISBN (EPUB) 978-3-11-043533-7
ISSN 1434-3452
www.degruyter.com
Acknowledgements
The foundations for this edited collection of articles were laid at the interna-
tional conference Register revisited: New perspectives on functional text variety in
English, which took place at the University of Vechta, Germany, from June 27 to 29,
2013. The aim of the present volume is to conserve the research papers and many
inspiring discussions which were stimulated then and to make them available to
a larger audience.
It was only possible to achieve this aim thanks to the help of many people
joining us in the effort. First and foremost, we would like to thank all contributors
for their continued cooperation in this project. Furthermore, we are very grate-
ful to the external peer reviewers who contributed their expertise to the selec-
tion and improvement of the contributions. These are (in alphabetical order):
Federica Barbieri (Swansea, Wales), Eniko Csomay (San Diego, USA) Jürgen Esser
(Bonn, Germany), Maria Freddi (Pavia, Italy), Christer Geisler (Uppsala, Sweden),
Bethany Gray (Ames, Iowa, USA), Joachim Grzega (Eichstätt, Germany), Thomas
Kohnen (Cologne, Germany), Rocío Montoro (Granada, Spain), Neal Norrick (Saar-
brücken, Germany), Caroline Tagg (Birmingham, UK), Sanna-Kaisa Tanskanen
(Helsinki, Finland) and Marija Zlatnar Moe (Ljubljana, Slovenia).
We are very happy that this volume appears in the series Topics in English Lin-
guistics (TiEL) and would like to thank the series editors Elizabeth Traugott and
Bernd Kortmann as well as Wolfgang Konwitschny, Julie Miess and Birgit Sievert
at de Gruyter Mouton for their invaluable support in the preparation of this book.
Needless to say that we are to blame for any remaining inadequacies.
Going back to the roots of this project, we would like to express our grat-
itude to the German Research Foundation/Deutsche Forschungsgemeinschaft
(DFG) for the generous funding of the conference as well as to the Kommission
für Forschung und Nachwuchsförderung der Universität Vechta, the Universitäts-
gesellschaft Vechta (UGV), the Volksbank Vechta and the city of Vechta for their
financial support and hospitality, which contributed immensely to the memora-
ble pleasant atmosphere of the event.
Christoph Schubert
Introduction: Current trends in register research 1
Heidrun Dorgeloh
The interrelationship of register and genre in medical discourse 43
Markus Bieswanger
Aviation English: Two distinct specialised registers? 67
Rolf Kreyer
‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-
linguistic perspective 87
Teresa Pham
The register of English crossword puzzles: Studies in intertextuality 111
Christina Sanchez-Stockhammer
Punctuation as an indication of register: Comics and academic texts 139
Martina Lampert
Linking up register and cognitive perspectives: Parenthetical constructions in
academic prose and experimentalist poetry 169
Barbara Güldenring
Metaphors in New English academic writing 223
Steffen Schaub
The influence of register on noun phrase complexity in varieties of
English 251
Valentin Werner
Real-time online text commentaries: A cross-cultural perspective 271
Javier Pérez-Guerra
Word order is in order here: A diachronic register analysis of syntactic
markedness in English 307
Index 337
Christoph Schubert
Introduction: Current trends in register
research
and analysis. The book by Szmrecsanyi and Wälchli (2014) does not only discuss
register and dialectology but also includes language typology and therefore com-
prises articles on a number of languages such as Dutch or members of the Slavic
family. Yet, they also formulate the central diagnosis that “[e]ven though dialec-
tologists, register analysts, typologists, and quantitative linguists all deal with
linguistic variation, there is astonishingly little interaction across these fields”
(Wälchli and Szmrecsanyi 2014: 1).
In general, register analysis offers a constantly widening range of research
opportunities because of the ever-increasing possibilities of communication,
mainly triggered by the advent of modern communication technologies. As the
main body of linguistic research has concentrated on well-established and fre-
quent registers such as newspaper writing or face-to-face conversations, many
descriptive and theoretical issues have not yet been sufficiently investigated.
Accordingly, the report on major register studies in Biber and Conrad (cf. 2009:
271–295) reveals that research on specialized registers has had a clear preference
for academic and newspaper texts. In particular, the language of popular genres
such as pop music, comics or puzzles has hardly been investigated so far, and
there are several forms of electronic communication, such as online text com-
mentaries, which need to be described more closely. Hence, by giving room to the
description of registers which have not received an appropriate amount of atten-
tion so far, we intend to point out emerging trends as well as new directions for
future research. By means of cross-cultural comparisons of registers, the volume
aims to build bridges to neighbouring disciplines such as cultural studies, espe-
cially with regard to intercultural communication. By pointing out the ubiquitous
nature of register, we also intend to show that adequate register choice is not a
marginal phenomenon but a fundamental prerequisite for successful communi-
cation in specific social situations.
2 Definitions of “register”
As far as the semantic origin of the term “register” is concerned, the linguistic
use of the term represents a metaphorical borrowing from the domain of music,
in particular organ playing (cf. Renkema 2004: 146), where it refers to a “sliding
device controlling a set of organ-pipes which share a tonal quality” or “the
compass of a voice or musical instrument; a particular range of this compass”
(Trumble and Stevenson 2002: 2514), so that it is common to speak of “the upper/
middle/lower register” (Summers et al. 2005: 1380) of a specific instrument.
Hence, in this analogy, “[l]anguage is seen to be regulated in the same way as the
Introduction: Current trends in register research 3
musical tuning of an organ” (cf. Dittmar 2010: 223), and competent speakers of a
language have the ability to fine-tune their linguistic choices according to their
intended contextual functions.
As regards the semantic extension of the term register, it is worthwhile to con-
sider different subdisciplines of linguistics in more detail (cf. Gut and Schubert
2012: 4–6). Thus, it is striking that sociolinguistic approaches usually employ a
narrow definition of the term, reducing it to the language of occupations, such
as “the register of law”, “the register of medicine” and the like. Since the topic of
discourse is the central determining factor in this type of approach, it is mainly
the vocabulary that is responsible for the constitution of a register. The follow-
ing two quotations taken from standard introductions to sociolinguistics aptly
demonstrate this narrow notion of “register”.
Linguistic varieties that are linked […] to particular occupations or topics can be termed
registers. […] Registers are usually characterized entirely, or almost so, by vocabulary differ-
ences. (Trudgill 2000: 81)
Register is another complicating factor in any study of language varieties. Registers are sets
of language items associated with discrete occupational or social groups. Surgeons, airline
pilots, bank managers, sales clerks, jazz fans, and pimps employ different registers. (Ward-
haugh 2002: 51)
It is obvious that subject matters connected to certain types of activity are respon-
sible for the linguistic choices made by discourse participants in this type of
approach to “register”. Although the second quotation includes the term “social
groups”, this is conceptualized in a narrow way, excluding the language of social
classes in the sense of working- or middle-class sociolects.
In contrast to this narrow notion of “register”, a wide definition of the term
is employed by the tradition of Systemic Functional Linguistics (SFL), as can be
seen in the next two definitions taken from a classic introduction to cohesion and
a recent study on register variation.
The linguistic features which are typically associated with a configuration of situational fea-
tures – with particular values of the field, mode and tenor – constitute a register. (Halliday
and Hasan 1976: 22, emphasis original)
Just as situations tend to recur and thus form types, registers represent recurring ways of
using language in a given situation. […] Registers can thus be described as sub-systems of
the language system or, when viewed from below, as types of instantiated texts reflecting a
similar situation. (Neumann 2013: 16)
“dialects” are defined as varieties based on the respective user, who has a certain
social or regional background that surfaces in linguistic behaviour. The fact that
registers can be rightfully viewed as “sub-systems” of a given language under-
lines their formative and constitutive character in a language. As for the three sit-
uational features determining register choices, “field” refers to the subject matter
under discussion, “tenor” pertains to the relationship between the participants
in a given context and “mode” characterizes the medium of transmission (cf. also
Bex 1996: 94–110 and Matthiessen 1993: 236–238).
This wide notion of “register” is also adopted by the currently prevailing
approach of Multidimensional Analysis (MDA) à la Douglas Biber (e.g. Biber
1988, 1995, 2006, 2007; Gray 2013: 363–366), which relies on corpus-derived
co-occurrences of lexico-grammatical features that serve equivalent functions in
discourse. Despite the enhanced methodology, the definition is relatively similar,
since a register is regarded as “a variety associated with a particular situation
of use (including particular communicative purposes)” (Biber and Conrad 2009:
6). By increasing the degree of specificity, it is possible to distinguish between
“sub-registers” (Biber and Gray 2013), so that, for instance, academic writing
can be subdivided into sub-registers such as social science, multi-disciplinary
science and humanities.
In text linguistics, the terminological differentiation between “register” and
“genre” has always been a notorious issue. One possible solution to the problem
is offered by Dorgeloh and Wanner (2010: 10), who suggest three main differ-
ences, although the distinction between the concepts is still seen as scalar and
gradient. First, while register implies linguistic features dependent on situational
contexts, genres are regarded as types of “social action” (Dorgeloh and Wanner
2010: 10) used to perform interindividual tasks. Second, register is dominantly
geared towards the function of linguistic features, whereas genres rely to a large
degree on “patterned practice” (Dorgeloh and Wanner 2010: 10), involving char-
acteristic textual structures. Third, register operates at a high level of generality,
while genre has a more specific and concrete character, such as “on-line medical
advice” or a “corporate blog” (Giltrow 2010: 47). In fact, this more specific defini-
tion offers a niche for the term “genre” in linguistics, since recently, research on
“genre” has been superseded by linguistic interest in “register” (cf. Giltrow 2010:
31). Literary criticism, by contrast, clearly maintains a preference for the concept
of “genre”.
An alternative approach to terminological differentiation is provided by Biber
and Conrad (2009: 15–23), who regard the three terms “register”, “genre” and
“style” as “different perspectives on text varieties” (2009: 15–16). The perspective
of register pertains to all kinds of “frequent and pervasive” lexico-grammatical
items that fulfil specific communicative functions in “a sample of text excerpts”,
Introduction: Current trends in register research 5
so that it can be applied to all sorts of discourse. As opposed to that, “[i]n the
genre perspective, the focus is on the linguistic characteristics that are used to
structure complete texts” (Biber and Conrad 2009: 16). Thus, genres rely on rather
specific expressions that occur “in a particular place in the text” (2009: 16) and
thus add up to a distinct rhetorical organization, which can be found in texts with
a fixed structure, such as formal letters. Finally, “style” is very similar to “reg-
ister” but depends on linguistic features that are “not directly functional” and
“are preferred because they are aesthetically valued” (Biber and Conrad 2009:
16). That is to say that it is possible to determine the style of specific authors or
periods of literary history, because these linguistic items do not correspond to
particular contexts of situation but serve the poetic function of language. Con-
clusively, in an extension of the “music” metaphor previously mentioned in the
definition of “register”, “genre” equals the specific musical piece chosen by the
church organist, while “style” is the organ-player’s individual interpretation and
performance of the composition.1
1 The editors sincerely thank Jan Renkema for this metaphorical insight.
6 Christoph Schubert
Along similar lines, Biber and Gray (2013) investigate diachronic change in news
reportage and academic research writing during the twentieth century.
Second, there is a considerable body of research on register variation in spe-
cialized domains. The dimensions under discussion include parameters such
as medium, public and private spheres as well as the discourse of certain fields
of knowledge. Research on academic English is most frequent, as shown by
Csomay’s (2002) analysis of lectures and Biber’s (2006) comprehensive multidi-
mensional study of spoken and written register variation in university discourse.
Fryer (2013) investigates medical research articles with regard to evaluation
practices, while Schutz (2013) discusses the use of verbs in registers pertaining
to business, linguistics, and medical research. Gotti (2012) argues that academic
English is by no means uniform but varies according to a number of criteria,
such as disciplinary conventions, expertise in the respective field, and linguistic
competence of the author. A particular focus on interdisciplinary discourses is
found in Teich (2009), whereas further recent studies on academic English and
scientific texts respectively have been published by Bartsch (2009) and Teich
(2010). In Quinto-Pozos and Mehta’s (2010) study of American Sign Language,
it becomes clear that different registers are present not only in verbal but also in
nonverbal communication. Concerning the parameter of medium, earlier studies
on spoken and written registers have been complemented by research on com-
puter-mediated communication (Biber 2007). As the research survey in Biber and
Conrad (cf. 2009: 271–295) underlines, interest in electronic discourse has signif-
icantly increased over the last ten to fifteen years. Further studies on specialized
domains comprise register shifting in US public discourse (Cole 2012), the crea-
tion of humour through incongruity in register (Venour, Ritchie and Mellish 2011),
the register of news reporting in its social context (Lukin 2010), Business English
(Cortés de los Ríos 2010), the evaluative language of corporate social reporting
(Fuoli 2013), legal language (Battarbee 2010) and the language of linguistics
(Freddi 2005). There is also some research on the use of registers in literary texts,
as exemplified by Pollner’s (2005) analysis of language variation in Irvine Welsh’s
novel Trainspotting.
Third, a quickly developing trend brings together register research with socio
linguistic investigations of regional variation, usually concentrating on inter-
national varieties of English, or “World Englishes”, used as a second language
(ESL). Xiao (2009) provides a discussion of general issues of the study of World
Englishes from the perspective of multidimensional analysis. The recent volume
by Szmrecsanyi and Wälchli (2014) contains a number of papers which combine
quantitative techniques in register analysis, dialectology, and language typology.
For instance, the contribution by Diwersy, Evert and Neumann (2014) shows how
a corpus-driven multivariate approach can be used for the study of both regis-
Introduction: Current trends in register research 7
ter and regional variation. Hilbert and Krug (2012) present a study on the use
of progressives in spoken conversations and written press language in Maltese
English, as compared to British and American English. As far as Asian varieties
are concerned, there is research on registers in Singapore English (Bao and Hong
2006) and on Indian English registers (Balasubramanian 2009a), complemented
by a special focus on adverbials (Balasubramanian 2009b). Regarding Africa,
there is multidimensional research on various registers in East African English,
pointing out, among other aspects, the presence of a greater degree of formal-
ity and an increased involvement of the addressee (Van Rooy et al. 2010). Other
papers analyse expository writing in Cameroon English (Nkemleke 2006) and
academic texts by African American college students (Syrquin 2006). Neumann
(2012) chooses a more comprehensive approach, comparing a number of registers
in the Englishes spoken in New Zealand, Hong Kong, India, Jamaica, Singapore
and Canada. The ultimate goal of most of these studies is to give a complete and
comprehensive account of geographical varieties by describing their internally
diversified registers, thus taking sociolinguistics to the next level. Along these
lines, Balasubramanian (2009a: 19) argues that “[t]o provide a thorough linguis-
tic description of a variety […], it is important to study registers of that variety –
i.e. to study the variation within the dialect” and that “[s]uch study of register
was missing in the earlier methodologies of dialectology”. As has been pointed
out in research on postcolonial Englishes, it is common for these new Englishes to
develop use-related varieties in addition to user-related ones, which corresponds
to the stage of “differentiation” in the evolutionary development of postcolonial
varieties (cf. Schneider 2007: 52–55). Hence, the study of registers aptly comple-
ments sociolinguistic approaches, so that this liaison will undoubtedly prove
highly fruitful in future research on linguistic variety.
Fourth, contrastive register analysis investigates register variation across
two or more languages and is often linked to questions of translation studies.
For instance, Teich (2003) compares textual variety in English and German and
thereby significantly extends the scope of Contrastive Linguistics, which used to
focus mainly on relatively isolated phonological and morphosyntactic features.
Neumann (2013) likewise contrasts English and German registers by including
both cross-linguistic variation and variational differences between original and
translated texts. One central result is that related registers in the two languages
show different register features with regard to the chosen subdimensions, so
that individual register studies for both languages are necessary. More specifi-
cally, the monograph by Barron (2012) compares public information messages in
Irish English and German, while register shifts in translations from English into
Slovene are investigated by Zlatnar Moe (2010). Focusing on the digital medium,
Hardy (2012) contrasts electronic discourse in Filipino and American English.
8 Christoph Schubert
butions to the volume take different research perspectives, all deal with frequent
and recurrent linguistic features throughout texts supporting specific superor-
dinate functions. Conclusively, the papers cover theoretical considerations, case
studies and reflections on presently employed methods, suggesting approaches
and topics for future research on variational text linguistics in English.
Bibliography
Alonso-Almeida, Francisco. 2008. The Middle English medical charm: Register, genre and text
type variables. Neuphilologische Mitteilungen 109(1). 9–38.
Andersen, Gisle & Kristin Bech (eds.). 2013. English corpus linguistics: Variation in time, space
and genre. Amsterdam: Rodopi.
Balasubramanian, Chandrika. 2009a. Register variation in Indian English. Amsterdam:
Benjamins.
Balasubramanian, Chandrika. 2009b. Circumstance adverbials in registers of Indian English.
World Englishes 28(4). 485–508.
Bao, Zhiming & Huaqing Hong. 2006. Diglossia and register variation in Singapore English.
World Englishes 25(1). 105–114.
Barron, Anne. 2012. Public information messages: A contrastive genre analysis of state-citizen
communication. Amsterdam: Benjamins.
Bartsch, Sabine. 2009. Corpus studies of register variation: An exploration of academic
registers. Anglistik: International Journal of English Studies 20(1). 105–124.
Battarbee, Keith. 2010. Shifts in the language of the law: Reading the registers of official-
language statutes. Text & Talk 30(6). 637–655.
Bex, Tony. 1996. Variety in written English: Texts in society – societies in text. London:
Routledge.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge UP.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge UP.
Biber, Douglas & Edward Finegan. 2001. Diachronic relations among speech-based and
written registers in English. In Susan Conrad & Douglas Biber (eds.). Variation in English:
Multi-dimensional studies, 66–83. Harlow: Pearson Education.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and written
registers. Amsterdam: Benjamins.
Biber, Douglas. 2007. Towards a taxonomy of web registers and text types: A multidimensional
analysis. In Marianne Hundt, Nadja Nesselhauf & Carolin Biewer (eds.). Corpus linguistics
and the Web, 109–131. Amsterdam: Rodopi.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge UP.
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus linguistics and
linguistic theory 8(1). 9–37.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. London: Longman.
Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of
sub-register. Journal of English Linguistics 41(2). 104–134.
12 Christoph Schubert
Cole, Debbie. 2012. Uptake (un)limited: The mediatization of register shifting in US public
discourse. Language in Society 41(4). 449–470.
Cortés de los Ríos, Ma Enriqueta. 2010. A combined genre-register approach in texts of
business English. LSP Journal 1(1). 13–28.
Crespo García, Begoña. 2004. The scientific register in the history of English: A corpus-based
study. Studia Neophilologica 76(2). 125–139.
Csomay, Eniko. 2002. Variation in academic lectures: Interactivity and level of instruction. In
Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore
linguistic variation, 203–224. Amsterdam: Benjamins.
Davies, Mark. 2009. Word frequency in context: Alternative architectures for examining related
words, register variation and historical change. In Dawn Archer (ed.). What’s in a word-list?
Investigating word frequency and keyword extraction, 53–68. Surrey: Ashgate.
De Beaugrande, Robert-Alain. 1993. ‘Register’ in discourse studies: A concept in search of a
theory. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 7–25. London:
Pinter Publishers.
De Beaugrande, Robert-Alain & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics.
London: Longman.
Dittmar, Norbert. 2010. Register. In Mirjam Fried, Jan-Ola Östman & Jef Verschueren (eds.).
Variation and change: Pragmatic perspectives, 221–233. Amsterdam: Benjamins.
Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate
approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli
(eds.). Aggregating dialectology, typology, and register analysis: Linguistic variation in
text and speech, 174–204. Berlin: de Gruyter.
Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner
(eds.). Syntactic variation and genre, 1–26. Berlin: De Gruyter Mouton.
Egbert, Jesse. 2012. Style in nineteenth century fiction: A multi-dimensional analysis. Scientific
Study of Literature 2(2). 167–198.
Esser, Jürgen. 2009. Introduction to English text-linguistics. Frankfurt/Main: Peter Lang.
Freddi, Maria. 2005. From corpus to register: The construction of evaluation and argumentation
in linguistics textbooks. In Elena Tognini-Bonelli & Gabriella Del Lungo Camiciotti (eds.).
Strategies in academic discourse, 133–151. Amsterdam: Benjamins.
Fryer, Daniel Lees. 2013. Exploring the dialogism of academic discourse: Heteroglossic
engagement in medical research articles. In Gisle Andersen & Kristin Bech (eds.). English
corpus linguistics: Variation in time, space and genre, 183–207. Amsterdam: Rodopi.
Fuoli, Matteo. 2013. Texturing a responsible corporate identity: A comparative analysis of
appraisal in BP’S and IKEA’S 2009 corporate social reports. In Gisle Andersen & Kristin
Bech (eds.). English corpus linguistics: Variation in time, space and genre, 209–235.
Amsterdam: Rodopi.
Gardner, Sheena. 2012. Genres and registers of student report writing: An SFL perspective on
texts and practices. Journal of English for Academic Purposes 11(1). 52–63.
Geisler, Christer. 2002. Investigating register variation in nineteenth-century English: A
multi-dimensional comparison. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber
(eds.). Using corpora to explore linguistic variation, 249–271. Amsterdam: Benjamins.
Gilquin, Gaëtanelle. 2008. Too chatty: Learner academic writing and register variation. English
Text Construction 1(1). 41–61.
Introduction: Current trends in register research 13
Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun
Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 29–51. Berlin: De Gruyter
Mouton.
Gotti, Maurizio. 2012. Variation in academic texts. In Maurizio Gotti (ed.). Academic identity
traits: A corpus-based investigation, 23–42. Bern: Peter Lang.
Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4).
359–379.
Gut, Ulrike & Christoph Schubert. 2012. Approaches to language variation: Introduction. In
Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings,
3–9. Trier: WVT.
Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Arnold.
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.
Han, Huabing. 2010. On the methodology employed in ESP teaching under register theory. The
1st Asian ESP conference. [Special edition]. Asian ESP Journal, 158–163.
Hardy, Jack A. 2012. Filipino and American online communication and linguistic variation. World
Englishes 31(2). 143–161.
Hilbert, Michaela & Manfred Krug. 2012. Progressives in Maltese English: A comparison with
spoken and written text types of British and American English. In Marianne Hundt &
Ulrike Gut (eds.). Mapping unity and diversity world-wide, 103–136. Amsterdam: John
Benjamins.
Lukin, Annabelle. 2010. ‘News’ and ‘register’: A preliminary investigation. In Ahmar Mahboob &
Naomi K. Knight (eds.). Appliable linguistics, 92–113. London: Continuum.
Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of
register analysis. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice,
221–292. London: Pinter Publishers.
Moore, Nick. 2006. Advanced language for intermediate learners: Corpus and register analysis
for curriculum specification in English for academic purposes. In Heidi Byrnes (ed.).
Advanced language learning: The contribution of Halliday and Vygotsky, 246–264.
London: Continuum.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik &
Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.
Neumann, Stella. 2013. Contrastive register variation: A quantitative approach to the
comparison of English and German. Berlin: Mouton de Gruyter.
Nkemleke, Daniel A. 2006. Some characteristics of expository writing in Cameroon English.
English World-Wide 27(1). 25–44.
Painter, Clare. 2001. Understanding genre and register: Implications for language teaching.
In Anne Burns & Caroline Coffin (eds.). Analysing English in a global context, 167–180.
London: Routledge.
Pollner, Clausdirk. 2005. English 0 – and drugs galore: Varieties and registers in Irvine Welsh’s
Trainspotting. In Gisela Hermann-Brennecke & Wolf Kindermann (eds.). Anglo-american
awareness: Arpeggios in aesthetics, 193–202. Münster: LIT.
Quinto-Pozos, David & Sarika Mehta. 2010. Register variation in mimetic gestural complements
to signed language. Journal of Pragmatics 42(3). 557–584.
Renkema, Jan. 2004. Introduction to discourse studies. Amsterdam: John Benjamins.
14 Christoph Schubert
Reppen, Randi. 2001. Register variation in student and adult speech and writing. In Susan
Conrad & Douglas Biber (eds.). Variation in English: Multidimensional studies, 187–199.
London: Longman.
Rühlemann, Christoph. 2008. A register approach to teaching conversation: Farewell to
Standard English? Applied Linguistics 29(4). 672–693.
Sardinha, Tony Berber & Marcia Veirano Pinto (eds.). 2014. Multi-dimensional analysis, 25
years on: A tribute to Douglas Biber. Amsterdam: John Benjamins.
Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge:
Cambridge UP.
Schneider, Klaus P. & Anne Barron (eds.). 2008. Variational pragmatics: A focus on regional
varieties in pluricentric languages. Amsterdam/Philadelphia: Benjamins.
Schubert, Christoph. 2012. Englische Textlinguistik: Eine Einführung. 2nd edn. Berlin: Erich
Schmidt.
Schutz, Natassia. 2013. How specific is English for academic purposes? A look at verbs
in business, linguistics and medical research articles. In Gisle Andersen & Kristin
Bech (eds.). English corpus linguistics: Variation in time, space and genre, 237–257.
Amsterdam: Rodopi.
Summers, Della et. al. (ed.). 2005. Longman dictionary of contemporary English. Harlow:
Pearson Education Limited.
Syrquin, Anna F. 2006. Registers in the academic writing of African American college students.
Written Communication 23(1). 63–90.
Szmrecsanyi, Benedikt & Bernhard Wälchli (eds.). 2014. Aggregating dialectology, typology,
and register analysis: Linguistic variation in text and speech. Berlin: de Gruyter.
Taavitsainen, Irma. 2001. Language history and the scientific register. In Hans-Jürgen Diller
& Manfred Görlach (eds.). Towards a history of English as a history of genres, 185–202.
Heidelberg: Winter.
Teich, Elke. 2003. Cross-linguistic variation in system and text. Berlin: Mouton de Gruyter.
Teich, Elke. 2009. Scientific registers in contact: An exploration of the lexico-grammatical
properties of interdisciplinary discourses. International Journal of Corpus Linguistics
14(4). 524–548.
Teich, Elke. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries,
Stefanie Wulff & Mark Davies (eds.). Corpus-linguistic applications: Current studies, new
directions, 233–247. Amsterdam: Rodopi.
Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn.
London: Penguin.
Trumble, William R. & Angus Stevenson (eds.). 2002. Shorter Oxford English dictionary on
historical principles. 2 vols. Oxford: Oxford UP.
Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register
differentiation in East African English: A multidimensional study. English World-Wide
31(3). 311–349.
Venour, Chris, Graeme Ritchie & Chris Mellish. 2011. Dimensions of incongruity in register
humour. In Marta Dynel (ed.). The pragmatics of humour across discourse domains,
125–144. Amsterdam: Benjamins.
Volden, Joanne. 2009. Bossy and nice requests: Varying language register in speakers with
autism spectrum disorder (ASD). Journal of Communication Disorders 42(1). 58–73.
Wälchli, Bernhard & Benedikt Szmrecsanyi. 2014. Introduction: The text-feature-aggregation
pipeline in variation studies. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.).
Introduction: Current trends in register research 15
Aggregating dialectology, typology, and register analysis: Linguistic variation in text and
speech, 1–25. Berlin: de Gruyter.
Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.
Warner, Anthony. 2005. Why DO dove: Evidence for register variation in Early Modern English.
Language Variation and Change 17(3). 257–280.
Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World
Englishes 28(4). 421–450.
Zlatnar Moe, Marija. 2010. Register shifts in translations of popular fiction from English into
Slovene. In Daniel Gile, Gyde Hansen & Nike K. Pokorn (eds.). Why translation studies
matters, 125–136. Amsterdam: Benjamins.
Section I:
Specialised registers
The volume opens with five contributions discussing the lexico-grammatical
features of previously underdescribed registers, which are situated on different
levels in the hierarchy of specificity: web registers, medical discourse, Aviation
English, hip-hop and crossword puzzles. The first two registers comprise hetero
geneous sub-registers, as, for instance, a distinction is made among the web
registers between interviews, discussion forums, encyclopedia articles, adver-
tisements and recipes, while Aviation English is a twofold construct and hip-hop
and crossword puzzles constitute relatively uniform categories. All studies can be
situated within the analytical register framework described in Biber and Conrad
(2009) and examine to what extent their object of inquiry can be considered a
register or where the boundaries between more general categories and sub-
registers may be drawn. In addition, Dorgeloh’s contribution extends the model
by including the genre perspective in the analyses.
The first paper in the volume, Douglas Biber and Jesse Egbert’s study
“Towards a user-based taxonomy of web registers”, stands out from the other
papers’ corpus-based approaches by its use of a bottom-up design in which inter-
net users were asked to identify basic situational characteristics of web docu-
ments. These characteristics were then used to construct a hierarchical decision
tree, which permitted the successful categorisation of most internet texts by the
same type of informants in the next step. Among the most important results of
this study are the finding that some sub-registers might be easier to identify than
their superordinate category and the observation that a relatively large propor-
tion of registers on the internet can be considered hybrid with regard to their
communicative purposes.
Hybridity of either form, discourse function or both is also observed by
Heidrun Dorgeloh in her study “The interrelationship of register and genre in
medical discourse”, which finds hybridity in the three medial registers under
consideration: illness blogs, medical case reports and medical case presenta-
tions. She argues that the correlations between form and function in medical dis-
course are less linked to the communicative situation than to the type of activity
and concludes that the notion of genre should be conferred primacy over that of
(sub-)registers.
Markus Bieswanger, by contrast, applies a classical Biberian register analy-
sis to the field of air traffic communication in his paper “Aviation English: Two dis-
tinct specialised registers?”. While the term Aviation English is generally used to
designate both the standardised phraseology promoted by the International Civil
18 Section I: Specialised registers
Aviation Organization and the plain English used in exceptional situations where
communicative needs transcend the routine repertoire, Bieswanger’s analysis of
authentic air traffic communication material manages to demonstrate that these
are actually two distinct registers and not just one register with two sub-registers.
While Dorgeloh’s and Bieswanger’s material-based approaches place a particular
focus on the qualitative analysis of their data in order to explore the boundaries
of their particular register(s), the remaining two studies represent quantitative
corpus-based studies of specialised corpora.
Rolf Kreyer’s contribution, “‘Now niggas talk a lotta Bad Boy shit’: The reg-
ister hip-hop from a corpus-linguistic perspective”, targets a question similar to
Bieswanger’s, namely whether hip-hop lyrics should be considered a sub-register
of pop song lyrics. Based on a corpus of lyrics from the top albums in the US
album charts in 2003 and 2011, Kreyer contrasts a hip-hop sub-corpus with lyrics
by rappers and hip-hoppers to the lyrics from the remaining albums. His analyses
yield differences regarding the semantically annotated content and some non-
standard spellings but particularly regarding the absence of the copula. Kreyer
therefore concludes that the language used in hip-hop can be considered a regis-
ter in its own right.
The section closes with Teresa Pham’s corpus analysis entitled “The register
of English crossword puzzles: Studies in intertextuality”, in which she reaches
the conclusion that cryptic and non-cryptic puzzles constitute sub-registers of
the general register of crossword puzzles. The differences with regard to the use
of intertextuality between the two types of crossword puzzle suggest the addition
of intertextuality to the list of linguistic features that can be used to distinguish
registers from each other in the Biberian framework.
Douglas Biber and Jesse Egbert
Towards a user-based taxonomy of web
registers
Abstract: There is a well-established need for a comprehensive taxonomy of
English web registers grounded in the actual experiences of end-users. In this
paper, we introduce a new grant-funded initiative aimed at filling this gap. We
first describe the methods used to develop a hierarchical web register framework
and introduce our bottom-up, user-based method of web register classification.
Using a hierarchical decision tree, a large sample of webpage URLs (N = 1,000)
was classified into register and sub-register categories by four raters each. The
results indicate that the approach can be effectively used to identify the register
category for most internet texts, although the results also show that many texts
belong to ‘hybrid’ registers. The primary goals of the paper are to present the
overall distribution of internet texts across general registers, sub-registers and
‘hybrid’ registers, and to discuss some of the key characteristics of the major reg-
ister categories. We conclude with a discussion of challenges and future direc-
tions for web register research.
1 I ntroduction
There is a mind-boggling amount of information available on the World Wide
Web. For example, Fletcher (2012: 1) estimates that Google indexes about 40
billion webpages. Although not its intended purpose, the WWW also provides
a tremendous resource for linguists, who can use the web as a corpus to investi-
gate linguistic patterns of use. This approach has become so prevalent that the
acronym WAC (Web-as-Corpus) has now become commonplace among research-
ers who explore ways to mine the WWW for linguistic analysis.
One of the major challenges for WAC research is that a typical web search
usually provides us with no information about the kinds of texts investigated. For
example, Fletcher notes that a linguistic search of the Web-as-Corpus will tell us
nothing about:
For whom and what purpose is the text intended? What […] target audience does it repre-
sent? Was it written carefully or carelessly by a native speaker, or is it an unreliable transla-
tion by man or machine? Is the document authoritative – accurate in content and represent-
ative in linguistic form? (2012: 1341)
Similar problems were noted a decade earlier by Kilgarriff and Grefenstette (2003)
in their introduction to a special issue of Computational Linguistics on WAC. Thus,
they write:
“Text type” is an area in which our understanding is, as yet, very limited. Although further
work is required irrespective of the Web, the use of the Web forces the issue. Where research-
ers use established corpora, such as Brown, the BNC, or the Penn Treebank, researchers and
readers are willing to accept the corpus name as a label for the type of text occurring in it
without asking critical questions. Once we move to the Web as a source of data, and our
corpora have names like “April03-sample77,” the issue of how the text type(s) can be char-
acterized demands attention. (2003: 343)
These concerns are shared widely among WAC researchers, and as a result, there
has been a surge of interest over the last several years in Automatic Genre Identi-
fication (AGI): computational methods using a wide range of descriptors to auto-
matically classify web texts into genre classes. The typical methodology used in
an AGI study is to manually identify the genre (or register) of selected internet
texts and to then test the extent to which computer programs can automatically
place those texts into the same categories. However, although some studies have
achieved high accuracy rates (e.g., Lindemann and Littig 2010; Santini 2010),
serious questions have been raised about the validity of those results. First, some
scholars raise doubts about the representativeness of the web corpora analysed
in previous AGI studies: researchers often disregard the question of whether the
sample used in an AGI study represents the full population of internet texts (see
discussion in Santini and Sharoff 2009).
There have also been questions raised about the actual genre/register cate-
gories that we are trying to predict. Most studies have followed the same general
procedure: they first begin with a list of possible genre categories; then internet
texts are manually classified into those categories by an ‘expert’; and then com-
putational methods are used to determine whether those genre categories can
be automatically predicted. This approach is based on two assumptions: 1) that
researchers have identified the ‘correct’ set of possible genre/register categories
found on the web, based on a priori intuitive consideration of internet texts; and
2) that a single expert user is able to ‘correctly’ identify the genre/register cat-
egory of individual internet texts. Unfortunately, neither assumption seems to
be warranted. The few cases where inter-rater reliability is reported have shown
Towards a user-based taxonomy of web registers 21
that it tends to be quite low, even for linguists. This is especially true for corpora
composed of randomly extracted web texts (see discussion in Sharoff, Wu, and
Markert 2010). Given the problems that ‘experts’ have identifying web genre cat-
egories, it is not surprising that non-expert web users also vary in their under-
standing of genre labels (see Crowston, Kwaśnik, and Rubleske 2010) and that
reliability among lay users is often unacceptably low (Rosso and Haas 2010).
More importantly, though, it is not clear that the genre categories being pre-
dicted in AGI studies are actually valid. This problem has been recognised and
discussed in previous research; thus, for example, Rehm et al. (2008: 352) note:
One of the most important problems concerns the elusiveness of the concept of genre. The
consequence is that, in practical terms, genre researchers usually have different ideas of
what a genre is, how genres should be defined and identified and, therefore, they use dif-
ferent genre labels in their approaches.
A few years ago, there was considerable effort to agree on a standard set of
register/genre categories for AGI research, as part of a wiki-based collaboration
among Web-as-Corpus experts (http://www.webgenrewiki.org/). That collabo-
rative effort resulted in a list of 78 register/genre distinctions, but the initiative
appears to have faded out in the last few years, with little consensus regarding the
relative status of those categories. As a result, there is still no generally agreed-on
set of register/genre categories used in current AGI research. (In the remainder
of this paper, we use the term ‘register’ rather than ‘genre’ to refer to situational-
ly-based textual distinctions, following the research tradition developed in Biber
1995, Biber et al. 1999, Biber and Conrad 2009, etc.).
In the present study, we tackle this problem with a completely different
approach: instead of relying on expert coders, we recruit typical end-users of
the web for our register analyses, assessing the degree of agreement among
those users. Most importantly, we do not force users to choose directly from a
pre-defined set of register categories. Rather, we ask users to identify basic situ
ational characteristics of each web document, coded in a hierarchical manner
(see below). Those situational characteristics lead to general register categories,
which in turn allow users to select a specific sub-register category. By working
through a hierarchical decision tree, users are able to identify the register cat
egory of most internet texts with a high degree of reliability.
In Section 2 below, we briefly document the methodological procedures used
for this project. (Readers are referred to Egbert and Biber 2013 for more detailed
discussions.) In Section 3, we introduce the register framework used for our study.
In Section 4, then, we describe the overall prevalence of different types of regis-
ters on the web and briefly describe and illustrate some of the major web regis-
22 Douglas Biber and Jesse Egbert
ters identified in the study. Section 5 discusses a more specialised type of register
identified by users in this study: ‘hybrid registers’. Finally, in the conclusion we
outline our on-going research to extend this methodological approach to a large
representative corpus of web documents.
2 M
ethods
The corpus used for our study was extracted from the Corpus of Global Web-
based English (GloWbE), constructed by Mark Davies (see http://corpus2.byu.
edu/glowbe/). The entire corpus contains ca. 1.9 billion words and 1.8 million
web pages, collected by using the results of Google searches of highly frequent
English 3-grams (e.g., is not the, and from the). The use of n-grams as search
engine seeds is an approach that has been used in the past by many WAC schol-
ars (see, e.g., Baroni and Bernardini 2004; Baroni et al. 2009; Sharoff 2005, 2006).
Our decision to use 3-grams (rather than 2-grams or 4-grams) was based largely
on empirical evidence from the Longman Grammar of Spoken and Written English
(Biber et al. 1999). 2-grams are generally collocations that are semantically-based
and likely to result in topic-driven Google search results. 4-grams, on the other
hand, are much less frequent than 3-grams and were thus not likely to offer us
a broad enough sample of n-grams to choose from. To create the actual corpus,
the web pages identified through these random searches were downloaded
using HTTrack (http://www.httrack.com). Our ultimate goal in this project is to
carry out linguistic analyses of internet texts from the range of web registers. To
prepare the corpus for such analyses, non-textual material was removed from all
web pages (HTML scrubbing and boilerplate removal) using JusText (http://code.
google.com/p/justext). Finally, for the present pilot study, we randomly extracted
1,000 web pages from the larger corpus (with URLs from the US, UK, CA, AU, NZ).
Roughly 7 % of the web pages in this initial sample were dropped from the reg-
ister analysis: 33 of the 1,000 web sites in the corpus were no longer available at
the time of coding and an additional 36 web pages consisted mostly of photos or
graphics. Consequently, the results reported below are based on a corpus of 931
web pages.
Towards a user-based taxonomy of web registers 23
The study described here is part of a larger project, designed to identify the reg-
isters found on the web, document the extent to which each of those registers is
actually used and ultimately undertake comprehensive linguistic analyses of those
register categories as the basis for automatic register and genre identification.
The first step required to reach these goals was to establish a set of regis-
ter distinctions that end-users actually recognise and can reliably identify. This
step turned out to be highly challenging, requiring several rounds of pilot testing
with end-users. In the process, we reconsidered our basic approach, developing
a decision tree of situational characteristics rather than asking users to directly
identify the register category of a given internet text. We discuss these register
distinctions, and the development of a web classification tool, in Section 3 below.
Once we had developed this tool, and verified that end-users were able to
reliably identify the register distinctions built into the tool, we moved on to the
larger pilot study to explore the types and distributions of registers found on the
web. We recruited 85 raters (typical end-users of the web) to analyse the 1,000
web pages in our pilot corpus. Raters were recruited through Mechanical Turk.
Mechanical Turk is an Amazon-based online crowd-sourcing utility that connects
Requesters – or people who need small tasks completed by human raters—with
Workers – or people who are willing to complete those small tasks for money.
Each web page was coded by four independent raters, so we were able to analyse
the reliability of the coding. We determined that four was the optimal number of
raters as a result of several rounds of pilot research. The choice to use 1,000 URLs
was based mostly on practicality and the money available to us. While there was
consensus on the coding of the majority of pages, this approach also allowed us
to identify the existence of ‘hybrid registers’ (see Section 5 below). Finally, we
compiled distributional results from the coding, providing the basis for our pre-
liminary description of register variation on the web (Sections 4–5).
nicative purpose; see Biber and Conrad 2009, Chapter 2), and based on that anal-
ysis, we developed a framework with the eight general registers shown in Table 1.
In our early pilot studies, we asked non-expert users of the internet to categorise
web pages by directly identifying the register category of each page. However, this
approach proved problematic, in some cases achieving agreement rates below
50 %. As a result, we developed a more bottom-up approach involving a deci-
sion tree with basic situational characteristics. At the top level, we asked users to
make a 2-way decision about the mode of production:
1. Internet texts that originated in the spoken mode (e.g., transcripts of speeches
or interviews)
2. Internet texts that originated in the written mode
Then, for the written texts, we asked users to distinguish between interactive dis-
cussions (e.g., discussion forums) versus non-interactive internet texts. Even this
simple distinction is often not clear-cut on the web, because authored web docu
ments are often followed by reader comments. We thus made it clear to coders
that ‘written interactive discussions’ are distinct from written documents fol-
lowed by reader comments, and that coders would be able to note the existence
of reader comments for non-interactive texts later in the process. These reader
comments are common in web documents. While we do not currently have plans
to classify documents with reader comments differently than those without com-
ments, coding for their presence makes this a possibility for future analyses.
For the first two general categories above (spoken and interactive written), we
immediately asked coders to identify a specific sub-register (see Table 2 below).
In both cases, users could select ‘other’ if the page did not fit clearly into one of
the existing categories.
Towards a user-based taxonomy of web registers 25
Then, once a user had selected one of those general categories (2.a.–2.f. in the list
above), we asked them to identify the specific sub-register. The full list of general
register and specific sub-register distinctions in our framework is listed in Table
2 below.
Table 2(continued)
5. express OPINION
– opinion blog
– review (product, service, movie, etc.)
– advice
– religious blog/sermon
– advertisement
– self-help
– letter to the editor
– other (opinion)
Table 3: Agreement results for the general register classification of 931 webpages
Table 4 shows that the levels of agreement were somewhat lower for the coding of
specific sub-register categories: raters were able to agree on the sub-register for
ca. 43 % of the web pages (with 3 or all 4 raters in agreement), while an additional
ca. 8 % of these pages were coded with a 2-2 split.
Table 4: Agreement results for the specific sub-register classification of 931 webpages
Taken together, the distributional results from the pilot study show that non-
expert web users can, to a large extent, reliably classify web pages into general
register categories, and that there is substantial agreement even for specific
sub-register categories.
The data obtained from this coding process allow us to begin to explore the
content of the web, asking what registers are especially prevalent and which ones
are relatively rare. Thus, Table 5 shows the breakdown of general register cate-
gories (presented in order of frequency) for all 931 texts in our corpus (see Table
3 above). Table 6 shows the breakdown of specific sub-registers within each of
these general register categories.
General Register # %
Register # %
Narrative 177
Table 6(continued)
Register # %
Opinion 121
Interactive Discussion 79
How-to/Instructional 27
How-to 13 48.1
Technical support 2 7.4
Recipe 1 3.7
Instructions 0 0
FAQ 0 0
No agreement on sub-register 11 40.7
30 Douglas Biber and Jesse Egbert
Table 6(continued)
Register # %
Lyrical 19
Informational Persuasion 15
Spoken 6
Interview 5 83.3
Transcript of video/audio 1 16.7
TV/movie script 0 0
No agreement on sub-register 0 0
Based on the data in our pilot corpus, the most common general internet register
is Narrative (19 % of the texts in our corpus; see Table 5). Table 6 shows that ca.
65 % of the texts in this general register were classified as either News report/
blogs or Sports reports/blogs. Many of these texts are examples of registers found
in print media that have simply been transferred to the web. At first we planned
to distinguish news/sports blogs, which have their origin on the web, from news/
sports reports that have their origin in print media. In practice, though, it proved
nearly impossible to determine whether a news/sports report was originally pub-
lished in a print newspaper or whether it had been written specifically for a web
blog. As a result, we treat these reports and blogs as a single category (although
it was generally easy for raters to distinguish between news reports/blogs versus
sports reports/blogs, based on the topic of the text).
The second most frequent general register is Informational Description/
Explanation (15 % of the texts in our corpus; see Table 5). However, as Table 6
shows, raters often failed to agree on the specific sub-register for this general
category (52 % of the total texts). In future research, we plan to investigate the
possibility of hybrid registers at the sub-register level to better understand the
nature of these texts.
Towards a user-based taxonomy of web registers 31
Opinion web pages were nearly as common as description pages (see Table 5).
Nearly half of these were classified as Opinion blogs (47 %), while another 19 %
were classified as Reviews. In general, there was much higher agreement about
these sub-register categories of Opinion than there was for the general category
of Informational Description/Explanation.
The Interactive Discussion general register was also used relatively fre-
quently, and the majority of these texts were classified as Question/Answer
forums. Similar to blogs, these are specialised web registers not found in print
media.
The other four general register categories – Lyrical, How-to/Instructional,
Informational Persuasion and Spoken – occurred much less frequently than the
major categories of Narration, Informational Description/Explanation, Opinion
and Interactive Discussion. However, it is clear that these registers each comprise
one or two important sub-register categories. For example, the specific sub-regis-
ters of song lyrics and spoken interviews were especially prevalent.
While some of these general registers and sub-registers are very similar to
traditional print registers (e.g., News reports, Sports reports, Reviews, Research
articles, Song lyrics), many of them are unique to the domain of the internet. For
example, the sub-registers of Personal/diary blogs and Opinion blogs, as well
as the general register of Interactive Discussion are distinctive to the internet.
Furthermore, some of the web registers that appear to be traditional are actu-
ally quite different from their printed, non-internet counterparts. This is due to
several factors, including the relative ease of ‘publishing’ on the internet and
decreased attention to pre-planning and editing common in many internet regis-
ters. In future research, we plan to explore these innovative registers in consider-
ably more detail (see Section 6 below).
5 H
ybrid registers
At the beginning of Section 4, we noted that many web pages were coded with a
2-2 split. For example, two raters might have coded a given page as a ‘narrative’,
while two other raters classified the same page as an ‘informational description/
explanation’. One interpretation of these splits is that they simply show a lack of
agreement among raters, reflecting a lack of reliability in the register framework.
However, the actual distribution of these pairings suggests a different interpreta-
tion.
In theory, there are 28 different 2-2 categories that could be formed by com-
bining the 8 general register categories in our framework. So, for example, there
32 Douglas Biber and Jesse Egbert
are 7 different 2-2 categories that could have been formed by combining ‘narra-
tive’ with one of the other categories (narrative-spoken, narrative-interactive
discussion, narrative-informational description, narrative-opinion, narrative-in-
formation presented with the intent to persuade, narrative-how-to, narrative-lyr-
ical). Similarly, there are 21 other pairings of general registers that are theoreti-
cally possible.
Given this fact, it is surprising that only four combinations of general registers
commonly occurred in 2-2 splits (see Table 7): Narrative+Informational Descrip-
tion, Narrative+Opinion, Informational Description+Opinion and Informational
Persuasion+Opinion. Other combinations occur in 2-1-1 splits (see Table 8). This
restricted set of commonly occurring register combinations suggests an alterna-
tive explanation for the lack of agreement among raters: rather than reflecting a
problem with the coding rubric, these common 2-2 combinations (and 2-1-1 com-
binations) can be interpreted as evidence that these texts belong to ‘hybrid’ reg-
isters – registers that combine the communicative purposes and other situational
characteristics of two or more general registers.
Evidence for this interpretation comes from the fact that these combina-
tions were identified by coders much more often than others. In particular, the
frequent hybrid combinations are restricted to four general register categories:
Narrative, Informational Description/Explanation, Opinion and Informational
Persuasion. These four general register categories are distinguished primarily by
their communicative purposes: For example, Table 7 shows that Narrative+Infor-
mational Description occurred 43 times, accounting for ca. 41 % of all 2-2 splits.
Table 8 shows that Narrative+Description+Other also accounts for ca. 56 % of 2-1-1
splits, further supporting the existence of a hybrid register that combines these
purposes.
Text Sample 1 illustrates a web page from the Daily Mail with combined Narra-
tive+Informational Description communicative purposes. Two raters coded the
sub-register of this text as a news report/blog and two other raters coded it as
a description of people. This text occurs online as a single web page (which is
still available on the web, despite its dated content). However, the text comprises
a series of topics, demarcated only by the use of ALL-CAPS. (The formatting of
the 8th paragraph is corrupted in the original version of the page online, since
THURSDAY nights and THE fashionable residents seem to begin new topics.) The
title of the page (It’s King Tony to see you, ma’am) seemingly relates only to the
first of these embedded topics. Such pages are common on the web (and perhaps
becoming more common in print media). They have no single topic or commu-
nicative purpose, except maybe to present a bunch of information that the author
happens to find interesting or amusing. The information in the page is sometimes
descriptive and sometimes narrative, resulting in the hybrid nature of such texts.
Text Sample 1:
<http://www.dailymail.co.uk/debate/columnists/article-316674/Its-King-Tony-maam.html>
<p> DON’T be taken in by claims that Tory chairman Liam Fox patched up the row over the
warning by Karl Rove --George Bush’s aide – that Michael Howard will never be allowed to
meet the President. Rove was “too busy” even to speak to Fox at the Republican convention,
let alone sit next to him during Bush’s speech, as was claimed.
<p> CHERIE BLAIR’S new job as ambassador for Britain’s 2012 Olympic bid has surprised
friends who cannot recall her interest in sport. She is being ‘coached’ by her new spin doctor
Jo Gibbons, a former Football Association aide.
<p> Gibbons is best friends with Jo Moore, the Labour aide who “coached” the former Trans-
port THURSDAY nights at London disco, Base 1, situated in a basement beneath the Tory
Party’s new HQ in Victoria Street, Westminster, are booming. The club has been “adopted”
by smart preppy males who work for the Conservatives and pop downstairs for a sweaty
session of high-energy dancing once a week. THE fashionable residents of Suffolk resort
Walberswick – including film-maker Richard Curtis and his partner Emma Freud, daughter
of ex-MP Clement – may be alarmed to learn the least fashionable member of the Cabinet
has moved in. Defence Secretary Geoff Hoon, the kind of man who wears knee-length socks
with open-toed sandals on his hols, is a new neighbour. Somehow he mingled with them
unnoticed at last week’s summer fete.
<p> THE death of spin has been greatly exaggerated. Labour HQ has sent out invitations to
MPs summoning them to a series of three all-day training sessions on how to ‘spin’ stories
to the media.
It is perhaps not surprising that such texts also often include opinionated pur-
poses. (Even Text Sample 1 could be interpreted in that way, although there are
few overt lexico-grammatical expressions of stance.) In particular, personal
blogs commonly combine narrative and opinionated purposes. For example,
Text Sample 2 was coded by two raters as a narrative-personal blog, and by two
raters as an opinion blog. A quick read through this text shows both purposes: it
begins with a narrative, but it also includes considerable discussion that could be
regarded as overt opinion (e.g., my gut is; Here’s one good reason to do that; But
I’m already on-side with that argument. It’s time to convince people…; ‘Making the
internet happen’ shouldn’t be magic).
Text Sample 2:
<http://matthewsheret.com/2011/08/26/time-to-get-out-more/>
manufacture. But I’m already on-side with that argument. It’s time to convince people
who’ll have to live with those products and live alongside the places that produce them.
<p> Here’s another. Russell jokingly mentioned the ‘Google apprenticeship’ as a means of
answering some of the questions floating around the room to do with aspiration, but my
gut feeling is that you get people engaged with working in companies like Google when
you demystify the whole process. ‘Making the internet happen’ shouldn’t be magic that
someone else does anymore, it should be something we show off.
<h> Find me at
<h> Email me
Text Sample 3:
<http://beta.fool.com/leglamp/2012/11/09/get-a-leg-up-on-the-market/16123/>
<p> WEAKNESSES
<p> The payout ratio on the yield is 90 % , very high for a company that is not a REIT or a
master limited partnership.
<p> Their P/E is higher than the industry average and higher than the 15.63 P/E of competitor
Genuine Parts Company (NYSE: GPC )
<p> While they manufacture most of their steel wire in house, steel is their number one raw
material and fluctuations in steel prices are a continuing concern, according to their 10-K.
<p> Revenue from international operations dropped due to currency fluctuations. […]
1 This option is not applicable to written interactive discussions, which incorporate reader com-
ments by definition. We are not sure why transcribed texts of spoken events are not followed by
reader comments in our sub-corpus.
Towards a user-based taxonomy of web registers 37
Narrative 87 49.1 %
Opinion 86 61.4 %
Description 37 30.6 %
Informational Persuasion 12 80.0 %
How-to/Instructional 8 29.6 %
Lyrical 4 21.1 %
Spoken 0 0
Discussion 0 0
Total 234 --
Table 10: Extent to which each register category was identified as a simple register (3 or 4 raters
in agreement), as a hybrid category (2-2 or 2-1-1 splits), or by only 1 rater
registers (e.g., a sports blog with both narrative and opinionated purposes).
We also expect to find some common hybrid sub-register categories that bridge
general registers (e.g., a personal blog + opinion blog hybrid; or an editorial +
review hybrid). We would not argue that one or the other of these approaches is
correct, but taken together, our hope is that we will be able to offer a more com-
prehensive description of the incredible range of register variation found on the
web.
Acknowledgements
This material is based upon work supported by the National Science Foundation
under Grant No. 1147581. We also thank Anna Gates and Rahel Oppliger for their
help with the pilot testing of register classification schemes.
References
Baroni, Marco and Silvia Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the
web. Proceedings of LREC 2004, 1313–1316. Lisbon: ELDA.
Baroni, Marco, Silvia Bernardini, Adriano Ferraresi & Eros Zanchetta. 2009. The WaCky wide
web: A collection of very large linguistically processed web-crawled corpora. Language
Resources and Evaluation 43(3). 209–226.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999.
Longman grammar of spoken and written English. London: Longman.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Connors, Robert J. 1981. The rise and fall of the modes of discourse. College Composition and
Communication 32(4). 444–455.
Crowston, Kevin, Barbara Kwaśnik & Joseph Rubleske. 2010. Problems in the use-centered
development of a taxonomy of web genres. In Alexander Mehler, Serge Sharoff & Marina
Santini (eds.), Genres on the Web: Computational models and empirical studies, 69–84.
New York: Springer.
Egbert, Jesse & Douglas Biber. 2013. Developing a user-based method of web register
classification. In Stefan Evert, Egon Stemle & Paul Rayson (eds.), Proceedings of the 8th
Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013, 16–23.
Fletcher, William H. 2012. Corpus analysis of the World Wide Web. In Carol A. Chapelle (ed.),
Encyclopedia of applied linguistics, 1339–1347. Hoboken, NJ, Wiley-Blackwell.
Kilgarriff, Adam and Gregory Grefenstette. 2003. Introduction to the special issue on the Web
as Corpus. Computational Linguistics 29. 333–347.
Towards a user-based taxonomy of web registers 41
Lindemann, Christoph & Lars Littig. 2010. Classification of Web sites at super-genre level.
In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web:
Computational models and empirical studies, 211–235. New York: Springer.
Rehm, Georg, Marina Santini, Alexander Mehler, Pavel Braslavski, Rüdiger Gleim, Andrea
Stubbe, Svetlana Symonenko, Mirko Tavosanis & Vedrana Vidulin. 2008. Towards a
reference corpus of Web genres for the evaluation of genre identification systems.
In Proceedings of the 6th Language Resources and Evaluation Conference, 351–358,
Marrakech, Morocco.
Rosso, Mark A., & Stephanie W. Haas. 2010. Identification of Web genres by user warrant.
In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web:
Computational models and empirical studies, 47–68. New York: Springer.
Santini, Marina. 2007. Characterizing genres of Web pages: Genre hybridism and
individualization. In Proceedings of the 40th Hawaii International Conference on System
Sciences (HICSS-40). Hawaii.
Santini, Marina. 2008. Zero, single, or multi? Genre of Web pages through the users’
perspective. Information Processing and Management 44(2). 702–737.
Santini, Marina and Serge Sharoff. 2009. Web genre benchmark under construction. Journal for
Language Technology and Computational Linguistics 25(1). 125–141.
Santini, Marina. 2010. Cross-testing a genre classification model for the Web. In Alexander
Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models
and empirical studies, 87–127. New York: Springer.
Sharoff, Serge. 2005. Creating general-purpose corpora using automated search engine
queries. In Marco Baroni and Silvia Bernardini (eds.), WaCky! Working papers on the Web
as Corpus, 63–98. Bologna: Gedit.
Sharoff, Serge. 2006. Open-source corpora: Using the net to fish for linguistic data.
International Journal of Corpus Linguistics 11(4). 435–462.
Sharoff, Serge, Zhili Wu & Katja Markert. 2010. The Web library of Babel: Evaluating genre
collections. In Proceedings of the Seventh Language Resources and Evaluation Conference,
LREC 2010. Malta.
Vidulin, Vedrana, Mitja Luštrek & Matjaž Gams. 2009. Multi-label approaches to Web genre
identification. Journal for language technology and computational linguistics 24(1).
97–114.
Heidrun Dorgeloh
The interrelationship of register and genre
in medical discourse
Abstract: This chapter is concerned with medical discourse which is produced
beyond the established roles of doctors and patients. The text varieties inves-
tigated are all somewhat hybrid, either in form, discourse function, or both. A
study based on a small corpus of these texts investigates the presence of features
from a narrative discourse mode and finds variable relationships of textual form
and textual function, which are then discussed from a genre as well as from a
register perspective. While it turns out that the presence of a narrative register
crosscuts over specific discourse activities, the genre perspective can explain the
nature of this textual variation. It accounts for the pervasiveness of linguistic fea-
tures but, more importantly, for the variant discourse functions which apply to
the verbalisation of medical experience. In such cases, it is argued, a genre ana
lysis logically subsumes and pre-determines a register analysis.
1 I ntroduction
Medicine uses a variety of texts since it is both an “area of knowledge […] and the
applied practice of that knowledge to medical praxis” (Gotti and Salager-Meyer
2006: 9). Accordingly, most linguistic research on medical discourse focuses
either on written genres of the medical profession, such as case reports or medical
research articles, or on the speech of medical practitioners and their patients, i.e.
on medical encounters or interviews. By contrast, the present study is concerned
with text varieties in medicine which are produced beyond the established roles
of both speaker groups. It deals with illness blogs, on the one hand, and medical
case presentations, including some innovative forms, on the other. These consti-
tute, in line with the purpose of the present volume (cf. Schubert, this volume),
less established and more hybrid forms of medical case writing and thus provide
good cases in point for illustrating new directions in register research. In particu-
lar, I will argue for a close interrelationship between register and genre as well as
for a primacy of the notion of genre, rather than (sub-)register.
As laid down in the introduction to this volume, register and genre are dif-
ferent perspectives for analysing text variety: the register perspective considers
functional correlations of linguistic co-occurrence patterns with variables from
the situation of use while the genre perspective refers to properties of entire texts
and has a conventional basis (Biber and Conrad 2009: 15; also Schubert, this
volume). It results from this distinction that a register analysis rests upon quan-
titative co-occurrence patterns in a given situation whereas genre characteristics
can actually be quite rare. They contribute to the rhetorical organisation of a text,
often occurring only once or in a particular position (Biber and Conrad 2009: 16).
Since textual variation can in principle refer to any level of text classification
(Biber 2006: 12) other approaches to register and genre point out that the con-
cepts also differ in the level of generality at which they determine situational vari-
eties (Giltrow 2010; Dorgeloh and Wanner 2010). The concept of a genre focuses
primarily on the discourse goals and purposes (e.g. Martin and Rose 2003; Swales
2004), on the kind of “social action” (Miller 1984); therefore the classification
is typically more specific for genres than for registers (Giltrow 2010: 30). More
specialised text varieties are also referred to as “sub-registers” (cf. Biber and Gray
2013), but genre studies have emphasised that the textual or social event is an
important basis for text classification, thus subsuming in one category a co-pat-
terning of setting, structure, and function (Richards and Schmidt 2002: 224). I will
argue here that for text varieties of medical discourse, which are often marked by
“discourse hybridity” (Sarangi and Roberts 1999; Sarangi 2001; also cf. Biber and
Egbert, this volume), a genre perspective in line with these approaches covers
the relevant linguistic patterns at a sufficient level of specificity. In particular, I
will show that the form-function correlations that one finds have more to do with
activity types, such as covered by the concept of genre, than with general situa-
tional parameters.
The case studies presented below contrast with more recently developing
medical genres. The aim of the analysis is to show that, on the one hand, there
are general discourse goals and purposes within medical discourse, notably
narration, which crosscut over all the texts investigated. The resulting language
variation is covered by the register perspective, since it defines a rather general,
presumably universal, register pattern (Biber and Conrad 2009: 259). On the other
hand, this pattern serves in a given genre more specific discourse goals, which
are expressed by features which need not be frequent nor pervasive. For example,
the interactional hybridity of a medical encounter includes a narrative discourse
type, but this type is embedded within a more complex social event, in which a
doctor fulfils several tasks such as data gathering, relationship building, and edu-
cating the patient about diagnoses and treatment (Frankel 2000: 85; also Maseide
2003). This variation within one activity produces more hybrid registers. In such
The interrelationship of register and genre in medical discourse 45
cases, the genre perspective has clear advantages over the register perspective,
since it focuses on the social activities going on and hence provides text classifi-
cation at a rather low level of generality. However, this means that the concept of
genre must be taken beyond the limits of rhetorical conventions.
The chapter is structured as follows: in Section 2, I offer a more detailed
consideration of the concepts of register and genre as categories for text clas-
sification from a theoretical point of view. Section 3 introduces three varieties
of medical discourse: on the one hand, it describes how they are situated with
regard to a general narrative dimension of textual variation (level of form); on the
other, the texts are discussed as instantiating different genres (level of discourse
function and social activity). The resulting profiles of the three functional varie-
ties show that the sample texts investigated are all hybrid in either form, function
or both. This complex picture is typical for the domain of medicine, and it can be
best understood from the genre perspective. Based on these profiles, an analysis
of characteristic form-function-relations within the medical register, in particular
with regard to narrative features, is provided in Section 4, followed by a conclud-
ing discussion in Section 5.
2.1 R
egister and genre in the context of the study of language
variation1
principle also involves “different ways of saying different things” (Halliday 1978:
35; emphasis added). As a result, the study of textual variation deals with “varia-
tion in verbalization [which] is not occasional [… but] UBIQUITOUS” (Croft 2010:
10; emphasis in the original).
This difference allows for some insights regarding the nature of both regis-
ters and genres. Rosenbach (2002: 77) proposes the attribute “choice-based” for
this type of linguistic variation, in contrast to the “variation-based” perspective,
which concentrates on sets of formal variants. The study presented here, and
in fact the entire volume, belongs to the choice-based, “text-linguistic” tradi-
tion (Biber 2012: 12), which means that the texts themselves are the target of the
description and not a predictor for the occurrence of formal variants.2 It results
from this approach that register and genre differences are typically “not categor-
ical (such that one variety has a certain grammatical element or syntactic con-
struction which another has not)” (Kortmann 2006: 603); instead, the choices
motivated and reflected by them are “meaningful choices”, in the sense of
serving “the […] needs of the language user” (Schulze 1998: 7). As shown below,
this applies not only to the occurrence of individual linguistic features, but also
to entire patterns of textual form, which can be shared by what are nonetheless
distinct text varieties.
Another consequence of the “polyvalent” nature of “grammatical structure in
discourse” (Sankoff 1988: 141, emphasis in the original) is that genres, but not reg-
isters, are in principle formally “underdetermined” (Giltrow and Stein 2009: 3).
Only by virtue of their being “typified responses to situations” (Salmon 2010: 219)
do users of a genre generally know what to expect and infer “both the stable and
variable aspects of form” (Salmon 2010: 223). For the linguistic variation taking
place within them this means that the genre perspective includes both frequently
occurring features as well as patterns that occur less pervasively; i.e. the genre
perspective logically subsumes, rather than opposes, the register perspective.
focuses primarily on the discourse goals and purposes, including “culturally rec-
ognized” patterns (Coupland 2007: 15) for realising them. As a result, the level of
genre classification tends to be lower, i.e. more specific, suggesting that genres
can, and typically do, contrast in registers, for example when requiring a certain
level of formality or technicality. Use of a certain register is therefore a function
of, but not a sufficient condition for, a genre, i.e. the genre perspective is the more
encompassing one.
In the text-linguistic tradition, discourse goals and purposes have also led
to the establishment of text typologies, which often integrate basic rhetorical
types (e.g. Kinneavy 1971; Werlich 1976). The text or discourse type here refer to
entire texts; but this tradition is still rather separate from genre analysis, if only
due to the fact that they “feature in different studies” (Virtanen 2010: 55). By
contrast, corpus-linguistic work (e.g. Biber 1988, 1989) understands text types
as “co-occurrence variables” (Eckert and Rickford 2001: 5), i.e. these text types
are, much like registers, the outcome of a classification based on linguistic form
(Biber 1988: 170). It is a central insight from this corpus-based tradition that genre
distinctions do not “adequately represent the underlying text types” (Biber 1989:
6). This finding is further support for the position that genres are to a certain
extent underdetermined by, and hence independent of, their form.
The category of discourse type, in contrast to text type, refers more directly to
the function of a discourse (Virtanen 2010: 57), but, in contrast to the discourse
goal pertaining to a genre, this has traditionally meant a discourse classification
based on a limited set of functions; for instance, on a classification of illocutions
(e.g. Brinker 2005). It is an important insight from this kind of work that the func-
tional discourse types are related in different ways to their linguistic form, since
a discourse type can express its function more or less directly (Virtanen 1992a,
2010). Narrative structures, in particular, have been noted to have primary or sec-
ondary uses, i.e. they are a textual pattern that “can be put to use in very different
genres” (Virtanen 2010 76).3
The analysis of medical texts presented here rests upon such a principled
separation of linguistic form, i.e. register features and text structure, and dis-
course function. A classification by discourse function leads, at a more general
level, to the identification of the discourse type; at a more specific level, it results
in genres. The analysis is also based on the assumption that the category of “nar-
rative” refers both to a very basic and presumably universal register and text type
(Virtanen 1992a; Biber and Conrad 2009) as well as to a widely used discourse
type or meta-genre (Fludernik 1996; Smith 2003). In the domain of medicine,
3 Werner (this volume), for example, notes the narrative properties of online text commentaries.
48 Heidrun Dorgeloh
both narrative form and function play a prominent role, since knowledge in this
discipline is not just expertise, i.e. “relevant biological and pathological infor-
mation”, but is primarily evidence based on human experience (Hunter 1991: 8).
It is interesting to note in this context that recent discussions on medical dis-
course have argued quite explicitly in favour of a more “narrative” kind of med-
icine (e.g. Charon 2006), emphasising the importance of the individual patient
and his or her experience. As a result, there are now genres within the medical
register which are innovative particularly with respect to the role of narration.
While proper storytelling is absent in professional medical reporting, there are
now other types of medical discourse which are more open to narration. This dif-
ference, however, does not primarily manifest itself in a more or less extensive
use of narrative features. Looking at three different genres from the medical reg-
ister in this study, I therefore hypothesise here that 1) a narrative discourse func-
tion correlates only insufficiently with a narrative form, and that 2) a discourse
purpose other than narration does not necessarily result from the absence of nar-
rative form. This in turn suggests that the function or goal of a discourse is not
primarily something to be observed in the form of frequencies of occurrence. On a
more theoretical level, these findings will lead me to the claim that, with respect
to the specific discourse goals and purposes typical of the context of medicine,
the target of the description should be the genre, rather than the register.
The instances of medical discourse which I will cover in my analysis come from
three different sources: illness blogs written by patients, case reports written by
doctors, and texts from a special section termed “Clinical Crossroads” of The
Journal of the American Medical Association (JAMA). Each of these text varieties
is characterised more closely in Sections 3.2 to 3.4. Before discussing these genre
profiles, I will first comment on the general nature of the relation between their
situational characteristics, in particular the discourse function, and their linguis-
tic form.
The three text varieties represent discourse with different perspectives on the
topic of disease or illness; i.e. the medical topic is the only situational variable
which they share. The texts differ, not only in the different speaker roles of doctor
and patient, but, more specifically, in that these groups of authors assume, by
different ways of speaking throughout their own discourse, different “voices”
The interrelationship of register and genre in medical discourse 49
(Mishler 1984: 103). In the professional medical discourse “of disease” (Fleisch
man 2001: 475), such as in case reports, doctors primarily use the voice of medi-
cine; however, they also have a doctor’s voice when they occur in the discourse as
a participant, for example, when concerned with “information about the patient’s
current health condition, […] patient compliance, and […] test results” (Murawska
2012: 71). Patients, by contrast, have primarily a voice of health-related storytell-
ing, but over time they also develop a medical competence of their own (Cordella
2004: 119). At some point, diagnosis and further treatment become a collabora-
tive effort, which is when patients also use elements of a voice of medicine. The
interactional hybridity of medical discourse referred to above is thus primarily a
hybridity of voices and it is one of the central variables that guide linguistic vari-
ation across all medical text varieties.
By contrast, illness blogs, professional case reports and the discourse jointly
produced by doctors and patients for “Clinical Crossroads” (for details, cf.
Section 3.4) differ in a variety of other situational variables, especially those per-
taining to production circumstances and setting (cf. Biber and Conrad 2009: 40).
The text varieties under investigation are therefore not easily subsumed as one
single register. However, instead of taking up a principled position about where a
register ends, and a new (sub-)register starts, the analysis below rests upon two
observations. On the one hand, the verbalisation of a disease or illness leads to
a concern with medical case histories, which cuts across general communicative
purposes, such as to narrate or to report (cf. Biber and Conrad 2009: 40). Linguis-
tically, this is marked by a pervasive presence of linguistic features such as “past
tense, communication verbs, third person pronouns, and time adverbials”, i.e.
the characteristic features of a narrative dimension of linguistic variation (Biber
and Conrad 2009: 259). It is with regard to these features, which arise out of the
topic of illness, that the texts share the same register.
On the other hand, although there are recognisably different discourse
goals involved in the verbalisation of a case history, the difference between
“private” and “public” medicine has always been gradual, as the evolution of
medical research writing has also shown (Atkinson 1992: 361–363). While profes-
sional medicine has long drifted away from the “rhetoric of immediate experi-
ence” (Atkinson 1992: 359), and while published case reports are professional
and public, only illness blogs constitute real narratives of personal experience.
However, nowadays, with the movement towards a narrative medicine, there are
also professional texts which aim at being more “patient-focused” again (Winker
2006: 2888).
Genre categories grasp this mixing of purposes and voices present in such
developments, not only due to the level of specificity they refer to, but also
because genres are often formally underdetermined and may therefore be com-
50 Heidrun Dorgeloh
posed of hybrid form. This is illustrated in Figure 1, which shows the three text
varieties as three different genres, with distinctly different discourse goals and
purposes, as the discussion has just shown. On the level of the general commu-
nicative purpose, i.e. at a high level of generality, these discourse functions can
be described as being narrative, non-narrative, or hybrid. This categorisation
links up the genre classification to register variation, because the narrative as dis-
course mode (Georgakopoulou and Goutsos 2004: 43–47) is an important aspect
of the register in all three cases. As the analysis below will illustrate in detail,
the narrativisation of the events (Georgakopoulou and Goutsos 2004: 43) which
have to do with the course of an illness is a major source of hybrid form across
the three text varieties and therefore explains some pervasive register features.
Before turning to the linguistic features and their interrelationship with the genre
category in Section 4, the next three subsections will introduce each text variety
and the sample texts used in more detail.
Medical topics are among the ubiquitous contents on the internet (Döring 2003:
19). When patients tell their stories on the web, i.e. when they produce narratives
of illness (cf. McCullough 1989: 124), this constitutes, not “a solitary occupation”,
The interrelationship of register and genre in medical discourse 51
but one which is shaped by the context of “the community of web users” (Page
2012: 45). Patients’ tales in illness blogs are thus more interactive than when elic-
ited in medical interviews, and they establish a particularly strong relation to the
audience: “the primary function of the comments on the […] blogs is to provide
or seek support in the form of shared experience, advice, and encouragement”
(Page 2012: 45).
From the point of view of this interactive function, illness blogs qualify as
patients’ tales, i.e. proper stories, but not in the first place from a structural point
of view. Narrative discourse, in essence, “attempts to sweep narrator and audi-
ence into a community of rapport”, i.e. the aim is to move, rather than to inform
(Georgakopoulou and Goutsos 2004: 53; also Tannen 1989). This means that,
although patients’ tales typically employ a “narrative syntax” (Labov 1997: 3),
they show the narrative mode primarily due to the “function of personal inter-
est” (Labov and Waletzky 1967: 13; emphasis added). This function rests upon
the sharing of the individual experience of illness (Dorgeloh 2012: 263) and dis-
tinguishes a patient’s tale, as any other kind of story, from a report, which “is
most typically elicited by the recipient […] or in response to circumstances which
require an accounting of what went on“ (Polanyi 1985: 10–11).
The examples of the variety of illness blogs come from a website where
patients share their stories about a rare neurological disease [SPS: The Real
Stories4]. Note that, as its title suggests, the website focuses primarily on the pub-
lication of the stories, and not, as other types of illness blogs, on the discussion
and commenting of postings on illness (cf. Page 2012). As sample (1) illustrates,
the typical structure is that the patients introduce themselves and then turn to the
chronology of the events:
(1) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symptoms
for the past several years that got worse as the years went on. I was exercising and
swimming three times a week and then I started getting more muscle cramps. I went to
the doctor and he just told me to take calcium and magnesium and drink more water.
It took him a long time to understand that the muscle cramp were extremely painful
happening several time a day. I would have abdominal muscle cramps that felt like i
was in full-blown labor. They would come on suddenly when I was startled or when I
coughed. They would ease up for a few seconds and then just get worse again. Several
times my feet and hands would cramp up until they were fully distorted. I did go to a
neurologist who seemed to have an idea of what I had but made no effort to diagnosis
what I had. He told me that it would not do any good to try to diagnosis my disease
and instead gave me all kinds of different pills and most of them did not work well and
also caused several side effects. Often when I went to see him I did not feel like he even
remembered me. I did finally request a new doctor, which has been a Godsend to me
and now is treating me with IVIG, which is working well. My symptoms still get worse
at times but they are manageable. I am eager to talk to people that have the same syn-
drome. Most people do not understand the pain and all the other symptoms. I found
your web site today and am eager to learn more. (http://www.stiffpersonsyndrome.net,
accessed March 17, 2011)
The proper narrative contained in (1) ends when the course of the events reaches
its most recent state. This description of the current situation (My symptoms still
get worse at times but they are manageable) serves as a coda and is followed by
an explicit mention of the story point. This point relates to the ill person him- or
herself, as in (1), or it centres on the social function of the blog by addressing the
readers’ interests, as in (2) and (3):
(2) If in any way I can contribute to bringing awareness to this insidious disease I throw in
my hat. (Wendy’s story; http://www.stiffpersonsyndrome.net)
(3) I must tell you that neither my wife nor myself ever gave up hope, In fact just the oppo-
site. We were very pro active in the treatment of our diseases. […] My prayer is for all
of you to see your journey through SMS with the knowledge that there is hope for all.
Stay the course, keep the faith, and fight on. (John’s story; http://www.stiffpersonsyn-
drome.net)
The story point expressed in (3) shows that the verbalisation of the experience
of illness has a strong component of self-reflection and evaluation. Many illness
blogs have such properties of “reflective anecdotes” (Page 2012: 58–59) and in
that tend towards less purely narrative text forms. It is highly typical that, instead
of the completeness of the recount and the degree of detail which one can expect
of more trivial narration (Georgakopoulou and Goutsos 2000: 125), patients’ tales
often limit themselves to “remarkable event[s], characterized by an evaluative
punch line” (Page 2012: 59). As was illustrated in Figure 1, a patient’s tale there-
fore possesses hybridity in its narrative form, since it limits the experience which
is shared to the main points of interest.
Case presentations in the form of published case reports are used by medical pro-
fessionals “to communicate the salient details of patient cases to one another”
(Schryer et al. 2003: 63; also Hurwitz 2006: 217), which means that the texts
pursue a predominantly professional discourse goal. On a more general level, the
discourse function is thus to inform, i.e. state “verifiable events”, rather than to
The interrelationship of register and genre in medical discourse 53
move. This function contrasts with the point of personal interest which applies to
proper storytelling, which is why the discourse mode in case reports is essentially
non-narrative (cf. Georgakopoulou and Goutsos 2004: 53).
The central component of a case report is the case presentation itself. It
begins “ritualistically with a brief account of a patient’s complaint as translated
by the doctor” (Hurwitz 2006: 234; emphasis added), followed by an account of
the examinations, findings, diagnosis and suggestions for treatment. Text (4)
exemplifies such an initial case presentation, referring to the same disease as
text (1):
(4) A 27-year-old Hispanic woman presented to the University Medical Center Emergency
Department in Las Vegas, Nevada with a sudden onset of shortness of breath and
increased difficulty in moving her right arm. She reported that during the evening
prior to her presentation, she was lying down when she began to experience shortness
of breath with worsening right-arm weakness. She also reported that for the past two
months her arm weakness was characterized as having limited strength and range of
motion. She also complained of chest pains that were localized behind her sternum.
The pain was characterized as a pressure sensation that was non-radiating. She did not
have any aggravating or relieving factors. Pertinent positive findings included nausea,
palpitations and lightheadedness. Pertinent negative symptoms included no loss of
consciousness, headache, vomiting, diarrhea, or vertigo. (Journal of Medical Case
Reports 4, 2010)
It has been noted that case reports published in journals “reorganize clinical
data using a variety of narrativising techniques” (Hurwitz 2006: 217; also Hunter
1990). However, as one can see in (4), from a narratological viewpoint this is
only a “degree-zero” narrativity (Fludernik 1996: 358); i.e. although a sequence
of events is verbalised, it is “translated” by a medical professional. The result is
a discourse which deals with a disease, i.e. which foregrounds the medical facts
and assigns “the sufferer […] the experiencer role” (Fleischman 2001: 476). In
such a text, the chronology lacks “experientiality” as the central component of
narrativity (Fludernik 1996) and is therefore only a hybrid narrative form.
The Grand Round begins with the case history of a patient and that patient’s firsthand
account of the medical decision he or she faced, occasionally along with the patient’s
primary care physician’s perspective. These accounts are followed by questions for the
Grand Rounds discussant, which the discussant, usually a well-recognized authority on
the clinical topic, addresses based on available evidence in the literature, and, where no
evidence exists, clinical experience. Following the presentation, the discussant drafts the
manuscript for submission to JAMA, including the case description, the patient’s perspec-
tive, the discussion (including references and pertinent tables and figures), and the ques-
tion-and-answer session that occurred at the end of the Grand Rounds. The manuscript
then undergoes editorial evaluation, external peer review, and revision. If the manuscript
is revised satisfactorily and determined to have a level of quality appropriate for JAMA, the
manuscript is accepted and published in JAMA and usually is featured in Clinician’s Corner.
(Winker 2006: 2888)
The idea behind this more innovative medical text variety is to approach a case
from various perspectives, including that of the patient. The purpose is not only
to offer and exchange information, but to improve medical decisions, which is
to be achieved by “aligning the goal of the patient and physician” (Winker 2006:
2888). Since its foundation, the section has been re-structured several times, but
the core idea, a joint context for doctors and patients, who contribute different
perspectives, has essentially remained unchanged. (5) and (6) are text samples of
a patient’s and a doctor’s presenting on the same case:
(5) After I had bladder surgery […], my doctor told me, “I have good news and bad news
and good news; it’s not bladder cancer, but the bad news is that it’s something else.”
I accepted the complete hysterectomy, which at my age was not disturbing news. But
in terms of the treatment and how it was going to affect me, the thing that worried me
most was that I kept hearing about nausea, exhaustion, and that I wouldn’t be able to
do things. As a result of that, I canceled my teaching for that fall.
I remembered being very anxious the first day of chemotherapy because I just didn’t
know what to expect. I decided to do the intraperitoneal chemotherapy because it
made spatial logic to me. If you are aiming a treatment at the area of the cancer, it was
going to get there more rapidly. I probably had some benefit from having had this mode
of treatment before I went back to complete the treatment with the IV.
Now, I have CAT [computed tomography] scans every 3 to 4 months. I don’t like to go to
doctors, my mother never went until she was 80, but I go now because I’ve learned to
trust the process, so I keep my appointments. The last time I chatted with the oncolo-
gist, I asked him if we could talk about the kinds of symptoms I should look for going
forward. What should I expect for myself? (Journal of the American Medical Associa-
tion, 4 April 2010; Ms W)
(6) Ms W is a 75-year-old woman with epithelial ovarian cancer. She first developed lower
abdominal pain in 2008. After workup for a genitourinary origin of the pain, she was
found to have a 13.5 × 11 × 15.5–cm complex right adnexal mass. She had an optimal
surgery cytoreductive, with less than 1 cm of peritoneal disease remaining at the end of
the procedure. The pathologic findings were consistent with epithelial ovarian cancer
The interrelationship of register and genre in medical discourse 55
The texts in (5) and (6), although from a highly professional medical journal, illus-
trate that the discourse is intended for the narrative kind of medicine described in
Section 2.2. This situation makes for text varieties that show a more mixed char-
acter than illness narratives, as exemplified by (1), as well as case reports, such as
(4). On the one hand, both (5) and (6) have a chronological structure, i.e. “degree-
zero” narrativity; on the other hand, the patient in (5) shows a degree of expertise
and professional competence, a voice of medicine (cf. Section 3.1), which makes
the register in the text more similar to professional medical discourse, like (4) and
(6). As a result, (5) and (6) possess hybridity in form, i.e. they combine narrative
and non-narrative register features.
By contrast, considering the discourse function, the doctor’s motivation in
this context is not limited to presenting a case to colleagues. Instead, there is a
more personal, though third-party, point in telling the patient’s story, as expressed
by She is […] questioning her prognosis and how she should be followed up in the
long term. Although throughout the main body of the presentation the doctor uses
the voice of medicine, the main purpose is collaboration and a joint effort; the
presentation thus comes from the doctor’s voice and carries an indirect, and ulti-
mately more hybrid function for a narrative. As the analysis of linguistic features
in the corpus study will show, this complex relationships of form and function is
reflected by the genre perspective.
4 R
egister and genre profile of the three types of
medical discourse
The analysis which follows is based on a small corpus of texts, covering in roughly
equal shares the three genres under investigation and amounting to a 3,777 words
56 Heidrun Dorgeloh
total. The exact proportions are included in Table 1. The analysis is intended as
a pilot study and rests upon a limited database, but it will demonstrate how the
interpretation of findings on register features benefits from a genre perspective.
Numerous studies already document the co-occurrence of features from a nar-
rative dimension of variation on a quantitative basis (starting with Biber 1988,
1989), among which, most notably, the presence of past tense forms, pronominal
reference, and time adverbials. The claim here is that these features, which are
pervasive to varying extents in the texts investigated, on the one hand testify the
formal hybridity of the genres as illustrated in Figure 1 but, on the other, do not
determine the text variety at a sufficient level of specification.
The more integrated genre analysis will be presented in two steps: in 4.1, the
register features indicative of a chronology, i.e. past tense narration and time
adverbials, are functionally re-interpreted from the point of view of the genre in
which they occur. This part of the analysis illustrates that in medical discourse
high frequencies of narrative features may in fact correlate with a non-narrative
discourse mode. It is argued, in particular, that the dominance of such narrative
text form goes beyond the presence of narrative episodes, which is something
that applies to many kinds of discourse (e.g. Csomay 2006, 2007; also Werner,
this volume), but is specifically motivated by the “object-oriented” discourse goal
of the genres investigated here. Section 4.2 then looks at features reflecting the
expression of human experience: pronoun usage and choice of subjects. The aim
of this section is to show that, rather than in a grammatical form such as pronoun
usage, genres with a narrative as opposed to a non-narrative purpose differ in a
characteristic way in a use of semantic categories. The more general claim behind
both analyses is that genre categories, in the sense of referring to discourse at a
relatively low level of generality, are effective beyond both register features as well
as textual conventions, but lead to patterns at several levels of analysis. Complex
discourse goals, such as the verbalisation of medical experience, are therefore
better accounted for from a genre, rather than from a register perspective.
other tenses.5 The second feature is the use of time adverbials, which situate the
events in their temporal sequence. For example, text (1), shown here as (7), has
non-narrative passages (printed in italics) in the beginning and in the closing
evaluative comment, serving as a coda, while the main body of the narration is
structured in episodes marked by explicit temporal reference (in bold print).
(7) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symp-
toms for the past several years that got worse as the years went on. I was exercising
and swimming three times a week and then I started getting more muscle cramps. I
went to the doctor and he just told me to take calcium and magnesium and drink more
water. It took him a long time to understand that the muscle cramp were extremely
painful happening several time a day. I would have abdominal muscle cramps that
felt like i was in full-blown labor. They would come on suddenly when I was startled
or when I coughed. They would ease up for a few seconds and then just get worse
again. Several times my feet and hands would cramp up until they were fully dis-
torted. I did go to a neurologist who seemed to have an idea of what I had but made
no effort to diagnosis what I had. He told me that it would not do any good to try to
diagnosis my disease and instead gave me all kinds of different pills and most of them
did not work well and also caused several side effects. Often when I went to see him
I did not feel like he even remembered me. I did finally request a new doctor, which
has been a Godsend to me and now is treating me with IVIG, which is working well. My
symptoms still get worse at times but they are manageable. I am eager to talk to people
that have the same syndrome. Most people do not understand the pain and all the other
symptoms. I found your web site today and am eager to learn more. (http://www.stiff-
personsyndrome.net , accessed March 17, 2011)
Table 1 shows the proportion of text passages in the narrative mode across the
three genres. While case presentations from the medical case report contain only
past tense passages, patients’ tales from the blog, i.e. from a medium that encour-
ages reflection and relation-building (cf. Section 3.2), have a lower proportion of
the narrative mode. The texts from “Clinical Crossroads” contain the lowest pro-
portion of proper narration, which is in line with a discourse goal consisting in,
not only the sharing of information, but also in preparing an adequate decision.
5 Instead of counting verb forms, the proportion of narrative as opposed to non-narrative mode
is measured in the relative length of the text passages in which past as opposed to non-past
tenses are used.
58 Heidrun Dorgeloh
time adverbials per 100 words 3.11 (35) 1.44 (20) 2.78 (35)
(absolute frequency)
The results are almost opposite for the occurrence of time adverbials: their fre-
quency is high in the patients’ tales, including the case presentations in “Clinical
Crossroads”, and much lower in case reports. Explicit temporal reference thus
seems to be directly related to more personal accounts, i.e. to a narrative, or at
least to a partly narrative (hybrid) function (cf. Figure 1). The finding is in line
with research which has shown that in proper stories time adverbials do not only
carry temporal meaning, but are also text-strategic devices (cf. Virtanen 1992b).
Note, however, that this applies, in particular, to time adverbials in sentence-in-
itial position, where they mark temporal shifts in the progression of a narrative
strategy (Virtanen 2004). For example, in (7) then I started getting more muscle
cramps marks the beginning of a new episode, whereas uses of the same tempo-
ral adverb in (6) (She then started […] chemotherapy, […]; She […] was then regis-
tered in a clinical trial), do not mark a text structure based on temporal sequence
and are thus placed sentence-medially. This means that the point of departure
is the patient as medical case, and not as a character. The lower amount of time
adverbials in case reports thus reflects their “topic-oriented strategy” focussing
on the medical case, turning them into an expository, rather than a narrative, text
(Virtanen 2010: 66–67).
These two findings together suggest that a differentiated look is necessary
when interpreting quantitative results about pervasive linguistic features in their
discourse context. In particular, a narrative form and a narrative text function
need to be distinguished, as the outline in Sections 3.2 to 3.4 and the illustration
in Figure 1 have shown. In the texts investigated, the non-narrative function of the
case reports, in the sense of a lack of personal story-point, goes together with an
6 Besides the individual sample texts discussed for illustration in Section 3, the corpus consists
of other texts from the same genre, totaling to the amount of words as indicated.
7 As reflected by the use of past tense verbs. As in texts (1) and (5), this also includes the use of
the so-called “habitual conditional” (cf. Haiman and Kuteva 2002: 120).
The interrelationship of register and genre in medical discourse 59
exclusive use of past tense forms, showing that the verbalisation of a chronology
of events has a variety of uses (cf. Section 2.2). A narrative function, by contrast,
also involves passages in which the narrative mode is absent, since it is evaluative
comments, particularly the coda, which verbalise the point of a proper story. In
this way, although dominated less by past tense narration, illness blogs as well
as case presentations from “Clinical Crossroads” gain their narrative or hybrid
function from passages in the non-narrative mode – a form-function complexity
which a genre perspective makes understood.
While past tense forms and time adverbials have to do with the past temporality
of the events reported, the pervasiveness of pronominal reference, as opposed
to more explicit forms of expression, arises from the fact that a narrative verbal-
ises human experience (Biber and Conrad 2009: 259, also cf. Neumann and Fest,
this volume). The presence of a narrator allows “readers to immerse themselves
in a different world and in the life of the protagonists” (Fludernik 2009: 6). The
main protagonists in a medical process are the doctor and the patient, reference
to them being made, in particular, when the doctor’s voice and the patient’s sto-
rytelling are used (cf. Section 3.1). By contrast, the voice of medicine tends to
de-focus human experience; turning the language of medical discourse into a
more scientific register, which is “object”- rather than “agent”-oriented (Atkinson
1999). It is therefore expected that reference to these different components of an
illness correlates in significant ways with the genre of medical discourse.
As Table 2 shows, the frequency of pronouns8 is higher in the text samples
with a narrative or hybrid function, i.e. in the patients’ tales and in “Clinical
Crossroads”. It is lower, though not very low, in the case reports. This reflects the
non-narrative, object-oriented discourse goal of professional medical discourse,
although the main object of investigation is nonetheless a human agent. The
hybrid narrative form of medical case reports is thus also confirmed by the use of
pronominal reference.
Since the use of pronouns as a register feature distinguishes the three genres
only insufficiently, Table 2 also presents results of an alternative analysis of the
referential patterns one finds in the texts. Looking at the subjects of all (finite and
8 This feature includes personal and possessive (including reflexive) pronouns as well as rela-
tive pronouns referring to a noun phrase (and not to a clause).
60 Heidrun Dorgeloh
non-finite) clauses in the corpus, the instances of the (explicit or implicit) sub-
jects were categorised as referring to the patient, the doctor, or to the domain of
medicine.9 Subjects being the unmarked point of departure of the English clause
and therefore more often than not the topic (e.g. Börjars and Burridge 2010: 226),
it was assumed that their reference is likely to indicate which voice is talking (cf.
Section 3.1) and to what extent the discourse truly focuses on human experience.
personal pronouns per 100 words 10.48 (118) 7.55 (111) 10.64 (134)
(absolute frequency)
clausal topics per 100 words 12.79 (144) 8.63 (120) 13.02 (164)
(absolute frequency)
topic from the domain of medicine 5.16 (58) 4.10 (57) 2.70 (34)
While the overall frequencies of clausal topics per text category differ mainly for
reasons of sentence length, the semantic sub-categorisation contained in Table
2 yields some notable similarities and differences. In particular, illness blogs
and case reports are quite similar with respect to their reference to the domain
of medicine, and both do not reach the extent of reference made to the patient in
“Clinical Crossroads”. Although they pursue opposite, i.e. narrative as opposed
to non-narrative, discourse goals and are produced by opposite speaker roles,
illness blogs and case reports, which otherwise differ in their use of narrative
register features, reveal a striking similarity in this respect.
9 Assuming that every lexical verb gives rise to a clause, each explicit or implicit subject belong-
ing to a lexical verb was categorised semantically. The category “patient” includes reference to
the person as well as to body parts. The category “medicine” covers symptoms (weakness, pain),
reference to the disease, as well as to elements from the diagnosis (tests, findings) or therapy
(e.g. medication or treatment). In the majority of cases, these categories were distinct; there were
only two instances of a subject referring to both patient and doctor, as in: The last time I chatted
with the oncologist, I asked him if we could talk. Subjects like these were counted towards both
categories.
The interrelationship of register and genre in medical discourse 61
100%
90%
80%
70%
60% other
patient
50%
doctor
40%
medicine
30%
20%
10%
0%
patient in blog doctor in case report patient in CC doctor in CC
The results from Table 1 and Figure 2 make obvious that genres with opposite
functions, i.e. illness reports and medical case reports, can in fact be more
similar than the ones with a related function, such as patients communicating
their illness in different situations. The reason is that different voices are used
for communicating illness (cf. Section 3.1), which highlight different aspects of
the course of the events. While due to general situational parameters, such as
speaker or discourse function, illness blogs and “Clinical Crossroads” are similar
in their register usage, they nonetheless differ in their choice of topics. It is this
10 Percentages show the proportion of the four semantic categories in relation to the total of
topics as given in Table 2.
62 Heidrun Dorgeloh
interrelationship of form (register), function, and social context, which for the
analysis of medical discourse suggests a primacy of the notion of genre.
5 C
onclusion
My analysis of text varieties from medical discourse has intended to show that
investigating linguistic variation with a view to genre adds an important perspec-
tive to the understanding of form-function relationships in text-linguistic studies.
While these commonly rest upon the assumption that “linguistic co-occurrence
reflects shared function” (Biber 1989: 5) and present corpus-linguistic evidence
for this, the interrelationship of register and genre can only be made explicit by
combining the perspectives. Since a genre classifies discourse at a rather low
level of generality, especially with regard to the purpose and goal of a discourse,
it determines both pervasive linguistic features as well as the choice of discourse
topics and semantic categories. Hence, I have argued here that a genre analysis
logically subsumes and pre-determines a register analysis.
Genres, especially in the domain of medicine, make regular use of the nar-
rative discourse type with its attested register features. This is not surprising,
given the acknowledged role of the narrative as a basic text type or meta-genre
(cf. Section 2.2). A similar interrelationship underlies the observation that the
dividing line between lay and professional communication is also one between
narrative and non-narrative discourse (Georgakopoulou and Goutsos 2000). The
discussion here has added to this view that one needs to distinguish between nar-
rative form and narrative discourse function, and that more professional social
and cognitive activities typically go together with more complex (in the sense of
more indirect) uses of narrative register variation. Text varieties of this kind are
best understood from a genre perspective, which can account for their mixed pur-
poses and voices and, thus, their hybridity in register.
References
Atkinson, Dwight. 1992. The evolution of medical research writing from 1735 to 1985. Applied
Linguistics 13. 337–374.
Atkinson, Dwight. 1999. Scientific discourse in sociohistorical context: The Philosophical
Transactions of the Royal Society of London 1675–1975. Mahwah, NJ: Lawrence Erlbaum.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas. 1989. A typology of English texts. Linguistics 27. 3–43.
The interrelationship of register and genre in medical discourse 63
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. Harlow: Longman.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and written
registers. Amsterdam & Philadelphia: John Benjamins.
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and
Linguistic Theory 8(1). 9–37.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of
sub-register. Journal of English Linguistics 41(2). 104–134.
Börjars, Kersti & Kate Burridge. 2010. Introducing English grammar. London: Arnold.
Brinker, Klaus. 2005. Linguistische Textanalyse: Eine Einführung in Grundbegriffe und
Methoden. Berlin: Schmidt.
Charon, Rita. 2006. Narrative medicine: Honoring the stories of illness. Oxford & New York:
Oxford University Press.
Cordella, Marisa. 2004. The dynamic consultation: A discourse analytical study of doctor-
patient communication. Amsterdam & Philadelphia: John Benjamins.
Coupland, Nikolas. 2007. Style: Language variation and identity. Cambridge: Cambridge
University Press.
Croft, William. 2010. The origins of grammaticalization in the verbalization of experience.
Linguistics 48. 1–48.
Csomay, Eniko. 2006. Academic talk in American university classrooms: Crossing the
boundaries of oral‐literate discourse? Journal of English for Academic Purposes 5(2).
117–135.
Csomay, Eniko. 2007. A corpus-based look at linguistic variation in classroom interaction:
Teacher talk versus student talk in American University classes. Journal of English for
Academic Purposes 6(4). 336–355.
Dorgeloh, Heidrun. 2012. Arztbericht vs. Patientengeschichte: Story point als Genremerkmal
im medizinischen Internetdiskurs. In Ansgar Nünning, Jan Rupp, Rebecca
Hagelmoser & Jonas Ivo Meyer (eds.), Narrative Genres im Internet: Theoretische
Bezugsrahmen, Mediengattungstypologie und Funktionen (WVT-Handbücher zum
literaturwissenschaftlichen Studium), 261–276. Trier: WVT.
Dorgeloh, Heidrun. 2014. ‘If it didn’t work the first time, we can try it again’: Conditionals as a
grounding device in a genre of illness discourse. Communication & Medicine 11(1). 55–67.
Dorgeloh, Heidrun & Anja Wanner. 2010. Syntactic variation and genre. Berlin & New York: de
Gruyter Mouton.
Döring, Nicola. 2003. Sozialpsychologie des Internet. Göttingen: Hogrefe.
Eckert, Penelope & John R. Rickford (eds.). 2001. Style and sociolinguistic variation. Cambridge:
Cambridge University Press.
Fleischman, Suzanne. 2001. Language and medicine. In Deborah Schiffrin, Deborah Tannen &
Heidi E. Hamilton (eds.), The handbook of discourse analysis, 470–502. Malden, Mass.:
Blackwell.
Fludernik, Monika. 1996. Towards a ‘natural’ narratology. London: Routledge.
Frankel, Richard M. 2000. The (socio)linguistic turn in physician-patient communication
research. In James E. Alatis, Heidi E. Hamilton & Ai-Hui Tan (eds.), Linguistics, language,
and the professions, 81–103. Georgetown: Georgetown University Press.
64 Heidrun Dorgeloh
Georgakopoulou, Alexandra & Dionysis Goutsos. 2000. Mapping the world of discourse: The
narrative vs. non-narrative distinction. Semiotica 131(1–2). 112–141.
Georgakopoulou, Alexandra & Dionysis Goutsos. 2004. Discourse analysis: An introduction.
Edinburgh: Edinburgh University Press.
Gerteis, Margaret, Susan Edgman-Levitan, Jennifer Daley & Thomas L. Delbanco (eds.). 1993.
Through the patient’s eyes: Understanding and promoting patient-centered care. San
Francisco: Jossey-Bass.
Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun
Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 29–52. Berlin & New York:
de Gruyter Mouton.
Giltrow, Janet & Dieter Stein. 2009. Genres in the internet. Amsterdam & Philadelphia: John
Benjamins.
Gotti, Maurizio & Françoise Salager-Meyer. 2006. Introduction. In Maurizio Gotti & Françoise
Salager-Meyer (eds.), Advances in medical discourse analysis: Oral and written contexts,
9–16. Bern: Peter Lang.
Haiman, John & Tania Kuteva. 2002. The symmetry of counterfactuals. In Joan Bybee & Michael
Noonan (eds.), Complex sentences in grammar and discourse: Essays in honor of Sandra
A. Thompson, 101–124. Amsterdam & Philadelphia: John Benjamins.
Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Edward Arnold.
Honeybone, Patrick. 2011. Variation and linguistic theory. In Warren Maguire & April McMahon
(eds.), Analysing variation in English, 151–177. Cambridge: Cambridge University Press.
Hunter, Kathryn M. 1991. Doctors’ stories: The narrative structure of medical knowledge.
Princeton, NJ: Princeton University Press.
Hurwitz, Brian. 2006. Form and representation in clinical case reports. Literature and Medicine
25(2). 216–240.
Kinneavy, James Louis. 1971. A theory of discourse: The aims of discourse. Englewood Cliffs, NJ:
Prentice-Hall.
Kortmann, Bernd. 2006. Syntactic variation in English: A global perspective. In Bas Arts & April
McMahon (eds.), Handbook of English linguistics, 603–624. Oxford: Blackwell.
Labov, William. 1997. Some further steps in narrative analysis. The Journal of Narrative and Life
History 7. 395–415.
Labov, William & Joshua Waletzky. 1967. Narrative analysis: Oral versions of personal
experience. In June Helm (ed.), Essays on verbal and visual arts, 12–44. Seattle: University
of Washington Press.
Martin, James Robert & David Rose. 2003. Working with discourse: Meaning beyond the clause.
London: Continuum.
Maseide, Per. 2003. Medical talk and moral order: Social interaction and collaborative clinical
work. Text 23(3). 369–403.
McCullough, Laurence B. 1989. The abstract character and transforming power of medical
language. Soundings 72(1). 111–125.
Mishler, Elliot G. 1984. The discourse of medicine: Dialectics of medical interviews. Norwood,
NJ: Ablex.
Miller, Carolyn R. 1984. Genre as social action. Quarterly Journal of Speech 70. 151–167.
Murawska, Magdalena. 2012. The many narrative faces of medical case reports. Poznan Studies
in Contemporary Linguistics 48(1). 55–75.
Page, Ruth. 2012. Stories and social media: Identities and interaction. New York: Routledge.
The interrelationship of register and genre in medical discourse 65
Polanyi, Livia. 1985. Telling the American story: A structural and cultural analysis of
conversational storytelling. Norwood: Ablex.
Richards, Jack C. & Richard W. Schmidt. 2002. Longman dictionary of language teaching and
applied linguistics. Harlow, UK: Longman.
Rosenbach, Annette. 2002. Genitive variation in English: Conceptual factors in synchronic
and diachronic studies (Topics in English linguistics 42). Berlin & New York: Mouton de
Gruyter.
Salmon, William N. 2010. Formal idioms and action: Toward a grammar of genres. Language &
Communication 30(4). 211–224.
Sankoff, David. 1988. Sociolinguistics and syntactic variation. In Frederick J. Newmeyer (ed.),
Linguistics: The Cambridge survey, 140–161. Oxford: Blackwell.
Sarangi, Srikant & Celia Roberts. 1999. Introduction: Discourse hybridity in medical work. In
Srikant Sarangi & Celia Roberts (eds.), Talk, work, and institutional order: Discourse in
medical, mediation, and management settings. 61–74. Berlin: Mouton de Gruyter.
Sarangi, Srikant. 2001. Activity types, discourse types and interactional hybridity: The case of
genetic counseling. In Srikant Sarangi & Malcolm Coulthard (eds.), Discourse and social
life, 1–27. Harlow: Longman.
Schilling-Estes, Natalie. 2002. Investigating stylistic variation. In Jack K. Chambers, Peter
Trudgill & Natalie Schilling-Estes (eds.), The handbook of variation and change, 374–401.
Oxford: Blackwell.
Schmid, Hans-Jörg. 2013. Is usage more than usage after all? The case of English not that.
Linguistics 51(1). 75–116.
Schryer, Catherine, Lorelei Lingard, Marlee Spafford & Kim Garwood. 2003. Structure and
agency in medical case presentations. In Charles Bazerman & David R. Russel (eds.),
Writing selves/writing societies, 92–96. Fort Collins: WAC.
Schulze, Rainer (ed.). 1998. Making meaningful choices in English: On dimensions,
perspectives, methodology, and evidence. Tübingen: Gunter Narr.
Smith, Carlota S. 2003. Modes of discourse: The local structure of texts (Cambridge Studies in
Linguistics 103). Cambridge: Cambridge University Press.
Swales, John M. 2004. Research genres: Explorations and applications. Cambridge: Cambridge
University Press.
Tannen, Deborah. 1989. Talking voices: Repetition, dialogue and imagery in conversational
discourse. Cambridge: Cambridge University Press.
Virtanen, Tuija. 1992a. Issues of text typology: Narrative – a ‘basic’ type of text? Text 12(2).
293–310.
Virtanen, Tuija. 1992b. Given and new information in adverbials: Clause-initial adverbials of
time and place. Journal of Pragmatics 17(2). 99–115.
Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological
perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic
variation and genre, 53–84. Berlin & New York: de Gruyter Mouton.
Werlich, Egon. 1976. A text grammar of English. Heidelberg: Quelle & Meyer.
Winker, Margaret A. 2006. Clinical crossroads: Expanding the horizons. The Journal of the
American Medical Association 295(24). 2888–2889.
Markus Bieswanger
Aviation English: Two distinct specialised
registers?
Abstract: The communication between air traffic controllers and pilots via voice
radio is regularly referred to as Aviation English in the literature. Responding to
growing international air travel after the Second World War and in reaction to
several accidents and incidents at least partly caused by controller-pilot miscom-
munication, the International Civil Aviation Organization (ICAO) developed a set
of standards and recommended practices concerning language use in air traffic
control communication. These ICAO guidelines permit the use of two different
and precisely defined varieties of Aviation English: standardised phraseology
in most routine situations and plain Aviation English when standardised phra-
seology is insufficient to serve an intended transmission. Based on the official
ICAO recommendations and the analysis of text excerpts from authentic air traffic
control communication, this paper addresses the question whether the two vari-
eties currently referred to as Aviation English are distinct registers in the sense
of Biber and Conrad (2009). The relationship between the two different inter-
pretations of Aviation English in actual controller-pilot communication and the
linguistic characteristics of these varieties are investigated and compared. The
analysis shows that the two varieties in question are indeed distinct specialised
registers and supports the main objective of the volume by demonstrating that
adequate register choice is a prerequisite for successful communication, in this
case in aviation contexts.
1 I ntroduction
For several decades, aviate – navigate – communicate has been widely known as
the axiomatic set of any pilot’s duties, particularly during non-routine and emer-
gency situations, but also in everyday routine flying. From the point of view of pri-
oritisation of tasks in high workload situations, the order implies that the primary
concern of any flight crew must be to maintain control over their aircraft, the
second most important duty is to make sure that the aircraft moves in the direc-
tion it is supposed to fly and the third priority is to communicate the intentions
of the flight crew to and receive instructions from air traffic control. However, this
order does not mean that communication plays an unimportant role in aviation.
Despite the highly plausible prioritisation of tasks, it should also be noted that
communication is included in the set of the three most important duties of pilots
(cf. Kostecka 2007: 13).
As a result of a number of incidents and accidents associated with commu-
nication problems as well as several decades of continuous growth of air traffic
around the globe, communication issues in air traffic control contexts are cur-
rently taken very seriously by the aviation authorities and play a heightened role
in pilot and air traffic controller training. The International Civil Aviation Organ-
ization explains this as follows:
With mechanical failures featuring less prominently in aircraft accidents, more attention
has been focused in recent years on human factors that contribute to accidents. Communi-
cation is one human element that is receiving renewed attention. (ICAO 2010: vii)
The renewed interest in air traffic control communication also shows in the
desire for an exchange of ideas and expertise between aviation professionals and
linguists, as illustrated by the recent volume entitled Aviation Communication:
Between Theory and Practice (Hansen-Schirra and Maksymski 2013). Voice-based
communication between pilots and air traffic controllers, so-called radiotele
phony, is regularly referred to as Aviation English or at least constitutes a central
part of even the broadest definitions of Aviation English. Moder (2013: 227) pro-
vides such a broad definition:
Aviation English describes the English used by pilots, air traffic controllers and other per-
sonnel associated with the aviation industry. Although the term may encompass a wide
variety of language use situations, including the language of airline mechanics, flight
attendants, or ground service personnel, most research and teaching focus on the more
specialized communication between pilots and air traffic controllers, often called radiote-
lephony.
of the history of English in air traffic control contexts and then go on to answer
the question whether the two varieties currently referred to by the term Aviation
English are distinct registers which can be categorised as specialised registers in
the sense of Biber and Conrad (2009).
Over 800 people lost their lives in three major accidents […]. In each of these seemingly
different types of accidents, accident investigators found a common contributing element:
insufficient English language proficiency on the part of the flight crew or a controller had
played a contributing role in the chain of events leading to the accident. In addition to these
high-profile accidents, multiple incidents and near misses are reported annually as a result
of language problems, instigating a review of communication procedures and standards
worldwide. (ICAO 2010: 1-1)
Aviation English: Two distinct specialised registers? 71
3.1 S
ituational analysis
According to Biber and Conrad (2009: 40), the major situational characteristics
of registers are: participants, relations among participants, channel, production
circumstances, setting, communicative purposes and topic (cf. also Schubert,
this volume).
Participants
The participants in both varieties of Aviation English are identical. The stake-
holders in aeronautical radiotelephony communication, i.e. pilots and control-
lers engaging in air traffic control communication, are both addressors producing
text as well as intended listeners referred to as addressees (cf. Biber and Conrad
2009: 41). Depending on national regulations, it may or may not be legal for out-
siders to listen to air traffic control communication, but there is no difference
between the two varieties concerning what Biber and Conrad (2009: 42) call
“on-lookers”. Since all parameters concerning participants and participation are
identical, differences between the use of standardised phraseology and plain Avi-
ation English cannot be attributed to this situational characteristic.
Channel
With channel, Biber and Conrad (2009: 43) mean the binary distinction into the
physical modes of speech and writing and what they call the “specific mediums
of communication.” Both types of Aviation English are voice-based and thus
clearly spoken registers. Written air traffic control communication with the help
of a so-called controller-pilot data link is still in its infancy and faces a number of
disadvantages that seem to inhibit its more widespread use, such as the ensuing
lack of situational awareness of all pilots of surrounding aircraft when messages
are exchanged bilaterally between one pilot and one air traffic controller. The
specific medium of communication for transmitting speech in air traffic control
communication is voice radio. Unlike face-to-face communication, Aviation
English thus generally belongs to the types of mediated spoken communication
(cf. also setting below).
Aviation English: Two distinct specialised registers? 73
Production circumstances
As both kinds of Aviation English are spoken registers, there is typically not much
time for speakers to plan what to say next and no possibility to “edit or erase
language once it is spoken” (Biber and Conrad 2009: 43). As in all spoken conver-
sations, there are certain expectations as to when a speaker has to say something
as well as limitations with respect to the length of pauses. Since all pilots a par-
ticular air traffic controller is responsible for are tuned to the same frequency and
since aviation radio technology does not allow more than one pilot to address the
controller at the same time, efficient communication is one of the main concerns
in air traffic control communication.
Setting
According to Biber and Conrad (2009: 44), “the setting refers to the physical
context of the communication – the time and place” (original emphasis). As with
most spoken communication, the time is shared by the interlocutors in air traffic
control communication, as the messages are transmitted instantaneously. Avia-
tion English, however, is generally mediated communication and thus the situ-
ation is special with respect to place. The participants have a certain knowledge
about the place of production of their interlocutor’s speech but do not share the
place of production as in face-to-face communication. The quality of transmis-
sion in air traffic control communication is one of the reasons for the implemen-
tation of SARPs, as it can be adversely affected by weather, distance and other
circumstances.
Communicative purposes
The two varieties of Aviation English show their biggest differences in relation to
the communicative purposes. It could be argued that both share what Biber and
Conrad (2009: 45) call the “general purpose”, i.e. the aim to ensure efficient and
effective communication between pilots and controllers, and differ only in the
specific purpose. If register status was decided by the general purpose alone, the
two varieties of Aviation English could be termed specific “subregister[s]” (Biber
and Conrad 2009: 45) of one register. However, according to the ICAO (2001: 5-1),
there should be no overlap between these two varieties: “ICAO standardized
phraseology shall be used in all situations for which it has been specified. Only
when standardized phraseology cannot serve an intended transmission, plain
language shall be used.” Considering the fundamentally different and comple-
mentary situations of use – routine versus non-routine air traffic control com-
munication (cf. ICAO 2010: 3-4, 3-5) – and the considerable linguistic differences
between the two varieties, as shown below, it can be argued that we are con-
cerned with two distinct, albeit related, registers.
74 Markus Bieswanger
Topic
The situation concerning the factor topic resembles the differentiation of com-
municative purposes: the shared general topic of both varieties is aviation, but
the specific topics covered are different. While standardised phraseology is con-
cerned with the fairly restricted aspects of routine air traffic control issues, plain
Aviation English covers a broader range of topics in non-routine situations, such
as emergencies as well as other unusual or unexpected contexts. “Topic is the
most important situational factor influencing vocabulary choice” (Biber and
Conrad 2009: 46) and so it is not surprising that standardised phraseology and
plain Aviation English should differ to a large extent at the lexical level (cf. also
Sections 3.2 and 3.3).
Summary
With respect to the situational characteristics of the two varieties of Aviation
English, many of Biber and Conrad’s (2009: 40) parameters such as participants,
relations among participants, channel, production circumstances and setting
are shared by both registers. However, there are clear differences in the commu-
nicative purposes and the range of topics covered by standardised phraseology
and plain Aviation English respectively, which leads to the conclusion that we
are not concerned with sub-registers of a single register. From the perspective of
situational characteristics, which “can be definitely specified” (Biber and Conrad
2009: 33) for both registers, standardised phraseology and plain Aviation English
can be categorised as two distinct specialised registers.
3.2 S
tandardised phraseology
Lexical characteristics
Standardised phraseology is probably best known for its characteristics at the
lexical level. At the heart of this register is a reduced vocabulary consisting of a
limited number of words and fixed phrases, each with a single precise meaning in
the situational context of routine air traffic control communication.
Section 5.2.1.5.8 of Annex 10 to the Convention on International Civil Aviation
(ICAO 2001) contains a brief list of words and phrases that “shall be used in radio
telephony communications as appropriate and shall have the meaning ascribed
hereunder.” The list contains key terms of radiotelephony communication, such
as affirm for ‘yes’, cleared (cf. Transcript 1) for ‘authorised to proceed [with the air-
craft] under the conditions specified’, go ahead (cf. Transcript 4) meaning ‘proceed
with your message’ but not ‘proceed with your aircraft’, monitor (Transcript 3) for
‘listen out on (frequency)’ and maintain (cf. Transcript 2) for ‘continue in accord-
ance with the condition(s) specified’. Section 12.3 of ICAO Document 4444 on Air
Traffic Management (ICAO 2007b) provides a more comprehensive collection of
words and phrases to be used in specific circumstances. For example, climb (cf.
Transcript 2) is prescribed as the phonetically dissimilar opposite of descend in
standardised phraseology, ruling out the use of ascend, which is regularly listed
as an antonym of descend in dictionaries of plain English (cf. OALDO 2014). The
recommendations even explicitly include words and phrases that should not be
used at all. For example, Section 3.1.4 of the Manual of Radiotelephony (ICAO
2007a: 3-1) suggest that “the use of courtesies should be avoided” altogether;
however, courtesies such as greeting and parting expressions are often used and
tolerated in non-urgent contexts (cf. Trancript 3). Standardised phraseology is
thus not among the many text varieties native speakers of a language acquire
76 Markus Bieswanger
“without explicitly studying them” (cf. Biber and Conrad 2009: 2) but has to be
learned by both native as well as non-native speakers of English with explicit
instruction.
From the lexical perspective, two main characteristics of the special regis-
ter referred to as standardised phraseology can be identified. First, in contrast to
most other varieties of English – where it is the rule rather than the exception for
words to have multiple meanings – each word and phrase has just one specific
and precisely defined meaning in aviation phraseology. Other meanings of words
which are polysemous in plain English are thus explicitly excluded from this reg-
ister and some of the defined meanings of words and phrases in aviation phra-
seology do not occur outside of this specialised register. Meanings of words and
phrases that do not occur in other registers are called “register markers” (Biber
and Conrad 2009: 53). Unlike register markers in many other registers, however,
these unique characteristics are strictly functional in standardised phraseology
(cf. Biber and Conrad 2009: 55). The second main lexical characteristic of this
register is the fact that words and phrases are carefully selected to avoid con-
fusion and misunderstandings due to phonetically similar expressions, since
“maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2) are considered
the most important aims of the prescription of aviation phraseology.
Grammatical characteristics
At the grammatical level, standardised phraseology is also characterised by a
number of pervasive and frequent “register features” (Biber and Conrad 2009:
53).
With respect to the use of verbs in aviation phraseology, the prescription to
use most verbs in the list of essential “words and phrases” in the imperative only
is certainly striking (cf. ICAO 2001: 5-6 and 5-7). According to the definitions in
this list, verbs such as cancel ‘annul the previously transmitted clearance’, check
‘examine a system or procedure’, contact (cf. Transcript 2) ‘establish communica-
tions with …’, disregard ‘ignore’, monitor (Transcript 3) ‘listen out on frequency’,
maintain (cf. Transcript 2) ‘continue in accordance with the condition(s) speci-
fied’, report ‘pass me the following information …’, and many more can only be
used in imperatives, which is certainly a register feature of this variety. Aviation
phraseology even prescribes the use of certain words as verbs in the imperative
which are not commonly used as verbs and thus not listed in this part of speech in
general-use dictionaries, e.g. the verbal use of standby (cf. Transcript 4) meaning
‘wait and I will call you’ (ICAO 2001: 5-7).
Another grammatical feature characteristic of aviation phraseology is the
specific prescribed order of elements in an utterance and the high frequency of
ellipses, as illustrated by the following authentic example:
Aviation English: Two distinct specialised registers? 77
Transcript 1:
Aerogal seven hundred heavy Kennedy Tower (.) winds calm (.) runway one three left (.)
cleared to land
(JFK Tower, own transcript, 2010)
Transcript 2:
Lufthansa four two four heavy climb [to and] maintain 3000 [feet] (.) fly runway heading
[…] contact Boston Departure […]
(Boston Tower, own transcript, 2015; imperatives in bold)
Pronunciation characteristics
The ICAO publications on standardised phraseology make specific recom-
mendations, which leads to additional linguistic features of this register. For
example, there are recommendations concerning the pronunciation of numbers
and letters. The “Radiotelephony Spelling Alphabet” defines the “desired pro-
nunciation” (ICAO 2001: 5-4) of the words representing letters when spelling out
“names, service abbreviations and words of which the spelling is doubtful” (ICAO
78 Markus Bieswanger
2001: 5-3). According to the ICAO (2001: 5-4), for example, the letter <z> has to be
pronounced as zulu /'zu:lu:/ and <k> has to be realised as kilo /'ki:lo/ (cf. Tran-
script 3).
Transcript 3:
Delta four twenty-seven (.) good day (.) continue down to kilo kilo [= taxiway KK] (.) follow
company [= another Delta jet] seven three seven (.) monitor tower one two three point niner
(JFK Ground, own transcript, 2008, my emphasis)
native speakers of English, who typically pronounce ‘3’ and ‘5’ in the usual plain
English way” (Moder 2013: 229–230). This is illustrated by Transcript 3, in which
the air traffic controller at JFK International Airport in New York City, most likely
a native speaker of English, pronounces <3> “in the usual plain English way”
(Moder 2013: 230) but realises <9> as niner.
Unlike for most other registers, there are even provisions concerning the
speed of delivery of utterances in Aviation English. The ICAO recommends “an
even rate of speech not exceeding 100 words per minute” (ICAO 2001: 5-5) and
an even slower rate “[w]hen it is known that elements of the message will be
written down by the recipient” (ICAO 2007a: 2-1). Studies, however, have shown
that particularly native speakers tend to use a much higher speech rate, often
over 200 words per minute, which can lead to misunderstandings and the need
for time-consuming clarifications (cf. Bieswanger 2013: 19–20). Silberstein and
Dittrich (2003: 9) quote an air traffic controller who admits: “I talk faster, a lot
faster – I talk so fast that they have to slow me down because they don’t under-
stand me anymore.” Since the speech rate is obviously crucial in Aviation English,
all pilots and air traffic controllers have to be trained to develop an awareness of
the importance of their speed of delivery.
Ever since its introduction after the Chicago Convention more than half a
century ago, the ICAO standardised phraseology has been refined and expanded.
The continuous development of standardised phraseology had been based on
pilots’ and controllers’ experiences and the analysis of language-related acci-
dents, in order to cover more areas of language use in aviation, to adopt new
procedures and technologies, and to deal with previously unknown or rare sit-
uations. For example, in reaction to recent events, the 15th edition of the ICAO
Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b: xv)
adds, among other regulations, new “pilot procedures in the event of unlawful
interference” and “procedures related to volcanic ash”.
Pilots and air traffic controllers are constantly urged to use standardised phra-
seology and to avoid non-standard communication whenever possible (cf., e.g.,
ICAO 2001: 5-1; ICAO 2007a: 3-2; ICAO 2010: 2-3; Prinzo et al. 2010: 15). Despite all
efforts to regularly update the standardised phraseology, the ICAO also acknow
ledges that “[i]t is not possible, however, to develop phraseologies to cover every
conceivable situation” (ICAO 2010: 4-2) and that “plain language shall be used”
(ICAO 2001: 5-1) when standardised phraseology is not available to cover the com-
municative needs of the stakeholders in air traffic control communication. The
following section will describe the use of plain language in such situations and
show that plain Aviation English can also be considered a specialised register.
80 Markus Bieswanger
The use of plain language has never been excluded from the use in pilot-control-
ler communication but, quite on the contrary, has always been permitted and
used in clearly defined situations in which “standardized phraseology cannot
serve an intended transmission” (ICAO 2001: 5-1). As a result of this precise situ-
ational context, however, plain Aviation English is fundamentally different from
everyday conversations in several respects:
Plain Aviation English is thus characterised by features that result from the func-
tion it has to fulfil with respect to safety and the topics covered in air traffic control
communication. These constraints are the reason for distinctive register features
at all linguistic levels, described and illustrated in the following subsections.
Lexical characteristics
The lexicon of plain Aviation English is less precisely defined than the words and
phrases used in standardised phraseology, but at the same time more restricted
than, for example, the lexicon of everyday conversation in what could be called
plain English. The ICAO recommendations make it very clear that the obvious
need for plain language in non-routine situations “should in no way be inter-
preted as permission to chat” (ICAO 2010: 4-3). At the lexical level, plain Avia-
tion English is thus characterised by words and phrases corresponding to topics
related to pilot-controller communication. These topics, which are also addressed
in textbooks and courses on plain Aviation English (cf., e.g., Emery and Roberts
2008), include, among others, fields such as technology, health, animals, fire
and weather (for a detailed list of domains, cf. ICAO 2010: B5-B8). For example,
in-flight medical emergencies often make the use of plain Aviation English neces-
sary (cf. Transcript 4). In Transcript 4, standardised phraseology is used in the
first two transmissions to establish contact but then turns out to be insufficient
to serve all of the communicative needs of the pilots. Hence a code-switch takes
place and the further three transmissions are carried out in plain Aviation English.
The vocabulary in these transmissions, however, is different from plain everyday
English in that it is characterised by aviation-related terms such as diversion,
declaring emergency and met report.
Aviation English: Two distinct specialised registers? 81
Transcript 4:
American 182 Tokyo Control American one eight two
Tokyo Control American one eight two (.) go ahead
American 182 Yes sir (.) we are (.) have a possible diversion to Narita [=Tokyo Narita
International Airport] (.) we are not declaring emergency yet but would
like Narita weather
[…] Narita airport is closed, Tokyo Haneda is suggested for a possible diver-
sion
Tokyo Control American one eight two (.) do you need met report [=weather report] of
Haneda?
American 182 Yes sir (.) request met report for Haneda
Tokyo Control Okay, standby
(Tokyo Control, own transcript, 2014)
Grammatical characteristics
The grammatical structure of plain Aviation English is similar to plain English
and only characterised by some tendencies which constitute functionally ori-
ented register features. Of the factors mentioned in the quotation above, “conci-
sion” (ICAO 2010: 3-5) is certainly one of the main driving forces responsible for
these characteristics. Concision is defined as ‘giving only the information that is
necessary, using few words’ in the OALDO (2014). In the context of plain Aviation
English, this means that the utterances produced by pilots and air traffic control-
lers have to be as brief as possible and simply structured. According to Prinzo
et al. (2010: 15), the rate of readback errors is affected by “both message length
and complexity” and they claim that “controllers should transmit less informa-
tion more often.” With reference to concision, it has also been reported that the
desire for brevity leads to an influence of standardised phraseology on plain
Aviation English, showing in the deletion of function words such as determiners
even when not using phraseology (ICAO 2010: 3-6). The last two transmissions in
Transcript 4 illustrate this claim, as the determiner the is omitted in both trans-
missions before met report.
Pronunciation characteristics
At the level of pronunciation, plain Aviation English is less restricted than stand-
ardised phraseology, as there are no specific recommendations concerning the
realisation of individual words and phrases. Other ICAO recommendations con-
cerning pronunciation, however, also apply to the use of plain language and
make plain Aviation English more restricted than plain English in many other
situations. For example, the recommended speech rate of 100 words or less per
minute (ICAO 2001: 5-5; cf. above) is also valid for plain Aviation English, which
aims for maximum “intelligibility” (cf. ICAO 2010: 3-5), just like standardised
phraseology.
82 Markus Bieswanger
4 C
onclusion
The above sections have shown that Aviation English is not monolithic and that
there is not one but two varieties referred to as Aviation English, namely stand-
ardised phraseology and plain Aviation English. Both varieties occur in pre-
cisely defined and complementary situations in pilot-controller communication:
standardised phraseology covers most routine situations, whereas plain Aviation
English is only permitted in non-routine situations. Both varieties share many
of the situational characteristics Biber and Conrad (2009: 39) consider “relevant
for describing and comparing registers”. They are employed by the same partici
pants, i.e. pilots and air traffic controllers, with identical relations between the
participants, use the same channel, face the same production circumstances and
share the same setting. The main differences with regard to the situational char-
acteristics can be found in the communicative purposes and the topics covered.
While both varieties share their general purpose, namely to facilitate efficient
and effective air traffic control communication, standardised phraseology is
restricted to a limited set of frequently used communicative purposes in routine
situations, whereas plain Aviation English covers a whole range of less frequently
used and non-routine communicative purposes such as emergencies. A similar
pattern can be identified concerning the topics covered by these two varieties:
while standardised phraseology covers a restricted but very frequently used set of
topics in routine air traffic control communication, plain Aviation English covers
a much broader range of air traffic related topics in non-routine situations.
Resulting from the partially different situational contexts, both varieties of
Aviation English are characterised by pervasive linguistic features that fulfil spe-
cific functions in each of the situations. Standardised phraseology is character-
ised by a very precisely defined reduced set of words and phrases, each with a
Aviation English: Two distinct specialised registers? 83
References
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge Universtity Press.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Bieswanger, Markus. 2013. Applied linguistics and air traffic control: Focus on language
awareness and intercultural communication. In Silvia Hansen-Schirra & Karin Maksymski
(eds.), Aviation communication: Between theory and practice, 15–30. Frankfurt am Main:
Peter Lang.
Convention on International Civil Aviation. 1944. Convention on international civil aviation
done at the 7th day of December 1944. Original version available at http://www.icao.int/
publications/Documents/7300_orig.pdf (accessed 31 January 2014).
Crystal, David. 2003. English as a global language. 2nd edn. Cambridge: Cambridge University
Press.
Cushing, Steven. 1994. Fatal words: Communication clashes and aircraft crashes. Chicago: The
University of Chicago Press.
Emery, Henry & Andy Roberts. 2008. Aviation English: For ICAO compliance. Oxford: Macmillan.
Hansen-Schirra, Silvia & Karin Maksymski (eds.). 2013. Aviation communication: Between
theory and practice. Frankfurt am Main: Peter Lang.
ICAO (International Civil Aviation Organisation). 2001. Annex 10: Aeronautical
telecommunications. Volume II. 6th edn.
ICAO (International Civil Aviation Organisation). 2007a. Manual of radiotelephony. 4th edn.
ICAO Document 9432-AN/925.
ICAO (International Civil Aviation Organisation). 2007b. Procedures for air navigation services:
Air traffic management. 15th edn. ICAO document 4444-ATM/501.
ICAO (International Civil Aviation Organisation). 2010. Manual on the implementation of ICAO
language proficiency requirements. 2nd edn. ICAO Document 9835-AN/453.
Intemann, Frauke. 2008. ‘Taipei ground, confirm your last transmission was in English … ?’ – An
analysis of Aviation English as a world language. In Claus Gnutzmann & Frauke Intemann
(eds.), The globalisation of English and the English language classroom, 76–93. 2nd edn.
Tübingen: Narr.
Jenkins, Jennifer. 2008. Teaching pronunciation for English as a Lingua Franca: A sociopolitical
perspective. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English
and the English language classroom, 145–158. 2nd edn. Tübingen: Narr.
Jones, R. Kent. 2003. Miscommunication between pilots and air traffic control. Language
Problems and Language Planning 27(3). 233–248.
Kostecka, Robert. 2007. Aviate—Navigate—Communicate. Transport Canada: Aviation safety
letter 2/2007, 12–14.
Live-atc.net. www.live-atc.net. (accessed 19 February 2015)
Mathews, Elizabeth. 2004. New provisions for English language proficiency are expected to
improve aviation safety. ICAO Journal 59(1). 4–6, 27.
Mell, Jeremy. 2004. Language training and testing in aviation need to focus on job-specific
competencies. ICAO Journal 59(1). 12–14, 27.
Mitsutomi, Marjo & Kathleen O’Brien. 2004. Fundamental aviation language issues addressed
by new proficiency requirements. ICAO Journal 59(1). 7–9, 26–27.
Aviation English: Two distinct specialised registers? 85
Moder, Carol Lynn. 2013. Aviation English. In Brian Paltridge & Sue Starfield (eds.), The
handbook of English for specific purposes, 227–242. Malden: John Wiley & Sons.
OALDO (Oxford advanced learner’s dictionary online). 2014. http://oald8.
oxfordlearnersdictionaries.com/(accessed 31 January 2014).
Prinzo, Veronika O., Alan Campbell, Alfred M. Hendrix & Ruby Hendrix. 2010. U.S. airline
transport pilot international flight language experiences. Report 5: Language experiences
in native English-speaking airspace/airports. Technical report DOT/FAA/AM-10/18.
Washington, DC: Federal Aviation Administration, Office of Aerospace Medicine.
Silberstein, Dagmar & Rainer Dietrich. 2003. Cockpit communication under high cognitive
workload. In Rainer Dietrich (ed.), Communication in high risk environments (Special issue
12 of Linguistische Berichte), 9–56. Hamburg: Buske.
Rolf Kreyer
‘Now niggas talk a lotta Bad Boy shit’:
The register hip-hop from a corpus-
linguistic perspective
Abstract: The present paper wants to provide a first corpus-based analysis of one
of the most successful kinds of popular music, namely hip-hop. In particular, the
paper explores to what extent hip-hop can be regarded as a register in its own
right, analysing data drawn from a 200,000-word corpus of the most success-
ful hip-hop albums in 2003 and 2011. Taking Biber and Conrad’s (2009) register-
defining trias of situation of use, linguistic features, and associated functions as
a descriptive framework, it is argued that hip-hop can be warranted the status of
a register in its own right indeed.
1 I ntroduction
In Western societies, pop songs are an integral part of everyday life: we are sur-
rounded by pop songs in the supermarket, in the elevator or when driving a car.
Moreover, listening to pop songs is one of the (if not the) most popular pastime
among adolescents in America or Western Europe (cf., for instance, Schwartz and
Fouts 2003). Given the pervasiveness of pop songs, it is surprising that the scien-
tific study of this register does not figure very prominently in linguistics, although
pop songs have been given a considerable amount of attention in fields like cul-
tural studies.
In this respect, it is telling that none of the major corpora of the English lan-
guage provide any lyrics of pop songs. The linguistic analysis of this register is
still in its infancy and corpus-linguistic studies are few and far between. An early
corpus-based analysis of pop songs is Murphey (1989; cf. also 1990 and 1992). He
provides both quantitative as well as qualitative data from a 13,000-word corpus
of pop-song lyrics. His main focus, however, does not lie in the description of
a register but in the exploitation of pop songs for the learning and teaching of
English as a foreign language. A much more ambitious project is the BLUR (Blues
Lyrics collected at the University of Regensburg) corpus, which contains 7,341 song
texts comprising roughly 1.5 million words (Miethaner 2001, 2005; Schneider and
Miethaner 2006). However, this corpus consisting of recordings from the 1920s
to the 1940s was compiled as evidence for earlier African American Vernacular
English and, accordingly, is only of limited value for the study of pop songs as an
important present-day register. More detailed analyses of modern pop songs can
be found in Kreyer and Mukherjee (2007) and Kreyer (2012). The former provide a
first attempt at describing the major linguistic properties of the register at issue,
such as deviant spellings (also cf. Mukherjee 2000) and lexical/lexico-grammat-
ical aspects. One focus of their research is on the degree to which pop songs can
be considered a written or spoken register. The data show that the register is more
spoken-like in general, as is shown in similarities in average word length or the
high frequency of the personal pronouns you and I. Interestingly, other features
that are typical of spoken language, such as the frequent use of you know as a dis-
course marker, were shown not to be that important in pop songs. Kreyer (2012)
explores the use of love-related metaphors in pop songs within the framework of
conceptual metaphor theory (e.g. Lakoff and Johnson 1980; Kövecses 2002). He
finds that, despite the (perhaps) popular assumption that pop songs are clichéd,
metaphors in pop songs are quite varied and creative. The most recent register-
related study of pop songs is Werner (2012). Since he is interested in small-scale
diachronic as well as varietal aspects of pop songs, his corpus consists of two
subcorpora, one with British lyrics and the other with American lyrics. The 1,128
songs included in the corpus span the years 1952–2008 and 1946–2005, totalling
171,968 and 170,234 words, respectively (Werner 2012: 23). Werner’s findings also
confirm earlier claims about the informal and conversational nature of pop songs
lyrics. However, he argues convincingly that subsuming pop song lyrics under the
conversational register would go too far. Rather, the low frequencies of typical
spoken features such as interjections or non-standard morphosyntactic elements
call for a more careful analysis: “the picture of pop-song lyrics as exemplars of
spoken/informal register […] had to be […] altered to be thought of as a ‘special’
register” (Werner 2012: 43).
The present paper wants to further contribute to our understanding of
pop song lyrics from a register perspective by exploring hip-hop as a potential
sub-register. A question that comes to mind is whether pop songs can be regarded
as one single monolithic register or whether it makes sense to assume more spe-
cific registers covered by the umbrella term ‘pop songs’. Biber and Conrad (2009:
10) claim that “[t]here is no one correct level on which to identify a register” and
“that registers can be studied on many different levels of specificity”.
The present paper aims at providing a first corpus-based analysis of one of
the most successful (musical) genres among pop songs, namely hip-hop. The
label ‘genre’ is also to be understood in its linguistic sense at this point, since
‘Now niggas talk a lotta Bad Boy shit’ 89
we cannot yet be sure that hip-hop constitutes a register. Based on data from an
updated pilot version of the Giessen-Bonn corpus of Popular music – GBoP (cf.
Kreyer and Mukherjee 2007), the paper explores Biber and Conrad’s (2009: 50)
three criteria for register analysis (situational characteristics, linguistic charac-
teristics and function; cf. Schubert, this volume) and shows that with regard to all
of these, hip-hop must be regarded as a register in its own right.
2 T
he data
The data for the present study is taken from an extended pilot version of GBoP. It
contains lyrics from the top albums from the US album charts of the years 2003
and 20111. More specifically, for 2003, 48 of the top 52 albums were included. Four
albums had to be ignored because they either did not contain any lyrics at all
or only contained non-English lyrics. The 2003 lyrics were taken from internet
lyric archives or from CD booklets (cf. Kreyer and Mukherjee 2007 for details). The
2003 material has been supplemented by the (English) lyrics of the top 50 albums
from 2011. These lyrics were primarily taken from A-Z lyrics (www.azlyrics.com).
This site is particularly suitable, since the lyrics it provides are usually reviewed
by a number of different users, resulting in a fairly ‘reliable’ version of the texts.
In some cases, other archives like metrolyrics (www.metrolyrics.com) or lyrics-
freak (www.lyricsfreak.com) had to be consulted.
From this compilation of albums, a subcorpus was compiled of albums that
would usually be considered as representative of hip-hop. Of course, the decision
whether to include an album or not is not an easy one. The criterion applied was
whether the featured artist was primarily considered a rapper/hip-hopper (infor-
mation taken from www.discogs.com). Nelly, for instance, is primarily regarded
as a rapper, which is why his album Nellyville was included in the corpus, even
though it contains tracks that might rather be considered R&B. Stripped by Chris-
tina Aguilera, by contrast, was not included, since the performer is not primarily
regarded as a rapper or hip-hopper, although some of the songs in her album
would fall under that category. Compilation albums were excluded if they fea-
tured more than one artist. All in all, the hip-hop corpus contains the lyrics from
18 albums; 9 from 2003 and 9 from 2011. Table 1 shows the composition of the
corpus.
1 My first explorations of the development of pop music registers started in 2012 when the data
from 2011 was the most recent data available.
90 Rolf Kreyer
Album # words
Since “[t]he analysis of register characteristics […] will generally focus on the
comparison of two or more registers” (Biber and Conrad 2009: 36), the hip-hop
data will be contrasted with the data from the remaining albums, in the follow-
ing referred to as ‘non-hip-hop corpus’ or ‘control corpus’ (cf. Appendix 1 for its
composition). Although the number of albums in this control corpus is almost
four times as large, the number of words is comparatively small, namely slightly
below 350,000.
In all the texts, the original punctuation and spelling deviations were
retained. This is particularly important for hip-hop, as spelling conventions are
an important means of creating identity (cf. Morgan 2001, 2002 and Olivio 2001).
Metatextual comments like verse, chorus or bridge or the identity of the singer in
duets, for example, were removed from the text. Choruses were spelt out any time
they appeared in the text, i.e. a comment like Chorus [2x] was replaced by a repe-
tition of the lines of the chorus. In those cases where it was not clear from the text
layout which words are still part of the chorus and which are part of the verse, an
‘Now niggas talk a lotta Bad Boy shit’ 91
audio version of the song was consulted. Other kinds of repetition were spelt out
if they contained words, e.g. a line like She (When she loves) [3x] was represented
three times in the corpus (without the [3x], of course). However, if repetitions con-
sisted of non-lexical material only, they were not made explicit, e.g. Oooooh oooh
ooohohhh [x2]. All texts were stored in .txt format. An example of a text is given
in (1) below (note that <Z>, from German Zeilenumbruch, stands for line break).
(1) G-Unit (What) <Z> We in here (What) <Z> We can get the drama popping <Z> We don’t
care (What, what, what) <Z> It’s going down (What) <Z> ’Cause I’m around (What) <Z>
50 Cent, you know how I gets down (Down) <Z> What up, Blood? (What) <Z> What
up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up, Gangstaaa? </C> What up,
Blood? (What) <Z> What up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up,
Gangstaaa? <Z>
(50 Cent – What Up Gangsta?)
All analyses of the corpus material were conducted by using AntConc 3.2.4
(Anthony 2011) and Wmatrix (Rayson 2003, 2009).
In many respects, hip-hop and pop songs in general share situational features.
For instance, in both cases the channel is identical: the primary mode is (sung)
speech and the speech event is captured on a permanent medium (apart from a
live concert, of course). Similarly, the settings are identical, e.g. different times
and places of communication for the participants. Features of addresser and
addressee can be regarded as similar as well, at least on a general level. Pro-
duction circumstances might be described as ‘revised and edited’ in both cases,
although spontaneous rapping plays an extremely important role in hip-hop
culture (e.g. during battlin’ or cypha, i.e. rap competitions).
92 Rolf Kreyer
A B C E
general and abstract the body and the arts and crafts emotion
terms individual
F G H I
food and farming government and architecture, housing money and commerce
public and the home in industry
K L M N
entertainment, sports life and living things movement, location, numbers and
and games travel and transport measurement
O P Q S
substances, materials, education language and social actions, states
objects and equipment communication and processes
T W X Y
Time world and psychological actions, science and
environment states and processes technology
Z
names and grammar
On the highest level of specificity a total of 232 category labels is provided. The
category E ‘Emotion’, for instance, contains six subcategories, one of these being
subdivided into two further sub-classes. Figure 2 shows the structure of the cat-
egory ‘Emotion’:
‘Now niggas talk a lotta Bad Boy shit’ 93
Figure 2: The semantic category ‘Emotion’ in USAS (Archer et al. 2002: 10–11).
An example of the semantic tagging can be seen in (2), which shows a few words
from Tupac Shakur’s Still Ballin’.
The verb blame is tagged as a ‘speech act term’ (Q2.2) and, alternatively, as either
‘general ethics’ (G2.2) or ‘Crime, law and order: Law & order’ (G2.1). The minus
sign following G2.2 indicates the lack of ethics. Note that the tags are not given
in alphanumerical order; their sequence depends on the likelihood that USAS
assigns to each tag. The following three words, it, on, and my are either tagged as
‘pronoun’ (Z8) or ‘grammatical bin’ (Z5). The tag ‘S4f’ for mama tells us that we
are dealing with a kinship term, more specifically, female kin.
Like all automatic annotation, semantic annotation is not fully accurate. In
particular, hip-hop, with its idiosyncratic spelling and use of words, can lead to
problems. For instance, the frequencies of individual semantic categories showed
‘Food and Farming’ (category F) to be a topic of particular relevance for rappers
and hip-hoppers – a somewhat counter-intuitive finding. A closer look at the data
quickly revealed that this was due to the ambiguity of the string hoe, namely as a
farming tool and in the slang use of the term in the sense of ‘promiscuous woman’.
Another problem became apparent with the tag G1.2, ‘Politics’: the Patois per-
sonal pronoun form dem, which is highly frequent in the lyrics by Sean Paul, was
obviously understood as an abbreviation for democrat or related words. Similarly,
the form dat (that), presumably misinterpreted as the acronym for digital audio
94 Rolf Kreyer
tape, led to a very high frequency of the semantic category K3, ‘Recorded Sound’,
which as a consequence has also been ignored.
Such problematic cases aside, semantic annotation can give us an idea about
topics that are comparatively frequent or rare in hip-hop as opposed to other
pop songs. To this end, all semantic categories that showed relative frequencies
higher than 0.02 % in the hip-hop corpus were checked against the respective
categories in the control corpus, i.e. the non-hip-hop corpus. Table 2 provides
an overview of some semantic categories that seem especially suited to paint a
particular picture of the artists.
Table 2: A sample of semantic categories that are particularly frequent in the hip-hop corpus.
from the semantic categories is in line with analyses from rap and hip-hop videos.
Jones (1997: 353), for instance, claims that rap music shows a high amount of
“socially questionable behaviors [… like] guntalk, drugtalk, the presence of
alcohol, bleeping of profanity, and gambling” (Jones 1997: 353; cf. also DuRant et
al. 1997; Smith and Boysen 2002; Kreyer 2015). On the whole, it could be argued
that the topics explored in hip-hop promote a ‘bad boy’ image of the artist.
In addition to topic-related contrasts between pop songs and hip-hop, another
major difference seems to lie in the relations among the participants, which, in
turn, has a bearing on the communicative purpose of hip-hop as opposed to
other pop songs. Relations among participants, are described along four dimen-
sions, namely interactiveness, social roles, personal relationship, and shared
knowledge, in Biber and Conrad’s (2009) approach. With regard to this variable,
hip-hop seems to obtain a special status. Spady et al. (1999: 67) provide the fol-
lowing quote from the rapper Method Man: “The streets is where you get you
stripes at”. This hints at the important role of street credibility, i.e. a hip-hopper’s
being close to his or her cultural backgrounds in ‘the streets’. Alim (2006: 113)
writes: “Hip-hop Culture not only began in the streets of Black America, but the
streets continue to be a driving force in contemporary Hip-hop Culture.” Although
successful hip-hop artists, like any other kind of successful pop singer, mostly
interact with a displaced audience, “[t]he members of the Black American Street
Culture, to whom the artists are directing their lyrics, are not physically present,
yet they are in conversation” (Alim 2006: 123). This hints at a relatively high level
of (maybe abstract) interactiveness that might not be typical of other pop songs.
Similarly, the artists’ focus on street identity and group solidarity seems to have
important consequences on the other three dimensions of participant relations:
artists assume a relation with the members of their audience that can be char-
acterised by relative similarity of status, a huge amount of shared knowledge
(which has been gained on the streets) and a personal relationship that would be
described as friends or brothas and sistas, rather than that of star and fan as in
many other pop music genres. This special relation of artist and audience leads
to an additional communicative purpose, namely that of “staying street”, i.e. of
staying connected to the streets and to their cultural background. Hip-hoppers
use their art to “represent ‘the streets’” but at the same time “to connect with
the streets as a space of culture, creativity, cognition, and consciousness” (Alim
2006: 124). A particularly impressive example of this is provided by JaRule’s Con-
nected from the album The Last Temptation.
96 Rolf Kreyer
(3) We world wide connected, and ya’ll don’t want to fuck with us
In the streets we respected, so ya’ll don’t want to fuck wit us
World wide connected nigga, ya’ll don’t want to fuck wit us
We gangster ass niggas and we hard to hit
Murder Inc in the role who could fuck wit this
On the whole, then, the situational characteristics of hip-hop and other pop
songs warrant the status of hip-hop as a register in its own right.
another way of addressing the particular audience […]. In other words, rap artists construct
themselves as ‘authentic’ through the use of language […,] through the use of locally signif-
icant images, sounds, and written texts.
He, too, reports on the ‘r-lessness’ of AAVE, as in the two examples above or in
cases like gangsta, rida, murda etc. In some cases, stressing the AAVE-pronun-
ciation leads to a decisive shift in meaning, as the late Tupac Shakur points out
regarding nigga: “Niggers was the ones on the rope, hanging off the thing; Niggas
is the ones with gold ropes, hanging out at clubs” (Lazin 2003). In the following
we will take a look at two idiosyncratic spelling features, namely orthographic -a
instead of -er and word-final -z as a plural marker. Table 3 shows the frequency of
these two non-standard spelling variants in the hip-hop corpus and the non-hip-
hop control corpus2.
Table 3: ‘r-less’ forms in the hip-hop corpus and the non-hip-hop control corpus.
anotha 11 83 0 mutha 1 0 0
balla 0 10 1 Muthafucka 1 0 0
betta 12 43 0 muthafucka 7 0 0
bigga 1 16 0 muthafuka 1 0 0
brotha 1 16 3 neitha 1 7 0
docka 1 0 0 Numba 1 49 0
fucka 1 2 0 playa 18 26 3
gangsta 41 6 4 Rida 3 4 0
2 The frequencies shown here are not entirely unproblematic because the texts were primarily
taken from lyrics archives (i.e. are most likely transcribed by fans) and not from official booklets.
To some extent, then, the numbers represent the audience rather than the artists themselves.
However, they still provide us with an idea of the use of non-standard spelling within the hip-
hop community, of which the artists want and claim to be a part.
98 Rolf Kreyer
Table 3(continued)
Gangstaa 3 0 0 rocka 2 2 0
Harda 1 12 0 stoppa 1 0 0
hotta 1 13 0 stunna 1 1 0
killa 2 15 0 sucka 1 6 0
lova 0 3 12 Sucka 2 0 0
mobsta 1 1 0 supa 1 26 0
motha 1 33 16 swagga 2 6 0
mothafucka 7 51 0 trigga 3 6 0
Motherfucka 1 7 0 wanksta 8 0 0
muhfucka 1 0 0 whateva 4 37 0
Murda 4 112 0
Boyz 21 41 0
Dredz 1 0 0
gangstaz 0 8 1
Gunnerz 1 0 0
Gunz 2 0 0
Hoez 1 148 0
Killaz 1 6 0
Outlawz 6 0 0
Ridaz 8 5 0
[…] the use of non-standard orthographic choices may be another way of addressing the
particular audience, while these forms appear alongside standard orthographic forms
100 Rolf Kreyer
which are available to be consumed by a more general audience. In other words, rap artists
construct themselves as ‘authentic’ through the use of language and accounts of the social
and economic realities in late-capitalist society, and the effects of this reality on the lives of
rap artists and their communities; but they also construct an ‘authentic’ audience through
the use of locally significant images, sounds, and written texts.3
The only consistent use of non-standard spelling in the present corpus is shown
in the texts by the Jamaican rapper Sean Paul. His texts seem to be primarily
addressed at a specific audience consisting of speakers of Patois. Consider the
example below:
Other possible register features or even register markers can, of course, be found
in the lexis of hip-hop, particularly taboo expressions. Beers Fägersten (2008)
reports on the frequency of taboo terms as a feature of hip-hop. In her analy-
sis of a 100,000-word corpus of postings on a hip-hop-message board she found
that the frequency of “swear words, profanity or taboo terms” such as shit, fuck,
ass, nigga and bitch “suggests that such linguistic behaviour is in fact character-
istic of the hip-hop community” (Beers Fägersten 2008: 223–224). These taboo
words “serve to discursively represent the hip-hop individual, and subsequently
the community as well, by virtue of their recognisability as taboo words” (Beers
Fägersten 2006: 29).
With some uses of these taboo words we see what Morgan (2002: 121) refers
to as inversion, where “an AAE [African American English] word means the oppo-
3 Of course, orthographic choices play a comparatively minor role since the main way of ad-
dressing the audience is through the auditory channel.
‘Now niggas talk a lotta Bad Boy shit’ 101
site of at least one definition of the word in dominant culture”. The word shit,
for instance, “can refer to almost anything – positions, events, etc.” (Smither-
man 2000: 257). The shit is “a person who is the ultimate; most powerful; above
all others; top dog” (Smitherman 2000: 257). Another example is the form nigga,
where the idiosyncratic spelling signals a decisive shift in meaning, as discussed
above.
Table 5 shows the 30 words that are most key (according to AntConc) in the
hip-hop corpus when compared to the non-hip-hop control corpus.
Table 5: The top 30 key word forms in hip-hop when compared to the non-hip-hop corpus.
Rank Token Freq. Rel. freq. Freq. non- Rel. freq. Keyness
hip-hop hip-hop hip-hop non-hip- of token in
hop hip-hop
Table 5(continued)
Rank Token Freq. Rel. freq. Freq. non- Rel. freq. Keyness
hip-hop hip-hop hip-hop non-hip- of token in
hop hip-hop
The frequent use of taboo words and profanity that is reported in Beers Fägersten
(2006 and 2008) can also be observed in the present corpus, the top five key-
words being nigga, shit, fuck, niggas and bitch. Inflectionally related forms occur
at rank 10 (niggaz), at rank 15 (bitches) and rank 16 (fucking). In addition, we see
a strong preference for terms with strong sexual connotations, such as ass, hoes
and pussy. Some of the above list might even be considered register markers. The
forms niggaz, Zoop, and yuh do not occur at all in the control corpus. The form
Zoop, however, cannot be regarded as indicative of the register, as it lacks the
pervasiveness necessary for register features/markers: it only occurs in one song,
CG by Nelly.
Anyone who has ever listened to hip-hop and has seen hip-hop videos is well
aware of the fact that it is an art form which is dominated by African Americans,
at least in the US. Are, then, the linguistic features of hip-hop merely a conse-
quence of the AAVE dialect? If that was the case, one would be hard put to argue
that these linguistic features fulfil a particular function in a particular situation.
An answer to that question is provided by Alim (2009: 117–123) in an analysis
of the absence of the present tense copular forms is and are. He compares the
‘Now niggas talk a lotta Bad Boy shit’ 103
frequencies of absence from the language of two hip-hoppers, Juvenile and Eve,
in two kinds of texts: an interview and their lyrics. For both artists, Alim (2009:
121–122) finds
an increase in the frequency of absence […] when moving from the interview data to the
lyrical data. […] it is clear that both of these artists display the absent form more frequently
in their lyrical data than in their interview speech data. […] the data suggest that the more
attention the artists pay to their speech (comparing interviews to lyrics) the more ‘non-
standard’ their speech becomes […].
His claim “that Hip-hop artists are indeed in conscious control of their copula
variability” (Alim 2009: 123) suggests that hip-hoppers deliberately make use of
AAVE features to achieve a particular (yet to be identified) effect. It makes sense,
therefore, to regard idiosyncratic linguistic features as exponents of register.
We will now look at patterns where a personal pronoun is either followed or
not followed by a present tense form of BE (in the past the copula is not absent;
cf. Alim 2006: 117) followed by either a NP (with definite or indefinite article) or
an ing-form of a verb, as in the examples below.
Originally, it was planned to conduct an automatic search for the above patterns.
Since Wmatrix provides us with the means to tag corpora, a query for strings of
parts of speech seemed to be the method of choice. However, it was soon found
that the accuracy of the CLAWS tagger suffered from idiosyncratic syntax and
from idiosyncratic spelling conventions, particularly in the hip-hop corpus. As a
consequence, the patterns above were identified on the basis of lexical queries, for
instance ‘I a/an’, I’m a/an’ or ‘I am a/an’ as the possible instantiations of pattern
(8) with the first person singular personal pronoun. The resulting concordances
were post-edited to weed out non-target hits, such as those shown below.
As can be seen in example (11), a query that is only based on lexical infor-
mation will also find tokens that end in -ing although they are not progressive
forms. Example (12) shows a written representation of an extremely reduced
variant of I am going to. The example under (13) shows how problems can arise
104 Rolf Kreyer
because of Patois transcription and grammar: a is not the indefinite article in this
case. Rather, it seems to be an equivalent to an emphatic do in British English.4
Example (14) is particularly challenging, since the text alone would allow two
readings, namely as an instance of the pattern we are interested in or as an appos-
itive construction. The only way to resolve the ambiguity was to listen to the track,
which showed that the second reading is the more plausible one.
(11) I’m everything you love (Kid Rock: I’m Wrong But You Ain’t Right)
(12) I’m a call you as soon as I land (Whiz Kalifa: Top Floor)
(13) We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)
(14) We the people / Are we the people? (Metallica: Some Kind of Monster)
Table 6: Copula be and copula absence in the hip-hop corpus (‘abs.’, ‘contr.’ and ‘full’ refer to
absent, contracted and full form of the copula, respectively).
abs. contr. full abs. contr. full abs. contr. full abs. contr. full
ya are/ø a/the/…ing 0 0 0 0 0 0 15 0 0 6 0 0
He is/ø a/the/…ing 2 5 0 1 7 0 6 14 0 9 26 0
Table 6(continued)
abs. contr. full abs. contr. full abs. contr. full abs. contr. full
Table 7: Copula be and copula absence in the non-hip-hop control corpus (‘abs.’, ‘contr.’ and
‘full’ refer to absent, contracted and full form of the copula, respectively).
abs. contr. full abs. contr. full abs. contr. full abs. contr. full
He is/ø a/the/…ing 0 14 9 0 7 0 3 19 1 3 40 10
We are/ø a/the/…ing 0 0 0 4 11 1 48 67 2 52 78 3
A summary of the results shown in the two tables is provided in Figure 3, which
compares the relative frequency of absent copula BE in hip-hop as opposed to
non-hip-hop lyrics.
As can be seen, the data show a very pronounced preference for copula
absence in the hip-hop corpus compared to the non-hip-hop corpus. The largest
proportion of copula absence in non-hip-hop songs is found with the personal
pronoun we. A closer look at the data shows that, to a large extent, this exception
can be explained by the African-American R&B artist R. Kelly. In particular, we
find that a total of 17 tokens are found in one song only, namely Ignition. If we
ignore this particular song, the relative frequency of copula absence in non-hip-
hop already drops to 30 %. All in all, these results suggest that copula absence
is indicative of hip-hop. Future research will have to show to what extent this
particular feature is also pervasive in other possible sub-registers of pop songs,
such as R&B.
106 Rolf Kreyer
100%
90%
80%
70%
60%
50% absent_hip-hop
40% absent_other
30%
20%
10%
0%
I You Ya He She It We They
Figure 3: Copula absence in the hip-hop and the non-hip-hop control corpus.
-z), lexical features (the frequent use of taboo expressions and profanity often
with a significant change of meaning) and grammatical characteristics (copula
absence) focus on the common language background of the artist and his or her
audience. So, when “niggas talk a lotta Bad Boy shit”, as the late Tupac Shakur
raps, they portray themselves as representatives of ‘the streets’, while at the same
time connecting back to the streets and the people living there.
References
Anthony, Laurence. 2011. AntConc (Version 3.2.4) [Computer Software]. Tokyo, Japan: Waseda
University. http://www.antlab.sci.waseda.ac.jp (accessed May 2014).
Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS category system.
University of Lancaster. http://ucrel.lancs.ac.uk/usas/usasguide.pdf (accessed May 2014).
Beers Fägersten, Kristy. 2006. The discursive construction of identity in an internet hip-hop
community. Revista Alicantina de Estudios Ingleses 19. 23–44.
Beers Fägersten, Kristy. 2008. A corpus approach to discursive construction of a hip-hop
identity. In Annelie Ädel & Randi Reppen (eds.), Corpora and discourse: The challenges of
different settings, 211–240. Amsterdam: John Benjamins.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
DuRant, Robert H., Michael Rich, S. Jean Emans, Ellen S. Rome, Elizabeth Allred & Elizabeth R.
Woods. 1997. Violence and weapon carrying in music videos: A content analysis. Archives
of Pediatrics and Adolescent Medicine 151(5). 443–448.
Forman, Murray & Mark Anthony Neal (eds.). 2004. That’s the joint! The hip-hop studies reader.
New York: Routledge.
Jones, Kenneth. 1997. Are rap videos more violent? Style differences and the prevalence of sex
and violence in the age of MTV. Howard Journal of Communication 8(4). 343–356.
Kövecses, Zoltan. 2002. Metaphor: A practical introduction. Oxford: Oxford University Press.
Kreyer, Rolf. 2012. ‘Love is like a stove – it burns you when it’s hot’: A corpus-linguistic view on
the (non-) creative use of love-related metaphors in pop songs. In Sebastian Hoffmann,
Paul Rayson & Geoffey Leech (eds.), English corpus linguistics: Looking back, moving
forward, 103–115. Amsterdam: Rodopi.
Kreyer, Rolf. 2015. ‘Funky fresh dressed to impress’: A corpus-linguistic view on gender roles in
pop songs. International Journal of Corpus Linguistics 20(2). 174–204.
Kreyer, Rolf & Joybrato Mukherjee. 2007. The style of pop song lyrics: A corpus-linguistic pilot
study. Anglia 125. 31–58.
Lakoff, George & Mark Johnson. 1980. Metaphors we live by. Chicago: Chicago University Press.
Lazin, Lauren. 2003. Tupac: Resurrection. Paramount.
Miethaner, Ulrich. 2001. The BLUR (Blues Lyrics Collected at the University of Regensburg)
corpus: Blues lyricism and the African American literary tradition. Current Objectives of
Postgraduate Studies 2. http://copas.uni-regensburg.de/article/view/64/78 (accessed 3
January 2015).
108 Rolf Kreyer
Miethaner, Ulrich. 2005. I can look through Muddy: Analyzing earlier African American English
in blues lyrics (BLUR). Frankfurt am Main: Peter Lang.
Morgan, Marcyliena. 2001. ‘Nuthin’ but a G thang’: Grammar and language ideology in hip-hop
identity. In Sonja L. Lanehart (ed.), Sociocultural and historical contexts of African
American Vernacular English, 187–210. Athens: University of Georgia Press.
Morgan, Marcyliena. 2002. Language, discourse and power in African American culture.
Cambridge: Cambridge University Press.
Mukherjee, Joybrato. 2000. ‘Krisis at Kamp Krusty’: Deviant spellings in popular culture as
examples of medium-dependent graphic presentation structures. Arbeiten aus Anglistik
und Amerikanistik 25. 161–172.
Murphey, Tim. 1989. The where, when and who of pop song lyrics: The listener’s prerogative.
Popular Music 8. 58–70.
Murphey, Tim. 1990. Music and song in language learning: An analysis of pop song lyrics and
the use of music and song in teaching English to speakers of other languages. Bern: Lang.
Murphey, Tim. 1992. The discourse of pop songs. TESOL Quarterly 26. 770–774.
Olivio, Warren. 2001. Phat lines: Spelling conventions in rap music. Written Language and
Literacy 4. 67–85.
Rayson, Paul. 2003. Matrix: A statistical method and software tool for linguistic analysis
through corpus comparison. Lancaster University: Ph.D. thesis.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Computing
Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/(accessed May 2014).
Schneider, Edgar W. & Ulrich Miethaner. 2006. When I started to using BLUR. Accounting for
unusual verb phrase patterns in an electronic corpus of Earlier African American English.
Journal of English Linguistics 34. 233–256.
Schwartz, Kelly D. & Gregory T. Fouts. 2003. Music preferences, personality style, and
developmental issues of adolescents. Journal of Youth and Adolescence 32. 205–213.
Seidman, Steven A. 1992. An investigation of sex-role stereotyping in music videos. Journal of
Broadcasting and Electronic Media 36(2). 209–216.
Smith, Stacy L. & Aaron R. Boyson. 2002. Violence in music videos: Examining the prevalence
and context of physical aggression. Journal of Communication 52(1). 61–83.
Smitherman, Geneva. 2000. Black talk: Words and phrases from the hood to the Amen corner.
Boston: Houghton Mifflin Company.
Spady, James G., Charles G. Lee & H. Samy Alim. 1999. Street conscious rap. Philadelphia:
Unum Loh Publishers.
Werner, Valentin. 2012. Love is all around: A corpus-based study of pop lyrics. Corpora 7(1).
19–50.
‘Now niggas talk a lotta Bad Boy shit’ 109
3 Doors Down – Away From The Sun Brad Paisley – This Is Country Music
Aaliyah – I care 4 U Adele – 19
Alan Jackson – Greatest Hits II … Adele – 21
Audioslave – Audioslave Beyoncé – 4
Avril Lavigne – Let Go Bon Jovi – Greatest Hits
Beyoncé – Dangerously In Love Britney Spears – Femme Fatale
Celine Dion – One Heart Bruno Mars – Doo-Wops And Hooligans
Cher – The Very Best Of Cher Chris Brown – F.A.M. E.
Christian Aguilera – Stripped Coldplay – Mylo Xyloto
Coldplay – A Rush Of Blood To The Head Florence and the Machine – Lungs
Dixie Chicks – Home Foo Fighters – Wasting Light
Elvis Presley – 30 #1 Hits Glee – The Music; Season 2
Evanescence – Fallen Glee – The Music, The Christmas …
Faith Hill – Cry Jackie Evancho – Dream With Me
Good Charlotte – The Young And … Jackie Evancho – O Holy Night
Hilary Duff – Metamorphosis Jason Aldean – My Kinda Party
Jennifer Lopez – This Is Me … Then Josh Groban – Illuminations
John Mayer – Room For Squares Justin Bieber – My World 2.0
Justin Timberlake – Justified Justin Bieber – My World’s Acoustic
Kelly Clarkson – Thankful Justin Bieber – Never Say Never …
Kenney Chesney – No Shoes, … Katy Perry – Teenage Dream
Kid Rock – Cocky Keith Urban – Get Closer
Linkin Park – Meteora Kenny Chesney – Hemingway’s Whiskey
Luther Vandross – Dance With My Father Kid Rock – Born Free
Matchbox Twenty – More Than You … Lady Antebellum – Need You Now
Metallica – St. Anger Lady Antebellum – Own the Night
R. Kelly – Chocolate Factory Lady Gaga – Born This Way
Rascal Flatts – Melt Mumford and Sons – Speak Now
Rod Stewart – It Had To Be You … P!nk – Greatest Hits … So Far!!!
Santana – Shaman R. Kelly – Loveletter
Shania Twain – Up! Rascal Flatts – Nothing Like This
Tim McGraw – Tim McGraw And … Rihanna – Loud
Toby Keith – Unleashed Sugarland – The Incredible Machine
Susan Boyle – The Gift
Taylor Swift – Speak Now
The Band Perry – The Band Perry
The Black Keys – Brothers
Tony Bennett – Duets 2
Zac Brown Band – You Get What You Give
Teresa Pham
The register of English crossword puzzles:
Studies in intertextuality
Abstract: Despite their popularity, crossword puzzles have so far been neglected
in text-linguistic publications. Therefore, this paper provides a detailed analysis
of crosswords. As a textual variety related to a specific situation, fulfilling specific
functions and displaying pervasive, frequent linguistic and formal features, this
type of linguistic riddle must be regarded as an independent register according to
the framework by Biber and Conrad (2009). Moreover, a detailed linguistic ana
lysis establishes non-cryptic and cryptic crosswords as two distinct sub-registers.
For the purpose of exploring the role of intertextuality in those two sub-registers,
a corpus of 270 intertextual non-cryptic and cryptic clue-answer pairs from The
Sun (N.N. 2009) and The Times (Browne 2009) was compiled. A quantitative ana
lysis of this corpus reveals that intertextual references in cryptic puzzles primar-
ily target classical mythology, Shakespeare and the Bible, whereas non-cryptic
puzzles additionally require knowledge of Anglo-American popular culture. The
qualitative analysis of the corpus discusses the particular forms and functions of
intertextuality in non-cryptic and cryptic puzzles (Stocker 1998), providing also
an explanation for their use from a cognitive linguistic perspective (Geeraerts &
Cuyckens 2007) as well as a comparison with intertextuality in other registers.
The paper shows that intertextual references and their particular forms and func-
tions may be distinctive features of certain registers. Intertextuality is context-
dependent and used with a particular communicative function and should thus
be incorporated as one possible feature into the linguistic analysis of registers
according to the framework by Biber and Conrad (2009).
1 I ntroduction
Crossword puzzles (or simply crosswords) are the most popular type of linguistic
puzzle today (cf. Augarde 2003: 57) and hold a permanent place in most British
and American newspapers. Given this prominence in regular, if not everyday
language use, their marginalisation as a register in text linguistic analysis and
the resulting scarcity of relevant linguistic publications are surprising. Most pub-
2.1 S
ituational analysis
and current affairs, familiar with and perhaps active in what have been classed as
middle-class sports”. However, since certain strategies of codification, chunks of
knowledge and even clues are recurrent, crossword experience is also a major pre-
dictor of crossword proficiency (cf. Hambrick, Salthouse, and Meinz 1999: 140).
From the cognitive linguistic perspective, this correlation, like the phenomenon
of agenda-setting (cf. Scheufele and Tewksbury 2007), is due to the fact that fre-
quent activation makes cognitive representations more easily retrievable. There-
fore, others (e.g. Scott and O’Donnell 1998: 237) claim that the knowledge and
skills necessary for crosswords can be acquired by everyone and consequently
regard crosswords as democratic.
II. Production circumstances and channel: With their close interde-
pendence between clues and answers, crosswords result from a careful and
time-consuming process of planning and editing. The reception process may
be equally time-consuming and non-linear. Therefore, the written mode is one
essential characteristic of crosswords – even in the digital age, where puzzles can
be downloaded from websites or generated by computer programmes or applica-
tions on mobile devices.
Furthermore, what can be considered as a marker of crosswords and what is
equally dependent on their appearance in writing is their physical layout on the
page. Answers must be inserted, letter by letter, into a grid available either on
paper or digitally and consisting of white (generally lights; cf. Scott and O’Don-
nell 1998: 219) and black squares (blocks; cf. Moorey 2008: 5). The corresponding
numbering of clues and squares indicates into which light the first grapheme of
the respective answer has to be inserted. Subsequent letters of the answer are
inserted either horizontally or vertically into the grid, depending on whether the
clue was labelled Across or Down. Answers are interdependent by their inter-
secting in so-called crosslights or checked letters (cf. Biddlecombe 2009). Con-
sequently, each correct answer will simplify the search for subsequent intersect-
ing answers to a greater or lesser extent (cf. Nickerson 2011; Goldblum and Frost
1987). The number of letters which are part of only one answer (unchecked letters
or unches) is an indicator of the difficulty of a crossword (cf. Augarde 2003: 63;
Scott and O’Donnell 1998: 219).
III. Setting: Setters and solvers do not share the physical context of com-
munication. As already mentioned, crosswords (as well as their solutions) are
usually originally printed in newspapers, i.e. in a public space, but are typically
solved in private. Heinemann (2000: 610–611) therefore assigns them to the
(semi-)official, public domain.
IV. Purposes: Crosswords are devoid of the usual purpose of language use,
which is communication (cf. Schlepper 1981: 63). On the contrary, the primary
purpose of crosswords is to entertain and delight the addressee: they allow
114 Teresa Pham
501). Thus, in example (1) the clue requires prior identification of the answer to
clue number 11:
(1) Line also transported 11 to shore (9) – LANDWARDS (Browne 2009: 124)
Apart from that, the only links between clues usually are their appearing together
with one uniform layout and the combinatory interdependence of the respective
answers in the grid. If, following Halliday and Hasan (1976: 1), a text is defined
as “a unit of language in use” whose texture arises from inter-sentential cohesive
ties on the surface, crosswords do not normally constitute texts.
Besides cohesion, further standards of textuality according to de Beaugrande
and Dressler (1981) are not or only partially met: clues are thematically inde-
pendent and there is no continuity of or even connection between underlying
concepts (coherence). Furthermore, even if clues need to be new and creative to
be intellectually challenging for solvers, crosswords do not have the function of
transmitting information (informativity). However, the setter’s primary inten-
tion of entertaining solvers is evident (intentionality) and, although most clues
would be unacceptable and irrelevant in usual communicative situations, cross-
word initiates accept these linguistic inconsistencies as being part of this type of
puzzle (acceptability, situationality). Thus, if we define a text as a passage of lan-
guage which “functions as a unity with respect to its environment” (Halliday and
Hasan 1976: 1) and consider cohesion, informativity (cf. Schubert 2012: 23) and
also coherence as frequent, but non-obligatory features of texts, then crosswords
must certainly be regarded as texts.
pers (e.g. Le Figaro, Le Nouvel Observateur; cf. Mok 1987: 98). Since the 1970s,
Die Zeit, a weekly national German quality paper, has been publishing a type of
crossword puzzle which combines cryptic and straightforward clues (Um die Ecke
gedacht, literally ‘thought outside the box’). However, the British cryptic cross-
word remains unique: “Although traces of the cryptic crossword can be found in
some European countries, it is nowhere developed to anything like the extent it
has now reached in the UK […]. German-language puzzles are those which come
closest to the British model […]. By and large, however, these are all relatively
modest by British standards” (Scott and O’Donnell 1998: 211–213).
A quantitative analysis performed on 20 puzzles (523 clue-answer pairs) from
The Times (cryptic puzzles; Browne 2009), The Guardian (non-cryptic puzzles;
Rusbridger 11.–16.05.2013) and The Sun (two-speed crosswords giving a non-cryp-
tic and a cryptic clue for each answer; N.N. 2009) confirms the basic distinction
between the two types of puzzle:
Despite variability within each type, clues and answers are considerably shorter
in non-cryptic than in cryptic puzzles. Furthermore, both turns are morpho-
syntactically more complex in the latter type. Non-cryptic clues are usually very
simple phrases, often consisting of a head only as in (2), sometimes in combina-
tion with a simple pre- or postmodifier (3), whereas the corresponding answers
are mostly single content words or proper names:
Cryptic clues, by contrast, resemble block language headlines. When they are
constituted by phrases, these are typically more complex, containing for example
longer prepositional phrases or (finite or nonfinite) clauses as postmodifiers (4).
Cryptic clues may also have an often elliptical clause structure, taking the form
The register of English crossword puzzles: Studies in intertextuality 117
of simple or complex, mainly declarative sentences (cf. Quirk et al. 1985: 40, 803)
as in (5). In addition to single content words and proper names, the answers to
cryptic clues often comprise morphologically complex lexemes (e.g. idioms as
in (5), compounds nouns or multi-word verbs) as well as function words (6) or
phrases (4).
(4) Bloomer made by top performer in nativity scene? (4,2,9) – STAR OF BETHLEHEM
(5) Find a lovely partner to share a seasonal moment (4,1,7) – PULL A CRACKER
(6) Jarring we hear’s in contrast (7) – WHEREAS (Browne 2009: 122, 110, 124)
Furthermore, the relationship between the turns of the same non-cryptic adja-
cency pair is overtly governed by the “Rule of Inflection” and the “Rule of Iden-
tity” (Schlepper 1981: 67). The former prescribes that clue and answer must “be
able to fulfil the same syntactic function” (Schlepper 1981: 67). Therefore, they
usually have the same inflection (7) and/or belong to the same formal syntactic
category. However, a prepositional phrase may also point to an adverb or a non-
finite clause to an adjective (8).
The latter rule dictates that clue and answer have to be semantically equivalent,
allowing (absolute or near) synonymy (9), negated antonymy (10), hyponymy (11)
as well as paraphrases and definitions of variable precision (12).
Therefore, according to Greimas (1970: 287), crosswords work like a reverse dic-
tionary, where only the definitions are given and the appropriate lemmata have
to be provided by the solver. Yet to complicate matters, solving a non-cryptic clue
may require considering polysemy, homonymy and proper names. In addition,
the relationship between clues and answers may also be syntagmatic, being
based on phraseological units such as idioms or collocations.
The aforementioned rules apply less overtly to cryptic crosswords. The
reason for this opacity is that cryptic clues have a binary structure. It is only the
definition (underlined in the following examples of cryptic clues) that is syn-
tactically and semantically equivalent to the answer. The subsidiary indication,
however, encodes the same answer a second time semantically, phonologically
118 Teresa Pham
(13) Huge mines exploded around me (7) – IMMENSE (Browne 2009: 28)
Only two clue types deviate from this basic structure: In so-called all-in-one or
& lit clues (‘and literally true clues’; cf. Moorey 2008: 22), which are sometimes
marked by exclamation marks, the definition and the subsidiary indication are
merged (14). Cryptic definition clues (cf. Moorey 2008: 27), by contrast, consist of
a misleading definition or paraphrase of the answer (15). They frequently rely on
homonymy or a morphological reinterpretation of lexemes or idiomatic expres-
sions and may be marked by question marks. Non-cryptic clues were banned
when the rules for cryptic puzzles were reformulated by setters in the 1930s and
1940s (cf. Scott and O’Donnell 1998: 236).
A cryptic clue thus offers two approaches to the answer and points to it unam-
biguously, if interpreted correctly. Some crossword initiates therefore insist that
cryptic crosswords are easier to solve than non-cryptic ones (cf. Skinner 2008: 7;
Schlepper 1981: 75). However, a solver may encounter several difficulties in inter-
The register of English crossword puzzles: Studies in intertextuality 119
preting cryptic clues. First, the definition and the subsidiary indication are inte-
grated into a stretch of language which seemingly permits literal interpretation.
Yet the sole purpose of the surface structure of the clue is to mislead the solver. Its
meaning, however, is exhausted once the clue has been solved. Therefore, clues
have to be regarded as a succession of fragments which correspond to neither
morpho-syntactic nor orthographic units, since word boundaries may be shifted
and punctuation marks overruled: “A cryptic clue is a sentence or phrase, appear-
ing to make some kind of sense and putting ideas into the solver’s head. These
often have little or nothing to do with the answer, which can be derived by inter-
preting all or part of the clue in ways which are less obvious” (Biddlecombe 2009).
Second, the definition and the subsidiary indication are unmarked, may
occur in variable order and may even overlap. There may also be words or phrases
which are superfluous for solving the clue (cf. Schlepper 1981: 66), added solely
for enhancing the coherence of the surface structure. Third, even when the defi-
nition has been identified, it may be a zero-derivation, polyseme or homonym
and thus, due to the absence of any context, syntactically and/or semantically
ambiguous. Fourth, the subsidiary indication may contain several operations of
codification not necessarily indicated by signal words (for lists of such indicators
cf. Stephenson 2007: 35–63; indicators will be underlined by a broken line in the
following examples of cryptic clues).
Cryptic clues whose subsidiary indication encodes the answer semantically,
so-called double or multiple definition clues (for the names of clue types used here
cf. Moorey 2008: 13–31; Biddlecombe 2009), contain a second definition. They
are usually based on polysemy, homonymy, homography or the metaphorical or
literal reinterpretation of one or several lexemes in the clue and/or the answer
(16).
(16) Poorly educated and characterless? (10) – UNLETTERED (Moorey 2008: 154)
By contrast, homophone clues encode the answer phonologically and are based
on the phonological similarity (homeophony) or identity (homophony) of lexemes
such as whale and wail in (17).
(17) Marine beast’s audible cry (4) – WAIL (Stephenson 2007: 55)
(21) Statement: Last month, a cat swallowed a rat (11) – DECLARATION (Biddlecombe 2009)
(22) Confine the heartless king in a public house (6) – INTERN (Gilbert 2001: 64)
The register of English crossword puzzles: Studies in intertextuality 121
tion and possible indicators are not marked, the surface structure of cryptic clues
permits multiple interpretations, semantically as well as morpho-syntactically.
As with non-cryptic puzzles, the difficulty of cryptic puzzles increases when rare
lexemes or specialised or esoteric knowledge are targeted. As against non-cryp-
tic clues, however, once the structure underlying the clue has been recognised
and the operations of codification have been identified, well-constructed cryptic
clues can be answered unambiguously, even without resorting to crosslights in
the grid.
The previous analysis showed that crosswords are associated with a particu-
lar situation and particular purposes, which are reflected in pervasive formal as
well as linguistic features. Consequently, crosswords must clearly be regarded
as an independent register according to Biber and Conrad’s definition (2009:
31; see also Schubert’s introduction to this volume). Furthermore, the detailed
semantic and morpho-syntactic analysis of crosswords revealed that non-cryptic
and cryptic puzzles, despite their being based on the same linguistic building
blocks, have developed different strategies for fulfilling their primary purpose
as entertainment. They codify answers to a different extent and therefore require
different skills on the part of the solver. Since, due to this, non-cryptic and cryptic
puzzles differ linguistically, those two types of crosswords must be regarded as
distinct sub-registers of the register of crosswords.
tion. So far, however, it has never been studied how intertextuality contributes to
the characteristics and purposes of crosswords and to what extent the analysis of
intertextual references can contribute to establishing crosswords as a register or
non-cryptic and cryptic puzzles as distinct sub-registers.
3.1 W
orking definitions
The term intertextuality was coined in the late 1960s by the Bulgarian linguist
and literary critic Julia Kristeva (1968). Yet, although intertextual references occur
particularly frequently in texts from the 20th and 21st centuries, intertextuality is
by no means an exclusively modern or postmodern phenomenon. On the con-
trary, references to previous texts or utterances may be regarded as an intrinsic
property of human language. Consequently, the study of intertextual references,
especially in the fields of rhetoric and literary theory, can be traced back to classi-
cal antiquity, albeit under different labels such as parody, quotation or imitation.
Today, there are two principal tendencies in research on intertextuality. The
theory of intertextuality is historically rooted in post-structuralist literary criti-
cism, which deconstructs the traditional concept of text. Post-structuralists like
Kristeva, Barthes and Derrida furthermore regard intertextuality as a character-
istic of all texts and consequently contest the autonomy of any text. Thus, inter-
textuality does not refer back to individual, identifiable pre-texts, but to a “texte
infini [infinite text]” (Barthes 1973: 59) or a “texte général [general text]” (Derrida
1972: 125), which is extended to comprise even the ‘social’, ‘cultural’ or ‘historical
text’ (cf. Barthes [1968] 1977: 146). However, this ontological conception of inter-
textuality has never developed a feasible method for textual analysis.
Consequently, for actual textual analysis as in the present paper, scholars
revert to the second, narrower conception of intertextuality. It regards intertex-
tual references as a gradable feature of some, yet not all texts, examines the forms
and functions of such references and, being related to structuralism, approves
of the traditional concept of text. For structuralists like Genette (1982) or Riffa-
terre (1981) intertextuality theoretically refers back to isolated, identifiable pre-
texts (or groups of pre-texts). It is this narrow conception of intertextuality that
was adopted by linguistics in the 1980s. Linguists usually distinguish between
typological intertextuality, i.e. the relationships between post-texts and groups of
texts (registers, genres, styles or textual patterns), and referential intertextuality,
i.e. the relationships between post-texts and individual, identifiable pre-texts.
The previous section showed that crosswords should be regarded as an inde-
pendent register comprising two sub-registers. Typical examples of crossword
puzzles thus follow certain conventions and are necessarily characterised by
124 Teresa Pham
3.2 M
ethodology
(24) Consumed. “But answer came there none And this was scarcely odd because They’d
____ every one” (Carroll’s Through the Looking-Glass) (5) – EATEN (Gilbert 2001: 12)
Thus, for devising such clues, the setters relied on their knowledge of those pre-
texts. In order to identify the answers, solvers had to be able to access similar
knowledge of the pre-texts by activating (or constructing) appropriate cognitive
representations (cf. Geeraerts and Cuyckens 2007: 170–187). In 1995, however,
quotation clues like (24) were forbidden because they were not strictly cryptic and
because some puzzles had devoted too much attention to literary background
knowledge (cf. Biddlecombe 2009). By contrast, quotation clues like (25) are still
to be found in non-cryptic puzzles.
This suggests that today references to works of literature or popular culture are
considerably more frequent in non-cryptic than in cryptic puzzles and that less
knowledge of existing texts is required to solve the latter. Hence, one further aim
of the empirical study was to investigate this assumption comparatively by exam-
ining intertextual references in the two sub-registers of crosswords as to their fre-
quency, pre-texts, forms and functions.
For the corpus, two collections of crosswords were analysed, both published
in 2009, i.e. well after the abolition of quotation clues in cryptic puzzles. In total,
80 non-cryptic puzzles (2080 clue-answer pairs) from The Sun (N.N. 2009) and
80 cryptic puzzles (2372 clue-answer pairs) from The Times (Browne 2009) were
scrutinised for intertextual references according to the above definitions. When
several references occurred in one clue-answer pair or when references pointed
to several pre-texts, those were counted separately. This yielded a corpus of 270
intertextual clue-answer pairs (The Sun: 112; The Times: 158) and 295 intertextual
references (The Sun: 112; The Times: 183; 38.0 % vs. 62.0 %), which were manually
classified into five categories according to their respective pre-text(s).
Category (1) comprises references to folkloristic and mythological texts, orig-
inally transmitted orally. Clue-answer pairs requiring knowledge of literary texts
produced by individual authors according to aesthetic standards are summarised
in category (2). References to the visual arts are subsumed under category (3) and
subdivided into (a) painting/drawing/sculpture and (b) broadcasting/TV series/
film. For references to music, category (4) was created with the subcategories (a)
classical music (both orchestral and vocal) and (b) popular music. Remaining ref-
erences to religious, philosophical or other theoretical texts constitute category
(5). In some cases, the distinction between these (sub-)categories is not clear-
cut. Thus, further criteria were introduced. For example, popular music, in con-
trast to classical music, was regarded as being typically commercially oriented,
addressed to large audiences and distributed by the music industry.
In addition, each group was analysed according to the provenances of the
pre-texts or their authors. Thus, texts from Greek and Roman antiquity are classi-
fied as Classical, British is the label for pre-texts from the UK and the Republic of
Ireland, American denotes pre-texts from the USA, etc. Provenances relevant for
less than four intertextual references per category were subsumed under Other.
Due to their importance for intertextuality, Shakespeare and the Bible are listed
separately (cf. Table 2).
126 Teresa Pham
The first conclusion we can draw from the quantitative analysis of the corpus is
that, on the whole, and contrary to the previous assumption, intertextual refer-
ences are relatively more frequent in the cryptic puzzles published in The Times
than in the non-cryptic ones from The Sun. While crosswords in The Sun contain
1.4 intertextual references on average, puzzles in The Times contain 2.3 intertex-
tual references. Even if references to different pre-texts occurring in the same
clue-answer pair as in example (32) are not counted separately, this distributional
difference remains obvious (1.4 vs. 2.0 intertextual clue-answer pairs/puzzle). A
comparison with the frequency of intertextual references in non-cryptic puzzles
from another quality paper, The Guardian (Rusbridger 11.–16.05.2013; 0.8 refer-
ences or intertextual clue-answer pairs/puzzle), shows that this difference actu-
ally depends on the type of crossword and not on the journalistic standards or the
addressed readership of the respective newspapers. Consequently, despite there
being considerable variability in the frequency of intertextuality within the same
sub-register, cryptic puzzles generally require more knowledge of other texts than
non-cryptic puzzles. The qualitative analysis of the corpus will shed light on how
intertextual references are incorporated into cryptic puzzles, despite quotation
clues having been banned.
(1) Folkloristic and mythological texts 14.3 (%) 9.3 (%) 11.2 (%)
Table 2(continued)
Note: All values are percentages and are calculated based on the number of intertextual
references in the crosswords from The Sun (112), The Times (183) or both newspapers (295;
labelled Average). Differences for example between percentage sums (shaded cells) corre-
sponding to (sub-)categories and respective individual percentage values (white cells) corre-
sponding to provenances result from rounding to one decimal place.
prising, since it has often been claimed that, at least since the mid-20th century,
the traditional pre-texts of the Victorian Age have declined in importance in
Anglo-American culture: “until recently Classical mythology, the works of Shake-
speare and the Bible were regular sources for compilers” (Scott and O’Donnell
1998: 207; cf. also Hebel 1991: 149). Consequently, the predominance of these pre-
texts in crosswords may have been even clearer in the first half of the 20th century.
This result supports Partridge’s assumption that typical solvers are thoroughly
and “humanistically educated” (Partridge 1992: 504).
Furthermore, it is equally revealing to compare the favourite pre-texts of the
two sub-registers of crosswords. Thus, clues in non-cryptic crosswords from The
Sun require knowledge of literary works in general (28.6 %), British literature
(excluding Shakespeare; 16.1 %) and Shakespeare (2.7 %) less frequently than
clues in cryptic crosswords from The Times (51.9 %, 31.1 % and 8.2 %). By contrast,
puzzles from The Sun refer to the Bible (14.3 %) and to the oral tradition (14.3 %),
especially to classical mythology (8.0 %), relatively more frequently than puzzles
from The Times (7.1 %, 9.3 % and 4.9 %). The reason for these different preferences
especially with regard to the traditional pre-texts of the Victorian Age might be
that a British solver with an average education can be expected to possess more
extensive general knowledge of the Bible and all texts of classical mythology
than of the 38 plays and 154 sonnets commonly attributed to Shakespeare (cf.
Greenblatt 1997: 65–66, 1923–1976). The most striking distributional differences
between the two sub-registers can, however, be found in categories (3b) and (4b).
Knowledge of (especially Anglo-American) video, broadcasting, TV series, films
and popular music is necessary for the solution of nearly one third of all inter-
textual non-cryptic clues (30.4 %) but is hardly relevant for cryptic puzzles at all
(4.9 %). Cryptic crosswords of the corpus thus primarily target traditional pre-texts
like classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles
focus on Shakespeare to a smaller, yet on classical mythology and the Bible to a
greater extent and additionally require knowledge of texts of the popular, espe-
cially Anglo-American culture. However, only a corpus including non-cryptic and
cryptic clue-answer pairs from further (popular and quality) newspapers could
reveal whether these preferences for certain pre-textual categories are correlated
to the respective sub-register of crosswords or to the expected knowledge of the
target solvership (or to both).
complex intension as well as their high selectivity and explicit markedness (cf.
Pfister 1985: 28; Karrer 1985: 106–108), proper nouns contribute to the codifica-
tion of answers as well as the unequivocal solution of clues. Hence, they are well-
suited for intertextual references in crosswords.
In more than two thirds of all intertextual non-cryptic adjacency pairs of the
corpus (67.9 %), proper nouns referring to the same pre-text occur in both the clue
and the answer, usually in combination with common nouns providing further
information on the referent (26). Thus, although these references are unmarked,
proper nouns can usually activate the necessary cognitive representations un‑
equivocally even without the grid.
In about one third of the non-cryptic clue-answer pairs of the corpus, proper
nouns occur either in the answer as in (27) (23.2 %) or, more rarely, in the clue
as in (28) (6.3 %), whereas the other component of the pair gives a semantically
equivalent common noun or noun phrase. Only one selective proper noun being
involved, more pre-textual knowledge is required for correctly associating clue
and answer. Furthermore, the solver may encounter a certain ambiguity, which is
resolved only when the number of letters of the answer is considered or crosslights
are already given in the grid:
In cryptic puzzles, by contrast, proper nouns are used with greater variation as
intertextual references. One major difference between the two sub-registers in the
corpus is that intertextual proper nouns may occur in the subsidiary indication
130 Teresa Pham
of cryptic clues, i.e. as an intermediate step in the solution of the clue (32.9 %).
From a cognitive linguistic point of view, especially well-known proper nouns
automatically activate easily accessible pre-textual frames. Whereas the frames
activated by intertextual references in non-cryptic puzzles are directly relevant
for the answers, this is not always the case in cryptic puzzles. Only lexemes in the
definition need to be interpreted literally. Intertextual references in the subsid-
iary indication, however, usually require no pre-textual knowledge at all. They
activate frames which mislead the solver and inhibit finding the answer, espe-
cially when knowledge of a completely different pre-text is required. Thus, in (32)
no knowledge of Lewis or the Lake poets is necessary because the answer, the
name of a different poet, is an anagram of the letters <TV CS Lewis Lake> given in
the subsidiary indication.
(32) TV broadcast with C S Lewis and Lake poet (9-4) – SACKVILLE-WEST (Browne 2009: 52)
Cryptic clues whose definitions and answers contain intertextual proper nouns
(usually referring to the same pre-text; 15.9 %) resemble the first type of non-
cryptic clue discussed before: an intertextual name in the definition is often
sufficient for an unequivocal solution and only basic pre-textual knowledge is
required. Whereas the additional subsidiary indication first complicates the acti-
vation of the necessary cognitive representations, once identified, it indicates the
correctness or falsehood of the supposed answer. In (33) the name of a Shake-
spearean spirit also results from the insertion of the Roman numeral for one into
an anagram of Lear. Equally, the answer in (34) is not only indicated by the defini-
tion but is also confirmed by the subsidiary indication: for the mythological place
name the graphemes of no and lava, paraphrased by sign of volcanic activity, are
reversed.
When intertextual proper nouns occur in the answer (41.1 %) or, more rarely, in
the definition only (5.1 %) and the corresponding counterpart is constituted by a
semantically equivalent common noun or noun phrase, as with the second type of
non-cryptic clue discussed above, the answer can usually not be inferred unam-
biguously from the definition alone. However, in these cryptic clues, the subsid-
iary indication may resolve the ambiguity. Furthermore, such clues require more
detailed knowledge of pre-texts than the previous categories. While the definition
in (35) does not unambiguously identify the intertextual answer, the subsidiary
indication requires the formation an anagram of relies on. By contrast, splitting
The register of English crossword puzzles: Studies in intertextuality 131
a couple, i.e. a lady and a man, by S (from succeeded) results in a synonym of the
intertextual eponym Casanova in (36).
Moreover, seven cryptic clues (4.4 %) require knowledge of the exact wording of
pre-textual passages. Thus, although they do not follow the traditional pattern of
quotation clues (featuring e.g. quotation marks and a gap which has to be recov-
ered), they must be classified as quotation clues. Not only is their share larger
than in non-cryptic puzzles, but they also refer to a different category of pre-texts.
While only two clues, (37) and (38), refer to popular culture (an English nursery
rhyme and a musical based on poems by Eliot), the others require knowledge
of works of well-known British and international authors: Shakespeare (39), but
only seemingly (40) and (41), Shelley (42), Carroll (43), Gray (41) and Plutarch
(40).
(37) When Grundy was christened, 48 hours before Chesterton’s man (7) – TUESDAY
(38) Reason for Macavity’s lack of presence (5) – ALIBI
(39) Underworld scam over shelter – it blighted Gloucester’s winter (10) – DISCONTENT
(40) Composer includes girl in second act of Julius Caesar (7) – VIVALDI
(41) Hamlet’s rude ancestor heard warning priest (10) – FOREFATHER
(42) Lovely old piece describing Shelley’s traveller’s land (7) – ANTIQUE
(43) Giving nasty looks? Alice never heard of such a thing! (12) – UGLIFICATION (Browne
2009: 156, 104, 92, 58, 90, 44, 42)
Finally, three cryptic clues (1.9 %) are based on idioms derived from individual
pre-texts. For these, the activation of pre-textual frames may be helpful, yet is by
no means essential. The idiomatic collocation representing the answer in (44) is
derived from Shakespeare’s Antony and Cleopatra (1.5.72). The subsidiary indi-
cation instructs the solver to insert sad (‘blue’) into lad (‘boy’) and to add ays
(‘votes’).
(44) Boy in blue votes for Green term (5,4) – SALAD DAYS (Browne 2009: 58)
Thus, an advertising slogan like “To smoke or not to smoke” for cigarettes (Mieder
1985: 126) can be interpreted as a statement about the world, expressing that the
consumer has to decide between two alternative actions, or as an intertextual
reference to Shakespeare’s Hamlet, additionally suggesting that the decision is
essential to the consumer. By contrast, a literal, non-intertextual interpretation
of references in non-cryptic clues as well as in the definition of cryptic puzzles
does not lead to the answer, whereas intertextual references in the subsidiary
indication of cryptic clues must be interpreted literally only. In both cases, the
clues’ meaning is exhausted as soon as the answer has been identified. Intertex-
tual references in puzzles can thus not be regarded as doubly referential.
The analysis of the corpus and the comparison with non-cryptic intertextual
clues from a quality newspaper further identified various types of intertextual
clue-answer pairs in non-cryptic and cryptic puzzles. These types typically estab-
lish intertextual relationships of different intensity and occur more frequently or
even exclusively in one or the other sub-register of crosswords. Cryptic puzzles
not only use intertextuality more often to encode the answer. Intertextual
clue-answer pairs in cryptic puzzles also tend to require the activation of more
comprehensive pre-textual knowledge than in non-cryptic puzzles. Furthermore,
cryptic puzzles require knowledge of a greater variety of pre-texts and also of pre-
texts which cannot be regarded as part of popular culture. Finally, well-known
pre-texts like Shakespeare’s Hamlet are referred to for misleading the solver by
activating easily accessible frames of knowledge.
4 C
onclusion
While crosswords had never been studied in detail from a text linguistic perspec-
tive, the present paper established and analysed crossword puzzles as an inde-
pendent register with non-cryptic and cryptic puzzles as distinct sub-registers.
In addition, neither had referential intertextuality been investigated as a charac-
teristic of crosswords, nor had it been considered as a linguistic feature relevant
for register analysis. Thus, Biber and Conrad only mention references to previous
scientific publications or postings in chatgroups (2009: 68, 289), but no other
types of intertextuality. However, intertextual clue-answer pairs occurring on
average more than once in every crossword in the present corpus (1.7 intertextual
clue-answer pairs/puzzle), this paper proved intertextuality to be one important
strategy of codification in this type of word game. Furthermore, intertextuality is
used in a manner differing radically from other texts, formally as well as func-
tionally. As a pervasive, frequent and distinctive linguistic feature of crosswords
The register of English crossword puzzles: Studies in intertextuality 133
Bibliography
Augarde, Tony. 2003. The Oxford guide to word games. Oxford: Oxford University Press.
Barthes, Roland. [1968] 1977. The death of the author. In Roland Barthes, Image music text,
142–148. London: Fontana Press.
Barthes, Roland. 1973. Le plaisir du texte. Paris: Editions du Seuil.
Beaugrande, Robert-Alain de & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics.
London & New York: Longman.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Biddlecombe, Peter. 2009. Yet another guide to cryptic crosswords. http://www.biddlecombe.
demon.co.uk/yagcc/(accessed 27 January 2015).
Browne, Richard. 2009. The Times crossword book 13. London: Times Books.
Bühler, Karl. [1934] 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart &
New York: Gustav Fischer Verlag.
Coffey, Steve. 1998. Linguistic aspects of the cryptic crossword. English Today 14(1). 14–18.
134 Teresa Pham
Cornell, Alan & Marion Cornell. 1980. Fragen und Antworten im englischen Kreuzworträtsel. In
Ernst Burgschmidt (ed.), Beiträge zu einer Linguistischen Landeskunde und Sprachpraxis,
44–63. Braunschweig: Verlag E. Burgschmidt.
Derrida, Jacques. 1972. Positions: Entretiens avec Henri Ronse, Julia Kristeva, Jean-Louis
Houdebine, Guy Scarpetta. Paris: Les Editions de Minuit.
Dienhart, John M. 1998. A linguistic look at riddles. Journal of Pragmatics 31. 95–125.
Fix, Ulla. 2011. Das Rätsel: Bestand und Wandel einer Textsorte. Oder: Warum sich die
Textlinguistik als Querschnittsdisziplin verstehen kann. In Ulla Fix (ed.), Texte und
Textsorten – sprachliche, kommunikative und kulturelle Phänomene, 185–214. 2nd edn.
Berlin: Frank & Timme.
Furthmann, Katja. 2006. Die Sterne lügen nicht: Eine linguistische Analyse der Textsorte
Pressehoroskop. Göttingen: V&R unipress.
Geeraerts, Dirk & Hubert Cuyckens (eds.). 2007. The Oxford handbook of cognitive linguistics.
Oxford: Oxford University Press.
Genette, Gérard. 1982. Palimpsestes: La littérature au second degré. Paris: Éditions du Seuil.
Gilbert, Val. 2001. The Daily Telegraph: How to crack the cryptic crossword. London: Pan Books.
Goldblum, Naomi & Ram Frost. 1987. The crossword puzzle paradigm: The effectiveness of
different word fragments as cues for the retrieval of words. Haskins laboratories status
report on speech research SR-89/90. 133–146.
Greenblatt, Stephen (ed.). 1997. The Norton Shakespeare. Based on the Oxford Edition. London:
W. W. Norton & Company.
Greimas, Algirdas Julien. 1970. L’écriture cruciverbiste. In Algirdas Julien Greimas (ed.), Du sens:
Essais sémiotiques, 285–307. Paris: Éditions du Seuil.
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.
Hambrick, David Z., Timothy A. Salthouse & Elizabeth J. Meinz. 1999. Predictors of crossword
puzzle proficiency and moderators of age–cognition relations. Journal of Experimental
Psychology: General 128(2). 131–164.
Hebel, Udo J. 1991. Towards a descriptive poetics of allusion. In Heinrich F. Plett (ed.),
Intertextuality, 135–164. Berlin & New York: Walter de Gruyter.
Heinemann, Margot. 2000. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang
Heinemann & Sven F. Sager (eds.), Text- und Gesprächslinguistik. Ein internationales
Handbuch zeitgenössischer Forschung, 604–614. Berlin & New York: Walter de Gruyter.
Helbig, Jörg. 1996. Intertextualität und Markierung: Untersuchungen zur Systematik und
Funktion der Signalisierung von Intertextualität. Heidelberg: Universitätsverlag C. Winter.
Karrer, Wolfgang. 1985. Intertextualität als Elementen- und Struktur-Reproduktion. In Ulrich
Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische
Fallstudien, 98–116. Tübingen: Niemeyer.
Kristeva, Julia. 1968. Le texte clos. Langages 12. 103–125.
Mieder, Wolfgang. 1985. Sprichwort, Redensart, Zitat: Tradierte Formelsprache in der Moderne.
Bern, Frankfurt am Main & New York: Peter Lang.
Mok, Quirinus Ignatius Maria. 1987. Mots croisés et ambiguïté. In Brigitte Kampers-Manhe &
Co Vet (eds.), Études de linguistique Française offertes à Robert de Dardel par ses amis et
collègues, 97–108. Amsterdam: Éditions Rodopi B. V.
Mollica, Anthony. 2007. Crossword puzzles and second-language teaching. Italica 84(1). 59–78.
Moorey, Tim. 2008. How to master the Times crossword: The Times cryptic crossword
demystified. London: Harper Collins Publishers.
The register of English crossword puzzles: Studies in intertextuality 135
Stratmann, Gerd. 1995. Kreuzworträtsel. In Rüdiger Ahrens, Wolf-Dietrich Bald & Werner Hüllen
(eds.), Handbuch Englisch als Fremdsprache (HEF), 192–195. Berlin: Erich Schmidt.
Underwood, Geoffrey, Caroline Deihim & Viv Batt. 1994. Expert performance in solving word
puzzles: From retrieval cues to crossword clues. Applied Cognitive Psychology 8. 531–548.
Weisskirch, Robert S. 2006. An analysis of instructor-created crossword puzzles for student
review. College Teaching 54(1). 198–201.
Witte, Kenneth L. & Joel S. Freund. 1995. Anagram solution as related to adult age, anagram
difficulty, and experience in solving crossword puzzles. Aging, Neuropsychology, and
Cognition 2(2). 146–155.
Section II:
Cross-register comparison
While the studies in Section I concentrated on single registers, Section II provides
cross-register comparisons, in which the distinctive features and markers of reg-
isters can be identified with great accuracy and perspicuity by means of juxtapo-
sition. As the contributions will show, such comparisons are particularly reveal-
ing when the registers under discussion are from clearly divergent domains. The
fact that each of the three papers in Section II includes academic writing demon-
strates that this register is highly distinctive and therefore well-suited as a yard-
stick for text-linguistic collation.
Christina Sanchez-Stockhammer’s study “Punctuation as an indica-
tion of register: Comics and academic texts” establishes a link to the papers by
Rolf Kreyer and Teresa Pham in Section I, since it also analyses a register from
popular culture, in this case the language of comics. At the same time, this con-
tribution enters uncharted linguistic territory by focusing on punctuation as a
register marker, which has been widely neglected so far despite its pervasive-
ness in written discourse. The study is based on two small-scale corpora, namely
AcadText, a corpus of journal articles, and CoCo, a comic corpus, both of which
were designed and compiled for register comparison by the author. It is shown
that different punctuation marks have varying functions and deviant frequencies
in relation to the written or spoken mode prominent in the registers. As a result,
features of punctuation are suggested as a valid and necessary extension of Biber
and Conrad’s (2009) model of register analysis.
In her paper “Linking up register and cognitive perspectives: Parenthetical
constructions in academic prose and experimentalist poetry”, Martina Lampert
chooses a specific linguistic feature as the standard of register comparison. By
concentrating on the syntactic construction of parenthesis, she draws an analogy
between a minimalist poem by E. E. Cummings and a scientific research paper
within the framework of a microscopic qualitative analysis. She picks two regis-
ters which are located at the opposite ends of a continuum of written discourse
and pays attention to punctuation marks as well, in this case to parenthetical
round brackets (so-called lunulae). Since situational features of register descrip-
tion are closely linked to cognitive principles, a correspondence is established
between Biber’s register analysis and Leonard Talmy’s cognitive semantic
approach. Lampert concludes by arguing that parenthesis should be included in
Biber and Conrad’s (2009) list of lexico-grammatical features relevant to register
investigation.
138 Section II: Cross-register comparison
The study “Cohesive devices across registers and varieties: The role of
medium in English” by Stella Neumann and Jennifer Fest combines the compar-
ative analysis of academic writing, administrative writing, broadcast discussions,
conversations and exams with regional variation. The term “regional” is used
here and in the paper in the broader Hallidayan sense grouping variation by the
speakers’ geographical background as opposed to functional variation varying by
context of use, not by user. Based on data from the International Corpus of English,
functional variation is investigated within the six L1 and L2 Englishes of Singa-
pore, Hong Kong, India, Canada, Jamaica and New Zealand. An examination of
the lexico-grammatical features of pronouns, conjunctions and lexical density
sheds new light on the use of cohesive ties across both varieties and registers.
In particular, quantitative surveys show that there are significant differences in
the frequency of the cohesive items between spoken and written registers. Along
these lines, it becomes obvious that an exhaustive discussion of any regional or
national variety of English needs to take into account register variation as well, so
that text linguistics is shown to be an indispensable complement to sociolinguis-
tics. Moreover, this paper builds a bridge to Section III, in which the interrelation
between regional and register variation is further elucidated.
Christina Sanchez-Stockhammer
Punctuation as an indication of register:
Comics and academic texts
Abstract: The currently most established definition of a register is the one devel-
oped by Douglas Biber in numerous publications (e.g. Biber 1988, 1995, 2006),
namely “a variety associated with a particular situation of use” (Biber and Conrad
2009: 6). The delimitation of individual registers such as telephone conversations
or newspaper editorials is based on their situational context, their lexical and
grammatical characteristics and the functional relationship obtaining between
context and language.
While Biber’s multidimensional approach already considers a multitude
of different lexico-grammatical features as potential indicators of register, this
paper adds a new perspective by exploring a feature type which has not been
taken into account so far in the different versions of his model, namely punctu-
ation.
After discussing the functions of various punctuation marks, the paper pre-
sents the corpus-based evidence of a small-scale study on two registers tending
towards the extremes of the spoken – written dimension, namely academic texts
and comics. To this end, the corpus AcadText was compiled for the present study
by analogy to the comic component of the comic corpus CoCo (described in
Sanchez-Stockhammer 2012), which comprises excerpts from Superman, Batman
and Uncle Scrooge and considers the text occurring in text boxes with narration,
inside speech bubbles, as onomatopoeia superimposed on the pictures etc.
The results show that some punctuation marks (such as exclamation marks
and round brackets) correlate strongly with spoken and written style respectively
and barely occur in the contrasting register. Furthermore, even in those cases
where the results are quantitatively similar, differences in usage become obvious
upon closer consideration – e.g. the dominant use of commas after introduc-
tory interjections or proper nouns with vocative function in comics as compared
to more varied uses of that punctuation mark in academic texts. These results
suggest that punctuation is indicative of register indeed, and that it makes sense
to introduce punctuation as an additional category in Biber’s register model.
140 Christina Sanchez-Stockhammer
(1) Those who are fond of sleeping late make unreliable workers.
is usually spoken with a pause after late, but it does not contain a comma if
common spelling conventions are adhered to (Meyer 1987: 70). By contrast, the
sentence
is realised with a comma but arguably not produced with a pause in speech
(Meyer 1987: 71). This raises the question whether the reverse relation between
punctuation as the primary feature and prosody as its realisation in speech can
also be postulated. One of the few exceptions where it is claimed that a feature of
the written modality is rendered in oral communication are so-called air quotes,
which are drawn into the air manually while speaking and which “intermodally”
refer the listeners/receivers to the printed source of a spoken quote (cf. Lampert
2013). However, air quotes rely on visual gestures rather than prosody. By con-
trast, punctuation marks are produced orally on some occasions, such as when
separating whole numbers from decimals, e.g. in
However, in such cases it is actually the terms referring to the punctuation marks
that are realised in the spoken modality rather than the corresponding function of
the punctuation mark. A third conceivable option regarding the relation between
punctuation and prosody is that there is none: for instance, Nunberg (1990: 7)
argues that punctuation has no correspondence in speech and that it exploits
“the particular expressive resources that graphical presentation makes availa-
Punctuation as an indication of register: Comics and academic texts 141
(5) The increasing evidence that language processing is sensitive to lexical and structural
co-occurrences at different levels of granularity and abstraction has led to the hypothesis
that lexical and structural processing may be unified…???!!!
in actual language usage – at least not in the original context of use.1 Various
explanations can be advanced for this: for instance, the first sentence is very
short and therefore lends itself to the incredulous intonation associated with
such a cluster of punctuation marks far better than the second sentence with its
complex structure. Possibly even more importantly, the second sentence contains
information situating it in the register of written academic language (it has been
adapted from Snider’s 2009 article “Similarity and structural priming”), and it
would seem that the punctuation above is unusual for an academic text to say
the least. This discrepancy between the constructed example above and readers’
expectations suggests that language users tend to expect particular types of punc-
tuation mark and their combination in some types of text rather than in others.
If that is the case, then it should also be possible to use punctuation marks as
an indication or even marker of individual registers – a hypothesis that will be
explored in the remainder of this paper.
Following Peters (2004: 447), the present contribution distinguishes between
word punctuation (comprising e.g. hyphens and apostrophes occurring within
full stop .
question mark ?
exclamation mark !
comma ,
semicolon ;
colon :
dash –
slash /
suspension dots …
single quotation marks ‘’
double quotation marks “”
round brackets ()
square brackets [].
Register as the second concept which needs to be defined for the empirical study
presented here is used with different meanings in the literature (cf. Schubert,
this volume). The most commonly used definitions of register are based on the
work of Douglas Biber. In numerous publications (e.g. Biber 1988, 1995, 2006),
his use of the term has developed from what might be called a synonym of genre
(Biber 1995: 910 about Biber 1988) to “a variety associated with a particular situ-
ation of use” (Biber and Conrad 2009: 6), i.e. a concept comprising all situation-
dependent variation in language use, regardless of the level of specialisation
(Biber and Conrad 2009: 32), but with specific sub-registers displaying less var
iation than more general registers (Biber and Conrad 2009: 33). In Biber’s model
(for a summary cf. Schubert, this volume), register features occur throughout texts
from a particular register and are more frequent in the target register than in most
other registers. Thus the passive voice is not restricted to academic writing and
may occur in different types of text, but it is particularly frequent in that register.
Register features can be structures on any linguistic level, from words to syntactic
constructions. The occurrence of specific lexico-grammatical features in regis-
ters is attributed to their functionality (Biber 2006: 11): they are believed to be
“particularly well suited to the purposes and situational context of the register”
(Biber and Conrad 2009: 6). The co-occurrence of features is therefore interpreted
as reflecting their shared functions (Biber 1995: 30). With regard to the features
under consideration, Biber’s approach has evolved in the course of time:
Punctuation as an indication of register: Comics and academic texts 143
– Both Biber (1988: 73–75) and Biber (1995: 94–96) consider 16 major categories
comprising 67 linguistic features:
1) Vocabulary features
2) Content word classes
3) Function word classes
4) Derived words
5) Verb features
6) Pronoun features
144 Christina Sanchez-Stockhammer
Without going into detail what these various categories represent precisely, it
becomes immediately obvious that punctuation or other orthographic character-
istics (such as capitalisation) do not figure among the distinctive features treated
in any of Biber’s approaches, in spite of the fact that Biber (1995: 29) maintains
that “[a]ny linguistic feature having a functional or conventional association can
be distributed in a way that distinguishes among registers”.
This raises the question whether there are any arguments supporting the
deliberate exclusion of punctuation as a distinctive feature. Based on Biber’s defi-
nition above, one might consider arguing that punctuation does not constitute a
linguistic feature – but this is hard to maintain: while punctuation is restricted
to the written modality, it is used nonetheless to represent linguistic meaning
(cf. below). Punctuation marks may even reverse the meaning of a sentence com-
pletely; compare
(6) The Democrats say the Republicans are sure to win the next election.
(7) The Democrats, say the Republicans, are sure to win the next election.
In the second example, the Democrats are expected to be victorious (cf. Runkel
and Runkel 1984: 34). In view of its meaning-distinguishing function, punctua-
tion should consequently be considered a linguistic feature.
If punctuation had no conventional or functional association, as required by
the definition of linguistic features above, it should be possible to use all punctu-
ation marks interchangeably. This is, however, not the case (cf. the next section).
Since punctuation is restricted to writing, using it as a feature would seem
to have the disadvantage of disregarding all registers belonging to spoken lan-
guage. This is, however, only true to a certain extent, since spoken texts may be
transcribed (e.g. in interviews for magazines or in corpora), and punctuation is
conventionally inserted for the convenience of the reader in such cases. The rela-
tion between the two dimensions is clarified by Söll and Hausmann (1985: 17),
Punctuation as an indication of register: Comics and academic texts 145
who distinguish between the medium of realisation (auditory vs. visual code) as
opposed to the characteristics of conception (spoken vs. written style). Punctu-
ation is thus only present in the visual code but may be used in texts belonging
both to the spoken or written style. Söll and Hausmann’s distinction is thus useful
e.g. in view of the possibilities offered by computer-mediated communication,
which may use the visual code but some kind of spoken style. Note also that Biber
and Conrad’s (2009: 78–82) long list of linguistic features includes a subcategory
“Special features of conversation”, which is restricted to a subgroup of registers
with a tendency towards oral realisation and includes e.g. pauses, fillers and
backchannels. As a consequence, the addition of a subcategory “Punctuation”,
which applies to registers in the visual code only, would appear to be legitimate.
Furthermore, one should not overlook the fact that Biber and Conrad
(2009: 63) speak of a “list of features that you might consider” in register anal-
ysis, which means that they do not claim completeness. They also state that
“[C]onsulting a corpus-based reference grammar is useful for deciding which fea-
tures to study”. Since punctuation is only marginally treated in such grammars,
possibly in view of written language’s widely assumed status as a secondary
system (cf. e.g. Bloomfield 1933: 21), this may have led to its omission from the
most influential model of register so far.
To conclude, there are no convincing reasons for excluding punctuation as
a possible register feature. Instead, it is argued in the following that there are
several good reasons for considering it.
of the corresponding registers and should also fulfil that role with regard to punc-
tuation.
: . . . : ? : . : . : ? [ ] : ? : .
: . . .
: ?
: .
: .
: ? [ ]
: ?
: .
The most striking feature is presumably the occurrence of a colon at the begin-
ning of every line, which is followed by either full stops or a question mark,
thereby suggesting an interactive communicative situation. Indeed, the text is
part of a conversation between a group of friends walking to a restaurant, which
is included in the Longman Spoken and Written English Corpus:
Judith: Yeah I just found out that Rebekah is going to the University of Chicago to get
her PhD. I really want to go visit her. Maybe I’ll come out and see her.
Eric: Oh is she?
Judith: Yeah.
Eric: Oh good.
Elias: Here, do you want one? [offering a candy]
Judith: What kind is it?
Elias: Cinnamon.
Text A: Text sample 1.1 from the LSWE Corpus (Biber and Conrad 2009: 7–8)
The colons in the full text are actually not line-initial but follow the names of the
speakers, just as they would in the scripted version of a play. Following the same
type of convention, the information referring to the extra-linguistic context has
Punctuation as an indication of register: Comics and academic texts 147
been added in square brackets at the end of one line. The punctuation marks are
thus strongly indicative of conversation.
The same is true of Figure 2. The large amount of exclamation marks, colons,
question marks and (this time angled) brackets in Text B makes it highly unlikely
that the text should be a tax declaration document or newspaper article. While
the fact that it is an excerpt from a drama – i.e. scripted speech – and no transcript
of a conversation cannot be deduced from punctuation alone, the oral dimension
of the text emerges by analogy to Text A.
Text B: Text sample 1.7 from Biber and Conrad (2009: 20): Paul Zindel’s 1970 drama The
Effect of Gamma Rays on Man in the Moon Marigolds
This raises the question what typical register features are linked to punctuation.
For example, the large number of first and second person pronouns typical of
spoken conversation (Biber and Conrad 2009: 7–8) – which is supported by the
prototypical extracts above – cannot be derived from punctuation. By contrast,
another characteristic linguistic feature can: the pervasiveness of questions,
which are usually marked by sentence-final question marks in many (but not all)
transcripts of spoken language, e.g. in Text A, and in texts that are written to be
spoken (e.g. Text B). The presence of question marks can thus be linked to the
presence of questions: both are indicative of interaction (cf. Biber and Conrad
2009: 7–8). Since questions favour the production of answers as the privileged
second pair part (Levinson 1983: 307), full stops following question marks are
likely to represent not only statements but answers. This assumption is supported
by Texts A and B above. According to Biber (1988: 227), questions “indicate a
concern with interpersonal functions and involvement with the addressee”. It
follows from this that they should be more frequent in registers involving that
148 Christina Sanchez-Stockhammer
It therefore seems safe to claim that the punctuation marks closing sentences
follow a prototype-based distribution (cf. e.g. Rosch 1973, 1975) with an ideal
exemplar in the centre of the category and fuzzy boundaries in its periphery. The
latter would include less typical uses, such as
2 Note, however, that puzzles need not necessarily be phrased as questions, e.g. in the case of
crosswords (cf. Pham, this volume), which tend not to use question marks.
Punctuation as an indication of register: Comics and academic texts 149
ation with a full stop in Quirk et al. (1985) for this particular example seems to
suggest that in doubtful cases, punctuation follows the syntactic rather than the
pragmatic perspective. While the use of an exclamation mark does not seem to be
entirely excluded in this particular example (even if an informal internet search
confirms the full stop as the norm), other indirect speech acts such as Searle’s
(1975: 73) famous
( ). ( ) , ( . , . , . ). , ( ; . ). , ( , ; . ; . ; . ; ). ( ) , ( . ). . . ( ) .
This sample is not only characterised by its complete lack of question marks and
exclamation marks but also by a large proportion of full stops and brackets, many
commas and even some semicolons. It comes from the introduction to a scientific
research article and is thus situated clearly towards the extreme of the written
dimension of language conception.
Hybridization between species can severely affect a species status and recovery (Rhymer &
Simberloff 1996). Threatened species (and others) may be directly affected by hybridization
and gene flow from invasive species, which can result in reduced fitness or lowered genetic
variability (Gilbert et al. 1993, Gottelli et al. 1994, Wolf et al. 2001). In other cases, hybridiza-
tion may provide increased polymorphisms that allow for rapid evolution to occur (Grant &
Grant 1992; Rhymer et al. 1994). Species can also be influenced indirectly, because hybrid-
ization may affect the conservation status of threatened species and their legal protection
(O’Brien & Mayr 1991a, 1991b; Jones et al. 1995; Allendorf et al. 2001; Schwartz et al. 2004;
Haig & Allendorf 2005). The Northern Spotted Owl (Strix occidentalis caurina) is a threat-
ened subspecies associated with rapidly declining, late-successional forests in western
North America (Gutierrez et al. 1995). Listing of this subspecies under the U.S. Endangered
Species Act (ESA) attracted considerable controversy because of concern that listing would
lead to restrictions on timber harvest.
Text C: Text sample 6.13 from Biber and Conrad (2009: 163): Scientific research article
(Genetic identification of Spotted Owls … , Conservation Biology, 2004).
While scientific research attempts to answer research questions, these are usually
formulated indirectly, with the consequence that the number of direct questions
and the ensuing question marks is relatively low (although not necessarily zero).
Exclamation marks, by contrast, seem to be practically excluded in this register.
This is presumably because the discourse functions usually associated with that
punctuation mark (cf. Quirk et al. 1985: 803–804 above) contradict the general
principles of academic research: it is neither directive (at least not overtly) nor
concerned with the expression of emotions such as being impressed. These con-
ventions are communicated between researchers, e.g. by supervisors marking
their students’ papers or by means of style guides.3
The occurrence of large numbers of full stops is not only due to the focus of
research papers on transmitting information but also to the frequent occurrence
of the abbreviation et al., which is rarely found outside academia, in this particu-
lar passage. The use of brackets is also highly conventionalised: with few excep-
3 Note, however, that very popular style guides giving advice on academic research, such as
Booth et al. (2008), do not mention punctuation (merely style), and others, such as Swales and
Feak (2010: 27), limit themselves to the discussion of semicolons, colons, dashes and commas.
Punctuation as an indication of register: Comics and academic texts 151
4 For a more detailed theoretical account of the guide functions of punctuation cf. Patt (2013).
152 Christina Sanchez-Stockhammer
In most other cases, however, the relation is not as unequivocal, because the
punctuation marks have several functions (some of which may overlap with the
functions of other punctuation marks): as we have seen, colons can be used to
set off the name of characters in a play from their text, but very frequently, they
are followed by explanations or specifications and they can therefore commonly
be found in registers with an argumentative function, such as academic papers.
Alternatively, additional information may be included in brackets or follow-
ing a dash,5 but different degrees of formality are associated with the various
punctuation marks. According to Seely (2007: 84), brackets are “the most formal
(and most obvious) way of showing parenthesis”, commas are “less forceful”
and dashes “the least formal”. This seems to imply that a superficial analysis of
punctuation marks does not suffice: it is not enough to simply count the number
of commas, question marks etc. (not even if the number of words in the texts
is taken into consideration), but it is also necessary to consider their individual
functions and possibly even their stylistic value. This is the only means of iden-
tifying highly conventionalised register-specific uses, such as initial exclamation
marks expressing negation (e.g. !interesting = not interesting) in “hacker-influ-
enced interactions” (Crystal 2001: 90) or the specialised use of double quotation
marks in comics (cf. below).
Table 1: The Comic Corpus (CoCo) texts (cf. Sanchez-Stockhammer 2012: 68)
6 While the absence of emoticons can be explained by the fact that the multimodality of comics
permits the representation of facial expression in a more detailed manner by the drawn faces
of the interlocutors, the absence of obscenicons from the corpus is presumably due to chance.
However, since the expression of anger in comic strips seems to use mainly question marks and
exclamation marks from the set of the punctuation marks, while frequently using symbols (e.g.
<@>, <#>, <$>, <%>, <&> and <*>) and also drawings of spirals etc. (cf. Law 2010), the treatment
of obscenicons belongs into the periphery of the use of punctuation marks anyway.
7 Since academic English is a register with a particularly strong lingua franca element and since
all articles in AcadText come from high-quality journals and have consequently undergone in-
tense editing, the native language of the authors was expected to play only a marginal role. While
the individual author Schneider has a German-language background, either all or the majority
of the authors of the jointly written articles were working at universities in English-speaking
countries at the time of publishing.
Punctuation as an indication of register: Comics and academic texts 155
the first two pages with numbers ending in zero from each article. End-of-line
hyphens were deleted and m-dashes flanked by spaces. Word-internal brack-
eting, e.g. in (semi-)automatic, was deleted so as not to skew the automated
counts. While full stops, question marks and quotation marks counted as sen-
tence endings, colons and semicolons were considered sentence-internal. Head-
ings and rows in tables counted as one sentence each. It becomes immediately
obvious that the number of words per sentence is considerably larger in the aca-
demic texts than in the comics.
(due to spatial restrictions and the fact that the speakers in a conversation are
indicated by the pointed side of speech bubbles in contrast to usual scripted
conversation)
6. fewer brackets than dashes
(because these represent the most and least formal punctuation marks indi-
cating parenthesis according to Seely 2007: 84)
7. a certain number of suspension dots
(in order to permit longer sentences to continue in the following speech
bubble).
For the quantitative analysis of the punctuation marks, all letters and numbers in
the original corpus texts were deleted, and the punctuation marks were counted
semi-automatically by using the “replace” function in Microsoft Word. The
results in Table 3 were normalised by dividing the absolute results by the number
of words in the respective texts, then multiplying them by a thousand (in order to
increase readability) and finally rounding them up or down to yield full numbers.
Punctuation as an indication of register: Comics and academic texts 157
Table 3: Normalised results (divided by the number of words per text, multiplied by 1,000 and
rounded)
Full stops 78 50 4 53 69 24
Question marks 20 22 14 0 1 0
Commas 60 62 37 62 72 71
Semicolons 0 0 0 4 1 2
Colons 0 1 0 1 0 11
Dashes 16 3 0 1 0 1
Slashes 0 0 0 4 0 0
Suspension dots 40 43 18 0 0 1
Apostrophes 58 43 71 1 4 5
For each line (i.e. for each punctuation mark), shaded cells indicate intra-group
similarity and inter-group dissimilarity between comics and academic texts. This
is either based on a very obvious difference in the results (e.g. for the suspension
dots) or, in some cases, on the presence of at least two values larger than zero in
one type of register as against all-zero in the three texts from the other register
(e.g. for the semicolons).
Note that the number of quotation marks and brackets corresponds to the
number of pairings of these punctuation marks. This is because it obligatorily
takes two exemplars to set off parentheses – in contrast to dashes or commas,
which may open a parenthesis closed by the final punctuation mark in a sen-
tence, e.g. a full stop (cf. Lampert 2011: 91–92). While an alternative single-punc-
tuation-mark use of brackets can be imagined, namely when a single closing
bracket is employed to set off the introductory ordering letters in lists, such as
158 Christina Sanchez-Stockhammer
a) xx
b) yy
c) zz,
the fact that this type of usage did not occur in the corpus made it unnecessary to
establish a more detailed distinction here. If the results from Table 3 are analysed
in relation to the hypotheses formulated above, the following findings emerge:
(i) As expected, there is a marked difference in the use of question marks
and exclamation marks in comics and academic texts: only one academic text
contains a single question mark at the end of the sentence
(12) What function do beginning and ending lexemes assume in compound recognition?
and no text from this register uses any exclamation marks. This is in line with
the usual correlation of these two punctuation marks with conceptually spoken
language: all the comic texts contain both question and exclamation marks,
although the proportion varies considerably, with results ranging from 14 to 134
instances.
(ii) The discussion of quotation marks requires a distinction between single
and double quotation marks. As for the distribution of the single quotation marks,
their analysis made it necessary to distinguish manually between single quotation
marks and the formally identical apostrophes. Since apostrophes are word-inter-
nal punctuation marks, they were only included in the analyses because of this
necessary distinction, but they actually yielded interesting results: while both
academic texts and comic texts contain a small number of stylistically neutral
genitives (4 in Superman, 3 in Batman, 2 in Uncle Scrooge), the majority of the
large amount of apostrophes in the comic texts either marks informal contrac-
tions (e.g. won’t) or omissions or shortenings characteristic of informal language
usage, e.g.
(13) With a swoop to his left an’ a peck to th’ right, he catches rat finks way out west!
Note, however, that the article by Schneider, which uses single quotation marks,
appeared in Language, which is an American journal.
The analysis of the article by Biber et al. beyond the passage included in the
corpus shows that a considerable proportion of single quotation marks enclose
no quotations but paraphrases of meaning, e.g. in
(15) any global characterizations of ‘General English’ should be regarded with caution
(16) we need to remember that ‘nations are mental constructs, “imagined communities” ’
which are constructed discursively […] (Wodak et al. 1999:4).
and it becomes clear that these are merely used to mark quotation marks within
a quotation whose reference is given later in the text; the convention being that
single quotation marks are doubled in this case and vice versa (cf. Achtert and
Gibaldi 1985: 80; Sanchez-Stockhammer, forthcoming).
While this quasi-absence of double quotation marks from AcadText may be
attributed to the small size of the random sample or the conventions of individual
publishers, chance cannot explain the other unexpected finding, namely the rel-
ative frequency of double quotation marks in the comic corpus (8 pairs; at least
one per text). Since direct speech is already marked as such by its inclusion in
speech bubbles, the double quotation marks must have a different function here:
indeed, the quotation marks in the comics are used in their general (academic)
function and serve to quote the speech of others. Thus the utterance
is countered by
Double quotation marks are also employed in the comics to refer to the metalin-
guistic use of words, e.g.
(19) Funny, I didn’t think you even knew the word “honest,” Penguin.
160 Christina Sanchez-Stockhammer
This use is completely missing in the academic texts. Alternatively, commas occur
after introductory interjections in the comics, e.g. in
in another use that was not found in the academic writing. These register-spe-
cific uses explain why commas occur relatively frequently in the comic texts. The
most frequent use of commas in comics which is also to be expected in academic
texts (but is not too frequent in the sample) is the delimitation of sentence-initial
adverbials, e.g. in
(22) According to the contract, they are RABBIT eggs for your children, King!
(iv) Semicolons, by contrast, only occur in the academic writing, e.g. in Schneider
(2003):
(23) traces of the previous stage will still be found; that is, some insecurity remains
Since they are absent from the sample of comic texts – presumably due to the fact
that most of their uses require relatively long sentences – they can generally be
used as an indication of register with regard to the spoken/written dimension.
(v) Surprisingly, it was observed that the amount of colons does not vary
extremely between the comics and the academic texts considered. Merely
Schneider (2003) stands out, since it is the only one among the three academic
texts to indicate the precise pages in text-internal references that do not affect
quotations.
(vi) Neither sample contained any square brackets. As expected, not a single
pair of round brackets was used in the comic corpus – in contrast to the academic
Punctuation as an indication of register: Comics and academic texts 161
texts, where brackets are commonly used to indicate references. The extremely
large proportion in Juhasz (2003) with 41 pairs of round brackets is due to the
fact that a large part of the passage randomly included in the AcadText corpus
is constituted by the results section, in which relevant figures and examples are
added in brackets, e.g. in
(24) high-frequency beginning lexemes were responded to quicker than low-frequency begin-
ning lexemes, t1(27) = ± 3.78, p < .01, t2(18) = ± 2.02, p = .059 .
This use of dashes represents a function which is not usually required in aca-
demic texts.
(vii) The difference in frequency between the use of suspension dots in
comics and academic texts is far more pronounced than expected: the only aca-
demic text using them is Schneider (2003) in one instance where omission in a
quoted passage is indicated:
(26) ‘the discursive constructs of nations and national identities … primarily emphasize
national uniqueness and intra-national uniformity but largely ignore intra-national dif-
ferences’ (Wodak et al. 1999:4).
This is a use which is highly unlikely to occur in comics. However, the low fre-
quency of suspension dots in the sample of academic texts seems to suggest
that quotations are usually extracted in shorter portions and that omissions are
avoided. This is supported by the quotations in AcadText, all of which represent
extracts from individual sentences only, e.g. the following series of quotations
from Schneider (2003):
(27) a case of ‘identity revision’ triggered by the insight that one’s traditional identity turns
out to be ‘manifestly untrue’ or at least ‘consistently unrewarding’ (Jenkins 1996:95)
162 Christina Sanchez-Stockhammer
Comics, by contrast, use suspension dots very frequently (all texts employ them
between 18 and 43 times) and often in order to create cohesion by their occur-
rence not only at the end of an utterance which is interrupted in one panel, e.g. in
(28) You might be stronger and faster than I am right now, Parasite…
but also at the beginning of the continued speech or thought in the next panel:
(29) …but you’ve barely had forty-eight hours to practice using my powers.
Such interruptions are not merely attributable to spatial restrictions, it seems, but
also to the fact that the picture in the new panel corresponds more closely to the
action indicated in the second part, such as a punch with a fist in the Superman
example above.
The differences between the use of punctuation marks in the texts from
the comic corpus and the academic texts are even more striking if considered
graphically. Figure 4 summarises the features which are characteristic of comics
(question marks, exclamation marks, suspension dots and apostrophes); Figure
5 those which are more typical of academic writing (semicolons, single quotation
marks and round brackets).
Figure 4: Punctuation marks occurring more frequently in comics than in academic texts
It may therefore come as a surprise that this striking difference between the
two registers cannot be backed statistically: non-parametric statistical tests for
independent samples were carried out in SPSS in order to compare the medians
between groups (i.e. comics vs. academic texts), but even the Mann–Whitney U
Punctuation as an indication of register: Comics and academic texts 163
test yielded no significant results for any of the variables (e.g. question marks)
due to the small number of texts considered. Nonetheless, the graphically imme-
diately obvious difference between comics and academic texts in Figures 4 and 5
permits the tentative conclusion that the use of punctuation in different registers
can be employed as a register feature. At the same time, these results call for
further empirical research, which is extremely likely to provide statistical backing
for the more than obvious tendencies observed in this explorative study.
Figure 5: Punctuation marks occurring more frequently in academic texts than in comics
4 C
onclusion
Punctuation is a completely underresearched feature in register studies at the
time of writing: thus Barbieri’s extensive annotation of major register and genre
studies in Biber and Conrad’s Appendix A (2009: 271–295) does not mention
punctuation a single time in the column “features under investigation”. It is only
in Barbieri’s summary of Crystal’s (2001) major findings that there is a minor ref-
erence to it, when “minimal punctuation” is found to be one of the “common
characteristics of internet registers” (Biber and Conrad 2009: 289).
However, the empirical analysis of two register-specific corpora in the present
study – one of comics and one of academic texts – suggests that certain types of
punctuation tend to occur more frequently in certain types of register and that
punctuation can therefore be employed as an indication of register. For instance,
some punctuation marks correlate strongly with spoken and written style respec-
tively and barely occur in the contrasting register. While question marks, excla-
mation marks, suspension dots and apostrophes are far more frequent in comics
164 Christina Sanchez-Stockhammer
than in academic texts, the latter use a larger proportion of semicolons, single
quotation marks and round brackets. Furthermore, even in those cases where the
results are similar from a quantitative perspective, differences in usage emerge
upon closer consideration: for instance, comics tend to use commas after intro-
ductory interjections or proper nouns with vocative function, whereas academic
texts make more varied use of that punctuation mark. Further research into this
topic is required to establish the register-distinctive functions of the punctuation
marks in more detail and for a larger number of registers.
Biber’s distinction between different registers is “based on the premise that
most formal differences reflect functional differences” (Biber 1995: 136). None-
theless, he claims that his multidimensional approach differs from the studies
of his predecessors in that he does not conduct a functional analysis in the first
place so as to identify characteristic linguistic features. Instead, he states that he
“first identifies groups of co-occurring features and subsequently interprets them
in functional terms” (Biber 1988: 24). While this seems to contradict an approach
such as the one used in the present study at first sight, one should not forget
that Biber’s analyses presuppose a list of linguistic features which were then
subjected to statistical analyses. Taking into account that he reviewed “previous
research to identify potentially important linguistic features” in his preliminary
analysis (Biber 1988: 64) and that these are understood as features “that have
been associated with particular communicative functions and therefore might be
used to differing extents in different types of text” (Biber 1988: 71–72), it becomes
clear that he is not correlating random phenomena but only the results of previ-
ous functional analyses – even if these were carried out by other researchers. In
this sense, the present study can be regarded as a legitimate suggestion for the
extension of the original model.
Within such a framework, punctuation is on a level with the 15 other major
categories such as “Special features of conversation” (Biber and Conrad 2009:
82). “Punctuation” is thus tentatively suggested as category 16 with the following
subordinate features (some of which did not prove distinctive for comics vs. aca-
demic texts but may play a more important role with regard to the differentiation
between other registers):
1. full stop
2. question mark
3. exclamation mark
4. comma
5. semicolon
6. colon
7. dash
8. slash
Punctuation as an indication of register: Comics and academic texts 165
In a very wide reading, the division of a text into paragraphs could also be con-
sidered as punctuation (cf. Huddleston and Pullum 2002: 1725). According to
Nunberg (1990: 17), “punctuation must be considered together with a variety of
other graphical features of the text, including font- and face-alternations, capi-
talization, indentation and spacing”, all of which are said to fulfil a similar func-
tion. To this can be added the use of italics and bold print. At first sight, these
features seem to go beyond the purely linguistic means and to unduly emphasise
the visual and multimodal aspect of written language – but they sometimes find
a correspondence in spoken language in pauses, stress, intonation etc., even if it
is not completely systematic (cf. above).
What makes the proposed category 16 special is the fact that the register fea-
tures listed therein are not lexico-grammatical, like the other features included
in Biber’s models up to the time of writing. Some of the punctuation features
correlate with lexico-grammatical features (e.g. question marks with syntactic
questions), which are in turn typical of specific registers (e.g. conversations).
However, this does not mean that punctuation is a secondary register feature.
Many other punctuation marks correlate with more abstract categories; e.g. quo-
tation marks with quotations, which may take practically any lexical or syntactic
form. Furthermore, it is normal that “linguistic features co-occur in texts because
they reflect shared functions” (Biber 1995: 30). This does not necessarily imply
that one should receive more weight than the other. As a consequence, punctua-
tion is considered a register feature in its own right.
In 1988, Biber (71–72) states for register analysis that “the goal is to include the
widest possible range of potentially important linguistic features”. The empirical
analysis presented here clearly suggests punctuation as such a feature. However,
the proposed addition of punctuation to the set of categories is not to be regarded
as any form of criticism of the original model, but merely as the suggestion of a
valuable category to add to the long list of previously used features.
5 R
eferences
Achtert, Walter S. & Joseph Gibaldi. 1985. The MLA style manual. New York: The Modern
Language Association of America.
166 Christina Sanchez-Stockhammer
Arendholz, Jenny, Wolfram Bublitz, Monika Kirner & Iris Zimmermann 2013. Food for thought –
or, what’s (in) a recipe? A diachronic analysis of cooking instructions. In Cornelia Gerhardt,
Maximiliane Frobenius & Susanne Ley (eds.), Culinary linguistics: The chef’s special,
119–137. Amsterdam: Benjamins.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge University Press.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and written
registers. Amsterdam: Benjamins.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Bloomfield, Leonard. 1933. Language. New York: Holt.
Booth, Wayne C., Gregory G. Colomb & Joseph M. Williams. 2008. The craft of research. 3rd edn.
Chicago: University of Chicago Press.
Crystal, David. 2001. Language and the internet. Cambridge: Cambridge University Press.
Halliday, Michael A.K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Arnold.
Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English
language. Cambridge: Cambridge University Press.
Jakobson, Roman. 1985. Closing statement: Linguistics and poetics. In Robert E. Innis (ed.),
Semiotics: An introductory anthology, 145–175. Bloomington: Indiana University Press.
Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts
on a cognitive-semantic analysis of written language. International Journal of Cognitive
Linguistics 2(1). 81–106.
Lampert, Martina. 2013. Say, be like, quote (unquote), and the air-quotes: Interactive
quotatives and their multimodal implications. English Today 29(4). 45–56.
Law, Gwillim. 2010. Grawlixes past and present. http://www.statoids.com/comicana/grawlist.
html (accessed 15 July, 2014).
Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.
Meyer, Charles F. 1987. A linguistic study of American punctuation. Frankfurt am Main: Peter
Lang.
Mithun, Marianne. 2012. The deeper regularities behind irregularities. In Thomas Stolz et al.
(eds.), Irregularity in morphology (and beyond), 39–59. Berlin: Akademie.
Moore, David S. & William I. Notz. 2006. Statistics: Concepts and controversies. New York: W.H.
Freeman.
Nunberg, Geoffrey. 1990. The linguistics of punctuation. Menlo Park, CA: CSLI.
Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in
English: Exploring the guide functions of punctuation. Tübingen: Narr.
Peters, Pam. 2004. The Cambridge guide to English usage. Cambridge: Cambridge University
Press.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive
grammar of the English language. London: Longman.
Rosch, Eleanor. 1973. On the internal structure of perceptual and semantic categories. In
Timothy E. Moore (ed.), Cognitive development and the acquisition of language, 111–144.
New York: Academic Press.
Punctuation as an indication of register: Comics and academic texts 167
Corpora:
Comic Corpus (CoCo):
Re-print: Englisch lernen mit Batman. Bad Guys Gallery. 2007. Munich: Berlitz.
Re-print: Englisch lernen mit Superman. Up, up and away! 2007. Munich: Berlitz.
Walt Disney’s Uncle $crooge. No. 376. April 2008. York (PA): Gemstone.
Biber, Douglas, Susan Conrad & Randi Reppen 1994. Corpus-based approaches to issues in
applied linguistics. Applied Linguistics 15. 169–189.
Juhasz, Barbara, Matthew S. Starr, Albrecht W. Inhoff & Lars Placke 2003. The effects of
morphology on the processing of compound words: Evidence from naming, lexical
decisions and eye fixations. British Journal of Psychology 94. 223–244.
Schneider, Edgar. 2003. The dynamics of new Englishes: From identity construction to dialect
birth. Language 79. 233–281.
Martina Lampert
Linking up register and cognitive
perspectives: Parenthetical constructions in
academic prose and experimentalist poetry
Abstract: This paper will explore the possibility of linking up Biber’s register
analysis and Talmy’s cognitive semantics, based on the assumption that some
fundamental cognitive principles inform situational features and hence would,
in part, determine linguistic characteristics. As one case in point, two samples
of parenthetical constructions from opposite written registers, academic science
writing and minimalist poetry, are scrutinised in an initial qualitative analysis.
The study identifies both a general structural and functional similarity in the
examples selected for illustration, suggesting that no significant register distinc-
tion will ensue, while the parenthetical pattern is likely to exhibit a substantial
cross-medial difference between speech and writing. These preliminary findings
invoke properties of the human cognitive architecture as well as evolutionary spe-
cifics of the language modalities as critical parameters of influence and would
speak for their recognition as potential determinants of register and, in turn, for
a principled compatibility of the two linguistic approaches.
1 I ntroduction
In this paper, I will present some arguments for linking up Douglas Biber’s regis-
ter analysis with a recent (re)conceptualizion of register as a cognitive construct
framed in Leonard Talmy’s cognitive semantics, suggesting that the traceable
principled compatibility of these two major approaches to linguistic analysis
might open up some promising insights.
In his forthcoming The Attention System of Language1, Talmy advances the
view that register, generally couched in terms of “types of speech situations”,
1 As always, I am grateful to Len Talmy for the privilege of granting me access to a very substan-
tial current draft version of this forthcoming book; unless otherwise indicated, all quotes are
from this work, and the references to this unformatted draft lack page numbers.
may allow for a consistent re-analysis as speaker attitude, for instance, “toward
[a lexical item’s] core meaning itself; toward the speech participants (the speaker
himself, the addressee, or the relation between the two); or toward the current
circumstance”. That is, in a cognitive semantics perspective, register distinc-
tions would become conceivable as backgrounded speaker role, or attitude, for
that matter, which are introjected into the minds of participants, thus inevitably
involving attention and memory as relevant cognitive categories. To illustrate:
what might best be treated at root as a speaker’s attitude of respect toward the addressee –
or a speaker’s attitude of solemnity about the circumstance – could also be interpreted as
the presence of a formal situation that triggers the use of a formal register.
The fundamental significance of register for any appropriate analysis of any lin-
guistic item that surfaces in Talmy’s explication ties in with Biber’s belief that
“all linguistic descriptions”, such as, for instance, “collocational studies of par-
ticular words […] must include consideration of register differences as a central
organizing parameter, if they hope to achieve an accurate account of the patterns
of use” (Gray 2013: 361). Accordingly, “register differences should be an essen-
tial component of any investigation of language use” (Gray 2013: 369). These two
statements, then, concur on the view that, in general, any linguistic construction
inheres a register ‘signature’.
Moreover, Biber’s and Talmy’s approaches might in fact be read as suggestive
of such link-up, precisely as they are seen to converge in acknowledging the major
role of both medial and cognitive determinants of linguistic patterns: introjected
in participants’ minds, cognitive parameters appear to effectively constrain perti-
nent situational characteristics, as, e.g., Biber’s (1988: 160) remark tracing medi-
al-distinctive effects back to “different cognitive constraints on the speakers and
writers” unambiguously demonstrates – apart from and additional to the hard-
wired effectors of the medium and the tangible properties of the setting in their
specific interdependence. Capitalizing on their essentially evolutionary ‘design’,
Talmy (2007b) furthermore recognises the prime significance of the options and
constraints of both the production and reception circumstances, while attention
proves the single most decisive determinant among the situational specifics in
communicative interactions to shape a linguistic item’s representational format
and its functional potential.
As a case in point, I will focus on a much neglected though highly pervasive
phenomenon in language – what I have suggested to call parenthetical construc-
tions (cf. Lampert 1992: 16 and chapter 2 below). To give a cursory impression of
the pattern’s range in structural variability, the following examples, exclusively
from academic writing, are in order. It should be noted that they are all in line
Linking up register and cognitive perspectives 171
(1) Yet for all these changes, there is a continuity here, too, in the way that change is
(sometimes heatedly) debated and (sometimes grudgingly) accommodated.
(2) And there is a large number of common words for talking about the language itself, for
example slang, usage, jargon, succinct, and literate. (It is striking how many of these
words are particular to English. No other language has an exact synonym for slang, for
example, or a single word that covers the territory that literate covers in English, from
“able to read and write” to “knowledgeable or educated”.)
(3) Robertson, John M., Chi-Wei Linn, Joyce Woodford, Kimberly, K. Danos, and Mark A.
Hurst. 2001. The (Un)Emotional Male: Physiological, Verbal, and Written Correlates of
Expressiveness. The Journal of Men’s Studies 9, 393–412.
(4) Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm
Kelsall & John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London:
Routledge.
(5) He took pianists, guitarists and harpists in stride, but expressed shock at “13 young
lady violinists (!), 1 young lady violist (!!), 4 violoncellists (!!!) and 1 young lady contra-
bassist (!!!!).
(6) While ego orientation did not emerge as a significant predictor of likelihood to aggress
in any of the three groups, significant correlations were found between ego orienta-
tion and likelihood to aggress for boys, r (????) =.20, p <.005, and girls in the all-girls
league, r (???) =.40, p <.005.
Along the general lines sketched in the introductory paragraphs of this paper,
I will thus probe into parenthetical constructions’ common cognitive basis,
arguing that attention direction turns out to be a relevant consistent principle for
the explanation of parenthetical constructions’ usage profile, which would then
have to be added, as a principal determinant of the participants’ cognitive make-
up, to the list of situational features defining a register (cf. Biber and Conrad
2009: 40; for a similar suggestion, though more global and including punctuation
marks in general, see Sanchez-Stockhammer, this volume).
Why parenthetical constructions – and why attention? Apart from the general
neglect of attentional effects as ubiquitous phenomena in language (cf. Lampert
2009: 20–25), the pattern has somehow – vaguely, intuitively, anecdotally – been
associated with reduced attention and, in consequence, been dismissed as an
informational and textual ‘aside’ at least since the beginning of research on par-
enthetical constructions in Schwyzer’s (1939) seminal study. The central issue is,
172 Martina Lampert
2 Some initial empirical evidence challenging this view was offered in a plenary talk at an inter-
national workshop on “Cognitive Motivations of Second(ary) Voices: A Multimodal Perspective
on Parentheses and Quotations” (Bamberg 12/06/2014).
3 In this case study, I will disregard, for space limitations, dashes or commas as principal com-
petitors, which exhibit some distinctive constraints on the syntactic patterns they tolerate.
Linking up register and cognitive perspectives 173
4 It comes as no surprise that register-specific details are neither available on the total frequen-
cies nor the relative proportions of parenthetical constructions; however, on a cursory and infor-
mal inspection, frequencies of occurrence and especially variation in structural complexity seem
to increase toward the written end of a conceived spoken-written continuum, i.e., those registers
that are at a considerable distance to casual and spontaneous conversation where recurrent for-
mulaic patterns like comment clauses dominate.
Linking up register and cognitive perspectives 175
some cursory notes on the options and constraints of the production and recep-
tion circumstances are advanced from a cognitive semantics perspective, which
might again support the sensibility of the cross-framework alignment proposed
in this paper.
5 For register analysis, see Biber (1988: 20) and Biber and Conrad (2009: 36); Talmy’s attention
factors are essentially framed in terms of same- and cross-venue comparison.
Linking up register and cognitive perspectives 177
6 Typically, academic prose is contextualised by shared background knowledge (cf. Biber 1988:
48).
7 Disregarding some marginal cases as when the poem serves as an exercise in literary discourse
or, like in the present context, register and attention analysis.
178 Martina Lampert
(Biber and Conrad 2009: 46; see already Biber 1988: 11 and 38 as well as very
explicitly Nunberg 1990: 3–4, 7 and 14–15).
Regarding these two major determinants, I will, in the following, elaborate
on the compatibility of Biber’s and Talmy’s approaches as they become relevant
for the subsequent analysis of the samples: for a potential mapping of register
analysis’ situational features, two factors suggest themselves in cognitive seman-
tics.
First, Talmy (2007b) acknowledges the substantial import of the channel-re-
lated situational features (including production circumstances as it were), which
rank high as major determinants of linguistic variation. He in fact refers to the
fundamental nature of the two modes’ production and reception circumstances
inherited from evolution and giving rise to their characteristic modality-related
reflexes – a view that would correspond to Biber privileging them as more deci-
sive. More specifically, it is categorical physical differences in the representational
format that essentially separate the analogous, coextensive and simultaneous
spoken modality (which in principle allows for gradient and relative distinctions
as in vocal dynamics) from the exclusively digital and discrete written system of
representation. It disallows gradient and relative distinctions and is characteris-
tically confined by two-dimensional space (see Section 5 below for some details
on the constraints imposed by conventionalised print).
Second, in Talmy’s cognitive semantics, situational features of register analy
sis may be conceived as inbuilt in lexical items’ associated meaning sectors. To
illustrate: participant-related characteristics like (encyclopaedic and shared)
knowledge or epistemic, affective and attitudinal stance become accessible via
the conceptual complexes of linguistic items themselves, which, in turn, are
notably shaped by another language-external general principle, a language
user’s cognitive state (including attention resources and memory capacity). Such
cognitive reflexes are at the heart of Talmy’s (forthcoming) The Attention System
of Language, and they may be captured, quite generally, as a linguistic item’s
attentional profile, critically determining its usage (cf. Sections 4 and 5).
Such salience effects, I would argue on a more general level, comply with
register analysis in many respects: Talmy (forthcoming), in fact, proposes to (re-)
analyse the contextual components of lexical items as part of their associated
meaning; and the “central notion of a speaker’s particular attitude can then –
through a backgrounding of the role of the speaker – be interpreted instead as a
type of speech situation”, which, in turn, accommodates the concept of regis-
ter. Accordingly, “any speaker attitude or register pertaining to the core meaning
that is lexicalised in a morpheme” as well as targeting “the speech participants
(the speaker himself, the addressee, or the relation between the two) or […] the
current circumstance” would then appear as “introjecting” register distinctions
Linking up register and cognitive perspectives 179
into their “minds” and thus be subject to the fundamental attentional processes
of activation and attenuation. Under such analysis, “register can always be traced
back upstream to speaker attitude”, incorporating specifics of the communicative
setting in the contextual sector of an item’s meaning for that matter; and
what [for example] might best be treated at root as a speaker’s attitude of respect toward
the addressee – or a speaker’s attitude of solemnity about the circumstance – could also
be interpreted as the presence of a formal situation that triggers the use of a formal register
(Talmy forthcoming).
4 A
n attentional analysis of parenthetical
delivery
Following this sketch of a situational analysis, this section will focus on and
contextualise one meta-linguistic attentional mechanism from Leonard Talmy’s
(forthcoming) The Attention System of Language8 that specifically accounts for
the pattern of parenthetical delivery in the spoken mode.
In this model, each individual attention-specifying device is seen to increase
or decrease the relative attentional weight of a particular linguistic representa-
tion’s (semantic) component or (surface) constituent, which, irrespective of its
linguistic format or structural category, thus coherently accounts for the linguis-
tic variation in terms of attentionally specified, discriminate usage profiles. It is
this functionality of linguistic choices to which “skilled speakers and writers can
devote considerable meta-cognitive attention [in] their options for setting an enti-
ty’s degree of salience” (Talmy forthcoming) and which arguably again invokes
fundamental issues likewise systematically addressed in Biber’s register, genre
and style perspectives.
For the present analysis, I will only selectively and cursorily call on two such
basic attention factors: one that captures attentional properties of an individual
morpheme and one that specifies attentional effects of one entity on another; that
8 Talmy’s forthcoming book introduces a coherent theoretical and powerful analytical factor
model of linguistic attention, informed by a sophisticated theory of language-specific attentional
parameters and accounting for a wide range of attentional effects in language, so far privileging
the (more basic) spoken modality. The individual basic attention factors successively integrate
as component mechanisms, or Areas, in (hierarchically organised) Domains: Domain A, Atten-
tional properties of an individual morpheme, Domain B, Attentional properties of a morpheme
combination, Domain C, Attentional effects of one entity on another.
180 Martina Lampert
is, they assign “different degrees of salience to the parts of an expression or of its
reference or of the context” (Talmy 2007a: 264).9
The attentional mechanisms relevant for parenthetical constructions are
framed in terms of meta-linguistic causal triggers and targets with two distin-
guishable attentional effects: for one, as an immediate effect of a target’s “desig-
nation as the relevant entity out of the entities co-present in the environment”,
its activation level is raised, thus increasing its salience; as a second effect, the
conceptual or referential content of the respective entity will either be activated
or attenuated, lending this selectional target its specific “dual character” that,
in turn, calls for its “differential attentional treatment”, which depends on the
actual impact on the referent, i.e., foregrounding or backgrounding it (Talmy
forthcoming).
The factor addressing attentional effects of parentheticity in the spoken
modality identifies a prosodic device as trigger that first highlights the parenthe-
sised sequence’s referential content as the selected-out target and whose salience
is subsequently attenuated via its prosodically differential realisation. The widely
assumed medium-specific mechanism induces an “expression-spanning loud-
ness reduction and pitch lowering” that together “seems in general to reduce a
hearer’s attention on the expression’s meaning”; and such parenthetical delivery
would then “trigger attentional decrease in a target – in particular, to attenuate
the expression’s reference”, in effect instructing the addressee to consider the
target’s referential content as incidental (Talmy forthcoming).
Accordingly, the parenthesised clause in example (2), “if pronounced as just
described, seems to encourage a hearer to treat its content as merely incidental
information, readily disregarded” (Talmy forthcoming):
(8) My cousin Sue (who happened to be visiting at the time) wanted to go to the museum.
9 This is a substantially simplified version of the actual attentional analysis, abstracting away
from intriguing details in the description and largely avoiding the usage of Talmy’s terminology.
Linking up register and cognitive perspectives 181
5 T
oward an attentional profile of parenthetical
constructions
In view of the general comparative perspective, I will now address major medial
differences as well as cross-medium correspondences in the “attentional behav-
iour” of the parenthetical pattern: like the supposed prosodic signature in the
spoken medium, the selected-out parenthesised sequence will, in print, undergo
(some) activation as an effect of being marked off as different from the adjacent
text; but unlike parenthetical delivery in the spoken mode, which allows for
gradience along a quantitative parameter (i.e., activation and attenuation from
minimal to substantial) and is essentially “fluid” in character as well as subject
to individual variation (hence, probably less discriminately effective), the lunulae
will attract (some) attention to themselves by virtue of their qualitative differ-
ence in figural shape as against their linguistic environment. Though members in
the inventory of alphabetical script, the crescent-shaped delimiters are perceiv-
ably distinct from their graphemic vicinity on account of their physical make-up
instantiating a non-alphanumeric representational system.10
Quite analogous to parenthetical delivery in the spoken modality, the lunulae
will attract attention to themselves as well as direct attention to another entity,
the parenthesised item(s); however, the figural elements themselves, categor-
ically different from the analogue gradient parameters of vocal delivery, lack
any perceptual quality to iconically induce an attention attenuating effect on
10 Nunberg (1990: 6–7), for instance, emphasizes the independence of this figural “linguistic
subsystem” as relatively autonomous and sets it on a par with non-linguistic graphical-rep-
resentational systems (cf. also Biber 1988: 7 and 9; Bredel 2008: 10–14).
182 Martina Lampert
the target and will instead initiate an(other) activation process. The digital and
discrete lunulae do not readily support gradience in a quantitative dimension,
and the vision-based linguistic subsystem in alphabetical languages exclusively
relies on the principle of categorical (figural) difference, having conventionalised
only discrete, all-or-none devices – with no perceptually gradient feasible to indi-
cate reduced salience in some physical parameter.11 The lunulae are essentially
separative, ‘point-like’ spatial delimiters, unequivocally signalling the begin-
ning and end of what is considered the prototype of parenthetical constructions
in print. Again, contrary to their functional equivalent of parenthetical overlay
delivery in the spoken modality, they are not coextensive with the parenthesised
sequence: by their curved shape, wide at their centres and pointed at their two
ends, lunulae – iconically speaking – “embrace” a sequence that in this way
receives an identity of its own, both separated from and integrated into its envi-
ronment, and effective in delimiting the item(s) “inside”. Accordingly, any poten-
tial attenuation that may be associated with the parenthetical pattern does not
derive from perceptual stimuli but would exclusively have to be understood as
a mere convention that has been negotiated in the literate community. It is thus
ultimately an effect of (prescriptive) formal instruction or cultural practice exhib-
iting the view that these characters signal the reader to treat the parenthesised
target as an aside deserving lesser attention.
In conclusion: while the parenthetical pattern cross-medially shares the
essentially dual character of attentional selection and weighting, the outcome
is different: discrete lunulae do not allow for the unequivocal attenuating effect
of parenthetical delivery. Readers encounter a visual stimulus whose categorical
difference both in the type of triggering device and its attentional impact is at the
mercy of the written medium’s essentially digital nature; and with the parenthe-
sised sequence being perceptually non-distinct from the previous and subsequent
typographical environment, no perceptual effect in either attentional direction is
reasonably to be expected. This cross-modal variance in parenthetical construc-
tions’ fundamental characteristics ultimately results in a categorical difference of
the same functional pattern: it derives from the tangible features pertaining to the
production and reception circumstances and gives rise to the profound (though
11 In principle, the written modality would not prohibit attentional gradience in the target,
though: light fonts, e.g., in a context of regular fonts might conventionally correspond to the re-
duced loudness over an expression in the parenthetical delivery and would in fact implement the
attentional principle of reduced quantity in a physical parameter. Exploiting such attenuating
potential has obviously never been considered as a possible general strategy.
Linking up register and cognitive perspectives 183
not absolutely but continuously quantifiable) divide into speech and writing by
major situational parameters (cf. Biber 1988: 38–45).
6 T
racing the parenthetical pattern across written
registers
Whereas the preceding section focused on mode-dependent differences, the
analysis to follow will now highlight commonalities across registers: my prelimi-
nary findings from an initial small-scale case study of arguably the extreme ends
of the register continuum, (scientific) academic prose and experimental poetry,
seem to speak in favour of a common cognitive principle underlying the pattern –
despite its enormous range of structural variability (cf. Lampert 2011 for some
exemplification). Parenthetical constructions may then qualify as an integration
feature (cf. Biber 1988: 43), significantly sharing both the structural pattern and
the communicative function across the two samples. In fact, scrutinising a novel
candidate for inclusion in the pool of register-indicating characteristics is not
unlikely to produce unexpected results, as when “certain linguistic features will
occur more frequently […] than […] expected” (Biber and Conrad 2009: 10).
A case in point apparently are the lunulae – a pertinent and prominent signa-
ture feature of E. E. Cummings’ minimalist poetry, in combination with an uncon-
ventional use and expressive functionalisation of not only punctuation marks
in general but also of deviant orthography and innovative typography. Though
clearly a major issue of style analysis12 and typically “associated with aesthetic
preferences” rather than being functional (Biber and Conrad 2009: 18), Cum-
mings’ usage of lunulae indeed appears to sensibly allow for, or even invite, a
comparative analysis regarding the parenthetical pattern, all the more so in light
of Biber’s (1988: 13) emphasis that any such function must not be “posited on an
a priori basis; rather [it is] required to account for co-occurrence patterns among
linguistic features”. Following a similar rationale and giving the linguistic dimen-
sion priority for the time being, I would indeed suggest that, across written regis-
ters (perhaps even including styles), parenthetical constructions are more likely
to share an essentially cognitive function rather than exhibiting a great extent of
variation; and it might turn out that it is “only” their contextual co-occurrence
12 Biber and Conrad (2009: 18) note that style analyses are “similar to register perspective” in
that “typical linguistic features [are] associated with a collection of text samples from a variety”,
they characteristically differ “in the underlying reasons for the observed linguistic patterns”.
184 Martina Lampert
(9) a. In the case of DAF, the speech is amplified and delayed (alteration in the time
domain), whereas FSF shifts the whole spectrum of speech.
Gloss: DAF abbreviates externally-delayed auditory feedback, and FSF replaces fre-
quency-shifted feedback.
(9) b. Three speech tasks (Reading Aloud, Monologue, Conversation) were used to examine
speech fluency at baseline and in each condition repeated independently for each
participant with the device in the left and right ear.
(9) c. For purposes of this study, attention was measured by a computerized version of the
CPT with this measure approaching significance with the PWS having higher scores
(more impaired attention) compared to controls.
Gloss: CPT abbreviates Conner’s Continuous Performance Test and PWS substitutes
people who stutter.
Following the linearity constraints of the reading process,14 a reader of the above
samples will, after an uninterrupted sequence of graphemes (delayed, tasks and
scores) and an obligatory blank space, encounter the opening character, which
would – according to the attentional analysis presented in the previous section –
13 In this description, I do not imply any claim whatsoever about the actual on-line processing.
14 These constraints testify to the strict(er) principle of linearity in written language; cf. Biber
(1988: 38), Bredel (2008: 9 and 30–31).
Linking up register and cognitive perspectives 185
first attract their attention to the character itself, resulting in its activation. The
lunula is then immediately succeeded by another uninterrupted sequence of
graphemes (alteration, Reading, and more) – the first constituents of the respec-
tive parenthesised sequences, all separating, by blank spaces, the word forms
in their specific linear sequences. Directly attached to the final items domain,
Conversation, and attention, another such figural element, the complementary
closing lunula, signals the end of the parenthetical construction, which is imme-
diately followed by a comma as well as another blank space in (1) a., while (1) b.
and c. only feature a blank space preceding the word forms of the non-parenthe-
sised text: whereas, were and compared.
Example (10), Cummings’ untitled experimentalist poem, exhibits only some
variations on the same theme:
(10) l(a
le
af
fa
ll
s)
one
l
iness
Note, first, that all instances of the shape <l> have to be considered typograph-
ically ambiguous to represent the lower case grapheme of the corresponding
lateral approximant/l/, the numeral one, and the first person singular pronoun, I.
In addition to the deviant vertical arrangement of the characters, a reader
is confronted with the homograph I, which adds to the effect of alienation ini-
tiated by the unconventional assembly of symbols and is likely to arouse sur-
prise, irritation or delight in the recipient. Note that the integration of a paren-
thetical construction into a morpheme is, in principle, admissible beyond the
poetic context, as in formal academic registers, for instance: cf. as one example
W(h)ither English, the title of a 1990 article by Peter Widdowson15. Deviating,
however, from conventionalised practice, the poem incorporates a sequence of
15 See Lampert (2011) for further exemplification of how deliberate academic writers exploit the
device to create the ambivalence crucial for their intended reading.
186 Martina Lampert
16 It may be worth noticing that the clausal pattern in (10) would comply with an expected gen-
eral preference in non-academic texts, but see (6), while (9) a. through c. represent the phrasal/
nominal prototype of informational writing (cf. Gray 2013: 368). Both, however, document refer-
entially non-explicit structures (without any indication of the relation), which would, following
Biber (cf. 1988: 145), go against the stereotype of academic samples.
Linking up register and cognitive perspectives 187
(11) No significant associations were observed between device effect and any of the three
measures of motor (manual) laterality (i.e., handedness) (all Bonferroni-corrected
p-values = 1.00).
(12) It should be noted, however, that findings regarding the influence of the device on
stuttering must be interpreted with caution as other factors (e.g., changes in speech
rate, amplification) may contribute to enhanced fluency with this treatment.
17 Any claim to do justice to the literary intricacies involved is explicitly beyond the purpose of
these few remarks; my concern is solely with a demonstration of a common cognitive principle.
188 Martina Lampert
18 Cf. also Brown’s (2009) “ambivalent nature of the parenthesis” and his repeated appeal to the
concept of attention, i.e., “foregrounded” vs. “unimportant” and attention vs. importance in the
introductory lines of his “dissertation”.
19 On the associated implications of figure-ground reversal in vision see Palmer (1999: 280–287).
20 Perceptual psychology abounds in experiments that confirm a robust effect: having realised
this ambivalence, a (per/con)ceiver’s perceptual system is likely to switch to and fro, and it is
hard to keep attention on only one “interpretation” or effect (see Palmer 1999). By analogy, the
two alternatives are accessible for a reader, who may, however, choose one option as primary.
Linking up register and cognitive perspectives 189
creating, through competing readings, the illusion of conveying more than one
message at a time. Accordingly, an individual reader may choose to either focus
their processing capacity on the parenthesised or the non-parenthesised infor-
mation – depending, in the case of the academic text, presumably on a particular
reader’s expertise and knowledge; hence, the actual processing, then, proves a
question of adequacy of understanding. In the poetic context, in contrast, the
choice appears to be(come) an issue of preference, with the ensuing effect of sur-
prise or delight, and thus, a matter of propensity for playfulness.
A final word goes to the poem, which, according to literary critic Alistair
Brown (2009) stretches the edges of the pattern (too far?); he expresses his
“verdict” that “the example […] is extreme […], useful for illustrating the range of
possibility in the lunulae but hardly representative of the general use.” I would,
however, argue that this conclusion is justified only at first sight: in attentional
terms, the poem ‘only’ employs the principle of divided attention, which quite
naturally invites strategic instrumentalisation – playing on the systematic ambi-
guity of what to attend to more. This option is indeed exploited by Cummings,
yet entirely within the ‘legitimate’ confines admissible in the visual medium of
print: “The synthesis of two different possibilities occurs here, but visually rather
than metaphorically” (Brown 2009). My conjecture is rather that both the figural
elements and the structural pattern – even in such an allegedly extreme case –
“just” perform their general cognitive “task” along with its well-known effect,
generating (per)ceptual ambiguity – which, if sensible, may indeed be quite dis-
illusioning in the context of an avantgarde piece of art.
Against the distinctive dual nature of parenthetical delivery’s attentional
profile with its (potential) selective activation and attenuation in the spoken
mode, the pattern in the vision-based written modality rather suggests divided
attention as an appropriate reference concept to capture parenthetical con-
structions’ modality-specific effect: well-known from Gestalt psychology’s fig-
ure-ground-distinction, it entails that attention should be divided between two
possible readings and thus creates the illusion of conveying two “messages” at a
time. In particular, with respect to the device’s predictive potential I would argue
that the difference in impact across the two sample registers selected for this
paper “boils down” to an after-effect of surprise in the poem, resulting from the
non-conformity to reader expectations associated with its conventionalised genre
norms that are prevalent in formal (academic) writing.
190 Martina Lampert
7 N
ew vistas: Balancing out cognitive
determinants and situational constraints
on parentheticity
Though definitely limited in both scope and variation of the constructional type,
with only two samples scrutinised, this outline account may nevertheless have
given at least a sense of the central argument, spelling out a principal cognitive
determinant of parenthetical constructions in general in its medium-specific
profile of print in particular: first, the nature of human cognition imparts, and
allows for, certain forms of implementation that are shared across cognitive
systems and is largely controlled by attention; as a second major effector, the
tangible properties of the production and reception circumstances impose their
constraints on the pattern’s structural options, manifesting a fundamental divide
between the language modalities and giving rise to two distinct medium-sensitive
attentional profiles of the parenthetical construction.
Based on the general principle of divided (visual) attention, parenthetical
constructions emerge, in the written mode, as a register and genre-independent
phenomenon. The pattern’s perceptual ambivalence “naturally” follows from the
lunulae’s attentional activation as non-graphemes, on the one hand, and from
the parenthesised sequence’s non-difference to its graphemic environment, on
the other. With any palpable perceptual attenuation effect missing in the con-
struction’s formal representation, neither a more nor a less activated sequence
may be identified, unless one would permit, once more, that the bias toward the
spoken modality be conceptually imposed on print, perpetuating the false ideol-
ogy that writing is dependent on speech.
Thus, the crucial question is why parenthetical constructions exist in the first
place, and why they are so pervasive in the written registers, given the principled
option of this production circumstance for multiple revisability. One reasonable
suggestion may be that, as a meta-cognitive device, the pattern allows for a con-
ceived additional (or separate) information level, hence fictively circumventing
the linearity constraint of the linguistic medium’s spatial two-dimensionality.
With their general cognitive parameter of divided attention, parenthetical con-
structions convey two “messages” at a time, or, as Nunberg (1990: 115–116) has
it, they – like quotations – “depart from a presumptive text”. If attenuation as a
general feature were retained, equivalent processing of the alternatives regarding
a specific ideational content in the linear form of progression available in print is
precluded in principle, thus sacrificing a text’s adaptability to the mind-sets and
expectations of individual readers that will play a decisive role in determining
the preferred reading. Globally, then, parenthetical constructions indeed instan-
Linking up register and cognitive perspectives 191
tiate a convenient textual strategy that may both support and be (consciously)
exploited for specific communicative purposes.
Apart from these general (register and genre-independent) cognitive impli-
cations, however, it proves an entirely empirical issue whether the suggested
discourse function(s) – here restricted to the complementary logical relations
of generalisation and specification (plausible as they may be for the selected
cases) – might be hypothesised to hold across registers. In this vein, an in-depth
corpus-based register analysis of representative samples that will pay respect to
higher-level discourse functions and their expected complex, systematic inter-
action with situational characteristics is essential to ultimately determine the
range of variation across the textual dimensions – whether a limited set of func-
tional relations possibly constrains parenthetical constructions or whether any
determination will rest on the unique interaction between the given text and an
individual reading experience, probably with few or no a priori generalisations
possible.
What the commonality of the two samples from disparate written registers
may, however, indeed suggest is the significance of the communicative “task”,
which perhaps proves a – or: the – decisive criterion of register variation (cf.
Gray 2013: 364), being largely independent of the type of structural integration:
while Cummings’ text with its clausal specimen rather invokes a presumptive
oral written register, the academic article conforms to the stereotype of phrasal
modification characteristic of formal writing; but in both cases the parenthetical
construction manifests itself as a “cognitive ‘marker’ of written discourse, which
can only be produced in circumstances that allow planning and manipulation
of the text” (cf. Gray 2013: 368). Register analysis will certainly contribute its
findings to provide insights into detailing the exact distribution of frequencies
of distinctive and salient co-occurrence patternings across (sub)registers, paying
respect to functional differences between registers in terms of their “internal
coherence”, i.e., the degree of variation that they tolerate (Biber 1988: 26). Con-
verging on the same observation of non-linearity or multi-layeredness (cf. Biber
1988: 21), the cognitive semantics view might offer a sensible motivation for the
abstracted underlying functional dimension – effects resulting from the cognitive
constraints that divided attention (dis)allows.
192 Martina Lampert
References
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. London: Longman.
Bredel, Ursula. 2008. Die Interpunktion des Deutschen. Tübingen: Niemeyer.
Brown, Alistair. 2009. Parentheses and ambiguity in poetry of the twentieth century. http://
www.thepequod.org.uk/essays/litcrit/parenthe.htm (accessed 30 January 2015).
Corpus of Contemporary American English (COCA). corpus.byu.edu/coca (accessed 29
September 2015).
Cummings, E. E. 1973. Complete poems, 1904–1962. George J. Firmage (ed.). New York: Liveright
Publishing Corporation.
Dehé, Nicole. 2014. Parentheticals in spoken English: The syntax-prosody relation. Cambridge:
CUP.
Foundas, Anne L., Jeffrey R. Mock, David M. Corey, Edward J. Golob & Edward G. Conture. 2013.
The SpeechEasy device in stuttering and nonstuttering adults: Fluency effects while
speaking and reading. Brain and Language 126(2). 141–150.
Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4).
359–379.
Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English
language. Cambridge: CUP.
James, William. 1950 [1890]. The principles of psychology. New York: Dover Publications.
Kasimir, Elke. 2008. Prosodic correlates of subclausal quotation marks. ZAS Papers in
Linguistics 49. 67–77.
Lampert, Martina. 1992. Die parenthetische Konstruktion als textuelle Strategie. Zur kognitiven
und kommunikativen Basis einer grammatischen Kategorie. München: Otto Sagner.
Lampert, Martina. 2009. Attention and recombinance: A cognitive-semantic investigation into
morphological compositionality in English. Frankfurt am Main: Peter Lang.
Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts
on a cognitive-semantic analysis of written language. International Journal of Cognitive
Linguistics 2(1). 86–106.
Lampert, Martina. Forthcoming. Cognitive motivations of second(ary) voices: A multimodal
perspective on parentheses and quotations. [Conference proceedings of the international
workshop on secondary syntax: Parentheticals, vocatives, quotations. University of
Bamberg, 6 December 2014].
Lennard, John. 1991. But I digress: The exploitation of parentheses in English printed verse.
Oxford: Clarendon Press.
Nunberg, Geoffrey. 1990. The linguistics of punctuation. Stanford: CSLI.
Nunberg, Geoffrey. 1999. Introductory Essay to the Norton Anthology of English Literature,
Seventh Edition. http://people.ischool.berkeley.edu/~nunberg/norton.pdf [1–22]
(accessed 29 September 2015).
Palmer, Stephen E. 1999. Vision science: Photons to phenomenology. Cambridge, MA: MIT
Press.
Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in
English: Exploring the guide functions of punctuation. Tübingen: Narr.
Linking up register and cognitive perspectives 193
Schwyzer, Eduard. 1939. Die Parenthese im engern und im weitern Sinne. Berlin: de Gruyter.
Talmy, Leonard. 2003. The representation of spatial structure in spoken and signed language.
In Karen Emmorey (ed.), Perspectives on classifier constructions in sign language,
169–195. Mahwah, NJ: Erlbaum.
Talmy, Leonard. 2007a. Attention phenomena. In Dirk Geerarts & Hubert Cuyckens (eds.), The
Oxford handbook of cognitive linguistics, 264–293. Oxford: OUP.
Talmy, Leonard. 2007b. Recombinance in the evolution of language. Proceedings of the 39th
annual meeting of the Chicago Linguistic Society: The panels. Chicago: Chicago Linguistic
Society. 26–60.
Talmy, Leonard. Forthcoming. The attention system in language. Cambridge, MA: MIT Press.
[draft version from 2010]
Tartakovsky, Roi. 2009. E. E. Cummings’s parentheses: Punctuation as poetic device. Style
43(2). 215–247.
Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm Kelsall &
John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London: Routledge.
Stella Neumann and Jennifer Fest
Cohesive devices across registers and
varieties: The role of medium in English
Abstract: The present paper aims at analysing varieties of English from a func-
tional as well as regional perspective, arguing that these two parameters of varia-
tion differ, but are closely related in the way they influence and shape language.
For that purpose, the six regional varieties of Singapore, Hong Kong, India,
Canada, Jamaica and New Zealand are examined in a corpus-based approach
drawing on the data from the International Corpus of English (ICE). All regional
varieties are represented in the study by the same five registers: academic writing,
administrative writing, broadcast discussions, conversations and exams.
The analysis focuses on the dimension of medium, which is examined in
terms of three concrete linguistic markers: the use of pronouns, conjunctions and
lexical density. The results clearly show differences along both regional and func-
tional lines which allow comparative conclusions about the speech societies in
question.
1 I ntroduction
English has a peculiar status amongst the languages found in different parts
of the world. For numerous reasons it developed along diverse lines in various
regions, resulting in a large number of different varieties spoken in almost all
corners of the world (for an overview of 76 varieties including pidgins and creoles
cf. Kortmann and Lunkenheimer 2013). These different regional varieties show
particularities which depend on the socio-cultural background and history of
the respective speech communities and the status English has in that context.
New varieties continue to evolve, arguably because the role of English is still
growing. One interesting question in this context is how to determine whether a
speech community’s use of English can be categorised as a new variety with its
own set of linguistic features (or whether the observed peculiarities are mistakes
and the putative variety is simply learner language). Amongst the criteria that
have been mentioned to determine whether a given use of English has emerged
as a new variety is the development of a distinct set of registers (cf. Mollin 2007).
The notions of functional variation, i.e. register, and regional variation are thus
closely related (cf. Schubert, this volume). It should be noted that throughout this
paper the notion of regional variation is used drawing on Halliday’s distinction
between variation according to the language user versus variation according to
language use (cf. Halliday 1978: 183): regional variation in this sense refers to
speaker-related variation based on his/her geographical provenance in contra-
distinction to functional variation capturing context-related variation independ-
ent of the speakers’ personal background. ‘Regional’ in this sense is being used in
a way that is thus broader than the more specific reference to dialects and related,
more local varieties, which is more commonly used in variational linguistics.
Varieties of English have been described extensively both individually and
comparatively (e.g. Kortmann and Szmrecsanyi 2004; cf. Section 2), and the same
is true for register variation (e.g. Biber 1995; Neumann 2013). What is still largely
missing is a systematic account of register variation across varieties of English, a
notable exception being Xiao (2009; cf. Section 2). Apart from that, even though
Systemic Functional Linguistics prioritises paradigmatic relations, i.e. the lan-
guage user’s choice depending on the meaning s/he wants to express and accord-
ing to different contexts, register-based research across varieties of English is
still at its beginning and often focuses on individual linguistic features (cf. e.g.
Güldering, this volume; Schaub, this volume).
This study presents a partial analysis of registers across different varieties of
English as part of an ongoing research project that aims at taking stock of the dif-
ferences and similarities in terms of register variation across varieties of English.
In the framework of this corpus-based project, we examine six components of the
International Corpus of English (ICE; Nelson, Wallis and Aarts 2002)1 and cur-
rently five of its text categories in order to collect findings for the different regis-
ter parameters drawing on systemic functional register theory (e.g. Halliday and
Hasan 1989). A previous study (Neumann 2012) provided a first set of findings
on one subcategory for each of the three register parameters “field”, “tenor” and
“mode” of discourse respectively, namely experiential domain, social distance
and medium. This study takes up medium again, this time concentrating on cohe-
sion, since the choice and frequency of cohesive devices reflect some interesting
specificities of the spoken versus written medium.
Since linguistic features are polyfunctional, it should not come as a surprise
that a register study re-analyses the same features in the light of different register
parameters. In the case of this study, this is true for two of the three indicators
that will be examined in Section 4, which have also been discussed by Neumann
(2012) in the context of other register parameters.
The remainder of the paper is organised as follows: Section 2 discusses the
relationship between register and variety in more detail, thus motivating the
approach chosen for this study. We will go on to summarise the corpus method-
ology including the operationalisation of medium as well as the concrete quan-
titative measures used in this study in Section 3 before discussing the results of
the corpus analysis in Section 4. The paper closes with some concluding remarks
in Section 5.
ship between the abstract system and concrete instances of language use as
mediated by probabilities of (co-)occurrence of linguistic features (e.g. Nesbitt
and Plum 1988; Halliday and James 1993; cf. also similar ideas in usage-based
accounts of the cognitive family of theories). This entails a skewed distribution
of features across different types of instances. Systemic Functional Linguistics
specifically argues that this probabilistic distribution results in subsystems
which filter the available features depending on the requirements or conventions
of recurring situations (e.g. Matthiessen 1993; cf. from a sociolinguistic point of
view Berruto 2004). The specific constellation of features in a given situational
context is called register. Given this link between register and situation types, it
appears plausible to assume that registers are constrained by the cultural context
in which they occur: given the range of variation in terms of cultural contexts
across varieties of English, it is unlikely for registers (i.e. situational contexts) to
be congruent across varieties.
Sociolinguistics usually focuses on the description of varieties and often does
not emphasise more general claims about language – even though current the-
orising in the cognitive family of theories tends to interact with sociolinguistics
and in particular investigates areas traditionally associated with sociolinguis-
tics (cf., for instance, Kristiansen and Dirven 2008). Descriptive linguistics, on
the other hand, tends to ignore variety-specific features and take an immediate
shortcut to general claims about the language system. A general account of a lan-
guage system organised by register is still not the norm.2 A further stratification
of language according to varieties is even less common, even though it may make
sense to insert variety as an intermediate category between language and register
(Berruto 2004). Significant differences in the type and number of registers as well
as in their linguistic characterisation should affect the general description of the
English language.
On the basis of this line of reasoning, we can conceive of the relationship
between language, variety and register as follows: a language may consist of
several varieties, and an established variety is partly identifiable as such because
it has its own set of registers. More specifically, this means that the particular cul-
tural context of a speech community gives rise to a specific set of situation types
which are linked to specific linguistic registers.
One way of verifying this model and – if shown to be viable – of using it for
systematic descriptions is to analyse corpora covering a broad range of situations
2 Examples of how the register perspective can be integrated in general descriptions of English
are the fourth edition of Halliday’s introduction to functional linguistics (Halliday and Matthies-
sen 2013) and the Longman Grammar of Spoken and Written English (Biber et al. 1999).
Cohesive devices across registers and varieties: The role of medium in English 199
3 This is exactly the problem that Biber’s (1995) comparative corpus design resolves.
200 Stella Neumann and Jennifer Fest
3 M
ethodology
3.1 D
ata
As already mentioned in the introduction, the texts that were used for our analysis
were extracted from the International Corpus of English, a comparative corpus of
English worldwide, which contains spoken and written data from a whole range
of varieties in the form of different components. Several additional components
are currently being collected.
This study adopts the approach to the corpus analysis introduced by
Neumann (2012) and thus analyses five different text categories from six ICE com-
ponents. It should be noted that “text category” is a notion of the corpus compil-
ers which is taken here to provide a rough estimate of what could turn out to be
registers. The same is true for the notion of component. Again this is a label for
sub-corpora representing what the compilers identified as a variety of English. In
what follows, we will assume that the text categories in the ICE components can
be roughly equated to registers in varieties. The registers examined are:
AcWrit: Academic writing from the natural sciences (file numbers W2A-021 – W2A-030 of
the original corpus design)
AdWrit: Administrative writing (W2D-001 – W2D-010)
BCDiscs: Broadcast discussions (S1B-021 – S1B-040)
Conv: Conversations (S1A-001 – S1A-030)
Exams: Timed exams (W1A-011 – W1A-020)
Altogether, the set totals 80 files per component, with 50 files for two spoken and
30 for three written categories. This roughly mirrors the design of the ICE collec-
tion, which contains more spoken than written data, although to a slightly lesser
degree. Note that the standard ICE design uses a fixed set of 500 files, where
the individual file may, depending on the text category, consist of several texts.
Usually, the different texts in one file are not identified by individual IDs but are
simply marked by the tag <I> in the internal structure of the file. This poses a
problem for register studies, where the unit of analysis is the text (not the file).
As it is consequently impossible to compute frequencies per text, we lump all fre-
quencies per files in each text category together in one value in Section 4.
Both spoken categories are classified as dialogic and unscripted, the only
difference being that conversations are identified as private, whereas broadcast
discussions are marked as public. They are distinguished from broadcast news
and broadcast talks, which are marked as monologic and scripted.
Cohesive devices across registers and varieties: The role of medium in English 201
3.2 O
perationalising cohesion
Ivan Pavlov, was the first person studying about classical conditioning in 1903. His demon-
stration was about the salivating of dog. He noticed that dogs accustomed to the proce-
dure would start salivating before the meat powder was presented. These are considered
as unconditioned stimulus and unconditioned response and they occur without previous
conditioning. Ivan then used the ringing of a bell as another stimulus to be paired as a pres-
entation with the meat powder. After a number of times, he found that the dogs responded
by salivating to the sound of bell alone. (ICE Hong Kong, text W1A-004, Student Essays)
Personal pronouns serve as a basic form of (personal) reference, i.e. the realisa-
tion of the same or similar referential meaning by different linguistic expressions.
Typically, a full lexical item (including phrases) as an antecedent is taken up in
the ensuing text by a pro-form, especially personal pronouns, articles or demon-
stratives. The excerpt given above includes several such references: Ivan Pavlov,
who is the sole human actor in this paragraph, is referred to using the pronouns
he and his. Furthermore, the cause and effect of his experiments are referred to
as these and they.
In accordance with previous studies and as mentioned above, pronominal
reference is considered a characteristic typical of spoken registers (cf. Halliday
and Hasan 1976; Biber et al. 1999). Lexical cohesion refers to links between text
chunks by repeating a previously used lexical item or by replacing it by a semanti-
cally related one. Typical indicators include various types of sense relations; this
study, however, examines a summary indicator, namely lexical density, which
summarises the overall role of the vocabulary in texts and is said to be higher in
written language (cf. Biber et al. 1999: 62; Halliday 2001). In the example from the
corpus, there are 43 function words and 49 content words (based on the part of
speech-tagging included in ICE HK), and the 92-word-paragraph therefore shows
a lexical density of 53.26.
Conjunctions represent a third type of cohesive device: they mark the log-
ico-semantic relationships between linguistic units, rather than operating by
204 Stella Neumann and Jennifer Fest
3.3 B
y all means: Measures for the comparison of registers
and varieties
varieties and registers, thus describing the respective feature without any restric-
tions. It is a necessary benchmark in order to put the register- and variety-specific
results into perspective. We thus give the magnitude of difference between the
grand mean for the respective feature and the specific value for one register in
one variety by subtracting the specific value from the grand mean across varieties
and registers.
In contrast to this register-oriented characterisation, a second cycle analy-
ses the cohesive devices based on their occurrence within each variety, again as
the magnitude of difference from the grand mean, but now disregarding regis-
ter. Neumann (cf. 2012: 84) describes the related variety mean, i.e. the arithmetic
mean of the relative frequency of a feature within a variety across all registers.
This study compares other types of descriptive statistics represented by boxplots
(cf. Section 4.2).
The major purpose of the final combinatory step is to compare the range of
variation that register features display in a variety, thus allowing conclusions
about register-specific characteristics. The cohesive features of pronominal ref-
erence, conjunctions and lexical density will be analysed in terms of the range
of variation across registers within one variety. These range values for the three
cohesion indicators are then processed to obtain the mean range of variation,
showing the overall degree of register variation in the variety.
As stated in Section 3.1 above, it is impossible to calculate occurrences per
text given the structure of the components of the International Corpus of English
used for this study. As a consequence, we cannot subject the data set to meaning-
ful mean-based inferential statistics, even though this would allow us to examine
the interaction between register and variety statistically. In Section 4, all results
will therefore be reported in the form of descriptive statistics.
4 Analysis
The variation in our corpus which was found on the basis of register showed results
that were, for the most part, in line with what has been found about spoken and
written language previously (e.g. Biber et al. 1999). This section will look at the
results for the individual linguistic features, namely pronouns, conjunctions and
lexical density, in more detail and examine their distribution within registers.
The relative frequency of pronouns given as the difference from the grand
mean (cf. Section 3.3 for the calculation of the values) in Table 1 is clearly higher
206 Stella Neumann and Jennifer Fest
Table 1: Personal pronouns per tokens presented as the difference from the grand mean in
percentage points
These values, always in comparison to the grand mean, are not unexpected. They
do, however, give a first indication of the differences between the registers when
compared across the varieties. Although pronouns are frequent in spoken regis-
ters in general, Canadian English, Jamaican English and Singapore English stand
out in both broadcast discussions and conversations as containing considerably
more instances of pronouns than the other three varieties.
Although these variety-based differences are less distinct in the written cat-
egories, here, too, Jamaican English stands out as deviating most from the grand
mean in all but the timed exams, where Singapore English displays a slightly
clearer divergence from the mean. Canadian English, on the other hand, demon-
strates less distinctive pronoun frequency in the written registers and stands out
as the only variety to contain more pronouns in timed exams than indicated by
the grand mean.4
4 This particularity should be treated with caution. The texts in this category in the Canadian
ICE component show some striking similarities in topic, which suggests a glitch in the corpus
compilation.
Cohesive devices across registers and varieties: The role of medium in English 207
Figure 1: Boxplot5 of the percentage of personal pronouns per tokens for each variety in the five
registers
The boxplots in Figure 1 show striking register differences, with academic writing
and administrative writing having a low frequency and broadcast discussions
and conversations displaying high frequencies. Exams display slightly higher
frequencies than the other written registers. Interestingly, the range of variation
across varieties is higher in the spoken registers with negligible variation in aca-
demic writing. Writers in this register seem to align to some common convention
in the use of pronouns (cf. Section 4.4).
Similarly, interesting observations can be made when looking at the registers
separately. As shown by the diverging extension of the boxes for each text cate-
gory, which indicates the range of variation of the middle 50 % of all observations,
there is much less variation in academic writing and exams across varieties than
in the other text categories (with the mentioned outlier of Canadian English). The
other three registers show a broader range of the usage of this linguistic feature,
most strikingly so the spoken fields of broadcast discussions and conversations.
In the first case, the difference from the grand mean varies between 1.51 in India
and 4.63 in Jamaica, while in the latter the range is from 5.15 in New Zealand to
8.92 in Singapore.
5 Boxplots contain information on the smallest and largest observation in a category (by the
whiskers), the interquartile range, i.e. the middle 50 % of all observations (by the box) and the
median, i.e. the value separating the higher half of the observations from the lower half (by the
line in the box). Outliers, i.e. observations clearly distant from the other observations, are plotted
as points.
208 Stella Neumann and Jennifer Fest
Figure 2: Boxplot of the lexical density for each variety in the five registers
Like the use of pronouns, the values for lexical density displayed in the texts meet
the expectations depending on their register, as can be seen in Table 2.
Table 2: Lexical density for each register presented as the difference from the grand mean in
percentage points
All varieties show the highest degree of lexical density in the field of academic
writing, and there is no clear variation within the register. Administrative writing,
too, yields only values above the grand mean. In this register, however, Hong
Cohesive devices across registers and varieties: The role of medium in English 209
Kong English stands out as having a rather low lexical density, especially in com-
parison to the higher value in Jamaican English. The difference between the two
varieties amounts to 2.99 percentage points.
In contrast to academic writing and administrative writing, the last written
category, which contains texts from timed exams, appears surprisingly diverse.
Here, Hong Kong English, Jamaican English and Singapore English are closest,
ranging only between 2.23 and 2.72 percentage points. New Zealand English
shows a less distinct lexical density than these three. In contrast, Indian English
shows a value of 6.43, making it the only variety to nearly reach the degree of
lexical density it displays in the register of academic writing and surpassing that
of administrative writing. Indian English shows the highest average of lexical
density in written language, as opposed to Hong Kong English with the lowest.
The spoken registers, on the other hand, exclusively present degrees of
lexical density below the grand mean. The register of conversations yields the
lowest values for lexical density in all varieties, with Canadian English, Jamai-
can English and New Zealand English with nearly identical values at the high
end of the range, and Hong Kong English, Indian English and Singapore English
grouped around a lower range value. In total, however, the range between the
two most distant components is no more than 1.48. In comparison, the register
of broadcast discussions shows slightly more internal variation. Here, Canadian
English and Jamaican English are furthest below the grand mean with values
of 6.90 and 6.84, and Hong Kong English, New Zealand English and Singapore
English cluster between 5.01 and 5.45, while Indian English again stands out with
a rather high lexical density (3.96). This variety seems to rely fairly strongly on
lexical means to create cohesion.
While the analyses of the usage of pronouns and lexical density proved to
comply with earlier studies of spoken and written registers (cf. Biber et al. 1999:
65, 333–334), the frequency of conjunctions, the last linguistic feature in our
research, provides many unexpected values for the six varieties. Neither spoken
nor written registers appear consistent in displaying values above or below the
grand mean, with the sole exception of the category of broadcast discussions,
which, however, still contains a considerable amount of intra-register variation,
as Table 3 shows.
210 Stella Neumann and Jennifer Fest
Table 3: Conjunctions per tokens presented as the difference from the grand mean in
percentage points
Like the use of pronouns and the degree of lexical density, the use of conjunc-
tions in the spoken registers is particularly pronounced in Jamaican English. But
while other varieties, mainly Canadian English and Singapore English, show an
almost identical distinction for the first two features, Jamaican English stands out
regarding conjunctions.
Canadian, Hong Kong, Indian and New Zealand English do not show any
extraordinary values in the spoken registers, varying only slightly between
broadcast discussions and conversations and displaying values in both catego-
ries above the grand mean. Singapore English, in contrast, is the only variety that
yields a frequency of conjunctions clearly below the grand mean in the register
of conversations. This makes broadcast discussions the only one of the five reg-
isters that complies with what is usually observed in spoken language, namely
an above-average use of conjunctions in comparison to the grand mean and
especially to written texts. Given the setting of broadcast discussions involving
several speakers who all contribute to a particular topic, the register is likely to
have some argumentative character. The frequent use of conjunctions appears
particularly suitable for linking the arguments across clauses.
The written registers in this study each display exceptions, too. In academic
writing, Jamaican English is again the variety which deviates furthest from the
grand mean, but Canadian English is the only example of a variety using more
conjunctions than on average, if only marginally so. In administrative writing,
it is New Zealand English that deviates clearly; the use of conjunctions in this
variety may reflect some argumentative style in presenting the administrative
contents. In contrast, Singapore English shows the strongest tendency towards
the written medium by using clearly fewer conjunctions in comparison to the
grand mean. The most peculiar values, however, can be found in the register of
timed exams. Indian English and, with quite some distance, Hong Kong English,
Cohesive devices across registers and varieties: The role of medium in English 211
rely much less on conjunctions, both showing a frequency of this cohesive device
below the average given by the grand mean. Singapore English almost equals the
grand mean. New Zealand English, Jamaican English and Canadian English, on
the other hand, contain more conjunctions in exams than the average. Although
this makes timed exams the most diverse register, it has to be kept in mind that
exams are written by students. Depending on the age and educational degree of
the examinees as well as the topic of the exam, their styles of writing will thus
differ from each other due to factors other than their language variety alone.
Figure 3: Boxplot of the percentage of conjunctions per tokens for each variety in the five
registers
While the boxplots for pronouns and lexical density (Figure 1 and Figure 2)
display clear differences between the registers, the differences are much less
distinctive for conjunctions (see Figure 3). Academic writing and administrative
writing have an almost identical median in terms of the relative frequency, i.e. not
in comparison to the grand mean. Only the range of variation across varieties is
larger in administrative writing. The other three registers display higher medians.
4.2 S
umming up varieties
The tables and figures depicted above show values per register and all six vari-
eties within them, this section sums up the register variation which the varie-
ties display for every one of the linguistic features. So far, every variety rendered
five values, one per register, for the use of pronouns, conjunctions and lexical
density. Every one of these values determines the register-specific distinction of
212 Stella Neumann and Jennifer Fest
this feature by comparing it to the grand mean. The distribution of each linguistic
feature within a variety will be examined with the help of boxplots displaying the
descriptive statistics for each variety for a linguistic feature across text categories.
As can be seen in Figure 4, Jamaican English renders the highest range of
variation in terms of relative pronoun frequency, closely followed by Singapo-
rean and Canadian English. The latter is clearly distinct from the other varieties
because it displays the highest median, whereas all other varieties have an almost
identical median. New Zealand and Indian English show the smallest range for
the use of pronouns.
Figure 4: Percentage of pronouns per tokens represented as variation of text categories per
variety
The picture looks a little more diverse for lexical density of the different varie-
ties (as shown in Figure 5). Here, Canadian English and again Jamaican English
display the highest overall range, yet their medians differ greatly, showing that
Canadian English, as it did for pronouns, stands out, this time with the lowest
median. In terms of range of variation, Jamaican English shows considerable
variation across text categories as visible from the widest interquartile range.6
Indian English has the highest median of all varieties, reflecting what was said
from the register perspective in Section 4.1. Hong Kong, Jamaican and Singapo-
rean English are almost identical in terms of the median, whereas New Zealand
English displays a slightly lower median in comparison with these three varieties.
This suggests a tentative interpretation of a similarity between the two L1 varie-
ties, as Canadian English and New Zealand English are the two varieties with the
lowest median value for lexical density. Apparently these two L1 varieties have
a slightly reduced tendency to draw on lexical means to create cohesion in the
registers under investigation.
The most diverse results were yielded for conjunctions (Figure 6). This is hardly
surprising, as the closer analysis in the previous section already pointed in this
direction; yet an overview over the varieties makes this even more obvious. The
two L1 varieties Canadian and New Zealand English are again clearly similar:
both display narrow ranges here and the median is rather high with 6.5 (CAN)
and 6.6 (NZ). Similar to the findings for lexical density, Jamaican English again
has the highest range for frequency of conjunctions, in particular when consid-
ering the interquartile range. Its median is relatively high, at least in comparison
to the other L2 varieties. Compared to the other L2 varieties, Indian English has
a relatively small interquartile range. Singaporean English, despite most values
being clustered around a median of 5.7 %, contains two outliers with considera-
bly low as well as high deviations from the median. The two East-Asian varieties
(Hong Kong and Singapore English) are similar in displaying the lowest median
for conjunctions.
214 Stella Neumann and Jennifer Fest
Figure 6: Percentage of conjunctions per tokens represented as variation of text categories per
variety
The last step to reach a value that represents the register variation within a variety
is the combination of the ranges obtained for the individual linguistic features
described in the previous section. The absolute range from the highest to the
lowest observation across text categories is calculated for every feature, and the
three values are added up and their mean obtained, as shown in Table 4.
Since this study only analysed three linguistic features, observations in the varie-
ties do not differ much from each other, yet some tendencies can be observed that
allow some conclusions. As could already be deduced from the previous discus-
sion, Jamaican English displays the highest register variation for the three indica-
tors as represented by the sum and mean of ranges, if only slightly so. Canadian
Cohesive devices across registers and varieties: The role of medium in English 215
and Singapore English are almost even and close to Jamaican English; all three
display a reasonable amount of variation in their use of cohesive devices, which
can be traced back mainly to distinct deviations in the spoken registers. Indian
and Hong Kong English, on the other hand, show less variation. The different
registers, representing spoken and written language, are therefore less distant
from each other in terms of cohesive devices, a pattern which, when looking back
at the more detailed results, originates both from the spoken and written parts of
the analysis. New Zealand English displays the least variation for the three indi-
cators across registers.
4.4 D
iscussion
The analysis in the previous sections gives insight into the distribution and usage
of cohesive devices from different perspectives, variation across registers as well
as regional variation. While the combination of these points of view makes the
analyses comparable, the explanatory power of the three features alone is limited:
this methodology only reaches its full potential when applied to a broader range
of features and registers.
When looking at registers, there is a not very surprising notable difference
between the spoken and written registers. Lexical density and the frequency
of pronouns behave complementarily: whenever a register across all varieties
displays a low value for one feature, it will invariably display a high value for
the other feature. This confirms the findings from the literature mentioned in
Section 3. The frequency of conjunctions also confirms the distribution described
for other varieties (cf. Biber et al. 1999: 81); the differences between registers are,
however, much less pronounced than for the other two features. The registers can
be organised along a scale of orality with conversations and broadcast discus-
sions at one end and academic writing and administrative writing at the other
end. Interestingly, timed exams always take up a middle position: apparently,
while written, they still have a clear influence of the spoken medium as far as
cohesive devices are concerned, which might be traced back to the incomplete
development of a register repertoire of the students sitting the exams.
However, within these two major groups we can also find particularities and
variation. The register of conversations displays more extreme deviations from
the grand mean than that of broadcast discussions, which might be due to the
fact that broadcast talk is more formal in terms of social distance (cf. below) and
often prepared to a considerable degree in terms of medium. And, of course, it is
public. By contrast, conversations are as spontaneous as can be. When looking
at pronouns, especially, the difference between these two registers is clear and
216 Stella Neumann and Jennifer Fest
reflects their respective functions. While conversations might vary with respect
to their goal, the number and identity of the participants can be assumed to be
rather stable. Therefore, using pronominal reference poses no problem. Radio
audiences, however, are subject to substantial fluctuation, and pronominal ref-
erence might be lost on some listeners if they enter the program at random times
during its course, making it advisable for radio moderators to avoid or at least min-
imise the usage of this feature. Furthermore, the purely phonic channel in which
the radio broadcasts are transmitted requires more reliance on auto- instead of
syn-semantic reference. These functional assumptions are also reflected in the
lexical density, which is higher in broadcast discussions than in conversations.
While the latter contain a considerable amount of function words (not least of
which are pronouns) and are also described as featuring more intricate sen-
tence structures (cf. Halliday 2001), broadcast information has to be designed to
be understood easily while at the same time not taking too long in order not to
strain the listeners’ attention span. The picture of conversations and broadcast
discussions showing similar tendencies but displaying some striking differences
has already been shown in the study of social distance on the same data set (cf.
Neumann 2012).7 While both registers showed an above average use of contrac-
tions and interjections, which can certainly be considered typical for spoken dis-
course in general, the less spontaneous and most of all more anonymous register
of broadcast talk showed a higher use of titles, especially in the L2 varieties of
Indian, Jamaican, Singaporean and Hong Kong English. This certainly is a means
of creating or maintaining a distance that is rare in conversations and indicates
the fact that in broadcast discussions, the participants might not know each other
very well or use the title as a piece of information for the radio audience.
In contrast to the spoken registers, the written categories of academic writing,
administrative writing and exams show less variation. The former two are very
close in their use of cohesive devices, which might be traced back to their rather
high degree of standardisation and norms. Administrative writing, especially, can
be assumed to follow strict guidelines or even use pre-fabricated forms or text
blocks depending on the topic, which arguably draw more on lexical means than
on pronominal reference. Lexical density is above average, which relates to Neu-
mann’s (2012) finding of the register being content-oriented and rather neutral in
social distance. Academic writing even shows slightly more extreme tendencies
regarding cohesive devices; the reasons for this, however, are surely very differ-
7 The numbers discussed in this paper are updated from those reported in Neumann (2012). Nev-
ertheless, all tendencies reported there remain unchanged and consequently the interpretation
is also maintained.
Cohesive devices across registers and varieties: The role of medium in English 217
ent. The register shows very little internal variation when separated according to
the regional variety it comes from, which hints at the international character of
the research community. Furthermore, the texts for this study were taken from the
same general thematic area, namely natural sciences, and thus render the sample
even more homogeneous. This observation is confirmed in the study of social dis-
tance, where academic writing stands out as the most content-oriented register.
In contrast to this register-based perspective, differences between vari-
eties are less clear-cut. Although there are many small divergences among the
six regional varieties, no patterns or groupings emerge that would allow strong
claims about patterns of the varieties; rather, individual trends stand out which
hint at particularities in some of the varieties. The most striking of these can
be found in Jamaican English, which displays extreme values for most features
as well as most registers. While other varieties show peaks in the usage of one
feature or in single registers, cohesive devices are notably high or low through-
out most categories in Jamaican English. Singaporean English, too, is set apart
in some aspects, most obviously regarding the use of conjunctions. Especially in
administrative writing, conjunctions are by far rarer than in the other varieties,
and it is the only variety which displays a below-average use of conjunctions in
conversations.
Even though there is much less variation across varieties than across regis-
ters – an observation which can hardly be surprising given that we are looking at
varieties of English in contexts where a sufficient amount of functional variation
is required –, some interesting observations can be made when comparing the
boxplots displaying the register variation within each variety. Canadian English
diverges from the other varieties in the median for all three indicators. The other
L1 variety, New Zealand English, behaves similarly for lexical density and con-
junctions. The three L2 varieties display more variation, so some tendency of the
L1 varieties to display more homogeneous patterns than the L2 varieties seems to
emerge. Reasons for this could be found in the exonormative characters of these
L2 varieties; the fact that the standard towards which the language is oriented
originates in a very different part of the world makes an adaptation to a certain
degree inevitable. The language is made to fit the needs of the new speech com-
munity with regard to societal, regional and geographical contexts, which can
be expected to be mirrored in the registers in use. Furthermore, as part of this
adaption, L2 varieties often come into more contact with other languages and
might thus display linguistic characteristics originating from these interferences.
More generally, one might speculate that this coarse patterning into L1 and L2
varieties reflects exactly this: the status of the respective types of varieties with
a long history of (transplanted) mother tongue speakers in the case of Canadian
and New Zealand English on the one hand and indigenized non-native varieties
218 Stella Neumann and Jennifer Fest
with reduced exposure to native English on the other hand. At the same time,
the two L1 varieties might also betray more interaction with a standard variety.
However, only a multivariate study of the type reported by Szmrecsanyi and Kort-
mann (2009), one which is based entirely on corpus findings rather than intro-
spective data, can tell whether these assumptions hold true across a wide range
of indicators and varieties.
5 C
onclusion
The analysis presented in this paper aimed at determining the degrees of cohe-
sion within a variety, based on the particularities that are displayed by differ-
ent registers. By examining the distribution of cohesive devices as indicators of
medium, the distinctiveness of individual registers can be observed and com-
pared across different regional varieties of English. In order to obtain a broader
and more representative overview of the register variations within different vari-
eties of English, more linguistic features representing other register parameters
have to be analysed. At the same time, more varieties would ensure a more even
coverage of the English language, which would be a benefit particularly for the
calculation of the comparative value of the grand mean – not only would it then
represent more varieties and thus become ever more general or ‘grand’, but every
new variety taken into the framework of the study would automatically be drawn
on for comparisons by being included in this value.
Similar thoughts of course hold true also for the inclusion of more registers.
Both for spoken and written language, the ICE components hold many more files
than those of the five registers analysed here, and an augmentation of the data in
this way would allow more universal statements about spoken and written texts
and their differences. This distinction, then, apart from insights into individual
registers, would show most clearly which functions a variety mainly serves in
a community by laying open whether written or spoken registers have devel-
oped a more distinct character. The present study also showed, however, that
despite the usefulness of the investigation of the interaction between register and
variety, the International Corpus of English, independent of its undisputed value
for other types of varieties-related research questions, may not be the best data
set to investigate this interaction. The recently compiled GloWbE Corpus (Davies
2013), a collection of English web texts from 20 countries, cannot be used as an
alternative because it does not provide the information needed to distinguish reg-
isters. Currently, research is under way (Fest, forthcoming) that will afford a more
Cohesive devices across registers and varieties: The role of medium in English 219
detailed and statistically more robust analysis of the interaction between register
and variety in English based on a corpus compiled for this specific purpose.
The combination of functional and regional variation thus still leaves a wide
field to be explored and many questions to be answered. Even with the limited
varieties and registers analysed so far, however, it becomes apparent that reg-
isters function as a very suitable gateway to understanding and describing the
development and status of a variety, determining its particularities and putting it
into perspective among Englishes worldwide at the same time.
References
Berruto, Gaetano. 2004. Sprachvarietät – Sprache (Gesamtsprache, Historische Sprache).
Linguistic Variety – Language (Whole Language, Historical Language). In Ulrich Ammon,
Norbert Dittmar, Klaus J. Mattheier & Peter Trudgill (eds.), Sociolinguistics/Soziolinguistik.
An International Handbook of the Science of Language and Society/Ein Internationales
Handbuch zur Wissenschaft von Sprache und Gesellschaft, 188–195. Berlin, New York: de
Gruyter.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: CUP.
Biber, Douglas, Geoffrey Leech, Stig Johansson, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. Harlow: Longman.
Davies, Mark. 2013. Corpus of Global Web-Based English: 1.9 billion words from speakers in 20
countries. Available online at http://corpus.byu.edu/glowbe/
Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate
approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli
(eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in
text and speech, 174–204. Berlin: de Gruyter.
Fest, Jennifer. Forthcoming. “News language in varieties of English: A corpus-based analysis of
newspaper reports.” PhD Thesis, Department of English, American and Romance Studies,
RWTH Aachen University.
Gregory, Michael. 1967. Aspects of varieties differentiation. Journal of Linguistics 3(2). 177–198.
Halliday, Michael A. K. 1978. Language as Social Semiotic: The Social Interpretation of
Language and Meaning. London: Arnold.
Halliday, Michael A. K. 2001. Literacy and linguistics: Relationships between spoken and
written language. In Anne Burns & Caroline Coffin (eds.), Analysing English in a global
context, 181–193. London: Routledge.
Halliday, Michael A. K. & Christian M. I. M. Matthiessen. 2013. Halliday’s introduction to
functional grammar. 4th ed, rev. Abingdon: Routledge.
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.
Halliday, Michael A. K. & Ruqaiya Hasan. 1989. Language, context, and text: Aspects of
language in a social-semiotic perspective. Oxford: OUP.
220 Stella Neumann and Jennifer Fest
Halliday, Michael A. K. & Zoe L. James. 1993. A quantitative study of polarity and primary tense
in the English finite clause. In John McHardy Sinclair (ed.), Techniques of description:
Spoken and written discourse, 3–35. London: Routledge.
Kortmann, Bernd & Benedikt Szmrecsanyi. 2004. Global synopsis: Morphological and syntactic
variation in English. In Bernd Kortmann, Edgar W. Schneider, Kate Burridge, Rajend
Mesthrie & Clive Upton (eds.), A handbook of varieties of English, 1142–1202. Berlin:
Mouton de Gruyter.
Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. eWAVE – The electronic world atlas of
varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://
ewave-atlas.org/. (Accessed on 2014-03-08.)
Kristiansen, Gitte & René Dirven. 2008. Cognitive sociolinguistics: Language variation, cultural
models, social systems. Berlin: de Gruyter.
Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of
register analysis. In Mohsen Ghadessy (ed.), Register analysis: Theory and practice,
221–292. London: Pinter Publishers.
Mollin, Sandra. 2007. New variety or learner English? Criteria for variety status and the case of
Euro-English. English World-Wide 28(2). 167–185.
Nelson, Gerald. 2006. The core and periphery of World Englishes: A corpus-based exploration.
World Englishes 25(1). 115–129.
Nelson, Gerald, Sean Wallis & Bas Aarts. 2002. Exploring natural language: Working with the
British component of the International Corpus of English. Amsterdam: Benjamins.
Nesbitt, Christopher & Günter Plum. 1988. Probabilities in a systemic grammar: The clause
complex in English. In Robin P. Fawcett & David J. Young (eds.), New Developments in
Systemic Linguistics, 6–33. London: Pinter Publishers.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik &
Benjamin Kohlmann (eds.), Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.
Neumann, Stella. 2013. Contrastive register variation. A quantitative approach to the
comparison of English and German. Berlin: de Gruyter Mouton.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Website. Lancaster.
http://ucrel.lancs.ac.uk/wmatrix/. (Accessed on 2014-03-09.)
Sand, Andrea. 2004. Shared morpho-syntactic features of contact varieties: Article use. World
Englishes 23(2). 281–298.
Sand, Andrea. 2008. Angloversals? Concord and interrogatives in contact varieties of English.
In Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta & Minna Korhonen (eds.), The
dynamics of linguistic variation: Corpus evidence on English past and present, 183–202.
Amsterdam: Benjamins.
Szmrecsanyi, Benedikt & Bernd Kortmann. 2009. The morphosyntax of varieties of English
worldwide: A quantitative perspective. Lingua 119(11). 1643–1663.
Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register
differentiation in East African English: A multidimensional study. English World-Wide
31(3). 311–349.
Wong, Deanna, Steve Cassidy & Pam Peters. 2011. Updating the ICE annotation system:
Tagging, parsing and validation. Corpora 6(2). 115–144.
Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World
Englishes 28(4). 421–450.
Section III:
Regional, contrastive and diachronic
register variation
The final section of the present volume broadens the analytical perspective in
order to include further issues that need to be addressed in a comprehensive dis-
cussion of variational text linguistics. While Section I gave a detailed analysis of
selected registers and Section II provided a juxtaposition of registers, Section III
offers a synchronic investigation of regional and contrastive register variation
as well as a diachronic study. The contributions will show that these different
approaches are by no means mutually exclusive but represent different facets of
one common research paradigm.
Both Barbara Güldenring’s paper “Metaphors in New English academic
writing” and Steffen Schaub’s contribution “The influence of register on noun
phrase complexity in varieties of English” deal with international varieties of
global English on the basis of the International Corpus of English, each focusing
on one particular linguistic category. In this way, Güldenring and Schaub con-
tinue Neumann and Fest’s comparative approach that concluded Section II,
although they place more emphasis on variational and sociolinguistic aspects.
While Güldenring discusses the semantic phenomenon of metaphor, Schaub
concentrates on the syntactic structure of the noun phrase. Güldenring deals
with English as a Second Language exclusively in Asia (India, Hong Kong and
Singapore), whereas Schaub includes Englishes in Asia (India, Hong Kong and
Singapore), the Caribbean (Jamaica) and North America (Canada), covering first-
and second-language use. Since it is not feasible to compare all these regional
varieties per se, Güldenring focuses on academic discourse, as previous research
has shown that this register is particularly rich in metaphor. In particular, she
compares metaphors in academic writing from New Englishes with more tradi-
tional varieties of English and examines the occurrence of metaphorical domains
in the sub-registers Humanities, Natural Science and Social Science with the help
of Conceptual Metaphor Theory. By contrast, Schaub takes into account not only
academic writing but also the registers of conversation, unscripted speeches and
social letters. He argues that an investigation of noun phrase complexity based
on modification types sheds new light on both internal register consistency and
regional variability, especially with respect to the situational features of commu-
nicative purpose and production circumstances. Both contributions demonstrate
that such multivariate approaches open manifold new research possibilities with
each potential parameter shift.
222 Section III: Regional, contrastive and diachronic register variation
1 I ntroduction
The field of World Englishes, including research devoted to the study of New
Englishes1, has grown into a prominent linguistic discipline within the last thirty
years. In addition to this body of research, an increasing number of studies
devoted to metaphor in authentic discourse have been exploring the relation-
ship between metaphor and register (Goatly 1997; Cameron 2003; Skorczynska
and Deignan 2006; Semino 2008; Steen et al. 2010; Semino et al. 2013). Steen
1 While “World Englishes” in reference to the academic discipline usually refers to the study of
any kind of English variety worldwide, “New Englishes”, a term attributed to Platt, Weber, and
Ho (1984), denotes those varieties that have grown in direct consequence of English’s spread
around the world (prominently via British colonialism) and, thus, have developed as nativised
varieties in areas, in which English was not traditionally the native language of the population,
fulfilling various (institutionalised) societal functions. In a descriptive sense New Englishes are
often characterised by variation on all linguistic levels due to substrate influence.
et al. (2010: 203) have found that the academic register “is characterized by the
highest proportion of metaphor-related words” of the registers they investigated,
including news, fiction and conversation. Corroborating this finding, Krennmayr
(2011: 322) concludes that “news texts contain a larger proportion of metaphor-
ically used words than fiction and conversation but a smaller proportion than
academic texts”. This not only indicates a significant metaphoricity of academic
texts vis-à-vis other registers, but it also marks a departure from older views on
the value of metaphor in the academic realm:
Nevertheless, Black (1954: 294) comes to the conclusion that there is “[n]o doubt
metaphors are dangerous and perhaps especially so in philosophy. But prohibition
against their use would be a wilful and harmful restriction upon our powers of
inquiry”. Especially this last statement comes across as a reluctant admittance of
the inescapable presence of metaphor, particularly in academic texts. Nowadays,
after decades of metaphor research, most prominently in the vein of Conceptual
Metaphor Theory (henceforth CMT), a negative stance, such as the one described
above, has been largely dispelled. This is due to the observations that academic
texts display a multitude of metaphors that are closely connected to the phenom-
ena they are used to describe (cf. Römer 2000: 353) and that metaphors, in fact,
constitute a large part of expert (as well as everyday) discourse (cf. Jäkel 1997:
284). Furthermore, this understanding of metaphor has led to important insights
about the function of metaphor in academic discourse, including its role in the
acquisition as well as imparting of knowledge (cf. Drewer 2003, Cameron 2003).
The present pilot study aims at contributing to the growing understanding
of the nature of metaphor and register by introducing a varietal perspective on
metaphor and register. It also aims at contributing to the cognitive approach to
World Englishes, which has only recently developed from the merging of these
two previously isolated paradigms (cf. Wolf and Polzenhagen 2009: 1). With these
particular goals in mind, I am primarily concerned with the following questions
concerning New English metaphor and academic writing:
1) What kinds of metaphor occur across New English academic texts in the first
place?
2) In view of their functional contributions, how are these metaphors distrib-
uted across the academic sub-registers of Humanities, Natural Science and
Social Science?
Metaphors in New English academic writing 225
The first question will be largely devoted to issues pertaining to metaphor distri-
bution on the basis of several conceptual mappings for the target domain concept
IDEAS in New English academic texts. In addition to this, I will briefly consider to
what extent the varieties under investigation differ in terms of the various entail-
ments or elaborations apparent in shared mappings. Addressing the second ques-
tion will involve scrutiny of the distribution of IDEAS metaphors according to the
academic sub-registers and, consequently, discussion of their functional roles.
However, before delving into these issues, the following outlines the theoretical
construct and analytic framework to which the present study adheres.
2 T
heoretical Background
In their immensely influential book Metaphors We Live By, Lakoff and Johnson
(2003 [1980]: 3) present a theory of metaphor, widely known as CMT (Conceptual
Metaphor Theory): “metaphor is pervasive in everyday life, not just in language
but in thought and action. Our ordinary conceptual system, in terms of which we
both think and act, is fundamentally metaphorical in nature”. They continue by
asserting that “[t]he essence of metaphor is understanding and experiencing one
kind of thing in terms of another” (Lakoff and Johnson 2003: 5). Thus, one impor-
tant claim of Conceptual Metaphor Theory is an experiential basis for metaphor,
which can be clearly seen by how an abstract concept like IDEAS is understood in
terms of more concrete concepts, with which we have more direct (bodily) expe-
rience:
and Johnson for not making room for genre in their theory, because “[c]ertain
metaphors may well be much more prevalent in one kind of writing than another,
in fact, one of the characterising features of a genre is probably the kind of meta-
phor generally to be found therein.” By extending this argument to New English
academic writing from the register perspective, the present study does suggest
that metaphor can be viewed as a characteristic register feature in the sense of
Biber and Conrad (2009):
The register perspective combines an analysis of linguistic characteristics that are common
in a text variety with the analysis of the situation of use of the variety. The underlying
assumption of the register perspective is that core linguistic features like pronouns and
verbs are functional, and, as a result, particular features are commonly used in association
with the communicative purposes and situational context of the texts.
With the aim of extending the notion of “core linguistic features”, this paper takes
the position that metaphors can be viewed as features of the text with which a
communicative function is linked.3 In terms of being “linguistic”, metaphors, due
to their pervasiveness, are accessible and can be located by their relation to the
lexico-grammatical, that is, linguistic realisations of underlying conceptual met-
aphors. Lakoff and Johnson (2003: 7) maintain that “[s]ince metaphorical expres-
sions in our language are tied to metaphorical concepts in a systematic way, we
can use metaphorical linguistic expressions to study the nature of metaphorical
concepts” and this has found application in various methods for metaphor iden-
tification. Therefore, I will explore metaphors as core functional features that,
while being conceptual in nature, provide evidence of their existence via the
linguistic expressions that point to them. This, in turn, allows for a more flexi-
ble notion of linguistic feature or characteristic in the register perspective. Simi-
larly, Wolf and Polzenhagen (2009: 16–17) make the case for including metaphor
in research on varieties of English, otherwise dominated by studies describing
variation of the more traditional structural elements of language, i.e. phonology,
morphology, and syntax:
From a CL perspective, the core of the descriptive approach advocates, however, a far too
narrow understanding of ‘form’ and of what counts as ‘linguistic peculiarities.’ This narrow
understanding deliberately excludes important dimensions of variation […] Unaddressed
are also crucial aspects of relations between linguistic units, beyond standard structuralist
formal and semantic parameters. Specifically, little or no attention is paid to the fact that
linguistic material from various domains is systematically linked through metaphoric and
metonymic mappings, which constitutes a key dimension of relatedness.
3 C
orpus data
The data in the present study stems from those components of the International
Corpus of English (henceforth ICE) which are representative for the New English
varieties associated with Hong Kong, India and Singapore. The subcategory of
academic writing was explored for metaphors pertaining to the target domain
IDEAS, since it was assumed that this domain would feature prominently in aca-
demic writing of all kinds. True to the overall design of the ICE project, for each
component under investigation, academic writing includes ten 2,000-word pub-
lished texts covering the disciplines of Humanities, Natural Science and Social
Science respectively.4
Nelson (1996: 32) provides a description of ICE academic writing in terms of
intended readership and mode of composition:
Printed material is written for a large, unrestricted audience that the writer does not know.
[…] Academic writing reaches a smaller, more well-defined readership [as compared to
newspapers, popular writing or fiction], but the exact individual readership is unknown
to the writer at the time of composition. […] writers of printed works are usually required to
follow the house style of the publisher […] for which they are writing. Printed material may
have been edited by a number of different people, and the final version is often a product
of several earlier revisions. […] Learned writing is produced by specialists for specialists. In
the humanities, for example, it may include journal articles by academic historians written
for other academic historians.
Integrating this into Biber and Conrad’s (2009) framework for the analysis of situ-
ational characteristics of a register, New English academic writing, as represented
by ICE, can be described as involving most commonly single or plural authors
addressing an un-enumerated audience. The author-reader relationship is on a
professional, specialist level that is characterised most significantly by shared
knowledge. Furthermore, as Nelson (1996) points out above, ICE academic texts,
because of their printed status, most likely entail highly revised and edited pro-
4 The ICE project also includes texts from the field of Technology.
Metaphors in New English academic writing 229
4 M
ethodology
In order to identify metaphors in ICE academic writing, a corpus-based study
(which has become more prominent in metaphor research in general) was the
logical choice. Berber Sardinha (2007: 12) summarises the distinct advantages of
studying metaphor with the help of corpora, such as ICE, over intuition-based
studies:
corpus-based studies can offer reliable information about the use of metaphors in language.
Another [advantage] is that corpora typically include large amounts of data, which can be
searched to provide information about the frequency of known metaphorical expressions.
Yet another is that genre or register-specific corpora can be explored to indicate metaphors
that are typical of certain fields or subject areas.
young to be in the running for Prime Minister or For months, polls showed the two
main parties neck and neck.
A similar sampling technique is used in Stefanowitsch’s (2006) Metaphor
Pattern Analysis (MPA), which is a corpus-based approach that aims at investigat-
ing metaphorical target domains by pre-determining lexical items that represent
those domains. Yet, in contrast to Deignan (2005), Stefanowitsch (2006: 66) is
more specific about identifying metaphor on the basis of what he calls metaphor-
ical patterns: “A metaphorical pattern is a multi-word expression from a given
source domain (SD) into which one or more specific lexical item [sic] from a given
target domain have been inserted”. Furthermore, metaphorical patterns “do not
merely instantiate general mappings between two semantic domains […]. [T]hey
establish specific paradigmatic relations between target domain lexical items
and the source domain items that would be expected in their place in a non-met-
aphorical use” (Stefanowitsch 2006: 67). This can be illustrated by a linguistic
metaphor such as That idea went out of style years ago, for which Lakoff and
Johnson (2003 [1980]) formulated the conceptual metaphor IDEAS ARE FASH-
IONS. According to Metaphor Pattern Analysis, we can clearly see how the meta-
phorical pattern establishes a paradigmatic relationship between idea and words
denoting clothing, which are also expected to fill the same slot, such as in That
shirt went out of style years ago. Stefanowitsch (2006: 71) investigates EMOTION
metaphors, like ANGER, by pre-defining a set of lexical items that correspond to
this domain, e.g. anger, fury, rage, wrath, etc., which help to retrieve metaphors
such as ANGER IS AN OPPONENT IN A STRUGGLE (X wrestle with anger, X protect
Y from anger, etc.) (Stefanowitsch 2006: 76).
The present study recognises the value of such methods and draws from
them by involving, in essence, a sampling technique of a similar kind. However,
the particular method used here departs from these types of sampling technique
in two important ways. Firstly, Deignan (2005: 92) claims that “the direction of
investigation in corpus studies is from the linguistic form through to meaning.
It is not possible to use the corpus to proceed in the other direction”. This is cer-
tainly valid from the perspective of an approach that relies on pre-formulated lists
of lexical items or strings of words. However, the present study aims at challeng-
ing this unidirectional view by taking a cue from Hardie et al. (2007) and attempt-
ing to approximate the direction of meaning to form. Secondly, the present study
automates the initial step of establishing key or representative lexical items for
investigating a specific target domain by prompting the corpus itself to provide
all lexis related to that domain. In the interest of incorporating both aspects into
the present method, I employed the web-based corpus analysis software Wmatrix
(Rayson 2009).
Metaphors in New English academic writing 231
Once all potential IDEAS metaphors were retrieved from the corpus data, these
candidates were manually analysed and marked as being instances of linguistic
metaphors or not. In order to do away with as much analyst intuition as pos-
sible, for this step I relied on MIP (the Metaphor Identification Procedure), ini-
tially developed by the Pragglejaz Group (2007) and further refined as MIPVU
(Metaphor Identification Procedure Vrije Universiteit) (Steen et al. 2010),6 which
assumes that “[m]etaphorical meaning in usage is indirect meaning: it arises
out of a contrast between the contextual meaning of a lexical unit and its more
basic meaning, the latter being absent from the actual context but observable
in others.” This procedure has been assessed by Berber Sardinha (2012: 22) as
belonging to the census technique. Nonetheless, the present study did not fully
adhere to the “census” quality of this technique because the texts comprising
the ICE academic writing under investigation were not all read in their entirety.
However, for making decisions on many of the metaphors, it was often necessary
to undertake a closer reading of the greater context to the extent that a contextual
meaning for the metaphor could be established.
In order to provide further support for metaphorical decisions made in this
study, I often additionally consulted the VU Amsterdam Metaphor Corpus Online,
which is the largest corpus annotated for metaphorical language according to
MIPVU (Steen et al. 2010), in order to tackle uncertain cases and to benefit from
the insight of multiple analysts. If a linguistic metaphor was identified in the VU
Amsterdam Metaphor Corpus Online for the uncertain case I was investigating,
then I also considered it metaphorical.7 For instance, advocates of a theory or
advocating a theory were considered metaphorical due to the fact that in the VU
Amsterdam Metaphor Corpus Online, similar formulations were also judged to be
metaphorical: Most advocates of biological theories was identified as an indirect
metaphor there (Steen et al. 2010).
After establishing which linguistic metaphors were present in the data, a final step
was undertaken to formulate potential conceptual mappings underlying these
6 For details of the individual steps of MIPVU, cf. Pragglejaz Group (2007) and Steen et al. (2010).
7 If it was not found in the VU Amsterdam Metaphor Corpus Online, I discarded the uncertain
case under investigation in order to avoid relying solely on my own intuition.
Metaphors in New English academic writing 233
metaphors. Harkening back to Lakoff and Johnson’s (2003) list of IDEAS meta-
phors, a few of these mappings were found in the data, but not all were present.
Therefore, the data warranted the consideration of other conceptual mappings,
and due to this circumstance, the following (broadly formulated) source domains
have been suggested as a means of categorising and, thus, quantifying the meta-
phors encountered in the data:
OBJECTS IN CONTAINERS: Medical personnel […] should keep this concept in mind <ICE-
HK:W2A-027#104:1>
CONTAINERS: philosophy does not confine to one particular subject matter <ICE-IN-
D:W2A-001#18:1>
POSSESSIONS: the parties […] would take the view <ICE-HK:W2A-014#52:1>
LANDMARKS: feminists also turn to the phenomenological reflection on the body, especially to
the idea […] <ICE-HK:W2A-003#86:1>
TOOLS: countries choose to use environmental issues to spark trade wars <ICE-
HK:W2A-011#27:2>
MIRRORS: Through an ideological mirror, individuals are constituted as subjects. <ICE-
HK:W2A-002#59:1>
CLOTHS: The common thread linking these two ideals <ICE-HK:W2A-004#41:1>
GOODS: straight thinking, therefore, is at a discount <ICE-IND:W2A-012#57:1>
BUILDINGS: These early ideas […] form the foundation of the modern idea of corporate social
responsibility <ICE-SIN:W2A-017#32:1>
PARTS OF BUILDINGS: These early ideas […] form the foundation of the modern idea of corpo-
rate social responsibility <ICE-SIN:W2A-017#32:1>
234 Barbara Güldenring
Table 1(continued)
JOURNEYS: the safe and well trodden areas of basic and general principles and practices <ICE-
SIN:W2A-002#5:1>
(VIOLENT) CONFLICTS: the concept […] came to be challenged exceedingly in Supreme Court
<ICE-IND:W2A-005#32:1>
GAMES: The succeeding tales also […] play off the themes <ICE-IND:W2A-008#59:1>
COMMUNICATIVE EVENTS: The notion of an individual text develops […] to the historical and
cultural dialogue <ICE-HK:W2A-002#110:1>
PRECIOUS METAL: The touchstone, of all ideas, should be not their novelty <ICE-IN-
D:W2A-012#97:1>
LIGHT/LIQUID: Christian theology/philosophy also absorbs the idea of process philosophy
<ICE-HK:W2A-005#46:1>
He regards the political perspective of understanding as the absolute horizon of all reading and
interpretation <ICE-HK:W2A-002#67:1>
5 Results
The method described above elicited a total of 458 metaphors for the target
domain IDEAS across all varieties (Hong Kong, India and Singapore) and aca-
demic sub-registers (Humanities, Natural Science and Social Science) with the
aid of a total of 1,011 X4.1 lexical items as provided by Wmatrix. Therefore, an
initial finding is that a good portion of the words used to talk about IDEAS in New
English academic writing show up in metaphors, namely 45.3 %.
Considering the basic distribution across varieties as well as across sub-reg-
isters of New English academic writing, as illustrated in Table 2, Hong Kong and
Academic Humanities emerge as the most metaphorical variety and the most
metaphorical sub-register, respectively, in terms of conceptualising the IDEAS
domain.
Natural Science
Natural Science
Natural Science
Social Science
Social Science
Social Science
Humanities
Humanities
Humanities
Total: Total: Total:
ORGANISMS 70 4 8 82 25 3 13 41 37 2 16 55
OBJECTS 47 2 14 63 27 4 5 36 21 1 10 32
ARTEFACTS 16 0 3 19 14 3 3 20 14 1 4 19
STRUCTURES 16 3 1 20 4 4 0 8 10 0 4 14
EVENTS/ 3 1 0 4 8 1 3 12 11 0 2 13
ACTIVITIES
MATTER/ 4 0 3 7 2 0 2 4 1 0 1 2
ENERGY/OTHER
NATURAL
PHENOMENA
IMAGES 4 0 0 4 2 1 0 3 0 0 0 0
One tendency that is apparent from the data in Table 3 is the reliance on the
source domain ORGANISMS to conceptualise IDEAS. As outlined above, this
broadly formulated category groups together more specific domains such as
PEOPLE and PLANTS. Nevertheless, of the domains in this category, the most
prominent for all varieties and sub-registers is, in fact, PEOPLE. For instance,
the bulk of IDEAS metaphors with the source domain ORGANISMS make use of
the PEOPLE domain. For Hong Kong academic Humanities the PEOPLE domain
is used 84.3 % of the time (59 out of 70), while in India academic Humanities it
is used 96 % (24 out of 25) and 81 % in Singapore academic Humanities (30 out
of 37). In Academic Natural Science texts, no matter what variety, PEOPLE is the
sole domain involved for the ORGANISMS category. This type of metaphor clearly
involves personification, which is in turn “the most obvious ontological meta-
phor” (Lakoff and Johnson 2003: 33). This finding is consistent with other studies
that have pinpointed personification as a characteristic feature of academic texts,
especially of the type “when a non-human entity (referring to some discourse
entity, such as a text) is the subject with a verb that requires a human agent”
Metaphors in New English academic writing 237
(Steen et al. 2010: 108). This type was found in the data across varieties as well as
across sub-registers, as the following briefly illustrate:
(b) “Old stone walls as an ecological habitat for urban trees in Hong Kong” < ICE-
HK:W2A-022>
(c) “Patterns of referral to the paediatric specialist clinic of a regional hospital:
descriptive study” < ICE- HK:W2A-023>
Moreover, IDEAS metaphors, based on this expected topical diversity8, occur with
varying elaborations from discipline to discipline and also functionally contrib-
ute to academic writing in different ways, as will be demonstrated below.
From the cross-varietal perspective, it is difficult to establish what variety
presents itself as most metaphorical on the basis of data concerning one concep-
tual domain. Additionally, although Hong Kong is clearly characterised by the
highest frequency of IDEAS metaphors, these metaphors still show up in com-
parable numbers in Indian and Singaporean academic writing. Therefore, what
is of greater interest here is the consideration of metaphorical variation beyond
frequency. Kövecses (2010: 216) states that “two languages may share the same
conceptual metaphor, but the metaphor is elaborated differently in the two lan-
guages”. For instance, the conceptual metaphors THE BODY IS A CONTAINER
FOR THE EMOTIONS and ANGER IS FIRE have an attested existence in both Hun-
garian and English; in Hungarian the body with fire inside is often elaborated
as a pipe – an elaboration that does not appear to be at work in conventional
English metaphors of this kind (Kövecses 2010: 216). By extending this notion to
the study of varieties, it is possible to establish variation along the lines of this
kind of elaboration. For instance, IDEAS were conceptualised in Hong Kong aca-
8 In the study of metaphor, we should not underestimate the problematic aspect of topical diver-
sity, which is related to the design of the ICE corpora. As far as the author of the present paper is
aware, the ICE texts, despite being carefully selected as representative examples of the text types
comprising the general design of the ICE project, were not selected on the basis of topic similar-
ity. Thus, ICE-based research into metaphor may run into the problem of absence of a domain,
not because a variety does not make use of this domain, but because it just so happens that the
topics of the text selected does not make use of it. This factor, along with the smaller nature of the
ICE components, does in the long run present difficulties for more extensive research into meta-
phor variation, for which more frequencies for a particular domain may be required. However, in
terms of register research, ICE’s design is still the best option for comparative study of varieties
and thus has been used in the present study.
Metaphors in New English academic writing 239
6 D
iscussion
(8) identity as a woman depends on the specific social regulatory ideals by which female
bodies are trained and formed <ICE-HK:W2A-003#26:1>
(9) it is widely accepted that general principles serve to guide moral conduct and decisions
<ICE-HK:W2A-004#114:1>
(10) Ethical behaviour is guided by the ethical ideal of caring and not by principles or rules.
<ICE-HK:W2A-004#125:1>
(11) we are under the guidance of the ethical ideal, that vision of the best self. <ICE-
HK:W2A-004#103:1>
240 Barbara Güldenring
(15) a controversial issue could be either a good or bad teacher by affecting learning through
its contents or through its dynamics. <ICE-SIN:W2A-002#6:1>
All in all, these metaphors show that, despite the potential for individual prefer-
ence for certain elaborations, such as IDEAS AS MORAL GUIDES or TEACHERS,
New English varieties, specifically in academic writing, tend to draw from the
same conceptual pool, that is, their metaphors display more conceptual similar-
ities than differences. This is perhaps not so different from varieties tradition-
ally conceived of as more “standard”, such as British or American English, which
would also speak to the strong conventional nature of the academic register, to
which I turn in the following.
Hong Kong 59 4 7
India 24 3 11
Singapore 30 2 12
Total: 113 9 30
All varieties consistently place Humanities on the more metaphorical side and
Natural Science on the less metaphorical side of the continuum, with Social
Science somewhere in between. This is a general trend for most other categories9,
illustrated for the second most prominent ontological metaphor, IDEAS ARE
OBJECTS, by Table 5.
9 The exceptions are 1) India academic Natural Science and Social Science, which both contain
three IDEAS ARE ARTEFACTS metaphors; 2) Hong Kong and India academic Natural Science has
more IDEAS ARE ARTEFACTS metaphors than SOCIAL SCIENCE; and 3) India academic Natu-
ral Science has one IDEAS ARE IMAGES, whereas India Social Science has none. However, the
frequencies involved here are very small and do not necessarily detract from the general trend.
242 Barbara Güldenring
Hong Kong 47 2 14
India 27 4 5
Singapore 21 1 10
Total: 95 7 29
However, in view of the topical diversity sketched out above, the mere fact that
these metaphors can occur in all academic sub-registers (albeit for some in small
numbers) is not necessarily an indication of similarity in the way these meta-
phors are elaborated. Just as we entertained the notion of variety-specific concep-
tualisations, we can consider discipline-specific conceptualisations, or at least
Metaphors in New English academic writing 243
preferences, by examining those metaphors from the OBJECTS category that are
not found in all academic sub-registers.
When IDEAS are conceptualised as OBJECTS in general, there is more of a
tendency in the Humanities, first and foremost, and in Social Science, secondly,
to highlight certain qualities, whereas in Natural Science no specific qualities are
attributed to IDEAS as OBJECTS. To exemplify this, consider the following quali-
ties, which can be formulated as individual mappings:
Examples (21) to (22) may not illustrate metaphors in the strictest “discipline-spe-
cific” sense due to their occurrence in two separate sub-registers, or they may be
an indication of Social Science containing texts of a more “Humanities” nature
than a “Natural Science” one. Nevertheless, when considering the frequency of
these metaphors, it becomes apparent that Humanities shows a slight preference
for them over Social Science, since IDEAS ARE MOVEABLE OBJECTS occurs 8
times in the Humanities and 6 times in Social Science, while IDEAS ARE VISIBLE
OBJECTS occurs 18 times in the Humanities and only 4 times in Social Science.
In fact, by taking a closer glance at the latter category, we can see a perhaps
more suitable candidate for a discipline-specific elaboration, because IDEAS are
not only VISIBLE OBJECTS in the Humanities, but also represented as VISIBLE
OBJECTS that were previously hidden from view and, by their revealing, have
attained the VISIBLE quality:
This particular elaboration makes up 72.2 % of IDEAS ARE VISIBLE OBJECTS (13
out of 18) in the Humanities texts and perhaps points to a functional role for this
metaphor in this academic sub-register. Humanities texts, often in introductory
sections, typically inform the reader about the history of ideas involved in the
discussion of the topic at hand. For instance, if we consider the greater context of
(23c), a linguistic paper from the India corpus entitled “Pragmatic Principles and
Language”, it becomes clear that IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY
HIDDEN FROM VIEW) functions to locate the paper within these previous ideas
and accentuate its contribution to these ideas:
(24) Philosophers have found pragmatics to be quite close to what they have called “ordi-
nary language analysis”. They have often used isolated insights about the working of
language in solving philosophical riddles without paying much attention to many of
the underlying pragmatic principles of the language that they are using. As they have
primarily concerned themselves with the theories of meaning, rules, and other related
issues, they were forced to study pragmatics of language incidentally without which
they would not have found it possible to explain, for example, what is “meaning”.
A fuller understanding of pragmatic aspects of the working of language is yet to be
achieved despite numerous attempts by philosophers and linguists. This paper aims
to put a step towards that by highlighting certain pragmetic [sic] principles, some of
which may go otherwise unnoticed. <ICE-IND:W2A-002#23:1-26:1>
is not only determined by the register in which they prominently feature, but also
perhaps by the extent to which a variety is nativised.10
Nevertheless, the present data provides insight into another preference and,
thus, another potential metaphorical register feature that can be seen in IDEAS
ARE PEOPLE, particularly those that stretch beyond the sentence boundary over
a larger portion of the text. Consider (25) below, which serves as an example of
how metaphors can influence textual structuring, that is, how they contribute to
the cohesion as well as coherence of a text more significantly in Humanities than
in Natural Science and Social Science:
(25) It is time for courses to introduce controversial issues in management studies. A con-
troversial issue covers new grounds. It enhances the learning process. It could facil-
itate further the practice of examining, analyzing and deciding skills. However, if not
carefully introduced, controversial issues could generate a disproportionate degree
of confusion, and result in demotivating the students. As such, the introduction of a
controversial issue in the curriculum would have to be properly managed because a
controversial issue could be either a good or bad teacher by affecting learning through
its contents or through its dynamics. <ICE-SIN:W2A-002#6:1-11>
We have encountered this metaphor before as IDEAS ARE TEACHERS (15) and
determined that it is a metaphor specific to the Singapore corpus. However, in
(25) we see that it functions to promote the coherence of the text, because an IDEA
(issue) is portrayed as having all those teacher-like qualities one could expect
when encountering a real teacher: A good teacher covers new grounds (top-
ic-wise), enhances the learning process, facilitates the practice of skills, while
a bad teacher can generate confusion and demotivate students. These qualities
are attributed to IDEAS via the repeated presence of the metaphor IDEAS ARE
TEACHERS, which is then directly stated at the end of the passage, acting as a
summary of sorts.
Here, it is also conceivable to consider this metaphor’s function in creating
cohesion due to the fact that almost each instantiation of IDEAS (issue(s), it) is
embedded in the same metaphor throughout the passage and all are linked by
language pertaining to both helpful attributes of a teacher (e.g. enhancing learn-
ing and facilitating practice of skills) as well as negative attributes (e.g. generat-
10 For extensive discussion about nativisation and the extent to which a variety, as it is develop-
ing, orientates itself towards the English input variety, cf. Schneider’s “Dynamic Model” (Schnei-
der 2007, 2003). Furthermore, research is currently being completed by the author of the present
paper exploring the relationship between metaphor and nativisation and, thus, considering to
what extent a variety, e.g. Indian English, behaves metaphorically different from its traditional
input variety, British English, for certain target domains, e.g. EMOTIONS.
246 Barbara Güldenring
ing confusion and demotivating). This is different for Natural Science and Social
Science texts, which do not give such prominence to IDEAS metaphors, and,
in doing so, leave little room for them to structure their respective texts in this
manner. Again, from this perspective, it seems to make more sense to talk about
potential metaphorical “register features” over “register markers” (cf. Schubert,
this volume).
7 C
onclusions
The assumption behind the present study is that metaphor is a characteristic and
functional feature of the academic register. Although this study focuses on meta-
phors conceptualising a single domain, it shows that, despite traditional notions
of the metaphorical poverty of this register, academic writing is by no means
void of metaphorical language, which, in turn, indicates the presence of concep-
tual metaphors. In particular, New English academic writing, as represented by
the ICE components under investigation, makes use of conventional metaphors
that can be encountered in academic writing associated with more traditional
varieties of English. This is perhaps the result of the highly revised and edited
production circumstances and international reach of this register, which, taken
together, may discourage more variety-specific conceptualisations in favour of
conventional metaphors intelligible to speakers of all varieties and non-native
speakers alike. Despite this conventionality, it is nevertheless possible to point
out potentially variety-specific conceptualisations by taking a finer-grained look
at how a variety elaborates on a more general metaphor. In fact, it is perhaps on
this level of analysis that metaphorical variation across varieties can be encoun-
tered in general. In order to provide more evidence for this, research on other
domains and with other varieties is required.
From the sub-register perspective, it is possible to pinpoint the most meta-
phorical discipline for a specific domain, e.g. Humanities as most metaphorical
for the IDEAS domain. Nevertheless, if other domains were examined, it could
very well be the case that a completely different academic sub-register emerges
as the most metaphorical. Furthermore, for metaphorical variation across the
disciplines in this study of New English academic writing, at this stage it is pos-
sible to identify potential candidates for metaphorical “register features” rather
than metaphorical “register markers” due to the fact that none in the data were
exclusive to one specific academic sub-register, although a preference for certain
metaphors can be determined. This also requires more research, which would
most certainly benefit from the inclusion of other sub-registers or comparison
Metaphors in New English academic writing 247
with metaphorical data from popular texts pertaining to the Humanities, Natural
Sciences and Social Sciences, which the ICE corpora also provide. In terms of
their functional properties, a metaphor conceptualising a certain domain may
exhibit functional features that can only be demonstrated for a particular
sub-register, like signalling a paper’s contribution to a body of research in the
Humanities. However, here again, further research can improve on the study of
metaphorical function by adhering more strictly to a “census” technique, such as
MIPVU, as well as relying on texts that do not display such a topical diversity, as
the ICE components do. Additionally, recent work in metaphorical variation and
the varieties11 exploit the advantages of using a significantly larger corpus, like
Davies’ (2013) Corpus of Global Web-Based English (GloWbE), in order to make
more extensive frequency-based claims about variety-specific domain prefer-
ences as well as to contribute to research into web registers (cf. Biber and Egbert,
this volume) from the cross-variety perspective12. All things considered, employ-
ing metaphor as a feature to investigate both variety-based and register variation
has the potential to provide many more insights into the nature of these highly
relevant fields of study.
References
Anthony, Laurence. 2012. AntConc (Version 3.3.5) [Computer Software]. Tokyo, Japan: Waseda
University. http://www.antlab.sci.waseda.ac.jp/
Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS Category System.
http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf (accessed 5 May 2011).
Berber Sardinha, Tony. 2012. An assessment of metaphor retrieval methods. In Fiona
MacArthur, José Luis Oncins-Martínez, Manuel Sánchez-García & Ana María Piquer-Píriz
(eds.), Metaphor in use: Context, culture, and communication, 21–50. Amsterdam &
Philadelphia: John Benjamins.
Berber Sardinha, Tony. 2007. Metaphor in corpora: A corpus-driven analysis of Applied
Linguistics dissertations. Rev. Brasileria de Lingüística Aplicada 7(1). 11–35.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.
Black, Max. 1954. Metaphor. Proceedings of the Aristotelian Society 55. 273–294.
Cameron, Lynne. 2003. Metaphor in educational discourse. London: Continuum.
Davies, Mark. 2013. Corpus of global web-based English. http://corpus.byu.edu/glowbe/.
11 Cf. Díaz-Vera’s (2015) study on various conceptualisations of LOVE in India, Pakistan and
Nigeria.
12 GloWbE provides an opportunity to efficiently compare 20 distinct varieties of English world-
wide, of which the bulk could be categorised as belonging to the “New Englishes”.
248 Barbara Güldenring
Deignan, Alice. 2005. Metaphor and corpus linguistics. Amsterdam & Philadelphia: John
Benjamins.
Díaz-Vera, Javier E. 2015. Love in the time of corpora. Preferential conceptualizations of love in
world Englishes. In Vito Pirrelli, Claudia Marzi & Marcello Ferro (eds.), Word structure and
word usage. Proceedings of the NetWordS final conference, 161–165. http://ceur-ws.org/
Vol-1347/paper37.pdf (accessed 13 May 2015).
Drewer, Petra. 2003. Die kognitive Metapher als Werkzeug des Denkens. Zur Rolle der Analogie
bei der Gewinnung und Vermittlung wissenschaftlicher Erkenntnisse. Tübingen: Narr.
Goatly, Andrew. 1997. The Language of metaphors. London & New York: Routledge.
Hardie, Andrew, Veronika Koller, Paul Rayson & Elena Semino. 2007. Exploring a semantic
annotation tool for metaphor analysis. In Matthew Davies, Paul Rayson, Susan Hunston &
Pernilla Danielsson (eds.), Proceedings of the Corpus Linguistics 2007 Conference, 1–12.
http://corpus.bham.ac.uk/corplingproceedings07/paper/49_Paper.pdf (accessed on 19
August, 2011).
Jäkel, Olaf. 1997. Metaphern in abstrakten Diskurs-Domänen. Eine kognitiv-linguistische
Untersuchung anhand der Bereiche Geistestätigkeit, Wirtschaft und Wissenschaft.
Frankfurt am Main: Peter Lang.
Kövecses, Zoltán. 2010. Metaphor: A practical introduction, 2nd edn. Oxford: OUP.
Krennmayr, Tina. 2011. Metaphor in newspapers. Utrecht: LOT.
Lakoff, George & Mark Johnson. 2003 [1980]. Metaphors we live by, 2nd edn. Chicago & London:
Chicago UP.
Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), Comparing English
worldwide: The International Corpus of English, 27–35. Oxford: Clarendon.
Partington, Alan. 1998. Patterns and meanings: Using corpora for English language research
and teaching. Amsterdam & Philadelphia: John Benjamins.
Platt, John, Heidi Weber & Ho Mian Lian. 1984. The New Englishes. London: Routledge.
Pragglejaz Group. 2007. A practical and flexible method for identifying metaphorically-used
words in discourse. Metaphor and Symbol 22(1). 1–39.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment, Computing
Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/
Römer, Christine. 2000. Metaphern in der Wissenschaftssprache: Bildfelder der
sprachwissenschaftlichen Fachkommunikation. In Josef Bayer & Christine Römer (eds.),
Von der Philologie zur Grammatiktheorie, 353–365. Tübingen: Max Niemeyer.
Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: CUP.
Schneider, Edgar W. 2003. The dynamics of New Englishes: From identity construction to dialect
birth. Language 79(2). 233–281.
Semino, Elena. 2008. Metaphor in discourse. Cambridge: CUP.
Semino, Elena, Alice Deignan & Jeannette Littlemore. 2013. Metaphor, genre, and
recontextualization. Metaphor and Symbol 28(1). 41–59.
Skorczynska, Hanna & Alice Deignan. 2006. Readership and purpose in the choice of
economics metaphors. Metaphor and Symbol 21(2). 87–104.
Steen, Gerard J., Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr & Trijntje
Pasma. 2010. A method for linguistic metaphor identification. From MIP to MIPVU.
Amsterdam & Philadelphia: John Benjamins.
Stefanowitsch, Anatol. 2006. Words and their metaphors: A corpus-based approach. In Anatol
Stefanowitsch & Stefan Th. Gries (eds.), Corpus-based approaches to metaphor and
metonymy, 63–106. Berlin & New York: Mouton de Gruyter.
Metaphors in New English academic writing 249
Wolf, Hans-Georg & Frank Polzenhagen. 2009. World Englishes: A cognitive sociolinguistic
approach. Berlin & New York: Mouton de Gruyter.
Zichler, Csilla. 2010. Metaphern in der Wissenschaftssprache. Sprachtheorie und
germanistische Linguistik 20(1). 95–112.
Steffen Schaub
The influence of register on noun phrase
complexity in varieties of English
Abstract: This study explores noun phrase (NP) complexity variation in registers
of regional varieties of English. The focus is on the description of NP complex-
ity in four registers (academic writing, conversation, unscripted speeches and
social letters) across five regional varieties of English (Canada, Hong Kong, India,
Jamaica, Singapore). For that, noun phrases are extracted from a register-strati-
fied subsample of the International Corpus of English and annotated for NP com-
plexity based on a four-way categorisation system: i) unmodified, ii) premodified
only, iii) postmodified only, iv) pre- and postmodified. The results corroborate the
strong influence of register on NP complexity, depending on two situational char-
acteristics: communicative purpose (informational vs. interactional) and mode
(written vs. spoken). Finally, it is assessed whether NP complexity is a viable
marker of regional variation in comparative varieties research.
1 I ntroduction
This study explores noun phrase (NP) complexity variation in registers of regional
varieties of English. There are three motivations for pursuing this particular
research topic: the lack of descriptive work on the noun phrase in varieties of
English, a growing interest in register variation in English varieties research and
awareness of the strong influence of register on NP structure. These motivations
are discussed in more detail in the following.
Descriptive work on the regional varieties of English has developed a focus
on comparison. With the emergence of comparable linguistic corpora, such as the
International Corpus of English (ICE), linguists have compared individual varieties
against a normative ‘yardstick’ (usually British English) or against each other.
Most of the attention has been devoted to phonology, lexis and morphosyntax.
Interest in the latter was mainly guided by investigations of ‘non-standard’ fea-
tures, i.e. features reported to occur in Englishes around the world that do not
occur in the norm-providing standard varieties. The task is to re-evaluate early
feature reports based on anecdotal observation (e.g. Platt, Weber and Ho 1984)
and to confirm their validity using empirical means. With regard to the noun
phrase across regional varieties of English, three ‘non-standard’ features are
frequently mentioned in surveys and grammatical descriptions: noun pluralisa-
tion (Platt, Weber and Ho 1984; Ahulu 1998; Hall, Schmidtke and Vickers 2013),
use of the article system (Sand 2004; Lamidi 2007; Wahid 2013; Sand forthc.),
and subject-verb concord (Asante 1995; Ahulu 1998; Blair and Collins 2001; Sand
forthc.). Other, less frequently reported phenomena include variation in the
pronoun system (Lamidi 2007; Kortmann and Lunkenheimer 2013), the expres-
sion of possession (Kortmann and Lunkenheimer 2013) and adjective comparison
(Kortmann and Lunkenheimer 2013).
More recently, interest in the noun phrase across varieties of English has
moved beyond the investigation of isolated morphosyntactic features. Brunner
(2014) introduces NP modification patterns as a marker of regional variation
across varieties of English. He compares NP structures in British, Kenyan and
Singapore English and finds that “[i]n Singapore English, premodified NPs are
significantly overrepresented [while] in Kenyan English, postmodifiers are more
frequent than premodifiers” (Brunner 2014: 44). He attributes these preferences
to contact influence from the indigenous languages of the respective areas, based
on their typological profiles (head-final vs. head-initial word order). These find-
ings are drawn from the register of spontaneous spoken conversation, which is
“arguably the least stylized and can therefore be expected to be susceptible to
contact-induced language change” (Brunner 2014: 30). In order to substantiate
the claim that preferences in NP modification are the result of language contact,
it is necessary to study more registers to see if these tendencies can be confirmed.
The notion of ‘register’ is a relatively recent addition to research into vari-
eties of English. Register is defined here, in accordance with Biber and Conrad
(2009: 6), as “a variety associated with a particular situation of use (including
particular communicative purposes)”. So far, English varieties have mainly been
handled as homogeneous entities conveniently defined by the borders of political
nation-states rather than linguistic criteria, but this is not due to a lack of aware-
ness. Already in early reports we find observations that take register variation
into consideration. Platt, Weber and Ho (1984: 49), for instance, frequently dif-
ferentiate between written and spoken as well as formal and colloquial language
when discussing individual features, e.g.: “It is common in some New Englishes
to mark the plural of the noun more often in writing and in more formal speech.
There would be less marking in colloquial speech”. Nevertheless, for much of
World Englishes research, the nation-state variety remained the preferred level of
comparison. Macro-scale projects such as the Electronic World Atlas of Varieties
The influence of register on noun phrase complexity in varieties of English 253
just as traditionally recognized ‘native’ varieties of English are recognized for the variation
within them, so too, should the emerging new varieties. The ‘native’ varieties of English are
recognized for the differences within them stemming from region, social status, and reason
for use or register […] to name just a few variables. […] Any study of a new variety of English,
then, should focus on identifying the variation within it, (and not just on describing a set
of features that characterize the national variety), and provide detailed descriptions of the
national variety […].
Xiao (2009) explores variation across twelve registers and five varieties using the
multidimensional analysis (MDA) approach developed by Biber (1988). The study
encompasses 141 grammatical and semantic features. Xiao concludes that “var-
iations in language use involve regional varieties as well as variants in different
registers and along different dimensions” (Xiao 2009: 447). In sum, register dif-
ferences are increasingly addressed in English varieties research, and it becomes
clear that the influence of register on the overall structural variation of regional
English varieties must be taken into account.
The connection between register and NP complexity has been demonstrated
repeatedly. Aarts (1971) analyses NP complexity across four different text types
and concludes that NP complexity correlates with syntactic function: While the
subject slot prefers ‘light’ noun phrases, the object slot prefers ‘heavy’ ones. In
1 The eWAVE database covers 76 mostly national varieties of English, including, however, a
number of localised dialectal varieties, for instance East Anglian English or Appalachian English
(Kortmann and Lunkenheimer 2013).
254 Steffen Schaub
addition, Aarts found a tendency for heavy noun phrases to be much less fre-
quent in spoken than in written texts. The latter point is taken up by de Haan
(1993), who confirms Aarts’ (1971) hunch about the relation between NP complex-
ity and text type. De Haan (1993) further investigates the combined influence of
text type and syntactic function on NP complexity, and finds that, in some cases,
the two reinforce each other, while in other cases they cancel each other out.
Halliday (1989) argues that spoken language is no less complex than written lan-
guage, but that the complexity is located differently. While spoken language has
a more elaborate clausal structure, in written language, the complexity lies in
the constituents below the clausal level, foremost in what he calls the nominal
group. Nominals, in writing, carry “the meat of the message” (Halliday 1989: 72).
Schäpers (2009), using a corpus of spoken and written British English, confirms
that “[n]oun phrases are more complex in written language with regard to pre-
modification, postmodification, and both pre- and postmodification” (2009: 153).
On the level of registers, Biber et al. (1999) find that almost 60 % of noun phrases
in academic prose have a modifier, while only 15 % of noun phrases in conversa-
tion are modified (Biber et al. 1999: 578). In general, academic prose is character-
ised by a more frequent use of nouns than conversation (Biber and Conrad 2009:
116–117). The linguistic differences between these two registers, Biber and Conrad
argue, can be explained on the basis of their different situational characteristics:
while the purpose of conversation is to develop personal relationships, academic
prose focuses on communicating information (Biber and Conrad 2009: 109). To
sum up, the strong connection between NP complexity and register has been con-
firmed in various studies of British and American English.
The present study combines the three interconnected research interests
outlined above. NP complexity is systematically compared across five varieties
of English (Canadian English, Indian English, Jamaican English, Hong Kong
English and Singapore English) and four registers (academic writing, conversa-
tion, unscripted speeches and social letters). The regional varieties reflect diverse
sociocultural and linguistic backgrounds. The registers were selected as counter-
parts based on two situational characteristics, namely mode (spoken vs. written)
and communicative purpose (information vs. interaction).2
2 Although the texts are meant to represent the extremes of these two situational characteristics,
a strict line cannot be drawn. For example, social letters may also be used to inform, for instance
in work-related exchange between colleagues. Likewise, unscripted speeches contain interac-
tional elements, as will be evident from the discussion of personal pronouns below.
The influence of register on noun phrase complexity in varieties of English 255
2 M
ethodology
The present section describes the data and the annotation process used in the
following analysis. Section 2.1 discusses various categorisation systems used to
mark NP complexity and introduces the system used in the analysis to follow.
Section 2.2 describes the corpus data and the annotation process.
complex distinction referred to above. The former is identical with class 1, while
the latter comprises classes 2–4. The system is summarised in Table 2.
Table 2: Categorisation system for NP complexity (based on de Haan 1993); the (+) symbol
indicates possible multiple instances
(1a) Was a pleasant surprise to hear your voice again from the other end of the line.
(1b) Was [NP a pleasant surprise] to hear [NP your voice] again from [NP the other end of
the line].
a sample of 400 NPs for each register–variety combination was extracted ran-
domly, adding up to a total of 8,000 NPs. In the second step, the extracted noun
phrases were annotated in a spreadsheet: the annotation includes the variables
complexity, based on the four-way categorisation system outlined in Section 2.1,
as well as variety, register and length (in orthographic words).
3 Results
Table 3 shows the frequencies of the four complexity classes across the four reg-
isters for all five varieties combined. In general, simple NPs without modification
(class 1) are most frequent overall (5,084 tokens or 64 %). Complex NPs (classes
2 to 4) are considerably less frequent: NPs with premodification (13 %) and post-
modification (14 %) are relatively equally frequent, while NPs with both pre- and
postmodification are the least frequent class (9 %).
Table 3: NP complexity across registers (class 1 = unmodified NPs incl. pronouns; class 2 =
premodified NPs; class 3 = postmodified NPs; class 4 = pre- and postmodified NPs and coordi-
nated multi-head NPs)
Class 1 1,559 (77.95 %) 1,291 (64.55 %) 1,402 (70.10 %) 832 (41.60 %) 5,084 (63.55 %)
Class 2 209 (10.45 %) 249 (12.45 %) 249 (12.45 %) 334 (16.70 %) 1,041 (13.01 %)
Class 3 159 (7.95 %) 300 (15.00 %) 209 (10.45 %) 466 (23.30 %) 1,134 (14.18 %)
Class 4 73 (3.65 %) 160 (8.00 %) 140 (7.00 %) 368 (18.40 %) 741 (9.26 %)
Total 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 8,000 (100 %)
The frequencies of the four classes vary with regard to register: simple NPs (class
1) are frequent in conversation (78 %), unscripted speeches (65 %) and social
letters (70 %), but relatively infrequent in academic writing (42 %). Analogously,
complex NPs (classes 2–4) are relatively infrequent in conversation (22 %) and
highly frequent in academic writing (58 %). Taking into consideration the two
situational characteristics of the registers as defined in the introduction (mode
and communicative purpose), NP complexity increases from spoken to written
The influence of register on noun phrase complexity in varieties of English 259
mode: social letters have a higher mean NP complexity4 than conversation (1.54
compared to 1.37), and academic writing has a higher mean NP complexity than
unscripted speeches (2.18 compared to 1.66). In addition, NP complexity increases
from interactional to informational communicative purpose: unscripted speeches
have a higher mean NP complexity than conversation (1.66 compared to 1.37),
while academic writing has a higher mean NP complexity than social letters (2.18
compared to 1.54).
Table 4 shows the distribution of complexity classes across the varieties for all
registers combined. Class 1 is the most frequent and class 4 is the least frequent
in all varieties (with a relatively low value in Singapore English). Looking at
classes 2 and 3, the frequencies are differently balanced across varieties: while
most varieties have a higher frequency of class 3, Hong Kong English shows a
tendency towards class 2. Furthermore, the frequencies of classes 2 and 3 are rel-
atively balanced in some varieties (Indian English, Canadian English, Singapore
English), while in others there is greater divergence (Jamaican English, Hong
Kong English).
Both Table 3 and Table 4 provide a general overview of NP complexity distri-
butions across register and variety. They allow the formulation of first tentative
conclusions, such as variety-specific tendencies towards particular classes (e.g.
pre- or postmodified NPs). As a second step, it is necessary to look at the distribu-
tion of NP classes across both varieties and registers simultaneously.
4 Mean NP complexity is defined here as a numeric value ranging from 1.0 to 4.0. It is the sum of
complexity values of n noun phrases divided by n. The higher the mean value, the more frequent-
ly we find ‘complex’ noun phrases, i.e. classes 2–4.
260 Steffen Schaub
Tables 5 to 8 show the distribution of the complexity classes for each individual
register across all varieties. In the following sections, the registers are discussed
separately.
The influence of register on noun phrase complexity in varieties of English 261
3.1 A
cademic Writing
Academic writing yields the highest frequency of complex NPs across all classes
(2–4). This is expected, as academic writing is characterised by dense informa-
tion packaging (due to its informational communicative purpose) and carefully
planned and revised production, both of which facilitate the use of complex NPs.
In academic writing, NPs contain elaborate pre- and postmodification, and they
typically contain the majority of lexical content of a sentence. Examples (2) to (6)
illustrate typical uses of noun phrases in academic writing (NPs are emphasised
in bold).
(2) The left side of Ayearst’s diptych reproduces in painstaking detail, and with close
attention to seventeenth-century techniques of glazing, Rembrandt’s frag-
mentary Anatomy Lesson of Dr Joan Deijman of 1656, now in the Rijksmuseum,
Amsterdam. (ICE-CAN:W2A-001#10:1)
(3) The integration of these two perspectives can form a more comprehensive picture
of the person of Jesus Christ. (ICE-HK:W2A-005#14:1)
(4) The whole misunderstanding about Hume’s philosophical position is the
outcome of his treatment of causation that is often misunderstood. (ICE-IN-
D:W2A-001#58:1)
(5) The casual centrality of the ‘supernatural’ in Brodber’s fiction is also an excellent
example of the writer’s adaptation of marginalised thematic concepts from the
oral tradition which she legitimises in the very process of ‘writing them up’. (ICE-
JA:W2A-005#X14:1)
(6) Though Wittgenstein was mainly concerned with the problem of philosophical
explanation, his writings on the relation between language and thought and
language and meaning have tremendous implications for both the theory and
practice of linguistic science. (ICE-SIN:W2A-005#48:1)
Analogously, academic writing has the lowest frequency of class-1 (or ‘simple’)
NPs in our sample (832 tokens or 41.6 %). The relatively low frequency of unmodi-
fied noun phrases can likewise be accounted for by the informational character of
the register: unmodified noun phrases carry less information than modified ones.
Personal pronouns are particularly uncommon: only 225 tokens (11 % of all NPs
in academic writing) are realised by personal pronouns, the most frequent being
it (61 tokens) and I (33 tokens). 1st and 2nd person pronouns are rare, which can
be attributed to the fact that interaction in academic texts is uncommon. The 2nd
person pronoun you is particularly rare, since no specific addressee is involved.
With regard to regional variation, I find that academic writing is largely
homogeneous across varieties. Few differences appear to exist with regard to
pronouns, although two exceptions are worth a brief discussion here. The first
person singular pronoun I occurs more frequently in some varieties (Hong Kong
262 Steffen Schaub
English: 15; Jamaican English: 10) than in others (Canadian English: 2; Indian
English: 4; Singapore English: 2). However, it would be premature to attribute
a more personal writing style to the Hong Kong and Jamaican English varieties
based on such low absolute frequencies. Secondly, looking at the frequencies
of you, it is noteworthy that the sample contains six occurrences in Singapore
English, while the remaining varieties have zero occurrences. A closer look at
the data reveals that all occurrences of you in Singapore English originate from
one text unit, which is not an academic text in the traditional sense, but instead
could best be described as a guide to real estate investment in Singapore. This
text unit is characterised by a much more interactive style of writing; it frequently
addresses the reader directly and makes use of imperatives, e.g. Take advantage
of this law (ICE-SIN:W2A-001#48:1), or Invest your CPF savings in property (ICE-
SIN:W2A-001#49:1). Whether such a text constitutes an instance of academic
writing, much less in the humanities, is debatable. Nevertheless, the text could
be clearly distinguished from other texts of the same register on the basis of one
grammatical feature.
There are slight indications of regional variation in the distribution of the
complex NP classes, for instance the relative overuse of class 2 and underuse of
class 3 in Indian English. Overall, however, there appears to be little variation
in academic writing across varieties. This can be interpreted in two ways: one,
there is no discernible difference between regional varieties for this register. An
argument in favour of this interpretation would be that the homogeneity of the
register, and by extension its conformity on an international level, is guaranteed
by the publication process. A second interpretation is that the level of abstrac-
tion in categorising NP complexity, as it is used in this analysis, is too superficial
to bring to light any discernible differences; in other words, although there may
be no differences across regional varieties on the superficial level of abstraction
assumed here, significant distributional differences might be observed when, for
instance, specifying the types of modification involved. At this point, however,
we have to conclude that we cannot find regional variation with regard to NP
complexity in academic writing.
3.2 C
onversation
Conversation has the highest frequency of simple noun phrases of all registers in
the study (78 %). This is in line with Biber et al., who find that ca. 85 % of all NPs
in their conversation data have no modifier (Biber et al. 1999: 578). Of the class-1
NPs in conversation, more than half are personal pronouns (857 tokens, or 55 %).
This also confirms Biber et al.’s finding that “pronouns are slightly more common
The influence of register on noun phrase complexity in varieties of English 263
than nouns in conversation” (Biber et al. 1999: 235). The relatively frequent reli-
ance on pronouns is due to the “shared situation and personal involvement of the
participants” (Biber et al. 1999: 235).
Class-2 NPs are the most common type of modified noun phrase in conversa-
tion. They account for 10 % of the NPs. With regard to premodification, Biber et
al. find that the vast majority of premodification sequences in noun phrases does
not exceed two words (Biber et al. 1999: 597). This is confirmed in the present
analysis: the average length (in orthographic words) of class-2 NPs in conversa-
tion is 3.2 (including head and any determiners). This means that premodification
amounts to 1–2 words on average. The most common type of premodification is
by adjective or noun, optionally including a determiner, as the examples below
illustrate.
(7) Uhm because David does say that hiking boots make an enormous difference not
slide on anything (ICE-CAN:S1A-001#3:1:A)
(8) Sometimes uhm the people uh sorry people of India they are they belong to different
communities and they have their separate cultures (ICE-IND:S1A-005#62:1:B)
Longer class-2 NPs (>3 words) are uncommon and usually the result of correction
or coordination, as can be seen in examples (9) and (10) below. Proper cases of
multiple premodification, as in examples (11) and (12), are rare. This is because
the real-time analysis of longer premodification sequences places a heavy cogni-
tive burden on the listener, rendering spoken communication ineffective.5
5 See Quirk et al. (1985: 1039): “Considerable left-branching is possible in the noun phrase, […]
although comprehension becomes more difficult as the complexity of left-branching increases”.
264 Steffen Schaub
words. The slightly higher mean value (7.1) is caused by rare instances of complex
postmodification, as in examples (13) and (14).
(13) Uhh I remember my friend Mendela that beautiful millionaire meatpacker from
Saskatoon who was so nice to me when I was a young man […] (ICE-CAN:S1A-
009#85:1:A)
(14) Naturally if Mitterand President Mitterand [sic] can run his government for a period
of ten years uh why India cannot have a government consisting of some <uh> party
<uh> national party national party representing the national capital or some pro-
gressive elements <uh> <uh> in some some political parties like Congress-I Con-
gress-S or even Janata Dal with some <uh> radical members belonging to <uh>
communist party or socialist party (ICE-IND:S1A-005#19:1:A)
Finally, class-4 NPs are extremely rare in conversation, accounting for only 4 %
of all noun phrases in the data. The most frequent type is a combination of a one-
word (nominal or adjectival) premodification plus postmodification by a short
prepositional phrase (usually with of), as the following examples illustrate:
(15) But what is after the road No the other side of the road (ICE-SIN:S1A-001#88:1:B)
(16) I said I behave as if this might be the last day of my life […] (ICE-CAN:S1A-009#88:1:A)
(17) […] and you would have seen a different spin to the thing (ICE-JA:S1A-009#X67:1:A)
Orthographically longer class-4 NPs are often the result of multiple coordina-
tion or performance phenomena, including repetitions, repairs and hesitations.
Example (18) is a coordinated list of postmodified NPs, which contains several
repairs and repetitions as well as a hesitation marker (uh).
(18) Political exchange <uh> tourist exchange tourist exchange or scholars exchange
of scholars or exchange of technocrats (ICE-IND:S1A-005#37:1:A)6
6 The example in (18) is assigned the complexity value 4, as it is a coordinated (multi-head) con-
struction (see Section 2.1). A ‘cleaned-up’ version of the noun phrase could be political exchange,
tourist exchange or exchange of scholars or exchange of technocrats.
The influence of register on noun phrase complexity in varieties of English 265
3.3 U
nscripted speeches
(19) Okay don’t think that they’re going to give you time okay after your job interview Don’t
think they’re going to take care of you in a very big way okay (ICE-CAN:S2A-021#29–
30:1:A)
(20) You have to vote more opposition strong opposition not only to establish opposition in
parliament Make opposition part of our political culture not only that but also an effec-
tive an effective hammer over the head of PAP If you don’t do that what will happen
You can bet your last dollar after this election prices will sure to go up (ICE-SIN:S2A-
021#34–37:1:A)
With regard to complex noun phrases, unscripted speeches have the second-high-
est overall frequency in the sample (35 %). This is due to the informational com-
municative purpose of speeches, which necessitates the use of modified noun
phrases to convey information. The overall level of NP complexity is higher in
speeches than in social letters, despite the latter being written. In direct compari-
son, unscripted speeches and social letters make equally frequent use of premod-
ification, while in classes 3 and 4, unscripted speeches surpass social letters. Like
in conversation, the tendency for a stronger reliance on postmodification instead
of premodification in unscripted speeches can be explained on the basis of easier
comprehensibility of right-branching (see Quirk et al. 1985: 1039).
266 Steffen Schaub
Comparing the results across varieties, the following observations are note-
worthy: assuming an even distribution, the frequency of premodified NPs (class
2) is relatively low in Canadian English (33 tokens) and high in Hong Kong English
(78 tokens). Furthermore, postmodified NPs (class 3) are relatively frequent in
Jamaican English (82 tokens), but infrequent in Singapore English (44).
3.4 S
ocial letters
Class-1 NPs are by far the most frequent noun phrase class in social letters, con-
stituting between 65 % and 75 % of all NPs in each 400-word variety sample. Per-
sonal pronouns form the majority of class-1 NPs (ranging from 52 % to 61 % across
varieties). This can be attributed to the interactional character of social letters,
which mainly rests on the frequent use of I and you.
The frequencies of class-2 and class-3 NPs are relatively balanced, with a slight
preference for class 2. Class 4 is the least frequent noun phrase type in this reg-
ister across all varieties, with the exception of Canadian English. Constructions
in this category show a range of variation. A typical kind of class-4 construction
are multi-head NPs coordinated with and or or. Class-4 NPs which are not coordi-
nated are often nouns premodified by one adjective or noun and postmodified by
a prepositional phrase, as in the examples (21) to (23). Complex noun phrases in
social letters are very similar to those found in conversation and form a contrast
to the lexically heavy class-4 NPs found in academic writing.
(21) I hope that I will be able to come to Kolhapur in the first week of Jan. (ICE-IN-
D:W1B-002#47:1)
(22) My point is that if one can love the other person without calculate what one can get back
from the relationship, this will be the greatest love of all. (ICE-HK:W1B-001#144:5)
(23) The team is still waiting for a final reply from the administration of this university
but I’m not optimistic. (ICE-SIN:W1B-001#148:2)
More complex examples are rare in social letters. Long, heavily modified noun
phrases clearly originate from letters with an academic background, as example
(24) illustrates.
(24) I would need a formal invitation from you for collaboration with specific refer-
ence to the project & [sic] that it would not involve financial liabilities for the
University. (ICE-IND:W1B-005#7:1)
practice, reports from an exchange year) and others clearly coming from an aca-
demic context (correspondence between students and professors). NP complex-
ity is higher in the latter. It remains debatable whether one text category should
include both subtypes.
Comparing NP complexity across varieties, there is relative underuse of pre-
modified NPs in Jamaican English, overuse of postmodified NPs in Singapore
English, and overuse of pre- and postmodified NPs in Canadian English and
Indian English.
There are numerous ways in which subsequent research can improve on the
study presented here. First, the database of noun phrases has to be extended to
provide a more solid empirical foundation. Second, by adding further annotation
to the data, such as syntactic function, type of head noun, and type of modifi-
cation, more fine-grained statements about differences in NP complexity across
varieties are possible. This study has also shown that random selection of text
units from the International Corpus of English for the purposes of a register anal-
ysis is not desirable. The texts included in some of the register categories in ICE
are too heterogeneous. Instead, text units have to be carefully selected in order to
ensure compatibility across varieties. Finally, any variety-specific structural pref-
erences have to be matched against the typological inventory found in the sub-
strate languages. Only then is it possible to draw any connections to the possible
origin of such preferences, and to substantiate claims about structural transfer.
6 R
eferences
Aarts, Flor G. A. M. 1971. On the distribution of noun-phrase types in English clause-structure.
Lingua 26. 281–293.
Ahulu, Samuel. 1998. Grammatical variation in international English. English Today: The
International Review of the English Language 14(4). 19–25.
Asante, Mabel Yeboah. 1995. Ghanaian English: Motivation for divergence from the standard
in certain grammatical categories. Tübingen: Eberhard Karls University Tübingen
dissertation.
Asante, Mabel Yeboah. 2012. Variation in subject-verb concord in Ghanaian English. World
Englishes 31(2). 208–225.
Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam:
Benjamins.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.
Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. 9th impr. (2011). Harlow: Longman.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.
Blair, David & Peter Collins. 2001. English in Australia. Amsterdam: John Benjamins.
Brunner, Thomas. 2014. Structural nativization, typology and complexity: Noun phrase
structures in British, Kenyan and Singaporean English. English Language and Linguistics
18. 23–48.
Fludernik, Monika & Bernd Kortmann (eds.). 2012. Proceedings: Anglistentag 2011 Freiburg.
Trier: Wissenschaftlicher Verlag Trier.
Gut, Ulrike. 2011. Studying structural innovations in new English varieties. In Joybrato
Mukherjee & Marianne Hundt (eds.), Exploring second-language varieties of English and
learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44), 101–124.
Amsterdam: John Benjamins.
270 Steffen Schaub
Haan, Pieter de. 1993. Noun phrase structure as an indication of text variety. In Andreas H.
Jucker (ed.), The noun phrase in English: Its structure and variability, 85–106. Heidelberg:
Winter.
Hall, Christopher J., Daniel Schmidtke & Jamie Vickers. 2013. Countability in World Englishes.
World Englishes 32(1). 1–22.
Halliday, Michael A. K. 1989. Spoken and written language. 2nd edn. Oxford: OUP.
Jucker, Andreas H. 1992. Social stylistics: Syntactic variation in British newspapers (Topics in
English Linguistics 6). Berlin: Mouton de Gruyter.
Jucker, Andreas H. (ed.). 1993. The noun phrase in English: Its structure and variability.
Heidelberg: Winter.
Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. The electronic world atlas of varieties of
English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas.
org (accessed 28 February 2015).
Lamidi, Mufutau T. 2007. The noun phrase structure in Nigerian English. Studia Anglica
Posnaniensia: An International Review of English Studies 43. 237–250.
Mukherjee, Joybrato & Marianne Hundt (eds.). 2011. Exploring second-language varieties of
English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44).
Amsterdam: John Benjamins.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik
& Bernd Kortmann (eds.), Proceedings: Anglistentag 2011 Freiburg, 75–94. Trier:
Wissenschaftlicher Verlag Trier.
Platt, John, Heidi Weber & Mian Lian Ho. 1984. The new Englishes. London: Routledge and
Kegan Paul.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive
grammar of the English language. 4th edn. London: Longman.
Sand, Andrea. 2004. Shared morpho-syntactic features in contact varieties of English: Article
use. World Englishes 23(2). 281–298.
Sand, Andrea. forthc. Angloversals? Shared morpho-syntactic features in contact varieties of
English. Amsterdam: Benjamins.
Schäpers, Uta Katharina Elisabeth. 2009. Nominal versus clausal complexity in spoken and
written English: Theory and description (English Corpus Linguistics 8). Frankfurt: Peter
Lang.
Schilk, Marco & Steffen Schaub. forthc. Noun phrase complexity across varieties of English:
Focus on syntactic function and text type. English World-Wide 37(1).
Setter, Jane, Cathy Wong & Brian Chan. 2010. Hong Kong English. Edinburgh: Edinburgh UP.
Wahid, Ridwan. 2013. Definite article usage across varieties of English. World Englishes 32(1).
23–41.
Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World
Englishes 28(4). 421–450.
Valentin Werner
Real-time online text commentaries:
A cross-cultural perspective
Abstract: In the area of electronically-mediated communication, real-time online
text commentaries (OTCs) as a new specialised register have become popular as an
alternative to traditional broadcasting. OTCs have been recognised as “mediated
quasi-interaction” (Chovanec 2010) and a hybrid genre showing characteristics
of spoken discourse within a written mode (Jucker 2006), as well as a character-
istic combination of simultaneous information and entertainment (“infotain-
ment”), where familiarity or “pseudo-intimacy” (O’Keeffe 2006; cf. Chovanec
2008) between commentator and the audience is created. This contribution helps
to situate this emerging register from a cross-cultural perspective. I use OTCs by
English and German media outlets from the EURO 2012 football championship to
tackle the following issues with the help of a corpus-linguistic approach: (i) What
are register-specific structural features of OTCs? (ii) Are there any culture-specific
aspects along language boundaries or the dimension “intended readership”? I
also consider the interaction of layout and content, production circumstances,
and the influence of recent developments (such as the incorporation of Twitter
messages) on reporting styles.
0 I ntroduction
Real-time online text commentaries (henceforth OTCs)1 have become more and
more popular2 and represent an alternative to traditional live TV and radio broad-
1 Alternative labels are live text commentary (LTC), live blogging, live ticker, news ticker and the
more sport-specific minute-by-minute report (MBM) or live match tracker.
2 According to a recent survey, OTCs have become “the default format for covering major break-
ing news stories, sports events, and scheduled entertainment news”, even surpassing online
articles and picture galleries in popularity (Thurman and Walters 2013: 82; cf. also Wells 2011).
The growing importance of the format is revealed both by the sheer number of OTCs (almost 150
per month for The Guardian) and also in terms of page view counts, which are at least twice as
high for OTCs compared to articles and galleries. User reports seem to confirm that OTCs are the
casting, reporting and commenting on live events controlled for duration, loca-
tion and topic (cf. Siever 2011: 171), in particular major sports events. As the name
implies, they are usually categorised as a written form of web communication (cf.
Biber and Egbert, this volume) and are similar to (we)blogs in that they consist of
individual consecutive postings (cf. Grieve et al. 2010: 303).
While previous research has recognised the narrative properties and ana-
lysed the vocabulary and morphosyntax of football reportage in general (Brandt
and Quentin 1983; Ghadessy 1988; Hennig 2000; Krone 2005; Müller 2007; Levin
2008), others have noted that OTCs as “mediated quasi-interaction” (Fairclough
1995: 40) constitute a hybrid register: They show characteristics of spoken dis-
course within a written mode (Jucker 2006; cf. Lakeberg, this volume) and are an
interesting combination of simultaneous information and entertainment (“info-
tainment”). Thus, familiarity or “pseudo-intimacy” (O’Keeffe 2006: 92; cf. Cho-
vanec 2008, 2010; Jucker 2010) between the commentator and the audience is
created.
Two further issues are important for establishing OTCs as a register, defined
(following Biber 1988) as language variety by situational (i.e. non-linguistic)
characteristics (see also Schubert, this volume). First, “situational context tends
to exert functional pressures on linguistic output” (Grieve et al. 2010: 315), which
implies there should be common linguistic features traceable across different
OTCs, particularly if they report on the same matches. Second, there is the con-
trastive view. It was hypothesised, albeit for other types of football reportage, that
“[t]ypological differences between […] two languages are expected to be neutral-
ised to a certain degree” (Krone 2005: 51) when texts from two languages fulfil the
same function (in our case, football match reportage). Others (e.g. Müller 2007:
44), however, have emphasised that cultural differences may lead to noticeable
stylistic differences.
Starting from these observations, this paper will address the following
aspects with the help of a corpus-linguistic approach:
(i) What are register-specific structural features of OTCs on different levels of
linguistic analysis?
online format par excellence to track stretches of live events, as more than 35 % of respondents
follow OTCs continuously. Nearly two fifths of all OTCs are sports-related (Thurman and Walters
2013: 82–95). Data for Der Spiegel are in line with these general findings as OTC football report-
age receives more than 1 million clicks per match (See <www.spiegelgruppe.de/spiegelgruppe/
home.nsf/0/CEF3A44164AED9BBC1256F720034CBAC>, accessed 20 April 2013).
Real-time online text commentaries: A cross-cultural perspective 273
(ii) Are there any culture-specific aspects along language boundaries or the
dimension “intended readership”; or do OTCs rather form a relatively uniform
cross-linguistic/cross-cultural register?
Further aspects addressed are the interaction of the layout of OTCs with their
content as well as the influence of very recent developments (such as the incor-
poration of Twitter messages) on the style of reporting.
After a few notes on data and methodology, the present study first sets out to
locate OTCs as a register in general terms in Section 2. Section 3 provides an anal-
ysis of the language of OTCs, focusing on vocabulary and collocations and related
semantic aspects, discourse features and potential implications of the interaction
of format and textual commentary. A discussion of OTCs as a cross-cultural reg-
ister follows in Section 4, while Section 5 sums up the results and presents some
generalisations as well as avenues of further research.
2 OTCs as a register
2.1 E
lectronically-mediated communication and sports
reportage
Broadly speaking, in the scarce amount of work available to date, the style of
football reportage has been described as resembling conversation (cf. Ferguson
1983: 156–157), but some have highlighted its monologic quality, emphasising its
narrative properties where the commentator acts as mediator and filter (Brandt
and Quentin 1983: 21; Hennig 2000: 44). A comparison of OTCs and traditional
types of live reportage in terms of a summary overview of results from previous
analyses (Perez-Sabater et al. 2008; Chovanec 2008, 2010; Jucker 2010; Thurman
and Walters 2013) yields the picture displayed in Table 1.
Real-time online text commentaries: A cross-cultural perspective 275
Radio TV OTC
Unscripted ✓ ✓ ✓
Temporal limitation ✓ ✓ ✓
Jargon/slang/idioms ✓ ✓ ✓
Formulaic language ✓ ✓ ✓
Ellipsis ✓ ✓ ✗
Table 1 shows the shared characteristics of both OTCs and traditional live report-
age as events in mass communication, while humour is another broad commu-
nication strategy characteristically used in all types. Owing to the features listed
above, sports reportage generally has been described as some kind of “enter-
tainment” genre, even though its primary function arguably is to report factual
content (Brandt and Quentin 1983: 20; Chovanec 2011: 253–254).
However, a number of differences on account of the channel of distribution
(web page mainly with textual content + interactive elements) and to the particu-
lar properties of electronic communication (e.g. the staging of familiarity,3 see
further Section 3.2 and Jucker 2010: 66) emerge. Above all, a point worth noting
is the way in which the recipients consume media forms such as OTCs. They are
produced fairly quickly and without many corrections as the commentator is
under time pressure due to the co-extensive nature of the event described and its
description (Jucker 2010: 64).4 Likewise, the consumption is quick and cursory, as
3 According to Dürscheid (1999: 23), the staging of familiarity (and the resulting “pseudo-inti-
macy” between participants; O’Keeffe 2006; see further Section 3.2) in written electronic com-
munication is characterised by an apparent closeness of those involved in such a communicative
situation. This is due to the immediacy of the exchanges via the electronic medium, which is
supported by the use and acceptance of features typically occurring in the spoken mode.
4 Indeed, typos, interpretable as a typical feature of online production under time constraints,
repeatedly occur in all of the OTCs analysed (see e.g. examples (41), (56) and (59) below).
276 Valentin Werner
is the case with many other electronic offerings (Dürscheid 1999: 21). These find-
ings suggest that there are areas of both overlap and divergence between OTCs
and traditional forms of sports reportage. In addition to the aspects mentioned in
the foregoing, it will be shown in the following how OTCs can be further related
to the domains of sports and news reportage, but why they should be categorised
as a separate, fairly institutionalised, register serving a discourse community
(O’Keeffe 2006: 19, 29).
5 The corpus even contains a few meta-comments on technical issues during production, after
the conventional layout and the technical platform apparently had been changed: Yes, yes this
looks a bit different to our usual minute-by-minute reports, but rather than moan about change,
why not embrace it? Or moan about it privately. I’m just a drone who’s following orders and doing
what he’s told. And besides, I quite like it, because I can put in big red quotation marks… (ukr_
eng_1906_guar); I do love this new headline facility… (ukr_eng_1906_guar)
Real-time online text commentaries: A cross-cultural perspective 277
Figure 1: Commentary and overview section of the SUN OTC (from swe_eng_1506_sun;
<www.thesun.co.uk/sol/homepage/sport/football/match_centre/article3670013.ece>,
accessed 12/07/2012, 10:21)
Jucker defines OTCs as a “complex combination of visual and textual features […]
giv[ing] the recipient not only a narrative account of the events so far, but also an
overview of the situation at present” (2010: 59). Typically, the textual informa-
tion is shown in reverse chronological order, with the most recently added post
278 Valentin Werner
appearing at the top of the page (see Figure 1 for an example).6 This post-by-post
(or minute-by-minute) reporting style is supposedly a fairly recent development
illustrating the influence of structure on activity (O’Keeffe 2006: 31). This means
that the special properties of OTCs as a form of electronically mediated communi-
cation have an impact on the style of reporting. In fact, OTCs surprisingly resem-
ble a certain type of after-match report which appeared in printed publications as
early as the 1950s (see Figure 2).
Figure 2: Excerpt from Kicker FUSSBALL-ILLUSTRIERTE (1954) adapted from Burkhardt (2010: 11)
What is new, however, are the opportunities offered by the technology to use a
similar reporting style for live reportage, and the additional options the electronic
6 The content management system of a media outlet may allow reversing the anti-chronological
order once the event has finished, so that the report appears as a kind of article readable from
top to bottom. For instance, this is the case with GUAR (Thurman and Walters 2013: 92) but does
not apply for the other OTCs explored in this study. Occasionally, earlier postings are corrected or
altered in order to make them more readable after the description of the actual event (e.g. during
half-time breaks or before the order is reversed (Simons 2011: 181). Thus, OTCs are a register that
is both dynamic and static (Chovanec 2010: 239).
Real-time online text commentaries: A cross-cultural perspective 279
medium offers. Sometimes the readers have the choice to filter the textual data to
quickly update on the most important events in the match (i.e. goals, fouls and
substitutions). Other elements that could be added (usually outside the frame or
area where the main commentary appears) are links and embedded audiovisual
content (Thurman and Walters 2013: 83). In football reportage in particular, the
majority of OTCs offers sections, tabs or links on the score (also of simultaneous
matches) and scorers, current and starting team line-up and on general statis-
tics (shots on goal, cards, ball possession, etc.). One of the most intricate OTCs is
offered by SPON, where readers can also retrieve the real-time statistics for each
individual player. This OTC further includes “heatmaps” (see Figure 3) showing
the positions/operating range of the individual player or of the full team on the
pitch.
Figure 3: Heatmap of the English team (left) and Italian central midfielder Andrea Pirlo (right) in
SPON (from ita_eng_2806_spon; <www.spiegel.de/sport/fussball/em-2012-liveticker-
spielplan-und-alle-statistiken-a-836448.html>, accessed 02/07/2012, 10:28)
simplified scheme (in chronological order), abstracted from the four OTC types
investigated:
Summary and overall match Consequences for teams, naming goal scorers and order of
comment scoring
Goodbye
This highly structured layout in large parts corresponds to the progression in tra-
ditional football reportage, but OTCs usually finish shortly after the actual match
coverage and lack post-match comments and interviews commonly found in
radio and especially on TV (cf. Ferguson 1983: 154). Note the differences between
the individual OTCs: while the posts of some media (e.g. from SUN) are always
organised in the same fixed way (preview – early team news – head to head – the
ref – etc.) and are apparently prepared in advance (cf. Simons 2011: 180–181), the
data from the other media outlets suggest that they take a more liberal approach
and leave the exact arrangement of the posts (particularly in the phases before
the actual commentary begins) up to the commentator.7
The length of the individual phases may vary. For example, the length of the
pre-match coverage ranges between 176 (swe_eng_1506_spon) and 2,969 words
(ita_eng_2406_guar), GUAR overall being most verbose in this respect (see Figure
7 Boundaries between the (idealised) phases are blurred at times, so that information typically
found in one phase may also appear somewhere else. For example, information on jersey colours
may appear within the first minutes of the actual match commentary, as illustrated by 1’ KICK
OFF Germany, in their all-white kit, start the game kicking from right to left (ger_den_1706_sun).
Real-time online text commentaries: A cross-cultural perspective 281
4) and particularly when matches of the English squad are reported (for further
quantitative assessment of OTCs, see Section 3.3 below).
1800
1600
1400
1200
word count
1000
800
600
400
200
0
GUAR SUN BILD SPON
AVG 1258.3 696.9 606.4 558.9
AVG ENG 1708.5 771.25 534.25 298.5
AVG GER 898.2 637.4 664.2 767.2
Figure 4: Length (in words) of pre-match commentary (AVG = overall average; AVG ENG =
average of England match reports; AVG GER = average of Germany match reports)
The phases before the match actually starts serve at least two important commu-
nicative functions. First, the ‘appetiser’ section is a device to incite interest in
readers and to emphasise the relevance of the match. (1) and (2) can be seen as
typical posts.
8 Translation: Germany versus the Netherlands, that’s the classic, the non-plus-ultra of football –
what am I saying, the Holy Grail of this European Championship. A warm welcome to this top event.
282 Valentin Werner
The commentary can be viewed as the core part, with the main communicative
function of conveying factual information, although further functions, such as
entertainment (see below), should not be discounted. OTCs usually finish with a
summary and overall match comment, potentially aimed at members of the audi-
ence who only look for a quick round-up of the match and who do not want to
read the full coverage.
2.3 A
udience participation
9 Translation: The national anthems. Creeps for every football fan. What an atmosphere.
Real-time online text commentaries: A cross-cultural perspective 283
On the other hand, the advent and growing popularity of genuinely inter-
active internet applications (the so-called “web 2.0” technologies) could have
led to a widespread integration of these into OTCs as another “webby” form of
communication, creating dynamic content. The most popular application, poten-
tially also most adapted to OTCs as another immediate form of journalism (cf.
Chovanec 2010: 239), is the microblogging service Twitter (<www.twitter.com>).
Despite its presence on the market since 2006, only one of the OTCs considered
in the present study, SPON, has reserved some space for Tweets (that is, Twitter
posts). This area (called “Live-Fanblock”, ‘live fan section’) is placed prominently
next to the main commentary box (see Figure 5).
(5) Jetzt ist es amtlich – Klose, Schürrle und Reus spielen von Beginn an. Twittert der
DFB. Sollten Sie auch den Drang verspüren, ihren Kommentar via Twitter in den
Live-Fanblock rechts nebenan zu Tickern, so benutzen Sie bitte den Hashtag #gergre
(ger_gre_2206_spon)10
10 Translation: Now we know for sure – Klose, Schürrle and Reus are in the starting line-up. Twit-
ters the DFB (= the German football association). Should you also feel the urge to post your com-
ments to the Live-Fanblock to the right, please use the hashtag #gergre
284 Valentin Werner
(6) PS: Mein Tweet des Abends: Dehnen ist gut für die Bänder, Bender ist schlecht für die
Dänen – @wintersjon! In diesem Sinne, gute Nacht! (ger_den_1706_spon)11
(7) Over on the Twitter @ianapplegate has this suggestion. Maybe they should at least give
Esperanto a go? Can anyone even speak Esperanto? (ger_den_1706_guar)
It emerges from the analysis that, at present, no unequivocal answer can be given
to the question as to whether interactive elements influence OTC commentary.
However, it could be shown (i) that the extent of how much reader-generated
content influences the style and content of OTCs varies considerably and (ii) that
different OTCs have different approaches towards interactivity. While two (SUN,
BILD) do not provide any opportunity for the readers to get involved, OTC report-
age in GUAR provides extensive, though filtered, reader-generated content and
related comments, and thus yields a quasi-conversational structure as defined
above. The most direct approach arguably is taken by SPON, where Tweets are
displayed unfiltered as a by-commentary right next to the commentator’s text.
However, the latter does not usually refer to the former in any way, so audience
participation could be viewed as constrained in another way.12
11 Translation: PS: My Tweet of the night: Stretching is good for the ligaments, Bender is bad for
the Danes – @wintersjon! In this spirit, good night! Note: In the German version, the author of
the Tweet exploits the homophony /bendɐ/ between Bänder (‘ligaments’) and Bender (player’s
name) for a comic effect.
12 Even if the extent of filtering varies, both ways of incorporating interactive elements pre-
sumably take account of a point made in audience studies of other media types. To be precise,
Gerhardt (2006: 129) maintains that the audience consists of “active social agents whose lives do
not come to a halt when they are exposed to a mass medium”. Accordingly, it could be argued
that OTCs with interactive elements take a socially more adequate approach towards their read-
ers. This view is also supported by Simons (2011: 156), asserting that modern audiences have
developed a feeling of being entitled to participation and interaction. Therefore, it is argued that
state-of-the-art journalistic practice is liable to incorporate social media in order to render mass
media production and use a shared experience. A related point of minor importance is that OTCs
sometimes also serve as some kind of by-medium to TV broadcasts where a commentator adds
Real-time online text commentaries: A cross-cultural perspective 285
3.1.1 G
eneral picture
Like traditional types of sports reportage (cf. Ghadessy 1988: 19), OTCs can be
expected to contain a substantial amount of technical vocabulary to describe the
gameplay. An exploration of the most frequent content words reveals that items
can be broadly categorised into what is shown in Table 3.
Table 3: Categories of content words amongst the top 100 wordlist created with AntConc
Sports-/game-related ball, goal, shot, corner, Ball (‘ball’), Tor (‘goal’), Ecke
terms side, kick, area, cross, (‘corner’), (gelbe) Karte (‘(yellow)
chance, post, team, game card’), Strafraum (‘penalty area’),
Flanke (‘cross’), Spiel (‘game’),
Wechsel (‘substitution’)
Names of players and Hart, Rooney Hart, Gomez, Klose, Özil, Neuer, Löw
coaches
Overall, the comparison between the most frequent content words in English and
German OTCs reveals some striking similarities (especially as regards the first
three categories in Table 3), but with a slight change in national focus (as regards
the players’ names). Note also that the expression of movement, location and
direction figures prominently in terms of function words – mainly prepositions –
amongst the highly frequent lexical items (e.g. right, up, left, down, back, over,
“colour commentary” to the “action” on the screen. This is especially salient in designated OTCs
on particular shows, for instance such as the regular SPON OTC on “Tatort”, a popular German
crime series.
286 Valentin Werner
against, to, in, for, from, on, at, by, into vs. in, auf, mit, von, zu, im, aus, an, bei,
gegen, vor, nach, zum, ab, am, über, durch, zur, ins).
These findings can be closely related to a semantic keyword analysis in
Wmatrix, where the English OTC data are compared against the spoken and
written BNC sampler. In this quantitative perspective, salient semantic areas
emerge. These are ‘competition’, ‘numbers’ (usually related to spatial and tem-
poral orientation), ‘warfare, defence and the army; weapons’, ‘violent/angry’,
‘chance, luck’, ‘long, tall and wide’, ‘success’, ‘failure’, ‘anatomy and physiol-
ogy’, illustrated by examples (8) to (15) respectively.13
(8) As it stands, Portugal will go through with a better head-to-head record. (ger_
den_1706_sun)
(9) And how England love that decision, because the second effort is sent right onto
Lescott’s head, eight yards out, level with the left-hand post. (fra_eng_1106_guar)
(10) That was Klose’s 64th goal for Germany four off Gerd Muller’s record and he almost
made it 65 moments later, following up a loose ball and sweeping in a low shot that
was kicked behind at the near post by the besieged Sifakis. (ger_gre_2206_guar)
(11) Evra whips a cross into the England area from the left. (fra_eng_1106_guar)
(12) It’s high-stakes major-championship Holland versus Germany. (ger_ned_1306_guar)
(13) Germany also prevailed in the third-place play-off at World Cup 2006, winning 3-1 in
Stuttgart. (ger_por_0906_sun)
(14) Designated scapegoat for when it all goes wrong: Pedro Proenca (Portugal). (ita_
eng_2406_sun)
(15) He curls a cross onto the head of Gomez, but the big striker’s header is weak and
wafted miles to the left of the target. (ger_ita_2806_guar)
The analysis of highly frequent content items and the semantic keyword anal-
ysis suggest that OTCs do not fundamentally differ from other forms of football
reportage, particular radio reportage, as “good playing, moments of risk, signif-
icant points of heightened competition” (Ferguson 1983: 156–157) receive most
extensive coverage. This can be deduced for example from the high salience of
‘success’ and ‘failure’ semantic tags or the high frequencies of players’ names
usually involved when chances in a game occur; that is, strikers/offensive players
(Rooney, Özil, Klose, Gomez) and goalkeepers (Hart, Neuer).
Levin (2008: 146) has pointed out that “traditions developed in sports com-
mentary are often unintelligible to the uninitiated”, one reason being that com-
mentators rely on formulaic language with specialised meanings. In order to test
13 Some of the findings of the corpus software may be due to the metaphorical processes in-
volved (cf. also the usage of the terms shot, target and squad, captain, etc.). It is controversial
whether “football is war” metaphors still apply or whether they have conventionalised (see also
Section 3.1.2).
Real-time online text commentaries: A cross-cultural perspective 287
this claim, I compared the ten most frequent 4-grams in the material for both
languages, as shown in Table 4.
GUAR+SUN SPON+BILD
According to the absolute usage frequencies, English OTCs apparently use for-
mulaic expressions much more than the German ones. A particularly common
collocation (see ranks 1, 2 and 3 in Table 4), better represented as a 6-gram,
is on the edge of the X.14 Levin’s (2008) findings can be confirmed insofar that
somebody reading OTC reportage has to have (i) knowledge about conventions
and a mental image as regards the layout of a football pitch and (ii) about foot-
14 Realisations for X occurring in the data are D, six yard box, England box, Italy penalty area,
Sweden penalty area, penalty area.
288 Valentin Werner
ball-related jargon. Fact (i) is especially illustrated by the English data, where
the majority of the 4-grams describes movement and/or position and (ii) espe-
cially by the German data, where technical terms (partly also related to position)
such as Strafraum (‘penalty area’), Flügel (lit. ‘wing’; ‘outer part of the pitch’) or
aus der zweiten Reihe (lit. ‘from the second row’; ‘from far away’) appear. The
present data therefore suggest that it is not merely “goal scoring and measuring
time” (Levin 2008: 146) where formulaic language is employed, although some
of the items included in Table 4 (e.g. in the first half; for the first time; Tooor für
Deutschland, X:X; in der zweiten Hälfte) support Levin’s claim.
A related aspect is the extended reliance on informal and slang items
(Perez-Sabater et al. 2008: 242; cf. Ferguson 1983: 156–157), exemplified by Kasten
(‘goal’, lit. ‘box, case’) in Table 4. A recent study on informality (Burkhardt 2010:
14–15) has identified a long-standing tradition of dialectal and informal influence
as regards (German) football language, and a similar situation in English appears
highly likely. Indeed, the OTC data from both languages confirm a general ten-
dency towards informal usage, as examples (16) to (19) show (see also below):
(16) Neat turn from Ozil who twists in the box before feeding Khedira for a low 20-yarder,
which Sifakis parries. (ger_gre_2206_sun)
(17) (…) on the sideline Joachim Low is waving his hands around in frustration like an eejit.
(ger_gre_2206_guar)
(18) Huiuiui, dieser Reus hat sich einiges vorgenommen. Diesmal rutscht ihm das Spiel-
gerät über den Schlappen und fliegt zwei Meter am rechten Außenpfosten vorbei.
(ger_gre_2206_bild)15
(19) Fortakis hält einfach mal drauf. Neuer hält einfach mal fest. (ger_gre_2206_spon)16
15 Translation: Huiuiui, this Reus guy is up for something. This time, the playing device (infml.)
slides over his worn-out shoe/slipper and misses the right outer post by two meters.
16 Translation: Fortakis just shoots. Neuer just saves.
Real-time online text commentaries: A cross-cultural perspective 289
Table 5: Standardised type/token ratios (TTR) calculated with frequencies from AntConc
In fact, keyword analyses contrasting the vocabulary of the two OTCs respectively
(GUAR vs. SUN and SPON vs. BILD) yield a very diverse picture. First, a look at
the top 100 keyness words of GUAR vs. SUN (and vice versa) reveals some (groups
of) characteristic items. Commentators for GUAR seem to have a preference for
technical terms such as tiki-taka or its ad-hoc (mock) variant (das) bundestikiund-
taka17 to describe the particular playing style the Spanish and German teams are
known for. On a related note, the acronym TBOF (‘two banks of four’), referring
to the traditional tactical formation of the England squad, reaches a high keyness
rating. Another conspicuous item in the GUAR data is beard. Here, an idiosyn-
cratic use of the GUAR commentator, again from the Germany vs. Greece match,
is responsible for its salience. While at the beginning of the coverage the player
Salpingidis is introduced with the metonymic nickname beard to be feared, as in
example (20), at a later point in the match, we can witness a process of personifi-
cation and the reference merely by a physiological feature is taken as established,
as can also be seen from the capitalisation of the term in example (21).18
(20) Gekas will go up front, with the beard to be feared, Salpingidis moving to the right of
midfield. (ger_gre_2206_guar)
(21) The Beard To Be Feared slides a cool low penalty to the right as Neuer goes the other
way. (ger_gre_2206_guar)
In contrast, we can generalise from the SUN vs. GUAR keyness list that SUN com-
mentators more often than not refer to players by their first names (Mario, Bastian,
Antonio, Manuel, Mesut, Cristiano, Miroslav, etc.) and employ more war-/aggres-
17 Burkhardt (2010: 14) presents an overview of the genesis of the term tikitaka. Consider also
the word formations das bundestikiundtakafussball (ger_por_0906_guar); I fell asleep after 63
minutes and have only just woken up from a tiki-taka-induced snooze (ita_eng_2406_guar) or Be-
cause over-intellectualising Spain’s tiki-totalitarianism isn’t going to be enough when you try to big
this up in ten years’ time, I can tell you that for nothing (ger_ita_2806_guar).
18 Cf. the following references to England striker Wayne Rooney: Dicke Chance für Mister Haupt
haar! (‘Big opportunity for Mister scalp hair!’; ita_eng_2406_bild); Wieder kommt das lebende
Haartransplantat Rooney angeflogen, doch sein Kopfball ist eher eine Rettungstat denn ein Torver-
such. (‘Again the living hair transplant Rooney is approaching, but his header is more of a save
than an attempt on target.’; ita_eng_2406_spon).
290 Valentin Werner
sion-related terminology (e.g. fires, impact, strike, shot, kill, onslaught) – although
it might be argued that some of these items have become conventionalised meta-
phors. Puns on players’ names and ad-hoc formations are a common feature of all
OTCs and illustrate creative language use in this type of sports commentary (see
also Section 3.2 on discourse features below; cf. Golebiowski 2012: 58):
The keyword analysis of the German OTCs shows that BILD is much more prone
to using dialectal and jargon words than SPON. Two illustrative instances are
references to Ball (‘ball’) and Tor (‘goal’). While the standard variants (i.e. Ball
and Tor) rank high in the keyness list of SPON, within the top 100 keyness items
of BILD a variety of informal terms both for the former (e.g. Kugel ‘bowl’, Leder
‘leather’, Pille ‘pill’, Murmel ‘marble’)19 and the latter (e.g. Kasten ‘box’, Hütte
‘shed’) occur. On a related note, other salient items worth mentioning due to their
high keyness in BILD are Schlappen (‘foot’; lit. ‘worn-out shoe/slipper’) or Dampf
hammer (‘fast shot on goal’; lit. ‘steam hammer’). This does not mean, however,
that SPON commentators do not use informal or jargon items, as the occurrence
of some other words listed in Burkhardt (2010) shows (see examples (30) to (32)) –
they are just used less frequently.
(30) Also Balotelli sollte heute besser keinen Elfer mehr schießen (ita_eng_2406_spon)20
(31) De Rossi schießt, Hart lässt prallen, Balotelli feuert aus kurzer Distanz drauf, wieder
Hart und dann muss Monotolivo das Ding im Nachschuss machen (ita_eng_2406_
spon)21
(32) Garmash wagt einen Distanzschuss und knallt aus 30 Metern vom linken Flügel aus
auf das Tor. (ukr_eng_1906_spon)22
3.2 D
iscourse features
Again relating to in-group knowledge (see also Gerhardt 2006: 140; O’Keeffe
2006: 155) required by the audience, an earlier analysis has identified “British-
ness” (Chovanec 2008: 261) as common ground of the cross-references in GUAR
OTCs. Some of these findings can be extended to OTCs from other media outlets.
In-group knowledge is required by the reader whenever commentators refer or
allude to particular players, coaches or commentators not part of the current
game or action (and their alleged characteristics, statements or achievements).
Examples (33) to (38) illustrate that this happens in OTCs of all kinds.
In the GUAR data, this is also often observable in the readers’ comments included
in the actual OTC. A similar effect is created by numerous references to scenes
from other games and to other teams, as shown in examples (39) to (43).
(39) Mellberg produces a tackle not too dissimilar to Bobby Moore’s famous one on Jair-
zinho in the 1970 World Cup. (swe_eng_1506_guar)
(40) He makes it to penalty area before old hand Mellberg stops him in his tracks with a
challenge akin to Moore on Pele, 1970. (swe_eng_1506_sun)
(41) I just had a horrible premonition of Balotelli making this match his Maradona ’86
moment and crushing us single-handledly [sic] because he feels like it (ita_eng_2406_
guar)
22 Translation: Garmash tries a distance shot and rifles the ball from 30 meters from the left wing
towards the goal.
23 Translation: But Kroos with a Christian-Rahn-memorial corner.
24 Translation: Pirlo gets the ball anyway, but does the Robben.
25 Translation: Balotelli wants to do the Ibrahimovic.
292 Valentin Werner
(42) Doch im Gegensatz zum FC Bayern nimmt keiner Reißaus oder zeigt auf den Anderen.
(ita_eng_2406_bild)26
(43) Schlecht war die deutsche Mannschaft gegen Portugal eigentlich nur im Jahr 2000.
Damals setzte es ein 0:3. Aber die Abwehrspieler hießen auch Rehmer oder Nowotny.
(ger_por_0906_spon)27
While these intertextual28 references as listed above are not restricted to OTCs
from GUAR, these are the ones where they occur most frequently (see Table 6).
This is also due to another unique feature of GUAR OTCs, which is reference to
popular culture (e.g. actors, movie titles etc.) by both commentators and audi-
ence comments, as exemplified in (44) or (45):
(44) See you in 10 minutes for more of the same, or the most dramatic twist since The Crying
Game/The Usual Suspects/Fight Club/Turner & Hooch. (ger_gre_2206_guar)
(45) Now that Walcott has replaced Ron Perlman England might actually win. (ita_
eng_2406_guar)
All this nicely illustrates the extensive additional knowledge required to become
an actual part of the game, or rather its mediated presentation (see also Gerhardt
2006: 140). In other types of media, commentators deliberately employ intertex-
tual references as one way to create “pseudo-intimacy”, that is, “some sense of
common identity and nationality or some other familiarity built up through fre-
quent ‘contact’” (O’Keeffe 2006: 92)29 and this seems to be the case also in OTC
reportage, most clearly in the GUAR data.
26 Translation: But in contrast to Bayern Munich nobody runs away or points to somebody else.
27 Translation: The only time the German team actually was bad against Portugal was in the year
2000. They got defeated 0:3. But the defenders were called Nowotny and Rehmer.
28 Intertexuality is conceived of in broad terms, including e.g. previous matches, scenes, other
players etc. as (non-linguistic) pre-texts. In addition, this intertextuality may also comprise ste-
reotyped (national) clichés requiring generalised cultural knowledge, such as “[…] but Andreas
Brehme has to be the best Left Back,” says John Duffy. “He had a few problems in the hairstyle
department, mind, but what German doesn’t?” (swe_eng_1506_guar).
29 Cf. also Ferguson’s term “dialog on stage” (1983: 156).
Real-time online text commentaries: A cross-cultural perspective 293
Therefore, Perez-Sabater et al.’s (2008: 255) finding that prosody is usually not
typographically marked in OTCs from British newspapers has to be revised. In
addition, commentators indicate spoken modes of discourse by other means such
as (i) question tags, (ii) interjections and (iii) hesitation markers (or combinations
of these), all typically found in speech (cf. Chovanec 2008). Examples (51) to (54)
illustrate the first type and are commonly used as rhetorical questions or as a
means to convey surprise.
(51) You’d fancy that run continuing this year, no? (ger_por_0906_guar)
(52) Motta reißt Kroos um, Italien bekommt Freistoß. Häh? (ger_ita_2806_spon)33
(53) Oh no they didn’t! Football eh? (ger_gre_2206_sun)
(54) Wenn man sowas übersteht, kann doch nichts mehr schiefgehen, oder? (ger_
por_0906_spon)34
The wide range of interjections found in the data fulfils a similar function of sim-
ulating spoken discourse. Again, they occur across all OTCs, as examples (55) to
(59) show.
30 Expressive punctuation, exemplified in (47), could also be added to the list of parlando pro-
sodics and may thus be seen as a characteristic register feature (cf. Sanchez-Stockhammer, this
volume).
31 Translation: But well, at least this means: IT’S ABOUT TO START!
32 Translation: Rooooooooney pays back.
33 Translation: Motta knocks Kroos down, Italy gets a free kick. Eh?
34 Translation: If you get over such a thing, nothing can go wrong, right?
294 Valentin Werner
Table 7 shows the relevant frequencies, and a “wordiness hierarchy” along the
lines GUAR > SUN > SPON > BILD emerges, which suggests that the German OTCs
are shorter on average. Two aspects are worth considering here: in addition to the
textual commentary, the different OTCs rely on various other forms of presenta-
tion of match-related information, all allocated to different areas on the page or
reachable by clicking on a tab (see Section 2.2 above). Table 8 gives an overview
of presence or absence of these features.
Textual commentary ✓ ✓ ✓ ✓
Team line-ups ✓ ✓ ✓ ✓
Live table ✗ ✓ ✓ ✓
Tactical formations ✗ ✓ ✓ ✗
Player positions/“heatmaps” ✗ ✓ ✗ ✓
Player ratings ✓ ✗ ✓ ✗
Referee statistics ✗ ✗ ✓ ✗
While Table 8 shows that there are some basic elements for all OTCs (match score
and goal scorers, team line-ups, statistics), it also illustrates a fundamental struc-
tural split between GUAR and the remaining three OTCs. GUAR emerges as the
one with least additional informational elements, necessitating, in turn, a more
explicit, or “wordy” style of reportage. The other OTCs, in contrast, rely more on
iconographic and tabular representations (see also Figures 6 and 7), which pro-
vides a first explanation for the lower number of tokens in these.
296 Valentin Werner
A second decisive point is that GUAR focuses on the entertainment aspect (Cho-
vanec 2010: 242), whereas the other three OTCs are more informational in the
sense that they provide an extended range of factual information and statistics.
This might also be the reason why the individual entries in the commentary
are short, as noted by Jucker (2010: 58–60).39 GUAR, in contrast, not only has
longer individual entries than the other OTCs, but relies extensively on readers’
comments and replies by the commentator, comprising up to one third of the
textual material (in number of words). Another characteristic feature of GUAR
is the incorporation of pictures, video clips and links only indirectly related to
the actual match, which rather serve to support the entertainment function.
The other OTCs do not incorporate audience participation at all (SUN, BILD) or
do so in a more direct manner, via Twitter messages displayed next to the main
commentary (SPON), thus creating another layer of commentary (see Section 2.3
above), which breaks the uni-directionality of the communication.
39 However, the span (in terms of word length) across the OTCs is considerable and can range
from just a few words (e.g. Ecke Deutschland ‘corner Germany’; ger_por_0906_bild) to more than
125 tokens.
Real-time online text commentaries: A cross-cultural perspective 297
6000
5000
4000
word count
3000
2000
1000
0
GUAR SUN BILD SPON
AVG 4746.5 3147.2 3059.8 2528.0
AVG ENG 5646 3525.5 3132.8 2054.5
AVG GER 3847 2768.8 2986.8 3001.4
Figure 8: Overall average word count and according to team playing (AVG = overall average;
AVG ENG = average of England match reports; AVG GER = average of Germany match reports)
Both GUAR and SPON are more extensive in their coverage of the “home” team
(these commentaries comprise approximately one third more words than com-
mentaries of the respective other), while this tendency is less clear for SUN
(approximately one quarter more words for England matches) and even slightly
reverse for BILD. Thus, despite claims that audiences of new media are “poten-
Real-time online text commentaries: A cross-cultural perspective 299
tially global” (O’Keeffe 2006: 16), this finding indicates some kind of persisting
“national allegiances”.
Turning to the lexicon and collocations, the analysis above revealed that
content and function vocabulary are broadly comparable across languages.
Equally, OTCs of all types rely on formulaic language, which could be expected
with relation to earlier research on football discourse. From a quantitative per-
spective, however, English OTCs tend to use these combinations more than
German OTCs, in particular when referring to location of the action on the pitch.
Other commonalties are, first, the usage of slang terms and informal items typical
for football language in general. Second, a comparison of the type-token ratios
did not yield any significant differences. Thus, one of the points mentioned above,
namely the restricted lexical range of this particular register and that especially
OTCs associated with yellow press papers (SUN, BILD) are “simple” as regards
lexical content, has to be qualified to a certain extent.
An area where the OTCs clearly diverged along the dimension “intended
audience” emerged in the keyness analysis. Both the English and the German
OTCs yielded some inner differentiation – the former as to a higher salience of
war-related metaphors in SUN, the latter as to a higher salience of dialectal and
jargon vocabulary in BILD. Given the quantitative evidence, it is highly unlikely
that this is a chance finding. Rather, it may be interpreted as an adaptation of
the SUN and BILD commentators to the alleged language use of their intended
readership. Whether this adaptation is deliberate or intuitive remains a matter
of speculation. Puns on players’ names and creative ad-hoc formations can be
found across all OTCs, however.
Discourse features represent a further area where differences and similarities
could be observed. On the one hand, the salience of football- and culture-related
intertextual references as identified by Chovanec (2008) for GUAR OTCs could
also be traced in the other OTCs considered, thus representing another uniting
feature. However, these references are most frequent in GUAR and SPON, sug-
gesting that both the creation of an in-group atmosphere and the often-related
entertainment aspect are more important in the quality-press related OTCs. On
the other hand, the present study confirmed and extended earlier research pos-
iting the staging of orality as a trademark feature of OTCs, showcasing creative
manipulation of restrictions of the written medium, while no cultural specificity
of this phenomenon can be claimed on the basis of the present data (see Perez-Sa-
bater et al. 2008: 256 for a comparison of English, Spanish and French).
Finally, with regard to the interaction between the textual commentary and
other elements of the OTCs, it was evident that all OTCs apart from GUAR rely on
an extended range of supplementary features (mainly tabular and iconographic),
while GUAR may compensate for this lack of factual information with a more
300 Valentin Werner
40 This could also include a case study focusing on the linguistic properties and functions of the
“twitterese” mentioned above.
41 Translation: Martin Olsson prevails against Walcott and Johnson on the left with a great solo.
302 Valentin Werner
While the present study offered a select comparison of German and English
OTCs, an analysis including even more OTCs from other languages and intended
audiences may help to establish a more fine-grained typology of OTCs world-
wide, potentially also considering diachronic developments. In this connection,
it remains to be seen whether audience participation, found to be relatively
restricted in the present study, will play a more important role in the future and
whether further technological developments (e.g. in terms of an integration of TV
and OTC reportage) will have an impact on the style of reporting.
References
Bateman, John A. 2012. Multimodal corpus-based approaches. In Carol A. Chapelle (ed.), The
encyclopedia of applied linguistics, 3983–3991. Oxford: Wiley-Blackwell.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.
Brandt, Wolfgang & Regina Quentin. 1983. Zeitstruktur und Tempusgebrauch in
Fussballreportagen des Hörfunks [Temporal structure and tense use in radio football
reportage]. Marburg: Elwert.
Burkhardt, Armin. 2010. Abseits, Kipper, Tiqui-Taca: Zur Geschichte der Fußballsprache in
Deutschland [Offside, keeper, tiki-taka: The history of football language in Germany]. Der
Deutschunterricht 62(3). 2–16.
Chovanec, Jan. 2008. Enacting an imaginary community: Infotainment in on-line minute-
by-minute sports commentaries. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang
Stadler (eds.), The linguistics of football, 255–268. Tübingen: Narr.
Chovanec, Jan. 2009. ‘Call Doc Singh’: Textual structure and coherence in live text sports
commentaries. In Olga Dontcheva-Navratilova & Renata Povolná (eds.), Coherence and
cohesion in spoken and written discourse, 124–137. Newcastle: Cambridge Scholars.
Chovanec, Jan. 2010. Online discussion and interaction: The case of live text commentary. In
Leonard Shedletsky & Joan E. Aitken (eds.), Cases on online discussion and interaction:
Experiences and outcomes, 234–251. Hershey: IGI Global.
Chovanec, Jan. 2011. Humor in quasi-conversations: Constructing fun in online sports
journalism. In Marta Dynel (ed.), The pragmatics of humour across discourse domains,
243–264. Amsterdam: Benjamins.
Dürscheid, Christa. 1999. Zwischen Mündlichkeit und Schriftlichkeit: Die Kommunikation
im Internet [Between speech and writing: Communication on the Internet]. Papiere zur
Linguistik 60(1). 17–30.
Fairclough, Norman. 1995. Media discourse. London: Arnold.
Ferguson, Charles A. 1983. Sports announcer talk: Syntactic aspects of register variation.
Language in Society 12(2). 153–172.
Gerhardt, Cornelia. 2006. Moving closer to the audience: Watching football on television.
Revista Alicantina de Estudios Ingleses 19. 125–148.
Ghadessy, Mohsen. 1988. The language of written sports commentary: Soccer – a description.
In Mohsen Ghadessy (ed.), Registers of written English: Situational factors and linguistic
features, 17–51. London: Pinter.
Real-time online text commentaries: A cross-cultural perspective 303
Press Gazette. 2013. UK national newspaper sales: Relatively strong performances from Sun
and Mirror. http://www.pressgazette.co.uk/uk-national-newspaper-sales-relatively-
strong-performances-sun-and-mirror (accessed 21 May 2013).
Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus
Linguistics 13(4). 519–549.
Santini, Marina, Alexander Mehler & Serge Sharoff. 2010. Riding the rough waves of genre on
the web: Concepts and research questions. In Alexander Mehler, Serge Sharoff & Marina
Santini (eds.), Genres on the web: Computational models and empirical studies, 3–30.
Dordrecht: Springer.
Schmidt, Thomas. 2007. The Kicktionary: A multilingual resource of the language of football.
In Georg Rehm, Andreas Witt & Lothar Lemnitzer (eds.), Data structures for linguistic
resources and applications, 189–196. Tübingen: Narr.
Siever, Torsten. 2011. Texte i. d. Enge: Sprachökonomische Reduktion in stark raumbegrenzten
Textsorten [Constricted texts: Language-economical reduction in heavily space-constrained
text types]. Frankfurt am Main: Lang.
Simons, Anton. 2011. Journalismus 2.0 [Journalism 2.0]. Konstanz: UVK.
Thurman, Neil & Anna Walters. 2013. Live blogging: Digital journalism’s pivotal platform. Digital
Journalism 1(1). 82–101.
Wells, Matt. 2011. How live blogging has transformed journalism: The benefits and the
drawbacks of the open-to-all digital format. http://www.guardian.co.uk/media/2011/
mar/28/live-blogging-transforms-journalism (accessed 13 April 2013).
Appendix
1 I ntroduction1
The linguistic analysis of registers/genres/text types in a language has always
been controversial, possibly because of the intangible status of such key concepts
(see Schubert, this volume). As Swales (1990: 33) points out when he refers to
specifically genres, “[t]he word [‘genre’] is highly attractive – even to the Parisian
timbre of its normal pronunciation – but extremely slippery”. A first termino-
1 I am grateful to the following institutions for generous financial support: the Spanish Minis-
try of Economy and Competitiveness and the European Regional Development Fund (grant no.
FFI2013-44065-P), and the Autonomous Government of Galicia (grant no. GPC2014/060).
logical remark seems thus in order here as regards the definition of ‘register’,
which constitues the research topic in this study. Following, for example, Taavit-
sainen (2001), who maintains that genres are based on “external evidence in the
context of culture” (140; my italics), where “external evidence” refers to the con-
ventions that have come institutionalised “so that they can function […] as ‘hori-
zons of expectation’ for readers to know what to expect and models of writing
for authors” (141), I will use ‘genre’ when I refer to exclusively the cultural and/
or social dimension of a given textual category. ‘Register’ will be used here with
a focus on the way in which the internal linguistic features of texts are codified
in a given text or category of texts, which matches Taavitsainen (2001: 141) term
‘text type’. Even though text types and genres commonly go hand in hand since
the linguistic characterisation of a textual category prototypically leads to the
latter’s conventionalisation and specialisation in fulfilling a certain discoursive,
communicative or social function, Taavitsainen herself recalls Fairclough’s (1992:
126) claim that a “genre [on occasions] implies not only a particular text type, but
also particular processes of producing, distributing and consuming texts”, which
broadens the notion of genre and covers elements which lie beyond the scope of
this chapter.
Such lack of definition of concepts such as register, genre or text type has led
to multi-faceted studies in this area, adopting a number of different theoretical
frameworks. On some occasions, linguists have addressed the linguistic analy-
sis of registers by focusing on the core or prototypical communicative purposes
attributed to these in (quite often traditional) stylistics. For example, Swales
(1990: 46) notes that “[t]he principal criterion that turns a collection of commu-
nicative events into a [register] is some shared set of communicative purposes”.
In Halliday’s (1978: 122) Systemic Functional Grammar, registers (genres, in their
terminology) are analysed in terms of three variables: their content (or ‘field’), the
participants (‘tenor’) and the channel of communication (‘mode’), that is, three
dimensions which focus on the communicative elements and purposes involved
in a given register. On other occasions, in an approach that will be used in the
present chapter, the study of registers has been addressed through focusing on
empirically-observable stylometric features (e.g. type-token ratios, length of syl-
lables, words, sentences, paragraphs) which are themselves said to reflect more
greater-level concepts such as lexical or syntactic complexity, lexical richness
and ornamentation, etc. In Biber and Conrad (2009) the two basic approaches
just summarised, which I refer to, respectively, as the ‘communicative’ and the
‘language-based’ views, are embodied in a taxonomy which identifies three
perspectives on text varieties (see, for a brief overview, their Table 1.1): (i) style,
which analyses aesthetic and authorial preferences in a given text or group of
texts; (ii) genre, which focuses on the conventional linguistic devices specific to
Diachronic register analysis of markedness 309
a text variety (e.g. ‘genre markers’ such as Dear Sir in a letter); and (iii) register,
which, as already pointed out, deals with the linguistic characteristics common
within a text variety ‒ and also with the situation of use of the variety as will be
argued later. The taxonomy is described in more detail in Dorgeloh (this volume;
Section 2 in particular) and Schubert (this volume).
So far I have equated register with the language-based characterisation of
a given textual category. In this scenario, a further dimension of register must
be brought into play. In line with previous proposal couched in the mutidimen-
sional tradition, Biber and Conrad (2009: 6) claim that the linguistic character-
istics of the textual categories, materialised by means of pervasive and frequent
linguistic features, are “well suited to the purposes and situational context of the
register”. That said, this chapter adheres to such a two-fold view of text varie-
ties, that is, both language-based and situational, and, within a register-centred
approach (as suggested in, for example, Biber 1995a: 1), focuses on the study of a
number of texts in an attempt to explore register variation over the course of the
history of English. On the one hand, I will describe a number of textual categories
by exploring their dependency on a list of structural features, thus adhering to
what is commonly understood by ‘text type’, that is, “grouping of texts that are
similar in their linguistic form” (Biber 1988: 170) or, in other words, codifications
of linguistic features (Taavitsainen 2001: 141). On the other hand, I will connect
the language-based characteristics of the texts with their siatuational interpreta-
tion, thus accepting, for example, Virtanen’s (2010: 57) claim that such linguistic
features “clearly relate to the form that [discourse functions] will take through
aggregates of linguistic exponents of the particular text strategies that are asso-
ciated with them”. The situational interpretation (better said, the functional
interpretation) of the linguistic characteristics of a given text type will lead to the
latter’s status as a ‘register’, in Biber’s terminology. This approach departs from,
for example, Dorgeloh and Wanner’s (2010: 10) terminological account, summa-
rised in Figure 1, where ‘register’ is used as a cover term for text type, genre and
style, and sticks to a twofold characterisation of register which comprises mainly
Dorgeloh and Wanner’s both text type and genre.
310 Javier Pérez-Guerra
Figure 1: Register, text type, genre and style in Dorgeloh and Wanner (2010)
This chapter will focus on register variation and, more specifically, on the rel-
evance of syntax for this issue. In this respect, Dorgeloh and Wanner (2010)
observe that resgiter is “language variation beyond the limits of semantic equiv-
alence, which is why syntax […] provides a promising area of study” (8) and that
“[i]t is form, and here morphosyntactic form in particular, that constitutes ‘a prior
condition for reasoning about [register]’” (9). In this scenario, under the philos-
ophy of Biber’s (1988, 1995a) groundbreaking multifactorial multidimensional
model, this study will combine the main approaches to the analysis of registers
already mentioned, that is, communicative and more language-based (syntactic)
standpoints, in that findings from the latter will be associated with a correspond-
ing functional interpretation (or dimensional interpretation, as Biber puts it).
In other words, by investigating the spread of a number of objectively identified
linguistic constructions in a selection of registers, and by interpreting the statis-
tical results of (co-)occurrence, this study will not only shed some light on the
functional interpretation of registers but also detect diachronic variation across
them. Furthermore, this chapter will suggest some kind of link between syntactic
markedness and the degree of (functional) conventionalisation or specialisation
of registers.
This paper, then, focuses on the analysis of registers in English while also
describing variation in the recent history of the language. It also aims to con-
sider the application of some of the assumptions of Biber’s model to syntactic
strategies at a supra-phrasal level. In Section 2, I will very briefly summarise the
features of the multidimensional model which constitutes the inspiration for the
study, this case study and its specific methodology. The results are discussed in
Section 3. Section 4 offers a summary of the investigation plus some suggestions
for further avenues for research.
Diachronic register analysis of markedness 311
2.1 T
he linguistic variables
positions, relativisers which, who, etc. In this chapter, and this makes this study
particularly innovative, I concentrate on syntactic supra-phrasal variables, spe-
cifically word-order phenomena, which cannot be determined by focusing on the
occurrence ratios of specific lexical elements. Following the multidimensional
model, these will be given so-called social or functional interpretations which
will pave the way for the detection of diachronic variation in English as far as
sentence linearisation is concerned.
As regards the variables to be analysed here, I have focused on syntactic
markedness at the level of the clause. From (at least) a statistical standpoint,
the default organisational schema of a declarative clause in English is subject-
verb-(complement), this being the most versatile design of the clause from the
point of view of information structure and processing. Deviation from such a
schema implies some degree of markedness. In particular, in what follows I will
focus on three syntactic strategies which, first, lead to marked designs as far as
word order is concerned and, second, involve elements other than the subjects in
sentence-initial position. Since this methodology aims to determine not strictly
linguistic but also social or situational variation in the language, I will follow
Virtanen (2004: 12) in her claim that “the sentence-initial slot itself constitutes a
rich source of discourse meanings precisely because of its cognitive relevance for
our processing capacities and memory constraints”. The three constructions are:
(i) Topicalisation (TOP), in which a (marked) constituent is in sentence-ini-
tial position ‒ example (1) below illustrates the topicalisation of the that-clause
object that I had received such from Edward,
(ii) Left dislocation (LFD), with a (marked) non-argument constituent in sen-
tence-initial position ‒ in (2), the constituent he that thynkethe it a harde thynge
to agre to the conclusion is a left-dislocated noun phrase which corefers with the
pronominal object hym in the ensuing main clause,
(iii) What I call other ‘subject-last’ strategies (SUBJ-LAST), which contain
(marked) non-subject constituents in sentence-initial pre-verbal position. The
SUBJ-LAST strategy comprises basically those examples of subject-verb inver-
sion and subject-extraposition ‒ example (3) below illustrates subject-verb inver-
sion, with the subject complement very great in sentence-initial position and the
subject following the verb; example (4), in which the that-clause that for x. yeres
then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones
functions as the (logical) subject of the sentence and occurs in sentence-final
Diachronic register analysis of markedness 313
(1) [That I had received such from Edward]i also I need not mention ∆i (Austen-180X,187.621)
[TOP]
(2) […] but [he that thynkethe it a harde thynge to agre to the conclusion,]i it behoueth
hymi to shew eyther that some false thynge hath gone before, (BOETHCO-E1-H,99.610)
[LFD]
(3) […] and very great was [my pleasure in going over the house and grounds]Subject. (Aus-
ten-180X,168.182) [SUBJ-LAST, subject inversion]
(4) yt was enacted ordeigned and graunted by auctorite of the same p~liament, [that
for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s
p~sones]Subject, (Statutes(II):524) [SUBJ-LAST, subject extraposition]
As already pointed out, the strategies TOP, LFD and the so-called SUBJ-LAST
constructions investigated here have been chosen because they are syntactically
marked since they do not comply with the default subject-verb(-complement)
design. In particular, their markedness is basically due to the location occupied
by the subjects, which are not clause-initial when constituents are topicalised
(TOP) or left-dislocated (LFD), when verbs and subjects swap positions (sub-
ject-verb inversion, a type of SUBJ-LAST) or when the subjects are placed in
clause-final position (subject-extraposition, another instance of the SUBJ-LAST
construction). Since subject placement is the trigger for these construction, in
line with the above-mentioned consequences which the unmarked placement of
the subject has for the processing and interpretation of clauses and sentences, in
what follows I will provide a very brief overview of the informative and/or com-
municative properties of the strategies TOP, LFD and SUBJ-LAST.
First, TOP merits attention in register analysis because this syntactic strat-
egy involves a specific not only syntactic but also informative arrangement of the
clause. Following Virtanen (2004: 80–82) [my italics],
Starting points are assumed to be light, small in size, and consist of given information. The
reader’s main inferencing effort is expected to take place later in the sentence […]. Secondly,
elements placed at the outset of a sentence also help readers anticipate what is to come as
they pinpoint what the sentence is about and how it relates to the discourse topic (…). Fur-
thermore, it is occasionally profitable to start with what is regarded as ‘crucial information’
2 TOP, LFD and the constructions within the frame of the SUBJ-LAST strategy have been ap-
proached from different perspectives in, for example, Virtanen’s (2010) qualitative scrutiny of
sentence openers in narrative texts and in Kreyer’s (2010) paper on sentence-initial locatives in
inversion constructions, in which a qualitative perspective on the description of the so-called
‘immediate-observer effect’ function is adopted.
314 Javier Pérez-Guerra
[…] Sentence-initial adverbials […] tend to form chains of text-strategic markers which have
two basic functions in the discourse. They help create coherence and at the same time they
signal text segmentation.
Virtanen thus summarises the informative function of TOP, that is, introduc-
ing constituents which do not convey given information in a position which is
reserved for given elements according to the given-new principle. This analysis
of TOP is in keeping with Prince (1981: 128), who highlights the salient status
of topicalised constituents. Prince claims that TOP implies “inference on the
part of the hearer that the entity represented by the initial NP stands in a salient
partially-ordered set relation to some entity or entities already evoked in the dis-
course-model”. Furthermore, she contends that “if the entity evoked by the left-
most NP represents an element of some salient set, make the set-membership
explicit”.
Second, the discourse functions which have been attributed to LFD in the
literature can be reduced to two: (i) a ‘simplifying’ function, according to which a
constituent conveying discourse-new information can be placed in sentence-in-
itial position, and (ii) a ‘poset’ function. As for the simplifying function, Prince
(1997: 138–139) contends that LFD can “simplify discourse processing by removing
a Discourse-new entity from a position in the clause which favors Discourse-old
entities, replacing it with a Discourse-old entity (i.e. a pronoun)”. In the same
vein, Gundel (1985) and Geluykens (1992) claim that LFD introduces a new topic
into discourse. On the other hand, Prince (1997: 138–139) maintains that sen-
tences containing left-dislocated phrases “trigger an inference that the entity rep-
resented by the initial NP stands in a salient partially-ordered set relation to some
entities already in the discourse-model”, and that this favours the so-called poset
function. In other words, the left-dislocated constituent resumes a number of ref-
erents previously evoked in the sentence by introducing a new expression which
activates previous earlier (thus, informatively given or old) referents. In short,
like TOP, LFD implies the placement of a new constituent in sentence-initial posi-
tion, the main difference between TOP and LFD being that the former selects an
extralinguistic referent already evoked in discourse and marks it as informatively
salient, whereas LFD constituents seldom refer to topics which have already been
introduced in the discourse.
Third, as already stated, the SUBJ-LAST constructions involve examples of
subject-verb inversion and subject-extraposition, illustrated, respectively, in (3)
and (4) above. As regards subject-verb inversion, it is commonly acknowledged
in the literature (e.g. Green 1980: 583; Birner 1994: 241; Dorgeloh 1997: 46) that
the informative principle given-new is not at work in subject-verb inversion, since
the preverbal constituent conveys information which is salient in the discourse,
Diachronic register analysis of markedness 315
3 The data in Pérez-Guerra (2005: 350) confirm that the determinant of subject-extraposition
is not end-focus but end-weight. The strategy of extraposition is, then, redistributional in the
sense that its main role is to place long clausal subjects in final position and thus preserve the
unmarked subject-verb(-complement) pattern from having non-prototypical material in sen-
tence-initial position.
316 Javier Pérez-Guerra
The data for the present study were retrieved from the following corpora:
– the Penn-Helsinki Parsed Corpus of Middle English, second edition (1150–
1500; henceforth PPCME2; Kroch and Taylor 2000),
– the Penn-Helsinki Parsed Corpus of Early Modern English (1500–1710;
PPCEME; Kroch et al. 2004)
– the Penn Parsed Corpus of Modern British English (1700–1914; PPCMBE;
Kroch et al. 2010).
The periods to be investigated are Middle (ME), Early Modern (EModE) and Late
Modern English (LModE), that is, the periods following the initiation of the
process of word-order syntacticisation or fixation in English around the default
pattern subject-verb(-complement) in declarative clauses. These corpora were
selected because, first, they are multi-register and, as noted above, this accom-
modates the need for representativeness. Second, they are parsed corpora follow-
ing (almost) identical parsing conventions. These make use of part-of-speech and
syntactic tagsets based on what we might call a shallow version of Principles-
and-Parameters. To give an example from the corpora, (5’) plots the graphical
adaptation of the parsed version of sentence in (5) from PPCMBE:
(5) a serious cheerfulness; that is the right mood in this as in all cases. (CARLYLE-
1835,2,278.374)
for object, LFD for left-dislocated constituent and RSP for resumptive, that is, the
proform which corefers in the clause with the left-dislocated material).
LFD is parsed as such in the corpora, which means that the data can be
retrieved automatically by means of specific software. In this case, the raw empir-
ical results of the search had to undergo extensive manual revision. Thus LFD was
retrieved by means of the (CorpusSearch) query in (6), which identifies clauses
(or IPs) dominating left-dislocated constituents.
(7) But of the tree of the knowledge of good and euill, thou shalt not eate of it: (AUTHOLD-
E2-H,II,1G.155)
By contrast, many of the examples parsed as LFD in the corpora which contain
non-(pro)nominal resumptives have not been considered in this study. Exam-
ples of such constructions are given in (8) to (10), in which the resumptives are,
respectively, then, yet and so:
(8) […] but if it worke vpon it selfe, as the Spider worketh his webbe, then it is endlesse,
(BACON-E2-H,1,20R.49)
(9) […] and though he suffer’d only the name of a slave, and had nothing of the toil and
labour of one, yet that was sufficient to render him uneasy; (BEHN-E3-H,193.231)
(10) And as these Languages ought to be well understood, so they shou’d be learn’d in as
short a Time as may be. (ANON-1711,3.6)
As regards TOP, which was not specifically tagged in the corpora used here, the
CorpusSearch queries in (11) and (12) were used to retrieve examples, respectively,
of topicalised complements (more specifically, nominal objects, subject predica-
4 A few examples from the database contain TOP and LFD of that-clauses. As regards LFD, since
such that-clauses are resumed by a (pro)nominal copy, they fit the concept of LFD as established
in this study. An example of a left-dislocated that-clause is given in (i):
(i) [That false Locks as they call them of some Hair, being by curling or otherwise brought to a
certain degree of driness, or of stiffness, will be attracted by the flesh of some persons, or
seem to apply themselves to it, as Hair is wont to do to Amber or Jet excited by rubbing.]i Of
thisi I had a Proof in such Locks worn by two very Fair Ladies that you know. (BOYLE-E3-
H,27E.93)
318 Javier Pérez-Guerra
Table 1 provides the raw figures of the distribution of the three constructions
under analysis (the TOP data in Table 1 only includes topicalised complements
for reasons which will be explained below). Figure 2 sets out the frequencies for
LModE normalised to 1,000 clauses (or IPs):
(14) [After that a childe is come to seuen yeres of age,]Adjunct I holde it expedient that he be
taken from the company of women (ELYOT-E1-H,23.27)
320 Javier Pérez-Guerra
The proportions of LFD, TOP and SUBJ-LAST were analysed in all the registers
in the corpora, namely Biography, Diary, Drama, Education, Fiction, Handbook,
History, Law, Letters, Philosophy, Science, Sermon, Religious treatises, Trave-
logue, Trials and Romance. Due to their archaic style and clausal organisation, I
did not include Bible texts. Also, given that comparison with other Fiction texts
in the latter periods is impracticable, the Fiction material in ME was not ana-
lysed. Following Culpeper and Kytö’s (2010: 16–18) typology of registers, those
listed above can be argued to provide an overall view of the English language in
its recent history: (i) writing-related registers such as Science, Law, Education,
Religious treatises, that is, registers which are primarily attested in the written
form; (ii) speech-purposed registers, designed to be articulated orally (either read
out or performed), like Drama and Sermons; (iii) speech-like texts in the Diaries,
Letters and Biographies, which contain features of “communicative immediacy”
(Culpeper and Kytö 2010: 17); and (iv) speech-based registers, based on actual
real-life speech events, here illustrated by the Trials.
The normalised frequencies of the three constructions in all the registers are
plotted respectively in Tables 2, 3 and 4.
Table 3: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in EModE
Table 4: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in LModE
With a view to determining the statistical role of each construction in the periods
under investigation, Figure 3 below displays the frequencies of the three con-
structions and reveals that, in line with the syntacticisation of subject-verb(-com-
plement) word order in English, they all decrease considerably over time. More
322 Javier Pérez-Guerra
Figure 3: Frequencies of LFD, TOP and SUBJ-LAST in ME (PPCME2), EModE (PPCEME) and LModE
(PPCMBE)
The section is organised as follows: 3.1 deals with the distribution of the LFD
data. Section 3.2 focuses on the analysis of the TOP examples from the database.
Finally, Section 3.3 considers the diachronic progression of the SUBJ-LAST con-
structions under investigation.
Figures 4, 5 and 6 contain the normalised frequencies (per 1,000 clauses) of LFD
in, respectively, ME, EModE and ModE. Table 5 provides an overview of the fre-
quency of LFD per register.7
Figure 4: LFD in ME (the dotted line plots the mean normalised frequency)
7 In the columns containing the registers with lower/higher proportions of LFD, TOP and SUBJ-
LAST in, respectively, Tables 5, 6 and 7 I have included a selection of the registers occurring
either before (lower proportions) or after (higher proportions) of the dotted line expressing the
mean normalised frequency of the distribution in the figures preceding the tables. As the figures
reval, the groups of registers resulting from the classification into those exhibiting more or fewer
examples of the constructions under investigation is not neat and, in consequence, in order to
determine the connection between register and syntactic markedness I have considered only
those registers which are more representative for that purpose.
324 Javier Pérez-Guerra
8 The adscription of the historical registers under investigation to the more/less literate options
is based on stylistic pervasiveness within the text types. Even though the degree of stylistic hy-
bridity is noteworthy in some of the registers (see my comments in Section 4), in order to de-
termine connections between register and productivity of LFD, I have adhered to the taxonomy
±literate by relying on the style which is dominant in the texts studied.
326 Javier Pérez-Guerra
Following the outline in Section 3.1, Figures 7, 8 and 9 show the distribution of
TOP in, respectively, ME, EModE and LModE in the database. Table 6 summarises
the results by classifying the registers into those in which TOP is frequent and
those with low levels of TOP.
Diachronic register analysis of markedness 327
Figure 7: TOP in ME
Philosophy texts) in the group of registers containing fewer examples of TOP, and
more formal registers (Religious treatises and Law) with many more instances of
TOP. Nonetheless, Figures 8 and 9, which provide the information corresponding
to, respectively, EModE and LModE, and Table 6, with an overview of the pre-
vailing trends over time, reveal that TOP is no longer a textual marker in Modern
English, in that it is a frequent syntactic device found in registers like Law and
History, commonly classified as formal registers, and in Fiction or Diary, which
are indisputably less formal. The data thus make clear the textually unmarked
status of TOP as a functional or situational marker.
As mentioned in Section 3.1, in an attempt to give value to the connection
between the distribution of formal linguistic features and the situational or func-
tional status of registers, I would like to establish a link between the unmarked
textual condition of TOP, resulting from the analysis of the data, and the linguis-
tic characterisation of TOP as a syntactic device in English. Syntactically, TOP
involves the promotion of a constituent (either complement or modifier/adjunct)
to sentence-initial position, which does not imply the violation of the unmarked
subject-verb design of the English declarative clause. In Section 4 I will hold the
position that if a given linguistic feature (in this research, a construction type)
does not trigger a significant level of linguistic (here, syntactic) markedness,
then a blatant functional or situational interpretation derived from the occur-
rence of the feature will not necessarily be at work. What I will be hypothesising
later, although I am aware that this demands further research, is that linguistic
markedness runs parallel to consistent functional specificity. If this is indeed the
case, it would further emphasise the empirical relevance of multidimensional
approaches.
ME Handbook Travelogue
Law Romance
Philosophy
Both the previous figures and Table 7 show that the frequencies of the SUBJ-LAST
constructions investigated in this study are somehow connected to the degree of
subject-involvement, as evinced by the registers in the database. In registers such
as Law and Science (and many Handbooks in the LModE corpus), which proto-
typically avoid speaker/writer- or hearer/reader-oriented linguistic features, one
finds fewer examples of SUBJ-LAST constructions. By contrast, practically all the
registers in the rightmost column in Table 7 (Travelogue, Romance, Fiction, Diary,
Drama) would be described as subject-oriented registers in the traditional stylo-
metric literature and do contain many examples classifiable as SUBJ-LAST in this
332 Javier Pérez-Guerra
structions which are syntactically most marked as far as word order is concerned
constitute hallmarks of well-defined situational interpretations of the registers
in which they occur at an appropriate frequency. In this respect, since TOP does
not significantly alter the unmarked subject-verb(-complement) organisation of
the English clause, it has thus been shown not to trigger a register-specific situ-
ational interpretation and, as already reported, has been defined as a textually
unmarked linguistic device. By contrast, the occurrence of LFD and SUBJ-LAST in
sentences which end up exhibiting syntactically marked word-order designs has
been related to specific situational interpretations: LFD evinces register literacy
and SUBJ-LAST is a marker of subject- or participant-involvement in a register.
The study concludes that word-order strategies can be added to the list of
linguistic features, units or variables on which register analysis can rely. This not-
withstanding, a final remark is in order here to acknowledge the high level of
heterogeneity in the registers which the statistical analysis of the texts has iden-
tified. First, hybridity in registers is sometimes a formal or a linguistic issue. In
this respect, Biber and Finegan (1988: 3) recognise that for some registers “greater
linguistic differences exist among texts within the categories than across them” –
to give some examples, in this chapter I noted both the speech-related status of
some Philosophy texts and the differences in subject-involvement among modern
Handbooks. Second, as contended by writers such as Virtanen (2010: 58) when
she says that “texts are seldom unitype; text types usually appear in embedded
hybridized forms, resulting in multiple texts”, the multidimensional model must
be able to encompass the existence of texts and even text types which are not
prototypical indicators of a given situational or textual interpretation. Finally, as
recognised in Biber and Conrad (2009: Chapter 7), hybridity also underlies the
classification of texts into registers – see also Biber & Egbert (this volume) for
an experiment on the classification of (mostly) hybrid internet registers. Virta-
nen (2010: 76) also refers to this when she says that “[o]ne and the same text
type can be put to use in very different genres [registers], and one and the same
genre easily manifests texts that can be related to very different types”. The model
would thus benefit from the statistical analysis of individual texts by means of
factorial or logistic regression techniques.
To conclude, two issues have been left for further research. On the one hand,
the validity of the findings in this study should be tested by extending the time
span of the investigation to include Present-Day English data. In this respect,
parsed corpora of contemporary English would provide empirical evidence of the
issues raised in this chapter. On the other hand, a key issue in historical regis-
ter variation, one pointed out in Biber and Conrad (2009: 166), is the distinction
between language change and register variation. As recognised in Lijffijt et al.
(2012), the null assumption in diachronic textual studies has usually been that
334 Javier Pérez-Guerra
References
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas. 1995a. Dimensions of register variation. A cross-linguistic comparison.
Cambridge: Cambridge University Press.
Biber, Douglas. 1995b. On the role of computational, statistical, and interpretive techniques in
a multi-dimensional analysis of register variation. A reply to Watson. Text 15(3). 341–370.
Biber, Douglas. 2013. Register as a predictor of linguistic variation. Paper presented at
‘Register revisited: New perspectives on functional text variety in English’ International
Conference, University of Vechta, 27–29 June.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Biber, Douglas & Jesse Egbert. This volume. Towards a user-based taxonomy of web registers.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. London: Longman.
Birner, Betty J. 1994. Information status and word order: An analysis of English inversion.
Language 70(2). 233–259.
Bolinger, Dwight. 1992. The role of accent in extraposition and focus. Studies in Language
16(2). 265–324.
Culpeper, Jonathan & Merja Kytö. 2010. Early Modern English dialogues: Spoken interaction as
writing. Cambridge: Cambridge University Press.
Dorgeloh, Heidrun. 1997. Inversion in modern English: Form and function. Amsterdam: John
Benjamins.
Dorgeloh, Heidrun. This volume. The interrelation of register and genre in the medical register.
Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner
(eds.), Syntactic variation and genre, 1–26. Berlin: Mouton de Gruyter.
Fairclough, Norman. 1992. Discourse and social change. Cambridge: Cambridge University
Press.
Geluykens, Ronald. 1992. From discourse process to grammatical construction: On
left-dislocation in English. Amsterdam: John Benjamins.
Green, Georgia M. 1980. Some wherefores of English inversion. Language 56. 582–601.
Diachronic register analysis of markedness 335
Gundel, Jeanette K. 1985. ‘Shared knowledge’ and topicality. Journal of Pragmatics 9(1).
83–107.
Halliday, Michael A. K. 1978. Language as social semiotic. London: Edward Arnold.
Kreyer, Rolf. 2010. Syntactic constructions as a means of spatial representation in fictional
prose. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 277–303.
Berlin: Mouton de Gruyter.
Kroch, Anthony & Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second
edition.
Kroch, Anthony, Beatrice Santorini & Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early
Modern English.
Kroch, Anthony, Beatrice Santorini & Ariel Diertani. 2010. Penn-Helsinki Parsed Corpus of
Modern British English.
Lijffijt, Jefrey, Tanya Säily & Terttu Nevalainen. 2012. CEECing the baseline: Lexical stability and
significant change in a historical corpus. In Jukka Tyrkkö, Matti Kilpiö, Terttu Nevalainen
& Matti Rissanen (eds.), Studies in variation, contacts and change in English. Vol. 10:
Outposts of historical corpus linguistics: From the Helsinki Corpus to a proliferation of
resources. Helsinki: University of Helsinki (Research unit for Variation, Contacts and
Change in English). http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_
nevalainen (accessed 9 February 2015).
McCawley, James D. 1988. The syntactic phenomena of English. Vols. 1, 2. Chicago: The
University of Chicago Press.
Pérez-Guerra, Javier. 2005. Word order after the loss of the verb-second constraint or the
importance of Early Modern English in the fixation of syntactic and informative (un-)
markedness. English Studies 86(4). 342–369.
Prince, Ellen F. 1981. Topicalization, focus-movement, and Yiddish-movement: a pragmatic
differentiation. In Danny K. Alford Karen, Ann Hunold & Monica A. Macaulay (eds.),
Proceedings of the Seventh Annual Meeting of the Berkeley Linguistics Society, 249–264.
Berkeley: Berkeley Linguistics Society.
Prince, Ellen F. 1997. On the functions of left-dislocation in English discourse. In Akio Kamio
(ed.), Directions in functional linguistics, 117–144. Philadelphia: John Benjamins.
Schubert, Christoph. This volume. Introduction: current trends in register research.
Swales, John M. 1990. Genre analysis: English in academic and research settings. Cambridge:
Cambridge University Press.
Taavitsainen, Irma. 2001. Changing conventions of writing: The dynamics of genre, text types,
and text traditions. European Journal of English Studies 5(2). 139–150.
Takahashi, Kunitoshi. 1992. Constructionally presentational sentences. Lingua 86. 119–148.
Virtanen, Tuija. 2004. Point of departure: Cognitive aspects of sentence-initial adverbs. In
Tuija Virtanen (ed.), Approaches to cognition through texts and discourse, 78–97. Berlin:
Mouton de Gruyter.
Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological
perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic
variation and genre, 53–84. Berlin: Mouton de Gruyter.
Index
academic writing 4, 8, 10, 137–138, 139–165, divided attention 189–191
169–191, 195, 200–201, 206–211, double referentiality/doubly referential 122,
215–217, 221, 223–247, 251, 254–269 131–132
air traffic control communication 67–73, 75, dual nature 181, 189
79–80, 82–83
air traffic management 69–70, 75, 79 electronic communication 275
attention system 169, 178–179 electronic media 276, see also medium,
attenuation effect 172, 190 electronic medium
audience participation 282, 284, 296, 300, electronically-mediated 271, 278, 300
302 exclamation mark 118, 139, 142, 147–153,
automatic register/genre identification 20, 155–158, 162–164
23, 39 extraposition 9, 222, 307, 312–315, 329, 332
Aviation English 10, 17, 67–83
face-to-face conversation 2, 72–73, 83, 255
brackets 137, 139, 142, 147, 150–153, football language 272, 274, 279–280, 282,
156–157, 160–162, 164–165, 173 286, 288, 299–300
frame 130–133
cognitive linguistic(s) 111, 113, 130
cognitive representation 113, 121, 124, genre 1, 2, 4–5, 8, 17, 20–21, 23, 43–62, 88,
129–130 95, 123, 142, 163, 189, 227, 253, 271,
cognitive semantics 169–170, 173, 176, 178, 275, 300, 307–309, 333
191
cohesion/cohesive 3, 114–115, 120, 138, 196, hip-hop 10, 17–18, 87–109
202–205, 209, 213–218, 245 hybrid(ity) 33, 43–45, 49–53, 55–59, 271,
comic 2, 10, 137, 139, 153–165 325, 333
comma 139–140, 142, 150–157, 160, 164, 185 ––hybrid register 19, 22–23, 27–28, 30–32,
conceptual metaphor/conceptual metaphor 36–40, 44, 62, 222, 272
theory (CMT) 88, 221, 224–226, 227,
230–234, 238, 240, 246 ICE see International Corpus of English (ICE)
conjunction 138, 195, 203–205, 209–215, illness blog 17, 43, 48–52, 58–61
217, 311 infotainment 271–272
contrastive linguistics 7, 10 intercultural communication 2
cross-cultural/cross-linguistic register 2, 7, International Corpus of English (ICE) 138,
222, 268, 271, 273–274, 298, 301 195–196, 199–201, 203, 205, 218, 221,
223, 228–229, 231–234, 237–247, 251,
description 9, 24, 26, 28–38 257, 261–269, 288
diachronic 1, 5–6, 10, 88, 221–222, 307–334 internet/web 9–10, 17, 19–40, 50–52, 89,
dialect 2, 4, 6–7, 82–83, 102, 196–197, 237, 92, 113, 151, 163, 218, 222, 230, 247,
288, 290, 299 271–302, 333
discourse hybridity 17, 44, 49, 52, 55–56, 62 intertextual/intertextuality 18, 111–133, 292,
discourse type 44, 47, 62 299, 301
discussion 24–25, 28–29, 31–32, 34, 37–38 inversion 100, 148, 307, 312–315, 329, 332
dislocation 222, 307, 312, 323, 332, see also
left dislocation left dislocation (LFD) 312–326, 328, 332–333
338 Index
lexical density 138, 195, 203, 205, 208–217, popular music/pop songs 2, 18, 87–89,
288 91–92, 94–96, 98–99, 105, 125, 127–128
lyrics 18, 25–26, 30–31, 87–109 pronominal reference 56, 59–60, 202–203,
205, 216, 312
marked(ness) 49, 57, 83, 118, 129, 152, 155, pronoun 9, 49, 56, 59, 93, 114, 120, 138, 143,
158, 161, 200, 222, 307–334 147, 195, 203–212, 214–216, 227, 252,
MDA see multidimensional analysis (MDA) 255–258, 314, 317
medical case report 17, 50, 52–53, 57, 59, 61 ––personal pronoun 60, 88, 93, 103–105,
medical discourse 17, 43–62 114, 185, 203, 206–207, 255–256,
medium 4, 6–7, 9, 57, 72, 91, 114, 138, 145, 261–262, 265–266
170, 172–175, 180–181, 195–219, 222, punctuation 90, 119, 137, 139–165, 171, 173,
279, 294 175, 183
––electronic medium 275, 300
––medium of print 176, 188–189 quasi-conversation 282, 284, 300
––spoken medium 181, 215 question mark 118, 142, 146–153, 155–158,
––written medium 174, 182, 196, 202, 210, 162–165
293, 299, 325
metaphor(ical) 2, 5, 10, 88, 119, 187, 189, raters 19, 23, 27, 30–39
221, 223–247, 290, 299, 301, see real-time online text commentaries (OTCs) 2,
also conceptual metaphor/conceptual 10, 222, 271–302
metaphor theory (CMT) reference/referential 59, 122–124, 131–132,
multidimensional 6–7, 139, 164, 253, 307, 180, 186, 203
309–312, 329, 332–333 regional variation 6–7, 10, 138, 196, 215, 219,
multidimensional analysis (MDA) 4, 6, 9–10, 251–252, 255, 261–262, 268
253, 326
SFL see Systemic Functional Linguistics (SFL)
narration 9, 31, 37–38, 44, 48, 52, 56–59, sociolect 3
139, 154, 300 sociolinguistic approach 3, 6–7, 112, 138
narrative/narrativity 24–25, 28–40, 43–45, specialised registers 10, 17, 67–83
47–62, 160, 177, 272, 274–275, 277, 328 spoken mode 9, 24–25, 137, 172–173, 179,
New English(es) 7, 10, 221, 223, 252, 257, 181, 189, 255, 265, 275, 293
268 standard(s) of textuality 115, 122
newspaper writing 2, 139, 147–148, 222, 293 standardised phraseology 17, 67, 70–83
noun phrase (NP) 9–10, 103, 114, 129–130, style 1, 4–5, 123, 139, 145, 150, 155, 158, 163,
144, 221, 251–269, 312, 314, 316, 318 179, 183, 210–211, 262, 271, 273–279,
noun phrase complexity/NP complexity 221, 284, 295, 300–302, 308–310, 320
251–269 sub-register 4, 17–18, 19–40, 44, 73–74,
88, 105, 111, 122–133, 142, 176, 221,
opinion 24–26, 29, 31–40, 177, 284, 300 223–247, 257
OTCs see real-time online text commentaries suspension dots 142, 152, 154, 156–157,
(OTCs) 160–165
synchronic 221
paratext(ual) 274, 279, 295, 300 Systemic Functional Linguistics (SFL) 3, 8,
parenthetical construction 137, 151–152, 169, 196–198
170–175, 179–186, 189–191
persuasion 9, 26, 28, 30–33, 36–39, 265 teaching 8, 87
plain Aviation English 10, 17, 67–83 text 1–5, 8–9
Index 339
time adverbials 49, 56–59, 143 ––regional variety 1, 195, 217–218, 221,
topic 3, 8, 33, 48–50, 58, 60–62, 74, 80, 251–254, 262, 267–268
82–83, 92–95, 114, 176–177, 237–238, ––text(ual) variety 1, 7, 44, 50, 54, 56, 111,
242, 272 227, 309, 332
topicalisation (TOP) 222, 307, 312–334 variety-specific 198–199, 205, 231, 239–240,
Twitter 271–274, 283–284, 296, 301 242, 246–247, 259, 267, 269
unmarked 60, 119, 129, 152, 313, 315, 326, web see internet/web
329, 332–333 word order 9–10, 144, 222, 252, 307, 312,
316, 321–322, 332–333
variational text linguistics 1, 221 World Englishes 6, 223–224, 252
variety 2–11, 23, 40, 43–62, 71–78, 82–83, written mode 24–25, 113, 172, 181, 190, 271
138–139, 142, 158, 195–219, 221,
223–247, 251–269, 272, 308–309, 332