Variational Text Linguistics (2016)

Christoph Schubert and Christina Sanchez-Stockhammer (Eds.
)
Variational Text Linguistics

Topics in English Linguistics
Editors
Elizabeth Closs Traugott
Bernd Kortmann
Volume 90

Variational
Text Linguistics
Revisiting Register in English
Edited by
Christoph Schubert
Christina Sanchez-Stockhammer

ISBN 978-3-11-044310-3
e-ISBN (PDF) 978-3-11-044355-4
e-ISBN (EPUB) 978-3-11-043533-7
ISSN 1434-3452
Library of Congress Cataloging-in-Publication Data

A CIP catalog record for this book has been applied for at the Library of Congress.
Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.
© 2016 Walter de Gruyter GmbH, Berlin/Boston

Cover image: Brian Stablyk/Photographer’s Choice RF/Getty Images
Typesetting: fidus Publikations-Service GmbH, Nördlingen
Printing and binding: CPI books GmbH, Leck
♾ Printed on acid-free paper
Printed in Germany
www.degruyter.com

Acknowledgements
The foundations for this edited collection of articles were laid at the interna-
tional conference Register revisited: New perspectives on functional text variety in
English, which took place at the University of Vechta, Germany, from June 27 to 29,
2013. The aim of the present volume is to conserve the research papers and many
inspiring discussions which were stimulated then and to make them available to
a larger audience.
It was only possible to achieve this aim thanks to the help of many people
joining us in the effort. First and foremost, we would like to thank all contributors
for their continued cooperation in this project. Furthermore, we are very grate-
ful to the external peer reviewers who contributed their expertise to the selec-
tion and improvement of the contributions. These are (in alphabetical order):
Federica Barbieri (Swansea, Wales), Eniko Csomay (San Diego, USA) Jürgen Esser
(Bonn, Germany), Maria Freddi (Pavia, Italy), Christer Geisler (Uppsala, Sweden),
Bethany Gray (Ames, Iowa, USA), Joachim Grzega (Eichstätt, Germany), Thomas
Kohnen (Cologne, Germany), Rocío Montoro (Granada, Spain), Neal Norrick (Saar-
brücken, Germany), Caroline Tagg (Birmingham, UK), Sanna-Kaisa Tanskanen
(Helsinki, Finland) and Marija Zlatnar Moe (Ljubljana, Slovenia).
We are very happy that this volume appears in the series Topics in English Lin-
guistics (TiEL) and would like to thank the series editors Elizabeth Traugott and
Bernd Kortmann as well as Wolfgang Konwitschny, Julie Miess and Birgit Sievert
at de Gruyter Mouton for their invaluable support in the preparation of this book.
Needless to say that we are to blame for any remaining inadequacies.
Going back to the roots of this project, we would like to express our grat-
itude to the German Research Foundation/Deutsche Forschungsgemeinschaft
(DFG) for the generous funding of the conference as well as to the Kommission
für Forschung und Nachwuchsförderung der Universität Vechta, the Universitäts-
gesellschaft Vechta (UGV), the Volksbank Vechta and the city of Vechta for their
financial support and hospitality, which contributed immensely to the memora-
ble pleasant atmosphere of the event.
Christoph Schubert and Christina Sanchez-Stockhammer

April 2016
Table of contents
Acknowledgements v
Christoph Schubert
Introduction: Current trends in register research 1
Section I: Specialised registers
Douglas Biber and Jesse Egbert

Towards a user-based taxonomy of web registers 19
Heidrun Dorgeloh
The interrelationship of register and genre in medical discourse 43
Markus Bieswanger
Aviation English: Two distinct specialised registers? 67
Rolf Kreyer
‘Now niggas talk a lotta Bad Boy shit’: The register hip-hop from a corpus-
linguistic perspective 87
Teresa Pham
The register of English crossword puzzles: Studies in intertextuality 111
Section II: Cross-register comparison
Punctuation as an indication of register: Comics and academic texts 139
Martina Lampert
Linking up register and cognitive perspectives: Parenthetical constructions in
academic prose and experimentalist poetry 169
Stella Neumann and Jennifer Fest

Cohesive devices across registers and varieties: The role of medium in
English 195
viii Table of contents
Section III: Regional, contrastive and diachronic

register variation
Barbara Güldenring
Metaphors in New English academic writing 223
Steffen Schaub
The influence of register on noun phrase complexity in varieties of
English 251
Valentin Werner
Real-time online text commentaries: A cross-cultural perspective 271
Javier Pérez-Guerra
Word order is in order here: A diachronic register analysis of syntactic
markedness in English 307
Index 337
Christoph Schubert
Introduction: Current trends in register
research
1 Research interest and goals of the volume

The discipline of text linguistics is firmly established as “any work in language
science devoted to the text as the primary object of inquiry” (de Beaugrande and
Dressler 1981: 14). Although there is a variety of theories and approaches in text
linguistics, common research issues are the definition of “text” in old and new
media, the formal and functional connections between sentences, typological
classifications of texts and processes in the production and comprehension of
texts (cf. Esser 2009: 20–21 and Schubert 2012: 29). As the new discipline of “var-
iational pragmatics”, which investigates contextual language use across regional
varieties of English, has been established in recent years (cf. Schneider and Barron
2008), the present volume aims to foster and further develop the discipline of
“variational text linguistics”. Since this new field of research covers both func-
tional and regional types of textual variation, it intends to provide novel insights
into the multi-faceted concept of “register”. Along the lines of Biber and Conrad’s
monograph Register, Genre, and Style (2009: 6), we regard “register analysis” as
a perspective on text variety which investigates context-dependent communica-
tive functions of characteristic lexico-grammatical features in discourse. Thus,
quantitative results based on adequate corpora are here combined with qualita-
tive assessment. We approach the subject of “register” from a wide perspective,
incorporating stylistics, variational linguistics and discourse analysis, so that
convergences and synergistic effects between disciplines become obvious.
In recent years, other volumes dedicated to textual variety have placed
emphasis on different research foci, which may be illustrated by three examples:
the essay collection by Dorgeloh and Wanner (2010) is interested in textual variety
in English exclusively from the perspective of syntactic parameters and it inves-
tigates genre rather than register. In the volume by Andersen and Bech (2013),
genre variation is only one parameter next to diachronic variation in time and
geographical variation in space. Moreover, the three types of variation are largely
discussed separately, and the editors’ main interest lies in corpus development
Christoph Schubert, University of Vechta

2 Christoph Schubert
and analysis. The book by Szmrecsanyi and Wälchli (2014) does not only discuss
register and dialectology but also includes language typology and therefore com-
prises articles on a number of languages such as Dutch or members of the Slavic
family. Yet, they also formulate the central diagnosis that “[e]ven though dialec-
tologists, register analysts, typologists, and quantitative linguists all deal with
linguistic variation, there is astonishingly little interaction across these fields”
(Wälchli and Szmrecsanyi 2014: 1).
In general, register analysis offers a constantly widening range of research
opportunities because of the ever-increasing possibilities of communication,
mainly triggered by the advent of modern communication technologies. As the
main body of linguistic research has concentrated on well-established and fre-
quent registers such as newspaper writing or face-to-face conversations, many
descriptive and theoretical issues have not yet been sufficiently investigated.
Accordingly, the report on major register studies in Biber and Conrad (cf. 2009:
271–295) reveals that research on specialized registers has had a clear preference
for academic and newspaper texts. In particular, the language of popular genres
such as pop music, comics or puzzles has hardly been investigated so far, and
there are several forms of electronic communication, such as online text com-
mentaries, which need to be described more closely. Hence, by giving room to the
description of registers which have not received an appropriate amount of atten-
tion so far, we intend to point out emerging trends as well as new directions for
future research. By means of cross-cultural comparisons of registers, the volume
aims to build bridges to neighbouring disciplines such as cultural studies, espe-
cially with regard to intercultural communication. By pointing out the ubiquitous
nature of register, we also intend to show that adequate register choice is not a
marginal phenomenon but a fundamental prerequisite for successful communi-
cation in specific social situations.
2 Definitions of “register”
As far as the semantic origin of the term “register” is concerned, the linguistic
use of the term represents a metaphorical borrowing from the domain of music,
in particular organ playing (cf. Renkema 2004: 146), where it refers to a “sliding
device controlling a set of organ-pipes which share a tonal quality” or “the
compass of a voice or musical instrument; a particular range of this compass”
(Trumble and Stevenson 2002: 2514), so that it is common to speak of “the upper/
middle/lower register” (Summers et al. 2005: 1380) of a specific instrument.
Hence, in this analogy, “[l]anguage is seen to be regulated in the same way as the
musical tuning of an organ” (cf. Dittmar 2010: 223), and competent speakers of a
language have the ability to fine-tune their linguistic choices according to their
intended contextual functions.
As regards the semantic extension of the term register, it is worthwhile to con-
sider different subdisciplines of linguistics in more detail (cf. Gut and Schubert
2012: 4–6). Thus, it is striking that sociolinguistic approaches usually employ a
narrow definition of the term, reducing it to the language of occupations, such
as “the register of law”, “the register of medicine” and the like. Since the topic of
discourse is the central determining factor in this type of approach, it is mainly
the vocabulary that is responsible for the constitution of a register. The follow-
ing two quotations taken from standard introductions to sociolinguistics aptly
demonstrate this narrow notion of “register”.
Linguistic varieties that are linked […] to particular occupations or topics can be termed
registers. […] Registers are usually characterized entirely, or almost so, by vocabulary differ-
ences. (Trudgill 2000: 81)
Register is another complicating factor in any study of language varieties. Registers are sets
of language items associated with discrete occupational or social groups. Surgeons, airline
pilots, bank managers, sales clerks, jazz fans, and pimps employ different registers. (Ward-
haugh 2002: 51)
It is obvious that subject matters connected to certain types of activity are respon-
sible for the linguistic choices made by discourse participants in this type of
approach to “register”. Although the second quotation includes the term “social
groups”, this is conceptualized in a narrow way, excluding the language of social
classes in the sense of working- or middle-class sociolects.
In contrast to this narrow notion of “register”, a wide definition of the term
is employed by the tradition of Systemic Functional Linguistics (SFL), as can be
seen in the next two definitions taken from a classic introduction to cohesion and
a recent study on register variation.
The linguistic features which are typically associated with a configuration of situational fea-
tures – with particular values of the field, mode and tenor – constitute a register. (Halliday
and Hasan 1976: 22, emphasis original)
Just as situations tend to recur and thus form types, registers represent recurring ways of
using language in a given situation. […] Registers can thus be described as sub-systems of
the language system or, when viewed from below, as types of instantiated texts reflecting a
similar situation. (Neumann 2013: 16)
As is the case in the influential monograph by Halliday (1978), “registers” are

here seen as functional varieties, corresponding to use in specific contexts, while
“dialects” are defined as varieties based on the respective user, who has a certain
social or regional background that surfaces in linguistic behaviour. The fact that
registers can be rightfully viewed as “sub-systems” of a given language under-
lines their formative and constitutive character in a language. As for the three sit-
uational features determining register choices, “field” refers to the subject matter
under discussion, “tenor” pertains to the relationship between the participants
in a given context and “mode” characterizes the medium of transmission (cf. also
Bex 1996: 94–110 and Matthiessen 1993: 236–238).
This wide notion of “register” is also adopted by the currently prevailing
approach of Multidimensional Analysis (MDA) à la Douglas Biber (e.g. Biber
1988, 1995, 2006, 2007; Gray 2013: 363–366), which relies on corpus-derived
co-occurrences of lexico-grammatical features that serve equivalent functions in
discourse. Despite the enhanced methodology, the definition is relatively similar,
since a register is regarded as “a variety associated with a particular situation
of use (including particular communicative purposes)” (Biber and Conrad 2009:
6). By increasing the degree of specificity, it is possible to distinguish between
“sub-registers” (Biber and Gray 2013), so that, for instance, academic writing
can be subdivided into sub-registers such as social science, multi-disciplinary
science and humanities.
In text linguistics, the terminological differentiation between “register” and
“genre” has always been a notorious issue. One possible solution to the problem
is offered by Dorgeloh and Wanner (2010: 10), who suggest three main differ-
ences, although the distinction between the concepts is still seen as scalar and
gradient. First, while register implies linguistic features dependent on situational
contexts, genres are regarded as types of “social action” (Dorgeloh and Wanner
2010: 10) used to perform interindividual tasks. Second, register is dominantly
geared towards the function of linguistic features, whereas genres rely to a large
degree on “patterned practice” (Dorgeloh and Wanner 2010: 10), involving char-
acteristic textual structures. Third, register operates at a high level of generality,
while genre has a more specific and concrete character, such as “on-line medical
advice” or a “corporate blog” (Giltrow 2010: 47). In fact, this more specific defini-
tion offers a niche for the term “genre” in linguistics, since recently, research on
“genre” has been superseded by linguistic interest in “register” (cf. Giltrow 2010:
31). Literary criticism, by contrast, clearly maintains a preference for the concept
of “genre”.
An alternative approach to terminological differentiation is provided by Biber
and Conrad (2009: 15–23), who regard the three terms “register”, “genre” and
“style” as “different perspectives on text varieties” (2009: 15–16). The perspective
of register pertains to all kinds of “frequent and pervasive” lexico-grammatical
items that fulfil specific communicative functions in “a sample of text excerpts”,
so that it can be applied to all sorts of discourse. As opposed to that, “[i]n the
genre perspective, the focus is on the linguistic characteristics that are used to
structure complete texts” (Biber and Conrad 2009: 16). Thus, genres rely on rather
specific expressions that occur “in a particular place in the text” (2009: 16) and
thus add up to a distinct rhetorical organization, which can be found in texts with
a fixed structure, such as formal letters. Finally, “style” is very similar to “reg-
ister” but depends on linguistic features that are “not directly functional” and
“are preferred because they are aesthetically valued” (Biber and Conrad 2009:
16). That is to say that it is possible to determine the style of specific authors or
periods of literary history, because these linguistic items do not correspond to
particular contexts of situation but serve the poetic function of language. Con-
clusively, in an extension of the “music” metaphor previously mentioned in the
definition of “register”, “genre” equals the specific musical piece chosen by the
church organist, while “style” is the organ-player’s individual interpretation and
performance of the composition.1
3 Recent developments in register research

While some twenty years ago in the volume Register Analysis Robert-Alain de
Beaugrande still diagnosed that “[t]hroughout much of linguistic theory and
method, the concept of ‘register’ has led a rather shadowy existence” (1993: 7),
research in the field has considerably gained momentum ever since. As regards
recent developments in register research in English, five main strands may be
distinguished.
First, there are numerous studies on diachronic register variation,
which cover various periods of English and usually focus on specific aspects. For
instance, Alonso-Almeida (2008) discusses the Middle English medical charm
with reference to register, genre and text type variables, whereas Warner (2005)
investigates the variable use of do-support in different registers of Early Modern
English. Moving on to Modern English, Biber and Finegan (2001) discuss varia-
tion in written and spoken registers from the 17th to the 20th centuries. Various
19th-century registers are covered by Geisler (2002) as well as by Egbert (2012).
More generally, Davies (2009) examines word frequency in registers from a
diachronic perspective, whereas Crespo Garcia (2004) and Taavitsainen (2001)
employ a narrower focus, concentrating on the history of the scientific register.
1 The editors sincerely thank Jan Renkema for this metaphorical insight.
Along similar lines, Biber and Gray (2013) investigate diachronic change in news
reportage and academic research writing during the twentieth century.
Second, there is a considerable body of research on register variation in spe-
cialized domains. The dimensions under discussion include parameters such
as medium, public and private spheres as well as the discourse of certain fields
of knowledge. Research on academic English is most frequent, as shown by
Csomay’s (2002) analysis of lectures and Biber’s (2006) comprehensive multidi-
mensional study of spoken and written register variation in university discourse.
Fryer (2013) investigates medical research articles with regard to evaluation
practices, while Schutz (2013) discusses the use of verbs in registers pertaining
to business, linguistics, and medical research. Gotti (2012) argues that academic
English is by no means uniform but varies according to a number of criteria,
such as disciplinary conventions, expertise in the respective field, and linguistic
competence of the author. A particular focus on interdisciplinary discourses is
found in Teich (2009), whereas further recent studies on academic English and
scientific texts respectively have been published by Bartsch (2009) and Teich
(2010). In Quinto-Pozos and Mehta’s (2010) study of American Sign Language,
it becomes clear that different registers are present not only in verbal but also in
nonverbal communication. Concerning the parameter of medium, earlier studies
on spoken and written registers have been complemented by research on com-
puter-mediated communication (Biber 2007). As the research survey in Biber and
Conrad (cf. 2009: 271–295) underlines, interest in electronic discourse has signif-
icantly increased over the last ten to fifteen years. Further studies on specialized
domains comprise register shifting in US public discourse (Cole 2012), the crea-
tion of humour through incongruity in register (Venour, Ritchie and Mellish 2011),
the register of news reporting in its social context (Lukin 2010), Business English
(Cortés de los Ríos 2010), the evaluative language of corporate social reporting
(Fuoli 2013), legal language (Battarbee 2010) and the language of linguistics
(Freddi 2005). There is also some research on the use of registers in literary texts,
as exemplified by Pollner’s (2005) analysis of language variation in Irvine Welsh’s
novel Trainspotting.
Third, a quickly developing trend brings together register research with socio
linguistic investigations of regional variation, usually concentrating on inter-
national varieties of English, or “World Englishes”, used as a second language
(ESL). Xiao (2009) provides a discussion of general issues of the study of World
Englishes from the perspective of multidimensional analysis. The recent volume
by Szmrecsanyi and Wälchli (2014) contains a number of papers which combine
quantitative techniques in register analysis, dialectology, and language typology.
For instance, the contribution by Diwersy, Evert and Neumann (2014) shows how
a corpus-driven multivariate approach can be used for the study of both regis-
ter and regional variation. Hilbert and Krug (2012) present a study on the use
of progressives in spoken conversations and written press language in Maltese
English, as compared to British and American English. As far as Asian varieties
are concerned, there is research on registers in Singapore English (Bao and Hong
2006) and on Indian English registers (Balasubramanian 2009a), complemented
by a special focus on adverbials (Balasubramanian 2009b). Regarding Africa,
there is multidimensional research on various registers in East African English,
pointing out, among other aspects, the presence of a greater degree of formal-
ity and an increased involvement of the addressee (Van Rooy et al. 2010). Other
papers analyse expository writing in Cameroon English (Nkemleke 2006) and
academic texts by African American college students (Syrquin 2006). Neumann
(2012) chooses a more comprehensive approach, comparing a number of registers
in the Englishes spoken in New Zealand, Hong Kong, India, Jamaica, Singapore
and Canada. The ultimate goal of most of these studies is to give a complete and
comprehensive account of geographical varieties by describing their internally
diversified registers, thus taking sociolinguistics to the next level. Along these
lines, Balasubramanian (2009a: 19) argues that “[t]o provide a thorough linguis-
tic description of a variety […], it is important to study registers of that variety –
i.e. to study the variation within the dialect” and that “[s]uch study of register
was missing in the earlier methodologies of dialectology”. As has been pointed
out in research on postcolonial Englishes, it is common for these new Englishes to
develop use-related varieties in addition to user-related ones, which corresponds
to the stage of “differentiation” in the evolutionary development of postcolonial
varieties (cf. Schneider 2007: 52–55). Hence, the study of registers aptly comple-
ments sociolinguistic approaches, so that this liaison will undoubtedly prove
highly fruitful in future research on linguistic variety.
Fourth, contrastive register analysis investigates register variation across
two or more languages and is often linked to questions of translation studies.
For instance, Teich (2003) compares textual variety in English and German and
thereby significantly extends the scope of Contrastive Linguistics, which used to
focus mainly on relatively isolated phonological and morphosyntactic features.
Neumann (2013) likewise contrasts English and German registers by including
both cross-linguistic variation and variational differences between original and
translated texts. One central result is that related registers in the two languages
show different register features with regard to the chosen subdimensions, so
that individual register studies for both languages are necessary. More specifi-
cally, the monograph by Barron (2012) compares public information messages in
Irish English and German, while register shifts in translations from English into
Slovene are investigated by Zlatnar Moe (2010). Focusing on the digital medium,
Hardy (2012) contrasts electronic discourse in Filipino and American English.
Fifth, from an applied linguistic perspective there are numerous publica-

tions on register and language teaching. While Painter (2001) writes on general
issues of teaching genre and register and Reppen (2001) compares spoken and
written registers of school-aged students and adults, many articles – quite unsur-
prisingly – deal with the teaching of academic English. For instance, Halliday’s
Systemic-Functional Linguistics is used for the analysis of student report writing
by Gardner (2012), and Gilquin (2008) as well as Moore (2006) investigate Learner
Academic Writing. On the basis of similar research interests, Han (2010) discusses
the teaching of English for Specific Purposes (ESP) from the perspective of register
theory. Another language-pedagogical topic is addressed by Volden (2009), who
concentrates on registers used by autistic children. Rühlemann (2008) examines
the teaching of the informal conversational register, which is frequently neglected
in EFL research. With the exception of language pedagogical approaches, all of
the trends mentioned are taken up by the papers in the present volume.
4 A model for register analysis

All of the contributions in this volume refer to the theoretical model of the influ-
ential textbook by Biber and Conrad (2009). The central statement underlying
register analysis in this textbook names the following crucial parameters: “[t]he
description of a register covers three major components: the situational context,
the linguistic features, and the functional relationships between the first two
components” (Biber and Conrad 2009: 6). By establishing meaningful relations
between these aspects, any given register can be described on the basis of a qual-
itative and quantitative investigation.
As far as the situational context is concerned, Biber and Conrad expand the
three parameters proposed by Halliday (1978) by establishing the following seven
characteristics (2009: 40–47): (1) participants: the addressor(s) as the produc-
er(s) of texts can be defined according to number, situation in society (individual
or institutional) and personal parameters (age, gender, education etc.). Address-
ees as the recipients of texts may also be classified according to number and the
question whether they can be personally identified or not. In addition, there may
be onlookers, who do not directly contribute to the verbal exchange but whose
physical presence may nevertheless influence the linguistic choices made by
the interlocutors. (2) Relations among participants: it is crucial to analyse
whether the communication is immediately interactive, which social roles are
played by the participants in terms of power, whether they have a personal
relationship, and to what degree the interactants share relevant background
knowledge. (3) Channel: the communication can be conducted in the written or

spoken mode, and a particular medium may be utilized, such as telephone, radio,
television or the internet. (4) Production circumstances: while spoken com-
munication commonly takes place in real time, written or electronic discourse
may be carefully planned and additionally revised. (5) Setting: in spoken inter-
action, the participants often share time and place, which is usually not the case
in written texts. Moreover, communication can take place in a private or public
setting or at a specific location such as a church. In temporal terms, linguistic
conventions change through the decades and centuries. (6) Communicative
purposes: while general discourse intentions include description, persuasion
or narration, they may be complemented by specific textual functions referring
to particular states of affairs, such as scientific findings or political spin. What
is more, the text may be presented as fictitious or factual, and addressors often
use linguistic items expressing their personal stance. (7) Topic: the theme of any
kind of communication can be classified at a very general level as belonging to a
certain field of discourse, such as science or business, while such domains obvi-
ously offer manifold possibilities of topical sudivisions.
Those seven situational characteristics can be related to fifteen linguistic
categories that may be worthwhile investigating in a register analysis (cf. Biber
and Conrad 2009: 78–82): vocabulary features (e.g. technical terms), content
word classes, function word classes, derived words, verb features (e.g. tense and
aspect), pronoun features, reduced forms and dispreferred structures (e.g. con-
tractions or ellipsis), prepositional phrases, coordination, main clause types,
noun phrases, adverbials, complement clauses, word order choices (e.g. raising
or extraposition) and special features of conversation (e.g. backchannels, pauses
and repetitions). Any of these features may then function as either “register fea-
tures” or “register markers”, which are distinguished in the following way (Biber
and Conrad 2009: 53–54): register features are both pervasive and frequent, as
they occur in all parts of a sample text belonging to a given register and appear
more often in a selected register than in others. In contrast, register markers are
unique to a particular register, as they do not occur in any other register, such as
technical expressions in specific types of sport broadcasts.
In order to make a comparison of registers possible, it is necessary to intro-
duce a limited set of dimensions along which various registers show different
frequencies of the respective linguistic features. For instance, dimensions used
for the study of spoken and written university registers may be “oral versus lit-
erate discourse” or “procedural versus content-focused discourse” (Biber and
Conrad 2009: 226–230). This approach, accordingly entitled “multidimensional
(MD) analysis”, heavily relies on corpus-derived quantitative data. With the help
of factor analysis, co-occurring clusters of linguistic features in target registers
can be retrieved. Eventually, it is possible to identify register-specific dimension

scores, by means of which the registers can be compared. This approach also
underlies the register distinction present in the seminal Longman Grammar of
Spoken and Written English (Biber, Johansson, Leech, Conrad and Finegan 1999)
as well as in the monograph University Language (Biber 2006), and it is the foun-
dation of numerous studies on registers in recent years. For instance, Biber (2012)
challenges the common practice of reference grammars which fail to take into
account register distinctions and treat grammatical structures as general features
of English at large. Biber’s impact can be measured by the fact that his method of
multidimensional analysis has become more and more widespread (e.g. Egbert
2012; Geisler 2002; Reppen 2001; van Rooy et al. 2010; Xiao 2009). This trend
is further corroborated by a recent edited volume which is dedicated explicitly
to Biber’s MDA and contains articles on regional and register variation in both
English and Romance languages (Sardinha and Pinto 2014).
5 An outline of the volume

This volume is subdivided into three thematic parts, each introduced by general
remarks on the respective section topic and by a summary of the individual articles:
the first part, specialised registers, is dedicated to the description of individual
registers, namely web registers (Biber and Egbert), medical texts (Dorgeloh), Avi-
ation English (Bieswanger), hip-hop (Kreyer) and crossword puzzles (Pham). The
second part, cross-register comparison, builds upon that basis by providing
register-transcending studies which compare individual registers. More specifi-
cally, it contrasts comics and academic texts (Sanchez-Stockhammer), academic
prose and minimalist poetry (Lampert) as well as academic writing, administra-
tive writing, timed exams, conversations and broadcast discussions (Neumann
and Fest). The third part, regional, contrastive and diachronic register
variation, widens the perspective by investigating register variation along inter-
national, contrastive-linguistic and historical dimensions. It is dedicated to met-
aphors in the New Englishes of India, Hong Kong and Singapore (Güldenring) as
well as noun phrase structure in Indian English, Jamaican English, Hong Kong
English and Canadian English (Schaub). Online text commentaries are analysed
contrastively in British and German sports reports (Werner). The diachronic
perspective is considered in the discussion of developments of word order from
Middle English to Late Modern English (Pérez-Guerra). The paper by Neumann
and Fest functions as an apt link between Sections II and III, since it combines
cross-register comparisons with regional variation. Although the various contri-
butions to the volume take different research perspectives, all deal with frequent
and recurrent linguistic features throughout texts supporting specific superor-
dinate functions. Conclusively, the papers cover theoretical considerations, case
studies and reflections on presently employed methods, suggesting approaches
and topics for future research on variational text linguistics in English.
Bibliography
Alonso-Almeida, Francisco. 2008. The Middle English medical charm: Register, genre and text
type variables. Neuphilologische Mitteilungen 109(1). 9–38.
Andersen, Gisle & Kristin Bech (eds.). 2013. English corpus linguistics: Variation in time, space
and genre. Amsterdam: Rodopi.
Balasubramanian, Chandrika. 2009a. Register variation in Indian English. Amsterdam:
Benjamins.
Balasubramanian, Chandrika. 2009b. Circumstance adverbials in registers of Indian English.
World Englishes 28(4). 485–508.
Bao, Zhiming & Huaqing Hong. 2006. Diglossia and register variation in Singapore English.
Barron, Anne. 2012. Public information messages: A contrastive genre analysis of state-citizen
communication. Amsterdam: Benjamins.
Bartsch, Sabine. 2009. Corpus studies of register variation: An exploration of academic
registers. Anglistik: International Journal of English Studies 20(1). 105–124.
Battarbee, Keith. 2010. Shifts in the language of the law: Reading the registers of official-
language statutes. Text & Talk 30(6). 637–655.
Bex, Tony. 1996. Variety in written English: Texts in society – societies in text. London:
Routledge.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge UP.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison.
Cambridge: Cambridge UP.
Biber, Douglas & Edward Finegan. 2001. Diachronic relations among speech-based and
written registers in English. In Susan Conrad & Douglas Biber (eds.). Variation in English:
Multi-dimensional studies, 66–83. Harlow: Pearson Education.
Biber, Douglas. 2006. University language: A corpus-based study of spoken and written
registers. Amsterdam: Benjamins.
Biber, Douglas. 2007. Towards a taxonomy of web registers and text types: A multidimensional
analysis. In Marianne Hundt, Nadja Nesselhauf & Carolin Biewer (eds.). Corpus linguistics
and the Web, 109–131. Amsterdam: Rodopi.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge UP.
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus linguistics and
linguistic theory 8(1). 9–37.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. London: Longman.
Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of
sub-register. Journal of English Linguistics 41(2). 104–134.
Cole, Debbie. 2012. Uptake (un)limited: The mediatization of register shifting in US public
discourse. Language in Society 41(4). 449–470.
Cortés de los Ríos, Ma Enriqueta. 2010. A combined genre-register approach in texts of
business English. LSP Journal 1(1). 13–28.
Crespo García, Begoña. 2004. The scientific register in the history of English: A corpus-based
study. Studia Neophilologica 76(2). 125–139.
Csomay, Eniko. 2002. Variation in academic lectures: Interactivity and level of instruction. In
Randi Reppen, Susan M. Fitzmaurice & Douglas Biber (eds.). Using corpora to explore
linguistic variation, 203–224. Amsterdam: Benjamins.
Davies, Mark. 2009. Word frequency in context: Alternative architectures for examining related
words, register variation and historical change. In Dawn Archer (ed.). What’s in a word-list?
Investigating word frequency and keyword extraction, 53–68. Surrey: Ashgate.
De Beaugrande, Robert-Alain. 1993. ‘Register’ in discourse studies: A concept in search of a
theory. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice, 7–25. London:
Pinter Publishers.
De Beaugrande, Robert-Alain & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics.
London: Longman.
Dittmar, Norbert. 2010. Register. In Mirjam Fried, Jan-Ola Östman & Jef Verschueren (eds.).
Variation and change: Pragmatic perspectives, 221–233. Amsterdam: Benjamins.
Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate
approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli
(eds.). Aggregating dialectology, typology, and register analysis: Linguistic variation in
text and speech, 174–204. Berlin: de Gruyter.
Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner
(eds.). Syntactic variation and genre, 1–26. Berlin: De Gruyter Mouton.
Egbert, Jesse. 2012. Style in nineteenth century fiction: A multi-dimensional analysis. Scientific
Study of Literature 2(2). 167–198.
Esser, Jürgen. 2009. Introduction to English text-linguistics. Frankfurt/Main: Peter Lang.
Freddi, Maria. 2005. From corpus to register: The construction of evaluation and argumentation
in linguistics textbooks. In Elena Tognini-Bonelli & Gabriella Del Lungo Camiciotti (eds.).
Strategies in academic discourse, 133–151. Amsterdam: Benjamins.
Fryer, Daniel Lees. 2013. Exploring the dialogism of academic discourse: Heteroglossic
engagement in medical research articles. In Gisle Andersen & Kristin Bech (eds.). English
corpus linguistics: Variation in time, space and genre, 183–207. Amsterdam: Rodopi.
Fuoli, Matteo. 2013. Texturing a responsible corporate identity: A comparative analysis of
appraisal in BP’S and IKEA’S 2009 corporate social reports. In Gisle Andersen & Kristin
Bech (eds.). English corpus linguistics: Variation in time, space and genre, 209–235.
Amsterdam: Rodopi.
Gardner, Sheena. 2012. Genres and registers of student report writing: An SFL perspective on
texts and practices. Journal of English for Academic Purposes 11(1). 52–63.
Geisler, Christer. 2002. Investigating register variation in nineteenth-century English: A
multi-dimensional comparison. In Randi Reppen, Susan M. Fitzmaurice & Douglas Biber
(eds.). Using corpora to explore linguistic variation, 249–271. Amsterdam: Benjamins.
Gilquin, Gaëtanelle. 2008. Too chatty: Learner academic writing and register variation. English
Text Construction 1(1). 41–61.
Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun
Dorgeloh & Anja Wanner (eds.). Syntactic variation and genre, 29–51. Berlin: De Gruyter
Mouton.
Gotti, Maurizio. 2012. Variation in academic texts. In Maurizio Gotti (ed.). Academic identity
traits: A corpus-based investigation, 23–42. Bern: Peter Lang.
Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4).
359–379.
Gut, Ulrike & Christoph Schubert. 2012. Approaches to language variation: Introduction. In
Monika Fludernik & Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings,
3–9. Trier: WVT.
Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Arnold.
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.
Han, Huabing. 2010. On the methodology employed in ESP teaching under register theory. The
1st Asian ESP conference. [Special edition]. Asian ESP Journal, 158–163.
Hardy, Jack A. 2012. Filipino and American online communication and linguistic variation. World
Englishes 31(2). 143–161.
Hilbert, Michaela & Manfred Krug. 2012. Progressives in Maltese English: A comparison with
spoken and written text types of British and American English. In Marianne Hundt &
Ulrike Gut (eds.). Mapping unity and diversity world-wide, 103–136. Amsterdam: John
Benjamins.
Lukin, Annabelle. 2010. ‘News’ and ‘register’: A preliminary investigation. In Ahmar Mahboob &
Naomi K. Knight (eds.). Appliable linguistics, 92–113. London: Continuum.
Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of
register analysis. In Mohsen Ghadessy (ed.). Register analysis: Theory and practice,
221–292. London: Pinter Publishers.
Moore, Nick. 2006. Advanced language for intermediate learners: Corpus and register analysis
for curriculum specification in English for academic purposes. In Heidi Byrnes (ed.).
Advanced language learning: The contribution of Halliday and Vygotsky, 246–264.
London: Continuum.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik &
Benjamin Kohlmann (eds.). Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.
Neumann, Stella. 2013. Contrastive register variation: A quantitative approach to the
comparison of English and German. Berlin: Mouton de Gruyter.
Nkemleke, Daniel A. 2006. Some characteristics of expository writing in Cameroon English.
English World-Wide 27(1). 25–44.
Painter, Clare. 2001. Understanding genre and register: Implications for language teaching.
In Anne Burns & Caroline Coffin (eds.). Analysing English in a global context, 167–180.
London: Routledge.
Pollner, Clausdirk. 2005. English 0 – and drugs galore: Varieties and registers in Irvine Welsh’s
Trainspotting. In Gisela Hermann-Brennecke & Wolf Kindermann (eds.). Anglo-american
awareness: Arpeggios in aesthetics, 193–202. Münster: LIT.
Quinto-Pozos, David & Sarika Mehta. 2010. Register variation in mimetic gestural complements
to signed language. Journal of Pragmatics 42(3). 557–584.
Renkema, Jan. 2004. Introduction to discourse studies. Amsterdam: John Benjamins.
Reppen, Randi. 2001. Register variation in student and adult speech and writing. In Susan
Conrad & Douglas Biber (eds.). Variation in English: Multidimensional studies, 187–199.
London: Longman.
Rühlemann, Christoph. 2008. A register approach to teaching conversation: Farewell to
Standard English? Applied Linguistics 29(4). 672–693.
Sardinha, Tony Berber & Marcia Veirano Pinto (eds.). 2014. Multi-dimensional analysis, 25
years on: A tribute to Douglas Biber. Amsterdam: John Benjamins.
Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge:
Cambridge UP.
Schneider, Klaus P. & Anne Barron (eds.). 2008. Variational pragmatics: A focus on regional
varieties in pluricentric languages. Amsterdam/Philadelphia: Benjamins.
Schubert, Christoph. 2012. Englische Textlinguistik: Eine Einführung. 2nd edn. Berlin: Erich
Schmidt.
Schutz, Natassia. 2013. How specific is English for academic purposes? A look at verbs
in business, linguistics and medical research articles. In Gisle Andersen & Kristin
Bech (eds.). English corpus linguistics: Variation in time, space and genre, 237–257.
Amsterdam: Rodopi.
Summers, Della et. al. (ed.). 2005. Longman dictionary of contemporary English. Harlow:
Pearson Education Limited.
Syrquin, Anna F. 2006. Registers in the academic writing of African American college students.
Written Communication 23(1). 63–90.
Szmrecsanyi, Benedikt & Bernhard Wälchli (eds.). 2014. Aggregating dialectology, typology,
and register analysis: Linguistic variation in text and speech. Berlin: de Gruyter.
Taavitsainen, Irma. 2001. Language history and the scientific register. In Hans-Jürgen Diller
& Manfred Görlach (eds.). Towards a history of English as a history of genres, 185–202.
Heidelberg: Winter.
Teich, Elke. 2003. Cross-linguistic variation in system and text. Berlin: Mouton de Gruyter.
Teich, Elke. 2009. Scientific registers in contact: An exploration of the lexico-grammatical
properties of interdisciplinary discourses. International Journal of Corpus Linguistics
14(4). 524–548.
Teich, Elke. 2010. Exploring a corpus of scientific texts using data mining. In Stefan Th. Gries,
Stefanie Wulff & Mark Davies (eds.). Corpus-linguistic applications: Current studies, new
directions, 233–247. Amsterdam: Rodopi.
Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn.
London: Penguin.
Trumble, William R. & Angus Stevenson (eds.). 2002. Shorter Oxford English dictionary on
historical principles. 2 vols. Oxford: Oxford UP.
Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register
differentiation in East African English: A multidimensional study. English World-Wide
31(3). 311–349.
Venour, Chris, Graeme Ritchie & Chris Mellish. 2011. Dimensions of incongruity in register
humour. In Marta Dynel (ed.). The pragmatics of humour across discourse domains,
125–144. Amsterdam: Benjamins.
Volden, Joanne. 2009. Bossy and nice requests: Varying language register in speakers with
autism spectrum disorder (ASD). Journal of Communication Disorders 42(1). 58–73.
Wälchli, Bernhard & Benedikt Szmrecsanyi. 2014. Introduction: The text-feature-aggregation
pipeline in variation studies. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.).
Aggregating dialectology, typology, and register analysis: Linguistic variation in text and
speech, 1–25. Berlin: de Gruyter.
Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.
Warner, Anthony. 2005. Why DO dove: Evidence for register variation in Early Modern English.
Language Variation and Change 17(3). 257–280.
Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World
Englishes 28(4). 421–450.
Zlatnar Moe, Marija. 2010. Register shifts in translations of popular fiction from English into
Slovene. In Daniel Gile, Gyde Hansen & Nike K. Pokorn (eds.). Why translation studies
matters, 125–136. Amsterdam: Benjamins.
Section I:
Specialised registers
The volume opens with five contributions discussing the lexico-grammatical
features of previously underdescribed registers, which are situated on different
levels in the hierarchy of specificity: web registers, medical discourse, Aviation
English, hip-hop and crossword puzzles. The first two registers comprise hetero
geneous sub-registers, as, for instance, a distinction is made among the web
registers between interviews, discussion forums, encyclopedia articles, adver-
tisements and recipes, while Aviation English is a twofold construct and hip-hop
and crossword puzzles constitute relatively uniform categories. All studies can be
situated within the analytical register framework described in Biber and Conrad
(2009) and examine to what extent their object of inquiry can be considered a
register or where the boundaries between more general categories and sub-
registers may be drawn. In addition, Dorgeloh’s contribution extends the model
by including the genre perspective in the analyses.
The first paper in the volume, Douglas Biber and Jesse Egbert’s study
“Towards a user-based taxonomy of web registers”, stands out from the other
papers’ corpus-based approaches by its use of a bottom-up design in which inter-
net users were asked to identify basic situational characteristics of web docu-
ments. These characteristics were then used to construct a hierarchical decision
tree, which permitted the successful categorisation of most internet texts by the
same type of informants in the next step. Among the most important results of
this study are the finding that some sub-registers might be easier to identify than
their superordinate category and the observation that a relatively large propor-
tion of registers on the internet can be considered hybrid with regard to their
communicative purposes.
Hybridity of either form, discourse function or both is also observed by
Heidrun Dorgeloh in her study “The interrelationship of register and genre in
medical discourse”, which finds hybridity in the three medial registers under
consideration: illness blogs, medical case reports and medical case presenta-
tions. She argues that the correlations between form and function in medical dis-
course are less linked to the communicative situation than to the type of activity
and concludes that the notion of genre should be conferred primacy over that of
(sub-)registers.
Markus Bieswanger, by contrast, applies a classical Biberian register analy-
sis to the field of air traffic communication in his paper “Aviation English: Two dis-
tinct specialised registers?”. While the term Aviation English is generally used to
designate both the standardised phraseology promoted by the International Civil
18 Section I: Specialised registers
Aviation Organization and the plain English used in exceptional situations where
communicative needs transcend the routine repertoire, Bieswanger’s analysis of
authentic air traffic communication material manages to demonstrate that these
are actually two distinct registers and not just one register with two sub-registers.
While Dorgeloh’s and Bieswanger’s material-based approaches place a particular
focus on the qualitative analysis of their data in order to explore the boundaries
of their particular register(s), the remaining two studies represent quantitative
corpus-based studies of specialised corpora.
Rolf Kreyer’s contribution, “‘Now niggas talk a lotta Bad Boy shit’: The reg-
ister hip-hop from a corpus-linguistic perspective”, targets a question similar to
Bieswanger’s, namely whether hip-hop lyrics should be considered a sub-register
of pop song lyrics. Based on a corpus of lyrics from the top albums in the US
album charts in 2003 and 2011, Kreyer contrasts a hip-hop sub-corpus with lyrics
by rappers and hip-hoppers to the lyrics from the remaining albums. His analyses
yield differences regarding the semantically annotated content and some non-
standard spellings but particularly regarding the absence of the copula. Kreyer
therefore concludes that the language used in hip-hop can be considered a regis-
ter in its own right.
The section closes with Teresa Pham’s corpus analysis entitled “The register
of English crossword puzzles: Studies in intertextuality”, in which she reaches
the conclusion that cryptic and non-cryptic puzzles constitute sub-registers of
the general register of crossword puzzles. The differences with regard to the use
of intertextuality between the two types of crossword puzzle suggest the addition
of intertextuality to the list of linguistic features that can be used to distinguish
registers from each other in the Biberian framework.
Douglas Biber and Jesse Egbert
Towards a user-based taxonomy of web
registers
Abstract: There is a well-established need for a comprehensive taxonomy of
English web registers grounded in the actual experiences of end-users. In this
paper, we introduce a new grant-funded initiative aimed at filling this gap. We
first describe the methods used to develop a hierarchical web register framework
and introduce our bottom-up, user-based method of web register classification.
Using a hierarchical decision tree, a large sample of webpage URLs (N = 1,000)
was classified into register and sub-register categories by four raters each. The
results indicate that the approach can be effectively used to identify the register
category for most internet texts, although the results also show that many texts
belong to ‘hybrid’ registers. The primary goals of the paper are to present the
overall distribution of internet texts across general registers, sub-registers and
‘hybrid’ registers, and to discuss some of the key characteristics of the major reg-
ister categories. We conclude with a discussion of challenges and future direc-
tions for web register research.
1 I ntroduction
There is a mind-boggling amount of information available on the World Wide
Web. For example, Fletcher (2012: 1) estimates that Google indexes about 40
billion webpages. Although not its intended purpose, the WWW also provides
a tremendous resource for linguists, who can use the web as a corpus to investi-
gate linguistic patterns of use. This approach has become so prevalent that the
acronym WAC (Web-as-Corpus) has now become commonplace among research-
ers who explore ways to mine the WWW for linguistic analysis.
One of the major challenges for WAC research is that a typical web search
usually provides us with no information about the kinds of texts investigated. For
example, Fletcher notes that a linguistic search of the Web-as-Corpus will tell us
nothing about:
Douglas Biber, Northern Arizona University

Jesse Egbert, Brigham Young University
20 Douglas Biber and Jesse Egbert
For whom and what purpose is the text intended? What […] target audience does it repre-
sent? Was it written carefully or carelessly by a native speaker, or is it an unreliable transla-
tion by man or machine? Is the document authoritative – accurate in content and represent-
ative in linguistic form? (2012: 1341)
Similar problems were noted a decade earlier by Kilgarriff and Grefenstette (2003)
in their introduction to a special issue of Computational Linguistics on WAC. Thus,
they write:
“Text type” is an area in which our understanding is, as yet, very limited. Although further
work is required irrespective of the Web, the use of the Web forces the issue. Where research-
ers use established corpora, such as Brown, the BNC, or the Penn Treebank, researchers and
readers are willing to accept the corpus name as a label for the type of text occurring in it
without asking critical questions. Once we move to the Web as a source of data, and our
corpora have names like “April03-sample77,” the issue of how the text type(s) can be char-
acterized demands attention. (2003: 343)
These concerns are shared widely among WAC researchers, and as a result, there
has been a surge of interest over the last several years in Automatic Genre Identi-
fication (AGI): computational methods using a wide range of descriptors to auto-
matically classify web texts into genre classes. The typical methodology used in
an AGI study is to manually identify the genre (or register) of selected internet
texts and to then test the extent to which computer programs can automatically
place those texts into the same categories. However, although some studies have
achieved high accuracy rates (e.g., Lindemann and Littig 2010; Santini 2010),
serious questions have been raised about the validity of those results. First, some
scholars raise doubts about the representativeness of the web corpora analysed
in previous AGI studies: researchers often disregard the question of whether the
sample used in an AGI study represents the full population of internet texts (see
discussion in Santini and Sharoff 2009).
There have also been questions raised about the actual genre/register cate-
gories that we are trying to predict. Most studies have followed the same general
procedure: they first begin with a list of possible genre categories; then internet
texts are manually classified into those categories by an ‘expert’; and then com-
putational methods are used to determine whether those genre categories can
be automatically predicted. This approach is based on two assumptions: 1) that
researchers have identified the ‘correct’ set of possible genre/register categories
found on the web, based on a priori intuitive consideration of internet texts; and
2) that a single expert user is able to ‘correctly’ identify the genre/register cat-
egory of individual internet texts. Unfortunately, neither assumption seems to
be warranted. The few cases where inter-rater reliability is reported have shown
that it tends to be quite low, even for linguists. This is especially true for corpora
composed of randomly extracted web texts (see discussion in Sharoff, Wu, and
Markert 2010). Given the problems that ‘experts’ have identifying web genre cat-
egories, it is not surprising that non-expert web users also vary in their under-
standing of genre labels (see Crowston, Kwaśnik, and Rubleske 2010) and that
reliability among lay users is often unacceptably low (Rosso and Haas 2010).
More importantly, though, it is not clear that the genre categories being pre-
dicted in AGI studies are actually valid. This problem has been recognised and
discussed in previous research; thus, for example, Rehm et al. (2008: 352) note:
One of the most important problems concerns the elusiveness of the concept of genre. The
consequence is that, in practical terms, genre researchers usually have different ideas of
what a genre is, how genres should be defined and identified and, therefore, they use dif-
ferent genre labels in their approaches.
A few years ago, there was considerable effort to agree on a standard set of
register/genre categories for AGI research, as part of a wiki-based collaboration
among Web-as-Corpus experts (http://www.webgenrewiki.org/). That collabo-
rative effort resulted in a list of 78 register/genre distinctions, but the initiative
appears to have faded out in the last few years, with little consensus regarding the
relative status of those categories. As a result, there is still no generally agreed-on
set of register/genre categories used in current AGI research. (In the remainder
of this paper, we use the term ‘register’ rather than ‘genre’ to refer to situational-
ly-based textual distinctions, following the research tradition developed in Biber
1995, Biber et al. 1999, Biber and Conrad 2009, etc.).
In the present study, we tackle this problem with a completely different
approach: instead of relying on expert coders, we recruit typical end-users of
the web for our register analyses, assessing the degree of agreement among
those users. Most importantly, we do not force users to choose directly from a
pre-defined set of register categories. Rather, we ask users to identify basic situ
ational characteristics of each web document, coded in a hierarchical manner
(see below). Those situational characteristics lead to general register categories,
which in turn allow users to select a specific sub-register category. By working
through a hierarchical decision tree, users are able to identify the register cat
egory of most internet texts with a high degree of reliability.
In Section 2 below, we briefly document the methodological procedures used
for this project. (Readers are referred to Egbert and Biber 2013 for more detailed
discussions.) In Section 3, we introduce the register framework used for our study.
In Section 4, then, we describe the overall prevalence of different types of regis-
ters on the web and briefly describe and illustrate some of the major web regis-
ters identified in the study. Section 5 discusses a more specialised type of register
identified by users in this study: ‘hybrid registers’. Finally, in the conclusion we
outline our on-going research to extend this methodological approach to a large
representative corpus of web documents.
2 M
ethods
2.1 Corpus for analysis
The corpus used for our study was extracted from the Corpus of Global Web-
based English (GloWbE), constructed by Mark Davies (see http://corpus2.byu.
edu/glowbe/). The entire corpus contains ca. 1.9 billion words and 1.8 million
web pages, collected by using the results of Google searches of highly frequent
English 3-grams (e.g., is not the, and from the). The use of n-grams as search
engine seeds is an approach that has been used in the past by many WAC schol-
ars (see, e.g., Baroni and Bernardini 2004; Baroni et al. 2009; Sharoff 2005, 2006).
Our decision to use 3-grams (rather than 2-grams or 4-grams) was based largely
on empirical evidence from the Longman Grammar of Spoken and Written English
(Biber et al. 1999). 2-grams are generally collocations that are semantically-based
and likely to result in topic-driven Google search results. 4-grams, on the other
hand, are much less frequent than 3-grams and were thus not likely to offer us
a broad enough sample of n-grams to choose from. To create the actual corpus,
the web pages identified through these random searches were downloaded
using HTTrack (http://www.httrack.com). Our ultimate goal in this project is to
carry out linguistic analyses of internet texts from the range of web registers. To
prepare the corpus for such analyses, non-textual material was removed from all
web pages (HTML scrubbing and boilerplate removal) using JusText (http://code.
google.com/p/justext). Finally, for the present pilot study, we randomly extracted
1,000 web pages from the larger corpus (with URLs from the US, UK, CA, AU, NZ).
Roughly 7 % of the web pages in this initial sample were dropped from the reg-
ister analysis: 33 of the 1,000 web sites in the corpus were no longer available at
the time of coding and an additional 36 web pages consisted mostly of photos or
graphics. Consequently, the results reported below are based on a corpus of 931
web pages.
2.2 Overview of procedures
The study described here is part of a larger project, designed to identify the reg-
isters found on the web, document the extent to which each of those registers is
actually used and ultimately undertake comprehensive linguistic analyses of those
register categories as the basis for automatic register and genre identification.
The first step required to reach these goals was to establish a set of regis-
ter distinctions that end-users actually recognise and can reliably identify. This
step turned out to be highly challenging, requiring several rounds of pilot testing
with end-users. In the process, we reconsidered our basic approach, developing
a decision tree of situational characteristics rather than asking users to directly
identify the register category of a given internet text. We discuss these register
distinctions, and the development of a web classification tool, in Section 3 below.
Once we had developed this tool, and verified that end-users were able to
reliably identify the register distinctions built into the tool, we moved on to the
larger pilot study to explore the types and distributions of registers found on the
web. We recruited 85 raters (typical end-users of the web) to analyse the 1,000
web pages in our pilot corpus. Raters were recruited through Mechanical Turk.
Mechanical Turk is an Amazon-based online crowd-sourcing utility that connects
Requesters – or people who need small tasks completed by human raters—with
Workers – or people who are willing to complete those small tasks for money.
Each web page was coded by four independent raters, so we were able to analyse
the reliability of the coding. We determined that four was the optimal number of
raters as a result of several rounds of pilot research. The choice to use 1,000 URLs
was based mostly on practicality and the money available to us. While there was
consensus on the coding of the majority of pages, this approach also allowed us
to identify the existence of ‘hybrid registers’ (see Section 5 below). Finally, we
compiled distributional results from the coding, providing the basis for our pre-
liminary description of register variation on the web (Sections 4–5).
3 Register categories distinguished in the study

Before undertaking empirical investigation of the registers found on the web, we
needed to decide on a set of register categories to be used for the coding. For this
purpose, we began with the 78 register/genre categories identified through the
wiki-based collaboration of Web-as-Corpus experts (http://www.webgenrewiki.
org/; see also the discussion in Rehm et al. 2008). We catalogued the underlying
situational characteristics of those 78 categories (e.g., mode, interactivity, commu-
nicative purpose; see Biber and Conrad 2009, Chapter 2), and based on that anal-
ysis, we developed a framework with the eight general registers shown in Table 1.
Table 1: General web register categories distinguished in the study
A. Internet texts that originated in the spoken mode

(e.g., transcripts of speeches or interviews)
B. Internet texts that originated in the written mode
1. Interactive written internet texts
2. Non-interactive written internet texts
2.a. Narratives
2.b. Informational descriptions or explanations
2.c. Overt opinions
2.d. Information presented with the intent to persuade
2.e. How-to procedures or instructions
2.f. Lyrical discourse
In our early pilot studies, we asked non-expert users of the internet to categorise
web pages by directly identifying the register category of each page. However, this
approach proved problematic, in some cases achieving agreement rates below
50 %. As a result, we developed a more bottom-up approach involving a deci-
sion tree with basic situational characteristics. At the top level, we asked users to
make a 2-way decision about the mode of production:
1. Internet texts that originated in the spoken mode (e.g., transcripts of speeches
or interviews)
2. Internet texts that originated in the written mode
Then, for the written texts, we asked users to distinguish between interactive dis-
cussions (e.g., discussion forums) versus non-interactive internet texts. Even this
simple distinction is often not clear-cut on the web, because authored web docu
ments are often followed by reader comments. We thus made it clear to coders
that ‘written interactive discussions’ are distinct from written documents fol-
lowed by reader comments, and that coders would be able to note the existence
of reader comments for non-interactive texts later in the process. These reader
comments are common in web documents. While we do not currently have plans
to classify documents with reader comments differently than those without com-
ments, coding for their presence makes this a possibility for future analyses.
For the first two general categories above (spoken and interactive written), we
immediately asked coders to identify a specific sub-register (see Table 2 below).
In both cases, users could select ‘other’ if the page did not fit clearly into one of
the existing categories.
For the third general category – written non-interactive internet texts – we

asked users to distinguish among general registers based on communicative
purpose:
– to narrate or report on EVENTS [past, present, or future]
– to describe or explain INFORMATION
– to express OPINION
– to describe or explain FACTS WITH INTENT TO PERSUADE
– to explain HOW-TO or INSTRUCTIONS
– to express oneself through LYRICS
Then, once a user had selected one of those general categories (2.a.–2.f. in the list
above), we asked them to identify the specific sub-register. The full list of general
register and specific sub-register distinctions in our framework is listed in Table
2 below.
Table 2: Web registers and sub-registers distinguished in the study
1. Internet texts that originated in the SPOKEN mode

– interview
– formal speech
– transcript of video/audio recording
– TV/movie script
– other (spoken)
2. INTERACTIVE internet texts that originated in the WRITTEN mode

– question/answer forum
– discussion forum
– reader/viewer responses
– other (discussion)
3.–8. Non-interactive internet texts that originated in the written mode
3. NARRATIVES or reports of events [past, present or future]

– news report/blog
– sports report/blog
– personal/diary blog
– historical article
– short story
– novel
– biographical story/history
– magazine article
– memoir
– obituary
– travel blog
– other (narrative)
Table 2(continued)
4. INFORMATIONAL DESCRIPTION or EXPLANATION

– description (place, product, organisation, program, job, etc.)
– description of a person (including celebrity profiles)
– frequently asked questions (FAQ) about information
– encyclopedia article
– abstract
– research article
– course materials
– informational blog
– legal terms and conditions
– technical report
– other (informational)
5. express OPINION
– opinion blog
– review (product, service, movie, etc.)
– advice
– religious blog/sermon
– advertisement
– self-help
– letter to the editor
– other (opinion)
6. describe or explain FACTS WITH INTENT TO PERSUADE

– description with intention to sell
– editorial
– persuasive article or essay
– other (informational persuasion)
7. explain HOW-TO or INSTRUCTIONS

– instructions
– frequently asked questions (FAQ) about how to do something
– how-to
– technical support
– recipe
– other (instructions)
8. express oneself through LYRICS

– poem
– prayer
– song lyrics
– other (lyrical)
4 Distribution of registers on the web

Applying the register classification scheme outlined in the last section, we asked
85 raters to code the register characteristics of 1,000 web pages, with each text
being coded by four different raters. As noted above, ca. 7 % of the web pages
in our initial sample were dropped from the register analysis (pages that were
no longer available or consisted mostly of photos or graphics). Thus, the results
reported below are based on a corpus of 931 web pages.
As Table 3 shows, at least three raters were able to agree on the general regis-
ter category for 62.7 % of the web pages in our corpus (see Table 3 below). All four
raters agreed on the classification of ca. 34 % of the texts, while three of the four
raters agreed on the classification of an additional ca. 29 % of the texts. For 11 %
of the texts, raters showed a 2-2 split in their classifications. It turned out, though,
that many of the specific classifications in these splits occurred repeatedly in the
corpus. As a result, we explored the possibility that these common 2-2 splits repre-
sent ‘hybrid registers’ on the web. We return to that possibility in Section 5 below.
Table 3: Agreement results for the general register classification of 931 webpages
4 agree 3 agree 2-2 split 2-1-1 split No Total

agreement
315 269 104 173 70 931

33.8 % 28.9 % 11.1 % 18.6 % 7.6 % 100 %
Table 4 shows that the levels of agreement were somewhat lower for the coding of
specific sub-register categories: raters were able to agree on the sub-register for
ca. 43 % of the web pages (with 3 or all 4 raters in agreement), while an additional
ca. 8 % of these pages were coded with a 2-2 split.
Table 4: Agreement results for the specific sub-register classification of 931 webpages
4 agree 3 agree 2-2 split 2-1-1 split No Total

agreement
171 231 73 90 366 931

18.3 % 24.8 % 7.8 % 9.8 % 39.3 % 100 %
Taken together, the distributional results from the pilot study show that non-
expert web users can, to a large extent, reliably classify web pages into general
register categories, and that there is substantial agreement even for specific
sub-register categories.
The data obtained from this coding process allow us to begin to explore the
content of the web, asking what registers are especially prevalent and which ones
are relatively rare. Thus, Table 5 shows the breakdown of general register cate-
gories (presented in order of frequency) for all 931 texts in our corpus (see Table
3 above). Table 6 shows the breakdown of specific sub-registers within each of
these general register categories.
Table 5: Frequency information for general register categories
General Register # %
Narrative 177 19.0

Informational Description/Explanation 140 15.0
Interactive Discussion 79 8.5

How-to/Instructional 27 2.9
Lyrical 19 2.0
Informational Persuasion 15 1.6
Spoken 6 0.6
Hybrid (see Section 5) 277 29.7
No agreement 70 7.5
Total 931 100
Table 6: Frequency information for sub-register categories
Register # %
Narrative 177
News report/blog 99 55.9

Sports report/blog 19 10.7
Personal/diary blog 7 4.0
Historical article 4 2.3
Short story 3 1.7
Novel 2 1.1
Biographical story/history 1 0.6
Joke 0 0
Magazine article 0 0
Memoir 0 0
Obituary 0 0
Other factual narrative 0 0
Table 6(continued)
Register # %
Other fictional narrative 0 0

Other personal narrative 0 0
Travel blog 0 0
No agreement on sub-register 42 23.7
Informational Description/Explanation 140
Description of a thing 34 24.3

Description of a person 9 6.4
Research article 7 5.0
Abstract 5 3.6
Legal terms and conditions 4 2.9
FAQ about information 2 1.4
Encyclopedia article 2 1.4
Informational blog 2 1.4
Course materials 1 0.7
Technical report 1 0.7
Opinion 121
Opinion blog 57 47.1

Review 23 19.0
Advice 9 7.4
Religious blog/sermon 5 4.1
Self-help 1 0.8
Advertisement 0 0
Letter to the editor 0 0
Interactive Discussion 79
Question/answer forum 46 58.2

Other forum 7 8.9
Other discussion 1 1.3
Reader/viewer responses 0 0
How-to/Instructional 27
How-to 13 48.1
Technical support 2 7.4
Recipe 1 3.7
Instructions 0 0
FAQ 0 0
Table 6(continued)
Register # %
Lyrical 19
Song lyrics 17 89.5

Other 1 5.2
Poem 0 0
Prayer 0 0
Informational Persuasion 15
Description with intent to sell 8 53.3

Persuasive article or essay 2 13.3
Editorial 0 0
Other 0 0
Spoken 6
Interview 5 83.3
Transcript of video/audio 1 16.7
TV/movie script 0 0
No agreement on sub-register 0 0
Based on the data in our pilot corpus, the most common general internet register
is Narrative (19 % of the texts in our corpus; see Table 5). Table 6 shows that ca.
65 % of the texts in this general register were classified as either News report/
blogs or Sports reports/blogs. Many of these texts are examples of registers found
in print media that have simply been transferred to the web. At first we planned
to distinguish news/sports blogs, which have their origin on the web, from news/
sports reports that have their origin in print media. In practice, though, it proved
nearly impossible to determine whether a news/sports report was originally pub-
lished in a print newspaper or whether it had been written specifically for a web
blog. As a result, we treat these reports and blogs as a single category (although
it was generally easy for raters to distinguish between news reports/blogs versus
sports reports/blogs, based on the topic of the text).
The second most frequent general register is Informational Description/
Explanation (15 % of the texts in our corpus; see Table 5). However, as Table 6
shows, raters often failed to agree on the specific sub-register for this general
category (52 % of the total texts). In future research, we plan to investigate the
possibility of hybrid registers at the sub-register level to better understand the
nature of these texts.
Opinion web pages were nearly as common as description pages (see Table 5).
Nearly half of these were classified as Opinion blogs (47 %), while another 19 %
were classified as Reviews. In general, there was much higher agreement about
these sub-register categories of Opinion than there was for the general category
of Informational Description/Explanation.
The Interactive Discussion general register was also used relatively fre-
quently, and the majority of these texts were classified as Question/Answer
forums. Similar to blogs, these are specialised web registers not found in print
media.
The other four general register categories – Lyrical, How-to/Instructional,
Informational Persuasion and Spoken – occurred much less frequently than the
major categories of Narration, Informational Description/Explanation, Opinion
and Interactive Discussion. However, it is clear that these registers each comprise
one or two important sub-register categories. For example, the specific sub-regis-
ters of song lyrics and spoken interviews were especially prevalent.
While some of these general registers and sub-registers are very similar to
traditional print registers (e.g., News reports, Sports reports, Reviews, Research
articles, Song lyrics), many of them are unique to the domain of the internet. For
example, the sub-registers of Personal/diary blogs and Opinion blogs, as well
as the general register of Interactive Discussion are distinctive to the internet.
Furthermore, some of the web registers that appear to be traditional are actu-
ally quite different from their printed, non-internet counterparts. This is due to
several factors, including the relative ease of ‘publishing’ on the internet and
decreased attention to pre-planning and editing common in many internet regis-
ters. In future research, we plan to explore these innovative registers in consider-
ably more detail (see Section 6 below).
5 H
ybrid registers
At the beginning of Section 4, we noted that many web pages were coded with a
2-2 split. For example, two raters might have coded a given page as a ‘narrative’,
while two other raters classified the same page as an ‘informational description/
explanation’. One interpretation of these splits is that they simply show a lack of
agreement among raters, reflecting a lack of reliability in the register framework.
However, the actual distribution of these pairings suggests a different interpreta-
tion.
In theory, there are 28 different 2-2 categories that could be formed by com-
bining the 8 general register categories in our framework. So, for example, there
are 7 different 2-2 categories that could have been formed by combining ‘narra-
tive’ with one of the other categories (narrative-spoken, narrative-interactive
discussion, narrative-informational description, narrative-opinion, narrative-in-
formation presented with the intent to persuade, narrative-how-to, narrative-lyr-
ical). Similarly, there are 21 other pairings of general registers that are theoreti-
cally possible.
Given this fact, it is surprising that only four combinations of general registers
commonly occurred in 2-2 splits (see Table 7): Narrative+Informational Descrip-
tion, Narrative+Opinion, Informational Description+Opinion and Informational
Persuasion+Opinion. Other combinations occur in 2-1-1 splits (see Table 8). This
restricted set of commonly occurring register combinations suggests an alterna-
tive explanation for the lack of agreement among raters: rather than reflecting a
problem with the coding rubric, these common 2-2 combinations (and 2-1-1 com-
binations) can be interpreted as evidence that these texts belong to ‘hybrid’ reg-
isters – registers that combine the communicative purposes and other situational
characteristics of two or more general registers.
Evidence for this interpretation comes from the fact that these combina-
tions were identified by coders much more often than others. In particular, the
frequent hybrid combinations are restricted to four general register categories:
Narrative, Informational Description/Explanation, Opinion and Informational
Persuasion. These four general register categories are distinguished primarily by
their communicative purposes: For example, Table 7 shows that Narrative+Infor-
mational Description occurred 43 times, accounting for ca. 41 % of all 2-2 splits.
Table 8 shows that Narrative+Description+Other also accounts for ca. 56 % of 2-1-1
splits, further supporting the existence of a hybrid register that combines these
purposes.
Table 7: General register 2+2 hybrid combinations
Hybrid Combination (2+2) Count
Narrative + Informational Description/Explanation 43

Narrative + Opinion 27
Informational Description/Explanation + Opinion 17
Informational Persuasion + Opinion 11
Informational Description + Informational Persuasion 6
Informational Description + How-to/Instructional 4
Interactive Discussion + Opinion 4
Informational Description + Interactive Discussion 3
How-to/Instructional + Opinion 3
TOTAL 118
Table 8: General register 2+1+1 hybrid combinations
Hybrid Combination (2+1+1) Count
Narrative + Description + Opinion 56

Description + Informational Persuasion + Opinion 40
Narrative + Description + Informational Persuasion 28
Informational Persuasion + Narrative + Opinion 24
Description + How-to/Instructional + Opinion 15
Other combinations 10
TOTAL 173
Text Sample 1 illustrates a web page from the Daily Mail with combined Narra-
tive+Informational Description communicative purposes. Two raters coded the
sub-register of this text as a news report/blog and two other raters coded it as
a description of people. This text occurs online as a single web page (which is
still available on the web, despite its dated content). However, the text comprises
a series of topics, demarcated only by the use of ALL-CAPS. (The formatting of
the 8th paragraph is corrupted in the original version of the page online, since
THURSDAY nights and THE fashionable residents seem to begin new topics.) The
title of the page (It’s King Tony to see you, ma’am) seemingly relates only to the
first of these embedded topics. Such pages are common on the web (and perhaps
becoming more common in print media). They have no single topic or commu-
nicative purpose, except maybe to present a bunch of information that the author
happens to find interesting or amusing. The information in the page is sometimes
descriptive and sometimes narrative, resulting in the hybrid nature of such texts.
Text Sample 1:
<http://www.dailymail.co.uk/debate/columnists/article-316674/Its-King-Tony-maam.html>
<h> It’s King Tony to see you, ma’am

 Tony and Cherie Blair arrived at Balmoral last night for their annual get-together with
the Queen and the Duke of Edinburgh.
 The Blairs have spent the summer touring the West Indies, Italy and Greece, hobnob-
bing with celebrities and world leaders, barely spending a penny of their own money. A
Royal tour in all but name.
 The Windsors spent most of the summer pottering unnoticed around Britain.
 One can’t help wondering why Her Majesty doesn’t just hand over the key to the castle.
 BORIS JOHNSON is in big trouble with Commons speaker and former sheet-metal worker
Michael ‘Gorbals Mick’ Martin. The Tory MP’s new novel features a Commons Speaker who
is a “buttockclenching, fat, tactless, Left-wing Scot who eats the traditional sheet-metal
worker’s breakfast of black pudding”. Order! Order!
 DON’T be taken in by claims that Tory chairman Liam Fox patched up the row over the
warning by Karl Rove --George Bush’s aide – that Michael Howard will never be allowed to
meet the President. Rove was “too busy” even to speak to Fox at the Republican convention,
let alone sit next to him during Bush’s speech, as was claimed.
 CHERIE BLAIR’S new job as ambassador for Britain’s 2012 Olympic bid has surprised
friends who cannot recall her interest in sport. She is being ‘coached’ by her new spin doctor
Jo Gibbons, a former Football Association aide.
 Gibbons is best friends with Jo Moore, the Labour aide who “coached” the former Trans-
port THURSDAY nights at London disco, Base 1, situated in a basement beneath the Tory
Party’s new HQ in Victoria Street, Westminster, are booming. The club has been “adopted”
by smart preppy males who work for the Conservatives and pop downstairs for a sweaty
session of high-energy dancing once a week. THE fashionable residents of Suffolk resort
Walberswick – including film-maker Richard Curtis and his partner Emma Freud, daughter
of ex-MP Clement – may be alarmed to learn the least fashionable member of the Cabinet
has moved in. Defence Secretary Geoff Hoon, the kind of man who wears knee-length socks
with open-toed sandals on his hols, is a new neighbour. Somehow he mingled with them
unnoticed at last week’s summer fete.
 THE death of spin has been greatly exaggerated. Labour HQ has sent out invitations to
MPs summoning them to a series of three all-day training sessions on how to ‘spin’ stories
to the media.
It is perhaps not surprising that such texts also often include opinionated pur-
poses. (Even Text Sample 1 could be interpreted in that way, although there are
few overt lexico-grammatical expressions of stance.) In particular, personal
blogs commonly combine narrative and opinionated purposes. For example,
Text Sample 2 was coded by two raters as a narrative-personal blog, and by two
raters as an opinion blog. A quick read through this text shows both purposes: it
begins with a narrative, but it also includes considerable discussion that could be
regarded as overt opinion (e.g., my gut is; Here’s one good reason to do that; But
I’m already on-side with that argument. It’s time to convince people…; ‘Making the
internet happen’ shouldn’t be magic).
Text Sample 2:
<http://matthewsheret.com/2011/08/26/time-to-get-out-more/>
<h> Time to get out more

 So, I’ve been thinking about something else that Laptops and Looms threw up for me.
 At one point someone -- I think it was Alice Taylor -- remarked that we’re really good at
talking about post-digital stuff to one another, but that it’s time to talk to other people. And
while many people at the event seemed to think about that in the context of reaching out to
manufacturers and discussing new ways of grokking production, my gut is that we should
talk more to people totally uninvolved with the whole thing.
 Here’s one good reason to do that. It was fascinating, hearing what a bunch of people
might do if given the opportunity to turn old mills and factories built a hundred and fifty
years ago into things that operate in the space between digital interfaces and traditional
manufacture. But I’m already on-side with that argument. It’s time to convince people
who’ll have to live with those products and live alongside the places that produce them.
 Here’s another. Russell jokingly mentioned the ‘Google apprenticeship’ as a means of
answering some of the questions floating around the room to do with aspiration, but my
gut feeling is that you get people engaged with working in companies like Google when
you demystify the whole process. ‘Making the internet happen’ shouldn’t be magic that
someone else does anymore, it should be something we show off.
<h> Find me at
<h> Email me
Finally, informational/descriptive texts often incorporate evaluative language,

but they are not uniformly regarded as ‘opinionated’. Text Sample 3 presents an
extreme case: a business report on a corporation that begins with an explicit dis-
claimer that the blog represents ‘personal opinions’. However, this text is mostly
presented as a simple report of information. It overtly identifies ‘strengths’ and
‘weaknesses’, but the information provided appears to be mostly factual descrip-
tion. Reflecting these combined purposes, two raters coded this text as an opinion
blog, one rater coded it as descriptive information and one coded it as a news
report/blog.
Text Sample 3:
<http://beta.fool.com/leglamp/2012/11/09/get-a-leg-up-on-the-market/16123/>
<h> Get a LEG Up on the Market

 AnnaLisa is a member of The Motley Fool Blog Network -- entries represent the personal
opinions of our bloggers and are not formally edited.
 Leggett & Platt (NYSE: LEG ) , the diversified bedspring, automotive, and industrial
manufacturer, just announced it would pay its dividend early so that shareholders wouldn’t
see a big tax on the dividend usually paid out in January. The early Christmas present goes
ex-dividend on Dec. 10, with the dividend to be paid out on Dec. 27. Leggett & Platt seems to
be one of the first companies to react to an anticipated tax increase on dividends come 2013.
This Standard & Poor’s dividend aristocrat is certainly shareholder attentive, but let’s drill
down on this company’s strengths, weaknesses, opportunities, and threats.
 STRENGTHS
 The company is extremely shareholder friendly, with dividends paid since 1987, and has
more than 25 consecutive years of increasing the dividend.
 An EPS growth rate of 15 %, and a P/E that currently stands at 21.59.
 The company is diversified across many industries besides their original status as a
bedspring company. It also manufactures retail store fixtures and display units, industrial
parts (especially for automotive and aviation), and parts for office and residential furniture.
 Their latest 10-K states the company plans to maintain a 4-5 % growth rate.
 The company repurchased 10 million shares in 2011.
 Their latest Q3 earnings release on Oct. 29 beat with EPS rising 45 % over the same
quarter a year ago and reflected strong volume and expanding margins.
 The yield now stands at 4.20 %
 WEAKNESSES
 The payout ratio on the yield is 90 % , very high for a company that is not a REIT or a
master limited partnership.
 Their P/E is higher than the industry average and higher than the 15.63 P/E of competitor
Genuine Parts Company (NYSE: GPC )
 While they manufacture most of their steel wire in house, steel is their number one raw
material and fluctuations in steel prices are a continuing concern, according to their 10-K.
 Revenue from international operations dropped due to currency fluctuations. […]
Three-way splits, summarised in Table 8 above, suggest that there might be

hybrid registers that combine multiple communicative purposes. The most fre-
quent 3-way hybrid is Narrative+Opinion+Description. Text Sample 3 above gives
one example of this type. Another example of a 3-way hybrid was coded as a
News report/blog (2 raters), a Description of a person (1 rater) and an Opinion
blog (1 person). The title of this text is enough by itself to demonstrate the triad
of characteristics recognised by raters: ‘On the road: Bradley Wiggins and Team
Sky have made Tour de France history – it’s been emotional’. This text is a blog
post that recounts a recent news story (Narrative), describes a team of athletes
(Description), and recounts the emotions and attitudes of the author (Opinion).
A different kind of hybrid register is extremely common on the web: pages
that present a text followed by reader comments. Table 9 shows that this type
of hybrid can occur with any of the non-interactive written registers.1 However,
it is interesting to note that reader comments are much more likely with some
registers than others. In particular, pages expressing opinions or persuasion are
especially likely to include reader comments: ca. 60 % of opinion pages and 80 %
of informational persuasion pages are followed by reader comments.
1 This option is not applicable to written interactive discussions, which incorporate reader com-
ments by definition. We are not sure why transcribed texts of spoken events are not followed by
reader comments in our sub-corpus.
Table 9: Frequency information for texts containing reader comments
Register Count % of register

with comments
Narrative 87 49.1 %
Opinion 86 61.4 %
Description 37 30.6 %
Informational Persuasion 12 80.0 %
How-to/Instructional 8 29.6 %
Lyrical 4 21.1 %
Spoken 0 0
Discussion 0 0
Total 234 --
6 Summary and future directions

The approach for register classification adopted here – a bottom-up hierar-
chical framework based on underlying situational characteristics – allows us
to describe the register characteristics of most web pages. Raters agree on the
general register category of ca. 63 % of the web pages included in our corpus (see
Table 3 above). Approximately another 25 % of these texts were coded as ‘hybrid’
registers belonging to a few combinations that occur commonly on the web (e.g.,
Narration + Information Description; Narration + Opinion; see Tables 7 and 8).
Taken together, these results indicate that approximately 88 % of web pages can
be reliably described for their singular or hybrid register characteristics.
An alternative perspective is to consider the register categories themselves,
regarding the extent to which general registers occur in their ‘simple’ state, rather
than as hybrids in combination with some other register category. At one extreme,
Table 10 shows that interactive discussions (e.g., question-answer forums) and
lyrical texts (e.g., songs or poems) usually occur as ‘simple’ registers, with only
ca. 30 % of those texts being coded as hybrids in combination with some other
register category, a relatively small proportion in comparison with several of the
other register categories. At the opposite extreme, Informational Persuasion was
almost never identified as the simple register of a web text. However, it was com-
monly selected by at least one of the raters, suggesting that this communicative
priority frequently occurs in hybrid combinations with other general register
categories.
Table 10: Extent to which each register category was identified as a simple register (3 or 4 raters
in agreement), as a hybrid category (2-2 or 2-1-1 splits), or by only 1 rater
General Register 3-4 raters 2 raters 1 rater Total (100 %)
Narrative 177 (47 %) 109 (29 %) 91 (24 %) 377
Informational Description/ 140 (30 %) 97 (21 %) 231 (49 %) 468

Explanation
Opinion 121 (50 %) 114 (47 %) 8 (3 %) 243
Interactive Discussion 79 (69 %) 14 (12 %) 22 (19 %) 115
How-to/Instructional 27 (33 %) 23 (28 %) 33 (40 %) 83
Lyrical 19 (68 %) 3 (11 %) 6 (21 %) 28
Informational Persuasion 15 (8 %) 38 (21 %) 125 (70 %) 178
Spoken 6 (43 %) 8 (57 %) 0 (0 %) 14
Narration, description, exposition and argumentation have long been regarded

as core textual distinctions distinguished by their communicative purposes (cor-
responding to the rhetorical ‘modes’ of discourse; see Connors 1981). In the reg-
ister framework developed here, we divided these distinctions up in a somewhat
different way, based on our survey of the kinds of texts found on the web and
our early pilot studies to investigate the distinctions that end-users could reliably
make (see Sections 2 and 3 above). Thus, we ended up combining ‘exposition’ and
‘description’ into our category of Informational Description/Explanation, while
we split ‘persuasion’ into two categories: Opinion (expressing attitudes with little
supporting evidence) and Informational Persuasion (a type of exposition with a
clear intent to sell or persuade).
However, our preliminary results, summarised in Table 10, indicate that
these general register categories are not equally well-defined for end-users. For
example, almost half of the texts in our corpus (468 of the 931 texts) were coded
as Informational Description/Explanation by at least one rater, suggesting that
most texts can be regarded as presenting some kind of description/explanation of
information. Texts were also commonly coded as having narrative purposes (377
texts), often in hybrid combinations with other registers.
The results for opinionated/persuasive texts are especially interesting here.
On the one hand, the category of simple opinion seems to be relatively well
defined: half of the texts classified as such in some way were categorised as
simple opinion by 3 or 4 raters. In most other cases, if a text was coded as opinion
by two raters, it was coded as narration or description by the other raters. By con-
trast, the category of Informational Persuasion seems especially problematic: it

was almost never identified as the simple register of a text, but there were many
instances where one rater noted this communicative priority. Over half of those
texts were coded as simple opinion by other raters, suggesting that these two
general registers are especially difficult to distinguish. Results like this point to
the need for more detailed future research focused on these categories.
In our on-going research, we are applying the framework and analytical
approach outlined here to a much larger corpus, with over 50,000 texts randomly
sampled from the web. That research effort will allow us to investigate the extent
to which the patterns described in Sections 4 and 5 above are typical of the web
more generally and to undertake more detailed analysis of specific patterns
(especially regarding sub-registers and sub-register hybrids). Beyond that, we
plan to analyse the lexico-grammatical characteristics of those texts and eventu-
ally undertake predictive research for the purposes of automatic register (genre)
identification.
One of the major limitations of the hierarchical approach used for these
analyses is that specific sub-registers are restricted to a single general register
category on an a priori basis. For example, sports blogs are listed only as a sub-
register of Narrative; reviews are listed only as a specific sub-register of Opinion;
editorials are listed only as a specific sub-register of Informational Persuasion.
This approach was motivated by two considerations: 1) previous research had
indicated that end-users become overwhelmed when they are required to directly
choose from a massive list of specific sub-registers and 2) we therefore believed
that general register categories – isolating specific situational characteristics –
would be easier to identify than specific sub-registers. However, review of our
findings here suggests the need to further explore these decisions.
As a result, we also plan to explore the possibility that some sub-register dis-
tinctions might be easier to directly identify than general register distinctions.
For example, a particular text might be a clear instance of a sports blog. However,
given the design of our coding framework at present, an end-user might never
be given the chance to make that simple classification. For example, if a user
decided that a text was primarily opinionated rather than narrative, there would
be no possibility of subsequently identifying the text as a ‘sports blog’ (see Table
2 above).
To explore this possibility, we plan to recode a set of web pages from our
corpus, asking users to directly choose a specific sub-register category. Then, the
results of the hierarchical coding will be compared to the results of the direct
sub-register coding for those texts. Our expectation is that the two approaches
will uncover complementary patterns. For example, we expect to find some texts
that clearly belong to a single specific sub-register but combine multiple general
registers (e.g., a sports blog with both narrative and opinionated purposes).
We also expect to find some common hybrid sub-register categories that bridge
general registers (e.g., a personal blog + opinion blog hybrid; or an editorial +
review hybrid). We would not argue that one or the other of these approaches is
correct, but taken together, our hope is that we will be able to offer a more com-
prehensive description of the incredible range of register variation found on the
web.
Acknowledgements
This material is based upon work supported by the National Science Foundation
under Grant No. 1147581. We also thank Anna Gates and Rahel Oppliger for their
help with the pilot testing of register classification schemes.
References
Baroni, Marco and Silvia Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the
web. Proceedings of LREC 2004, 1313–1316. Lisbon: ELDA.
Baroni, Marco, Silvia Bernardini, Adriano Ferraresi & Eros Zanchetta. 2009. The WaCky wide
web: A collection of very large linguistically processed web-crawled corpora. Language
Resources and Evaluation 43(3). 209–226.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University
Press.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge
University Press.
Connors, Robert J. 1981. The rise and fall of the modes of discourse. College Composition and
Communication 32(4). 444–455.
Crowston, Kevin, Barbara Kwaśnik & Joseph Rubleske. 2010. Problems in the use-centered
development of a taxonomy of web genres. In Alexander Mehler, Serge Sharoff & Marina
Santini (eds.), Genres on the Web: Computational models and empirical studies, 69–84.
New York: Springer.
Egbert, Jesse & Douglas Biber. 2013. Developing a user-based method of web register
classification. In Stefan Evert, Egon Stemle & Paul Rayson (eds.), Proceedings of the 8th
Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013, 16–23.
Fletcher, William H. 2012. Corpus analysis of the World Wide Web. In Carol A. Chapelle (ed.),
Encyclopedia of applied linguistics, 1339–1347. Hoboken, NJ, Wiley-Blackwell.
Kilgarriff, Adam and Gregory Grefenstette. 2003. Introduction to the special issue on the Web
as Corpus. Computational Linguistics 29. 333–347.
Lindemann, Christoph & Lars Littig. 2010. Classification of Web sites at super-genre level.
In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web:
Computational models and empirical studies, 211–235. New York: Springer.
Rehm, Georg, Marina Santini, Alexander Mehler, Pavel Braslavski, Rüdiger Gleim, Andrea
Stubbe, Svetlana Symonenko, Mirko Tavosanis & Vedrana Vidulin. 2008. Towards a
reference corpus of Web genres for the evaluation of genre identification systems.
In Proceedings of the 6th Language Resources and Evaluation Conference, 351–358,
Marrakech, Morocco.
Rosso, Mark A., & Stephanie W. Haas. 2010. Identification of Web genres by user warrant.
In Alexander Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web:
Computational models and empirical studies, 47–68. New York: Springer.
Santini, Marina. 2007. Characterizing genres of Web pages: Genre hybridism and
individualization. In Proceedings of the 40th Hawaii International Conference on System
Sciences (HICSS-40). Hawaii.
Santini, Marina. 2008. Zero, single, or multi? Genre of Web pages through the users’
perspective. Information Processing and Management 44(2). 702–737.
Santini, Marina and Serge Sharoff. 2009. Web genre benchmark under construction. Journal for
Language Technology and Computational Linguistics 25(1). 125–141.
Santini, Marina. 2010. Cross-testing a genre classification model for the Web. In Alexander
Mehler, Serge Sharoff & Marina Santini (eds.), Genres on the Web: Computational models
and empirical studies, 87–127. New York: Springer.
Sharoff, Serge. 2005. Creating general-purpose corpora using automated search engine
queries. In Marco Baroni and Silvia Bernardini (eds.), WaCky! Working papers on the Web
as Corpus, 63–98. Bologna: Gedit.
Sharoff, Serge. 2006. Open-source corpora: Using the net to fish for linguistic data.
International Journal of Corpus Linguistics 11(4). 435–462.
Sharoff, Serge, Zhili Wu & Katja Markert. 2010. The Web library of Babel: Evaluating genre
collections. In Proceedings of the Seventh Language Resources and Evaluation Conference,
LREC 2010. Malta.
Vidulin, Vedrana, Mitja Luštrek & Matjaž Gams. 2009. Multi-label approaches to Web genre
identification. Journal for language technology and computational linguistics 24(1).
97–114.
Heidrun Dorgeloh
The interrelationship of register and genre
in medical discourse
Abstract: This chapter is concerned with medical discourse which is produced
beyond the established roles of doctors and patients. The text varieties inves-
tigated are all somewhat hybrid, either in form, discourse function, or both. A
study based on a small corpus of these texts investigates the presence of features
from a narrative discourse mode and finds variable relationships of textual form
and textual function, which are then discussed from a genre as well as from a
register perspective. While it turns out that the presence of a narrative register
crosscuts over specific discourse activities, the genre perspective can explain the
nature of this textual variation. It accounts for the pervasiveness of linguistic fea-
tures but, more importantly, for the variant discourse functions which apply to
the verbalisation of medical experience. In such cases, it is argued, a genre ana
lysis logically subsumes and pre-determines a register analysis.
1 I ntroduction
Medicine uses a variety of texts since it is both an “area of knowledge […] and the
applied practice of that knowledge to medical praxis” (Gotti and Salager-Meyer
2006: 9). Accordingly, most linguistic research on medical discourse focuses
either on written genres of the medical profession, such as case reports or medical
research articles, or on the speech of medical practitioners and their patients, i.e.
on medical encounters or interviews. By contrast, the present study is concerned
with text varieties in medicine which are produced beyond the established roles
of both speaker groups. It deals with illness blogs, on the one hand, and medical
case presentations, including some innovative forms, on the other. These consti-
tute, in line with the purpose of the present volume (cf. Schubert, this volume),
less established and more hybrid forms of medical case writing and thus provide
good cases in point for illustrating new directions in register research. In particu-
lar, I will argue for a close interrelationship between register and genre as well as
for a primacy of the notion of genre, rather than (sub-)register.
Heidrun Dorgeloh, Düsseldorf University

44 Heidrun Dorgeloh
As laid down in the introduction to this volume, register and genre are dif-
ferent perspectives for analysing text variety: the register perspective considers
functional correlations of linguistic co-occurrence patterns with variables from
the situation of use while the genre perspective refers to properties of entire texts
and has a conventional basis (Biber and Conrad 2009: 15; also Schubert, this
volume). It results from this distinction that a register analysis rests upon quan-
titative co-occurrence patterns in a given situation whereas genre characteristics
can actually be quite rare. They contribute to the rhetorical organisation of a text,
often occurring only once or in a particular position (Biber and Conrad 2009: 16).
Since textual variation can in principle refer to any level of text classification
(Biber 2006: 12) other approaches to register and genre point out that the con-
cepts also differ in the level of generality at which they determine situational vari-
eties (Giltrow 2010; Dorgeloh and Wanner 2010). The concept of a genre focuses
primarily on the discourse goals and purposes (e.g. Martin and Rose 2003; Swales
2004), on the kind of “social action” (Miller 1984); therefore the classification
is typically more specific for genres than for registers (Giltrow 2010: 30). More
specialised text varieties are also referred to as “sub-registers” (cf. Biber and Gray
2013), but genre studies have emphasised that the textual or social event is an
important basis for text classification, thus subsuming in one category a co-pat-
terning of setting, structure, and function (Richards and Schmidt 2002: 224). I will
argue here that for text varieties of medical discourse, which are often marked by
“discourse hybridity” (Sarangi and Roberts 1999; Sarangi 2001; also cf. Biber and
Egbert, this volume), a genre perspective in line with these approaches covers
the relevant linguistic patterns at a sufficient level of specificity. In particular, I
will show that the form-function correlations that one finds have more to do with
activity types, such as covered by the concept of genre, than with general situa-
tional parameters.
The case studies presented below contrast with more recently developing
medical genres. The aim of the analysis is to show that, on the one hand, there
are general discourse goals and purposes within medical discourse, notably
narration, which crosscut over all the texts investigated. The resulting language
variation is covered by the register perspective, since it defines a rather general,
presumably universal, register pattern (Biber and Conrad 2009: 259). On the other
hand, this pattern serves in a given genre more specific discourse goals, which
are expressed by features which need not be frequent nor pervasive. For example,
the interactional hybridity of a medical encounter includes a narrative discourse
type, but this type is embedded within a more complex social event, in which a
doctor fulfils several tasks such as data gathering, relationship building, and edu-
cating the patient about diagnoses and treatment (Frankel 2000: 85; also Maseide
2003). This variation within one activity produces more hybrid registers. In such
cases, the genre perspective has clear advantages over the register perspective,
since it focuses on the social activities going on and hence provides text classifi-
cation at a rather low level of generality. However, this means that the concept of
genre must be taken beyond the limits of rhetorical conventions.
The chapter is structured as follows: in Section 2, I offer a more detailed
consideration of the concepts of register and genre as categories for text clas-
sification from a theoretical point of view. Section 3 introduces three varieties
of medical discourse: on the one hand, it describes how they are situated with
regard to a general narrative dimension of textual variation (level of form); on the
other, the texts are discussed as instantiating different genres (level of discourse
function and social activity). The resulting profiles of the three functional varie-
ties show that the sample texts investigated are all hybrid in either form, function
or both. This complex picture is typical for the domain of medicine, and it can be
best understood from the genre perspective. Based on these profiles, an analysis
of characteristic form-function-relations within the medical register, in particular
with regard to narrative features, is provided in Section 4, followed by a conclud-
ing discussion in Section 5.
2 Some theoretical issues on register and genre

This section will discuss the concepts and positions relevant for the analysis of
the medical text varieties in Sections 3 and 4.
2.1 R
egister and genre in the context of the study of language
variation1
Language variation is conditioned by a variety of social and pragmatic factors.

When studied by way of quantitative, corpus-based methodology, there are in
principle two research goals that can be pursued: the first is “to describe the vari-
ants and use of a word or linguistic structure” and the second “to describe differ-
ences among texts and text varieties, such as registers […]” (Biber 2012: 12). While
the former approach is variationist in nature, i.e. it presupposes the existence of
“formal alternatives which can be considered optional variants, in the sense that
they are nearly equivalent in meaning” (Biber et al. 1999: 14), register variation in
1 Cf. also the introduction to Dorgeloh and Wanner (2010).

46 Heidrun Dorgeloh
principle also involves “different ways of saying different things” (Halliday 1978:
35; emphasis added). As a result, the study of textual variation deals with “varia-
tion in verbalization [which] is not occasional [… but] UBIQUITOUS” (Croft 2010:
10; emphasis in the original).
This difference allows for some insights regarding the nature of both regis-
ters and genres. Rosenbach (2002: 77) proposes the attribute “choice-based” for
this type of linguistic variation, in contrast to the “variation-based” perspective,
which concentrates on sets of formal variants. The study presented here, and
in fact the entire volume, belongs to the choice-based, “text-linguistic” tradi-
tion (Biber 2012: 12), which means that the texts themselves are the target of the
description and not a predictor for the occurrence of formal variants.2 It results
from this approach that register and genre differences are typically “not categor-
ical (such that one variety has a certain grammatical element or syntactic con-
struction which another has not)” (Kortmann 2006: 603); instead, the choices
motivated and reflected by them are “meaningful choices”, in the sense of
serving “the […] needs of the language user” (Schulze 1998: 7). As shown below,
this applies not only to the occurrence of individual linguistic features, but also
to entire patterns of textual form, which can be shared by what are nonetheless
distinct text varieties.
Another consequence of the “polyvalent” nature of “grammatical structure in
discourse” (Sankoff 1988: 141, emphasis in the original) is that genres, but not reg-
isters, are in principle formally “underdetermined” (Giltrow and Stein 2009: 3).
Only by virtue of their being “typified responses to situations” (Salmon 2010: 219)
do users of a genre generally know what to expect and infer “both the stable and
variable aspects of form” (Salmon 2010: 223). For the linguistic variation taking
place within them this means that the genre perspective includes both frequently
occurring features as well as patterns that occur less pervasively; i.e. the genre
perspective logically subsumes, rather than opposes, the register perspective.
2.2 Genre in relation to register and discourse type
Textual variation is “normal in individuals’ linguistic performance” (Honeybone

2011: 167): speakers show “shifts in usage levels” for features associated with the
situation, i.e. they switch into specific registers, but they also switch “into and
out of genres” (Schilling-Estes 2002: 375). While a register is “associated with
a particular situation of use” (Biber and Conrad 2009: 6), the concept of genre
2 A detailed account of the distinction can be found in Biber (2012).

focuses primarily on the discourse goals and purposes, including “culturally rec-
ognized” patterns (Coupland 2007: 15) for realising them. As a result, the level of
genre classification tends to be lower, i.e. more specific, suggesting that genres
can, and typically do, contrast in registers, for example when requiring a certain
level of formality or technicality. Use of a certain register is therefore a function
of, but not a sufficient condition for, a genre, i.e. the genre perspective is the more
encompassing one.
In the text-linguistic tradition, discourse goals and purposes have also led
to the establishment of text typologies, which often integrate basic rhetorical
types (e.g. Kinneavy 1971; Werlich 1976). The text or discourse type here refer to
entire texts; but this tradition is still rather separate from genre analysis, if only
due to the fact that they “feature in different studies” (Virtanen 2010: 55). By
contrast, corpus-linguistic work (e.g. Biber 1988, 1989) understands text types
as “co-occurrence variables” (Eckert and Rickford 2001: 5), i.e. these text types
are, much like registers, the outcome of a classification based on linguistic form
(Biber 1988: 170). It is a central insight from this corpus-based tradition that genre
distinctions do not “adequately represent the underlying text types” (Biber 1989:
6). This finding is further support for the position that genres are to a certain
extent underdetermined by, and hence independent of, their form.
The category of discourse type, in contrast to text type, refers more directly to
the function of a discourse (Virtanen 2010: 57), but, in contrast to the discourse
goal pertaining to a genre, this has traditionally meant a discourse classification
based on a limited set of functions; for instance, on a classification of illocutions
(e.g. Brinker 2005). It is an important insight from this kind of work that the func-
tional discourse types are related in different ways to their linguistic form, since
a discourse type can express its function more or less directly (Virtanen 1992a,
2010). Narrative structures, in particular, have been noted to have primary or sec-
ondary uses, i.e. they are a textual pattern that “can be put to use in very different
genres” (Virtanen 2010 76).3
The analysis of medical texts presented here rests upon such a principled
separation of linguistic form, i.e. register features and text structure, and dis-
course function. A classification by discourse function leads, at a more general
level, to the identification of the discourse type; at a more specific level, it results
in genres. The analysis is also based on the assumption that the category of “nar-
rative” refers both to a very basic and presumably universal register and text type
(Virtanen 1992a; Biber and Conrad 2009) as well as to a widely used discourse
type or meta-genre (Fludernik 1996; Smith 2003). In the domain of medicine,
3 Werner (this volume), for example, notes the narrative properties of online text commentaries.
48 Heidrun Dorgeloh
both narrative form and function play a prominent role, since knowledge in this
discipline is not just expertise, i.e. “relevant biological and pathological infor-
mation”, but is primarily evidence based on human experience (Hunter 1991: 8).
It is interesting to note in this context that recent discussions on medical dis-
course have argued quite explicitly in favour of a more “narrative” kind of med-
icine (e.g. Charon 2006), emphasising the importance of the individual patient
and his or her experience. As a result, there are now genres within the medical
register which are innovative particularly with respect to the role of narration.
While proper storytelling is absent in professional medical reporting, there are
now other types of medical discourse which are more open to narration. This dif-
ference, however, does not primarily manifest itself in a more or less extensive
use of narrative features. Looking at three different genres from the medical reg-
ister in this study, I therefore hypothesise here that 1) a narrative discourse func-
tion correlates only insufficiently with a narrative form, and that 2) a discourse
purpose other than narration does not necessarily result from the absence of nar-
rative form. This in turn suggests that the function or goal of a discourse is not
primarily something to be observed in the form of frequencies of occurrence. On a
more theoretical level, these findings will lead me to the claim that, with respect
to the specific discourse goals and purposes typical of the context of medicine,
the target of the description should be the genre, rather than the register.
3 Types of medical discourse
3.1 Sources and voices in medicine
The instances of medical discourse which I will cover in my analysis come from
three different sources: illness blogs written by patients, case reports written by
doctors, and texts from a special section termed “Clinical Crossroads” of The
Journal of the American Medical Association (JAMA). Each of these text varieties
is characterised more closely in Sections 3.2 to 3.4. Before discussing these genre
profiles, I will first comment on the general nature of the relation between their
situational characteristics, in particular the discourse function, and their linguis-
tic form.
The three text varieties represent discourse with different perspectives on the
topic of disease or illness; i.e. the medical topic is the only situational variable
which they share. The texts differ, not only in the different speaker roles of doctor
and patient, but, more specifically, in that these groups of authors assume, by
different ways of speaking throughout their own discourse, different “voices”
(Mishler 1984: 103). In the professional medical discourse “of disease” (Fleisch
man 2001: 475), such as in case reports, doctors primarily use the voice of medi-
cine; however, they also have a doctor’s voice when they occur in the discourse as
a participant, for example, when concerned with “information about the patient’s
current health condition, […] patient compliance, and […] test results” (Murawska
2012: 71). Patients, by contrast, have primarily a voice of health-related storytell-
ing, but over time they also develop a medical competence of their own (Cordella
2004: 119). At some point, diagnosis and further treatment become a collabora-
tive effort, which is when patients also use elements of a voice of medicine. The
interactional hybridity of medical discourse referred to above is thus primarily a
hybridity of voices and it is one of the central variables that guide linguistic vari-
ation across all medical text varieties.
By contrast, illness blogs, professional case reports and the discourse jointly
produced by doctors and patients for “Clinical Crossroads” (for details, cf.
Section 3.4) differ in a variety of other situational variables, especially those per-
taining to production circumstances and setting (cf. Biber and Conrad 2009: 40).
The text varieties under investigation are therefore not easily subsumed as one
single register. However, instead of taking up a principled position about where a
register ends, and a new (sub-)register starts, the analysis below rests upon two
observations. On the one hand, the verbalisation of a disease or illness leads to
a concern with medical case histories, which cuts across general communicative
purposes, such as to narrate or to report (cf. Biber and Conrad 2009: 40). Linguis-
tically, this is marked by a pervasive presence of linguistic features such as “past
tense, communication verbs, third person pronouns, and time adverbials”, i.e.
the characteristic features of a narrative dimension of linguistic variation (Biber
and Conrad 2009: 259). It is with regard to these features, which arise out of the
topic of illness, that the texts share the same register.
On the other hand, although there are recognisably different discourse
goals involved in the verbalisation of a case history, the difference between
“private” and “public” medicine has always been gradual, as the evolution of
medical research writing has also shown (Atkinson 1992: 361–363). While profes-
sional medicine has long drifted away from the “rhetoric of immediate experi-
ence” (Atkinson 1992: 359), and while published case reports are professional
and public, only illness blogs constitute real narratives of personal experience.
However, nowadays, with the movement towards a narrative medicine, there are
also professional texts which aim at being more “patient-focused” again (Winker
2006: 2888).
Genre categories grasp this mixing of purposes and voices present in such
developments, not only due to the level of specificity they refer to, but also
because genres are often formally underdetermined and may therefore be com-
50 Heidrun Dorgeloh
posed of hybrid form. This is illustrated in Figure 1, which shows the three text
varieties as three different genres, with distinctly different discourse goals and
purposes, as the discussion has just shown. On the level of the general commu-
nicative purpose, i.e. at a high level of generality, these discourse functions can
be described as being narrative, non-narrative, or hybrid. This categorisation
links up the genre classification to register variation, because the narrative as dis-
course mode (Georgakopoulou and Goutsos 2004: 43–47) is an important aspect
of the register in all three cases. As the analysis below will illustrate in detail,
the narrativisation of the events (Georgakopoulou and Goutsos 2004: 43) which
have to do with the course of an illness is a major source of hybrid form across
the three text varieties and therefore explains some pervasive register features.
Before turning to the linguistic features and their interrelationship with the genre
category in Section 4, the next three subsections will introduce each text variety
and the sample texts used in more detail.
hybrid form and hybrid formand

narrative function non-narrative function
patients’ tale medical case report
hybrid form and

hybrid function
Clinical Crossroads in
JAMA
Figure 1: Narrative form and function in medical text varieties
3.2 Illness blogs: The patients’ tale
Medical topics are among the ubiquitous contents on the internet (Döring 2003:
19). When patients tell their stories on the web, i.e. when they produce narratives
of illness (cf. McCullough 1989: 124), this constitutes, not “a solitary occupation”,
but one which is shaped by the context of “the community of web users” (Page
2012: 45). Patients’ tales in illness blogs are thus more interactive than when elic-
ited in medical interviews, and they establish a particularly strong relation to the
audience: “the primary function of the comments on the […] blogs is to provide
or seek support in the form of shared experience, advice, and encouragement”
(Page 2012: 45).
From the point of view of this interactive function, illness blogs qualify as
patients’ tales, i.e. proper stories, but not in the first place from a structural point
of view. Narrative discourse, in essence, “attempts to sweep narrator and audi-
ence into a community of rapport”, i.e. the aim is to move, rather than to inform
(Georgakopoulou and Goutsos 2004: 53; also Tannen 1989). This means that,
although patients’ tales typically employ a “narrative syntax” (Labov 1997: 3),
they show the narrative mode primarily due to the “function of personal inter-
est” (Labov and Waletzky 1967: 13; emphasis added). This function rests upon
the sharing of the individual experience of illness (Dorgeloh 2012: 263) and dis-
tinguishes a patient’s tale, as any other kind of story, from a report, which “is
most typically elicited by the recipient […] or in response to circumstances which
require an accounting of what went on“ (Polanyi 1985: 10–11).
The examples of the variety of illness blogs come from a website where
patients share their stories about a rare neurological disease [SPS: The Real
Stories4]. Note that, as its title suggests, the website focuses primarily on the pub-
lication of the stories, and not, as other types of illness blogs, on the discussion
and commenting of postings on illness (cf. Page 2012). As sample (1) illustrates,
the typical structure is that the patients introduce themselves and then turn to the
chronology of the events:
(1) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symptoms
for the past several years that got worse as the years went on. I was exercising and
swimming three times a week and then I started getting more muscle cramps. I went to
the doctor and he just told me to take calcium and magnesium and drink more water.
It took him a long time to understand that the muscle cramp were extremely painful
happening several time a day. I would have abdominal muscle cramps that felt like i
was in full-blown labor. They would come on suddenly when I was startled or when I
coughed. They would ease up for a few seconds and then just get worse again. Several
times my feet and hands would cramp up until they were fully distorted. I did go to a
neurologist who seemed to have an idea of what I had but made no effort to diagnosis
what I had. He told me that it would not do any good to try to diagnosis my disease
and instead gave me all kinds of different pills and most of them did not work well and
also caused several side effects. Often when I went to see him I did not feel like he even
4 http://www.stiffpersonsyndrome.net; last accessed on March 30, 2015.

52 Heidrun Dorgeloh
remembered me. I did finally request a new doctor, which has been a Godsend to me
and now is treating me with IVIG, which is working well. My symptoms still get worse
at times but they are manageable. I am eager to talk to people that have the same syn-
drome. Most people do not understand the pain and all the other symptoms. I found
your web site today and am eager to learn more. (http://www.stiffpersonsyndrome.net,
accessed March 17, 2011)
The proper narrative contained in (1) ends when the course of the events reaches
its most recent state. This description of the current situation (My symptoms still
get worse at times but they are manageable) serves as a coda and is followed by
an explicit mention of the story point. This point relates to the ill person him- or
herself, as in (1), or it centres on the social function of the blog by addressing the
readers’ interests, as in (2) and (3):
(2) If in any way I can contribute to bringing awareness to this insidious disease I throw in
my hat. (Wendy’s story; http://www.stiffpersonsyndrome.net)
(3) I must tell you that neither my wife nor myself ever gave up hope, In fact just the oppo-
site. We were very pro active in the treatment of our diseases. […] My prayer is for all
of you to see your journey through SMS with the knowledge that there is hope for all.
Stay the course, keep the faith, and fight on. (John’s story; http://www.stiffpersonsyn-
drome.net)
The story point expressed in (3) shows that the verbalisation of the experience
of illness has a strong component of self-reflection and evaluation. Many illness
blogs have such properties of “reflective anecdotes” (Page 2012: 58–59) and in
that tend towards less purely narrative text forms. It is highly typical that, instead
of the completeness of the recount and the degree of detail which one can expect
of more trivial narration (Georgakopoulou and Goutsos 2000: 125), patients’ tales
often limit themselves to “remarkable event[s], characterized by an evaluative
punch line” (Page 2012: 59). As was illustrated in Figure 1, a patient’s tale there-
fore possesses hybridity in its narrative form, since it limits the experience which
is shared to the main points of interest.
3.3 The medical case report
Case presentations in the form of published case reports are used by medical pro-
fessionals “to communicate the salient details of patient cases to one another”
(Schryer et al. 2003: 63; also Hurwitz 2006: 217), which means that the texts
pursue a predominantly professional discourse goal. On a more general level, the
discourse function is thus to inform, i.e. state “verifiable events”, rather than to
move. This function contrasts with the point of personal interest which applies to
proper storytelling, which is why the discourse mode in case reports is essentially
non-narrative (cf. Georgakopoulou and Goutsos 2004: 53).
The central component of a case report is the case presentation itself. It
begins “ritualistically with a brief account of a patient’s complaint as translated
by the doctor” (Hurwitz 2006: 234; emphasis added), followed by an account of
the examinations, findings, diagnosis and suggestions for treatment. Text (4)
exemplifies such an initial case presentation, referring to the same disease as
text (1):
(4) A 27-year-old Hispanic woman presented to the University Medical Center Emergency
Department in Las Vegas, Nevada with a sudden onset of shortness of breath and
increased difficulty in moving her right arm. She reported that during the evening
prior to her presentation, she was lying down when she began to experience shortness
of breath with worsening right-arm weakness. She also reported that for the past two
months her arm weakness was characterized as having limited strength and range of
motion. She also complained of chest pains that were localized behind her sternum.
The pain was characterized as a pressure sensation that was non-radiating. She did not
have any aggravating or relieving factors. Pertinent positive findings included nausea,
palpitations and lightheadedness. Pertinent negative symptoms included no loss of
consciousness, headache, vomiting, diarrhea, or vertigo. (Journal of Medical Case
Reports 4, 2010)
It has been noted that case reports published in journals “reorganize clinical
data using a variety of narrativising techniques” (Hurwitz 2006: 217; also Hunter
1990). However, as one can see in (4), from a narratological viewpoint this is
only a “degree-zero” narrativity (Fludernik 1996: 358); i.e. although a sequence
of events is verbalised, it is “translated” by a medical professional. The result is
a discourse which deals with a disease, i.e. which foregrounds the medical facts
and assigns “the sufferer […] the experiencer role” (Fleischman 2001: 476). In
such a text, the chronology lacks “experientiality” as the central component of
narrativity (Fludernik 1996) and is therefore only a hybrid narrative form.
3.4 ‘Clinical Crossroads’ in JAMA
In 1995, JAMA launched the publication of various types of medical discourse

within a section titled “Clinical Crossroads”. The contributions in this section
follow the organisation of a “Grand Round” in clinical departments, where case
presentations are given from various perspectives. These case presentations are
later edited and published in the journal. The full process is described as follows
(cf. also Dorgeloh 2014):
54 Heidrun Dorgeloh
The Grand Round begins with the case history of a patient and that patient’s firsthand
account of the medical decision he or she faced, occasionally along with the patient’s
primary care physician’s perspective. These accounts are followed by questions for the
Grand Rounds discussant, which the discussant, usually a well-recognized authority on
the clinical topic, addresses based on available evidence in the literature, and, where no
evidence exists, clinical experience. Following the presentation, the discussant drafts the
manuscript for submission to JAMA, including the case description, the patient’s perspec-
tive, the discussion (including references and pertinent tables and figures), and the ques-
tion-and-answer session that occurred at the end of the Grand Rounds. The manuscript
then undergoes editorial evaluation, external peer review, and revision. If the manuscript
is revised satisfactorily and determined to have a level of quality appropriate for JAMA, the
manuscript is accepted and published in JAMA and usually is featured in Clinician’s Corner.
(Winker 2006: 2888)
The idea behind this more innovative medical text variety is to approach a case
from various perspectives, including that of the patient. The purpose is not only
to offer and exchange information, but to improve medical decisions, which is
to be achieved by “aligning the goal of the patient and physician” (Winker 2006:
2888). Since its foundation, the section has been re-structured several times, but
the core idea, a joint context for doctors and patients, who contribute different
perspectives, has essentially remained unchanged. (5) and (6) are text samples of
a patient’s and a doctor’s presenting on the same case:
(5) After I had bladder surgery […], my doctor told me, “I have good news and bad news
and good news; it’s not bladder cancer, but the bad news is that it’s something else.”
I accepted the complete hysterectomy, which at my age was not disturbing news. But
in terms of the treatment and how it was going to affect me, the thing that worried me
most was that I kept hearing about nausea, exhaustion, and that I wouldn’t be able to
do things. As a result of that, I canceled my teaching for that fall.
I remembered being very anxious the first day of chemotherapy because I just didn’t
know what to expect. I decided to do the intraperitoneal chemotherapy because it
made spatial logic to me. If you are aiming a treatment at the area of the cancer, it was
going to get there more rapidly. I probably had some benefit from having had this mode
of treatment before I went back to complete the treatment with the IV.
Now, I have CAT [computed tomography] scans every 3 to 4 months. I don’t like to go to
doctors, my mother never went until she was 80, but I go now because I’ve learned to
trust the process, so I keep my appointments. The last time I chatted with the oncolo-
gist, I asked him if we could talk about the kinds of symptoms I should look for going
forward. What should I expect for myself? (Journal of the American Medical Associa-
tion, 4 April 2010; Ms W)
(6) Ms W is a 75-year-old woman with epithelial ovarian cancer. She first developed lower
abdominal pain in 2008. After workup for a genitourinary origin of the pain, she was
found to have a 13.5 × 11 × 15.5–cm complex right adnexal mass. She had an optimal
surgery cytoreductive, with less than 1 cm of peritoneal disease remaining at the end of
the procedure. The pathologic findings were consistent with epithelial ovarian cancer
of mixed endometrioid/clear-cell histology. Her uterus, fallopian tubes, and omentum

were free of disease. Metastatic adenocarcinoma was noted in the left paracolic gutter
and she was diagnosed as having stage IIIC disease. She then started intraperitoneal
and intravenous (IV) cisplatin/paclitaxel chemotherapy, which was switched to IV car-
boplatin/paclitaxel because of an infection of the intraperitoneal catheter. She was
in complete clinical remission after 6 cycles of platinum-based chemotherapy and
was then registered in a clinical trial of maintenance abagovomab vs placebo. She is
currently not receiving any treatment and is questioning her prognosis and how she
should be followed up in the long term. (Journal of the American Medical Association,
4 April 2010; Dr Tess)
The texts in (5) and (6), although from a highly professional medical journal, illus-
trate that the discourse is intended for the narrative kind of medicine described in
Section 2.2. This situation makes for text varieties that show a more mixed char-
acter than illness narratives, as exemplified by (1), as well as case reports, such as
(4). On the one hand, both (5) and (6) have a chronological structure, i.e. “degree-
zero” narrativity; on the other hand, the patient in (5) shows a degree of expertise
and professional competence, a voice of medicine (cf. Section 3.1), which makes
the register in the text more similar to professional medical discourse, like (4) and
(6). As a result, (5) and (6) possess hybridity in form, i.e. they combine narrative
and non-narrative register features.
By contrast, considering the discourse function, the doctor’s motivation in
this context is not limited to presenting a case to colleagues. Instead, there is a
more personal, though third-party, point in telling the patient’s story, as expressed
by She is […] questioning her prognosis and how she should be followed up in the
long term. Although throughout the main body of the presentation the doctor uses
the voice of medicine, the main purpose is collaboration and a joint effort; the
presentation thus comes from the doctor’s voice and carries an indirect, and ulti-
mately more hybrid function for a narrative. As the analysis of linguistic features
in the corpus study will show, this complex relationships of form and function is
reflected by the genre perspective.
4 R
egister and genre profile of the three types of
medical discourse
4.1 Data and research aims
The analysis which follows is based on a small corpus of texts, covering in roughly
equal shares the three genres under investigation and amounting to a 3,777 words
56 Heidrun Dorgeloh
total. The exact proportions are included in Table 1. The analysis is intended as
a pilot study and rests upon a limited database, but it will demonstrate how the
interpretation of findings on register features benefits from a genre perspective.
Numerous studies already document the co-occurrence of features from a nar-
rative dimension of variation on a quantitative basis (starting with Biber 1988,
1989), among which, most notably, the presence of past tense forms, pronominal
reference, and time adverbials. The claim here is that these features, which are
pervasive to varying extents in the texts investigated, on the one hand testify the
formal hybridity of the genres as illustrated in Figure 1 but, on the other, do not
determine the text variety at a sufficient level of specification.
The more integrated genre analysis will be presented in two steps: in 4.1, the
register features indicative of a chronology, i.e. past tense narration and time
adverbials, are functionally re-interpreted from the point of view of the genre in
which they occur. This part of the analysis illustrates that in medical discourse
high frequencies of narrative features may in fact correlate with a non-narrative
discourse mode. It is argued, in particular, that the dominance of such narrative
text form goes beyond the presence of narrative episodes, which is something
that applies to many kinds of discourse (e.g. Csomay 2006, 2007; also Werner,
this volume), but is specifically motivated by the “object-oriented” discourse goal
of the genres investigated here. Section 4.2 then looks at features reflecting the
expression of human experience: pronoun usage and choice of subjects. The aim
of this section is to show that, rather than in a grammatical form such as pronoun
usage, genres with a narrative as opposed to a non-narrative purpose differ in a
characteristic way in a use of semantic categories. The more general claim behind
both analyses is that genre categories, in the sense of referring to discourse at a
relatively low level of generality, are effective beyond both register features as well
as textual conventions, but lead to patterns at several levels of analysis. Complex
discourse goals, such as the verbalisation of medical experience, are therefore
better accounted for from a genre, rather than from a register perspective.
4.2 Degree-zero narrativity in different medical genres
A narrative discourse mode is primarily associated with events that happened in

the past and with their temporal sequencing (Georgakopoulou and Goutsos 2000:
125, 2004: 43). For this reason, the primary narrative register features indicative of
this degree-zero narrativity (cf. Section 3.4) are the use of past tense narration and
of time adverbials (cf. Biber and Conrad 2009: 119).
In Table 1, the proportion of overall text in the narrative, past tense mode
is shown as a word count, compared to the amount of text passages containing
other tenses.5 The second feature is the use of time adverbials, which situate the
events in their temporal sequence. For example, text (1), shown here as (7), has
non-narrative passages (printed in italics) in the beginning and in the closing
evaluative comment, serving as a coda, while the main body of the narration is
structured in episodes marked by explicit temporal reference (in bold print).
(7) Hi my name is Ann. I was officially diagnosed in Sept of last year. I have had symp-
toms for the past several years that got worse as the years went on. I was exercising
and swimming three times a week and then I started getting more muscle cramps. I
went to the doctor and he just told me to take calcium and magnesium and drink more
water. It took him a long time to understand that the muscle cramp were extremely
painful happening several time a day. I would have abdominal muscle cramps that
felt like i was in full-blown labor. They would come on suddenly when I was startled
or when I coughed. They would ease up for a few seconds and then just get worse
again. Several times my feet and hands would cramp up until they were fully dis-
torted. I did go to a neurologist who seemed to have an idea of what I had but made
no effort to diagnosis what I had. He told me that it would not do any good to try to
diagnosis my disease and instead gave me all kinds of different pills and most of them
did not work well and also caused several side effects. Often when I went to see him
I did not feel like he even remembered me. I did finally request a new doctor, which
has been a Godsend to me and now is treating me with IVIG, which is working well. My
symptoms still get worse at times but they are manageable. I am eager to talk to people
that have the same syndrome. Most people do not understand the pain and all the other
symptoms. I found your web site today and am eager to learn more. (http://www.stiff-
personsyndrome.net , accessed March 17, 2011)
Table 1 shows the proportion of text passages in the narrative mode across the
three genres. While case presentations from the medical case report contain only
past tense passages, patients’ tales from the blog, i.e. from a medium that encour-
ages reflection and relation-building (cf. Section 3.2), have a lower proportion of
the narrative mode. The texts from “Clinical Crossroads” contain the lowest pro-
portion of proper narration, which is in line with a discourse goal consisting in,
not only the sharing of information, but also in preparing an adequate decision.
5 Instead of counting verb forms, the proportion of narrative as opposed to non-narrative mode
is measured in the relative length of the text passages in which past as opposed to non-past
tenses are used.
58 Heidrun Dorgeloh
Table 1: Narrative features in three medical genres6
Illness blog Case report Clinical Crossroads
total no. of words 1,126 1,391 1,260
proportion of past tense text 80 % 100 % 59 %

passages (by no. of words7)
time adverbials per 100 words 3.11 (35) 1.44 (20) 2.78 (35)
(absolute frequency)
The results are almost opposite for the occurrence of time adverbials: their fre-
quency is high in the patients’ tales, including the case presentations in “Clinical
Crossroads”, and much lower in case reports. Explicit temporal reference thus
seems to be directly related to more personal accounts, i.e. to a narrative, or at
least to a partly narrative (hybrid) function (cf. Figure 1). The finding is in line
with research which has shown that in proper stories time adverbials do not only
carry temporal meaning, but are also text-strategic devices (cf. Virtanen 1992b).
Note, however, that this applies, in particular, to time adverbials in sentence-in-
itial position, where they mark temporal shifts in the progression of a narrative
strategy (Virtanen 2004). For example, in (7) then I started getting more muscle
cramps marks the beginning of a new episode, whereas uses of the same tempo-
ral adverb in (6) (She then started […] chemotherapy, […]; She […] was then regis-
tered in a clinical trial), do not mark a text structure based on temporal sequence
and are thus placed sentence-medially. This means that the point of departure
is the patient as medical case, and not as a character. The lower amount of time
adverbials in case reports thus reflects their “topic-oriented strategy” focussing
on the medical case, turning them into an expository, rather than a narrative, text
(Virtanen 2010: 66–67).
These two findings together suggest that a differentiated look is necessary
when interpreting quantitative results about pervasive linguistic features in their
discourse context. In particular, a narrative form and a narrative text function
need to be distinguished, as the outline in Sections 3.2 to 3.4 and the illustration
in Figure 1 have shown. In the texts investigated, the non-narrative function of the
case reports, in the sense of a lack of personal story-point, goes together with an
6 Besides the individual sample texts discussed for illustration in Section 3, the corpus consists
of other texts from the same genre, totaling to the amount of words as indicated.
7 As reflected by the use of past tense verbs. As in texts (1) and (5), this also includes the use of
the so-called “habitual conditional” (cf. Haiman and Kuteva 2002: 120).
exclusive use of past tense forms, showing that the verbalisation of a chronology
of events has a variety of uses (cf. Section 2.2). A narrative function, by contrast,
also involves passages in which the narrative mode is absent, since it is evaluative
comments, particularly the coda, which verbalise the point of a proper story. In
this way, although dominated less by past tense narration, illness blogs as well
as case presentations from “Clinical Crossroads” gain their narrative or hybrid
function from passages in the non-narrative mode – a form-function complexity
which a genre perspective makes understood.
4.2 F rom register feature to genre feature: Exploring reference

in medical discourse
While past tense forms and time adverbials have to do with the past temporality
of the events reported, the pervasiveness of pronominal reference, as opposed
to more explicit forms of expression, arises from the fact that a narrative verbal-
ises human experience (Biber and Conrad 2009: 259, also cf. Neumann and Fest,
this volume). The presence of a narrator allows “readers to immerse themselves
in a different world and in the life of the protagonists” (Fludernik 2009: 6). The
main protagonists in a medical process are the doctor and the patient, reference
to them being made, in particular, when the doctor’s voice and the patient’s sto-
rytelling are used (cf. Section 3.1). By contrast, the voice of medicine tends to
de-focus human experience; turning the language of medical discourse into a
more scientific register, which is “object”- rather than “agent”-oriented (Atkinson
1999). It is therefore expected that reference to these different components of an
illness correlates in significant ways with the genre of medical discourse.
As Table 2 shows, the frequency of pronouns8 is higher in the text samples
with a narrative or hybrid function, i.e. in the patients’ tales and in “Clinical
Crossroads”. It is lower, though not very low, in the case reports. This reflects the
non-narrative, object-oriented discourse goal of professional medical discourse,
although the main object of investigation is nonetheless a human agent. The
hybrid narrative form of medical case reports is thus also confirmed by the use of
pronominal reference.
Since the use of pronouns as a register feature distinguishes the three genres
only insufficiently, Table 2 also presents results of an alternative analysis of the
referential patterns one finds in the texts. Looking at the subjects of all (finite and
8 This feature includes personal and possessive (including reflexive) pronouns as well as rela-
tive pronouns referring to a noun phrase (and not to a clause).
60 Heidrun Dorgeloh
non-finite) clauses in the corpus, the instances of the (explicit or implicit) sub-
jects were categorised as referring to the patient, the doctor, or to the domain of
medicine.9 Subjects being the unmarked point of departure of the English clause
and therefore more often than not the topic (e.g. Börjars and Burridge 2010: 226),
it was assumed that their reference is likely to indicate which voice is talking (cf.
Section 3.1) and to what extent the discourse truly focuses on human experience.
Table 2: Pronominal reference and reference of topics in the three genres
Illness blog Case report Clinical Crossroads
personal pronouns per 100 words 10.48 (118) 7.55 (111) 10.64 (134)
clausal topics per 100 words 12.79 (144) 8.63 (120) 13.02 (164)
patient as topic 5.60 (63) 3.45 (48) 8.50 (107)
doctor as topic 1.51( 17) 0.86 (12) 0.56 (7)
topic from the domain of medicine 5.16 (58) 4.10 (57) 2.70 (34)
other topics 0.98 (11) 0.22 (3) 1.27 (16)
While the overall frequencies of clausal topics per text category differ mainly for
reasons of sentence length, the semantic sub-categorisation contained in Table
2 yields some notable similarities and differences. In particular, illness blogs
and case reports are quite similar with respect to their reference to the domain
of medicine, and both do not reach the extent of reference made to the patient in
“Clinical Crossroads”. Although they pursue opposite, i.e. narrative as opposed
to non-narrative, discourse goals and are produced by opposite speaker roles,
illness blogs and case reports, which otherwise differ in their use of narrative
register features, reveal a striking similarity in this respect.
9 Assuming that every lexical verb gives rise to a clause, each explicit or implicit subject belong-
ing to a lexical verb was categorised semantically. The category “patient” includes reference to
the person as well as to body parts. The category “medicine” covers symptoms (weakness, pain),
reference to the disease, as well as to elements from the diagnosis (tests, findings) or therapy
(e.g. medication or treatment). In the majority of cases, these categories were distinct; there were
only two instances of a subject referring to both patient and doctor, as in: The last time I chatted
with the oncologist, I asked him if we could talk. Subjects like these were counted towards both
categories.
By contrast, the texts from “Clinical Crossroads” show a pattern of reference to

topics which reflects the discourse goal of aligning the perspectives of the patient
and the doctor (cf. Section 3.4). The focus of this discourse is not so much on the
domain of medicine, nor on the role of the doctor, but in line with the objective of
a “narrative medicine” it represents a true expression of patient-centred medical
care (Gerteis et al. 1993). That such a discourse context provides in fact for a new
genre becomes particularly evident if one looks at the proportions of topics as
used by the patients, as opposed to ones used by the doctors, in Figure 2.
100%
90%
80%
70%
60% other
patient
50%
doctor
40%
medicine
30%
20%
10%
0%
patient in blog doctor in case report patient in CC doctor in CC
Figure 2: Use of medical topics10 by both speaker groups
The results from Table 1 and Figure 2 make obvious that genres with opposite
functions, i.e. illness reports and medical case reports, can in fact be more
similar than the ones with a related function, such as patients communicating
their illness in different situations. The reason is that different voices are used
for communicating illness (cf. Section 3.1), which highlight different aspects of
the course of the events. While due to general situational parameters, such as
speaker or discourse function, illness blogs and “Clinical Crossroads” are similar
in their register usage, they nonetheless differ in their choice of topics. It is this
10 Percentages show the proportion of the four semantic categories in relation to the total of
topics as given in Table 2.
62 Heidrun Dorgeloh
interrelationship of form (register), function, and social context, which for the
analysis of medical discourse suggests a primacy of the notion of genre.
5 C
onclusion
My analysis of text varieties from medical discourse has intended to show that
investigating linguistic variation with a view to genre adds an important perspec-
tive to the understanding of form-function relationships in text-linguistic studies.
While these commonly rest upon the assumption that “linguistic co-occurrence
reflects shared function” (Biber 1989: 5) and present corpus-linguistic evidence
for this, the interrelationship of register and genre can only be made explicit by
combining the perspectives. Since a genre classifies discourse at a rather low
level of generality, especially with regard to the purpose and goal of a discourse,
it determines both pervasive linguistic features as well as the choice of discourse
topics and semantic categories. Hence, I have argued here that a genre analysis
logically subsumes and pre-determines a register analysis.
Genres, especially in the domain of medicine, make regular use of the nar-
rative discourse type with its attested register features. This is not surprising,
given the acknowledged role of the narrative as a basic text type or meta-genre
(cf. Section 2.2). A similar interrelationship underlies the observation that the
dividing line between lay and professional communication is also one between
narrative and non-narrative discourse (Georgakopoulou and Goutsos 2000). The
discussion here has added to this view that one needs to distinguish between nar-
rative form and narrative discourse function, and that more professional social
and cognitive activities typically go together with more complex (in the sense of
more indirect) uses of narrative register variation. Text varieties of this kind are
best understood from a genre perspective, which can account for their mixed pur-
poses and voices and, thus, their hybridity in register.
References
Atkinson, Dwight. 1992. The evolution of medical research writing from 1735 to 1985. Applied
Linguistics 13. 337–374.
Atkinson, Dwight. 1999. Scientific discourse in sociohistorical context: The Philosophical
Transactions of the Royal Society of London 1675–1975. Mahwah, NJ: Lawrence Erlbaum.
Press.
Biber, Douglas. 1989. A typology of English texts. Linguistics 27. 3–43.
Longman grammar of spoken and written English. Harlow: Longman.
registers. Amsterdam & Philadelphia: John Benjamins.
Biber, Douglas. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and
Linguistic Theory 8(1). 9–37.
University Press.
Biber, Douglas & Bethany Gray. 2013. Being specific about historical change: The influence of
sub-register. Journal of English Linguistics 41(2). 104–134.
Börjars, Kersti & Kate Burridge. 2010. Introducing English grammar. London: Arnold.
Brinker, Klaus. 2005. Linguistische Textanalyse: Eine Einführung in Grundbegriffe und
Methoden. Berlin: Schmidt.
Charon, Rita. 2006. Narrative medicine: Honoring the stories of illness. Oxford & New York:
Oxford University Press.
Cordella, Marisa. 2004. The dynamic consultation: A discourse analytical study of doctor-
patient communication. Amsterdam & Philadelphia: John Benjamins.
Coupland, Nikolas. 2007. Style: Language variation and identity. Cambridge: Cambridge
University Press.
Croft, William. 2010. The origins of grammaticalization in the verbalization of experience.
Csomay, Eniko. 2006. Academic talk in American university classrooms: Crossing the
boundaries of oral‐literate discourse? Journal of English for Academic Purposes 5(2).
117–135.
Csomay, Eniko. 2007. A corpus-based look at linguistic variation in classroom interaction:
Teacher talk versus student talk in American University classes. Journal of English for
Academic Purposes 6(4). 336–355.
Dorgeloh, Heidrun. 2012. Arztbericht vs. Patientengeschichte: Story point als Genremerkmal
im medizinischen Internetdiskurs. In Ansgar Nünning, Jan Rupp, Rebecca
Hagelmoser & Jonas Ivo Meyer (eds.), Narrative Genres im Internet: Theoretische
Bezugsrahmen, Mediengattungstypologie und Funktionen (WVT-Handbücher zum
literaturwissenschaftlichen Studium), 261–276. Trier: WVT.
Dorgeloh, Heidrun. 2014. ‘If it didn’t work the first time, we can try it again’: Conditionals as a
grounding device in a genre of illness discourse. Communication & Medicine 11(1). 55–67.
Dorgeloh, Heidrun & Anja Wanner. 2010. Syntactic variation and genre. Berlin & New York: de
Gruyter Mouton.
Döring, Nicola. 2003. Sozialpsychologie des Internet. Göttingen: Hogrefe.
Eckert, Penelope & John R. Rickford (eds.). 2001. Style and sociolinguistic variation. Cambridge:
Cambridge University Press.
Fleischman, Suzanne. 2001. Language and medicine. In Deborah Schiffrin, Deborah Tannen &
Heidi E. Hamilton (eds.), The handbook of discourse analysis, 470–502. Malden, Mass.:
Blackwell.
Fludernik, Monika. 1996. Towards a ‘natural’ narratology. London: Routledge.
Frankel, Richard M. 2000. The (socio)linguistic turn in physician-patient communication
research. In James E. Alatis, Heidi E. Hamilton & Ai-Hui Tan (eds.), Linguistics, language,
and the professions, 81–103. Georgetown: Georgetown University Press.
64 Heidrun Dorgeloh
Georgakopoulou, Alexandra & Dionysis Goutsos. 2000. Mapping the world of discourse: The
narrative vs. non-narrative distinction. Semiotica 131(1–2). 112–141.
Georgakopoulou, Alexandra & Dionysis Goutsos. 2004. Discourse analysis: An introduction.
Edinburgh: Edinburgh University Press.
Gerteis, Margaret, Susan Edgman-Levitan, Jennifer Daley & Thomas L. Delbanco (eds.). 1993.
Through the patient’s eyes: Understanding and promoting patient-centered care. San
Francisco: Jossey-Bass.
Giltrow, Janet. 2010. Genre as difference: The sociality of linguistic variation. In Heidrun
Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 29–52. Berlin & New York:
de Gruyter Mouton.
Giltrow, Janet & Dieter Stein. 2009. Genres in the internet. Amsterdam & Philadelphia: John
Benjamins.
Gotti, Maurizio & Françoise Salager-Meyer. 2006. Introduction. In Maurizio Gotti & Françoise
Salager-Meyer (eds.), Advances in medical discourse analysis: Oral and written contexts,
9–16. Bern: Peter Lang.
Haiman, John & Tania Kuteva. 2002. The symmetry of counterfactuals. In Joan Bybee & Michael
Noonan (eds.), Complex sentences in grammar and discourse: Essays in honor of Sandra
A. Thompson, 101–124. Amsterdam & Philadelphia: John Benjamins.
Halliday, Michael A. K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Edward Arnold.
Honeybone, Patrick. 2011. Variation and linguistic theory. In Warren Maguire & April McMahon
(eds.), Analysing variation in English, 151–177. Cambridge: Cambridge University Press.
Hunter, Kathryn M. 1991. Doctors’ stories: The narrative structure of medical knowledge.
Princeton, NJ: Princeton University Press.
Hurwitz, Brian. 2006. Form and representation in clinical case reports. Literature and Medicine
25(2). 216–240.
Kinneavy, James Louis. 1971. A theory of discourse: The aims of discourse. Englewood Cliffs, NJ:
Prentice-Hall.
Kortmann, Bernd. 2006. Syntactic variation in English: A global perspective. In Bas Arts & April
McMahon (eds.), Handbook of English linguistics, 603–624. Oxford: Blackwell.
Labov, William. 1997. Some further steps in narrative analysis. The Journal of Narrative and Life
History 7. 395–415.
Labov, William & Joshua Waletzky. 1967. Narrative analysis: Oral versions of personal
experience. In June Helm (ed.), Essays on verbal and visual arts, 12–44. Seattle: University
of Washington Press.
Martin, James Robert & David Rose. 2003. Working with discourse: Meaning beyond the clause.
London: Continuum.
Maseide, Per. 2003. Medical talk and moral order: Social interaction and collaborative clinical
work. Text 23(3). 369–403.
McCullough, Laurence B. 1989. The abstract character and transforming power of medical
language. Soundings 72(1). 111–125.
Mishler, Elliot G. 1984. The discourse of medicine: Dialectics of medical interviews. Norwood,
NJ: Ablex.
Miller, Carolyn R. 1984. Genre as social action. Quarterly Journal of Speech 70. 151–167.
Murawska, Magdalena. 2012. The many narrative faces of medical case reports. Poznan Studies
in Contemporary Linguistics 48(1). 55–75.
Page, Ruth. 2012. Stories and social media: Identities and interaction. New York: Routledge.
Polanyi, Livia. 1985. Telling the American story: A structural and cultural analysis of
conversational storytelling. Norwood: Ablex.
Richards, Jack C. & Richard W. Schmidt. 2002. Longman dictionary of language teaching and
applied linguistics. Harlow, UK: Longman.
Rosenbach, Annette. 2002. Genitive variation in English: Conceptual factors in synchronic
and diachronic studies (Topics in English linguistics 42). Berlin & New York: Mouton de
Gruyter.
Salmon, William N. 2010. Formal idioms and action: Toward a grammar of genres. Language &
Communication 30(4). 211–224.
Sankoff, David. 1988. Sociolinguistics and syntactic variation. In Frederick J. Newmeyer (ed.),
Linguistics: The Cambridge survey, 140–161. Oxford: Blackwell.
Sarangi, Srikant & Celia Roberts. 1999. Introduction: Discourse hybridity in medical work. In
Srikant Sarangi & Celia Roberts (eds.), Talk, work, and institutional order: Discourse in
medical, mediation, and management settings. 61–74. Berlin: Mouton de Gruyter.
Sarangi, Srikant. 2001. Activity types, discourse types and interactional hybridity: The case of
genetic counseling. In Srikant Sarangi & Malcolm Coulthard (eds.), Discourse and social
life, 1–27. Harlow: Longman.
Schilling-Estes, Natalie. 2002. Investigating stylistic variation. In Jack K. Chambers, Peter
Trudgill & Natalie Schilling-Estes (eds.), The handbook of variation and change, 374–401.
Oxford: Blackwell.
Schmid, Hans-Jörg. 2013. Is usage more than usage after all? The case of English not that.
Linguistics 51(1). 75–116.
Schryer, Catherine, Lorelei Lingard, Marlee Spafford & Kim Garwood. 2003. Structure and
agency in medical case presentations. In Charles Bazerman & David R. Russel (eds.),
Writing selves/writing societies, 92–96. Fort Collins: WAC.
Schulze, Rainer (ed.). 1998. Making meaningful choices in English: On dimensions,
perspectives, methodology, and evidence. Tübingen: Gunter Narr.
Smith, Carlota S. 2003. Modes of discourse: The local structure of texts (Cambridge Studies in
Linguistics 103). Cambridge: Cambridge University Press.
Swales, John M. 2004. Research genres: Explorations and applications. Cambridge: Cambridge
University Press.
Tannen, Deborah. 1989. Talking voices: Repetition, dialogue and imagery in conversational
discourse. Cambridge: Cambridge University Press.
Virtanen, Tuija. 1992a. Issues of text typology: Narrative – a ‘basic’ type of text? Text 12(2).
293–310.
Virtanen, Tuija. 1992b. Given and new information in adverbials: Clause-initial adverbials of
time and place. Journal of Pragmatics 17(2). 99–115.
Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological
perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic
variation and genre, 53–84. Berlin & New York: de Gruyter Mouton.
Werlich, Egon. 1976. A text grammar of English. Heidelberg: Quelle & Meyer.
Winker, Margaret A. 2006. Clinical crossroads: Expanding the horizons. The Journal of the
American Medical Association 295(24). 2888–2889.
Markus Bieswanger
Aviation English: Two distinct specialised
registers?
Abstract: The communication between air traffic controllers and pilots via voice
radio is regularly referred to as Aviation English in the literature. Responding to
growing international air travel after the Second World War and in reaction to
several accidents and incidents at least partly caused by controller-pilot miscom-
munication, the International Civil Aviation Organization (ICAO) developed a set
of standards and recommended practices concerning language use in air traffic
control communication. These ICAO guidelines permit the use of two different
and precisely defined varieties of Aviation English: standardised phraseology
in most routine situations and plain Aviation English when standardised phra-
seology is insufficient to serve an intended transmission. Based on the official
ICAO recommendations and the analysis of text excerpts from authentic air traffic
control communication, this paper addresses the question whether the two vari-
eties currently referred to as Aviation English are distinct registers in the sense
of Biber and Conrad (2009). The relationship between the two different inter-
pretations of Aviation English in actual controller-pilot communication and the
linguistic characteristics of these varieties are investigated and compared. The
analysis shows that the two varieties in question are indeed distinct specialised
registers and supports the main objective of the volume by demonstrating that
adequate register choice is a prerequisite for successful communication, in this
case in aviation contexts.
1 I ntroduction
For several decades, aviate – navigate – communicate has been widely known as
the axiomatic set of any pilot’s duties, particularly during non-routine and emer-
gency situations, but also in everyday routine flying. From the point of view of pri-
oritisation of tasks in high workload situations, the order implies that the primary
concern of any flight crew must be to maintain control over their aircraft, the
second most important duty is to make sure that the aircraft moves in the direc-
tion it is supposed to fly and the third priority is to communicate the intentions
Markus Bieswanger, University of Bayreuth

68 Markus Bieswanger
of the flight crew to and receive instructions from air traffic control. However, this
order does not mean that communication plays an unimportant role in aviation.
Despite the highly plausible prioritisation of tasks, it should also be noted that
communication is included in the set of the three most important duties of pilots
(cf. Kostecka 2007: 13).
As a result of a number of incidents and accidents associated with commu-
nication problems as well as several decades of continuous growth of air traffic
around the globe, communication issues in air traffic control contexts are cur-
rently taken very seriously by the aviation authorities and play a heightened role
in pilot and air traffic controller training. The International Civil Aviation Organ-
ization explains this as follows:
With mechanical failures featuring less prominently in aircraft accidents, more attention
has been focused in recent years on human factors that contribute to accidents. Communi-
cation is one human element that is receiving renewed attention. (ICAO 2010: vii)
The renewed interest in air traffic control communication also shows in the
desire for an exchange of ideas and expertise between aviation professionals and
linguists, as illustrated by the recent volume entitled Aviation Communication:
Between Theory and Practice (Hansen-Schirra and Maksymski 2013). Voice-based
communication between pilots and air traffic controllers, so-called radiotele
phony, is regularly referred to as Aviation English or at least constitutes a central
part of even the broadest definitions of Aviation English. Moder (2013: 227) pro-
vides such a broad definition:
Aviation English describes the English used by pilots, air traffic controllers and other per-
sonnel associated with the aviation industry. Although the term may encompass a wide
variety of language use situations, including the language of airline mechanics, flight
attendants, or ground service personnel, most research and teaching focus on the more
specialized communication between pilots and air traffic controllers, often called radiote-
lephony.
Linguistic publications indeed often adopt a more focused definition of Aviation

English as “the language used by pilots and air traffic controllers” (Intemann
2008: 21). The present article follows this definition of Aviation English as the
English used in voice-based air traffic control communication, but differs from
most previous work in that it does not aim to analyze Aviation English prima
rily to investigate the reasons for miscommunication in air traffic control and the
contribution of communication problems to incidents and accidents (cf., e.g.,
Bieswanger 2013), but to assess the status of Aviation English from the perspec-
tive of register research. In the following, this article will give a short overview
of the history of English in air traffic control contexts and then go on to answer
the question whether the two varieties currently referred to by the term Aviation
English are distinct registers which can be categorised as specialised registers in
the sense of Biber and Conrad (2009).
2 English in Air Traffic Control

In 1944, 52 states signed the Chicago Convention, i.e. the first international con-
vention on civil aviation. The convention resulted in the foundation of the Interna-
tional Civil Aviation Organisation (ICAO), which became a United Nations Agency
in 1947. One of the purposes of the ICAO is to provide international standards for
air traffic control and safe flight operations, which includes recommendations
on language use in pilot-controller communication. These provisions concerning
language use and language requirements are primarily defined in Volume II of the
Annex 10 to the Convention on International Civil Aviation on Aeronautical Com-
munications (ICAO 2001), additional language recommendations are defined in
the Annexes 1, 6, and 11. The requirements are further specified in the Manual of
Radiotelephony (ICAO 2007a) and the Procedures for Air Navigation Services: Air
Traffic Management (ICAO 2007b).
It is mainly as a result of World War II that English was chosen as the basis
of the world-wide aviation communication language. It has to be noted that the
ICAO recognises national languages and does not forbid the use of languages
other than English for local air navigation purposes, provided that all persons
involved share that other language. In international aviation, by contrast, the
use of English is the rule. Crystal (2003: 108) sums up the reasons for this choice
as follows: “[…] they agreed that English should be the international language
of aviation when pilots and controllers speak different languages. This would
have been the obvious choice for a lingua franca. The leaders of the Allies were
English-speaking; the major aircraft-manufacturers were English-speaking; and
most of the post-war pilots in the West (largely ex-military personnel) were Eng-
lish-speaking.” Regarding the economic, technological, and military dominance
of Great Britain and especially the USA at that time, other languages were not a
realistic option.
The Chicago convention granted “complete and exclusive sovereignty over
the airspace above its territory” (Convention on International Civil Aviation 1944)
to each of the contracting states, but also demanded that all contracting states
provide adequate regulations for the safety of aviation. The original language of
the document is English, but it was translated into French and Spanish as the
two other languages “equal of authenticity” (cf. Convention on International Civil

Aviation 1944). Today, there are also translations of the document into Russian,
Chinese and Arabic, since these are official languages of the United Nations. Cur-
rently the ICAO has 190 member states.
The first version of the Convention on International Civil Aviation (1944) does
not include any statements on the question of an international air traffic commu-
nication language, but it promises further regulations. Today’s air traffic man-
agement procedures are the result of an ongoing evaluation and revision of the
first document provided in 1946 by the Air Traffic Control Committee of the Inter-
national Conference on North Atlantic Route Service Organisation (cf. ICAO 2007b:
vii). This bias towards North Atlantic air traffic has definitely also contributed to
the choice of English.
Responding to the constantly growing international air travel after the
Second World War and in reaction to several accidents and incidents at least
partly caused by controller-pilot miscommunication (cf., e.g., Cushing 1994;
Jones 2003), the ICAO developed a set of standards and recommended practices
(SARPs) concerning language use in general and the use of English in particular
in air traffic control communication (cf. ICAO 2001; ICAO 2007a; ICAO 2007b),
which has been adopted by most countries world-wide. For several decades, until
about the turn of the century, these SARPs were almost exclusively devoted to the
definition of the so-called “ICAO standardized phraseology” (ICAO 2001: 5-1; for
a detailed description cf. Section 3.2 below), which is supposed to “provide the
tools for communication in most of the situations encountered in the daily prac-
tice of ATC [= air traffic control] and flight” (ICAO 2010: 3-5).
More recently, the ICAO has added SARPs concerning the proficiency in plain
Aviation English of all pilots and air traffic controllers involved in international
aviation (cf. Mathews 2004; Mitsutomi and O’Brian 2004; ICAO 2010). Experi-
ence with standardised phraseology had shown that in unusual and unexpected
“cases, where phraseology provides no ready made form of communication,
pilots and controllers must resort to plain language” (ICAO 2010: 3-5). The moti-
vation for the demand of a certain level of proficiency in plain Aviation English
by all stakeholders in air traffic control communication was similar to the reasons
that had earlier led to the development of the standardised phraseology:
Over 800 people lost their lives in three major accidents […]. In each of these seemingly
different types of accidents, accident investigators found a common contributing element:
insufficient English language proficiency on the part of the flight crew or a controller had
played a contributing role in the chain of events leading to the accident. In addition to these
high-profile accidents, multiple incidents and near misses are reported annually as a result
of language problems, instigating a review of communication procedures and standards
worldwide. (ICAO 2010: 1-1)
As a result of accidents and incidents more or less intimately connected to com-

munication problems, currently all pilots and air traffic controllers involved in
international aviation have to demonstrate proficiency in plain aviation-related
English or plain Aviation English; the required level of proficiency is at least level
4 “operational” on a scale from level 1 “pre-elementary” to level 6 “expert” (ICAO
2010: A-7 and A-8).
To sum up, two varieties of English used for communication between pilots
and air traffic controllers are presently referred to by the term Aviation English,
namely standardised phraseology, on the one hand, and plain Aviation English,
on the other. In this paper, Aviation English will be used as the umbrella term,
while standardised phraseology and plain Aviation English will be used to refer
to the varieties of Aviation English respectively. The following chapter will apply
the classification of Biber and Conrad (2009) to these varieties and investigate
whether we are concerned with two distinct specialised registers referred to by
the same designation.
3 Registers of Aviation English

According to Biber and Conrad (2009: 6), “a register is a variety associated with a
particular situation of use (including particular communicative purposes.” Biber
and Conrad (2009: 6) identify three components of a register analysis: firstly, the
situational context of use, i.e. the unique situational characteristics of a certain
variety of language use. Secondly, the linguistic analysis, i.e. the description of
“typical lexical and grammatical features” (Biber and Conrad 2009: 6) that are
pervasive in a variety. Thirdly, the interpretation of the functions of these per-
vasive linguistic features in the situational context specified earlier. Section 3.1
will be devoted to a situational analysis of the two registers in question, while
Sections 3.2 and 3.3 will describe their linguistic characteristics and their specific
functions.
3.1 S
ituational analysis
As already mentioned, Aviation English consists of standardised phraseology and

the use of plain English in aeronautical radiotelephony communication. When
applying Biber and Conrad’s (2009: 39) “framework for analyzing situational
characteristics,” many similarities and some crucial differences concerning the
situational context of these two varieties of Aviation English can be identified.
According to Biber and Conrad (2009: 40), the major situational characteristics
of registers are: participants, relations among participants, channel, production
circumstances, setting, communicative purposes and topic (cf. also Schubert,
this volume).
Participants
The participants in both varieties of Aviation English are identical. The stake-
holders in aeronautical radiotelephony communication, i.e. pilots and control-
lers engaging in air traffic control communication, are both addressors producing
text as well as intended listeners referred to as addressees (cf. Biber and Conrad
2009: 41). Depending on national regulations, it may or may not be legal for out-
siders to listen to air traffic control communication, but there is no difference
between the two varieties concerning what Biber and Conrad (2009: 42) call
“on-lookers”. Since all parameters concerning participants and participation are
identical, differences between the use of standardised phraseology and plain Avi-
ation English cannot be attributed to this situational characteristic.
Relations among participants

There are no differences between the two varieties of Aviation English in the rela-
tions among participants either. The participants in air traffic control communi-
cation directly interact with each other. Usually, one member of the flight crew
interacts with one air traffic controller in a dialogue at any given point in time. In
both varieties, the social roles of the interlocutors are identical, there are usually
no personal relationships between them and all participants share considerable
background knowledge about aviation.
Channel
With channel, Biber and Conrad (2009: 43) mean the binary distinction into the
physical modes of speech and writing and what they call the “specific mediums
of communication.” Both types of Aviation English are voice-based and thus
clearly spoken registers. Written air traffic control communication with the help
of a so-called controller-pilot data link is still in its infancy and faces a number of
disadvantages that seem to inhibit its more widespread use, such as the ensuing
lack of situational awareness of all pilots of surrounding aircraft when messages
are exchanged bilaterally between one pilot and one air traffic controller. The
specific medium of communication for transmitting speech in air traffic control
communication is voice radio. Unlike face-to-face communication, Aviation
English thus generally belongs to the types of mediated spoken communication
(cf. also setting below).
Production circumstances
As both kinds of Aviation English are spoken registers, there is typically not much
time for speakers to plan what to say next and no possibility to “edit or erase
language once it is spoken” (Biber and Conrad 2009: 43). As in all spoken conver-
sations, there are certain expectations as to when a speaker has to say something
as well as limitations with respect to the length of pauses. Since all pilots a par-
ticular air traffic controller is responsible for are tuned to the same frequency and
since aviation radio technology does not allow more than one pilot to address the
controller at the same time, efficient communication is one of the main concerns
in air traffic control communication.
Setting
According to Biber and Conrad (2009: 44), “the setting refers to the physical
context of the communication – the time and place” (original emphasis). As with
most spoken communication, the time is shared by the interlocutors in air traffic
control communication, as the messages are transmitted instantaneously. Avia-
tion English, however, is generally mediated communication and thus the situ-
ation is special with respect to place. The participants have a certain knowledge
about the place of production of their interlocutor’s speech but do not share the
place of production as in face-to-face communication. The quality of transmis-
sion in air traffic control communication is one of the reasons for the implemen-
tation of SARPs, as it can be adversely affected by weather, distance and other
circumstances.
Communicative purposes
The two varieties of Aviation English show their biggest differences in relation to
the communicative purposes. It could be argued that both share what Biber and
Conrad (2009: 45) call the “general purpose”, i.e. the aim to ensure efficient and
effective communication between pilots and controllers, and differ only in the
specific purpose. If register status was decided by the general purpose alone, the
two varieties of Aviation English could be termed specific “subregister[s]” (Biber
and Conrad 2009: 45) of one register. However, according to the ICAO (2001: 5-1),
there should be no overlap between these two varieties: “ICAO standardized
phraseology shall be used in all situations for which it has been specified. Only
when standardized phraseology cannot serve an intended transmission, plain
language shall be used.” Considering the fundamentally different and comple-
mentary situations of use – routine versus non-routine air traffic control com-
munication (cf. ICAO 2010: 3-4, 3-5) – and the considerable linguistic differences
between the two varieties, as shown below, it can be argued that we are con-
cerned with two distinct, albeit related, registers.
Topic
The situation concerning the factor topic resembles the differentiation of com-
municative purposes: the shared general topic of both varieties is aviation, but
the specific topics covered are different. While standardised phraseology is con-
cerned with the fairly restricted aspects of routine air traffic control issues, plain
Aviation English covers a broader range of topics in non-routine situations, such
as emergencies as well as other unusual or unexpected contexts. “Topic is the
most important situational factor influencing vocabulary choice” (Biber and
Conrad 2009: 46) and so it is not surprising that standardised phraseology and
plain Aviation English should differ to a large extent at the lexical level (cf. also
Sections 3.2 and 3.3).
Summary
With respect to the situational characteristics of the two varieties of Aviation
English, many of Biber and Conrad’s (2009: 40) parameters such as participants,
relations among participants, channel, production circumstances and setting
are shared by both registers. However, there are clear differences in the commu-
nicative purposes and the range of topics covered by standardised phraseology
and plain Aviation English respectively, which leads to the conclusion that we
are not concerned with sub-registers of a single register. From the perspective of
situational characteristics, which “can be definitely specified” (Biber and Conrad
2009: 33) for both registers, standardised phraseology and plain Aviation English
can be categorised as two distinct specialised registers.
3.2 S
tandardised phraseology
In this section, the linguistic features of standardised phraseology and their

functions will be presented and discussed. In contrast to many other registers,
the functions of the linguistic features of this variety are clearly and explicitly
defined. The register that is officially referred to as “ICAO standardized phrase-
ology” (ICAO 2001: 5-1) is a variety of English that is used in a precisely defined
situational context and characterised by prescribed and pervasive linguistic fea-
tures used for a specific function, mainly “for the purpose of ensuring uniformity
in RTF [= radiotelephony] communications” (ICAO 2007a: 3-1) and “to provide
maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2). This variety thus
fulfils all the criteria of a “specialized register” in the sense of Biber and Conrad
(2009).
The ICAO standardised phraseology is precisely defined in several official
documents published by the ICAO. The second volume of Annex 10 to the Con-
vention on International Civil Aviation (ICAO 2001) describes “Aeronautical Com-

munications”, chapter 12 of ICAO Document 4444 on Air Traffic Management
(ICAO 2007b) is devoted entirely to “Phraseologies” and ICAO Document 9432,
the Manual of Radiotelephony (ICAO 2007a), provides a collection of illustrations
of the recommendations given in the other two documents.
Recommendations exist for all levels of language, including lexicon,
grammar and pronunciation. According to Biber and Conrad (2009: 6), “[r]egis-
ters are described for their typical lexical and grammatical characteristics” and
they state that their “linguistic features are always functional”. Pronunciation
features are not included in the list of linguistic features of registers by Biber and
Conrad (2009: 6), but since the pronunciation features of the ICAO phraseology
are strictly functional and since Biber mentioned phonological features as reg-
ister features in an earlier study (cf. Biber 1995: 29), they will also be considered
linguistic features of this register and thus be presented in this section.
Lexical characteristics
Standardised phraseology is probably best known for its characteristics at the
lexical level. At the heart of this register is a reduced vocabulary consisting of a
limited number of words and fixed phrases, each with a single precise meaning in
the situational context of routine air traffic control communication.
Section 5.2.1.5.8 of Annex 10 to the Convention on International Civil Aviation
(ICAO 2001) contains a brief list of words and phrases that “shall be used in radio
telephony communications as appropriate and shall have the meaning ascribed
hereunder.” The list contains key terms of radiotelephony communication, such
as affirm for ‘yes’, cleared (cf. Transcript 1) for ‘authorised to proceed [with the air-
craft] under the conditions specified’, go ahead (cf. Transcript 4) meaning ‘proceed
with your message’ but not ‘proceed with your aircraft’, monitor (Transcript 3) for
‘listen out on (frequency)’ and maintain (cf. Transcript 2) for ‘continue in accord-
ance with the condition(s) specified’. Section 12.3 of ICAO Document 4444 on Air
Traffic Management (ICAO 2007b) provides a more comprehensive collection of
words and phrases to be used in specific circumstances. For example, climb (cf.
Transcript 2) is prescribed as the phonetically dissimilar opposite of descend in
standardised phraseology, ruling out the use of ascend, which is regularly listed
as an antonym of descend in dictionaries of plain English (cf. OALDO 2014). The
recommendations even explicitly include words and phrases that should not be
used at all. For example, Section 3.1.4 of the Manual of Radiotelephony (ICAO
2007a: 3-1) suggest that “the use of courtesies should be avoided” altogether;
however, courtesies such as greeting and parting expressions are often used and
tolerated in non-urgent contexts (cf. Trancript 3). Standardised phraseology is
thus not among the many text varieties native speakers of a language acquire
“without explicitly studying them” (cf. Biber and Conrad 2009: 2) but has to be
learned by both native as well as non-native speakers of English with explicit
instruction.
From the lexical perspective, two main characteristics of the special regis-
ter referred to as standardised phraseology can be identified. First, in contrast to
most other varieties of English – where it is the rule rather than the exception for
words to have multiple meanings – each word and phrase has just one specific
and precisely defined meaning in aviation phraseology. Other meanings of words
which are polysemous in plain English are thus explicitly excluded from this reg-
ister and some of the defined meanings of words and phrases in aviation phra-
seology do not occur outside of this specialised register. Meanings of words and
phrases that do not occur in other registers are called “register markers” (Biber
and Conrad 2009: 53). Unlike register markers in many other registers, however,
these unique characteristics are strictly functional in standardised phraseology
(cf. Biber and Conrad 2009: 55). The second main lexical characteristic of this
register is the fact that words and phrases are carefully selected to avoid con-
fusion and misunderstandings due to phonetically similar expressions, since
“maximum clarity, brevity and unambiguity” (ICAO 2007a: 3-2) are considered
the most important aims of the prescription of aviation phraseology.
Grammatical characteristics
At the grammatical level, standardised phraseology is also characterised by a
number of pervasive and frequent “register features” (Biber and Conrad 2009:
53).
With respect to the use of verbs in aviation phraseology, the prescription to
use most verbs in the list of essential “words and phrases” in the imperative only
is certainly striking (cf. ICAO 2001: 5-6 and 5-7). According to the definitions in
this list, verbs such as cancel ‘annul the previously transmitted clearance’, check
‘examine a system or procedure’, contact (cf. Transcript 2) ‘establish communica-
tions with …’, disregard ‘ignore’, monitor (Transcript 3) ‘listen out on frequency’,
maintain (cf. Transcript 2) ‘continue in accordance with the condition(s) speci-
fied’, report ‘pass me the following information …’, and many more can only be
used in imperatives, which is certainly a register feature of this variety. Aviation
phraseology even prescribes the use of certain words as verbs in the imperative
which are not commonly used as verbs and thus not listed in this part of speech in
general-use dictionaries, e.g. the verbal use of standby (cf. Transcript 4) meaning
‘wait and I will call you’ (ICAO 2001: 5-7).
Another grammatical feature characteristic of aviation phraseology is the
specific prescribed order of elements in an utterance and the high frequency of
ellipses, as illustrated by the following authentic example:
Transcript 1:
Aerogal seven hundred heavy Kennedy Tower (.) winds calm (.) runway one three left (.)
cleared to land
(JFK Tower, own transcript, 2010)
In line with the recommendation in Section 5.2.1.6 “Composition of messages”

of Annex 10 to the Convention on International Civil Aviation (ICAO 2001), the
message uttered by the air traffic controller at JFK International Airport consists
of two main parts, the “call” made up of the call sign of the addressee Aerogal
seven hundred heavy and the call sign of the originator Kennedy Tower, and the
“text” winds calm (.) runway three one left (.) cleared to land, which provides
information concerning the weather and contains the instruction that the plane
is cleared to land on runway three one left. The fixed structure permits elliptical
constructions and the reduction of function words “to a small number of prep-
ositions” (Moder 2013: 229; cf. also ICAO 2010: 3-4), as illustrated by the above
example.
Overall, the grammatical characteristics of standardised phraseology reflect
the dominant functions of pilot-controller communication identified by Mell
(2004: 13), which are sharing of information (cf. information on the wind condi-
tions in Transcript 1 above), triggering actions, management of the pilot-controller
relationship and managing the dialogue. For example, the frequent use of imper-
atives is directly linked to the category “triggering actions” as “the core function
of pilot-controller communications” (Mell 2004: 13) and the prescribed structure
reduces the number of words needed for managing the dialogue between pilot
and air traffic controller. Transcript 2 illustrates the importance of imperatives
for triggering immediate actions (cf. also the imperatives continue, follow and
monitor in Transcript 3), in this case right after the decision of the pilots to abort
the landing and initiate a go-around:
Transcript 2:
Lufthansa four two four heavy climb [to and] maintain 3000 [feet] (.) fly runway heading
[…] contact Boston Departure […]
(Boston Tower, own transcript, 2015; imperatives in bold)
Pronunciation characteristics
The ICAO publications on standardised phraseology make specific recom-
mendations, which leads to additional linguistic features of this register. For
example, there are recommendations concerning the pronunciation of numbers
and letters. The “Radiotelephony Spelling Alphabet” defines the “desired pro-
nunciation” (ICAO 2001: 5-4) of the words representing letters when spelling out
“names, service abbreviations and words of which the spelling is doubtful” (ICAO
2001: 5-3). According to the ICAO (2001: 5-4), for example, the letter <z> has to be
pronounced as zulu /'zu:lu:/ and <k> has to be realised as kilo /'ki:lo/ (cf. Tran-
script 3).
Transcript 3:
Delta four twenty-seven (.) good day (.) continue down to kilo kilo [= taxiway KK] (.) follow
company [= another Delta jet] seven three seven (.) monitor tower one two three point niner
(JFK Ground, own transcript, 2008, my emphasis)
The pronunciation of numbers, which under most circumstances have to be pro-

nounced as single digits, is also specified in the recommendations for standard-
ised phraseology. Section 5.2.1.4.3 “Pronunciation of Numbers” of Annex 10 to
the Convention on International Civil Aviation (ICAO 2001) provides a description
of the desired pronunciation of numbers including recommended stress place-
ments:
(ICAO 2001: 5-5)
To avoid misunderstandings in radiotelephony communication, some of the

recommended pronunciations of numbers are deliberately different from the
common pronunciation of these numbers in many varieties of English spoken by
native speakers. The prescribed pronunciation features thus have to be learned
by native and non-native speakers of English alike. For example, dental frica-
tives are regularly replaced by alveolar stops – a recommendation in line with
Jenkins’ (2008: 146) recommendations for the so-called “Lingua Franca Core” of
English – and so the initial sounds in thousand and three are supposed to be
realised as /t/. Unfortunately, these recommendations are “often not adopted by
native speakers of English, who typically pronounce ‘3’ and ‘5’ in the usual plain
English way” (Moder 2013: 229–230). This is illustrated by Transcript 3, in which
the air traffic controller at JFK International Airport in New York City, most likely
a native speaker of English, pronounces <3> “in the usual plain English way”
(Moder 2013: 230) but realises <9> as niner.
Unlike for most other registers, there are even provisions concerning the
speed of delivery of utterances in Aviation English. The ICAO recommends “an
even rate of speech not exceeding 100 words per minute” (ICAO 2001: 5-5) and
an even slower rate “[w]hen it is known that elements of the message will be
written down by the recipient” (ICAO 2007a: 2-1). Studies, however, have shown
that particularly native speakers tend to use a much higher speech rate, often
over 200 words per minute, which can lead to misunderstandings and the need
for time-consuming clarifications (cf. Bieswanger 2013: 19–20). Silberstein and
Dittrich (2003: 9) quote an air traffic controller who admits: “I talk faster, a lot
faster – I talk so fast that they have to slow me down because they don’t under-
stand me anymore.” Since the speech rate is obviously crucial in Aviation English,
all pilots and air traffic controllers have to be trained to develop an awareness of
the importance of their speed of delivery.
Ever since its introduction after the Chicago Convention more than half a
century ago, the ICAO standardised phraseology has been refined and expanded.
The continuous development of standardised phraseology had been based on
pilots’ and controllers’ experiences and the analysis of language-related acci-
dents, in order to cover more areas of language use in aviation, to adopt new
procedures and technologies, and to deal with previously unknown or rare sit-
uations. For example, in reaction to recent events, the 15th edition of the ICAO
Procedures for Air Navigation Services: Air Traffic Management (ICAO 2007b: xv)
adds, among other regulations, new “pilot procedures in the event of unlawful
interference” and “procedures related to volcanic ash”.
Pilots and air traffic controllers are constantly urged to use standardised phra-
seology and to avoid non-standard communication whenever possible (cf., e.g.,
ICAO 2001: 5-1; ICAO 2007a: 3-2; ICAO 2010: 2-3; Prinzo et al. 2010: 15). Despite all
efforts to regularly update the standardised phraseology, the ICAO also acknow
ledges that “[i]t is not possible, however, to develop phraseologies to cover every
conceivable situation” (ICAO 2010: 4-2) and that “plain language shall be used”
(ICAO 2001: 5-1) when standardised phraseology is not available to cover the com-
municative needs of the stakeholders in air traffic control communication. The
following section will describe the use of plain language in such situations and
show that plain Aviation English can also be considered a specialised register.
3.3 Plain Aviation English
The use of plain language has never been excluded from the use in pilot-control-
ler communication but, quite on the contrary, has always been permitted and
used in clearly defined situations in which “standardized phraseology cannot
serve an intended transmission” (ICAO 2001: 5-1). As a result of this precise situ-
ational context, however, plain Aviation English is fundamentally different from
everyday conversations in several respects:
Plain language in aeronautical radiotelephony communications means the spontaneous,

creative and noncoded use of a given natural language, although constrained by the func-
tions and topics (aviation and non-aviation) that are required by aeronautical radiotele
phony communications, as well as by specific safety-critical requirements for intelligibility,
directness, appropriacy, non-ambiguity and concision. (ICAO 2010: 3-5)
Plain Aviation English is thus characterised by features that result from the func-
tion it has to fulfil with respect to safety and the topics covered in air traffic control
communication. These constraints are the reason for distinctive register features
at all linguistic levels, described and illustrated in the following subsections.
Lexical characteristics
The lexicon of plain Aviation English is less precisely defined than the words and
phrases used in standardised phraseology, but at the same time more restricted
than, for example, the lexicon of everyday conversation in what could be called
plain English. The ICAO recommendations make it very clear that the obvious
need for plain language in non-routine situations “should in no way be inter-
preted as permission to chat” (ICAO 2010: 4-3). At the lexical level, plain Avia-
tion English is thus characterised by words and phrases corresponding to topics
related to pilot-controller communication. These topics, which are also addressed
in textbooks and courses on plain Aviation English (cf., e.g., Emery and Roberts
2008), include, among others, fields such as technology, health, animals, fire
and weather (for a detailed list of domains, cf. ICAO 2010: B5-B8). For example,
in-flight medical emergencies often make the use of plain Aviation English neces-
sary (cf. Transcript 4). In Transcript 4, standardised phraseology is used in the
first two transmissions to establish contact but then turns out to be insufficient
to serve all of the communicative needs of the pilots. Hence a code-switch takes
place and the further three transmissions are carried out in plain Aviation English.
The vocabulary in these transmissions, however, is different from plain everyday
English in that it is characterised by aviation-related terms such as diversion,
declaring emergency and met report.
Transcript 4:
American 182 Tokyo Control American one eight two
Tokyo Control American one eight two (.) go ahead
American 182 Yes sir (.) we are (.) have a possible diversion to Narita [=Tokyo Narita
International Airport] (.) we are not declaring emergency yet but would
like Narita weather
[…] Narita airport is closed, Tokyo Haneda is suggested for a possible diver-
sion
Tokyo Control American one eight two (.) do you need met report [=weather report] of
Haneda?
American 182 Yes sir (.) request met report for Haneda
Tokyo Control Okay, standby
(Tokyo Control, own transcript, 2014)
Grammatical characteristics
The grammatical structure of plain Aviation English is similar to plain English
and only characterised by some tendencies which constitute functionally ori-
ented register features. Of the factors mentioned in the quotation above, “conci-
sion” (ICAO 2010: 3-5) is certainly one of the main driving forces responsible for
these characteristics. Concision is defined as ‘giving only the information that is
necessary, using few words’ in the OALDO (2014). In the context of plain Aviation
English, this means that the utterances produced by pilots and air traffic control-
lers have to be as brief as possible and simply structured. According to Prinzo
et al. (2010: 15), the rate of readback errors is affected by “both message length
and complexity” and they claim that “controllers should transmit less informa-
tion more often.” With reference to concision, it has also been reported that the
desire for brevity leads to an influence of standardised phraseology on plain
Aviation English, showing in the deletion of function words such as determiners
even when not using phraseology (ICAO 2010: 3-6). The last two transmissions in
Transcript 4 illustrate this claim, as the determiner the is omitted in both trans-
missions before met report.
Pronunciation characteristics
At the level of pronunciation, plain Aviation English is less restricted than stand-
ardised phraseology, as there are no specific recommendations concerning the
realisation of individual words and phrases. Other ICAO recommendations con-
cerning pronunciation, however, also apply to the use of plain language and
make plain Aviation English more restricted than plain English in many other
situations. For example, the recommended speech rate of 100 words or less per
minute (ICAO 2001: 5-5; cf. above) is also valid for plain Aviation English, which
aims for maximum “intelligibility” (cf. ICAO 2010: 3-5), just like standardised
phraseology.
This necessity for maximum mutual intelligibility in pilot-controller commu-

nication is also the reason for another requirement concerning the pronuncia-
tion of plain Aviation English, namely the demand that all pilots and air traffic
controllers “must take care to acquire an internationally understood accent or
dialect” (ICAO 2010: 5-6). The ICAO does not specify more precisely what is meant
by “internationally understood accent” and does not name any recommended
accents in particular, but this fairly vaguely defined rule applies to both native
speakers and non-native speakers of English. From a functional perspective, such
an accent or dialect is a register feature of plain Aviation English and necessary
for efficient and effective communication in air traffic control contexts.
4 C
onclusion
The above sections have shown that Aviation English is not monolithic and that
there is not one but two varieties referred to as Aviation English, namely stand-
ardised phraseology and plain Aviation English. Both varieties occur in pre-
cisely defined and complementary situations in pilot-controller communication:
standardised phraseology covers most routine situations, whereas plain Aviation
English is only permitted in non-routine situations. Both varieties share many
of the situational characteristics Biber and Conrad (2009: 39) consider “relevant
for describing and comparing registers”. They are employed by the same partici
pants, i.e. pilots and air traffic controllers, with identical relations between the
participants, use the same channel, face the same production circumstances and
share the same setting. The main differences with regard to the situational char-
acteristics can be found in the communicative purposes and the topics covered.
While both varieties share their general purpose, namely to facilitate efficient
and effective air traffic control communication, standardised phraseology is
restricted to a limited set of frequently used communicative purposes in routine
situations, whereas plain Aviation English covers a whole range of less frequently
used and non-routine communicative purposes such as emergencies. A similar
pattern can be identified concerning the topics covered by these two varieties:
while standardised phraseology covers a restricted but very frequently used set of
topics in routine air traffic control communication, plain Aviation English covers
a much broader range of air traffic related topics in non-routine situations.
Resulting from the partially different situational contexts, both varieties of
Aviation English are characterised by pervasive linguistic features that fulfil spe-
cific functions in each of the situations. Standardised phraseology is character-
ised by a very precisely defined reduced set of words and phrases, each with a
single prescribed meaning, a grammar marked by ellipsis, short utterances and

a frequent use of imperatives, and a prescribed pronunciation of numbers and
letters as well as recommendations concerning the speech rate. Reflecting the
wider range of communicative purposes and topics covered by plain Aviation
English, the lexical, grammatical and pronunciation characteristics are less pre-
cisely specified than for standardised phraseology. There are, however, character-
istics at all linguistic levels that distinguish plain Aviation English from conver-
sations in plain English, such as a reduced lexicon resulting from the restriction
of plain Aviation English to the topics related to aeronautical radiotelephony, a
grammar determined by the fundamental need for concision and non-ambiguity,
and ICAO recommendations concerning the speech rate and the intelligibility of
accents and dialects.
In conclusion, considering situational, linguistic and functional character-
istics, the analysis presented in this paper shows that both varieties of Aviation
English used in pilot-controller communication can be categorised as specialised
registers in the sense of Biber and Conrad (2009: 10; 32–33). They are both fun-
damentally different from the very general register of conversation, and they are
distinct because they differ in their degree of specificity. Compared to plain Avi-
ation English, the situational, linguistic and functional characteristics of stand-
ardised phraseology can be much more precisely specified. Standardised phrase-
ology thus represents one extreme of a continuum of specificity of registers, while
conversations would be at the other end. Plain Aviation English could be placed
somewhere in between, although certainly in the range of specialised registers
and closer to aviation phraseology than to everyday conversations.
In air traffic control communication, routine and non-routine situations
alternate constantly, meaning that changes in communicative purpose and the
switching between the two specialised registers described in this article are the
rule rather than the exception in the work-life of pilots and air traffic controllers
(cf. Biber and Conrad 2009: 45). The two specialised registers, standardised phra-
seology and plain Aviation English, however, are the only choices permitted in
English-language air traffic control situations; plain English – as used in every-
day face-to-face or mediated conversations – is not an option and explicitly dis-
couraged by the ICAO. Both native speakers and non-native speakers of English
have to learn these two specialised registers with explicit instruction, as neither
of these specialised registers is among the many registers native speakers acquire
“automatically” without any extra effort. The need for situation-specific register
selection in air traffic control communication provides yet another example for
the fact that the use of the appropriate register in a given situation is the pre
requisite for successful communication.
References
Cambridge: Cambridge Universtity Press.
University Press.
Bieswanger, Markus. 2013. Applied linguistics and air traffic control: Focus on language
awareness and intercultural communication. In Silvia Hansen-Schirra & Karin Maksymski
(eds.), Aviation communication: Between theory and practice, 15–30. Frankfurt am Main:
Peter Lang.
Convention on International Civil Aviation. 1944. Convention on international civil aviation
done at the 7th day of December 1944. Original version available at http://www.icao.int/
publications/Documents/7300_orig.pdf (accessed 31 January 2014).
Crystal, David. 2003. English as a global language. 2nd edn. Cambridge: Cambridge University
Press.
Cushing, Steven. 1994. Fatal words: Communication clashes and aircraft crashes. Chicago: The
University of Chicago Press.
Emery, Henry & Andy Roberts. 2008. Aviation English: For ICAO compliance. Oxford: Macmillan.
Hansen-Schirra, Silvia & Karin Maksymski (eds.). 2013. Aviation communication: Between
theory and practice. Frankfurt am Main: Peter Lang.
ICAO (International Civil Aviation Organisation). 2001. Annex 10: Aeronautical
telecommunications. Volume II. 6th edn.
ICAO (International Civil Aviation Organisation). 2007a. Manual of radiotelephony. 4th edn.
ICAO Document 9432-AN/925.
ICAO (International Civil Aviation Organisation). 2007b. Procedures for air navigation services:
Air traffic management. 15th edn. ICAO document 4444-ATM/501.
ICAO (International Civil Aviation Organisation). 2010. Manual on the implementation of ICAO
language proficiency requirements. 2nd edn. ICAO Document 9835-AN/453.
Intemann, Frauke. 2008. ‘Taipei ground, confirm your last transmission was in English … ?’ – An
analysis of Aviation English as a world language. In Claus Gnutzmann & Frauke Intemann
(eds.), The globalisation of English and the English language classroom, 76–93. 2nd edn.
Tübingen: Narr.
Jenkins, Jennifer. 2008. Teaching pronunciation for English as a Lingua Franca: A sociopolitical
perspective. In Claus Gnutzmann & Frauke Intemann (eds.), The globalisation of English
and the English language classroom, 145–158. 2nd edn. Tübingen: Narr.
Jones, R. Kent. 2003. Miscommunication between pilots and air traffic control. Language
Problems and Language Planning 27(3). 233–248.
Kostecka, Robert. 2007. Aviate—Navigate—Communicate. Transport Canada: Aviation safety
letter 2/2007, 12–14.
Live-atc.net. www.live-atc.net. (accessed 19 February 2015)
Mathews, Elizabeth. 2004. New provisions for English language proficiency are expected to
improve aviation safety. ICAO Journal 59(1). 4–6, 27.
Mell, Jeremy. 2004. Language training and testing in aviation need to focus on job-specific
competencies. ICAO Journal 59(1). 12–14, 27.
Mitsutomi, Marjo & Kathleen O’Brien. 2004. Fundamental aviation language issues addressed
by new proficiency requirements. ICAO Journal 59(1). 7–9, 26–27.
Moder, Carol Lynn. 2013. Aviation English. In Brian Paltridge & Sue Starfield (eds.), The
handbook of English for specific purposes, 227–242. Malden: John Wiley & Sons.
OALDO (Oxford advanced learner’s dictionary online). 2014. http://oald8.
oxfordlearnersdictionaries.com/(accessed 31 January 2014).
Prinzo, Veronika O., Alan Campbell, Alfred M. Hendrix & Ruby Hendrix. 2010. U.S. airline
transport pilot international flight language experiences. Report 5: Language experiences
in native English-speaking airspace/airports. Technical report DOT/FAA/AM-10/18.
Washington, DC: Federal Aviation Administration, Office of Aerospace Medicine.
Silberstein, Dagmar & Rainer Dietrich. 2003. Cockpit communication under high cognitive
workload. In Rainer Dietrich (ed.), Communication in high risk environments (Special issue
12 of Linguistische Berichte), 9–56. Hamburg: Buske.
Rolf Kreyer
‘Now niggas talk a lotta Bad Boy shit’:
The register hip-hop from a corpus-
linguistic perspective
Abstract: The present paper wants to provide a first corpus-based analysis of one
of the most successful kinds of popular music, namely hip-hop. In particular, the
paper explores to what extent hip-hop can be regarded as a register in its own
right, analysing data drawn from a 200,000-word corpus of the most success-
ful hip-hop albums in 2003 and 2011. Taking Biber and Conrad’s (2009) register-
defining trias of situation of use, linguistic features, and associated functions as
a descriptive framework, it is argued that hip-hop can be warranted the status of
a register in its own right indeed.
1 I ntroduction
In Western societies, pop songs are an integral part of everyday life: we are sur-
rounded by pop songs in the supermarket, in the elevator or when driving a car.
Moreover, listening to pop songs is one of the (if not the) most popular pastime
among adolescents in America or Western Europe (cf., for instance, Schwartz and
Fouts 2003). Given the pervasiveness of pop songs, it is surprising that the scien-
tific study of this register does not figure very prominently in linguistics, although
pop songs have been given a considerable amount of attention in fields like cul-
tural studies.
In this respect, it is telling that none of the major corpora of the English lan-
guage provide any lyrics of pop songs. The linguistic analysis of this register is
still in its infancy and corpus-linguistic studies are few and far between. An early
corpus-based analysis of pop songs is Murphey (1989; cf. also 1990 and 1992). He
provides both quantitative as well as qualitative data from a 13,000-word corpus
of pop-song lyrics. His main focus, however, does not lie in the description of
a register but in the exploitation of pop songs for the learning and teaching of
English as a foreign language. A much more ambitious project is the BLUR (Blues
Lyrics collected at the University of Regensburg) corpus, which contains 7,341 song
Rolf Kreyer, University of Marburg

88 Rolf Kreyer
texts comprising roughly 1.5 million words (Miethaner 2001, 2005; Schneider and
Miethaner 2006). However, this corpus consisting of recordings from the 1920s
to the 1940s was compiled as evidence for earlier African American Vernacular
English and, accordingly, is only of limited value for the study of pop songs as an
important present-day register. More detailed analyses of modern pop songs can
be found in Kreyer and Mukherjee (2007) and Kreyer (2012). The former provide a
first attempt at describing the major linguistic properties of the register at issue,
such as deviant spellings (also cf. Mukherjee 2000) and lexical/lexico-grammat-
ical aspects. One focus of their research is on the degree to which pop songs can
be considered a written or spoken register. The data show that the register is more
spoken-like in general, as is shown in similarities in average word length or the
high frequency of the personal pronouns you and I. Interestingly, other features
that are typical of spoken language, such as the frequent use of you know as a dis-
course marker, were shown not to be that important in pop songs. Kreyer (2012)
explores the use of love-related metaphors in pop songs within the framework of
conceptual metaphor theory (e.g. Lakoff and Johnson 1980; Kövecses 2002). He
finds that, despite the (perhaps) popular assumption that pop songs are clichéd,
metaphors in pop songs are quite varied and creative. The most recent register-
related study of pop songs is Werner (2012). Since he is interested in small-scale
diachronic as well as varietal aspects of pop songs, his corpus consists of two
subcorpora, one with British lyrics and the other with American lyrics. The 1,128
songs included in the corpus span the years 1952–2008 and 1946–2005, totalling
171,968 and 170,234 words, respectively (Werner 2012: 23). Werner’s findings also
confirm earlier claims about the informal and conversational nature of pop songs
lyrics. However, he argues convincingly that subsuming pop song lyrics under the
conversational register would go too far. Rather, the low frequencies of typical
spoken features such as interjections or non-standard morphosyntactic elements
call for a more careful analysis: “the picture of pop-song lyrics as exemplars of
spoken/informal register […] had to be […] altered to be thought of as a ‘special’
register” (Werner 2012: 43).
The present paper wants to further contribute to our understanding of
pop song lyrics from a register perspective by exploring hip-hop as a potential
sub-register. A question that comes to mind is whether pop songs can be regarded
as one single monolithic register or whether it makes sense to assume more spe-
cific registers covered by the umbrella term ‘pop songs’. Biber and Conrad (2009:
10) claim that “[t]here is no one correct level on which to identify a register” and
“that registers can be studied on many different levels of specificity”.
The present paper aims at providing a first corpus-based analysis of one of
the most successful (musical) genres among pop songs, namely hip-hop. The
label ‘genre’ is also to be understood in its linguistic sense at this point, since
‘Now niggas talk a lotta Bad Boy shit’ 89
we cannot yet be sure that hip-hop constitutes a register. Based on data from an
updated pilot version of the Giessen-Bonn corpus of Popular music – GBoP (cf.
Kreyer and Mukherjee 2007), the paper explores Biber and Conrad’s (2009: 50)
three criteria for register analysis (situational characteristics, linguistic charac-
teristics and function; cf. Schubert, this volume) and shows that with regard to all
of these, hip-hop must be regarded as a register in its own right.
2 T
he data
The data for the present study is taken from an extended pilot version of GBoP. It
contains lyrics from the top albums from the US album charts of the years 2003
and 20111. More specifically, for 2003, 48 of the top 52 albums were included. Four
albums had to be ignored because they either did not contain any lyrics at all
or only contained non-English lyrics. The 2003 lyrics were taken from internet
lyric archives or from CD booklets (cf. Kreyer and Mukherjee 2007 for details). The
2003 material has been supplemented by the (English) lyrics of the top 50 albums
from 2011. These lyrics were primarily taken from A-Z lyrics (www.azlyrics.com).
This site is particularly suitable, since the lyrics it provides are usually reviewed
by a number of different users, resulting in a fairly ‘reliable’ version of the texts.
In some cases, other archives like metrolyrics (www.metrolyrics.com) or lyrics-
freak (www.lyricsfreak.com) had to be consulted.
From this compilation of albums, a subcorpus was compiled of albums that
would usually be considered as representative of hip-hop. Of course, the decision
whether to include an album or not is not an easy one. The criterion applied was
whether the featured artist was primarily considered a rapper/hip-hopper (infor-
mation taken from www.discogs.com). Nelly, for instance, is primarily regarded
as a rapper, which is why his album Nellyville was included in the corpus, even
though it contains tracks that might rather be considered R&B. Stripped by Chris-
tina Aguilera, by contrast, was not included, since the performer is not primarily
regarded as a rapper or hip-hopper, although some of the songs in her album
would fall under that category. Compilation albums were excluded if they fea-
tured more than one artist. All in all, the hip-hop corpus contains the lyrics from
18 albums; 9 from 2003 and 9 from 2011. Table 1 shows the composition of the
corpus.
1 My first explorations of the development of pop music registers started in 2012 when the data
from 2011 was the most recent data available.
90 Rolf Kreyer
Table 1: The corpus analysed in the present study.
Album # words
2Pac – Better Dayz 20,349

50 Cent – Get Rich or Die Tryin’ 13,711
Chingy – Jackpot 10,475
Eminem – The Eminem Show 13,049
Ja Rule – The Last Temptation 8,425
Missy Elliot – Under Construction 7,360
Nelly – Nellyville 13,424
Outkast – Speakerboxx/The Love Below 15,043
Sean Paul – Dutty Rock 10,163
Total 2003 111,999
Bad Meets Evil – Hell_The Sequel 9,246

Eminem – Recovery 15,694
Jay Z & Kanye West – Watch the Throne 7,529
Kanye West – My Beautiful Dark … 8,407
Lil’ Wayne – I am not a Human Being 7,218
Lil’ Wayne – Tha Carter IV 11,520
Nicki Minaj – Pink Friday 9,492
The Black Eyed Peas – The Beginning 7,750
Wiz Khalifa – Rolling Papers 7,564
Total 2011 84,420
Total 2003 + 2011 198,387
Since “[t]he analysis of register characteristics […] will generally focus on the
comparison of two or more registers” (Biber and Conrad 2009: 36), the hip-hop
data will be contrasted with the data from the remaining albums, in the follow-
ing referred to as ‘non-hip-hop corpus’ or ‘control corpus’ (cf. Appendix 1 for its
composition). Although the number of albums in this control corpus is almost
four times as large, the number of words is comparatively small, namely slightly
below 350,000.
In all the texts, the original punctuation and spelling deviations were
retained. This is particularly important for hip-hop, as spelling conventions are
an important means of creating identity (cf. Morgan 2001, 2002 and Olivio 2001).
Metatextual comments like verse, chorus or bridge or the identity of the singer in
duets, for example, were removed from the text. Choruses were spelt out any time
they appeared in the text, i.e. a comment like Chorus [2x] was replaced by a repe-
tition of the lines of the chorus. In those cases where it was not clear from the text
layout which words are still part of the chorus and which are part of the verse, an
audio version of the song was consulted. Other kinds of repetition were spelt out
if they contained words, e.g. a line like She (When she loves) [3x] was represented
three times in the corpus (without the [3x], of course). However, if repetitions con-
sisted of non-lexical material only, they were not made explicit, e.g. Oooooh oooh
ooohohhh [x2]. All texts were stored in .txt format. An example of a text is given
in (1) below (note that <Z>, from German Zeilenumbruch, stands for line break).
(1) G-Unit (What) <Z> We in here (What) <Z> We can get the drama popping <Z> We don’t
care (What, what, what) <Z> It’s going down (What) <Z> ’Cause I’m around (What) <Z>
50 Cent, you know how I gets down (Down) <Z> What up, Blood? (What) <Z> What
up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up, Gangstaaa? </C> What up,
Blood? (What) <Z> What up, Cuz? (What) <Z> What up, Blood? (What) <Z> What up,
Gangstaaa? <Z>
(50 Cent – What Up Gangsta?)
All analyses of the corpus material were conducted by using AntConc 3.2.4
(Anthony 2011) and Wmatrix (Rayson 2003, 2009).
3 Hip-hop – a register in its own right?

Following the definition of ‘register’ provided in Biber and Conrad (2009; cf.
Schubert, this volume), hip-hop can be regarded as a register in its own right if we
can specify a particular situation of use, a particular set of linguistic features and
a particular function of these features vis-à-vis the situation of use. This section
will discuss the first two of these three aspects. By way of conclusion, possible
functions will be explored.
3.1 Situation of use
In many respects, hip-hop and pop songs in general share situational features.
For instance, in both cases the channel is identical: the primary mode is (sung)
speech and the speech event is captured on a permanent medium (apart from a
live concert, of course). Similarly, the settings are identical, e.g. different times
and places of communication for the participants. Features of addresser and
addressee can be regarded as similar as well, at least on a general level. Pro-
duction circumstances might be described as ‘revised and edited’ in both cases,
although spontaneous rapping plays an extremely important role in hip-hop
culture (e.g. during battlin’ or cypha, i.e. rap competitions).
92 Rolf Kreyer
Alongside these similarities, two aspects are worth considering by which

hip-hop and other popsongs diverge, namely topic and relations among partic-
ipants. To explore topic-related differences, the corpus-analysis tool Wmatrix
(Rayson 2003, 2009) was used. Wmatrix provides web access to the UCREL
Semantic Analysis System (USAS), which automatically assigns semantic catego-
ries to all of the lexical items in a given corpus. On the whole, the semantic tagger
employs 21 broad semantic categories, which are shown in Figure 1.
A B C E
general and abstract the body and the arts and crafts emotion
terms individual
F G H I
food and farming government and architecture, housing money and commerce
public and the home in industry
K L M N
entertainment, sports life and living things movement, location, numbers and
and games travel and transport measurement
O P Q S
substances, materials, education language and social actions, states
objects and equipment communication and processes
T W X Y
Time world and psychological actions, science and
environment states and processes technology
Z
names and grammar
Figure 1: The semantic categories of USAS (Archer et al. 2002: 2).
On the highest level of specificity a total of 232 category labels is provided. The
category E ‘Emotion’, for instance, contains six subcategories, one of these being
subdivided into two further sub-classes. Figure 2 shows the structure of the cat-
egory ‘Emotion’:
Category Subcategory I Subcategory II Example
E: Emotion E1: General emotion, hysterical
E2: Liking adore, beloved
E3: Calm/Violent/Angry gentle, infuriated
E4: Happy/Sad E4.1: …: Happy amused, cheerful
E4.2: …: Contentment dismay, humour
E5: Fear/Bravery/Shock amazed, dread
E6: Worry/Concern/ anxious, edgy

Confident
Figure 2: The semantic category ‘Emotion’ in USAS (Archer et al. 2002: 10–11).
An example of the semantic tagging can be seen in (2), which shows a few words
from Tupac Shakur’s Still Ballin’.
(2) 0000002 510 VV0 Blame Q2.2/G2.2- G2.1

0000002 520 PPH1 it Z8
0000002 530 II on Z5
0000002 540 APPGE my Z8
0000002 550 NN1 mama S4f
The verb blame is tagged as a ‘speech act term’ (Q2.2) and, alternatively, as either
‘general ethics’ (G2.2) or ‘Crime, law and order: Law & order’ (G2.1). The minus
sign following G2.2 indicates the lack of ethics. Note that the tags are not given
in alphanumerical order; their sequence depends on the likelihood that USAS
assigns to each tag. The following three words, it, on, and my are either tagged as
‘pronoun’ (Z8) or ‘grammatical bin’ (Z5). The tag ‘S4f’ for mama tells us that we
are dealing with a kinship term, more specifically, female kin.
Like all automatic annotation, semantic annotation is not fully accurate. In
particular, hip-hop, with its idiosyncratic spelling and use of words, can lead to
problems. For instance, the frequencies of individual semantic categories showed
‘Food and Farming’ (category F) to be a topic of particular relevance for rappers
and hip-hoppers – a somewhat counter-intuitive finding. A closer look at the data
quickly revealed that this was due to the ambiguity of the string hoe, namely as a
farming tool and in the slang use of the term in the sense of ‘promiscuous woman’.
Another problem became apparent with the tag G1.2, ‘Politics’: the Patois per-
sonal pronoun form dem, which is highly frequent in the lyrics by Sean Paul, was
obviously understood as an abbreviation for democrat or related words. Similarly,
the form dat (that), presumably misinterpreted as the acronym for digital audio
94 Rolf Kreyer
tape, led to a very high frequency of the semantic category K3, ‘Recorded Sound’,
which as a consequence has also been ignored.
Such problematic cases aside, semantic annotation can give us an idea about
topics that are comparatively frequent or rare in hip-hop as opposed to other
pop songs. To this end, all semantic categories that showed relative frequencies
higher than 0.02 % in the hip-hop corpus were checked against the respective
categories in the control corpus, i.e. the non-hip-hop corpus. Table 2 provides
an overview of some semantic categories that seem especially suited to paint a
particular picture of the artists.
Table 2: A sample of semantic categories that are particularly frequent in the hip-hop corpus.
Semantic category Rel. freq. in Rel. freq. in r1/r2

hip-hop (r1) non-hip-hop (r2)
F3, ‘Cigarettes and Drugs’ 0.1 % 0.02 % 5
G2.1, ‘Crime, Law and Order’ 0.16 % 0.05 % 3.2
G3, ‘Warfare, Defence, Weapons, Army’ 0.34 % 0.12 % 2.83
I1, ‘Money: Generally’ 0.27 % 0.07 % 3.86
I1.1+, ‘Money: Affluence’ 0.03 % 0.01 % 3
I2.1, ‘Business: Generally’ 0.04 % 0.01 % 4
An example of a semantic category that is overrepresented in hip-hop is F3, ‘Cig-

arettes and Drugs’. While the hip-hop corpus contains 193 (0.1 %) tokens that are
assigned to that category, other pop songs only show 50 cases in 293,410 words
(0.02 %); i.e. in hip-hop there are five times as many words relating to cigarettes
and drugs than in other pop songs. An arguably related category is G2.1, ‘Crime,
Law and Order’, whose relative frequency in the hip-hop corpus is 3.2 times that
of the control corpus, namely 0.16 % as opposed to 0.05 %. Another compara-
tively frequent hip-hop category is G3, ‘Warfare, Defence, Weapons, Army’, which
is over 2.8 times more frequent in hip-hop than in other pop songs, namely 0.34 %
as opposed to 0.12 %. In addition to topics related to crime, drugs and weapons,
questions of wealth and money seem to play an important role in hip-hop: the
categories ‘Money: Generally’ (I1), ‘Money: Affluence’ (I1.1+) and ‘Business: Gen-
erally’ (I2.1) all are at least three times more frequently attested here than in the
control corpus.
It can be argued that the overrepresentation of the above categories serves to
paint a particular picture of the hip-hop artist as an independent, successful and
rich person that is involved in (gun) fights and crime. This image that emerges
from the semantic categories is in line with analyses from rap and hip-hop videos.
Jones (1997: 353), for instance, claims that rap music shows a high amount of
“socially questionable behaviors [… like] guntalk, drugtalk, the presence of
alcohol, bleeping of profanity, and gambling” (Jones 1997: 353; cf. also DuRant et
al. 1997; Smith and Boysen 2002; Kreyer 2015). On the whole, it could be argued
that the topics explored in hip-hop promote a ‘bad boy’ image of the artist.
In addition to topic-related contrasts between pop songs and hip-hop, another
major difference seems to lie in the relations among the participants, which, in
turn, has a bearing on the communicative purpose of hip-hop as opposed to
other pop songs. Relations among participants, are described along four dimen-
sions, namely interactiveness, social roles, personal relationship, and shared
knowledge, in Biber and Conrad’s (2009) approach. With regard to this variable,
hip-hop seems to obtain a special status. Spady et al. (1999: 67) provide the fol-
lowing quote from the rapper Method Man: “The streets is where you get you
stripes at”. This hints at the important role of street credibility, i.e. a hip-hopper’s
being close to his or her cultural backgrounds in ‘the streets’. Alim (2006: 113)
writes: “Hip-hop Culture not only began in the streets of Black America, but the
streets continue to be a driving force in contemporary Hip-hop Culture.” Although
successful hip-hop artists, like any other kind of successful pop singer, mostly
interact with a displaced audience, “[t]he members of the Black American Street
Culture, to whom the artists are directing their lyrics, are not physically present,
yet they are in conversation” (Alim 2006: 123). This hints at a relatively high level
of (maybe abstract) interactiveness that might not be typical of other pop songs.
Similarly, the artists’ focus on street identity and group solidarity seems to have
important consequences on the other three dimensions of participant relations:
artists assume a relation with the members of their audience that can be char-
acterised by relative similarity of status, a huge amount of shared knowledge
(which has been gained on the streets) and a personal relationship that would be
described as friends or brothas and sistas, rather than that of star and fan as in
many other pop music genres. This special relation of artist and audience leads
to an additional communicative purpose, namely that of “staying street”, i.e. of
staying connected to the streets and to their cultural background. Hip-hoppers
use their art to “represent ‘the streets’” but at the same time “to connect with
the streets as a space of culture, creativity, cognition, and consciousness” (Alim
2006: 124). A particularly impressive example of this is provided by JaRule’s Con-
nected from the album The Last Temptation.
96 Rolf Kreyer
(3) We world wide connected, and ya’ll don’t want to fuck with us
In the streets we respected, so ya’ll don’t want to fuck wit us
World wide connected nigga, ya’ll don’t want to fuck wit us
We gangster ass niggas and we hard to hit
Murder Inc in the role who could fuck wit this
On the whole, then, the situational characteristics of hip-hop and other pop
songs warrant the status of hip-hop as a register in its own right.
3.2 L inguistic features
This section discusses orthographical, lexical and grammatical phenomena as

possible register features/markers.
3.2.1 Orthographic features – -er/-a and -s/-z
Non-standard spelling is a common feature in written hip-hop culture, which

according to Beers Fägersten (2008: 227) “permeate[s] nearly all word types”. Of
the 10 most frequent words in her corpus, all of them grammatical, of course,
seven have non-standard alternatives, including the pairs the/da, you/u and that/
dat. In addition, she finds final orthographic -a as a substitution for both mor-
phemic and non-morphemic -er, as in rappa, younga and holla, neva, respectively.
Whereas this example of idiosyncratic spelling represents non-standard phonol-
ogy, the frequently occurring word-final -z is usually used as a spelling variant
that represents standard phonology more precisely than the standard spelling -s.
In his study on spelling conventions in rap music, Olivio (2001: 73) distin-
guishes between two types of non-standard orthography, namely spelling var-
iants that represent “distinctive features of AAVE [African American Vernacu-
lar English] phonology and syntax” and those that do not. He argues “that the
meaning of the non-standard orthographic choices depends on its contrast with
standard forms” (Olivio 2001: 73). This hints at a conscious decision on the part
of the writer to use non-standard orthography. After all, writers seem to be aware
of their deviation from the standard, as Olivio argues convincingly. In his corpus,
an instance like fo’ shows the awareness of the final consonant that we find in the
standard variant for. Similarly, the fact that bombers occurs as bombas in his data
shows an awareness of the standard silent in the middle of this word. Simi-
larly to what was discussed above, Olivio (2001: 72) interprets these choices as:
another way of addressing the particular audience […]. In other words, rap artists construct
themselves as ‘authentic’ through the use of language […,] through the use of locally signif-
icant images, sounds, and written texts.
He, too, reports on the ‘r-lessness’ of AAVE, as in the two examples above or in
cases like gangsta, rida, murda etc. In some cases, stressing the AAVE-pronun-
ciation leads to a decisive shift in meaning, as the late Tupac Shakur points out
regarding nigga: “Niggers was the ones on the rope, hanging off the thing; Niggas
is the ones with gold ropes, hanging out at clubs” (Lazin 2003). In the following
we will take a look at two idiosyncratic spelling features, namely orthographic -a
instead of -er and word-final -z as a plural marker. Table 3 shows the frequency of
these two non-standard spelling variants in the hip-hop corpus and the non-hip-
hop control corpus2.
Table 3: ‘r-less’ forms in the hip-hop corpus and the non-hip-hop control corpus.
Token Hip-hop Hip-hop Non-hip- Token Hip-hop Hip-hop Non-hip-

-a -er hop -a -a -er hop -a
anotha 11 83 0 mutha 1 0 0
balla 0 10 1 Muthafucka 1 0 0
betta 12 43 0 muthafucka 7 0 0
bigga 1 16 0 muthafuka 1 0 0
brotha 1 16 3 neitha 1 7 0
Crossova 1 0 0 neva 13 463 0
deala 1 6 0 nigga 613 0 12
docka 1 0 0 Numba 1 49 0
Exploda 1 0 0 otha 2 101 0
figga 1 19 0 Ova 5 188 0
fucka 1 2 0 playa 18 26 3
gangsta 41 6 4 Rida 3 4 0
2 The frequencies shown here are not entirely unproblematic because the texts were primarily
taken from lyrics archives (i.e. are most likely transcribed by fans) and not from official booklets.
To some extent, then, the numbers represent the audience rather than the artists themselves.
However, they still provide us with an idea of the use of non-standard spelling within the hip-
hop community, of which the artists want and claim to be a part.
98 Rolf Kreyer
Table 3(continued)
Token Hip-hop Hip-hop Non-hip- Token Hip-hop Hip-hop Non-hip-

-a -er hop -a -a -er hop -a
Gangstaa 3 0 0 rocka 2 2 0
Harda 1 12 0 stoppa 1 0 0
hotta 1 13 0 stunna 1 1 0
killa 2 15 0 sucka 1 6 0
lova 0 3 12 Sucka 2 0 0
mobsta 1 1 0 supa 1 26 0
motha 1 33 16 swagga 2 6 0
mothafucka 7 51 0 trigga 3 6 0
Motherfucka 1 7 0 wanksta 8 0 0
muhfucka 1 0 0 whateva 4 37 0
Murda 4 112 0
Table 3 provides the frequencies of ‘r-less’ forms in hip-hop and non-hip-hop

songs (columns ‘Hip-hop –a’ and ‘Non-hip-hop –a’, respectively). In addition, it
gives the frequencies with which regularly spelt forms occur in the hip-hop texts
(‘Hip-hop –er’). In the corpus we find a total of 45 ‘r-less’ types. 43 of these are
attested in the hip-hop corpus. The control corpus, by contrast, only shows seven
types of this particular kind of idiosyncratic spelling. With regard to type fre-
quency, we see clearly that this spelling phenomenon is a feature highly typical
of hip-hop. It is not surprising that this huge difference in type frequency results
in a huge difference in token frequency, namely 785 in hip-hop texts as opposed
to 51 in non-hip-hop texts. It is interesting to note, though, that the number of
regularly spelt forms is usually higher than that of non-standard forms, even in
hip-hop (see below for an explanation), notable exceptions being nigga, gangsta,
muthafucka and wanksta, whose spelling is predominantly non-standard. Still,
‘r-less’ forms are a pervasive feature in hip-hop, much more so than in other pop
songs: although the number of 51 tokens is fairly substantial, 28 of these occur
in merely two pop songs, namely the 12 instances of lova and the 16 instances
of motha. The former are all found in the song Eenie Meenie by Sean Kingston
featuring Justin Bieber and all instances of motha occur in Girls by Beyoncé. Inter-
estingly, in both cases these unconventional forms are part of the chorus in other-
wise rather conventionally spelt songs.
(4) Shawty is a eenie meenie miney mo lova (Eenie Meenie)

(5) Who run this motha? Girls! (Girls!)
Table 4: Word-final orthographic -z as plural marker in hip-hop and non-hip-hop popsongs.
Token Hip-hop -z Hip-hop -s Non-hip-hop -z
Boyz 21 41 0
Dredz 1 0 0
gangstaz 0 8 1
Gunnerz 1 0 0
Gunz 2 0 0
Hoez 1 148 0
Killaz 1 6 0
Niggaz 178 380 0
Outlawz 6 0 0
Ridaz 8 5 0
Word-final orthographic -z is considerably less frequent both as far as types and

tokens are concerned. In the data we find ten different types all in all (‘hypercor-
rect’ tokens like beatz or nutz, in which the voiced sibilant is not the correct plural
allophone, were excluded), nine of which are only attested in the hip-hop corpus,
totalling 219 tokens. The single type that occurs in the control corpus is gangstaz
with the frequency of 1. Interestingly, this one occurrence appears in the song
That’s how you like it by Beyoncé, featuring the rapper Jay-Z, who uses this form
in the line shown below:
(6) I know you’ve heard I’m a gangsta

They say “Stay away from them gangstaz”
They never change up, or pull they pants up (Beyoncé: That’s how you like it)
A comparison of the use of non-standard and standard variants (both in the

case of ‘r-less’ forms and orthographic -z) quickly reveals that in most cases the
standard still is the preferred version of spelling even in the hip-hop corpus. This
finding hints at a twofold function of spelling in hip-hop lyrics, as Olivio (2001:
72) points out:
[…] the use of non-standard orthographic choices may be another way of addressing the
particular audience, while these forms appear alongside standard orthographic forms
100 Rolf Kreyer
which are available to be consumed by a more general audience. In other words, rap artists
construct themselves as ‘authentic’ through the use of language and accounts of the social
and economic realities in late-capitalist society, and the effects of this reality on the lives of
rap artists and their communities; but they also construct an ‘authentic’ audience through
the use of locally significant images, sounds, and written texts.3
The only consistent use of non-standard spelling in the present corpus is shown
in the texts by the Jamaican rapper Sean Paul. His texts seem to be primarily
addressed at a specific audience consisting of speakers of Patois. Consider the
example below:
(7) So how can they waan big up dem chest

But they dun know Dutty Cup we deh ya rated as di best
A wouldn’t they love diss this is Sean-A-Paul this
We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)
The generally mixed occurrence of standard and non-standard spelling in hip-hop

non-withstanding, the data presented above show that the word-final -a instead
of -er as well as -z as plural marker can be regarded as a register feature of written
hip-hop lyrics (of course, partly influenced by attempts to mirror pronunciation
while recording or in the actual performance).
3.2.2 Lexical aspects
Other possible register features or even register markers can, of course, be found
in the lexis of hip-hop, particularly taboo expressions. Beers Fägersten (2008)
reports on the frequency of taboo terms as a feature of hip-hop. In her analy-
sis of a 100,000-word corpus of postings on a hip-hop-message board she found
that the frequency of “swear words, profanity or taboo terms” such as shit, fuck,
ass, nigga and bitch “suggests that such linguistic behaviour is in fact character-
istic of the hip-hop community” (Beers Fägersten 2008: 223–224). These taboo
words “serve to discursively represent the hip-hop individual, and subsequently
the community as well, by virtue of their recognisability as taboo words” (Beers
Fägersten 2006: 29).
With some uses of these taboo words we see what Morgan (2002: 121) refers
to as inversion, where “an AAE [African American English] word means the oppo-
3 Of course, orthographic choices play a comparatively minor role since the main way of ad-
dressing the audience is through the auditory channel.
site of at least one definition of the word in dominant culture”. The word shit,
for instance, “can refer to almost anything – positions, events, etc.” (Smither-
man 2000: 257). The shit is “a person who is the ultimate; most powerful; above
all others; top dog” (Smitherman 2000: 257). Another example is the form nigga,
where the idiosyncratic spelling signals a decisive shift in meaning, as discussed
above.
Table 5 shows the 30 words that are most key (according to AntConc) in the
hip-hop corpus when compared to the non-hip-hop control corpus.
Table 5: The top 30 key word forms in hip-hop when compared to the non-hip-hop corpus.
Rank Token Freq. Rel. freq. Freq. non- Rel. freq. Keyness
hip-hop hip-hop hip-hop non-hip- of token in
hop hip-hop
1 nigga 606 0.31 12 0.00 1006.88
2 shit 626 0.32 43 0.01 874.36
3 fuck 504 0.26 45 0.02 660.20
4 niggas 380 0.19 10 0.00 615.11
5 bitch 432 0.22 35 0.01 580.43
6 dem 232 0.12 7 0.00 370.02
7 ass 274 0.14 27 0.01 349.05
8 wit 207 0.11 4 0.00 344.62
9 ya 618 0.32 260 0.09 333.13
10 niggaz 177 0.09 0 0.00 325.09
11 Zoop 142 0.07 0 0.00 260.81
12 yo 253 0.13 52 0.02 239.10
13 hoes 148 0.08 5 0.00 232.88
14 gon’ 163 0.08 13 0.00 219.86
15 bitches 150 0.08 9 0.00 215.50
16 fucking 161 0.08 14 0.00 212.40
17 gettin’ 167 0.09 21 0.01 196.50
18 em 170 0.09 25 0.01 188.35
19 murder 112 0.06 2 0.00 187.61
20 they 925 0.47 705 0.24 187.39

102 Rolf Kreyer
Table 5(continued)
Rank Token Freq. Rel. freq. Freq. non- Rel. freq. Keyness
hip-hop hip-hop hip-hop non-hip- of token in
hop hip-hop
21 get 1036 0.53 834 0.28 182.07
22 ai 867 0.44 655 0.22 179.48
23 di 124 0.06 8 0.00 175.54
24 y’all 183 0.09 39 0.01 169.49
25 yuh 90 0.05 0 0.00 165.30
26 u 145 0.07 20 0.01 164.82
27 pussy 109 0.06 5 0.00 164.25
28 them 431 0.22 238 0.08 163.16
29 money 287 0.15 115 0.04 163.03
30 up 643 0.33 454 0.15 155.53
The frequent use of taboo words and profanity that is reported in Beers Fägersten
(2006 and 2008) can also be observed in the present corpus, the top five key-
words being nigga, shit, fuck, niggas and bitch. Inflectionally related forms occur
at rank 10 (niggaz), at rank 15 (bitches) and rank 16 (fucking). In addition, we see
a strong preference for terms with strong sexual connotations, such as ass, hoes
and pussy. Some of the above list might even be considered register markers. The
forms niggaz, Zoop, and yuh do not occur at all in the control corpus. The form
Zoop, however, cannot be regarded as indicative of the register, as it lacks the
pervasiveness necessary for register features/markers: it only occurs in one song,
CG by Nelly.
3.2.3 Grammatical features – copula absence
Anyone who has ever listened to hip-hop and has seen hip-hop videos is well
aware of the fact that it is an art form which is dominated by African Americans,
at least in the US. Are, then, the linguistic features of hip-hop merely a conse-
quence of the AAVE dialect? If that was the case, one would be hard put to argue
that these linguistic features fulfil a particular function in a particular situation.
An answer to that question is provided by Alim (2009: 117–123) in an analysis
of the absence of the present tense copular forms is and are. He compares the
frequencies of absence from the language of two hip-hoppers, Juvenile and Eve,
in two kinds of texts: an interview and their lyrics. For both artists, Alim (2009:
121–122) finds
an increase in the frequency of absence […] when moving from the interview data to the
lyrical data. […] it is clear that both of these artists display the absent form more frequently
in their lyrical data than in their interview speech data. […] the data suggest that the more
attention the artists pay to their speech (comparing interviews to lyrics) the more ‘non-
standard’ their speech becomes […].
His claim “that Hip-hop artists are indeed in conscious control of their copula
variability” (Alim 2009: 123) suggests that hip-hoppers deliberately make use of
AAVE features to achieve a particular (yet to be identified) effect. It makes sense,
therefore, to regard idiosyncratic linguistic features as exponents of register.
We will now look at patterns where a personal pronoun is either followed or
not followed by a present tense form of BE (in the past the copula is not absent;
cf. Alim 2006: 117) followed by either a NP (with definite or indefinite article) or
an ing-form of a verb, as in the examples below.
(8) PersProN + BEpres/ø + a/an

I am a pitbull off his leash
a nigga that think he a cracker
(9) PersProN + BEpres/ø +/the/
I am the baddest bitch in the petstore
I the designated driver Chuck never the rider
(10) PersProN + BEpres/ø + …ing/…in’
the world is falling and I am rising
Nigga you fucking with a changed man
Originally, it was planned to conduct an automatic search for the above patterns.
Since Wmatrix provides us with the means to tag corpora, a query for strings of
parts of speech seemed to be the method of choice. However, it was soon found
that the accuracy of the CLAWS tagger suffered from idiosyncratic syntax and
from idiosyncratic spelling conventions, particularly in the hip-hop corpus. As a
consequence, the patterns above were identified on the basis of lexical queries, for
instance ‘I a/an’, I’m a/an’ or ‘I am a/an’ as the possible instantiations of pattern
(8) with the first person singular personal pronoun. The resulting concordances
were post-edited to weed out non-target hits, such as those shown below.
As can be seen in example (11), a query that is only based on lexical infor-
mation will also find tokens that end in -ing although they are not progressive
forms. Example (12) shows a written representation of an extremely reduced
variant of I am going to. The example under (13) shows how problems can arise
104 Rolf Kreyer
because of Patois transcription and grammar: a is not the indefinite article in this
case. Rather, it seems to be an equivalent to an emphatic do in British English.4
Example (14) is particularly challenging, since the text alone would allow two
readings, namely as an instance of the pattern we are interested in or as an appos-
itive construction. The only way to resolve the ambiguity was to listen to the track,
which showed that the second reading is the more plausible one.
(11) I’m everything you love (Kid Rock: I’m Wrong But You Ain’t Right)
(12) I’m a call you as soon as I land (Whiz Kalifa: Top Floor)
(13) We nuh cater fi nuh guy and only girls we a request (Sean Paul: Like Glue)
(14) We the people / Are we the people? (Metallica: Some Kind of Monster)
The results of our analysis concerning the absence or presence of copula in

present tense BE are shown in Tables 6 and 7, which provide a detailed account
of the distribution of the individual variants in hip-hop and non-hip-hop, respec-
tively. More specifically, for each personal pronoun the tables provide the fre-
quency of absent, contracted or full form of copula BE either in front of the indef-
inite article, the definite article or the progressive form (in various realisations) of
a verb. Note that for Table 6 an additional row was inserted to include the idiosyn-
cratic written form ya for you. This row was not needed for Table 7, since the form
ya could not be found in those songs that were not hip-hop.
Table 6: Copula be and copula absence in the hip-hop corpus (‘abs.’, ‘contr.’ and ‘full’ refer to
absent, contracted and full form of the copula, respectively).
Pattern Indef. article Def. article …ing/…in’/…in Total
abs. contr. full abs. contr. full abs. contr. full abs. contr. full
I am/ø a/the/…ing 0 233 1 1 88 11 0 715 12 1 1036 24
You are/ø a/the/…ing 43 23 1 15 22 3 155 79 1 213 124 5
ya are/ø a/the/…ing 0 0 0 0 0 0 15 0 0 6 0 0
He is/ø a/the/…ing 2 5 0 1 7 0 6 14 0 9 26 0
She is/ø a/the/…ing 4 3 0 4 3 0 20 10 0 28 16 0
It is/ø a/the/…ing 0 76 1 0 29 1 3 28 4 3 133 6
We are/ø a/the/…ing 0 0 0 8 1 0 136 15 2 144 16 2
4 I am grateful to André Sherriah for his information on Patois.

Table 6(continued)
They are/ø a/the/…ing 0 0 0 0 0 0 61 4 0 61 4 0
Total 465 1355 37
Table 7: Copula be and copula absence in the non-hip-hop control corpus (‘abs.’, ‘contr.’ and
‘full’ refer to absent, contracted and full form of the copula, respectively).
I am/ø a/the/…ing 0 152 14 0 73 20 2 895 16 2 1120 50
You are/ø a/the/…ing 9 60 3 3 98 14 47 290 7 59 448 23
He is/ø a/the/…ing 0 14 9 0 7 0 3 19 1 3 40 10
She is/ø a/the/…ing 1 34 7 0 6 3 18 50 2 19 90 12
It is/ø a/the/…ing 0 83 0 0 50 2 0 118 0 0 251 2
We are/ø a/the/…ing 0 0 0 4 11 1 48 67 2 52 78 3
They are/ø a/the/…ing 0 0 0 0 1 0 4 22 0 4 23 0
Total 139 2050 100
A summary of the results shown in the two tables is provided in Figure 3, which
compares the relative frequency of absent copula BE in hip-hop as opposed to
non-hip-hop lyrics.
As can be seen, the data show a very pronounced preference for copula
absence in the hip-hop corpus compared to the non-hip-hop corpus. The largest
proportion of copula absence in non-hip-hop songs is found with the personal
pronoun we. A closer look at the data shows that, to a large extent, this exception
can be explained by the African-American R&B artist R. Kelly. In particular, we
find that a total of 17 tokens are found in one song only, namely Ignition. If we
ignore this particular song, the relative frequency of copula absence in non-hip-
hop already drops to 30 %. All in all, these results suggest that copula absence
is indicative of hip-hop. Future research will have to show to what extent this
particular feature is also pervasive in other possible sub-registers of pop songs,
such as R&B.
106 Rolf Kreyer
100%
90%
80%
70%
60%
50% absent_hip-hop
40% absent_other
30%
20%
10%
0%
I You Ya He She It We They
Figure 3: Copula absence in the hip-hop and the non-hip-hop control corpus.
4 Conclusion: The functional dimension

The concept ‘register’ rests on the assumption that a particular group of texts
exhibits a set of features that are frequent and pervasive within this group, while
at the same time being more or less rare in other groups of texts. In addition,
these features are supposed to fulfil a function vis-à-vis the situation in which the
texts at issue are used. Having explored the linguistic features above, this section
concludes the paper by providing some remarks on the functional dimension of
hip-hop lyrics.
In one (maybe two) word(s), the function of hip-hop lyrics may best be
described by the term street credibility. Already in our discussion of the situation
of use it has become clear that hip-hop artists and their audience partake in a
very special kind of relationship. This can be characterised by a high degree of
(displaced) interactiveness, not between a star and a fan but between brothaz and
sistaz of the same street culture from which hip-hop evolved. One major function
of hip-hop lyrics is to demonstrate the artists’ authenticity and to show that they
are ‘staying street’. All of the features discussed in the preceding sections can be
interpreted along these lines: the major topics as evidenced in the comparatively
high frequency of some semantic domains (‘Cigarettes and Drugs’, ‘Warfare …’,
‘Crime, Law and Order’ and money- or business-related concepts) mirror aspects
of street life in African American neighbourhoods in the US, where hip-hop
evolved. At the same time, idiosyncratic spelling (word-final -a and plural marker
-z), lexical features (the frequent use of taboo expressions and profanity often
with a significant change of meaning) and grammatical characteristics (copula
absence) focus on the common language background of the artist and his or her
audience. So, when “niggas talk a lotta Bad Boy shit”, as the late Tupac Shakur
raps, they portray themselves as representatives of ‘the streets’, while at the same
time connecting back to the streets and the people living there.
References
Anthony, Laurence. 2011. AntConc (Version 3.2.4) [Computer Software]. Tokyo, Japan: Waseda
University. http://www.antlab.sci.waseda.ac.jp (accessed May 2014).
Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS category system.
University of Lancaster. http://ucrel.lancs.ac.uk/usas/usasguide.pdf (accessed May 2014).
Beers Fägersten, Kristy. 2006. The discursive construction of identity in an internet hip-hop
community. Revista Alicantina de Estudios Ingleses 19. 23–44.
Beers Fägersten, Kristy. 2008. A corpus approach to discursive construction of a hip-hop
identity. In Annelie Ädel & Randi Reppen (eds.), Corpora and discourse: The challenges of
different settings, 211–240. Amsterdam: John Benjamins.
University Press.
DuRant, Robert H., Michael Rich, S. Jean Emans, Ellen S. Rome, Elizabeth Allred & Elizabeth R.
Woods. 1997. Violence and weapon carrying in music videos: A content analysis. Archives
of Pediatrics and Adolescent Medicine 151(5). 443–448.
Forman, Murray & Mark Anthony Neal (eds.). 2004. That’s the joint! The hip-hop studies reader.
New York: Routledge.
Jones, Kenneth. 1997. Are rap videos more violent? Style differences and the prevalence of sex
and violence in the age of MTV. Howard Journal of Communication 8(4). 343–356.
Kövecses, Zoltan. 2002. Metaphor: A practical introduction. Oxford: Oxford University Press.
Kreyer, Rolf. 2012. ‘Love is like a stove – it burns you when it’s hot’: A corpus-linguistic view on
the (non-) creative use of love-related metaphors in pop songs. In Sebastian Hoffmann,
Paul Rayson & Geoffey Leech (eds.), English corpus linguistics: Looking back, moving
forward, 103–115. Amsterdam: Rodopi.
Kreyer, Rolf. 2015. ‘Funky fresh dressed to impress’: A corpus-linguistic view on gender roles in
pop songs. International Journal of Corpus Linguistics 20(2). 174–204.
Kreyer, Rolf & Joybrato Mukherjee. 2007. The style of pop song lyrics: A corpus-linguistic pilot
study. Anglia 125. 31–58.
Lakoff, George & Mark Johnson. 1980. Metaphors we live by. Chicago: Chicago University Press.
Lazin, Lauren. 2003. Tupac: Resurrection. Paramount.
Miethaner, Ulrich. 2001. The BLUR (Blues Lyrics Collected at the University of Regensburg)
corpus: Blues lyricism and the African American literary tradition. Current Objectives of
Postgraduate Studies 2. http://copas.uni-regensburg.de/article/view/64/78 (accessed 3
January 2015).
108 Rolf Kreyer
Miethaner, Ulrich. 2005. I can look through Muddy: Analyzing earlier African American English
in blues lyrics (BLUR). Frankfurt am Main: Peter Lang.
Morgan, Marcyliena. 2001. ‘Nuthin’ but a G thang’: Grammar and language ideology in hip-hop
identity. In Sonja L. Lanehart (ed.), Sociocultural and historical contexts of African
American Vernacular English, 187–210. Athens: University of Georgia Press.
Morgan, Marcyliena. 2002. Language, discourse and power in African American culture.
Cambridge: Cambridge University Press.
Mukherjee, Joybrato. 2000. ‘Krisis at Kamp Krusty’: Deviant spellings in popular culture as
examples of medium-dependent graphic presentation structures. Arbeiten aus Anglistik
und Amerikanistik 25. 161–172.
Murphey, Tim. 1989. The where, when and who of pop song lyrics: The listener’s prerogative.
Popular Music 8. 58–70.
Murphey, Tim. 1990. Music and song in language learning: An analysis of pop song lyrics and
the use of music and song in teaching English to speakers of other languages. Bern: Lang.
Murphey, Tim. 1992. The discourse of pop songs. TESOL Quarterly 26. 770–774.
Olivio, Warren. 2001. Phat lines: Spelling conventions in rap music. Written Language and
Literacy 4. 67–85.
Rayson, Paul. 2003. Matrix: A statistical method and software tool for linguistic analysis
through corpus comparison. Lancaster University: Ph.D. thesis.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Computing
Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/(accessed May 2014).
Schneider, Edgar W. & Ulrich Miethaner. 2006. When I started to using BLUR. Accounting for
unusual verb phrase patterns in an electronic corpus of Earlier African American English.
Journal of English Linguistics 34. 233–256.
Schwartz, Kelly D. & Gregory T. Fouts. 2003. Music preferences, personality style, and
developmental issues of adolescents. Journal of Youth and Adolescence 32. 205–213.
Seidman, Steven A. 1992. An investigation of sex-role stereotyping in music videos. Journal of
Broadcasting and Electronic Media 36(2). 209–216.
Smith, Stacy L. & Aaron R. Boyson. 2002. Violence in music videos: Examining the prevalence
and context of physical aggression. Journal of Communication 52(1). 61–83.
Smitherman, Geneva. 2000. Black talk: Words and phrases from the hood to the Amen corner.
Boston: Houghton Mifflin Company.
Spady, James G., Charles G. Lee & H. Samy Alim. 1999. Street conscious rap. Philadelphia:
Unum Loh Publishers.
Werner, Valentin. 2012. Love is all around: A corpus-based study of pop lyrics. Corpora 7(1).
19–50.
Appendix 1: The non-hip-hop corpus
Top 50 Albums – Non-hip-hop 2003 Top 50 Albums – Non-hip-hop 2011
3 Doors Down – Away From The Sun Brad Paisley – This Is Country Music
Aaliyah – I care 4 U Adele – 19
Alan Jackson – Greatest Hits II … Adele – 21
Audioslave – Audioslave Beyoncé – 4
Avril Lavigne – Let Go Bon Jovi – Greatest Hits
Beyoncé – Dangerously In Love Britney Spears – Femme Fatale
Celine Dion – One Heart Bruno Mars – Doo-Wops And Hooligans
Cher – The Very Best Of Cher Chris Brown – F.A.M. E.
Christian Aguilera – Stripped Coldplay – Mylo Xyloto
Coldplay – A Rush Of Blood To The Head Florence and the Machine – Lungs
Dixie Chicks – Home Foo Fighters – Wasting Light
Elvis Presley – 30 #1 Hits Glee – The Music; Season 2
Evanescence – Fallen Glee – The Music, The Christmas …
Faith Hill – Cry Jackie Evancho – Dream With Me
Good Charlotte – The Young And … Jackie Evancho – O Holy Night
Hilary Duff – Metamorphosis Jason Aldean – My Kinda Party
Jennifer Lopez – This Is Me … Then Josh Groban – Illuminations
John Mayer – Room For Squares Justin Bieber – My World 2.0
Justin Timberlake – Justified Justin Bieber – My World’s Acoustic
Kelly Clarkson – Thankful Justin Bieber – Never Say Never …
Kenney Chesney – No Shoes, … Katy Perry – Teenage Dream
Kid Rock – Cocky Keith Urban – Get Closer
Linkin Park – Meteora Kenny Chesney – Hemingway’s Whiskey
Luther Vandross – Dance With My Father Kid Rock – Born Free
Matchbox Twenty – More Than You … Lady Antebellum – Need You Now
Metallica – St. Anger Lady Antebellum – Own the Night
R. Kelly – Chocolate Factory Lady Gaga – Born This Way
Rascal Flatts – Melt Mumford and Sons – Speak Now
Rod Stewart – It Had To Be You … P!nk – Greatest Hits … So Far!!!
Santana – Shaman R. Kelly – Loveletter
Shania Twain – Up! Rascal Flatts – Nothing Like This
Tim McGraw – Tim McGraw And … Rihanna – Loud
Toby Keith – Unleashed Sugarland – The Incredible Machine
Susan Boyle – The Gift
Taylor Swift – Speak Now
The Band Perry – The Band Perry
The Black Keys – Brothers
Tony Bennett – Duets 2
Zac Brown Band – You Get What You Give
Teresa Pham
The register of English crossword puzzles:
Studies in intertextuality
Abstract: Despite their popularity, crossword puzzles have so far been neglected
in text-linguistic publications. Therefore, this paper provides a detailed analysis
of crosswords. As a textual variety related to a specific situation, fulfilling specific
functions and displaying pervasive, frequent linguistic and formal features, this
type of linguistic riddle must be regarded as an independent register according to
the framework by Biber and Conrad (2009). Moreover, a detailed linguistic ana
lysis establishes non-cryptic and cryptic crosswords as two distinct sub-registers.
For the purpose of exploring the role of intertextuality in those two sub-registers,
a corpus of 270 intertextual non-cryptic and cryptic clue-answer pairs from The
Sun (N.N. 2009) and The Times (Browne 2009) was compiled. A quantitative ana
lysis of this corpus reveals that intertextual references in cryptic puzzles primar-
ily target classical mythology, Shakespeare and the Bible, whereas non-cryptic
puzzles additionally require knowledge of Anglo-American popular culture. The
qualitative analysis of the corpus discusses the particular forms and functions of
intertextuality in non-cryptic and cryptic puzzles (Stocker 1998), providing also
an explanation for their use from a cognitive linguistic perspective (Geeraerts &
Cuyckens 2007) as well as a comparison with intertextuality in other registers.
The paper shows that intertextual references and their particular forms and func-
tions may be distinctive features of certain registers. Intertextuality is context-
dependent and used with a particular communicative function and should thus
be incorporated as one possible feature into the linguistic analysis of registers
according to the framework by Biber and Conrad (2009).
1 I ntroduction
Crossword puzzles (or simply crosswords) are the most popular type of linguistic
puzzle today (cf. Augarde 2003: 57) and hold a permanent place in most British
and American newspapers. Given this prominence in regular, if not everyday
language use, their marginalisation as a register in text linguistic analysis and
the resulting scarcity of relevant linguistic publications are surprising. Most pub-
Teresa Pham, University of Vechta

112 Teresa Pham
lications on crosswords belong to the discipline of psychology (e.g. Hambrick,

Salthouse, and Meinz 1999; Nickerson 2011; Underwood, Deihim, and Batt 1994;
Witte and Freund 1995) or examine crosswords from the perspective of didactics
(e.g. Mollica 2007; Weisskirch 2006) or cultural studies (e.g. Cornell and Cornell
1980; Stratmann 1995). Other types of word games have been studied in detail
(e.g. Dienhart 1998; Fix 2011; Pepicello 1980) and crosswords have been analysed
even from a general linguistic (though not specifically register-based) perspec-
tive (e.g. Coffey 1998; Mok 1987). Furthermore, some text linguistic publications
explicitly refer to puzzles or even crosswords as a register (e.g. Heinemann 2000:
610–611; Furthmann 2006: 133; Rolf 1993: 258). Hence, while their status as a dis-
tinct register is largely uncontested, the field of register studies still lacks specific
analyses of crosswords.
Therefore, this paper first provides a register analysis of crosswords follow-
ing the framework by Biber and Conrad (2009; see also Schubert’s introduction
to this volume). It then reports the results of a corpus study on English-language
crosswords, focusing on the role of intertextuality in the constitution of this reg-
ister.
2 Crossword puzzles as a register

The OED (Simpson and Weiner 2015) defines crosswords as “puzzle[s] in which a
pattern of chequered squares has to be filled in from numbered clues”. Accord-
ingly, crosswords are a type of word game in which answers to clues have to be
inserted into a grid of boxes.
2.1 S
ituational analysis
I. Participants: The clues of crosswords are provided by a setter or compiler, who

usually remains anonymous or works under a pseudonym. Puzzles are addressed
to a plural, yet un-enumerated set of solvers, who, in most cases, work individ-
ually and neither interact with setters nor are in direct, personal contact with
them. Furthermore, there is some disagreement on the social status of setters
and solvers. Since solving crosswords requires thorough general and sometimes
even expert or ‘esoteric’, i.e. uncommon or specialist knowledge, Partridge (1992:
504) draws the following sociolinguistic profile of typical setters and solvers:
“humanistically educated speakers of Standard English, with a reasonably deep
basis of Western culture, a general knowledge of literature, history, geography
and current affairs, familiar with and perhaps active in what have been classed as
middle-class sports”. However, since certain strategies of codification, chunks of
knowledge and even clues are recurrent, crossword experience is also a major pre-
dictor of crossword proficiency (cf. Hambrick, Salthouse, and Meinz 1999: 140).
From the cognitive linguistic perspective, this correlation, like the phenomenon
of agenda-setting (cf. Scheufele and Tewksbury 2007), is due to the fact that fre-
quent activation makes cognitive representations more easily retrievable. There-
fore, others (e.g. Scott and O’Donnell 1998: 237) claim that the knowledge and
skills necessary for crosswords can be acquired by everyone and consequently
regard crosswords as democratic.
II. Production circumstances and channel: With their close interde-
pendence between clues and answers, crosswords result from a careful and
time-consuming process of planning and editing. The reception process may
be equally time-consuming and non-linear. Therefore, the written mode is one
essential characteristic of crosswords – even in the digital age, where puzzles can
be downloaded from websites or generated by computer programmes or applica-
tions on mobile devices.
Furthermore, what can be considered as a marker of crosswords and what is
equally dependent on their appearance in writing is their physical layout on the
page. Answers must be inserted, letter by letter, into a grid available either on
paper or digitally and consisting of white (generally lights; cf. Scott and O’Don-
nell 1998: 219) and black squares (blocks; cf. Moorey 2008: 5). The corresponding
numbering of clues and squares indicates into which light the first grapheme of
the respective answer has to be inserted. Subsequent letters of the answer are
inserted either horizontally or vertically into the grid, depending on whether the
clue was labelled Across or Down. Answers are interdependent by their inter-
secting in so-called crosslights or checked letters (cf. Biddlecombe 2009). Con-
sequently, each correct answer will simplify the search for subsequent intersect-
ing answers to a greater or lesser extent (cf. Nickerson 2011; Goldblum and Frost
1987). The number of letters which are part of only one answer (unchecked letters
or unches) is an indicator of the difficulty of a crossword (cf. Augarde 2003: 63;
Scott and O’Donnell 1998: 219).
III. Setting: Setters and solvers do not share the physical context of com-
munication. As already mentioned, crosswords (as well as their solutions) are
usually originally printed in newspapers, i.e. in a public space, but are typically
solved in private. Heinemann (2000: 610–611) therefore assigns them to the
(semi-)official, public domain.
IV. Purposes: Crosswords are devoid of the usual purpose of language use,
which is communication (cf. Schlepper 1981: 63). On the contrary, the primary
purpose of crosswords is to entertain and delight the addressee: they allow
114 Teresa Pham
setters and solvers alike to manipulate language irrespective of established rules

and conventions and thus “provide an opportunity of handling at one’s whim
a medium which in other situations very much has a will of its own” (Schlep-
per 1981: 78; cf. Augarde 2003: vii). However, crosswords may also provide social
pleasure when they are solved cooperatively or competitively. Finally, crosswords
may be completed to test or consolidate one’s knowledge, to maintain or to boost
one’s cognitive capacities (e.g. one’s memory capacity or mental flexibility).
Medical research even suggests that such mental exercise reduces the risk for
certain diseases like dementia (cf. Moorey 2008: 3).
In order for crosswords to fulfil these functions, it is essential that they,
despite answers being encoded, are devised to be solvable by the target ‘solver-
ship’. While an unsolvable puzzle causes frustration, the ability to solve a puzzle
is experienced as a success and provides the pleasurable feeling of being part of
the intellectual elite.
2.2 Analysis of linguistic features
2.2.1 General features of crossword puzzles
From a discourse analytic perspective, the basic building blocks of crosswords

are adjacency pairs, each consisting of a clue and an answer. Each clue encodes
its respective answer more or less strongly. A figure in brackets at the end of
the clue usually indicates the number of letters of the answer. For answers to
intersect in the grid, crosswords require a plural, yet variable number of such
clue-answer pairs. The first turn is provided by the setter, whereas the second
turn is provided by the solver. Since crosswords are intended to be solvable, only
one answer is indubitably correct (sometimes also taking into consideration the
number of lights or using the crosslights already filled in the grid). However,
since clue-answer pairs function independently, there are usually no cohesive
ties between them. On the contrary, linguistic means which are usually cohesive
(e.g. articles, personal or demonstrative pronouns) may be employed to encode
answers according to certain conventions. In some crosswords, the personal pro-
nouns he or she, for example, may not function as anaphoric or cataphoric refer-
ences to preceding or subsequent noun phrases, but may point to the fact that the
words man or girl (or their letters) are part of the answer (cf. Skinner 2008: 25). In
rare cases, the adjacency pairs of a puzzle are linked by a common topic, which
may be indicated (more or less directly) by its title. Cohesion between adjacency
pairs may also be established by an explicit “Cross-reference” (Partridge 1992:
501). Thus, in example (1) the clue requires prior identification of the answer to
clue number 11:
(1) Line also transported 11 to shore (9) – LANDWARDS (Browne 2009: 124)
Apart from that, the only links between clues usually are their appearing together
with one uniform layout and the combinatory interdependence of the respective
answers in the grid. If, following Halliday and Hasan (1976: 1), a text is defined
as “a unit of language in use” whose texture arises from inter-sentential cohesive
ties on the surface, crosswords do not normally constitute texts.
Besides cohesion, further standards of textuality according to de Beaugrande
and Dressler (1981) are not or only partially met: clues are thematically inde-
pendent and there is no continuity of or even connection between underlying
concepts (coherence). Furthermore, even if clues need to be new and creative to
be intellectually challenging for solvers, crosswords do not have the function of
transmitting information (informativity). However, the setter’s primary inten-
tion of entertaining solvers is evident (intentionality) and, although most clues
would be unacceptable and irrelevant in usual communicative situations, cross-
word initiates accept these linguistic inconsistencies as being part of this type of
puzzle (acceptability, situationality). Thus, if we define a text as a passage of lan-
guage which “functions as a unity with respect to its environment” (Halliday and
Hasan 1976: 1) and consider cohesion, informativity (cf. Schubert 2012: 23) and
also coherence as frequent, but non-obligatory features of texts, then crosswords
must certainly be regarded as texts.
2.2.2 Features of non-cryptic and cryptic crossword puzzles
There are two basic types of English-language crosswords, generally called

non-cryptic or primitive and cryptic puzzles (cf. Schlepper 1981: 61). In the latter
type, clues are more obscure than in the former and encode the answers more
strongly according to certain conventions (see below). Non-cryptic puzzles, which
have been published since 1913 (cf. Stephenson 2007: 7), are common in most
European and non-European countries. Cryptic crosswords emerged in England
towards the end of the 1930s (cf. Scott and O’Donnell 1998: 211). Today they are
an integral part of British culture and are regularly published (often alongside
non-cryptic puzzles) in most British magazines and newspapers (quality as well
as popular, national as well as regional and local). Cryptic crosswords have even
influenced puzzles outside Great Britain: cryptic clues occur in some American
dailies such as The New York Times (Variety puzzle) and some French newspa-
116 Teresa Pham
pers (e.g. Le Figaro, Le Nouvel Observateur; cf. Mok 1987: 98). Since the 1970s,
Die Zeit, a weekly national German quality paper, has been publishing a type of
crossword puzzle which combines cryptic and straightforward clues (Um die Ecke
gedacht, literally ‘thought outside the box’). However, the British cryptic cross-
word remains unique: “Although traces of the cryptic crossword can be found in
some European countries, it is nowhere developed to anything like the extent it
has now reached in the UK […]. German-language puzzles are those which come
closest to the British model […]. By and large, however, these are all relatively
modest by British standards” (Scott and O’Donnell 1998: 211–213).
A quantitative analysis performed on 20 puzzles (523 clue-answer pairs) from
The Times (cryptic puzzles; Browne 2009), The Guardian (non-cryptic puzzles;
Rusbridger 11.–16.05.2013) and The Sun (two-speed crosswords giving a non-cryp-
tic and a cryptic clue for each answer; N.N. 2009) confirms the basic distinction
between the two types of puzzle:
Table 1: Quantitative analysis of non-cryptic and cryptic puzzles
Non-cryptic puzzles Cryptic puzzles
Length of clues The Sun: 2.1 The Sun: 6.1

(orthographic units delimited The Guardian: 3.3 The Times: 6.8
by blanks)
Average: 2.7 Average: 6.5
Length of answers The Sun: 6.2 The Sun: 6.2

(letters) The Guardian: 6.7 The Times: 7.5
Average: 6.4 Average: 6.9
Despite variability within each type, clues and answers are considerably shorter
in non-cryptic than in cryptic puzzles. Furthermore, both turns are morpho-
syntactically more complex in the latter type. Non-cryptic clues are usually very
simple phrases, often consisting of a head only as in (2), sometimes in combina-
tion with a simple pre- or postmodifier (3), whereas the corresponding answers
are mostly single content words or proper names:
(2) Flowery (6) – FLORAL

(3) Mediterranean volcano (4) – ETNA (N.N. 2009: 105, 103)
Cryptic clues, by contrast, resemble block language headlines. When they are
constituted by phrases, these are typically more complex, containing for example
longer prepositional phrases or (finite or nonfinite) clauses as postmodifiers (4).
Cryptic clues may also have an often elliptical clause structure, taking the form
of simple or complex, mainly declarative sentences (cf. Quirk et al. 1985: 40, 803)
as in (5). In addition to single content words and proper names, the answers to
cryptic clues often comprise morphologically complex lexemes (e.g. idioms as
in (5), compounds nouns or multi-word verbs) as well as function words (6) or
phrases (4).
(4) Bloomer made by top performer in nativity scene? (4,2,9) – STAR OF BETHLEHEM
(5) Find a lovely partner to share a seasonal moment (4,1,7) – PULL A CRACKER
(6) Jarring we hear’s in contrast (7) – WHEREAS (Browne 2009: 122, 110, 124)
Furthermore, the relationship between the turns of the same non-cryptic adja-
cency pair is overtly governed by the “Rule of Inflection” and the “Rule of Iden-
tity” (Schlepper 1981: 67). The former prescribes that clue and answer must “be
able to fulfil the same syntactic function” (Schlepper 1981: 67). Therefore, they
usually have the same inflection (7) and/or belong to the same formal syntactic
category. However, a prepositional phrase may also point to an adverb or a non-
finite clause to an adjective (8).
(7) Least cooked (6) – RAREST (N.N. 2009: 111)

(8) Lacking injury (6) – UNHURT (Rusbridger 11.–16.05.2013)
The latter rule dictates that clue and answer have to be semantically equivalent,
allowing (absolute or near) synonymy (9), negated antonymy (10), hyponymy (11)
as well as paraphrases and definitions of variable precision (12).
(9) Applaud (5) – CHEER

(10) Not dead (5) – ALIVE
(11) Hairdo (4) – PERM
(12) Short-tempered person (7) – HOTHEAD (N.N. 2009: 9, 25, 69, 27)
Therefore, according to Greimas (1970: 287), crosswords work like a reverse dic-
tionary, where only the definitions are given and the appropriate lemmata have
to be provided by the solver. Yet to complicate matters, solving a non-cryptic clue
may require considering polysemy, homonymy and proper names. In addition,
the relationship between clues and answers may also be syntagmatic, being
based on phraseological units such as idioms or collocations.
The aforementioned rules apply less overtly to cryptic crosswords. The
reason for this opacity is that cryptic clues have a binary structure. It is only the
definition (underlined in the following examples of cryptic clues) that is syn-
tactically and semantically equivalent to the answer. The subsidiary indication,
however, encodes the same answer a second time semantically, phonologically
118 Teresa Pham
or orthographically. Thus, in example (13) the definition huge is a synonym of the

answer, whereas the remaining subsidiary indication encodes the answer again,
orthographically.
(13) Huge mines exploded around me (7) – IMMENSE (Browne 2009: 28)
Only two clue types deviate from this basic structure: In so-called all-in-one or
& lit clues (‘and literally true clues’; cf. Moorey 2008: 22), which are sometimes
marked by exclamation marks, the definition and the subsidiary indication are
merged (14). Cryptic definition clues (cf. Moorey 2008: 27), by contrast, consist of
a misleading definition or paraphrase of the answer (15). They frequently rely on
homonymy or a morphological reinterpretation of lexemes or idiomatic expres-
sions and may be marked by question marks. Non-cryptic clues were banned
when the rules for cryptic puzzles were reformulated by setters in the 1930s and
1940s (cf. Scott and O’Donnell 1998: 236).
(14) Hood’s resort few disturbed (8,6) – SHERWOOD FOREST

(15) One may move on to another American story (9) – ESCALATOR (Moorey 2008: 148, 106)
The different types of clue-answer relationship typical of non-cryptic and cryptic

puzzles are illustrated schematically in Figure 1.
Figure 1: Clue-answer relationship in non-cryptic and cryptic crosswords (CWPs)
A cryptic clue thus offers two approaches to the answer and points to it unam-
biguously, if interpreted correctly. Some crossword initiates therefore insist that
cryptic crosswords are easier to solve than non-cryptic ones (cf. Skinner 2008: 7;
Schlepper 1981: 75). However, a solver may encounter several difficulties in inter-
preting cryptic clues. First, the definition and the subsidiary indication are inte-
grated into a stretch of language which seemingly permits literal interpretation.
Yet the sole purpose of the surface structure of the clue is to mislead the solver. Its
meaning, however, is exhausted once the clue has been solved. Therefore, clues
have to be regarded as a succession of fragments which correspond to neither
morpho-syntactic nor orthographic units, since word boundaries may be shifted
and punctuation marks overruled: “A cryptic clue is a sentence or phrase, appear-
ing to make some kind of sense and putting ideas into the solver’s head. These
often have little or nothing to do with the answer, which can be derived by inter-
preting all or part of the clue in ways which are less obvious” (Biddlecombe 2009).
Second, the definition and the subsidiary indication are unmarked, may
occur in variable order and may even overlap. There may also be words or phrases
which are superfluous for solving the clue (cf. Schlepper 1981: 66), added solely
for enhancing the coherence of the surface structure. Third, even when the defi-
nition has been identified, it may be a zero-derivation, polyseme or homonym
and thus, due to the absence of any context, syntactically and/or semantically
ambiguous. Fourth, the subsidiary indication may contain several operations of
codification not necessarily indicated by signal words (for lists of such indicators
cf. Stephenson 2007: 35–63; indicators will be underlined by a broken line in the
following examples of cryptic clues).
Cryptic clues whose subsidiary indication encodes the answer semantically,
so-called double or multiple definition clues (for the names of clue types used here
cf. Moorey 2008: 13–31; Biddlecombe 2009), contain a second definition. They
are usually based on polysemy, homonymy, homography or the metaphorical or
literal reinterpretation of one or several lexemes in the clue and/or the answer
(16).
(16) Poorly educated and characterless? (10) – UNLETTERED (Moorey 2008: 154)
By contrast, homophone clues encode the answer phonologically and are based
on the phonological similarity (homeophony) or identity (homophony) of lexemes
such as whale and wail in (17).
(17) Marine beast’s audible cry (4) – WAIL (Stephenson 2007: 55)
Most frequently, however, a solver has to recompose the answer orthographically.

The easiest case of an orthographic codification is a hidden clue, explicitly con-
taining the graphemes of the answer. In the surface structure of the subsidiary
indication, these graphemes are either dispersed or contained consecutively,
often across word boundaries. Furthermore, it may be necessary to reverse the
order of the graphemes contained in or encoded by the subsidiary indication
120 Teresa Pham
(anadrome or reversal clues) or to rearrange them (anagram clues). Thus, in (18)

the graphemes of live, a synonym of quick, have to appear in inverted order to
form a synonym of sin, while in (19) the answer is an anagram of remote:
(18) Quick to return to sin (4) – EVIL (Stephenson 2007: 48)

(19) Unusually remote shooting star (6) – METEOR (Skinner 2008: 18)
In addition, graphemes may also be substituted (substitution clues) or deleted

(take away, apocopative or deletion clues). This is illustrated in (20), where the
first letter of gown, a synonym of dress, must be deleted.
(20) Possess a topless dress (3) – OWN (Moorey 2008: 20)
In crosswords of a certain complexity, however, answers may be cut into several

chunks, which, theoretically, may consist of single letters. These orthographic
chunks are then encoded separately, linearly in charade or additive clues and
non-linearly in content or container clues. In (21), the graphemes <arat> have to be
inserted into a synonym of cat, namely lion. Dec, the abbreviation for December,
the last month of the year, is added by a charade operation.
(21) Statement: Last month, a cat swallowed a rat (11) – DECLARATION (Biddlecombe 2009)
For these operations of codification, all kinds of abbreviations or acronyms may

be used, such as of military ranks (e.g. Lt for lieutenant), chemical elements (e.g.
Ag for silver) or terms from chess, music or cricket (e.g. W for wicket). Other letter
sequences constitute foreign-language articles (e.g. le/la for the, un for one), pro-
nouns (e.g. she for girl) or Roman numerals (e.g. I for one). Therefore, despite the
fact that crosswords do not show grammatical cohesion, they may still contain
lexemes which otherwise have a cohesive function.
Finally, to further complicate the solving of clues, the aforementioned oper-
ations of codification can also be combined (complex clues). Thus, three opera-
tions are included in (22): heartless indicates the deletion of the central grapheme
of the. By a charade operation (see the explanation of charade clues above), R (for
Latin rex ‘king’) is added to <te>. This letter sequence is then inserted into inn, a
synonym of public house.
(22) Confine the heartless king in a public house (6) – INTERN (Gilbert 2001: 64)
2.3 F unctional analysis
In view of the purposes of crosswords, their language is shaped by two diametric

requirements: it must, on the one hand, encrypt the answers, yet, on the other
hand, point to them unambiguously.
In non-cryptic puzzles, in which the syntactic and semantic relationship
between clues and answers is straightforward, a solver’s proficiency depends
mainly on his or her factual declarative, encyclopaedic as well as metalinguis-
tic knowledge. Only when clues can activate chunks of knowledge which are
stored as cognitive representations in the solver’s memory or when appropriate
cognitive representations can be constructed in the process of solving the puzzle
(e.g. by consulting an encyclopaedia) can those clues be solved. The language
of non-cryptic puzzles mirrors this. Most non-cryptic clues permit a literal, syn-
tactically and semantically unambiguous interpretation of the surface structure.
Furthermore, they are characterised by structural simplicity and shortness. What
primarily accounts for the difficulty of primitive puzzles are, consequently, the
currency of the lexemes functioning as answers among the target ‘solvership’ and
the extent to which esoteric knowledge is targeted. In addition, non-cryptic clues
may constitute semantically unspecific paraphrases, pointing to several answers
such as in (23). Such ambiguity can only be resolved by intersecting answers and
thus imposes a specific approach to solving the respective puzzle.
(23) Atlantic county of Eire (5) – SLIGO (N.N. 2009: 11)
To procure even greater entertainment, cryptic puzzles, by contrast, take playing

with words, testing mental flexibility and encoding answers to extremes. Their
solution requires not only general knowledge but also expert knowledge, abilities
or solution strategies. These may concern the specific conventions of codifica-
tion, the frequency of certain letters, the completion of incomplete lexemes or
the solution of anagrams. Cryptic crosswords thus often rely on the various syn-
tagmatic and paradigmatic as well as coincidental formal relationships within
the English language, which are largely irrelevant for everyday language use.
Besides knowledge, they consequently depend on “fluid cognition” (Hambrick,
Salthouse, and Meinz 1999: 131) or “lateral thinking” (Schlepper 1981: 79), i.e.
creativity, mental flexibility and logical, abstract reasoning. This focus on a more
complex codification of the answers and a more complex reasoning process in
cryptic puzzles is mirrored in their language. The structurally more complex and
longer surface structure of cryptic clues only seemingly permits literal interpreta-
tion but deliberately aims at misleading the solver. Since operations of codifica-
tion are not necessarily indicated and since the definition, the subsidiary indica-
122 Teresa Pham
tion and possible indicators are not marked, the surface structure of cryptic clues
permits multiple interpretations, semantically as well as morpho-syntactically.
As with non-cryptic puzzles, the difficulty of cryptic puzzles increases when rare
lexemes or specialised or esoteric knowledge are targeted. As against non-cryp-
tic clues, however, once the structure underlying the clue has been recognised
and the operations of codification have been identified, well-constructed cryptic
clues can be answered unambiguously, even without resorting to crosslights in
the grid.
The previous analysis showed that crosswords are associated with a particu-
lar situation and particular purposes, which are reflected in pervasive formal as
well as linguistic features. Consequently, crosswords must clearly be regarded
as an independent register according to Biber and Conrad’s definition (2009:
31; see also Schubert’s introduction to this volume). Furthermore, the detailed
semantic and morpho-syntactic analysis of crosswords revealed that non-cryptic
and cryptic puzzles, despite their being based on the same linguistic building
blocks, have developed different strategies for fulfilling their primary purpose
as entertainment. They codify answers to a different extent and therefore require
different skills on the part of the solver. Since, due to this, non-cryptic and cryptic
puzzles differ linguistically, those two types of crosswords must be regarded as
distinct sub-registers of the register of crosswords.
3 I ntertextuality in crossword puzzles:

A corpus study
Intertextuality, the seventh standard of textuality according to de Beaugrande
and Dressler (1981), implies that knowledge of one or several individual texts
or groups of texts (pre-texts) may influence the production and/or reception
of another text (the post-text). In registers like newspaper articles or advertise-
ments, intertextuality most frequently takes the form of (unmodified or modified)
quotations. Numerous studies have shown that these may have for example the
representational function of introducing additional components of meaning into
a post-text, the expressive function of supporting the author’s argumentation
and/or the conative function of guiding the reader’s reception (cf. Bühler [1934]
1982: 24–33). For an intertextual reference to fulfil (most of) its functions, (more or
less extensive) knowledge of the pre-text is required (cf. Schulte-Middelich 1985;
Stocker 1998: 73–92). However, since intertextual references are normally doubly
referential, pointing to pre-texts as well as to the extra-linguistic world (cf. Pham
2014: 472), most post-texts equally permit a literal, non-intertextual interpreta-
tion. So far, however, it has never been studied how intertextuality contributes to
the characteristics and purposes of crosswords and to what extent the analysis of
intertextual references can contribute to establishing crosswords as a register or
non-cryptic and cryptic puzzles as distinct sub-registers.
3.1 W
orking definitions
The term intertextuality was coined in the late 1960s by the Bulgarian linguist
and literary critic Julia Kristeva (1968). Yet, although intertextual references occur
particularly frequently in texts from the 20th and 21st centuries, intertextuality is
by no means an exclusively modern or postmodern phenomenon. On the con-
trary, references to previous texts or utterances may be regarded as an intrinsic
property of human language. Consequently, the study of intertextual references,
especially in the fields of rhetoric and literary theory, can be traced back to classi-
cal antiquity, albeit under different labels such as parody, quotation or imitation.
Today, there are two principal tendencies in research on intertextuality. The
theory of intertextuality is historically rooted in post-structuralist literary criti-
cism, which deconstructs the traditional concept of text. Post-structuralists like
Kristeva, Barthes and Derrida furthermore regard intertextuality as a character-
istic of all texts and consequently contest the autonomy of any text. Thus, inter-
textuality does not refer back to individual, identifiable pre-texts, but to a “texte
infini [infinite text]” (Barthes 1973: 59) or a “texte général [general text]” (Derrida
1972: 125), which is extended to comprise even the ‘social’, ‘cultural’ or ‘historical
text’ (cf. Barthes [1968] 1977: 146). However, this ontological conception of inter-
textuality has never developed a feasible method for textual analysis.
Consequently, for actual textual analysis as in the present paper, scholars
revert to the second, narrower conception of intertextuality. It regards intertex-
tual references as a gradable feature of some, yet not all texts, examines the forms
and functions of such references and, being related to structuralism, approves
of the traditional concept of text. For structuralists like Genette (1982) or Riffa-
terre (1981) intertextuality theoretically refers back to isolated, identifiable pre-
texts (or groups of pre-texts). It is this narrow conception of intertextuality that
was adopted by linguistics in the 1980s. Linguists usually distinguish between
typological intertextuality, i.e. the relationships between post-texts and groups of
texts (registers, genres, styles or textual patterns), and referential intertextuality,
i.e. the relationships between post-texts and individual, identifiable pre-texts.
The previous section showed that crosswords should be regarded as an inde-
pendent register comprising two sub-registers. Typical examples of crossword
puzzles thus follow certain conventions and are necessarily characterised by
124 Teresa Pham
typological intertextuality. Consequently, for the present study, analyses were

limited to referential intertextuality. The term intertextuality was thus understood
to comprise only the relationships between a post-text and one or more indi-
vidual and identifiable pre-texts. The intertextual subcategory of interfigurality
(cf. Müller 1991) includes the mention or appearance of figures and authors of
pre-texts in a post-text (“re-used figures [and] authors”, Helbig 1996: 115). There-
fore, references to pre-textual figures and authors were equally considered in
the present study. Moreover, a text was defined broadly as a formally delimited
communicative act which usually exists in written or spoken form but may also
consist of other visual or acoustic signs.
3.2 M
ethodology
A corpus study on intertextuality in crosswords puzzles was conducted for this

paper. Its primary aim was to investigate the particular forms and functions of
intertextual references in this type of word game in order to evaluate their impor-
tance for crosswords as a register as well as for non-cryptic and cryptic puzzles
as sub-registers.
In the first half of the 20th century, so-called quotation clues were still used in
cryptic puzzles. A citation listed in the Oxford Dictionary of Quotations (Parting-
ton 1992) was reproduced literally, explicitly marked by quotation marks, italics,
the name of the pre-text and/or the name of the author. One part of the original
wording was elided and had to be recovered by the solver as in (24), where the
quotation is accompanied by a definition:
(24) Consumed. “But answer came there none And this was scarcely odd because They’d
____ every one” (Carroll’s Through the Looking-Glass) (5) – EATEN (Gilbert 2001: 12)
Thus, for devising such clues, the setters relied on their knowledge of those pre-
texts. In order to identify the answers, solvers had to be able to access similar
knowledge of the pre-texts by activating (or constructing) appropriate cognitive
representations (cf. Geeraerts and Cuyckens 2007: 170–187). In 1995, however,
quotation clues like (24) were forbidden because they were not strictly cryptic and
because some puzzles had devoted too much attention to literary background
knowledge (cf. Biddlecombe 2009). By contrast, quotation clues like (25) are still
to be found in non-cryptic puzzles.
(25) “A Nightmare on ____ Street” – ELM (Parker 15.04.2013)

This suggests that today references to works of literature or popular culture are
considerably more frequent in non-cryptic than in cryptic puzzles and that less
knowledge of existing texts is required to solve the latter. Hence, one further aim
of the empirical study was to investigate this assumption comparatively by exam-
ining intertextual references in the two sub-registers of crosswords as to their fre-
quency, pre-texts, forms and functions.
For the corpus, two collections of crosswords were analysed, both published
in 2009, i.e. well after the abolition of quotation clues in cryptic puzzles. In total,
80 non-cryptic puzzles (2080 clue-answer pairs) from The Sun (N.N. 2009) and
80 cryptic puzzles (2372 clue-answer pairs) from The Times (Browne 2009) were
scrutinised for intertextual references according to the above definitions. When
several references occurred in one clue-answer pair or when references pointed
to several pre-texts, those were counted separately. This yielded a corpus of 270
intertextual clue-answer pairs (The Sun: 112; The Times: 158) and 295 intertextual
references (The Sun: 112; The Times: 183; 38.0 % vs. 62.0 %), which were manually
classified into five categories according to their respective pre-text(s).
Category (1) comprises references to folkloristic and mythological texts, orig-
inally transmitted orally. Clue-answer pairs requiring knowledge of literary texts
produced by individual authors according to aesthetic standards are summarised
in category (2). References to the visual arts are subsumed under category (3) and
subdivided into (a) painting/drawing/sculpture and (b) broadcasting/TV series/
film. For references to music, category (4) was created with the subcategories (a)
classical music (both orchestral and vocal) and (b) popular music. Remaining ref-
erences to religious, philosophical or other theoretical texts constitute category
(5). In some cases, the distinction between these (sub-)categories is not clear-
cut. Thus, further criteria were introduced. For example, popular music, in con-
trast to classical music, was regarded as being typically commercially oriented,
addressed to large audiences and distributed by the music industry.
In addition, each group was analysed according to the provenances of the
pre-texts or their authors. Thus, texts from Greek and Roman antiquity are classi-
fied as Classical, British is the label for pre-texts from the UK and the Republic of
Ireland, American denotes pre-texts from the USA, etc. Provenances relevant for
less than four intertextual references per category were subsumed under Other.
Due to their importance for intertextuality, Shakespeare and the Bible are listed
separately (cf. Table 2).
126 Teresa Pham
3.3 Quantitative analysis of the corpus
The first conclusion we can draw from the quantitative analysis of the corpus is
that, on the whole, and contrary to the previous assumption, intertextual refer-
ences are relatively more frequent in the cryptic puzzles published in The Times
than in the non-cryptic ones from The Sun. While crosswords in The Sun contain
1.4 intertextual references on average, puzzles in The Times contain 2.3 intertex-
tual references. Even if references to different pre-texts occurring in the same
clue-answer pair as in example (32) are not counted separately, this distributional
difference remains obvious (1.4 vs. 2.0 intertextual clue-answer pairs/puzzle). A
comparison with the frequency of intertextual references in non-cryptic puzzles
from another quality paper, The Guardian (Rusbridger 11.–16.05.2013; 0.8 refer-
ences or intertextual clue-answer pairs/puzzle), shows that this difference actu-
ally depends on the type of crossword and not on the journalistic standards or the
addressed readership of the respective newspapers. Consequently, despite there
being considerable variability in the frequency of intertextuality within the same
sub-register, cryptic puzzles generally require more knowledge of other texts than
non-cryptic puzzles. The qualitative analysis of the corpus will shed light on how
intertextual references are incorporated into cryptic puzzles, despite quotation
clues having been banned.
Table 2: Composition of the corpus of intertextual clue-answer pairs
Categories and provenances of pre-texts THE SUN THE TIMES AVERAGE
(1) Folkloristic and mythological texts 14.3 (%) 9.3 (%) 11.2 (%)
Classical 8.0 4.9 6.1

British 4.5 2.7 3.4
Other 1.8 1.6 1.7
(2) Literature 28.6 51.9 43.0

British (excluding Shakespeare) 16.1 31.1 25.4
Shakespeare 2.7 8.2 6.1
American 3.6 4.9 4.4
French 2.7 2.7 2.7
Other 1.8 2.2 2.0
Table 2(continued)
Categories and provenances of pre-texts THE SUN THE TIMES AVERAGE
(3) Visual arts: 19.6 6.6 11.5
(a) Painting/drawing/sculpture 3.6 3.8 3.7
Italian 1.8 1.1 1.4

Other 1.8 2.7 2.4
(b) Video/broadcasting/TV series/films 16.1 2.7 7.8
British 8.9 0.5 3.7

(4) Music: 22.3 13.1 16.6
(a) Classical music 8.0 10.9 9.8
British 1.8 3.3 2.7

Italian 3.6 2.7 3.1
Other 2.7 4.9 4.1
(b) Popular music 14.3 2.2 6.8
British 8.0 1.1 3.7

Other 0.9 0.0 0.3
(5) Religious, philosophical and other 15.2 19.1 17.6

theoretical texts

British 0.9 5.5 3.7
The Bible 14.3 7.1 9.8
Other 0.0 3.3 2.0
Note: All values are percentages and are calculated based on the number of intertextual
references in the crosswords from The Sun (112), The Times (183) or both newspapers (295;
labelled Average). Differences for example between percentage sums (shaded cells) corre-
sponding to (sub-)categories and respective individual percentage values (white cells) corre-
sponding to provenances result from rounding to one decimal place.
Table 2 discloses the most popular pre-textual categories in crosswords in

general. Works of literature are by far the most important ones (43.0 %), followed
by religious, philosophical or other theoretical texts (17.6 %) and folkloristic and
mythological texts (11.2 %). If provenance is considered as well, British literature
(including Shakespeare; 31.5 %), the Bible (9.8 %) and myths of classical antiq-
uity (6.1 %) are the most important pre-texts. In addition, Shakespeare is the indi-
vidual author who is by far most often referred to (6.1 %). This result might be sur-
128 Teresa Pham
prising, since it has often been claimed that, at least since the mid-20th century,
the traditional pre-texts of the Victorian Age have declined in importance in
Anglo-American culture: “until recently Classical mythology, the works of Shake-
speare and the Bible were regular sources for compilers” (Scott and O’Donnell
1998: 207; cf. also Hebel 1991: 149). Consequently, the predominance of these pre-
texts in crosswords may have been even clearer in the first half of the 20th century.
This result supports Partridge’s assumption that typical solvers are thoroughly
and “humanistically educated” (Partridge 1992: 504).
Furthermore, it is equally revealing to compare the favourite pre-texts of the
two sub-registers of crosswords. Thus, clues in non-cryptic crosswords from The
Sun require knowledge of literary works in general (28.6 %), British literature
(excluding Shakespeare; 16.1 %) and Shakespeare (2.7 %) less frequently than
clues in cryptic crosswords from The Times (51.9 %, 31.1 % and 8.2 %). By contrast,
puzzles from The Sun refer to the Bible (14.3 %) and to the oral tradition (14.3 %),
especially to classical mythology (8.0 %), relatively more frequently than puzzles
from The Times (7.1 %, 9.3 % and 4.9 %). The reason for these different preferences
especially with regard to the traditional pre-texts of the Victorian Age might be
that a British solver with an average education can be expected to possess more
extensive general knowledge of the Bible and all texts of classical mythology
than of the 38 plays and 154 sonnets commonly attributed to Shakespeare (cf.
Greenblatt 1997: 65–66, 1923–1976). The most striking distributional differences
between the two sub-registers can, however, be found in categories (3b) and (4b).
Knowledge of (especially Anglo-American) video, broadcasting, TV series, films
and popular music is necessary for the solution of nearly one third of all inter-
textual non-cryptic clues (30.4 %) but is hardly relevant for cryptic puzzles at all
(4.9 %). Cryptic crosswords of the corpus thus primarily target traditional pre-texts
like classical mythology, Shakespeare and the Bible, whereas non-cryptic puzzles
focus on Shakespeare to a smaller, yet on classical mythology and the Bible to a
greater extent and additionally require knowledge of texts of the popular, espe-
cially Anglo-American culture. However, only a corpus including non-cryptic and
cryptic clue-answer pairs from further (popular and quality) newspapers could
reveal whether these preferences for certain pre-textual categories are correlated
to the respective sub-register of crosswords or to the expected knowledge of the
target solvership (or to both).
3.4 Qualitative analysis of the corpus
In both sub-registers of crosswords, most intertextual references (95.2 %) involve

proper nouns (including titles). Due to their fixed extension but particularly
complex intension as well as their high selectivity and explicit markedness (cf.
Pfister 1985: 28; Karrer 1985: 106–108), proper nouns contribute to the codifica-
tion of answers as well as the unequivocal solution of clues. Hence, they are well-
suited for intertextual references in crosswords.
In more than two thirds of all intertextual non-cryptic adjacency pairs of the
corpus (67.9 %), proper nouns referring to the same pre-text occur in both the clue
and the answer, usually in combination with common nouns providing further
information on the referent (26). Thus, although these references are unmarked,
proper nouns can usually activate the necessary cognitive representations un‑
equivocally even without the grid.
(26) Writer of 1984 (6) – ORWELL (N.N. 2009: 77)
In about one third of the non-cryptic clue-answer pairs of the corpus, proper
nouns occur either in the answer as in (27) (23.2 %) or, more rarely, in the clue
as in (28) (6.3 %), whereas the other component of the pair gives a semantically
equivalent common noun or noun phrase. Only one selective proper noun being
involved, more pre-textual knowledge is required for correctly associating clue
and answer. Furthermore, the solver may encounter a certain ambiguity, which is
resolved only when the number of letters of the answer is considered or crosslights
are already given in the grid:
(27) Opera composer (7) – PUCCINI

(28) Puccini work (5) – OPERA (N.N. 2009: 45, 53)
A comparison with non-cryptic puzzles from The Guardian (Rusbridger 11.–

16.05.2013) shows that these two types are particularly typical of this sub-register.
In the corpus, only three non-cryptic clues (2.7 %) require exact knowledge of the
wording of a pre-text and may thus, despite their featuring no explicit markers,
be classified as quotation clues. Interestingly, all three refer to texts of popular
culture: the catchphrase of a British comedian and the beginnings of two nursery
rhymes. These pre-texts can be expected to be common knowledge among British
solvers.
(29) Tommy Cooper’s catchphrase (4,4,4) – JUST LIKE THAT

(30) Ride a cock horse to here (7,5) – BANBURY CROSS
(31) Silver-buckled sailor (5,7) – BOBBY SHAFTOE (N.N. 2009: 71, 147, 155)
In cryptic puzzles, by contrast, proper nouns are used with greater variation as
intertextual references. One major difference between the two sub-registers in the
corpus is that intertextual proper nouns may occur in the subsidiary indication
130 Teresa Pham
of cryptic clues, i.e. as an intermediate step in the solution of the clue (32.9 %).
From a cognitive linguistic point of view, especially well-known proper nouns
automatically activate easily accessible pre-textual frames. Whereas the frames
activated by intertextual references in non-cryptic puzzles are directly relevant
for the answers, this is not always the case in cryptic puzzles. Only lexemes in the
definition need to be interpreted literally. Intertextual references in the subsid-
iary indication, however, usually require no pre-textual knowledge at all. They
activate frames which mislead the solver and inhibit finding the answer, espe-
cially when knowledge of a completely different pre-text is required. Thus, in (32)
no knowledge of Lewis or the Lake poets is necessary because the answer, the
name of a different poet, is an anagram of the letters <TV CS Lewis Lake> given in
the subsidiary indication.
(32) TV broadcast with C S Lewis and Lake poet (9-4) – SACKVILLE-WEST (Browne 2009: 52)
Cryptic clues whose definitions and answers contain intertextual proper nouns
(usually referring to the same pre-text; 15.9 %) resemble the first type of non-
cryptic clue discussed before: an intertextual name in the definition is often
sufficient for an unequivocal solution and only basic pre-textual knowledge is
required. Whereas the additional subsidiary indication first complicates the acti-
vation of the necessary cognitive representations, once identified, it indicates the
correctness or falsehood of the supposed answer. In (33) the name of a Shake-
spearean spirit also results from the insertion of the Roman numeral for one into
an anagram of Lear. Equally, the answer in (34) is not only indicated by the defini-
tion but is also confirmed by the subsidiary indication: for the mythological place
name the graphemes of no and lava, paraphrased by sign of volcanic activity, are
reversed.
(33) Shakespearean spirit – one into Lear possibility (5) – ARIEL

(34) No sign of volcanic activity about Arthur’s Seat (6) – AVALON (Browne 2009: 132, 56)
When intertextual proper nouns occur in the answer (41.1 %) or, more rarely, in
the definition only (5.1 %) and the corresponding counterpart is constituted by a
semantically equivalent common noun or noun phrase, as with the second type of
non-cryptic clue discussed above, the answer can usually not be inferred unam-
biguously from the definition alone. However, in these cryptic clues, the subsid-
iary indication may resolve the ambiguity. Furthermore, such clues require more
detailed knowledge of pre-texts than the previous categories. While the definition
in (35) does not unambiguously identify the intertextual answer, the subsidiary
indication requires the formation an anagram of relies on. By contrast, splitting
a couple, i.e. a lady and a man, by S (from succeeded) results in a synonym of the
intertextual eponym Casanova in (36).
(35) Relies on horribly haunted castle? (8) – ELSINORE

(36) Casanova succeeded splitting couple? (5,3) – LADY’S MAN (Browne 2009: 40, 72)
Moreover, seven cryptic clues (4.4 %) require knowledge of the exact wording of
pre-textual passages. Thus, although they do not follow the traditional pattern of
quotation clues (featuring e.g. quotation marks and a gap which has to be recov-
ered), they must be classified as quotation clues. Not only is their share larger
than in non-cryptic puzzles, but they also refer to a different category of pre-texts.
While only two clues, (37) and (38), refer to popular culture (an English nursery
rhyme and a musical based on poems by Eliot), the others require knowledge
of works of well-known British and international authors: Shakespeare (39), but
only seemingly (40) and (41), Shelley (42), Carroll (43), Gray (41) and Plutarch
(40).
(37) When Grundy was christened, 48 hours before Chesterton’s man (7) – TUESDAY
(38) Reason for Macavity’s lack of presence (5) – ALIBI
(39) Underworld scam over shelter – it blighted Gloucester’s winter (10) – DISCONTENT
(40) Composer includes girl in second act of Julius Caesar (7) – VIVALDI
(41) Hamlet’s rude ancestor heard warning priest (10) – FOREFATHER
(42) Lovely old piece describing Shelley’s traveller’s land (7) – ANTIQUE
(43) Giving nasty looks? Alice never heard of such a thing! (12) – UGLIFICATION (Browne
2009: 156, 104, 92, 58, 90, 44, 42)
Finally, three cryptic clues (1.9 %) are based on idioms derived from individual
pre-texts. For these, the activation of pre-textual frames may be helpful, yet is by
no means essential. The idiomatic collocation representing the answer in (44) is
derived from Shakespeare’s Antony and Cleopatra (1.5.72). The subsidiary indi-
cation instructs the solver to insert sad (‘blue’) into lad (‘boy’) and to add ays
(‘votes’).
(44) Boy in blue votes for Green term (5,4) – SALAD DAYS (Browne 2009: 58)
The qualitative analysis of the corpus revealed that intertextual references in

crosswords differ drastically from those in other registers, formally as well as
functionally. Whereas intertextuality e.g. in newspapers or advertisements most
frequently takes the form of quotations (cf. Pham 2014), interfigural relation-
ships are the predominant formal category in the present corpus. Furthermore,
intertextual references in other registers are usually doubly referential (cf. Pham
2014: 472), referring to both the extralinguistic world and the respective pre-texts.
132 Teresa Pham
Thus, an advertising slogan like “To smoke or not to smoke” for cigarettes (Mieder
1985: 126) can be interpreted as a statement about the world, expressing that the
consumer has to decide between two alternative actions, or as an intertextual
reference to Shakespeare’s Hamlet, additionally suggesting that the decision is
essential to the consumer. By contrast, a literal, non-intertextual interpretation
of references in non-cryptic clues as well as in the definition of cryptic puzzles
does not lead to the answer, whereas intertextual references in the subsidiary
indication of cryptic clues must be interpreted literally only. In both cases, the
clues’ meaning is exhausted as soon as the answer has been identified. Intertex-
tual references in puzzles can thus not be regarded as doubly referential.
The analysis of the corpus and the comparison with non-cryptic intertextual
clues from a quality newspaper further identified various types of intertextual
clue-answer pairs in non-cryptic and cryptic puzzles. These types typically estab-
lish intertextual relationships of different intensity and occur more frequently or
even exclusively in one or the other sub-register of crosswords. Cryptic puzzles
not only use intertextuality more often to encode the answer. Intertextual
clue-answer pairs in cryptic puzzles also tend to require the activation of more
comprehensive pre-textual knowledge than in non-cryptic puzzles. Furthermore,
cryptic puzzles require knowledge of a greater variety of pre-texts and also of pre-
texts which cannot be regarded as part of popular culture. Finally, well-known
pre-texts like Shakespeare’s Hamlet are referred to for misleading the solver by
activating easily accessible frames of knowledge.
4 C
onclusion
While crosswords had never been studied in detail from a text linguistic perspec-
tive, the present paper established and analysed crossword puzzles as an inde-
pendent register with non-cryptic and cryptic puzzles as distinct sub-registers.
In addition, neither had referential intertextuality been investigated as a charac-
teristic of crosswords, nor had it been considered as a linguistic feature relevant
for register analysis. Thus, Biber and Conrad only mention references to previous
scientific publications or postings in chatgroups (2009: 68, 289), but no other
types of intertextuality. However, intertextual clue-answer pairs occurring on
average more than once in every crossword in the present corpus (1.7 intertextual
clue-answer pairs/puzzle), this paper proved intertextuality to be one important
strategy of codification in this type of word game. Furthermore, intertextuality is
used in a manner differing radically from other texts, formally as well as func-
tionally. As a pervasive, frequent and distinctive linguistic feature of crosswords
which is related to the purposes and the communicative situation characteristic

of this register, intertextuality must be included in a register analysis of this type
of puzzle according to the framework by Biber and Conrad (2009). It might also
turn out to be relevant for the analysis of other registers.
Moreover, the present corpus study revealed considerable differences in the
way non-cryptic and cryptic puzzles employ intertextual references. It thus con-
firmed the distinction between two sub-registers of crosswords. In non-cryptic
clues, intertextuality typically supports the unambiguous solution of the clue
and demands only superficial pre-textual knowledge. In cryptic crosswords, by
contrast, intertextual references and even quotation clues are more frequent,
despite the latter having been officially banned in 1995. 23 cryptic clue-answer
pairs of the corpus such as (40) or (32) even contain references to two or three
pre-texts. Thus, cryptic puzzles more frequently require the activation of pre-tex-
tual frames than non-cryptic puzzles and these frames need to be more detailed.
Cryptic crosswords also feature references which are formally more variable and,
at least initially, lead to ambiguities which account for part of the cryptic charac-
ter of this sub-register. What is specific to cryptic puzzles is the reference to well-
known pre-texts in the subsidiary indication for misleading the solver. However,
the present corpus permits no conclusion as to whether the pre-textual categories
targeted by crosswords are dependent on the type of sub-register or the expected
knowledge of the target readership of the newspapers in which these puzzles are
published (or both). Thus, further corpus studies should be undertaken to specif-
ically examine this correlation.
Bibliography
Augarde, Tony. 2003. The Oxford guide to word games. Oxford: Oxford University Press.
Barthes, Roland. [1968] 1977. The death of the author. In Roland Barthes, Image music text,
142–148. London: Fontana Press.
Barthes, Roland. 1973. Le plaisir du texte. Paris: Editions du Seuil.
Beaugrande, Robert-Alain de & Wolfgang Ulrich Dressler. 1981. Introduction to text linguistics.
London & New York: Longman.
University Press.
Biddlecombe, Peter. 2009. Yet another guide to cryptic crosswords. http://www.biddlecombe.
demon.co.uk/yagcc/(accessed 27 January 2015).
Browne, Richard. 2009. The Times crossword book 13. London: Times Books.
Bühler, Karl. [1934] 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart &
New York: Gustav Fischer Verlag.
Coffey, Steve. 1998. Linguistic aspects of the cryptic crossword. English Today 14(1). 14–18.
134 Teresa Pham
Cornell, Alan & Marion Cornell. 1980. Fragen und Antworten im englischen Kreuzworträtsel. In
Ernst Burgschmidt (ed.), Beiträge zu einer Linguistischen Landeskunde und Sprachpraxis,
44–63. Braunschweig: Verlag E. Burgschmidt.
Derrida, Jacques. 1972. Positions: Entretiens avec Henri Ronse, Julia Kristeva, Jean-Louis
Houdebine, Guy Scarpetta. Paris: Les Editions de Minuit.
Dienhart, John M. 1998. A linguistic look at riddles. Journal of Pragmatics 31. 95–125.
Fix, Ulla. 2011. Das Rätsel: Bestand und Wandel einer Textsorte. Oder: Warum sich die
Textlinguistik als Querschnittsdisziplin verstehen kann. In Ulla Fix (ed.), Texte und
Textsorten – sprachliche, kommunikative und kulturelle Phänomene, 185–214. 2nd edn.
Berlin: Frank & Timme.
Furthmann, Katja. 2006. Die Sterne lügen nicht: Eine linguistische Analyse der Textsorte
Pressehoroskop. Göttingen: V&R unipress.
Geeraerts, Dirk & Hubert Cuyckens (eds.). 2007. The Oxford handbook of cognitive linguistics.
Oxford: Oxford University Press.
Genette, Gérard. 1982. Palimpsestes: La littérature au second degré. Paris: Éditions du Seuil.
Gilbert, Val. 2001. The Daily Telegraph: How to crack the cryptic crossword. London: Pan Books.
Goldblum, Naomi & Ram Frost. 1987. The crossword puzzle paradigm: The effectiveness of
different word fragments as cues for the retrieval of words. Haskins laboratories status
report on speech research SR-89/90. 133–146.
Greenblatt, Stephen (ed.). 1997. The Norton Shakespeare. Based on the Oxford Edition. London:
W. W. Norton & Company.
Greimas, Algirdas Julien. 1970. L’écriture cruciverbiste. In Algirdas Julien Greimas (ed.), Du sens:
Essais sémiotiques, 285–307. Paris: Éditions du Seuil.
Hambrick, David Z., Timothy A. Salthouse & Elizabeth J. Meinz. 1999. Predictors of crossword
puzzle proficiency and moderators of age–cognition relations. Journal of Experimental
Psychology: General 128(2). 131–164.
Hebel, Udo J. 1991. Towards a descriptive poetics of allusion. In Heinrich F. Plett (ed.),
Intertextuality, 135–164. Berlin & New York: Walter de Gruyter.
Heinemann, Margot. 2000. Textsorten des Alltags. In Klaus Brinker, Gerd Antos, Wolfgang
Heinemann & Sven F. Sager (eds.), Text- und Gesprächslinguistik. Ein internationales
Handbuch zeitgenössischer Forschung, 604–614. Berlin & New York: Walter de Gruyter.
Helbig, Jörg. 1996. Intertextualität und Markierung: Untersuchungen zur Systematik und
Funktion der Signalisierung von Intertextualität. Heidelberg: Universitätsverlag C. Winter.
Karrer, Wolfgang. 1985. Intertextualität als Elementen- und Struktur-Reproduktion. In Ulrich
Broich & Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische
Fallstudien, 98–116. Tübingen: Niemeyer.
Kristeva, Julia. 1968. Le texte clos. Langages 12. 103–125.
Mieder, Wolfgang. 1985. Sprichwort, Redensart, Zitat: Tradierte Formelsprache in der Moderne.
Bern, Frankfurt am Main & New York: Peter Lang.
Mok, Quirinus Ignatius Maria. 1987. Mots croisés et ambiguïté. In Brigitte Kampers-Manhe &
Co Vet (eds.), Études de linguistique Française offertes à Robert de Dardel par ses amis et
collègues, 97–108. Amsterdam: Éditions Rodopi B. V.
Mollica, Anthony. 2007. Crossword puzzles and second-language teaching. Italica 84(1). 59–78.
Moorey, Tim. 2008. How to master the Times crossword: The Times cryptic crossword
demystified. London: Harper Collins Publishers.
Müller, Wolfgang G. 1991. Interfigurality: A study on the interdependence of literary figures. In

Heinrich F. Plett (ed.), Intertextuality, 101–121. Berlin & New York: Walter de Gruyter.
Nickerson, Raymond S. 2011. Five down, absquatulated: Crossword puzzle clues to how the
mind works. Psychonomic Bulletin & Review 18. 217–241.
N.N. 2009. The Sun two-speed crossword book 10. London: Harper Collins.
Parker, Timothy. 15.04.2013. Universal crossword. New York Post. New York: News Corporation.
Partington, Angela (ed.). 1992. The Oxford dictionary of quotations. 4th edn. Oxford & New
York: Oxford University Press.
Partridge, John G. 1992. Linguistic reflections on the English crossword puzzle. In Claudia Blank
(ed.), Language and civilization. A concerted profusion of essays and studies in honour of
Otto Hietsch, 495–504. Frankfurt am Main: Peter Lang.
Pepicello, William J. 1980. Linguistic strategies in riddling. Western Folklore 39(1). 1–16.
Pfister, Manfred. 1985. Konzepte der Intertextualität. In Ulrich Broich & Manfred Pfister (eds.),
Intertextualität: Formen, Funktionen, anglistische Fallstudien, 1–30. Tübingen: Niemeyer.
Pham, Teresa. 2014. Intertextuelle Referenzen auf Shakespeare. Eine kognitiv-linguistische
Untersuchung. Münster: LIT Verlag.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive
grammar of the English language. Harlow: Longman.
Riffaterre, Michael. 1981. Interpretation and undecidability. New Literary History 12(2).
227–242.
Rolf, Eckard. 1993. Die Funktion der Gebrauchstextsorten. Berlin & New York: de Gruyter.
Rusbridger, Alan (ed.). 11.–16.05.2013. Quick crossword No. 13,418–13,422. London: Guardian
Media Group.
Scheufele, Dietram A. & David Tewksbury. 2007. Framing, agenda setting, and priming: The
evolution of three media effects models. Journal of Communication 57. 9–20.
Schlepper, Wolfgang. 1981. Confusing poet makes fine stuff (5): The “wrestle with words
and meanings” in the crossword puzzle. In Hans-Jürgen Diller, Stephan Kohl, Joachim
Kornelius, Erwin Otto & Gerd Stratmann (eds.), anglistik & englischunterricht. Vol. 15,
61–80. Trier: WVT Wissenschaftlicher Verlag Trier.
Schubert, Christoph. 2012. Englische Textlinguistik. Eine Einführung. 2nd edn. Berlin: Erich
Schmidt.
Schulte-Middelich, Bernd. 1985. Funktionen intertextueller Textkonstitution. In Ulrich Broich
& Manfred Pfister (eds.), Intertextualität: Formen, Funktionen, anglistische Fallstudien,
197–242. Tübingen: Niemeyer.
Scott, W. T. & H. O’Donnell. 1998. Recovering meaning from chaos? Word play and the challenge
of sense. In William Pencak & J. Ralph Lindgren (eds.), New approaches to semiotics and
the human sciences: Essays in honor of Roberta Kevelson, 203–239. New York: Peter Lang
Publishing.
Simpson, John A. & Edmund S. C. Weiner (eds.). 2015. Oxford English dictionary online. Oxford:
Oxford University Press. http://oed.com/(accessed 27 January 2015).
Skinner, Kevin. 2008. How to solve cryptic crosswords. London: Right Way, Constable &
Robinson.
Stephenson, Hugh. 2007. Secrets of the setters. How to solve the Guardian crossword. London:
Atlantic Books.
Stocker, Peter. 1998. Theorie der intertextuellen Lektüre: Modelle und Fallstudien. Paderborn:
Ferdinand Schöningh.
136 Teresa Pham
Stratmann, Gerd. 1995. Kreuzworträtsel. In Rüdiger Ahrens, Wolf-Dietrich Bald & Werner Hüllen
(eds.), Handbuch Englisch als Fremdsprache (HEF), 192–195. Berlin: Erich Schmidt.
Underwood, Geoffrey, Caroline Deihim & Viv Batt. 1994. Expert performance in solving word
puzzles: From retrieval cues to crossword clues. Applied Cognitive Psychology 8. 531–548.
Weisskirch, Robert S. 2006. An analysis of instructor-created crossword puzzles for student
review. College Teaching 54(1). 198–201.
Witte, Kenneth L. & Joel S. Freund. 1995. Anagram solution as related to adult age, anagram
difficulty, and experience in solving crossword puzzles. Aging, Neuropsychology, and
Cognition 2(2). 146–155.
Section II:
Cross-register comparison
While the studies in Section I concentrated on single registers, Section II provides
cross-register comparisons, in which the distinctive features and markers of reg-
isters can be identified with great accuracy and perspicuity by means of juxtapo-
sition. As the contributions will show, such comparisons are particularly reveal-
ing when the registers under discussion are from clearly divergent domains. The
fact that each of the three papers in Section II includes academic writing demon-
strates that this register is highly distinctive and therefore well-suited as a yard-
stick for text-linguistic collation.
Christina Sanchez-Stockhammer’s study “Punctuation as an indica-
tion of register: Comics and academic texts” establishes a link to the papers by
Rolf Kreyer and Teresa Pham in Section I, since it also analyses a register from
popular culture, in this case the language of comics. At the same time, this con-
tribution enters uncharted linguistic territory by focusing on punctuation as a
register marker, which has been widely neglected so far despite its pervasive-
ness in written discourse. The study is based on two small-scale corpora, namely
AcadText, a corpus of journal articles, and CoCo, a comic corpus, both of which
were designed and compiled for register comparison by the author. It is shown
that different punctuation marks have varying functions and deviant frequencies
in relation to the written or spoken mode prominent in the registers. As a result,
features of punctuation are suggested as a valid and necessary extension of Biber
and Conrad’s (2009) model of register analysis.
In her paper “Linking up register and cognitive perspectives: Parenthetical
constructions in academic prose and experimentalist poetry”, Martina Lampert
chooses a specific linguistic feature as the standard of register comparison. By
concentrating on the syntactic construction of parenthesis, she draws an analogy
between a minimalist poem by E. E. Cummings and a scientific research paper
within the framework of a microscopic qualitative analysis. She picks two regis-
ters which are located at the opposite ends of a continuum of written discourse
and pays attention to punctuation marks as well, in this case to parenthetical
round brackets (so-called lunulae). Since situational features of register descrip-
tion are closely linked to cognitive principles, a correspondence is established
between Biber’s register analysis and Leonard Talmy’s cognitive semantic
approach. Lampert concludes by arguing that parenthesis should be included in
Biber and Conrad’s (2009) list of lexico-grammatical features relevant to register
investigation.
138 Section II: Cross-register comparison
The study “Cohesive devices across registers and varieties: The role of
medium in English” by Stella Neumann and Jennifer Fest combines the compar-
ative analysis of academic writing, administrative writing, broadcast discussions,
conversations and exams with regional variation. The term “regional” is used
here and in the paper in the broader Hallidayan sense grouping variation by the
speakers’ geographical background as opposed to functional variation varying by
context of use, not by user. Based on data from the International Corpus of English,
functional variation is investigated within the six L1 and L2 Englishes of Singa-
pore, Hong Kong, India, Canada, Jamaica and New Zealand. An examination of
the lexico-grammatical features of pronouns, conjunctions and lexical density
sheds new light on the use of cohesive ties across both varieties and registers.
In particular, quantitative surveys show that there are significant differences in
the frequency of the cohesive items between spoken and written registers. Along
these lines, it becomes obvious that an exhaustive discussion of any regional or
national variety of English needs to take into account register variation as well, so
that text linguistics is shown to be an indispensable complement to sociolinguis-
tics. Moreover, this paper builds a bridge to Section III, in which the interrelation
between regional and register variation is further elucidated.
Punctuation as an indication of register:
Comics and academic texts
Abstract: The currently most established definition of a register is the one devel-
oped by Douglas Biber in numerous publications (e.g. Biber 1988, 1995, 2006),
namely “a variety associated with a particular situation of use” (Biber and Conrad
2009: 6). The delimitation of individual registers such as telephone conversations
or newspaper editorials is based on their situational context, their lexical and
grammatical characteristics and the functional relationship obtaining between
context and language.
While Biber’s multidimensional approach already considers a multitude
of different lexico-grammatical features as potential indicators of register, this
paper adds a new perspective by exploring a feature type which has not been
taken into account so far in the different versions of his model, namely punctu-
ation.
After discussing the functions of various punctuation marks, the paper pre-
sents the corpus-based evidence of a small-scale study on two registers tending
towards the extremes of the spoken – written dimension, namely academic texts
and comics. To this end, the corpus AcadText was compiled for the present study
by analogy to the comic component of the comic corpus CoCo (described in
Sanchez-Stockhammer 2012), which comprises excerpts from Superman, Batman
and Uncle Scrooge and considers the text occurring in text boxes with narration,
inside speech bubbles, as onomatopoeia superimposed on the pictures etc.
The results show that some punctuation marks (such as exclamation marks
and round brackets) correlate strongly with spoken and written style respectively
and barely occur in the contrasting register. Furthermore, even in those cases
where the results are quantitatively similar, differences in usage become obvious
upon closer consideration – e.g. the dominant use of commas after introduc-
tory interjections or proper nouns with vocative function in comics as compared
to more varied uses of that punctuation mark in academic texts. These results
suggest that punctuation is indicative of register indeed, and that it makes sense
to introduce punctuation as an additional category in Biber’s register model.
140 Christina Sanchez-Stockhammer
1 Introduction: Punctuation and register

Many features of language occur in both speech and writing, but some are spe-
cific to one of these two modalities: thus phonetic assimilation phenomena and
intonation are by necessity restricted to the spoken modality, since they concern
auditory phenomena, whereas punctuation immediately comes to mind as a
visual linguistic feature that only occurs in writing. While it is sometimes claimed
that punctuation acts as a substitute for prosody and pauses in the written modal-
ity, Meyer (1987: 69) notes that “punctuation is at best a rather crude reflection
of the complexities of prosody” and that the relation between the two is unsys-
tematic. Thus commas are sometimes but not always used in contexts where one
would expect a pause in speech – and sometimes they occur in contexts without
a prosodic juncture: for example, the sentence
(1) Those who are fond of sleeping late make unreliable workers.
is usually spoken with a pause after late, but it does not contain a comma if
common spelling conventions are adhered to (Meyer 1987: 70). By contrast, the
sentence
(2) A couple of the males made good comedy, too.
is realised with a comma but arguably not produced with a pause in speech
(Meyer 1987: 71). This raises the question whether the reverse relation between
punctuation as the primary feature and prosody as its realisation in speech can
also be postulated. One of the few exceptions where it is claimed that a feature of
the written modality is rendered in oral communication are so-called air quotes,
which are drawn into the air manually while speaking and which “intermodally”
refer the listeners/receivers to the printed source of a spoken quote (cf. Lampert
2013). However, air quotes rely on visual gestures rather than prosody. By con-
trast, punctuation marks are produced orally on some occasions, such as when
separating whole numbers from decimals, e.g. in
(3) one point five.
However, in such cases it is actually the terms referring to the punctuation marks
that are realised in the spoken modality rather than the corresponding function of
the punctuation mark. A third conceivable option regarding the relation between
punctuation and prosody is that there is none: for instance, Nunberg (1990: 7)
argues that punctuation has no correspondence in speech and that it exploits
“the particular expressive resources that graphical presentation makes availa-
ble” in order to serve the requirements of written communication. Yet whatever

the relation between speech and writing is recognised to be, what remains is that
punctuation constitutes a characteristic feature of written language. This raises
the question whether it is possible to recognise general principles of punctuation
underlying all written language use, which are common to all written registers,
or whether it is more appropriate to consider more specific tendencies in the use
of punctuation marks in particular communicative situations. For instance, the
combination <…???!!!> seems acceptable in the sentence
(4) She did what…???!!!
whereas one is extremely unlikely to encounter an example such as
(5) The increasing evidence that language processing is sensitive to lexical and structural
co-occurrences at different levels of granularity and abstraction has led to the hypothesis
that lexical and structural processing may be unified…???!!!
in actual language usage – at least not in the original context of use.1 Various
explanations can be advanced for this: for instance, the first sentence is very
short and therefore lends itself to the incredulous intonation associated with
such a cluster of punctuation marks far better than the second sentence with its
complex structure. Possibly even more importantly, the second sentence contains
information situating it in the register of written academic language (it has been
adapted from Snider’s 2009 article “Similarity and structural priming”), and it
would seem that the punctuation above is unusual for an academic text to say
the least. This discrepancy between the constructed example above and readers’
expectations suggests that language users tend to expect particular types of punc-
tuation mark and their combination in some types of text rather than in others.
If that is the case, then it should also be possible to use punctuation marks as
an indication or even marker of individual registers – a hypothesis that will be
explored in the remainder of this paper.
Following Peters (2004: 447), the present contribution distinguishes between
word punctuation (comprising e.g. hyphens and apostrophes occurring within
1 By contrast, it is conceivable to encounter the example in an online discussion forum or blog

with reference to unclear academic writing. (I am grateful to my anonymous reviewer for point-
ing this out to me.) In that case, however, the sentence (which is a quotation) and the punctu-
ation marks (which represent a comment) are situated on different linguistic levels. This is yet
another example of the more general observation that texts with a metalinguistic function in the
sense of Jakobson (1985) may depart from common usage. As a consequence, texts on linguistics
should ideally be avoided in the compilation of general-language corpora.
unspaced sequences of letters) and sentence punctuation, and it concentrates on

the latter. Sentence punctuation is usually characterised by the use of a space on
that side of the punctuation mark which is not directly attached to a preceding or
following sequence of letters and comprises
full stop .
question mark ?
exclamation mark !
comma ,
semicolon ;
colon :
dash –
slash /
suspension dots …
single quotation marks ‘’
double quotation marks “”
round brackets ()
square brackets [].
Register as the second concept which needs to be defined for the empirical study
presented here is used with different meanings in the literature (cf. Schubert,
this volume). The most commonly used definitions of register are based on the
work of Douglas Biber. In numerous publications (e.g. Biber 1988, 1995, 2006),
his use of the term has developed from what might be called a synonym of genre
(Biber 1995: 910 about Biber 1988) to “a variety associated with a particular situ-
ation of use” (Biber and Conrad 2009: 6), i.e. a concept comprising all situation-
dependent variation in language use, regardless of the level of specialisation
(Biber and Conrad 2009: 32), but with specific sub-registers displaying less var
iation than more general registers (Biber and Conrad 2009: 33). In Biber’s model
(for a summary cf. Schubert, this volume), register features occur throughout texts
from a particular register and are more frequent in the target register than in most
other registers. Thus the passive voice is not restricted to academic writing and
may occur in different types of text, but it is particularly frequent in that register.
Register features can be structures on any linguistic level, from words to syntactic
constructions. The occurrence of specific lexico-grammatical features in regis-
ters is attributed to their functionality (Biber 2006: 11): they are believed to be
“particularly well suited to the purposes and situational context of the register”
(Biber and Conrad 2009: 6). The co-occurrence of features is therefore interpreted
as reflecting their shared functions (Biber 1995: 30). With regard to the features
under consideration, Biber’s approach has evolved in the course of time:
– Both Biber (1988: 73–75) and Biber (1995: 94–96) consider 16 major categories
comprising 67 linguistic features:
1) Tense and aspect markers

2) Place and time adverbials
3) Pronouns and pro-verbs
4) Questions
5) Nominal forms
6) Passives
7) Stative forms
8) Subordination features
9) Prepositional phrases, adjectives and adverbs
10) Lexical specificity
11) Lexical classes
12) Modals
13) Specialised verb classes
14) Reduced forms and dispreferred structures
15) Co-ordination
16) Negation.
– These are reduced to seven major categories in Biber (2006: 241):
1. vocabulary distributions (e.g., the number of different words in classroom teaching

versus textbooks), including the distributional classifications of words from the four
content word classes (e.g., common vs. rare nouns, common vs. rare verbs);
2. grammatical part-of-speech classes (e.g., nouns, verbs, first and second person pro-
nouns, prepositions);
3. semantic categories for the major word classes (e.g., activity verbs, mental verbs, exist-
ence verbs);
4. grammatical characteristics (e.g., nominalizations, past tense verbs, passive voice
verbs);
5. syntactic structures (e.g., that relative clauses, to complement clauses);
6. lexico-grammatical associations (e.g., that-complement clauses and to-complement
clauses controlled by communication verbs vs. mental verbs);
7. lexical bundles – i.e. recurrent sequences of words.
– Biber and Conrad (2009: 78–82), by contrast, classify their 75 subcategories

(some of which can be split up further) into 15 major categories:
1) Vocabulary features
2) Content word classes
3) Function word classes
4) Derived words
5) Verb features
6) Pronoun features
7) Reduced forms and dispreferred structures

8) Prepositional phrases
9) Coordination
10) Main clause type
11) Noun phrases
12) Adverbials
13) Complement clauses
14) Word order choices
15) Special features of conversation.
Without going into detail what these various categories represent precisely, it
becomes immediately obvious that punctuation or other orthographic character-
istics (such as capitalisation) do not figure among the distinctive features treated
in any of Biber’s approaches, in spite of the fact that Biber (1995: 29) maintains
that “[a]ny linguistic feature having a functional or conventional association can
be distributed in a way that distinguishes among registers”.
This raises the question whether there are any arguments supporting the
deliberate exclusion of punctuation as a distinctive feature. Based on Biber’s defi-
nition above, one might consider arguing that punctuation does not constitute a
linguistic feature – but this is hard to maintain: while punctuation is restricted
to the written modality, it is used nonetheless to represent linguistic meaning
(cf. below). Punctuation marks may even reverse the meaning of a sentence com-
pletely; compare
(6) The Democrats say the Republicans are sure to win the next election.
in which the Republicans are the assumed victors, as against
(7) The Democrats, say the Republicans, are sure to win the next election.
In the second example, the Democrats are expected to be victorious (cf. Runkel
and Runkel 1984: 34). In view of its meaning-distinguishing function, punctua-
tion should consequently be considered a linguistic feature.
If punctuation had no conventional or functional association, as required by
the definition of linguistic features above, it should be possible to use all punctu-
ation marks interchangeably. This is, however, not the case (cf. the next section).
Since punctuation is restricted to writing, using it as a feature would seem
to have the disadvantage of disregarding all registers belonging to spoken lan-
guage. This is, however, only true to a certain extent, since spoken texts may be
transcribed (e.g. in interviews for magazines or in corpora), and punctuation is
conventionally inserted for the convenience of the reader in such cases. The rela-
tion between the two dimensions is clarified by Söll and Hausmann (1985: 17),
who distinguish between the medium of realisation (auditory vs. visual code) as
opposed to the characteristics of conception (spoken vs. written style). Punctu-
ation is thus only present in the visual code but may be used in texts belonging
both to the spoken or written style. Söll and Hausmann’s distinction is thus useful
e.g. in view of the possibilities offered by computer-mediated communication,
which may use the visual code but some kind of spoken style. Note also that Biber
and Conrad’s (2009: 78–82) long list of linguistic features includes a subcategory
“Special features of conversation”, which is restricted to a subgroup of registers
with a tendency towards oral realisation and includes e.g. pauses, fillers and
backchannels. As a consequence, the addition of a subcategory “Punctuation”,
which applies to registers in the visual code only, would appear to be legitimate.
Furthermore, one should not overlook the fact that Biber and Conrad
(2009: 63) speak of a “list of features that you might consider” in register anal-
ysis, which means that they do not claim completeness. They also state that
“[C]onsulting a corpus-based reference grammar is useful for deciding which fea-
tures to study”. Since punctuation is only marginally treated in such grammars,
possibly in view of written language’s widely assumed status as a secondary
system (cf. e.g. Bloomfield 1933: 21), this may have led to its omission from the
most influential model of register so far.
To conclude, there are no convincing reasons for excluding punctuation as
a possible register feature. Instead, it is argued in the following that there are
several good reasons for considering it.
2 Functions of the punctuation marks

Biber’s approach is based on the premise that “linguistic features co-occur in
texts because they reflect shared functions” (Biber 1995: 30). This means that it
should be possible to establish a link between the punctuation marks occurring
in texts (and their functions) and the various lexico-grammatical register features
discussed in the previous literature (with their corresponding functions linking
them to the situational context and the communicative purpose of the respective
register). If that were indeed the case, it should be possible to make an informed
guess about (or even recognise) the register of a text based solely on the punctua-
tion marks occurring in that text. The following illustrative passages are extracts
from example texts used in Biber and Conrad (2009). Since these “illustrate the
linguistic patterns found in previous large-scale analyses of these registers”
(Biber and Conrad 2009: 64), they can be considered prototypical representatives
of the corresponding registers and should also fulfil that role with regard to punc-
tuation.
: . . . : ? : . : . : ? [ ] : ? : .
Figure 1: Punctuation from text A
Figure 1 constitutes a sequence of punctuation marks which were extracted from

a short text (cf. below) by deleting everything except the punctuation marks.
Spaces were then added to make the punctuation marks more clearly discernible.
Even in this reduced format, which is void of any lexical or syntactic content, it
is possible to form some idea about the communicative situation of the text. The
task is made easier if paragraph breaks are conserved as well:
: . . .
: ?
: .
: .
: ? [ ]
: ?
: .
Figure 1a: Punctuation from text A with paragraph breaks
The most striking feature is presumably the occurrence of a colon at the begin-
ning of every line, which is followed by either full stops or a question mark,
thereby suggesting an interactive communicative situation. Indeed, the text is
part of a conversation between a group of friends walking to a restaurant, which
is included in the Longman Spoken and Written English Corpus:
Judith: Yeah I just found out that Rebekah is going to the University of Chicago to get
her PhD. I really want to go visit her. Maybe I’ll come out and see her.
Eric: Oh is she?
Judith: Yeah.
Eric: Oh good.
Elias: Here, do you want one? [offering a candy]
Judith: What kind is it?
Elias: Cinnamon.
Text A: Text sample 1.1 from the LSWE Corpus (Biber and Conrad 2009: 7–8)
The colons in the full text are actually not line-initial but follow the names of the
speakers, just as they would in the scripted version of a play. Following the same
type of convention, the information referring to the extra-linguistic context has
been added in square brackets at the end of one line. The punctuation marks are
thus strongly indicative of conversation.
: ! . : ? : ! : ! : < . > . ! : < . > ? : !
Figure 2: Punctuation from text B
The same is true of Figure 2. The large amount of exclamation marks, colons,
question marks and (this time angled) brackets in Text B makes it highly unlikely
that the text should be a tax declaration document or newspaper article. While
the fact that it is an excerpt from a drama – i.e. scripted speech – and no transcript
of a conversation cannot be deduced from punctuation alone, the oral dimension
of the text emerges by analogy to Text A.
RUTH: I want to go! I promised Chris Burns I’d meet him.

BEATRICE: Can’t you understand English?
RUTH: I’ve got to go!
BEATRICE: Shut up!
RUTH: <Almost berserk.> I don’t care. I’M GOING ANYWAY!
BEATRICE: <Shoving RUTH hard.> WHAT DID YOU SAY?
TILLIE: Mother!
Text B: Text sample 1.7 from Biber and Conrad (2009: 20): Paul Zindel’s 1970 drama The
Effect of Gamma Rays on Man in the Moon Marigolds
This raises the question what typical register features are linked to punctuation.
For example, the large number of first and second person pronouns typical of
spoken conversation (Biber and Conrad 2009: 7–8) – which is supported by the
prototypical extracts above – cannot be derived from punctuation. By contrast,
another characteristic linguistic feature can: the pervasiveness of questions,
which are usually marked by sentence-final question marks in many (but not all)
transcripts of spoken language, e.g. in Text A, and in texts that are written to be
spoken (e.g. Text B). The presence of question marks can thus be linked to the
presence of questions: both are indicative of interaction (cf. Biber and Conrad
2009: 7–8). Since questions favour the production of answers as the privileged
second pair part (Levinson 1983: 307), full stops following question marks are
likely to represent not only statements but answers. This assumption is supported
by Texts A and B above. According to Biber (1988: 227), questions “indicate a
concern with interpersonal functions and involvement with the addressee”. It
follows from this that they should be more frequent in registers involving that
function, occurring e.g. more frequently in riddles2 than in front-page newspaper

articles (cf. Biber and Conrad 2009: 7–8) and also in scripted or transcribed con-
versation.
The analysis of the four main types of pragmatic discourse function and the
syntactic sentence types in Quirk et al. (1985: 803–804) with the punctuation
marks used in the examples of that grammar reveals that this correlation is no
coincidence: there is a strong link between
– statements (which mainly convey information),
declaratives (in which the subject usually precedes the verb) and
full stops,
e.g. The Prime Minister resigned.
– questions (which usually seek information),
interrogatives (characterised by inversion, e.g. of subject and operator, or
sentence-initial wh-question words) and
question marks,
e.g. Did the Prime Minister resign? or What did the Prime Minister do?
– directives (which are mainly used to instruct someone to do something),
imperatives (which have no subject and whose verb is in the base form) and
exclamation marks,
e.g. Leave me alone!
– exclamations (in which speakers express the extent to which they are
impressed),
exclamatives (which begin with what/how and usually have no subject-verb
inversion) and
exclamation marks,
e.g. What a funny hat!
It therefore seems safe to claim that the punctuation marks closing sentences
follow a prototype-based distribution (cf. e.g. Rosch 1973, 1975) with an ideal
exemplar in the centre of the category and fuzzy boundaries in its periphery. The
latter would include less typical uses, such as
(8) I’d love a cup of tea.
which is a declarative from the perspective of syntax but pragmatically a direc-

tive, inciting the hearer to serve a hot drink (Quirk et al. 1985: 804). The punctu-
2 Note, however, that puzzles need not necessarily be phrased as questions, e.g. in the case of
crosswords (cf. Pham, this volume), which tend not to use question marks.
ation with a full stop in Quirk et al. (1985) for this particular example seems to
suggest that in doubtful cases, punctuation follows the syntactic rather than the
pragmatic perspective. While the use of an exclamation mark does not seem to be
entirely excluded in this particular example (even if an informal internet search
confirms the full stop as the norm), other indirect speech acts such as Searle’s
(1975: 73) famous
(9) Can you pass the salt?
which is syntactically a question but actually a directive, definitely require the

syntactically-based question mark. By contrast, the use of an exclamation mark
making
(10) Can you pass the salt!
slightly more explicitly directive would seem quite unusual. As a consequence,

we may conclude that there is a strong correlation between punctuation marks
and particular grammatical structures – even more than with discourse func-
tions, but often (in direct speech acts), both aspects will coincide.
The communicative purposes of a register determine its discourse functions
and the syntactic structures associated with these – which are in turn linked to
particular prototypical punctuation marks. However, some registers may simply
not require particular types of expression: for instance, instruction manuals do
not usually engage in mutual interaction with their readers. As a consequence,
one would not expect them to contain any questions and consequently no ques-
tion marks (except, possibly, the occasional rhetorical question to guide their
readers more vividly).
Note, however, that the conventions of particular registers may require the use
of particular punctuation marks in spite of communicative purposes or favoured
syntactic sentence types which would prototypically result in the use of a differ-
ent punctuation mark: thus recipes are directive and use a considerable amount
of verbs in the imperative (cf. Arendholz et al. 2013), but they rarely contain any
exclamation marks. This would seem to imply that the conventions associated
with particular registers can override more general punctuation tendencies.
The next extract of punctuation also belongs into a highly conventionalised
register.
( ). ( ) , ( . , . , . ). , ( ; . ). , ( , ; . ; . ; . ; ). ( ) , ( . ). . . ( ) .
Figure 3: Punctuation from text C

This sample is not only characterised by its complete lack of question marks and
exclamation marks but also by a large proportion of full stops and brackets, many
commas and even some semicolons. It comes from the introduction to a scientific
research article and is thus situated clearly towards the extreme of the written
dimension of language conception.
Hybridization between species can severely affect a species status and recovery (Rhymer &
Simberloff 1996). Threatened species (and others) may be directly affected by hybridization
and gene flow from invasive species, which can result in reduced fitness or lowered genetic
variability (Gilbert et al. 1993, Gottelli et al. 1994, Wolf et al. 2001). In other cases, hybridiza-
tion may provide increased polymorphisms that allow for rapid evolution to occur (Grant &
Grant 1992; Rhymer et al. 1994). Species can also be influenced indirectly, because hybrid-
ization may affect the conservation status of threatened species and their legal protection
(O’Brien & Mayr 1991a, 1991b; Jones et al. 1995; Allendorf et al. 2001; Schwartz et al. 2004;
Haig & Allendorf 2005). The Northern Spotted Owl (Strix occidentalis caurina) is a threat-
ened subspecies associated with rapidly declining, late-successional forests in western
North America (Gutierrez et al. 1995). Listing of this subspecies under the U.S. Endangered
Species Act (ESA) attracted considerable controversy because of concern that listing would
lead to restrictions on timber harvest.
Text C: Text sample 6.13 from Biber and Conrad (2009: 163): Scientific research article
(Genetic identification of Spotted Owls … , Conservation Biology, 2004).
While scientific research attempts to answer research questions, these are usually
formulated indirectly, with the consequence that the number of direct questions
and the ensuing question marks is relatively low (although not necessarily zero).
Exclamation marks, by contrast, seem to be practically excluded in this register.
This is presumably because the discourse functions usually associated with that
punctuation mark (cf. Quirk et al. 1985: 803–804 above) contradict the general
principles of academic research: it is neither directive (at least not overtly) nor
concerned with the expression of emotions such as being impressed. These con-
ventions are communicated between researchers, e.g. by supervisors marking
their students’ papers or by means of style guides.3
The occurrence of large numbers of full stops is not only due to the focus of
research papers on transmitting information but also to the frequent occurrence
of the abbreviation et al., which is rarely found outside academia, in this particu-
lar passage. The use of brackets is also highly conventionalised: with few excep-
3 Note, however, that very popular style guides giving advice on academic research, such as
Booth et al. (2008), do not mention punctuation (merely style), and others, such as Swales and
Feak (2010: 27), limit themselves to the discussion of semicolons, colons, dashes and commas.
tions containing additional explanations, most brackets contain references to

other texts. This supports the view that particular punctuation marks tend to cor-
relate with particular registers, and that some punctuation marks are employed
following register-specific conventions which are particularly adequate for the
communicative needs of the register in question. In academic research, this
includes the need to refer to previous research in a clear and unobtrusive way.
If we take all of the above into account, a question that emerges is whether
there are any general functions of punctuation marks which may be put to spe-
cific ends in individual registers. According to Huddleston and Pullum (2002:
1729–1730), punctuation can be ascribed four main functions from a general per-
spective:
– indicating boundaries (e.g. full stops mark the end of sentences)
– indicating status (e.g. question marks indicate that a sentence is a question)
– indicating omission (e.g. …)
– indicating linkage (e.g. commas mark that units belong together).4
A more specific but nonetheless brief overview of the functions of individual

punctuation marks is provided by Seely (2007: 16–124): the
● full stop ○ marks the ends of sentences
○ marks complete groups of words
○ ends abbreviations
○ acts as a separator in e-mail and website addresses
● question mark ○ marks the end of a question
○ marks statements as doubtful or questionable, e.g.
in brackets
● exclamation mark ○ ends exclamations
○ ends loud or shouted direct speech
○ ends sentences expressing amusement
○ is used in brackets to express amusement or irony
● comma ○ separates items in lists
○ encloses sentence parts parenthetically
○ marks the divisions between the clauses in
complex sentences
○ separates sections of sentences or numbers con-
sisting of more than four digits to make them easier
to read
○ introduces or ends direct speech
4 For a more detailed theoretical account of the guide functions of punctuation cf. Patt (2013).
● semicolon ○ lists items which are very long

○ marks a break between two parts of a sentence,
which are usually finite clauses that could stand on
their own, in order to show the close link between
them
● colon ○ introduces lists
○ introduces direct speech or quotations
○ separates two parts of a sentence of which the first
leads on to the second
● dash ○ encloses sentence parts parenthetically
○ introduces something which further develops or
exemplifies what has been written before
○ introduces asides by the writer
○ shows interruptions or break-offs in mid-sentence
(in direct speech)
● slash ○ indicates alternatives
○ shows a range
○ is used in some abbreviations (e.g. c/o)
● suspension dots ○ reduce the length of quotations
○ show incompleteness in direct speech
● quotation marks ○ separate direct speech, titles or quotations or ideas
marked as not being the author’s
● brackets ○ indicate that the words enclosed within are not
essential to the meaning of the sentence but
provide supplementary information.
Even if this account necessarily simplifies a more complex situation, it provides

a good point of departure for the consideration of more specific uses of the punc-
tuation marks.
Since full stops are used at the end of statements, they seem to represent a
relatively unmarked punctuation mark. They do, however, change their function
and become more marked as soon as they are combined into suspension dots,
which signal omission.
Question marks are apparently only placed at the end of direct questions,
and direct questions always end with a question mark. Even the seemingly excep-
tional sceptical use listed above can be interpreted as shorthand for a question
such as “Is that true?”, e.g. in
(11) There is no such thing as a free lunch. (?)

In most other cases, however, the relation is not as unequivocal, because the
punctuation marks have several functions (some of which may overlap with the
functions of other punctuation marks): as we have seen, colons can be used to
set off the name of characters in a play from their text, but very frequently, they
are followed by explanations or specifications and they can therefore commonly
be found in registers with an argumentative function, such as academic papers.
Alternatively, additional information may be included in brackets or follow-
ing a dash,5 but different degrees of formality are associated with the various
punctuation marks. According to Seely (2007: 84), brackets are “the most formal
(and most obvious) way of showing parenthesis”, commas are “less forceful”
and dashes “the least formal”. This seems to imply that a superficial analysis of
punctuation marks does not suffice: it is not enough to simply count the number
of commas, question marks etc. (not even if the number of words in the texts
is taken into consideration), but it is also necessary to consider their individual
functions and possibly even their stylistic value. This is the only means of iden-
tifying highly conventionalised register-specific uses, such as initial exclamation
marks expressing negation (e.g. !interesting = not interesting) in “hacker-influ-
enced interactions” (Crystal 2001: 90) or the specialised use of double quotation
marks in comics (cf. below).
3 Punctuation in comics vs. academic texts

In order to confirm or reject the hypothesis that punctuation can serve as an indi-
cation of register and to identify register-specific usage of punctuation, a small-
scale empirical study was conducted. Since register characteristics become most
obvious if very different registers are analysed contrastively (Biber and Conrad
2009: 8), a register with a relatively strong tendency towards spoken conceptu-
alisation (namely comics) was contrasted with a register tending towards the
written extreme (namely academic texts). For the first of these, the comic compo-
nent of CoCo, the Comic Corpus described in Sanchez-Stockhammer (2012), was
used.
5 Cf. Lampert (this volume) for a detailed treatment of parenthesis.

Table 1: The Comic Corpus (CoCo) texts (cf. Sanchez-Stockhammer 2012: 68)
Text Words Sentences Words per sentence
Batman 868 153 5.67

Superman 744 101 7.37
Uncle Scrooge 774 140 5.53
The language in comics considered in the compilation of CoCo occurs in head-

ings, text boxes with narration, speech bubbles, thought bubbles and subtitles
(common particularly in cartoons – which are part of CoCo but were not consid-
ered in the present study), as onomatopoeia superimposed on the pictures and as
written language within the picture (e.g. inscriptions on signs; cf. Sanchez-Stock-
hammer 2012: 58–59). Combinations of punctuation marks were also encoded –
notably suspension dots, which can also be considered a complex punctuation
mark. Neither emoticons (e.g. < :-) >) nor obscenicons (e.g. <!?#*&>) as emotion-
ally loaded combinations of punctuation marks occurred in the dataset.6 Non-
linguistic semiotic means (such as the shapes of bubbles used to indicate that
their content is spoken, thought, shouted etc.) were not taken into consideration,
either.
The corpus of academic texts AcadText was compiled specifically for the
present study. It contains three research articles from high-quality journals: one
theoretical text (Schneider 2003), one empirical study (Juhasz et al. 2003) and
one text by Biber and two co-authors, namely Susan Conrad and Randi Reppen
(Biber et al. 1994).7
Following the same approach as in the compilation of the comic corpus wher-
ever possible, all full sentences (including footnotes) and tables were taken from
6 While the absence of emoticons can be explained by the fact that the multimodality of comics
permits the representation of facial expression in a more detailed manner by the drawn faces
of the interlocutors, the absence of obscenicons from the corpus is presumably due to chance.
However, since the expression of anger in comic strips seems to use mainly question marks and
exclamation marks from the set of the punctuation marks, while frequently using symbols (e.g.
<@>, <#>, <$>, <%>, <&> and <*>) and also drawings of spirals etc. (cf. Law 2010), the treatment
of obscenicons belongs into the periphery of the use of punctuation marks anyway.
7 Since academic English is a register with a particularly strong lingua franca element and since
all articles in AcadText come from high-quality journals and have consequently undergone in-
tense editing, the native language of the authors was expected to play only a marginal role. While
the individual author Schneider has a German-language background, either all or the majority
of the authors of the jointly written articles were working at universities in English-speaking
countries at the time of publishing.
the first two pages with numbers ending in zero from each article. End-of-line
hyphens were deleted and m-dashes flanked by spaces. Word-internal brack-
eting, e.g. in (semi-)automatic, was deleted so as not to skew the automated
counts. While full stops, question marks and quotation marks counted as sen-
tence endings, colons and semicolons were considered sentence-internal. Head-
ings and rows in tables counted as one sentence each. It becomes immediately
obvious that the number of words per sentence is considerably larger in the aca-
demic texts than in the comics.
Table 2: The Corpus of Academic Texts (AcadText)
Text Words Sentences Words per sentence
Biber et al. (1994) 892 35 25.49

Juhasz et al. (2003) 1,037 40 25.93
Schneider (2003) 1,103 25 44.12
Since language in comics is heavily constrained by spatial restrictions and mainly

contains the written representation of spoken-style language from conversations
between speakers, comics as a register should contrast with what is already
known from previous research about more prototypically written registers – such
as academic texts. From a statistical perspective, the hypothesis H1 is therefore
that comics and academic texts should differ in their use of the punctuation
marks. H0 is consequently that comics and academic texts do not differ in this
respect.
In view of the assumed register-specifics, we can formulate the following
more specific expectations regarding punctuation in comics: one may expect
1. a relatively large proportion of question marks and exclamation marks
(due to the spoken character of this register)
2. no quotation marks
(because direct speech is already marked as such by its inclusion in speech
bubbles)
3. few commas
(because the sentences in comics are presumably relatively short due to
spatial restrictions)
4. few semicolons
(for the same reason as for the commas)
5. few colons
(due to spatial restrictions and the fact that the speakers in a conversation are
indicated by the pointed side of speech bubbles in contrast to usual scripted
conversation)
6. fewer brackets than dashes
(because these represent the most and least formal punctuation marks indi-
cating parenthesis according to Seely 2007: 84)
7. a certain number of suspension dots
(in order to permit longer sentences to continue in the following speech
bubble).
By contrast, academic texts as a written register are expected to contain

1. a very small proportion of question marks and exclamation marks
(due to the written character of this register)
2. a certain proportion of quotation marks
(in order to mark passages that were taken over verbatim from another author)
3. many commas
(because the sentences in academic texts are presumably relatively long due
to the complexity of the subjects treated)
4. many semicolons
(for the same reason as for the commas)
5. many colons
(because these provide links between sentences and are also used to refer to
precise pages in references)
6. more brackets than dashes
(because these represent the most and least formal punctuation marks indi-
cating parenthesis according to Seely 2007: 84)
7. a few suspension dots
(signalling omission in quotations).
For the quantitative analysis of the punctuation marks, all letters and numbers in
the original corpus texts were deleted, and the punctuation marks were counted
semi-automatically by using the “replace” function in Microsoft Word. The
results in Table 3 were normalised by dividing the absolute results by the number
of words in the respective texts, then multiplying them by a thousand (in order to
increase readability) and finally rounding them up or down to yield full numbers.
Table 3: Normalised results (divided by the number of words per text, multiplied by 1,000 and
rounded)
Comics Academic texts
Batman Super- Uncle Biber Juhasz Schnei-

man Scrooge et al. et al. der
Full stops 78 50 4 53 69 24
Question marks 20 22 14 0 1 0
Exclamation marks 53 36 134 0 0 0
Commas 60 62 37 62 72 71
Semicolons 0 0 0 4 1 2
Colons 0 1 0 1 0 11
Dashes 16 3 0 1 0 1
Slashes 0 0 0 4 0 0
Suspension dots 40 43 18 0 0 1
Single quotation marks (pairs) 0 0 0 3 0 9
Double quotation marks (pairs) 2 5 1 0 0 1
Round brackets (pairs) 0 0 0 18 41 12
Square brackets (pairs) 0 0 0 0 0 0
Apostrophes 58 43 71 1 4 5
For each line (i.e. for each punctuation mark), shaded cells indicate intra-group
similarity and inter-group dissimilarity between comics and academic texts. This
is either based on a very obvious difference in the results (e.g. for the suspension
dots) or, in some cases, on the presence of at least two values larger than zero in
one type of register as against all-zero in the three texts from the other register
(e.g. for the semicolons).
Note that the number of quotation marks and brackets corresponds to the
number of pairings of these punctuation marks. This is because it obligatorily
takes two exemplars to set off parentheses – in contrast to dashes or commas,
which may open a parenthesis closed by the final punctuation mark in a sen-
tence, e.g. a full stop (cf. Lampert 2011: 91–92). While an alternative single-punc-
tuation-mark use of brackets can be imagined, namely when a single closing
bracket is employed to set off the introductory ordering letters in lists, such as
a) xx
b) yy
c) zz,
the fact that this type of usage did not occur in the corpus made it unnecessary to
establish a more detailed distinction here. If the results from Table 3 are analysed
in relation to the hypotheses formulated above, the following findings emerge:
(i) As expected, there is a marked difference in the use of question marks
and exclamation marks in comics and academic texts: only one academic text
contains a single question mark at the end of the sentence
(12) What function do beginning and ending lexemes assume in compound recognition?
and no text from this register uses any exclamation marks. This is in line with
the usual correlation of these two punctuation marks with conceptually spoken
language: all the comic texts contain both question and exclamation marks,
although the proportion varies considerably, with results ranging from 14 to 134
instances.
(ii) The discussion of quotation marks requires a distinction between single
and double quotation marks. As for the distribution of the single quotation marks,
their analysis made it necessary to distinguish manually between single quotation
marks and the formally identical apostrophes. Since apostrophes are word-inter-
nal punctuation marks, they were only included in the analyses because of this
necessary distinction, but they actually yielded interesting results: while both
academic texts and comic texts contain a small number of stylistically neutral
genitives (4 in Superman, 3 in Batman, 2 in Uncle Scrooge), the majority of the
large amount of apostrophes in the comic texts either marks informal contrac-
tions (e.g. won’t) or omissions or shortenings characteristic of informal language
usage, e.g.
(13) With a swoop to his left an’ a peck to th’ right, he catches rat finks way out west!
However, it seems that there is currently a tendency for an increasing number of

academic texts to use contractions, too, e.g. Moore and Notz (2006: 236, Let’s) or
Mithun (2012: 53, I’m).
No pairs of single quotation marks were used in the comic texts, as expected,
but they occasionally occur in the academic writing (12 pairs in two texts). This
result may also be variety-dependent to a certain extent: according to Seely (2007:
60–62), there is a tendency for British English usage to prefer single quotation
marks over double quotation marks, whereas American English has the opposite
tendency – codified e.g. in The MLA Style Manual (Achtert and Gibaldi 1985: 80).
Note, however, that the article by Schneider, which uses single quotation marks,
appeared in Language, which is an American journal.
The analysis of the article by Biber et al. beyond the passage included in the
corpus shows that a considerable proportion of single quotation marks enclose
no quotations but paraphrases of meaning, e.g. in
(14) an analysis of adjectives marking ‘certainty’
or words which are used metalinguistically, e.g.
(15) any global characterizations of ‘General English’ should be regarded with caution
Contrary to expectations, double quotation marks are almost nonexistent in the

AcadText corpus, with only one pair in one text:
(16) we need to remember that ‘nations are mental constructs, “imagined communities” ’
which are constructed discursively […] (Wodak et al. 1999:4).
and it becomes clear that these are merely used to mark quotation marks within
a quotation whose reference is given later in the text; the convention being that
single quotation marks are doubled in this case and vice versa (cf. Achtert and
Gibaldi 1985: 80; Sanchez-Stockhammer, forthcoming).
While this quasi-absence of double quotation marks from AcadText may be
attributed to the small size of the random sample or the conventions of individual
publishers, chance cannot explain the other unexpected finding, namely the rel-
ative frequency of double quotation marks in the comic corpus (8 pairs; at least
one per text). Since direct speech is already marked as such by its inclusion in
speech bubbles, the double quotation marks must have a different function here:
indeed, the quotation marks in the comics are used in their general (academic)
function and serve to quote the speech of others. Thus the utterance
(17) Maybe next time, master Bruce.
is countered by
(18) Not “maybe”, Alfred.
Double quotation marks are also employed in the comics to refer to the metalin-
guistic use of words, e.g.
(19) Funny, I didn’t think you even knew the word “honest,” Penguin.
In Superman, double quotation marks are additionally used on some occasions in

narrative boxes to indicate the direct speech or thought of a character not shown
in the current panel itself, but whose identity can be deduced from context or
from the fact that suspension dots are linking the end of an utterance marked
with quotation marks to its beginning in a panel on the previous page (cf. below).
(iii) Contrary to expectations, no marked difference was observable in the
use of commas: while the figures are lower for comics overall, they are still sur-
prisingly close to the results obtained for the academic texts. However, a more
detailed text-based analysis reveals that commas are mainly used with very spe-
cific functions in comics: a very large proportion separate off proper nouns with
vocative function from the remainder of the sentence, e.g. in
(20) Toyman, you maniac!
This use is completely missing in the academic texts. Alternatively, commas occur
after introductory interjections in the comics, e.g. in
(21) Man, would you look at THAT!
in another use that was not found in the academic writing. These register-spe-
cific uses explain why commas occur relatively frequently in the comic texts. The
most frequent use of commas in comics which is also to be expected in academic
texts (but is not too frequent in the sample) is the delimitation of sentence-initial
adverbials, e.g. in
(22) According to the contract, they are RABBIT eggs for your children, King!
(iv) Semicolons, by contrast, only occur in the academic writing, e.g. in Schneider
(2003):
(23) traces of the previous stage will still be found; that is, some insecurity remains
Since they are absent from the sample of comic texts – presumably due to the fact
that most of their uses require relatively long sentences – they can generally be
used as an indication of register with regard to the spoken/written dimension.
(v) Surprisingly, it was observed that the amount of colons does not vary
extremely between the comics and the academic texts considered. Merely
Schneider (2003) stands out, since it is the only one among the three academic
texts to indicate the precise pages in text-internal references that do not affect
quotations.
(vi) Neither sample contained any square brackets. As expected, not a single
pair of round brackets was used in the comic corpus – in contrast to the academic
texts, where brackets are commonly used to indicate references. The extremely
large proportion in Juhasz (2003) with 41 pairs of round brackets is due to the
fact that a large part of the passage randomly included in the AcadText corpus
is constituted by the results section, in which relevant figures and examples are
added in brackets, e.g. in
(24) high-frequency beginning lexemes were responded to quicker than low-frequency begin-
ning lexemes, t1(27) = ± 3.78, p < .01, t2(18) = ± 2.02, p = .059 .
While the quasi-absence of dashes from the academic papers in contrast to a

larger proportion in CoCo seems to support the view that there is a difference in
formality between these two punctuation marks, the quantitative difference is
not as marked as one might have expected. Furthermore, the analysis of the texts
reveals that dashes are frequently used in consecutive pairs in the Batman comics
and also in Superman, which raises the number of dashes. In many cases, the
combination < -- > seems to indicate a longer pause, e.g. in
(25) But her insides are all right -- no bleeding there.
This use of dashes represents a function which is not usually required in aca-
demic texts.
(vii) The difference in frequency between the use of suspension dots in
comics and academic texts is far more pronounced than expected: the only aca-
demic text using them is Schneider (2003) in one instance where omission in a
quoted passage is indicated:
(26) ‘the discursive constructs of nations and national identities … primarily emphasize
national uniqueness and intra-national uniformity but largely ignore intra-national dif-
ferences’ (Wodak et al. 1999:4).
This is a use which is highly unlikely to occur in comics. However, the low fre-
quency of suspension dots in the sample of academic texts seems to suggest
that quotations are usually extracted in shorter portions and that omissions are
avoided. This is supported by the quotations in AcadText, all of which represent
extracts from individual sentences only, e.g. the following series of quotations
from Schneider (2003):
(27) a case of ‘identity revision’ triggered by the insight that one’s traditional identity turns
out to be ‘manifestly untrue’ or at least ‘consistently unrewarding’ (Jenkins 1996:95)
Comics, by contrast, use suspension dots very frequently (all texts employ them
between 18 and 43 times) and often in order to create cohesion by their occur-
rence not only at the end of an utterance which is interrupted in one panel, e.g. in
(28) You might be stronger and faster than I am right now, Parasite…
but also at the beginning of the continued speech or thought in the next panel:
(29) …but you’ve barely had forty-eight hours to practice using my powers.
Such interruptions are not merely attributable to spatial restrictions, it seems, but
also to the fact that the picture in the new panel corresponds more closely to the
action indicated in the second part, such as a punch with a fist in the Superman
example above.
The differences between the use of punctuation marks in the texts from
the comic corpus and the academic texts are even more striking if considered
graphically. Figure 4 summarises the features which are characteristic of comics
(question marks, exclamation marks, suspension dots and apostrophes); Figure
5 those which are more typical of academic writing (semicolons, single quotation
marks and round brackets).
Figure 4: Punctuation marks occurring more frequently in comics than in academic texts
It may therefore come as a surprise that this striking difference between the
two registers cannot be backed statistically: non-parametric statistical tests for
independent samples were carried out in SPSS in order to compare the medians
between groups (i.e. comics vs. academic texts), but even the Mann–Whitney U
test yielded no significant results for any of the variables (e.g. question marks)
due to the small number of texts considered. Nonetheless, the graphically imme-
diately obvious difference between comics and academic texts in Figures 4 and 5
permits the tentative conclusion that the use of punctuation in different registers
can be employed as a register feature. At the same time, these results call for
further empirical research, which is extremely likely to provide statistical backing
for the more than obvious tendencies observed in this explorative study.
Figure 5: Punctuation marks occurring more frequently in academic texts than in comics
4 C
onclusion
Punctuation is a completely underresearched feature in register studies at the
time of writing: thus Barbieri’s extensive annotation of major register and genre
studies in Biber and Conrad’s Appendix A (2009: 271–295) does not mention
punctuation a single time in the column “features under investigation”. It is only
in Barbieri’s summary of Crystal’s (2001) major findings that there is a minor ref-
erence to it, when “minimal punctuation” is found to be one of the “common
characteristics of internet registers” (Biber and Conrad 2009: 289).
However, the empirical analysis of two register-specific corpora in the present
study – one of comics and one of academic texts – suggests that certain types of
punctuation tend to occur more frequently in certain types of register and that
punctuation can therefore be employed as an indication of register. For instance,
some punctuation marks correlate strongly with spoken and written style respec-
tively and barely occur in the contrasting register. While question marks, excla-
mation marks, suspension dots and apostrophes are far more frequent in comics
than in academic texts, the latter use a larger proportion of semicolons, single
quotation marks and round brackets. Furthermore, even in those cases where the
results are similar from a quantitative perspective, differences in usage emerge
upon closer consideration: for instance, comics tend to use commas after intro-
ductory interjections or proper nouns with vocative function, whereas academic
texts make more varied use of that punctuation mark. Further research into this
topic is required to establish the register-distinctive functions of the punctuation
marks in more detail and for a larger number of registers.
Biber’s distinction between different registers is “based on the premise that
most formal differences reflect functional differences” (Biber 1995: 136). None-
theless, he claims that his multidimensional approach differs from the studies
of his predecessors in that he does not conduct a functional analysis in the first
place so as to identify characteristic linguistic features. Instead, he states that he
“first identifies groups of co-occurring features and subsequently interprets them
in functional terms” (Biber 1988: 24). While this seems to contradict an approach
such as the one used in the present study at first sight, one should not forget
that Biber’s analyses presuppose a list of linguistic features which were then
subjected to statistical analyses. Taking into account that he reviewed “previous
research to identify potentially important linguistic features” in his preliminary
analysis (Biber 1988: 64) and that these are understood as features “that have
been associated with particular communicative functions and therefore might be
used to differing extents in different types of text” (Biber 1988: 71–72), it becomes
clear that he is not correlating random phenomena but only the results of previ-
ous functional analyses – even if these were carried out by other researchers. In
this sense, the present study can be regarded as a legitimate suggestion for the
extension of the original model.
Within such a framework, punctuation is on a level with the 15 other major
categories such as “Special features of conversation” (Biber and Conrad 2009:
82). “Punctuation” is thus tentatively suggested as category 16 with the following
subordinate features (some of which did not prove distinctive for comics vs. aca-
demic texts but may play a more important role with regard to the differentiation
between other registers):
1. full stop
2. question mark
3. exclamation mark
4. comma
5. semicolon
6. colon
7. dash
8. slash
9. quotation marks (single, double)

10. brackets (e.g. round, square, angled)
11. word-internal punctuation (apostrophes, hyphens)
12. combinations of punctuation marks (e.g. suspension dots, emoticons).
In a very wide reading, the division of a text into paragraphs could also be con-
sidered as punctuation (cf. Huddleston and Pullum 2002: 1725). According to
Nunberg (1990: 17), “punctuation must be considered together with a variety of
other graphical features of the text, including font- and face-alternations, capi-
talization, indentation and spacing”, all of which are said to fulfil a similar func-
tion. To this can be added the use of italics and bold print. At first sight, these
features seem to go beyond the purely linguistic means and to unduly emphasise
the visual and multimodal aspect of written language – but they sometimes find
a correspondence in spoken language in pauses, stress, intonation etc., even if it
is not completely systematic (cf. above).
What makes the proposed category 16 special is the fact that the register fea-
tures listed therein are not lexico-grammatical, like the other features included
in Biber’s models up to the time of writing. Some of the punctuation features
correlate with lexico-grammatical features (e.g. question marks with syntactic
questions), which are in turn typical of specific registers (e.g. conversations).
However, this does not mean that punctuation is a secondary register feature.
Many other punctuation marks correlate with more abstract categories; e.g. quo-
tation marks with quotations, which may take practically any lexical or syntactic
form. Furthermore, it is normal that “linguistic features co-occur in texts because
they reflect shared functions” (Biber 1995: 30). This does not necessarily imply
that one should receive more weight than the other. As a consequence, punctua-
tion is considered a register feature in its own right.
In 1988, Biber (71–72) states for register analysis that “the goal is to include the
widest possible range of potentially important linguistic features”. The empirical
analysis presented here clearly suggests punctuation as such a feature. However,
the proposed addition of punctuation to the set of categories is not to be regarded
as any form of criticism of the original model, but merely as the suggestion of a
valuable category to add to the long list of previously used features.
5 R
eferences
Achtert, Walter S. & Joseph Gibaldi. 1985. The MLA style manual. New York: The Modern
Language Association of America.
Arendholz, Jenny, Wolfram Bublitz, Monika Kirner & Iris Zimmermann 2013. Food for thought –
or, what’s (in) a recipe? A diachronic analysis of cooking instructions. In Cornelia Gerhardt,
Maximiliane Frobenius & Susanne Ley (eds.), Culinary linguistics: The chef’s special,
Press.
registers. Amsterdam: Benjamins.
University Press.
Bloomfield, Leonard. 1933. Language. New York: Holt.
Booth, Wayne C., Gregory G. Colomb & Joseph M. Williams. 2008. The craft of research. 3rd edn.
Chicago: University of Chicago Press.
Crystal, David. 2001. Language and the internet. Cambridge: Cambridge University Press.
Halliday, Michael A.K. 1978. Language as social semiotic: The social interpretation of language
and meaning. London: Arnold.
Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English
language. Cambridge: Cambridge University Press.
Jakobson, Roman. 1985. Closing statement: Linguistics and poetics. In Robert E. Innis (ed.),
Semiotics: An introductory anthology, 145–175. Bloomington: Indiana University Press.
Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts
on a cognitive-semantic analysis of written language. International Journal of Cognitive
Lampert, Martina. 2013. Say, be like, quote (unquote), and the air-quotes: Interactive
quotatives and their multimodal implications. English Today 29(4). 45–56.
Law, Gwillim. 2010. Grawlixes past and present. http://www.statoids.com/comicana/grawlist.
html (accessed 15 July, 2014).
Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.
Meyer, Charles F. 1987. A linguistic study of American punctuation. Frankfurt am Main: Peter
Lang.
Mithun, Marianne. 2012. The deeper regularities behind irregularities. In Thomas Stolz et al.
(eds.), Irregularity in morphology (and beyond), 39–59. Berlin: Akademie.
Moore, David S. & William I. Notz. 2006. Statistics: Concepts and controversies. New York: W.H.
Freeman.
Nunberg, Geoffrey. 1990. The linguistics of punctuation. Menlo Park, CA: CSLI.
Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in
English: Exploring the guide functions of punctuation. Tübingen: Narr.
Peters, Pam. 2004. The Cambridge guide to English usage. Cambridge: Cambridge University
Press.
grammar of the English language. London: Longman.
Rosch, Eleanor. 1973. On the internal structure of perceptual and semantic categories. In
Timothy E. Moore (ed.), Cognitive development and the acquisition of language, 111–144.
New York: Academic Press.
Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of Experimental

Psychology, General 104(3). 192–233.
Runkel, Philip Julian & Margaret Runkel. 1984. A guide to usage for writers and students in the
social sciences. Towota, New Jersey: Rowman & Allanheld.
Sanchez-Stockhammer, Christina. 2012. Comicsprache – leichte Sprache? In Daniela Pietrini
(ed.), Die Sprache(n) der Comics, 55–74. Munich: Meidenbauer.
Sanchez-Stockhammer, Christina. Forthcoming. The transformative power of copying in
language. In Corinna Forberg & Philipp W. Stockhammer (eds.), The transformative power
of the copy: A transcultural and interdisciplinary approach. Heidelberg: Heidelberg
Publishing.
Searle, John. 1975. Indirect speech acts. In Peter Cole & Jerry L. Morgan (eds.), Syntax and
semantics. Vol. 3: Speech act, 59–82. New York: Academic Press.
Seely, John. 2007. Oxford A–Z of grammar and punctuation. Oxford: Oxford University Press.
Snider, Neal. 2009. Similarity and structural priming. In Niels Taatgen & Hedderik van Rijn
(eds.), Proceedings of the 31st annual conference of the Cognitive Science Society,
815–820. Austin, TX: Cognitive Science Society.
Söll, Ludwig & Franz Josef Hausmann. 1985. Gesprochenes und geschriebenes Französisch. 3rd
edn. Berlin: Erich Schmidt.
Swales, John M. & Christine B. Feak. 2010. Academic writing for graduate students: Essential
tasks and skills. 2nd edn. Ann Arbor: The University of Michigan Press.
Trudgill, Peter. 2000. Sociolinguistics: An introduction to language and society. 4th edn.
London: Penguin.
Wardhaugh, Ronald. 2002. An introduction to sociolinguistics. 4th edn. Oxford: Blackwell.
Corpora:
Comic Corpus (CoCo):
Re-print: Englisch lernen mit Batman. Bad Guys Gallery. 2007. Munich: Berlitz.
Re-print: Englisch lernen mit Superman. Up, up and away! 2007. Munich: Berlitz.
Walt Disney’s Uncle $crooge. No. 376. April 2008. York (PA): Gemstone.
Corpus of Academic Texts (AcadText):
Biber, Douglas, Susan Conrad & Randi Reppen 1994. Corpus-based approaches to issues in
applied linguistics. Applied Linguistics 15. 169–189.
Juhasz, Barbara, Matthew S. Starr, Albrecht W. Inhoff & Lars Placke 2003. The effects of
morphology on the processing of compound words: Evidence from naming, lexical
decisions and eye fixations. British Journal of Psychology 94. 223–244.
Schneider, Edgar. 2003. The dynamics of new Englishes: From identity construction to dialect
birth. Language 79. 233–281.
Martina Lampert
Linking up register and cognitive
perspectives: Parenthetical constructions in
academic prose and experimentalist poetry
Abstract: This paper will explore the possibility of linking up Biber’s register
analysis and Talmy’s cognitive semantics, based on the assumption that some
fundamental cognitive principles inform situational features and hence would,
in part, determine linguistic characteristics. As one case in point, two samples
of parenthetical constructions from opposite written registers, academic science
writing and minimalist poetry, are scrutinised in an initial qualitative analysis.
The study identifies both a general structural and functional similarity in the
examples selected for illustration, suggesting that no significant register distinc-
tion will ensue, while the parenthetical pattern is likely to exhibit a substantial
cross-medial difference between speech and writing. These preliminary findings
invoke properties of the human cognitive architecture as well as evolutionary spe-
cifics of the language modalities as critical parameters of influence and would
speak for their recognition as potential determinants of register and, in turn, for
a principled compatibility of the two linguistic approaches.
1 I ntroduction
In this paper, I will present some arguments for linking up Douglas Biber’s regis-
ter analysis with a recent (re)conceptualizion of register as a cognitive construct
framed in Leonard Talmy’s cognitive semantics, suggesting that the traceable
principled compatibility of these two major approaches to linguistic analysis
might open up some promising insights.
In his forthcoming The Attention System of Language1, Talmy advances the
view that register, generally couched in terms of “types of speech situations”,
1 As always, I am grateful to Len Talmy for the privilege of granting me access to a very substan-
tial current draft version of this forthcoming book; unless otherwise indicated, all quotes are
from this work, and the references to this unformatted draft lack page numbers.
Martina Lampert, Johannes Gutenberg University Mainz

170 Martina Lampert
may allow for a consistent re-analysis as speaker attitude, for instance, “toward
[a lexical item’s] core meaning itself; toward the speech participants (the speaker
himself, the addressee, or the relation between the two); or toward the current
circumstance”. That is, in a cognitive semantics perspective, register distinc-
tions would become conceivable as backgrounded speaker role, or attitude, for
that matter, which are introjected into the minds of participants, thus inevitably
involving attention and memory as relevant cognitive categories. To illustrate:
what might best be treated at root as a speaker’s attitude of respect toward the addressee –
or a speaker’s attitude of solemnity about the circumstance – could also be interpreted as
the presence of a formal situation that triggers the use of a formal register.
The fundamental significance of register for any appropriate analysis of any lin-
guistic item that surfaces in Talmy’s explication ties in with Biber’s belief that
“all linguistic descriptions”, such as, for instance, “collocational studies of par-
ticular words […] must include consideration of register differences as a central
organizing parameter, if they hope to achieve an accurate account of the patterns
of use” (Gray 2013: 361). Accordingly, “register differences should be an essen-
tial component of any investigation of language use” (Gray 2013: 369). These two
statements, then, concur on the view that, in general, any linguistic construction
inheres a register ‘signature’.
Moreover, Biber’s and Talmy’s approaches might in fact be read as suggestive
of such link-up, precisely as they are seen to converge in acknowledging the major
role of both medial and cognitive determinants of linguistic patterns: introjected
in participants’ minds, cognitive parameters appear to effectively constrain perti-
nent situational characteristics, as, e.g., Biber’s (1988: 160) remark tracing medi-
al-distinctive effects back to “different cognitive constraints on the speakers and
writers” unambiguously demonstrates – apart from and additional to the hard-
wired effectors of the medium and the tangible properties of the setting in their
specific interdependence. Capitalizing on their essentially evolutionary ‘design’,
Talmy (2007b) furthermore recognises the prime significance of the options and
constraints of both the production and reception circumstances, while attention
proves the single most decisive determinant among the situational specifics in
communicative interactions to shape a linguistic item’s representational format
and its functional potential.
As a case in point, I will focus on a much neglected though highly pervasive
phenomenon in language – what I have suggested to call parenthetical construc-
tions (cf. Lampert 1992: 16 and chapter 2 below). To give a cursory impression of
the pattern’s range in structural variability, the following examples, exclusively
from academic writing, are in order. It should be noted that they are all in line
Linking up register and cognitive perspectives 171
with the formal prototype, as demarcated by parentheses in the schematic illus-

tration (7) below. Examples (1) and (2) are taken from Nunberg (1999) and demon-
strate a typical sub-clausal as well as an alleged marginal sentential instance.
The two sub-morphemic exemplars (3) and (4), found in titles of scholarly arti-
cles, are likewise deemed to be peripheral members of the category, while (5) and
(6), retrieved from the academic sub-corpus of the COCA, testify to the principled
unconstrainedness of the format even in the formal register.
(1) Yet for all these changes, there is a continuity here, too, in the way that change is
(sometimes heatedly) debated and (sometimes grudgingly) accommodated.
(2) And there is a large number of common words for talking about the language itself, for
example slang, usage, jargon, succinct, and literate. (It is striking how many of these
words are particular to English. No other language has an exact synonym for slang, for
example, or a single word that covers the territory that literate covers in English, from
“able to read and write” to “knowledgeable or educated”.)
(3) Robertson, John M., Chi-Wei Linn, Joyce Woodford, Kimberly, K. Danos, and Mark A.
Hurst. 2001. The (Un)Emotional Male: Physiological, Verbal, and Written Correlates of
Expressiveness. The Journal of Men’s Studies 9, 393–412.
(4) Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm
Kelsall & John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London:
Routledge.
(5) He took pianists, guitarists and harpists in stride, but expressed shock at “13 young
lady violinists (!), 1 young lady violist (!!), 4 violoncellists (!!!) and 1 young lady contra-
bassist (!!!!).
(6) While ego orientation did not emerge as a significant predictor of likelihood to aggress
in any of the three groups, significant correlations were found between ego orienta-
tion and likelihood to aggress for boys, r (????) =.20, p <.005, and girls in the all-girls
league, r (???) =.40, p <.005.
Along the general lines sketched in the introductory paragraphs of this paper,
I will thus probe into parenthetical constructions’ common cognitive basis,
arguing that attention direction turns out to be a relevant consistent principle for
the explanation of parenthetical constructions’ usage profile, which would then
have to be added, as a principal determinant of the participants’ cognitive make-
up, to the list of situational features defining a register (cf. Biber and Conrad
2009: 40; for a similar suggestion, though more global and including punctuation
marks in general, see Sanchez-Stockhammer, this volume).
Why parenthetical constructions – and why attention? Apart from the general
neglect of attentional effects as ubiquitous phenomena in language (cf. Lampert
2009: 20–25), the pattern has somehow – vaguely, intuitively, anecdotally – been
associated with reduced attention and, in consequence, been dismissed as an
informational and textual ‘aside’ at least since the beginning of research on par-
enthetical constructions in Schwyzer’s (1939) seminal study. The central issue is,
172 Martina Lampert
however, whether it is justified to generalise over an attenuation effect as a dis-

tinctive characteristic in the construction’s spoken realisation in the first place2
and, further, unaltered, to its functional equivalent in the written mode, thus
tacitly presupposing lowered salience as an unequivocal property of parentheti-
cal constructions across the board.
My key objective, then, is first and foremost a conceptual concern: elabo-
rating on two previous studies (Lampert 1992, 2011), I will outline, in an initial
(qualitative and microscopic) analysis, a common usage profile of parenthetical
constructions, which might ultimately be added to the list of linguistic features
offered in Biber and Conrad (2009: 78–82), paying due respect to constraints
exerted by situational characteristics that give rise to modality-sensitive register
variation.
Beyond this proposal, I will, however, address a ‘classical’ target of register
analysis – the options and constraints of medium-specific properties that give
rise to parenthetical constructions’ register profile. Repeatedly, Biber has drawn
attention to the prime significance of mode-induced variation, qualifying it as
one likely candidate “for universal parameters of register variation: a dimension
associated with oral versus literate discourse” (Gray 2013: 367), as it is this “oral/
literate opposition […] which emerges as the very first dimension in nearly all
MD studies” (Gray 2013: 367). And it is particularly the language-external medial
properties of the articulatory-organic-auditory and motor-instrumental-visual
channels (cf. Bredel 2008: 11) underlying this dimension that turn out to be the
major determinants of variation also in this case, setting parenthetical construc-
tions in the spoken mode apart from their written/printed counterparts.
To provide some evidentiary support for my line of argument, I will confine
my analyses to printed text, and to the construction’s presumed representational
and positional prototype,3 as illustrated through its generalised schematic tem-
plate:
(7) xxxxx (xxxxx) xxxxx
In view of mode-specific differences as situational variants, I have selected as test

cases for illustration two random samples, representing two extreme written reg-
isters, science writing and experimentalist poetry, which quite reasonably may
2 Some initial empirical evidence challenging this view was offered in a plenary talk at an inter-
national workshop on “Cognitive Motivations of Second(ary) Voices: A Multimodal Perspective
on Parentheses and Quotations” (Bamberg 12/06/2014).
3 In this case study, I will disregard, for space limitations, dashes or commas as principal com-
petitors, which exhibit some distinctive constraints on the syntactic patterns they tolerate.
be seen to occupy the opposite ends of a conceived register continuum (cf. my

analysis in Section 6). As a general caveat, I am all too aware of the obvious limi-
tations of this sketch. Nevertheless, this study, which suggests a form-to-function
correlation on a minor parameter, explicitly draws on the strengths of such func-
tional analysis as an essential component in any register analysis (cf. Biber 1988:
52–53, 55 and 62), especially regarding the identification of individual functions
of textual dimensions that are potentially relevant to register distinctions.
The structure of my paper is then as follows: in Section 2, I will briefly intro-
duce the attention ‘theme’ vis-à-vis parenthetical constructions, in fact as a
long-standing issue, from the traditional point of view; Section 3 will then spot
major correspondences between the situational factors of register analysis and
cognitive parameters pertaining to attention as advanced in cognitive semantics.
Section 4 will detail some relevant mechanisms underlying parenthetical con-
structions in the spoken mode, which then serves as a basis for the attentional
profile of its written counterpart elaborated on in Section 5. Probing into the
sample illustrations, Section 6 will scrutinise their register-specific commonali-
ties and differences, and the final section addresses some critical issues in view of
a more descriptively adequate analysis of parenthetical constructions following
from the proposed link-up of register analysis and cognitive semantics.
2 (In)Attention for Parenthetical Constructions

To begin with, a note on terminology is in order: as is evident from the above
template, alphabetical orthographic systems have conventionalised figural ele-
ments, typically the pairwise occurring delimiters, “with distinct opening and
closing characters” (Huddleston and Pullum 2002: 1731) and originally pertaining
to a non-alphanumeric representational system. To avoid any ambiguities asso-
ciated with parenthesis, which is found to refer to both the figural markers and
the overall pattern, I will instead borrow Lennard’s (1991: 1) term lunulae as an
unequivocal label for the crescent-shaped round brackets that set a (sequence
of) alphabetical elements (graphemes, numerals, punctuation marks) off from
their linguistic environment (cf. also Brown 2009 in his introductory paragraph,
“Some terms”, who follows Lennard’s suggestion). Parenthetical construction will
generally serve as a cover term for the whole structural ensemble instantiating
the concept parentheticity, and parenthesised sequence is used whenever it
appears relevant to make exclusive reference to its ‘content’ (cf. Lampert 2011 for
some details).
174 Martina Lampert
In stark contrast to parenthetical constructions’ pervasiveness and structural

versatility in the written medium in general, and in the more formal registers in
particular,4 the absence of any in-depth study on the range of both their struc-
tural variability and functional potential manifests a general inattention to par-
entheticity as an object of research in its own right. This observation may seem
quite iconic to the pattern’s presupposed communicative function as conveying
a secondary, defocussed and/or incidental aside to the allegedly primary, true
and essential message of the non-parenthesised text. Typically confined to the
sentence as their host domain in both pertinent reference grammars but also in
most linguistic research, parenthetical constructions, especially in the written
modality, thus prove under-researched, or effectively un-researched, in (recent)
linguistics. If they have become the topic of current research at all, it has been
with exclusive reference to the spoken modality (cf., e.g., Dehé 2014, the most
recent publication).
It may be interesting to note, however, that attention has been invoked as a
framing concept ever since Eduard Schwyzer’s (1939) study, which is arguably
the first serious (cross-linguistic) investigation of what may be referred to as the
parenthetical construction. Remaining entirely intuitive and vague, attention
appears to echo William James’ (1950/1890: 403) famous dictum dismissing the
notion only suitable as a presupposed allusion at best: “Every one knows what
attention is.” Schwyzer (1939: 32–33) qualifies parenthesised sequences as ‘aside
meanings’ (Zwischengedanke or Nebengedanke) and thus perpetuates, in fact, a
long-cherished bias advanced and actually codified, for the English language, at
the latest in the Late Modern English grammarians’ accounts (cf. Lennard 1991:
84–113). “Alien” to the primary layer of information and incidental to the current
(text) topic, parenthetical constructions “disrupt” both the syntactic structure
and the line of argumentation of their environment. Hence, in rhetoric or stylis-
tics and in the notorious usage guides, but also in some grammars, parentheti-
cal constructions are usually considered as either undesirable or as negatively
connoted meaningless fillers or wilful digressions – as the “obstinate” title But
I Digress of Lennard’s remarkable study on The Exploitation of Parentheses in
English Printed Verse indicates. They are widely conceived to testify to authors’
4 It comes as no surprise that register-specific details are neither available on the total frequen-
cies nor the relative proportions of parenthetical constructions; however, on a cursory and infor-
mal inspection, frequencies of occurrence and especially variation in structural complexity seem
to increase toward the written end of a conceived spoken-written continuum, i.e., those registers
that are at a considerable distance to casual and spontaneous conversation where recurrent for-
mulaic patterns like comment clauses dominate.
caprice and/or lack of clarity or perspicuity in organizing their texts, a stereotype

that is already present in Schwyzer (1939: 5–7 and 27) and has survived until this
day, testifying to a poor understanding, if not an actual misconception, of the pat-
tern’s functional versatility in terms of sophisticated information management
and elaborate discourse structuring.
Along the same lines, major current reference grammars of English locate the
content(s) of parenthetical constructions “in the shade as background”, as “addi-
tional” and “related”, providing “supplementary information” which is “not part
of the main message” (Biber et al. 1999: 137). And, iconically, the pertinent termi-
nology appeals to the concept of (less or lesser) attention: parenthetical construc-
tions qualify as peripheral elements of clause grammar (cf. Biber et al. 1999); or
when specified in terms of the syntactic patterns they allow, comment and com-
plement clauses along with appositive or non-restrictive modifiers are the catego-
ries regularly reoccurring in the literature. In more idiosyncratic, and at the same
time presumably more general terminology, parenthetical constructions emerge
as non-dependent, disintegrated supplements (cf. Huddleston and Pullum 2002:
1350) – all of which imply the connotation of minor (structural) relevance.
I will, however, (hope to) demonstrate that this received view, which une-
quivocally associates the parenthetical pattern with “only” incidental or back-
ground information of low conceptual import, may well derive from, or be attrib-
uted to, a general misconception tacitly presupposing a “simple” equivalence
of the two language modalities: following from a deep-rooted structuralist bias
toward or ideology of the spoken language as the primary and “true” medium of
communication (cf., e.g., Biber 1988: 5–9 and, very pronouncedly with reference
to punctuation, Nunberg 1990: 1–7 or Bredel 2008: 2–11), the attenuating effect
of parenthetical delivery is indiscriminately imposed on its written counterpart;
and even this tacitly presupposed assumption fails to be confirmed by empirical
evidence, at least in two different settings, reading out subclausal quotes in an
experiment and public speeches (cf. Kasimir 2008 and Lampert 2014). Such an
(over-)generalising approach to parentheticity, however, reveals a principled dis-
regard of fundamental, though never simplistic and binary, characteristics intrin-
sically associated with mode and medium as well as properties that derive from
the human cognitive make-up (cf. Biber 1988: 22, 26, 160–161).
This rather deplorable state of the art may, in part, have been due to the lack
of an adequate analytical tool that is sufficiently explicit to capture (all) relevant
characteristics, calling on attention as a critical explanatory construct for paren-
theticity as a linguistic category. Before Section 4 sketches the baseline of such an
approach, the brief remarks to follow are intended to address some fundamental
preconditions of writing, as they become manifest in situational characteristics
and are identified in register analysis. Capitalising on an evolutionary argument,
176 Martina Lampert
some cursory notes on the options and constraints of the production and recep-
tion circumstances are advanced from a cognitive semantics perspective, which
might again support the sensibility of the cross-framework alignment proposed
in this paper.
3 Situational context and cognitive determinants

Register analysis, as an articulate perspective of linguistic practice, gives prece-
dence to situational characteristics of linguistic events like participants, includ-
ing their specific social relationships, the particulars of the mode, as well as the
setting of the communicative interaction, paying due respect also for the import
of communicative purposes and topics. These situational categories are “more
basic”, since they “cannot be derived from any linguistic phenomena”, that is,
they functionally govern the choice of medially admissible and physically possi-
ble linguistic patterns, as their pervasive and conventionalised instantiations in
a given context. As a result, “registers differ in their characteristic distribution”
(Biber and Conrad 2009: 9) of a particular selection and pattern of lexical and
grammatical features that emerge as common determinants of registers and sub-
registers along continuous dimensions of linguistic variation (cf. Biber 1988: 9).
To comment on some basic determinants of register and outline the con-
ceptual compatibility of register analysis and cognitive semantics as fundamen-
tally comparative perspectives,5 I will spot the most relevant correspondences
between situational and linguistic features in the permanent medium of print (cf.
Biber 1988: 36–42; Biber and Conrad 2009: 40–47) as they become manifest in
the samples selected for analysis: an experimental report from scientific writing,
published online in the summer issue of Brain and Language 2013, and a spec-
imen of experimentalist poetry, E. E. Cummings’ famous untitled poem of 1958,
“i(a” (cf. Section 6).
Instantiating, as printed documents, the same physical mode, the poem
features a single author, the American poet E. E. Cummings, while the scientific
article is co-authored by five US scientists specialising in brain studies, neurol-
ogy and (cognitive) neuroscience. As professionals in their respective field of
expertise, they are likely to exhibit similar general social characteristics as their
readers – a feature that is, however, less predictable for the poem. Both represent
5 For register analysis, see Biber (1988: 20) and Biber and Conrad (2009: 36); Talmy’s attention
factors are essentially framed in terms of same- and cross-venue comparison.
unequivocal instances of texts with un-enumerated (typically) unknown readers,

most likely without any significant amount of interaction, though addressors
and addressees will share, to different degrees, specialist background knowledge
(colleagues in the case of the academic text6 or ‘fans’ in the case of Cummings),
whereas their relative social statuses would perhaps vary more with the poem.
Likewise, the samples are presumably identical regarding their principled pro-
duction and reception circumstances as planned, scripted, revised and (multi-
ply) edited texts lacking any indication as to the actual extent of editing; also,
the setting will not significantly differ between the two: the participants neither
share time nor place, with the readers typically in private (though some public
place is possible, as when reading the poem or the article in class) as well as in
complete control over the text; and both samples feature the same specific setting
as parts of a published book being relatively contemporary.
Regarding ideational properties, such as topic and communicative purpose,
however, the two instances are significantly distinct: while the general and origi-
nal purpose of the poem is entertainment with no further specification suggesting
itself7, the article’s communicative intent may be specified – inform, describe and
report; likewise, the factuality status proves a discriminating feature: an imagis-
tic (rather than narrative) poem vs. a factual academic fragment of non-opinion-
ated statements. And whereas the poem does not display any overt stance, the
academic text features epistemic stance expressions, for instance, purpose and
approach in the samples selected for illustration. The general topic, again, distin-
guishes between the poem’s entertaining function through a picturesque image
and the article’s scientific import, which may be specified as a report on a con-
trolled experiment using an “electronic device designed to alleviate stuttering by
manipulating auditory feedback via time delays and frequency shifts” (Foundas
et al. 2013: 141).
Contrary, however, to their immediate and decisive impact on lexical choices,
it should be noted that, perhaps against expectation, it is not “topical differ-
ences [that] are […] influential for determining grammatical differences” (Biber
and Conrad 2009: 46). As has been repeatedly emphasised, the key relevance
in shaping the overall linguistic appearance is accredited to language-external
factors, as “the pervasive grammatical characteristics of a register are mostly
determined by the physical situational context and the communicative purposes”
6 Typically, academic prose is contextualised by shared background knowledge (cf. Biber 1988:
48).
7 Disregarding some marginal cases as when the poem serves as an exercise in literary discourse
or, like in the present context, register and attention analysis.
178 Martina Lampert
(Biber and Conrad 2009: 46; see already Biber 1988: 11 and 38 as well as very
explicitly Nunberg 1990: 3–4, 7 and 14–15).
Regarding these two major determinants, I will, in the following, elaborate
on the compatibility of Biber’s and Talmy’s approaches as they become relevant
for the subsequent analysis of the samples: for a potential mapping of register
analysis’ situational features, two factors suggest themselves in cognitive seman-
tics.
First, Talmy (2007b) acknowledges the substantial import of the channel-re-
lated situational features (including production circumstances as it were), which
rank high as major determinants of linguistic variation. He in fact refers to the
fundamental nature of the two modes’ production and reception circumstances
inherited from evolution and giving rise to their characteristic modality-related
reflexes – a view that would correspond to Biber privileging them as more deci-
sive. More specifically, it is categorical physical differences in the representational
format that essentially separate the analogous, coextensive and simultaneous
spoken modality (which in principle allows for gradient and relative distinctions
as in vocal dynamics) from the exclusively digital and discrete written system of
representation. It disallows gradient and relative distinctions and is characteris-
tically confined by two-dimensional space (see Section 5 below for some details
on the constraints imposed by conventionalised print).
Second, in Talmy’s cognitive semantics, situational features of register analy
sis may be conceived as inbuilt in lexical items’ associated meaning sectors. To
illustrate: participant-related characteristics like (encyclopaedic and shared)
knowledge or epistemic, affective and attitudinal stance become accessible via
the conceptual complexes of linguistic items themselves, which, in turn, are
notably shaped by another language-external general principle, a language
user’s cognitive state (including attention resources and memory capacity). Such
cognitive reflexes are at the heart of Talmy’s (forthcoming) The Attention System
of Language, and they may be captured, quite generally, as a linguistic item’s
attentional profile, critically determining its usage (cf. Sections 4 and 5).
Such salience effects, I would argue on a more general level, comply with
register analysis in many respects: Talmy (forthcoming), in fact, proposes to (re-)
analyse the contextual components of lexical items as part of their associated
meaning; and the “central notion of a speaker’s particular attitude can then –
through a backgrounding of the role of the speaker – be interpreted instead as a
type of speech situation”, which, in turn, accommodates the concept of regis-
ter. Accordingly, “any speaker attitude or register pertaining to the core meaning
that is lexicalised in a morpheme” as well as targeting “the speech participants
(the speaker himself, the addressee, or the relation between the two) or […] the
current circumstance” would then appear as “introjecting” register distinctions
into their “minds” and thus be subject to the fundamental attentional processes
of activation and attenuation. Under such analysis, “register can always be traced
back upstream to speaker attitude”, incorporating specifics of the communicative
setting in the contextual sector of an item’s meaning for that matter; and
what [for example] might best be treated at root as a speaker’s attitude of respect toward
the addressee – or a speaker’s attitude of solemnity about the circumstance – could also
be interpreted as the presence of a formal situation that triggers the use of a formal register
(Talmy forthcoming).
4 A
n attentional analysis of parenthetical
delivery
Following this sketch of a situational analysis, this section will focus on and
contextualise one meta-linguistic attentional mechanism from Leonard Talmy’s
(forthcoming) The Attention System of Language8 that specifically accounts for
the pattern of parenthetical delivery in the spoken mode.
In this model, each individual attention-specifying device is seen to increase
or decrease the relative attentional weight of a particular linguistic representa-
tion’s (semantic) component or (surface) constituent, which, irrespective of its
linguistic format or structural category, thus coherently accounts for the linguis-
tic variation in terms of attentionally specified, discriminate usage profiles. It is
this functionality of linguistic choices to which “skilled speakers and writers can
devote considerable meta-cognitive attention [in] their options for setting an enti-
ty’s degree of salience” (Talmy forthcoming) and which arguably again invokes
fundamental issues likewise systematically addressed in Biber’s register, genre
and style perspectives.
For the present analysis, I will only selectively and cursorily call on two such
basic attention factors: one that captures attentional properties of an individual
morpheme and one that specifies attentional effects of one entity on another; that
8 Talmy’s forthcoming book introduces a coherent theoretical and powerful analytical factor
model of linguistic attention, informed by a sophisticated theory of language-specific attentional
parameters and accounting for a wide range of attentional effects in language, so far privileging
the (more basic) spoken modality. The individual basic attention factors successively integrate
as component mechanisms, or Areas, in (hierarchically organised) Domains: Domain A, Atten-
tional properties of an individual morpheme, Domain B, Attentional properties of a morpheme
combination, Domain C, Attentional effects of one entity on another.
180 Martina Lampert
is, they assign “different degrees of salience to the parts of an expression or of its
reference or of the context” (Talmy 2007a: 264).9
The attentional mechanisms relevant for parenthetical constructions are
framed in terms of meta-linguistic causal triggers and targets with two distin-
guishable attentional effects: for one, as an immediate effect of a target’s “desig-
nation as the relevant entity out of the entities co-present in the environment”,
its activation level is raised, thus increasing its salience; as a second effect, the
conceptual or referential content of the respective entity will either be activated
or attenuated, lending this selectional target its specific “dual character” that,
in turn, calls for its “differential attentional treatment”, which depends on the
actual impact on the referent, i.e., foregrounding or backgrounding it (Talmy
forthcoming).
The factor addressing attentional effects of parentheticity in the spoken
modality identifies a prosodic device as trigger that first highlights the parenthe-
sised sequence’s referential content as the selected-out target and whose salience
is subsequently attenuated via its prosodically differential realisation. The widely
assumed medium-specific mechanism induces an “expression-spanning loud-
ness reduction and pitch lowering” that together “seems in general to reduce a
hearer’s attention on the expression’s meaning”; and such parenthetical delivery
would then “trigger attentional decrease in a target – in particular, to attenuate
the expression’s reference”, in effect instructing the addressee to consider the
target’s referential content as incidental (Talmy forthcoming).
Accordingly, the parenthesised clause in example (2), “if pronounced as just
described, seems to encourage a hearer to treat its content as merely incidental
information, readily disregarded” (Talmy forthcoming):
(8) My cousin Sue (who happened to be visiting at the time) wanted to go to the museum.
This “attenuative effect of reduced loudness over an expression derives readily

from the attentional principle of quantity” and involves a general cognitive prin-
ciple: “the smaller the magnitude of some perceptual dimension of a form – here,
its loudness – the less salient its referent”. In the spoken modality, then, this
triggering device would uniformly decrease the target’s salience through reduced
physical parameters. According to the received view, this very mechanism is
assumed to be also operative in the written modality and to directly “translate”
into the attentional effect attributed to the lunulae. I would, however, argue
9 This is a substantially simplified version of the actual attentional analysis, abstracting away
from intriguing details in the description and largely avoiding the usage of Talmy’s terminology.
instead that a corresponding, yet distinct device is called for to accommodate

the representational format of its visual counterpart, whose cognitive effects on
readers in processing parenthesised sequences seem to be essentially constrained
by the physical characteristics of the medium (cf. Lampert 2011 for some sugges-
tions to this end). Notably, the presupposed uniform and iconic attenuation is
readily available only to the analogue channel of vocal dynamics (but absent from
the written modality in principle), where such reduction of prosodic parameters
allows for gradience. As the following section may well demonstrate, the written
mode, however, deprives the pattern’s attentional profile of its characteristic dual
nature, owing to its representational design features of digital discreteness.
5 T
oward an attentional profile of parenthetical
constructions
In view of the general comparative perspective, I will now address major medial
differences as well as cross-medium correspondences in the “attentional behav-
iour” of the parenthetical pattern: like the supposed prosodic signature in the
spoken medium, the selected-out parenthesised sequence will, in print, undergo
(some) activation as an effect of being marked off as different from the adjacent
text; but unlike parenthetical delivery in the spoken mode, which allows for
gradience along a quantitative parameter (i.e., activation and attenuation from
minimal to substantial) and is essentially “fluid” in character as well as subject
to individual variation (hence, probably less discriminately effective), the lunulae
will attract (some) attention to themselves by virtue of their qualitative differ-
ence in figural shape as against their linguistic environment. Though members in
the inventory of alphabetical script, the crescent-shaped delimiters are perceiv-
ably distinct from their graphemic vicinity on account of their physical make-up
instantiating a non-alphanumeric representational system.10
Quite analogous to parenthetical delivery in the spoken modality, the lunulae
will attract attention to themselves as well as direct attention to another entity,
the parenthesised item(s); however, the figural elements themselves, categor-
ically different from the analogue gradient parameters of vocal delivery, lack
any perceptual quality to iconically induce an attention attenuating effect on
10 Nunberg (1990: 6–7), for instance, emphasizes the independence of this figural “linguistic
subsystem” as relatively autonomous and sets it on a par with non-linguistic graphical-rep-
resentational systems (cf. also Biber 1988: 7 and 9; Bredel 2008: 10–14).
182 Martina Lampert
the target and will instead initiate an(other) activation process. The digital and
discrete lunulae do not readily support gradience in a quantitative dimension,
and the vision-based linguistic subsystem in alphabetical languages exclusively
relies on the principle of categorical (figural) difference, having conventionalised
only discrete, all-or-none devices – with no perceptually gradient feasible to indi-
cate reduced salience in some physical parameter.11 The lunulae are essentially
separative, ‘point-like’ spatial delimiters, unequivocally signalling the begin-
ning and end of what is considered the prototype of parenthetical constructions
in print. Again, contrary to their functional equivalent of parenthetical overlay
delivery in the spoken modality, they are not coextensive with the parenthesised
sequence: by their curved shape, wide at their centres and pointed at their two
ends, lunulae – iconically speaking – “embrace” a sequence that in this way
receives an identity of its own, both separated from and integrated into its envi-
ronment, and effective in delimiting the item(s) “inside”. Accordingly, any poten-
tial attenuation that may be associated with the parenthetical pattern does not
derive from perceptual stimuli but would exclusively have to be understood as
a mere convention that has been negotiated in the literate community. It is thus
ultimately an effect of (prescriptive) formal instruction or cultural practice exhib-
iting the view that these characters signal the reader to treat the parenthesised
target as an aside deserving lesser attention.
In conclusion: while the parenthetical pattern cross-medially shares the
essentially dual character of attentional selection and weighting, the outcome
is different: discrete lunulae do not allow for the unequivocal attenuating effect
of parenthetical delivery. Readers encounter a visual stimulus whose categorical
difference both in the type of triggering device and its attentional impact is at the
mercy of the written medium’s essentially digital nature; and with the parenthe-
sised sequence being perceptually non-distinct from the previous and subsequent
typographical environment, no perceptual effect in either attentional direction is
reasonably to be expected. This cross-modal variance in parenthetical construc-
tions’ fundamental characteristics ultimately results in a categorical difference of
the same functional pattern: it derives from the tangible features pertaining to the
production and reception circumstances and gives rise to the profound (though
11 In principle, the written modality would not prohibit attentional gradience in the target,
though: light fonts, e.g., in a context of regular fonts might conventionally correspond to the re-
duced loudness over an expression in the parenthetical delivery and would in fact implement the
attentional principle of reduced quantity in a physical parameter. Exploiting such attenuating
potential has obviously never been considered as a possible general strategy.
not absolutely but continuously quantifiable) divide into speech and writing by
major situational parameters (cf. Biber 1988: 38–45).
6 T
racing the parenthetical pattern across written
registers
Whereas the preceding section focused on mode-dependent differences, the
analysis to follow will now highlight commonalities across registers: my prelimi-
nary findings from an initial small-scale case study of arguably the extreme ends
of the register continuum, (scientific) academic prose and experimental poetry,
seem to speak in favour of a common cognitive principle underlying the pattern –
despite its enormous range of structural variability (cf. Lampert 2011 for some
exemplification). Parenthetical constructions may then qualify as an integration
feature (cf. Biber 1988: 43), significantly sharing both the structural pattern and
the communicative function across the two samples. In fact, scrutinising a novel
candidate for inclusion in the pool of register-indicating characteristics is not
unlikely to produce unexpected results, as when “certain linguistic features will
occur more frequently […] than […] expected” (Biber and Conrad 2009: 10).
A case in point apparently are the lunulae – a pertinent and prominent signa-
ture feature of E. E. Cummings’ minimalist poetry, in combination with an uncon-
ventional use and expressive functionalisation of not only punctuation marks
in general but also of deviant orthography and innovative typography. Though
clearly a major issue of style analysis12 and typically “associated with aesthetic
preferences” rather than being functional (Biber and Conrad 2009: 18), Cum-
mings’ usage of lunulae indeed appears to sensibly allow for, or even invite, a
comparative analysis regarding the parenthetical pattern, all the more so in light
of Biber’s (1988: 13) emphasis that any such function must not be “posited on an
a priori basis; rather [it is] required to account for co-occurrence patterns among
linguistic features”. Following a similar rationale and giving the linguistic dimen-
sion priority for the time being, I would indeed suggest that, across written regis-
ters (perhaps even including styles), parenthetical constructions are more likely
to share an essentially cognitive function rather than exhibiting a great extent of
variation; and it might turn out that it is “only” their contextual co-occurrence
12 Biber and Conrad (2009: 18) note that style analyses are “similar to register perspective” in
that “typical linguistic features [are] associated with a collection of text samples from a variety”,
they characteristically differ “in the underlying reasons for the observed linguistic patterns”.
184 Martina Lampert
features that would conceivably discriminate more specialised (sub)register

functions, while not “separating”, or telling apart, even opposite written regis-
ters, in the face of the most critical defining characteristics of a register: shared
communicative functions (cf. Biber and Conrad 2009: 16). Now, what would this
commonality across the “extreme” registers of scientific article and experimental
poetry mean in light of the assumption that, according to Biber (1988: 16 and
19–20), differences (due to situational factors) are more likely to be expected?
To begin with one randomly chosen article from neuroscience, Foundas,
Mock, Corey, Golob and Conture’s “The SpeechEasy device in stuttering and
nonstuttering adults: Fluency effects while speaking and reading”, I will address
some major issues with respect to the overall line of argumentation in this paper.
For reasons of greatest comparability, I have only selected instances of paren-
thetical constructions from the article that match those in the poem regarding
their structural type, i.e., the parenthesised sequences in example (9) exclusively
feature lunulae but lack any verbal specification of the relation between their
own referent and the referents of the outside linguistic environment (but cf.
examples 11 and 12 below).
I will first comment on the three excerpts selected from science writing and
spot some major salience-related effects:13
(9) a. In the case of DAF, the speech is amplified and delayed (alteration in the time
domain), whereas FSF shifts the whole spectrum of speech.
Gloss: DAF abbreviates externally-delayed auditory feedback, and FSF replaces fre-
quency-shifted feedback.
(9) b. Three speech tasks (Reading Aloud, Monologue, Conversation) were used to examine
speech fluency at baseline and in each condition repeated independently for each
participant with the device in the left and right ear.
(9) c. For purposes of this study, attention was measured by a computerized version of the
CPT with this measure approaching significance with the PWS having higher scores
(more impaired attention) compared to controls.
Gloss: CPT abbreviates Conner’s Continuous Performance Test and PWS substitutes
people who stutter.
Following the linearity constraints of the reading process,14 a reader of the above
samples will, after an uninterrupted sequence of graphemes (delayed, tasks and
scores) and an obligatory blank space, encounter the opening character, which
would – according to the attentional analysis presented in the previous section –
13 In this description, I do not imply any claim whatsoever about the actual on-line processing.
14 These constraints testify to the strict(er) principle of linearity in written language; cf. Biber
(1988: 38), Bredel (2008: 9 and 30–31).
first attract their attention to the character itself, resulting in its activation. The
lunula is then immediately succeeded by another uninterrupted sequence of
graphemes (alteration, Reading, and more) – the first constituents of the respec-
tive parenthesised sequences, all separating, by blank spaces, the word forms
in their specific linear sequences. Directly attached to the final items domain,
Conversation, and attention, another such figural element, the complementary
closing lunula, signals the end of the parenthetical construction, which is imme-
diately followed by a comma as well as another blank space in (1) a., while (1) b.
and c. only feature a blank space preceding the word forms of the non-parenthe-
sised text: whereas, were and compared.
Example (10), Cummings’ untitled experimentalist poem, exhibits only some
variations on the same theme:
(10) l(a
le
af
fa
ll
s)
one
l
iness
Note, first, that all instances of the shape <l> have to be considered typograph-
ically ambiguous to represent the lower case grapheme of the corresponding
lateral approximant/l/, the numeral one, and the first person singular pronoun, I.
In addition to the deviant vertical arrangement of the characters, a reader
is confronted with the homograph I, which adds to the effect of alienation ini-
tiated by the unconventional assembly of symbols and is likely to arouse sur-
prise, irritation or delight in the recipient. Note that the integration of a paren-
thetical construction into a morpheme is, in principle, admissible beyond the
poetic context, as in formal academic registers, for instance: cf. as one example
W(h)ither English, the title of a 1990 article by Peter Widdowson15. Deviating,
however, from conventionalised practice, the poem incorporates a sequence of
15 See Lampert (2011) for further exemplification of how deliberate academic writers exploit the
device to create the ambivalence crucial for their intended reading.
186 Martina Lampert
dissociated graphemes (assembled in four pairs) that have to be reconstructed as

the complete simple clause a leaf falls. Different from a canonical, horizontally
arranged running text, line breaks and three space lines replace its regular blank
spaces, while the closing lunula expectedly follows the final letter <s> of falls
directly. Without venturing a final decision on the parenthetical pattern’s degree
of alienation in the poem, I would nevertheless argue that the gestalt remains
perfectly decodable against its conventional form – in fact, I would suggest that
it is the lunulae in the first place that, apart from the two lexical items, high-fre-
quency one and the transparent nonce iness, render the sequence of graphemes
“readable” as a clause.
In terms of discourse functions, and irrespective of any structural specifics16
of the parenthesised sequence, (9) a. illustrates one out of only few principal
options: the parenthesised sequence (alteration in the time domain) represents
an instance of generalisation over the subcategory of DAF (to be spelled out as
externally-delayed auditory feedback) as one of its specimen with respect to the
time dimension of speech (relevant for an analysis of stuttering); that is, the
parenthesised sequence links the more specific information in the preceding text
to a superordinate category of temporal changes. The most likely communica-
tive purpose underlying this author-causal strategy is to offer the reader a more
general reference system for the information to integrate, in the service of safe-
guarding an approximation of the shared knowledge base between author and (a
less specialist) reader.
(9) b., by contrast, suggests the opposite textual and informational relation
between the preceding environment and the parenthesised sequence, identify-
ing the three concrete test items: the parenthetical construction (Reading Aloud,
Monologue, Conversation) specifies the subcategories of tasks to establish a base-
line of a test person’s speech fluency profile; again, the exact relational specifica-
tion will have to be reconstituted through inferencing processes on account of the
knowledge base available to a reader.
A third option is instantiated in (9) c., where the parenthesised sequence
(more impaired attention) most plausibly establishes a same-level category rela-
tion of higher scores […] compared to controls, essentially reformulating the same
referential content from another perspective, i.e., framed with reference to the
16 It may be worth noticing that the clausal pattern in (10) would comply with an expected gen-
eral preference in non-academic texts, but see (6), while (9) a. through c. represent the phrasal/
nominal prototype of informational writing (cf. Gray 2013: 368). Both, however, document refer-
entially non-explicit structures (without any indication of the relation), which would, following
Biber (cf. 1988: 145), go against the stereotype of academic samples.
study’s objective; it may, however, be plausible to conceive this case as a spec-

ification as well, since more detail is added to the text’s informational import,
ultimately leaving us with two major principled and (logically) complementary
relations: generalisation and specification.
Interestingly, the poem indeed appears to instantiate both the very same rela-
tional pattern and discourse function suggested for the examples from academic
prose: E. E. Cummings’ text features the same delimiter and the same selected-out
structure, with no categorically different principle inside and outside the lunulae
(even though, admittedly, the non-parenthesised sequence incorporates only two
conventional lexemes, one and the nonce iness): hence, it follows the same cog-
nitive principle. Other than the science article, the poem, however, plays on the
option of simultaneously meaningful processing alternatives, that is, between
specification and generalisation as ‘legitimate’ reading variants of the text –
either privileging the non-parenthesised sequence, i.e., specifying the abstract
concept loneliness through an example, or generalising over the parenthesised
sequence as a (metaphorical) specimen of loneliness; and, like in (9) c., there is
even the option to conceive the two component texts as complementary perspec-
tives, ‘melted’ into one ‘statement’.17
It should be added that, in all instances, it is up to the reader to reconsti-
tute the presupposed relation, with the risk of their misconceiving its actual
meaning (cf. Lampert 2011 for some detail). To explain: unlike the cases in (9),
which do not explicitly instruct the reader how to exactly process the respective
information, the structurally variant examples (11) and (12) from the same article
below feature optional, conventionalised triggers that maximally control for
the author-intended processing of the relational specification. While in (11) i.e.
asserts the relation of semantic equivalence between laterality and handedness,
the e.g. in (12) indicates that changes in speed rate and amplification represent
exemplars of the general category other factors (cf. Foundas et al. 2013: 146–147).
(11) No significant associations were observed between device effect and any of the three
measures of motor (manual) laterality (i.e., handedness) (all Bonferroni-corrected
p-values = 1.00).
(12) It should be noted, however, that findings regarding the influence of the device on
stuttering must be interpreted with caution as other factors (e.g., changes in speech
rate, amplification) may contribute to enhanced fluency with this treatment.
17 Any claim to do justice to the literary intricacies involved is explicitly beyond the purpose of
these few remarks; my concern is solely with a demonstration of a common cognitive principle.
188 Martina Lampert
As argued out in the previous section, the parenthesised sequences, though

attentionally made more salient as the selected-out targets, are neither different
in quality nor in quantity in any typographical feature(s): without any difference
in a perceptual dimension of form, only separated off from the environment, no
attenuating or activating effect is feasible; and this perceptual non-distinctness
between parenthesised and non-parenthesised text may – plausibly and readily –
invoke ambiguity, or irritation, in readers how to process the respective informa-
tion in terms of priority – an (attention) effect that has been around ever since
parentheses have been used (cf., e.g., Lennard 1991: 5). It is this perceptual (or
cognitive) dimension that appears to be the source of the ambivalence,18 with
targets perceptually indiscriminate from their surrounding and quite iconically
inviting alternative attributions of salience to either the textual environment or
the parenthesised sequence: in form, the lunulae both separate and integrate;
and in function, the perceptual non-distinctness opens up reading alternatives
between “aside and drama” as there is “nothing […] to prevent [the lunulae] from
being […] emphatic” (Lennard 1991: 5). They only delimit a portion of (the same)
text, or in Cummings’ case, two “texts” melted into each other as two layers of the
same poem (cf. Tartakovsky 2009: 228–229).
This very effect of simultaneously available options among which to choose
is well known from another cognitive system: in vision (which is the natural per-
ceptual domain of print), the fundamental (per)ceptual phenomenon of Gestalt
psychology’s figure-ground distinction gives rise to the reflexive cognitive ambiv-
alence of visual illusions or bi-stable images as represented in Edgar Rubin’s clas-
sic.19 Either the vase is attended to as figure against two faces as (back)ground or
the two faces as foregrounded figure against the vase as (back)ground.20 Trans-
lated into the medium of print, this effect, emerging from the same general cog-
nitive principle, may lend lunulae their modality-specific profile: as a meta-lin-
guistic/cognitive device, they instruct the reader to simultaneously attend to
linguistic alternatives outside and inside of them; and attention will have to be
divided to select among different processing options – in an attempt to “liber-
ate” the text from its spatial linearity, or unidimensionality (cf. footnote 16), and
18 Cf. also Brown’s (2009) “ambivalent nature of the parenthesis” and his repeated appeal to the
concept of attention, i.e., “foregrounded” vs. “unimportant” and attention vs. importance in the
introductory lines of his “dissertation”.
19 On the associated implications of figure-ground reversal in vision see Palmer (1999: 280–287).
20 Perceptual psychology abounds in experiments that confirm a robust effect: having realised
this ambivalence, a (per/con)ceiver’s perceptual system is likely to switch to and fro, and it is
hard to keep attention on only one “interpretation” or effect (see Palmer 1999). By analogy, the
two alternatives are accessible for a reader, who may, however, choose one option as primary.
creating, through competing readings, the illusion of conveying more than one
message at a time. Accordingly, an individual reader may choose to either focus
their processing capacity on the parenthesised or the non-parenthesised infor-
mation – depending, in the case of the academic text, presumably on a particular
reader’s expertise and knowledge; hence, the actual processing, then, proves a
question of adequacy of understanding. In the poetic context, in contrast, the
choice appears to be(come) an issue of preference, with the ensuing effect of sur-
prise or delight, and thus, a matter of propensity for playfulness.
A final word goes to the poem, which, according to literary critic Alistair
Brown (2009) stretches the edges of the pattern (too far?); he expresses his
“verdict” that “the example […] is extreme […], useful for illustrating the range of
possibility in the lunulae but hardly representative of the general use.” I would,
however, argue that this conclusion is justified only at first sight: in attentional
terms, the poem ‘only’ employs the principle of divided attention, which quite
naturally invites strategic instrumentalisation – playing on the systematic ambi-
guity of what to attend to more. This option is indeed exploited by Cummings,
yet entirely within the ‘legitimate’ confines admissible in the visual medium of
print: “The synthesis of two different possibilities occurs here, but visually rather
than metaphorically” (Brown 2009). My conjecture is rather that both the figural
elements and the structural pattern – even in such an allegedly extreme case –
“just” perform their general cognitive “task” along with its well-known effect,
generating (per)ceptual ambiguity – which, if sensible, may indeed be quite dis-
illusioning in the context of an avantgarde piece of art.
Against the distinctive dual nature of parenthetical delivery’s attentional
profile with its (potential) selective activation and attenuation in the spoken
mode, the pattern in the vision-based written modality rather suggests divided
attention as an appropriate reference concept to capture parenthetical con-
structions’ modality-specific effect: well-known from Gestalt psychology’s fig-
ure-ground-distinction, it entails that attention should be divided between two
possible readings and thus creates the illusion of conveying two “messages” at a
time. In particular, with respect to the device’s predictive potential I would argue
that the difference in impact across the two sample registers selected for this
paper “boils down” to an after-effect of surprise in the poem, resulting from the
non-conformity to reader expectations associated with its conventionalised genre
norms that are prevalent in formal (academic) writing.
190 Martina Lampert
7 N
ew vistas: Balancing out cognitive
determinants and situational constraints
on parentheticity
Though definitely limited in both scope and variation of the constructional type,
with only two samples scrutinised, this outline account may nevertheless have
given at least a sense of the central argument, spelling out a principal cognitive
determinant of parenthetical constructions in general in its medium-specific
profile of print in particular: first, the nature of human cognition imparts, and
allows for, certain forms of implementation that are shared across cognitive
systems and is largely controlled by attention; as a second major effector, the
tangible properties of the production and reception circumstances impose their
constraints on the pattern’s structural options, manifesting a fundamental divide
between the language modalities and giving rise to two distinct medium-sensitive
attentional profiles of the parenthetical construction.
Based on the general principle of divided (visual) attention, parenthetical
constructions emerge, in the written mode, as a register and genre-independent
phenomenon. The pattern’s perceptual ambivalence “naturally” follows from the
lunulae’s attentional activation as non-graphemes, on the one hand, and from
the parenthesised sequence’s non-difference to its graphemic environment, on
the other. With any palpable perceptual attenuation effect missing in the con-
struction’s formal representation, neither a more nor a less activated sequence
may be identified, unless one would permit, once more, that the bias toward the
spoken modality be conceptually imposed on print, perpetuating the false ideol-
ogy that writing is dependent on speech.
Thus, the crucial question is why parenthetical constructions exist in the first
place, and why they are so pervasive in the written registers, given the principled
option of this production circumstance for multiple revisability. One reasonable
suggestion may be that, as a meta-cognitive device, the pattern allows for a con-
ceived additional (or separate) information level, hence fictively circumventing
the linearity constraint of the linguistic medium’s spatial two-dimensionality.
With their general cognitive parameter of divided attention, parenthetical con-
structions convey two “messages” at a time, or, as Nunberg (1990: 115–116) has
it, they – like quotations – “depart from a presumptive text”. If attenuation as a
general feature were retained, equivalent processing of the alternatives regarding
a specific ideational content in the linear form of progression available in print is
precluded in principle, thus sacrificing a text’s adaptability to the mind-sets and
expectations of individual readers that will play a decisive role in determining
the preferred reading. Globally, then, parenthetical constructions indeed instan-
tiate a convenient textual strategy that may both support and be (consciously)
exploited for specific communicative purposes.
Apart from these general (register and genre-independent) cognitive impli-
cations, however, it proves an entirely empirical issue whether the suggested
discourse function(s) – here restricted to the complementary logical relations
of generalisation and specification (plausible as they may be for the selected
cases) – might be hypothesised to hold across registers. In this vein, an in-depth
corpus-based register analysis of representative samples that will pay respect to
higher-level discourse functions and their expected complex, systematic inter-
action with situational characteristics is essential to ultimately determine the
range of variation across the textual dimensions – whether a limited set of func-
tional relations possibly constrains parenthetical constructions or whether any
determination will rest on the unique interaction between the given text and an
individual reading experience, probably with few or no a priori generalisations
possible.
What the commonality of the two samples from disparate written registers
may, however, indeed suggest is the significance of the communicative “task”,
which perhaps proves a – or: the – decisive criterion of register variation (cf.
Gray 2013: 364), being largely independent of the type of structural integration:
while Cummings’ text with its clausal specimen rather invokes a presumptive
oral written register, the academic article conforms to the stereotype of phrasal
modification characteristic of formal writing; but in both cases the parenthetical
construction manifests itself as a “cognitive ‘marker’ of written discourse, which
can only be produced in circumstances that allow planning and manipulation
of the text” (cf. Gray 2013: 368). Register analysis will certainly contribute its
findings to provide insights into detailing the exact distribution of frequencies
of distinctive and salient co-occurrence patternings across (sub)registers, paying
respect to functional differences between registers in terms of their “internal
coherence”, i.e., the degree of variation that they tolerate (Biber 1988: 26). Con-
verging on the same observation of non-linearity or multi-layeredness (cf. Biber
1988: 21), the cognitive semantics view might offer a sensible motivation for the
abstracted underlying functional dimension – effects resulting from the cognitive
constraints that divided attention (dis)allows.
192 Martina Lampert
References
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: CUP.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: CUP.
Bredel, Ursula. 2008. Die Interpunktion des Deutschen. Tübingen: Niemeyer.
Brown, Alistair. 2009. Parentheses and ambiguity in poetry of the twentieth century. http://
www.thepequod.org.uk/essays/litcrit/parenthe.htm (accessed 30 January 2015).
Corpus of Contemporary American English (COCA). corpus.byu.edu/coca (accessed 29
September 2015).
Cummings, E. E. 1973. Complete poems, 1904–1962. George J. Firmage (ed.). New York: Liveright
Publishing Corporation.
Dehé, Nicole. 2014. Parentheticals in spoken English: The syntax-prosody relation. Cambridge:
CUP.
Foundas, Anne L., Jeffrey R. Mock, David M. Corey, Edward J. Golob & Edward G. Conture. 2013.
The SpeechEasy device in stuttering and nonstuttering adults: Fluency effects while
speaking and reading. Brain and Language 126(2). 141–150.
Gray, Bethany. 2013. Interview with Douglas Biber. Journal of English Linguistics 41(4).
359–379.
Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge grammar of the English
language. Cambridge: CUP.
James, William. 1950 [1890]. The principles of psychology. New York: Dover Publications.
Kasimir, Elke. 2008. Prosodic correlates of subclausal quotation marks. ZAS Papers in
Lampert, Martina. 1992. Die parenthetische Konstruktion als textuelle Strategie. Zur kognitiven
und kommunikativen Basis einer grammatischen Kategorie. München: Otto Sagner.
Lampert, Martina. 2009. Attention and recombinance: A cognitive-semantic investigation into
morphological compositionality in English. Frankfurt am Main: Peter Lang.
Lampert, Martina. 2011. Attentional profiles of parenthetical constructions: Some thoughts
on a cognitive-semantic analysis of written language. International Journal of Cognitive
Lampert, Martina. Forthcoming. Cognitive motivations of second(ary) voices: A multimodal
perspective on parentheses and quotations. [Conference proceedings of the international
workshop on secondary syntax: Parentheticals, vocatives, quotations. University of
Bamberg, 6 December 2014].
Lennard, John. 1991. But I digress: The exploitation of parentheses in English printed verse.
Oxford: Clarendon Press.
Nunberg, Geoffrey. 1990. The linguistics of punctuation. Stanford: CSLI.
Nunberg, Geoffrey. 1999. Introductory Essay to the Norton Anthology of English Literature,
Seventh Edition. http://people.ischool.berkeley.edu/~nunberg/norton.pdf [1–22]
(accessed 29 September 2015).
Palmer, Stephen E. 1999. Vision science: Photons to phenomenology. Cambridge, MA: MIT
Press.
Patt, Sebastian. 2013. Punctuation as a means of medium-dependent presentation structure in
English: Exploring the guide functions of punctuation. Tübingen: Narr.
Schwyzer, Eduard. 1939. Die Parenthese im engern und im weitern Sinne. Berlin: de Gruyter.
Talmy, Leonard. 2003. The representation of spatial structure in spoken and signed language.
In Karen Emmorey (ed.), Perspectives on classifier constructions in sign language,
169–195. Mahwah, NJ: Erlbaum.
Talmy, Leonard. 2007a. Attention phenomena. In Dirk Geerarts & Hubert Cuyckens (eds.), The
Oxford handbook of cognitive linguistics, 264–293. Oxford: OUP.
Talmy, Leonard. 2007b. Recombinance in the evolution of language. Proceedings of the 39th
annual meeting of the Chicago Linguistic Society: The panels. Chicago: Chicago Linguistic
Society. 26–60.
Talmy, Leonard. Forthcoming. The attention system in language. Cambridge, MA: MIT Press.
[draft version from 2010]
Tartakovsky, Roi. 2009. E. E. Cummings’s parentheses: Punctuation as poetic device. Style
43(2). 215–247.
Widdowson, Peter. 1990. W(h)ither English? In Martin Coyle, Peter Garside, Malcolm Kelsall &
John Peck (eds.), Encyclopaedia of literature and criticism, 1221–1236. London: Routledge.
Stella Neumann and Jennifer Fest
Cohesive devices across registers and
varieties: The role of medium in English
Abstract: The present paper aims at analysing varieties of English from a func-
tional as well as regional perspective, arguing that these two parameters of varia-
tion differ, but are closely related in the way they influence and shape language.
For that purpose, the six regional varieties of Singapore, Hong Kong, India,
Canada, Jamaica and New Zealand are examined in a corpus-based approach
drawing on the data from the International Corpus of English (ICE). All regional
varieties are represented in the study by the same five registers: academic writing,
administrative writing, broadcast discussions, conversations and exams.
The analysis focuses on the dimension of medium, which is examined in
terms of three concrete linguistic markers: the use of pronouns, conjunctions and
lexical density. The results clearly show differences along both regional and func-
tional lines which allow comparative conclusions about the speech societies in
question.
1 I ntroduction
English has a peculiar status amongst the languages found in different parts
of the world. For numerous reasons it developed along diverse lines in various
regions, resulting in a large number of different varieties spoken in almost all
corners of the world (for an overview of 76 varieties including pidgins and creoles
cf. Kortmann and Lunkenheimer 2013). These different regional varieties show
particularities which depend on the socio-cultural background and history of
the respective speech communities and the status English has in that context.
New varieties continue to evolve, arguably because the role of English is still
growing. One interesting question in this context is how to determine whether a
speech community’s use of English can be categorised as a new variety with its
own set of linguistic features (or whether the observed peculiarities are mistakes
and the putative variety is simply learner language). Amongst the criteria that
have been mentioned to determine whether a given use of English has emerged
as a new variety is the development of a distinct set of registers (cf. Mollin 2007).
Stella Neumann, RWTH Aachen University

Jennifer Fest, RWTH Aachen University
196 Stella Neumann and Jennifer Fest
The notions of functional variation, i.e. register, and regional variation are thus
closely related (cf. Schubert, this volume). It should be noted that throughout this
paper the notion of regional variation is used drawing on Halliday’s distinction
between variation according to the language user versus variation according to
language use (cf. Halliday 1978: 183): regional variation in this sense refers to
speaker-related variation based on his/her geographical provenance in contra-
distinction to functional variation capturing context-related variation independ-
ent of the speakers’ personal background. ‘Regional’ in this sense is being used in
a way that is thus broader than the more specific reference to dialects and related,
more local varieties, which is more commonly used in variational linguistics.
Varieties of English have been described extensively both individually and
comparatively (e.g. Kortmann and Szmrecsanyi 2004; cf. Section 2), and the same
is true for register variation (e.g. Biber 1995; Neumann 2013). What is still largely
missing is a systematic account of register variation across varieties of English, a
notable exception being Xiao (2009; cf. Section 2). Apart from that, even though
Systemic Functional Linguistics prioritises paradigmatic relations, i.e. the lan-
guage user’s choice depending on the meaning s/he wants to express and accord-
ing to different contexts, register-based research across varieties of English is
still at its beginning and often focuses on individual linguistic features (cf. e.g.
Güldering, this volume; Schaub, this volume).
This study presents a partial analysis of registers across different varieties of
English as part of an ongoing research project that aims at taking stock of the dif-
ferences and similarities in terms of register variation across varieties of English.
In the framework of this corpus-based project, we examine six components of the
International Corpus of English (ICE; Nelson, Wallis and Aarts 2002)1 and cur-
rently five of its text categories in order to collect findings for the different regis-
ter parameters drawing on systemic functional register theory (e.g. Halliday and
Hasan 1989). A previous study (Neumann 2012) provided a first set of findings
on one subcategory for each of the three register parameters “field”, “tenor” and
“mode” of discourse respectively, namely experiential domain, social distance
and medium. This study takes up medium again, this time concentrating on cohe-
sion, since the choice and frequency of cohesive devices reflect some interesting
specificities of the spoken versus written medium.
Since linguistic features are polyfunctional, it should not come as a surprise
that a register study re-analyses the same features in the light of different register
parameters. In the case of this study, this is true for two of the three indicators
1 <http://ice-corpora.net/ice/> (accessed on 28 April 2015)

Cohesive devices across registers and varieties: The role of medium in English 197
that will be examined in Section 4, which have also been discussed by Neumann
(2012) in the context of other register parameters.
The remainder of the paper is organised as follows: Section 2 discusses the
relationship between register and variety in more detail, thus motivating the
approach chosen for this study. We will go on to summarise the corpus method-
ology including the operationalisation of medium as well as the concrete quan-
titative measures used in this study in Section 3 before discussing the results of
the corpus analysis in Section 4. The paper closes with some concluding remarks
in Section 5.
2 Variation across registers and varieties

Initially, “variety” is a cover term for different ways of using language. In an early
paper, Gregory (1967), for instance, points out that the term covers as heteroge-
neous types of dialectal variety as idiolect (“Miss Y’s English”), temporal dia-
lects (“Old English”), geographical dialects (“American English”), social dialects
(“Upper Class English”) and what he tentatively calls standard dialects (“Stand-
ard English”). These types are closely related to the language user’s situation
in time, space and society. Gregory continues to distinguish “diatypic variety”,
which has since come to be known as “register”. This type of variety is based on
use, depending on the situational context, regardless of the dialectal background
of the language user in Gregory’s above sense. The term “variety” has become
established as a term covering geographical and social dialects (acknowledging
the increasing interaction between these two types of variation), and “register” is
used to refer to functional variation, which is due to the recurring characteristics
of the situational context. In the course of this development, the interrelation
between variety and register has been neglected and scholars specialising in one
of the two areas mention a potential impact of the other area only in passing – if
at all (e.g. Biber 1995; Neumann 2013: 1).
When exploring register from the point of view of varieties of English, we are
actually looking at the question of whether there is one English language with a
certain range of registers and varieties as some kind of generalised dialects, as it
were, or whether what we label as “English” is actually a loose collection of vari-
eties more closely related to what one might call different languages.
In linguistic theorising, the notion of language is typically used to refer to
some kind of abstract system which – depending on the theoretical stance – con-
tains a collection of rules or the options to express different meanings. Functional
language theories such as Systemic Functional Linguistics describe the relation-
ship between the abstract system and concrete instances of language use as
mediated by probabilities of (co-)occurrence of linguistic features (e.g. Nesbitt
and Plum 1988; Halliday and James 1993; cf. also similar ideas in usage-based
accounts of the cognitive family of theories). This entails a skewed distribution
of features across different types of instances. Systemic Functional Linguistics
specifically argues that this probabilistic distribution results in subsystems
which filter the available features depending on the requirements or conventions
of recurring situations (e.g. Matthiessen 1993; cf. from a sociolinguistic point of
view Berruto 2004). The specific constellation of features in a given situational
context is called register. Given this link between register and situation types, it
appears plausible to assume that registers are constrained by the cultural context
in which they occur: given the range of variation in terms of cultural contexts
across varieties of English, it is unlikely for registers (i.e. situational contexts) to
be congruent across varieties.
Sociolinguistics usually focuses on the description of varieties and often does
not emphasise more general claims about language – even though current the-
orising in the cognitive family of theories tends to interact with sociolinguistics
and in particular investigates areas traditionally associated with sociolinguis-
tics (cf., for instance, Kristiansen and Dirven 2008). Descriptive linguistics, on
the other hand, tends to ignore variety-specific features and take an immediate
shortcut to general claims about the language system. A general account of a lan-
guage system organised by register is still not the norm.2 A further stratification
of language according to varieties is even less common, even though it may make
sense to insert variety as an intermediate category between language and register
(Berruto 2004). Significant differences in the type and number of registers as well
as in their linguistic characterisation should affect the general description of the
English language.
On the basis of this line of reasoning, we can conceive of the relationship
between language, variety and register as follows: a language may consist of
several varieties, and an established variety is partly identifiable as such because
it has its own set of registers. More specifically, this means that the particular cul-
tural context of a speech community gives rise to a specific set of situation types
which are linked to specific linguistic registers.
One way of verifying this model and – if shown to be viable – of using it for
systematic descriptions is to analyse corpora covering a broad range of situations
2 Examples of how the register perspective can be integrated in general descriptions of English
are the fourth edition of Halliday’s introduction to functional linguistics (Halliday and Matthies-
sen 2013) and the Longman Grammar of Spoken and Written English (Biber et al. 1999).
across a broad range of (potential) varieties. Existing related corpus-based inves-

tigations either examine a range of linguistic features suitable for claims about
registers (e.g. van Rooy et al. 2010) but are restricted to one or at least a narrow
range of varieties, or, if comparative, focus on only a few features across a wider
range of varieties (e.g. Kortmann and Szmrecsanyi 2004; Nelson 2006; Sand
2004, 2008). A notable exception is Xiao (2009), who adopts Biber’s Multi-Di-
mensional Approach (e.g. Biber 1995) to compare register variation across five
components of the International Corpus of English (ICE). His study is particularly
welcome, since it addresses a central shortcoming of Biber’s (1995) study, namely
the re-analysis of features developed originally for the study of variation between
spoken and written English for a general analysis of register variation in English.
By drawing on Biber’s methodology, however, Xiao (2009) also encounters the
same methodological limitations of a strongly inductive approach (cf. Neumann
2012; for a critique of factor analysis as a standard statistical technique, cf. also
Diwersy, Evert and Neumann 2014).
Although the call for a corpus-linguistic approach to investigating the rela-
tionship between variety and register appears straightforward, there are some
hard methodological problems which need to be kept in mind when attempting
this approach. Assuming – as we do here – that registers differ across diverging
cultural contexts, strictly speaking, a variety-specific corpus design is required
in order to obtain evidence of differences in registers. However, a corpus design
reflecting these diverging cultural contexts may result in the collection of regis-
ters which are incomparable across varieties, thus not allowing us to make spe-
cific claims about the deviation of similar registers. A corpus design common to
the different varieties, as used for the components of the International Corpus of
English, avoids this problem by using a fixed set of text categories. This approach,
in turn, is at least problematic, because it privileges the analysis of comparable
language use and blurs divergences between seemingly comparable registers.3 In
the worst case, it may lead to artefacts of the corpus analysis, because texts were
collected as specimens of a category according to the common corpus design
which do not represent a recurring situation type, and hence a register, in a given
variety. Against this background, the results obtained from the analysis of the
International Corpus of English, which will be described in the following sec-
tions, need to be treated with caution. If it is possible to show differences between
corpus texts for a comparable text category across varieties, this, at least, indi-
cates that there could be underlying differences between the populations.
3 This is exactly the problem that Biber’s (1995) comparative corpus design resolves.
3 M
ethodology
3.1 D
ata
As already mentioned in the introduction, the texts that were used for our analysis
were extracted from the International Corpus of English, a comparative corpus of
English worldwide, which contains spoken and written data from a whole range
of varieties in the form of different components. Several additional components
are currently being collected.
This study adopts the approach to the corpus analysis introduced by
Neumann (2012) and thus analyses five different text categories from six ICE com-
ponents. It should be noted that “text category” is a notion of the corpus compil-
ers which is taken here to provide a rough estimate of what could turn out to be
registers. The same is true for the notion of component. Again this is a label for
sub-corpora representing what the compilers identified as a variety of English. In
what follows, we will assume that the text categories in the ICE components can
be roughly equated to registers in varieties. The registers examined are:
AcWrit: Academic writing from the natural sciences (file numbers W2A-021 – W2A-030 of
the original corpus design)
AdWrit: Administrative writing (W2D-001 – W2D-010)
BCDiscs: Broadcast discussions (S1B-021 – S1B-040)
Conv: Conversations (S1A-001 – S1A-030)
Exams: Timed exams (W1A-011 – W1A-020)
Altogether, the set totals 80 files per component, with 50 files for two spoken and
30 for three written categories. This roughly mirrors the design of the ICE collec-
tion, which contains more spoken than written data, although to a slightly lesser
degree. Note that the standard ICE design uses a fixed set of 500 files, where
the individual file may, depending on the text category, consist of several texts.
Usually, the different texts in one file are not identified by individual IDs but are
simply marked by the tag in the internal structure of the file. This poses a
problem for register studies, where the unit of analysis is the text (not the file).
As it is consequently impossible to compute frequencies per text, we lump all fre-
quencies per files in each text category together in one value in Section 4.
Both spoken categories are classified as dialogic and unscripted, the only
difference being that conversations are identified as private, whereas broadcast
discussions are marked as public. They are distinguished from broadcast news
and broadcast talks, which are marked as monologic and scripted.
The three written registers, too, represent a certain amount of diversity.

Timed exams are classified as non-printed writing produced by students. Aca-
demic and administrative writing are classified as printed, which can be taken
to entail a non-spontaneous nature of the texts. Academic writing is narrowed
down for this study to texts from the natural sciences. It is different from popular
writing, which also includes texts from the natural sciences, yet can be assumed
to aim at a different audience. Lastly, administrative writing is categorised as a
type of instructional writing.
The components selected for analysis represent
Canadian English (CAN)

Hong Kong English (HK)
Indian English (IND)
Jamaican English (JA)
New Zealand English (NZ)
Singapore English (SIN).
These varieties represent sufficiently different socio-cultural situations and cover

different types of variety. Drawing on the categories used by eWAVE, the elec-
tronic World Atlas of Varieties of English (Kortmann and Lunkenheimer 2013),
these can be classified as high contact L1 (NZ, CAN) and indigenised L2 varieties
(HK, IND, JA, SIN; for details of the classification cf. Neumann 2012). We use the
corpus version annotated with part-of-speech information based on the CLAWS7
tagset using the Wmatrix interface (Rayson 2009) as provided by the ICE Corpus
team.
Neumann (2012) documents a number of technical difficulties which can
be summarised in the following three categories: firstly, mark-up is not entirely
standardised (some types of encoding are optional; cf. Wong, Cassidy and Peters
2011). This leads to a fair amount of incomparability between components, which
is harmful to the type of register studies discussed here. Secondly, sloppy imple-
mentation of the mark-up in some components leads to errors in query output
and furthermore is inherited by all later annotation stages (including the part-of-
speech annotation we use here). This is also true for the third type of difficulty,
the mark-up approach to extra-corpus text. Enclosing extra-corpus text such
as editorial comments by tags will lead to the content being included in corpus
queries instead of treating it as additional information. Since the present study
draws on the non-revised corpus version, its findings are liable to inaccuracies
due to these three areas of difficulties.
3.2 O
perationalising cohesion
The notion of register, clearly distinctive as functional language variation against

variation based on social or regional factors, is identified by the three rather
abstract parameters of field, mode and tenor of discourse. Since these register
parameters are too general to allow the formulation of concrete corpus queries,
derivation of more precise linguistic features is a necessary step in the analysis.
Register analysis in the systemic functional framework draws on a stepwise oper-
ationalisation of indicators for the abstract parameters by way of intermediate
categories, thus avoiding the risk of overgeneralisation by using individual and
fairly shallow linguistic features to make far-reaching claims about groups of
texts. An example for the stepwise operationalisation could be channel as a spec-
ification of mode of discourse, the parameter concerned with the way language
is typically structured in a given situational context. Channel refers to the phys-
ical way in which language is transmitted in a given register. It could be phonic
or graphic with ensuing restrictions for the linguistic features used in the given
context. If, for instance, language is transmitted only via the graphic channel,
prosody cannot be used to foreground (or background) information, but rather
syntactic means such as cleft constructions in English. This means that phono-
logical and syntactic features used for structuring information may be interpreted
as operationalisations of channel to decide which particular channel is charac-
teristic of a given register, thus, at the next level of interpretation, also character-
ising the mode of discourse of this register. While channel would appear a fairly
straightforward category which can be determined without extensive linguistic
analysis, computer-mediated communication, such as, for example, chat com-
munication, seems to defy traditional classifications of channel, thereby making
a linguistic analysis of the specifics of the electronic channel appear useful (for a
more detailed introduction to the derivation of register indicators, cf. Neumann
2013).
For this study, the intermediate category of medium, or, more precisely, the
way in which spoken and written medium affects the organisation of language
in texts (Halliday and Hasan 1989), and its underlying phenomenon of cohesion
are selected from the area of mode of discourse. Cohesion refers to the linguistic
means which make a text hang together (Halliday and Hasan 1976).
Spoken and written language can be said to vary – amongst other indica-
tors – in the preferred type of cohesive devices and their frequency. For instance,
as a corollary of their context dependence, spoken registers tend to draw more on
pronominal means to link clauses (Biber et al. 1999: 237); written texts, in particu-
lar those where the audience is unknown to the writer, will spell out more lexical
information. The registers, independent of the variety they originate from, can be
supposed to have these indicators in common at least to a certain degree. Taking

the perspective of comparing varieties, we can expect variation in the reliance on
specific cohesive devices depending on the extent of register variation within the
variety as well as specific socio-cultural characteristics of the speech community
such as literacy.
Halliday and Hasan (1976) describe five cohesive devices, namely reference,
ellipsis, substitution, conjunction and lexical cohesion. Our analysis presents a
closer examination of features related to reference, conjunction and lexical cohe-
sion in the form of pronoun frequency, frequency of conjunctions and lexical
density. An example from the corpus visualises these linguistic features quite
clearly:
Ivan Pavlov, was the first person studying about classical conditioning in 1903. His demon-
stration was about the salivating of dog. He noticed that dogs accustomed to the proce-
dure would start salivating before the meat powder was presented. These are considered
as unconditioned stimulus and unconditioned response and they occur without previous
conditioning. Ivan then used the ringing of a bell as another stimulus to be paired as a pres-
entation with the meat powder. After a number of times, he found that the dogs responded
by salivating to the sound of bell alone. (ICE Hong Kong, text W1A-004, Student Essays)
Personal pronouns serve as a basic form of (personal) reference, i.e. the realisa-
tion of the same or similar referential meaning by different linguistic expressions.
Typically, a full lexical item (including phrases) as an antecedent is taken up in
the ensuing text by a pro-form, especially personal pronouns, articles or demon-
stratives. The excerpt given above includes several such references: Ivan Pavlov,
who is the sole human actor in this paragraph, is referred to using the pronouns
he and his. Furthermore, the cause and effect of his experiments are referred to
as these and they.
In accordance with previous studies and as mentioned above, pronominal
reference is considered a characteristic typical of spoken registers (cf. Halliday
and Hasan 1976; Biber et al. 1999). Lexical cohesion refers to links between text
chunks by repeating a previously used lexical item or by replacing it by a semanti-
cally related one. Typical indicators include various types of sense relations; this
study, however, examines a summary indicator, namely lexical density, which
summarises the overall role of the vocabulary in texts and is said to be higher in
written language (cf. Biber et al. 1999: 62; Halliday 2001). In the example from the
corpus, there are 43 function words and 49 content words (based on the part of
speech-tagging included in ICE HK), and the 92-word-paragraph therefore shows
a lexical density of 53.26.
Conjunctions represent a third type of cohesive device: they mark the log-
ico-semantic relationships between linguistic units, rather than operating by
replacing linguistic units. Consequently they represent transitions between

messages (cf. Halliday and Matthiessen 2013: 655). In their discussion of linking
adverbials, i.e. adverbials serving to mark logico-semantic relationships, Biber
et al. (1999: 884–887) report some clear differences between spoken and written
registers, not just with respect to the specific items chosen but also to the fre-
quency of linking devices, with a higher frequency in spoken registers. The
example contains two very obvious instances, namely the and, operating once
as a phrase-level and once as a clause-level connector in “These are considered
as unconditioned stimulus and unconditioned response and they occur without
previous conditioning.”
Halliday and Matthiessen (2013: 657) rightly distinguish clause complex-
ing (in the realm of grammar) from conjunction (in the realm of cohesion) and
point out how they complement each other across spoken and written regis-
ters. This raises the question of how well a quantitative corpus study can dis-
tinguish between grammatical and cohesive features. The approach to the cor-
pus-based analysis of cohesion taken in this paper is liable to a methodological
caveat. Although cohesive devices may also operate within the clause, grammar
is the main locus of linking elements inside the clause. Consequently, cohesion
obtains mainly between clauses. This study does not inspect the cohesiveness of
each individual occurrence of the pronouns and conjunctions retrieved from the
corpus query. The reported results therefore have to be seen as providing a first
indication of certain register characteristics only.
3.3 B
y all means: Measures for the comparison of registers
and varieties
The approach of studying language variation both from a functional as well as a

regional point of view requires several successive steps in the analysis. A search
for register features in only one variety would lead to narrow results and not fully
take into account functional and regional aspects as separate, yet related aspects
of language variation. We therefore first examine variation between registers,
before focusing on the variation between varieties and ultimately combining
individual results. All frequencies are given as percentages relative to the overall
number of tokens per file.
In a first step described in detail in chapter 4.1, the three cohesive devices
relevant for this study are examined within each register across all six varieties
against the grand mean, i.e. the arithmetic mean of the relative frequencies of the
respective feature across varieties and registers (Neumann 2012: 84). This grand
mean functions as a reference value for the distribution of features across all
varieties and registers, thus describing the respective feature without any restric-
tions. It is a necessary benchmark in order to put the register- and variety-specific
results into perspective. We thus give the magnitude of difference between the
grand mean for the respective feature and the specific value for one register in
one variety by subtracting the specific value from the grand mean across varieties
and registers.
In contrast to this register-oriented characterisation, a second cycle analy-
ses the cohesive devices based on their occurrence within each variety, again as
the magnitude of difference from the grand mean, but now disregarding regis-
ter. Neumann (cf. 2012: 84) describes the related variety mean, i.e. the arithmetic
mean of the relative frequency of a feature within a variety across all registers.
This study compares other types of descriptive statistics represented by boxplots
(cf. Section 4.2).
The major purpose of the final combinatory step is to compare the range of
variation that register features display in a variety, thus allowing conclusions
about register-specific characteristics. The cohesive features of pronominal ref-
erence, conjunctions and lexical density will be analysed in terms of the range
of variation across registers within one variety. These range values for the three
cohesion indicators are then processed to obtain the mean range of variation,
showing the overall degree of register variation in the variety.
As stated in Section 3.1 above, it is impossible to calculate occurrences per
text given the structure of the components of the International Corpus of English
used for this study. As a consequence, we cannot subject the data set to meaning-
ful mean-based inferential statistics, even though this would allow us to examine
the interaction between register and variety statistically. In Section 4, all results
will therefore be reported in the form of descriptive statistics.
4 Analysis
4.1 Comparison of the registers
The variation in our corpus which was found on the basis of register showed results
that were, for the most part, in line with what has been found about spoken and
written language previously (e.g. Biber et al. 1999). This section will look at the
results for the individual linguistic features, namely pronouns, conjunctions and
lexical density, in more detail and examine their distribution within registers.
The relative frequency of pronouns given as the difference from the grand
mean (cf. Section 3.3 for the calculation of the values) in Table 1 is clearly higher
in the spoken registers of broadcast discussions and conversations. In these two

categories, the values throughout all varieties are above the grand mean, while
the written registers of academic writing, administrative writing and timed exams
generally display values below the grand mean, with the sole exception of timed
exams in Canadian English.
Table 1: Personal pronouns per tokens presented as the difference from the grand mean in
percentage points
Pronouns across varieties and registers (grand mean): 5.59 %
Academic Administrative Broadcast Conversations Exams

Writing Writing Discussions
Canada –4.80 –2.93 4.01 8.70 0.47

Hong Kong –4.87 –4.27 2.97 6.56 –2.13
India –4.94 –4.06 1.51 5.98 –2.29
Jamaica –5.21 –4.64 4.63 8.47 –2.64
New Zealand –4.89 –2.98 1.96 5.15 –2.33
Singapore –4.44 –3.50 4.37 8.92 –2.78
These values, always in comparison to the grand mean, are not unexpected. They
do, however, give a first indication of the differences between the registers when
compared across the varieties. Although pronouns are frequent in spoken regis-
ters in general, Canadian English, Jamaican English and Singapore English stand
out in both broadcast discussions and conversations as containing considerably
more instances of pronouns than the other three varieties.
Although these variety-based differences are less distinct in the written cat-
egories, here, too, Jamaican English stands out as deviating most from the grand
mean in all but the timed exams, where Singapore English displays a slightly
clearer divergence from the mean. Canadian English, on the other hand, demon-
strates less distinctive pronoun frequency in the written registers and stands out
as the only variety to contain more pronouns in timed exams than indicated by
the grand mean.4
4 This particularity should be treated with caution. The texts in this category in the Canadian
ICE component show some striking similarities in topic, which suggests a glitch in the corpus
compilation.
Figure 1: Boxplot5 of the percentage of personal pronouns per tokens for each variety in the five
registers
The boxplots in Figure 1 show striking register differences, with academic writing
and administrative writing having a low frequency and broadcast discussions
and conversations displaying high frequencies. Exams display slightly higher
frequencies than the other written registers. Interestingly, the range of variation
across varieties is higher in the spoken registers with negligible variation in aca-
demic writing. Writers in this register seem to align to some common convention
in the use of pronouns (cf. Section 4.4).
Similarly, interesting observations can be made when looking at the registers
separately. As shown by the diverging extension of the boxes for each text cate-
gory, which indicates the range of variation of the middle 50 % of all observations,
there is much less variation in academic writing and exams across varieties than
in the other text categories (with the mentioned outlier of Canadian English). The
other three registers show a broader range of the usage of this linguistic feature,
most strikingly so the spoken fields of broadcast discussions and conversations.
In the first case, the difference from the grand mean varies between 1.51 in India
and 4.63 in Jamaica, while in the latter the range is from 5.15 in New Zealand to
8.92 in Singapore.
5 Boxplots contain information on the smallest and largest observation in a category (by the
whiskers), the interquartile range, i.e. the middle 50 % of all observations (by the box) and the
median, i.e. the value separating the higher half of the observations from the lower half (by the
line in the box). Outliers, i.e. observations clearly distant from the other observations, are plotted
as points.
Figure 2 containing the boxplot of lexical density shows an almost completely

complementary picture to Figure 1 with only small differences in the range of
variation. Again, timed exams are situated between academic writing and admin-
istrative writing, at the one end, and broadcast discussions and conversations at
the other end.
Figure 2: Boxplot of the lexical density for each variety in the five registers
Like the use of pronouns, the values for lexical density displayed in the texts meet
the expectations depending on their register, as can be seen in Table 2.
Table 2: Lexical density for each register presented as the difference from the grand mean in
percentage points
Lexical density across varieties and registers (grand mean): 51.00 %

Canada 7.45 5.30 –6.90 –9.43 –1.27

Hong Kong 7.36 3.58 –5.01 –8.34 2.23
India 7.37 5.90 –3.96 –8.26 6.43
Jamaica 7.23 6.57 –6.84 –9.69 2.72
New Zealand 6.01 4.87 –5.45 –9.74 1.19
Singapore 6.41 6.03 –5.45 –8.43 2.33
All varieties show the highest degree of lexical density in the field of academic
writing, and there is no clear variation within the register. Administrative writing,
too, yields only values above the grand mean. In this register, however, Hong
Kong English stands out as having a rather low lexical density, especially in com-
parison to the higher value in Jamaican English. The difference between the two
varieties amounts to 2.99 percentage points.
In contrast to academic writing and administrative writing, the last written
category, which contains texts from timed exams, appears surprisingly diverse.
Here, Hong Kong English, Jamaican English and Singapore English are closest,
ranging only between 2.23 and 2.72 percentage points. New Zealand English
shows a less distinct lexical density than these three. In contrast, Indian English
shows a value of 6.43, making it the only variety to nearly reach the degree of
lexical density it displays in the register of academic writing and surpassing that
of administrative writing. Indian English shows the highest average of lexical
density in written language, as opposed to Hong Kong English with the lowest.
The spoken registers, on the other hand, exclusively present degrees of
lexical density below the grand mean. The register of conversations yields the
lowest values for lexical density in all varieties, with Canadian English, Jamai-
can English and New Zealand English with nearly identical values at the high
end of the range, and Hong Kong English, Indian English and Singapore English
grouped around a lower range value. In total, however, the range between the
two most distant components is no more than 1.48. In comparison, the register
of broadcast discussions shows slightly more internal variation. Here, Canadian
English and Jamaican English are furthest below the grand mean with values
of 6.90 and 6.84, and Hong Kong English, New Zealand English and Singapore
English cluster between 5.01 and 5.45, while Indian English again stands out with
a rather high lexical density (3.96). This variety seems to rely fairly strongly on
lexical means to create cohesion.
While the analyses of the usage of pronouns and lexical density proved to
comply with earlier studies of spoken and written registers (cf. Biber et al. 1999:
65, 333–334), the frequency of conjunctions, the last linguistic feature in our
research, provides many unexpected values for the six varieties. Neither spoken
nor written registers appear consistent in displaying values above or below the
grand mean, with the sole exception of the category of broadcast discussions,
which, however, still contains a considerable amount of intra-register variation,
as Table 3 shows.
Table 3: Conjunctions per tokens presented as the difference from the grand mean in
percentage points
Conjunctions across varieties and registers (grand mean): 6.05 %

Canada 0.09 –0.14 0.72 0.45 0.76

Hong Kong –0.55 –0.98 0.43 0.48 –0.72
India –0.64 –0.39 0.69 0.18 –1.15
Jamaica –0.84 –0.63 1.21 1.01 0.07
New Zealand –0.20 0.76 0.53 0.60 0.23
Singapore –0.44 –1.51 0.91 –0.72 –0.02
Like the use of pronouns and the degree of lexical density, the use of conjunc-
tions in the spoken registers is particularly pronounced in Jamaican English. But
while other varieties, mainly Canadian English and Singapore English, show an
almost identical distinction for the first two features, Jamaican English stands out
regarding conjunctions.
Canadian, Hong Kong, Indian and New Zealand English do not show any
extraordinary values in the spoken registers, varying only slightly between
broadcast discussions and conversations and displaying values in both catego-
ries above the grand mean. Singapore English, in contrast, is the only variety that
yields a frequency of conjunctions clearly below the grand mean in the register
of conversations. This makes broadcast discussions the only one of the five reg-
isters that complies with what is usually observed in spoken language, namely
an above-average use of conjunctions in comparison to the grand mean and
especially to written texts. Given the setting of broadcast discussions involving
several speakers who all contribute to a particular topic, the register is likely to
have some argumentative character. The frequent use of conjunctions appears
particularly suitable for linking the arguments across clauses.
The written registers in this study each display exceptions, too. In academic
writing, Jamaican English is again the variety which deviates furthest from the
grand mean, but Canadian English is the only example of a variety using more
conjunctions than on average, if only marginally so. In administrative writing,
it is New Zealand English that deviates clearly; the use of conjunctions in this
variety may reflect some argumentative style in presenting the administrative
contents. In contrast, Singapore English shows the strongest tendency towards
the written medium by using clearly fewer conjunctions in comparison to the
grand mean. The most peculiar values, however, can be found in the register of
timed exams. Indian English and, with quite some distance, Hong Kong English,
rely much less on conjunctions, both showing a frequency of this cohesive device
below the average given by the grand mean. Singapore English almost equals the
grand mean. New Zealand English, Jamaican English and Canadian English, on
the other hand, contain more conjunctions in exams than the average. Although
this makes timed exams the most diverse register, it has to be kept in mind that
exams are written by students. Depending on the age and educational degree of
the examinees as well as the topic of the exam, their styles of writing will thus
differ from each other due to factors other than their language variety alone.
Figure 3: Boxplot of the percentage of conjunctions per tokens for each variety in the five
registers
While the boxplots for pronouns and lexical density (Figure 1 and Figure 2)
display clear differences between the registers, the differences are much less
distinctive for conjunctions (see Figure 3). Academic writing and administrative
writing have an almost identical median in terms of the relative frequency, i.e. not
in comparison to the grand mean. Only the range of variation across varieties is
larger in administrative writing. The other three registers display higher medians.
4.2 S
umming up varieties
The tables and figures depicted above show values per register and all six vari-
eties within them, this section sums up the register variation which the varie-
ties display for every one of the linguistic features. So far, every variety rendered
five values, one per register, for the use of pronouns, conjunctions and lexical
density. Every one of these values determines the register-specific distinction of
this feature by comparing it to the grand mean. The distribution of each linguistic
feature within a variety will be examined with the help of boxplots displaying the
descriptive statistics for each variety for a linguistic feature across text categories.
As can be seen in Figure 4, Jamaican English renders the highest range of
variation in terms of relative pronoun frequency, closely followed by Singapo-
rean and Canadian English. The latter is clearly distinct from the other varieties
because it displays the highest median, whereas all other varieties have an almost
identical median. New Zealand and Indian English show the smallest range for
the use of pronouns.
Figure 4: Percentage of pronouns per tokens represented as variation of text categories per
variety
The picture looks a little more diverse for lexical density of the different varie-
ties (as shown in Figure 5). Here, Canadian English and again Jamaican English
display the highest overall range, yet their medians differ greatly, showing that
Canadian English, as it did for pronouns, stands out, this time with the lowest
median. In terms of range of variation, Jamaican English shows considerable
variation across text categories as visible from the widest interquartile range.6
Indian English has the highest median of all varieties, reflecting what was said
from the register perspective in Section 4.1. Hong Kong, Jamaican and Singapo-
rean English are almost identical in terms of the median, whereas New Zealand
English displays a slightly lower median in comparison with these three varieties.
This suggests a tentative interpretation of a similarity between the two L1 varie-
6 The interquartile range visualises the middle 50 % of all observations.

ties, as Canadian English and New Zealand English are the two varieties with the
lowest median value for lexical density. Apparently these two L1 varieties have
a slightly reduced tendency to draw on lexical means to create cohesion in the
registers under investigation.
Figure 5: Lexical density represented as variation of text categories per variety
The most diverse results were yielded for conjunctions (Figure 6). This is hardly
surprising, as the closer analysis in the previous section already pointed in this
direction; yet an overview over the varieties makes this even more obvious. The
two L1 varieties Canadian and New Zealand English are again clearly similar:
both display narrow ranges here and the median is rather high with 6.5 (CAN)
and 6.6 (NZ). Similar to the findings for lexical density, Jamaican English again
has the highest range for frequency of conjunctions, in particular when consid-
ering the interquartile range. Its median is relatively high, at least in comparison
to the other L2 varieties. Compared to the other L2 varieties, Indian English has
a relatively small interquartile range. Singaporean English, despite most values
being clustered around a median of 5.7 %, contains two outliers with considera-
bly low as well as high deviations from the median. The two East-Asian varieties
(Hong Kong and Singapore English) are similar in displaying the lowest median
for conjunctions.
Figure 6: Percentage of conjunctions per tokens represented as variation of text categories per
variety
4.3 Synopsis across indicators
The last step to reach a value that represents the register variation within a variety
is the combination of the ranges obtained for the individual linguistic features
described in the previous section. The absolute range from the highest to the
lowest observation across text categories is calculated for every feature, and the
three values are added up and their mean obtained, as shown in Table 4.
Table 4: Ranges of variation – Cohesion
Canada Hong India Jamaica New Singapore

Kong Zealand
Pronouns 13.51 11.42 10.92 13.68 10.04 13.36

Lexical density 16.88 15.69 15.93 16.92 15.75 14.84
Conjunctions 0.91 1.46 1.84 2.05 0.96 2.42
Sum of ranges 31.30 28.57 28.39 32.65 26.76 30.92
Mean of ranges 10.43 9.52 9.46 10.88 8.92 10.21
Since this study only analysed three linguistic features, observations in the varie-
ties do not differ much from each other, yet some tendencies can be observed that
allow some conclusions. As could already be deduced from the previous discus-
sion, Jamaican English displays the highest register variation for the three indica-
tors as represented by the sum and mean of ranges, if only slightly so. Canadian
and Singapore English are almost even and close to Jamaican English; all three
display a reasonable amount of variation in their use of cohesive devices, which
can be traced back mainly to distinct deviations in the spoken registers. Indian
and Hong Kong English, on the other hand, show less variation. The different
registers, representing spoken and written language, are therefore less distant
from each other in terms of cohesive devices, a pattern which, when looking back
at the more detailed results, originates both from the spoken and written parts of
the analysis. New Zealand English displays the least variation for the three indi-
cators across registers.
4.4 D
iscussion
The analysis in the previous sections gives insight into the distribution and usage
of cohesive devices from different perspectives, variation across registers as well
as regional variation. While the combination of these points of view makes the
analyses comparable, the explanatory power of the three features alone is limited:
this methodology only reaches its full potential when applied to a broader range
of features and registers.
When looking at registers, there is a not very surprising notable difference
between the spoken and written registers. Lexical density and the frequency
of pronouns behave complementarily: whenever a register across all varieties
displays a low value for one feature, it will invariably display a high value for
the other feature. This confirms the findings from the literature mentioned in
Section 3. The frequency of conjunctions also confirms the distribution described
for other varieties (cf. Biber et al. 1999: 81); the differences between registers are,
however, much less pronounced than for the other two features. The registers can
be organised along a scale of orality with conversations and broadcast discus-
sions at one end and academic writing and administrative writing at the other
end. Interestingly, timed exams always take up a middle position: apparently,
while written, they still have a clear influence of the spoken medium as far as
cohesive devices are concerned, which might be traced back to the incomplete
development of a register repertoire of the students sitting the exams.
However, within these two major groups we can also find particularities and
variation. The register of conversations displays more extreme deviations from
the grand mean than that of broadcast discussions, which might be due to the
fact that broadcast talk is more formal in terms of social distance (cf. below) and
often prepared to a considerable degree in terms of medium. And, of course, it is
public. By contrast, conversations are as spontaneous as can be. When looking
at pronouns, especially, the difference between these two registers is clear and
reflects their respective functions. While conversations might vary with respect
to their goal, the number and identity of the participants can be assumed to be
rather stable. Therefore, using pronominal reference poses no problem. Radio
audiences, however, are subject to substantial fluctuation, and pronominal ref-
erence might be lost on some listeners if they enter the program at random times
during its course, making it advisable for radio moderators to avoid or at least min-
imise the usage of this feature. Furthermore, the purely phonic channel in which
the radio broadcasts are transmitted requires more reliance on auto- instead of
syn-semantic reference. These functional assumptions are also reflected in the
lexical density, which is higher in broadcast discussions than in conversations.
While the latter contain a considerable amount of function words (not least of
which are pronouns) and are also described as featuring more intricate sen-
tence structures (cf. Halliday 2001), broadcast information has to be designed to
be understood easily while at the same time not taking too long in order not to
strain the listeners’ attention span. The picture of conversations and broadcast
discussions showing similar tendencies but displaying some striking differences
has already been shown in the study of social distance on the same data set (cf.
Neumann 2012).7 While both registers showed an above average use of contrac-
tions and interjections, which can certainly be considered typical for spoken dis-
course in general, the less spontaneous and most of all more anonymous register
of broadcast talk showed a higher use of titles, especially in the L2 varieties of
Indian, Jamaican, Singaporean and Hong Kong English. This certainly is a means
of creating or maintaining a distance that is rare in conversations and indicates
the fact that in broadcast discussions, the participants might not know each other
very well or use the title as a piece of information for the radio audience.
In contrast to the spoken registers, the written categories of academic writing,
administrative writing and exams show less variation. The former two are very
close in their use of cohesive devices, which might be traced back to their rather
high degree of standardisation and norms. Administrative writing, especially, can
be assumed to follow strict guidelines or even use pre-fabricated forms or text
blocks depending on the topic, which arguably draw more on lexical means than
on pronominal reference. Lexical density is above average, which relates to Neu-
mann’s (2012) finding of the register being content-oriented and rather neutral in
social distance. Academic writing even shows slightly more extreme tendencies
regarding cohesive devices; the reasons for this, however, are surely very differ-
7 The numbers discussed in this paper are updated from those reported in Neumann (2012). Nev-
ertheless, all tendencies reported there remain unchanged and consequently the interpretation
is also maintained.
ent. The register shows very little internal variation when separated according to
the regional variety it comes from, which hints at the international character of
the research community. Furthermore, the texts for this study were taken from the
same general thematic area, namely natural sciences, and thus render the sample
even more homogeneous. This observation is confirmed in the study of social dis-
tance, where academic writing stands out as the most content-oriented register.
In contrast to this register-based perspective, differences between vari-
eties are less clear-cut. Although there are many small divergences among the
six regional varieties, no patterns or groupings emerge that would allow strong
claims about patterns of the varieties; rather, individual trends stand out which
hint at particularities in some of the varieties. The most striking of these can
be found in Jamaican English, which displays extreme values for most features
as well as most registers. While other varieties show peaks in the usage of one
feature or in single registers, cohesive devices are notably high or low through-
out most categories in Jamaican English. Singaporean English, too, is set apart
in some aspects, most obviously regarding the use of conjunctions. Especially in
administrative writing, conjunctions are by far rarer than in the other varieties,
and it is the only variety which displays a below-average use of conjunctions in
conversations.
Even though there is much less variation across varieties than across regis-
ters – an observation which can hardly be surprising given that we are looking at
varieties of English in contexts where a sufficient amount of functional variation
is required –, some interesting observations can be made when comparing the
boxplots displaying the register variation within each variety. Canadian English
diverges from the other varieties in the median for all three indicators. The other
L1 variety, New Zealand English, behaves similarly for lexical density and con-
junctions. The three L2 varieties display more variation, so some tendency of the
L1 varieties to display more homogeneous patterns than the L2 varieties seems to
emerge. Reasons for this could be found in the exonormative characters of these
L2 varieties; the fact that the standard towards which the language is oriented
originates in a very different part of the world makes an adaptation to a certain
degree inevitable. The language is made to fit the needs of the new speech com-
munity with regard to societal, regional and geographical contexts, which can
be expected to be mirrored in the registers in use. Furthermore, as part of this
adaption, L2 varieties often come into more contact with other languages and
might thus display linguistic characteristics originating from these interferences.
More generally, one might speculate that this coarse patterning into L1 and L2
varieties reflects exactly this: the status of the respective types of varieties with
a long history of (transplanted) mother tongue speakers in the case of Canadian
and New Zealand English on the one hand and indigenized non-native varieties
with reduced exposure to native English on the other hand. At the same time,
the two L1 varieties might also betray more interaction with a standard variety.
However, only a multivariate study of the type reported by Szmrecsanyi and Kort-
mann (2009), one which is based entirely on corpus findings rather than intro-
spective data, can tell whether these assumptions hold true across a wide range
of indicators and varieties.
5 C
onclusion
The analysis presented in this paper aimed at determining the degrees of cohe-
sion within a variety, based on the particularities that are displayed by differ-
ent registers. By examining the distribution of cohesive devices as indicators of
medium, the distinctiveness of individual registers can be observed and com-
pared across different regional varieties of English. In order to obtain a broader
and more representative overview of the register variations within different vari-
eties of English, more linguistic features representing other register parameters
have to be analysed. At the same time, more varieties would ensure a more even
coverage of the English language, which would be a benefit particularly for the
calculation of the comparative value of the grand mean – not only would it then
represent more varieties and thus become ever more general or ‘grand’, but every
new variety taken into the framework of the study would automatically be drawn
on for comparisons by being included in this value.
Similar thoughts of course hold true also for the inclusion of more registers.
Both for spoken and written language, the ICE components hold many more files
than those of the five registers analysed here, and an augmentation of the data in
this way would allow more universal statements about spoken and written texts
and their differences. This distinction, then, apart from insights into individual
registers, would show most clearly which functions a variety mainly serves in
a community by laying open whether written or spoken registers have devel-
oped a more distinct character. The present study also showed, however, that
despite the usefulness of the investigation of the interaction between register and
variety, the International Corpus of English, independent of its undisputed value
for other types of varieties-related research questions, may not be the best data
set to investigate this interaction. The recently compiled GloWbE Corpus (Davies
2013), a collection of English web texts from 20 countries, cannot be used as an
alternative because it does not provide the information needed to distinguish reg-
isters. Currently, research is under way (Fest, forthcoming) that will afford a more
detailed and statistically more robust analysis of the interaction between register
and variety in English based on a corpus compiled for this specific purpose.
The combination of functional and regional variation thus still leaves a wide
field to be explored and many questions to be answered. Even with the limited
varieties and registers analysed so far, however, it becomes apparent that reg-
isters function as a very suitable gateway to understanding and describing the
development and status of a variety, determining its particularities and putting it
into perspective among Englishes worldwide at the same time.
References
Berruto, Gaetano. 2004. Sprachvarietät – Sprache (Gesamtsprache, Historische Sprache).
Linguistic Variety – Language (Whole Language, Historical Language). In Ulrich Ammon,
Norbert Dittmar, Klaus J. Mattheier & Peter Trudgill (eds.), Sociolinguistics/Soziolinguistik.
An International Handbook of the Science of Language and Society/Ein Internationales
Handbuch zur Wissenschaft von Sprache und Gesellschaft, 188–195. Berlin, New York: de
Gruyter.
Cambridge: CUP.
Biber, Douglas, Geoffrey Leech, Stig Johansson, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. Harlow: Longman.
Davies, Mark. 2013. Corpus of Global Web-Based English: 1.9 billion words from speakers in 20
countries. Available online at http://corpus.byu.edu/glowbe/
Diwersy, Sascha, Stefan Evert & Stella Neumann. 2014. A weakly supervised multivariate
approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli
(eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in
text and speech, 174–204. Berlin: de Gruyter.
Fest, Jennifer. Forthcoming. “News language in varieties of English: A corpus-based analysis of
newspaper reports.” PhD Thesis, Department of English, American and Romance Studies,
RWTH Aachen University.
Gregory, Michael. 1967. Aspects of varieties differentiation. Journal of Linguistics 3(2). 177–198.
Halliday, Michael A. K. 1978. Language as Social Semiotic: The Social Interpretation of
Language and Meaning. London: Arnold.
Halliday, Michael A. K. 2001. Literacy and linguistics: Relationships between spoken and
written language. In Anne Burns & Caroline Coffin (eds.), Analysing English in a global
context, 181–193. London: Routledge.
Halliday, Michael A. K. & Christian M. I. M. Matthiessen. 2013. Halliday’s introduction to
functional grammar. 4th ed, rev. Abingdon: Routledge.
Halliday, Michael A. K. & Ruqaiya Hasan. 1989. Language, context, and text: Aspects of
language in a social-semiotic perspective. Oxford: OUP.
Halliday, Michael A. K. & Zoe L. James. 1993. A quantitative study of polarity and primary tense
in the English finite clause. In John McHardy Sinclair (ed.), Techniques of description:
Spoken and written discourse, 3–35. London: Routledge.
Kortmann, Bernd & Benedikt Szmrecsanyi. 2004. Global synopsis: Morphological and syntactic
variation in English. In Bernd Kortmann, Edgar W. Schneider, Kate Burridge, Rajend
Mesthrie & Clive Upton (eds.), A handbook of varieties of English, 1142–1202. Berlin:
Mouton de Gruyter.
Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. eWAVE – The electronic world atlas of
varieties of English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://
ewave-atlas.org/. (Accessed on 2014-03-08.)
Kristiansen, Gitte & René Dirven. 2008. Cognitive sociolinguistics: Language variation, cultural
models, social systems. Berlin: de Gruyter.
Matthiessen, Christian M. I. M. 1993. Register in the round: Diversity in a unified theory of
register analysis. In Mohsen Ghadessy (ed.), Register analysis: Theory and practice,
221–292. London: Pinter Publishers.
Mollin, Sandra. 2007. New variety or learner English? Criteria for variety status and the case of
Euro-English. English World-Wide 28(2). 167–185.
Nelson, Gerald. 2006. The core and periphery of World Englishes: A corpus-based exploration.
Nelson, Gerald, Sean Wallis & Bas Aarts. 2002. Exploring natural language: Working with the
British component of the International Corpus of English. Amsterdam: Benjamins.
Nesbitt, Christopher & Günter Plum. 1988. Probabilities in a systemic grammar: The clause
complex in English. In Robin P. Fawcett & David J. Young (eds.), New Developments in
Systemic Linguistics, 6–33. London: Pinter Publishers.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik &
Benjamin Kohlmann (eds.), Anglistentag 2011 Freiburg: Proceedings, 75–94. Trier: WVT.
Neumann, Stella. 2013. Contrastive register variation. A quantitative approach to the
comparison of English and German. Berlin: de Gruyter Mouton.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment. Website. Lancaster.
http://ucrel.lancs.ac.uk/wmatrix/. (Accessed on 2014-03-09.)
Sand, Andrea. 2004. Shared morpho-syntactic features of contact varieties: Article use. World
Englishes 23(2). 281–298.
Sand, Andrea. 2008. Angloversals? Concord and interrogatives in contact varieties of English.
In Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta & Minna Korhonen (eds.), The
dynamics of linguistic variation: Corpus evidence on English past and present, 183–202.
Amsterdam: Benjamins.
Szmrecsanyi, Benedikt & Bernd Kortmann. 2009. The morphosyntax of varieties of English
worldwide: A quantitative perspective. Lingua 119(11). 1643–1663.
Van Rooy, Bertus, Lize Terblanche, Christoph Haase & Joseph Schmied. 2010. Register
differentiation in East African English: A multidimensional study. English World-Wide
31(3). 311–349.
Wong, Deanna, Steve Cassidy & Pam Peters. 2011. Updating the ICE annotation system:
Tagging, parsing and validation. Corpora 6(2). 115–144.
Xiao, Richard. 2009. Multidimensional analysis and the study of World Englishes. World
Englishes 28(4). 421–450.
Section III:
Regional, contrastive and diachronic
register variation
The final section of the present volume broadens the analytical perspective in
order to include further issues that need to be addressed in a comprehensive dis-
cussion of variational text linguistics. While Section I gave a detailed analysis of
selected registers and Section II provided a juxtaposition of registers, Section III
offers a synchronic investigation of regional and contrastive register variation
as well as a diachronic study. The contributions will show that these different
approaches are by no means mutually exclusive but represent different facets of
one common research paradigm.
Both Barbara Güldenring’s paper “Metaphors in New English academic
writing” and Steffen Schaub’s contribution “The influence of register on noun
phrase complexity in varieties of English” deal with international varieties of
global English on the basis of the International Corpus of English, each focusing
on one particular linguistic category. In this way, Güldenring and Schaub con-
tinue Neumann and Fest’s comparative approach that concluded Section II,
although they place more emphasis on variational and sociolinguistic aspects.
While Güldenring discusses the semantic phenomenon of metaphor, Schaub
concentrates on the syntactic structure of the noun phrase. Güldenring deals
with English as a Second Language exclusively in Asia (India, Hong Kong and
Singapore), whereas Schaub includes Englishes in Asia (India, Hong Kong and
Singapore), the Caribbean (Jamaica) and North America (Canada), covering first-
and second-language use. Since it is not feasible to compare all these regional
varieties per se, Güldenring focuses on academic discourse, as previous research
has shown that this register is particularly rich in metaphor. In particular, she
compares metaphors in academic writing from New Englishes with more tradi-
tional varieties of English and examines the occurrence of metaphorical domains
in the sub-registers Humanities, Natural Science and Social Science with the help
of Conceptual Metaphor Theory. By contrast, Schaub takes into account not only
academic writing but also the registers of conversation, unscripted speeches and
social letters. He argues that an investigation of noun phrase complexity based
on modification types sheds new light on both internal register consistency and
regional variability, especially with respect to the situational features of commu-
nicative purpose and production circumstances. Both contributions demonstrate
that such multivariate approaches open manifold new research possibilities with
each potential parameter shift.
222 Section III: Regional, contrastive and diachronic register variation
In contrast to the articles in Section II, Valentin Werner’s paper “Real-time

online text commentaries: A cross-cultural perspective” does not compare differ-
ent registers but studies one particular register across the two linguacultures of
German and British English. In this way, Werner’s contribution transcends the
monolingual English viewpoint and relates English to an adjacent language in the
Germanic family tree. Since it discusses computer-mediated communication, the
paper links up with Biber and Egbert’s study of web registers at the beginning of
Section I. The medium-dependent register of online text commentaries is further
narrowed down by concentrating on the subject matter of sports, with data drawn
from the online versions of widely read British and German newspapers. Having
established online text commentaries as an emergent register with specific lin-
guistic features, the paper highlights quantitative tendencies of cross-cultural
diversification on the basis of communicative intentions and the respective target
group of internet users. Hence, apart from purely linguistic deviations, the results
of the study also have significant repercussions on cross-cultural differences.
The volume closes with Javier Pérez-Guerra’s article “Word order is in
order here: A diachronic register analysis of syntactic markedness in English”,
which discusses grammatical developments of word order in written registers
from Middle English to Late Modern English. Thus, the paper stands out not only
by virtue of its diachronic approach but also because of its distinct focus on the
three syntactic constructions of left dislocation, topicalisation and extraposition,
which are here considered as register markers. By concentrating on these fea-
tures, the paper is able to give a comparative account of an exceptionally large
number of registers, including handbooks, law, philosophy, science, trials, trav-
elogues, romance, fiction, diaries, drama, history, letters, biography, education,
religious treatises and sermons. It is thus demonstrated that some constructions
have undergone a significant change in usage, such as left dislocation, which
used to be a feature of literate registers but today is a common marker of conver-
sation. Thus, with the help of historical corpora such as the Penn-Helsinki Parsed
Corpus of Early Modern English, it is possible to examine the historical background
of present-day English registers. The volume comes full circle when Pérez-Guerra
finally mentions the hybridity of historical registers, thereby creating a link to
hybrid web registers, as postulated by Biber and Egbert in their opening chapter.
Barbara Güldenring
Metaphors in New English academic writing
Abstract: In recent decades, heightened academic interest in World Englishes has
led to a growing body of research surrounding institutionalised second-language
varieties of English, often referred to as New Englishes. This paper aims at con-
tributing to this research by exploring metaphorical variation in New English aca-
demic writing as represented by three components of the International Corpus
of English (ICE), namely India, Hong Kong and Singapore. It asks two major
questions: What kinds of metaphor occur across New English academic texts?
and How are these metaphors distributed across the academic sub-registers of
Humanities, Natural Science and Social Science?. While generally suggesting
that metaphor can be viewed as a characteristic feature of academic writing, this
study considers metaphors that are ubiquitous to all varieties and academic dis-
ciplines under investigation as well as potentially variety- or discipline-specific
conceptualisations, which leads into a brief discussion about metaphorical func-
tion in New English academic writing.
1 I ntroduction
The field of World Englishes, including research devoted to the study of New
Englishes1, has grown into a prominent linguistic discipline within the last thirty
years. In addition to this body of research, an increasing number of studies
devoted to metaphor in authentic discourse have been exploring the relation-
ship between metaphor and register (Goatly 1997; Cameron 2003; Skorczynska
and Deignan 2006; Semino 2008; Steen et al. 2010; Semino et al. 2013). Steen
1 While “World Englishes” in reference to the academic discipline usually refers to the study of
any kind of English variety worldwide, “New Englishes”, a term attributed to Platt, Weber, and
Ho (1984), denotes those varieties that have grown in direct consequence of English’s spread
around the world (prominently via British colonialism) and, thus, have developed as nativised
varieties in areas, in which English was not traditionally the native language of the population,
fulfilling various (institutionalised) societal functions. In a descriptive sense New Englishes are
often characterised by variation on all linguistic levels due to substrate influence.
Barbara Güldenring, Philipps-Universität Marburg

224 Barbara Güldenring
et al. (2010: 203) have found that the academic register “is characterized by the
highest proportion of metaphor-related words” of the registers they investigated,
including news, fiction and conversation. Corroborating this finding, Krennmayr
(2011: 322) concludes that “news texts contain a larger proportion of metaphor-
ically used words than fiction and conversation but a smaller proportion than
academic texts”. This not only indicates a significant metaphoricity of academic
texts vis-à-vis other registers, but it also marks a departure from older views on
the value of metaphor in the academic realm:
To draw attention to a philosopher’s metaphors is to belittle him – like praising a logician

for his beautiful handwriting. Addiction to metaphor is held to be illicit, on the principle
that whereof one can speak only metaphorically, thereof one ought not to speak at all.
(Black 1954: 273, also qtd. in Römer 2000: 353).
Nevertheless, Black (1954: 294) comes to the conclusion that there is “[n]o doubt
metaphors are dangerous and perhaps especially so in philosophy. But prohibition
against their use would be a wilful and harmful restriction upon our powers of
inquiry”. Especially this last statement comes across as a reluctant admittance of
the inescapable presence of metaphor, particularly in academic texts. Nowadays,
after decades of metaphor research, most prominently in the vein of Conceptual
Metaphor Theory (henceforth CMT), a negative stance, such as the one described
above, has been largely dispelled. This is due to the observations that academic
texts display a multitude of metaphors that are closely connected to the phenom-
ena they are used to describe (cf. Römer 2000: 353) and that metaphors, in fact,
constitute a large part of expert (as well as everyday) discourse (cf. Jäkel 1997:
284). Furthermore, this understanding of metaphor has led to important insights
about the function of metaphor in academic discourse, including its role in the
acquisition as well as imparting of knowledge (cf. Drewer 2003, Cameron 2003).
The present pilot study aims at contributing to the growing understanding
of the nature of metaphor and register by introducing a varietal perspective on
metaphor and register. It also aims at contributing to the cognitive approach to
World Englishes, which has only recently developed from the merging of these
two previously isolated paradigms (cf. Wolf and Polzenhagen 2009: 1). With these
particular goals in mind, I am primarily concerned with the following questions
concerning New English metaphor and academic writing:
1) What kinds of metaphor occur across New English academic texts in the first
place?
2) In view of their functional contributions, how are these metaphors distrib-
uted across the academic sub-registers of Humanities, Natural Science and
Social Science?
The first question will be largely devoted to issues pertaining to metaphor distri-
bution on the basis of several conceptual mappings for the target domain concept
IDEAS in New English academic texts. In addition to this, I will briefly consider to
what extent the varieties under investigation differ in terms of the various entail-
ments or elaborations apparent in shared mappings. Addressing the second ques-
tion will involve scrutiny of the distribution of IDEAS metaphors according to the
academic sub-registers and, consequently, discussion of their functional roles.
However, before delving into these issues, the following outlines the theoretical
construct and analytic framework to which the present study adheres.
2 T
heoretical Background
In their immensely influential book Metaphors We Live By, Lakoff and Johnson
(2003 [1980]: 3) present a theory of metaphor, widely known as CMT (Conceptual
Metaphor Theory): “metaphor is pervasive in everyday life, not just in language
but in thought and action. Our ordinary conceptual system, in terms of which we
both think and act, is fundamentally metaphorical in nature”. They continue by
asserting that “[t]he essence of metaphor is understanding and experiencing one
kind of thing in terms of another” (Lakoff and Johnson 2003: 5). Thus, one impor-
tant claim of Conceptual Metaphor Theory is an experiential basis for metaphor,
which can be clearly seen by how an abstract concept like IDEAS is understood in
terms of more concrete concepts, with which we have more direct (bodily) expe-
rience:
(1) a. IDEAS ARE FOOD

What he said left a bad taste in my mouth.
b. IDEAS ARE PEOPLE

The theory of relativity gave birth to an enormous number of ideas in physics.
c. IDEAS ARE PLANTS

That idea died on the vine.
d. IDEAS ARE PRODUCTS

He produces new ideas at an astounding rate.
e. IDEAS ARE COMMODITIES

That idea just won’t sell.
f. IDEAS ARE RESOURCES

Don’t waste your thoughts on small projects.
g. IDEAS ARE MONEY

He’s rich in ideas.
h. IDEAS ARE CUTTING INSTRUMENTS

That cuts right to the heart of the matter.
i. IDEAS ARE FASHIONS

That idea went out of style years ago.
j. IDEAS ARE LIGHT-SOURCES

That’s an insightful idea.
(Lakoff and Johnson 2003 [1980]: 46–48)
This should by no means be considered an exhaustive list of IDEAS metaphors,

largely due to the fact that these examples were intuitively formulated and were,
at the time, not corroborated with authentic data. However, they do make clear
the relationship between the target domain (the abstract concept, e.g. IDEAS)
and the source domain (the more concrete, structured concept, e.g. FOOD) as
postulated by Conceptual Metaphor Theory. That is, our knowledge of the source
domain serves to achieve better comprehension of the target domain of which our
knowledge is less structured. Thus, a linguistic metaphor2 such as What he said
left a bad taste in my mouth relates our experience with a sensory-bound dislike
of certain food to the more abstract dislike of a certain idea.
In terms of academic discourse, Zichler (2010: 97–98) affirms the suitability
of conceptual metaphor, because academic discourse often involves the investi-
gation of phenomena that escape our direct experience; metaphors can be seen
as an opportunity to reveal the unknown through the known, i.e. the abstract
through the concrete. Nevertheless, Partington (1998: 107–108) criticises Lakoff
2 Conceptual metaphors can be distinguished from linguistic metaphors, metaphorically used

words (following the terminology by Steen et al. 2010) or metaphorical linguistic expressions.
Kövecses (2010: 4) makes this distinction and defines a conceptual metaphor as consisting of
“two conceptual domains, in which one domain is understood in terms of another”, while defin-
ing linguistic metaphor as “words or other linguistic expressions that come from the language or
terminology of the more concrete conceptual domain”. By way of example, reconsider (1c) above:
IDEAS ARE PLANTS. That idea died on the vine is the linguistic instantiation of the mapping
between IDEAS and PLANTS. A conceptual metaphor makes clear that IDEAS are understood as
behaving in a way similar to PLANTS in order to express something about IDEAS via this analogy,
and the linguistic metaphor codifies this. Thus, died on the vine not only points to the existence
of the IDEAS – PLANTS mapping, it also helps to efficiently communicate that some ideas are not
fully developed and thus can be discarded, much like a dead plant.
and Johnson for not making room for genre in their theory, because “[c]ertain
metaphors may well be much more prevalent in one kind of writing than another,
in fact, one of the characterising features of a genre is probably the kind of meta-
phor generally to be found therein.” By extending this argument to New English
academic writing from the register perspective, the present study does suggest
that metaphor can be viewed as a characteristic register feature in the sense of
Biber and Conrad (2009):
The register perspective combines an analysis of linguistic characteristics that are common
in a text variety with the analysis of the situation of use of the variety. The underlying
assumption of the register perspective is that core linguistic features like pronouns and
verbs are functional, and, as a result, particular features are commonly used in association
with the communicative purposes and situational context of the texts.
With the aim of extending the notion of “core linguistic features”, this paper takes
the position that metaphors can be viewed as features of the text with which a
communicative function is linked.3 In terms of being “linguistic”, metaphors, due
to their pervasiveness, are accessible and can be located by their relation to the
lexico-grammatical, that is, linguistic realisations of underlying conceptual met-
aphors. Lakoff and Johnson (2003: 7) maintain that “[s]ince metaphorical expres-
sions in our language are tied to metaphorical concepts in a systematic way, we
can use metaphorical linguistic expressions to study the nature of metaphorical
concepts” and this has found application in various methods for metaphor iden-
tification. Therefore, I will explore metaphors as core functional features that,
while being conceptual in nature, provide evidence of their existence via the
linguistic expressions that point to them. This, in turn, allows for a more flexi-
ble notion of linguistic feature or characteristic in the register perspective. Simi-
larly, Wolf and Polzenhagen (2009: 16–17) make the case for including metaphor
in research on varieties of English, otherwise dominated by studies describing
variation of the more traditional structural elements of language, i.e. phonology,
morphology, and syntax:
From a CL perspective, the core of the descriptive approach advocates, however, a far too
narrow understanding of ‘form’ and of what counts as ‘linguistic peculiarities.’ This narrow
understanding deliberately excludes important dimensions of variation […] Unaddressed
are also crucial aspects of relations between linguistic units, beyond standard structuralist
formal and semantic parameters. Specifically, little or no attention is paid to the fact that
linguistic material from various domains is systematically linked through metaphoric and
metonymic mappings, which constitutes a key dimension of relatedness.
3 For a similar position, cf. Krennmayr (2011).

Thus, the present paper is devoted to the investigation of metaphor as a core

feature of the academic register, as constructed by New English varieties. The fol-
lowing sections provide some detail about the corpus data and serve to sketch out
the method used to access metaphors via their linguistic realisations encountered
in New English academic writing.
3 C
orpus data
The data in the present study stems from those components of the International
Corpus of English (henceforth ICE) which are representative for the New English
varieties associated with Hong Kong, India and Singapore. The subcategory of
academic writing was explored for metaphors pertaining to the target domain
IDEAS, since it was assumed that this domain would feature prominently in aca-
demic writing of all kinds. True to the overall design of the ICE project, for each
component under investigation, academic writing includes ten 2,000-word pub-
lished texts covering the disciplines of Humanities, Natural Science and Social
Science respectively.4
Nelson (1996: 32) provides a description of ICE academic writing in terms of
intended readership and mode of composition:
Printed material is written for a large, unrestricted audience that the writer does not know.
[…] Academic writing reaches a smaller, more well-defined readership [as compared to
newspapers, popular writing or fiction], but the exact individual readership is unknown
to the writer at the time of composition. […] writers of printed works are usually required to
follow the house style of the publisher […] for which they are writing. Printed material may
have been edited by a number of different people, and the final version is often a product
of several earlier revisions. […] Learned writing is produced by specialists for specialists. In
the humanities, for example, it may include journal articles by academic historians written
for other academic historians.
Integrating this into Biber and Conrad’s (2009) framework for the analysis of situ-
ational characteristics of a register, New English academic writing, as represented
by ICE, can be described as involving most commonly single or plural authors
addressing an un-enumerated audience. The author-reader relationship is on a
professional, specialist level that is characterised most significantly by shared
knowledge. Furthermore, as Nelson (1996) points out above, ICE academic texts,
because of their printed status, most likely entail highly revised and edited pro-
4 The ICE project also includes texts from the field of Technology.
duction circumstances, whereas the place of communication is public, but not in

a shared setting. The communicative purposes are to describe, explain, summa-
rise, report on research pertaining to a specific topic. Based on this description,
New English academic writing shares the same characteristics that define aca-
demic writing associated with other varieties of English.
4 M
ethodology
In order to identify metaphors in ICE academic writing, a corpus-based study
(which has become more prominent in metaphor research in general) was the
logical choice. Berber Sardinha (2007: 12) summarises the distinct advantages of
studying metaphor with the help of corpora, such as ICE, over intuition-based
studies:
corpus-based studies can offer reliable information about the use of metaphors in language.
Another [advantage] is that corpora typically include large amounts of data, which can be
searched to provide information about the frequency of known metaphorical expressions.
Yet another is that genre or register-specific corpora can be explored to indicate metaphors
that are typical of certain fields or subject areas.
4.1 Retrieving metaphor candidates
Before reporting on the methodological details of the present study, it is impor-

tant to determine what type of method can be best employed to retrieve poten-
tial metaphor candidates from the corpora for further analysis. Berber Sardinha
(2012: 21–22) sorts previous methods into two overriding groups: 1) “sampling
techniques”, which involve a pre-selection of lexical units with which to approach
the corpus, and 2) “census techniques”, which involve examining each unit of the
corpus as a whole. Since the present study makes use of both of these techniques
to a certain extent, it is worth briefly considering previous research utilising these
methods.
A prominent example of a sampling technique for metaphor retrieval can
be found in Deignan (2005: 27), who uses the Bank of English corpus to study
metaphors pertaining to horse-racing and gambling and, before consulting the
corpus, establishes “the key lexical items in the field [of horse-racing and gam-
bling] using intuition, thesauri, dictionaries and collocational information from
concordances”. This method elicits linguistic metaphors such as At 48 he is too
young to be in the running for Prime Minister or For months, polls showed the two
main parties neck and neck.
A similar sampling technique is used in Stefanowitsch’s (2006) Metaphor
Pattern Analysis (MPA), which is a corpus-based approach that aims at investigat-
ing metaphorical target domains by pre-determining lexical items that represent
those domains. Yet, in contrast to Deignan (2005), Stefanowitsch (2006: 66) is
more specific about identifying metaphor on the basis of what he calls metaphor-
ical patterns: “A metaphorical pattern is a multi-word expression from a given
source domain (SD) into which one or more specific lexical item [sic] from a given
target domain have been inserted”. Furthermore, metaphorical patterns “do not
merely instantiate general mappings between two semantic domains […]. [T]hey
establish specific paradigmatic relations between target domain lexical items
and the source domain items that would be expected in their place in a non-met-
aphorical use” (Stefanowitsch 2006: 67). This can be illustrated by a linguistic
metaphor such as That idea went out of style years ago, for which Lakoff and
Johnson (2003 [1980]) formulated the conceptual metaphor IDEAS ARE FASH-
IONS. According to Metaphor Pattern Analysis, we can clearly see how the meta-
phorical pattern establishes a paradigmatic relationship between idea and words
denoting clothing, which are also expected to fill the same slot, such as in That
shirt went out of style years ago. Stefanowitsch (2006: 71) investigates EMOTION
metaphors, like ANGER, by pre-defining a set of lexical items that correspond to
this domain, e.g. anger, fury, rage, wrath, etc., which help to retrieve metaphors
such as ANGER IS AN OPPONENT IN A STRUGGLE (X wrestle with anger, X protect
Y from anger, etc.) (Stefanowitsch 2006: 76).
The present study recognises the value of such methods and draws from
them by involving, in essence, a sampling technique of a similar kind. However,
the particular method used here departs from these types of sampling technique
in two important ways. Firstly, Deignan (2005: 92) claims that “the direction of
investigation in corpus studies is from the linguistic form through to meaning.
It is not possible to use the corpus to proceed in the other direction”. This is cer-
tainly valid from the perspective of an approach that relies on pre-formulated lists
of lexical items or strings of words. However, the present study aims at challeng-
ing this unidirectional view by taking a cue from Hardie et al. (2007) and attempt-
ing to approximate the direction of meaning to form. Secondly, the present study
automates the initial step of establishing key or representative lexical items for
investigating a specific target domain by prompting the corpus itself to provide
all lexis related to that domain. In the interest of incorporating both aspects into
the present method, I employed the web-based corpus analysis software Wmatrix
(Rayson 2009).
In order to approach this corpus-based metaphor study from meaning to

form, in a first step, I uploaded the respective ICE texts for each variety-specific
sub-register, e.g. India academic Natural Science, to Wmatrix, which semanti-
cally annotates corpus texts with the aid of USAS (UCREL Semantic Annotation
System)5. “The semantic fields automatically annotated by USAS can be seen as
roughly corresponding to the domains of metaphor theory” (Hardie et al. 2007).
One such semantic field can be illustrated by the semantic tag used in this study
to extract metaphors involved in conceptualising the domain of IDEAS, namely
the tag X4.1 (Mental object: Conceptual object). It was assumed that this tag
would feature prominently in academic texts from various sub-registers. There-
fore, this particular tag was selected to query for IDEAS metaphors, which can be
accomplished by simply concordancing for X4.1 and, thus, setting meaning as the
starting point and not form.
However, Wmatrix also provides frequency lists of words tagged with X4.1
(e.g. idea, concept, notion, theory, etc.), which give an indication of which linguis-
tic forms are associated with this domain in the corpus texts before concordanc-
ing. Therefore, in this second step, we can see an automated version of Deignan’s
(2005) or Stefanowitsch’s (2006) pre-determination of lexical items with which to
approach the corpus. This, of course, has the distinct advantage that there is no
reliance on pre-selected linguistic metaphors, which may or may not be present
in the corpus. Thus, the present study was less restricted in terms of which met-
aphorical data could be uncovered. In addition, the corpus texts acted as their
own reference by supplying lexical information from this domain that otherwise
would have involved a certain degree of guesswork. Furthermore, with recourse
to Metaphor Pattern Analysis, already established advantages remained intact,
such as the “retrieval of large number of instances of a target domain item” as
well as the potential to quantify certain metaphorical instances of lexis from a
particular domain and “to make generalizations concerning the importance of
the conceptual metaphors underlying these patterns” (Stefanowitsch 2006: 66).
Finally, in a third step, I systematically examined the concordance lines for each
X4.1 item with the aid of AntConc (Anthony 2012).
All in all, the retrieval of metaphor candidates from the corpora involved a
more automated sampling technique than has been previously employed. Since
at this point, we are still talking about “metaphor candidates,” this method is
in need of a separate step that identifies these candidates as metaphorical or
non-metaphorical. This step takes on characteristics of the “census technique”
and will be outlined in the following.
5 For a more extensive overview, cf. Archer et al. (2002).

4.2 I dentifying metaphors
Once all potential IDEAS metaphors were retrieved from the corpus data, these
candidates were manually analysed and marked as being instances of linguistic
metaphors or not. In order to do away with as much analyst intuition as pos-
sible, for this step I relied on MIP (the Metaphor Identification Procedure), ini-
tially developed by the Pragglejaz Group (2007) and further refined as MIPVU
(Metaphor Identification Procedure Vrije Universiteit) (Steen et al. 2010),6 which
assumes that “[m]etaphorical meaning in usage is indirect meaning: it arises
out of a contrast between the contextual meaning of a lexical unit and its more
basic meaning, the latter being absent from the actual context but observable
in others.” This procedure has been assessed by Berber Sardinha (2012: 22) as
belonging to the census technique. Nonetheless, the present study did not fully
adhere to the “census” quality of this technique because the texts comprising
the ICE academic writing under investigation were not all read in their entirety.
However, for making decisions on many of the metaphors, it was often necessary
to undertake a closer reading of the greater context to the extent that a contextual
meaning for the metaphor could be established.
In order to provide further support for metaphorical decisions made in this
study, I often additionally consulted the VU Amsterdam Metaphor Corpus Online,
which is the largest corpus annotated for metaphorical language according to
MIPVU (Steen et al. 2010), in order to tackle uncertain cases and to benefit from
the insight of multiple analysts. If a linguistic metaphor was identified in the VU
Amsterdam Metaphor Corpus Online for the uncertain case I was investigating,
then I also considered it metaphorical.7 For instance, advocates of a theory or
advocating a theory were considered metaphorical due to the fact that in the VU
Amsterdam Metaphor Corpus Online, similar formulations were also judged to be
metaphorical: Most advocates of biological theories was identified as an indirect
metaphor there (Steen et al. 2010).
4.3 Formulating conceptual metaphors
After establishing which linguistic metaphors were present in the data, a final step
was undertaken to formulate potential conceptual mappings underlying these
6 For details of the individual steps of MIPVU, cf. Pragglejaz Group (2007) and Steen et al. (2010).
7 If it was not found in the VU Amsterdam Metaphor Corpus Online, I discarded the uncertain
case under investigation in order to avoid relying solely on my own intuition.
metaphors. Harkening back to Lakoff and Johnson’s (2003) list of IDEAS meta-
phors, a few of these mappings were found in the data, but not all were present.
Therefore, the data warranted the consideration of other conceptual mappings,
and due to this circumstance, the following (broadly formulated) source domains
have been suggested as a means of categorising and, thus, quantifying the meta-
phors encountered in the data:
Table 1: Overview of IDEAS metaphors in New English Academic Writing
ORGANISMS: ideas are … Total: 178
PEOPLE: The role of theory is to give quantitative predictions <ICE-SIN:W2A-027#17:1>

PLANTS: Language and thoughts have different genetic roots. <ICE-SIN:W2A-005#56-1>
UNSPECIFIED: thought […] and language […] assume a unitary existence <ICE-
SIN:W2A-005#58:1>
OBJECTS: ideas are … Total: 131
OBJECTS IN CONTAINERS: Medical personnel […] should keep this concept in mind <ICE-
HK:W2A-027#104:1>
CONTAINERS: philosophy does not confine to one particular subject matter <ICE-IN-
D:W2A-001#18:1>
POSSESSIONS: the parties […] would take the view <ICE-HK:W2A-014#52:1>
LANDMARKS: feminists also turn to the phenomenological reflection on the body, especially to
the idea […] <ICE-HK:W2A-003#86:1>
ARTEFACTS: ideas are … Total: 58
TOOLS: countries choose to use environmental issues to spark trade wars <ICE-
HK:W2A-011#27:2>
MIRRORS: Through an ideological mirror, individuals are constituted as subjects. <ICE-
HK:W2A-002#59:1>
CLOTHS: The common thread linking these two ideals <ICE-HK:W2A-004#41:1>
GOODS: straight thinking, therefore, is at a discount <ICE-IND:W2A-012#57:1>
STRUCTURES: ideas are … Total: 42
BUILDINGS: These early ideas […] form the foundation of the modern idea of corporate social
responsibility <ICE-SIN:W2A-017#32:1>
PARTS OF BUILDINGS: These early ideas […] form the foundation of the modern idea of corpo-
rate social responsibility <ICE-SIN:W2A-017#32:1>
Table 1(continued)
EVENTS / ACTIVITIES: ideas are … Total: 29
JOURNEYS: the safe and well trodden areas of basic and general principles and practices <ICE-
SIN:W2A-002#5:1>
(VIOLENT) CONFLICTS: the concept […] came to be challenged exceedingly in Supreme Court
<ICE-IND:W2A-005#32:1>
GAMES: The succeeding tales also […] play off the themes <ICE-IND:W2A-008#59:1>
COMMUNICATIVE EVENTS: The notion of an individual text develops […] to the historical and
cultural dialogue <ICE-HK:W2A-002#110:1>
MATTER / ENERGY / OTHER NATURAL PHENOMENA: ideas are … Total: 11
PRECIOUS METAL: The touchstone, of all ideas, should be not their novelty <ICE-IN-
D:W2A-012#97:1>
LIGHT/LIQUID: Christian theology/philosophy also absorbs the idea of process philosophy
<ICE-HK:W2A-005#46:1>
IMAGES: ideas are … Total: 7
He regards the political perspective of understanding as the absolute horizon of all reading and
interpretation <ICE-HK:W2A-002#67:1>
With this framework in place, it is now possible to pinpoint variation between

varieties and sub-registers of ICE academic writing. Although the formulation
of conceptual metaphors can be a tricky business at times (and I acknowledge
that they are, in fact, plausibility offerings), for the purposes of this study, it was
necessary to establish the grounds of comparison, because it makes evaluating
the distribution of the various metaphors possible. Nevertheless, these categories
were first formulated after analysis of the whole data set and on the basis of their
most salient conceptual features, as expressed by the linguistic metaphors. Addi-
tionally, taken together, these categories can be viewed as supplying a metaphor-
ical profile for the way the New English varieties, as represented by ICE-Hong
Kong, ICE-India and ICE-Singapore, conceptualise the IDEAS domain.
On a final note, due to the nature of the academic register and the highly con-
ventionalised informational language it contains, it was assumed that the bulk of
metaphors encountered in the corpora would be of the conventional type, which
is in line with previous studies (cf. Jäkel 1997; Steen et al. 2010) and confirmed
by the present results. With this in mind, we turn to some specific findings in the
following section.
5 Results
The method described above elicited a total of 458 metaphors for the target
domain IDEAS across all varieties (Hong Kong, India and Singapore) and aca-
demic sub-registers (Humanities, Natural Science and Social Science) with the
aid of a total of 1,011 X4.1 lexical items as provided by Wmatrix. Therefore, an
initial finding is that a good portion of the words used to talk about IDEAS in New
English academic writing show up in metaphors, namely 45.3 %.
Considering the basic distribution across varieties as well as across sub-reg-
isters of New English academic writing, as illustrated in Table 2, Hong Kong and
Academic Humanities emerge as the most metaphorical variety and the most
metaphorical sub-register, respectively, in terms of conceptualising the IDEAS
domain.
Table 2: Distribution of IDEAS metaphors by variety and sub-register
Varieties: Frequency: Sub-registers: Frequency:
Hong Kong 199 Academic Humanities 336

India 124 Academic Natural Science 30
Singapore 135 Academic Social Science 92
Taking account of the distributional patterns of IDEAS metaphors according to

the source domains involved, some clear preferences can be observed from both
the variety as well as the sub-register perspective. This is demonstrated by the
results in Table 3.
Table 3: Distribution of IDEAS metaphors by sub-register and variety according to source

domains
Hong Kong India Singapore
Natural Science
Natural Science
Natural Science
Social Science
Social Science
Social Science
Humanities
Humanities
Humanities
Total: Total: Total:
ORGANISMS 70 4 8 82 25 3 13 41 37 2 16 55
OBJECTS 47 2 14 63 27 4 5 36 21 1 10 32
ARTEFACTS 16 0 3 19 14 3 3 20 14 1 4 19
STRUCTURES 16 3 1 20 4 4 0 8 10 0 4 14
EVENTS/ 3 1 0 4 8 1 3 12 11 0 2 13
ACTIVITIES
MATTER/ 4 0 3 7 2 0 2 4 1 0 1 2
ENERGY/OTHER
NATURAL
PHENOMENA
IMAGES 4 0 0 4 2 1 0 3 0 0 0 0
Total: 160 10 29 199 82 16 26 124 94 4 37 135
One tendency that is apparent from the data in Table 3 is the reliance on the
source domain ORGANISMS to conceptualise IDEAS. As outlined above, this
broadly formulated category groups together more specific domains such as
PEOPLE and PLANTS. Nevertheless, of the domains in this category, the most
prominent for all varieties and sub-registers is, in fact, PEOPLE. For instance,
the bulk of IDEAS metaphors with the source domain ORGANISMS make use of
the PEOPLE domain. For Hong Kong academic Humanities the PEOPLE domain
is used 84.3 % of the time (59 out of 70), while in India academic Humanities it
is used 96 % (24 out of 25) and 81 % in Singapore academic Humanities (30 out
of 37). In Academic Natural Science texts, no matter what variety, PEOPLE is the
sole domain involved for the ORGANISMS category. This type of metaphor clearly
involves personification, which is in turn “the most obvious ontological meta-
phor” (Lakoff and Johnson 2003: 33). This finding is consistent with other studies
that have pinpointed personification as a characteristic feature of academic texts,
especially of the type “when a non-human entity (referring to some discourse
entity, such as a text) is the subject with a verb that requires a human agent”
(Steen et al. 2010: 108). This type was found in the data across varieties as well as
across sub-registers, as the following briefly illustrate:
Hong Kong Academic Humanities:

(2) the body in contemporary thought that may be regarded as legacies of the Cartesian
view, which treat the body as primarily an object <ICE-HK: W2A-003:67:1>
India Academic Natural Science:

(3) Actually JuddOfelt theory works less satisfactorily <ICE-IND:W2A-025#16:1>
Singapore Academic Social Science:

(4) the concept of dialect groups is too embracing to be able to take care of internal segmen-
tations <ICE-SIN:W2A-016#58:1>
Furthermore, as Table 3 shows, there is also a prominence of another type of onto-

logical metaphor that I have subsumed under the category of OBJECT. Because
PEOPLE and OBJECT metaphors make up the bulk of all metaphors collected,
their analysis merits special attention, which I will turn to below. However,
beforehand, it is worthwhile to consider certain focal points for interpreting the
data from the perspectives of variety and sub-register.
From the cross-discipline perspective, it is perhaps not a very surprising
result that IDEAS metaphors in general are more present in Humanities (113 met-
aphors in total), as compared to the other academic sub-registers (9 in Natural
Science and 10 in Social Science), considering the nature of the domain itself and
its informational contribution to the Humanities texts. IDEAS, as represented in
the corpus by lexical items with the tag X4.1, occur in general more frequently
in the Humanities as compared to Social Sciences and Natural Sciences (624
items in Humanities, 114 in Natural Sciences and 273 in Social Sciences). This,
of course, relates to the specific topical domain(s) of the Humanities texts that
warrant discussion of the respective histories of ideas, at least more often than
in Natural Science or even Social Science writing, which can be exemplified by a
glance at a sample of titles from the Hong Kong corpus texts:
(5) Academic Humanities:

(a) “Re(-)presenting the Unconscious: From Sigmund Freud to Fredric” <ICE-
HK:W2A-002>
(b) “Chinese-Western Comparative Drama in Perspective” <ICE-HK:W2A-007>
(c) “Anthropology and Christology in Christian-Confucian Dialogue” <ICE-
HK:W2A-005>
(6) Academic Natural Science:

(a) “Infections of the Central Nervous System” <ICE-HK:W2A-021>
(b) “Old stone walls as an ecological habitat for urban trees in Hong Kong” < ICE-
HK:W2A-022>
(c) “Patterns of referral to the paediatric specialist clinic of a regional hospital:
descriptive study” < ICE- HK:W2A-023>
(7) Academic Social Science:

(a) “A Strategy for Hong Kong Industries, Inc.” <ICE-HK:W2A-012>
(b) “The prospect of mediation in resolving construction disputes” <ICE-HK:W2A-014>
(c) “The Rehabilitation Development Coordinating Committee and the Future of Ser-
vices Concerning People with Disabilities in Hong Kong” <ICE-HK:W2A-019>
Moreover, IDEAS metaphors, based on this expected topical diversity8, occur with
varying elaborations from discipline to discipline and also functionally contrib-
ute to academic writing in different ways, as will be demonstrated below.
From the cross-varietal perspective, it is difficult to establish what variety
presents itself as most metaphorical on the basis of data concerning one concep-
tual domain. Additionally, although Hong Kong is clearly characterised by the
highest frequency of IDEAS metaphors, these metaphors still show up in com-
parable numbers in Indian and Singaporean academic writing. Therefore, what
is of greater interest here is the consideration of metaphorical variation beyond
frequency. Kövecses (2010: 216) states that “two languages may share the same
conceptual metaphor, but the metaphor is elaborated differently in the two lan-
guages”. For instance, the conceptual metaphors THE BODY IS A CONTAINER
FOR THE EMOTIONS and ANGER IS FIRE have an attested existence in both Hun-
garian and English; in Hungarian the body with fire inside is often elaborated
as a pipe – an elaboration that does not appear to be at work in conventional
English metaphors of this kind (Kövecses 2010: 216). By extending this notion to
the study of varieties, it is possible to establish variation along the lines of this
kind of elaboration. For instance, IDEAS were conceptualised in Hong Kong aca-
8 In the study of metaphor, we should not underestimate the problematic aspect of topical diver-
sity, which is related to the design of the ICE corpora. As far as the author of the present paper is
aware, the ICE texts, despite being carefully selected as representative examples of the text types
comprising the general design of the ICE project, were not selected on the basis of topic similar-
ity. Thus, ICE-based research into metaphor may run into the problem of absence of a domain,
not because a variety does not make use of this domain, but because it just so happens that the
topics of the text selected does not make use of it. This factor, along with the smaller nature of the
ICE components, does in the long run present difficulties for more extensive research into meta-
phor variation, for which more frequencies for a particular domain may be required. However, in
terms of register research, ICE’s design is still the best option for comparative study of varieties
and thus has been used in the present study.
demic Humanities as MORAL GUIDES, illustrated by (8) to (11) in the following

section, which was not found to be part of the mappings for the other varieties.
Whether or not these differences have an overall characterising role for the
study of varieties remains to be seen and would require more extensive research.
However, this does indicate a starting point from which to consider overall meta-
phorical variation along the variety divide. Specifically, it helps to create a basis
for separating those metaphors and metaphorical expressions that are ubiqui-
tous to all varieties from potentially variety-specific conceptualisations or at the
very least variety-specific domain preferences. This will be briefly considered in
the next section, which is followed by a closer look at metaphor variation and
function from the sub-register perspective on New English academic writing.
6 D
iscussion
6.1 Metaphor across New English varieties
In distributional terms, it is clear that the IDEAS domain is conceptualised by all

categories in all three varieties, with the exception of the IMAGES category (no
instances in Singapore academic writing), which did not contribute many meta-
phors in general (four for Hong Kong and three for India). Due to the fact that all
varieties make use of nearly all source domain categories to conceptualise IDEAS,
I conclude that there is no great difference between the varieties, especially in
regard to the non-presence of a certain domain.
Nevertheless, the similarities in domain exploitation for IDEAS metaphors
do not necessarily exclude potentially variety-specific conceptualisations. If we
consider differences in terms of the various entailments or elaborations appar-
ent in shared domain mappings, it becomes clear that varieties, in fact, display a
certain degree of variation. Consider (8) to (11) below from Hong Kong academic
Humanities:
(8) identity as a woman depends on the specific social regulatory ideals by which female
bodies are trained and formed <ICE-HK:W2A-003#26:1>
(9) it is widely accepted that general principles serve to guide moral conduct and decisions
<ICE-HK:W2A-004#114:1>
(10) Ethical behaviour is guided by the ethical ideal of caring and not by principles or rules.
<ICE-HK:W2A-004#125:1>
(11) we are under the guidance of the ethical ideal, that vision of the best self. <ICE-
HK:W2A-004#103:1>
These extracts illustrate an elaboration that could be represented by the concep-

tual metaphor IDEAS ARE MORAL GUIDES, which occurs a total of 11 times in the
Hong Kong corpus. While (8) to (11) display a personification of IDEAS (repre-
sented by principles, ideal and vision) in that they are pursuing a uniquely human
activity, that is, serving as a good example of moral behaviour or actively guiding
and training, this elaboration is not present in Indian and Singaporean academic
writing and, thus, has the potential to be variety-specific. IDEAS ARE MORAL
GUIDES belongs to a more general metaphor, IDEAS ARE DOMINANT PEOPLE
that, by contrast, has been attested for all varieties and shows no major tendency
towards a certain elaboration:
(12) general principles do not always determine what is appropriate <ICE:HK-

W2A-004#139:1>
(13) writers […] are very much influenced by the theories of black Aesthetics <ICE-IN-
D:W2A-009#29:1>
(14) Whereas once EAP was dominated by the concept of registers <ICE-SIN:W2A-007#39:1>
Another example for a variety-specific elaboration comes from Singapore, which

is the only variety to conceptualise IDEAS as a TEACHER, illustrated by (15):
(15) a controversial issue could be either a good or bad teacher by affecting learning through
its contents or through its dynamics. <ICE-SIN:W2A-002#6:1>
Although this TEACHER conceptualisation is unique to the Singaporean corpus,

the notion that IDEAS can impact an individual or a society in a positive or nega-
tive manner, as illustrated in (15), is still part of metaphors found in all varieties,
for instance:
(16) IDEAS ARE PEOPLE WHO HELP

(a) translation as the ideological handmaid of imperialism <ICE-HK:W2A-009#8:1>
(b) they [principles] may all work together to facilitate the use of language <ICE-IN-
D:W2A-002#32:1>
(c) What would soothe her is […] the thought that his action in comforting her is a
response to her need <ICE-SIN:W2A-004#29:1>
(17) IDEAS ARE PEOPLE WHO HARM

(a) many forms of oppressive ideologies <ICE-HK:W2A-003#16:1>
(b) Ayer’s notion of philosophy deprives philosophy of its empirical content <ICE-IN-
D:W2A-001#17:1>
(c) understanding of human phenomena are sometimes distorted by […] political
beliefs, ideology and sheer ethnocentrism. <ICE-SIN:W2A-005#12:1>
All in all, these metaphors show that, despite the potential for individual prefer-
ence for certain elaborations, such as IDEAS AS MORAL GUIDES or TEACHERS,
New English varieties, specifically in academic writing, tend to draw from the
same conceptual pool, that is, their metaphors display more conceptual similar-
ities than differences. This is perhaps not so different from varieties tradition-
ally conceived of as more “standard”, such as British or American English, which
would also speak to the strong conventional nature of the academic register, to
which I turn in the following.
6.2 Discussion: Metaphor across academic sub-registers
Distributional differences in IDEAS metaphors can also be identified from the

sub-register perspective. One obvious observation relates to the distribution
of PEOPLE metaphors. Humanities emerges as the most clearly metaphorical
sub-register for the conceptualising of this domain, followed by Social Science
and Natural Science. Incidentally, each variety individually displays the same
tendency, as portrayed in Table 4.
Table 4: Distribution of IDEAS ARE PEOPLE metaphors
Academic Academic Academic

Humanities Natural Science Social Science
Hong Kong 59 4 7
India 24 3 11
Singapore 30 2 12
Total: 113 9 30
All varieties consistently place Humanities on the more metaphorical side and
Natural Science on the less metaphorical side of the continuum, with Social
Science somewhere in between. This is a general trend for most other categories9,
illustrated for the second most prominent ontological metaphor, IDEAS ARE
OBJECTS, by Table 5.
9 The exceptions are 1) India academic Natural Science and Social Science, which both contain
three IDEAS ARE ARTEFACTS metaphors; 2) Hong Kong and India academic Natural Science has
more IDEAS ARE ARTEFACTS metaphors than SOCIAL SCIENCE; and 3) India academic Natu-
ral Science has one IDEAS ARE IMAGES, whereas India Social Science has none. However, the
frequencies involved here are very small and do not necessarily detract from the general trend.
Table 5: Distribution of IDEAS ARE OBJECTS metaphors
Academic Academic Academic

Humanities Natural Science Social Science
Hong Kong 47 2 14
India 27 4 5
Singapore 21 1 10
Total: 95 7 29
Despite these frequency differences, the New English academic register as a

whole displays a common characteristic, in that it makes use of well-established
conventional metaphors, of the kind we would expect to find in other academic
English varieties. Examples (19) to (20) represent conventional metaphors from
the OBJECTS category of that feature in all academic sub-registers:
(18) IDEAS ARE CONTAINERS

(a) The interesting aspect […] lies in its intra-Asian comparative literature perspective
<ICE-HK:W2A-007#51:1> (Humanities)
(b) one of the major problem in the lanthanide f-f intensity theory <ICE-IN-
D:W2A-025#28:1> (Natural Science)
(c) Such a view hides in it subtle dangers. <ICE-HK:W2A-019#88:2> (Social Science)
(19) IDEAS ARE OBJECTS IN CONTAINERS

(a) the eternal world become latent dream-thoughts stored in the unconscious psyche
<ICE-HK:W2A-002#17:1> (Humanities)
(b) Medical personnel […] should keep this concept in mind <ICE-HK:W2A-027#104:1>
(Natural Science)
(c) the word “wealth” […] now occupies the vacated slot in the dirt dictionary as an
unworthy concept <ICE-IND:W2A-012:56:1> (Social Science)
(20) IDEAS ARE POSSESSIONS

(a) Full cognizance should be given to the influences on the curriculum planner <ICE-
SIN:W2A-002#82:1> (Humanities)
(b) the panoramic view gives the idea that it is more slopy and undulating <ICE-IN-
D:W2A-030#42:1> (Natural Science)
(c) The argument […] was given another perspective <ICE-SIN:W2A-017#63:1> (Social
Science)
However, in view of the topical diversity sketched out above, the mere fact that
these metaphors can occur in all academic sub-registers (albeit for some in small
numbers) is not necessarily an indication of similarity in the way these meta-
phors are elaborated. Just as we entertained the notion of variety-specific concep-
tualisations, we can consider discipline-specific conceptualisations, or at least
preferences, by examining those metaphors from the OBJECTS category that are
not found in all academic sub-registers.
When IDEAS are conceptualised as OBJECTS in general, there is more of a
tendency in the Humanities, first and foremost, and in Social Science, secondly,
to highlight certain qualities, whereas in Natural Science no specific qualities are
attributed to IDEAS as OBJECTS. To exemplify this, consider the following quali-
ties, which can be formulated as individual mappings:
(21) IDEAS ARE MOVEABLE OBJECTS

(a) Wittgenstein has replaced Kant’s concept of mind by language <ICE-IN-
D:W2A-002#7:1> (Humanities)
(b) One view, advanced in the 1920s and 1930s <ICE-SIN:W2A-017#23:1> (Social Science)
(22) IDEAS ARE VISIBLE OBJECTS

(a) Although the concept was never defined formally, it is clear on the basis of these
answers <ICE-HK:W2A-004#57:1> (Humanities)
(b) financial statements that show a “true and fair view” <ICE-IND:W2A-020#32:1>
(Social Science)
Examples (21) to (22) may not illustrate metaphors in the strictest “discipline-spe-
cific” sense due to their occurrence in two separate sub-registers, or they may be
an indication of Social Science containing texts of a more “Humanities” nature
than a “Natural Science” one. Nevertheless, when considering the frequency of
these metaphors, it becomes apparent that Humanities shows a slight preference
for them over Social Science, since IDEAS ARE MOVEABLE OBJECTS occurs 8
times in the Humanities and 6 times in Social Science, while IDEAS ARE VISIBLE
OBJECTS occurs 18 times in the Humanities and only 4 times in Social Science.
In fact, by taking a closer glance at the latter category, we can see a perhaps
more suitable candidate for a discipline-specific elaboration, because IDEAS are
not only VISIBLE OBJECTS in the Humanities, but also represented as VISIBLE
OBJECTS that were previously hidden from view and, by their revealing, have
attained the VISIBLE quality:
(23) IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY HIDDEN FROM VIEW)

(a) the article is a legitimate attempt at establishing rapports de fait […] shedding light
on certain issues <ICE-HK:W2A-007#44:1>
(b) Subsequently they have exposed this notion as a historical and ideological con-
struct <ICE-HK:W2A-009#6:1>
(c) This paper aims to put a step towards that by highlighting certain pragmetic [sic]
principles, some of which may go otherwise unnoticed <ICE-IND:W2A-002#26:1>
This particular elaboration makes up 72.2 % of IDEAS ARE VISIBLE OBJECTS (13
out of 18) in the Humanities texts and perhaps points to a functional role for this
metaphor in this academic sub-register. Humanities texts, often in introductory
sections, typically inform the reader about the history of ideas involved in the
discussion of the topic at hand. For instance, if we consider the greater context of
(23c), a linguistic paper from the India corpus entitled “Pragmatic Principles and
Language”, it becomes clear that IDEAS ARE VISIBLE OBJECTS (PREVIOUSLY
HIDDEN FROM VIEW) functions to locate the paper within these previous ideas
and accentuate its contribution to these ideas:
(24) Philosophers have found pragmatics to be quite close to what they have called “ordi-
nary language analysis”. They have often used isolated insights about the working of
language in solving philosophical riddles without paying much attention to many of
the underlying pragmatic principles of the language that they are using. As they have
primarily concerned themselves with the theories of meaning, rules, and other related
issues, they were forced to study pragmatics of language incidentally without which
they would not have found it possible to explain, for example, what is “meaning”.
A fuller understanding of pragmatic aspects of the working of language is yet to be
achieved despite numerous attempts by philosophers and linguists. This paper aims
to put a step towards that by highlighting certain pragmetic [sic] principles, some of
which may go otherwise unnoticed. <ICE-IND:W2A-002#23:1-26:1>
By conceptualising IDEAS (principles) as becoming VISIBLE OBJECTS in need of

revelation (highlighting, go otherwise unnoticed), it becomes obvious to the reader
that the present article’s aim is to fill those knowledge gaps left by previous “phi-
losophers and linguists”. Incidentally, the other 12 metaphors of this kind found
in the Humanities texts function in exactly the same way. This is not necessarily
evidence for Natural Science or Social Science texts being completely void of this
metaphorical function, despite the data indicating a clear preference for it in the
Humanities, which is most likely due to the nature of the topics that these texts
comprise.
At this point, we could entertain the possibility that metaphors of this kind
act systematically as metaphorical “register features” (cf. Biber and Conrad 2009;
Schubert, this volume; Sanchez-Stockhammer, this volume) due to a register’s
or, in this case, sub-register’s preference for this particular mapping and func-
tion. Furthermore, the more extensively we investigate the relationship between
metaphor and register within the study of varieties of English, we could conceive
of the existence of metaphorical “register markers” (cf. Biber and Conrad 2009;
Schubert, this volume; Sanchez-Stockhammer, this volume), whose uniqueness
is not only determined by the register in which they prominently feature, but also
perhaps by the extent to which a variety is nativised.10
Nevertheless, the present data provides insight into another preference and,
thus, another potential metaphorical register feature that can be seen in IDEAS
ARE PEOPLE, particularly those that stretch beyond the sentence boundary over
a larger portion of the text. Consider (25) below, which serves as an example of
how metaphors can influence textual structuring, that is, how they contribute to
the cohesion as well as coherence of a text more significantly in Humanities than
in Natural Science and Social Science:
(25) It is time for courses to introduce controversial issues in management studies. A con-
troversial issue covers new grounds. It enhances the learning process. It could facil-
itate further the practice of examining, analyzing and deciding skills. However, if not
carefully introduced, controversial issues could generate a disproportionate degree
of confusion, and result in demotivating the students. As such, the introduction of a
controversial issue in the curriculum would have to be properly managed because a
controversial issue could be either a good or bad teacher by affecting learning through
its contents or through its dynamics. <ICE-SIN:W2A-002#6:1-11>
We have encountered this metaphor before as IDEAS ARE TEACHERS (15) and
determined that it is a metaphor specific to the Singapore corpus. However, in
(25) we see that it functions to promote the coherence of the text, because an IDEA
(issue) is portrayed as having all those teacher-like qualities one could expect
when encountering a real teacher: A good teacher covers new grounds (top-
ic-wise), enhances the learning process, facilitates the practice of skills, while
a bad teacher can generate confusion and demotivate students. These qualities
are attributed to IDEAS via the repeated presence of the metaphor IDEAS ARE
TEACHERS, which is then directly stated at the end of the passage, acting as a
summary of sorts.
Here, it is also conceivable to consider this metaphor’s function in creating
cohesion due to the fact that almost each instantiation of IDEAS (issue(s), it) is
embedded in the same metaphor throughout the passage and all are linked by
language pertaining to both helpful attributes of a teacher (e.g. enhancing learn-
ing and facilitating practice of skills) as well as negative attributes (e.g. generat-
10 For extensive discussion about nativisation and the extent to which a variety, as it is develop-
ing, orientates itself towards the English input variety, cf. Schneider’s “Dynamic Model” (Schnei-
der 2007, 2003). Furthermore, research is currently being completed by the author of the present
paper exploring the relationship between metaphor and nativisation and, thus, considering to
what extent a variety, e.g. Indian English, behaves metaphorically different from its traditional
input variety, British English, for certain target domains, e.g. EMOTIONS.
ing confusion and demotivating). This is different for Natural Science and Social
Science texts, which do not give such prominence to IDEAS metaphors, and,
in doing so, leave little room for them to structure their respective texts in this
manner. Again, from this perspective, it seems to make more sense to talk about
potential metaphorical “register features” over “register markers” (cf. Schubert,
this volume).
7 C
onclusions
The assumption behind the present study is that metaphor is a characteristic and
functional feature of the academic register. Although this study focuses on meta-
phors conceptualising a single domain, it shows that, despite traditional notions
of the metaphorical poverty of this register, academic writing is by no means
void of metaphorical language, which, in turn, indicates the presence of concep-
tual metaphors. In particular, New English academic writing, as represented by
the ICE components under investigation, makes use of conventional metaphors
that can be encountered in academic writing associated with more traditional
varieties of English. This is perhaps the result of the highly revised and edited
production circumstances and international reach of this register, which, taken
together, may discourage more variety-specific conceptualisations in favour of
conventional metaphors intelligible to speakers of all varieties and non-native
speakers alike. Despite this conventionality, it is nevertheless possible to point
out potentially variety-specific conceptualisations by taking a finer-grained look
at how a variety elaborates on a more general metaphor. In fact, it is perhaps on
this level of analysis that metaphorical variation across varieties can be encoun-
tered in general. In order to provide more evidence for this, research on other
domains and with other varieties is required.
From the sub-register perspective, it is possible to pinpoint the most meta-
phorical discipline for a specific domain, e.g. Humanities as most metaphorical
for the IDEAS domain. Nevertheless, if other domains were examined, it could
very well be the case that a completely different academic sub-register emerges
as the most metaphorical. Furthermore, for metaphorical variation across the
disciplines in this study of New English academic writing, at this stage it is pos-
sible to identify potential candidates for metaphorical “register features” rather
than metaphorical “register markers” due to the fact that none in the data were
exclusive to one specific academic sub-register, although a preference for certain
metaphors can be determined. This also requires more research, which would
most certainly benefit from the inclusion of other sub-registers or comparison
with metaphorical data from popular texts pertaining to the Humanities, Natural
Sciences and Social Sciences, which the ICE corpora also provide. In terms of
their functional properties, a metaphor conceptualising a certain domain may
exhibit functional features that can only be demonstrated for a particular
sub-register, like signalling a paper’s contribution to a body of research in the
Humanities. However, here again, further research can improve on the study of
metaphorical function by adhering more strictly to a “census” technique, such as
MIPVU, as well as relying on texts that do not display such a topical diversity, as
the ICE components do. Additionally, recent work in metaphorical variation and
the varieties11 exploit the advantages of using a significantly larger corpus, like
Davies’ (2013) Corpus of Global Web-Based English (GloWbE), in order to make
more extensive frequency-based claims about variety-specific domain prefer-
ences as well as to contribute to research into web registers (cf. Biber and Egbert,
this volume) from the cross-variety perspective12. All things considered, employ-
ing metaphor as a feature to investigate both variety-based and register variation
has the potential to provide many more insights into the nature of these highly
relevant fields of study.
References
Anthony, Laurence. 2012. AntConc (Version 3.3.5) [Computer Software]. Tokyo, Japan: Waseda
University. http://www.antlab.sci.waseda.ac.jp/
Archer, Dawn, Andrew Wilson & Paul Rayson. 2002. Introduction to the USAS Category System.
http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf (accessed 5 May 2011).
Berber Sardinha, Tony. 2012. An assessment of metaphor retrieval methods. In Fiona
MacArthur, José Luis Oncins-Martínez, Manuel Sánchez-García & Ana María Piquer-Píriz
(eds.), Metaphor in use: Context, culture, and communication, 21–50. Amsterdam &
Philadelphia: John Benjamins.
Berber Sardinha, Tony. 2007. Metaphor in corpora: A corpus-driven analysis of Applied
Linguistics dissertations. Rev. Brasileria de Lingüística Aplicada 7(1). 11–35.
Black, Max. 1954. Metaphor. Proceedings of the Aristotelian Society 55. 273–294.
Cameron, Lynne. 2003. Metaphor in educational discourse. London: Continuum.
Davies, Mark. 2013. Corpus of global web-based English. http://corpus.byu.edu/glowbe/.
11 Cf. Díaz-Vera’s (2015) study on various conceptualisations of LOVE in India, Pakistan and
Nigeria.
12 GloWbE provides an opportunity to efficiently compare 20 distinct varieties of English world-
wide, of which the bulk could be categorised as belonging to the “New Englishes”.
Deignan, Alice. 2005. Metaphor and corpus linguistics. Amsterdam & Philadelphia: John
Benjamins.
Díaz-Vera, Javier E. 2015. Love in the time of corpora. Preferential conceptualizations of love in
world Englishes. In Vito Pirrelli, Claudia Marzi & Marcello Ferro (eds.), Word structure and
word usage. Proceedings of the NetWordS final conference, 161–165. http://ceur-ws.org/
Vol-1347/paper37.pdf (accessed 13 May 2015).
Drewer, Petra. 2003. Die kognitive Metapher als Werkzeug des Denkens. Zur Rolle der Analogie
bei der Gewinnung und Vermittlung wissenschaftlicher Erkenntnisse. Tübingen: Narr.
Goatly, Andrew. 1997. The Language of metaphors. London & New York: Routledge.
Hardie, Andrew, Veronika Koller, Paul Rayson & Elena Semino. 2007. Exploring a semantic
annotation tool for metaphor analysis. In Matthew Davies, Paul Rayson, Susan Hunston &
Pernilla Danielsson (eds.), Proceedings of the Corpus Linguistics 2007 Conference, 1–12.
http://corpus.bham.ac.uk/corplingproceedings07/paper/49_Paper.pdf (accessed on 19
August, 2011).
Jäkel, Olaf. 1997. Metaphern in abstrakten Diskurs-Domänen. Eine kognitiv-linguistische
Untersuchung anhand der Bereiche Geistestätigkeit, Wirtschaft und Wissenschaft.
Frankfurt am Main: Peter Lang.
Kövecses, Zoltán. 2010. Metaphor: A practical introduction, 2nd edn. Oxford: OUP.
Krennmayr, Tina. 2011. Metaphor in newspapers. Utrecht: LOT.
Lakoff, George & Mark Johnson. 2003 [1980]. Metaphors we live by, 2nd edn. Chicago & London:
Chicago UP.
Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), Comparing English
worldwide: The International Corpus of English, 27–35. Oxford: Clarendon.
Partington, Alan. 1998. Patterns and meanings: Using corpora for English language research
and teaching. Amsterdam & Philadelphia: John Benjamins.
Platt, John, Heidi Weber & Ho Mian Lian. 1984. The New Englishes. London: Routledge.
Pragglejaz Group. 2007. A practical and flexible method for identifying metaphorically-used
words in discourse. Metaphor and Symbol 22(1). 1–39.
Rayson, Paul. 2009. Wmatrix: A web-based corpus processing environment, Computing
Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/
Römer, Christine. 2000. Metaphern in der Wissenschaftssprache: Bildfelder der
sprachwissenschaftlichen Fachkommunikation. In Josef Bayer & Christine Römer (eds.),
Von der Philologie zur Grammatiktheorie, 353–365. Tübingen: Max Niemeyer.
Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: CUP.
Schneider, Edgar W. 2003. The dynamics of New Englishes: From identity construction to dialect
birth. Language 79(2). 233–281.
Semino, Elena. 2008. Metaphor in discourse. Cambridge: CUP.
Semino, Elena, Alice Deignan & Jeannette Littlemore. 2013. Metaphor, genre, and
recontextualization. Metaphor and Symbol 28(1). 41–59.
Skorczynska, Hanna & Alice Deignan. 2006. Readership and purpose in the choice of
economics metaphors. Metaphor and Symbol 21(2). 87–104.
Steen, Gerard J., Aletta G. Dorst, J. Berenike Herrmann, Anna Kaal, Tina Krennmayr & Trijntje
Pasma. 2010. A method for linguistic metaphor identification. From MIP to MIPVU.
Amsterdam & Philadelphia: John Benjamins.
Stefanowitsch, Anatol. 2006. Words and their metaphors: A corpus-based approach. In Anatol
Stefanowitsch & Stefan Th. Gries (eds.), Corpus-based approaches to metaphor and
metonymy, 63–106. Berlin & New York: Mouton de Gruyter.
Wolf, Hans-Georg & Frank Polzenhagen. 2009. World Englishes: A cognitive sociolinguistic
approach. Berlin & New York: Mouton de Gruyter.
Zichler, Csilla. 2010. Metaphern in der Wissenschaftssprache. Sprachtheorie und
germanistische Linguistik 20(1). 95–112.
Steffen Schaub
The influence of register on noun phrase
complexity in varieties of English
Abstract: This study explores noun phrase (NP) complexity variation in registers
of regional varieties of English. The focus is on the description of NP complex-
ity in four registers (academic writing, conversation, unscripted speeches and
social letters) across five regional varieties of English (Canada, Hong Kong, India,
Jamaica, Singapore). For that, noun phrases are extracted from a register-strati-
fied subsample of the International Corpus of English and annotated for NP com-
plexity based on a four-way categorisation system: i) unmodified, ii) premodified
only, iii) postmodified only, iv) pre- and postmodified. The results corroborate the
strong influence of register on NP complexity, depending on two situational char-
acteristics: communicative purpose (informational vs. interactional) and mode
(written vs. spoken). Finally, it is assessed whether NP complexity is a viable
marker of regional variation in comparative varieties research.
1 I ntroduction
This study explores noun phrase (NP) complexity variation in registers of regional
varieties of English. There are three motivations for pursuing this particular
research topic: the lack of descriptive work on the noun phrase in varieties of
English, a growing interest in register variation in English varieties research and
awareness of the strong influence of register on NP structure. These motivations
are discussed in more detail in the following.
Descriptive work on the regional varieties of English has developed a focus
on comparison. With the emergence of comparable linguistic corpora, such as the
International Corpus of English (ICE), linguists have compared individual varieties
against a normative ‘yardstick’ (usually British English) or against each other.
Most of the attention has been devoted to phonology, lexis and morphosyntax.
Interest in the latter was mainly guided by investigations of ‘non-standard’ fea-
tures, i.e. features reported to occur in Englishes around the world that do not
occur in the norm-providing standard varieties. The task is to re-evaluate early
feature reports based on anecdotal observation (e.g. Platt, Weber and Ho 1984)
Steffen Schaub, University of Marburg

252 Steffen Schaub
and to confirm their validity using empirical means. With regard to the noun
phrase across regional varieties of English, three ‘non-standard’ features are
frequently mentioned in surveys and grammatical descriptions: noun pluralisa-
tion (Platt, Weber and Ho 1984; Ahulu 1998; Hall, Schmidtke and Vickers 2013),
use of the article system (Sand 2004; Lamidi 2007; Wahid 2013; Sand forthc.),
and subject-verb concord (Asante 1995; Ahulu 1998; Blair and Collins 2001; Sand
forthc.). Other, less frequently reported phenomena include variation in the
pronoun system (Lamidi 2007; Kortmann and Lunkenheimer 2013), the expres-
sion of possession (Kortmann and Lunkenheimer 2013) and adjective comparison
(Kortmann and Lunkenheimer 2013).
More recently, interest in the noun phrase across varieties of English has
moved beyond the investigation of isolated morphosyntactic features. Brunner
(2014) introduces NP modification patterns as a marker of regional variation
across varieties of English. He compares NP structures in British, Kenyan and
Singapore English and finds that “[i]n Singapore English, premodified NPs are
significantly overrepresented [while] in Kenyan English, postmodifiers are more
frequent than premodifiers” (Brunner 2014: 44). He attributes these preferences
to contact influence from the indigenous languages of the respective areas, based
on their typological profiles (head-final vs. head-initial word order). These find-
ings are drawn from the register of spontaneous spoken conversation, which is
“arguably the least stylized and can therefore be expected to be susceptible to
contact-induced language change” (Brunner 2014: 30). In order to substantiate
the claim that preferences in NP modification are the result of language contact,
it is necessary to study more registers to see if these tendencies can be confirmed.
The notion of ‘register’ is a relatively recent addition to research into vari-
eties of English. Register is defined here, in accordance with Biber and Conrad
(2009: 6), as “a variety associated with a particular situation of use (including
particular communicative purposes)”. So far, English varieties have mainly been
handled as homogeneous entities conveniently defined by the borders of political
nation-states rather than linguistic criteria, but this is not due to a lack of aware-
ness. Already in early reports we find observations that take register variation
into consideration. Platt, Weber and Ho (1984: 49), for instance, frequently dif-
ferentiate between written and spoken as well as formal and colloquial language
when discussing individual features, e.g.: “It is common in some New Englishes
to mark the plural of the noun more often in writing and in more formal speech.
There would be less marking in colloquial speech”. Nevertheless, for much of
World Englishes research, the nation-state variety remained the preferred level of
comparison. Macro-scale projects such as the Electronic World Atlas of Varieties
The influence of register on noun phrase complexity in varieties of English 253
of English (Kortmann and Lunkenheimer 2013) show that demarcating varieties

even at this general level produces a large number of distinct entities.1
An essential component of register research is the analysis of grammatical
features and their function in particular registers (see Schubert, this volume).
With the emergence of comparable, computer-readable corpora in the 1990s,
which are also subdivided into genres, it is possible to move beyond anecdotal
observation and to verify hypotheses about register variation systematically. For
instance, Sand (2004) compares article use across varieties and concludes that
“differences across text types are observable and genre differences within one
variety are practically always more pronounced than overall variation across vari-
eties” (Sand 2004: 294–295). A growing number of studies extend Biber’s (1988)
multidimensional approach to register variation to the study of regional varie-
ties of English (see, for instance, Balasubramanian 2009; Xiao 2009; Neumann
2012; Neumann and Fest, this volume). Balasubramanian (2009: 4) specifically
addresses register variation in Indian English, arguing that
just as traditionally recognized ‘native’ varieties of English are recognized for the variation
within them, so too, should the emerging new varieties. The ‘native’ varieties of English are
recognized for the differences within them stemming from region, social status, and reason
for use or register […] to name just a few variables. […] Any study of a new variety of English,
then, should focus on identifying the variation within it, (and not just on describing a set
of features that characterize the national variety), and provide detailed descriptions of the
national variety […].
Xiao (2009) explores variation across twelve registers and five varieties using the
multidimensional analysis (MDA) approach developed by Biber (1988). The study
encompasses 141 grammatical and semantic features. Xiao concludes that “var-
iations in language use involve regional varieties as well as variants in different
registers and along different dimensions” (Xiao 2009: 447). In sum, register dif-
ferences are increasingly addressed in English varieties research, and it becomes
clear that the influence of register on the overall structural variation of regional
English varieties must be taken into account.
The connection between register and NP complexity has been demonstrated
repeatedly. Aarts (1971) analyses NP complexity across four different text types
and concludes that NP complexity correlates with syntactic function: While the
subject slot prefers ‘light’ noun phrases, the object slot prefers ‘heavy’ ones. In
1 The eWAVE database covers 76 mostly national varieties of English, including, however, a
number of localised dialectal varieties, for instance East Anglian English or Appalachian English
(Kortmann and Lunkenheimer 2013).
254 Steffen Schaub
addition, Aarts found a tendency for heavy noun phrases to be much less fre-
quent in spoken than in written texts. The latter point is taken up by de Haan
(1993), who confirms Aarts’ (1971) hunch about the relation between NP complex-
ity and text type. De Haan (1993) further investigates the combined influence of
text type and syntactic function on NP complexity, and finds that, in some cases,
the two reinforce each other, while in other cases they cancel each other out.
Halliday (1989) argues that spoken language is no less complex than written lan-
guage, but that the complexity is located differently. While spoken language has
a more elaborate clausal structure, in written language, the complexity lies in
the constituents below the clausal level, foremost in what he calls the nominal
group. Nominals, in writing, carry “the meat of the message” (Halliday 1989: 72).
Schäpers (2009), using a corpus of spoken and written British English, confirms
that “[n]oun phrases are more complex in written language with regard to pre-
modification, postmodification, and both pre- and postmodification” (2009: 153).
On the level of registers, Biber et al. (1999) find that almost 60 % of noun phrases
in academic prose have a modifier, while only 15 % of noun phrases in conversa-
tion are modified (Biber et al. 1999: 578). In general, academic prose is character-
ised by a more frequent use of nouns than conversation (Biber and Conrad 2009:
116–117). The linguistic differences between these two registers, Biber and Conrad
argue, can be explained on the basis of their different situational characteristics:
while the purpose of conversation is to develop personal relationships, academic
prose focuses on communicating information (Biber and Conrad 2009: 109). To
sum up, the strong connection between NP complexity and register has been con-
firmed in various studies of British and American English.
The present study combines the three interconnected research interests
outlined above. NP complexity is systematically compared across five varieties
of English (Canadian English, Indian English, Jamaican English, Hong Kong
English and Singapore English) and four registers (academic writing, conversa-
tion, unscripted speeches and social letters). The regional varieties reflect diverse
sociocultural and linguistic backgrounds. The registers were selected as counter-
parts based on two situational characteristics, namely mode (spoken vs. written)
and communicative purpose (information vs. interaction).2
2 Although the texts are meant to represent the extremes of these two situational characteristics,
a strict line cannot be drawn. For example, social letters may also be used to inform, for instance
in work-related exchange between colleagues. Likewise, unscripted speeches contain interac-
tional elements, as will be evident from the discussion of personal pronouns below.
Table 1: Situational characteristics of registers in sample
Mode/Communicative Purpose informational interactional
written academic writing social letters

spoken unscripted speeches conversation
Based on the discussion above, a number of tentative hypotheses can be formu-

lated. First, it is expected that register exerts a strong influence on NP complex-
ity. Matched with the two situational characteristics mode and communicative
purpose, NP complexity is likely to increase a) from interactional to informational
texts, and b) from spoken to written texts. For our four registers, this yields the
following:
– Academic writing is expected to show the highest frequency of complex
noun phrases. This is mainly due to the informational character, the high
level of formality and the careful planning and revision during the produc-
tion process.
– Conversation is a highly interactive face-to-face exchange between two or
more parties. Due to these situational characteristics, a higher frequency of
pronouns, particularly personal pronouns, is expected. Furthermore, conver-
sation is expected to contain the lowest frequency of complex noun phrases
of all four registers, both due to mode and communicative purpose.
– Unscripted speeches are expected to show a higher degree of NP complexity
than conversation. This is due to the formal and informational character of
unscripted speeches. However, complexity is expected to be lower than in
academic writing because of the spoken mode.
– Social letters are expected to contain more complex noun phrases than con-
versation because they are written and are planned and possibly revised
during production. The level of NP complexity, however, is expected to be
lower than in academic writing, because the communicative purpose of
social letters is to interact.
A second motivation of the present study is to further explore the potential of NP

complexity as a marker of regional variation, especially in the light of a regis-
ter-sensitive comparison (see the discussion in Section 4). Due to the exploratory
nature of the study, the results are not tested for statistical significance.
256 Steffen Schaub
2 M
ethodology
The present section describes the data and the annotation process used in the
following analysis. Section 2.1 discusses various categorisation systems used to
mark NP complexity and introduces the system used in the analysis to follow.
Section 2.2 describes the corpus data and the annotation process.
2.1 Categorising NP complexity
There are a number of methods for categorising NP complexity. The simplest is a

binary distinction into ‘simple’ and ‘complex’ noun phrases, although the line is
drawn differently by different authors. The most common understanding of this
two-way distinction distinguishes between the presence and the absence of mod-
ification; in other words, all pre- and/or postmodified noun phrases are ‘com-
plex’,3 while the remaining are ‘simple’. Some authors (de Haan 1993; Biber et al.
1999: 573–655) distinguish four classes of complexity (unmodified, premodified,
postmodified, pre- and postmodified), with determination being optional for
all four types. A more elaborate system is used in Jucker (1992: 259–260), whose
annotation scheme not only specifies the type of head noun and modification(s),
but also records the structural depth of the noun phrase, i.e. the degree of embed-
ding in the modification.
The present study makes use of the categorization system developed in de
Haan (1993), which is also used in Biber et al. (1999: 573–655). It distinguishes four
classes of NP complexity: class 1 comprises all noun phrases that lack modifica-
tion, including pronouns, proper nouns, as well as unmodified common nouns.
In the analysis to follow, class 1 is further subclassified: personal pronouns have
been identified as a word class that is highly sensitive to register, so that a finer
distinction of class 1 into personal pronouns on the one hand and other types
of NP heads on the other is desirable. Class 2 includes all noun phrases that are
premodified only. Class 3 includes all noun phrases that are postmodified only.
Finally, class 4 includes all noun phrases that are both pre- and postmodified.
As a slight modification to de Haan (1993) and Biber et al. (1999), class 4 here
also includes multi-head coordinated constructions, e.g. the men and women. All
four classes optionally contain determination. Although four classes are distin-
guished, the discussions below make occasional reference to the binary simple–
3 In the present paper, determination is not treated as modification.

complex distinction referred to above. The former is identical with class 1, while
the latter comprises classes 2–4. The system is summarised in Table 2.
Table 2: Categorisation system for NP complexity (based on de Haan 1993); the (+) symbol
indicates possible multiple instances
Simple NPs Class 1 (DET) – HEAD –

Class 2 (DET) PREM(+) HEAD
Complex NPs Class 3 (DET) – HEAD POSTM(+)

Class 4 (DET) PREM(+) HEAD(+) POSTM(+)
2.2 Corpus and annotation
The analysis to follow in Section 3 is based on a sample of 8,000 noun phrases

taken from five components of the International Corpus of English: Canadian
English (CAN), Indian English (IND), Jamaican English (JA), Hong Kong English
(HK) and Singapore English (SIN). The varieties were selected in order to repre-
sent both traditional and ‘new’ Englishes, while at the same time covering differ-
ent regions of the world. For each variety, texts from four registers were included:
academic writing (from the sub-register ‘humanities’), conversation, social
letters and unscripted speeches. For each register, three text units comprising
2,000 words were selected at random. The resulting sub-corpus is a selection of
60 text units stratified across four registers and five varieties, totalling approxi-
mately 120,000 words.
In the following, I will describe the annotation process in more detail. First,
the noun phrases are marked in the raw data using a simple bracket-and-label
system. Only top-level noun phrases are marked; in other words, noun phrases
that are embedded in larger noun phrases are not marked separately. As an
illustration of the marking system, consider the sample sentence in (1a) and its
marked version in (1b). Note how the embedded NP the line in the larger NP the
other end of the line is not marked individually.
(1a) Was a pleasant surprise to hear your voice again from the other end of the line.
(1b) Was [NP a pleasant surprise] to hear [NP your voice] again from [NP the other end of
the line].
Randomisation was introduced at two steps in the annotation process. First, as

stated above, for each register–variety combination, three textual units were
picked at random. In these textual units, all noun phrases were marked. Second,
258 Steffen Schaub
a sample of 400 NPs for each register–variety combination was extracted ran-
domly, adding up to a total of 8,000 NPs. In the second step, the extracted noun
phrases were annotated in a spreadsheet: the annotation includes the variables
complexity, based on the four-way categorisation system outlined in Section 2.1,
as well as variety, register and length (in orthographic words).
3 Results
Table 3 shows the frequencies of the four complexity classes across the four reg-
isters for all five varieties combined. In general, simple NPs without modification
(class 1) are most frequent overall (5,084 tokens or 64 %). Complex NPs (classes
2 to 4) are considerably less frequent: NPs with premodification (13 %) and post-
modification (14 %) are relatively equally frequent, while NPs with both pre- and
postmodification are the least frequent class (9 %).
Table 3: NP complexity across registers (class 1 = unmodified NPs incl. pronouns; class 2 =
premodified NPs; class 3 = postmodified NPs; class 4 = pre- and postmodified NPs and coordi-
nated multi-head NPs)
Conversation Unscripted Social letters Academic Total

speeches writing
Class 1 1,559 (77.95 %) 1,291 (64.55 %) 1,402 (70.10 %) 832 (41.60 %) 5,084 (63.55 %)
Class 2 209 (10.45 %) 249 (12.45 %) 249 (12.45 %) 334 (16.70 %) 1,041 (13.01 %)
Class 3 159 (7.95 %) 300 (15.00 %) 209 (10.45 %) 466 (23.30 %) 1,134 (14.18 %)
Class 4 73 (3.65 %) 160 (8.00 %) 140 (7.00 %) 368 (18.40 %) 741 (9.26 %)
Total 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 2,000 (100 %) 8,000 (100 %)
The frequencies of the four classes vary with regard to register: simple NPs (class
1) are frequent in conversation (78 %), unscripted speeches (65 %) and social
letters (70 %), but relatively infrequent in academic writing (42 %). Analogously,
complex NPs (classes 2–4) are relatively infrequent in conversation (22 %) and
highly frequent in academic writing (58 %). Taking into consideration the two
situational characteristics of the registers as defined in the introduction (mode
and communicative purpose), NP complexity increases from spoken to written
mode: social letters have a higher mean NP complexity4 than conversation (1.54
compared to 1.37), and academic writing has a higher mean NP complexity than
unscripted speeches (2.18 compared to 1.66). In addition, NP complexity increases
from interactional to informational communicative purpose: unscripted speeches
have a higher mean NP complexity than conversation (1.66 compared to 1.37),
while academic writing has a higher mean NP complexity than social letters (2.18
compared to 1.54).
Table 4: NP complexity across varieties
CAN HK IND JA SIN Total
Class 1 1,034 989 1,001 997 1,063 5,084

Class 2 193 251 216 173 208 1,041
Class 3 210 219 227 267 211 1,134
Class 4 163 141 156 163 118 741
Total 1,600 1,600 1,600 1,600 1,600 8,000
Table 4 shows the distribution of complexity classes across the varieties for all
registers combined. Class 1 is the most frequent and class 4 is the least frequent
in all varieties (with a relatively low value in Singapore English). Looking at
classes 2 and 3, the frequencies are differently balanced across varieties: while
most varieties have a higher frequency of class 3, Hong Kong English shows a
tendency towards class 2. Furthermore, the frequencies of classes 2 and 3 are rel-
atively balanced in some varieties (Indian English, Canadian English, Singapore
English), while in others there is greater divergence (Jamaican English, Hong
Kong English).
Both Table 3 and Table 4 provide a general overview of NP complexity distri-
butions across register and variety. They allow the formulation of first tentative
conclusions, such as variety-specific tendencies towards particular classes (e.g.
pre- or postmodified NPs). As a second step, it is necessary to look at the distribu-
tion of NP classes across both varieties and registers simultaneously.
4 Mean NP complexity is defined here as a numeric value ranging from 1.0 to 4.0. It is the sum of
complexity values of n noun phrases divided by n. The higher the mean value, the more frequent-
ly we find ‘complex’ noun phrases, i.e. classes 2–4.
260 Steffen Schaub
Table 5: NP complexity in conversation across all varieties
CAN HK IND JA SIN
Class 1 314 298 303 312 332

Class 2 36 59 43 27 44
Class 3 33 31 34 47 14
Class 4 17 12 20 14 10
Table 6: NP complexity in unscripted speeches across all varieties
CAN HK IND JA SIN
Class 1 286 234 263 233 275

Class 2 33 78 41 45 52
Class 3 56 51 67 82 44
Class 4 25 37 29 40 29
Table 7: NP complexity in social letters across all varieties
CAN HK IND JA SIN
Class 1 275 293 258 301 275

Class 2 55 47 54 36 57
Class 3 31 41 48 37 52
Class 4 39 19 40 26 16
Table 8: NP complexity in academic writing across all varieties
CAN HK IND JA SIN
Class 1 159 164 177 151 181

Class 2 69 67 78 65 55
Class 3 90 96 78 101 101
Class 4 82 73 67 83 63
Tables 5 to 8 show the distribution of the complexity classes for each individual
register across all varieties. In the following sections, the registers are discussed
separately.
3.1 A
cademic Writing
Academic writing yields the highest frequency of complex NPs across all classes
(2–4). This is expected, as academic writing is characterised by dense informa-
tion packaging (due to its informational communicative purpose) and carefully
planned and revised production, both of which facilitate the use of complex NPs.
In academic writing, NPs contain elaborate pre- and postmodification, and they
typically contain the majority of lexical content of a sentence. Examples (2) to (6)
illustrate typical uses of noun phrases in academic writing (NPs are emphasised
in bold).
(2) The left side of Ayearst’s diptych reproduces in painstaking detail, and with close
attention to seventeenth-century techniques of glazing, Rembrandt’s frag-
mentary Anatomy Lesson of Dr Joan Deijman of 1656, now in the Rijksmuseum,
Amsterdam. (ICE-CAN:W2A-001#10:1)
(3) The integration of these two perspectives can form a more comprehensive picture
of the person of Jesus Christ. (ICE-HK:W2A-005#14:1)
(4) The whole misunderstanding about Hume’s philosophical position is the
outcome of his treatment of causation that is often misunderstood. (ICE-IN-
D:W2A-001#58:1)
(5) The casual centrality of the ‘supernatural’ in Brodber’s fiction is also an excellent
example of the writer’s adaptation of marginalised thematic concepts from the
oral tradition which she legitimises in the very process of ‘writing them up’. (ICE-
JA:W2A-005#X14:1)
(6) Though Wittgenstein was mainly concerned with the problem of philosophical
explanation, his writings on the relation between language and thought and
language and meaning have tremendous implications for both the theory and
practice of linguistic science. (ICE-SIN:W2A-005#48:1)
Analogously, academic writing has the lowest frequency of class-1 (or ‘simple’)
NPs in our sample (832 tokens or 41.6 %). The relatively low frequency of unmodi-
fied noun phrases can likewise be accounted for by the informational character of
the register: unmodified noun phrases carry less information than modified ones.
Personal pronouns are particularly uncommon: only 225 tokens (11 % of all NPs
in academic writing) are realised by personal pronouns, the most frequent being
it (61 tokens) and I (33 tokens). 1st and 2nd person pronouns are rare, which can
be attributed to the fact that interaction in academic texts is uncommon. The 2nd
person pronoun you is particularly rare, since no specific addressee is involved.
With regard to regional variation, I find that academic writing is largely
homogeneous across varieties. Few differences appear to exist with regard to
pronouns, although two exceptions are worth a brief discussion here. The first
person singular pronoun I occurs more frequently in some varieties (Hong Kong
262 Steffen Schaub
English: 15; Jamaican English: 10) than in others (Canadian English: 2; Indian
English: 4; Singapore English: 2). However, it would be premature to attribute
a more personal writing style to the Hong Kong and Jamaican English varieties
based on such low absolute frequencies. Secondly, looking at the frequencies
of you, it is noteworthy that the sample contains six occurrences in Singapore
English, while the remaining varieties have zero occurrences. A closer look at
the data reveals that all occurrences of you in Singapore English originate from
one text unit, which is not an academic text in the traditional sense, but instead
could best be described as a guide to real estate investment in Singapore. This
text unit is characterised by a much more interactive style of writing; it frequently
addresses the reader directly and makes use of imperatives, e.g. Take advantage
of this law (ICE-SIN:W2A-001#48:1), or Invest your CPF savings in property (ICE-
SIN:W2A-001#49:1). Whether such a text constitutes an instance of academic
writing, much less in the humanities, is debatable. Nevertheless, the text could
be clearly distinguished from other texts of the same register on the basis of one
grammatical feature.
There are slight indications of regional variation in the distribution of the
complex NP classes, for instance the relative overuse of class 2 and underuse of
class 3 in Indian English. Overall, however, there appears to be little variation
in academic writing across varieties. This can be interpreted in two ways: one,
there is no discernible difference between regional varieties for this register. An
argument in favour of this interpretation would be that the homogeneity of the
register, and by extension its conformity on an international level, is guaranteed
by the publication process. A second interpretation is that the level of abstrac-
tion in categorising NP complexity, as it is used in this analysis, is too superficial
to bring to light any discernible differences; in other words, although there may
be no differences across regional varieties on the superficial level of abstraction
assumed here, significant distributional differences might be observed when, for
instance, specifying the types of modification involved. At this point, however,
we have to conclude that we cannot find regional variation with regard to NP
complexity in academic writing.
3.2 C
onversation
Conversation has the highest frequency of simple noun phrases of all registers in
the study (78 %). This is in line with Biber et al., who find that ca. 85 % of all NPs
in their conversation data have no modifier (Biber et al. 1999: 578). Of the class-1
NPs in conversation, more than half are personal pronouns (857 tokens, or 55 %).
This also confirms Biber et al.’s finding that “pronouns are slightly more common
than nouns in conversation” (Biber et al. 1999: 235). The relatively frequent reli-
ance on pronouns is due to the “shared situation and personal involvement of the
participants” (Biber et al. 1999: 235).
Class-2 NPs are the most common type of modified noun phrase in conversa-
tion. They account for 10 % of the NPs. With regard to premodification, Biber et
al. find that the vast majority of premodification sequences in noun phrases does
not exceed two words (Biber et al. 1999: 597). This is confirmed in the present
analysis: the average length (in orthographic words) of class-2 NPs in conversa-
tion is 3.2 (including head and any determiners). This means that premodification
amounts to 1–2 words on average. The most common type of premodification is
by adjective or noun, optionally including a determiner, as the examples below
illustrate.
(7) Uhm because David does say that hiking boots make an enormous difference not
slide on anything (ICE-CAN:S1A-001#3:1:A)
(8) Sometimes uhm the people uh sorry people of India they are they belong to different
communities and they have their separate cultures (ICE-IND:S1A-005#62:1:B)
Longer class-2 NPs (>3 words) are uncommon and usually the result of correction
or coordination, as can be seen in examples (9) and (10) below. Proper cases of
multiple premodification, as in examples (11) and (12), are rare. This is because
the real-time analysis of longer premodification sequences places a heavy cogni-
tive burden on the listener, rendering spoken communication ineffective.5
(9) I know because I I can’t talk to an answering machine telephone answering

machine <unc> three-words </unc> (ICE-HK:S1A-009#4:1:D)
(10) nine hundred but on average about four hundred five hundred dollars both lah the
reception and the sanctuary (ICE-SIN:S1A-001#33:1:A)
(11) A very bright cheerful smiling face (ICE-IND:S1A-001#108:1:A)
(12) We are entirely functional loving human beings (ICE-CAN:S1A-009#54:1:B)
Postmodified NPs (class 3) are relatively uncommon in conversation (8 %). Post-

modification tends to be slightly longer than premodification. The mean word
length of the former is 7.1 (as compared to 3.2). This value is relativised to some
extent when looking at the median, which is 5. Subtracting head and optional
determiner, this means that the length of postmodification averages between 3–4
5 See Quirk et al. (1985: 1039): “Considerable left-branching is possible in the noun phrase, […]
although comprehension becomes more difficult as the complexity of left-branching increases”.
264 Steffen Schaub
words. The slightly higher mean value (7.1) is caused by rare instances of complex
postmodification, as in examples (13) and (14).
(13) Uhh I remember my friend Mendela that beautiful millionaire meatpacker from
Saskatoon who was so nice to me when I was a young man […] (ICE-CAN:S1A-
009#85:1:A)
(14) Naturally if Mitterand President Mitterand [sic] can run his government for a period
of ten years uh why India cannot have a government consisting of some <uh> party
<uh> national party national party representing the national capital or some pro-
gressive elements <uh> <uh> in some some political parties like Congress-I Con-
gress-S or even Janata Dal with some <uh> radical members belonging to <uh>
communist party or socialist party (ICE-IND:S1A-005#19:1:A)
Finally, class-4 NPs are extremely rare in conversation, accounting for only 4 %
of all noun phrases in the data. The most frequent type is a combination of a one-
word (nominal or adjectival) premodification plus postmodification by a short
prepositional phrase (usually with of), as the following examples illustrate:
(15) But what is after the road No the other side of the road (ICE-SIN:S1A-001#88:1:B)
(16) I said I behave as if this might be the last day of my life […] (ICE-CAN:S1A-009#88:1:A)
(17) […] and you would have seen a different spin to the thing (ICE-JA:S1A-009#X67:1:A)
Orthographically longer class-4 NPs are often the result of multiple coordina-
tion or performance phenomena, including repetitions, repairs and hesitations.
Example (18) is a coordinated list of postmodified NPs, which contains several
repairs and repetitions as well as a hesitation marker (uh).
(18) Political exchange <uh> tourist exchange tourist exchange or scholars exchange
of scholars or exchange of technocrats (ICE-IND:S1A-005#37:1:A)6
Comparing the frequencies of the conversation data across varieties, we observe

distributional differences, which are mainly the result of individual varieties over-
or underusing certain complexity classes. We can pinpoint a) a relative overuse of
class-2 NPs in Hong Kong English, b) an underuse of class 2 in Jamaican English,
c) an overuse of class 3 in Jamaican English, and d) an underuse of class 3 in Sin-
gapore English. Looking at the data, however, it is difficult to identify a pattern
which explains the over- or underuse (see discussion in Section 4).
6 The example in (18) is assigned the complexity value 4, as it is a coordinated (multi-head) con-
struction (see Section 2.1). A ‘cleaned-up’ version of the noun phrase could be political exchange,
tourist exchange or exchange of scholars or exchange of technocrats.
3.3 U
nscripted speeches
Unscripted speeches are characterised by their spoken mode, a spontaneous,

conversation-like production situation and the informational and/or persuasive
communicative purpose. With regard to NP complexity, unscripted speeches rank
between conversation and academic writing. While NP complexity is expected to
be high due to the register’s informational communicative purpose, it is expected
to be low because it is unscripted and spoken. The result is an intermediate level
of NP complexity with slightly higher frequencies in the three complex noun
phrase types, as compared to conversation.
Unscripted speeches have the third-highest frequency of class-1 NPs in the
sample (1291 tokens or 65 %). Personal pronouns constitute about half of the
class-1 NPs (683 tokens or 53 %). The most frequent personal pronouns are I (162),
you (126) and it (109). The reliance on personal pronouns can be related to the
setting, since speeches usually take place in public in front of an audience and
speakers use personal pronouns to create an impression of interaction between
themselves and the audience. Furthermore, speeches frequently have the purpose
of persuading the audience, which is facilitated by direct references, such as I
and you. Examples (19) and (20) illustrate the kind of direct addressing typically
found in speeches.
(19) Okay don’t think that they’re going to give you time okay after your job interview Don’t
think they’re going to take care of you in a very big way okay (ICE-CAN:S2A-021#29–
30:1:A)
(20) You have to vote more opposition strong opposition not only to establish opposition in
parliament Make opposition part of our political culture not only that but also an effec-
tive an effective hammer over the head of PAP If you don’t do that what will happen
You can bet your last dollar after this election prices will sure to go up (ICE-SIN:S2A-
021#34–37:1:A)
With regard to complex noun phrases, unscripted speeches have the second-high-
est overall frequency in the sample (35 %). This is due to the informational com-
municative purpose of speeches, which necessitates the use of modified noun
phrases to convey information. The overall level of NP complexity is higher in
speeches than in social letters, despite the latter being written. In direct compari-
son, unscripted speeches and social letters make equally frequent use of premod-
ification, while in classes 3 and 4, unscripted speeches surpass social letters. Like
in conversation, the tendency for a stronger reliance on postmodification instead
of premodification in unscripted speeches can be explained on the basis of easier
comprehensibility of right-branching (see Quirk et al. 1985: 1039).
266 Steffen Schaub
Comparing the results across varieties, the following observations are note-
worthy: assuming an even distribution, the frequency of premodified NPs (class
2) is relatively low in Canadian English (33 tokens) and high in Hong Kong English
(78 tokens). Furthermore, postmodified NPs (class 3) are relatively frequent in
Jamaican English (82 tokens), but infrequent in Singapore English (44).
3.4 S
ocial letters
Class-1 NPs are by far the most frequent noun phrase class in social letters, con-
stituting between 65 % and 75 % of all NPs in each 400-word variety sample. Per-
sonal pronouns form the majority of class-1 NPs (ranging from 52 % to 61 % across
varieties). This can be attributed to the interactional character of social letters,
which mainly rests on the frequent use of I and you.
The frequencies of class-2 and class-3 NPs are relatively balanced, with a slight
preference for class 2. Class 4 is the least frequent noun phrase type in this reg-
ister across all varieties, with the exception of Canadian English. Constructions
in this category show a range of variation. A typical kind of class-4 construction
are multi-head NPs coordinated with and or or. Class-4 NPs which are not coordi-
nated are often nouns premodified by one adjective or noun and postmodified by
a prepositional phrase, as in the examples (21) to (23). Complex noun phrases in
social letters are very similar to those found in conversation and form a contrast
to the lexically heavy class-4 NPs found in academic writing.
(21) I hope that I will be able to come to Kolhapur in the first week of Jan. (ICE-IN-
D:W1B-002#47:1)
(22) My point is that if one can love the other person without calculate what one can get back
from the relationship, this will be the greatest love of all. (ICE-HK:W1B-001#144:5)
(23) The team is still waiting for a final reply from the administration of this university
but I’m not optimistic. (ICE-SIN:W1B-001#148:2)
More complex examples are rare in social letters. Long, heavily modified noun
phrases clearly originate from letters with an academic background, as example
(24) illustrates.
(24) I would need a formal invitation from you for collaboration with specific refer-
ence to the project & [sic] that it would not involve financial liabilities for the
University. (ICE-IND:W1B-005#7:1)
In general, the register category of ‘social letters’ in ICE contains heterogene-

ous content, with some letters discussing everyday activities (e.g. basketball
practice, reports from an exchange year) and others clearly coming from an aca-
demic context (correspondence between students and professors). NP complex-
ity is higher in the latter. It remains debatable whether one text category should
include both subtypes.
Comparing NP complexity across varieties, there is relative underuse of pre-
modified NPs in Jamaican English, overuse of postmodified NPs in Singapore
English, and overuse of pre- and postmodified NPs in Canadian English and
Indian English.
4 NP complexity across varieties

In this section, I review the potential of NP complexity as a marker of variation
across regional varieties of English. As discussed in the introduction, the field is
currently in the process of shifting from studies of regional nation-state varieties
as holistic entities, and towards acknowledging register variation. Any compara-
tive study of varieties of English, it is argued, must take register into account. The
inclusion of register leads to a more discriminating picture of structural prefer-
ences in regional varieties of English. Such preferences may occur in
a) one or more varieties and one specific register,
b) one or more varieties and several registers (with shared situational character-
istics), and
c) one or more varieties as a whole (i.e. in all registers).
The preceding sections already isolated the first, namely variety-plus-regis-

ter-specific preferences, such as the relative overuse of premodified NPs in Hong
Kong conversational data. Regarding the second ― registers that share one situ-
ational characteristic ― we can identify a number of variety-specific tendencies.
Again assuming even distribution within NP classes across varieties, the follow-
ing tendencies can be observed:
– relative overuse of premodified (only) NPs in spoken Hong Kong English
– relative overuse of postmodified (only) NPs in spoken Jamaican English
– relative underuse of postmodified (only) NPs in spoken Singapore English
– relative underuse of premodified (only) NPs in interactional Jamaican
English.
These preferences can be matched with descriptions of varieties of English: for

instance, the relative overuse of premodification in spoken Hong Kong English
is in line with the description in Setter, Wong and Chan (2010: 61). Although this
268 Steffen Schaub
approach enables us to isolate structural preferences of NP complexity for par-

ticular varieties and register situations, these tendencies do not take into account
other factors influencing NP complexity, such as syntactic function, and have to
be interpreted with caution.
The explanation most commonly offered for the emergence of structural
innovations in varieties of English (in particular, postcolonial or New Englishes)
is language contact (also called ‘transfer’ or ‘cross-linguistic influence’). Gut
(2011: 105) points out that “as yet there exists no reliable method of quantifying
the relative contribution of cross-linguistic influence on any structure produced
by language learners”. This is especially true for NP modification patterns, which
are strongly influenced by other factors, such as register and syntactic function.
In addition to that, NP modification patterns can only be identified in the form
of (statistical) preferences and are thus not directly identifiable as the result of
contact-induced change (unlike, for instance, loanwords). The approach in the
present study is suitable for detecting candidates for such structural tendencies.
However, more factors need to be included and weighed against each other in
order to confirm these preferences (see Schilk and Schaub forthc.).
5 Conclusion and outlook

This study systematically compared NP complexity in a selection of registers and
across a range of regional varieties of English. The results, based on data from
five varieties of English, corroborate the strong connection between NP complex-
ity and register. Across all regional varieties, NP complexity correlates with two
situational register characteristics:
– communicative purpose: NP complexity increases from interactional to infor-
mational registers, and
– mode: NP complexity increases from real-time and spoken to planned and
written registers.
Overall, NP complexity is largely homogeneous within registers across the

regional varieties. Consistency is higher in registers which have stricter codifica-
tion (e.g. academic writing). Nevertheless, assuming even distribution across all
varieties, it is possible to isolate individual varieties which show relative over- or
underuse of particular NP structures. Furthermore, it is possible to match such
preferences for pairs of registers that share situational characteristics. NP com-
plexity has already been established as a register marker, and, it is argued here,
is a viable marker of regional variation on the register level.
There are numerous ways in which subsequent research can improve on the
study presented here. First, the database of noun phrases has to be extended to
provide a more solid empirical foundation. Second, by adding further annotation
to the data, such as syntactic function, type of head noun, and type of modifi-
cation, more fine-grained statements about differences in NP complexity across
varieties are possible. This study has also shown that random selection of text
units from the International Corpus of English for the purposes of a register anal-
ysis is not desirable. The texts included in some of the register categories in ICE
are too heterogeneous. Instead, text units have to be carefully selected in order to
ensure compatibility across varieties. Finally, any variety-specific structural pref-
erences have to be matched against the typological inventory found in the sub-
strate languages. Only then is it possible to draw any connections to the possible
origin of such preferences, and to substantiate claims about structural transfer.
6 R
eferences
Aarts, Flor G. A. M. 1971. On the distribution of noun-phrase types in English clause-structure.
Lingua 26. 281–293.
Ahulu, Samuel. 1998. Grammatical variation in international English. English Today: The
International Review of the English Language 14(4). 19–25.
Asante, Mabel Yeboah. 1995. Ghanaian English: Motivation for divergence from the standard
in certain grammatical categories. Tübingen: Eberhard Karls University Tübingen
dissertation.
Asante, Mabel Yeboah. 2012. Variation in subject-verb concord in Ghanaian English. World
Englishes 31(2). 208–225.
Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam:
Benjamins.
Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad & Edward Finegan. 1999.
Longman grammar of spoken and written English. 9th impr. (2011). Harlow: Longman.
Blair, David & Peter Collins. 2001. English in Australia. Amsterdam: John Benjamins.
Brunner, Thomas. 2014. Structural nativization, typology and complexity: Noun phrase
structures in British, Kenyan and Singaporean English. English Language and Linguistics
18. 23–48.
Fludernik, Monika & Bernd Kortmann (eds.). 2012. Proceedings: Anglistentag 2011 Freiburg.
Trier: Wissenschaftlicher Verlag Trier.
Gut, Ulrike. 2011. Studying structural innovations in new English varieties. In Joybrato
Mukherjee & Marianne Hundt (eds.), Exploring second-language varieties of English and
learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44), 101–124.
Amsterdam: John Benjamins.
270 Steffen Schaub
Haan, Pieter de. 1993. Noun phrase structure as an indication of text variety. In Andreas H.
Jucker (ed.), The noun phrase in English: Its structure and variability, 85–106. Heidelberg:
Winter.
Hall, Christopher J., Daniel Schmidtke & Jamie Vickers. 2013. Countability in World Englishes.
Halliday, Michael A. K. 1989. Spoken and written language. 2nd edn. Oxford: OUP.
Jucker, Andreas H. 1992. Social stylistics: Syntactic variation in British newspapers (Topics in
English Linguistics 6). Berlin: Mouton de Gruyter.
Jucker, Andreas H. (ed.). 1993. The noun phrase in English: Its structure and variability.
Heidelberg: Winter.
Kortmann, Bernd & Kerstin Lunkenheimer (eds.). 2013. The electronic world atlas of varieties of
English. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://ewave-atlas.
org (accessed 28 February 2015).
Lamidi, Mufutau T. 2007. The noun phrase structure in Nigerian English. Studia Anglica
Posnaniensia: An International Review of English Studies 43. 237–250.
Mukherjee, Joybrato & Marianne Hundt (eds.). 2011. Exploring second-language varieties of
English and learner Englishes: Bridging a paradigm gap (Studies in corpus linguistics 44).
Amsterdam: John Benjamins.
Neumann, Stella. 2012. Applying register analysis to varieties of English. In Monika Fludernik
& Bernd Kortmann (eds.), Proceedings: Anglistentag 2011 Freiburg, 75–94. Trier:
Wissenschaftlicher Verlag Trier.
Platt, John, Heidi Weber & Mian Lian Ho. 1984. The new Englishes. London: Routledge and
Kegan Paul.
grammar of the English language. 4th edn. London: Longman.
Sand, Andrea. 2004. Shared morpho-syntactic features in contact varieties of English: Article
use. World Englishes 23(2). 281–298.
Sand, Andrea. forthc. Angloversals? Shared morpho-syntactic features in contact varieties of
English. Amsterdam: Benjamins.
Schäpers, Uta Katharina Elisabeth. 2009. Nominal versus clausal complexity in spoken and
written English: Theory and description (English Corpus Linguistics 8). Frankfurt: Peter
Lang.
Schilk, Marco & Steffen Schaub. forthc. Noun phrase complexity across varieties of English:
Focus on syntactic function and text type. English World-Wide 37(1).
Setter, Jane, Cathy Wong & Brian Chan. 2010. Hong Kong English. Edinburgh: Edinburgh UP.
Wahid, Ridwan. 2013. Definite article usage across varieties of English. World Englishes 32(1).
23–41.
Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World
Englishes 28(4). 421–450.
Valentin Werner
Real-time online text commentaries:
A cross-cultural perspective
Abstract: In the area of electronically-mediated communication, real-time online
text commentaries (OTCs) as a new specialised register have become popular as an
alternative to traditional broadcasting. OTCs have been recognised as “mediated
quasi-interaction” (Chovanec 2010) and a hybrid genre showing characteristics
of spoken discourse within a written mode (Jucker 2006), as well as a character-
istic combination of simultaneous information and entertainment (“infotain-
ment”), where familiarity or “pseudo-intimacy” (O’Keeffe 2006; cf. Chovanec
2008) between commentator and the audience is created. This contribution helps
to situate this emerging register from a cross-cultural perspective. I use OTCs by
English and German media outlets from the EURO 2012 football championship to
tackle the following issues with the help of a corpus-linguistic approach: (i) What
are register-specific structural features of OTCs? (ii) Are there any culture-specific
aspects along language boundaries or the dimension “intended readership”? I
also consider the interaction of layout and content, production circumstances,
and the influence of recent developments (such as the incorporation of Twitter
messages) on reporting styles.
0 I ntroduction
Real-time online text commentaries (henceforth OTCs)1 have become more and
more popular2 and represent an alternative to traditional live TV and radio broad-
1 Alternative labels are live text commentary (LTC), live blogging, live ticker, news ticker and the
more sport-specific minute-by-minute report (MBM) or live match tracker.
2 According to a recent survey, OTCs have become “the default format for covering major break-
ing news stories, sports events, and scheduled entertainment news”, even surpassing online
articles and picture galleries in popularity (Thurman and Walters 2013: 82; cf. also Wells 2011).
The growing importance of the format is revealed both by the sheer number of OTCs (almost 150
per month for The Guardian) and also in terms of page view counts, which are at least twice as
high for OTCs compared to articles and galleries. User reports seem to confirm that OTCs are the
Valentin Werner, University of Bamberg

272 Valentin Werner
casting, reporting and commenting on live events controlled for duration, loca-
tion and topic (cf. Siever 2011: 171), in particular major sports events. As the name
implies, they are usually categorised as a written form of web communication (cf.
Biber and Egbert, this volume) and are similar to (we)blogs in that they consist of
individual consecutive postings (cf. Grieve et al. 2010: 303).
While previous research has recognised the narrative properties and ana-
lysed the vocabulary and morphosyntax of football reportage in general (Brandt
and Quentin 1983; Ghadessy 1988; Hennig 2000; Krone 2005; Müller 2007; Levin
2008), others have noted that OTCs as “mediated quasi-interaction” (Fairclough
1995: 40) constitute a hybrid register: They show characteristics of spoken dis-
course within a written mode (Jucker 2006; cf. Lakeberg, this volume) and are an
interesting combination of simultaneous information and entertainment (“info-
tainment”). Thus, familiarity or “pseudo-intimacy” (O’Keeffe 2006: 92; cf. Cho-
vanec 2008, 2010; Jucker 2010) between the commentator and the audience is
created.
Two further issues are important for establishing OTCs as a register, defined
(following Biber 1988) as language variety by situational (i.e. non-linguistic)
characteristics (see also Schubert, this volume). First, “situational context tends
to exert functional pressures on linguistic output” (Grieve et al. 2010: 315), which
implies there should be common linguistic features traceable across different
OTCs, particularly if they report on the same matches. Second, there is the con-
trastive view. It was hypothesised, albeit for other types of football reportage, that
“[t]ypological differences between […] two languages are expected to be neutral-
ised to a certain degree” (Krone 2005: 51) when texts from two languages fulfil the
same function (in our case, football match reportage). Others (e.g. Müller 2007:
44), however, have emphasised that cultural differences may lead to noticeable
stylistic differences.
Starting from these observations, this paper will address the following
aspects with the help of a corpus-linguistic approach:
(i) What are register-specific structural features of OTCs on different levels of
linguistic analysis?
online format par excellence to track stretches of live events, as more than 35 % of respondents
follow OTCs continuously. Nearly two fifths of all OTCs are sports-related (Thurman and Walters
2013: 82–95). Data for Der Spiegel are in line with these general findings as OTC football report-
age receives more than 1 million clicks per match (See <www.spiegelgruppe.de/spiegelgruppe/
home.nsf/0/CEF3A44164AED9BBC1256F720034CBAC>, accessed 20 April 2013).
(ii) Are there any culture-specific aspects along language boundaries or the
dimension “intended readership”; or do OTCs rather form a relatively uniform
cross-linguistic/cross-cultural register?
Further aspects addressed are the interaction of the layout of OTCs with their
content as well as the influence of very recent developments (such as the incor-
poration of Twitter messages) on the style of reporting.
After a few notes on data and methodology, the present study first sets out to
locate OTCs as a register in general terms in Section 2. Section 3 provides an anal-
ysis of the language of OTCs, focusing on vocabulary and collocations and related
semantic aspects, discourse features and potential implications of the interaction
of format and textual commentary. A discussion of OTCs as a cross-cultural reg-
ister follows in Section 4, while Section 5 sums up the results and presents some
generalisations as well as avenues of further research.
1 Data and methodology

While previous research on OTCs has been dedicated almost exclusively to foot-
ball reportage from The Guardian (Chovanec 2008, 2009, 2010, 2011; Perez-Sa-
bater et al. 2008; but cf. Jucker 2006, 2010), the present analysis is based on
OTCs of two English (The Sun, henceforth SUN; The Guardian, henceforth GUAR)
and two German (Bild, henceforth BILD; Der Spiegel (online), henceforth SPON)
media outlets, all stemming from the coverage of the UEFA 2012 EURO Champi-
onship Finals. This facilitates the comparison of the reportage along language
boundaries as well as along the dimension of intended readership.
The print versions of both SUN and BILD can be categorised as tabloids, pre-
dominantly aimed at a working-class readership (see also Höke 2007). Another
common feature is their circulation, with approximately 2.4 million (SUN) and
2.5 million (BILD) copies sold on a daily basis, making them the most popular
papers in England and Germany respectively. In contrast, GUAR and SPON can be
viewed as quality-press products, primarily catering to a middle-class readership.
Their circulation, with 0.2 million daily (GUAR) and 1.0 million weekly (SPON), is
less extensive. The online versions of all four sources are amongst the news sites
visited most often nationwide (Press Gazette 2013; see also <daten.ivw.eu/index.
php>). For the present analysis it is presumed that the intended readership of
the online version roughly corresponds to the intended readership of the printed
version (cf. Newsworks 2013a, 2013b).
274 Valentin Werner
To have a comparable dataset, the corpus includes a total of 36 match reports

for the English and the German squads (amounting to a token count of 120,414
words; see the appendix for a detailed list). In the first instance, the main focus
of the structural analysis lies on the English OTCs, which are organised in a linear
way, usually in reverse order and time-stamped (see further Section 2).
For the extraction of the data, the running text chunks with the correspond-
ing time stamps were manually copied and saved as plain text files in order to
exclude unwanted meta-data and to make them machine-readable. Subse-
quently, these text files were loaded into Wmatrix (Rayson 2008; see <ucrel.lancs.
ac.uk/wmatrix/>). This online annotation tool provides automatic part-of-speech
tagging with the CLAWS 7 tagset (<ucrel.lancs.ac.uk/claws/>) as well as semantic
annotation with USAS (<ucrel.lancs.ac.uk/usas/>). In addition, it offers various
concordancing, wordlist and keyword functions. For the analysis of n-grams, for
keyword analyses and for concordance searches, AntConc 3.3.5w (<www.antlab.
sci.waseda.ac.jp/antconc_index.html>) was used both for the English and the
German data.
In addition, the webpages containing the OTCs were saved in order to access
paratextual features such as Twitter feeds, tables, graphs or integrated videos and
to assess their potential influence on the main text. The rationale behind includ-
ing this data is the growing trend in linguistics that “[e]ver more phenomena that
would previously have been termed paralinguistic, in the sense of accompanying
but only weakly influencing linguistic form and expression, are now being moved
into the center of concern” (Bateman 2012: 3990). Therefore, the present corpus
can be seen as multimodal.
2 OTCs as a register
2.1 E
lectronically-mediated communication and sports
reportage
Broadly speaking, in the scarce amount of work available to date, the style of
football reportage has been described as resembling conversation (cf. Ferguson
1983: 156–157), but some have highlighted its monologic quality, emphasising its
narrative properties where the commentator acts as mediator and filter (Brandt
and Quentin 1983: 21; Hennig 2000: 44). A comparison of OTCs and traditional
types of live reportage in terms of a summary overview of results from previous
analyses (Perez-Sabater et al. 2008; Chovanec 2008, 2010; Jucker 2010; Thurman
and Walters 2013) yields the picture displayed in Table 1.
Table 1: Comparison of traditional registers of sports/football live reportage with OTCs
Radio TV OTC
STRUCTURAL Event-related versus non-event-related ✓ ✓ ✓

FEATURES sections
Unscripted ✓ ✓ ✓
Channels (visual/aural/textual) ✗/✓/✗ ✓/✓/(✓) (✓)/✗/✓
Temporal limitation ✓ ✓ ✓
LINGUISTIC Narrative style ✓ ✓ ✓

FEATURES
Monologic structure (one-to-many) ✓ ✓ (✓)
Orality/informality/casual tone ✓ ✓ (✓)
Jargon/slang/idioms ✓ ✓ ✓
Formulaic language ✓ ✓ ✓
Ellipsis ✓ ✓ ✗
Table 1 shows the shared characteristics of both OTCs and traditional live report-
age as events in mass communication, while humour is another broad commu-
nication strategy characteristically used in all types. Owing to the features listed
above, sports reportage generally has been described as some kind of “enter-
tainment” genre, even though its primary function arguably is to report factual
content (Brandt and Quentin 1983: 20; Chovanec 2011: 253–254).
However, a number of differences on account of the channel of distribution
(web page mainly with textual content + interactive elements) and to the particu-
lar properties of electronic communication (e.g. the staging of familiarity,3 see
further Section 3.2 and Jucker 2010: 66) emerge. Above all, a point worth noting
is the way in which the recipients consume media forms such as OTCs. They are
produced fairly quickly and without many corrections as the commentator is
under time pressure due to the co-extensive nature of the event described and its
description (Jucker 2010: 64).4 Likewise, the consumption is quick and cursory, as
3 According to Dürscheid (1999: 23), the staging of familiarity (and the resulting “pseudo-inti-
macy” between participants; O’Keeffe 2006; see further Section 3.2) in written electronic com-
munication is characterised by an apparent closeness of those involved in such a communicative
situation. This is due to the immediacy of the exchanges via the electronic medium, which is
supported by the use and acceptance of features typically occurring in the spoken mode.
4 Indeed, typos, interpretable as a typical feature of online production under time constraints,
repeatedly occur in all of the OTCs analysed (see e.g. examples (41), (56) and (59) below).
276 Valentin Werner
is the case with many other electronic offerings (Dürscheid 1999: 21). These find-
ings suggest that there are areas of both overlap and divergence between OTCs
and traditional forms of sports reportage. In addition to the aspects mentioned in
the foregoing, it will be shown in the following how OTCs can be further related
to the domains of sports and news reportage, but why they should be categorised
as a separate, fairly institutionalised, register serving a discourse community
(O’Keeffe 2006: 19, 29).
2.2 Layout and production
The fundamental difference between commentators in traditional and in elec-

tronic media (including OTCs) is the loss of their ‘gatekeeping’ function. With the
advent of internet communication, reporters are supposed to transfer, modularise
and visualise information without any prioritising (Jucker 2005: 17). OTCs seem to
be a nearly perfect format to achieve this, while another of the defining properties
is their immediacy and speed and a particular ‘live’ atmosphere, highly valued
by the online audience (Simons 2011: 180; Thurman and Walters 2013: 95). That
OTCs in practice actually represent a new form of journalism can also be deduced
from the fact that the task of creating the input is more often than not assigned
to a freelance journalist or intern rather than to a regular editorial staff member.
Economic considerations also play a role here, of course. OTCs as a rule are com-
posed in an editorial office in front of a TV screen and only rarely in the foot-
ball stadium (Holger Müller, personal communication). In the majority of cases a
single commentator is responsible for the coverage, who acts as the voice in the
OTC. That means he introduces himself and refers to himself in the first person.
At times, however, a person mirroring and choosing readers’ mailings for inclu-
sion in the commentary may support the commentator. This person may also be
responsible for taking care of any technical issues occurring during the reportage
(Thurman and Walters 2013: 91–92; see below for other interactive elements).5
5 The corpus even contains a few meta-comments on technical issues during production, after
the conventional layout and the technical platform apparently had been changed: Yes, yes this
looks a bit different to our usual minute-by-minute reports, but rather than moan about change,
why not embrace it? Or moan about it privately. I’m just a drone who’s following orders and doing
what he’s told. And besides, I quite like it, because I can put in big red quotation marks… (ukr_
eng_1906_guar); I do love this new headline facility… (ukr_eng_1906_guar)
Figure 1: Commentary and overview section of the SUN OTC (from swe_eng_1506_sun;
<www.thesun.co.uk/sol/homepage/sport/football/match_centre/article3670013.ece>,
accessed 12/07/2012, 10:21)
Jucker defines OTCs as a “complex combination of visual and textual features […]
giv[ing] the recipient not only a narrative account of the events so far, but also an
overview of the situation at present” (2010: 59). Typically, the textual informa-
tion is shown in reverse chronological order, with the most recently added post
278 Valentin Werner
appearing at the top of the page (see Figure 1 for an example).6 This post-by-post
(or minute-by-minute) reporting style is supposedly a fairly recent development
illustrating the influence of structure on activity (O’Keeffe 2006: 31). This means
that the special properties of OTCs as a form of electronically mediated communi-
cation have an impact on the style of reporting. In fact, OTCs surprisingly resem-
ble a certain type of after-match report which appeared in printed publications as
early as the 1950s (see Figure 2).
Figure 2: Excerpt from Kicker FUSSBALL-ILLUSTRIERTE (1954) adapted from Burkhardt (2010: 11)
What is new, however, are the opportunities offered by the technology to use a
similar reporting style for live reportage, and the additional options the electronic
6 The content management system of a media outlet may allow reversing the anti-chronological
order once the event has finished, so that the report appears as a kind of article readable from
top to bottom. For instance, this is the case with GUAR (Thurman and Walters 2013: 92) but does
not apply for the other OTCs explored in this study. Occasionally, earlier postings are corrected or
altered in order to make them more readable after the description of the actual event (e.g. during
half-time breaks or before the order is reversed (Simons 2011: 181). Thus, OTCs are a register that
is both dynamic and static (Chovanec 2010: 239).
medium offers. Sometimes the readers have the choice to filter the textual data to
quickly update on the most important events in the match (i.e. goals, fouls and
substitutions). Other elements that could be added (usually outside the frame or
area where the main commentary appears) are links and embedded audiovisual
content (Thurman and Walters 2013: 83). In football reportage in particular, the
majority of OTCs offers sections, tabs or links on the score (also of simultaneous
matches) and scorers, current and starting team line-up and on general statis-
tics (shots on goal, cards, ball possession, etc.). One of the most intricate OTCs is
offered by SPON, where readers can also retrieve the real-time statistics for each
individual player. This OTC further includes “heatmaps” (see Figure 3) showing
the positions/operating range of the individual player or of the full team on the
pitch.
Figure 3: Heatmap of the English team (left) and Italian central midfielder Andrea Pirlo (right) in
SPON (from ita_eng_2806_spon; <www.spiegel.de/sport/fussball/em-2012-liveticker-
spielplan-und-alle-statistiken-a-836448.html>, accessed 02/07/2012, 10:28)
The presence of all of these elements appears to suggest a secondary impor-

tance of the textual data of the commentary (cf. Jucker 2005: 17). Actually, the
paratextual elements are also mainly textual (that is, they encode information
orthographically) and present factual information. This might determine the style
and content of the commentary, as factual information is constrained to the para-
textual elements (Perez-Sabater et al. 2008: 251; cf. Bateman 2012: 3985). Occa-
sionally (this mainly applies to GUAR), these additional elements are used for
mere entertainment purposes without any direct relation to the event described
(Thurman and Walters 2013: 85). In any case, it is necessary to consider the com-
bination and interplay between these two categories in a linguistic analysis (see
Section 3.3 below).
Generally speaking, we can describe OTCs as examples of mash-ups of dif-
ferent journalistic styles (reporting, commenting, glossing; cf. Simons 2011: 179).
Turning to the common layout of the commentary, we can establish the following
280 Valentin Werner
simplified scheme (in chronological order), abstracted from the four OTC types
investigated:
Table 2: OTC phases and their typical content
Phase Typical content
“Appetiser” (published a few Statements on the relevance of the match

days or hours in advance)
Preamble/preview Self-introduction of the commentator, welcoming the readers,

match-related interview passages
Background information Team line-ups, tactics, referees, results in previous

ncounters, description of atmosphere, jersey colours,
e
national anthems
Commentary Play-by-play description and comment, half-time summary

and preview (readers’ comments)
Summary and overall match Consequences for teams, naming goal scorers and order of
comment scoring
Outlook Next fixture of the team(s)
Goodbye
This highly structured layout in large parts corresponds to the progression in tra-
ditional football reportage, but OTCs usually finish shortly after the actual match
coverage and lack post-match comments and interviews commonly found in
radio and especially on TV (cf. Ferguson 1983: 154). Note the differences between
the individual OTCs: while the posts of some media (e.g. from SUN) are always
organised in the same fixed way (preview – early team news – head to head – the
ref – etc.) and are apparently prepared in advance (cf. Simons 2011: 180–181), the
data from the other media outlets suggest that they take a more liberal approach
and leave the exact arrangement of the posts (particularly in the phases before
the actual commentary begins) up to the commentator.7
The length of the individual phases may vary. For example, the length of the
pre-match coverage ranges between 176 (swe_eng_1506_spon) and 2,969 words
(ita_eng_2406_guar), GUAR overall being most verbose in this respect (see Figure
7 Boundaries between the (idealised) phases are blurred at times, so that information typically
found in one phase may also appear somewhere else. For example, information on jersey colours
may appear within the first minutes of the actual match commentary, as illustrated by 1’ KICK
OFF Germany, in their all-white kit, start the game kicking from right to left (ger_den_1706_sun).
4) and particularly when matches of the English squad are reported (for further
quantitative assessment of OTCs, see Section 3.3 below).
1800
1600
1400
1200
word count
1000
800
600
400
200
0
GUAR SUN BILD SPON
AVG 1258.3 696.9 606.4 558.9
AVG ENG 1708.5 771.25 534.25 298.5
AVG GER 898.2 637.4 664.2 767.2
Figure 4: Length (in words) of pre-match commentary (AVG = overall average; AVG ENG =
average of England match reports; AVG GER = average of Germany match reports)
The phases before the match actually starts serve at least two important commu-
nicative functions. First, the ‘appetiser’ section is a device to incite interest in
readers and to emphasise the relevance of the match. (1) and (2) can be seen as
typical posts.
(1) A titanic clash awaits. (ger_ita_2806_sun)

(2) Deutschland gegen Niederlande, das ist der Klassiker, das Non-Plus-Ultra im
Fußball, ach, was sag ich, der heilige Gral bei dieser EM. Ich begrüße Sie herzlich zu
diesem Top-Event (ger_ned_1306_spon)8
A second function, also applicable to the background information phase, is to

directly address and accommodate the readers into the spectacle and make them
part of the match. In this regard OTCs are quite similar to traditional mass media,
which aim at linking “the significant and the mundane” (Gerhardt 2006: 131),
that is, the allegedly spectacular match and the allegedly ordinary everyday life
of the readers. (3) and (4) nicely illustrate this point.
8 Translation: Germany versus the Netherlands, that’s the classic, the non-plus-ultra of football –
what am I saying, the Holy Grail of this European Championship. A warm welcome to this top event.
282 Valentin Werner
(3) Good evening, everybody. Are ya nervous? Are ya? (ukr_eng_1906_guar)

(4) Die Nationalhymnen. Gänsehaut für jeden Fußballfan. Was für eine Stimmung.
(ukr_eng_1906_bild)9
The commentary can be viewed as the core part, with the main communicative
function of conveying factual information, although further functions, such as
entertainment (see below), should not be discounted. OTCs usually finish with a
summary and overall match comment, potentially aimed at members of the audi-
ence who only look for a quick round-up of the match and who do not want to
read the full coverage.
2.3 A
udience participation
Studies of internet communication have always recognised its multimedial nature

in the sense that textual data rarely appears in isolation (Dürscheid 1999: 28–29),
and the same naturally applies to OTCs. Another dimension of multimediality is
the opportunity of interacting with commentators before and while the match
coverage is in progress. The question is whether this has ramifications for the
structure and content of OTCs.
On the one hand, Chovanec (e.g. 2008) has convincingly shown that audience
mail-ins constitute an essential element of OTC football reportage. In addition, he
has found that readers’ comments and their citing by the commentator are rarely
directly related to the gameplay and thus constitute a second layer of “gossip”
with a social rather than an informative function. This considerably extends
the scope of the OTC beyond the provision of factual information (as its primary
purpose) and is testimony to the entertainment function OTCs can carry. As only a
selection of readers’ mails are presented and addressed and, more often than not,
reduced to clichés (Chovanec 2008: 260), he labels this type of discourse “qua-
si-conversational interactions” (Chovanec 2011: 252). Readers may participate in
the creation of the content of the OTC, but only at the discretion of the commenta-
tors (or their aides; see above). Given that commentator and contributing readers
usually do not know each other personally, casual conversation is only simulated
to a certain extent. However, the general applicability of Chovanec’s findings is
limited as his analyses are restricted to GUAR data only (see also Thurman and
Walters 2013: 85).
9 Translation: The national anthems. Creeps for every football fan. What an atmosphere.
On the other hand, the advent and growing popularity of genuinely inter-
active internet applications (the so-called “web 2.0” technologies) could have
led to a widespread integration of these into OTCs as another “webby” form of
communication, creating dynamic content. The most popular application, poten-
tially also most adapted to OTCs as another immediate form of journalism (cf.
Chovanec 2010: 239), is the microblogging service Twitter (<www.twitter.com>).
Despite its presence on the market since 2006, only one of the OTCs considered
in the present study, SPON, has reserved some space for Tweets (that is, Twitter
posts). This area (called “Live-Fanblock”, ‘live fan section’) is placed prominently
next to the main commentary box (see Figure 5).
Figure 5: Main commentary and Tweets in SPON (from ger_ita_2406_spon;

<www.spiegel.de/sport/fussball/em-2012-liveticker-spielplan-und-alle-statistiken-a-836448.
html>, accessed 02/07/2012, 10:29)
Commentators actively encourage readers to participate, as in (5), but they do

not cite readers’ Tweets in the main commentary. The one exception to this rule
is presented below as (6).
(5) Jetzt ist es amtlich – Klose, Schürrle und Reus spielen von Beginn an. Twittert der
DFB. Sollten Sie auch den Drang verspüren, ihren Kommentar via Twitter in den
Live-Fanblock rechts nebenan zu Tickern, so benutzen Sie bitte den Hashtag #gergre
(ger_gre_2206_spon)10
10 Translation: Now we know for sure – Klose, Schürrle and Reus are in the starting line-up. Twit-
ters the DFB (= the German football association). Should you also feel the urge to post your com-
ments to the Live-Fanblock to the right, please use the hashtag #gergre
284 Valentin Werner
(6) PS: Mein Tweet des Abends: Dehnen ist gut für die Bänder, Bender ist schlecht für die
Dänen – @wintersjon! In diesem Sinne, gute Nacht! (ger_den_1706_spon)11
Therefore, rather than engaging in quasi-conversation in the sense defined above,

Tweets in SPON should be viewed as truly parallel comment, where readers can
express their (unfiltered) opinion and post links.
Although OTCs in GUAR do not comprise a formalised way of incorporat-
ing Twitter comparable to the “Live-Fanblock” of SPON, commentators refer to
Tweets in a similar fashion as they do with regard to mails (that is, with added
comment), albeit rarely in the present data (see (7)).
(7) Over on the Twitter @ianapplegate has this suggestion. Maybe they should at least give
Esperanto a go? Can anyone even speak Esperanto? (ger_den_1706_guar)
It emerges from the analysis that, at present, no unequivocal answer can be given
to the question as to whether interactive elements influence OTC commentary.
However, it could be shown (i) that the extent of how much reader-generated
content influences the style and content of OTCs varies considerably and (ii) that
different OTCs have different approaches towards interactivity. While two (SUN,
BILD) do not provide any opportunity for the readers to get involved, OTC report-
age in GUAR provides extensive, though filtered, reader-generated content and
related comments, and thus yields a quasi-conversational structure as defined
above. The most direct approach arguably is taken by SPON, where Tweets are
displayed unfiltered as a by-commentary right next to the commentator’s text.
However, the latter does not usually refer to the former in any way, so audience
participation could be viewed as constrained in another way.12
11 Translation: PS: My Tweet of the night: Stretching is good for the ligaments, Bender is bad for
the Danes – @wintersjon! In this spirit, good night! Note: In the German version, the author of
the Tweet exploits the homophony /bendɐ/ between Bänder (‘ligaments’) and Bender (player’s
name) for a comic effect.
12 Even if the extent of filtering varies, both ways of incorporating interactive elements pre-
sumably take account of a point made in audience studies of other media types. To be precise,
Gerhardt (2006: 129) maintains that the audience consists of “active social agents whose lives do
not come to a halt when they are exposed to a mass medium”. Accordingly, it could be argued
that OTCs with interactive elements take a socially more adequate approach towards their read-
ers. This view is also supported by Simons (2011: 156), asserting that modern audiences have
developed a feeling of being entitled to participation and interaction. Therefore, it is argued that
state-of-the-art journalistic practice is liable to incorporate social media in order to render mass
media production and use a shared experience. A related point of minor importance is that OTCs
sometimes also serve as some kind of by-medium to TV broadcasts where a commentator adds
3 The language of OTCs
3.1 Vocabulary, collocations and semantics
3.1.1 G
eneral picture
Like traditional types of sports reportage (cf. Ghadessy 1988: 19), OTCs can be
expected to contain a substantial amount of technical vocabulary to describe the
gameplay. An exploration of the most frequent content words reveals that items
can be broadly categorised into what is shown in Table 3.
Table 3: Categories of content words amongst the top 100 wordlist created with AntConc
Examples for GUAR + SUN SPON + BILD
Names of teams England, Germany, England, Deutschland (‘Germany’),

(geographical location) Sweden, France, Portugal, Italien (‘Italy’), Portugal, deutschen
Ukraine, Italy (‘German’)
Temporal location min, time, (first/second) Minute (‘minute’), jetzt (‘now’),

half, after dann (‘then’), heute (‘today’), nach
(‘after‘)
Sports-/game-related ball, goal, shot, corner, Ball (‘ball’), Tor (‘goal’), Ecke
terms side, kick, area, cross, (‘corner’), (gelbe) Karte (‘(yellow)
chance, post, team, game card’), Strafraum (‘penalty area’),
Flanke (‘cross’), Spiel (‘game’),
Wechsel (‘substitution’)
Names of players and Hart, Rooney Hart, Gomez, Klose, Özil, Neuer, Löw
coaches
Overall, the comparison between the most frequent content words in English and
German OTCs reveals some striking similarities (especially as regards the first
three categories in Table 3), but with a slight change in national focus (as regards
the players’ names). Note also that the expression of movement, location and
direction figures prominently in terms of function words – mainly prepositions –
amongst the highly frequent lexical items (e.g. right, up, left, down, back, over,
“colour commentary” to the “action” on the screen. This is especially salient in designated OTCs
on particular shows, for instance such as the regular SPON OTC on “Tatort”, a popular German
crime series.
286 Valentin Werner
against, to, in, for, from, on, at, by, into vs. in, auf, mit, von, zu, im, aus, an, bei,
gegen, vor, nach, zum, ab, am, über, durch, zur, ins).
These findings can be closely related to a semantic keyword analysis in
Wmatrix, where the English OTC data are compared against the spoken and
written BNC sampler. In this quantitative perspective, salient semantic areas
emerge. These are ‘competition’, ‘numbers’ (usually related to spatial and tem-
poral orientation), ‘warfare, defence and the army; weapons’, ‘violent/angry’,
‘chance, luck’, ‘long, tall and wide’, ‘success’, ‘failure’, ‘anatomy and physiol-
ogy’, illustrated by examples (8) to (15) respectively.13
(8) As it stands, Portugal will go through with a better head-to-head record. (ger_
den_1706_sun)
(9) And how England love that decision, because the second effort is sent right onto
Lescott’s head, eight yards out, level with the left-hand post. (fra_eng_1106_guar)
(10) That was Klose’s 64th goal for Germany four off Gerd Muller’s record and he almost
made it 65 moments later, following up a loose ball and sweeping in a low shot that
was kicked behind at the near post by the besieged Sifakis. (ger_gre_2206_guar)
(11) Evra whips a cross into the England area from the left. (fra_eng_1106_guar)
(12) It’s high-stakes major-championship Holland versus Germany. (ger_ned_1306_guar)
(13) Germany also prevailed in the third-place play-off at World Cup 2006, winning 3-1 in
Stuttgart. (ger_por_0906_sun)
(14) Designated scapegoat for when it all goes wrong: Pedro Proenca (Portugal). (ita_
eng_2406_sun)
(15) He curls a cross onto the head of Gomez, but the big striker’s header is weak and
wafted miles to the left of the target. (ger_ita_2806_guar)
The analysis of highly frequent content items and the semantic keyword anal-
ysis suggest that OTCs do not fundamentally differ from other forms of football
reportage, particular radio reportage, as “good playing, moments of risk, signif-
icant points of heightened competition” (Ferguson 1983: 156–157) receive most
extensive coverage. This can be deduced for example from the high salience of
‘success’ and ‘failure’ semantic tags or the high frequencies of players’ names
usually involved when chances in a game occur; that is, strikers/offensive players
(Rooney, Özil, Klose, Gomez) and goalkeepers (Hart, Neuer).
Levin (2008: 146) has pointed out that “traditions developed in sports com-
mentary are often unintelligible to the uninitiated”, one reason being that com-
mentators rely on formulaic language with specialised meanings. In order to test
13 Some of the findings of the corpus software may be due to the metaphorical processes in-
volved (cf. also the usage of the terms shot, target and squad, captain, etc.). It is controversial
whether “football is war” metaphors still apply or whether they have conventionalised (see also
Section 3.1.2).
this claim, I compared the ten most frequent 4-grams in the material for both
languages, as shown in Table 4.
Table 4: The ten most frequent 4-grams extracted with AntConc
GUAR+SUN SPON+BILD
Rank Freq. 4-gram Freq. 4-gram
1 41 the edge of the 13 Meter vor dem Tor

(‘meters before the goal’)
2 25 edge of the area 11 auf der anderen Seite

(‘on the other side’)
3 25 on the edge of 11 aus der zweiten Reihe

(‘from the second row’)
4 19 down the inside-right 10 Tooor für Deutschland, X:X

(‘goal for Germany, X:X’)
5 16 the inside-right channel 8 auf dem rechten Flügel

(‘on the right wing’)
6 14 down the right and 7 in der zweiten Hälfte

(‘in the second half’)
7 14 from the edge of 6 da war mehr drin

(‘there was more in it’)
8 13 down the inside-left 6 doch der Ball geht

(‘but the ball goes’)
9 13 in the first half 6 im Strafraum an den

(‘in the penalty area at the’)
10 12 for the first time 6 Meter vor dem Kasten

(‘meters before the goal’)
According to the absolute usage frequencies, English OTCs apparently use for-
mulaic expressions much more than the German ones. A particularly common
collocation (see ranks 1, 2 and 3 in Table 4), better represented as a 6-gram,
is on the edge of the X.14 Levin’s (2008) findings can be confirmed insofar that
somebody reading OTC reportage has to have (i) knowledge about conventions
and a mental image as regards the layout of a football pitch and (ii) about foot-
14 Realisations for X occurring in the data are D, six yard box, England box, Italy penalty area,
Sweden penalty area, penalty area.
288 Valentin Werner
ball-related jargon. Fact (i) is especially illustrated by the English data, where
the majority of the 4-grams describes movement and/or position and (ii) espe-
cially by the German data, where technical terms (partly also related to position)
such as Strafraum (‘penalty area’), Flügel (lit. ‘wing’; ‘outer part of the pitch’) or
aus der zweiten Reihe (lit. ‘from the second row’; ‘from far away’) appear. The
present data therefore suggest that it is not merely “goal scoring and measuring
time” (Levin 2008: 146) where formulaic language is employed, although some
of the items included in Table 4 (e.g. in the first half; for the first time; Tooor für
Deutschland, X:X; in der zweiten Hälfte) support Levin’s claim.
A related aspect is the extended reliance on informal and slang items
(Perez-Sabater et al. 2008: 242; cf. Ferguson 1983: 156–157), exemplified by Kasten
(‘goal’, lit. ‘box, case’) in Table 4. A recent study on informality (Burkhardt 2010:
14–15) has identified a long-standing tradition of dialectal and informal influence
as regards (German) football language, and a similar situation in English appears
highly likely. Indeed, the OTC data from both languages confirm a general ten-
dency towards informal usage, as examples (16) to (19) show (see also below):
(16) Neat turn from Ozil who twists in the box before feeding Khedira for a low 20-yarder,
which Sifakis parries. (ger_gre_2206_sun)
(17) (…) on the sideline Joachim Low is waving his hands around in frustration like an eejit.
(ger_gre_2206_guar)
(18) Huiuiui, dieser Reus hat sich einiges vorgenommen. Diesmal rutscht ihm das Spiel-
gerät über den Schlappen und fliegt zwei Meter am rechten Außenpfosten vorbei.
(ger_gre_2206_bild)15
(19) Fortakis hält einfach mal drauf. Neuer hält einfach mal fest. (ger_gre_2206_spon)16
3.1.2 I ntended readership
Lexical differences along the dimension “intended readership” are harder to

determine. First of all, a quantitative assessment of the lexical density of OTCs
(see Table 5) shows only marginal differences between languages and individual
OTCs (SD = 1.50) and standardised type/token ratio values approximating values
normally found in written data (e.g. of the written components of the Interna-
tional Corpus of English).
15 Translation: Huiuiui, this Reus guy is up for something. This time, the playing device (infml.)
slides over his worn-out shoe/slipper and misses the right outer post by two meters.
16 Translation: Fortakis just shoots. Neuer just saves.
Table 5: Standardised type/token ratios (TTR) calculated with frequencies from AntConc
GUAR SUN BILD SPON
std. TTR 45.88 42.50 45.51 42.93
In fact, keyword analyses contrasting the vocabulary of the two OTCs respectively
(GUAR vs. SUN and SPON vs. BILD) yield a very diverse picture. First, a look at
the top 100 keyness words of GUAR vs. SUN (and vice versa) reveals some (groups
of) characteristic items. Commentators for GUAR seem to have a preference for
technical terms such as tiki-taka or its ad-hoc (mock) variant (das) bundestikiund-
taka17 to describe the particular playing style the Spanish and German teams are
known for. On a related note, the acronym TBOF (‘two banks of four’), referring
to the traditional tactical formation of the England squad, reaches a high keyness
rating. Another conspicuous item in the GUAR data is beard. Here, an idiosyn-
cratic use of the GUAR commentator, again from the Germany vs. Greece match,
is responsible for its salience. While at the beginning of the coverage the player
Salpingidis is introduced with the metonymic nickname beard to be feared, as in
example (20), at a later point in the match, we can witness a process of personifi-
cation and the reference merely by a physiological feature is taken as established,
as can also be seen from the capitalisation of the term in example (21).18
(20) Gekas will go up front, with the beard to be feared, Salpingidis moving to the right of
midfield. (ger_gre_2206_guar)
(21) The Beard To Be Feared slides a cool low penalty to the right as Neuer goes the other
way. (ger_gre_2206_guar)
In contrast, we can generalise from the SUN vs. GUAR keyness list that SUN com-
mentators more often than not refer to players by their first names (Mario, Bastian,
Antonio, Manuel, Mesut, Cristiano, Miroslav, etc.) and employ more war-/aggres-
17 Burkhardt (2010: 14) presents an overview of the genesis of the term tikitaka. Consider also
the word formations das bundestikiundtakafussball (ger_por_0906_guar); I fell asleep after 63
minutes and have only just woken up from a tiki-taka-induced snooze (ita_eng_2406_guar) or Be-
cause over-intellectualising Spain’s tiki-totalitarianism isn’t going to be enough when you try to big
this up in ten years’ time, I can tell you that for nothing (ger_ita_2806_guar).
18 Cf. the following references to England striker Wayne Rooney: Dicke Chance für Mister Haupt
haar! (‘Big opportunity for Mister scalp hair!’; ita_eng_2406_bild); Wieder kommt das lebende
Haartransplantat Rooney angeflogen, doch sein Kopfball ist eher eine Rettungstat denn ein Torver-
such. (‘Again the living hair transplant Rooney is approaching, but his header is more of a save
than an attempt on target.’; ita_eng_2406_spon).
290 Valentin Werner
sion-related terminology (e.g. fires, impact, strike, shot, kill, onslaught) – although
it might be argued that some of these items have become conventionalised meta-
phors. Puns on players’ names and ad-hoc formations are a common feature of all
OTCs and illustrate creative language use in this type of sports commentary (see
also Section 3.2 on discourse features below; cf. Golebiowski 2012: 58):
(22) It’s Robben-esque at times from Ibrahimovic (…) (swe_eng_1506_guar)

(23) “It’s Goetzille.” Who needs Xaviesta? (ger_gre_2206_guar)
(24) Super Mario was brilliant at times for Manchester City this season (…) (ita_eng_2406_
sun)
(25) Immer wieder Mad Mario. (ita_eng_2406_spon)
(26) THE LAHM BELLS ARE RINGING (ger_ned_1306_sun)
(27) LACKING in KLAAS (ger_ned_1306_sun)
(28) Schewagol (ukr_eng_1906_bild)
(29) Kjaer has the ball toe-poked pass him by Muller with the result that Muller is mullered
to the ground by the Dane. (ger_den_1706_guar)
The keyword analysis of the German OTCs shows that BILD is much more prone
to using dialectal and jargon words than SPON. Two illustrative instances are
references to Ball (‘ball’) and Tor (‘goal’). While the standard variants (i.e. Ball
and Tor) rank high in the keyness list of SPON, within the top 100 keyness items
of BILD a variety of informal terms both for the former (e.g. Kugel ‘bowl’, Leder
‘leather’, Pille ‘pill’, Murmel ‘marble’)19 and the latter (e.g. Kasten ‘box’, Hütte
‘shed’) occur. On a related note, other salient items worth mentioning due to their
high keyness in BILD are Schlappen (‘foot’; lit. ‘worn-out shoe/slipper’) or Dampf
hammer (‘fast shot on goal’; lit. ‘steam hammer’). This does not mean, however,
that SPON commentators do not use informal or jargon items, as the occurrence
of some other words listed in Burkhardt (2010) shows (see examples (30) to (32)) –
they are just used less frequently.
(30) Also Balotelli sollte heute besser keinen Elfer mehr schießen (ita_eng_2406_spon)20
(31) De Rossi schießt, Hart lässt prallen, Balotelli feuert aus kurzer Distanz drauf, wieder
Hart und dann muss Monotolivo das Ding im Nachschuss machen (ita_eng_2406_
spon)21
19 The Kicktionary (<www.kicktionary.de>; Schmidt 2007), a multilingual dictionary of football

terms, includes Kugel and Leder (in addition to Spielgerät (‘the thing to play with’)); cf. Neuer
faustet das Spielgerät weg (‘Neuer punches the ball away’; ger_por_09_06_bild), but not Pille
and Murmel.
20 Translation: Well, Balotelli rather shouldn’t shoot any more penalties (infml.) today.
21 Translation: De Rossi shoots, Hart rebounds the ball, Balotelli fires from a short distance, again
Hart and then Montolivo must score [lit. make the thing] in the follow-up.
(32) Garmash wagt einen Distanzschuss und knallt aus 30 Metern vom linken Flügel aus
auf das Tor. (ukr_eng_1906_spon)22
3.2 D
iscourse features
Again relating to in-group knowledge (see also Gerhardt 2006: 140; O’Keeffe
2006: 155) required by the audience, an earlier analysis has identified “British-
ness” (Chovanec 2008: 261) as common ground of the cross-references in GUAR
OTCs. Some of these findings can be extended to OTCs from other media outlets.
In-group knowledge is required by the reader whenever commentators refer or
allude to particular players, coaches or commentators not part of the current
game or action (and their alleged characteristics, statements or achievements).
Examples (33) to (38) illustrate that this happens in OTCs of all kinds.
(33) Call it the Crouch Effect, if you will. (swe_eng_1506_guar)

(34) The full-back likes attacking more than defending, apparently, so appears to be the
Portuguese equivalent of Glen Johnson. (ger_por_0906_sun)
(35) Gomes slides in, Gascoigne at the Euro 96 semi style, but can’t get his boot to the ball.
(ger_ned_1306_guar)
(36) Aber Kroos mit einer Christian-Rahn-Gedächtnis-Ecke. (ger_ita_2806_spon)23
(37) Pirlo kommt trotzdem an den Ball, macht aber den Robben. (ger_ita_2806_spon)24
(38) Balotelli will den Ibrahimovic machen. (ita_eng_2406_bild)25
In the GUAR data, this is also often observable in the readers’ comments included
in the actual OTC. A similar effect is created by numerous references to scenes
from other games and to other teams, as shown in examples (39) to (43).
(39) Mellberg produces a tackle not too dissimilar to Bobby Moore’s famous one on Jair-
zinho in the 1970 World Cup. (swe_eng_1506_guar)
(40) He makes it to penalty area before old hand Mellberg stops him in his tracks with a
challenge akin to Moore on Pele, 1970. (swe_eng_1506_sun)
(41) I just had a horrible premonition of Balotelli making this match his Maradona ’86
moment and crushing us single-handledly [sic] because he feels like it (ita_eng_2406_
guar)
22 Translation: Garmash tries a distance shot and rifles the ball from 30 meters from the left wing
towards the goal.
23 Translation: But Kroos with a Christian-Rahn-memorial corner.
24 Translation: Pirlo gets the ball anyway, but does the Robben.
25 Translation: Balotelli wants to do the Ibrahimovic.
292 Valentin Werner
(42) Doch im Gegensatz zum FC Bayern nimmt keiner Reißaus oder zeigt auf den Anderen.
(ita_eng_2406_bild)26
(43) Schlecht war die deutsche Mannschaft gegen Portugal eigentlich nur im Jahr 2000.
Damals setzte es ein 0:3. Aber die Abwehrspieler hießen auch Rehmer oder Nowotny.
(ger_por_0906_spon)27
While these intertextual28 references as listed above are not restricted to OTCs
from GUAR, these are the ones where they occur most frequently (see Table 6).
Table 6. Average number of intertextual references per match report
GUAR SUN BILD SPON
cross references 6.67 2.78 2.44 4.78
This is also due to another unique feature of GUAR OTCs, which is reference to
popular culture (e.g. actors, movie titles etc.) by both commentators and audi-
ence comments, as exemplified in (44) or (45):
(44) See you in 10 minutes for more of the same, or the most dramatic twist since The Crying
Game/The Usual Suspects/Fight Club/Turner & Hooch. (ger_gre_2206_guar)
(45) Now that Walcott has replaced Ron Perlman England might actually win. (ita_
eng_2406_guar)
All this nicely illustrates the extensive additional knowledge required to become
an actual part of the game, or rather its mediated presentation (see also Gerhardt
2006: 140). In other types of media, commentators deliberately employ intertex-
tual references as one way to create “pseudo-intimacy”, that is, “some sense of
common identity and nationality or some other familiarity built up through fre-
quent ‘contact’” (O’Keeffe 2006: 92)29 and this seems to be the case also in OTC
reportage, most clearly in the GUAR data.
26 Translation: But in contrast to Bayern Munich nobody runs away or points to somebody else.
27 Translation: The only time the German team actually was bad against Portugal was in the year
2000. They got defeated 0:3. But the defenders were called Nowotny and Rehmer.
28 Intertexuality is conceived of in broad terms, including e.g. previous matches, scenes, other
players etc. as (non-linguistic) pre-texts. In addition, this intertextuality may also comprise ste-
reotyped (national) clichés requiring generalised cultural knowledge, such as “[…] but Andreas
Brehme has to be the best Left Back,” says John Duffy. “He had a few problems in the hairstyle
department, mind, but what German doesn’t?” (swe_eng_1506_guar).
29 Cf. also Ferguson’s term “dialog on stage” (1983: 156).
Another remarkable discourse feature already extensively covered by Jucker

(2006: 128) is what he labels “parlando prosodics”: in the written medium the
commentator imitates “spoken language through exclamations, capitalisation,
graphical indication of vowel lengthening […] and hesitations”.30 For reasons of
space, suffice it to say that also the current dataset yields a range of examples
and that these realisations can be found in OTCs of any provenance (see examples
(46) to (50)).
(46) Gooooooooooooooooal! but in the other game. (ger_den_1706_guar)

(47) They couldn’t, could they??? (ger_ita_2806_sun)
(48) Peeeeeeep! Peeeeeeep! Peeeeeeeeeeeeeeep! Nothing more to report here folks.
(ger_den_1706_guar)
(49) Aber gut, es bedeutet immerhin: GLEICH GEHT ES LOS! (ger_ger_2206_spon)31
(50) Rooooooooney zahlt zurück. (ukr_eng_1906_bild)32
Therefore, Perez-Sabater et al.’s (2008: 255) finding that prosody is usually not
typographically marked in OTCs from British newspapers has to be revised. In
addition, commentators indicate spoken modes of discourse by other means such
as (i) question tags, (ii) interjections and (iii) hesitation markers (or combinations
of these), all typically found in speech (cf. Chovanec 2008). Examples (51) to (54)
illustrate the first type and are commonly used as rhetorical questions or as a
means to convey surprise.
(51) You’d fancy that run continuing this year, no? (ger_por_0906_guar)
(52) Motta reißt Kroos um, Italien bekommt Freistoß. Häh? (ger_ita_2806_spon)33
(53) Oh no they didn’t! Football eh? (ger_gre_2206_sun)
(54) Wenn man sowas übersteht, kann doch nichts mehr schiefgehen, oder? (ger_
por_0906_spon)34
The wide range of interjections found in the data fulfils a similar function of sim-
ulating spoken discourse. Again, they occur across all OTCs, as examples (55) to
(59) show.
30 Expressive punctuation, exemplified in (47), could also be added to the list of parlando pro-
sodics and may thus be seen as a characteristic register feature (cf. Sanchez-Stockhammer, this
volume).
31 Translation: But well, at least this means: IT’S ABOUT TO START!
32 Translation: Rooooooooney pays back.
33 Translation: Motta knocks Kroos down, Italy gets a free kick. Eh?
34 Translation: If you get over such a thing, nothing can go wrong, right?
294 Valentin Werner
(55) Blimey, Liberopoulos is a man on a mission. (ger_gre_2206_sun)

(56) Oooooooooh. A ball as delicious as your mother’s Sunday roast is swung into the box
from Ozil but it goes out for a corer [sic]. (ger_den_1706_guar)
(57) Boah! Kann man das bitte nochmal in Zeitlupe sehen? (ger_ned_1306_bild)35
(58) Drei Minuten gibt es obendrauf! Puuh, das ist viel! (ger_por_0906_spon)36
(59) Oh Gott, was macht den [sic] Müller da? (ger_den_1706_spon)37
In the above instances, medium determines content, or at least its typographical

representation and many of the discourse features listed contribute to the crea-
tion of “pseudo-intimacy”, also meaning that both commentator and audience
“pretend the relationship is not mediated and is carried on as though it were face-
to-face” (O’Keeffe 2006: 92).
3.3 Interaction of text and other elements
Another aspect largely having escaped researchers’ attention is the interaction

between formal layout/paralinguistic phenomena and textual/linguistic content.
For the four OTCs under investigation, this indeed plays a role. It was already indi-
cated above that some of the OTCs come with many additional features such as
team statistics, heatmaps, etc. Thus, it could be hypothesised that the more para-
linguistic material is present, the shorter the individual OTCs are.38 This potential
interaction can be measured quantitatively by considering absolute token counts
(average number of words per match reported) and relating these values to the
presence of further structural elements.
Table 7: Average token number per match report
GUAR SUN BILD SPON
Average token number 4,646 3,105 2,580 3,047
35 Translation: Boah! Can we see this in slow motion again?

36 Translation: Three minutes of additional time! Phew, that’s a lot!
37 Translation: Oh my god, what’s Müller doing there?
38 The present analysis applies a “micro-level approach” (Santini et al. 2010: 11); that is, only el-
ements reachable within one click and which are part of the actual OTC are included (excluding
ads and general navigation tabs, etc.).
Table 7 shows the relevant frequencies, and a “wordiness hierarchy” along the
lines GUAR > SUN > SPON > BILD emerges, which suggests that the German OTCs
are shorter on average. Two aspects are worth considering here: in addition to the
textual commentary, the different OTCs rely on various other forms of presenta-
tion of match-related information, all allocated to different areas on the page or
reachable by clicking on a tab (see Section 2.2 above). Table 8 gives an overview
of presence or absence of these features.
Table 8: Comparative overview of presence/absence of paratextual features.
GUAR SUN BILD SPON
Textual commentary ✓ ✓ ✓ ✓
Match score and goal scorers ✓ ✓ ✓ ✓
Parallel matches and scores ✗ ✗ ✗ ✓
Team line-ups ✓ ✓ ✓ ✓
Live table ✗ ✓ ✓ ✓
Tactical formations ✗ ✓ ✓ ✗
“Event” filter or timeline (goals, cards, substitutions) ✗ ✓ ✓ ✓
Team and player statistics ✓ ✓ ✓ ✓
Player positions/“heatmaps” ✗ ✓ ✗ ✓
Player ratings ✓ ✗ ✓ ✗
Referee statistics ✗ ✗ ✓ ✗
Area for Tweets ✗ ✗ ✗ ✓
While Table 8 shows that there are some basic elements for all OTCs (match score
and goal scorers, team line-ups, statistics), it also illustrates a fundamental struc-
tural split between GUAR and the remaining three OTCs. GUAR emerges as the
one with least additional informational elements, necessitating, in turn, a more
explicit, or “wordy” style of reportage. The other OTCs, in contrast, rely more on
iconographic and tabular representations (see also Figures 6 and 7), which pro-
vides a first explanation for the lower number of tokens in these.
296 Valentin Werner
Figure 6: Team line-up, statistics and heatmap from SPON (fra_eng_1106_spon;

<www.spiegel.de/sport/fussball/em-2012-liveticker-spielplan-und-alle-statistiken-a-836448.
html>, accessed 02/07/2012, 10:30)
A second decisive point is that GUAR focuses on the entertainment aspect (Cho-
vanec 2010: 242), whereas the other three OTCs are more informational in the
sense that they provide an extended range of factual information and statistics.
This might also be the reason why the individual entries in the commentary
are short, as noted by Jucker (2010: 58–60).39 GUAR, in contrast, not only has
longer individual entries than the other OTCs, but relies extensively on readers’
comments and replies by the commentator, comprising up to one third of the
textual material (in number of words). Another characteristic feature of GUAR
is the incorporation of pictures, video clips and links only indirectly related to
the actual match, which rather serve to support the entertainment function.
The other OTCs do not incorporate audience participation at all (SUN, BILD) or
do so in a more direct manner, via Twitter messages displayed next to the main
commentary (SPON), thus creating another layer of commentary (see Section 2.3
above), which breaks the uni-directionality of the communication.
39 However, the span (in terms of word length) across the OTCs is considerable and can range
from just a few words (e.g. Ecke Deutschland ‘corner Germany’; ger_por_0906_bild) to more than
125 tokens.
Figure 7: Timeline, statistics and commentary from SUN (ita_eng_2406_sun;

<www.thesun.co.uk/sol/homepage/sport/football/match_centre/article3670013.ece>,
accessed 02/07/2012, 10:20)
298 Valentin Werner
4 Discussion: Cross-cultural aspects

Having considered some linguistic and structural aspects of OTCs, this section
addresses the question as to whether OTCs should be seen as a cross-cultural
register or whether differences are salient along the dimensions of regional prov-
enance or intended readership. Based on the findings from the previous sections,
a diverse picture emerges.
A first area with considerable overlap is the general structure of reportage.
Many elements (e.g. an “appetiser” section; see Section 2.2) occur universally and
also the other components of a textual match report are principally similar. This
is determined to some extent by the fact that all OTCs report on the same event
with a fixed duration and thematic focus (Siever 2011: 171) – a football match –,
so that a certain congruence could be expected. However, with respect to content,
GUAR is more extensive in its pre-match coverage of England matches, while the
German OTCs use more words to describe Germany playing. Word counts in SUN,
however, are relatively indifferent to the type of match reported (see Section 2.2).
The picture changes slightly when we consider the average word counts for the
full reports on matches by either England or Germany, as shown in Figure 8.
6000
5000
4000
word count
3000
2000
1000
0
GUAR SUN BILD SPON
AVG 4746.5 3147.2 3059.8 2528.0
AVG ENG 5646 3525.5 3132.8 2054.5
AVG GER 3847 2768.8 2986.8 3001.4
Figure 8: Overall average word count and according to team playing (AVG = overall average;
AVG ENG = average of England match reports; AVG GER = average of Germany match reports)
Both GUAR and SPON are more extensive in their coverage of the “home” team
(these commentaries comprise approximately one third more words than com-
mentaries of the respective other), while this tendency is less clear for SUN
(approximately one quarter more words for England matches) and even slightly
reverse for BILD. Thus, despite claims that audiences of new media are “poten-
tially global” (O’Keeffe 2006: 16), this finding indicates some kind of persisting
“national allegiances”.
Turning to the lexicon and collocations, the analysis above revealed that
content and function vocabulary are broadly comparable across languages.
Equally, OTCs of all types rely on formulaic language, which could be expected
with relation to earlier research on football discourse. From a quantitative per-
spective, however, English OTCs tend to use these combinations more than
German OTCs, in particular when referring to location of the action on the pitch.
Other commonalties are, first, the usage of slang terms and informal items typical
for football language in general. Second, a comparison of the type-token ratios
did not yield any significant differences. Thus, one of the points mentioned above,
namely the restricted lexical range of this particular register and that especially
OTCs associated with yellow press papers (SUN, BILD) are “simple” as regards
lexical content, has to be qualified to a certain extent.
An area where the OTCs clearly diverged along the dimension “intended
audience” emerged in the keyness analysis. Both the English and the German
OTCs yielded some inner differentiation – the former as to a higher salience of
war-related metaphors in SUN, the latter as to a higher salience of dialectal and
jargon vocabulary in BILD. Given the quantitative evidence, it is highly unlikely
that this is a chance finding. Rather, it may be interpreted as an adaptation of
the SUN and BILD commentators to the alleged language use of their intended
readership. Whether this adaptation is deliberate or intuitive remains a matter
of speculation. Puns on players’ names and creative ad-hoc formations can be
found across all OTCs, however.
Discourse features represent a further area where differences and similarities
could be observed. On the one hand, the salience of football- and culture-related
intertextual references as identified by Chovanec (2008) for GUAR OTCs could
also be traced in the other OTCs considered, thus representing another uniting
feature. However, these references are most frequent in GUAR and SPON, sug-
gesting that both the creation of an in-group atmosphere and the often-related
entertainment aspect are more important in the quality-press related OTCs. On
the other hand, the present study confirmed and extended earlier research pos-
iting the staging of orality as a trademark feature of OTCs, showcasing creative
manipulation of restrictions of the written medium, while no cultural specificity
of this phenomenon can be claimed on the basis of the present data (see Perez-Sa-
bater et al. 2008: 256 for a comparison of English, Spanish and French).
Finally, with regard to the interaction between the textual commentary and
other elements of the OTCs, it was evident that all OTCs apart from GUAR rely on
an extended range of supplementary features (mainly tabular and iconographic),
while GUAR may compensate for this lack of factual information with a more
300 Valentin Werner
extensive description in the textual commentary. In addition, GUAR and, with

qualifications, SPON can be viewed as more “entertaining” or “fan-like”, while
SUN and BILD are more factual (although the latter pair uses more jargon). This
division reproduces Jucker’s (2010: 69) categorisation of OTCs.
By way of summary, we can posit that there are indeed many commonalities
transcending borders (set by cultural specificity and intended readership), but
there is also room for variability both within and across language boundaries.
This highlights the flexibility of the register despite the formal constraints of the
electronic medium.
5 Summary and conclusion

Above all, OTCs emerged from the analysis as a “webby” genre that has gained
prominence within the last decade as an immediate form of online journalism,
particularly adequate for live coverage of sports events. Production circumstances
were established to be markedly different from those of traditional sports report-
age and it was shown that OTCs can be viewed as an amalgamation of different
journalistic, or, speaking more broadly, discursive styles (narration, description,
opinion, quasi-conversation, etc.; see further Biber and Egbert, this volume).
Some OTCs relied on an extended number of paratextual elements and the data
suggested a split picture as regards the potential influence of audience partici-
pation (both in terms of “web 2.0” applications and via other channels) on the
reporting. While two (SUN, BILD) did not take account of readers’ contributions,
SPON had a designated paratextual element (the “Live-Fanblock” containing
Tweets), where the audience could express their views as some kind of paral-
lel comment, and GUAR covered an intermediate position as comments (usually
sent-in mails) were frequently quoted and referred to, albeit in a mediated and fil-
tered form. An overall comparison of OTCs and traditional forms of sports report-
age indicated that the former should be identified as a new and specific register.
At the same time, this showcased the “interweaving of old and new formats” as
posited by O’Keeffe (2006: 27) as one of the general properties of newly emerging
registers.
Turning to language-related aspects, the present study first showed by way of
a lexical and semantic analysis that OTCs do not fundamentally differ from other
types of football reportage in their use of technical vocabulary. Second, the explo-
ration of n-grams revealed the importance of position-related collocations and
furthermore of informal and slang vocabulary, while differences between the indi-
vidual OTCs, especially along the dimension “intended readership” were clearly
evident. In contrast, the consideration of discourse features showed a remarkable

overlap between the four OTCs, while intertextual references were found to be
most salient in OTCs with “entertainment” as a communicative function (GUAR
and SPON). However, there were some instances with limitations posed by the
electronic (written) format, in particular as regards the staging of orality promi-
nent in OTCs. While all OTCs shared a similar general structure, GUAR emerged
as “the odd one out”. It was the one using most words but least paratextual ele-
ments, one potential explanation being that there the entertainment function is
strongest, while the other OTCs provided more factual information, supported
through tabular and iconographic elements. This highlighted the need to con-
sider the interaction between format and content and the communicative aim of
the individual OTCs as well as the tension between information and entertain-
ment emblematic of modern media discourse (cf. Fairclough 1995: 10).
No definitive answer could be given to the second guiding research question
as to whether OTCs can be seen as a cross-cultural register. Rather, OTCs emerged
as a highly diversified form of reportage. Formal constraints and the similar struc-
ture of the matches reported determined similarity to a certain extent. However,
the present analysis revealed (mostly, quantitative) diversity and flexibility, both
across (e.g. as regards length of the coverage of the “home” team) and within
(e.g. as to reliance on informal and slang items) languages. I suggest this is again
mainly due to the communicative aim of the individual OTCs and adaptation
towards their intended audience.
For a future exploration, it would be desirable to obtain a better insight into
the receptive dimension,40 for instance in terms of eye-tracking experiments
establishing how fast users read the OTC text and which elements (statistics,
textual commentary, icons etc.) they focus on. From a linguistic point of view,
further areas worth considering in more detail are creative language use (see
example (60)) as well as metonymies (see examples (20) and (21) above) and met-
aphors (see example (61) for a musical metaphor; cf. also Burkhardt 2010; Küster
2010: 32; Lewandowski 2012).
(60) The German fans are ole-ing. (ger_gre_2206_guar)

(61) Martin Olsson setzt sich auf links mit einem tollen Solo gegen Walcott und Johnson
durch […] (swe_eng_1506_bild)41
40 This could also include a case study focusing on the linguistic properties and functions of the
“twitterese” mentioned above.
41 Translation: Martin Olsson prevails against Walcott and Johnson on the left with a great solo.
302 Valentin Werner
While the present study offered a select comparison of German and English
OTCs, an analysis including even more OTCs from other languages and intended
audiences may help to establish a more fine-grained typology of OTCs world-
wide, potentially also considering diachronic developments. In this connection,
it remains to be seen whether audience participation, found to be relatively
restricted in the present study, will play a more important role in the future and
whether further technological developments (e.g. in terms of an integration of TV
and OTC reportage) will have an impact on the style of reporting.
References
Bateman, John A. 2012. Multimodal corpus-based approaches. In Carol A. Chapelle (ed.), The
encyclopedia of applied linguistics, 3983–3991. Oxford: Wiley-Blackwell.
Brandt, Wolfgang & Regina Quentin. 1983. Zeitstruktur und Tempusgebrauch in
Fussballreportagen des Hörfunks [Temporal structure and tense use in radio football
reportage]. Marburg: Elwert.
Burkhardt, Armin. 2010. Abseits, Kipper, Tiqui-Taca: Zur Geschichte der Fußballsprache in
Deutschland [Offside, keeper, tiki-taka: The history of football language in Germany]. Der
Deutschunterricht 62(3). 2–16.
Chovanec, Jan. 2008. Enacting an imaginary community: Infotainment in on-line minute-
by-minute sports commentaries. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang
Stadler (eds.), The linguistics of football, 255–268. Tübingen: Narr.
Chovanec, Jan. 2009. ‘Call Doc Singh’: Textual structure and coherence in live text sports
commentaries. In Olga Dontcheva-Navratilova & Renata Povolná (eds.), Coherence and
cohesion in spoken and written discourse, 124–137. Newcastle: Cambridge Scholars.
Chovanec, Jan. 2010. Online discussion and interaction: The case of live text commentary. In
Leonard Shedletsky & Joan E. Aitken (eds.), Cases on online discussion and interaction:
Experiences and outcomes, 234–251. Hershey: IGI Global.
Chovanec, Jan. 2011. Humor in quasi-conversations: Constructing fun in online sports
journalism. In Marta Dynel (ed.), The pragmatics of humour across discourse domains,
Dürscheid, Christa. 1999. Zwischen Mündlichkeit und Schriftlichkeit: Die Kommunikation
im Internet [Between speech and writing: Communication on the Internet]. Papiere zur
Linguistik 60(1). 17–30.
Fairclough, Norman. 1995. Media discourse. London: Arnold.
Ferguson, Charles A. 1983. Sports announcer talk: Syntactic aspects of register variation.
Language in Society 12(2). 153–172.
Gerhardt, Cornelia. 2006. Moving closer to the audience: Watching football on television.
Revista Alicantina de Estudios Ingleses 19. 125–148.
Ghadessy, Mohsen. 1988. The language of written sports commentary: Soccer – a description.
In Mohsen Ghadessy (ed.), Registers of written English: Situational factors and linguistic
features, 17–51. London: Pinter.
Golebiowski, Adam. 2012. Wortverschmelzungen und Sportsprache: Zur Kreativität im

Wortbildungsbereich [Blends and the language of sport: Creativity in word formation]. In
Janusz Taborek, Artur Tworek & Lech Zielinski (eds.), Sprache und Fußball im Blickpunkt
linguistischer Forschung [Language and football in the view of linguistic analysis], 51–61.
Hamburg: Kovač.
Grieve, Jack, Douglas Biber, Eric Friginal & Tatjana Nekrasova. 2010. Variation among blogs: A
multi-dimensional analysis. In Alexander Mehler, Serge Sharoff & Marina Santini (eds.),
Genres on the web: Computational models and empirical studies, 303–322. Dordrecht:
Springer.
Hennig, Mathilde. 2000. Tempus und Temporalität in geschriebenen und gesprochenen Texten
[Tense and temporality in written and spoken texts]. Tübingen: Niemeyer.
Höke, Susanne. 2007. Sun vs. Bild: Boulevardpresse in Großbritannien und Deutschland [Sun
vs. Bild: Yellow press in Great Britain and Germany]. Saarbrücken: VDM.
Jucker, Andreas. 2005. News discourse: Mass media communication from the seventeenth
to the twenty-first century. In Janne Skaffari, Matti Peikola, Ruth Carroll, Risto Hiltunen
& Brita Warvik (eds.), Opening windows on texts and discourses of the past, 7–21.
Amsterdam: Benjamins.
Jucker, Andreas. 2006. Live text commentaries: Read about it while it happens. In Jannis
K. Androutsopoulos, Jens Runkehl, Peter Schlobinski & Torsten Siever (eds.), Neuere
Entwicklungen in der linguistischen Internetforschung [Recent developments in linguistic
internet research], 113–131. Hildesheim: Olms.
Jucker, Andreas. 2010. ‘Audacious, brilliant!! What a strike!’ Live text commentaries on the
internet as real-time narratives. In Christian R. Hoffmann (ed.), Narrative revisited: Telling
a story in the age of new media, 57–77. Amsterdam: Benjamins.
Krone, Maike. 2005. The language of football: A contrastive study of syntactic and semantic
specifics of verb usage in English and German match commentaries. Stuttgart: Ibidem.
Küster, Rainer. 2010. ‘Im Tabellenkeller brennt noch Licht’: Metaphern in der Fußballsprache
[At the bottom of the table there’s still some light: Metaphors in football language]. Der
Deutschunterricht 62(3). 26–37.
Levin, Magnus. 2008. ‘Hitting the back of the net just before the final whistle’: High-frequency
phrases in football reporting. In Eva Lavric, Gerhard Pisek, Andrew Skinner & Wolfgang
Stadler (eds.), The linguistics of football, 143–155. Tübingen: Narr.
Lewandowski, Marcin. 2012. Football is not only war: Non-violence conceptual metaphors in
English and Polish soccer language. In Janusz Taborek, Artur Tworek & Lech Zielinski
(eds.), Sprache und Fußball im Blickpunkt linguistischer Forschung [Language and football
in the view of linguistic analysis], 79–96. Hamburg: Kovač.
Müller, Torsten. 2007. Football, language and linguistics: Time-critical utterances in unplanned
spoken language, their structures and their relation to non-linguistic situations and
events. Tübingen: Narr.
Newsworks. 2013a. The Guardian. http://www.newsworks.org.uk/The-Guardian (accessed 20
April 2013).
Newsworks. 2013b. The Sun. http://www.newsworks.org.uk/The-Sun (accessed 20 April 2013).
O’Keeffe, Anne. 2006. Investigating media discourse. London: Routledge.
Perez-Sabater, Carmen, Gemma Pena-Martinez, Ed Turney & Begona Montero-Fleta. 2008. A
spoken genre gets written: Online football commentaries in English, French, and Spanish.
Written Communication 25(2). 235–261.
304 Valentin Werner
Press Gazette. 2013. UK national newspaper sales: Relatively strong performances from Sun
and Mirror. http://www.pressgazette.co.uk/uk-national-newspaper-sales-relatively-
strong-performances-sun-and-mirror (accessed 21 May 2013).
Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus
Santini, Marina, Alexander Mehler & Serge Sharoff. 2010. Riding the rough waves of genre on
the web: Concepts and research questions. In Alexander Mehler, Serge Sharoff & Marina
Santini (eds.), Genres on the web: Computational models and empirical studies, 3–30.
Dordrecht: Springer.
Schmidt, Thomas. 2007. The Kicktionary: A multilingual resource of the language of football.
In Georg Rehm, Andreas Witt & Lothar Lemnitzer (eds.), Data structures for linguistic
resources and applications, 189–196. Tübingen: Narr.
Siever, Torsten. 2011. Texte i. d. Enge: Sprachökonomische Reduktion in stark raumbegrenzten
Textsorten [Constricted texts: Language-economical reduction in heavily space-constrained
text types]. Frankfurt am Main: Lang.
Simons, Anton. 2011. Journalismus 2.0 [Journalism 2.0]. Konstanz: UVK.
Thurman, Neil & Anna Walters. 2013. Live blogging: Digital journalism’s pivotal platform. Digital
Journalism 1(1). 82–101.
Wells, Matt. 2011. How live blogging has transformed journalism: The benefits and the
drawbacks of the open-to-all digital format. http://www.guardian.co.uk/media/2011/
mar/28/live-blogging-transforms-journalism (accessed 13 April 2013).
Appendix
Match Match day Commentators Associated files

(if available) (guar = The Guardian;
sun = The Sun;
bild = Bild;
spon = Der Spiegel)
Germany – Portugal 09/06/2012 GUAR: N/A ger_por_0906_xx

SUN: N/A
BILD: N/A
SPON: Christian Paul
Germany – Netherlands 13/06/2012 GUAR: N/A ger_ned_1306_xx

SUN: N/A
BILD: N/A
SPON: Jan Reschke
Germany – Denmark 17/06/2012 GUAR: Ian McCourt ger_den_1706_xx

SUN: N/A
BILD: N/A
SPON: Mike Glindmeier
Match Match day Commentators Associated files

(if available) (guar = The Guardian;
sun = The Sun;
bild = Bild;
spon = Der Spiegel)
Germany – Greece 22/06/2012 GUAR: Rob Smyth ger_gre_2206_xx

SUN: N/A
BILD: N/A
SPON: Lukas Rilke
Germany – Italy 28/06/2012 GUAR: N/A ger_ita_2806_xx

SUN: N/A
BILD: N/A
France – England 11/06/2012 GUAR: Scott Murray fra_eng_1106_xx

SUN: N/A
BILD: N/A
SPON: Christian Paul
Sweden – England 15/06/2012 GUAR: Jacob Steinberg swe_eng_1506_xx

SUN: N/A
BILD: N/A
SPON: N/A
Ukraine – England 19/06/2012 GUAR: Barry Glendenning ukr_eng_1906_xx

SUN: N/A
BILD:N/A
SPON: N/A
Italy – England 24/06/2012 GUAR: N/A ita_eng_2406_xx

SUN: N/A
BILD: N/A
Javier Pérez-Guerra
Word order is in order here: A diachronic
register analysis of syntactic markedness
in English
Abstract: In line with multidimensional proposals under which registers can be
stylistically and/or situationally defined by paying attention to the frequency
of a selection of linguistic features, this study explores the connection between
syntactic markedness at the level of the clause and stylistic characterisation in
a number of registers in the history of English. In particular, this chapter inves-
tigates three syntactic constructions leading to syntactically marked clausal
designs which do not conform to subject-verb-complement word order: left dis-
location, topicalisation and subject-inversion/extraposition. The data, retrieved
from multi-register parsed corpora, show that the distribution of these construc-
tions correlates with the degree of stylistic specificity and conventionalisation of
the registers. In particular, those registers in which these constructions are par-
ticularly frequent feature more specific situational or stylistic choices related to
literacy or subject-/participant-involvement. As a matter of fact, out of the three
constructions, topicalisation has proved to have less radical consequences for the
syntax of the clause, and this correlates with its even distribution across registers.
1 I ntroduction1
The linguistic analysis of registers/genres/text types in a language has always
been controversial, possibly because of the intangible status of such key concepts
(see Schubert, this volume). As Swales (1990: 33) points out when he refers to
specifically genres, “[t]he word [‘genre’] is highly attractive – even to the Parisian
timbre of its normal pronunciation – but extremely slippery”. A first termino-
1 I am grateful to the following institutions for generous financial support: the Spanish Minis-
try of Economy and Competitiveness and the European Regional Development Fund (grant no.
FFI2013-44065-P), and the Autonomous Government of Galicia (grant no. GPC2014/060).
Javier Pérez-Guerra, University of Vigo

308 Javier Pérez-Guerra
logical remark seems thus in order here as regards the definition of ‘register’,
which constitues the research topic in this study. Following, for example, Taavit-
sainen (2001), who maintains that genres are based on “external evidence in the
context of culture” (140; my italics), where “external evidence” refers to the con-
ventions that have come institutionalised “so that they can function […] as ‘hori-
zons of expectation’ for readers to know what to expect and models of writing
for authors” (141), I will use ‘genre’ when I refer to exclusively the cultural and/
or social dimension of a given textual category. ‘Register’ will be used here with
a focus on the way in which the internal linguistic features of texts are codified
in a given text or category of texts, which matches Taavitsainen (2001: 141) term
‘text type’. Even though text types and genres commonly go hand in hand since
the linguistic characterisation of a textual category prototypically leads to the
latter’s conventionalisation and specialisation in fulfilling a certain discoursive,
communicative or social function, Taavitsainen herself recalls Fairclough’s (1992:
126) claim that a “genre [on occasions] implies not only a particular text type, but
also particular processes of producing, distributing and consuming texts”, which
broadens the notion of genre and covers elements which lie beyond the scope of
this chapter.
Such lack of definition of concepts such as register, genre or text type has led
to multi-faceted studies in this area, adopting a number of different theoretical
frameworks. On some occasions, linguists have addressed the linguistic analy-
sis of registers by focusing on the core or prototypical communicative purposes
attributed to these in (quite often traditional) stylistics. For example, Swales
(1990: 46) notes that “[t]he principal criterion that turns a collection of commu-
nicative events into a [register] is some shared set of communicative purposes”.
In Halliday’s (1978: 122) Systemic Functional Grammar, registers (genres, in their
terminology) are analysed in terms of three variables: their content (or ‘field’), the
participants (‘tenor’) and the channel of communication (‘mode’), that is, three
dimensions which focus on the communicative elements and purposes involved
in a given register. On other occasions, in an approach that will be used in the
present chapter, the study of registers has been addressed through focusing on
empirically-observable stylometric features (e.g. type-token ratios, length of syl-
lables, words, sentences, paragraphs) which are themselves said to reflect more
greater-level concepts such as lexical or syntactic complexity, lexical richness
and ornamentation, etc. In Biber and Conrad (2009) the two basic approaches
just summarised, which I refer to, respectively, as the ‘communicative’ and the
‘language-based’ views, are embodied in a taxonomy which identifies three
perspectives on text varieties (see, for a brief overview, their Table 1.1): (i) style,
which analyses aesthetic and authorial preferences in a given text or group of
texts; (ii) genre, which focuses on the conventional linguistic devices specific to
Diachronic register analysis of markedness 309
a text variety (e.g. ‘genre markers’ such as Dear Sir in a letter); and (iii) register,
which, as already pointed out, deals with the linguistic characteristics common
within a text variety ‒ and also with the situation of use of the variety as will be
argued later. The taxonomy is described in more detail in Dorgeloh (this volume;
Section 2 in particular) and Schubert (this volume).
So far I have equated register with the language-based characterisation of
a given textual category. In this scenario, a further dimension of register must
be brought into play. In line with previous proposal couched in the mutidimen-
sional tradition, Biber and Conrad (2009: 6) claim that the linguistic character-
istics of the textual categories, materialised by means of pervasive and frequent
linguistic features, are “well suited to the purposes and situational context of the
register”. That said, this chapter adheres to such a two-fold view of text varie-
ties, that is, both language-based and situational, and, within a register-centred
approach (as suggested in, for example, Biber 1995a: 1), focuses on the study of a
number of texts in an attempt to explore register variation over the course of the
history of English. On the one hand, I will describe a number of textual categories
by exploring their dependency on a list of structural features, thus adhering to
what is commonly understood by ‘text type’, that is, “grouping of texts that are
similar in their linguistic form” (Biber 1988: 170) or, in other words, codifications
of linguistic features (Taavitsainen 2001: 141). On the other hand, I will connect
the language-based characteristics of the texts with their siatuational interpreta-
tion, thus accepting, for example, Virtanen’s (2010: 57) claim that such linguistic
features “clearly relate to the form that [discourse functions] will take through
aggregates of linguistic exponents of the particular text strategies that are asso-
ciated with them”. The situational interpretation (better said, the functional
interpretation) of the linguistic characteristics of a given text type will lead to the
latter’s status as a ‘register’, in Biber’s terminology. This approach departs from,
for example, Dorgeloh and Wanner’s (2010: 10) terminological account, summa-
rised in Figure 1, where ‘register’ is used as a cover term for text type, genre and
style, and sticks to a twofold characterisation of register which comprises mainly
Dorgeloh and Wanner’s both text type and genre.
Figure 1: Register, text type, genre and style in Dorgeloh and Wanner (2010)
This chapter will focus on register variation and, more specifically, on the rel-
evance of syntax for this issue. In this respect, Dorgeloh and Wanner (2010)
observe that resgiter is “language variation beyond the limits of semantic equiv-
alence, which is why syntax […] provides a promising area of study” (8) and that
“[i]t is form, and here morphosyntactic form in particular, that constitutes ‘a prior
condition for reasoning about [register]’” (9). In this scenario, under the philos-
ophy of Biber’s (1988, 1995a) groundbreaking multifactorial multidimensional
model, this study will combine the main approaches to the analysis of registers
already mentioned, that is, communicative and more language-based (syntactic)
standpoints, in that findings from the latter will be associated with a correspond-
ing functional interpretation (or dimensional interpretation, as Biber puts it).
In other words, by investigating the spread of a number of objectively identified
linguistic constructions in a selection of registers, and by interpreting the statis-
tical results of (co-)occurrence, this study will not only shed some light on the
functional interpretation of registers but also detect diachronic variation across
them. Furthermore, this chapter will suggest some kind of link between syntactic
markedness and the degree of (functional) conventionalisation or specialisation
of registers.
This paper, then, focuses on the analysis of registers in English while also
describing variation in the recent history of the language. It also aims to con-
sider the application of some of the assumptions of Biber’s model to syntactic
strategies at a supra-phrasal level. In Section 2, I will very briefly summarise the
features of the multidimensional model which constitutes the inspiration for the
study, this case study and its specific methodology. The results are discussed in
Section 3. Section 4 offers a summary of the investigation plus some suggestions
for further avenues for research.
2 The case study

Biber’s model, which has inspired this study, is based on three theoretical
assumptions, summarised in Schubert (this volume) and recapped here only for
introductory purposes: (i) the distinctive characteristics of a register are derived
from inherent tendencies affecting the statistical productivity of a number of lin-
guistic features; (ii) the patterns of these (co-)occurring features portray under-
lying dimensions of variation on which texts differ significantly; and (iii) these
dimensions can be interpreted in terms of the social, situational and text-func-
tional roles that their constitutive features have been found to play in previous
research. As summarised in Biber (1995b), the sixty-seven features used in the
first applications of the model belonged to different fields of linguistic analysis:
syntactic (causal subordination, coordination, deletion of complementiser that,
wh subject relativisers, pied-piped prepositions, stranded prepositions, particip-
ial adverbial clauses), grammatical (morphosyntactic categories such as nouns,
adjectives, prepositions, demonstratives) and lexical categories (hedges, ampli-
fiers, emphatics), as well as other metrics such as word length and type/token
ratio. As noted above, the factors or families of features lead to the dimensions
which are interpreted situationally.
Biber and Conrad (2009: 51) established the pillars of the methodology: the
need for a comparative approach, for quantitative analysis and for a representa-
tive sample. First, as regards the comparative approach, this study investigates
three syntactic constructions, described in Section 2.1, by assessing two variables
which will allow for comparison and contrast: diachrony and register. Second,
the need for quantitative analysis has been accomplished by the empirical
methodology described in Section 2.2. Third, the whole survey is driven by data
retrieved from multi-register balanced corpora, as a means of attaining empirical
representativeness and significance.
2.1 T
he linguistic variables
The study outlined in this chapter reports on a construction-driven analysis of

historical registers in English by looking at supra-phrasal variables or features
which have not thus far been explored in the literature. As pointed out in Section
1, early studies by Biber and his colleagues – and practically all subsequent
studies derived from these – are based on counts of lexical features. In fact,
even those syntactic features which operate at the clause- or sentence-level were
singled out by computing the frequency of lexical items such as complementiser
that, specific (causal) conjunctions, members of the closed set of English pre
positions, relativisers which, who, etc. In this chapter, and this makes this study
particularly innovative, I concentrate on syntactic supra-phrasal variables, spe-
cifically word-order phenomena, which cannot be determined by focusing on the
occurrence ratios of specific lexical elements. Following the multidimensional
model, these will be given so-called social or functional interpretations which
will pave the way for the detection of diachronic variation in English as far as
sentence linearisation is concerned.
As regards the variables to be analysed here, I have focused on syntactic
markedness at the level of the clause. From (at least) a statistical standpoint,
the default organisational schema of a declarative clause in English is subject-
verb-(complement), this being the most versatile design of the clause from the
point of view of information structure and processing. Deviation from such a
schema implies some degree of markedness. In particular, in what follows I will
focus on three syntactic strategies which, first, lead to marked designs as far as
word order is concerned and, second, involve elements other than the subjects in
sentence-initial position. Since this methodology aims to determine not strictly
linguistic but also social or situational variation in the language, I will follow
Virtanen (2004: 12) in her claim that “the sentence-initial slot itself constitutes a
rich source of discourse meanings precisely because of its cognitive relevance for
our processing capacities and memory constraints”. The three constructions are:
(i) Topicalisation (TOP), in which a (marked) constituent is in sentence-ini-
tial position ‒ example (1) below illustrates the topicalisation of the that-clause
object that I had received such from Edward,
(ii) Left dislocation (LFD), with a (marked) non-argument constituent in sen-
tence-initial position ‒ in (2), the constituent he that thynkethe it a harde thynge
to agre to the conclusion is a left-dislocated noun phrase which corefers with the
pronominal object hym in the ensuing main clause,
(iii) What I call other ‘subject-last’ strategies (SUBJ-LAST), which contain
(marked) non-subject constituents in sentence-initial pre-verbal position. The
SUBJ-LAST strategy comprises basically those examples of subject-verb inver-
sion and subject-extraposition ‒ example (3) below illustrates subject-verb inver-
sion, with the subject complement very great in sentence-initial position and the
subject following the verb; example (4), in which the that-clause that for x. yeres
then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s p~sones
functions as the (logical) subject of the sentence and occurs in sentence-final
position, involving the insertion of expletive it in sentence-initial (preverbal)

position, exemplifies subject-extraposition.2
(1) [That I had received such from Edward]i also I need not mention ∆i (Austen-180X,187.621)
[TOP]
(2) […] but [he that thynkethe it a harde thynge to agre to the conclusion,]i it behoueth
hymi to shew eyther that some false thynge hath gone before, (BOETHCO-E1-H,99.610)
[LFD]
(3) […] and very great was [my pleasure in going over the house and grounds]Subject. (Aus-
ten-180X,168.182) [SUBJ-LAST, subject inversion]
(4) yt was enacted ordeigned and graunted by auctorite of the same p~liament, [that
for x. yeres then next folowyng sevãll Comyssions of Sewers shuld be made to dyv~s
p~sones]Subject, (Statutes(II):524) [SUBJ-LAST, subject extraposition]
As already pointed out, the strategies TOP, LFD and the so-called SUBJ-LAST
constructions investigated here have been chosen because they are syntactically
marked since they do not comply with the default subject-verb(-complement)
design. In particular, their markedness is basically due to the location occupied
by the subjects, which are not clause-initial when constituents are topicalised
(TOP) or left-dislocated (LFD), when verbs and subjects swap positions (sub-
ject-verb inversion, a type of SUBJ-LAST) or when the subjects are placed in
clause-final position (subject-extraposition, another instance of the SUBJ-LAST
construction). Since subject placement is the trigger for these construction, in
line with the above-mentioned consequences which the unmarked placement of
the subject has for the processing and interpretation of clauses and sentences, in
what follows I will provide a very brief overview of the informative and/or com-
municative properties of the strategies TOP, LFD and SUBJ-LAST.
First, TOP merits attention in register analysis because this syntactic strat-
egy involves a specific not only syntactic but also informative arrangement of the
clause. Following Virtanen (2004: 80–82) [my italics],
Starting points are assumed to be light, small in size, and consist of given information. The
reader’s main inferencing effort is expected to take place later in the sentence […]. Secondly,
elements placed at the outset of a sentence also help readers anticipate what is to come as
they pinpoint what the sentence is about and how it relates to the discourse topic (…). Fur-
thermore, it is occasionally profitable to start with what is regarded as ‘crucial information’
2 TOP, LFD and the constructions within the frame of the SUBJ-LAST strategy have been ap-
proached from different perspectives in, for example, Virtanen’s (2010) qualitative scrutiny of
sentence openers in narrative texts and in Kreyer’s (2010) paper on sentence-initial locatives in
inversion constructions, in which a qualitative perspective on the description of the so-called
‘immediate-observer effect’ function is adopted.
[…] Sentence-initial adverbials […] tend to form chains of text-strategic markers which have
two basic functions in the discourse. They help create coherence and at the same time they
signal text segmentation.
Virtanen thus summarises the informative function of TOP, that is, introduc-
ing constituents which do not convey given information in a position which is
reserved for given elements according to the given-new principle. This analysis
of TOP is in keeping with Prince (1981: 128), who highlights the salient status
of topicalised constituents. Prince claims that TOP implies “inference on the
part of the hearer that the entity represented by the initial NP stands in a salient
partially-ordered set relation to some entity or entities already evoked in the dis-
course-model”. Furthermore, she contends that “if the entity evoked by the left-
most NP represents an element of some salient set, make the set-membership
explicit”.
Second, the discourse functions which have been attributed to LFD in the
literature can be reduced to two: (i) a ‘simplifying’ function, according to which a
constituent conveying discourse-new information can be placed in sentence-in-
itial position, and (ii) a ‘poset’ function. As for the simplifying function, Prince
(1997: 138–139) contends that LFD can “simplify discourse processing by removing
a Discourse-new entity from a position in the clause which favors Discourse-old
entities, replacing it with a Discourse-old entity (i.e. a pronoun)”. In the same
vein, Gundel (1985) and Geluykens (1992) claim that LFD introduces a new topic
into discourse. On the other hand, Prince (1997: 138–139) maintains that sen-
tences containing left-dislocated phrases “trigger an inference that the entity rep-
resented by the initial NP stands in a salient partially-ordered set relation to some
entities already in the discourse-model”, and that this favours the so-called poset
function. In other words, the left-dislocated constituent resumes a number of ref-
erents previously evoked in the sentence by introducing a new expression which
activates previous earlier (thus, informatively given or old) referents. In short,
like TOP, LFD implies the placement of a new constituent in sentence-initial posi-
tion, the main difference between TOP and LFD being that the former selects an
extralinguistic referent already evoked in discourse and marks it as informatively
salient, whereas LFD constituents seldom refer to topics which have already been
introduced in the discourse.
Third, as already stated, the SUBJ-LAST constructions involve examples of
subject-verb inversion and subject-extraposition, illustrated, respectively, in (3)
and (4) above. As regards subject-verb inversion, it is commonly acknowledged
in the literature (e.g. Green 1980: 583; Birner 1994: 241; Dorgeloh 1997: 46) that
the informative principle given-new is not at work in subject-verb inversion, since
the preverbal constituent conveys information which is salient in the discourse,
whereas the subject is informatively anti-prominent or, in other words, materi-

alises referents which have already been evoked. In fact, Takahashi (1992: 138)
contends that subject-verb inversion fulfils a “Subtopically-Presentational-Focus-
emphasizing function”, that is, it accommodates (discourse-new) presenta-
tional constituents in sentence-initial position and relegates to sentence-final or
postverbal position discourse-given grammatical subjects. Bolinger (1992: 294)
emphasises the focusing or presentational effect of inversion when he says that
it locates the informatively non-prominent subject almost physically ‘on-stage’.
The second SUBJ-LAST construction considered in this chapter is subject-
extraposition. Its function is claimed to be different from that of subject-verb
inversion (see, for instance, McCawley 1988), since, as a newness device, sub-
ject-extraposition accommodates informatively new subjects in final position,
thus keeping track of given-new. However, the empirical analysis of extraposed
subjects from Late Middle to Present-Day English in Pérez-Guerra (2005: 349–350)
shows that information structure is not a decisive factor in explaining subject-
extraposition since 60 to 70 percent of the extraposed subjects in this study are
informatively referring and the information conveyed by sentence-medial con-
stituents (mostly subject complements) in the examples of subject-extraposition
is less referring in nature than that carried by the extraposed subjects.3 In conse-
quence, it can be concluded that both subject-verb inversion and subject-extra-
position are mostly new-given constructions and can be subsumed under SUBJ-
LAST in the present approach.
This section has provided a basic characterisation of LFD, TOP and SUBJ-
LAST in terms of information structure. The syntactic marked organisation of
these constructions correlates with their deviation from entrenched informative
rules such as given-new. In short, informatively new and/or salient constituents
are placed sentence-initially in LFD, TOP and SUBJ-LAST structures, where one
would expect elements conveying given information, and informatively given
subjects are preferred in postverbal and/or final position in the SUBJ-LAST con-
struction type.
3 The data in Pérez-Guerra (2005: 350) confirm that the determinant of subject-extraposition
is not end-focus but end-weight. The strategy of extraposition is, then, redistributional in the
sense that its main role is to place long clausal subjects in final position and thus preserve the
unmarked subject-verb(-complement) pattern from having non-prototypical material in sen-
tence-initial position.
2.2 The data and the methodology
The data for the present study were retrieved from the following corpora:
– the Penn-Helsinki Parsed Corpus of Middle English, second edition (1150–
1500; henceforth PPCME2; Kroch and Taylor 2000),
– the Penn-Helsinki Parsed Corpus of Early Modern English (1500–1710;
PPCEME; Kroch et al. 2004)
– the Penn Parsed Corpus of Modern British English (1700–1914; PPCMBE;
Kroch et al. 2010).
The periods to be investigated are Middle (ME), Early Modern (EModE) and Late
Modern English (LModE), that is, the periods following the initiation of the
process of word-order syntacticisation or fixation in English around the default
pattern subject-verb(-complement) in declarative clauses. These corpora were
selected because, first, they are multi-register and, as noted above, this accom-
modates the need for representativeness. Second, they are parsed corpora follow-
ing (almost) identical parsing conventions. These make use of part-of-speech and
syntactic tagsets based on what we might call a shallow version of Principles-
and-Parameters. To give an example from the corpora, (5’) plots the graphical
adaptation of the parsed version of sentence in (5) from PPCMBE:
(5) a serious cheerfulness; that is the right mood in this as in all cases. (CARLYLE-
1835,2,278.374)
(5’) ( (1 IP-MAT (2 NP-LFD (3 D a) (5 ADJ serious) (7 N cheerfulness))

(9 , ;)
(11 NP-SBJ-RSP (12 D that))
(14 BEP is)
(16 NP-OB1 (17 D the) (19 ADJ right) (21 N mood))
(23 PP (24 P in)
(26 NP (27 D this)
(29 PP (30 P as)
(32 PP (33 P in)
(35 NP (36 Q all) (38 NS cases))))))
(40 . .))
(5’) includes part-of-speech tagging (e.g. lexical morphosyntactic categories such

as D(eterminer), ADJ(ective), N(oun) or P(reposition)) and syntactic annotation
(e.g. phrasal categories such as IP for Inf(lection) phrase ‒ basically correspond-
ing in the Principles-and-Parameters model to the category clause ‒, NP for noun
phrase and PP for prepositional phrase, as well as functional labels such as OB1
for object, LFD for left-dislocated constituent and RSP for resumptive, that is, the
proform which corefers in the clause with the left-dislocated material).
LFD is parsed as such in the corpora, which means that the data can be
retrieved automatically by means of specific software. In this case, the raw empir-
ical results of the search had to undergo extensive manual revision. Thus LFD was
retrieved by means of the (CorpusSearch) query in (6), which identifies clauses
(or IPs) dominating left-dislocated constituents.
(6) node: IP*

query: (IP* Doms *-LFD)
A very small number of examples of LFD in my database are not nominal,4 as is

the case in (7) below, which contains a left-dislocated prepositional phrase and a
resumptive pronoun governed by a preposition in the main clause:
(7) But of the tree of the knowledge of good and euill, thou shalt not eate of it: (AUTHOLD-
E2-H,II,1G.155)
By contrast, many of the examples parsed as LFD in the corpora which contain
non-(pro)nominal resumptives have not been considered in this study. Exam-
ples of such constructions are given in (8) to (10), in which the resumptives are,
respectively, then, yet and so:
(8) […] but if it worke vpon it selfe, as the Spider worketh his webbe, then it is endlesse,
(BACON-E2-H,1,20R.49)
(9) […] and though he suffer’d only the name of a slave, and had nothing of the toil and
labour of one, yet that was sufficient to render him uneasy; (BEHN-E3-H,193.231)
(10) And as these Languages ought to be well understood, so they shou’d be learn’d in as
short a Time as may be. (ANON-1711,3.6)
As regards TOP, which was not specifically tagged in the corpora used here, the
CorpusSearch queries in (11) and (12) were used to retrieve examples, respectively,
of topicalised complements (more specifically, nominal objects, subject predica-
4 A few examples from the database contain TOP and LFD of that-clauses. As regards LFD, since
such that-clauses are resumed by a (pro)nominal copy, they fit the concept of LFD as established
in this study. An example of a left-dislocated that-clause is given in (i):
(i) [That false Locks as they call them of some Hair, being by curling or otherwise brought to a
certain degree of driness, or of stiffness, will be attracted by the flesh of some persons, or
seem to apply themselves to it, as Hair is wont to do to Amber or Jet excited by rubbing.]i Of
thisi I had a Proof in such Locks worn by two very Fair Ladies that you know. (BOYLE-E3-
H,27E.93)
tives5 and prepositional/adverbial complements6 occurring before nominal sub-

jects) and adjuncts (prepositional and adverb phrases preceding nominal sub-
jects). As already pointed out, some of the examples retrieved by the queries had
to be excluded manually, since they were not correct instantiations of TOP.
(11) node: IP-MAT*

query: (IP-MAT* iDoms NP-OB*|NP-SPR)
AND (IP-MAT* iDoms NP-SBJ*)
AND (NP-OB*|NP-SPR precedes NP-SBJ*)
(12) node: IP-MAT*

query: (IP-MAT* iDoms PP*|ADVP*)
AND (IP-MAT* iDoms NP-SBJ*)
AND (PP*|ADVP* precedes NP-SBJ*)
Finally, with respect to SUBJ-LAST, the CorpusSearch query in (13) retrieved

matrix IPs or clauses containing at least the following two immediate consti
tuents: sentence-final noun phrases functioning as subjects and pronominal
(expletive) subjects.
(13) node: IP-MAT*

query: (IP-MAT* iDomsLast NP-SBJ)
AND (NP-SBJ iDoms !PRO)
Table 1 provides the raw figures of the distribution of the three constructions
under analysis (the TOP data in Table 1 only includes topicalised complements
for reasons which will be explained below). Figure 2 sets out the frequencies for
LModE normalised to 1,000 clauses (or IPs):
5 An (archaic) illustration of a clause introduced by a topicalised object predicative is Male and

female created he them (ERV-OLD-1885,1,20G.66).
6 My database includes only a small number of examples of topicalised prepositional comple-
ments (in (i)) and adverbial complements (in (ii)):
(i) To them may be applied what St. James says on a like occasion (BURTON-1762,2,5.116)
(ii) In the inward Frame the various Passions, Appetites, Affections, stand in different Respects
to each other. (BUTLER-1726,235.69)
Table 1: Totals of LFD, TOP and SUBJ-LAST constructions from ME to LModE
LFD TOP SUBJ-LAST Clauses
PPCME2 1,638 1,878 2,989 74,092

PPCEME 575 359 611 34,896
PPCMBE 369 352 677 60,100
Total 2,582 2,589 4,277 169,088
Figure 2: Normalised frequencies of LFD, TOP and SUBJ-LAST constructions in LModE
Since, as Figure 2 shows, the frequencies of topicalised adjuncts (TOP_adj), as in

(14) below, and of complements (TOP_compl), in (1) above, differ greatly, I have
opted for focusing exclusively on topicalised complements, whose proportion
is closer and thus comparable to that of the LFD and the SUBJ-LAST construc-
tions. In this vein, since the criterion for the distinction between complement
and adjunct is syntactic (and semantic) selection by the verb, in what follows I
will consider only those examples of topicalised constituents which are subcate-
gorised by the verb (e.g. objects, prepositional complements, adverbial comple-
ments, predicative complements).
(14) [After that a childe is come to seuen yeres of age,]Adjunct I holde it expedient that he be
taken from the company of women (ELYOT-E1-H,23.27)
The proportions of LFD, TOP and SUBJ-LAST were analysed in all the registers
in the corpora, namely Biography, Diary, Drama, Education, Fiction, Handbook,
History, Law, Letters, Philosophy, Science, Sermon, Religious treatises, Trave-
logue, Trials and Romance. Due to their archaic style and clausal organisation, I
did not include Bible texts. Also, given that comparison with other Fiction texts
in the latter periods is impracticable, the Fiction material in ME was not ana-
lysed. Following Culpeper and Kytö’s (2010: 16–18) typology of registers, those
listed above can be argued to provide an overall view of the English language in
its recent history: (i) writing-related registers such as Science, Law, Education,
Religious treatises, that is, registers which are primarily attested in the written
form; (ii) speech-purposed registers, designed to be articulated orally (either read
out or performed), like Drama and Sermons; (iii) speech-like texts in the Diaries,
Letters and Biographies, which contain features of “communicative immediacy”
(Culpeper and Kytö 2010: 17); and (iv) speech-based registers, based on actual
real-life speech events, here illustrated by the Trials.
The normalised frequencies of the three constructions in all the registers are
plotted respectively in Tables 2, 3 and 4.
Table 2: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in ME
LFD TOP SUBJ-LAST
Biography 15.11 34.62 41.93

Handbook 16.65 11.10 19.26
History 9.25 13.26 36.30
Law 30.70 38.28 17.54
Philosophy 44.18 16.83 21.71
Religious treat. 29.32 31.95 35.12
Romance 5.76 17.10 109.40
Sermons 29.05 31.57 23.38
Travelogue 14.79 15.09 72.72
Mean 21.65 23.31 41.93

Table 3: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in EModE
LFD TOP SUBJ-LAST
Biography 30.33 13.31 19.22

Diary 4.07 5.63 25.42
Drama 4.18 8.12 19.77
Education 24.29 10.05 9.12
Fiction 8.77 10.96 59.90
Handbook 33.44 7.17 5.30
History 19.55 17.46 21.03
Law 91.65 8.15 3.64
Letters 14.65 8.99 4.50
Philosophy 26.20 16.16 3.50
Science 41.50 9.94 16.89
Sermon 32.95 6.71 18.17
Travelogue 6.47 5.55 29.29
Trials 6.28 4.63 12.42
Mean 24.59 9.49 17.73
Table 4: Normalised frequencies (/1,000 IPs) of LFD, TOP and SUBJ-LAST in LModE
LFD TOP SUBJ-LAST
Biography 4.34 3.67 5.67

Diary 1.61 5.37 5.55
Drama 1.84 2.61 15.96
Education 8.97 5.12 7.04
Fiction 5.99 5.99 54.17
Handbook 6.87 5.62 5.31
History 5.29 4.70 15.88
Law 11.94 3.47 13.86
Letters 2.42 3.70 4.12
Philosophy 16.27 15.18 5.42
Science 3.49 2.33 3.72
Sermon 29.07 10.65 13.10
Travelogue 1.13 4.75 9.73
Trials 1.85 4.51 0.21
Mean 7.22 5.55 11.41
With a view to determining the statistical role of each construction in the periods
under investigation, Figure 3 below displays the frequencies of the three con-
structions and reveals that, in line with the syntacticisation of subject-verb(-com-
plement) word order in English, they all decrease considerably over time. More
specifically, whereas LFD accounted for approximately 20 to 25 examples (per

1,000 IPs) in ME and EModE, its normalised frequency is 7 clauses in LModE. As
regards TOP, around 23 clauses per 1,000 contained topicalised complements in
ME, this normalised frequency being slightly higher than 10 in LModE. Finally,
sentence-final subjects are also rare in LModE, when approximately 13 clauses
(per 1,000 IPs) belong to the SUBJ-LAST construction type, and this was the pre-
ferred pattern at a normalised frequency of 42 in ME. These proportions evince
the statistically marked condition of the three syntactic strategies and thus their
potential status as markers of other functional or situational roles. I will return
to the connection between markedness and situational delimitation in Section 3.
Figure 3: Frequencies of LFD, TOP and SUBJ-LAST in ME (PPCME2), EModE (PPCEME) and LModE
(PPCMBE)
3 Analysis of the data

In this section I employ what Biber (2013) would call both a ‘linguistic variationist’
approach, in which the register itself is taken as a variable, and a ‘text-linguistic’
perspective, according to which the registers or the texts are the research objects.
In other words, the small-scale multifeature analysis which is developed in this
chapter aims, first, at describing register variation across time and, second, at
profiling the situational or functional roles played by three marked word-order
designs in the various registers from which the data were extracted.
The section is organised as follows: 3.1 deals with the distribution of the LFD
data. Section 3.2 focuses on the analysis of the TOP examples from the database.
Finally, Section 3.3 considers the diachronic progression of the SUBJ-LAST con-
structions under investigation.
3.1 Left dislocation and register
Figures 4, 5 and 6 contain the normalised frequencies (per 1,000 clauses) of LFD
in, respectively, ME, EModE and ModE. Table 5 provides an overview of the fre-
quency of LFD per register.7
Figure 4: LFD in ME (the dotted line plots the mean normalised frequency)
7 In the columns containing the registers with lower/higher proportions of LFD, TOP and SUBJ-
LAST in, respectively, Tables 5, 6 and 7 I have included a selection of the registers occurring
either before (lower proportions) or after (higher proportions) of the dotted line expressing the
mean normalised frequency of the distribution in the figures preceding the tables. As the figures
reval, the groups of registers resulting from the classification into those exhibiting more or fewer
examples of the constructions under investigation is not neat and, in consequence, in order to
determine the connection between register and syntactic markedness I have considered only
those registers which are more representative for that purpose.
Figure 5: LFD in EModE
Figure 6: LFD in LModE

Table 5: LFD and registers across time
LOWER PROPORTIONS HIGHER PROPORTIONS
ME Romance Religious treatises

Law
Philosophy
EModE Diary Science

Drama Law
Trials
Letters
LModE Travelogue Philosophy

Diary Law
Trials
Letters
In light of the proportions of sentences containing left-dislocated constituents in

initial position in ME, the following conclusions can be reached: first, the regis-
ters which are stylistically less literate (Biography, Romance, Travelogue), that is,
those which demand on the reader’s part fewer technical understanding skills
and linguistic abilities, contain a lower number of examples of LFD and, second,
the registers which are stylistically more literate (Law, Philosophy, Religious trea-
tises) contain more examples of LFD.8 The fact that Sermons (and possibly this
can also be applied to the type of texts contained in the Philosophy historical
registers, with predominant speech-related/purposed status due to the inclusion
of the dialogues in Boethius’ De Consolatione Philosophiae) are grouped with the
more literate registers implies that the distribution of LFD is conditioned by reg-
ister literacy (the more literate the register is, the greater the frequency of LFD)
and not by the production circumstances associated with either the spoken or the
written medium. As for EModE and LModE, the relative proportions of LFD per
register are quite similar and reinforce the view that stylistic literacy also seems
to be the significant factor in these periods. As shown in Table 5, this tendency is
relatively stable across time.
8 The adscription of the historical registers under investigation to the more/less literate options
is based on stylistic pervasiveness within the text types. Even though the degree of stylistic hy-
bridity is noteworthy in some of the registers (see my comments in Section 4), in order to de-
termine connections between register and productivity of LFD, I have adhered to the taxonomy
±literate by relying on the style which is dominant in the texts studied.
From a theoretical perspective, LFD is a strategy which disrupts the unmarked

organisation of the clause. First, as already pointed out, subjects are not sen-
tence-initial in contexts of LFD. Second, the constituents in sentence-initial posi-
tion in LFD contexts (that is, the constituents which are left-dislocated) do not
fulfil a syntactic function within the clause or, in order words, cannot be syntac-
tically integrated with the ensuing clause. In fact, LFD is possibly the only syn-
tactic strategy in English which enables the allocation in a clause of a constituent
which is semantically connected with the clause and yet syntactically untethered
to it. Consequently, the syntax of LFD leads to the characterisation of this con-
struction as a highly marked syntactic device in English. From this perspective,
I will argue below, and at greater length in Section 4, that linguistic markedness
can be claimed to be closely connected with functional specificity in register
analysis, and that this paves the way for the consideration that LFD is a formal
indicator of stylistic literacy, at least in the recent history of English. Couched
in the terminology of multidimensional register analysis, LFD can be taken as a
linguistic feature which positively contributes to the minus-plus dimension ‘less
literate versus more literate’.
3.2 Topicalisation and register
Following the outline in Section 3.1, Figures 7, 8 and 9 show the distribution of
TOP in, respectively, ME, EModE and LModE in the database. Table 6 summarises
the results by classifying the registers into those in which TOP is frequent and
those with low levels of TOP.
Figure 7: TOP in ME
Figure 8: TOP in EModE

Figure 9: TOP in LModE
Table 6: TOP and registers across time
ME Handbook Religious treatises

History Biography
Travelogue Law
EModE Trials Biography

Travelogue Philosophy
Diary History
Sermon Fiction
LModE Science Philosophy

Drama Sermon
Law Diary
Biography Fiction
The distribution of TOP over different registers in ME is considerably more com-

plicated than the partitioning of registers according to the frequencies of LFD,
since the families of registers resulting from the grouping in Figure 7 do not lead
to an easy explanation in terms of, for example, narrative versus expository
status, written versus speech-based nature or dialogic versus monologic charac-
ter. The binomial condition of less formal versus more formal/literate could pos-
sibly constitute the baseline for the assessment of the cline in Figure 7, with less
formal registers (for example, Handbook, Travelogue and the speech-purposed
Philosophy texts) in the group of registers containing fewer examples of TOP, and
more formal registers (Religious treatises and Law) with many more instances of
TOP. Nonetheless, Figures 8 and 9, which provide the information corresponding
to, respectively, EModE and LModE, and Table 6, with an overview of the pre-
vailing trends over time, reveal that TOP is no longer a textual marker in Modern
English, in that it is a frequent syntactic device found in registers like Law and
History, commonly classified as formal registers, and in Fiction or Diary, which
are indisputably less formal. The data thus make clear the textually unmarked
status of TOP as a functional or situational marker.
As mentioned in Section 3.1, in an attempt to give value to the connection
between the distribution of formal linguistic features and the situational or func-
tional status of registers, I would like to establish a link between the unmarked
textual condition of TOP, resulting from the analysis of the data, and the linguis-
tic characterisation of TOP as a syntactic device in English. Syntactically, TOP
involves the promotion of a constituent (either complement or modifier/adjunct)
to sentence-initial position, which does not imply the violation of the unmarked
subject-verb design of the English declarative clause. In Section 4 I will hold the
position that if a given linguistic feature (in this research, a construction type)
does not trigger a significant level of linguistic (here, syntactic) markedness,
then a blatant functional or situational interpretation derived from the occur-
rence of the feature will not necessarily be at work. What I will be hypothesising
later, although I am aware that this demands further research, is that linguistic
markedness runs parallel to consistent functional specificity. If this is indeed the
case, it would further emphasise the empirical relevance of multidimensional
approaches.
3.3 Subject-last constructions and register
This section provides statistical information corresponding to the SUBJ-LAST con-

structions analysed in this chapter, namely subject-inversion and subject-extra-
position. Figures 10, 11 and 12 display the distribution of SUBJ-LAST across time
and Table 7 summarises the groups of registers depending upon the frequency of
SUBJ-LAST.
Figure 10: SUBJ-LAST in ME
Figure 11: SUBJ-LAST in EModE

Figure 12: SUBJ-LAST in LModE
Table 7: SUBJ-LAST and registers across time
ME Handbook Travelogue
Law Romance
Philosophy
EModE Philosophy Fiction

Law Diary
Travelogue
Drama
LModE Science Drama

Trials History
Handbooks Fiction
Both the previous figures and Table 7 show that the frequencies of the SUBJ-LAST
constructions investigated in this study are somehow connected to the degree of
subject-involvement, as evinced by the registers in the database. In registers such
as Law and Science (and many Handbooks in the LModE corpus), which proto-
typically avoid speaker/writer- or hearer/reader-oriented linguistic features, one
finds fewer examples of SUBJ-LAST constructions. By contrast, practically all the
registers in the rightmost column in Table 7 (Travelogue, Romance, Fiction, Diary,
Drama) would be described as subject-oriented registers in the traditional stylo-
metric literature and do contain many examples classifiable as SUBJ-LAST in this
study. Furthermore, such a functional characterisation of the registers which are

more prominent as far as the frequency of the variable SUBJ-LAST is concerned is
strikingly stable in the periods explored here.
The finding reported in the previous paragraph reinforces the connection
between, on the one hand, the highly marked syntax of a construction and, on the
other, its substantive functional defining role. Two remarks seem in order here:
first, SUBJ-LAST constructions by definition wreak havoc on the unmarked syn-
tactic design of English clauses, since their syntactic subjects are placed in final
postverbal position. Second, the data reflect that the frequency of SUBJ-LAST is
a strong indicator of the degree of participant-involvement of a given register.
Briefly, then, syntactic markedness and functional priming have been shown to
go hand in hand in the recent history of English also as far as subject-inversion
and subject-extraposition are concerned.
4 Summary and concluding remarks

This study has drawn on the multidimensional assumption that registers are
(basically) linguistic units which can be associated with specific functional,
textual and stylistic interpretations, which is in line with Biber and Conrad’s
(2009: 1) well-known ‘register perspective’, which “combines an analysis of lin-
guistic characteristics that are common in a text variety with analysis of the situ
ation of use of the variety”. In this study I have explored the premise that a set
of linguistic constructions, in particular three syntactic strategies with marked
word-order designs in English, can be taken as markers of the functional, textual
and stylistic characterisation of registers. The three constructions investigated
here are topicalisation, left dislocation and extraposition.
This study has shown, first, that LFD is a linguistic strategy which has been
associated with literate registers from ME to LModE. This is a weighty finding,
since the connection between LFD and textual literacy is not in keeping with the
conversational character which is attributed to LFD in Present-Day English in the
literature. To give an example, Biber et al. (1999: 957–958) claim that “Prefaces
[LFD] […] are almost exclusively conversational features […] Prefaces are a sign
of the evolving nature of conversation”. Second, it was found that TOP can be
described as a literacy strategy in ME which has become progressively more textu-
ally unmarked in Modern English. Finally, the so-called SUBJ-LAST constructions
investigated in this chapter are claimed to feature subject-hearer involvement.
I have also suggested that the data serve to illustrate the link between lin-
guistic markedness and situational definition. It was proposed that those con-
structions which are syntactically most marked as far as word order is concerned
constitute hallmarks of well-defined situational interpretations of the registers
in which they occur at an appropriate frequency. In this respect, since TOP does
not significantly alter the unmarked subject-verb(-complement) organisation of
the English clause, it has thus been shown not to trigger a register-specific situ-
ational interpretation and, as already reported, has been defined as a textually
unmarked linguistic device. By contrast, the occurrence of LFD and SUBJ-LAST in
sentences which end up exhibiting syntactically marked word-order designs has
been related to specific situational interpretations: LFD evinces register literacy
and SUBJ-LAST is a marker of subject- or participant-involvement in a register.
The study concludes that word-order strategies can be added to the list of
linguistic features, units or variables on which register analysis can rely. This not-
withstanding, a final remark is in order here to acknowledge the high level of
heterogeneity in the registers which the statistical analysis of the texts has iden-
tified. First, hybridity in registers is sometimes a formal or a linguistic issue. In
this respect, Biber and Finegan (1988: 3) recognise that for some registers “greater
linguistic differences exist among texts within the categories than across them” –
to give some examples, in this chapter I noted both the speech-related status of
some Philosophy texts and the differences in subject-involvement among modern
Handbooks. Second, as contended by writers such as Virtanen (2010: 58) when
she says that “texts are seldom unitype; text types usually appear in embedded
hybridized forms, resulting in multiple texts”, the multidimensional model must
be able to encompass the existence of texts and even text types which are not
prototypical indicators of a given situational or textual interpretation. Finally, as
recognised in Biber and Conrad (2009: Chapter 7), hybridity also underlies the
classification of texts into registers – see also Biber & Egbert (this volume) for
an experiment on the classification of (mostly) hybrid internet registers. Virta-
nen (2010: 76) also refers to this when she says that “[o]ne and the same text
type can be put to use in very different genres [registers], and one and the same
genre easily manifests texts that can be related to very different types”. The model
would thus benefit from the statistical analysis of individual texts by means of
factorial or logistic regression techniques.
To conclude, two issues have been left for further research. On the one hand,
the validity of the findings in this study should be tested by extending the time
span of the investigation to include Present-Day English data. In this respect,
parsed corpora of contemporary English would provide empirical evidence of the
issues raised in this chapter. On the other hand, a key issue in historical regis-
ter variation, one pointed out in Biber and Conrad (2009: 166), is the distinction
between language change and register variation. As recognised in Lijffijt et al.
(2012), the null assumption in diachronic textual studies has usually been that
a single-register corpus provides homogeneous linguistic data over time with

regard to unique functional or situational implications. Were this the case, varia-
tion in corpus studies would lead straightforwardly to the observation of general
diachronic change in language. By contrast, if the defining linguistic and/or sty-
listic features of registers were claimed to be subject to change over time, then
linguistic register variation would not necessarily imply diachronic change of the
language’s grammar. This leads us to the conclusion that corpus-based register
analysis will benefit from fine-grained analyses of the data in order to detect quali
tative inconsistencies which are, on occasions, blurred by the statistical results.
References
Press.
Biber, Douglas. 1995a. Dimensions of register variation. A cross-linguistic comparison.
Biber, Douglas. 1995b. On the role of computational, statistical, and interpretive techniques in
a multi-dimensional analysis of register variation. A reply to Watson. Text 15(3). 341–370.
Biber, Douglas. 2013. Register as a predictor of linguistic variation. Paper presented at
‘Register revisited: New perspectives on functional text variety in English’ International
Conference, University of Vechta, 27–29 June.
University Press.
Biber, Douglas & Jesse Egbert. This volume. Towards a user-based taxonomy of web registers.
Birner, Betty J. 1994. Information status and word order: An analysis of English inversion.
Language 70(2). 233–259.
Bolinger, Dwight. 1992. The role of accent in extraposition and focus. Studies in Language
16(2). 265–324.
Culpeper, Jonathan & Merja Kytö. 2010. Early Modern English dialogues: Spoken interaction as
writing. Cambridge: Cambridge University Press.
Dorgeloh, Heidrun. 1997. Inversion in modern English: Form and function. Amsterdam: John
Benjamins.
Dorgeloh, Heidrun. This volume. The interrelation of register and genre in the medical register.
Dorgeloh, Heidrun & Anja Wanner. 2010. Introduction. In Heidrun Dorgeloh & Anja Wanner
(eds.), Syntactic variation and genre, 1–26. Berlin: Mouton de Gruyter.
Fairclough, Norman. 1992. Discourse and social change. Cambridge: Cambridge University
Press.
Geluykens, Ronald. 1992. From discourse process to grammatical construction: On
left-dislocation in English. Amsterdam: John Benjamins.
Green, Georgia M. 1980. Some wherefores of English inversion. Language 56. 582–601.
Gundel, Jeanette K. 1985. ‘Shared knowledge’ and topicality. Journal of Pragmatics 9(1).
83–107.
Halliday, Michael A. K. 1978. Language as social semiotic. London: Edward Arnold.
Kreyer, Rolf. 2010. Syntactic constructions as a means of spatial representation in fictional
prose. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic variation and genre, 277–303.
Berlin: Mouton de Gruyter.
Kroch, Anthony & Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second
edition.
Kroch, Anthony, Beatrice Santorini & Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early
Modern English.
Kroch, Anthony, Beatrice Santorini & Ariel Diertani. 2010. Penn-Helsinki Parsed Corpus of
Modern British English.
Lijffijt, Jefrey, Tanya Säily & Terttu Nevalainen. 2012. CEECing the baseline: Lexical stability and
significant change in a historical corpus. In Jukka Tyrkkö, Matti Kilpiö, Terttu Nevalainen
& Matti Rissanen (eds.), Studies in variation, contacts and change in English. Vol. 10:
Outposts of historical corpus linguistics: From the Helsinki Corpus to a proliferation of
resources. Helsinki: University of Helsinki (Research unit for Variation, Contacts and
Change in English). http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_
nevalainen (accessed 9 February 2015).
McCawley, James D. 1988. The syntactic phenomena of English. Vols. 1, 2. Chicago: The
University of Chicago Press.
Pérez-Guerra, Javier. 2005. Word order after the loss of the verb-second constraint or the
importance of Early Modern English in the fixation of syntactic and informative (un-)
markedness. English Studies 86(4). 342–369.
Prince, Ellen F. 1981. Topicalization, focus-movement, and Yiddish-movement: a pragmatic
differentiation. In Danny K. Alford Karen, Ann Hunold & Monica A. Macaulay (eds.),
Proceedings of the Seventh Annual Meeting of the Berkeley Linguistics Society, 249–264.
Berkeley: Berkeley Linguistics Society.
Prince, Ellen F. 1997. On the functions of left-dislocation in English discourse. In Akio Kamio
(ed.), Directions in functional linguistics, 117–144. Philadelphia: John Benjamins.
Schubert, Christoph. This volume. Introduction: current trends in register research.
Swales, John M. 1990. Genre analysis: English in academic and research settings. Cambridge:
Cambridge University Press.
Taavitsainen, Irma. 2001. Changing conventions of writing: The dynamics of genre, text types,
and text traditions. European Journal of English Studies 5(2). 139–150.
Takahashi, Kunitoshi. 1992. Constructionally presentational sentences. Lingua 86. 119–148.
Virtanen, Tuija. 2004. Point of departure: Cognitive aspects of sentence-initial adverbs. In
Tuija Virtanen (ed.), Approaches to cognition through texts and discourse, 78–97. Berlin:
Mouton de Gruyter.
Virtanen, Tuija. 2010. Variation across texts and discourses: Theoretical and methodological
perspectives on text type and genre. In Heidrun Dorgeloh & Anja Wanner (eds.), Syntactic
variation and genre, 53–84. Berlin: Mouton de Gruyter.
Index
academic writing 4, 8, 10, 137–138, 139–165, divided attention 189–191
169–191, 195, 200–201, 206–211, double referentiality/doubly referential 122,
215–217, 221, 223–247, 251, 254–269 131–132
air traffic control communication 67–73, 75, dual nature 181, 189
79–80, 82–83
air traffic management 69–70, 75, 79 electronic communication 275
attention system 169, 178–179 electronic media 276, see also medium,
attenuation effect 172, 190 electronic medium
audience participation 282, 284, 296, 300, electronically-mediated 271, 278, 300
302 exclamation mark 118, 139, 142, 147–153,
automatic register/genre identification 20, 155–158, 162–164
23, 39 extraposition 9, 222, 307, 312–315, 329, 332
Aviation English 10, 17, 67–83
face-to-face conversation 2, 72–73, 83, 255
brackets 137, 139, 142, 147, 150–153, football language 272, 274, 279–280, 282,
156–157, 160–162, 164–165, 173 286, 288, 299–300
frame 130–133
cognitive linguistic(s) 111, 113, 130
cognitive representation 113, 121, 124, genre 1, 2, 4–5, 8, 17, 20–21, 23, 43–62, 88,
129–130 95, 123, 142, 163, 189, 227, 253, 271,
cognitive semantics 169–170, 173, 176, 178, 275, 300, 307–309, 333
191
cohesion/cohesive 3, 114–115, 120, 138, 196, hip-hop 10, 17–18, 87–109
202–205, 209, 213–218, 245 hybrid(ity) 33, 43–45, 49–53, 55–59, 271,
comic 2, 10, 137, 139, 153–165 325, 333
comma 139–140, 142, 150–157, 160, 164, 185 ––hybrid register 19, 22–23, 27–28, 30–32,
conceptual metaphor/conceptual metaphor 36–40, 44, 62, 222, 272
theory (CMT) 88, 221, 224–226, 227,
230–234, 238, 240, 246 ICE see International Corpus of English (ICE)
conjunction 138, 195, 203–205, 209–215, illness blog 17, 43, 48–52, 58–61
217, 311 infotainment 271–272
contrastive linguistics 7, 10 intercultural communication 2
cross-cultural/cross-linguistic register 2, 7, International Corpus of English (ICE) 138,
222, 268, 271, 273–274, 298, 301 195–196, 199–201, 203, 205, 218, 221,
223, 228–229, 231–234, 237–247, 251,
description 9, 24, 26, 28–38 257, 261–269, 288
diachronic 1, 5–6, 10, 88, 221–222, 307–334 internet/web 9–10, 17, 19–40, 50–52, 89,
dialect 2, 4, 6–7, 82–83, 102, 196–197, 237, 92, 113, 151, 163, 218, 222, 230, 247,
288, 290, 299 271–302, 333
discourse hybridity 17, 44, 49, 52, 55–56, 62 intertextual/intertextuality 18, 111–133, 292,
discourse type 44, 47, 62 299, 301
discussion 24–25, 28–29, 31–32, 34, 37–38 inversion 100, 148, 307, 312–315, 329, 332
dislocation 222, 307, 312, 323, 332, see also
left dislocation left dislocation (LFD) 312–326, 328, 332–333
338 Index
lexical density 138, 195, 203, 205, 208–217, popular music/pop songs 2, 18, 87–89,
288 91–92, 94–96, 98–99, 105, 125, 127–128
lyrics 18, 25–26, 30–31, 87–109 pronominal reference 56, 59–60, 202–203,
205, 216, 312
marked(ness) 49, 57, 83, 118, 129, 152, 155, pronoun 9, 49, 56, 59, 93, 114, 120, 138, 143,
158, 161, 200, 222, 307–334 147, 195, 203–212, 214–216, 227, 252,
MDA see multidimensional analysis (MDA) 255–258, 314, 317
medical case report 17, 50, 52–53, 57, 59, 61 ––personal pronoun 60, 88, 93, 103–105,
medical discourse 17, 43–62 114, 185, 203, 206–207, 255–256,
medium 4, 6–7, 9, 57, 72, 91, 114, 138, 145, 261–262, 265–266
170, 172–175, 180–181, 195–219, 222, punctuation 90, 119, 137, 139–165, 171, 173,
279, 294 175, 183
––electronic medium 275, 300
––medium of print 176, 188–189 quasi-conversation 282, 284, 300
––spoken medium 181, 215 question mark 118, 142, 146–153, 155–158,
––written medium 174, 182, 196, 202, 210, 162–165
293, 299, 325
metaphor(ical) 2, 5, 10, 88, 119, 187, 189, raters 19, 23, 27, 30–39
221, 223–247, 290, 299, 301, see real-time online text commentaries (OTCs) 2,
also conceptual metaphor/conceptual 10, 222, 271–302
metaphor theory (CMT) reference/referential 59, 122–124, 131–132,
multidimensional 6–7, 139, 164, 253, 307, 180, 186, 203
309–312, 329, 332–333 regional variation 6–7, 10, 138, 196, 215, 219,
multidimensional analysis (MDA) 4, 6, 9–10, 251–252, 255, 261–262, 268
253, 326
SFL see Systemic Functional Linguistics (SFL)
narration 9, 31, 37–38, 44, 48, 52, 56–59, sociolect 3
139, 154, 300 sociolinguistic approach 3, 6–7, 112, 138
narrative/narrativity 24–25, 28–40, 43–45, specialised registers 10, 17, 67–83
47–62, 160, 177, 272, 274–275, 277, 328 spoken mode 9, 24–25, 137, 172–173, 179,
New English(es) 7, 10, 221, 223, 252, 257, 181, 189, 255, 265, 275, 293
268 standard(s) of textuality 115, 122
newspaper writing 2, 139, 147–148, 222, 293 standardised phraseology 17, 67, 70–83
noun phrase (NP) 9–10, 103, 114, 129–130, style 1, 4–5, 123, 139, 145, 150, 155, 158, 163,
144, 221, 251–269, 312, 314, 316, 318 179, 183, 210–211, 262, 271, 273–279,
noun phrase complexity/NP complexity 221, 284, 295, 300–302, 308–310, 320
251–269 sub-register 4, 17–18, 19–40, 44, 73–74,
88, 105, 111, 122–133, 142, 176, 221,
opinion 24–26, 29, 31–40, 177, 284, 300 223–247, 257
OTCs see real-time online text commentaries suspension dots 142, 152, 154, 156–157,
(OTCs) 160–165
synchronic 221
paratext(ual) 274, 279, 295, 300 Systemic Functional Linguistics (SFL) 3, 8,
parenthetical construction 137, 151–152, 169, 196–198
170–175, 179–186, 189–191
persuasion 9, 26, 28, 30–33, 36–39, 265 teaching 8, 87
plain Aviation English 10, 17, 67–83 text 1–5, 8–9
Index 339
time adverbials 49, 56–59, 143 ––regional variety 1, 195, 217–218, 221,
topic 3, 8, 33, 48–50, 58, 60–62, 74, 80, 251–254, 262, 267–268
82–83, 92–95, 114, 176–177, 237–238, ––text(ual) variety 1, 7, 44, 50, 54, 56, 111,
242, 272 227, 309, 332
topicalisation (TOP) 222, 307, 312–334 variety-specific 198–199, 205, 231, 239–240,
Twitter 271–274, 283–284, 296, 301 242, 246–247, 259, 267, 269
unmarked 60, 119, 129, 152, 313, 315, 326, web see internet/web
329, 332–333 word order 9–10, 144, 222, 252, 307, 312,
316, 321–322, 332–333
variational text linguistics 1, 221 World Englishes 6, 223–224, 252
variety 2–11, 23, 40, 43–62, 71–78, 82–83, written mode 24–25, 113, 172, 181, 190, 271
138–139, 142, 158, 195–219, 221,
223–247, 251–269, 272, 308–309, 332

Variational Text Linguistics (2016)

Uploaded by

Copyright:

Available Formats

Variational Text Linguistics (2016)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Variational Text Linguistics (2016)

Uploaded by

Copyright:

Available Formats

Christoph Schubert and Christina Sanchez-Stockhammer (Eds.

Revisiting Register in English

Library of Congress Cataloging-in-Publication Data

Bibliographic information published by the Deutsche Nationalbibliothek

© 2016 Walter de Gruyter GmbH, Berlin/Boston

Christoph Schubert and Christina Sanchez-Stockhammer

Section I: Specialised registers

Douglas Biber and Jesse Egbert

Section II: Cross-register comparison

Stella Neumann and Jennifer Fest

Section III: Regional, contrastive and diachronic

1 Research interest and goals of the volume

Christoph Schubert, University of Vechta

As is the case in the influential monograph by Halliday (1978), “registers” are

3 Recent developments in register research

Fifth, from an applied linguistic perspective there are numerous publica-

4 A model for register analysis

knowledge. (3) Channel: the communication can be conducted in the written or

can be retrieved. Eventually, it is possible to identify register-specific dimension

5 An outline of the volume

Douglas Biber, Northern Arizona University

2.1 Corpus for analysis

2.2 Overview of procedures

3 Register categories distinguished in the study

Table 1: General web register categories distinguished in the study

A. Internet texts that originated in the spoken mode

For the third general category – written non-interactive internet texts – we

Table 2: Web registers and sub-registers distinguished in the study

1. Internet texts that originated in the SPOKEN mode

2. INTERACTIVE internet texts that originated in the WRITTEN mode

3.–8. Non-interactive internet texts that originated in the written mode

3. NARRATIVES or reports of events [past, present or future]

4. INFORMATIONAL DESCRIPTION or EXPLANATION

6. describe or explain FACTS WITH INTENT TO PERSUADE

7. explain HOW-TO or INSTRUCTIONS

8. express oneself through LYRICS

4 Distribution of registers on the web

4 agree 3 agree 2-2 split 2-1-1 split No Total

315 269 104 173 70 931

4 agree 3 agree 2-2 split 2-1-1 split No Total

171 231 73 90 366 931

Table 5: Frequency information for general register categories

Narrative 177 19.0

Interactive Discussion 79 8.5

Table 6: Frequency information for sub-register categories

News report/blog 99 55.9

Other fictional narrative 0 0

Informational Description/Explanation 140

Description of a thing 34 24.3

Opinion blog 57 47.1

Question/answer forum 46 58.2

Song lyrics 17 89.5

Description with intent to sell 8 53.3

Table 7: General register 2+2 hybrid combinations

Hybrid Combination (2+2) Count

Narrative + Informational Description/Explanation 43

Table 8: General register 2+1+1 hybrid combinations

Hybrid Combination (2+1+1) Count

Narrative + Description + Opinion 56

<h> It’s King Tony to see you, ma’am

1 Research interest and goals of the volume

3 Recent developments in register research

4 A model for register analysis

5 An outline of the volume

2.1 Corpus for analysis

2.2 Overview of procedures

3 Register categories distinguished in the study

A. Internet texts that originated in the spoken mode

1. Internet texts that originated in the SPOKEN mode

2. INTERACTIVE internet texts that originated in the WRITTEN mode

3. NARRATIVES or reports of events [past, present or future]

4 Distribution of registers on the web

6 Summary and future directions

Informational Description/ 140 (30 %) 97 (21 %) 231 (49 %) 468

2 Some theoretical issues on register and genre

2.2 Genre in relation to register and discourse type

3 Types of medical discourse

3.1 Sources and voices in medicine

3.2 Illness blogs: The patients’ tale

3.3 The medical case report

3.4 ‘Clinical Crossroads’ in JAMA

4.1 Data and research aims

4.2 Degree-zero narrativity in different medical genres

4.2 F rom register feature to genre feature: Exploring reference

2 English in Air Traffic Control

3 Registers of Aviation English

3.3 Plain Aviation English