Ethnomethodological Conversation Analysis in Motion_ Emerging Methods and New Technologies -- Pentti Haddington (editor), Tiina Eilittä (editor), Antti Kamunen (editor), Laura Kohonen-Aho -- 1, 2023 -- 9781032544410 -- 0bf1f433a56014d4bffc
Ethnomethodological Conversation Analysis in Motion_ Emerging Methods and New Technologies -- Pentti Haddington (editor), Tiina Eilittä (editor), Antti Kamunen (editor), Laura Kohonen-Aho -- 1, 2023 -- 9781032544410 -- 0bf1f433a56014d4bffc
Ethnomethodological Conversation Analysis in Motion_ Emerging Methods and New Technologies -- Pentti Haddington (editor), Tiina Eilittä (editor), Antti Kamunen (editor), Laura Kohonen-Aho -- 1, 2023 -- 9781032544410 -- 0bf1f433a56014d4bffc
CONVERSATION ANALYSIS
IN MOTION
Tiina Eilittä is a doctoral researcher at the Research Unit for Languages and
Literature at the University of Oulu, Finland.
Antti Kamunen is a postdoctoral researcher at the Research Unit for Languages
and Literature at the University of Oulu, Finland.
PART 1
Exploring “being a member” 19
vi Contents
PART 2
Broadening the analyst’s access to a member’s
perspective by using various video materials 83
PART 3
Augmenting analyses of the member’s perspective with
multiple research materials and methods 151
PART 4
Enhancing transparency of analytical processes 219
Index 265
LIST OF CONTRIBUTORS
Iuliia Avgustis is a doctoral researcher at the Research Unit for Languages and
Literature and the Research Unit INTERACT: Human Computer Interaction and
Human Centred Development at the University of Oulu, Finland, and a member of
the project Smart Communication.
Her interdisciplinary video-based research combines methodological insights
from ethnomethodology, multimodal conversation analysis, phenomenology, soci-
ology, and human-computer interaction. In her doctoral dissertation, she focuses
on collocated and collaborative smartphone use in the context of everyday face-
to-face interactions.
Tobias Boelt Back is Assistant Professor in the Department of Culture and Learning
at Aalborg University, Denmark, and member of the Centre for Discourses in
Transition (C-DiT).
His research focuses on risk and inequality as barriers for transitions towards
more environmentally sustainable mobilities. He uses graphic transcription and
ethnomethodological membership categorisation analysis to explicate the catego-
rial moral order of public transport settings as captured by 360-degree recording
devices. Back is part of the Travelling Together project (2021–2023).
List of contributors ix
Tiina Eilittä is a doctoral researcher at the Research Unit for Languages and
Literature at the University of Oulu, Finland.
She uses conversation analysis to study everyday adult–child interactions. Her
research data consist of Finnish and English video recordings in families and
early childhood education. She has published in the Journal of Pragmatics and
Gesprächsforschung, works as an editor in the Finnish Journal of Linguistics, and
co-edits the book Complexity of Interaction.
x List of contributors
He is co-founder of the Video Research Lab (VILA), the QuiViRR journal, and
the BigSoftVideo team who develop software tools enhancing qualitative video
research.
Tuire Oittinen is a postdoctoral researcher at the Research Unit for Languages and
Literature at the University of Oulu, Finland.
She uses video-recorded data and multimodal conversation analysis to inves-
tigate social interaction in multilingual work and educational settings. Currently,
she analyses inclusive teamwork practices in remote crisis management train-
ing. Her work has been published in academic journals, such as the Journal of
Pragmatics and Social Interaction. Video-Based Studies on Human Sociality.
Iira Rautiainen is a postdoctoral researcher at the Research Unit for Languages and
Literature at the University of Oulu, Finland.
She combines ethnomethodology, conversation analysis, and ethnography to
study social interaction in multinational crisis management training. Currently,
she examines interactional practices in collaborative situations. She has pub-
lished in the Journal of Pragmatics, co-edited the Finnish Association of Applied
Linguistics (AFinLA) yearbook 2020, and is currently co-editing the book
Complexity of Interaction.
We are grateful to the publisher Routledge and especially Senior Publisher Louisa
Semlyen for her faith in the book from the very beginning. We also thank Senior
Editorial Assistant Talitha Duncan-Todd for the vital help, support, and advice
during the typesetting process. It has been a pleasure to work with both of you.
We also thank the contributing authors for their hard work and patience with the
process and us editors. We could not have done this without you.
This book would not have been possible without the help of many colleagues
around the world. We want to thank Liz Stokoe for a critical piece of advice and
collegial support at a moment when it was most needed. We also want to thank
our colleagues in the COACT research community at the University of Oulu for
providing an inspirational environment for exploring questions related, among
so many other things, to methods and methodologies for studying social interac-
tion. We are extremely grateful to our colleagues – both authors in the book and
external readers – who have generously given their time and support when reading
and reviewing the chapters and providing critical feedback and advice. Your help
has been invaluable. We are grateful also to three scholars who reviewed the book
proposal and made helpful suggestions that have developed the book in many
ways. We also want to thank Melisa Stevanovic and Sylvaine Tuncer for carefully
reading the book’s introductory chapter and giving critical remarks and sugges-
tions for improving it.
This book would not have been possible without the funding we have received
for two projects, iTask: Linguistic and Embodied Features of Interactional
Multitasking and PeaceTalk: Talk and Interaction in Multinational Crisis
Management Training. Both projects have been funded by the Academy of
xiv Acknowledgements
Finland (decision numbers 287219 and 322199) and the Eudaimonia Institute at
the University of Oulu. We gratefully acknowledge the support by both of them.
We are privileged to be able to belong to the world-wide EMCA commu-
nity and explore the world from the perspectives of talk and interaction, and we
acknowledge our debt to the EMCA community, whose research continues to be
a constant inspiration and source of joy to us. We hope that this volume will be a
source of new ideas and inspiration for the study of social interaction in the EMCA
community. We look forward to the future of EMCA. Lastly, we want to thank our
families, friends, and other close ones for their interest, support, and understand-
ing along the way.
1
ETHNOMETHODOLOGICAL
CONVERSATION ANALYSIS IN MOTION
An introduction
Introduction
Ethnomethodology (EM; e.g., Arminen, 2006; Garfinkel, 1967, 2002; Heritage,
1984) and Conversation Analysis (CA; e.g., Sacks, 1992; Schegloff, 2007; Sidnell
& Stivers, 2013) are both approaches to the study of social action. Their theoreti-
cal and methodological backgrounds can be traced back to a common historical
and intellectual origin, particularly to Garfinkel’s re-reading of Talcott Parsons’s
sociological theory and the idea of actors’ subordination to the social system, as
well as the phenomenological philosophy of Edmund Husserl, Aron Gurwitch and
Alfred Schütz (e.g., Heritage, 1984; Housley, 2021; Maynard, 2013). EM is not
considered a method, nor does it use a specific method to achieve its analytical
objectives; rather, it uses and relies on different qualitative and descriptive tools to
help access and analyse participants’ reasoning procedures and practical actions
that constitute and order their everyday lives.
EM has had a strong influence on the inception and development of CA. CA is
often considered a rigorous method for the study of the organisation of social inter-
action as it is accomplished by participants through talk and multimodal conduct.
While CA focuses on social actions as they are produced in sequences of social
interaction and relies on audio-video recordings of naturally occurring interaction,
EM also uses methods such as (auto-)ethnography, observation and re-enactments,
and studies the perception of the social world (e.g., Coulter & Parsons, 1990), rule-
following and practical reasoning (e.g., Livingston, 1987; Sormani, 2014), and tech-
nology (e.g., Dourish, 2001), among others. For scholars in both EM and CA, the
intelligibility and accountability of action emerge from the actions and activities
themselves and can be traced back to and evidenced by the participants’ own con-
duct (e.g., natural language use or embodied behaviour). Through this approach,
DOI: 10.4324/9781003424888-1
2 Tiina Eilittä et al.
This book takes the above as its starting point, but it also explores possible new
methods, solutions, and proof procedures for uncovering participants’ common-
sense knowledge, practices, and reasoning procedures that they use to accomplish
activities in interaction, which to us is in line with ethnomethodological thinking.
This book has its origins in an affiliated edited volume that explores the constitu-
tion of joint action and shared understanding in complex interactions (Haddington
et al., in press). While preparing the volume, we learned that studying the com-
plexity of interaction involves new questions about how participants’ backgrounds
and the rich material and multimodal contexts contribute to the joint constitu-
tion of action and social order. We were also struck by how responding to the
demands of studying social action in previously understudied settings involved
the use of new methods and solutions for capturing and analysing interactions, and
that video corpora were becoming exceedingly rich and diverse. Along with the
richer datasets and more detailed focus on the multimodal details of interaction,
the analysis and visual representations of video data were taking new forms. The
affiliated volume taught us that there are unanswered questions and new methodi-
cal approaches that seem to be pushing the boundaries of EMCA. These issues and
questions are now explored in this book.
At the outset, we approached colleagues who were using video-based methods
and exploring new avenues in EMCA and asked: What aspects of their research
had guided them towards creative methodological thinking? What traditional and
new solutions had they used, and to what ends? As a result, the following ques-
tion began to emerge and unite the current book’s chapters: What can be treated
as evidence for analytic claims in EMCA, and how can such evidence be analysed
and represented? This question connects with EMCA’s analytic mentality and ties
firmly with Garfinkel’s notion of “unique adequacy requirement”, referring to the
analyst’s “vulgar” competence in the studied activity (Garfinkel, 2002; Garfinkel
& Wieder, 1992). In EMCA, the member’s perspective becomes evident and is
mobilised in a technical sense with the next-turn proof procedure.
This volume studies how EMCA’s robust approach to analyse social action and
activity could be supported with new methods, without losing or compromising
its strictly empirical roots. It explores how unique adequacy and common-sense
knowledge could be uncovered, for example, by accompanying the next-turn
proof procedure with other “proof procedures”, such as ethnographic tools and
knowledge about the studied community. From quite a different methodological
perspective, the chapters also discuss the possibility of acquiring access to mem-
bers’ private actions and how it could contribute to the analysts’ understanding
of members’ reasoning procedures and the joint constitution of action, activity,
and social order, and what implications it may have for analysis. This book is a
conversation opener, inviting critical and constructive dialogue on how EMCA’s
methodology and toolbox could be developed for the purpose of acquiring richer
perspectives on endogenous social action. The next section positions this book
alongside the history and development of EMCA and its evolution from the
4 Tiina Eilittä et al.
Ishino, 2018; Rossano, this volume; Wagner et al., 2018). Longitudinal perspective
has been used to explore such topics as socialisation processes and second and
foreign language learning (Pekarek Doehler & Balaman, 2021; Pekarek Doehler
& Berger, 2016). This is illustrative of the richness and flexibility of EMCA as a
methodology and the possibility to develop it in new directions.
Moreover, new video data and methodological developments have inspired
EMCA scholars to improve existing or design completely new ways to anno-
tate, visualise, and represent research materials. This need has been prompted
by the richness of multimodal data, which often defy simple and straightforward
modes of representation. Recent excellent examples include the Laurierian comic
strip visualisations (Laurier, 2013, 2019; Laurier & Back, this volume; see also
Skedsmo, 2021), the combination of traditional transcripts and affective comput-
ing annotations (Rollet & Clavel, 2020), and the embedding of audio or video
clips into online publications (Blythe et al., this volume; Mortensen & Due, 2018;
Greer, 2018; Nevile, 2018). The new annotation and visualisation methods com-
plement the commonly used systems, such as the Jeffersonian system (Jefferson,
2004), which is used to transcribe the details of talk, and the Mondadian system
(Mondada, 2016a), which is used to transcribe multimodal conduct. New digital
solutions for analysis have also had a profound impact on the EMCA methodol-
ogy. A good example is a system by McIlvenny and Davidsen (2017; McIlvenny,
2019; McIlvenny & Davidsen, this volume) that allows new forms of collaborative
analysis across distances and institutional barriers.
An important step within EMCA has been to extend its analytic scope beyond
video recordings and the field itself as a single-method approach and to explore the
value of other methods for the analysis of social action. This step has been partly
inspired by the increasing interest in encounters involving, for example, babies
(e.g., Kidwell, 2009) and people living with intellectual disabilities (e.g., Antaki
& Crompton, 2015), non-human participants (primates or other animals; e.g.,
Mondémé, 2022; Rossano & Liebal, 2014), or non-sentient participants (robots;
e.g., Due, 2021, 2022; Pelikan, this volume; Pelikan & Broth, 2016; Pitsch, 2020;
artificial intelligence systems; e.g., Porcheron et al., 2018; Reeves et al., 2018). The
interaction order in the groups and communities involving these participants may
not only be beyond what research has described but also the analysis and inter-
pretation of social conduct and endogenous action in them may require new forms
of data. Important methodological questions follow: How much can the analyses
involving the above kinds of participants be claimed to be based on the partici-
pants’ own orientations? What does being a member (or a researcher) in these set-
tings mean? What is required to achieve the “unique adequacy requirement” for
conducting analysis in these settings (Garfinkel, 2002; Garfinkel & Wieder, 1992;
Jenkings, 2018)?
Furthermore, research on workplace activities (e.g., Heath & Luff, 2000; Luff
et al., 2000; Whalen & Vinkhuyzen, 2000), influenced by Lucy Suchman’s (1987)
work, has relied on ethnographic knowledge gleaned from the studied settings.
EMCA in motion 7
The chapters in this volume discuss whether and how creative ways of studying
social actions and activities could, in fact, be part of EMCA’s questions and sup-
port the achievement of EMCA’s fundamental aim: the study of the intelligibility
of situated social action from the perspective of the participants themselves. The
volume converses the boundaries of EMCA, what the methodology can be consid-
ered to comprehend, and the various ways in which it can be utilised.
2021). The chapters extend the notion of members’ perspective and discuss what
it means to study interaction in under-researched settings or with non-traditional
participants.
Rossano explores the importance of interactional histories in encounters
between non-human animals. By focusing on sign–meaning relationships, he
identifies change in the communicative practices of mother-infant bonobo dyads.
The chapter shows how bonobo infants’ gestures change over time and highlights
how such a signal change can be investigated systematically. The chapter expands
the notion of interactional histories to social interactions between non-human
animals.
Similarly, Pelikan examines participants whose perspective has previously
been unattainable. She studies robots as participants in social interaction and dis-
cusses how human interlocutors support and scaffold robots in their role as par-
ticipants. Pelikan reflects on how the examination of human-robot interactions
transforms aspects of the EMCA transcription process and illustrates how detailed
transcriptions can make perspicuous the robots’ participation and perspective in
interaction.
Due focuses on interactions involving both seeing and visually impaired par-
ticipants (VIPs), highlighting the role of ocularcentrism in the situated organi-
sation of actions. The resources the VIPs use to engage in joint activity and to
achieve membership in a participation framework are also discussed and raised
as methodological issues when conducting analyses on these settings. Overall,
the chapters in Part 1 contribute to a better understanding of the affordances and
limitations of the traditional EMCA methodology and how the (lack of) access to
a member’s perspective can be addressed in the overall research process.
not only the sequential but also the contextual environment of actions and their
situated interpretation, as well as keeping up with societal changes, such as digi-
talisation and the ubiquitous use of technological devices, have been recognised as
methodological challenges and are raised as key in the contributions. Furthermore,
the contributions illustrate how an increasing amount of EMCA research may bene-
fit and be motivated by a more open, innovative, and collaborative research process.
This volume discusses methodological and analytical questions that cut through
the phases in the EMCA research process. It invites scholars to rethink EMCA
concepts and terminology, such as participation, embodiment, perspective, and
member. The contributions address changes and challenges in data collection.
Not only is the researcher’s role shifting from a mere observer to a participant
in the field, but also the tools for making recordings are constantly developing.
Furthermore, new types of access to interactional environments and phenomena
create fresh, more versatile perspectives on human conduct and the organisation
of social actions. Is the methodology keeping up with such rapid changes? The
same question applies to the processes of transcribing, annotating, and visualising
video data. The practice of transcribing is an integral part of EMCA’s analytical
process, but as video data become more and more complex, new challenges (and,
possibly, biases) are recognised and require solutions. What new ways for data
analysis and representation can be brought forth through adaptive and innovative
solutions for transcribing?
Finally, the chapters explore diverse topics concerning data analysis. Given
the EMCA researchers’ expanding access to new research environments, and thus
to new, context-specific interactional phenomena, the discussion of the validity
of tools (both methodological and technical) that scholars in the same field can
use (e.g., statistics, ethnographic observations, satellite geolocation) is timely and
necessary. How can current methods in EMCA be complemented and supported
with new methods without losing sight of EMCA’s methodological principles and
core strengths?
We understand that some of the views expressed in the chapters of this book
may and will trigger debate. However, our aim has been to explore possible new
methods and analytic practices for doing EMCA without, however, forgetting its
main tenet and objective, that the focus must be on the accomplishment of human
activity and social order from the member’s own perspective. Consequently,
we see this volume as a possibility to review the methodology and mentality of
EMCA, calling for constructive dialogue on its varying possibilities and poten-
tials. Along with digitalisation and (other) societal and global changes, new con-
texts for EMCA exploration will continue to emerge, and we need to address the
demands for methodological development also in the future. In this respect, this
book takes one step towards considering the many opportunities we have for refin-
ing our thinking as scholars in the analysis of human sociality and conduct. The
future will show which ideas and solutions presented in this book will stand the
test of time and EMCA’s principles, and take the field further.
EMCA in motion 13
References
Alač, M. (2011). Handling digital brains: A laboratory study of multimodal semiotic
interaction in the age of computers. MIT Press.
Antaki, C., & Crompton, R. J. (2015). Conversational practices promoting a discourse of
agency for adults with intellectual disabilities. Discourse and Society, 26(6), 645–661.
Arminen, I. (2006). Ethnomethodology and conversation analysis. In C. Bryant & D. Peck
(Eds.), The handbook of the 21st century sociology (pp. 8–16). Sage.
Arminen, I., Licoppe, C., & Spagnolli, A. (2016). Respecifying mediated interaction.
Research on Language and Social Interaction, 49(4), 290–309.
Bennerstedt, U., & Ivarsson, J. (2010). Knowing the way: Managing epistemic topologies
in virtual game worlds. Computer Supported Cooperative Work (CSCW), 19(2),
201–230.
Broth, M., & Mondada, L. (2013). Walking away: The embodied achievement of activity
closings in mobile interaction. Journal of Pragmatics, 47(1), 41–58.
Button, G., Lynch, M., & Sharrock, W. (2022). Ethnomethodology, conversation analysis
and constructive analysis. Routledge.
Carlin, P. (2020). Sacks’ plenum: The inscription of social orders. In R. J. Smith, R.
Fitzgerald, & W. Housley (Eds.), On Sacks: Methodology, materials, and inspirations
(pp. 32–46). Routledge.
Cekaite, A. (2015). The coordination of talk and touch in adults’ directives to children:
Touch and social control. Research on Language and Social Interaction, 48(2), 152–175.
Coulter, J., & Parsons, E. D. (1990). The praxiology of perception: Visual orientations and
practical action. Inquiry, 33(3), 251–272.
Deppermann, A. (2013). Multimodal interaction from a conversation analytic perspective.
Journal of Pragmatics, 46(1), 1–7.
Deppermann, A., & Pekarek Doehler, S. (2021). Longitudinal conversation analysis -
Introduction to the special issue. Research on Language and Social Interaction, 54(2),
127–141.
Dourish, P. (2001). Where the action is: The foundations of embodied interaction. The
MIT Press.
Due, B. L. (2021). Distributed perception: Co-operation between sense-able, actionable,
and accountable semiotic agents. Symbolic Interaction, 44(1), 134–162.
Due, B. L. (2022). Guide dog versus robot dog: Assembling visually impaired people with
non-human agents and achieving assisted mobility through distributed co-constructed
perception. Mobilities, 18(1), 148–166.
Due, B. L., Lange, S. B., Nielsen, M. F., & Jarlskov, C. (2019). Mimicable embodied
demonstration in a decomposed sequence: Two aspects of recipient design in
professionals’ video-mediated encounters. Journal of Pragmatics, 152, 13–27.
Edmonds, R. (2021). Balancing research goals and community expectations: The
affordances of body cameras and participant observation in the study of wildlife
conservation. Social Interaction: Video-Based Studies of Human Sociality, 4(2). https://
doi.org/10.7146/si.v4i2.127193
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice-Hall.
Garfinkel, H. (2002). Ethnomethodology’s program: Working out Durkheim’s aphorism.
Rowman & Littlefield.
Garfinkel, H., & Wieder, D. L. (1992). Two incommensurable, asymmetrically alternate
technologies of social analysis. In G. Watson & R. M. Seiler (Eds.), Text in context:
Contributions to ethnomethodology (pp. 175–206). Sage.
14 Tiina Eilittä et al.
Sidnell, J., & Stivers, T. (2013). The handbook of conversation analysis. Wiley-Blackwell.
Simone, M., & Galatolo, R. (2020). Climbing as a pair: Instructions and instructed body
movements in indoor climbing with visually impaired athletes. Journal of Pragmatics,
155, 286–302.
Simone, M., & Galatolo, R. (2021). Timing and prosody of lexical repetition: How repeated
instructions assist visually impaired athletes’ navigation in sport climbing. Research
on Language and Social Interaction, 54(4), 397–419.
Skedsmo, K. (2021). How to use comic-strip graphics to represent signed conversation.
Research on Language and Social Interaction, 54(3), 241–260.
Sormani, P. (2014). Respecifying lab ethnography: An ethnomethodological study of
experimental physics. Ashgate.
Stevanovic, M. (2016). Keskustelunanalyysi ja kokeellinen vuorovaikutustutkimus. In
M. Stevanovic & C. Lindholm (Eds.), Keskustelunanalyysi: Kuinka tutkia sosiaalista
toimintaa ja vuorovaikutusta (pp. 390–409). Vastapaino.
Stevanovic, M., Himberg, T., Niinisalo, M., Kahri, M., Peräkylä, A., Sams, M., &, Hari, R.
(2017). Sequentiality, mutual visibility, and behavioral matching: Body sway and pitch
register during joint decision making. Research on Language and Social Interaction,
50(1), 33–53.
Stivers, T. (2015). Coding social interaction: A heretical approach in conversation analysis?
Research on Language and Social Interaction, 48(1), 1–19.
Streeck, J., Goodwin, C., &, LeBaron, C. (Eds.). (2011). Embodied interaction: Language
and body in the material world. Cambridge University Press.
Stukenbrock, A., &, Dao, A. N. (2019). Joint attention in passing: What dual mobile eye
tracking reveals about gaze in coordinating embodied activities at a market. In E. Reber
& C. Gerhardt (Eds.), Embodied activities in face-to-face and mediated settings (pp.
177–213). Palgrave Macmillan.
Suchman, L. (1987). Plans and situated action: The problem of human-machine
communication. Cambridge University Press.
Voutilainen, L., Henttonen, P., Kahri, M., Kivioja, M., Ravaja, N., Sams, M., & Peräkylä,
A. (2014). Affective stance, ambivalence, and psychophysiological responses during
conversational storytelling. Journal of Pragmatics, 68, 1–24.
Wagner, J., Pekarek Doehler, S., & González-Martínez, E. (2018). Longitudinal research
on the organization of social interaction: Current developments and methodological
challenges. In S. Pekarek DoehlerS.,, J. Wagner & E. González-Martínez (Eds.),
Longitudinal studies on the organization of social interaction (pp. 3–35). Palgrave
Macmillan.
Whalen, J., & Vinkhuyzen, E. (2000). Expert systems in (inter)action: diagnosing
document machine problems over the telephone. In P. Luff, J. Hindmarsh & C. Heath
(Eds.), Workplace studies: Recovering work practice and informing systems design (pp.
92–140). Cambridge University Press.
Wooffitt, R. (2007). Communication and laboratory performance in parapsychology
experiments: Demand characteristics and the social organization of interaction. British
Journal of Social Psychology, 46(3), 477–498.
PART 1
Exploring “being a
member”
2
HOW TO STUDY INTERACTIONAL
HISTORY IN NON-HUMAN ANIMALS?
CHALLENGES AND OPPORTUNITIES
Federico Rossano
There is a special map at the Austrian National Library in Vienna, hidden from the
public and protected by UNESCO: the Tabula Peutingeriana (the Peutinger Map).
It is the only remaining map representing the ancient road network in the Roman
Empire between the Atlantic Ocean and India. It is a parchment scroll one foot
high and 22 feet long (0.3 × 6.7 metres) with an iconic resemblance to the shape of
a road. It is a precursor of topological maps such as subway maps, in that it does not
represent any detailed geographical information but rather depicts routes and dis-
tances between cities connected by roads. The proportions and actual geographi-
cal orientation are not precise (e.g., Italy is presented as elongated horizontally,
west to east, rather than north to south). Moreover, the map does not represent the
state of the road network at any specific moment in time. For example, on the map,
one can simultaneously find Pompei (destroyed in 79 AD) and Constantinople
(founded in 328 AD). Yet the tabula does serve its purpose: to provide information
about the roads that connect different cities. What it does not show is the process
through which those roads were built and the cities got connected, and how those
roads might have changed over time. In other words, it reminds us that maps are
static and timeless. Time is not part of the picture.
Similarly, several disciplines investigating human communication have tradi-
tionally investigated communicative practices as having unchangeable, timeless
relationships with the social actions they stand for. Conversation Analysis is no
exception (though see Deppermann & Pekarek Doehler, 2021 and Pekarek Doehler
et al., 2018 for recent work on longitudinal conversation analysis). Variations are
accounted via individual differences, community/cultural differences, or con-
textual factors that might have modified how a communicative act should be
interpreted. The way communicative practices change over time is usually not
DOI: 10.4324/9781003424888-3
22 Federico Rossano
obsession with language and linguistic structures had masked the importance of
what was achieved via language: communication and social interaction.
Conversation Analysis (CA) emerged in sociology in the 1960s as a micro-
analytical approach to the organisation of social action in social interaction (Sacks
et al., 1974; Schegloff, 2007; Sidnell & Stivers, 2012). The initial goal of CA was
to create an account of how two people who do not know each other (especially in
institutional settings, e.g., a therapist and a patient, a 911 dispatcher and a caller, a
physician and a patient) could manage to successfully engage in social interaction
and make sense of each other. Accordingly, the assumption was that human com-
munication is orderly and that participants engage in social interaction by relying
on a machinery that can be studied partly, independently of its users.
In CA, the key units are the social actions produced by participants in interac-
tion (e.g., requesting, offering, complaining, inviting). The production of social
actions usually makes relevant responsive behaviour. Even remaining silent, when
responding would be relevant, can be considered a responsive action (Schegloff,
2007). The intelligibility of social action is required for the accomplishment of
mutual understanding, which provides for successful engagement in cooperative
interactions.
In analysing and labelling what social actions are being produced during a con-
versation, conversation analysts tend to adopt an emic perspective (a participant’s
perspective, see Pike, 1966), and, thus, have developed a procedure for it that has
been called the “next-turn proof procedure” (Sacks et al., 1974). The claim is that
the interactional nature of conversation provides an obligation among participants
to display to each other their understanding of the previous conversational turn.
Given the obligation for B to convey how they have understood A’s prior turn, if
B’s turn conveys a misunderstanding of A’s turn, then A can correct herself, and
if no correction occurs, then the assumption should be that B has correctly under-
stood A, and, therefore, that A’s turn was aimed at eliciting the kind of response
that B produced. This procedure has been labelled the “central methodological
resource for the investigation of conversation” (Sacks et al., 1974, p. 728).
Erving Goffman famously rejected the idea that this proof procedure would
be sufficient to account for the design and structure of social action in social
interaction by raising an interesting issue: “[A]n account of second utterances in
terms of their contingency on a first leaves unexplained how there could be any
firsts; after all, from where could they draw their design? Conversation could
never begin” (1983, p. 50). To put it differently, while one major concern in CA
is to provide evidence intrinsic to the interaction that members were treating
practice X as implementing social action Y, Goffman was asking the question:
how did practice X come to be associated with such social action so that a par-
ticipant can recognise it and make sense of it in context? How do we choose the
words that we choose when we calibrate a request? Is there a specific relation-
ship between the format of a specific communicative practice and the social
action it is implementing? Goffman’s concern is even more poignant when we
24 Federico Rossano
More recently, Hobaiter and Byrne (2014) have focused on the meaning of ges-
tural signals by relying on what they have called an “apparently satisfactory out-
come” (ASO). The idea is that if the response to a signal was not satisfactory, the
signaller would pursue the original intended goal and convey that the response
obtained was inadequate. The outcome of the gesture (i.e., the response of the
recipient) becomes the meaning of the gesture. The focus is on the signaller, who
has to appear satisfied by the response. This approach also does not consider that
a lack of a response might count as a sufficient response on some occasions (i.e.,
it could count as a “no” signal). More generally, this approach does not consider
whether the behaviour of the other participant is simply stochastic (i.e., it coin-
cidentally just happens next, like it would be in human interaction if B started
drinking a glass of water after A had said “I am tired”) or whether it is an appro-
priate second pair part to the initial first pair part (i.e., like it would be in human
interaction if B provided a glass of water after A’s “I am thirsty”) (see Mondeme,
2022 and Schegloff, 2007 on the issue of “nextness”).
This simplification arises when starting from forms and inferring functions,
rather than from the specific social activities primates engage in and the social
goals they try to achieve, and then identifying how those activities are initiated
and negotiated via embodied communicative behaviour. Recent work has begun to
move in the latter direction by focusing on primates’ behaviour within a singular
activity: co-locomotion via ventral or dorsal carry (Fröhlich et al., 2016a; Halina
et al., 2013; Hutchins & Johnson, 2009), engagement in social play (Fröhlich et al.,
2016b), or reciprocal greetings (Luef & Pika, 2017). Yet there has been no longi-
tudinal investigation on gesture development, specifically assessing signal change
over time, or the developmental relationship of different signals (i.e., whether sig-
nals are independent of each other, or rather that a signal developed through the
modification of another one). While signal repertoires for each social action have
been identified and data has been collected over weeks, or even months, the gen-
eral goal of these studies has not been to investigate how individuals modify their
signals over time. Rather, the aim of extended data collection has been to give the
authors more chances to capture the full repertoire of the individuals observed.
There is an open debate about how gestures develop in great apes and how
likely it is that most of their gestures are innate or learnt. Scholars who support the
biological inheritance hypothesis suggest that the gestural repertoire of great apes
is genetically pre-determined (Byrne et al., 2017; Genty et al., 2009; Graham et al.,
2016; Hobaiter & Byrne, 2011), while those who support the ontogenetic rituali-
sation hypothesis posit that gestures are progressively ritualised through repeated
dyadic interactions and that individual repertoires may vary (Halina et al., 2013;
Tomasello et al., 1994; Tomasello et al., 1997). A third hypothesis, the cultural
learning hypothesis, suggests that gestures are acquired by imitating those of oth-
ers in the group (Russon & Galdikas, 1993; Tanner et al., 2006); however, there
is no clear empirical support for this claim given the poor imitative skills of non-
human primates (Tennie et al., 2006).
26 Federico Rossano
Data
The following examples come from a larger project developed by Halina et al.
(2013) that analysed 1173 infant carries in 410 hours of video recordings of ten
different mother–infant bonobo dyads from six different zoos, during the infants’
second year of life. Specifically, Halina et al. investigated these data in terms of
actions and gestures leading to mother–infant carries and showed that, in each
dyad, infants had developed a variable number of gesture types through ontoge-
netic ritualisation to get the mothers to engage in a carry. This study provided
three key pieces of evidence in support of ontogenetic ritualisation:
Crucially, Halina et al.’s study did not provide evidence that the gestural signals
change over time. Evidence of change over time is the goal of this chapter.
Interactional history in non-human animals 27
Carries in bonobos
Engaging in a carry in non-human primates can be considered a joint action
(Clark, 1996), whereby each participant plays a specific role in bringing the action
about. For example, the infant usually holds onto the mother’s body while the
mother moves, which is different from human infants. Nonetheless, a carry does
not always play out as a ratified joint action. Indeed, mothers can force infants
into a carry by grabbing them and holding them onto their venter with one arm.
The opposite situation is not possible; that is, an infant cannot force the mother
into a carry. This makes it particularly interesting to investigate asymmetric roles
in these interactions, because if the infant wants to move, then s/he is posed with
the problem of how to get the mother to engage in a carry. In general, carries are
a frequent means for co-locomotion through the environment (on average, carries
occur about five times per hour, see Halina et al., 2013).
Considering the body configuration that allows the baby to be carried around
by the mother, either ventrally or dorsally, there are a few key features relevant to
the following analysis:
1) The baby tends to hold onto the mother’s hair by grabbing her hair on both
sides of the body, a few inches below the mother’s armpits.
2) The baby’s arms are usually open wide and extended while holding onto
the mother’s body, in a hugging position.
3) The baby’s legs are similarly open wide and wrapped around the mother’s
body.
In this paper, I show examples of infant requests to be carried that are performed
by two female bonobos (L and K) in their second year of life. (They are repre-
sentative of practices identified in other dyads.) All infants were born and live in
captivity (L at Leipzig Zoo and K at San Diego Zoo) and were raised by their own
mothers. As to L, the extracts come from 36 hours of data (122 carry events), and
K’s come from 29 hours of data (137 carry events). In the following analysis, the
sequence-initiating move by the infant (i.e., the first sequential attempt to elicit
a carry) will be considered the base First Pair Part (FPP). On the other hand, the
behavioural response by the mother that makes the carry possible will be consid-
ered the Second Pair Part (SPP). Usually, the SPP involves the mother approaching
the infant, placing a hand/arm behind the infant’s back, slightly scooping her up,
and getting up from the ground before the carry is actually performed.
All instances have been identified not by searching for specific gestures but
by identifying instances of successful carries and then analysing the behavioural
moves by both participants that led to the successful achievement of the joint
activity. From a CA perspective, what I am suggesting here corresponds to what
some scholars have done by focusing on social actions and attempting to iden-
tify different ways in which these actions can be accomplished in conversation
28 Federico Rossano
(e.g., Golato [2005] on compliments; Fox & Heinemann [2017] on requests; Pillet-
Shore [2012] on greetings).
In what follows, I present examples of two mechanisms that lead to the forma-
tion of communicative signals by which baby bonobos request to be carried. They
are:
FIGURE 2.1
The infant spreads her legs in response to the mother’s present venter
gesture.
it is used both while moving through space to get to the mother faster, and most
importantly it is the last behavioural move implemented in order for the infant to
wrap her legs around the mother’s body while holding her hands onto a rope.
In Excerpt 2, we can see how L initiates a carry sequence through an arm
raise gesture but opens her legs wide in anticipation of the mother’s arrival to
pick her up.
FIGURE 2.2
Infant spreads her legs in anticipation of the mother’s arrival (to then carry
her away).
30 Federico Rossano
FIGURE 2.3
The infant spreads her legs in first position and waits until the mother can
perceive the signal and respond with a carry.
Interactional history in non-human animals 31
Excerpts 1–3 show how behaviour that was originally part of the responsive
actions performed by the infant bonobo evolved into a first position move across
time. The transformation occurred when the mother began to anticipate and orient
to L’s responsive behaviour, and then L realised that she, herself, could initiate and
elicit a carry by producing, in first position, her responsive behaviour. The time
that passed between Excerpts 1 and 3 was three weeks. Thus, in less than a month,
a behaviour that was not used to elicit a carry becomes a reliable intentional signal
within this dyad. Notably, in the 36 hours of footage investigated for this dyad,
there is no evidence of the existence of the gesture produced in Excerpt 3 until the
date on which Excerpt 3 was documented. Interestingly, the steps through which
this process occurred mirror those that have been claimed to take place for human
infants’ gesture of raising their arms to be picked up (Lock, 1978).
FIGURE 2.4
Infant K moves out, turns, returns to the mother, and begins the touch side
gesture to elicit a carry.
FIGURE 2.5
Infant K moves out, turns, but does not return to the mother and rather
waits for the mother to pick her up.
Before K moves further, the mother begins moving towards K and grabs her
back, pulling K towards herself (i.e., conveying that the mother is going to carry
her). K gets on the mother’s venter (Figure 2.5e), and the mother follows the path
of the adult female that was previously sitting beside them. In this excerpt, the
infant’s original moving out by three steps (as seen in Excerpt 4) and then return-
ing towards the mother to touch/grab her side is now reduced to moving out only
one step and not moving back towards the mother. The key behaviour that so far is
conserved across time, and that signals to the mother the infant’s wish to be car-
ried is the first moving out in front of the mother and then turning.
34 Federico Rossano
Excerpt 6 shows how this two-step process (first moving and then turning) is
further transformed and condensed.
FIGURE 2.6 Infant K turns while moving out and produces a slow spin.
Interactional history in non-human animals 35
The articulation of the signal is not completely smooth and the moving out is still
noticeable, only the turning begins while the infant throws herself out. The mother
moves her arm to tap the infant’s back before the infant has completed the spin
(6c and 6e), and the infant gets onto the mother’s venter while the mother begins
the carry.
In Excerpt 7, we can see the final gesture emerging from the shortening and
speeding up of the steps initially present in Excerpt 4. Only 25 days have gone by
between Excerpts 4 and 7.
The spin, therefore, is not an arbitrary signal developed by K but a contraction and
speeding up of the previous steps that she had used to solicit a successful carry.
This process is analogous to what was described by Deppermann (2018b), con-
cerning instructions in driving lessons that over time become shorter, less com-
plex, and more condensed. Critically, the contraction of the signal is not done by
the infant in a vacuum; rather, she relies on the mother’s ability to project what
she is trying to accomplish/communicate and to cooperatively respond by grant-
ing such request (Rossano, 2018). Without the mother’s inferential projection, the
36 Federico Rossano
FIGURE 2.7
Infant K looks at the mother, produces a spin and then waits for a response
from the mother.
communicative practice could not be compressed because the recipient would sys-
tematically fail to respond to any behaviour different from the original one. Similar
to the first practice, the time that the process of ritualisation takes in bonobos is
rather brief. The recordings of Excerpts 4 and 7 are 25 days apart, and Excerpts
5 and 7 only 11 days apart. In other words, the transformative process through
which the spin signal emerges does not require an extensive amount of time.
Interactional history in non-human animals 37
Discussion
In this paper, I have relied on sequence organisation and next-turn proof procedure
as two key tools to analyse not only how animals communicate, but also how their
communicative signals might change over time. As shown, they provide access
to the participant’s perspective also in the primate data. By applying a CA per-
spective, I have shown how communicative gestures that infant bonobos use to
request their mothers to carry them change over time, and through repeated inter-
actions, becoming part of their interactional history. In particular, I have outlined
two mechanisms through which behavioural practices can change while retaining
some similarities with the original behaviour produced by the same individuals: 1)
Responsive action produced in the first position, and 2) Shortening and speeding
up. This is not an exhaustive list and most likely other mechanisms exist.
I have also shown how the process through which a responsive behaviour trans-
forms into an initiating action does not require months or years, but rather a few
weeks at most. The estimate is likely exaggerated since we did not collect data 24
hours a day; therefore, it is possible that the signal became ritualised even faster.
The finding that ritualisation takes three weeks or less within the context of carry
is critical because it addresses recent criticisms (e.g., Byrne et al., 2017) about the
implausibility of ontogenetic ritualisation as a mechanism to acquire new ges-
tures, given how long it would take to do so.
This study does not meet the criteria for investigating interactional histories
(Deppermann, 2018a), that is, to capture interactions “from the beginning without
a gap”. Yet, the analysis builds on a significantly broader dataset focusing on the
same individuals than any other prior study on gesture development in non-human
primates. The current standard of collecting primate data has proven prohibitive
to fully capturing the process of behavioural change. Primate data is often biased
towards live observations and the use of ethograms (at the expense of video record-
ings), and even when video recordings are made, they are usually not collected on a
daily basis but sporadically, for limited durations. Thus, it should not be too surpris-
ing that the methodological constraints of the current standard for data collection
have hindered the possibility to fully capture behaviour that is malleable and quickly
transforming. As we have long learned from the notion of recipient design (Sacks et
al., 1974), the psycholinguistic work on lexical choice by Brennan and Clark (1996)
and the work by Schegloff (1986) on the routine as achievement, repeated interac-
tions with familiar others affect the way we communicate and calibrate our signals.
The knowledge of who we are for each other and what we know about each other
affects the signal selection, especially if the interaction is a routine one.
This study proposes a new perspective to the study of interactional histories.
Instead of detecting a practice in earlier interactions and tracking how it changes
over time, here we propose to start from activities (i.e., social actions) and look
backwards to identify the behavioural practices that could have been used to imple-
ment them (for a similar type of analysis, see Skogmyr Marian, 2021 on complaints
in L2). This is in contrast to the strategy of starting from behavioural forms, and
38 Federico Rossano
then inferring which social actions they are implementing (relying on the next-turn
proof procedure). Taking seriously Goffman’s (1983) concern about the origin of
the design of first turns means identifying all possible signals that a non-human
primate can rely on to elicit a specific response, and figuring out what factors affect
their implementation. Taking Goffman’s remark seriously also means attempting to
identify the history of a signal, its origins and development, and integrating it with
the relational history of the interacting individuals. This would allow us to see and
identify the origins of the signal’s design and what is necessary for its development.
Furthermore, this work has important implications for the study of communi-
cative development in human infants. Apart from the work by Lock (1978) and
Clark (1978) and the original work on gesture acquisition in young infants (Bates
et al., 1975), the focus of psychological research on human infants’ remarkable
imitative abilities (Gergely et al., 2002; Meltzoff, 1995) has led to a general neglect
of alternative ways of learning and developing communicative signals in children
(see, however, Cekaite, 2007; Gardner & Forrester, 2009; and Wootton, 1997, for
exceptions). A recent article by Marentette and Nicoladis (2012), for example, has
failed to find evidence for ontogenetic ritualisation in the gestural development
of human infants less than one year old. Unfortunately, methodologically, their
study corresponds to existing research on non-human primate gestures: they used
infrequent and short recordings collected every few weeks and a coding scheme
that looks for fully formed gestures to begin with, rather than starting from social
activities and looking backwards at the non-gesture-like behaviours that the
infants produced to elicit the social activities.
As to current questions in the CA community, this study reminds us that build-
ing typologies of social actions or practices should not come at the cost of detect-
ing how actions and practices change. They might change because the individuals
change (e.g., individuals grow older and their ability to communicate transforms)
or because the relational history between interactional partners affects the recog-
nisability of specific behaviours, facilitating ritualisation and the use of idiosyn-
cratic expressions. The ability to make sense of each other while using modified
communicative practices is one of the key reasons investigating interactional his-
tories is gaining popularity within the field (e.g., Deppermann & Pekarek Doehler,
2021). We can now aim at tracking their role in the calibration of communicative
practices also in non-human primates.
Acknowledgements
I want to thank the Wolfgang Köhler Primate Research Centre and Christine
Johnson for generously allowing me to access their video data for the purpose of
this project and Marike Schreiber, Catherine Eng and Paulina Lee for producing
the drawings contained in this paper. I also want to thank Marta Halina for the
many helpful discussions about gestures in primates and carries in bonobos.
Interactional history in non-human animals 39
References
Bates, E., Camaioni, L., & Volterra, V. (1975). The acquisition of performatives prior to
speech. Merrill-Palmer Quarterly of Behavior and Development, 21(3), 205–226.
Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(6), 1482.
Broth, M. (2017). Starting out as a driver: Progression in instructed pedal work. In Å.
Mäkitalo, P. Linell, & R. Säljö (Eds.), Memory practices and learning: Interactional,
institutional and sociocultural perspectives (pp. 115–152). Information Age Publishing.
Byrne, R. W., Cartmill, E., Genty, E., Graham, K. E., Hobaiter, C., & Tanner, J. (2017).
Great ape gestures: Intentional communication with a rich set of innate signals. Animal
Cognition, 20, 755-769.
Call, J., & Tomasello, M. (2007). The gestural communication of apes and monkeys.
Lawrence Erlbaum Associates.
Cartmill, E. A., & Byrne, R. W. (2010). Semantics of primate gestures: Intentional
meanings of orangutan gestures. Animal Cognition, 13(6), 793–804.
Cekaite, A. (2007). A child’s development of interactional competence in a Swedish L2
classroom. The Modern Language Journal, 91(1), 45–62.
Cekaite, A., Keisanen, T., Rauniomaa, M., & Siitonen, P. (2021). Human-assisted mobility
as an interactional accomplishment. Gesprächsforschung - Online-Zeitschrift zur
Verbalen Interaktion, 2, 469–475.
Clark, H. H. (1996). Using language. Cambridge University Press.
Clark, R. A. (1978). The transition from action to gesture. In A.E. Lock (Ed.), Action,
gesture and symbol (pp. 231-257). Academic Press.
Crockford, C., Wittig, R. M., Mundry, R., & Zuberbuehler, K. (2012). Wild chimpanzees
inform ignorant group members of danger. Current Biology, 22(2), 142–146.
Deppermann, A. (2018a). Changes in turn-design over interactional histories - The case
of instructions in driving school lessons. In A. Deppermann & J. Streeck (Eds.), Time
in embodied interaction: Synchronicity and sequentiality of multimodal resources (pp.
293–324). John Benjamins.
Deppermann, A. (2018b). Instruction practices in German driving lessons: Differential
uses of declaratives and imperatives. International Journal of Applied Linguistics,
28(2), 265–282.
Deppermann, A., & Pekarek Doehler, S. (2021). Longitudinal conversation analysis-
introduction to the special issue. Research on Language and Social Interaction, 54(2),
127–141.
Fox, B. A., & Heinemann, T. (2017). Issues in action formation: Requests and the problem
with x. Open Linguistics, 3(1), 31–64.
Fröhlich, M., Wittig, R. M., & Pika, S. (2016a). Should I stay or should I go? Initiation of
joint travel in mother–infant dyads of two chimpanzee communities in the wild. Animal
Cognition, 19(3), 483–500.
Fröhlich, M., Wittig, R. M., & Pika, S. (2016b). Play-solicitation gestures in chimpanzees
in the wild: Flexible adjustment to social circumstances and individual matrices. Open
Science, 3(8), 160278.
Gardner, H., & Forrester, M. (Eds.). (2009). Analysing interactions in childhood: Insights
from conversation analysis. John Wiley & Sons.
Garfinkel, H. (2002). Ethnomethodology’s program: Working out Durkheim’s aphorism.
Rowman & Littlefield.
40 Federico Rossano
Genty, E., Breuer, T., Hobaiter, C., & Byrne, R. W. (2009). Gestural communication of
the gorilla (Gorilla gorilla): Repertoire, intentionality and possible origins. Animal
Cognition, 12(3), 527–546.
Gergely, G., Bekkering, H., & Király, I. (2002). Developmental psychology: Rational
imitation in preverbal infants. Nature, 415(6873), 755–755.
Goffman, E. (1983). Felicity’s condition. American Journal of Sociology, 89(1), 1–53.
Golato, A. (2005). Compliments and compliment responses: Grammatical structure and
sequential organization. John Benjamins.
Graham, K. E., Furuichi, T., & Byrne, R. W. (2016). The gestural repertoire of the wild
bonobo (Pan paniscus): A mutually understood communication system. Animal
Cognition, 20(2), 171–177.
Halina, M., Rossano, F., & Tomasello, M. (2013). The ontogenetic ritualization of bonobo
gestures. Animal Cognition, 16(4), 653–666.
Hinde, R. A. (1966). Animal behavior: A synthesis of ethology and comparative psychology.
McGraw-Hill.
Hobaiter, C., & Byrne, R. W. (2011). The gestural repertoire of the wild chimpanzee.
Animal Cognition, 14(5), 745–767.
Hobaiter, C., & Byrne, R. W. (2014). The meanings of chimpanzee gestures. Current
Biology, 24(14), 1596–1600.
Hutchins, E., & Johnson, C. M. (2009). Modeling the emergence of language as an
embodied collective cognitive activity. Topics in Cognitive Science, 1(3), 523–546.
Liebal, K., & Call, J. (2012). The origins of hon-human primates’ manual gestures.
Philosophical Transactions of the Royal Society of London Series B, 367(1585), 118–128.
Lock, A. (1978). The emergence of language. In A. Lock (Ed.), Action, gesture and symbol:
The emergence of language (pp. 1–21). Academic Press.
Luef, E. M., & Pika, S. (2017). Reciprocal greeting in chimpanzees (Pan troglodytes) at the
Ngogo community. Journal of Neurolinguistics, 43, 263–273.
Marentette, P., & Nicoladis, E. (2012). Does ontogenetic ritualization explain early
communicative gestures in human infants. Developments in Primate Gesture Research,
6, 33.
Meltzoff, A. N. (1995). Understanding the intentions of others: Re-enactment of intended
acts by 18-month-old children. Developmental Psychology, 31(5), 838.
Mondémé, C. (2022). Why study turn‐taking sequences in interspecies interactions?
Journal for the Theory of Social Behaviour, 52(1), 67–85.
Pekarek Doehler, S., Wagner, J., & González-Martínez, E. (Eds.). (2018). Longitudinal
studies on the organization of social interaction. Palgrave Macmillan.
Pika, S., Liebal, K., & Tomasello, M. (2005). Gestural communication in subadult bonobos
(Pan paniscus): Repertoire and use. American Journal of Primatology, 65(1), 39–61.
Pike, K. L. (1966). Etic and emic standpoints for the description of behaviour. In A. G.
Smith (Ed.), Communication and culture (pp. 152–163). Holt Rinehart & Winston.
Pillet-Shore, D. (2012). Greeting: Displaying stance through prosodic recipient design.
Research on Language and Social Interaction, 45(4), 375–398.
Rossano, F. (2018). Social manipulation, turn-taking and cooperation in apes: Implications
for the evolution of language-based interaction in humans. Interaction Studies, 19(1–2),
151–166.
Rossano, F., & Liebal, K. (2014). Requests’ and ‘offers’ in orangutans and human infants.
In P. Drew & E. Couper-Kuhlen (Eds.), Requesting in social interaction (pp. 333–362).
John Benjamins.
Interactional history in non-human animals 41
Hannah Pelikan
Introduction
Transcription is a crucial part of the workflow in the ethnomethodological and
conversation analytic (EMCA) tradition, as it supports re-inspection and re-
analysis of the data (Laurier, 2014). Besides standard practices for transcrib-
ing speech (Jefferson, 2004; Selting et al., 2011) and multimodal and embodied
actions (Mondada, 2019), EMCA researchers have developed practices for rep-
resenting interaction with material artefacts and interaction in complex (tech-
nological) environments (see e.g., Laurier & Reeves, 2014; Licoppe et al., 2017;
Luff et al., 2013). While robots are increasingly gaining interest within the EMCA
community (see e.g., Alač et al., 2011; Due et al., 2019; Fischer, 2016; Pelikan
& Broth, 2016; Pitsch, 2016; Tuncer et al., 2023; Yamazaki et al., 2010), to date
there is no systematic discussion of how to transcribe human–robot interaction.
Should robot behaviour be transcribed like that of humans or of material objects?
What adjustments and considerations may be necessary to capture the differences
between humans and robots? What does transcription tell us about the interac-
tional status of a robot?
Since “transcribing is an analytic process” (Roberts, 2012, p. 1), it is necessar-
ily shaped and informed by the theoretical and subjective goals of the researcher
(see e.g., Bucholtz, 2000; Jenks, 2013; Ochs, 1979). In the case of robots, it may
matter considerably whether one is interested in how humans interact in the pres-
ence of a robot or whether one is focusing on the robot as a potential participant
(Goffman, 1981; Goodwin & Goodwin, 2005) in interaction. In line with com-
mon definitions in robotics (see e.g., Bekey, 2005), I define robots as physically
embodied machines that can sense and act in the material world with a degree of
autonomy. As this opens opportunities for joint actions, this chapter will focus
DOI: 10.4324/9781003424888-4
Transcribing human–robot interaction 43
participants and EMCA analysis can nuance what type of participant an animal
is (Mondémé, 2016; see also Rossano, this volume). Multimodal transcription can
uncover changes in participation status not only for humans and animals but also
for robots (see also Pelikan et al., 2022).
With their ability to speak and act autonomously, robots are not only potential
participants but may even fall under the ethnomethodological concept of a “mem-
ber”, as introduced by Garfinkel and Sacks (1970, p. 342, emphasis in original):
The notion of member is at the heart of the matter. We do not use the term
to refer to a person. It refers instead to mastery of natural language, which
we understand in the following way. We offer the observation that persons,
because of the fact that they are heard to be speaking a natural language, some-
how are heard to be engaged in the objective production and objective display
of commonsense knowledge of everyday activities as observable and report-
able […] that is account-able phenomena
The data
The chapter features three different robots that I recorded in different settings.
Table 3.1 offers an overview of the data.
46 Hannah Pelikan
TABLE 3.1
The robots referred to in this chapter, including information about the setting, participants, and further details about the respective
studies
Nao (Aldebaran) 60 cm tall humanoid robot Lab (charade 13 university Sweden/English (native & Pelikan and
that can perform complex game) students non-native) Broth
gestures (2016)
Cozmo (Anki) Palm-sized toy robot with Home (free 6 families, 4 adult Sweden/Swedish, Germany/ Pelikan et al.
forklift that can pick up toy play) pairs German (2020)
cubes, animated display face
Autonomous shuttle Autonomous vehicle for public Public roads Safety drivers, Sweden/Swedish (native & Pelikan (2021)
bus (Navya) transport (15 passengers) traffic non-native)
participants
Transcribing human–robot interaction 47
usually do not describe robot sound by reporting that the robot emitted a one-sec-
ond tone at 440 Hz, but by putting it into words. Consider for instance the follow-
ing transcript in which a safety driver on an autonomous shuttle bus formulates a
rendering of the bus’ sounds in anticipation of a break. The transcription style is
adapted from Jefferson (2004) and Mondada (2019).
The bus is driving a few metres behind a pair of pedestrians. When they slow
down (l. 01), the safety driver who is monitoring the situation utters “pling pling”
(l. 02). A moment later, the bus indeed produces two metallic bell sounds (l. 03),
which are usually triggered before braking, when the bus detects an obstacle. The
example demonstrates that such sounds are relevant for participants who demon-
strate their understanding by producing onomatopoeic renderings of the sounds.
This can also guide the analyst in capturing these sounds.
When trying to put sounds into words, we may imitate a car engine with a
“vroom”, and a honk with a “beep” (Laurier et al., 2020). However, as these
examples illustrate, by rendering a written version of them, we present them in a
particular language: The same sounds above would be written as “brumm” and
“tuut”, respectively, in German for example. This highlights that as soon as we
decide to transcribe sounds, we are “translating” them into a particular language
and its sign system. This can pose a challenge for transcription: Should robot
sounds be transcribed in the language of the participants, the native language of
the analyst, or in the language of the design team? Differences in orthographic
representation may make it impossible to compare transcripts across languages.
Since transcripts should be understandable, I found transcribing robot sound in
the language that the paper is written in the most practical solution (if only to
avoid translation lines).
literature. Although all three studies include the sounds in their transcripts, there
are differences in spelling and two studies distinguish the sounds with raising
(“da↑dup”/“da↑dap”) from those with falling intonation (”da↓dap”). The following
transcript illustrates a typical moment in which the robot plays these sounds.
Participant Rachel is seated on a carpet in a robot lab and is about to play a
charade game with humanoid robot Nao. The robot just introduced itself and the
game and is now asking Rachel whether she is ready to play.
In the excerpt, the robot asks whether Rachel is ready (l. 01) and she immedi-
ately produces the preferred response, indicating that she is indeed ready (l. 02).
Her utterance overlaps with a sound that the robot plays (l. 03). After a significant
amount of silence (l. 04), she repeats and extends her utterance, saying “yes, I am
ready” (l. 05). The robot plays another sound (l. 07) before responding “good” (l.
09). Transcribing these sounds (in whatever lexicalised form) enables a temporal
analysis, demonstrating for instance that the first sound (l. 03) occurs in overlap
with Rachel’s utterance.
Analysing the sounds more carefully, I noticed that they typically occur after
the robot has asked a question and that there are usually two sounds occurring
in close proximity: A sound with rising intonation (l. 03) will be followed by
a sound with falling intonation (l. 07). Human utterances that occur in overlap
with or before the robot’s sound will not be reacted to by the robot. In fact, the
sounds mark a time window during which the robot is recognising the partici-
pant’s speech. Distinguishing between the sound that indicates that the robot is
now starting or stopping to listen thus informed my subsequent analysis. While I
sometimes include the intended meaning in a comment such as ((speech recogni-
tion on/off)), participants may not necessarily have been aware of what the sounds
meant. Since analysts cannot look inside participants’ heads to verify whether
they understand the underlying technological function of a sound, transcribing
sound in an onomatopoeic way is a more neutral option.
While “dadap” is clearly robotic, sounds inspired by human vocalisations raise
different challenges for transcription. They require careful balancing between
transcribing in a way that is realistic (staying true to how it actually sounds,
50 Hannah Pelikan
capturing the details of the robotic production) and recognisable (making the
transcript readable and preserving the associated meaning potentials) (ten Have,
2002). Excerpt 3 illustrates this through an early transcript draft, which I include
here to highlight problems with capturing key elements of the sound in line 28. A
German couple, Petra and Robert, are playing with toy robot Cozmo at the dinner
table in their home. Making use of its face recognition component, Cozmo just
learnt to recognise Petra’s face and is now able to greet her by name whenever her
face appears in its camera view. With some effort, she has managed to orient the
palm-sized robot away from her own face towards Robert, who is leaning towards
it, waiting for the robot to also learn his name. While the researcher manually
typed both names into a phone app that launches the activity, Cozmo now autono-
mously performs a sequence of moves.
22 ROB • ha•llo
hello
coz •turns away•
23 • (0.5) •(1.2)• (0.3) •(.)
coz •turns back• •drives forward•
24 ROB [m]
25 COZ [o]wa:odidi
26 ROB mog[st mi ned
don’t (you) like me? ((Bavarian dialect))
27 COZ [dr::r::r::r::
28 COZ ä:::
uh
29 ((Petra and researcher laugh))
Just when Robert greets the robot, Cozmo turns away again (l. 22). When
the robot turns back (l. 23) and plays a squeaky beep sound (l. 25), Robert asks
“don’t you like me” (l. 26). Cozmo produces a cog-like machine sound in overlap
(l. 27), followed by a human-inspired vocalisation uttered in a robot voice (l. 28),
which evokes associations with a hesitating “uh”. Robert’s wife and the researcher
acknowledge this interpretation with a few brief laughter particles (l. 29).
While I initially transcribed Cozmo’s sound in line 28 with the German
hesitation marker “ä”, the sound is extensively stretched, which makes it more
ambiguous than a typical human version. It was challenging to strike a balance
between capturing the meaning potential (Linell, 2009) while also highlight-
ing the robotic-ness of the sound. This is especially relevant as I am working
on collections of repeated sequences, in which the sound may or may not be
oriented to as carrying a particular meaning. In later stages of transcription, I
rendered the sound as “uUUUUUUHH” following English spelling conventions
(see Figure 3.1 image 3). This version captures the potential hearing as “uh”
while it also highlights the difference to a typical human hesitation maker and a
slight shift in how it sounds at its onset, as compared to the prolonged duration.
Transcribing human–robot interaction 51
FIGURE 3.1
Cozmo’s face learning script (Pelikan & Hofstetter, 2022).
Excerpt 4 Face learning. Cozmo-FAM5. Day 1 (Pelikan & Hofstetter, 2022). Cozmo’s
pre-programmed turns are highlighted in grey.
can also be heard as “uh”, its meaning potential as hesitation marker seems to be
slightly weaker.
A recording of the robot in a silent environment proved particularly helpful
when creating a reference transcription. Tools for prosodic analysis like Praat and
Audacity supported scrutiny and comparison of the individual sounds. Sometimes
adjustments to situated hearing were necessary, for instance when a sound lin-
gered on longer (cf. Excerpt 2, l. 03 for an example), or when the onset of the sound
was drowned out in overlapping talk. Similarly, dealing with many repetitions of
the same multimodal behaviour across different recordings, I noticed that certain
elements (e.g., a small movement paired with a sound) may be treated as relevant
in some situations and ignored in others. One way to present this is to provide a
detailed multimodal transcription of the “idealised” robot animation (see Figure
3.1) and to use a version that is reduced to what is relevant for the situated scene
in the transcripts (Pelikan et al., 2020; Pelikan & Hofstetter, 2022). In other cases,
I decided to keep certain elements in all transcripts, even if a clear orientation to
them is absent, as one would perhaps do with babbling babies. For instance, the
listening sounds described in Extract 2 with Nao provide an important insight into
what the robot is doing – which is available to those with prior experience of talk-
ing to machines and those with programming knowledge.
and robots have implications for the applicability of the next-turn proof proce-
dure (Sacks et al., 1974), in which the recipient’s understanding of a current turn
becomes observable for speaker and analyst in the next turn. Taking the term
“participant understanding” seriously, a robot is also displaying its understanding,
but the particulars may differ significantly from those of the human participants.
Consider for instance Excerpt 5 below, in which participant Gary is just start-
ing to interact with Nao. While the initial turns do not appear problematic, the
question in line 10 points to a mismatch between how Gary understood the previ-
ous turns and how they are treated by the robot.
Nao greets Gary (l. 01), and Gary produces the second pair part to the greeting
(l. 03). Nao then proceeds by introducing its name (l. 04) and again, Gary fills the
slot with a relevant next action (l. 06). Nao continues and provides more infor-
mation (l. 08–09). However, when the robot asks, “what’s your name?” (l. 10), it
becomes evident that the robot has not registered Gary’s name and Gary has to
repeat it (l. 12), speaking in the window marked by the previously analysed sounds
(l. 11 and l. 13).
From Nao’s perspective, no input was registered outside its audio record-
ing time frame, so the robot does not respond to Gary until after the beep
sounds (l. 14). The utterance “I’m Nao” (l. 04) is not displaying understanding
of Gary’s previous turn as a return greeting. Instead, at that moment, the robot
shares similarities with automata like washing machines, juke boxes, or vend-
ing machines that, once triggered, perform a series of actions no matter what.
Relying on its sensors and processing capabilities, the robot’s understanding is
severely limited. This raises questions about who displays what kind of under-
standing at what point in time and highlights the asymmetry between Gary and
the robot.
While initially orienting to Nao as a competent member, Gary repeats his
name calmly and without asking for an account. He thereby treats the robot like
56 Hannah Pelikan
a machine that requires certain input, like answering machines that only register
what is said after a beep or websites that only accept text strings of a certain
length. The excerpt highlights that while a robot may be momentarily treated as a
member that displays common sense knowledge, it may lose that status in the next
moment when it demonstrates lack of that knowledge, such as following through
a typical greeting sequence. Studying interaction with robots as potential (though
limited) members also raises questions about how we describe human practices,
and what it means to be a competent member in interaction.
Excerpt 6
Cozmo at the dinner table. A2 [2:20–2:28].
01 ULR hallo antworten
hello, respond!
02 COZ me↓oooooo
03 ULR ja (.) was is denn das für ne antwort
well what kind of a response is that
04 (0.9)
05 RES wollt ihr dass er eure namen lernt?
would you like him to learn your names?
06 COZ oai?
07 (0.9)
08 HOS? [ja: ]
yes
09 ULR [ich h]eiße ulrich
my name is ulrich
Participant Ulrich (ULR) commands a response (l. 01) and subsequently dis-
plays dissatisfaction with Cozmo’s reaction (l. 02–03). Since the robot does not
have speech recognition, it has no means to appropriately answer. During an
ensuing silence (l. 04), which might have ended with participants abandoning the
robot, the researcher proposes a new activity (l. 05), saving Cozmo from a moment
in which it is unable to provide further relevant actions. At the same time, she
frames a new context in which the robot can contribute competently during the
clearly structured face learning sequence.
Transcribing human–robot interaction 57
While the scaffolding role of researchers requires special scrutiny, this way of
supporting the robot is common even in their absence. In my second round of data
collection, Cozmo stayed with families for several days. Parents and siblings led
each other through activities, and sometimes selected specific actions in the app to
compensate for missing responses from the robot. Naturally, technologically more
experienced participants took the lead, shaping how the others experienced the
robot (for an example, see Pelikan et al., 2022). The recorded data consequently
reveals not only the scaffolded status of the robot, but also the variable degrees of
the participants’ robotic expertise.
Scaffolding occurs not only in groups but can also be observed in one-on-one
interaction between humans and robots. Adapting to the robot and producing
utterances that can be handled by the machine (see Excerpt 5, Pelikan & Broth,
2016), humans permit the robot to smoothly advance through the interactional
sequence that it is capable of – scaffolding by removing complexities that the robot
may not be able to handle. These examples highlight that robots can be consid-
ered participants, albeit with limited capabilities. Their participation status may
dynamically change from one moment to the next.
Concluding discussion
As I have illustrated in this chapter, transcribing robot behaviour in detail sheds
light on the role robots take in interaction. However, since robot action repertoires
differ from those of humans, transcription practices need to be adjusted, par-
ticularly with respect to non-lexical sound. Transcribing non-human utterances
in letter combinations enables deep analysis but also poses challenges, as one
necessarily renders them in a specific language. Capturing ambiguity and mean-
ing potentials (Linell, 2009) that participants may or may not orient to requires
careful consideration of transcription choices. Further, when dealing with repeti-
tions of complex robot behaviour, creating a reference transcription may prove
valuable. This may be created for individual utterances/interactional moves or
for longer sequences. Aggregating situated transcriptions into a generalised script
pays off particularly when one aims to demonstrate a robot’s interactional flaws.
Suggestions for improvement of the robot’s interaction design can be demonstrated
by adjusting the identified script, pointing out alternative sequential trajectories
that will be easier to follow from the perspective of human users (see e.g., Pelikan
& Broth, 2016; Pelikan et al., 2020; Pelikan & Hofstetter, 2022).
In the second part of the chapter, I have argued for paying particular attention
to the robot’s understanding in later stages of the analysis. EMCA analysis heav-
ily draws on the next-turn proof procedure (Sacks et al., 1974), identifying how
a next turn displays understanding of the previous. To analyse what understand-
ing a robot is displaying at a given moment, it may be important to determine
whether a robot is actually in a reactive mode. Suchman (1987/2007) for instance
notes for each human action in her transcripts whether it is registered by the
58 Hannah Pelikan
copying machine. Engaging with the way a robot locally produces its actions can
be compared with gaining vulgar competence (Garfinkel & Wieder, 1992) in the
perspective of the robot. If we want to treat or develop robots as potential partici-
pants whose perspective is different from that of humans, we may want to con-
sider carefully what researchers need to know about this perspective. Ultimately,
looking at what behaviour is unavailable to the machine and where the machine is
scaffolded provides a sobering perspective on the state of the art of human–robot
interaction.
Finally, empirically analysing video recordings enables EMCA researchers
to shed light onto the role that robots take in interaction. Transcription supports
documentation of embodied actions and changing participation frameworks. As
robots are increasingly able to dynamically react to people’s actions, such as their
movement in space (cf. Excerpt 4, in which the robot is extending the scanning
process in response to the moving participant), one may argue that robots can be
considered participants in specific moments. Careful transcription can uncover
how experienced co-participants project and scaffold a robot’s next actions, creat-
ing contexts in which the robot’s actions appear as competent. This chapter has
demonstrated the continued relevance of early EMCA work on machines as inter-
actants (McIlvenny, 1990; Norman & Thomas, 1991). However, in line with con-
temporary work (Pitsch, 2016), it also presents robots as participants that require
(more or less) interactional support. Human co-participants treat robots as partici-
pants (and potential members) and robots can be analysed as such, but at the same
time, one should not overlook the human scaffolding work that goes into achieving
this participation status.
Acknowledgements
I am indebted to Leelo Keevallik and Mathias Broth for fruitful discussions and
would like to thank Jenny Fu for commenting on an earlier draft. This work is
funded by the Swedish Research Council (2016-00827).
Reference list
Alač, M. (2016). Social robots: Things or agents? AI and Society, 31(4), 519–535. https://doi
.org/10.1007/s00146- 015- 0631-6
Alač, M., Movellan, J., & Tanaka, F. (2011). When a robot is social: Spatial arrangements
and multimodal semiotic engagement in the practice of social robotics. Social Studies
of Science, 41(6), 893–926.
Arend, B., Sunnen, P., & Caire, P. (2017). Investigating breakdowns in human robot
interaction: A conversation analysis guided single case study of a human-robot
communication in a museum environment. International Journal of Mechanical and
Mechatronics Engineering, 11(5), 949–955. https://doi.org/10.5281/zenodo.1130169
Barth-Weingarten, D., Couper-Kuhlen, E., & Deppermann, A. (2020).
Konstruktionsgrammatik und Prosodie: OH in englischer Alltagsinteraktion. In W.
Transcribing human–robot interaction 59
of the 4th ACM/IEEE international conference on human robot interaction (pp. 61–68).
Association for Computing Machinery. https://doi.org/10.1145/1514095.1514109
Nevile, M. (2015). The embodied turn in research on language and social interaction.
Research on Language and Social Interaction, 48(2), 121–151. https://doi.org/10.1080
/08351813.2015.1025499
Norman, M. A., & Thomas, P. J. (1991). Informing HCI design through conversation
analysis. International Journal of Man-Machine Studies, 35(2), 235–250.
Ochs, E. (1979). Transcription as theory. In E. Ochs & B. B. Schieffelin (Eds.),
Developmental pragmatics (pp. 43–72). Academic Press.
Panariello, C., Sköld, M., Frid, E., & Bresin, R. (2019). From vocal sketching to sound
models by means of a sound-based musical transcription system. In Proceedings of the
16th sound and music computing conference. https://www.smc2019.uma.es/articles/S2/
S2_05_SMC2019_paper.pdf
Pelikan, H. R. M. (2021). Why autonomous driving is so hard: The social dimension of
traffic. In Companion of the 2021 ACM/IEEE international conference on human-robot
interaction (pp. 81–85). Association for Computing Machinery. https://doi.org/10.1145
/3434074.3447133
Pelikan, H. R. M., & Broth, M. (2016). Why that Nao? How humans adapt to a conventional
humanoid robot in taking turns-at-talk. In Proceedings of the 2016 CHI conference
on human factors in computing systems (pp. 4921–4932). Association for Computing
Machinery. https://doi.org/10.1145/2858036.2858478
Pelikan, H. R. M., Broth, M., & Keevallik, L. (2020). “Are you sad, Cozmo?”: How humans
make sense of a home robot’s emotion displays. In Proceedings of the 2020 ACM/IEEE
international conference on human-robot interaction (pp. 461–470). Association for
Computing Machinery. https://doi.org /10.1145/3319502.3374814
Pelikan, H. R. M., Broth, M., & Keevallik, L. (2022). When a robot comes to life: The
interactional achievement of agency as a transient phenomenon. Social Interaction.
Video-Based Studies of Human Sociality, 5(3). https://doi.org/10.7146/si.v5i3.129915
Pelikan, H., & Hofstetter, E. (2022). Managing delays in human–robot interaction. ACM
Transactions on Computer–Human Interaction Just Accepted (October 2022). https://
doi.org/10.1145/3569890
Pitsch, K. (2016). Limits and opportunities for mathematizing communicational conduct for
social robotics in the real world? Toward enabling a robot to make use of the human’s
competences. AI and Society, 31(4), 587–593. https://doi.org/10.1007/s00146-015-
0629-0
Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice interfaces in everyday
life. In Proceedings of the 2018 CHI conference on human factors in computing systems
(pp. 640:1–640:12). Association for Computing Machinery. https://doi.org/10.1145
/3173574.3174214
Roberts, F. (2012). Transcribing and transcription. In The international encyclopedia
of communication. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781405186407
.wbiect056.pub2
Robinson, F. A., Bown, O., & Velonaki, M. (2022a). Designing sound for social robots:
Candidate design principles. International Journal of Social Robotics, 14(6), 1507–
1525. https://doi.org/10.1007/s12369- 022- 00891-0
Robinson, F., Velonaki, M., & Bown, O. (2022b). Crafting the language of robotic agents:
A vision for electroacoustic music in human–robot interaction. Organised Sound, 1–13.
https://doi.org/10.1017/S1355771822000358
62 Hannah Pelikan
Brian L. Due
Introduction
This chapter addresses the following research question: What methods do sighted
members of society apply when trying to achieve intersubjectivity with and adopt
a visually impaired member’s perspective in situ? By exploring this question
through EMCA analysis of video excerpts, I seek to revisit the canonical concept
of the “participation framework” by adding the taken-for-granted aspect – that
these frameworks are ordinarily ocularcentric in their organisation, with vision as
an anticipated common resource for joint activities. I propose the concept “ocular-
centric participation framework” to describe how participants deal with embodied
and spatial reconfigurations in relation to aspects of vision/non-vision in the pur-
suit of accomplishing a current activity.
The community of Visually Impaired Persons (VIPs) configures what
Garfinkel called “natural experiments” (Rawls et al., 2020, p. 8ff), i.e., people
who, through natural actions, simultaneously exhibit aspects of human social-
ity and perception that are otherwise unnoticed and taken for granted. Thus, the
study of VIPs not only provides insights concerning a minority’s everyday lives
but also unveils visual practices routinely accomplished and taken for granted
by sighted people (cf. Garfinkel, 1963) in an ocularcentric world (Due & Lange,
2018c; Jay, 1994). The chapter analyses a single case in which a VIP is guided
towards a robot – which she eventually will use as a “guide dog” in an experi-
ment – to reflect on two issues of methodological relevance for EMCA: a) The
observable troubles participants encounter and manage in situ when adopting
another visually impaired member’s perspective; and b) the paradoxes embedded
in undertaking a visual analysis of visual impairment – i.e. the problems a seeing
DOI: 10.4324/9781003424888-5
64 Brian L. Due
making co-participants see what the viewer is seeing (Nishizaka, 2018) or what a
participant wants somebody else to see (Due et al., 2019), or different perceptions
of what other participants are seeing (Goodwin, 1995). However, these kinds of
studies anticipate sight as a common sensory resource. Organising participation
frameworks in which one or more individuals are recognised as VIPs prompts a
reconfiguration of the social, spatial, multimodal, and multisensorial organisation.
As I will also reflect upon: Providing analytical insights on such types of configu-
rations can be troublesome for any researcher who might also anticipate vision as
a common resource.
Participation frameworks are built around participants, whom ethnomethod-
ology considers members of a cultural society. Ten Have (2002) describes how
“members use and rely on a corpus of practical knowledge which they assume
is shared at least in part with others”. As Garfinkel (1967) originally conceived,
this membership knowledge is a “seen but unnoticed” resource in interaction. A
member is not an individual or person but a collection of a specific set of cultural,
epistemic, and language competences involved in being a member of a particu-
lar group (Garfinkel & Sacks, 1970). Participants in interaction mostly take for
granted visual resources, just as most EMCA researchers assume the member’s
visual perspective when collecting and analysing data. As such, ocularcentrism
has an invisible moral orderliness that becomes apparent in troublesome cases
(cf. Garfinkel’s study of a “transsexual” (1967, p. 116ff)). In this chapter, I adopt
Garfinkel’s approach to studies of perspicuous settings for close examination of
sense-making activities, where the taken-for-granted vision becomes prominent.
IMAGE 1 he setting, consisting of nine human agents and two non-human agents
T
(dog and robot).
Ocularcentric participation frameworks 67
a role as a human “obstacle”. RES4 is the software developer who operates Spot.
RES7 also works in public relations for the institute and is taking close-up record-
ings for the project. This analysis follows the unfolding of this single case as the
experiment is about to begin. The example is divided into four fragments: 1) get-
ting ready to move as a blind person alone; 2) taking – and accounting for the lack
of – a member’s perspective; 3) verbally including the VIP in the ocularcentric
participation framework; and 4) fracturing the ocularcentric participation frame-
work by shifting to a haptic sociality. The guiding analytical questions concern
problematic aspects associated with taking a member’s perspective, as well as
the construction and deconstruction of the ocularcentric participation framework.
Analysis
It is observable how PAR1, PAR2, and the dog form a micro-territory (Scheflen,
1975), with the dog as the focal point (Figures 13–15), and that they are standing
slightly apart from the others. While PAR1, who owns the dog, is standing with
her face and front towards the robot and the other participants, PAR2 is squatting
at an intimate distance with the dog (Hall, 1976). PAR1 is bodily displaying an
orientation to the main activity: the other participants and the robot. She has just
handed the dog’s leash to PAR2, and it is noticeable how she minimally initiates a
movement with her left leg toward the operator and the robot (Figure 13). It is not
a complete movement as she is still holding the cane in a resting position with a
closed fist, and the stick falls perpendicularly to the ground. By comparing Figure
13 with 14, it is noticeable how she reorients her body away from the dog and
FRAGMENT 1
Getting ready to move as a blind person alone.
68 Brian L. Due
PAR2 and towards the robot, but she then stands still in a waiting position (Figure
15). This reorientation can be seen as a response to the more formalised initiation
of the experiment produced by RES4: “test number one with the spot robot” (l.
18). This turn is a response to a prior interaction between RES4 and RES3 con-
cerning the recording equipment. These sighted members of the situation – and
I as the analyst – recognise how the turn in line 18 is directed to the recording
device as a kind of “start-the-test-documentary”, as there is no visual orienta-
tion to another human recipient (Figure 14–16). However, without this access to
RES4’s visual orientation, the turn can also be interpreted as a common initiation
of the experiment, in which PAR1 is supposed to walk with the robot. This situ-
ation is a manifestation of the ocularcentric participation framework, where a
sighted participant produces a turn with no extra verbal references or other indexi-
cal cues or multisensorial signs that make its recipiency understandable by sen-
sory means other than sight alone. The intelligibility lies solely in the visibility of
the speaker’s body, head, and gaze direction that is not projected towards a human
recipient (RES4, Figure 15). So far, the visually impaired person is reorienting and
displaying “getting ready to move” (Broth & Keevallik, 2014) but is left standing
alone in a waiting position with no display of understanding of whether she was
projected as the next person to do something or not.
From a visual perspective, however, the VIP seems marginalised as she orients
away from the dog and PAR2 and still is not bodily included in the new configura-
tion. But from a visually impaired perspective, this might not be the case since no
one accounts for the identity of the “true” recipient of RES4 utterance. As such,
dealing with a blind member’s perspective seems to be infused with a paradox:
namely, that the visual analysis shows how a blind person is left in a waiting posi-
tion with no clear understanding of the next actions due to a “wrong” understand-
ing of recipiency, but at the same time the VIP herself does not account for this
as a problem, as she has no perception of it. PAR1 is displaying an orientation
towards the robot but is standing still in a waiting position and exhibiting a bodily
monitoring stance, probably waiting for a more directive instruction. In the next
fragment, we see how this isolated waiting becomes accountable.
RES6 seems to recognise PAR1’s bodily orientation towards the robot, as she
now produces a request to “someone” (l.20): “I am thinking someone should give
you a clue about where it is (.) Tina” (l. 20–22). RES6’s verbal action can be
taken as an accountable response to the fact that PAR1 is displaying readiness
but has received no verbal direction or guidance of any sort. Thus, RES6 is tak-
ing another member’s perspective by verbally addressing the need for guidance,
something she presumably infers from PAR1’s bodily orientation, which projects
the trajectory. However, while PAR1 is standing in an isolated waiting position,
RES4 is engaged in a different conversation. He is addressing the other partici-
pants (see image 1) with task-specific talk regarding the experiment (l. 21-26).
The talk is produced in overlap, and there is no response to RES6’s request to
assist PAR1. RES4 is bodily orienting away from the VIP and toward RES2, and
Ocularcentric participation frameworks 69
FRAGMENT 2
Taking – and accounting for the lack of – a member’s perspective.
then RES3 and RES5 (Figure 18b), as he explicates his understanding of what
he will do in the experiment: “the first experiment here I will just walk behind
are we agreeing on that” (l. 23–24). RES2 responds to this question design with
a minimal aligning “yes, yes”. It is also noticeable that PAR1, the VIP responds
with a “yes” (l.25). A purely linguistics analysis could lead to the interpretation
that this is an aligning response to RES4, similar to RES2. However, a multi-
modal analysis encourages the interpretation that PAR1s response is a confirma-
tion of the RES6 prior turn. Figures 17b–18b shows how PAR1 is turning her head
and directing her response designed with a smile to the person standing behind
her (RES6), which can be interpreted as a visual display of PAR1’s recipiency of
RES6’s previous turn (l. 22). This “yes” is thus not just a minimal response but
rather displaying emotional affiliation and strong agreement with the suggestion
that “someone should give you a clue about where it is” (l. 20–22). So, two distinct
participation frameworks appear to be active concurrently, one including RES6
and PAR1 and the other including RES4, RES3 and RES5. The talk about aspects
of the experiment does not seem to be produced with PAR1 as a recipient. The
other seeing participants recognise this from RES4 turning his torso, head and
gaze to project them as recipients (Figure 16–18), and I, in the role of analyst,
recognise this from observing the video recordings. At the same time, RES6 and
PAR1 are co-constructing the need for guidance. We will see how this is done in
the next excerpt.
RES6 seems to recognise the visually impaired person’s (PAR1) orientation
to the main activity. By taking the blind member’s non-visual perspective, she
produces further verbal actions aimed at including PAR1 in the participation
70 Brian L. Due
knowledge of PAR1 being blind, and thus the difficulties of understanding spatial
relations and accomplishing “simple” navigational tasks such as walking toward
the robot.
At this point in the analysis, we must include prior knowledge of the differ-
ent participants’ experiences and epistemic stances (Heritage, 2012) in relation to
visual impairment. Whereas RES6 displays a transportable identity (Zimmerman,
1998) of being knowledgeable about blindness, as is observable in her adopting
the blind member’s perspective (she is an instructor in a blindness organisation),
RES4 displays a situated and transportable identity of being a novice in terms of
blindness. Therefore, it seems obvious that RES6 is enacting the membership cat-
egory “blindness-knowing” by practically orienting toward issues of assistance,
mobility, and navigation for PAR1, whereas RES4, being a novice in blindness,
but an expert in robotics, orients to the procedures of the experiment.
As this upgraded moral obligation (“you should”) is built into the design of the
turn produced by RES6, it is unsurprising that RES4 responds immediately and in
overlap. When RES6 finishes her turn (“guide Tina towards Spot” (l. 27)), RES4
simultaneously produces a strong affiliation with the moral obligation, which can
be heard through the prompt responding “YES”, produced with high volume, fol-
lowed by a request to Tina to approach the Spot robot: “Tina (.) come with me (.)
it is just here” (l. 28).
If PAR1 were a sighted person with visual access to the robot and the socio-
technical ecology, there would be no problems with this simple verbal request,
which is designed as a declarative using deixis, to produce an indexical reference
to a point in space (“it is just here”). This is produced while RES4 simultane-
ously seems to reach out to Tina with his right hand, as if projecting a haptic
guidance (Fig. 20/20b), but such embodied action is aborted because immediately
afterwards Tina turns to PAR2 to give him a dog treat following his overlapping
request (l. 29). Then, RES6 uses the index finger to produce a simple pointing
gesture to the robot (“it is just here”) (Figure 21). As we know from previous stud-
ies of pointing practices (e.g. Goodwin, 2003), to point is to project something as
understandable from a visual point of view. Pointing is a basic aspect of human
sociality and the evolution of language (Tomasello, 2008) but loses its concrete
meaning in the context of blindness (Saerberg, 2010). Pointing is a prime exam-
ple of the organisation of ocularcentric participation frameworks, as its semiotic
meaning is purely visually indexical.
RES4 then changes action formation toward a more explanatory description of
the object’s position in space. While pointing (Figure 25), he produces a spatial
description, the pragmatic features of which could also be an instruction: “you
have the robot on your left side” (l. 31). Compared to the prior turn, this turn is sig-
nificantly more indexical in terms of preparing for the VIP’s “perception-related
actions” (Due, 2021). This is recognisable through the reversed point of view (“on
your”) and the location-specific indexical word choice (“left side”). Such words
72 Brian L. Due
possible direction given by RES4 (“you have the robot on your left side” (l. 31).
In this environment, the embodied action of reaching out the arm both displays
an understanding of RES4’s prior turn as an instruction to grab the robot and an
invitation to be haptically guided by RES4.
In many situations within ordinary taken-for-granted ocularcentric par-
ticipation frameworks, verbal descriptions work finely as “seen but unnoticed”
(Garfinkel, 1967) guiding actions, but they are revealed as accountable phenom-
ena when visually impaired people are part of the framework in contexts where
co-participants do not have a member’s knowledge of how to act in coordination
with visually impaired people. Figure 27 illustrates an uncooperative moment.
The VIP reaches out, either grabbing the robot and/or receiving bodily guid-
ing and assistance, thus seeking to break the configuration of the participation
framework as solely being visually organised. However, instead of responding
with the body or with further detailed verbal instructions, RES4 points toward
the robot with a palm hand. There is no fracture of the ocularcentric organisation,
which prompts PAR1 to produce a verbal account that almost works as an excuse
for not being able to find the robot: “yes (.) I only have my cane with me today”
(l. 34). This turn, in this sequential environment, functions as an explanation of
the trouble involved in locating the whereabouts of the robot and the failure to
receive bodily guidance based on verbal descriptions alone. Only a member of
“the culture of visual impairment” would know the difference between a long
white cane and a short guide cane. A long white cane is the “norm” used for navi-
gation. The short guide cane is used to identify any immediate obstacles, and it
is not used for navigation in the way that the white cane is used. The differences
between the canes are made interactionally relevant in this context as a kind
of excuse for not easily navigating towards and locating the robot. This indexi-
cal membership knowledge is used as an explanation that makes the category of
being blind explicitly and morally relevant, but there is no uptake from RES4.
This may be because he has no knowledge of the differences between canes, and
their consequences for navigation and the practices of achieving understanding
of object–space relations in the situation, and thus his inability to adopt that kind
of membership perspective.
While PAR1 produces the account (l. 34), she simultaneously makes an observ-
able change in her bodily stance, from reaching out (Figure 27), thereby bodily
recruiting assistance (Kendrick & Drew, 2016), to standing still, with her arm
bent (Figure 28). One of the “official” practices for guiding visually impaired peo-
ple is to let them hold on to the elbow or forearm (EverydaySight, 2018), which
PAR1 can be seen to invite in Figure 27. As RES4 does not orient to this practical
morality of attending to the consequences of visual impairment, PAR1 changes
her bodily position, now more clearly offering her left arm (elbow) to be grabbed –
an action that seems to be mirrored by RES4 (Figure 28). As RES4 still does not
engage in a haptic sociality (Cekaite et al., 2020) with PAR1, she reaches her arm
74 Brian L. Due
out towards the presumed position of the robot, while RES4 continues to produce
verbal descriptions of the robot’s position in space (l. 35).
For sighted people, and in the visual analysis, it seems that PAR1 is almost
touching the robot. However, as the robot is producing no non-visual sensory
signs, its position in space is not recognisable for PAR1. In effect, she can be close
to touching it while at the same time being “far away” from succeeding. Finally,
RES4 seems to recognise that verbal descriptions are insufficient in guiding PAR1
into the midst of the participation framework and toward a physical encounter
with the robot – specifically, grabbing the harness attached to it. Consequently,
RES4 first touches PAR1 on the left elbow (Figure 30) and then accounts for it:
“may I can you feel your hand here (.) yes” (l. 36). Touching and achieving an
intercorporeal (Meyer et al., 2017) relation with PAR1 is thus not unnoticeably
accomplished but treated as something that requires permission. The first part of
the turn, “may I”, is a question associated with politeness, which displays an orien-
tation toward the morality of attending to the sensory impairment as a reason for
crossing the boundary of another person’s intimate space. However, as RES4 has
already touched PAR1, the turn-constructional unit has no real pragmatic effect.
This is observable because PAR1 does not respond, and RES4 continues, with no
intraturn space produced for a response. As RES4 has already touched PAR1, he
continues with the bodily directed description “can you feel” (l. 36) while observ-
ably grabbing PAR1’s arm more firmly and pulling it down (Figure 31), and then
says, “your hand here” (l. 36), as he pulls her arm all the way down to the harness,
which is hanging from the robot (Figure 32). The deictic word “here” is produced
at the exact moment PAR1 touches the harness (Figure 32), and RES4 lets go of
PAR1’s arm (Figure 33). He accounts for the fact that PAR1 has now touched the
robot and grabbed the harness with a confirming verbal attachment: “yes” (l. 36).
PAR1 responds with affiliation (Steensig, 2013): “there it was yes” (l. 37, Figure
34). The deictic “there” referring to the harness’s spatial position, combined with
the affiliation, confirms that common ground and a mutual perceptual field have
been established. This excerpt neatly illustrates the possible “awkwardness” and
“uncertainty” participants may experience and display in interaction with people
with disabilities. These kinds of social encounters require attention to what is
otherwise taken for granted, and the analysis has shown how this requires extra
communicative work to fracture the otherwise taken-for-granted ocularcentric
participation framework.
Conclusion
For a visually impaired person, moving two metres towards a robodog and grab-
bing its harness requires a complex organisation of participation status, requests,
bodily displays of recruitment, verbal descriptions and instructions, and the final
organisation of a haptic sociality as an apparatus for fracturing the ocularcen-
tric organisation of the participation framework. Orienting to the robot is, as the
Ocularcentric participation frameworks 75
analysis has shown, only a “simple task”, for all practical purposes, when partici-
pants can easily see and have a shared perception of the object to which attention
is directed. However, the situation is very different when one of the participants is
visually impaired, which makes this a perspicuous case for exploring the seen but
unnoticed visual aspects of ocularcentric participation frameworks. The analysis
showed the overall sequential process of 1) getting ready to move as a blind per-
son alone, and 2) how a competent member (RES6) then verbally adopts a mem-
ber’s perspective and accounts for the possible social exclusion. The analysis then
showed 3) how RES4 reorients and takes a member’s perspective and displays
recognition for the need for guiding actions, but then produces these as verbal
descriptions, using ocularcentric indexical terminology. As such, ocularcentric
participation frameworks seem to be defined by actions that privilege vision and
the use of deictics in face-formations over other senses and thus practices of see-
ing, looking, and gazing as a member’s taken-for-granted resource for ordinary
interactional projects. Finally, 4) the analysis showed how the ocularcentric par-
ticipation framework becomes an observable, accountable form of contextual con-
figuration when the visual primacy is fractured, and intersubjectivity is achieved
through other sensory resources – specifically, haptics and touch, as the key sen-
sory resource for distribution of perception-related actions (Due, 2021). The shift
in sensory resource was enacted as a change from deictic terms (“here”, “there”)
to touching the VIP, but only after the VIP (PAR1) produced recruiting actions
(Drew & Kendrick, 2018) (bodily positioning and verbal excuses). There was then
a stepwise transition from indexical verbal descriptions with pointing practices to
a haptic sociality. Cekaite and Mondada have shown how touch can be used as “a
communicative resource to coordinate social interaction and various courses of
action” (2021, p. 10). In this analysis, steering and coordination through the use of
“body techniques” (Mauss, 1935) are not treated interactionally as an intervention
but as effective means for accomplishing the activity.
This chapter described the shift in sensory resources for communication as
not just joint attention but a morality of attention towards sensory impairment,
most explicitly accounted for by the word “should” (l. 22 and 24). This directive
turn-design in that sequential environment has a moral obligation toward guiding.
Belonging to the category of being visually impaired involves aspects of help and
assistance. These category devices have a cultural association, which Sacks calls
standardised relational pairs. Housley (2021, p. 211) states that
would then have missed precisely what blind people often miss – namely, the vis-
ual organisation that excludes blind people, even though they might not observe it
or account for it. This observation is not reproducing the problem of constructing
blindness as a deficit, as sometimes found in disability studies, but argues that prac-
tical interactions with VIPs require different organisations of sensory and bodily
practices. Future studies of not just visually impaired people in interactions with
seeing participants but also ordinary ocularcentric participation frameworks could
pay more attention to multisensoriality and moderate the usually strong focus on
the audio–visual organisation of interaction and the taken-for-granted nature of the
individual’s own membership perspective.
References
Abrahamson, D., Flood, V. J., Miele, J. A., & Siu, Y.-T. (2019). Enactivism and
ethnomethodological conversation analysis as tools for expanding universal design for
learning: The case of visually impaired mathematics students. ZDM, 51(2), 291–303.
https://doi.org/10.1007/s11858- 018- 0998-1
Avital, S., & Streeck, J. (2011). Terra incognita: Social interaction among blind children. In
J. Streeck, C. Goodwin, & C. D. LeBaron (Eds.), Embodied interaction: Language and
body in the material world (pp. 169–181). Cambridge University Press.
Broth, M., & Keevallik, L. (2014). Getting ready to move as a couple accomplishing mobile
formations in a dance class. Space and Culture, 17(2), 107–121. https://doi.org/10.1177
/1206331213508483
Cekaite, A., & Mondada, L. (2021) (eds.). Towards an interactional approach to touch
in social encounters. In Touch in social interaction: Touch, language, and body (pp.
1–27). Routledge.
Cekaite, A., Mondada, L., & Mondada, L. (2020). Touch in social interaction: Touch,
language, and body. Routledge. https://doi.org/10.4324/9781003026631
Drew, P., & Kendrick, K. H. (2018). Searching for trouble: Recruiting assistance through
embodied action. Social Interaction: Video-Based Studies of Human Sociality, 1(1).
https://doi.org/10.7146/si.v1i1.105496
Due, B. L. (2021). Distributed perception: Co-operation between Sense-Able, actionable,
and accountable semiotic agents. Symbolic Interaction, 44(1), 134–162. https://doi.org
/10.1002/symb.538
Due, B. L. (2022a). The haecceity of assembling by distributing perception. Academic
Medicine/IEEE HRI 2022. 17th annual ACM/IEEE international conference on human-
robot interaction (HRI 2022)!, Online Originaly Sapporo, Hokkaido, Japan.
Due, B. L. (2022b). Guide dog versus robot dog: Assembling visually impaired
people with non-human agents and achieving assisted mobility through distributed
co-constructed perception. Mobilities, 0(0), 1–19. https://doi.org/10.1080/17450101
.2022.2086059
Due, B. L. (2023). A walk in the park with Robodog: Navigating around pedestrians using
a Spot robot as a ‘guide dog.’ Space and Culture. DOI:10.1177/12063312231159215
Due, B. L., Kupers, R., Lange, S. B., & Ptito, M. (2017). Technology enhanced vision in
blind and visually impaired individuals. Synoptik Foundation Research project. Circd
Working Papers in Social Interaction, 3, 1, 1–31.
Ocularcentric participation frameworks 79
Due, B. L., & Lange, S. B. (2018a). Semiotic resources for navigation: A video ethnographic
study of blind people’s uses of the white cane and a guide dog for navigating in urban
areas. Semiotica, 222, 287–312. https://doi.org/10.1515/sem-2016- 0196
Due, B. L., & Lange, S. B. (2018b). The Moses effect: The spatial hierarchy and joint
accomplishment of a blind person navigating. Space and Culture, 21(2), 129–144.
https://doi.org/10.1177/1206331217734541
Due, B. L., & Lange, S. B. (2018c). Troublesome objects: Unpacking ocular-centrism
in urban environments by studying blind navigation using video ethnography and
ethnomethodology. Sociological Research Online, 24(4), 475–495. https://doi.org/10
.1177/1360780418811963
Due, B. L., Lange, S. B., Nielsen, M. F., & Jarlskov, C. (2019). Mimicable embodied
demonstration in a decomposed sequence: Two aspects of recipient design in
professionals’ video-mediated encounters. Journal of Pragmatics, 152, 13–27. https://
doi.org/10.1016/j.pragma.2019.07.015
Edmonds, D. M., & Greiffenhagen, C. (2020). Configuring prospective sensations:
Experimenters preparing participants for what they might feel. Symbolic Interaction,
n/a(n/a). https://doi.org/10.1002/symb.485
EverydaySight. (2018, November 27). How to guide a person who is visually impaired or
blind: 12 tips for sighted guide. Everyday Sight. https://www.everydaysight.com / how
-to-guide-a-person-who-is-visually-impaired/
Fele, G., & Liberman, K. (2020). Some discovered practices of lay coffee drinkers. Symbolic
Interaction, online first. https://doi.org/10.1002/symb.486
Garfinkel, H. (1963). A conception of and experiments with ‘“trust”’ as a condition of
stable concerted actions. In O. J. Harvey (Ed.), Motivation and social interaction:
Cognitive determinants (pp. 187–138). The Ronald Press Company.
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice Hall.
Garfinkel, H., & Sacks, H. L. (1970). On formal structures of practical actions. In J.
C. McKinney & E. A. Tiryakian (Eds.), Theoretical sociology: Perspectives and
developments (pp. 338–366). Appleton Century Crofts.
Garfinkel, H., & Wieder, D. L. (1992). Two incommensurable, asymmetrically alternate
technologies of social analysis. In G. Watson & R. M. Seiler (Eds.), Text in context:
Studies in ethnomethodology (pp. 175–206). Sage.
Goffman, E. (1971). Relations in public: Microstudies of the public order. Harper and Row.
Goffman, E. (1981). Forms of talk. University of Pennsylvania Press.
Goodwin, C. (1980). Restarts, pauses, and the achievement of a state of mutual gaze at
turn-beginning. Sociological Inquiry, 50(3–4), 272–302.
Goodwin, C. (1994). Professional vision. American Anthropologist, 96(3), 606–633.
Goodwin, C. (1995). Seeing in depth. Social Studies of Science, 25(2), 237–274.
Goodwin, C. (2003). Pointing as situated practice. In S. Kita (Ed.), Pointing: Where
language, culture and cognition meet (pp. 217–241). Erlbaum.
Goodwin, C. (2007). Participation, stance and affect in the organization of activities.
Discourse and Society, 18(1), 53–74.
Goodwin, C., & Goodwin, M. H. (2005). Participation. In A. Duranti (Ed.), A companion
to linguistic anthropology.(pp. 222-244) Blackwell.
Hall, E. T. (1976). Beyond culture. Anchor. http://www.amazon.com / Beyond- Culture
-Edward-T-Hall/dp/0385124740
Have, P. T. (2002). The notion of member is the heart of the matter: On the role of membership
knowledge in ethnomethodological inquiry. Forum Qualitative Sozialforschung: Forum:
80 Brian L. Due
5
COLLECTING AND ANALYSING
MULTI-SOURCE VIDEO DATA
Grasping the opacity of smartphone
use in face-to-face encounters
Introduction
This chapter will discuss how additional types of recording hardware and soft-
ware (that is, wearable cameras and screen capture) can reveal – or sometimes
modify – analytical objects when investigating smartphone use in face-to-face
interactions. Within the framework of ethnomethodological conversation analy-
sis, the observability of social actions in naturally occurring interactions is the
basis of social order, thus making it available to both the participants themselves
and to the researchers recording them (Sacks, 1984). The increasing interest in
the coordination of talk, embodied actions, object use, and interactional space in
recent decades has gone hand in hand with the systematic use of video recordings,
leading to the currently well-established multimodal approach to social interac-
tion (e.g., Goodwin, 1981; Streeck et al., 2011). It has been widely acknowledged
that different recording set-ups can reveal different aspects of the temporal and
sequential organisation and unfolding of social interaction (Mondada, 2013; see
also Kohonen-Aho & Haddington, this volume; McIlvenny & Davidsen, this
volume; Raudaskoski, this volume). The ubiquity of mobile devices, typically
smartphones, in a large variety of contemporary interactional settings represents
a further challenge to the observability of participants’ actions. Due to their inher-
ent multi-functionality and mobility, smartphones can be used in a variety of indi-
vidual and joint actions, which are often of low visibility to co-present others.
While mundane technology use does not usually suffer from “opaqueness” in the
overall setting (Goodwin, 2000, p. 1508), there is a twofold opacity that is related
to the technological object itself, namely participant opacity (or “bystander igno-
rance”, Raudaskoski et al., 2017) and analytical opacity.
DOI: 10.4324/9781003424888-7
86 Iuliia Avgustis and Florence Oloff
Our focus is on the latter, analytical opacity, as a situation that researchers may
encounter during analysis, and which can potentially become a hindrance to mak-
ing analytical claims. This problem has been addressed via the use of additional
recording equipment in previous interactional research on mobile device use. In
the next three sub-sections, we will discuss how different recording set-ups have
been motivated by different analytical foci. The opacity of smartphones has been
acknowledged as being a challenge for both the participants in the interaction
and for researchers. However, additional recording equipment might aggravate
the long-standing and familiar issue regarding the naturalness of the recorded
data. This theoretical consideration will be reflected in examples taken from video
recordings of everyday encounters in Russian1, which combine static cameras,
individual wearable cameras, and dynamic screen captures on the mobile devices.
In the following sub-sections, we will show how this combination provides access
to additional details concerning how smartphone-related multiactivity unfolds
and provides a different perspective on the sequential embeddedness of individual
smartphone use. Moreover, this way of recording can lead to a better understand-
ing of how multimodal actions relate to onscreen digital objects, leading to a
refinement of well-known interactional phenomena such as assessments or point-
ing gestures. We will then discuss whether more invasive recording equipment
can “contaminate” interactional phenomena by demonstrating that this mainly
depends on the type of action under scrutiny (non-technologically related actions
versus smartphone-related phenomena). Finally, we will address both the advan-
tages and the challenges of complex recording set-ups for the analysis of mundane
mobile device use in face-to-face interactions.
and contingencies, which could potentially be opaque for a researcher using only
one angle of vision.
situations of both convergent device use (that is, being connected to the on-going
interaction) and divergent ones (that is, being separate from the joint interaction)
(see Brown et al., 2013). This opacity can potentially problematise our understand-
ing of members’ sense-making practices. In this chapter, we will show how addi-
tional hardware and software used to record interactions around smartphones can
overcome this analytical opacity, at least partially. However, the use of additional
equipment might raise questions regarding the naturalness of recorded interactions.
Excerpt 1 (181222_OwnWay_Ru)
A B
Figure 1.a-b.
A B
Figure 2.a-b.
A B
Figure 3.a-b.
06 (1.0)
07 DAN: +perenjat' to vpechatlenie na ljudej
to make the (same) impression on people
mih: +searches for the chat->>
08 kotoroe eti ljudi proizvodjat
that these people make
The excerpt begins after MIH has found the YouTube channel he wants to share
with his co-participants. He has been looking at his screen during the entire
searching process and continues to do so at the beginning of the transcript. In
line 1, he selects the “share” option on his phone; when his phone prompts options
for sharing, he chooses the Russian social network Vkontakte (VK in the tran-
script). He then selects the option “send as a personal message” (Figure 1.b), and
his phone’s screen becomes white for several moments before the VK interface
appears. During this interface switch, MIH raises his gaze from the phone, looks
at DAN, and provides a minimal response token (“mhm”, l. 05, Figures 2.a–b). He
then quickly switches his gaze back to the smartphone on which the VK interface
has appeared (Figures 3.a–b).
Collecting and analysing multi-source data 93
The temporal organisation of gaze switches from and to mobile devices has been
studied previously in a non-interactional context. In their study of single commut-
ers, Licoppe and Figeac (2018) showed that phone users switched their gaze from
the phone to other activities at particular moments, for example, when a “progress
bar icon” appeared on the screen. Thus, the “sequential texture” of the phone inter-
faces provides opportunities for changes in the phone user’s focus of attention. In
our excerpt, MIH carries out his smartphone use and participation in the on-going
conversation as parallel, nonconflicting activities (Haddington et al., 2014), but he
only shifts his gaze to DAN when his smartphone provides a sequential opportu-
nity to switch to another activity; that is, when it is temporarily not usable. Hence,
this can be described as “shifting the attention away from the phone” rather than
“shifting the attention to the co-participant”. It is at this moment that MIH pro-
vides a response to DAN, which demonstrates his commitment to the co-present
interaction despite his involvement in an individual activity. However, his response
is late in relation to DAN’s prior turn suspension and is positioned in the middle
of her next turn-constructional unit: MIH provides a pro forma rather than a fit-
ted response (Oloff, 2021, p. 222). MIH’s smartphone use is topically related to
the on-going talk, but follows a different temporal order. Both activities have the
potential to affect each other’s progressivity; therefore, both should be recorded and
considered in the analysis. While screen captures make the temporal order to which
MIH orients clearly visible, the static camera captures the switches in his displayed
attention.
In contrast to the previous excerpt in which the smartphone use is recognis-
ably occasioned by talk, this only becomes evident in Excerpt 2 due to the screen
capture recording. Prior to the excerpt, Maria (MAR) and Ekaterina (EKA) had
been discussing when they were going to finish their lunch. In line 01, EKA
tells MAR that they could finish lunch earlier, to which MAR responds with the
Russian change-of-state token “a” (Heritage, 1984), indicating that this informa-
tion is new. EKA then immediately starts a new sequence and topic by showing
and commenting on her dish (l. 04–05). At the same time, MAR puts her fork
down, picks up her phone (Figure 4), and unlocks it. MAR does not account for
this action and it is not commented upon further. The screen capture provides
empirical proof that MAR’s phone use was occasioned by the previous talk and
is linked to a practical concern, namely scheduling the next meeting. Therefore,
having access to MAR’s screen provides a better understanding of her perspec-
tive, as well as of the way in which she makes sense of and displays her sense-
making of the situation.
94 Iuliia Avgustis and Florence Oloff
Excerpt 2 (191227_Cilantro_RU)
EKA MAR
Figure 4.
06 +(0.6)
mar: >+unlocks SP->
07 MAR: >da +da da.<@
>yeah yeah yeah.<
mar: ->+gaze spm->
spm: @Instagram home page on the screen->
08 (0.9)+@(0.3)
mar: >+opens side bar
spm >@
09 MAR: u:hm+
mar: >+closes side bar->
10 (0.7)+ (1.0) + (0.8) + (0.5)
mar: >+opens messages+opens chat+opens keyboard->
11 EKA: eto# +↑krevetka, eto kal'mar?
is this a ↑shrimp, is this a squid?
mar: ->+types "Oh, no, I will be free earlier"->
fig: #fig.5.a-c
A B C
Figure 5.a-c.
Collecting and analysing multi-source data 95
12 (1.5)
13 EKA: ja vizhu zdes' ↓kinzu.
I see ↓cilantro here.
14 (2.5)+ (0.5) +
mar: ->+sends the message+
15 MAR: +ki:nzu+# v tom jame?
ci:lantro in tom yum?
mar: >+.....+gaze EKA's soup->>
fig: #fig.6.a-c
A B C
Figure 6.a-c.
After having picked up her phone, MAR provides a late-timed and somewhat
unengaged response to EKA’s comment regarding the food (“yeah yeah yeah”,
l. 07), thereby aiming to halt the course of EKA’s action (Stivers, 2004). In fact,
MAR starts to look at her smartphone after the first response token, while EKA
begins to gaze down at her soup, and a lapse occurs (l. 08). Maria then continues
to be involved in her smartphone-related activity until the end of the excerpt. Her
“uhm” (l. 09) is related to her navigation process on the screen and simultane-
ously displays to EKA that she is involved in a phone-related activity. After MAR
opens the chat in which she previously wrote that she would be free at 15:40 (l.
10, Figure 5.a–c), she begins to type (“Oh, no, I will be free earlier”), while EKA
resumes her commentary on the food (l. 11). When preparing and sending the
message, Maria does not show any audible or visible engagement with EKA, and
several long periods of silence occur (l. 12, 14).
Without the screen capture, one would not be able to state definitively whether
MAR’s smartphone use is connected to the previous talk or not. However, the
screen capture reveals that MAR’s phone use is related to the practical task of
planning her day, and that this message is clearly based on the new information
(l. 01) that she has received (l. 03). MAR also returns to the co-present interac-
tion as soon as the message has been sent (l. 15, Figures 6.a–c), which shows
that her device-related activity was restricted to a precise task. Porcheron et al.
(2016) analysed both conversation-related and conversation-unrelated occasions
of mobile device use in their study of everyday conversations in pubs. However,
96 Iuliia Avgustis and Florence Oloff
they excluded situations in which the device use was not accounted for or topical-
ised. This might have been related to the fact that the authors only used fixed cam-
eras to record the overall view of the interaction. Unaddressed device use remains
analytically opaque in such a recording set-up, and it is not possible to determine
whether such use is related or unrelated to the conversation. Accordingly, smart-
phone use that is not topicalised or explicitly accounted for remains largely under-
investigated. The use of additional hardware and software could allow researchers
to explore when and how the participants prefer to conceal their smartphone-
related activities (that is, relating the device use to “impression management”), or
when they find it unnecessary to topicalise or account for their device use; that is,
concerning the morality of smartphone use (see Robles et al., 2018).
This section has illustrated how a combination of stationary and wearable cam-
eras/screen capture provides an opportunity to better understand the organisation
of smartphone-related multiactivity. While wearable cameras and/or screen cap-
ture provide access to the temporal organisation of a smartphone user’s individual
activity, the static camera provides access to shifts in the smartphone user’s gaze
orientation. A multi-source recording can be used to understand not only how the
progression of the interaction is affected by onscreen events (Excerpt 1), but also
why the smartphone use was originally initiated (Excerpt 2). Even if the partici-
pants do not address a smartphone-based activity explicitly, the activity can still
be relevant from an analytical point of view, as individual courses of action can
become recognisably connected to previous or future joint courses of action.
Excerpt 3 (181225_FrenchDog_Ru)
A B
A B
Digital content has to be retrieved on a smartphone prior to being able to show it.
In this case, DAR first needs to locate the folder (l. 01), followed by searching for
the picture she wants to show within the collection of pictures (l. 02, Figure 7.b).
IGO looks at the display during this process and can therefore perceive the picture
that Daria is selecting (Figures 7.a–b). He then begins to lean towards her but, as
the interface does not respond to DAR’s first tapping gesture, his first assessment
appears to be premature (l. 04, Figures 8.a–b). This is also observable in DAR’s
next action: She taps on the picture a second time (l. 04) and, a micro-second
98 Iuliia Avgustis and Florence Oloff
Excerpt 4 (191227_Merlin_Ru)
Figure 10.a-c.
A B
09 (0.6---)+(0.6)
eka: >touches+..moves pen to display->
10 EKA: tut pod+#°p(h)is(h)+ano [bl(h)ja:t'°+]
here (it's) wr°itten [dammit° ]
11 MAR: [°mhehe° ]
eka: >......+ppp w/pen--+,,,,
eka: >gaze display-----------------------+...looks up to MAR->
fig: #fig.12
12 MAR: nu eto v fi- eto novyj fi:l'm tipa,
well it’s in the fi- this is a new fi:lm kinda,
The palm-down position of MAR’s hand and her use of the local deictic eto “this”
reveal her pointing to have the aim of identifying a specific object (Kendon, 2004,
pp. 207–209). However, the lack of tension in her pointing finger and the hovering
movement of her finger and hand over the display (l. 02) hint at a possible uncer-
tainty, which is reflected in her incomplete turn and the hesitation particle. EKA
then identifies the actor as having played the role of Merlin (l. 03). As this infor-
mation is not fully endorsed by MAR (l. 04–05), EKA then moves her right hand
and index finger to the display (l. 06) and points to the caption “Merlin” below
the actor’s picture (l. 08, Figures 11.a–b). MAR explicitly questions this informa-
tion in overlap with this turn (l. 07). Shortly after the overlap has been resolved,
EKA grasps the pen that is lying in the middle of the table (Figure 11.a), moves it
towards the display, and uses it to point at the “Merlin” caption again (l. 09–10,
Figure 12), thus reformulating and placing emphasis on her previous turn (“here
it’s written”, l. 10).
Excerpt 4 illustrates that pointing to displays can be carried out in a variety of
ways that, similar to pointing to non-digital objects, display the way in which the
participants refer to the object in question, for example, an object about which a
question is formulated, as something that can substantiate a claim (by referring
to a visible detail such as the caption) or that substantiates a claim once again.
Different types of resources are used in these instances, such as a finger versus
a pen, tensed versus relaxed fingers, immobile versus hovering pointing, and so
forth. The types and shapes of digital pointing are thus not deployed according to
the objectively small size of the objects to which they refer, but are adapted to the
contingencies of potentially static or dynamic objects and foci of attention, and to
the assumed relevance of the object for the recipient.
While an overall view is indispensable for researchers in order to access the
participants’ bodily orientation towards the device and each other, the details of
Collecting and analysing multi-source data 101
the pointing hands and the things that they are referring to generally remain invis-
ible. Wearable cameras – ideally in combination with screen capture or otherwise
accessible files of what is on the screen – can provide a valuable data base for
investigating pointing gestures, as well as for assessments of the digital content
on the screen. This allows for the understanding of how the establishment of joint
attention, speaking turns, gestures, and digital objects are interconnected, and
which aspects of the referring to and assessing of digital objects are similar or
different with regard to non-digital settings. Moreover, it allows for the discovery
of practices that appear to be specifically adapted to the affordances of the device
and the handling thereof, such as in the event of double assessments. Partially
overcoming the analytical opacity of smartphone use can thus contribute to revis-
iting basic social actions in light of new material features and affordances of the
setting.
Excerpt 5 (181222_AllCameras_Ru)
Figure 13.
Figure 16.
Collecting and analysing multi-source data 103
During DAN’s turn, MIH leans towards TIN’s phone, apparently to include him-
self in TIN’s picture (l. 03, Figure 14). Shortly afterwards, DAN shifts her gaze
to TIN’s smartphone, then abandons her turn (l. 4), and also leans towards the
phone (Figure 15). These embodied actions display an interest in joint picture
taking, and TIN and DAN simultaneously suggest taking a picture in which the
wearable cameras are visible (l. 06–07). After the excerpt ends, the participants
continue to pose with and comment on the cameras as props. When the cameras
are finally visible in the image section, TIN points to the camera and presses the
shutter with her other hand (Figure 16, taken after the end of the excerpt). MIH
then asks TIN to send the picture to their group chat, and DAN reinitiates her
previous topic.
This excerpt shows one way in which participants can orient towards the pres-
ence of wearable cameras. Even though wearable cameras are now widely used in
a variety of different settings in daily life, they are still associated with exciting
and difficult activities (Chalfen, 2014). By taking pictures with wearable cam-
eras, or “zooming in” on them when recording videos, the participants show their
orientation towards this equipment as a potentially exceptional object in a cafe.
However, as the wearable camera is simply another photographable object in this
scenario, this excerpt could be included amongst other instances of situated pic-
ture taking with smartphones.
In the present data set, all of the participants agreed to use integrated or freely
downloadable screen capture software. The participants usually activated the
screen capture at the beginning of the event, and received a reminder about the
on-going recording either via notifications or as a small icon on the display (see
Excerpt 4, Figure 11.b). Unlike wearable cameras, the screen capture was rarely
topicalised once the application had been installed and activated. However, the
participants experienced problems with the software (the recording stopped every
time the phone was locked) during one of the events, and decided to activate the
screen capture manually every time they used their smartphones. Excerpt 6 shows
an instance of Daria (DAR) reminding Nina (NIN) to activate the screen capture.
DAR, a psychologist, is listing the mental disorders affecting the people whom she
had previously counselled. This list construction, as part of an extended storytell-
ing, has been going on for some time, and NIN picks up her smartphone at the
beginning of the excerpt.
104 Iuliia Avgustis and Florence Oloff
Excerpt 6 (181229_PanicAttacks_Ru)
NIN
DAR
12 (1.4)
13 NIN: °°hi hi ↑hih°°
14 (2.5)*(0.5)
nin: ->*opens messages->
15 DAR: °mhm°
16 (0.6)
17 DAR: @*.h:: i vo:t.+#
.h:: and we:ll.
spn: @screen capture is recording->>
nin: >*opens a message window and reads->
dar: >+looks away->
fig: #fig.19
18 (2.7)
19 DAR: .ts +tam+ byla takaja shtuka*+ kak+ panicheskie ataki.#
.ts there was such thing as panic attacks.
dar: >+...+looks at NIN-------+SP--+looks at NIN->
nin: >*opens keyboard and types->>
fig: #fig.20
After picking up her phone, NIN first checks her notifications and then scrolls
through her applications (l. 01). In the meantime, DAR continues to talk about
the disorders she had treated, but also shifts her gaze towards NIN’s phone
(l. 03, Figure 17). After closely monitoring NIN’s phone, DAR asks NIN if she
has turned on the screen capture (l. 9). NIN answers with a slightly modified
other-repeat and laughs quietly (l. 11–13, Figure 18). She then locates the screen
capture application and turns it on (l. 11). DAR’s question about the screen capture
demonstrates her momentary self-categorisation as a research participant. DAR
also waits for confirmation that the screen capture has been activated before she
resumes her suspended turn. DAR acknowledges this waiting process (“mhm”,
l. 15), observes the beginning of the recording, and only then returns to the previ-
ous topic (l. 17). A participant’s orientation towards “doing being a research par-
ticipant” can also be used to accomplish certain actions, such as to make jokes or
to moralise about others’ device use (see Robles et al., 2018). In this sense, DAR’s
reminder about the recording might also serve to emphasise that the suspension of
her storytelling has been caused by NIN’s smartphone use; also see DAR’s gaze
back to NIN (l. 19) and her facial expression (Figure 20).
By initiating a side sequence related to the recording application, DAR makes
both the suspension of her telling (also see the gesture-hold of her left hand,
Figures 17–19) and her monitoring of NIN’s device-related activity publicly avail-
able. Apart from the fact that equipment-related sequences might implement mul-
tiple action types, this episode sequentially functions in the same way that any
other side sequence would do in this environment, as it suspends the previously
on-going telling. In this regard, this excerpt could be treated analytically as an
instance of suspension and resumption practices (Helisten, 2017) in general, with-
out having to specifically take its topical relationship to the recording set-up into
account. However, if we observe the manipulation of the device itself closely, it
can be seen that NIN switches from reading the notifications (l. 02) to searching
for and activating the screen recording (l. 05), then goes back to open a messaging
application (l. 14). This makes it more difficult to reconstruct the initially pro-
jected trajectory of NIN’s phone use. Consequently, the activation of the recording
application may interfere with the unrecorded use, as it might lead to different
temporalities and types of navigation on the interface.
This section has illustrated how wearable cameras and screen capture can
appear in the data as additional objects, topics, or activities. On the one hand,
these can be considered to not contaminate the data, as the basic actions and
activities are still accomplished in systematic and recognisable ways (“everything
is a natural something”, Hofstetter, 2021, p. 14). On the other hand, supplemen-
tary recording devices can also modify the sequential organisation of smart-
phone-based actions. A possible superimposed orientation towards the recording
equipment can therefore render an occurrence of this specific action less typical.
Accordingly, the analytical adequacy of certain types of video data depends on
the phenomena under scrutiny, with some phenomena and actions being less pro-
totypical. However, such topicalised interferences of the recording equipment are
106 Iuliia Avgustis and Florence Oloff
rare in our data. Furthermore, as they are largely traceable in the interactions, the
specificity of these situations can be taken into account when analysing the data
with regard to a particular phenomenon.
Note
1 The data were collected as part of the project “Smart Communication: The situ-
ated practices of mobile technology and lifelong digital literacies” (funded by the
Eudaimonia Institute, University of Oulu 2018–2022, and the Academy of Finland,
2019–2023, project number: 323848).
References
Asplund, S.-B., Olin-Scheller, C., & Tanner, M. (2018). Under the teacher’s radar: Literacy
practices in task-related smartphone use in the connected classroom. L1-Educational
Studies in Language and Literature, 18, 1–26. https://doi.org/10.17239/ L1ESLL-2018
.18.01.03
Bolden, G. (2004). The quote and beyond: Defining boundaries of reported speech in
conversational Russian. Journal of Pragmatics, 36(6), 1071–1118. https://doi.org/10
.1016/j.pragma.2003.10.015
Brown, B., McGregor, M., & Laurier, E. (2013). IPhone in vivo: Video analysis of mobile
device use (pp. 1031–1040). https://doi.org /10.1145/2470654.2466132
108 Iuliia Avgustis and Florence Oloff
Brown, B., McGregor, M., & McMillan, D. (2015). Searchable objects: Search in everyday
conversation (pp. 508–517). https://doi.org/10.1145/2675133.2675206
Brown, B., O’Hara, K., McGregor, M., & McMillan, D. (2018). Text in talk: Lightweight
messages in co-present interaction. ACM Transactions on Computer-Human
Interaction, 24(6), 42:1–42:25. https://doi.org/10.1145/3152419
Chalfen, R. (2014). ‘Your panopticon or mine?’ Incorporating wearable technology’s Glass
and GoPro into visual social science. Visual Studies, 29(3), 299–310. https://doi.org/10
.1080/1472586X.2014.941547
DiDomenico, S. M., & Boase, J. (2013). Bringing mobiles into the conversation: Applying
a conversation analytic approach to the study of mobiles in co-present interaction. In
D. Tannen & A. Trester (Eds.), Discourse 2.0: Language and new media (pp. 119–131).
Georgetown University Press.
DiDomenico, S. M., Raclaw, J., & Robles, J. S. (2020). Attending to the mobile text
summons: Managing multiple communicative activities across physically copresent
and technologically mediated interpersonal interactions. Communication Research,
47(5), 669–700. https://doi.org/10.1177/0093650218803537
Francis, D., & Hester, S. (2004). An invitation to ethnomethodology: Language, society
and social interaction. Sage Publications. https://doi.org/10.4135/9781849208567
Goodwin, C. (1981). Conversational organization: Interaction between speakers and
hearers. Academic Press.
Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal
of Pragmatics, 32(10), 1489–1522. https://doi.org/10.1016/S0378-2166(99)00096-X
Gordon, C. (2013). Beyond the observer’s paradox: The audio-recorder as a resource for
the display of identity. Qualitative Research, 13(3), 299–317. https://doi.org/10.1177
/1468794112442771
Haddington, P., Keisanen, T., Mondada, L., & Nevile, M. (2014). Towards multiactivity as
a social and interactional phenomenon. In P. Haddington, T. Keisanen, L. Mondada, &
M. Nevile (Eds.), Multiactivity in social interaction: Beyond multitasking (pp. 3–32).
John Benjamins. https://doi.org/10.1075/z.187.01had
Hazel, S. (2016). The paradox from within: Research participants doing-being-observed.
Qualitative Research, 16(4), 446–467. https://doi.org/10.1177/1468794115596216
Heath, C., Hindmarsh, J., & Luff, P. (2010). Video in qualitative research: Analysing social
interaction in everyday life. Sage. https://doi.org/10.4135/9781526435385
Helisten, M. (2017). Resumptions as multimodal achievements in conversational (story)
tellings. Journal of Pragmatics, 112, 1–19. https://doi.org/10.1016/j.pragma.2017.01.014
Hellermann, J., Thorne, S. L., & Fodor, P. (2017). Mobile reading as social and embodied
practice. Classroom Discourse, 8(2), 99–121. https://doi.org/10.1080/19463014.2017
.1328703
Hendry, G., Wiggins, S., & Anderson, T. M. (2016). Are you still with us? Managing
mobile phone use and group interaction in PBL. Interdisciplinary Journal of Problem-
Based Learning, 10(2). https://doi.org/10.7771/1541-5015.1600
Heritage, J. (1984). A change-of-state token and aspects of its sequential placement. In J.
M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 299–345). Cambridge
University Press. https://doi.org/10.1017/CBO9780511665868.020
Hofstetter, E. (2021). Analyzing the researcher-participant in EMCA. Social Interaction:
Video-Based Studies of Human Sociality, 4(2), Article 2. https://doi.org/10.7146/si.v4i2
.127185
Collecting and analysing multi-source data 109
Hutchby, I., O’Reilly, M., & Parker, N. (2012). Ethics in praxis: Negotiating the presence
and functions of a video camera in family therapy. Discourse Studies, 14(6), 675–690.
https://doi.org/10.1177/1461445612457487
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.
https://doi.org/10.1017/CBO9780511807572
Laurier, E., Brown, B., & McGregor, M. (2016). Mediated pedestrian mobility: Walking
and the map app. Mobilities, 11(1), 117–134. https://doi.org/10.1080/17450101.2015
.1099900
Laurier, E., & Philo, C. (2006). Natural problems of naturalistic video data. In H.
Knoblauch, B. Schnettler, J. Raab, & H.-G. Soeffner (Eds.), Video-analysis
methodology and methods, qualitative audiovisual data analysis in sociology (pp.
183–192). Peter Lang.
Licoppe, C., & Figeac, J. (2018). Gaze patterns and the temporal organization of multiple
activities in mobile smartphone uses. Human–Computer Interaction, 33(5–6), 311–334.
https://doi.org/10.1080/07370024.2017.1326008
Lomax, H., & Casey, N. (1998). Recording social life: Reflexivity and video methodology.
Sociological Research Online, 3(2), 121–146. https://doi.org/10.5153/sro.1372
Mantere, E., & Raudaskoski, S. (2017). The sticky media device. In A. R. Lahikainen,
T. Mälkiä, & K. Repo (Eds.), Media, family interaction and the digitalization
of childhood (pp. 135–154). Edward Elgar Publishing. https://doi.org/10.4337
/9781785366673.00018
Mantere, E., Raudaskoski, S., & Valkonen, S. (2018). Parental smartphone use and
bystander ignorance on child development. In M. Loicq, S. Aude, & I. Féroc Dumez
(Eds.), Les cultures médiatiques de l’enfance et de la petite enfance (pp. 98–113).
Editions du Centre d’études sur les Jeunes et les Médias.
Mondada, L. (2013). The conversation analytic approach to data collection. In J. Sidnell &
T. Stivers (Eds.), The handbook of conversation analysis (pp. 32–56). Wiley-Blackwell.
https://doi.org/10.1002/9781118325001.ch3
Mondada, L. (2014). The temporal orders of multiactivity: Operating and demonstrating in
the surgical theatre. In P. Haddington, T. Keisanen, L. Mondada, & M. Nevile (Eds.),
Multiactivity in social interaction: Beyond multitasking (pp. 33–76). John Benjamins.
https://doi.org /10.1075/z.187.02mon
Mondada, L. (2018). Multiple temporalities of language and body in interaction: Challenges
for transcribing multimodality. Research on Language and Social Interaction, 51(1),
85–106. https://doi.org/10.1080/08351813.2018.1413878
Mondada, L. (2022). Conventions for multimodal transcription. Online resource. https://
www.lorenzamondada.net/multimodal-transcription
Oloff, F. (2019). Das Smartphone als soziales Objekt: Eine multimodale Analyse von
initialen Zeigesequenzen in Alltagsgesprächen. In K. Marx & A. Schmidt (Eds.),
Interaktion und Medien: Interaktionsanalytische Zugänge zu medienvermittelter
Kommunikation (pp. 191–218). Universitätsverlag Winter.
Oloff, F. (2021). Some systematic aspects of self-initiated mobile device use in face-to-face
encounters. Journal für Medienlinguistik, 2(2), 195–235. https://doi.org/10.21248/jfml
.2019.21
Porcheron, M., Fischer, J. E., & Sharples, S. (2016). Using mobile phones in pub talk. In
CSCW ‘16: Proceedings of the 19th ACM conference on computer-supported cooperative
work & social computing (pp. 1649–1661). https://doi.org/10.1145/2818048.2820014
110 Iuliia Avgustis and Florence Oloff
Porcheron, M., Fischer, J. E., & Sharples, S. (2017). “Do Animals Have Accents?” talking
with agents in multi-party conversation. Language and Communication, 60, 207–219.
https://doi.org/10.1145/2998181.2998298
Potter, J. (2002). Two kinds of natural. Discourse Studies, 4(4), 539–542. https://doi.org/10
.1177/14614456020040040901
Raclaw, J., Robles, J. S., & DiDomenico, S. M. (2016). Providing epistemic support for
assessments through mobile-supported sharing activities. Research on Language and
Social Interaction, 49(4), 362–379. https://doi.org/10.1080/08351813.2016.1199089
Raudaskoski, S. (2009). Tool and machine: The affordances of the mobile phone. University
of Tampere Press.
Raudaskoski, S., Mantere, E., & Vakonen, S. (2017). The influence of parental smartphone
use, eye contact and ‘bystander ignorance’ on child development. In A. R. Lahikainen,
T. Mälkiä, & K. Repo (Eds.), Media, family interaction and the digitalization of
childhood (pp. 173–184). Edward Elgar. https://doi.org/10.4337/9781785366673.00021
Robles, J. S., DiDomenico, S., & Raclaw, J. (2020). Doing being an ordinary technology
and social media user. Language and Communication, 60, 150–167. https://doi.org/10
.1016/j.langcom.2018.03.002
Sacks, H. (1984). Notes on methodology. In J. M. Atkinson & J. Heritage (Eds.), Structures
of social action: Studies in conversation analysis (pp. 21–27). Cambridge University
Press.
Sahlström, F., Tanner, M., & Valasmo, V. (2019). Connected youth, connected classrooms:
Smartphone use and student and teacher participation during plenary teaching.
Learning, Culture and Social Interaction, 21, 311–331. https://doi.org/10.1016/j.lcsi.2019
.03.008
Speer, S. A. (2002). ‘Natural’ and ‘contrived’ data: A sustainable distinction? Discourse
Studies, 4(4), 511–525. https://doi.org/10.1177/14614456020040040601
Speer, S. A., & Hutchby, I. (2003). From ethics to analytics: Aspects of participants’
orientations to the presence and relevance of recording devices. Sociology, 37(2), 315–
337. https://doi.org/10.1177/0038038503037002006
Stivers, A. (2004). “No no no” and other types of multiple sayings in social interaction.
Human Communication Research, 30(2), 260–293. https://doi.org/10.1111/j.1468-2958
.2004.tb00733.x
Streeck, J., Goodwin, C., & LeBaron, C. (Eds.). (2011). Embodied interaction. Language
and body in the material world. Cambridge University Press.
Tuncer, S. (2016). The effects of video recording on office workers’ conduct, and the validity
of video data for the study of naturally-occurring interactions. Forum Qualitative
Sozialforschung/Forum: Qualitative Social Research, 17(3), Article 3. https://doi.org
/10.17169/fqs-17.3.2604
6
FROM DISTRIBUTED ECOLOGIES TO
DISTRIBUTED BODIES IN INTERACTION
Capturing and analysing “dual
embodiment” in virtual environments
Introduction
The human body is an important resource for producing and ascribing mean-
ings to social actions in face-to-face interaction. It constitutes an element – or a
“semiotic field” – among others (talk with its language structure, the shapes and
structures of the material environment, etc.) within emerging “contextual con-
figurations” that participants use to build and interpret actions (Goodwin, 2000, p.
1490, 2018, p. 134). Embodied participation frameworks – that is, different kinds
of constellations of human bodies in interaction – are embedded elements in con-
textual configurations. In them, the human body is used to accomplish joint focus
to a referent or establish mutual orientation between participants. Hence, they are
also frames for producing and interpreting social actions (Goodwin, 2018, p. 235).
Technology-mediated interactions provide their own contextual configura-
tions for establishing embodied participant frameworks. This is because the
technological platforms that are used for interaction provide setting-specific
affordances for using embodied resources (e.g., eye contact, gestures, and body
movements) and establishing joint focus and mutual orientation. This can lead to
co-participants’ fragmented and limited access to shared multimodal (embodied
or material) resources, which in turn affects their use as resources for producing
and interpreting social actions. For example, Heath and Luff (1992) and Luff et al.
(2003) use the terms “communicative asymmetry” and “fractured ecologies” to
analyse the problems involved in building a shared perspective in video-mediated
interactions, when co-participants do not share a physical space.
Current graphical three-dimensional virtual environments (VEs), such as vir-
tual worlds (VWs) and immersive virtual realities (VRs), are spaces where users
can interact in shared virtual spaces and form embodied participation frameworks
DOI: 10.4324/9781003424888-8
112 Laura Kohonen-Aho and Pentti Haddington
with others by using a virtual body, an avatar. Contrary to the “living physical
body”, the avatar is an artificial virtual character. Still, participants orient to ava-
tar bodies as interactional resources, for example, when (re-)opening encounters
in VEs (Kohonen-Aho & Vatanen, 2021). In VEs, the participants inhabit two
worlds: the physical world and the virtual world. They also operate with two bod-
ies: the physical and the avatar body. This is a unique feature of VE interaction
(Cleland, 2010; Strain, 1999). In this chapter, we use the term “dual embodiment”
to refer to this phenomenon.
This chapter explores how dual embodiment features in the organisation of
action in VEs. First, we analyse a case in which a participant’s virtual body is
partly and momentarily inaccessible to the participant themselves while it remains
accessible to the co-participants. Second, we analyse a case in which a partici-
pant’s physical body is invisible and not accessible to the co-participants. We
show how fragmented or limited joint access to embodied resources caused by
dual embodiment is consequential for the intelligibility and coordination of joint
action. We also show how establishing joint access to an embodied resource for
this or that social end may require the use of other resources, such as talk or the
participants’ private physical actions.
Finally, we highlight the importance of studying dual embodiment as a fea-
ture of joint action in VEs. We show how recording the participants’ actions and
interactions both in the virtual and physical environments provides the analyst
access to new participant or “member’s” perspectives. This introduces important
theoretical and methodological questions for EMCA research, which are discussed
at the end of the chapter.
to refer to the extension of one’s physical body and the interplay of the actions of
the two bodies in VE contexts, rather than as the virtual body being separate from
the physical body (Cleland, 2010; Strain, 1999). In this chapter, we argue that in
VEs a social action’s intelligibility can be contingent on dual embodiment; either
manifestation of the body can be used as a resource for producing and ascribing
meanings to social action.
The virtual bodies in VEs are called avatars, and they are operated by human
users in real time. Depending on the VE, avatars have different appearances and
embodied capabilities. In most VEs, users can modify their avatar’s appearance,
for example, by changing its skin and hair colour, body shape, and clothing. Most
avatars can move, gesticulate, and change their facial expressions. Common ava-
tar movements include walking, flying, jumping, and turning around.
Depending on the VE, avatar movements and gestures are produced with
different devices. In desktop VWs, such as Second Life, typical avatar gestures
include pointing, waving, and clapping. In some cases, users can use pre-defined
gestures, “emblems”, or facial expressions (e.g., smiling) that are selected from a
library of avatar features. One challenge involved in the use of pre-defined ges-
tures is the difficulty of controlling their timely production and duration (Moore
et al., 2006). The avatar’s movement is controlled by the mouse and specific keys
on the keyboard. In immersive VR, gesticulation and movement are different. The
system detects in real time the user’s head, body, and hand movements (e.g., point-
ing and waving) from a head-mounted display (HMD) and hand-held controllers
that the user is wearing and translates them into avatar movements. Consequently,
they tend to appear more accurate in terms of direction, timing, and duration
than in VWs. The controllers are also used to activate and manipulate virtual
objects. Facial expressions, on the other hand, are often generated automatically
by the system; for example, the avatar’s lips move when the system recognises
that the user speaks. More sophisticated motion-tracking gear, such as full-body
motion capture suits, are also being developed. All in all, the user’s actions with
the physical devices usually have consequences for their avatar actions in the VE
(Gamberini & Spagnolli, 2003).
Embodied avatar interaction has been a popular research topic since the first
“blockie”-shaped avatars (Bowers et al., 1996). Even though the design purpose of
most VEs has long been to develop avatars that realistically resemble human con-
duct, the (in)accuracy and (un)intelligibility of avatar actions has still raised the
interest of researchers. Moore et al. (2006) note that despite the increasing visual
realism of VEs, avatars still lack interactional sophistication in displaying the
users’ intentions or current state. For example, the way in which an avatar’s gaze
appears to others can differ from its user’s actual view to the environment (Moore
et al., 2006). This may have consequences for, or even disrupt, the accountability
and intelligibility of actions (Robinson, 2016) in VEs.
These observations about interaction in VE were already raised by Hindmarsh
et al. (2006). They analysed how the embodied features of avatars affect or
114 Laura Kohonen-Aho and Pentti Haddington
disrupt the organisation of interaction in VEs. First, they showed how VE inter-
action can become fragmented because, depending on the interactants’ current
body and gaze direction, the VE is available for them in different ways. Second,
they showed that participants easily presuppose mutual accessibility and avail-
ability when interacting, while, in fact, their perspectives to one another and the
surrounding world are different. Third, they showed that because the VE tech-
nology fails to display the complete trajectory of an avatar’s embodied action,
and part of the trajectory remains hidden, the interpretability of that action is
compromised. Finally, participants assume that their gestures are available to
their co-participants when appearing in their visual field, while they may not
see the gesture at all. In sum, witnessing, recognising, and ascribing meanings
to the visible features of co-participants’ conduct in VEs seems to depend on
the avatar’s exact body position and gaze direction (Hindmarsh et al., 2006).
Thus, it can be difficult for one participant to assess whether a co-participant
sees or does not see the gesture they have produced.1 The next section discusses
the collection of parallel videos from the virtual and the physical environment
and argues for its importance to identify and analyse dual embodiment in VEs
interaction.
to behave in a particular way but where interaction can unfold naturally within
the given context, potentially providing novel insights into social interaction
(Kendrick, 2017).
Yet another aspect of collecting video from VE interaction relates to the partic-
ipants’ simultaneous presence in parallel physical and virtual spaces. Gamberini
and Spagnolli (2003) argue that it is necessary to capture all the digital, physi-
cal, and local resources that participants use when they operate technologies.
EMCA researchers frequently use multiple cameras to capture the events of an
interactional situation from different perspectives. However, multiple cameras are
used within the same contextual configuration. As to VEs, on the other hand,
Gamberini and Spagnolli (2003) propose the use of a “split-screen technique”
where the video recording of the avatar is merged and synchronised with a record-
ing of the user in the physical environment. The merged videos offer a possibility
to analyse the connection between the users’ physical movements with a keyboard
or controllers and the avatar actions in the VE. In other words, they offer two
perspectives of the “same” action in different contextual configurations which
are not similarly accessible to all participants, thus offering the analyst a richer
perspective to the shape and delivery of multimodal social action in VEs (for
more examples of similar recording setups, see Hindmarsh et al., 2006; Kohonen-
Aho & Alin, 2015; Steier, 2020). Next, we introduce the settings and data that
were used to study dual embodiment. Then, we illustrate with two excerpts how
the participants orient to their own physical body and each other’s virtual bod-
ies in VE interaction. We show how dual embodiment can be consequential for
understanding bodily appearances and for producing and ascribing meanings to
embodied actions in VEs.
private island. Each team member used an avatar that was created for them by
the researchers. All three avatars in a team had a customary male appearance.
During the collaboration session, in the physical world, the three participants sat
in adjacent rooms where they could not see or hear each other; they could only see
one another’s avatars on their computer displays and speak through an audio con-
nection. The team members did not know each other, and they met face-to-face
for the first time after the session. Before the session began, the participants were
given instructions on how to move with the avatar.
Figure 6.1 illustrates the data collected in Second Life. We used screen capture
software to record one VE video from the researcher’s computer, recording all
three avatars from a third-person perspective. This was made possible by placing
the researcher’s avatar in the virtual space with the participants but putting it out
of sight under the floor. The three participants in the physical space were video
recorded with remote-controlled cameras in three adjacent rooms. The cameras
were positioned to the side from an angle that captured the computer displays on
the video frame. An alternative solution for getting the participants’ perspective on
the VE would have been to use screen capture software for recording the events as
they unfolded on their displays. However, this would have resulted in six record-
ings (three VE videos and three PE videos), making the analysis arduous. As a
compromise, the PE videos provided access to each participant and their private
perspective to the VE through the computer display, which turned out to be suf-
ficient for the analysis.
Rec Room is a multiplayer virtual reality game environment with possibilities
to play, for example, paintball, charades, or disc golf. The users can customise
FIGURE 6.1
Parallel videos in the VW data in Second Life.
Dual embodiment in virtual environments 117
FIGURE 6.2
Parallel videos in the VR data in Rec Room.
their avatar appearance, but unlike in Second Life, the avatar body consists of a
head, upper body, hands, and no lower body (see Figure 6.2). The players interact
by moving and gesturing with their avatars, activating and moving objects, and
talking in real-time through an audio connection. Facial expressions are generated
automatically: The avatars’ lips move when the user talks and the avatars frown
when something hits their body or face. The avatars can be operated only from a
first-person perspective.
The data were recorded as part of a university course on interaction analy-
sis. Twelve volunteers were recruited (not students in the course), and they were
divided into six pairs and instructed to stay together and freely interact and explore
the environment. Both participants’ first-person perspective in the VE (see the top
frames in Figure 6.2) and the in-game stereoscopic sound were recorded sepa-
rately with screen-capture software from two computers that ran the Rec Room
game. The participants’ talk was recorded from the microphones on the HMDs.
Since the two participants were physically in the same space, only one PE record-
ing was made with a 360-degree video camera and high-quality microphones. The
participants move in the virtual space either by using a teleportation feature or
the game’s menu to select and move to a specific place in the game. In the physi-
cal space, the participants can move within a small and delimited area defined by
the VR gear. Approaching the boundaries of the area triggers a grid on the HMD
warning the user to not step over the boundary. In principle, this prevents the par-
ticipants from entering each other’s physical space. The HMDs were connected
to a computer with a cable, which constrained the participants’ movements. The
users wore earphones to hear one another and the in-game sounds in Rec Room.
Some participants chose to wear only one of the earphones to hear both the game
sounds and the talk in the lab.
118 Laura Kohonen-Aho and Pentti Haddington
FIGURE 6.3
Hugo’s avatar turns to Antti’s avatar and takes a few steps back.
FIGURE 6.4
Hugo is controlling his avatar’s movements.
token (Heritage, 1984), treating the answer as news. It suggests that Hugo was
expecting a different answer, for example, that the others also see their own shirts
as black (lines 5–7). Then Hugo asks another question about the appearance of
his shirt in line 8: lukeeks täs (.) mustas paidas ((yliopisto)) “does this black shirt
read ((university))”. The question is motivated by the fact that while Hugo can see
and read the text on his co-participants’ shirts, he cannot see the front of his own
shirt. This is a dual embodiment problem: Even though he is controlling the vir-
tual body, he does not have visual access to the front of the avatar’s body. With the
question, he checks if the same information stands on all their shirts even though
their colour is different. As he asks the question, to solve the dual embodiment
problem, he positions Hugo-A’s body so that it faces Antti-A, making the text on
the shirt available to him (Figure 6.3). Antti confirms that the shirt has the uni-
versity logo on it. The turn-final ja- “and-” suggests that he plans to continue the
turn, but he then cuts it off (line 8). Hugo acknowledges Antti’s description with
okei “okay” in line 9, after which Iiris also confirms seeing the logo on Hugo-A’s
shirt (line 11).
Next, in overlap with Iiris’ confirmation, Hugo moves on to turning his ava-
tar around 360 degrees, possibly to see the front side of his avatar. This private
action is another attempt and practice for solving the dual embodiment problem.
Simultaneously, he verbalises his attempts in lines 10–12. Hugo’s voice fades dur-
ing the turn, indicating that the turn is produced as self-talk, indeed reflecting his
private attempt to solve the issue, rather than as a genuine question to the others.
This is also how Iiris and Antti interpret Hugo’s turn, because they do not respond
to it. After a 0.5-second pause (line 13), Iiris continues to talk about Hugo’s shirt,
stating that the text on it is small (lines 14–15). At the same time, Hugo finds a
solution to see the front side of his own avatar and solves the dual embodiment
problem. The PE video shows that by pressing the ALT-key and simultaneously
moving his mouse, he is able to turn Hugo-A around so that it faces Hugo on the
computer display (Figure 6.4). This confirms that his aim earlier was to face his
avatar. At this point, Hugo has gained visual access to the text on his avatar’s
shirt, which is evidenced by his noticing turn ai kato lukee siin ((yliopisto)) “oh
look, it does read ((yliopisto))” in line 16. Iiris then confirms Hugo’s discovery
(line 17).
Excerpt 1 shows how in VE interaction a participant may lose visual access
to (the full appearance of) their avatar’s body. In VE interactions, this usually
occurs with the first-person perspective but can also occur with the third-person
perspective active, for example, when viewing one’s avatar from behind. The
analysis shows how Hugo attempts to solve the problem of not seeing a part of
his avatar body by using verbal and social resources (i.e., asking the others for
more information about the avatar’s appearance) and by using private, embodied
resources, to figure out the correct keyboard and mouse configurations to see the
front of the avatar’s body. The PE video shows why Hugo does not have access
122 Laura Kohonen-Aho and Pentti Haddington
to his own avatar shirt: His own computer display does not show it. Additionally,
the PE video reveals Hugo’s orientation when he moves from asking his team-
mates for help to trying to solve the dual embodiment problem himself by using
the mouse and the keyboard to control his avatar. These actions are not visible in
the VE video.
In sum, Excerpt 1 shows how dual embodiment in VE can become evident as
limited visual access to one’s own avatar body; the person steering the avatar can-
not see their own avatar body in the same way as their co-participants can. This res-
onates with the analysis on the intelligibility of pointing gestures and fragmented
interaction in VR by Hindmarsh et al. (2006): The user’s perspective may limit or
prevent visual access to (parts of) their own avatar body, which in turn may lead to
fragmented interaction. This has consequences for interaction: One cannot assume
that one’s co-participants have full visual access to the details and appearance of
their own avatar even though that avatar is present and in use.
ribs and life vest. At the beginning of the excerpt, Pertti changes his strategy: He
throws away the existing drawings and produces a gesture next to his waist and
verbally directs Heikki’s attention to his pants.
FIGURE 6.5
Pertti produces the gesture around his waist while at the same time the
virtual pen and hand keep appearing and disappearing.
02 HEI: ba[dge.]
03 PER: [pan ]ts. (.) yeah*
per: -*
per-A: -*
04 (0.4)
05 HEI: pants? (.) oh, pants.
06 PER: I’ve got (.) uhh.
07 *(2.4) *
per: *shakes hands in the air, palms up*
per-A: *hands shaking--------------------*
08 PER: *mhh
per: *drops hands down
per-A: *drops hands down
09 HEI: I think I know what you mean,
10 but I can’t remember the English word
124 Laura Kohonen-Aho and Pentti Haddington
pocket? Heikki’s turn in line 18, =that didn’t look like a pocket, is important,
because it verbalises the problem of ascribing meaning to Pertti’s gesture. After
laughing and shrugging, Pertti explains the strategy he used to describe the target
word I was trying to highlight the areas and reproduces the gesture he made ear-
lier in front of his waist.
Excerpt 2 illustrates the possible interactional challenges involved in producing
recognisable and intelligible embodied actions in VR. It shows how such chal-
lenges may be occasioned by the co-participants’ orientation to different manifes-
tations of the distributed bodies, the virtual or the physical. When Pertti uses his
physical body as a reference point for the gesture, he builds on assumed equiva-
lence between the appearances of his physical and virtual bodies. The form of
Pertti’s gesture, as shown in the PE video, reveals that it is designed with respect
to his physical body (waist/pants). However, this is not intelligible to Heikki partly
because the avatar lacks a lower body. Indeed, as Moore et al. (2006, p. 272) argue,
in VR, it is not what one intends to do that matters but what is publicly displayed
and becomes evident as a social action.
Moore et al. (2006) also suggest that avatar movements are less intelligible than
physical ones because they are less sophisticated. However, Excerpt 2 shows that
the challenge is not only related to the crudeness of the avatar body but is also
deeply rooted in dual embodiment; despite being co-present in the physical and
the virtual environments, the participants orient to different manifestations of their
bodies: Pertti to his physical body and Heikki to Pertti’s avatar body. Pertti’s physi-
cal body remains inaccessible to Heikki, although it is precisely the physical body
that would be needed for ascribing meaning to the gesture. Simultaneously, the
motion capture technology used in VR seems to make it effortless for the partici-
pants to use their physical bodies, in rich and subtle ways, to produce embodied
actions. Nevertheless, as Excerpt 2 shows, they can also lose access to how the
technology replicates their movements and actions to the avatar.
Finally, the use of both VE and PE video made it possible for analysing the subtle
features of embodied social action. PE video provided access to Pertti’s physical
body that produced the gesture within its “private” contextual configuration, while
for Heikki, Pertti’s gesture became intelligible only within the virtual contextual con-
figuration and embodied participation framework that they both shared (Goodwin,
2000, 2007). In the following, we discuss how the above observations invite us to
rethink some theoretical and methodological questions and principles in EMCA.
Discussion
From embodied participation frameworks and distributed
ecologies to distributed bodies in VE interaction
New technologies provide new environments for human–human interaction and,
therefore, new forms of “contextual configurations” (Goodwin, 2000) and new
possibilities for establishing embodied participation frameworks (Goodwin, 2018;
126 Laura Kohonen-Aho and Pentti Haddington
see also Haddington & Oittinen, 2022). Moreover, in VEs, one participant inhab-
its two bodies simultaneously, the virtual and the physical. We showed how the
two bodies function variously as interactional resources for action production and
action ascription in VE interaction. Thus, we argue for the importance of consid-
ering “dual embodiment” and its relevance for understanding participants’ pro-
cedures for making social action intelligible (i.e., accountable) in VEs. This said,
dual embodiment cannot be treated as something that by default characterises VE
interactions. Rather, VEs afford dual embodiment; it is a contingent phenomenon
that may become relevant for the participants and consequential for the coordina-
tion of joint action. Dual embodiment may lead to divergent orientations of the
body as a resource and influence a social action’s intelligibility and accountability,
requiring further interactional work to coordinate action.
We analysed two excerpts that illustrate how participants in VE may – due to
dual embodiment – occasionally lose shared access to a body (part) that is relevant
for the intelligibility of action. Excerpt 1 showed how participants may lose access
to their own virtual body, which may in turn become an issue that needs a joint
resolution. Excerpt 2 showed how the participants attended to the same embodied
action (an iconic and indexical gesture) but oriented to different manifestations of
the body when producing the action and ascribing a meaning to it: One of them to
the physical body and the other to the virtual body. In the excerpts, dual embodi-
ment disrupts the contextual configuration and the embodied participation frame-
work between the participants, affecting the accountability and intelligibility of the
embodied action, which is later resolved through talk.
These observations resonate more broadly with EMCA research on video-
mediated technologies that afford interaction and collaboration across remote
environments (e.g., Heath & Luff, 1992; Luff et al., 2003). The challenge that these
environments – and their distributed ecologies – present for the intelligibility of
embodied action originates in them being produced in a scene (physical environ-
ment) that is different from the scene in which they are received and interpreted (a
computer display). They make possible situations where participants do not have
equal access to the resources needed for joint action. In other words, they have
divergent or fragmented contextual configurations (Goodwin, 2000) which can
lead to communicative asymmetry and fractured interaction.
VEs add a new layer to this asymmetry and invite use to replenish the con-
cept of “distributed ecology”. In contrast to video-mediated settings, VEs create
a shared ecology for distributed participants and provide them with an array of
common resources that they can orient to and use for joint action. However, this
shared ecology can be misleading; the fact that the participants still inhabit dis-
tributed ecologies becomes evident, as our analysis shows, in the form of dual
embodiment. VEs provide an additional virtual body to the users that is controlled
and operated by the physical body, sometimes from different perspectives (first-
person and third-person perspectives). As a result, in VEs, actions and activities
are produced and interpreted with distributed bodies. This can fragment, as we
Dual embodiment in virtual environments 127
analyst to not alienate themselves from the participants’ social realities (Arminen
et al., 2016).3
The second methodological question concerns the quasi-experimental nature
of the studies in which the interactions were recorded and whether they reflect the
endogenous organisation of social interaction between people who would meet
“naturally” to play. The settings in both cases were partly the source of the partici-
pants’ problems. In Excerpt 1, especially, the avatar appearances and the colours
of the shirts were defined by the researchers. Thus, it is possible to argue that
the interactional problem concerning the virtual body would not have arisen had
the participants been recorded in a naturally occurring situation and created their
own avatars. However, at the same time, the quasi-experiments showed the par-
ticipants’ orientation to their physical and virtual bodies, allowing the analysis to
reveal the existence of dual embodiment and its relevance for joint social action.
As different VE environments become more common in the future, these analyses
can be complemented and verified with recordings made in naturally occurring
situations.
Third, EMCA scholars who work with video are generally familiar with observ-
ing the participants from a third-person perspective, that is, “from the outside”.
In VEs – and especially in immersive VR (see Excerpt 2) – the recordings are
made from a participant’s first-person perspective. This has consequences for the
analysis. On the one hand, the first-person perspective provides straightforward
access to the recorded participant’s perspective; the analyst shares the view of the
recorded participant. At the same time, however, with the first-person perspective,
the analyst may miss information about the broader context that a recording made
from a third-person perspective would offer.4 As to the VR data in this study,
however, it is important to note that the recordings used for analysis are two-
dimensional videos captured from computer screens. They differ greatly from and
are not able to capture the participant’s full experience of being immersed in VR
(McIlvenny, 2020). An advanced solution for the analyst to access the participant’s
immersive experience is AVA360VR, a software tool designed for collecting and
analysing “volumetric” data (see McIlvenny, 2019, 2020; McIlvenny & Davidsen,
this volume). AVA360VR allows the analyst to “enter” a 360-degree recording
with virtual glasses and experience the recorded situation from a first-person per-
spective that the participants had during their interaction.
To conclude, as long as existing and emerging VE technologies offer interac-
tional affordances that diverge from the ones that are familiar to us from face-
to-face interaction, such as the possibility of having two bodies (of which the
physical one is constantly private for its user), the users of VE are likely to continue
encountering situations where they cannot rely on the mechanisms of face-to-face
interaction and hence come up with new setting-specific strategies to ensure the
intelligibility of joint action (see, e.g., Bennerstedt & Ivarsson, 2010; Kohonen-
Aho & Vatanen, 2021). Capturing these requires the development of the EMCA
methodology in a direction such as the one we have proposed. Although parallel
Dual embodiment in virtual environments 129
videos should be used with caution because they offer the analyst more informa-
tion than the participants have themselves, we argue that they are necessary for
complementing the analysis of multimodal interaction as well as for exploring the
participants’ orientation and mutual action in VEs.
Notes
1 One possible reason for not seeing a co-participant’s embodied action may be related to
the system: the field of vision with HMDs is narrower than human field of vision, which
makes it more difficult to see from the corner of one’s eye.
2 This opens a broader question related to what the participants can be expected to
“know” about how the features of the virtual environment affect their actions. It is
true that the more experience an individual has about operating in VEs the more they
may be able to design their actions in view of the specific affordances in it. However,
“knowledge” or “experience” does not necessarily translate into action. As to the dual
embodiment puzzle in particular, even if one “knows” that their avatar does not have
the lower part of the body, it does not mean that the individual stops using their real
lower body as a resource for producing an embodied action in VE.
3 There is also another feature related to the information gap between analysis and par-
ticipants that we have not addressed. In the Second Life data, the participants sat in
separate rooms and did not have any sensory access to each other’s bodies. In the Rec
Room data, however, the participants were co-present in the same room. Although the
HMD blocked their visual access to the physical appearance of their co-participant, it
is impossible to prove, unless made explicit, if and when the participants could hear or
sense each other’s physical presence. For example, the participants may be able to hear
talk and sounds from the physical environment through the earphones. Some partici-
pants also deliberately wore only one of the earphones to gain better auditory access to
the physical environment.
4 One way to overcome this problem is that the researcher enters the same virtual
environment with the recorded participants and follows their actions through the
virtual glasses that are also used for capturing their interaction from a third-person
perspective.
References
Arminen, I., Licoppe, C., & Spagnolli, A. (2016). Respecifying mediated interaction.
Research on Language and Social Interaction, 49(4), 290–309. https://doi.org/10.1080
/08351813.2016.1234614
Bennerstedt, U., & Ivarsson, J. (2010). Knowing the way: Managing epistemic topologies
in virtual game worlds. Computer Supported Cooperative Work (CSCW), 19(2), 201–
230. https://doi.org/10.1007/s10606- 010-9109-8
Blascovich, J., Loomis, J., Beall, A. C., Swinth, K. R., Hoyt, C. L., & Bailenson, J. N.
(2002). Immersive virtual environment technology as a methodological tool for social
psychology. Psychological Inquiry, 13(2), 103–124.
Boellstorff, T. (2008). Coming of age in second life. Princeton University Press.
Bowers, J., Pycock, J., & O’brien, J. (1996). Talk and embodiment in collaborative
virtual environments. In Proceedings of the SIGCHI conference on human factors in
computing systems (pp. 58–65).
Cleland, K. (2010). Prostethic bodies and virtual cyborgs. Second Nature: International
Journal of Creative Media, 2, 74–101.
130 Laura Kohonen-Aho and Pentti Haddington
Gamberini, L., & Spagnolli, A. (2003). Display techniques and methods for cross-medial
data analysis. PsychNology Journal, 1(2), 131–140.
Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal
of Pragmatics, 32(10), 1489–1522. https://doi.org/10.1016/S0378-2166(99)00096-X
Goodwin, C. (2003). The body in action. In J. Coupland & R. Gwyn (Eds.), Discourse, the
body and identity (pp. 19–42). Palgrave Macmillan.
Goodwin, C. (2007). Environmentally coupled gestures. In S. D. Duncan, J. Cassell, & E.
T. Levy (Eds.), Gesture and the dynamic dimension of language: Essays in Honor of
David McNeill (pp. 195–212). John Benjamins Publishing.
Goodwin, C. (2018). Co-operative action. Cambridge University Press.
Haddington, P., & Oittinen, T. (2022). Interactional spaces in stationary, mobile, video-
mediated and virtual encounters. In A. H. Jucker & H. Hausendorf (Eds.), Pragmatics of
space (pp. 317–362). De Gruyter Mouton. https://doi.org/doi:10.1515/9783110693713-011
Heath, C., & Luff, P. (1992). Media space and communicative asymmetries: Preliminary
observations of video-mediated interaction. Human–Computer Interaction, 7(3), 315–
346. https://doi.org /10.1207/s15327051hci0703_3
Heritage, J. (1984). A change-of-state token and aspects of its sequential placement. In J.
M. Atkinson and J. Heritage (Eds), Structures of Social Action: Studies in Conversation
Analysis, (pp. 299–345). Cambridge University Press.
Hindmarsh, J., Heath, C., & Fraser, M. (2006). (Im)materiality, virtual reality and
interaction: Grounding the ‘virtual’ in studies of technology in action. The Sociological
Review, 54(4), 795–817. https://doi.org/10.1111/j.1467-954X.2006.00672.x
Kendrick, K. H. (2017). Using conversation analysis in the lab. Research on Language and
Social Interaction, 50(1), 1–11. https://doi.org/10.1080/08351813.2017.1267911
Kitzinger, C. (2013). Repair. In T. Stivers & J. Sidnell (Eds.), The Handbook of Conversation
Analysis, (pp. 229–256). Wiley-Blackwell.
Kohonen-Aho, L., & Alin, P. (2015). Introducing a video-based strategy for theorizing
social presence emergence in 3D virtual environments. Presence: Teleoperators and
Virtual Environments, 24(2), 113–131. https://doi.org /10.1162/ PRES_a _00222
Kohonen-Aho, L., & Vatanen, A. (2021). (Re-)Opening an encounter in the virtual world
of second life: On types of joint presence in avatar interaction. Journal for Media
Linguistics - Journal Für Medienlinguistik, 4(2), 14–51. https://doi.org/10.21248/jfml
.2021.30
Levinson, S. (2013). Action formation and ascription. In J. Sidnell & T. Stivers (Eds.), The
handbook of conversation analysis (pp. 103–130). Wiley-Blackwell.
Luff, P., Heath, C., Kuzuoka, H., Hindmarsh, J., Yamazaki, K., & Oyama, S. (2003).
Fractured ecologies: Creating environments for collaboration. Human-Computer
Interaction, 18(1), 51–84. https://doi.org/10.1207/S15327051HCI1812_3
McIlvenny, P. (2019). Inhabiting spatial video and audio data: Towards a scenographic turn
in the analysis of social interaction. Social Interaction: Video-Based Studies of Human
Sociality, 2(1), Article 1. https://doi.org/10.7146/si.v2i1.110409
McIlvenny, P. (2020). The future of ‘video’ in video-based qualitative research is
not ‘dumb’ flat pixels! Exploring volumetric performance capture and immersive
performative replay. Qualitative Research, 20(6), 800–818. https://doi.org/10.1177
/1468794120905460
Moore, R. J., Ducheneaut, N., & Nickell, E. (2006). Doing virtually nothing: Awareness
and accountability in massively multiplayer online worlds. Computer Supported
Cooperative Work (CSCW), 16(3), 265–305. https://doi.org/10.1007/s10606- 006-9021-4
Dual embodiment in virtual environments 131
DOI: 10.4324/9781003424888-9
360-cameras used in a mobile gathering 133
knowledge in their sayings and doings and in that the analyst gains access to a new,
enlarged range of possibilities to produce new knowledge in a field of research. The
nature hike in focus in the present chapter involved six guides and 14 participants
visiting a local forest. The guides explained what could be seen or heard in the sur-
roundings. The participants were also given a printed “bingo sheet quiz” to help
them spot specific plants. While on the move, the participant-researchers spread
out among the stretched-out group, typically with one of them in front, one around
the middle, and one accompanying the slowest group. An analytically interesting
situation was spotted in the footage when two guides offered participants two dif-
ferent sensory orientations (seeing, feeling) to inspecting the contents of a scientific
tool: a rod with a soil sample. The chance to touch a sample of soil was generally
welcomed by the participants. The “why that now” question surfaced when, step-
ping closer to the rod like those who wanted to touch it, one participant did not take
a soil sample, but instead asked about its invisible contents (bacteria, acidity). This
occurred while another participant standing next to him was touching two samples
and discussing the feel of them with the guides. I looked through the combined
footage from the entire hike to see whether they resorted to similar active partici-
pation in other situations. The chapter does not concentrate on the actual analysis
(Raudaskoski, 2019) but uses the analytic interest to illustrate the benefits of the
setup used to document the mobile and stretched-out group. My claim is that when
the outputs of several camera and audio channels are collected as raw data on one
screen (Figure 7.3 below) to allow closer study of the rendered versions of the
360-camera footage, this increases the possibility to follow the EMCA principle of
attending to “the demonstrable indigenous import of the events and of their con-
text for their participants (Schegloff, 1997, p. 184). The concluding remarks of the
chapter also relate the study reported here to recent developments in ethnography.
It is important to have a provisional division of labour for the camera crew, espe-
cially for a moving, often stretched-out group. When it stops, the camerapersons
have to monitor each other to decide where to place themselves such that the crowd
in its entity is captured. With handheld 2D cameras, that work also includes zoom-
ing in and out to record the details of artefacts used (camera operator). In complex
situations, for instance with a moving group, that epistemic work becomes diffi-
cult as it is hard to follow the practice from the various participants’ continuously
changing perspectives (see Broth & Lundström, 2013). In complex data-gathering
situations such as the ones reported in this chapter, a cameraperson will necessarily
switch from the role of (detached) observer of an evolving action who is trying to
make sense of it, to the role of a participant who orients to the situation with another
type of instrumental stance (Goodwin, 2007). Since 360-degree cameras render the
camerapersons’ practices visible in the footage, it is possible to analyse their embod-
ied participation. As mentioned, that participation should not be seen as problematic
influencing but as one aspect of how the situation evolves as a specific configura-
tion. The camerapersons monitor their concrete placement without seeing the exact
result through a viewfinder, sometimes holding the camera at arm’s length, if not
further away with the help of a pole, or on a pole touching the ground next to them
(cf. Figure 7.1). An almost continuously changing mobile formation requires an agile
FIGURE 7.1
A one-lens 360-camera view from above
136 Pirkko Raudaskoski
camera team. The cameraperson without a viewfinder view is fairly free to visibly
orient to the ongoing situation, to become available to other participants, and thus
to play a part in the ongoing action. They cannot see the exact camera view but con-
centrate on the action. In this way they behave like Garfinkel’s ethnomethodologists:
camerapersons’ instrumental and epistemic choices but also observe more than
they did in situ.
In Figure 7.1, the group has stopped and gathered around an object. Both R2 and
R3 came to the scene later than R1, who seemed to have found an instrumentally best
position to monitor what was going on with the earth sample. R2 and R3 first stopped
to film a couple of metres to the left from R1, but after G1 introduced the rod with a
soil sample asking what the group can see there, several participants moved towards
the rod and so did R2 and R3. Even after settling for a position, they moved around,
deciding where to place themselves in the changing configuration. This happened, in
fact, when the group members’ action trajectory changed from monitoring a guide’s
(G1) explanation of what they see to following another guide’s (G2) offer to touch
the soil sample (see below). As most group members moved to the rod, two camera
team members had to adjust their positions, too: R2 moved around and closer to the
rod and R3 moved to a squat position to get a better close-up of the hands and fingers
at the rod. R2’s footage shows that he glanced at the camera on the forward tilting
pole, estimating its viewfinder’s coverage, while R3 saw through the viewfinder to
get the best possible shot. R1 stayed put, as she and the camera had a direct view
to the rod; her own field of vision was approximately the same as the camera view.
Being a cameraperson was no less challenging when the group was a mov-
ing gestaltic configuration or Goffmanian plastic vehicular unit (cf. Mondada,
2014): The shape of the moving group would change continuously as participants
adjusted their speed in the stretched-out mobile formation (McIlvenny, 2013a)
while engaging in incipient talk. The only cameraperson orienting to filming “an
entire recognizable configuration” (Mondada, 2014, p. 52) was the 2D camera
operator, who concentrated on capturing the tail of the group.
because recent developments in online data sessions (McIlvenny, 2020) mean that
you can join as an avatar (head) in a non-VR mode, too. In the data session, you
can also examine the data clip at your own pace during individual observation
time without the others present. Data sessions and data inspection both afford
experiences that approximate visiting the place, not just as “formerly present”
(Laurier & Philo, 2006) but, rather, as something “situated”, that is, more “here”
and therefore also “now”. The analyst can come closer to the ongoing documen-
tary method of interpretation. In my case, I could engage in different ways of
encountering and making sense of nature. For me, being able to track the partici-
pants throughout the hike occasioned an analytic stance to individual participants’
orientations to nature in guided nature hikes.
My analytical EMCA observations of the data grew out of general interest in
affect and agency studies (Raudaskoski, 2017a) and the usefulness of Goodwin’s
approach for socio-material analysis (Raudaskoski, 2017b). In both approaches,
the analytical focus remained on situations where the guides invited participants
to inspect the contents of a soil sample, offering a choice between what might be
deemed two distinct referential practices: the embodied perception of vision, on the
one hand, or touch on the other. In addition, what is visible in Figure 7.1 is also of
interest: Two participants standing next to each other both take a step towards the
soil sample but engage in different activities. P1 joins others in touching the soil to
feel it, whereas P2 asks a question about its contents. The following cartoon strips
(cf. Laurier, 2014; Laurier & Back, this volume) show how the footage was used to
visualise the analytical interest in transcripts. (Movement is indicated with arrows,
and transcribed talk in Danish lasts approximately as long as the action visible in
the frame; talk corresponding to the duration of movement is highlighted).
G1: så skal vi prøve og se hvad vi har her=I kan se igen vi har noget organisk stof heroppe
so shall we try and see what we have here=you can see again we have some organic stuff up here
This camera view comes from the Panasonic 2D camera. The picture quality
is high, which is important for closeups, showing the usefulness of zoomable 2D
cameras. At this point in the data analysis, it was not just the invitation to watch
and listen that was of interest, but also how the reference “organic stuff” gets
constituted by the guide: The object is built while attending to it (cf. Smith, 2019),
as opposed to reference or description acting as a separate representation of the
object in the world (cf. Potter, 1996).
140 Pirkko Raudaskoski
G2: (vil) I mærke på det? G3: he he he G2: det er sand, >d- det< det sjove er jo at mærke forskellen
you (want to) feel it? it is sand, t- the the funny thing is to feel the difference
The rod in this cartoon looks bent due to the chosen view from the one-lens
360-degree camera footage. The first frame also shows a closeup insert from the
Panasonic footage which shows the detail of the guide’s fingers on the rod.
The rendered version of the one-lens 360-camera footage also made it possible
to zoom in on the group, as can be seen in the upper picture of Figure 7.2. The
images below show the rendered version of R1’s camera view at the same exact
moment, starting from the view to the front. Under it are the images when the
analyst turns the image to the left or right to inspect what is happening around R1
at that moment: to her left and behind, or to her right and behind. The blob in the
left-hand side image at the bottom is the top of her head, showing how the camera
placement slightly above her head did not block the view. (The images from R1’s
camera are cut-outs from the otherwise 360-degree possibility – only the camera
itself and what was below it is invisible.)
Figure 7.2 shows the different orientations of the two participants of interest; P1
is touching the soil, and P2 is asking about it with a pointed finger. Both have just
taken a step towards the rod, showing an interest in the topic. These two distinct
orientations to the soil sample have made me think again about building reference
and knowledge; while “subjective” haptic knowledge is based on life experiences
and is available to all those who want to touch the soil (cf. Goodwin & Smith,
2020), “objective” scientific knowledge hints at “book” knowledge. Did resort-
ing to the latter provide a way for P2 to do action-wise self-deselection (cf. Hoey,
2021) while still being active?
Instead of going through the analysis more closely (see Raudaskoski, in press,
2023), I have chosen to explain how the footage made it possible to follow the two
participants throughout the hike. I did this to see if the participants showed similar
“natural attitudes” to the material world during the rest of the hike. Raw footage
from all five cameras was combined (see Figure 7.3), together with all the audio
data that was available from the hike. The composite arrangement allowed com-
parison of the different camera views simultaneously to select views on which to
concentrate, that is, the views which showed the participants who were the focus
of interest. It also made it possible to choose the audio recording that was most
relevant or clear. P1 and P2 were visible in all the used cameras when P1 was feel-
ing the soil and P2 asking about it. In the rest of the footage, especially when they
360-cameras used in a mobile gathering 141
FIGURE 7.2
Above:zoom in capture from the 360-camera on pole (the 8-lens camera
marked on the right); Below: the 8-lens camera view straight ahead, to left
and behind, to right and behind
FIGURE 7.3
Simultaneous five camera views
142 Pirkko Raudaskoski
were walking between sites, P1 and P2 were in separate subgroups but they could
be followed from the different footages covering the entire stretched-out group
(Figure 7.3). The 360-camera view from above helped in the detection of rele-
vant occasions in the moving vehicular units (Goffman, 1971) and the view from
the eight-lens camera enabled the detection of potentially interesting phenomena
involving singles or participation units (Goffman, 1971; Figure 7.2).
The combined footage from the nature hike provided additional insight into
the differences in orientation of the two participants. For instance, when the 2D
camera operator was walking with the last grouping, which included P1 as a par-
ticipant, P1 suddenly ran into the woods to fetch a twig from a beech tree, ate a
leaf, and offered the twig to the rest of the group. This became a traceable action
for the camera operator as, just seconds earlier, the group had visibly noticed
(pointed at, looked at) the big “shiny” tree visible about 20 metres into the for-
est. Nevertheless, had shadowing a participant been the task, many actions and
activities might have gone unnoticed by the 2D camera. This was the case when
P1 shared another plant, this time in connection with the bingo sheet quiz. He
was standing by the side of the path and then walked to another participant in the
middle of the crowd, handing her a specimen. This action took place behind R1,
but because she was bearing the eight-lens 360-degree camera in the middle of the
group, it was detectable from her footage (P1 marked by white rings Figure 7.3)
and analysable from the rendered version. The 2D cameraperson did not have to
worry about whether or not P1’s movements represented an important focus to fol-
low. P2 is visible in the one-lens raw footage (marked with a black ring) in Figure
7.3. He was visibly surveying both sides of the path for specimens and could be
detected at different places in the stretched-out mobile group, mostly in the mid-
dle. His only physical contact with specimens under scrutiny occurred when a
guide instructed everybody to touch a fallen tree trunk. While at the rod, the same
guide had only suggested that participants might touch it.
Summary
The general setup of the nature hike involved a visit to a forest habitat with
experts. After an initial analytical “why that now?” question, the coverage from
the multiple camera footage made it possible to follow two participants, P1 and
P2, throughout the whole nature hike. The initial interest in the referential practice
regarding a soil sample (vision, touch) turned out to be an example of individual
“preference structures” in relation to the natural phenomena they encountered: P2
observed whereas P1 chose direct engagement.
Whether the participants are following instruction or making their own
choices, the combined views capture how the participants’ encounters with speci-
mens constitute the objects, that is, how they “build the objects” while attending
to them. The footage documents reference-making in both the “primary move-
ment” and “secondary movement” (Broth et al., 2014, p. 17), providing materials
360-cameras used in a mobile gathering 143
to study the perspective of both movements and throughout the event. McIlvenny
and Davidsen (this volume) show how the iterative co-analysis of the materials can
also be recorded as practical work in the footage, providing analysts with ample
opportunities not just to go back and check previous analytical observations, but
also to immerse themselves and study EMCA (or other) analytical practices in
vivo (cf. Greiffenhagen et al., 2015). For those researchers who suffer from nausea
in VR, the important message from the present paper is that, after successful cam-
era team work, the combined footage makes it possible to study the unfolding of
the many scattered practices constituting an overall activity even outside the VR.
The opportunities for analytical unmotivated looking also multiplied: Instead of
only studying how different types of actions are formed through collections of
similar cases (an important area of study, e.g., Levinson, 2012), analysis could
trace the types of participants and – without a cameraperson having decided to
shadow them – their orientations to each other and the natural habitat.
It is possible to make a comparison between the two types of orientations (P1
and P2) that the situation afforded to the camerapersons with the 2D and 360-degree
cameras: While the 2D camera operator concentrated on epistemological concerns
about collecting data for research (observing, finding the best point of view to cap-
ture; cf. P2), the other camerapersons were also able to participate or engage more
freely in the ongoing event. When the group stops, the task becomes to ensure a
configuration of camera positions that cover the group’s practices. The work of
the 360-cameraperson without a viewfinder requires a skill at envisioning what the
camera on a pole or next to them captures of the participant perspectives of members
of a group. In a moving group, a 360-camera can ensure that the footage also covers
what goes on behind the cameraperson’s field of vision. In this way, camerawork
becomes more subtle as the cameraperson does not need to start walking backwards
(with a 2D-camera) should their protoanalysis capture something interesting behind
them. While stopping, the pole allows the 360-cameraperson to stand with others
in a circular standing formation instead of (maybe awkwardly) in the middle, thus
making it possible to participate through double membership. An added benefit is
that the work of the camerapersons themselves becomes part of the footage, that is,
the data collection work becomes directly documented (cf. Mondada, 2014).
The only footage that could be analytically equated with the cameraperson’s
potential viewpoint comes from the eight-lens 360-degree camera, as it captures
everything else but the camera itself and what is below it; the cameraperson could
turn around and look up or down while holding the camera located slightly above her
head. The out of body (Raudaskoski, 2017a) version of the one-lens camera suggests
an imagined viewpoint that the cameraperson has while holding the pole above the
group. In any case, the camerapersons’ practical work, their “practices of represent-
ing” (Barad, 2003, p. 804) while within the phenomenon they are studying, is availa-
ble as part of the research data. The cameraperson holding a pole can quickly change
from an overall “neutral” view (pole straight up) by turning the pole so that the
camera faces a specific direction to capture a focused interaction (Goffman, 1963).
144 Pirkko Raudaskoski
Discussion
Contemporary scholarship on ethnographic research methods criticises repre-
sentationalism and advocates for reflexivity in qualitative research (e.g., Lynch
& O’Mara, 2019). Macbeth (2001) suggests a division of the types of reflexivity
encouraged in this movement into positional and textual reflexivities. Whereas
positional reflexivity concentrates on the researcher’s epistemic ponderings, tex-
tual reflexivity treats written discourses as isolated entities. Macbeth goes on to
explain the ethnomethodological take on reflexivity as a practical constitutive
phenomenon, where texts are part of the accomplishment that can be analysed.
The present chapter has built on the ethnomethodological point of departure,
highlighting the possibilities that multiple data gathering affords to enlarge the
analytical potentials of those constitutive practices and, importantly, to make the
camerapersons’ reflexive participation publicly available. This opens up new ana-
lytical possibilities to study how researcher-participants manage to accomplish
truly joint partnerships (see the ethnography of nexus analysis in Ingold, 2018;
Raudaskoski, 2021; Scollon & Scollon, 2007). The practices covered by 360-cam-
era data provide access to the concrete locatedness of researchers’ positionings
and enable analysts to “revisit” their situated positions. For the camera crew using
multiple cameras in this study, the practical work was similar to the practices of
any other person on the move: “the very observability of a scene is embedded
in, endogenous to, and mutually elaborated with and for the observing members’
ongoing activities, location, and context” (Smith, 2019, p. 38).
The group effort captures the participants’ reflexive constitution of socio-mate-
rial ontology during a (thoroughly documented) nature hike, enabling the analyst to
shadow the actions and epistemologically revisit different aspects of that reflexive
work. The contrastive pair formed by two participants orienting to a soil sample
led to an analytical focus on types of participant-in-interaction, rather than solely
on types of (multimodal) talk-in-interaction; in other words, the footage allowed a
different type of EMCA collection. Here, it has allowed us to follow the constitu-
tion of referents as members’ practice, and the data has supported the sensation of
revisiting events, rather than merely studying a closely corresponding 2D represen-
tation. Since the camerapersons are visible in the footage, their ethno-methods or
analytical work (both as participants and as researchers) is available for scrutiny.
This allows access not just to the analytical practices, but also to the data collection
aspect of “research as a practical enterprise” (Greiffenhagen et al., 2015, p. 480).
Conclusion
This chapter has addressed various issues involved in using a multi-cam crew
(McIlvenny & Davidsen, 2017) and, more particularly, the benefits of a constella-
tion of one 360-degree camera (with one-lens and with a small 2D camera attached
to the pole), one (eight-lens) 360-degree camera, and a 2D camera (with the possi-
bility of a picture-in-picture from an additional adjustable lens). The participation
360-cameras used in a mobile gathering 145
Notes
1 I will refer to the traditional (60/90-degree) video cameras as 2D-cameras in the rest of
the chapter.
2 The one-lens camera was, in fact, a 220-degree one, but the sky that could not be seen
was not of analytical interest.
References
Arminen, I. (2005). Institutional interaction. Ashgate.
Barad, K. (2003). Posthumanist performativity. Signs, 28(3), 801–831.
Broth, M., Laurier, E., & Mondada, L. (2014). Introducing video at work. In M. Broth, E.
Laurier, & L. Mondada (Eds.), Studies of video practices (pp. 1–29). Routledge.
Broth, M., & Lundström, F. (2013). A walk on the pier: Establishing relevant places in
mobile instruction. In P. Haddington, L. Mondada, & M. Nevile (Eds.), Interaction and
mobility (pp. 91–122). De Gruyter.
Büscher, M., Urry, J., & Witchger, K. (Eds.). (2011). Mobile methods. Routledge.
Cekaite, A., & Goodwin, M. H. (2021). Researcher participation, ethics, and cameras in the
field. Social Interaction, 4(2). DOI: https://doi.org/10.7146/si.v4i2.127215
Childers, S. M. (2013). The materiality of fieldwork: An ontology of feminist becoming.
International Journal of Qualitative Studies in Education, 26(5), 599–609.
Emibrayer, M., & Maynard, D. W. (2011). Pragmatism and ethnomethodology. Qualitative
Sociology, 34(1), 221–261.
Erickson, F. (2004). Origins: A brief intellectual and technological history of the emergence
of multimodal discourse analysis. In P. Levine & R. Scollon (Eds.), Discourse and
technology (pp. 196–207). Georgetown University Press.
Erickson, F. (2021). Co-operative participation, social ecology, and ethics in video-based
ethnography. Social Interaction, 4(2). DOI: https://doi.org/10.7146/si.v4i2.127210
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice Hall.
Gibson, W., & vom Lehn, K. (2021). The senses in social interaction. Special Issue.
Symbolic Interaction, 44(1), 3–9.
Goffman, E. (1961). Encounters: Two studies in the sociology of interaction. Bobbs-Merrill.
Goffman, E. (1963). Behavior in public places. The Free Press.
Goffman, E. (1971). Relations in public. Penguin.
Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal
of Pragmatics, 32(10), 1489–1522.
Goodwin, C. (2007). Participation, stance and affect in the organization of activities.
Discourse and Society, 18(1), 53–72.
Goodwin, C., & Goodwin, M. H. (1996). Seeing as situated activity. In Y. Engeström & D.
Middleton (Eds.), Cognition and communication at work (pp. 61–95). CUP.
Goodwin, C., & Smith, M. S. (2020). Calibrating professional perception through touch in
geological fieldwork. In L. Mondada & A. Cekaite (Eds.), Touch in social interaction
(pp. 269–287). Routledge.
Greiffenhagen, C., Mair, M., & Sharrock, W. (2015). Methodological troubles as problems
and phenomena. British Journal of Sociology, 66(3), 460–485.
Haddington, P., Mondada, L., & Nevile, M. (Eds.). (2013). Interaction and mobility. De
Gruyter.
Haraway, D. (1988). Situated knowledges. Feminist Studies, 14(3), 575–599.
148 Pirkko Raudaskoski
Hoey, E. M. (2021). Sacks and silence. In R. Smith, R. Fitzgerald, & W. Housley (Eds.), On
Sacks. Routledge.
Hofstetter, E. (2021). Analyzing the researcher-participant in EMCA. Social Interaction,
4(2). DOI: https://doi.org/10.7146/si.v4i2.127185
Ingold, T. (2018). Anthropology and/as education. Routledge.
Kamunen, A., Haddington, P., & Rautiainen, I. (2022). “It seems to be some kind of an
accident”: Perception and team decision-making in time critical situations. Journal of
Pragmatics, 195, 7–30.
Keisanen, T. (2012). ‘Uh-oh, we were going there’: Environmentally occasioned noticings
of trouble in in-car interaction. Semiotica, 191(1/4), 197–222.
Lafruit, G., & Teratani, M. (2022). Virtual reality and light field immersive video
technologies for real-world application. The Institution of Engineering and
Technology.
Laurier, E. (2014). The graphic transcript. Geography Compass, 8(4), 235–248.
Laurier, E., & Philo, C. (2006). Natural problems of naturalistic video data. In H.
Knoblauch, B. Schnettler, J. Raab, & H.-G. Soeffner (Eds.), Video analysis (pp. 183–
192). Peter Lang.
Levinson, S. C. (2012). Action formation and ascription. In J. Sidnell & T. Stivers (Eds.),
The handbook of conversation analysis (pp. 101–130). Blackwell.
Liberman, K. (2013). More studies in ethnomethodology. Suny.
Lynch, J., & O’Mara, J. (2019). Morphologies of knowing. In J. Lynch, J. Rowlands, T.
Gale, & S. Parker (Eds.), Practice methodologies in education research (pp. 166–186).
Taylor & Francis.
Macbeth, D. (1999). Glances, traces, and their relevance for a visual sociology. In P.
L. Jalbert (Ed.), Media studies: Ethnomethodological approaches (pp. 135–170).
University Press of America.
Macbeth, D. (2001). On “reflexivity” in qualitative research: Two readings, and a third.
Qualitative Inquiry, 7(1), 35–68.
McIlvenny, P. (2011). Video interventions in “everyday life”: Semiotic and spatial practices
of embedded video as a therapeutic tool in reality TV parenting programmes. Social
Semiotics, 21(2), 259–288.
McIlvenny, P. (2013a). The joy of biking together. Mobilities, 10(1), 55–82.
McIlvenny, P. (2013b). Interacting outside the box: Between social interaction and
mobilities. In P. Haddington, L. Mondada, & M. Nevile (Eds.), Interaction and mobility
(pp. 409–417). De Gruyter.
McIlvenny, P. (2020). New technology and tools to enhance collaborative video analysis
in live ‘data sessions’. QuiViRR: Qualitative Video Research Reports, 1(December),
a0001. https://doi.org /10.5278/ojs.quivirr.v1.2020.a0001
McIlvenny, P., Broth, M., & Haddington, P. (2009). Communicating place, space and
mobility. Special Issue. Journal of Pragmatics, 41(10), 1879–1886.
McIlvenny, P., & Davidsen, J. (2017). A big video manifesto: Re-sensing video and audio.
Nordicom Information, 39(2), 15–21.
Mondada, L. (2014). Shooting as a research activity: The embodied production of video
data. In M. Broth, E. Laurier, & L. Mondada (Eds.), Studies of video practices (pp.
33–62). Routledge.
Mondada, L. (2021). Orchestrating multi-sensoriality in tasting sessions: Sensing bodies,
normativity, and language. Symbolic Interaction, 44(1), 63–86.
Mondada, L., & Cekaite, A. (Eds.). (2020). Touch in social interaction. Routledge.
360-cameras used in a mobile gathering 149
Augmenting analyses of
the member’s perspective
with multiple research
materials and methods
8
INDUCTIVE APPROACH IN EMCA
The role of accumulated ethnographic
knowledge and video-based observations in
studying military crisis management training
Introduction
Ethnomethodological conversation analysis (EMCA) has a long-standing tradition
in investigating people’s methods of “doing” and “being” in the world with each
other (Arminen, 2005, p. 9). It is based on the ethnomethodological aspiration to
understand human social conduct and on the systematic study of interaction as
the “primordial site of human sociality” (Schegloff, 1996). The latter forms the
tenets of conversation analysis (CA) which has two underlying assumptions: 1)
All interaction is orderly, and 2) This orderliness is oriented to, and made visible,
by the interlocutors themselves through their situated conduct and sense-making
practices (e.g., Stivers & Sidnell, 2005, p. 2). What has formed the core of EMCA
research is detecting micro-level interactional phenomena with a special focus on
the sequential organisation of actions. It has been taken as illustrative of the ways
in which interlocutors construct mutual understanding about their social reali-
ties and the situation in which they are engaging. The overarching aspiration has
been to unravel how situations unfold, and are experienced, from a member’s per-
spective. The theoretical and methodological underpinnings of EMCA have been
employed by researchers working on both mundane and institutional interactions,
such as medical settings, courtrooms, classrooms, and diverse workplace con-
texts. Whereas the former has been characterised as the “purest” kind of CA, as it
tends to focus on the basic operations of sociality (e.g., Drew & Heritage, 1992; ten
Have, 2007), the latter has been often referred to as “applied CA”, since it typically
deals with more specialised uses of language and other conduct with the potential
aim to improve some aspect(s) of institutional practice. However, the boundary
between these lines of work is not straightforward, since it is foremost a matter of
DOI: 10.4324/9781003424888-11
154 Antti Kamunen et al.
the overall research aims and the stance and viewpoint taken by scholars, that is,
their researcher positionality (e.g., Day & Kjaerbeck, 2013). The purpose of this
chapter is to discuss and reflect on our process of studying the institutional setting
of crisis management training and how it has been informed by a close collabora-
tion with the studied community and the decisions we have made along the way.
We illustrate how carrying out research in this context has required us to rethink
our position as researchers and to apply EMCA in ways that partly differ from its
traditional starting points.
The EMCA methodological procedure is in many ways unique and differ-
ent from other approaches that study social interaction, such as (socio)linguis-
tics, mainstream sociology, and cultural anthropology. First, it typically entails
working with video (or audio) recorded data from naturally occurring encounters
from situations that take place regardless of the ongoing research. The start of the
research process is, rightfully, the selection of a research site and data collection.
The focus in the latter is often on the equipment and decisions regarding their
placement in order to get the best possible quality data. Second, a key feature in
EMCA is the role of the researcher in the field, which is to be an observer of the
events without an attempt to participate in them. Third, the analysis itself has mul-
tiple stages: it starts when the data is transcribed (i.e., the first-stage analysis) and
with what is called “unmotivated looking” into the recordings (ten Have, 2007).
The purpose of this “empirical bite” (Clift, 2016) is to remain as objective as pos-
sible and refrain from making deductions based on anything but the recorded
data. Originally, “purist” CA scholars were sceptical about the role of contextual
knowledge in all stages of the work (ten Have, 2007). However, when conducting
research on work communities with distinctive routines and characteristics, the
reasoning of what occurs in an interactional event may need to be complemented
and enriched with additional information. Gaining ethnographic knowledge from
study participants that adds to what can be learned from the video recordings has
been found beneficial, if not crucial, in some cases: It is “required for entry into
settings that are generally inaccessible, for the understanding of local activities,
for the identification of what has to be recorded, and also for the arrangement/
positioning of the recording device(s)” (Mondada, 2013, p. 37; see also Maynard,
2006). In addition, as Lindholm (2016) notes, researchers who have collected their
own data may begin to identify the focal phenomena of their research already
during the data collection and write about them in their field notes. In applied CA
in particular, the role of observation and ethnographic knowledge has been recog-
nised and appreciated during the past years (e.g., Antaki, 2011).
Researcher positionality, including the use of additional data and the research-
er’s role during fieldwork, has been addressed by many EMCA scholars (e.g.,
Hoey & Kendrick, 2017; Maynard, 2006; Peräkylä, 2004; ten Have, 2007). The
most recent work has touched upon participation in the interactional events of the
recording, challenging the way the researcher’s role has been traditionally enacted
Inductive approach in EMCA 155
(see, e.g., Hofstetter, 2021; Katila et al., 2021; Pehkonen et al., 2021). One aspect
that has not been comprehensively described nor discussed in previous EMCA
literature is: how the researcher’s contextual knowledge, deriving from experi-
ences and observations during the data collection phase and the relationships built
with the community members, informs the identification of research topics and
the formation of research questions, and how it helps to analyse the data (but see
Lindholm, 2016; Waring et al., 2012). Peräkylä (2004), for example, highlights the
importance of gathering additional information on the research site through eth-
nographic observation, interviews, and questionnaires. This information can then
be used “to contextualise the CA observations, in terms of the larger social system
of which the tape-recorded interactions are apart”, as well as to “offer information
without which also the understanding of tape-recorded interactions may remain
insufficient” (Peräkylä, 2004, p. 169).
In this chapter, our aim is to reflect on our own experiences conducting
research on the context of UN Military Observer training courses. We discuss
how studying this unique setting has required us to take a more participatory
approach to EMCA, including a multi-phase data collection process, upholding
continuous dialogue with the studied community, and gaining a unique kind
of membership in it. In this chapter, we introduce the ways in which collecting
and using complementary data have become constitutive elements at different
stages of the research project and helped us refine our research questions and
objectives along the way. Furthermore, we will discuss the practical, theoreti-
cal, and methodological aspects concerning the process and our way to adapt
the “conversation analytic mentality” (Schenkein, 1978), and what it has meant
to us and how we have reflexively developed it. We revisit the core EMCA con-
cepts of inductivity and unmotivated looking regarding the data collection and
analysis and consider how they can, or sometimes cannot, be adhered to in the
investigation of all contexts. We also highlight the significance of accumulated
ethnographic knowledge in reaching a sufficient level of competency, or “unique
adequacy” (see Garfinkel, 2002; Garfinkel & Wieder, 1992; Jenkings, 2018),
which is needed to understand the study participants’ locally produced actions.
Finally, we propose that such ethnographic knowledge and researchers’ lived
experiences during fieldwork can comprise what we call proto-data. Proto-data
informs our in-situ analyses of the observed interactional situations and can help
to identify and establish topics and foci already before the start of the video-
based micro-analysis. The chapter is ordered chronologically: We first explain
the initial steps of the project and the work done to understand the setting, and
then move on to discuss the different stages of data collection, and how we got
from the point of unmotivated looking to a more informed data collection and
analysis. Lastly, we illustrate how the knowledge gained during the whole pro-
cess has helped us make valid arguments on the video recorded data, which
would have not been possible without it.
156 Antti Kamunen et al.
FIGURE 8.1
A realistic UN base in the simulated operation area.
knowledge, such as extensive observation, field notes, and the continuous dialogue
with the studied community, as elements of proto-data that can contribute to iden-
tifying research foci already prior to the start of the video analysis (cf. Lindholm,
2016). During the video analysis phase, these same observations and experiences
are used as ethnographic data alongside the recordings. The role of accumulated
knowledge has been fundamental, since it has guided us to see potential research
topics, refine our research questions and do initial analyses already from what we
observed around us during the preliminary stages of data collection. Furthermore,
access to such ethnographic data has been key in making valid deductions based
on the video recordings, which would otherwise be rather impenetrable. In this
section, we will further elaborate on the key concepts and the overall progression
of our work (Figure 8.2).
FIGURE 8.2
Visualisation of the research process.
158 Antti Kamunen et al.
Preparing for data collection: from first contact to first visit to the field
EMCA research on institutional contexts may have different kinds of starting
points and motivations, which are often driven by the researcher’s own interests.
However, they may also be affected or even instigated by the needs of the studied
community, which has increasingly been the case with research on professional
settings (see e.g., Heath et al., 2003; Hindmarsh & Pilnick, 2007; Mondada, 2013).
In our case, the collaborating party raised matters that our research could address
from the beginning and at different organisational levels. In 2018, two of us
researchers took part in a management-level meeting, the purpose of which was to
strengthen the role of scientific research in the development of crisis management
training. It was during this meeting, when the then-Commandant of FINCENT
first brought up the military observer course. As part of the course, the trainees
rehearse car patrolling in independently operating pairs or groups of three, and at
certain points in the exercise many of them end up “getting killed.” Since in that
specific part of the course the teams operate independently without an instructor
present, the instructors have no access to what happens inside the vehicles prior
to these incidents, apart from the trainees’ accounts afterwards. Our video-based
research was seen as a possible solution to this problem. Thus, already at this
time, the focus of the first data collection and a possible wider research question
started to be composed based on the course organisers’ “substantive concerns”
(Maynard, 2006, p. 70).
In his seminal paper on the interplay between CA and ethnography, Maynard
(2006, p. 59) describes how he got interested in the phenomenon of delivering
diagnostic news in clinical settings through unmotivated listening of pre-recorded
audio tapes he had received, and reading of the transcripts that came with them. In
an endnote, he compares his approach to the data collection with the ethnographic
strategy of “hanging out” (Dingwall, 1997, p. 53 as cited in Maynard, 2006) in a
setting “to experience the people and the social situation, avoiding prior questions
and letting the situation pose its own questions” (Maynard 2006, p. 83, fn1). This
made it possible for him to begin collecting new data with a focused phenomenon
in mind. In our case, the first discussion led to the first researcher visit in the
autumn of 2018, where the objective was to get to know the course and the setting
and make observations for planning the data collection on the following course.
This also functioned as an important first experience and a step towards gaining
the knowledge needed about the practices of the studied community.
The visit took place during the field exercise part when the trainees were con-
ducting their first simulated tasks. Although we had been previously presented
with one question to seek answers to, we had also been given a more or less free
rein to study whatever we might find interesting. Thus, we were invited to take
part in the course at first as observers, in order to get to know the course, its sur-
roundings, and contents, and identify potentially interesting parts of the course
that might include topics for our research. In other words, our aim during the first
Inductive approach in EMCA 159
visit was to hang out and look for different “interactional hot spots” (Jordan &
Henderson, 1995, p. 43). The course director introduced us to the basic structure
and objective of the course and presented the various tasks and exercises the train-
ees perform during the course. We had the possibility to inspect and document
the vehicles and the bases, which helped us in planning and, later, building the
camera systems. By that time, we had already decided to record in-car interac-
tion, and therefore the focus was on what possibilities and limitations we had for
installing the equipment. Once we saw the activities that took place in the bases
and other buildings, such as negotiation and mediation exercises, lessons, brief-
ings, and planning, we also started to plan different options to record in some of
those spaces. We were also given some suggestions about where to look by the
course teachers and instructors, who already had a good professional understand-
ing of the different interactionally challenging tasks and exercises of the course,
and who were also just interested in hearing interaction researchers’ – that is, our
– views on some course-related issues. During the visit we also got to sit in the
cars during patrolling exercises, which gave us insights into the practicalities and
contents of the tasks and was an eye-opener into the various events and challenges
that the trainees face during patrolling. We were provided yellow “invisibility”
vests that the instructors also used, which indicated that we did not exist in the
training scenario.
The first visit gave us several preliminary ideas for possible research topics,
especially related to patrolling exercises, and it afforded us with the first possibil-
ity to gain and begin to utilise proto-data for future data collection and analyses.
In this respect, the preparation and observation phases were key in the research
process, as they set the basis for the data collection on the next course. By tak-
ing an inductive and open-minded approach, we were able to reflexively identify
spaces and activities that could become sites for interesting interactional practices
and phenomena. Our approach thus resembles the ethnography of nexus analysis
(Scollon & Scollon, 2004; see also Raudaskoski, 2010) in that our aspiration was
to engage and create the kind of a zone of identification that enabled us to see
which phenomena were the most interesting ones.
meeting, had led us researchers there. As “invisible” passengers in the patrol cars,
we had no idea when to turn on our cameras and record, nor did we yet have port-
able power supplies with us. As a result of the inductive approach to the first data
collection, we could not record much, but instead, the recording was more reac-
tive, and the cameras were turned on only when something that we perceived as
potentially interesting happened. Although this reactiveness often led to the lack
of footage of the build-up to the recorded event, the gained first-person perspective
(Edmonds, 2021) allowed us to learn from the recordings which then helped with
the preparation for the next part of our work. As opposed to the first visit, when
we were taking part (as passengers) in the patrolling exercises during which the
instructors gave us a discrete heads-up to indicate good timings to start recording,
this time we were more independent and made a comprehensive amount of field
notes to which we were able to resort later.
During the second visit, the instructors approached us frequently with topics
related to their work and posed specific questions for us to consider, which indi-
cated that they now treated us as members of the community (see, e.g., Marttila,
2018). This mainly happened when the cameras were not recording, showing the
participants’ orientation to our varying roles in the course. This first data col-
lection could be characterised as an extended observation phase during which
we recorded anything and everything we deemed interesting through unmoti-
vated looking. Based on our observations during the course, we were then able to
develop even more refined ideas of what we could focus on more in depth.
FIGURE 8.3
Collecting data in the field (image from recording a task outside the
vehicle).
Inductive approach in EMCA 163
data collection was crucial in building more comprehensive collections for our
respective studies. In addition, it afforded us a deeper understanding of the course
and its contents, which helped validate our arguments in the upcoming analyses.
come in many forms, displaying the range of meanings that the noticings may
construct. Sometimes they are verbal acknowledgements, such as expressions that
build a context for continued scanning of the environment but with no reportable
details, or actions that actively progress the observation activity, such as listing
observations for reporting purposes. When studying noticings, access to knowl-
edge about the organisation of these moments and about the sociomaterial and
semiotic fields (Goodwin, 2000) around or within the car (e.g., maps, notebooks,
aide-mémoires, compasses, satellite navigators), namely what the trainees see and
experience, provided us with crucial information about the consequentiality of the
noticing actions: how they are produced and interpreted in the moment.
FIGURE 8.4
Illustrationof the mine incident case (from Kamunen, Haddington, &
Rautiainen, 2022).
layout. While this detail is also available in the data, it is rather far away from
the mine incident, and the connection between these two setups is not explicitly
verbalised by the participants in any of the cases. Watching the recordings of these
moments evoked a memory of the first-aid task while being present in the exercise
throughout the patrol route and helped to quickly make the connection to it, with-
out the need to comb through the tapes in order to identify the referent incident.
Overall, having knowledge not only about the interactional context under scrutiny
but also about past events proved invaluable for us researchers in identifying the
problem and potential reasons for its reoccurrence in the teams.
observing the teams caused one of us to focus on navigation and navigational talk.
The significance of the routines really became highlighted later when examin-
ing recordings of other teams, specifically when coming across a team that really
struggled with navigating, and forming and maintaining its routines.
Our understanding of the routines and practices of navigation developed out
of the perception that navigation and talk related to it takes up a notable amount
of time in the patrolling vehicle (Rautiainen, 2021). Even when everything goes
smoothly, patrolling is a continuous progression from waypoint to waypoint, and
navigating and following the route structures all other task-related activities, mak-
ing navigation and navigational talk ubiquitous for patrolling. The experiences
with and observations of the first followed team were the starting point that led
to examining other teams and their actions. The objective was to find out how
the teams “do navigating” and how they achieve a smooth progress that enables
performing various other tasks, such as documenting and reporting events, simul-
taneously with navigating.
To conclude, accumulated ethnographic knowledge, aspects of “proto-data”,
and developing unmotivated looking throughout the process, namely prior to
and during the recording phase, and before the actual analysis, have formed key
elements in all stages of our work. Their role has also been substantial, or even
required, in gaining “unique adequacy” and when identifying and analysing the
interactional phenomena: noticings, decision making, and navigation talk, which
cannot be fully understood in their respective environments and as part of the
military observer training context without complementary information.
as equally valid data with photographs and video-recordings, in EMCA, these phe-
nomena and questions will nevertheless have to be studied objectively through
what the method treats as the primary data: video recordings of naturally occurring
interactions (see, though, Stevanovic this volume).
We hope the experiences presented in this chapter become an incentive for
further discussion in EMCA and thereby contribute to possible reviewing, or even
redefining, of the conversation analytic research process. Whereas previous litera-
ture has included many studies regarding complex settings, utilising longitudinal
data and comprising ethnographic aspects (e.g., Maynard, 2006), there are only a
few methodological descriptions of “pre-analytic” processes, and how they have,
or have not, impacted the analyses or the results of the studies (e.g., Finlay et al.,
2011). We have aimed to show how gaining a member’s perspective on locally
produced social actions may sometimes require additional work to get a suffi-
cient amount of knowledge of the setting and the phenomena, and thereby to meet
the “unique adequacy requirement” (Garfinkel, 2002; Garfinkel & Wieder, 1992;
Jenkings, 2018). To reach this goal, our study has looked into methodological
procedures and practices that go beyond traditional EMCA. Furthermore, it has
meant the utilisation of other research methods, such as ethnography, and taking
an inductive approach to a multilevel and multiphase data collection and analysis.
Our approach resembles prior research in which EMCA has been complemented
with more participatory methods or practices (e.g., Hofstetter, 2021; Katila et
al., 2021; Maynard, 2006), as well as some aspects of nexus analysis (Scollon &
Scollon, 2004), but it has its distinctive characteristics. Whereas Maynard (2006)
discusses his ethnography from the starting point of limited affinity, referring
to the use of other methods as something that complements or deepens under-
standing of what happens in video data, our experience has been more immersive.
Instead of merely taking advantage of ethnographic knowledge in order to gain
a better understanding of the studied context, our ethnographic experiences as
researchers had a more profound impact on our research process.
A part of our ethnographic approach has been finding the balance between ana-
lytic control and reaching “the right kind of” inductivity. The cumulating knowl-
edge gained through ethnographic observations has helped us outline our focus
and extend unmotivated looking into the phase of fieldwork. At the same time,
we have been interacting with the studied community who have also motivated
our views of what should or could be investigated. Along with our multi-layered,
multiphase approach to carrying out EMCA research, involving multiple occa-
sions of data collection and developing an in-depth understanding of the course
and its contents, we have had to define which topics or ideas have been identified
in ways that comply with the research mentality and, indeed, which topics or ideas
can be studied through EMCA. This has led us to situations where we have had to
negotiate with the course organisers and instructors about some of their wishes for
the research and the questions they would want answers to, and to build a shared
understanding of what can and what cannot be achieved through our method.
168 Antti Kamunen et al.
The purpose of this discussion is by no means to say that the traditional under-
standing and definition of the conversation analytic process is outdated or wrong.
On the contrary. We are all currently going through the various, previously
unviewed data (as there are plenty) in an unmotivated way to find new research
topics. Nevertheless, some contexts, such as the military observer course, are
too complex and multifaceted by their nature to be understood without the accu-
mulated knowledge and researcher participants’ perspective. However, there are
some methodological “risks” involved in the kind of fieldwork described in this
chapter. Leaning too far into the subjective observation of the studied environ-
ment can lead to jumping to conclusions; what might seem and feel like one thing
in the moment, can actually be proven to have been something else entirely once
examined through watching the recordings (which is, of course, one of the main
arguments for using EMCA in institutional and professional settings). Another
possible risk factor is the bias that can come from having gotten to know the
participants during the fieldwork. While having information on, for example,
individual participants’ personalities, expertise, and past experiences can give us
explanations for some specific moments or actions, this knowledge can only be
allowed to inform the analyses when it becomes evident in the recorded interac-
tions. Remaining conscious of these risks, we nevertheless see value in broaden-
ing the conversation analytic scope by combining in-situ observations and the
information gained through ethnographic knowledge gathering and incorporating
them as part of the analyses that show how the situation at hand developed for all
its participants, including those who, for some of the time, were researchers.
Acknowledgments
We wish to thank all the course participants, especially the trainees and the instruc-
tors, for letting us get a glimpse of their important and valuable work. We also thank
FINCENT and the Finnish National Defence University, Academy of Finland (pro-
ject numbers 287219 and 322199), Eudaimonia Institute at the University of Oulu
for their help and support. We extend our special thanks to Antti Siipo from the
LeaF infrastructure, who worked as our technician in the data collection process.
Note
1 In our overall research process, we recognise the connection with the ethnography of
nexus analysis (Scollon & Scollon, 2004) which takes into account the participants’
individual experience, skills and capacities (the historical body), as well as the shared
social space in which the interactions take place (discourses in place).
References
Antaki, C. (2011). Six kinds of applied conversation analysis. In C. Antaki (Ed.), Applied
conversation analysis: Intervention and change in institutional talk (pp. 1–14). Springer.
Arminen, I. (2005). Institutional interaction: Studies of talk at work. Ashgate.
Inductive approach in EMCA 169
Katila, J., Gan, Y., Goico, S., & Goodwin, M. H. (2021). Researchers’ participation roles
in video-based fieldwork: An introduction to a special issue. Social Interaction: Video-
Based Studies on Human Sociality, 4(2). https://doi.org/10.7146/si.v4i2.127184
Lindholm, C. C. (2016). Keskustelunanalyysi ja etnografia. In T. M. Stevanovic & C.
C. Lindholm (Eds.), Keskustelunanalyysi. Kuinka tutkia sosiaalista toimintaa ja
vuorovaikutusta (pp. 331–348). Vastapaino.
Marttila, A. (2018). Tutkijan positiot etnografisessa tutkimuksessa – Kentän ja
kokemuksen dialoginen rakentaminen. In P. Hämeenaho & E. Koskinen-Koivisto
(Eds.), Moniulotteinen etnografia (pp. 362–392). Ethnos-toimite 17.
Maynard, D. W. (2006). Ethnography and conversation analysis. In S. N. Hesse-Biber & P.
Leavy (Eds.), Emergent methods in social research (pp. 55–94). Sage.
Mondada, L. (2013). The conversation analytic approach to data collection. In J. Sidnell &
T. Stivers (Eds.), The handbook of conversation analysis (pp. 32–57). Wiley-Blackwell.
Nevile, M. (2013a). Collaboration in crisis: Pursuing perception through multiple
descriptions (how friendly vehicles became damn rocket launchers). In A. De Rycker
& Z. Mohd. Don (Eds.), Discourse and crisis: Critical perspectives (pp. 159–183).
Benjamins.
Nevile, M. (2013b). Seeing on the move: Mobile collaboration on the battlefield. In P.
Haddington, L. Mondada, & M. Nevile (Eds.), Interaction and mobility: Language and
the body in motion (pp. 153–177). Walter de Gruyter.
Pehkonen, S., Rauniomaa, M., & Siitonen, P. (2021). Participating researcher or researching
participant? On possible positions of the researcher in the collection (and analysis) of
mobile video data. Social Interaction: Video-Based Studies of Human Sociality, 42(2).
https://doi.org/10.7146/si.v4i2.127267
Peräkylä, A. (2004). Conversation analysis. In C. Seale, D. Silverman, J. Gubrium, & G.
Gobo (Eds.), Qualitative research practice (pp. 165–179). Sage.
Potter, J. (1997). Discourse analysis as a way of analysing naturally occurring talk. In
D. Silverman (Ed.), Qualitative research: Theory, method and practice (pp. 144–160).
Sage.
Raudaskoski, P. (2010). “Hi Father”, “Hi Mother”: A multimodal analysis of a significant,
identity changing phone call mediated on TV. Journal of Pragmatics, 42(2), 426–442.
Rautiainen, I. (2021). Talk and action as discourse in UN military Observer Course:
Routines and practices of navigation. In I. Chiluwa (Ed.), Discourse and conflict:
Analysing text and talk of conflict, hate and peace-building (pp. 381–412). Palgrave.
Schegloff, E. A. (1996). Turn organization: One intersection of grammar and interaction.
In E. Ochs, E. A. Schegloff, & S. A. Thompson (Eds.), Interaction and grammar (pp.
52–133). Cambridge University Press.
Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation
analysis. Cambridge University Press.
Schenkein, J. N. (Ed.). (1978). Studies in the organisation of conversational interaction.
Academic Press.
Scollon, R., & Scollon, S. W. (2004). Nexus analysis: Discourse and the emerging Internet.
Routledge.
Stivers, T., & Rossano, F. (2010). Mobilizing response. Research on Language and Social
Interaction, 43(1), 3–31.
Stivers, T., & Sidnell, J. (2005). Introduction: Multimodal interaction, 156, 1–20. https://doi
.org/10.1515/semi.2005.2005.156.1
ten Have, P. (2007). Doing conversation analysis. Sage.
Waring, H. Z., Creider, S., Tarpey, T., & Black, R. (2012). A search for specificity in
understanding CA and context. Discourse Studies, 14(4), 477–492.
9
A SATELLITE VIEW OF SPATIAL
POINTS IN CONVERSATION1
Joe Blythe, Francesco Possemato, Josua Dahmen,
Caroline de Dear, Rod Gardner, and Lesley Stirling
The video recordings of the extracts in this chapter may be accessed through Figshare
using the QR below or the following link: https://doi.org/10.25949/18133682
Introduction
When using video to record conversational interaction, we often observe par-
ticipants pointing to locations that are not within their immediate vicinity; the
targets of these points extend beyond the scene being captured by the video
cameras. This is not a members’ concern for participants that share knowledge
of their local environment, but for the external analysts unacquainted with the
local topography, it can pose a methodological challenge. We try to level this
imbalance by considering the wider spatial context in which locational points
are produced.
DOI: 10.4324/9781003424888-12
172 Joe Blythe et al.
FIGURE 9.1
Main point components (adapted from Le Guen, 2011, p. 272).
174 Joe Blythe et al.
This chapter demonstrates how GPS-derived information and GIS visual rep-
resentations of the geospatial environment can be integrated into multimodal
studies of interaction in order to capture and visualise the directionality of loca-
tional pointing gestures, particularly when studying conversational place refer-
ence and spatial language and cognition. The geospatial approach offers a new
analytic tool for the investigation of pointing practices in interaction. While the
multimodal approach to conversational data (cf. Goodwin, 2012; Mondada, 2014;
Mondada, 2019b; Stivers & Sidnell, 2005) allows analysts to uncover the rela-
tionship between the multisemiotic resources that participants recruit when doing
locational reference, the geospatial framework enhances and expands the domain
of talk-in-interaction by facilitating the incorporation of the wider spatial context
in which the locational points are produced. The methodological procedures are
intended to provide researchers with a systematic account for the analysis of point-
ing gestures relative to parts of speech, sequences of actions in interaction, and the
wider geographic environment in which the pointing gestures occur.
The chapter is structured as follows. The next section offers an outline of
the existing research on locational pointing and the different methodologies
employed in the literature. We then illustrate the step-by-step procedures that are
used to identify, code, and visually represent pointing gestures in order to verify
their directional accuracy. The following section demonstrates how geospatial
considerations are integrated into the interactional analysis by presenting a geo-
spatial analysis of pointing gestures in four excerpts of conversation conducted
in typologically distinct languages from the CIARA corpus: the Aboriginal lan-
guages Murrinhpatha, Gija, and Jaru, as well as Australian English spoken by
non-Aboriginal people in Halls Creek in the north of Western Australia (see
Figure 9.2). We conclude with a discussion of the prospects and implications
for the study of naturally occurring locational pointing practices and spatial
language.
FIGURE 9.2
The CIARA Project field sites and languages discussed in this chapter.
two speech communities (Yucatec Maya and French), Le Guen examines pointing
gestures used for path indication and signalling the location of distant entities.
Central to the development of our framework, he recommends that:
instance, Enfield (2009) and Enfield et al. (2007) investigate pointing gestures
in semi-structured “locality interviews”, as well as in naturally occurring Lao
conversations. Similarly, Kendon (1992, 1995, 2004) draws on informal conversa-
tions, (semi-)institutional interactions, and elicited talk, to closely examine the
interactional and pragmatic aspects of pointing. In another study, Kendon and
Versante (2003) describe and compare six distinctive types of deictic manual ges-
tures produced during occasions of place and person reference using naturally
occurring video recorded Italian conversations. Socio-interactional research on
the coordination of pointing gestures and speech has also combined various meth-
ods – i.e., experimental and naturalistic data – in an effort to complement psycho-
linguistic conceptualisations of pointing gestures with interactional perspectives
gleaned from the analysis of everyday talk (e.g., Kita, 2003b).
Interactional research informed by Ethnomethodology (Garfinkel, 1967)
and CA has systematically investigated interactants’ embodied conduct and its
relationship with turns-at-talk (Schegloff, 1984), with a focus on the sequential
organisation of actions in interaction (for an overview, cf. Deppermann, 2013;
Nevile, 2015) and on the relationships between grammar and embodied actions
(e.g., Couper-Kuhlen, 2018; Iwasaki, 2009, 2011; Keevallik, 2018). Over the past
thirty years conversation analytic research has demonstrated the complexities of
pointing and its finely tuned coordination with talk. Conversation analysts have
explored pointing in a variety of contexts, including work meetings (Mondada,
2007), archaeological field excavation (Goodwin, 2003), parent–child interactions
(Filipi, 2009), interactions involving aphasic participants (Goodwin, 2003; Klippi,
2015), and children with Down’s syndrome (Wootton, 1990). Moreover, a recent
special issue of Open Linguistics on place reference in interaction (Enfield & San
Roque, 2017) includes analyses of pointing behaviour (Blythe et al., 2016; Sicoli,
2016; Williams, 2017).
Although the fine details of interaction and matters such as temporal and
sequential relations have traditionally been at the centre of conversation analytic
research (e.g., Deppermann & Streeck, 2018; Mondada, 2016; Mushin & Doehler,
2021; Streeck et al., 2011), the relationship between locational gestures, the wider
space, and geography has regularly been overlooked (however, see Auer et al.,
2013). With some exceptions (e.g., Stukenbrock, 2014), conversation analytic stud-
ies have generally considered the immediate interactional context, describing the
ways in which participants may orient to specific interactional affordances offered
by the surrounding environment, artefacts, and other local referents. While our
geospatial framework aligns with the temporal and sequential conventions of
orthodox CA, the GIS and GPS technologies allow us to incorporate the wider
spatial context into the analysis, allowing a more holistic view of the environ-
ment, extending it beyond the very proximal setting in which face-to-face interac-
tion transpires. This allows the external analyst to visualise, from space, the local
topography, and thus gain familiarity with participants’ backyards, which reduces
the epistemic disparity between co-participants and external analysts. The utility
178 Joe Blythe et al.
of this framework lies not so much in establishing whether or not points are accu-
rate (although our procedures are very helpful in this regard) but in understanding
where events under discussion are alleged to have transpired. Conversational nar-
ratives often commence with place references and locational points may be part
of how place references are formulated (Dingemanse et al., 2017). When place or
person references are designed to be elliptical (especially for reasons of taboo)
(Blythe et al., 2016), a geospatial view of locational points can help external ana-
lysts understand where participants are talking about and therefore what they are
talking about, all of which is requisite information for most EMCA analyses.
Visualising geographically enriched conversational data
This section discusses the data collection procedures adopted in our framework
(de Dear et al., 2021; Possemato et al., 2021; Stirling et al., 2022), and illustrates
how the acquisition of geospatial information can yield spatial insight to support
interactional analyses of locational points in conversation.
A central aspect of collecting conversational data for geospatial investigations
is the geolocalisation of the recording location. An accurate geographical deter-
mination of the recording session site is critical for later determination of the
directionality of pointing gestures, and for the positioning of the interactional
scene within the wider topogeographical context. While exact coordinates can
be gathered through a GPS unit4, GPS derived information can be also imported
and utilised for visualisation purposes in various Geographic Information System
(GIS) software, such as Google Earth and QGIS. Once the geographical data are
instrumentally acquired, the recording location can be found on a map using GIS
software, such as Google Earth, where the recording place can be located, and a
placemark or pin can be added to the satellite imagery (Figure 9.3). The recording
location can also be saved and the associated spatial data can be stored, exported,
and imported from and into other GIS software programs that can be later used for
the mapping and visualization of geospatial data.
An equally crucial element for interactional analyses of locational pointing is
the annotation of the relative bearings of the camera(s) used in the recording ses-
sion. The alignment of the camera(s), expressed in degrees from true north, can
be acquired through the use of a compass, a handheld GPS device, or a dedicated
mobile application5. It is important to record the initial bearings of each camera, as
well as any subsequent alignment changes during the session in order to accurately
infer the pointing gestures’ directionality. If the location is known but the bearing
of the camera has not been recorded, it is possible to estimate the bearing (admit-
tedly with less precision) based on the rooflines of any buildings that are visible
within the satellite imagery, particularly in urban settings. The orientation of build-
ings can be measured using Google Earth’s ruler tool (see below).
A satellite view of spatial points 179
FIGURE 9.3
The placemark window.
FIGURE 9.4
A still frame of a pointing gesture.
FIGURE 9.5
The scene reproduced from a bird’s-eye perspective.
scene. In the graphic in Figure 9.5 the inferred pointing vector is represented by
the long arrow, the camera position and its bearing are represented by the short
arrow, and the pointing gesture is circled by a dotted line. It should be noted that
the participants’ spatial arrangement is inferred. By closely examining the video
recorded interaction and/or pictures of the scene we are now able to extrapolate
participants’ relative positioning, which allows us to produce a sufficiently accu-
rate visual representation.
A satellite view of spatial points 181
FIGURE 9.6
Using Google Earth’s ruler tool to orient to landscape with respect to the
absolute bearing of the video camera.
Next, the satellite imagery is rotated according to the camera bearing recorded
in the appropriate metadata file (for this example, the satellite imagery is rotated
110° east-south-east)7. A method to ensure that this rotation corresponds with the
camera bearing(s) is to use the ruler tool in Google Earth (cf. Figure 9.6). This will
make it possible to draw a semi-axis with a precise heading measured in degrees,
which should correspond with the bearing of the camera recorded in the metadata.
In other words, the line projected from the pinned recording location represents
the direction toward which the camera is pointing. After saving the line, the com-
pass on the top right-hand side of the screen (circled) can be rotated until it is
aligned vertically (see Figure 9.7).
A snapshot of the rotated satellite image displaying the recording location is
then captured and imported into a digital illustration software, where the origo (O)
of the point as well as its intended targets (Tn) are marked (in this case the target
communities are Kununurra, WA, and Wadeye, NT). A compass is also included
for reference (Figure 9.8).
A still frame of the stroke of the pointing gesture, including an arrow indicat-
ing the direction of the point and the names of participants can then be overlaid
onto the map. The extrapolated vector is signified by a solid arrow that has been
pivoted to correspond with the point from the still frame of the pointing gesture.
This arrow can be duplicated and re-sized to use as a reference on the compass.
When participants point somewhat inaccurately to a target location, an idealised
vector connecting the origo with the true target(s) – represented here by a barred
line – is then added to the image, as shown in Figure 9.9.
182 Joe Blythe et al.
FIGURE 9.7
The landscape is rotated until the camera bearing is aligned vertically
toward the top of the image
FIGURE 9.8
The rotated satellite imagery with Origo and Targets placemarked.
FIGURE 9.9
The satellite imagery showing actual and ideal pointing vectors.
angular discrepancy between Mabel’s actual point (A) and the intended target(s)
(T) to be 2°.
This process has illuminated the remarkable accuracy of interactants’ locational
pointing gestures, which is something that has continued to emerge in our research
on spatial reference and pointing in conversations conducted in remote Aboriginal
and non-Aboriginal Australian communities (de Dear et al., 2021; Stirling et al.,
2022). It has also highlighted interactants’ preference to maintain the relative dis-
tances between intended targets even when their points are somewhat inaccurate.
Graphically representing ideal and actual vectors has rendered the acuity of loca-
tional points analytically accessible. Moreover, this procedure has the potential to
yield significant quantitative insights especially when applied to large-scale cross-
linguistic studies on the acuity of locational pointing gestures.
while hashtags indicate the moment where the screenshots – all representing the
stroke phase of the points – were taken with reference to the talk (or absence
thereof). It should also be noted that the morphological description of the point-
ing gesture, along with its directionality, has been incorporated in the relative
figure caption. The coding of the pointing gestures was also informed by EMCA,
and by recent work within Pragmatic Typology (e.g., Dingemanse & Enfield,
2015; Floyd et al., 2020). We do not discuss these here; however, see Possemato
et al. (2021) for details.
The next section will show how geospatial methods can be employed to enrich
interactional analyses of place reference and locational points. The four excerpts
discussed here come from the CIARA corpus.
TABLE 9.1
The languages from the CIARA Project corpus
like if white people were to settle in the area and develop a city or a town. They
recall an old man at a meeting speaking out against possible development of the
area.
At lines 1 and 2 Rob recounts the old man’s words as he urged his countrymen to
protect the country from destruction. From lines 4 to 7 Ray and Rob display their
agreement about the old man being a good man (and wise). At lines 8, 10 and 12
Rob again reports the man’s oratory, where he pointed out that no bush medicine
would be growing in the area. He goes on to point out that the unspoilt country
has spirits in it (lines 14 and 16). At line 17 and 18 Ray makes the claim that if
white people had settled in this area, they would have ruined the country, as they
have done where they found natural gas. As he does this he points back over his
shoulder (Figure 9.10) in a west-south-westerly direction to the onshore gas pro-
cessing plant near Yeltjerr beach where a pipeline brings in gas from a gas field in
the Joseph Bonaparte Gulf. Jim agrees with Ray’s assertion at line 20. At lines 22
and 24 Rob then makes the seemingly justified claim that white people (if left to
their own devices) will destroy the whole country.
Ray’s reference to natural gas in line 18 is vaguely composed of the “residue”
nominal-classifier nandji plus the proximal demonstrative kanyi (“this stuff”).
A satellite view of spatial points 187
FIGURE 9.10
Ray produces a sagittally oriented index-finger point behind his left
shoulder, due WSW, towards the gas processing plant near Yeltjerr beach.
11 (0.5)
12 Phy ja- (0.2) janganyji boo:rroo:rn, hh
jang-wanyji boorr-oorn
eat -maybe 3NS.S-say/do_PRES
may- maybe they eat hh
13 Shi Mm.
Mm.
While gazing to the east (inset 1, Figure 9.119), Shirley produces a place refer-
ence at line 1 which is comprised of a Kriol distal deictic tharrei (“that way/over
there”) plus two (apparently contradictory) Gija geocentric terms related to a river-
drainage system: gendoowa (“upstream”) and yoorloo (“downstream”). Phyllis,
FIGURE 9.11
Shirley gazes and lip-points E.
A satellite view of spatial points 189
who does not appear to have noticed Shirley’s eastward gaze, initiates repair at line
3 with the Kriol question word, wijeyi (“where”). Shirley specifies the location at
line 6, as “up here at Cattle Creek”. This is composed of a proximal demonstrative
(ngenengga), an elevational term (gerloorr, “up”), the English name of the creek,
plus a lip-point to the east (inset 2, Figure 9.11). Helen displays apparent recogni-
tion of this location at line 7. Phyllis then goes on (at lines 12 and 14) to suggest
that “these frogs” might account for the absence of fish in the creek. It is likely that
Phyllis is here referring to Cane Toads (Bufo marinus), a highly poisonous feral
toad that is having a devastating impact on the local fauna. Shirley’s gaze is quite
accurately directed toward Cattle Creek, a tributary of the Bow River that lies some
25 km from where the women are seated.
In the next extract from the Jaru corpus, the three participants, Juanita, Nida, and
Ruby are sitting near the abandoned Gordon Downs homestead in the East Kimberley,
where Nida and Ruby used to live when they were young. Just before the exchange in
Extract (3), Nida explained that planes used to land at a nearby aerodrome to trans-
port sick people. This prompts Juanita to ask where kids were born back in the day.
In this question–response sequence, Juanita asks Nida where most of the children
from Gordon Downs were born (lines 1–2), to which Nida responds that some of
them were born in Wyndham (line 4). Wyndham is the northernmost town of the
Kimberley region, located about 370 km in air distance from the recording location.
As Nida utters the place name Windum (“Wyndham”), she gazes to her left side and
points backwards with her thumb in a north-westerly direction to indicate the loca-
tion of Wyndham (Figure 9.12). The direction of Nida’s thumb-point lies within 35
degrees of the target location which is considerably accurate. At line 5 Juanita nods
and repeats the place name Wyndham in sequential third position to overtly regis-
ter Nida’s response (cf. Schegloff, 1996, pp. 178–179). After 1.9 seconds of silence,
Nida then adds that some children were also born at the Gordon Downs homestead,
which is close to the recording location and within sight. At lines 7 through 11,
Nida points with a fluttering open hand to the area of the homestead, followed by
an index-finger point with a series of six pulses in the direction of an abandoned
building that sits 240m in an east-south-easterly direction (Figure 9.13).
FIGURE 9.12
Nida thumb-points backwards NW to Wyndham and gazes sideways.
FIGURE 9.13
Nida points to Gordon Downs through an elevated index finger point ESE
producing six consecutive pulses.
A satellite view of spatial points 191
Although both Nida’s place references to Wyndham and Gordon Downs involve
a pointing gesture in combination with a place name, there is a crucial difference:
The place reference to Wyndham is combined with a backward pointing gesture
that does not invite the recipient(s) to re-direct their gaze; it is what has been
described as a “secondary point” (Enfield et al., 2007). On the other hand, the
reference to Gordon Downs indicates a location in the participants’ vicinity and is
combined with spatial deictic expressions, murlawana (“around here”, line 7), rait
dea (“right there” in Kriol, line 8), and murlangga (“here”, line 11). The accom-
panying canonical pointing gesture to Gordon Downs conveys crucial locational
information and invites the recipient(s) to look in the direction of the pointing
gesture. While satellite imagery as presented in this chapter is especially powerful
for the analysis of secondary points to distant locations, it can also provide analysts
with geospatial information of more proximal locations that are the target of point-
ing gestures.
Our final extract comes from the Australian English corpus. The four partici-
pants Dave, Warren, Malcolm, and Jamie are seated near Halls Creek, in the East
Kimberley. Dave asks Warren whether he knows a way to transport his timber
factory to Adelaide “for nothing”, hinting at the possibility of Warren helping him
move the factory on one of his trips.
1 Dave warr^en,
2 (0.5)
3 Dave you know a bit about- (0.4) you know a lot about (.) truck=an’
4 things like that don’t [you.
5 Warr [o:h not really˘ .hmf °°i’m no-°°
6 Dave $come o[n$]
7 Warr [i-] i’m not a truckie mpf. he he h. hh.
8 (0.7)
9 Dave nah but [shif]ting stuff around the place
10 Warr [wh- ]
11 Warr what do you- (.) what do you w^ant
12 Dave w-
13 Warr >what’s happe[ning<]
14 Dave [if i ] want to *shift the timber factory, (1.0)
Dave *thumb-points behind --->>
15 from here* (0.9) #*down- (0.5) f- down*
Dave -->* #1 Figure 9.14-------*
16 *(0.2) to adel#aide o- o- yeah adelaide will do,* (0.7) uhm f’r
Dave *-------------#2 Figure 9.14--------------------*
17 nothing, >how would i do it<?
18 Warr h. ha [ha ha
19 Malc [ha [ha ha ha ha
20 Jam [hh. ah ah [ah
21 Warr [>put it in< a wheelbarrow.
22 (0.2)
192 Joe Blythe et al.
FIGURE 9.14
Dave points SSE to Adelaide (1) and (2).
After summoning Warren’s attention (line 1), Dave produces a pre-request over
lines 3, 4, and 9, to which Warren issues a go-ahead at lines 11 and 13. Between
lines 14 and 17 Dave produces a seemingly tongue-in-cheek request for haulage
advice – perhaps masking an actual request for haulage. After initially pointing
with his thumb to the timber factory (lines 14 and 15), Dave produces a small
index finger point south-south-east as he launches a word-search at line 15 (inset
1, Figure 9.14). At line 16 Dave again points in the same direction (inset 2, Figure
9.14), this time accompanied by the place name, Adelaide. Dave’s points are
remarkably accurate (to within 5 degrees). Adelaide is a city of 1.3 million people,
located 2,133 km away from where the participants are seated.
It is worth noting that despite the enormous distance, Dave’s points are small in
scale and neither point is elevated. These gestures are anti-iconic in that they are
inversely proportional in magnitude to vast distance being indicated (cf., Bauer,
2014; Le Guen, 2011; Levinson, 2003; Wilkins, 2003, inter alia). Warren, Jamie,
and Malcolm do not need enlightening that Adelaide is a long way from Halls
Creek. The tiny points may instead downplay the imposition of the request (“It’s
only a small way to go!”) and thereby contribute toward prompting the co-partici-
pants’ joint laughter at lines 18–20. This laughter, along with Warren’s dismissive
suggestion that Dave haul his own factory in a wheelbarrow (line 21), demonstrate
Dave’s multimodal reference to this distant location to be more than adequate for
his communicative objectives.
Conclusion
Place reference in conversation needs to be designed in a way to enable recipients’
recognition of where a speaker is referring to. An EMCA approach to interaction
A satellite view of spatial points 193
Notes
1 The methodological framework discussed here was first published in Possemato et al.
(2021). This chapter reworks the procedures discussed in that article, re-presenting
them for an EMCA audience – with altogether different data.
2 Conversation Analysis in Aboriginal and Remote Australia (CIARA) is a collabo-
rative research project funded by the Australian Research Council (DP180100515,
www.ciaraproject.com) that uses Conversation Analytic/Interactional Linguistic tech-
niques to compare conversational interactions across different languages, cultures and
geographic locations within the Australian outback.
3 For a critique of FoR models and Whorfian assumptions in studying naturally occur-
ring points in interaction cf. de Dear et al. (2021).
4 It is worth noting that some cameras can mount GPS receivers or have in-built GPS.
Various mobile phone applications, such as iOS Compass, and other GPS trackers can
also be used for geolocalization purposes.
5 Compass sensors are usually integrated in GPS units and in GPS mobile applications.
6 Research on gesture describes points as generally articulated into three core distinct
phases, namely preparation, stroke, and retraction (e.g., McNeill, 1992; Kita et al.,
1998).
194 Joe Blythe et al.
References
Auer, P., Hilpert, M., Stukenbrock, A., & Szmrecsanyi, B. (Eds.). (2013). Space in
language and linguistics: Geographical, interactional, and cognitive perspectives.
De Gruyter.
Bauer, A. (2014). The use of signing space in a shared sign language of Australia. De
Gruyter Mouton; Ishara Press.
Blythe, J., Mardigan, K. C., Perdjert, M. E., & Stoakes, H. (2016). Pointing out directions in
Murrinhpatha. Open Linguistics, 2(1), 132–159. https://doi.org /10.1515/opli-2016 - 0007
Couper-Kuhlen, E. (2018). Finding a place for body movement in grammar. Research on
Language and Social Interaction, 51(1), 22–25. https://doi.org/10.1080/08351813.2018
.1413888
Dahmen, J. (2021). Bilingual speech in Jaru-Kriol conversations: Codeswitching,
codemixing, and grammatical fusion. International Journal of Bilingualism, 26(2),
198–226. https://doi.org/10.1177/13670069211036925
de Dear, C. (2019). Place reference and pointing in Gija conversation. Macquarie
University Master of Research. http://hdl.handle.net /1959.14/1284209
de Dear, C., Blythe, J., Possemato, F., Gardner, R., Stirling, L., Mushin, I., & Kofod, F.
(2021). Locational pointing in Murrinhpatha, Gija and English conversations. Gesture,
20(3), 417–452. https://doi.org/doi.org/10.1075/gest.20035.dea
de Dear, C., Possemato, F., & Blythe, J. (2020). Gija (east Kimberley, Western Australia)
– Language snapshot. Language Documentation and Description, 17, 134–141. http://
www.elpublishing.org/ PID/189
Deppermann, A. (2013). Multimodal interaction from a conversation analytic perspective.
Journal of Pragmatics, 46(1), 1–7. https://doi.org/10.1016/j.pragma.2012.11.014
Deppermann, A., & Streeck, J. (Eds.). (2018). Time in embodied interaction. John
Benjamins Publishing Company.
Dingemanse, M., & Enfield, N. J. (2015). Other-initiated repair across languages: Towards
a typology of conversational structures. Open Linguistics, 1(1), 96–118. https://doi.org
/doi:10.2478/opli-2014- 0007
Dingemanse, M., Rossi, G., & Floyd, S. (2017). Place reference in story beginnings: A
cross-linguistic study of narrative and interactional affordances. Language in Society,
46(2), 129–158. https://doi.org/10.1017/S0047404516001019
Enfield, N. J. (2009). The anatomy of meaning: Speech, gesture, and composite utterances.
Cambridge University Press.
Enfield, N. J. (2013). Reference in conversation. In J. Sidnell & T. Stivers (Eds.), The
handbook of conversation analysis (pp. 433–454). John Wiley & Sons, Ltd.
Enfield, N. J., Kita, S., & de Ruiter, J. P. (2007). Primary and secondary pragmatic functions
of pointing gestures. Journal of Pragmatics, 39(10), 1722–1741. https://doi.org/10.1016
/j.pragma.2007.03.001
Enfield, N. J., & San Roque, L. (2017). Place reference in interaction. Open Linguistics,
3(1). https://doi.org/10.1515/opli-2017- 0029
A satellite view of spatial points 195
Filipi, A. (2009). Toddler and parent interaction: The organisation of gaze, pointing and
vocalisation (Vol. 192). John Benjamins.
Floyd, S., Rossi, G., & Enfield, N. J. (Eds.). (2020). Getting others to do things: A pragmatic
typology of recruitments. Language Science Press. https://doi.org/10.5281/zenodo.4017493
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice-Hall.
Goodwin, C. (2003). Pointing as situated practice. In S. Kita (Ed.), Pointing: Where
language, culture, and cognition meet (pp. 217–242). L. Erlbaum Associates.
Goodwin, C. (2006). Human sociality as mutual orientation in a rich interactive
environment: Multimodal utterances and pointing in aphasia. In N. J. Enfield & S. C.
Levinson (Eds.), Roots of human sociality (pp. 96–125). Berg.
Goodwin, C. (2012). The co-operative, transformative organization of human action and
knowledge. In Proceedings of the 14th ACM international conference on multimodal
interaction (ICMI ’12), 1–2. ACM.
Green, J. (2014). Signs and space in Arandic sand narratives. In M. Seyfeddinipur & M.
Gullberg (Eds.), From gesture in conversation to visible action as utterance (pp. 219–
243). John Benjamins Publishing Company.
Green, J., & Wilkins, D. P. (2014). With or without speech: Arandic Sign Language from
Central Australia. Australian Journal of Linguistics, 34(2), 234–261. https://doi.org/10
.1080/07268602.2014.887407
Haviland, J. B. (1993). Anchoring, iconicity, and orientation in guugu Yimithirr pointing
gestures. Journal of Linguistic Anthropology, 3(1), 3–45. https://doi.org/10.1525/jlin
.1993.3.1.3
Haviland, J. B. (1998). Guugu Yimithirr cardinal directions. Ethos, 26(1), 25–47. https://doi
.org/10.1525/eth.1998.26.1.25
Haviland, J. B. (2000). Mental maps and gesture spaces. In D. McNeill (Ed.), Language
and gesture: Window into thought and action (pp. 13–46). Cambridge University
Press.
Haviland, J. B. (2003). How to point in Zinacantán. In S. Kita (Ed.), Pointing: Where
language, culture, and cognition meet (pp. 139–169). Lawrence Erlbaum Associates.
Hepburn, A., & Bolden, G. (2017). Transcribing for social research. Sage Publications.
Iwasaki, S. (2009). Initiating interactive turn spaces in Japanese conversation: Local
projection and collaborative action. Discourse Processes, 46(2–3), 226–246. https://doi
.org/10.1080/01638530902728918
Iwasaki, S. (2011). The multimodal mechanics of collaborative unit construction in
Japanese conversation. In J. Streeck, C. Goodwin, & C. LeBaron (Eds.), Embodied
interaction: Language and body in the material world (pp. 106–120). Cambridge
University Press.
Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In G. H. Lerner
(Ed.), Conversation analysis: Studies from the first generation (pp. 13–23). John
Benjamins.
Keevallik, L. (2018). What does embodied interaction tell us about grammar? Research
on Language and Social Interaction, 51(1), 1–21. https://doi.org/10.1080/08351813.2018
.1413887
Kendon, A. (1992). Some recent work from Italy on “quotable gestures (emblems).” Journal
of Linguistic Anthropology, 2(1), 92–108. https://doi.org/10.1525/jlin.1992.2.1.92
Kendon, A. (1995). Gestures as illocutionary and discourse structure markers in Southern
Italian conversation. Journal of Pragmatics, 23(3), 247–279. https://doi.org/10.1016
/0378-2166(94)00037-F
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.
196 Joe Blythe et al.
Kendon, A., & Versante, L. (2003). Pointing by hand in Neapolitan. In S. Kita (Ed.),
Pointing: Where language, culture, and cognition meet (pp. 109–137). Lawrence
Erlbaum Associates.
Kita, S. (2003a). Pointing: Where language, culture, and cognition meet. L. Erlbaum
Associates.
Kita, S. (2003b). Interplay of gaze, hand, torso, and language. In S. Kita (Ed.), Pointing:
Where language, culture, and cognition meet (pp. 307–328). L. Erlbaum Associates.
Kita, S. (2009). Cross-cultural variation of speech-accompanying gesture: A review.
Language and Cognitive Processes, 24(2), 145–167. https://doi.org/10.1080
/01690960802586188
Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech
gestures, and their transcription by human coders. In I. Wachsmuth & M. Fröhlich
(Eds.), Gesture and sign language in human-computer interaction (Lecture Notes in
Computer Science) (pp. 23–35). Springer. https://doi.org/10.1007/ BFb0052986
Klippi, A. (2015). Pointing as an embodied practice in aphasic interaction. Aphasiology,
29(3), 337–354. https://doi.org/10.1080/02687038.2013.878451
Le Guen, O. (2011). Modes of pointing to existing spaces and the use of frames of reference.
Gesture, 11(3), 271–307. https://doi.org/10.1075/gest.11.3.02leg
Levinson, S. C. (1996). Language and space. Annual Review of Anthropology, 25(1), 353.
Levinson, S. C. (1997). Language and cognition: The cognitive consequences of spatial
description in guugu Yimithirr. Journal of Linguistic Anthropology, 7(1), 98–131.
https://doi.org/10.1525/jlin.1997.7.1.98
Levinson, S. C. (2003). Space in language and cognition: Explorations in cognitive
diversity. Cambridge University Press.
Levinson, S. C., Kita, S., Haun, D. B. M., & Rasch, B. H. (2002). Returning the tables:
Language affects spatial reasoning. Cognition, 84(2), 155–188. https://doi.org/10.1016
/S0010 - 0277(02)00045-8
Levinson, S. C., & Wilkins, D. P. (Eds.). (2006). Grammars of space: Explorations in
cognitive diversity. Cambridge University Press.
Majid, A., Bowerman, M., Kita, S., Haun, D. B. M., & Levinson, S. C. (2004). Can language
restructure cognition? The case for space. Trends in Cognitive Sciences, 8(3), 108–114.
https://doi.org/10.1016/j.tics.2004.01.003
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of
Chicago Press.
Mesh, K., Cruz, E., van de Weijer, J., Burenhult, N., & Gullberg, M. (2021). Effects of scale
on multimodal deixis: Evidence from Quiahije Chatino. Frontiers in Psychology, 11,
3183. https://doi.org/10.3389/fpsyg.2020.584231
Mondada, L. (2007). Multimodal resources for turn-taking: Pointing and the emergence
of possible next speakers. Discourse Studies, 9(2), 194–225. https://doi.org/10.1177
/1461445607075346
Mondada, L. (2014). The local constitution of multimodal resources for social interaction.
Journal of Pragmatics, 65, 137–156. https://doi.org/10.1016/j.pragma.2014.04.004
Mondada, L. (2016). Challenges of multimodality: Language and the body in social
interaction. Journal of Sociolinguistics, 20(3), 336–366. https://doi.org/10.1111/josl.1
_12177
Mondada, L. (2019a). Conventions for multimodal transcription. https://www
.lorenzamondada.net/_files/ugd/ ba0dbb_986ddd4993a04a57acf20ea06e2b9a34.pdf
A satellite view of spatial points 197
Wilkins, D. P. (2003). Why pointing with the index finger is not a universal (in sociocultural
and semiotic terms). In S. Kita (Ed.), Pointing: Where language, culture and cognition
meet (pp. 171–215). Lawrence Erlbaum Associates.
Williams, N. (2017). Place reference in Kula conversation. Open Linguistics, 3(1). https://
doi.org/10.1515/opli-2017- 0028
Wootton, A. J. (1990). Pointing and interaction initiation: The behaviour of young children
with Down’s syndrome when looking at books. Journal of Child Language. 17(3), 565–
589. https://doi.org/10.1017/S0305000900010886
Abbreviations
ALL: allative, ANAPH: anaphoric demonstrative, APPL: applicative, CAT: cata-
lyst, CL: clitic, DIST: distal demonstrative, DO: direct object, EMPH: emphatic,
EXIST: existential, FOC: focus, FUT: future, GEN: genitive, IMPV: imperfective,
INTS: intensifier, INTJ: interjection, IRR: irrealis, LOC: locative, M: masculine,
NC:ANM: ‘animate’ noun classifier, NC:PERS: ‘person’ noun classifier, NC:PL/T:
‘place/time’ noun classifier, NC:RES: ‘residue’ noun classifier, NEG: negator,
NFUT: non-future, NMLZ: nominalizer, NSIB: non-sibling, NS: non-singular,
PC: paucal, PERL: Perlative, PL: plural, PRES: present tense, PROX: proximal,
PST: past, S: subject, SG: singular, TEMP: temporal adverbial.
10
EMCA INFORMED EXPERIMENTATION
AS A WAY OF INVESTIGATING
(ALSO) “NON-ACCOUNTABLE”
INTERACTIONAL PHENOMENA
Melisa Stevanovic
DOI: 10.4324/9781003424888-13
200 Melisa Stevanovic
in which a researcher could have compelling reasons to assume that the partici-
pants’ relationship is in some way fundamentally unequal or unbalanced (e.g.,
sexual harassment and violence), in which case the sole focus on the participants’
publicly displayed orientations leaves the researcher at the risk of disregarding
those aspects of interaction that are particularly relevant for the participants them-
selves (Wetherell, 1998; Billig, 1999).
In this chapter, I will describe EMCA informed research that deviates from
the ontological muteness characteristic for this field of inquiry. I will describe
EMCA informed research on two interactional phenomena that go beyond peo-
ple’s publicly displayed, accountable conduct – the prereflective human mirroring
mechanisms and the physiological underpinnings of social interaction. Although
these phenomena are likely to evade EMCA analytic tools, I still assume these
phenomena to play a key role in what EMCA is generally interested in – that is, in
how social interaction is organised as actions and sequences of action.
Dale (2013) found significantly less bodily synchrony within a dyad during argu-
mentative settings, compared to affiliative ones. Fusaroli & Tylén (2012) found
that when dyads were making joint decisions in a psychophysical task, the degree
to which the participants matched each other’s task-relevant expressions corre-
lated positively with their task performance, whereas the indiscriminate matching
of all expressions had a negative effect on the task performance. Findings such
as these suggests that people’s behaviours vary with respect to when they mirror
each other’s behaviours and when not. Some researchers have even suggested that
it is precisely the alternation between mirroring and non-mirroring that drives
the interaction and makes it interesting (Beebe & Lachman, 2002; Fuchs & De
Jaegher, 2009). These ideas highlight the relevance of prereflective mirroring from
the EMCA perspective (Stevanovic, Himberg et al., 2017; Stevanovic & Himberg,
2021). If people do not mirror each other’s behaviours all the time, the EMCA
researcher asks: “Why that now?” – that is, what it is in the situation at the moment
that makes mirroring more or less relevant. Even if the prereflective mirroring
behaviours go beyond people’s publicly displayed accountable behaviors, it is still
possible that these behaviours somehow contribute to people’s interpretations of
actions and sequences of action.
In our EMCA informed experimental study on dyadic joint decision-making
interaction (see Stevanovic, Himberg et al., 2017), we compared the degree of
similarity in the participants’ body sway during sequential continuations and
sequential transitions, finding that the instances of highest body-sway synchrony
occurred during the sequential transitions. In my view, this finding suggests that it
is specifically at those moments of interaction when a close coordination is critical
– for example, when the participants need to reach and display a common under-
standing that a joint decision has been reached – that the prereflective mirroring
mechanisms can become consequential for the sequential organisation of action.
From this perspective, the phenomenon becomes a topic of EMCA informed
inquiry.
linked to various social and emotional stimuli (e.g., DiMascio et al., 1957; Khalfa
et al., 2002; Marci et al., 2007; Stark et al., 2005). Other physiological response
variables that have been associated with specific social and affective processes
are heart rate (Konvalinka et al., 2011), breathing (McFarland, 2001), and facial
electromyographic activity (Deschamps, 2012). Measuring these physiological
responses requires technical equipment other than video recordings, which is new
to EMCA but has played a central role in psychological research.
One set of studies on the physiological underpinnings of social interaction has
focused on the degree of synchrony in the physiological changes in the interacting
participants (Feldman et al., 2011; Konvalinka et al., 2011; Marci et al., 2007). In
these non-EMCA studies, physiological synchrony has come across as a feature
of intense social interaction, which may range from competitive computer games
(Sovijärvi-Spapé et al., 2013) to fire-walking rituals (Konvalinka et al., 2011).
Consistently with this insight, our own EMCA informed experimental study on
joint decision-making interaction (Stevanovic et al., 2021) showed physiological
synchrony to be higher during proposal sequences, compared to the other types of
sequences constituting the participants’ conversational activity.
Another related set of studies has focused on how social-interactional events
relate to increases or decreases in the physiological indicators of participants’
arousal. For example, in an early study, DiMascio and colleagues (1957) exam-
ined psychotherapy sessions with reference to the categories of Bales’ Interaction
Process Analysis (Bales, 1950), finding that the categories “showing tension”,
“showing tension release”, and “showing antagonism” were reflected in the par-
ticipants’ heart rates in systematic ways. This type of focus has also characterised
some recent EMCA informed experimental studies on the physiological under-
pinnings of interaction (Koskinen et al., 2021; Peräkylä et al., 2015; Stevanovic
et al., 2019b, 2021, 2022; Voutilainen et al., 2014, 2018a). For example, focusing
on storytelling and story reception, Peräkylä and colleagues (2015) found that an
increased level of affiliative story reception is associated with a decrease in the
storyteller’s arousal and an increase in the story recipient’s arousal, as indicated in
the participants’ SC (skin conductance) response during the storytelling episodes
(see also Stevanovic et al., 2019b). The authors interpreted this finding by drawing
on the dyadic systems theory by Beebe and Lachmann (2002), which postulates
that the system by which participants regulate their affective arousal is bidirec-
tionally connected to the system by which participants regulate the unfolding of
social interaction.
From the perspective of EMCA, the dyadic systems theory by Beebe and
Lachmann (2002) offers an important insight. It suggests that, even though one-
half of this entire big picture escapes publicly displayed moral orientations (self-
regulation of arousal), the understanding of the entire picture is necessary to get a
deeper understanding of the other half of the system – the one that is governed by
the mechanism of moral accountability (regulation of interaction) and constitutes
the focus of traditional EMCA inquiry.
204 Melisa Stevanovic
video recordings alone, we may need to use some measuring equipment attached
to the participants’ bodies (see e.g., Stevanovic & Himberg et al., 2017; Holler
& Kendrick, 2015; Torreira et al., 2015;). This is also the case for physiologi-
cal measurements, which typically limit the participants’ physical freedom (see
e.g., Koskinen et al., 2021; Peräkylä et al., 2015; Stevanovic et al., 2019b, 2021;
Voutilainen et al., 2014, 2018a). Hence, whether and how experimentation would
make sense as a means of investigation must always be assessed in the light of the
specific questions that one wants to study.
EMCA informed experimental research may be motivated by various reasons.
First, one might want to assess the generalisability of certain findings obtained in
the qualitative data-driven scrutiny of naturally occurring interactions. The pos-
sibility of inducing repeated instances of those actions and sequences of action
that one has previously studied on a case-by-case basis and subjecting the obser-
vations to quantification and statistical analysis might come across as a natural
next step in one’s attempts to better understand these interactional phenomena.
Second, one might be interested in comparing the interactional practices of differ-
ent participant groups (e.g., participants with various clinical conditions). Given
that such differences cannot be considered as absolute and static but as subject to
high interindividual and intraindividual variation, analytic claim must be done
based on a larger collection of parallel cases than would be possible to obtain by
using naturally occurring data only.
Third, as has been highlighted in this chapter, one might want to examine
social-interactional phenomena, such as the prereflective human mirroring mech-
anisms and the physiological underpinnings of interaction, which can best be tech-
nically measured in laboratory conditions. This is not only due to the possibilities
of using measurement technologies in the lab, but also due to the nature of the
phenomena as going beyond publicly displayed accountable behaviours. To accept
the possibility that the construction of actions and sequences of action could be
also informed by interactional phenomena that the participants are not reflexively
aware of makes it relevant also for the researcher to accept the possibility that not
every single case in the data can be accounted for in similar terms. In other words,
the EMCA principle involving the need to account for all the so-called “deviant
cases” must be relaxed and replaced by effective ways of separating the generic
patterns of action construction from what may now be regarded as “noise”. Such
separation necessitates larger amounts of comparable data (with parallel actions
and sequences of action) than one might be able to obtain in natural settings.
radically different from the inductive EMCA enterprise with “unmotivated look-
ing” of naturally occurring interactions as a starting point. The EMCA informed
way of conducting experiments must therefore be conceived as a compromise
between these two entirely different ways of carrying out research.
I suggest that the research process associated with EMCA informed experi-
mentation encompasses the following five steps: (1) theorizing the interactional
target phenomenon, (2) inventing the social interaction tasks, (3) running the
experiments, (4) coding or rating and checking for inter-coder reliability, and (5)
statistical analysis and the interpretation of results. In what follows, I will briefly
discuss each step separately, pointing to certain complications and concerns that
an EMCA researcher might experience during them.
2013), and deontics (Stevanovic, 2018), although not much has yet been done in
this respect.
Notably, theorising interactional phenomena must not be exclusively based
on EMCA literature. In addition, the EMCA informed theorising of actions and
sequences of action may be augmented by using theories from other fields, such
as psychology (De Ruiter & Albert, 2017). It is from this perspective that also
questions about “non-accountable” interaction phenomena surface. As traditional
EMCA operates on ontological and epistemological assumptions that are not com-
patible with this view, some researchers might be inclined to deny the EMCA
informed nature of research on such questions. However, solely the idea of social
action and its multiple resources and configurations in chains of initiating and
responsive actions is essentially informed by EMCA. In my view, it is only right
and just to acknowledge this source of inspiration when theorising social interac-
tion, as it motivates asking questions that have not been asked by researchers from
other fields of inquiry.
L, M, N, O), and then asked to select eight adjectives that would start with these
letters, and describe a fictional target (e.g., Donald Duck). As a motivation for
the task, the participants were told to imagine being editors of a children’s book,
teaching the alphabet by featuring the target character with a series of adjectives.
The participants carried out the task entirely without experimenter intervention.
Once a decision for one letter was reached, the dyad moved on to the next letter in
the alphabet, which means that the typical practice by which a transition to a new
sequence was constructed was the mentioning of the next relevant letter (e.g., “and
then K”). As indications of transitions, such utterances were easily and reliably
identifiable from the data.
in the ways in which the task is explained to the participants may reflect on their
entire conversation. Also, in case the experiment consists of several tasks to be
conducted during a single laboratory session, it is important to have the order of
the tasks counter-balanced across the different sessions.
Conclusions
In this chapter, I have described EMCA informed experimentation as a way of
addressing social-interactional phenomena that go beyond people’s publicly dis-
played accountable behaviors, but which may still be ontologically real and some-
how relevant for how social interaction ends up becoming organised as actions
and sequences of action. I have given two examples of such phenomena: the pre-
reflective human mirroring mechanisms and the physiological underpinnings of
social interaction.
The investigation of “non-accountable” interactional phenomena is challeng-
ing from the EMCA perspective. Such investigation emphasises the need to use
quantification to distinguish between the basic and non-basic patterns of interac-
tion and involves a relaxation of the requirement to account for “deviant cases”.
While researchers in other fields are used to dealing with “noise” in the data, for
an EMCA researcher such an idea is hard to reconcile with the notion of “order
at all points” (Sacks, 1984, p. 22). EMCA informed experimental research thus
involves various complications and concerns that are likely to arise during the
research process. In addition, publishing the results may involve a further risk:
I once had a paper submitted to a journal for nine months, after which the editor
returned the submission stating that that they had been unable to find reviewers
for the paper.
“Non-accountable” interactional phenomena 211
References
Austin, J. L. (1962). How to do things with words. Harvard University Press.
Bales, R. F. (1950). Interaction process analysis: A method for the study of small groups.
Chicago: University of Chicago Press.
Bavelas, J., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of
Personality and Social Psychology, 79(6), 941–952.
Bavelas, J., Gerwing, J., & Healing, S. (2014). Effect of dialogue on demonstrations: Direct
quotations, facial portrayals, hand gestures, and figurative references. Discourse
Processes, 51(8), 619–655.
Bavelas, J., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing on the telephone:
Independent effects of dialogue and visibility. Journal of Memory and Language, 58(2),
495–520.
Beebe, B., & Lachmann, F. (2002). Infant research and adult treatment: Co-constructing
interactions. The Analytic Press.
Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the
sociology of knowledge. Harmondsworth: Penguin Books.
Billig, M. (1999). Whose terms? Whose ordinariness? Rhetoric and ideology in
conversation analysis. Discourse and Society, 10(4), 543–558. https://doi.org/10.1177
/0957926599010004005
Boden, D. (1994). The business of talk: Organizations in action. Polity Press.
Bögels, S., Kendrick, K. H., & Levinson, S. C. (2015). Never say no… How the brain
interprets the pregnant pause in conversation. PLOS One, 10(12), e0145474. https://doi
.org/10.1371/journal.pone.0145474
Burr, V. (2015). Social constructionism (3rd ed.). Routledge.
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception–behavior
link and social interaction. Journal of Personality and Social Psychology, 76(6), 893–
910. https://doi.org/10.1037//0022-3514.76.6.893
Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for
understanding. Journal of Memory and Language, 50(1), 62–81.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
Condon, W. S., & Sander, L. W. (1974). Synchrony demonstrated between movements of
the neonate and adult speech. Child Development, 45(2), 456–462. https://doi.org/10
.2307/1127968
Couper-Kuhlen, E., & Selting, M. (2001). Introducing interactional linguistics. In M.
Selting & E. Couper-Kuhlen (Eds.), Studies in interactional linguistics (pp. 1–22). John
Benjamins.
Cross, I. (2005). Music and meaning, ambiguity and evolution. In D. Miell, R. MacDonald,
& D. J. Hargreaves (Eds.), Musical communication (pp. 27–43). Oxford University Press.
212 Melisa Stevanovic
De Ruiter, J. P., & Albert, S. (2017). An appeal for a methodological fusion of conversation
analysis and experimental psychology. Research on Language and Social Interaction,
50(1), 90–107. https://doi.org/10.1080/08351813.2017.1262050
De Ruiter, J. P., Mitterer, H., & Enfield, N. J. (2006). Projecting the end of a speaker’s turn:
A cognitive cornerstone of conversation. Language, 82(3), 515–535. https://doi.org/10
.1353/lan.2006.0130
Deschamps, P. K. H., Schutte, I., Kenemans, J. L., Matthys, W., & Schutter, D. J. L. G.
(2012). Electromyographic responses to emotional facial expressions in 6–7 year olds.
International Journal of Psychophysiology, 85(2), 195–199. https://doi.org/10.1016/j
.ijpsycho.2012.05.004
DiMascio, A., Boyd, R., & Greenblatt, M. (1957). Physiological correlates of tension
and antagonism during psychotherapy: A study of “Interpersonal Physiology”.
Psychosomatic Medicine, 19(2), 99–104.
Drew, P., & Heritage, J. (1992). Analyzing talk at work: An introduction. In P. Drew
& J. Heritage (Eds.), Talk at work: Interaction in institutional settings (pp. 3–65).
Cambridge University Press.
Edwards, D., & Potter, J. (1992). Discursive psychology. Sage.
Feldman, R., Magori-Cohen, R., Galili, G., Singer, M., & Louzoun, Y. (2011). Mother
and infant coordinate heart rhythms through episodes of interaction synchrony. Infant
Behavior and Development, 34(4), 569–577. https://doi.org/10.1016/j.infbeh.2011.06
.008
Frith, C. D., & Frith, U. (2006). How we predict what other people are going to do. Brain
Research, 1079(1), 36–46. https://doi.org/10.1016/j.brainres.2005.12.126
Fuchs, T., & De Jaegher, H. (2009). Enactive intersubjectivity: Participatory sense-making
and mutual incorporation. Phenomenology and the Cognitive Sciences, 8(4), 465–486.
https://doi.org/10.1007/s11097- 009-9136-4
Fusaroli, R., & Tylén, K. (2012). Carving language for social coordination: A dynamical
approach. Interaction Studies, 13(1), 103–124. https://doi.org/10.1075/is.13.1.07fus
Garfinkel, H. (1967). Studies in ethnomethodology. Prentice Hall.
Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: A study in
conceptual and semantic co-ordination. Cognition, 27(2), 181–218. https://doi.org/10
.1016/0010- 0277(87)90018-7
Goffman, E. (1967). Interaction ritual: Essays on face-to-face interaction. Anchor
Books.
Goodwin, C., & Heritage, J. (1990). Conversation analysis. Annual Review of Anthropology,
19(1), 283–307. https://doi.org/10.1146/annurev.an.19.100190.001435
Gorisch, J., Wells, B., & Brown, G. J. (2012). Pitch contour matching and interactional
alignment across turns: An acoustic investigation. Language and Speech, 55(1), 57–76.
Hamilton, D. L., & Sherman, S. J. (1996). Perceiving persons and groups. Psychological
Review, 103(2), 336–355. https://doi.org/10.1037/0033-295X.103.2.336
Hammersley, M. (2003). Conversation analysis and discourse analysis: Methods
or paradigms? Discourse and Society, 14(6), 751–781. https://doi.org/10.1177
/09579265030146004
Heritage, J. (1984). Garfinkel and ethnomethodology. Polity Press.
Heritage, J. (1987). Ethnomethodology. In A. Giddens & J. Turner (Eds.), Social theory
today (pp. 224–272). Polity Press.
Heritage, J. (2009). Conversation analysis as social theory. In B. Turner (Ed.), The new
Blackwell companion to social theory (pp. 300–320). Blackwell.
“Non-accountable” interactional phenomena 213
Marci, C. D., Ham, J., Moran, E., & Orr, S. P. (2007). Physiologic correlates of perceived
therapist empathy and social-emotional process during psychotherapy. Journal
of Nervous and Mental Disease, 195(2), 103–111. https://doi.org/10.1097/01.nmd
.0000253731.71025.fc
Marsh, K. L., Richardson, M. J., & Schmidt, R. C. (2009). Social connection through joint
action and interpersonal coordination. Topics in Cognitive Science, 1(2), 320–339.
https://doi.org/10.1111/j.1756-8765.2009.01022.x
McFarland, D. H. (2001). Respiratory markers of conversational interaction. Journal of
Speech, Language, and Hearing Research, 44(1), 128–143. https://doi.org/10.1044/1092
-4388(2001/012)
Miles, L. K., Louise, K. N., & Macrae, C. N. (2009). The rhythm of rapport: Interpersonal
synchrony and social perception. Journal of Experimental Social Psychology, 45(3),
585–589. https://doi.org/10.1016/j.jesp.2009.02.002
Paxton, A., & Dale, R. (2013). Frame-differencing methods for measuring bodily synchrony
in conversation. Behavior Research Methods, 45(2), 329–343. https://doi.org/10.3758/
s13428- 012- 0249-2
Peräkylä, A., Henttonen, P., Voutilainen, L., Kahri, M., Stevanovic, M., Sams, M., & Ravaja,
N. (2015). Sharing the emotional load: Recipient affiliation calms down the storyteller.
Social Psychology Quarterly, 78(4), 301–323. https://doi.org/10.1177/0190272515611054
Rabinowitch, T.-C., Cross, I., & Burnard, P. (2013). Long-term musical group interaction
has a positive influence in empathy in children. Psychology of Music, 41(4), 484–498.
https://doi.org/10.1177/0305735612440609
Reddish, P., Bulbulia, J., & Fischer, R. (2014). Does synchrony promote generalized
prosociality? Religion, Brain and Behavior, 4(1), 3–19. https://doi.org/10.1080
/2153599X.2013.764545
Richardson, M. J., Marsh, K. L., Isenhower, R. W., Goodman, J. R. L., & Schmidt, R.
C. (2007). Rocking together: Dynamics of intentional and unintentional interpersonal
coordination. Human Movement Science, 26(6), 867–891. https://doi.org/10.1016/j
.humov.2007.07.002
Roberts, F., Francis, A. L., & Morgan, M. (2006). The interaction of inter-turn silence
with prosodic cues in listener perceptions of “trouble” in conversation. Speech
Communication, 48(9), 1079–1093. https://doi.org/10.1016/j.specom.2006.02.001
Roberts, F., Margutti, P., & Takano, S. (2011). Judgments concerning the valence of inter-
turn silence across speakers of American English, Italian, and Japanese. Discourse
Processes, 48(5), 331–354. https://doi.org/10.1080/0163853X.2011.558002
Sacks, H. (1984). Notes on methodology. In J. M. Atkinson & J. Heritage (Eds.), Structures of
social action: Studies in conversation analysis (pp. 21–27). Cambridge University Press.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the
organisation of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org
/10.1016/ B978- 0-12- 623550- 0.50008-2
Schegloff, E. A. (1991). Reflections on talk and social structure. In D. Boden & D. H.
Zimmerman (Eds.), Talk and social structure: Studies in Ethnomethodology and
Conversation Analysis (pp. 44–70). Polity Press.
Schegloff, E. A. (1996b). Issues of relevance for discourse analysis: Contingency in
action, interaction and co-participant context. In E. H. Hovy & D. R. Scott (Eds.),
Computational and conversational discourse (pp. 3–35). Springer. https://doi.org/10
.1007/978-3- 662- 03293- 0_1
“Non-accountable” interactional phenomena 215
Schegloff, E. A. (1997). Whose text? Whose context? Discourse and Society, 8(2), 165–
187. https://doi.org/10.1177/0957926597008002002
Schegloff, E. A. (2004). Experimentation or observation? On the self alone or the natural
world? Brain and Behavioral Sciences, 27(2), 271–272. https://doi.org/10.1017/
S0140525X0431006X
Schegloff, E. A. (2007). Sequence organization in interaction: A primer in conversation
analysis. Cambridge University Press.
Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in
the organization of repair in conversation. Language, 53(2), 361–382. https://doi.org/10
.1353/lan.1977.0041
Schmidt, R. C., & O’Brien, B. (1997). Evaluating the dynamics of unintended
interpersonal coordination. Ecological Psychology, 9(3), 189–206. https://doi.org/10
.1207/s15326969eco0903_2
Scott-Phillips, T. (2014). Speaking our minds: Why human communication is different, and
how language evolved to make it special. Palgrave Macmillan.
Shockley, K., Baker, A. A., Richardson, M. J., & Fowler, C. A. (2007). Articulatory
constraints on interpersonal postural coordination. Journal of Experimental
Psychology: Human Perception and Performance, 33(1), 201–208. https://doi.org/10
.1037/0096-1523.33.1.201
Shockley, K., Santana, M.-V., & Fowler, C. A. (2003). Mutual interpersonal postural
constraints are involved in cooperative conversation. Journal of Experimental
Psychology: Human Perception and Performance, 29(2), 326–332. https://doi.org/10
.1037/0096-1523.29.2.326
Spapé, M. M., Kivikangas, J. M., Järvelä, S., Kosunen, I., Jacucci, G., & Ravaja, N. (2013).
Keep your opponents close: Social context affects EEG and fEMG linkage in a turn-
based computer game. PLOS One, 8(11), e78795. https://doi.org/10.1371/journal.pone
.0078795
Speer, S. A. (2002). “Natural” and “contrived” data: A sustainable distinction? Discourse
Studies, 4(4), 511–525. https://doi.org/10.1177/14614456020040040601
Stark, R., Walter, B., Schienle, A., & Vaitl, D. (2005). Psychophysiological correlates of
disgust and disgust sensitivity. Journal of Psychophysiology, 19(1), 50–60. https://doi
.org/10.1027/0269-8803.19.1.50
Stevanovic, M. (2018). Social deontics: A nano‐level approach to human power play.
Journal for the Theory of Social Behaviour, 48(3), 369–389. https://doi.org/10.1111/jtsb
.12175
Stevanovic, M., Henttonen, P., Koski, S., Kahri, M., & Voutilainen, L. (2019a). Affiliation
and dominance in female and male dyads: When discoordination makes happy. Gender
Issues, 36(3), 201–235. https://doi.org/10.1007/s12147- 018-9218-0
Stevanovic, M., Henttonen, P., Koski, S., Kahri, M., Voutilainen, L., Koskinen, E.,
Nieminen-von Wendt, T., Tani, P., & Peräkylä, A. (2017). On the Asperger experience
of interaction: Interpersonal dynamics in dyadic conversations. Journal of Autism, 4(2).
http://doi.org /10.7243/2054 -992X- 4-2
Stevanovic, M., Henttonen, P., Koskinen, E., Peräkylä, A., Nieminen von-Wendt, T.,
Sihvola, E., Tani, P., Ravaja, N., & Sams, M. (2019b). Physiological responses to
affiliation during conversation: Comparing neurotypical males and males with
Asperger syndrome. PLOS One, 14(9), e0222084. https://doi.org/10.1371/journal.pone
.0222084
216 Melisa Stevanovic
Stevanovic, M., & Himberg, T. (2021). Movement synchrony as a topic of empirical social
interaction research. In J. Lindström, R. Laury, A. Peräkylä, & M.-L. Sorjonen (Eds.),
Intersubjectivity in action (pp. 229–346). Benjamins.
Stevanovic, M., Himberg, T., Niinisalo, M., Kahri, M., Peräkylä, A., Sams, M., & Hari, R.
(2017). Sequentiality, mutual visibility, and behavioral matching: Body sway and pitch
register during joint decision-making. Research on Language and Social Interaction,
50(1), 33–53. https://doi.org/10.1080/08351813.2017.1262130
Stevanovic, M., & Kahri, M. (2011). Puheäänen musiikilliset piirteet ja sosiaalinen
toiminta. [Social action and the musical aspects of speech.]. Sosiologia, 48, 1–24.
Stevanovic, M., Tuhkanen, S., Järvensivu, M., Koskinen, E., Lindholm, C., Paananen,
J., Savander, E., Valkeapää, T., & Valkia, K. (2022). Making food decisions together:
Physiological and affective underpinnings of relinquishing preferences and reaching
decisions. SAGE Open. https://doi.org/10.1177/21582440221078010
Stevanovic, M., Tuhkanen, S., Järvensivu, M., Koskinen, E., Savander, E., & Valkia,
K. (2021). Physiological responses to proposals during dyadic decision-making
conversations. PLOS One, 16(1), e0244929. https://doi.org/10.1371/journal.pone.0244929
Stevanovic, M., Valkeapää, T., Weiste, E., & Lindholm, C. (2020). Joint decision making
in a mental health rehabilitation community: The impact of support workers’ proposal
design on client responsiveness. Counselling Psychology Quarterly. https://doi.org/10
.1080/09515070.2020.1762166
Stivers, T. (2015). Coding social interaction: A heretical approach in conversation analysis?
Research on Language and Social Interaction, 48(1), 1–19. https://doi.org/10.1080
/08351813.2015.993837
Svennevig, J., & Skovholt, K. (2005). The methodology of conversation analysis –
Positivism or social constructivism? Paper presented at the 9th International Pragmatics
Conference, July 10–15, 2005.
ten Have, P. (2007 [1999]). Doing conversation analysis: A practical guide. Sage.
Torreira, F., Bögels, S., & Levinson, S. C. (2015). Breathing for answering: The time
course of response planning in conversation. Frontiers in Psychology, 6, 284. https://
doi.org/10.3389/fpsyg.2015.00284
Valdesolo, P., & DeSteno, D. (2011). Synchrony and the social tuning of compassion.
Emotion, 11(2), 262–266. https://doi.org/10.1037/a0021302
Voutilainen, L., Henttonen, P., Kahri, M., Ravaja, N., Sams, M., & Peräkylä, A. (2014).
Affective stance, ambivalence, and psychophysiology. Journal of Pragmatics, 68, 1–24.
https://doi.org/10.1016/j.pragma.2014.04.006
Voutilainen, L., Henttonen, P., Kahri, M., Ravaja, N., Sams, M., & Peräkylä, A. (2018a).
Empathy, challenge, and psychophysiological activation in therapist–client interaction.
Frontiers in Psychology, 9, 530. https://doi.org/10.3389/fpsyg.2018.00530
Voutilainen, L., Henttonen, P., Stevanovic, M., Kahri, M., & Peräkylä, A. (2018b). Nods,
vocal continuers, and the perception of empathy in storytelling. Discourse Processes,
56(4), 310–330. https://doi.org/10.1080/0163853X.2018.1498670
Watson, R. (2000). The character of institutional talk: A response to Hester and Francis.
Text and Talk, 20(3), 377–389. https://doi.org /10.1515/text.1.2000.20.3.377
Wetherell, M. (1998). Positioning and interpretative repertoires: Conversation analysis and
post-structuralism in dialogue. Discourse and Society, 9(3), 387–412. https://doi.org/10
.1177/0957926598009003005
Wiggins, S. (2017). Discursive psychology: Theory, method, and applications. Sage.
“Non-accountable” interactional phenomena 217
Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science,
20(1), 1–5. https://doi.org/10.1111/j.1467-9280.2008.02253.x
Wowk, M. T. (2007). Kitzinger’s feminist conversation analysis: Critical observations.
Human Studies, 30(2), 131–155. https://doi.org /10.1007/s10746 - 007-9051-z
Zivotofsky, A. Z., & Hausdorff, J. M. (2007). The sensory feedback mechanisms enabling
couples to walk synchronously: An initial investigation. Journal of NeuroEngineering
and Rehabilitation, 4, 28. https://doi.org/10.1186/1743- 0003- 4-28
PART 4
Enhancing transparency
of analytical processes
11
BEYOND VIDEO
Using practice-based VolCap analysis to
understand analytical practices volumetrically
Introduction
In this chapter, we propose a general approach we call practice-based volumetric
capture analysis (PBVCA) that uses Virtual Reality (VR) technology to better
understand and support the practices by which scholars collaborate over time to
develop an analysis of a complex event recorded with multiple cameras. What
is novel in this approach is the virtualisation of two sets of practices, namely (a)
the viewing and manipulation of mediated representations of complex time-based
audio-visual data, and (b) the ethno-scenography of analytical performances of
observation, demonstration, and analysis.1 To virtualise both, we combine a pow-
erful digital tool (AVA360VR) for immersion within a virtual spatial environment
and another tool (VolCap) to capture and re-enact live actions volumetrically.
This chapter explores the practices in which these digital tools and virtualisations
come to make sense for the participants (for us as analysts) as they work towards
an analysis of socio-interactional phenomena. We also consider if the “heuristic
handicap” of collaboratively working with the digital tools entails a methodologi-
cal bonus or are these tools mere supplements, conveniences, or distractions, from
the perspective of ethnomethodology (EM) and ethnomethodological conversa-
tion analysis (EMCA).
DOI: 10.4324/9781003424888-15
222 Paul McIlvenny and Jacob Davidsen
might seduce one into thinking that they are, de facto, more sensitive to inter-
subjectively-based interaction than are projects based on other forms of data.
In fact, unless the conceptual approach that informs this video-based analysis
adopts a coherent alignment towards members’ in situ, in vivo, intersubjec-
tively-based practices, video-data only operates as part of the theory, corrobo-
ratively reproducing any incoherences the theory itself espouses.
software, such as 360-degree cameras or VR. Indeed, there are new ways of col-
lecting audio-visual data that need digital tools to be developed to support immer-
sive qualitative analysis of that data. Those tools should support a richer method
of archiving the embodied business of doing analysis for the purposes of aid-
ing critical reflection in the future. AVA360VR (Annotate, Visualise, Analyse 360
Video in VR) is a Windows-based software package that we developed to support
the analysis of complex video archives consisting of synchronised 360-degree
video (see Raudaskoski, this volume), 2D video, and spatial audio recordings in
an immersive virtual environment.3 It offers a range of tools affording sophisti-
cated re-visualisations and annotations of simultaneous video and audio streams
(McIlvenny, 2018). Most importantly for this chapter, it also supports the volu-
metric capture (VolCap) of an analyst’s use of the tool for performative immersive
RePlay at a later date (McIlvenny, 2020b).4 This is a unique feature built into
AVA360VR for the express purpose of asynchronously collaborating spatially with
others. Later, we document how we used the software package AVA360VR to share
and expand our emerging analyses of social, embodied, and spatial interaction.
Below we summarise the aims of, and support for, volumetric capture with soft-
ware in VR, which AVA360VR implements within its own interactive scenography.
Volumetric capture describes a set of tools that attempt to capture the complete
volume in which physical or virtual events take place, so that they can be reconstructed
from any spectator standpoint in the scene in terms of visual and/or aural events. Besides
the implementations of some form of capture in video games (e.g., Machinima), early
scholarly approaches to capture and replay of computer-supported activity can be
found in pioneering systems such as Digital Replay System (DRS) (Crabtree et al.,
2015). VolCap in AVA360VR is our implementation of the volumetric capture of an
analyst (in our case, an EMCA researcher using the virtual tools to construct an analy-
sis) in VR, so that the event of analysis can be reconstructed and RePlayed (reacti-
vated) immersively from any position in the scene captured. This includes the voice of
the analyst, as well as the tracked head and hand movements logged by the VR headset
and controllers. When a VolCap is RePlayed at a later point, the RePlayer should expe-
rience immersively what the original analyst (VolCapper) was doing virtually in three-
dimensions with six-degrees-of-freedom. Hence, RePlay is not a passive spectatorial
experience, such as going to the cinema or watching a video clip.5
In Figure 11.1, the complex relationship between the embedded layers of PBVCA
is shown. On the left is indicated the relationship between the original event, the
recording of the event, and the VolCap of the ongoing analysis of the event via the
recording (step 1). On the right, that VolCap becomes fodder for the meta-analyst
to reactivate and re-enact the analysis-at-hand (steps 2–3). Step 4 involves reflect-
ing from the perspective of the right side as a “heuristic handicap” to uncover the
“missing whatness” of the ongoing analysis on the left side. Indeed, each VolCap
is “another next first time” with the data, which is also evident in any reactiva-
tion of the VolCap.6 It is in the re-enactments of the analysis in the RePlays of
VolCaps (steps 2–3) that the “missing whatness” is recoverable, not in the VolCaps
FIGURE 11.1
Diagram showing four steps of PBVCA
226 Paul McIlvenny and Jacob Davidsen
themselves. Rather than just watching the recorded analysis, the meta-analyst can
re-enact the analysis and pose alternative analyses as an embodied experience.
Moreover, with two analysts sharing VolCaps, then PBVCA becomes collaborative
and incremental.
Data
In this section, we give some information on our video data collected in a specific
setting, as well as our recording practices to document the event in question. We
also outline how we captured and archived VolCaps and RePlays of researchers
analysing this video data using AVA360VR.8
Arch1 data
The Arch 1 data stems from a large archive of recordings following architecture
and design students as they make physical models of their designs. It was recorded
Beyond Video 227
FIGURE 11.2
Aerial plan of the participants in Arch1
with three different 360-degree cameras and two GoPro 2D cameras positioned
statically at different locations in the workshop area (see Figure 11.2).
The student group consists of six members, who are working in a distributed
manner on different smaller parts of their shared presentation. They are working
together in clusters (e.g., EM/DA, BS/MT and UN/KT) to put together the physical
materials into three different versions of their design. In Figure 11.2, an aerial view
shows the location of the group members, their gaze direction, and the movements
of KT, as well as the positions of the relevant cameras. In Figure 11.3, the talk and
embodied actions of the main participants in Arch1, which feature predominantly
in all our VolCaps, are represented much simplified in a comic panel sequence (see
Laurier & Back, this volume). The panels contain images from the anonymised
original video recordings of the event (e.g., shots from 360-1, 360-2, and GoPro-2).
In the nine-panel comic transcript, the key participants and actions are shown.
In panels 1–3, we see EM and DA engage with UN about their newly discovered
problem, while KT works at the other side of the room. In panel 4, UN responds
to the report of the problem, after which KT turns around and responds verbally in
panel 5. In panel 6, UN turns to look at KT while BS turns to glance at KT in panel
7. In panel 8, KT starts to move towards UN and arrives next to UN in panel 9.
The phenomenon that is the locus of the researcher’s analytical attention with
respect to the original event described above is what we might call a “huddle”,
228 Paul McIlvenny and Jacob Davidsen
FIGURE 11.3
Comic panel sequence for short excerpt from Arch1
VolCaps as data
For us as qualitative analysts, VolCapping has become an important tool to demon-
strate an observation or an analysis in the act of doing an observation or analysis.
We do that analysis using the range of tools offered by AVA360VR within the 3D
space in which the original recorded data is inhabited spatially. However, VolCaps
are data of a different order than legacy video recordings, such as Arch1. A VolCap
Beyond Video 229
FIGURE 11.4
A rendering of a VolCap RePlay scene in VR
230 Paul McIlvenny and Jacob Davidsen
The RePlayer is currently positioned behind the VolCapper but can teleport to any
position on a horizontal plane. The RePlayer can record the view from a selection of
virtual cameras simultaneously, such as the VolCapper’s and the RePlayer’s view-
ports (see the figures in Example 2 that show the two viewports side-by-side).9
Some readers may ask why we did not just use a standard video camera to
record the EMCA analysts together in the same room analysing the event recorded
above on a 2D display. That would have been easier, but it would have missed the
volumetric phenomena that we analyse below. Because we capture volumetrically
the work of analysing video recordings in VR, we can repeatedly revisit those prac-
tices volumetrically. Alternatively, we would have loved to capture volumetrically
co-present analysts at work, but this is not yet technically feasible in natural set-
tings. We did record ourselves using a live collaborative prototype of AVA360VR
that allowed us to annotate, visualise, and analyse the data in VR simultaneously,
but it is not yet possible to do a VolCap of that scenario for later analysis in RePlay.
Note that we are not claiming that the analysis using VolCaps in VR is equivalent to
(or a proxy for) co-present analytical work; both are legitimate analytical practices.
With several cases of VolCap and one case of RePlay, we illustrate some
aspects of our analytical “workflow” as we pursue an analysis of how “huddles”
are socially accomplished in our Arch1 data.
Notes on Transcripts
In this chapter, we use a specific set of multimodal transcription conventions.10
Our modifications include an action sub-tier type naming system for explicitness,
as well as a transcoding of numbered time intervals into strings made up of sym-
bols (each representing 0.1 second, e.g. ◘◘◘◘ = 0.4) to better show visually the
passing of time in a manner that is comparable to the graphemic representation of
the temporal stream of speech.11 The shaded boxes in the transcripts indicate the
talk and actions in the video recording of the students at work (Arch1), whereas
the non-shaded sections are related to the talk and actions of the VolCapper and
Replayer in VR.12
Analysis
The analysis that follows focuses on a practice-based VolCap analysis (PBVCA)
of our own collaborative practices of assembling tangible observations and proto-
analyses of complex data. This involves inspecting the digital traces of our work-
ing practices in the acts of documenting, archiving, observing, and analysing that
VolCapping facilitates (mainly the right side of Figure 11.1).
Beyond Video 231
Analysis of Arch1-RePlay1
The first case we explore is a specific segment (12:45–15:53) of a rendering of
a RePlay by Paul (P) of a VolCap created by Jacob (J) in May 2021. The source
VolCap that is RePlayed is identical to Arch1-VolCap5 that is analysed later in this
chapter, though the segment of the VolCap that is RePlayed here is 8:04–9:16. J is
observing the scene of the recorded event from the perspective of the 360-degree
camera (360-2) in the middle of the room (see Figure 11.2). It is possible for J to
see KT, UN, and BS, but a large foam block on the table in the middle of the room
is obstructing J’s view of EM, DA, and MT.
In Example 1, we see how J provides an instructed viewing and hearing for a
future RePlayer (the RePlayer in this case is P).
VolCappers are continually faced with giving accounts for their past, present,
and future actions and observations in relation to the video data that is presented
scenographically and temporally in AVA360VR. In this case, J has already started
playing the 360-degree video and gives a running commentary that anticipates
what KT will do and say next. He is orienting to what a RePlayer must attend to in
order to observe the same phenomenal details of an unfolding action – an instructed
viewing for a future RePlayer. In a delicate dance with the recalcitrant video play-
back, J says “when she is orienting orienting towards the group and she says is it
annoying” (lines 8, 11, 14 and 17). However, in the viewable playback so far, KT
232 Paul McIlvenny and Jacob Davidsen
has neither turned to orient to the group nor spoken. Therefore, we can see and hear
that J’s embodied talk and action is prospectively oriented to a known, but not yet
shared, seeable and hearable whose occurrence and course of action are unfold-
ing. During this example, P shifts gaze to KT (line 8) and then back to J (line 18).
Often when RePlaying a VolCap, it becomes difficult to follow the pace of
the VolCapper’s reorientations – such as pointing, glancing, and checking – in
360-degrees. This is a practical problem facing a RePlayer who is trying to follow
a VolCapper, which is exacerbated by spatial positioning. Hence, with the PBVCA
method, we can enquire into how the immersive practices of using AVA360VR
and VolCapping reshapes how the RePlayer comes to see a spatial referent (in the
360-degree video) in common with the VolCapper. Moreover, we can experience
this phenomenon for ourselves by RePlaying the VolCap, an embodied experience
of a practice that is not available with a video recording of a RePlaying of a VolCap,
nor in a video recording of someone trying to follow an analysis by someone else.
In the examples above, and below, both the VolCapper and Replayer accomplish
the scenic and sonic intelligibility of action in 360-degrees volumetrically.
In Example 2, we see a more complex case of the practical work of following
an instructed viewing.
Example 2 – Following BS or the mirror cam.
Beyond Video 233
In the course of his instructed viewing, J has primed the seeability of some
phenomenal detail which is as yet unspecified: “we can see” (line 40). Just prior,
J adjusted the mirror cam and then repositioned the window slightly to the left
so that KT is no longer occluded from his point of view (see J’s viewport in the
left frame of #fig1). Unbeknownst to J, who made the VolCap in the past relative
to P, this causes an occlusion problem in the RePlay for P now. The reposition-
ing of the mirror cam by J results in P no longer being able to see KT on the
360-degree video (see P’s viewport in the right frame of #fig1). As P glances
down (to his controller?) in line 41, J restarts playback of the 360-degree video
from his vantage point. P then teleports to a position almost directly behind J to
see from the same angle as J sees the scene unfolding with KT visible again (see
P’s viewport in the right frame of #fig2). P has achieved a co-viewing (in lines
42–43) of what transpires to be a demonstration by J using the mirror cam of
BS’s non-reciprocated glance at KT (see gaze shifts by BS, UN and KT on line
42) after her question.
Analysis of Arch1-VolCaps
With the help of four interconnected VolCaps (Arch1-VolCap2 to VolCap5), the
initial analysis was developed by the authors (P and J) of how KT comes to notice,
and announce her noticing, of trouble resulting from the public work of EM and
DA (see Figure 11.3). To reach this stage of proto-analysis, P and J have iteratively
and collaboratively developed a set of observations of the event through re-acti-
vating each other’s VolCap. Each VolCap is a new capture by J or P created after
RePlaying prior VolCaps.
Arch1-VolCap2
In Example 4 from Arch1-VolCap2, while creating the VolCap in AVA360VR, J is
observing the scene of the recorded event from the location of the 360-1 camera
on the table near the cluster EM and DA (see Figure 11.2).
236 Paul McIlvenny and Jacob Davidsen
From this camera position, it is possible for J to see everyone except BS because
a large foam block on the table in the middle of the room is obstructing the camera’s
view. J focuses on the troubles identified by the cluster BS and MT and suggests
that KT is also searching for what their trouble might be (lines 11–12). In earlier
research, not in this chapter, we primarily analysed how the cluster EM and DA
tried to solve their problem, but we did not focus on the cluster BS and MT. Both
clusters at the table are experiencing troubles with their task at hand, and they are
hearably stating that they are experiencing troubles. In this VolCap, J plays the “uh
oh” sequence and provides a running commentary. Just after KT finishes her turn
(line 3), J pauses the video to mark this sequence as an observable. Then J plays
a little more while commenting on what KT is doing (lines 6–7). There is a sense
of surprise (“oh”) in the way that J is commenting on the interaction (e.g., “she’s
actually putting something back on the table oh and then” on lines 6–7) to do with
a noticing in the video as it was replaying. J is scanning for visible evidence of a
phenomenon in the playback that the VolCap documents. J continues playback and
Beyond Video 237
observes that KT is walking to the main table (line 10). J pauses the video, and
makes a claim that KT is at first concerned with what the cluster BS and EM are
saying and then shifts to another focus while standing in the middle of the room
(lines 11–13). This claim in J’s VolCap2 is what P’s argument in VolCap3 below
focuses on (made after P has RePlayed J’s VolCap2).
Arch1-VolCap3
In Example 5, P makes a new VolCap to provide a counter-observation to J’s claim
in VolCap2 above using the different 360-degree views afforded by the locations
of the 360-1 and 360-2 cameras that recorded the Arch1 event.
Arch1-VolCap5
In a new VolCap4, not analysed here, J begins to integrate the counter-observation
made by P in VolCap3. The final VolCap5 is also by J, and it completes the col-
laborative proto-analysis. In Example 6, the commentary by J is spatially condi-
tioned by the different views of the event afforded by the 360-degree cameras (see
Figure 11.2). At an earlier point in the VolCap, J transitioned from the 360-2 to the
360-1 camera view.
Beyond Video 239
When J says “we will just change position” (line 1) and transitions from the
360-1 to the 360-2 camera view, he orients to a future RePlayer who will also be
automatically transported to the viewpoint of the new camera in their RePlay of
this VolCap. Moreover, after the transition, he says “now we are back where we
started” (line 3), orienting to the scenic and scenographic intelligibility of a visual
sequence that is spatially located in the same scene (“back there”), yet at the cur-
rent frame in the timeline (not “back then”). For J, the paused 360-degree video
(line 3) provides sufficient resources for an observation to be partially formulated
about the anticipated movement and interaction that is about to be re-activated
shortly when playback resumes. The hesitation by J before completing the obser-
vation of the direction of movement indicates the practical ambiguity of the static
240 Paul McIlvenny and Jacob Davidsen
video. It might happen now or later; it might be that she steps back or forward (at
some point she does both). The precise timing and direction of the movement is
only seeable once the video is playing. After playback resumes (line 6), the move-
ment of KT is punctuated by J’s syntactic addition of the complement, namely
(walking) “into the room”. Playback continues and the “uh oh” sequence is seen
and heard in full. After the video is paused at line 8, J re-enacts the verbal interac-
tion, prefaced by “and then she’s saying” (line 10) in reference to an earlier ambig-
uous English translation in the already open transcript window. With VolCaps in
AVA360VR, the source video(s) can be played and reactivated repeatedly to make
bodily actions visible for a RePlayer. In this case, it is used by J to re-enact a play-
through and instructed viewing that aligns for another next first time with the
ongoing proto-analysis from VolCap2 to VolCap4.
Conclusion
The chapter has documented the emerging analyses of a video-recorded event
within practices that use the functionality of AVA360VR, specifically volumetric
capture (VolCap) and RePlay. Using practice-based VolCap analysis (a version of
PBVCA), we have discovered some of the missing phenomenal details of “bodies
of practices” of “doing analysis for another member” by attempting to follow “the
analysis” volumetrically in and as the artful production of a spatial field unfolding
in RePlay. We documented the crucial role of the body and space in resolving for-
all-practical-purposes the indexicality of instructed viewing, such as the practices
of accomplishing co-viewing in 360-degrees, the instructed viewing of apposite
alternative perspectives on the 360-degree video, and spatial repositioning to
achieve co-viewing. We have also discovered some of the missing phenomena
of practices of “doing analysis” by re-enacting “the analysis” by another mem-
ber in and through reactivating it in RePlay, such as scanning for a phenomenon,
zooming in on bodies of practices and bodily actions, and achieving the scenic
and scenographic intelligibility of a visual sequence. This can only be undertaken
volumetrically by reactivating and re-enacting an earlier VolCapped analysis or
by viewing the RePlay and proactively redoing the analysis and creating alterna-
tive analyses by VolCap. We argue that VolCap – and the analytical possibili-
ties afforded by taking a “scenographic turn” to video analysis (McIlvenny &
Davidsen, 2017) – expands the range of abductive–inductive analytical potentials
that lie between collecting data and finalising a robust analysis. We have dem-
onstrated how the combination of VolCap and RePlay is negotiated as an ana-
lyst’s resource for the performance of enhanced visual and spatial argumentation,
accountable in terms of a praxeology of evidential adequacy and critical reflexiv-
ity. As an alternative to traditional EMCA data sessions, it leads to a complemen-
tary mode of performing, engaging, sharing, collaborating, and archiving with
respect to mixed video data recorded at sites of social conduct.
Beyond Video 241
This chapter has gone some way to determining the contingencies of the pro-
duction of a VolCap that have to be erased to enable the VolCap to act as the
“data” (with steps towards its naturalisation). We have also learnt from docu-
menting how we come to share viewings, seeings, and hearings of audio-vis-
ual “data” within specific interactional scenographic practices that we need to
reflect more on the relationships between (i) the original Event, (ii) the recordings
of the Event, (iii) replaying audio-visual recordings of the Event, (iv) RePlaying
VolCaps of the Event of analysis of the recordings of the original Event, and
(v) the reflexive practices of analysis itself (see Figure 11.1). This is especially
true if we are aiming to demonstrate a specific enhanced practice of working
with complex data using AVA360VR, which is designed to support an alternative
infrastructure for practice-based qualitative research beyond video (McIlvenny,
2020a).
In relation to the praxeology of VolCap reactivation, we can see interesting
correspondences between Garfinkel’s articulation of “another next first time”
(Garfinkel, 2002) and (a misreading or at least an oversimplification of) Derrida’s
conceptual discussion of “signature event context” (Derrida, 1988). For Derrida,
for a signature as an act to be authentic, for instance, it must, of necessity, be
both repeatable and perceptually different on every event of its reduplication. If
not, then it cannot function as a signature without the iterability of difference.
Derrida writes: “In order to function, that is, to be readable, a signature must
have a repeatable, iterable, imitable form; it must be able to be detached from the
present and singular intention of its production” (Derrida, 1988, p. 20). Likewise,
for Garfinkel every action is both undertaken as if for a first time, but inevitably it
must be recognisable from earlier occasions as a similar or same action. Replays
of a VolCap are reactivations that are recognisably of the same event as earlier
reactivations, and yet they are understood in practice as different or unique for the
specific project under way. While each next RePlay may have unique properties,
the “same” VolCap event must continually reproduce recognisably the “same”
properties. In this chapter, we hope to have given some insight into how VolCaps
as “signatures” are in practice accomplished each next first time.
Notes
1 By virtualisation we mean that each and every activation of a video record or a volu-
metric capture is an abstract distillation of the event captured. With complex video
data (and VolCaps), each virtualisation is more and more obviously never identical.
2 In performance studies, Auslander (2009: 85) suggests that “each reactivation dis-
closes the original, but discloses it under different circumstances.”
3 AVA360VR was officially released on 20th May 2021 by the BigSoftVideo team after
several years of development and beta-testing. The software is free to use and is avail-
able to download from GitHub: github.com / BigSoftVideo/AVA360V R/.
4 We use the terms VolCap and RePlay to distinguish the specific implementation of
volumetric capture and replay in AVA360VR.
242 Paul McIlvenny and Jacob Davidsen
References
Antaki, C., Biazzi, M., Nissen, A., & Wagner, J. (2008). Accounting for moral judgments
in academic talk: The case of a conversation analysis data session. Text and Talk, 28(1),
1–30. https://doi.org /10.1515/text.2008.001
Ashmore, M., & Reed, D. (2000). Innocence and nostalgia in conversation analysis: The
dynamic relations of tape and transcript. Forum: Qualitative Social Research, 1(3).
https://doi.org /10.17169/ FQS-1.3.1020
Auslander, P. (2009). Reactivation: Performance, mediatization and the present moment.
In M. Chatzichristodoulou, J. Jefferies, & R. Zerihan (Eds.), Interfaces of performance
(pp. 81–93). Ashgate.
Crabtree, A., Tennent, P., Brundell, P., & Knight, D. (2015). Digital records and the digital
replay system. In P. Halfpenny & R. Proctor (Eds.), Innovations in digital research
methods (pp. 193–220). Sage.
Derrida, J. (1988). Limited Inc. Northwestern University Press.
Eisenmann, C., & Lynch, M. (2021). Introduction to Harold Garfinkel’s ethnomethodological
‘misreading’ of Aron Gurwitsch on the phenomenal field. Human Studies, 44(1), 1–17.
https://doi.org /10.1007/s10746 - 020 - 09564-1
Beyond Video 243
Evans, B., & Lindwall, O. (2020). Show them or involve them? Two organizations of
embodied instruction. Research on Language and Social Interaction, 53(2), 223–246.
https://doi.org/10.1080/08351813.2020.1741290
Garfinkel, H. (2002). Ethnomethodology’s program: Working out Durkheim’s aphorism.
Rowman & Littlefield Publishers.
Goffman, E. (1963). Behavior in public places. Free Press.
Heinemann, T., & Möller, R. L. (2016). The virtual accomplishment of knitting: How
novice knitters follow instructions when using a video tutorial. Learning, Culture and
Social Interaction, 8, 25–47. https://doi.org/10.1016/j.lcsi.2015.11.001
Johansson, E., Lindwall, O., & Rystedt, H. (2017). Experiences, appearances, and
interprofessional training: The instructional use of video in post-simulation debriefings.
International Journal of Computer-Supported Collaborative Learning, 12(1), 91–112.
https://doi.org/10.1007/s11412- 017-9252-z
Katila, J., & Raudaskoski, S. (2020). Interaction analysis as an embodied and interactive
process: Multimodal, co-operative, and intercorporeal ways of seeing video data as
complementary professional visions. Human Studies, 43(3), 445–470. https://doi.org/10
.1007/s10746 - 020 - 09553-4
Kovács, A. B., & McIlvenny, P. (2020). BreachingVR. QuiViRR: Qualitative video research
reports (Vol. 1). QuiViRR. https://doi.org /10.5278/ojs.quivirr.v1.2020.a0002
Macbeth, D. (1999). Glances, trances, and their relevance for a visual sociology. In P.
L. Jalbert (Ed.), Media studies: Ethnomethodological approaches (pp. 135–170).
University Press of America.
McIlvenny, P. (2018). Inhabiting spatial video and audio data: Towards a scenographic turn
in the analysis of social interaction. Social Interaction: Video-Based Studies of Human
Sociality, 2(1). https://doi.org/10.7146/si.v2i1.110409
McIlvenny, P. (2020a). New technology and tools to enhance collaborative video analysis
in live ‘data sessions’. QuiViRR: Qualitative Video Research Reports, 1, a0001. https://
doi.org/10.5278/ojs.quivirr.v1.2020.a0001
McIlvenny, P. (2020b). The future of ‘video’ in video-based qualitative research is
not ‘dumb’ flat pixels! Exploring volumetric performance capture and immersive
performative replay. Qualitative Research, 20(6), 800–818. https://doi.org/10.1177
/1468794120905460
McIlvenny, P., & Davidsen, J. (2017). A big video manifesto: Re-sensing video and audio.
Nordicom Information, 39(2), 15–21. https://www.nordicom.gu.se/sites/default/files/
kapitel-pdf/mcilvenny_davidsen.pdf
Mondada, L. (2019). Practices for showing, looking, and videorecording: The interactional
establishment of a common focus of attention. In E. Reber & C. Gerhardt (Eds.),
Embodied activities in face-to-face and mediated settings (pp. 63–104). Springer.
Rawls, A. W. (2002). Editor’s introduction. In H. Garfinkel & A. W. Rawls (Eds.),
Ethnomethodology’s program: Working out Durkheim’s aphorism (pp. 1–64). Rowman
& Littlefield Publishers.
Smith, R. (2020). Seeing the trouble: A mountain rescue training scenario in its
circumstantial and situated detail in three frames. Ethnographic Studies, 17, 41–59.
https://doi.org/10.5281/ZENODO.4050536
Sormani, P. (2014). Respecifying lab ethnography: An ethnomethodological study of
experimental physics. Routledge.
Sormani, P. (2016). Practice-based video analysis: A position statement.
SocietàMutamentoPolitica, 7(14), 103–120. https://doi.org/10.13128/SMP-19698
244 Paul McIlvenny and Jacob Davidsen
Tuncer, S., Lindwall, O., & Brown, B. (2020). Making time: Pausing to coordinate video
instructions and practical tasks. Symbolic Interaction, 44(3), 603–631. https://doi.org
/10.1002/symb.516
Tutt, D., & Hindmarsh, J. (2011). Reenactments at work: Demonstrating conduct in data
sessions. Research on Language and Social Interaction, 44(3), 211–236. https://doi.org
/10.1080/08351813.2011.591765
Watson, R. (1999). Driving in forests and mountains: A pure and applied ethnography.
Ethnographic Studies, 3, 50–60.
12
RECURRENT PROBLEMS AND RECENT
EXPERIMENTS IN TRANSCRIBING VIDEO
Live transcribing in data sessions
and depicting perspective
DOI: 10.4324/9781003424888-16
246 Eric Laurier and Tobias Boelt Back
standard characters, text-based format that she became better known for. In the
digital realm, the current generation of EMCA researchers have used screen grabs
as their way of tracing embodied action from video screens.
In this chapter, we will return to Jefferson’s warrants for changing existing
transcription formats and reflect on the abiding problems of transcribing in EMCA
studies. We will compare the Jefferson format with the graphic transcript and
present our experiments with using the graphic transcript for transcribing “live”
during data sessions, as well as for documenting social phenomena in final pub-
lications. The graphic transcript foregrounds other phenomena and in different
ways than the Jeffersonian system, particularly visually and spatially available
phenomena of order: bodily actions, objects, movements, environmental features,
etc. As Jefferson did, we will raise the problems that we have faced in transcribing
while also working through cases of phenomena that show how and why we shape
and reshape the forms of our transcripts.
01 ^(0.6)^ (2.6)%
dri: ^inspects instruments^
pas: >> brushing trousers -----------%
03 (0.3)*(0.7)+ (0.6)*(0.5)+
dri: +gz rsm------+
pas: *gz F window-* *gz PAS window----
> (5.8)
04 (0.9)+ (1.8) +
dri: +gz instruments+
to render the relative timings of actions so that we can recover just when actions
start, their duration, and their temporal trajectory.
However, to only focus on Mondada’s supplement to the existing Jeffersonian
format would lead us to miss that, like Jefferson, she has led EMCA’s experi-
ments and creativity in transcribing outside of that format. For example, she
hosted a workshop in Rome in 2017 bringing together artists, graphic design-
ers, and EMCA researchers, including Eric, to experiment with distinct diagram-
matic, graphic, and other ways of transcribing action that extended beyond ASCII
text and figures. In her work, she continues to experiment with alternatives that
would help us register phenomena in other ways than our conventional text-based
transcripts.
Alongside bodily actions, the Mondada orthography deals with what we might
broadly call the material environment or, as Mondada sometimes calls it, the local
ecology of objects, technologies, architecture, vegetation, etc. Accompanying
the addition of the local ecology is a further choice about what features to ren-
der, which “are potentially infinite because they exceed conventional forms and
include situated, ad hoc resources that depend on the type of activity and its spe-
cific ecology” (Mondada, 2018, p. 95). There is always a tension between the infi-
nite elements, the plenum as Garfinkel might put it, the absences of the recording,
and the desire to transcribe what the recording has preserved in a “similar, coher-
ent and robust way that is essential for systematic analyses” (p. 95).
the very activities transcribed. For example, for events happening in a vehicle
we label its occupants in the transcript unchangingly even though, as the actions
unfold, various pairings will become relevant: adult + child, aunt + nephew,
driver + passenger, driver + navigator, etc. The salience of category identifica-
tion is particularly marked in workplace settings, yet, of course, applies in other
settings as well. As Watson puts it, these speaker formulations are “categorical-
incumbency-as-transcribed” and “[t]he very transcription procedures for such
occasions of speech exchange indicate a background reliance on the provision
of membership categories” (1997, pp. 51–52). As such, the speaker’s categori-
sation as e.g., “TV producer” precedes the transcribed action and thereby the
transcript steers its readers to “hear” each turn-of-talk as tied to the category in
the identification.
Our third reflection builds on Watson’s categorisation problems but in relation
to transcribing non-verbal action, given each non-verbal activity requires formula-
tion for transcribing it. Again, it steers the reader toward what kind of action they
should understand the person, or other agent, as doing. For example, in Transcript
1, Eric transcribed the driver’s action as “inspecting instruments” rather than
“glances at radio” or “looks down”, etc. Equally, in line 2, the use of “gz” (an
abbreviated “gaze”) selects amongst the possible practices of looking. It occurs to
us that we have come to use “gz” commonly in transcribing because it serves as a
placeholder term for practices of looking, which leaves what kind of looking it is
to be specified in the paired text. Trying to use more neutral or generic identifica-
tions of non-verbal actions does not solve the problem. Inevitably they lose the
local recognisability and availability of the participants’ ongoing production of
their actions with their eyes, heads, hands etc.
Our fourth reflection on non-verbal action is around the granularity we use in
describing embodied actions in transcripts where “gz” is itself a case in point. In
their study of instructional demonstrations, Lindwall and Lymer (in press) build
on Schegloff’s (2000) analysis of the granularity of descriptions as a member’s
concern and resource. According to Schegloff, “[k]nowing how granularity works
matters then not just substantively, but methodologically” (p. 719) because EMCA
descriptions of practices stand in contrast to other social sciences’ representations,
in their greater attentiveness to identifying locally relevant details. Schegloff’s
point is that what details to include and when to be more or less detailed, should be
tied to the level of detail being used by participants. Lindwall and Lymer contrast
videos of instructions by medical professionals for students with Youtube instruc-
tional videos for a general audience. Via that comparison, they show that to stand-
ardise the level of detail in the transcriber’s description then misses the divergent
levels of detail the instructed actions provide and that characterises them. In short,
transcripts that have built on Jefferson’s pioneering work are not without problems
that are internal and of abiding interest to EMCA. They should not be seen as the
endpoint for our efforts to depict practices, document action, and do justice to the
voices of others.
Problems & experiments transcribing video 251
Phonetic transcripts are not accessible to most readers. And the sort of “comic
book” orthography I use (e.g., for “What are you doing?”, “Wutche doin?”) is
considered objectionable in that it makes the speakers look “stupid”; it seems to
caricature them rather than illuminate features of their talk.
(Jefferson, 1983, p. 3)
the details of laughter particles, are deeply consequential for what is meant and
done by participants in interaction. In publications, the seriousness of what the
graphic transcripts is showing, is similarly vouched for in the accompanying anal-
ysis that reshapes first impressions of them as trivialising human affairs. Though,
as we noted earlier, the association between comics and the inconsequential, and
readers’ expectations and conventions for reading them are variable across cul-
tures of reading. At this point, it might still seem as if comic strips will be more
fun to read, but the graphic transcript is by no means as captivating and vivid as
classic comics. As with any type of transcribing, there is a meeting between the
need for under-appreciated detail and the potential lack of imagination in depict-
ing that detail. Part of our criteria in producing graphic transcripts, keeping in
mind Bogen’s complaint, is to show rather than obscure what is happening ahead
of returning to it, as a members’ accomplishment, in our analysis. There is a famil-
iar grammar of comic strips for the EMCA scholar to draw on and adapt (see
McCloud (1993) for an exemplary and fun guide). It is more than a visual gram-
mar, it is what Grennan (2017) calls a lexicogrammar, which has distinct ways of
representing speech, timing, time, actions, motion, etc., yet one that is familiar as
part of, not just its use in comic strips, but also in the many advertisings, instruc-
tions manuals, etc. that share its grammar.
Perhaps the most central problem of graphic transcripts is the loss of a singu-
lar sense of, and measure of, the relative timing of actions. As mentioned above,
a unitary measure of timing is implicit and explicit in the Jeffersonian format.
The length of the line of ASCII text provides a measure of its timing, with vari-
ations on that measure marked in slower or faster delivery (e.g., ‘<slower>’ and
‘>faster<’). There are ways to try and establish a graphic transcript with a compa-
rable measure of timing, for example, you can add time markers explicitly in cap-
tions (see further experiments in showing time measures in Back, 2020; Laurier,
2019). In reflecting on the problem of showing a measure of timing, it is not that
a sense of timing is missing from the comic strip form. The temporality of, and
across, panels has always been multiple because they have the timing associated
with the actions depicted in the image, the duration added by speech bubbles or
other sounds, and the caption box’s use in both timing of the panel as well as plac-
ing it in the past or future (see Laurier, 2014a, 2019). The lack of a single measure
is also a possibility. Multiple temporalities are produced, drawn upon, recognised,
etc. by members in and as part of multiple courses of action. Those multiple tim-
ings are a challenge in transcribing that Mondada (2018) also responds to.
The problem of timing takes a further twist in the graphic transcript. To pro-
vide for their intelligibility to readers as singular images in a panel, actions are
routinely captured from video recordings at a moment in their course that makes
them most recognisable as what they are seen to be by the analyst (Jayyusi, 1993).
Actions’ trajectories are broken into stages, which is more often in relation to their
iconic recognisability than the shape of their trajectory, or they may be selected
to represent other features of analytic interest visible in their course. However,
Problems & experiments transcribing video 253
in trying to show trajectories, they are captured at the points when they start (for
example, at their “home position”), at a mid-point, then again at their completion
(for example, a return to home position). In selecting for their representativeness
in any panel, they run into Jefferson’s original complaint over stereotyping in tran-
scription. Yet, one development of the comic strip EMCA has pursued is using
distinctive images, the equivalent of Jefferson’s nyem, to help show a thing found
in the recording that a comic strip writer would discard as ambiguous or unrec-
ognisable. The way those ambiguities of the screen capture are routinely resolved
is by captioning the image (e.g., “inspects instruments” in Transcript 2) in order
to instruct the reader on how to see specific features of the image (Lindwall &
Lymer in press). Such a caption, of course, leads us back to the categorisation and
formulation problem of embodied actions that besets transcripts, so reflection is
required on the warrants for categorisations formulated by the analyst.
To balance out what will likely sound like an overly cautious introduction
we will now offer succour by considering our recent experiments in transcrib-
ing. Firstly, we will introduce a data session transcribing technique that displaces
the typical pairing of a Jeffersonian or Mondada transcript with playing video
recordings. Secondly, we will describe how a graphic transcript has served us
well, if not better than a textual transcript in showing phenomena related to visual
perspectives.
did not exclude transcribing by drawing from the video, or, in a nod to Jefferson,
tracing paper over video screens, the fastest process was to frame-grab from the
video recording (which is also the easiest way to re-grab and replace an image).
For making the images, providing individual copies of the video was the only
practical solution we could find1. We shared templates in Comic Life, PowerPoint,
and Keynote formats for making comic strips. Participants, once they had made
a preliminary noticing, would depict the thing which they noticed using three or
more panels.
In one data session, we used video data of public transport during the corona-
virus pandemic. Transcript 3 was made by a participant in that session. The ses-
sion was used to explore how passengers select seats on a train as part of the joint
accomplishment of following pandemic restrictions. Every other seat on the train
had a non-seating sticker on it, serving as a resource and reminder for passengers
of distancing guidelines. The sketch filter was applied to the original video for
anonymisation purposes rather than during the transcribing in the data session.
The transcript was used by the participant to support their describing of how seat-
selecting and seat-proposing is divided between the passengers.
In the data session, each analyst usually only had time to produce one graphic
transcript. These were sketches toward analyses. Even in that short time, the
sketching routinely involved several grabs at the video frame to represent not just
for each individual panel, but also, participants re-grabbed images for their fit in
building sequences across the panels, thereby creating the narrative of the prac-
tice. Participants cropped to focus (e.g., panel 3 in Transcript 3), they broke up
actions into shorter or longer durations across and within the panels, as ways of
attending to different kinds of details and temporalities.
The participants reported that the depictional work of grabbing frames and
captioning them shifted their attention toward the embodied, visual, and spatial
aspects of the video recorded event. From the outset, in roughing out the tran-
scripts, participants were inspired to pursue more adequate renderings of the
embodied practices that they had become interested in. There was a stimulating
dis-satisfaction with what they could render. At the same time, and in a limited
time, the very process of producing the three-panel renderings became an occa-
sion for the participants’ initial noticings to foster more noticings at the point of
documenting, as is so often the case in transcribing. Indeed, transcribing was
fruitful rather than mechanical. For example, when Transcript 3 was produced,
one researcher noted how, in cropping the video-grabs to focus on gaze and hand
gesture, his attention was drawn to the literal foot work of the three passengers as
part of proposing where to sit. To shift perspective to the feet was going to require
cropping the image to foreground the feet of the passengers while also grabbing a
differently timed set of frames from the video. Thus, the data-session graphic tran-
script was not an endpoint and instead pushed the participants through revealing
recurrently what was missing. Live graphical transcribing altered other practices
of the data session as well. Participants shifted away from replaying the source
video, using their transcript to show the phenomena that they were talking about.
This sharing of the document rather than replaying the video, lessened the time
taken for each person to present in our online data sessions, though this will vary
dramatically depending on local ecologies of room hardware and software for
sharing screens, projecting video, etc. However, the change in presenting material
was at the expense of staying with the video, with the attendant risk that tran-
scripts supplanted the finding of details in the video recording.
One of the overlooked qualities of transcripts in data sessions is that they are
usually passed back by members of the data session to the presenter as part of
the sharing of collective analyses. The passing back shows the organisation and
accountability of the workplace in terms of whose data it is and who should one
day make something from the session. The multiple graphic transcripts continued
to support this practice of passing back, yet as a different sort of resource. In a
simple sense they were digital documents rather than paper, which fitted well to
the fact that the sessions were online during the COVID pandemic. More signifi-
cantly, they provided parallel, sometimes convergent, sometimes divergent, itera-
tions of the very transcription process, rather than the one stabilised transcript
with each member’s annotation. The destabilisation was falling on a different side
of the tensions between standardisation and inventiveness in transcribing. The
presenter, at the end of a data session, had not only community noticings to draw
on but also a portfolio of ways of representing the thing being analysed.
Depicting perspectives
When narratives of comic strips are drawn, rather than created with screen grabs,
shifts of perspective are used to both show a perspective on things, people, and
places and, relatedly, to show perspective as the perspectives of particular charac-
ters in the narrative. When we transcribe from the given perspectives of cameras,
our ability to show different perspective is limited. In what follows we will, in
that Jeffersonian spirit, use an analysis of perspectives to consider how it is more
than simply the angle and location of the camera. The recognising, sharing, and
256 Eric Laurier and Tobias Boelt Back
has two lenses, one showing the occupants of the car (panel 1) and one showing
part of the view out of the front of the car (panels 2–6). In a sense it prefigures the
360-camera given that many 360-cameras end up being edited or rendered to pro-
duce dashcam grammars of forward and backward perspectives (see CAVA360VR
in McIlvenny, 2021). The dashcam has a fixed position within the vehicle on the
dashboard, which means that the forward camera perspective is moving when the
vehicle itself is moving. The graphic transcript was produced to show an event that
helps us understand how the beeps from car horns are produced to be hearable by
other road users.
The transcript uses a single image of the occupants from the rear-facing cam-
era in panel 1 to provide the reader with a sense of whose perspective they will
come upon in the next frame. As we have noted earlier, perspective provides more
than locating where we are looking from, it is to whom this view belongs. The
pairing of perspectives of a scene with its members looking, is routinely used
in comic strips and film editing to show the subject and afterwards show what
they are looking at. The graphic transcript builds on that adjacently-paired per-
spective relationship. Panel 1 provides more, an excess, yet one that is a relevant
resource. It is the driver’s perspective but also visible in the image is a front seat
passenger and a child in the rear, all looking ahead. In the following sequence of
video frames, it shows the visual perspective as a member’s concern and practical
accomplishment.
The analytic phenomena that Eric sought to show in creating Transcript 4, for
the publication on driving-in-traffic, was what he had from the recording: the use
of the car itself as a perspective producing device. The auto-rickshaw in front of
258 Eric Laurier and Tobias Boelt Back
the car is positioned, for the driver, as an obstacle to looking ahead. When the
driver alters the car position as shown in panel 3, we can see, as the driver can see,
a lack of vehicles ahead on the pavement side. In panel 5 we can see, through the
graphic transcript, how returning to the middle of the dual lanes shows the view
ahead for the slot to be overtaken into but that the view of the pavement side has
been obscured again. The transcript is seeking to show the driver’s work of finding
perspectives into the traffic ahead. Eric selected the video-stills in panel 2 and 3
to show that gap, and for a certain perspective to be visible in its pairing with the
image before and after it.
The sequencing in Transcript 5 is similar to that in Transcript 4. We see a pas-
senger leave the train station platform, walk through a staircase to an underground
tunnel while being interviewed about his daily commuting during COVID-19. The
frames are grabbed from a 360-degree camera recording, which allowed Tobias
to crop from a full sphere. The aim of the transcript is to show, from the passen-
ger’s perspective, the distribution of co-passengers in a shared space. Showing the
What we want to remind you of, before concluding, is that the transcripts in this
section were, of course, the result of a number of versions where different grabbed
images were tried, various layouts, speech bubbles shifted around, etc.
Conclusion
In opening this chapter, we examined, at some length, the reasons for and some of
the abiding problems of transcription in EMCA studies. We considered the intro-
duction of “live” graphic transcribing in data sessions as a novel way of showing
what each analyst had noticed. We underlined how rendering a video recording as
a comic strip shifted the analysts’ attention toward the embodied, visual, and spa-
tial aspects of members’ practices and how it changes the sharing of noticings with
the presenter. In a further reflection on our recent experiments in transcribing, we
demonstrated how, in finalising our transcripts for publication, we responded to
the phenomena of visual perspective and the intimately related problem of repro-
ducing members’ perspectives from the perspective(s) of our cameras.
As we argued, in the comic strip form, we have distinct possibilities to respond
to the action formulation and actor categorisation problems: For example, in rela-
tion to gaze, by showing direction of view, and to whom a specific view belongs,
rather than merely describing who is looking/gazing at what when. We have shown
how transcribing using comic strip conventions hybridised with transcription cri-
teria was shaped around particular analytic foci. We used the case of perspective
as one such spur to us, as part of participants’ sense-making and action produc-
tion, finding perspectives into traffic, guiding a co-passenger through a crowded
tunnel, distributing seats on a train, etc. As Jefferson herself showed, this required
both drawing upon and disrupting existing conventions for exhibiting phenomena.
Transcribing, in the comic strip form, shares EMCA’s commitment to tran-
scription as a way of highlighting and pointing up practices. As John Heritage
recalls the 1970s, when Jefferson first circulated her recording and transcripts:
[T]he transcripts highlighted features of the recordings that we might have oth-
erwise overlooked, and pointed up practices in the talk which turned out to be
highly relevant to our analyses. Because the tapes and transcripts were quite
widely circulated, they contributed to a ‘culture of transcribing’, to the develop-
ment of a set of common standards that we learned to share, and tried to live up
to. The standards allowed us to recognize failure – our own and other people’s
– as well as success.
(Heritage quoted in Hepburn & Bolden, 2017, p. 181)
Jefferson’s rightness ahead of inquiries that are “utterly obscure and unstable”.
Indeed, we do not wish to propose a fixed set of standards for the many elements
of the graphic transcript (e.g., panels, captions, textualised speech, layout of bub-
bles). For us, the Jeffersonian breakthrough emerged from questioning the existing
and dominant traditions of representation of hearable and consequential speech.
Even as Jefferson was building a new system because existing representations of
talk were ignoring the hearable and accountable features of speech, she was wary
around how speakers might then be caricatured in her transcribing. There is no
existing standard for graphic transcript orthography in EMCA that we wish to
depart from, it is already a departure from Jeffersonian and Mondada formats.
Building graphic transcripts for EMCA inquiries remains instead, as it shouldn’t
surprise EMCA scholars to learn, an ad hoc endeavour.
Our aim here has not been to pull you into a puzzling transcription procedure,
a tutorial of a primary problem in EMCA studies, that any recording and any tran-
script misses the haecceities of the thing. Garfinkel (2002) taught us that lesson in
his summoning phones tutorial, where transcribing was instructively impossible
and the tutorial in transcribing was at best a secondary one for his students and,
more likely, a useful ruse. Our aim instead has been to help EMCA research-
ers develop their craft in good-enough representations of events, weighing the
balance between their identifying details and gratuitous granularity. Elsewhere,
Garfinkel did not abandon attempts to show phenomena via vivid ethnographic
descriptions, diagrams, photographs, and re-use of members’ own representations
of a phenomenon. Our warrant for experimenting with graphic transcripts is to
maintain the Jeffersonian spirit of showing what is missed or otherwise carica-
tured. Using the comic strip’s forms of panels, speech bubbles, etc. is a different
way of picking up Jefferson’s pencil and tracing paper to settle upon what we are
attending to.
Note
1 Depending on the sensitivity of the materials, this may not be possible and/or may
require trusting members to destroy their copies of the video at the end of the session.
References
Albert, S., Heath, C., Skach, S., Harris, M. T., Miller, M., & Healey, P. G. T. (2019).
Drawing as transcription: How do graphical techniques inform interaction analysis.
Social Interaction: Video-Based Studies of Human Sociality, 2(1). https://doi.org/10
.7146/si.v2i1.113145
Antaki, C., Biazzi, M., Nissen, A., & Wagner, J. (2008). Accounting for moral judgments
in academic talk: The case of a conversation analysis data session. Text and Talk, 28(1),
1–30. https://doi.org/10.1515/TEXT.2008.001
Ayaß, R. (2015). Doing data: The status of transcripts in conversation analysis. Discourse
Studies, 17(5), 505–528. https://doi.org/10.1177/1461445615590717
262 Eric Laurier and Tobias Boelt Back
Back, T. B. (2020). One more time with feeling: Resemiotising boundary affects for doing
‘emotional talk show’ interaction for another next first time (ISBN 978-87-7210-627-4)
[PhD Thesis, Aalborg University]. Aalborg University Press.
Back, T. B. (2021). Building a passenger’s perspective. [Unpublished transcript from the
project ‘Travelling Together’]. Aalborg University. Copies available from the author.
Bogen, D. (1992). The organization of talk. Qualitative Sociology, 15(3), 273–295. https://
doi.org/10.1007/ BF00990329
Bogen, D. (1999). Order without rules. Suny Press.
Deppermann, A., Laurier, E., Mondada, L., Broth, M., Cromdal, J., De Stefani, E.,
Haddington, P., Levin, L., Nevile, M., Rauniomaa, M. (2018). Overtaking as an
interactional accomplishment. Gesprachsforschung - Online-Zeitschrift zur verbalen
Interaktion, 19, 1–131.
Garfinkel, H. (2002). Ethnomethodology’s program. Rowman & Littlefield.
Glenn, P. J. (2010). Laughter in interaction. Cambridge University Press.
Goodwin, C. (2000). Practices of seeing: Visual analysis - An ethnomethodological
approach. In T. v. Leeuwen & C. Jewitt (Eds.), Handbook of visual (1st ed., pp. 157–
182). Sage Publications.
Goodwin, C., & Salomon, R. (2019). Not being bound by what you can see now. Charles
Goodwin in conversation with René Salomon. Forum: Qualitative Social Research,
20(2). https://doi.org/10.17169/fqs-20.2.3271
Grennan, S. (2017). A theory of narrative drawing. Springer.
Have, P. t. (2007). Doing conversation analysis, A practical guide. Sage.
Heath, C., Hindmarsh, J., & Luff, P. (2010). Video in qualitative research: Analysing social
interaction in everyday life. Sage Publications.
Hepburn, A. (2004). Crying: Notes on description, transcription, and interaction.
Research on Language and Social Interaction, 37(3), 251–290. https://doi.org /10.1207/
s15327973rlsi3703_1
Hepburn, A., & Bolden, G. B. (2017). Transcribing for social research. Sage Publications.
Jayyusi, L. (1993). The reflexive nexus: Photo-practice and natural history. Continuum:
The Australian Journal of Media & Culture, 6(2), 25–52. https://doi.org/10.1080
/10304319309359397
Jefferson, G. (1978). What’s in a `Nyem. Sociology, 12(1), 135–139. https://doi.org/10.1177
/003803857801200109
Jefferson, G. (1983). Issues in the Transcription of Naturally-Occurring Talk: Caricature
versus capturing pronunciational particulars. Tilburg Papers in Language and
Literature, 34.
Jefferson, G. (1985). An exercise in the transcription and analysis of laughter. In T. A. Van
Dijk (Ed.), Handbook of discourse analysis (pp. 25–34). Academic Press.
Laurier, E. (2014a). The graphic transcript: Poaching comic book grammar for inscribing
the visual, spatial and temporal aspects of action. Geography Compass, 8(4), 235–248.
https://doi.org/10.1111/gec3.12123
Laurier, E. (2014b). The lives hidden by the transcript and the hidden lives of the transcript
[Unpublished manuscript].
Laurier, E. (2019). The panel show: Further experiments with graphic transcripts and
vignettes. Social Interaction: Video-Based Studies of Human Sociality, 2(1). https://doi
.org/10.7146/si.v2i1.113968
Problems & experiments transcribing video 263
Laurier, E., Muñoz, D., Miller, R., & Brown, B. (2020). A bip, a Beeeep, and a beep
beep: How horns are sounded in Chennai traffic. Research on Language and Social
Interaction, 53(3), 341–356. https://doi.org/10.1080/08351813.2020.1785775
Lindwall, O., & Lymer, G. (in press). Detail, granularity, and laic analysis in instructional
demonstrations. In M. Lynch & O. Lindwall (Eds.), Instructed and instructive actions.
Routledge.
Luff, P., & Heath, C. (2012). Some “technical challenges” of video analysis: Social actions,
objects, material realities and the problems of perspective. Qualitative Research, 12(3),
255–279. https://doi.org/10.1177/1468794112436655
McCloud, S. (1993). Understanding comics: The invisible art. Kitchen Sink Press.
McIlvenny, P. (2021). New Technology and Tools to Enhance Collaborative Video Analysis
in Live ‘Data Sessions’. QuiViRR: Qualitative Video Research Reports, 1. https://doi
.org/10.5278/ojs.quivirr.v1.2020.a0001
Mondada, L. (2014). Shooting as a research activity studies of video practices. In M. Broth,
E. Laurier, & L. Mondada (Eds.), Studies of video practices: Video at work (pp. 33–63).
Routledge.
Mondada, L. (2018). Multiple temporalities of language and body in interaction: Challenges
for transcribing multimodality. Research on Language and Social Interaction, 51(1),
85–106. https://doi.org/10.1080/08351813.2018.1413878
Schegloff, E. (2000). On granularity. Annual Review of Sociology, 26(1), 715–720. https://
doi.org/10.1146/annurev.soc.26.1.715
Watson, R. (1997). Some general reflections on ‘categorization’ and ‘sequence’ in the
analysis of conversation. In S. Hester & P. Eglin (Eds.), Culture in action: Studies in
membership categorization analysis (pp. 49–76). University Press of America.
INDEX
aboriginal languages 10, 172, 174, 176, embodiment 5, 11–12, 47, 112, 229
184–90 ethnographic knowledge 12, 77, 132,
accountability 1, 4, 77, 113, 115, 126, 203, 154–7, 163, 166–8
247, 255 ethnomethodology 4, 65, 76, 146, 177,
affordance 8, 96–101, 106, 111, 128–9, 133, 221–2, 245
138, 160, 177, 222 experimentation 7, 22, 109, 205–6,
avatars 112–29, 139, 229, 233–4 210–11, 256
camera: 360-degree 5, 10, 117, 132–46, geospatial framework 172, 174, 176–7
224, 227, 231–42, 256–8; handheld GIS 172, 174, 176–8
134–5, 162; GPS-derived data 172–4, 176–8
static 9, 86–8, 90, 93, 96, 106, graphic transcripts 246–9, 251–61
226; wearable 9, 85–87, 90,
96, 101–107 human-robot interaction 9, 42–4, 47, 58
camera perspective 5, 106, 115, 231, 234, individual activity 90–1, 93, 96
240, 255–7 inductivity 2, 153, 155–6, 159, 161, 166–7
camera view 87, 96, 100, 106, 135–46, 172, involvement 88, 93, 98
187, 233,
captioning 252–4, 259 locational pointing 171–5, 178–9, 183–4, 193
coding 7, 38, 183–4, 206–7, 209 longitudinal data 26, 31, 146, 167
comic strips 6, 11, 227–8, 251–7,
260–1 member’s perspective 3, 8–11, 24, 26,
communicative gestures 9, 24, 37, 173 63–4, 67–9, 75, 77, 153, 167
contextual configuration 64, 75, 111–12, mirroring 11, 199, 201–2, 205, 210
115, 125–7, 145 mobile groups 132–5, 138, 142, 145–6
co-presence 85, 87–8, 91, 93, 95, 98, 125, mobility 65, 71, 76, 85, 106, 134
129, 229–30 motion capture 113, 125, 204, 223
multiactivity 68, 90, 96, 106, 259
distributed bodies 112, 125–6
double membership 132, 143, 145 naturalness 86, 89, 101, 106
dual embodiment 111–15, 118–22, next-turn proof procedure 2–3, 23–4,
125–9 37–8, 43, 55, 57, 127
266 Index
non-accountability 10, 199, 207–10 screen capture 85–7, 90, 93, 95–6, 101,
non-human interaction 6, 8–9, 21–2, 24, 103–5, 107, 116–17, 253
26–7, 37–8, 47, 57, 66 sequence organisation 28, 37, 206
smartphone 85–8, 90–8, 101, 103, 105–7
observability 45, 47, 55, 63, 67, 71, social action 1, 3–4, 6–8, 22–3, 25, 113,
73–6, 85–6, 88, 97, 134, 136, 144, 115, 125–6, 128, 163, 199–200, 207
199–200, 237 space 4, 29, 47, 58, 71, 74, 111, 116, 127,
ocularcentrism 8–9, 63–8, 70–8 172–5, 179, 222, 240, 258; virtual 111–2,
115–7, 226, 229
participation 8–9, 12, 43–5, 57–8, 64, speech bubbles 252, 259–61
74, 93, 132–5, 137, 144, 146, 154, statistical analysis 205–6, 209–10
160, 209
participation framework 9, 58, 63–78, 87, temporality 90–1, 98, 252
111–12, 125–7, 238 topography 171–2, 176–7
physiological signal analysis 11, 199, transcription 9, 42–3, 45, 47–54, 57–8, 230,
201–3, 205, 208–10 245–251, 253, 255, 260–1; Jeffersonian
place reference 172, 174, 176–7, 184, 6, 11, 183, 246–9, 251–3, 261; Mondada
187–8, 191, 192–3 48, 90, 183, 246–8, 251–3, 261
pointing 10, 71, 75, 86, 96, 98–101,
113, 122–3, 171–184, 191, 192–3, unique adequacy 3, 6, 22, 77, 155, 166–7
232, 259 unmotivated looking 134, 143, 145, 154–6,
practice-based video analysis 223, 225 161, 163, 166–7, 206
practice-based VolCap analysis 224–5,
230, 240 VE see virtual environment
private actions 3, 7–8, 10, 96, 127 virtual: environment 5, 10, 111–12, 114–15,
proto-data 10, 155–7, 158, 161, 163, 166 118, 121–2, 125, 127, 129, 221, 224;
reality 11, 112,
rating 206, 209 116, 122–3, 138, 221, 229
recording equipment 68, 86–7, 89, 101, visual impairment 63, 65–6, 71, 73, 76
105–7 volumetric capture 221, 223–5, 240–1
researcher positionality 154 VR see virtual reality
robots 6, 8–9, 42–58, 63, 66–74 vulgar competence 3, 58, 77