Auditory Phonetics
Auditory Phonetics
Auditory Phonetics
Auditory Phonetics
Herbert Pilch
To cite this article: Herbert Pilch (1978) Auditory Phonetics, Word, 29:2, 148-160, DOI:
10.1080/00437956.1978.11435657
To link to this article: https://doi.org/10.1080/00437956.1978.11435657
Auditory Phonetics*
• This is the slightly revised text of a plenary session paper in the International Congress of
Phonetic Sciences, Miami Beach, December 19th, 1977.
1
G. Ungeheuer, "Kommunikative und Extrakommunikative Gesichtspunkte in der
Phonetik," Proc. 6th Int. Cong. Phonetic Sciences (1967), ed. B. Hala, M. Romportl, and P.
Janota, Prague: 1970, pp. 73-85.
2
Ed. G. Fant and M.A. A. Tatham, London 1975.
14X
AUDITORY PHONETICS 149
Instituut voor Perceptieonderzoek of Eindhoven. The ear is modelled (for
this purpose) as a frequency analyzer, an inbuilt spectrograph.
Consequently, auditory phonetics appears to be a special branch of
acoustic phonetics. I certainly see no reason to belittle this work, but I do
wish to use the term auditory phonetics for something wider, perhaps
something different-not just for the study of acoustically predefined
stimuli, but generally for the auditory perception of linguistic stimuli,
which are not necessarily predefined acoustically-thereby putting audi-
tory phonetics on a par with articulatory and acoustic phonetics. Auditory
phonetics (in this sense) should be closer to the way we communicate-not
through acoustically predefined noises, but just through (linguistically
structured) noises.
The type of auditory category I have in mind does occur in the
established canon of phonetic learning, but only in a few isolated instances,
not as a comprehensive network. What I have in mind are categories like
the HISS and the HUSH, the difference between /s/ and/~/. The hiss and the
hush are auditory terms; they describe what we hear. But we often prefer
the articulatory labels, talking about apico-alveolar vs. !amino-alveolar
fricatives, and we like to believe the articulatory specification is more
"objective" than our "subjective hearing" can ever be. Little do we realize
that the articulatory parameters are, in such cases, mere imitation labels,
not objective, but speculative. 3 To clinch the argument, let us consider one
of the well established facts about aphasia. I affirm, with confidence, that in
paradigmatic aphasia the hiss-hush distinction is one of those lost early, 4
because I have heard this happen with enough aphasic patients. Can I say
as much for the apico-alveolar and the !amino-alveolar fricatives? I
cannot; though I have heard many aphasic patients, I have seen very little
of their tongue movements. So I should conclude the auditory specification
is more reliable, more "objective" than the articulatory imitation labels.
The job of auditory phonetics is, then, not only to make auditory
parameters available for isolated events such as the hiss and the hush, but
to make available a comprehensive network of auditory parameters which
will ideally cover all phonetic events. Once this has been achieved with the
necessary precision, we will be able to dispense with a great deal of
speculation, as in the current distinctive feature theories-whose feature
3
I wholeheartedly agree with Berti! Malmberg: "Les faits perceptifs sont aussi 'objectifs',
aussi 'reels', aussi 'mesurables'-si les methodes de mesure sont adequats---{jue les faits
physiques" ("Changement de perspective en phonetique," in Nouvelles perspectives en
phonetique, Brussels: 1970, p. 12).
4
As discovered by R. Jakobson, Kindersprache, Aphasie und allgemeine Lautgesetze,
Uppsala: 1943, rpt. Collected Writings, l, The Hague: 1962, pp. 328-401.
150 HERBERT PILCH
7
Witness the experiments conducted by Eli Fischer-J0rgensen, op. cit. (fn. 2 above),
pp. 153~176.
8
The auditory network is spelled out in depth, and applied to English, in my Manual of
English Phonetics, ch. 7 (Munich, Fink, in press).
9
See Richard C. Berry, "A three-feature system for English vowels," Proc. Seventh Int.
Cong. Phonetic Sciences, ed. A. Rigault and R. Charbonneau, The Hague 1972, pp. 452-459.
AUDITORY PHONETICS !53
Not necessarily in the simple, complementary sense that the investigator
mistook for /i/ a "factual jej", but that he was mistaken in his belief that
there IS such a thing as a "factual /i/" versus a "factual jej''? Things like /i/
and jej have, after all, been abstracted from a continuously changing
soundflow. What is so "factual" about them as to leave no room for
legitimate disagreement? Sure, when a properly trained listener listens to a
properly trained speaker producing a minimal pair like pit-pet, no
uncertainty is to be expected. But this is a highly specialized laboratory
situation. I do not think we can safely generalize on this situation, assuming
that ordinary communication also works with unambiguous /i/'s and jej's
all the time. In fact, all of us probably have experienced the ambiguity
involved in transcribing some spontaneous conversation recorded on tape.
Let us call such a tape an AUDITORY TEXT 10-in contrast with the TEST
UTTERANCE produced ad hoc. In an auditory text, it is by no means always
easy to determine which phoneme is "factually there". The reason is, I
submit, that the different phonemes of a given language do not factually
sound different from each other, in auditory texts, as neatly as we would
like to believe. On the contrary, there is a great deal of PHONEMIC
INDETERMINACY. Sometimes this can be resolved by the context. For
instance, I hear someone talk about "the Publishing House of Macmillan"
with the m of publishing sounding much more like a [kh]. But of course I
know the word is publishing house, not publicking house, so I conclude the
speaker did not pronounce properly. But do speakers, ordinarily, pro-
nounce "properly"? True, the speaker concerned could probably be
induced to pronounce the minimal pair finishing-finicking. Does he carry
over this distinction from the test situation into ordinary communication?
Obviously he does not. The fact is that some s-phonemes sound so much
like k that the two cannot be distinguished.
Some instances of phonemic indeterminacy cannot be resolved even
through prior morphemic interpretation, I hear an utterance which is,
presumably, either and you wouldn't feel confident or I just wouldn't feel
confident with the "main stress" on wouldn't. The proclitic section pre-
ceding wouldn't sounds like [aiJi]. If this represents and you, then [i] is the
denasalized vocoid allophone of the phoneme /n/. In fact, both nasalized
and denasalized vocoids are heard, commonly enough, for the phoneme /n/
in English, as finally in government j'g;:wm;)t;::::: 'g~vm~t/. On the other hand
!Ji! could, perhaps, be a proclitic allomorph of just, and this is more
10
This is E. Zwirner's Abhortext, cf. his Grundfragen der Phonometrie, 2nd ed. rev.
Bibliotheca phonetica 3, Basel 1966, pp. 173-178; H. Bluhme in his English translation uses
phonetic text (Principles of Phonometries, The University of Alabama Press 1970, pp. 130-
135).
154 HERBERT PILCH
plausible in the present context, as it carries on a rhetorical anaphora.
Which phonemes are "factually present"? /renJi/? or jaiJi(st)/? We cannot
answer this question uniquely. It is, I conclude, an improper question in
this instance. The transcripts can, in the nature of things, be no more than
rough approximations. How do people communicate with so much
indeterminacy? The fact is they do, and they couldn't care less. In the
present instances (both are from Canadian radio programs), no listener has
been puzzled except the phonetician. And even the phonetician was not
puzzled till he had done a good deal of re-listening to the tape.
Dialectologists often call on a native speaker to help them transcribe an
auditory text. Even though they know the phonemic system involved, they
are puzzled by many passages. 11 This has been explained by the native
speaker myth, attributing to the native speaker a sixth sense which enables
her or him to recognize phonemes which the non-native does not. The
rational explanation is, I submit, in the phonemic indeterminacy of the
auditory text. As the indeterminacy is phonemic, it cannot be resolved by
phonemic criteria, but it requires the skill of an editor who, by virtue of
cultural experience, knows not only how the community concerned usually
expresses itself, but what it usuflllY talks about. The dialectologist, for all
his learning, cannot rival the cultural experience of the untutored native
speaker. This is also why automatic speech recognition is so difficult.
Though we may program into the computer all our acoustic knowledge, we
cannot yet do the same with the cultural knowledge on which the human
listener also relies for recognition.
Ill. PHONETIC HEARING
The assumption that there IS such a thing as "factual [e]" vs. "factual [i]"
presupposes a UNIQUE PARTITION of auditory space and, conversely, a
single, standard list of different "sounds". A crude version of this idea is
what we were taught in grade school: there ARE five (or six, or seven, or
eight ... ) vowels a, e, i, o, u, ... 12 The more sophisticated versions of the
standard list claim to be "universal phonetic alphabets". They differ from
the crude version in the number of vowels, but the basic idea remains the
same: the unique partition of auditory space-not in terms of just five or six
vowels, but (say) of twenty, or fifty or two hundred, or in terms of a single
list of distinctive features-twelve, fourteen, twenty, or suchlike. The
phonemic system (or sound pattern) of any particular language is then
11
As discussed by E. Zwirner, Grundfragen, pp. 169-173 (Phonometries, pp. 133-135); F.
Hedblom, "Recording in Dialect Investigation in Sweden," Phonetica, 3 (1959), 95-108.
12
In fact, one of my (flunking) students at the University of Massachusetts argued thus the
other day. Noticing my disapproval, he explained: "The teacher told us so in High School."
AUDITORY PHONETICS !55
taken to constitute a selection from the big standard list-in such a way
that certain specific "sounds" are selected from the list and all others
deleted: "In the beginning God created the heavens and the earth and the
International Phonetic Alphabet. 13
Now this is, I submit, a bad theory. Its justification is not empirical, but
cultural. It is extrapolated from the Latin alphabet, which our culture has
been using and adapting to new languages for many centuries. So the Latin
alphabet appears "natural" to us-either the crude version with five vowels
or one of the more sophisticated versions. 14 Other cultures, say the
Chinese, use, for their version of phonemic analysis, a different partition of
auditory space, one in which the question "How many vowels?" simply
does not apply. 15 The idea that phonemic systems constitute subsets from
some universal list of sounds is, I submit, irreconcilable with the empirical
evidence:
I. If every phonemic system were, in fact, a subset from a standard
"universal phonetic alphabet", then we should be able to learn how to
pronounce an unknown language from a sufficiently narrow transcription.
We all know from experience that this cannot be done. The learner can, in
13
Note Charles Hockett's caustic comment on "the assumption that there is a large, but
strictly finite, stock of distinct humanly possible articulations or speech sounds, from which
each language selects a small subset as its phonemes." As Hockett remarks, "In fact, of course,
the set of all humanly possible articulations forms a multidimensional continuum, from which
discretely contrasting ranges are quarried by quantization, and the ways in which this is done
in two different languages need show no congruence." (Amer. Speech, 47 [for 1972; belatedly
published 1975], 243n.) In the same vein, this writer "rejects transcriptions which allege some
particular topological structuring of phonetic space as universal" (Phonemtheorie, 3rd ed.
Basel 1974, p. xi).
14
This was first pointed out by H. Liidtke: "Der Gedanke, daB man mit den modernen
Transkriptionssystemen die Gesamtheit lautlicher Variation aller Sprachen und Mundarten
erfassen konne, sofern man nur das Symbolrepertoire angemessen erweitert, ist letzlich eine
Extrapolation der Entwicklungsgeschichte der Lateinschrift" (Folia Linguistica, 5 (1972],
336). Conversely, alphabetic writing has been taken to confirm the validity of our transcripts:
"The segmentability of the speech chain into discrete and global phonemes is firmly
established by alphabetic writing systems" (0. S. Achmanova, Proceedings of the Seventh
International Congress of Phonetic Sciences p. 170). I question not the segmentability as such
(see section (4) below), but the general validity of particular segmentations such as the
International Phonetic Alphabet.
15
A convenient summary statement of the Chinese way of phonemic analysis is offered by
Tung-Ho Tung, Bipartite Division of Syllables in Chinese Phonology," Proc. Ninth Int. Cong.
of Linguists, p. 203. The reason for the different cultural modes of phonemic analysis does
not necessarily lie in differences oflinguistic structure. Chinese can be analyzed with vowels, as
has been done by C. H. Hockett, "Peiping Phonology," J. A mer. Oriental Soc., 67 (1947), 253-
267; "Peiping Morphonemics," Language, 26 (1950), 63-85. Conversely, some European
languages, such as Swedish and Norwegian, have pitch patterns that are distinctive on the
level of lexical meaning, but most dictionaries omit them.
156 HERBERT PILCH
fact, not get away with the selection from a known partition of auditory
space, but he must learn a new partition, a new AUDITORY CATEGORIZATION
with every new language he learns.
2. Conversely, if the universal phonetic alphabet did, in fact, apply to
every language, then phoneticians should be able tc transcribe every
unknown language properly at the first attempt. The expert who knows "all
the sounds of the human species" must know every sound in any particular
subset. Experience shows that the phonetician can, at the first attempt,
achieve no more than a rough approximation, because she or he has not yet
learned the specific auditory categorization which is pecu::ar to the new
language. We categorize what we hear in terms of what we already know
(such as some phonetic alphabet). As the two categorizations are incom-
mensurate, we often find that the native speaker to whom we read our
transcripts does not understand them.
3. Certain aphasic patients can hear in the sense that they have normal
hearing under audiological examination, but they do not recognize
phonemes. 16 For instance, I had a Welsh patient who had lost all but the
most elementary pitch patterns of Welsh, yet he could sing. 17 What he had
lost was not the sense of hearing, but the auditory categorization of what he
heard in terms of the pitch patterns of Welsh.
It appears then that auditory perception is not simply a matter of hearing
something the way it "is," but it necessarily involves different auditory
categorizations. These auditory categorizations are, in principle, at the
discretion of the listener. It is the listener who chooses to treat a particular
bit of noise as just noise or as an auditory text and who decides whether it
can plausibly be treated as a text. Analytically speaking, this decision is
what constitutes phonemic analysis. The particular phonological model A,
B, C, or D in terms of which this analysis is couched-Chinese or
European-is a matter of detail. 18 For the purposes of auditory phonetics,
I propose to distinguish between three modes of auditory perception: 19
1. The AUDIOLOGICAL hearing which is a biological property of the
16
This is the syndrome known as PHONEMIC DEAFNESS (Lurija) or LAUTTAUBHEIT (Kleist).
17
See my account in "Aphasische Intonationsstorungen," Saggi, 2 ( 1976) 33-42; "Aphasia
in Welsh," Word, 28 (1976), 207-229.
18
Cf. H. Pilch, Phonemtheorie, 3rd ed., Basel: 1974, p. 157.
9
-' This is close to Berti! Malmberg's views, Introduktion till fonetiken som vetenskap,
Stockholm: 1969. Malmberg distinguishes between the same three modes of perception, but
he recognizes the third not as crucial to intelligibility, but only as a crosscheck on signals
already recognized phonemically: "Faktor 2 ar avgorande for mottagarens reaktion. Om
denne brukat den riktiga koden, identifierar han ljudvagen som en utsaga pi\ ett kant sprak". I
have presented more extensive data in support of my alternative view in "La langue et Ia
comprehension: experience d'un aphasique," La linguistique, 10 (1974), 79-90. Most
AUDITORY PHONETICS 157
normal human ear as such, for instance the pitch ranges and loudness
thresholds within which our ears perceive noise, feel pain etc.
2. The PHONEMIC listening which is an acquired faculty of those people
who have learned a given language. It involves for example the recognition
not just of pitch, but of the pitch patterns of a given language. It is lost
under phonemic deafness.
3. The EDITORIAL understanding which involves hunches and guesses
and is best practised by people who have a great deal of cultural experience
with a given language. It is lost (or partially lost) in sensory aphasia.
IV. AUDITORY ANALYSIS
What, then, is the job of auditory phonetics? The answer I proposed
above (end of introduction) appears oversimplified from our present
vantage point. Let us revise it: Auditory phonetics should specify, in a
coherent manner, the partitions of auditory space imposed by different
phonemic systems. As I do not believe in "the natural partition" of
auditory space, I adapt to my purposes culturally pre-established par-
titions, such as noise vocabularies, alphabets, audiological test scales, etc.
This is my justification for drawing on the noise vocabulary of English in
section (i) above. The empirical motivation for adopting any particular
category is provided by phonemic experience with different languages. It
will determine whether or not we need auditory parameters to describe
(say) different voice qualities (see end of section (i) above). 20
Consider, as an example, the category of pitch. This is a pre-established
category familiar to musicians and audiologists. We use it, in order to
specify the pitch patterns of languages, even though pitch in languages
sounqs remarkably different from musical pitch. Phonemic experience,
however, convinces us we do not need the musician's elaborate classifi-
cation of pitches in terms of the diapason. 21 All we need is a simple high-
phoneticians appear to attach even less importance to the editorial mode of perception than
does Malmberg. They assume a "mini-interpreter" or "precategorical auditory store" which
recognizes messages by a "matching procedure", comparing the input signal, on a purely
phonetic basis, with the phonemic units somewhere in the brain.
1. C. Lafon distinguishes between the audiological mode (audition) and the phonemic mode
(integration) of perception (Message et phonetique, Paris: 1966).
°
2
Cf. D. Fry: "there seems to be no record of a language in which voice quality differences
operate independently as a prosodic feature." (Manual of Phonetics, 2nd ed., ed. B.
Malmberg, Amsterdam: 1968, p. 370.)
21
It is true there have been linguistic investigations using the diapason framework, such as
1. E. Buning and C. H. van Schooneveld, The Sentence Intonation of Contemporary Standard
Russian as a Linguistic Structure, The Hague: 1961. Such investigations are unrealistic in the
sense that they contain more detail than can be distinguished by ear in terms of the recurrent
differences (see below).
158 HERBERT PILCH
22
On French cf. F. Carton, Introduction a Ia phonetique du fran~ais, Paris 1974. On
English cf. fn. 8 above; on Welsh cf. my article Advanced Welsh Phonemics, Zeitschrifi.fiir
ce/tische Philo/ogie 34 (1975) pp. 60-102; on German cf. my article "Baseldeutsche
Phonologie auf Grundlage der Intonation," Phonetica, 34 (1977), 165-190.
23
G. Fan!, Den akustiskafonetikens grunder, Stockholm: 1957.
24
The inventory of Lettish has been described by L. K. Ceplitis, Analiz re~evoj intonaciji,
Riga: 1974, p. 113; the inventory of Finnish by A. Sovijiirvi, Olli-Matti Ronimuksen eriiistii
runoista laadittuja lausunto--analyysi-harjoituksia (Fonetikaan /aitos, Helsinki: 1963).
25
For an acoustic specification see J. M. Howie, Acoustical Studies of Mandarin Vowels
and Tones, Cambridge and New York 1976.
AUDITORY PHONETICS 159
hopeless. The pitches keep drifting in all directions. Still, the specification is
good. How does one ever learn to recognize these tones? More generally,
how can we ever analyze a phonemic system which we do not know
already?
The way to tackle the job is, first, to forget about the auditory
specification. What we do instead is, in principle, listen to several sequences
of at least two tones (the segmentation presents no particular difficulty
most of the time) and decide not whether they rise or fall, but whether they
sound the same or different. This is the most elementary phonetic
judgment (as we saw above). We do, of course, hear subtle differences
everywhere. We may thus be tempted never to judge any two given tones
(or other phonetic events) to be the same. If we fall to this temptation, we
will fail to arrive at any partition of the auditory space, that is, fail at
phonemic analysis. The way to escape this pitfall is to take into account
only such differences as we can hear again and again consistently (not just
once or twice or any limited number of times). Thus we listen ultimately not
just for given· phonetic events, but for recurrent auditory differences
between classes of phonetic events. In this way one eventually learns to
recognize the tones of Chinese, or the pitch patterns of intonational
languages 26 and even phonetic parameters other than pitch. This is at least
one way to learn,. by trial and error, those "new partitions of auditory
space" which characterize unknown languages. In every case it is essential
not to project one's auditory impressions onto a "general phonetics chart",
as was done by the old "Ohrenphonetik," but to compare them with each
other, listening. for recurrent differences. The specification of these
differences in terms of auditory parameters, is a subsequent step. First
recognize them, then characterize them. It is after we recognize the set of
four tones that we can apply to them such parameters as level, rise, fall-rise
and fall, and we can then argue whether or not some fall-rises might not be
better described as rise-falls, low-levels, and so on.
This procedure reflects the two basic assumptions of phonetics, 2 7 namely
that phonetic events are (i) distinguishable, and (ii) classifiable.
Both assumptions imply the discrete character of phonetic elements. If they
were not distinguishable, we would be unable to recognize them. If they were
not classifiable, we would be unable to classify them as same or different. These
assumptions constitute the essential difference between audiology and audi-
tory phonetics, acoustics and acoustic phonetics, physiology and articulatory
phonetics. Without these assumptions (or their equivalents) no phonetic work
26
Lexical and semantic criteria are, of course, helpful in the case of tones, but not in the case
of intonations which, by definition, do not involve lexical and semantic differences; see my
article "Intonation in Discourse Analysis," Phonetica, 34 ( 1977), 81-92.
27
As formulated by E. Zwirner, Grundfragen, p. Ill.
160 HERBERT PILCH
Albert-Ludwigs-Universitiit
Freiburg i.Br. and
University of Massachusetts at Amherst