Auditory Phonetics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Word

ISSN: 0043-7956 (Print) 2373-5112 (Online) Journal homepage: www.tandfonline.com/journals/rwrd20

Auditory Phonetics

Herbert Pilch

To cite this article: Herbert Pilch (1978) Auditory Phonetics, Word, 29:2, 148-160, DOI:
10.1080/00437956.1978.11435657
To link to this article: https://doi.org/10.1080/00437956.1978.11435657

Published online: 16 Jun 2015.

Submit your article to this journal

Article views: 16709

View related articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=rwrd20
HERBERT P I L C H - - - - - - - - - - - - -

Auditory Phonetics*

The most elementary judgments in phonetics are auditory judgments.


They are judgments of auditory sameness and difference. Such judgments
precede even the most sophisticated experimental research.
The initial sections of the English words pill and pin sound the same, but
they sound different from the initial sections of Bill and bin. The difference
between pin and bin sounds the same as that between tin and din. It is on the
basis of such elementary auditory judgments that we set up the English
stops and their distinctive features and that the Haskin's Group researches
into voice onset time.
Elementary phonetic information is passed on, necessarily, through oral
transmission. Other kinds of knowledge may be gleaned from books, but
did a phonetician ever learn from a textbook things like glottal fry (creaky
voice), the lateral fricative of Welsh, or the tonelag ofNorwegian?
We like to think of phonetics as a communication-oriented discipline. 1
Now how do we communicate? We communicate by listening. And yet,
when we describe phonetic events, do we talk about the way we hear them?
No, we talk about the output of our sophisticated machinery, about high-
level abstractions such as distinctive features and phonemes. For this
purpose, we have built up comprehensive networks of articulatory
parameters, and comprehensive networks of acoustic parameters. But
there is nothing comparable in the domain of auditory parameters.
The very question which I am posing is not generally put in this way;
rather auditory phonetics is understood as the study of test responses to
acoustically defined stimuli. "Auditory analysis and the perception of
speech" (in this sense) was the title of a symposium recently held at
Leningrad. 2 "Perceptual phonetics" (in this sense) is the research of the

• This is the slightly revised text of a plenary session paper in the International Congress of
Phonetic Sciences, Miami Beach, December 19th, 1977.
1
G. Ungeheuer, "Kommunikative und Extrakommunikative Gesichtspunkte in der
Phonetik," Proc. 6th Int. Cong. Phonetic Sciences (1967), ed. B. Hala, M. Romportl, and P.
Janota, Prague: 1970, pp. 73-85.
2
Ed. G. Fant and M.A. A. Tatham, London 1975.
14X
AUDITORY PHONETICS 149
Instituut voor Perceptieonderzoek of Eindhoven. The ear is modelled (for
this purpose) as a frequency analyzer, an inbuilt spectrograph.
Consequently, auditory phonetics appears to be a special branch of
acoustic phonetics. I certainly see no reason to belittle this work, but I do
wish to use the term auditory phonetics for something wider, perhaps
something different-not just for the study of acoustically predefined
stimuli, but generally for the auditory perception of linguistic stimuli,
which are not necessarily predefined acoustically-thereby putting audi-
tory phonetics on a par with articulatory and acoustic phonetics. Auditory
phonetics (in this sense) should be closer to the way we communicate-not
through acoustically predefined noises, but just through (linguistically
structured) noises.
The type of auditory category I have in mind does occur in the
established canon of phonetic learning, but only in a few isolated instances,
not as a comprehensive network. What I have in mind are categories like
the HISS and the HUSH, the difference between /s/ and/~/. The hiss and the
hush are auditory terms; they describe what we hear. But we often prefer
the articulatory labels, talking about apico-alveolar vs. !amino-alveolar
fricatives, and we like to believe the articulatory specification is more
"objective" than our "subjective hearing" can ever be. Little do we realize
that the articulatory parameters are, in such cases, mere imitation labels,
not objective, but speculative. 3 To clinch the argument, let us consider one
of the well established facts about aphasia. I affirm, with confidence, that in
paradigmatic aphasia the hiss-hush distinction is one of those lost early, 4
because I have heard this happen with enough aphasic patients. Can I say
as much for the apico-alveolar and the !amino-alveolar fricatives? I
cannot; though I have heard many aphasic patients, I have seen very little
of their tongue movements. So I should conclude the auditory specification
is more reliable, more "objective" than the articulatory imitation labels.
The job of auditory phonetics is, then, not only to make auditory
parameters available for isolated events such as the hiss and the hush, but
to make available a comprehensive network of auditory parameters which
will ideally cover all phonetic events. Once this has been achieved with the
necessary precision, we will be able to dispense with a great deal of
speculation, as in the current distinctive feature theories-whose feature
3
I wholeheartedly agree with Berti! Malmberg: "Les faits perceptifs sont aussi 'objectifs',
aussi 'reels', aussi 'mesurables'-si les methodes de mesure sont adequats---{jue les faits
physiques" ("Changement de perspective en phonetique," in Nouvelles perspectives en
phonetique, Brussels: 1970, p. 12).
4
As discovered by R. Jakobson, Kindersprache, Aphasie und allgemeine Lautgesetze,
Uppsala: 1943, rpt. Collected Writings, l, The Hague: 1962, pp. 328-401.
150 HERBERT PILCH

specifications usually prove a little fanciful once they are subjected to


serious articulatory and acoustic investigation. 5
I. AUDITORY CATEGORIES
I wish to report about my attempt to invent a network of auditory
parameters. It is based on the noise vocabulary of English. The lexicon of
any language will embody a great deal of auditory categorization. Like all
colloquial vocabularies, noise vocabularies are fuzzy at many points, and
they have plenty of gaps. For the purposes of a technical vocabulary, the
fuzzy meanings need straightening out, and the gaps need filling. This is
done by laying down precise technical meanings and coining new terms ad
hoc. All this is standard procedure in the creation of technical vocabularies.
As one dimension of auditory space, let us consider noise words with
different time characteristics:
(i) INSTANTANEOUS NOISE: bang, burst, click, crack, pop, snap, tap, thud,
thunk.
(ii). BRIEF NOISE: beep, creak, peep, squeak, swish, zoom.
(iii) CONTINUOUS NOISE: buzz, drone, hiss, hum, hush, rasp, rustle.
(iv) FLUTTERY NOISE: 6 clatter, gurgle, patter, pocketa-pocketa, rattle,
rumble, sizzle, sputter.
The words of the first class refer to SINGLE, MOMENTARY noises, such as
the crack of a whip, the pop of a cork, the thud of a book falling on a
carpeted floor, the thunk of a car door slamming shut.
The words of the second class refer to SINGLE, BRIEF noises .. They are brief
but not so brief as to be momentary-such as the beep of a radio
transmitter, the creak of a wooden floor, the swish of a scythe cutting the
air, the zoom of a jet plane passing us overhead. If the swish of a whip is too
short to be anything but momentary, it is no longer called a swish, but a
crack.
The words of the third class mean EVEN, CONTINUOUS noise. It is
5 This has been admitted even by some M.I.T. phoneticians, cf. C. M. Bush: "The acoustic

specification as presently detailed in the distinctive feature analysis is either inadequate or


inappropriate for the four English fricatives under investigation [i.e. /f v 91'1/]." (Phonetic
Variation and Acoustic Distinctive Features, The Hague: 1964, Janua linguarum series practica
12, p. 136). It has been known for a long time to readers of E. Zwirner, M. Joos and H. M.
Truby. Mario Rossi has highlighted the in built contradiction of a distinctive feature theory
which claims to be adequate both to the abstract linguistic units (phonemes) and the acoustic
observation. He advocates distinctive features on the perceptual level in the psychoacoustic
sense-"Les faits acoustiques," La linguistique 13 (1977), 63-82.
6 This term is taken from the technical vocabulary of flute playing where it is a loan

translation of German Flatterzunge.


AUDITORY PHONETICS 151
homogeneous and of indefinite duration. The buzz of a bluebottle may be
heard briefly or for a long time; the swish of a scythe is necessarily brief.
Otherwise it would no longer be called swish but perhaps whistle-like the
whistle of an approaching artillery shell.
The words of the fourth class mean UNEVEN, CONTINUOUS noise. It swells
up and down, like the fluttery tone of a flute. The pattering of feet on the
ground consists of many individual "patters"-sit venia verbo-but no
individual "patter" is called this, it is called a thump or tap (or by some
other instantaneous-noise word).
These four classes of noise words are, in relation to each other,
ANTONYMOUS and EXHAUSTIVE. They are antonymous in the sense that each
class differs in meaning from the three other classes. True, any one noise
word of the colloquial vocabulary may have multiple meaning-witness
the swish with its current slang meaning of "homosexual". If this bothers
us, we can make it unambiguous by allowing only one of these meanings in
the technical vocabulary.
The four classes of noise words are EXHAUSTIVE in the sense that every
noise word of English (hopefully) belongs to at least one of them. They thus
do for us what we have asked auditory phonetics to do. They are clear, and
they are comprehensive (as far as time characteristics are concerned). They
apply to the noise classes of phonetics fairly smoothly.
The single instantaneous noises are our old friends the stops. Within this
class the BURSTS (explosives), SNAPS (implosives), FLAPS and DOUBLE STOPS
form well known subclasses. A less well known subclass has friction noise
between the onset (snap) and release (burst), as the diaphones of /'p/ and /o/
used in Ireland and Newfoundland. The single brief noises are the glides.
The even, continuous noises are the continuants, the fluttery noises are the
trills.
Is this more than just a set of new terms? Does auditory phonetics teach
us anything, in other words, that is not handled at least equally well by
acoustic and articulatory phonetics? Yes, it does, in that the auditory
categories cut right across the articulatory categories in quite a few
instances. Consider the phonemes of English. Auditorily, not only are the
semi-vowels /h y w/ glides, but also the so-called "lax-vowels", the
retroflex vowels and the "vocalized" diaphone of /1/, as widely heard in
American English. All these are glides in the sense specified above. They are
of more than momentary duration, but not of indefinite duration.
For a quick cross-check try to draw out a retroflex vowel indefinitely. It soon
no longer sounds "r-colored"---even though it may still be retroflex in terms of
tongue-position. Try the same with a "vocalized/". Or draw out a lax vowel
indefinitely-say the stressed /t/ of the German word bitter. It soon sounds like
the tense vowel /e/ of Beter-whatever its articulatory laxness may be. 7
152 HERBERT PILCH
The auditory specifications provide, at the same time, the phonetic
motivation why "lax" vowels must always be followed by a final consonant
(in the stressed syllables of those Germanic languages in which lax and
tense vowels contrast). As they are glides, they are not of indefinite
duration and occupy by themselves the nucleus of the intonation contour
(which demands lengthening).
Thus auditory phonetics, far from being a mere terminological in-
novation, enables us to solve at one stroke two riddles of long standing
(which have proven recalcitrant to even the most advanced methods of
acoustic and physiological investigation), the phonetic specification of the
tense-lax distinction, and the phonetic link between the lax vowels and
the obligatory presence of final consonants. Why should these answers
be found in the auditory (more readily than in the acoustic or articula-
tory) domain? Because the tense-lax distinction functions in communica-
tion not by virtue of its acoustic properties, but by virtue of the way we
hear it.
The two major auditory dimensions other than time are RESONANCE and
TIMBRE. Timbre is, in the colloquial vocabulary of English, described in
terms of the antonymous pairs bright-dark, dull-clear, thin-full, soft-hard.
These parameters apply, inter alia, to voice quality, a field which
articulatory and acoustic phonetics have not clarified so far. Changes of
voice quality can be HEARD in connection with certain pitch patterns. For
instance, a change from hard to soft voice is associated, in English, with the
extra-high pitch on a concave-fall nucleus. I cannot, at this point, go into
further detail. 8

II. THE AUDITORY TEXT


The crosscheck on the validity of the auditory network is provided by the
well known confusion matrices. The lax vowels of English, for instance, are
more readily confused with each other than they are with the tense vowels. 9
The confusion matrices are customarily taken to indicate that the listener is
in error. A given lax vowel (say /i/, as in pit) was "factually present", but the
listener "mistook it" for another lax vowel (say jej, as in pet). Now could it
not be that it wasn't the listener who was mistaken, but the investigator?

7
Witness the experiments conducted by Eli Fischer-J0rgensen, op. cit. (fn. 2 above),
pp. 153~176.
8
The auditory network is spelled out in depth, and applied to English, in my Manual of
English Phonetics, ch. 7 (Munich, Fink, in press).
9
See Richard C. Berry, "A three-feature system for English vowels," Proc. Seventh Int.
Cong. Phonetic Sciences, ed. A. Rigault and R. Charbonneau, The Hague 1972, pp. 452-459.
AUDITORY PHONETICS !53
Not necessarily in the simple, complementary sense that the investigator
mistook for /i/ a "factual jej", but that he was mistaken in his belief that
there IS such a thing as a "factual /i/" versus a "factual jej''? Things like /i/
and jej have, after all, been abstracted from a continuously changing
soundflow. What is so "factual" about them as to leave no room for
legitimate disagreement? Sure, when a properly trained listener listens to a
properly trained speaker producing a minimal pair like pit-pet, no
uncertainty is to be expected. But this is a highly specialized laboratory
situation. I do not think we can safely generalize on this situation, assuming
that ordinary communication also works with unambiguous /i/'s and jej's
all the time. In fact, all of us probably have experienced the ambiguity
involved in transcribing some spontaneous conversation recorded on tape.
Let us call such a tape an AUDITORY TEXT 10-in contrast with the TEST
UTTERANCE produced ad hoc. In an auditory text, it is by no means always
easy to determine which phoneme is "factually there". The reason is, I
submit, that the different phonemes of a given language do not factually
sound different from each other, in auditory texts, as neatly as we would
like to believe. On the contrary, there is a great deal of PHONEMIC
INDETERMINACY. Sometimes this can be resolved by the context. For
instance, I hear someone talk about "the Publishing House of Macmillan"
with the m of publishing sounding much more like a [kh]. But of course I
know the word is publishing house, not publicking house, so I conclude the
speaker did not pronounce properly. But do speakers, ordinarily, pro-
nounce "properly"? True, the speaker concerned could probably be
induced to pronounce the minimal pair finishing-finicking. Does he carry
over this distinction from the test situation into ordinary communication?
Obviously he does not. The fact is that some s-phonemes sound so much
like k that the two cannot be distinguished.
Some instances of phonemic indeterminacy cannot be resolved even
through prior morphemic interpretation, I hear an utterance which is,
presumably, either and you wouldn't feel confident or I just wouldn't feel
confident with the "main stress" on wouldn't. The proclitic section pre-
ceding wouldn't sounds like [aiJi]. If this represents and you, then [i] is the
denasalized vocoid allophone of the phoneme /n/. In fact, both nasalized
and denasalized vocoids are heard, commonly enough, for the phoneme /n/
in English, as finally in government j'g;:wm;)t;::::: 'g~vm~t/. On the other hand
!Ji! could, perhaps, be a proclitic allomorph of just, and this is more
10
This is E. Zwirner's Abhortext, cf. his Grundfragen der Phonometrie, 2nd ed. rev.
Bibliotheca phonetica 3, Basel 1966, pp. 173-178; H. Bluhme in his English translation uses
phonetic text (Principles of Phonometries, The University of Alabama Press 1970, pp. 130-
135).
154 HERBERT PILCH
plausible in the present context, as it carries on a rhetorical anaphora.
Which phonemes are "factually present"? /renJi/? or jaiJi(st)/? We cannot
answer this question uniquely. It is, I conclude, an improper question in
this instance. The transcripts can, in the nature of things, be no more than
rough approximations. How do people communicate with so much
indeterminacy? The fact is they do, and they couldn't care less. In the
present instances (both are from Canadian radio programs), no listener has
been puzzled except the phonetician. And even the phonetician was not
puzzled till he had done a good deal of re-listening to the tape.
Dialectologists often call on a native speaker to help them transcribe an
auditory text. Even though they know the phonemic system involved, they
are puzzled by many passages. 11 This has been explained by the native
speaker myth, attributing to the native speaker a sixth sense which enables
her or him to recognize phonemes which the non-native does not. The
rational explanation is, I submit, in the phonemic indeterminacy of the
auditory text. As the indeterminacy is phonemic, it cannot be resolved by
phonemic criteria, but it requires the skill of an editor who, by virtue of
cultural experience, knows not only how the community concerned usually
expresses itself, but what it usuflllY talks about. The dialectologist, for all
his learning, cannot rival the cultural experience of the untutored native
speaker. This is also why automatic speech recognition is so difficult.
Though we may program into the computer all our acoustic knowledge, we
cannot yet do the same with the cultural knowledge on which the human
listener also relies for recognition.
Ill. PHONETIC HEARING
The assumption that there IS such a thing as "factual [e]" vs. "factual [i]"
presupposes a UNIQUE PARTITION of auditory space and, conversely, a
single, standard list of different "sounds". A crude version of this idea is
what we were taught in grade school: there ARE five (or six, or seven, or
eight ... ) vowels a, e, i, o, u, ... 12 The more sophisticated versions of the
standard list claim to be "universal phonetic alphabets". They differ from
the crude version in the number of vowels, but the basic idea remains the
same: the unique partition of auditory space-not in terms of just five or six
vowels, but (say) of twenty, or fifty or two hundred, or in terms of a single
list of distinctive features-twelve, fourteen, twenty, or suchlike. The
phonemic system (or sound pattern) of any particular language is then
11
As discussed by E. Zwirner, Grundfragen, pp. 169-173 (Phonometries, pp. 133-135); F.
Hedblom, "Recording in Dialect Investigation in Sweden," Phonetica, 3 (1959), 95-108.
12
In fact, one of my (flunking) students at the University of Massachusetts argued thus the
other day. Noticing my disapproval, he explained: "The teacher told us so in High School."
AUDITORY PHONETICS !55
taken to constitute a selection from the big standard list-in such a way
that certain specific "sounds" are selected from the list and all others
deleted: "In the beginning God created the heavens and the earth and the
International Phonetic Alphabet. 13
Now this is, I submit, a bad theory. Its justification is not empirical, but
cultural. It is extrapolated from the Latin alphabet, which our culture has
been using and adapting to new languages for many centuries. So the Latin
alphabet appears "natural" to us-either the crude version with five vowels
or one of the more sophisticated versions. 14 Other cultures, say the
Chinese, use, for their version of phonemic analysis, a different partition of
auditory space, one in which the question "How many vowels?" simply
does not apply. 15 The idea that phonemic systems constitute subsets from
some universal list of sounds is, I submit, irreconcilable with the empirical
evidence:
I. If every phonemic system were, in fact, a subset from a standard
"universal phonetic alphabet", then we should be able to learn how to
pronounce an unknown language from a sufficiently narrow transcription.
We all know from experience that this cannot be done. The learner can, in
13
Note Charles Hockett's caustic comment on "the assumption that there is a large, but
strictly finite, stock of distinct humanly possible articulations or speech sounds, from which
each language selects a small subset as its phonemes." As Hockett remarks, "In fact, of course,
the set of all humanly possible articulations forms a multidimensional continuum, from which
discretely contrasting ranges are quarried by quantization, and the ways in which this is done
in two different languages need show no congruence." (Amer. Speech, 47 [for 1972; belatedly
published 1975], 243n.) In the same vein, this writer "rejects transcriptions which allege some
particular topological structuring of phonetic space as universal" (Phonemtheorie, 3rd ed.
Basel 1974, p. xi).
14
This was first pointed out by H. Liidtke: "Der Gedanke, daB man mit den modernen
Transkriptionssystemen die Gesamtheit lautlicher Variation aller Sprachen und Mundarten
erfassen konne, sofern man nur das Symbolrepertoire angemessen erweitert, ist letzlich eine
Extrapolation der Entwicklungsgeschichte der Lateinschrift" (Folia Linguistica, 5 (1972],
336). Conversely, alphabetic writing has been taken to confirm the validity of our transcripts:
"The segmentability of the speech chain into discrete and global phonemes is firmly
established by alphabetic writing systems" (0. S. Achmanova, Proceedings of the Seventh
International Congress of Phonetic Sciences p. 170). I question not the segmentability as such
(see section (4) below), but the general validity of particular segmentations such as the
International Phonetic Alphabet.
15
A convenient summary statement of the Chinese way of phonemic analysis is offered by
Tung-Ho Tung, Bipartite Division of Syllables in Chinese Phonology," Proc. Ninth Int. Cong.
of Linguists, p. 203. The reason for the different cultural modes of phonemic analysis does
not necessarily lie in differences oflinguistic structure. Chinese can be analyzed with vowels, as
has been done by C. H. Hockett, "Peiping Phonology," J. A mer. Oriental Soc., 67 (1947), 253-
267; "Peiping Morphonemics," Language, 26 (1950), 63-85. Conversely, some European
languages, such as Swedish and Norwegian, have pitch patterns that are distinctive on the
level of lexical meaning, but most dictionaries omit them.
156 HERBERT PILCH

fact, not get away with the selection from a known partition of auditory
space, but he must learn a new partition, a new AUDITORY CATEGORIZATION
with every new language he learns.
2. Conversely, if the universal phonetic alphabet did, in fact, apply to
every language, then phoneticians should be able tc transcribe every
unknown language properly at the first attempt. The expert who knows "all
the sounds of the human species" must know every sound in any particular
subset. Experience shows that the phonetician can, at the first attempt,
achieve no more than a rough approximation, because she or he has not yet
learned the specific auditory categorization which is pecu::ar to the new
language. We categorize what we hear in terms of what we already know
(such as some phonetic alphabet). As the two categorizations are incom-
mensurate, we often find that the native speaker to whom we read our
transcripts does not understand them.
3. Certain aphasic patients can hear in the sense that they have normal
hearing under audiological examination, but they do not recognize
phonemes. 16 For instance, I had a Welsh patient who had lost all but the
most elementary pitch patterns of Welsh, yet he could sing. 17 What he had
lost was not the sense of hearing, but the auditory categorization of what he
heard in terms of the pitch patterns of Welsh.
It appears then that auditory perception is not simply a matter of hearing
something the way it "is," but it necessarily involves different auditory
categorizations. These auditory categorizations are, in principle, at the
discretion of the listener. It is the listener who chooses to treat a particular
bit of noise as just noise or as an auditory text and who decides whether it
can plausibly be treated as a text. Analytically speaking, this decision is
what constitutes phonemic analysis. The particular phonological model A,
B, C, or D in terms of which this analysis is couched-Chinese or
European-is a matter of detail. 18 For the purposes of auditory phonetics,
I propose to distinguish between three modes of auditory perception: 19
1. The AUDIOLOGICAL hearing which is a biological property of the
16
This is the syndrome known as PHONEMIC DEAFNESS (Lurija) or LAUTTAUBHEIT (Kleist).
17
See my account in "Aphasische Intonationsstorungen," Saggi, 2 ( 1976) 33-42; "Aphasia
in Welsh," Word, 28 (1976), 207-229.
18
Cf. H. Pilch, Phonemtheorie, 3rd ed., Basel: 1974, p. 157.
9
-' This is close to Berti! Malmberg's views, Introduktion till fonetiken som vetenskap,

Stockholm: 1969. Malmberg distinguishes between the same three modes of perception, but
he recognizes the third not as crucial to intelligibility, but only as a crosscheck on signals
already recognized phonemically: "Faktor 2 ar avgorande for mottagarens reaktion. Om
denne brukat den riktiga koden, identifierar han ljudvagen som en utsaga pi\ ett kant sprak". I
have presented more extensive data in support of my alternative view in "La langue et Ia
comprehension: experience d'un aphasique," La linguistique, 10 (1974), 79-90. Most
AUDITORY PHONETICS 157
normal human ear as such, for instance the pitch ranges and loudness
thresholds within which our ears perceive noise, feel pain etc.
2. The PHONEMIC listening which is an acquired faculty of those people
who have learned a given language. It involves for example the recognition
not just of pitch, but of the pitch patterns of a given language. It is lost
under phonemic deafness.
3. The EDITORIAL understanding which involves hunches and guesses
and is best practised by people who have a great deal of cultural experience
with a given language. It is lost (or partially lost) in sensory aphasia.
IV. AUDITORY ANALYSIS
What, then, is the job of auditory phonetics? The answer I proposed
above (end of introduction) appears oversimplified from our present
vantage point. Let us revise it: Auditory phonetics should specify, in a
coherent manner, the partitions of auditory space imposed by different
phonemic systems. As I do not believe in "the natural partition" of
auditory space, I adapt to my purposes culturally pre-established par-
titions, such as noise vocabularies, alphabets, audiological test scales, etc.
This is my justification for drawing on the noise vocabulary of English in
section (i) above. The empirical motivation for adopting any particular
category is provided by phonemic experience with different languages. It
will determine whether or not we need auditory parameters to describe
(say) different voice qualities (see end of section (i) above). 20
Consider, as an example, the category of pitch. This is a pre-established
category familiar to musicians and audiologists. We use it, in order to
specify the pitch patterns of languages, even though pitch in languages
sounqs remarkably different from musical pitch. Phonemic experience,
however, convinces us we do not need the musician's elaborate classifi-
cation of pitches in terms of the diapason. 21 All we need is a simple high-
phoneticians appear to attach even less importance to the editorial mode of perception than
does Malmberg. They assume a "mini-interpreter" or "precategorical auditory store" which
recognizes messages by a "matching procedure", comparing the input signal, on a purely
phonetic basis, with the phonemic units somewhere in the brain.
1. C. Lafon distinguishes between the audiological mode (audition) and the phonemic mode
(integration) of perception (Message et phonetique, Paris: 1966).
°
2
Cf. D. Fry: "there seems to be no record of a language in which voice quality differences
operate independently as a prosodic feature." (Manual of Phonetics, 2nd ed., ed. B.
Malmberg, Amsterdam: 1968, p. 370.)
21
It is true there have been linguistic investigations using the diapason framework, such as
1. E. Buning and C. H. van Schooneveld, The Sentence Intonation of Contemporary Standard
Russian as a Linguistic Structure, The Hague: 1961. Such investigations are unrealistic in the
sense that they contain more detail than can be distinguished by ear in terms of the recurrent
differences (see below).
158 HERBERT PILCH

low dichotomy, or at most four or five different (relative) pitches. When a


pitch changes, it either rises or falls, or it drifts indeterminately. Thus we
have the rising, falling and level pitches. Phonemic experience suggests a
further subdivision by the speed of the pitch movement, either accelerating
or slowing down. Thus we have (besides the level pitch) the accelerating fall
the accelerating rise, the decelerating fall, and the decelerating rise. This
auditory specification is motivated by the pitch patterns oflanguages such
as English, French, Welsh and German. 22
Work with pre-established categories and modify them to suit our needs is
what we do in articulatory and acoustic phonetics, too. For instance, the
acoustic measure of gravity as
f3-F2_F
1
2
is surely motivated not by some natural partition of acoustic space, but by
phonemic experience. 23

Auditory specifications are valid if they characterize at least the phonemic


distinctions within the language concerned. They usually go beyond the
level of distinctive features, but they stop short of a full specification of our
auditory impressions. This restraint is necessary in order to insure both the
coherence of our framework and the typological classifiability oflanguages
on the basis of their auditory properties. Finnish and Lettish both have, for
instance, the same inventory of five pitch patterns: [1] accelerating fall, [2]
accelerating rise, [3] rise-fall, [4] fall-rise, and [5] two levels with pitch break
between them. 24 Yet, Finnish and Lettish SOUND remarkably different.
This, however, does not detract from the validity of the specification.
The impressionistic incompleteness of the specification implies, con-
versely, that a given auditory specification does not necessarily enable us to
recognize the phonemic units concerned (unless we know them before-
hand). Recently, this author struggled to learn the tones of Mandarin
Chinese. They are auditorily specified as [1] level, [2] rise, [3] fall-rise, [4]
fall. 25 But to recognize them in an auditory text by just this specification is

22
On French cf. F. Carton, Introduction a Ia phonetique du fran~ais, Paris 1974. On
English cf. fn. 8 above; on Welsh cf. my article Advanced Welsh Phonemics, Zeitschrifi.fiir
ce/tische Philo/ogie 34 (1975) pp. 60-102; on German cf. my article "Baseldeutsche
Phonologie auf Grundlage der Intonation," Phonetica, 34 (1977), 165-190.
23
G. Fan!, Den akustiskafonetikens grunder, Stockholm: 1957.
24
The inventory of Lettish has been described by L. K. Ceplitis, Analiz re~evoj intonaciji,
Riga: 1974, p. 113; the inventory of Finnish by A. Sovijiirvi, Olli-Matti Ronimuksen eriiistii
runoista laadittuja lausunto--analyysi-harjoituksia (Fonetikaan /aitos, Helsinki: 1963).
25
For an acoustic specification see J. M. Howie, Acoustical Studies of Mandarin Vowels
and Tones, Cambridge and New York 1976.
AUDITORY PHONETICS 159

hopeless. The pitches keep drifting in all directions. Still, the specification is
good. How does one ever learn to recognize these tones? More generally,
how can we ever analyze a phonemic system which we do not know
already?
The way to tackle the job is, first, to forget about the auditory
specification. What we do instead is, in principle, listen to several sequences
of at least two tones (the segmentation presents no particular difficulty
most of the time) and decide not whether they rise or fall, but whether they
sound the same or different. This is the most elementary phonetic
judgment (as we saw above). We do, of course, hear subtle differences
everywhere. We may thus be tempted never to judge any two given tones
(or other phonetic events) to be the same. If we fall to this temptation, we
will fail to arrive at any partition of the auditory space, that is, fail at
phonemic analysis. The way to escape this pitfall is to take into account
only such differences as we can hear again and again consistently (not just
once or twice or any limited number of times). Thus we listen ultimately not
just for given· phonetic events, but for recurrent auditory differences
between classes of phonetic events. In this way one eventually learns to
recognize the tones of Chinese, or the pitch patterns of intonational
languages 26 and even phonetic parameters other than pitch. This is at least
one way to learn,. by trial and error, those "new partitions of auditory
space" which characterize unknown languages. In every case it is essential
not to project one's auditory impressions onto a "general phonetics chart",
as was done by the old "Ohrenphonetik," but to compare them with each
other, listening. for recurrent differences. The specification of these
differences in terms of auditory parameters, is a subsequent step. First
recognize them, then characterize them. It is after we recognize the set of
four tones that we can apply to them such parameters as level, rise, fall-rise
and fall, and we can then argue whether or not some fall-rises might not be
better described as rise-falls, low-levels, and so on.
This procedure reflects the two basic assumptions of phonetics, 2 7 namely
that phonetic events are (i) distinguishable, and (ii) classifiable.
Both assumptions imply the discrete character of phonetic elements. If they
were not distinguishable, we would be unable to recognize them. If they were
not classifiable, we would be unable to classify them as same or different. These
assumptions constitute the essential difference between audiology and audi-
tory phonetics, acoustics and acoustic phonetics, physiology and articulatory
phonetics. Without these assumptions (or their equivalents) no phonetic work
26
Lexical and semantic criteria are, of course, helpful in the case of tones, but not in the case
of intonations which, by definition, do not involve lexical and semantic differences; see my
article "Intonation in Discourse Analysis," Phonetica, 34 ( 1977), 81-92.
27
As formulated by E. Zwirner, Grundfragen, p. Ill.
160 HERBERT PILCH

would be possible; we would go on forever being baffied by the infinitely


variable soundflow.
Has anybody ever successfully proceeded the other way-the way that is
laid down in many of our textbooks, first fully analyze the soundftow in
terms of a very narrow transcription, then single out the distinctive
transcription signs and throw away the redundant ones? Has everybody
not, in fact, always used so called shortcuts, meaning less unrealistic
procedures than those of the textbook? Let us now recognize those
"shortcuts" for what they are and accord them their proper epistemologi-
cal status. First of all let us recognize the primacy of auditory phonetics as
our most convenient and, indeed, inevitable gateway to phonetic
investigation.

Albert-Ludwigs-Universitiit
Freiburg i.Br. and
University of Massachusetts at Amherst

You might also like