In my 2007 article in Birding magazine, I distinguished three methods of describing a bird sound in words: transliteration, analogy, and analytic description. But the article was so focused on promoting the third of those methods that it gave the other two short shrift. I was particularly scathing in my condemnation of phonetic transcription, bemoaning its “limited capacity to carry information,” with “little or no regard to pitch, tone quality, variation, or any other crucial components of birdsong.”
However, the more I study phonetic transcriptions, the more I become convinced that they tend to be more informative than I thought. In fact, even though the people writing the transcriptions may be completely unaware of it, their choice of vowels almost always follows a consistent set of rules for indicating the pitch and inflection of the bird sound. Consonants are another story, but the consistency of the vowel rules is extraordinary once you learn to see it.
The first principle is this: different vowels represent different pitches in bird sounds. The following list arranges common (American) English vowels — plus the semivowels r, w, and y — from highest transcription pitch (at the top of the list) to lowest transcription pitch (at the bottom of the list).
- ee and y (as in feet and yes)
- ih (as in pit)
- eh (as in pet)
- ah (as in father)
- oh (as in pole)
- er and r (as in herd and rip)
- oo and w (as in boot and win)
The lowest-pitched bird sounds in North America are those of doves and owls — and it’s common knowledge that they coo and hoot, respectively. Meanwhile, if I tell you to listen for a “seet” or a “peep,” you’ll expect something high-pitched. Only medium-pitched sounds get filled in with other vowels, like the nasal notes from this Cooper’s Hawk, which Sibley’s guide describes as “pek-pek” notes and the old Audubon Society guides transliterate as “cack-cack-cack” (using vowels from the middle of the chart):
But what we’ve just seen isn’t the cool part. The cool part is how people use diphthongs — that is, vowel combinations — to consistently, systematically represent changes in the pitch of bird sounds. The correspondence between vowels and bird sounds isn’t random — it’s based on the acoustic properties of the human voice and the way it produces vowels and semivowels. The rules are simple:
- Monotone sounds are transliterated by single vowels, never diphthongs: for example, “tseet,” “peep,” “hoo.”
- Upslurred sounds are transliterated by two consecutive vowels, the first one lower on the chart, the second one higher. For example: w + ee = “whee.”
- Downslurred sounds are transliterated by two consecutive vowels, the second one lower on the chart than the first. For example: ee + r = “eer.”
- Overslurred sounds are transliterated by three consecutive vowels, the middle one highest on the chart. For example: w + ee + oo = “wheeoo.”
- Underslurred sounds are transliterated by three consecutive vowels, the middle one lowest on the chart. For example: ee + oo + ee = “eeyoowee.”
Believe it or not, despite the huge variety of ways to transcribe bird sounds, most transcriptions follow these principles — and for good reason, as we shall see.
The commonest call of the Great Crested Flycatcher is a classic upslur, and like all upslurred sounds, it rises on the spectrogram from lower left to upper right:
The Sibley, National Geographic, and Audubon field guides all transliterate this sound as “wheep,” which starts with an “oo” sound (w) and changes to an “ee” sound — thus, spanning the entire range of the pitch table from bottom to top. This oo + ee combination virtually always denotes an upslurred sound, as it does in the “whit” calls of Empidonax flycatchers, the “squeet” call of Sprague’s Pipit, and the “kwit” flight call of Type 4 Red Crossbill. A look at the human voice on the spectrogram shows the reason:
Whenever the human voice pronounces a diphthong that begins low on the chart and ends high, the resulting spectrogram shows a distinct dark band that runs from lower left to upper right — in other words, an upslur. However, this dark band does not represent the pitch of the person’s voice — note that the darkness moves independently of the harmonics (the underlying horizontal lines), which change with the voice’s pitch. The dark bands are called formants, and they differentiate vowel sounds. (For a detailed explanation of spectrograms as they relate to the human voice, see this page.)
Like all downslurred sounds, the call of the Olive Warbler traces from upper left to lower right on the spectrogram:
Sibley transliterates this “teew” or “tewp,” National Geographic “phew,” and Audubon “kew.” In all cases, the vowel combination is ee + oo = “ew,” as it is in Sibley’s transcriptions of the American Robin’s “tseeew” alarm call and the “kewp” flight call of Type 2 Red Crossbill. Another classic vowel combination for downslurred sounds is ee + r = “eer,” like in the “pdeeer” call of Say’s Phoebe, the “veer” call of Veery, and the “cheer” call of Carolina Wren. As you might expect, spectrograms of the human voice saying “eer,” “eew,” and “yo” show downward-sweeping formants:
The burry overslurred call of the Couch’s Kingbird can be seen three times on the spectrogram below, mixed with shorter calls:
Sibley transliterates this as “kweeeerz” and Audubon as “queer” — in both cases, the vowel combination is oo + ee + er = “weer.” National Geographic goes almost the same route, with er + ee + er = “breeeer.” Similar patterns occur in Sibley’s “hweeeeeew” for Dusky-capped Flycatcher and “urrREEErrr” of Common Pauraque. And, of course, a similar pattern can be seen in spectrograms of the human voice pronouncing these combinations:
The possibility of standardization
So far, I have merely tried to describe the way people do describe bird sounds, not the way that they should do it. However, it may be possible to go one step further and create a standardized system by which transcriptions could communicate the basic properties of a sound to any audience unambiguously, and two people hearing the same sound would transcribe it the same way. Much more work would need to be done, particularly on consonants, but it might be well worth doing. I expect to explore some of the possibilities in future posts.