Bird sounds vary on many levels. Naturally, they vary from one species to the next. But they can also vary within a species — say, between birds from different geographic regions. And they can also vary within a geographic region — say, between one individual bird and the next. And they can also vary within an individual bird — say, if that individual happens to have a repertoire of multiple songtypes. And they can even vary within one songtype of one individual bird — say, if the bird fails to reproduce that same songtype exactly each time it sings.
When talking about variation, it would be nice to be able to distinguish exactly which kind of variation we mean. I propose a vocabulary like the following.
Differences between individuals: variable vs. uniform
I’ve started using the word variable to describe sounds that show high levels of individual variation, and uniform to describe sounds with low levels of individual variation. In other words, variable sounds differ from one bird to the next; uniform sounds are the same in all members of a species.
Variable sounds (differing between individuals)
A good example of a variable sound is the mewing call of the Spotted Towhee. This sound takes a great many forms across the range of the species. Even in the same location, the calls of different individuals may differ sharply.
Uniform sounds (the same between individuals)
The song of the Chuck-will’s-widow is highly uniform: that is, it varies little across the entire geographic range of the species. If you’ve heard one Chuck-will’s-widow, you’ve pretty much heard ’em all.
Differences within individuals: Plastic vs. stereotyped
Plastic sounds differ slightly each time they are produced, even by the same bird. Stereotyped sounds are always the same when made by the same bird.
Plastic songs are typical of young birds, but some sounds of certain species remain plastic into adulthood, like the “Vreet” call of the House Finch. In adult birds, plasticity may vary with situation and season. Highly stereotyped versions of song tend to be associated with courtship. Plastic versions may be heard in winter, when levels of breeding hormones are lower.
Plastic sounds (differing within individuals)
Notice how these calls are almost never the same twice:
Stereotyped sounds (the same within individuals)
Most bird songs are stereotyped, at least during the breeding season: each time the individual repeats a given songtype, it produces a precise copy of the previous rendition. Note that sounds can be variable and still stereotyped; Eastern Meadowlark songs vary greatly across their range, and even within individuals (in the sense that each individual has multiple songtypes). But consecutive renditions of the same songtype tend to be identical down to the smallest details — the songtype never changes when the same bird repeats it.
The four vocabulary words I’ve defined here (variable, uniform, plastic, stereotyped) aren’t quite sufficient to describe all the different types of variation. Things get complicated when you start talking about repertoires of songtypes, truncations of stereotyped songs, or mosaics of geographic variation of different song features. But maybe this is a start. I’ll be happy to hear what people think.
Nothing has created more confusion about how to describe sounds than tone quality.
Tone quality is the distinctive voice of a sound — the thing that allows you to tell the difference between a violin and a trumpet when they’re both playing the same note. It comes in very handy when identifying birds by sound, but people have tended to differ in their notions of how to describe it. Today, we’re going to break sounds down into just seven basic qualities, which in combination make up the huge variety of sounds that birds can create.
And here they are:
Whistled sounds
Whistles are the most basic and common type of bird sound. They appear on the spectrogram as simple nonvertical lines. Non-bird sounds with a whistled tone quality include typical human whistling, and the sounds of flutes and piccolos. Bird examples are plentiful:
Hooting sounds
Hoots and coos are just low-pitched whistles, less than 1 kHz in frequency, that appear at the very bottom of the spectrogram. They resemble the sound made by blowing across the top of an open bottle, and they are typical of the voices of doves and large owls.
Clicking sounds
Instantaneous bursts of noise sound like clicks, pops, or taps, and appear on the spectrogram as vertical lines. The ticking of a clock, the drumming of a woodpecker’s bill against a tree, the bill snap of an angry flycatcher, and the ticking song of Yellow Rail all fall into this category.
Burry and buzzy sounds
When a whistle rises and falls very rapidly in pitch, it forms a squiggly line on the spectrogram and sounds trilled, like a referee whistle. If the squiggles are tall and fast enough, they sound less musical, more like an electric buzzer. What all burrs and buzzes have in common is the presence of very rapid repeated elements, resulting in audible “beats”. This makes them very similar to (and in some cases indistinguishable from) trills.
The beats in burry and buzzy sounds are often so rapid that they are not individually visible on the spectrogram. The result is a well-defined shape on the spectrogram that is vertically thicker than the thin line of a whistle. Such sounds often have a hoarse, grating quality to the ear.
Noisy sounds
Noisy sounds contain noise—that is, random sound at multiple frequencies, which looks like television static on the spectrogram and sounds like static to the ear. Unlike buzzes, noisy sounds tend to have faded, blurry edges on the spectrogram, and they often almost stretch all the way to the bottom and the top of it.
Non-bird sources of noise include rushing streams and waterfalls, and the English speech sounds “s” and “sh”. Noisy bird sounds tend to be described as “rough” or “harsh,” like the alarm chatters of wrens and the hissing of angry swans and geese.
Nasal sounds
Many bird sounds are actually combinations of multiple simultaneous whistles on different pitches that the human brain typically perceives as a single sound (because of the mathematical relationship between the frequencies of the different whistles). This is characteristic of the sounds we identify as having a nasal tone quality. The individual whistles are called partials. Non-bird examples include police sirens, the whine of mosquito wings, and the sounds of oboes and violins.
Polyphonic sounds
Many birds can produce two separate sounds simultaneously, one from each lung. When birds use this ability, the two original sounds blend into one polyphonic sound. Polyphonic sounds are diverse, encompassing a number of different tone qualities, but with practice, they can be consistently distinguished from all other types of sounds by ear.
On the spectrogram, polyphonic sounds may look like nasal sounds, with stacks of partials, but if the spectrogram is high enough in quality, they can usually be distinguished by having partials that are dissimilar in shape, irregularly spaced, or simultaneously rising and falling.
The quality of most polyphonic sounds is either distinctively metallic or distinctively whiny.
metallic: If the polyphonic notes are very brief or contain monotone segments, they tend to sound metallic, like certain versions of the Hooded Oriole call, some versions the “squeaky gate hinge” songs of Brewer’s Blackbird and Common Grackle, and the shimmering melodies of thrushes like the Veery.
whiny: If the polyphonic notes do not contain any monotone segments, they tend to have a whiny quality, like the Pine Siskin and Blue-gray Gnatcatcher calls, as well as the common calls of House Finch and the flight calls of meadowlarks.
Obviously, these seven tone qualities are very broad categories. Some of them grade into one another, and some of them occur in combination — e.g., a note may be simultaneously burry, noisy, and nasal. But this is the basic vocabulary we’ll use to start discussing the qualities of sounds. More to come!
Changes in Speed and Pitch, and Multi-noted Series
One of the basic questions we ask of any bird sound is, “are the notes slow enough to count, or too fast to count”? Sometimes, the answer is both.
Some bird sounds change in speed. If the elements in a series are more closely spaced on the spectrogram as you move from left to right, then they are growing more closely spaced in time, which means that the series accelerates. If the elements grow farther apart, the series decelerates.
Here are a couple of examples. The song of the Wrentit is a series of notes that accelerates into a trill, while the drum of a Yellow-bellied Sapsucker starts as a trill of tapping notes, and slows into a series.
Changes in pitch
Phrases, series, warbles, and trills can also change in pitch. For example, a warble might sound upslurred if it shows an overall trend towards higher notes. Similarly, a series might fall in pitch if each note starts slightly lower than the last, even though each individual note may be upslurred.
Overslurred series are quite common among bird sounds. Here are two examples:
Changes in both speed and pitch
Many sounds change in speed and pitch at the same time. A quick glance at the spectrogram of the Sora’s whinny shows us that it’s an overslurred, decelerating series with an early peak:
Here’s a decelerating, downslurred series of upslurred whistles:
And here’s a phrase accelerating into an upslurred warble:
Multi-noted Series
Sometimes the repeated elements in a series may themselves consist of multiple notes. A two-noted series sounds like a two-syllabled word repeated, such as “peter peter peter;” a three-noted series, like a three-syllable word repeated, such as “teakettle teakettle teakettle.”
Examples of two-noted series
Examples of three-noted series
Yes, there are four-noted series too
With the basic vocabulary that I’ve introduced in these three posts, we can describe the pattern of almost any bird sound. But there’s more to bird sounds than just pattern. Stay tuned for the next installment in the series.
In the last post, I covered the five basic pitch patterns, introducing some vocabulary to help distinguish between different types of individual notes. Today I’m going to introduce some vocabulary to help distinguish between different types of groups of notes — that is, different types of songs.
The four song patterns are based on two simple questions:
Does the bird ever sing the same thing twice?
Are the notes slow enough to count, or too fast to count?
Together, these two questions delineate four basic patterns: phrases, series, warbles, and trills. These four simple patterns, individually and in combination, give us a precise way to describe almost any type of complex bird sound.
Phrases are clusters of unique notes that are slow enough to count;
Series are clusters of repeated notes that are slow enough to count;
Warbles are clusters of unique notes that are too fast to count;
Trills are clusters of repeated notes that are too fast to count.
“Too fast to count” is a somewhat subjective criterion, but as a general rule of thumb, it’s any speed over about 8 notes per second. Here are some examples of each pattern, so you can practice hearing the differences.
Phrases
In these examples, each note is different from the one before, and the notes are slow enough to count.
Series
In these examples, each note is the same as the one before, and the notes are slow enough to count.
Warbles
In these examples, the notes are all different, and too fast to count.
Trills
In these examples, the notes are all the same, and too fast to count.
In the next installment in this series, we’ll look at some ways to combine and extend this vocabulary to cover almost any type of bird sound pattern.
The “How to Read Spectrograms” section of this blog is in desperate need of an upgrade, so today I’m starting a series of posts to help people describe and visualize sounds as simply and clearly as possible. Our first topic: pitch patterns.
To identify birds, you don’t need musical training. You don’t have to name the notes that a bird is singing. You only have to recognize whether the pitch of a sound is rising, falling, or staying the same. Five simple patterns allow us to describe most sounds:
Monotone sounds do not change in pitch, and appear horizontal on the spectrogram.
Upslurred (or rising) sounds rise in pitch, and appear tilted upward.
Downslurred (or falling) sounds fall in pitch, and appear tilted downward.
Overslurred sounds rise and then fall in pitch, appearing and sounding highest in the middle.
Underslurred sounds fall and then rise, appearing and sounding lowest in the middle.
Let’s listen to some examples.
Monotone sounds
These sounds are characterized by horizontal lines on the spectrogram. Even if those lines are very short (as with the Townsend’s Solitaire call), it’s still easy to hear that the sound isn’t going up or down, just remaining on the same pitch from start to end.
Upslurred sounds
Listen to these sounds and practice hearing how they rise in pitch.
Downslurred sounds
Listen to these sounds and practice hearing how they fall in pitch.
Overslurred sounds
One of the most common pitch patterns in bird sounds is the overslur, which rises and then falls in pitch. These sounds are often mistaken for upslurs or downslurs, so listen carefully to hear both the initial rise and the ending fall.
Underslurred sounds
This is not a common pattern, but when heard, it is distinctive.
These five basic pitch patterns are the starting point for talking about the different types of notes that we hear from birds, but there’s much more to discuss. Next up, we’ll discuss the four basic patterns of repetition and speed.
As a kid, I began learning to identify bird sounds by listening to the old Peterson Birding By Ear tapes (one of the best learning aids in existence for bird song, still on the market in CD form). One part of the tape eventually wore through because I listened to it so often — the part with the Veery.
What made that part so special was that Birding By Ear played the Veery song several times — first at normal speed, and then slowed down to half and quarter speed. At full speed, the song was incredible: a shimmering swirl of notes spiralling downward, ethereal and metallic. Slowed down, it was more incredible still. The bird’s voice rolled up and down arpeggios like someone playing pan pipes — two people playing pan pipes, actually, because the Veery is a polyphonic singer; it sings simultaneously with both sides of its syrinx. The bird literally has two voices, one from each of its lungs, and it can control them separately. A single Veery sings a duet — and when you slow the song down, you can hear the bird actually harmonize with itself.
Today, I can recreate those slowed-down Veery songs on the computer. And I can take it one step further: I can undo the duet. I can edit the sound file so as to listen to one Veery voice at a time.
The original
Here’s one strophe of a Veery song from Colorado. I’ve cleaned up the spectrogram to show how the two voices overlay one another. (Not my best photo editing, but it’ll have to do.)
If you’re familiar with the Veery’s song from the eastern United States, you might find this example slightly less ethereal, slightly more jangling, and slightly less shimmery than the versions you’re used to hearing. For the most part, that’s not actually due to geographic differences in Veery song (although there are some of those as well). It’s mostly due to the fact that eastern Veeries almost always sing in hardwood forests, where their voices bounce off of innumerable trunks and leaves, smearing the sound with echo. The Veery I’ve chosen, like most in Colorado, sings in willow carrs at medium-high elevation — a much more open habitat that lacks a forest’s echo. It may make the Veery a little less evocative, but it makes it much easier for me to do the sound editing necessary to separate the voices from one another.
Now let’s slow the Veery down, so you can hear it harmonizing with itself:
half speed:
1/4 speed:
Separating the voices
Here’s what the spectrogram of the Veery song looks like if we make the two voices different colors:
And here they are, separated to the best of my ability. (The first note, the rising single-voiced burr, is on both recordings.)
Upper voice (red):
Lower voice (cyan):
Having trouble following along? Try listening to both voices at half speed:
Upper voice (red) at half speed:
Lower voice (cyan) at half speed:
What can we learn from this exercise? First, the upper voice dominates the original song. It’s carrying the melody; the lower voice is softer and just provides the harmony. Second, the level of detail in each voice is immense, and can be difficult to follow even at half speed. Third, both voices are needed to bring out the jangling, metallic quality that is so typical of Veery and its relatives. That metallic sound is an emergent property of the two voices mixing. More on that in a future post.
Finally, and most importantly: bird sounds are really, really freakin’ cool. But I bet most of you knew that already.
Recently a friend alerted me to a post on the “ID-Frontiers” listserv by Christopher Hill in which he made a statement very dear to my heart:
In this day and age, I’m always surprised at the contrast between the level at which many advanced birders discuss plumage cues and the much more primitive way a lot of us approach sounds. I doubt I could convince many people on this forum of the identity of a vagrant by saying “but it looked just like the picture in my field guide!” (maybe if I repeated it?) but that type of argument is offered much more often, even routinely, in discussions of sounds.
He then apologizes for sounding like a preachy blowhard. (Hoo boy! If those are the words of a preachy blowhard, then I’ve got a lot to apologize for!)
I couldn’t agree more with his argument, which neatly summarizes the raison d’etre of this entire blog. I also wrote a Birding magazine article a few years ago that created a conceptual framework intended to help people describe sounds better. But reading Chris’s comments, I realized that a conceptual framework may not be of immediate use to people hearing bird sounds in the field. What they need are a set of instructions. So I decided to write a few.
How to Describe A Bird Sound in Six Easy Steps
If you can, make an audio recording. Use your cell phone. Use your camera on the video setting. Use a cheap voice recorder. Use your laptop. Use any device that can possibly record sound. If you don’t have one, that’s OK — but if you have any audio recording capability whatsoever, don’t proceed to Step 2 until you’ve done Step 1!
Count the notes. (If they are too fast or too many to count, make a note of that.)
Figure out which notes are repeated, if any. (Remember: trills are made of notes that are repeated, too fast to count.)
Write down nonsense words that sound like what the bird is saying (that is, onomatopoeia). Try not to use real words or phrases, as you’re likely to get closer to the original sound if you let yourself break the rules of English. Spend some time on this, and try to get the transcription as close to the original as possible.
Compare the sound you’re hearing to similar sounds. These could be bird sounds or non-bird sounds — for example, “like a robin song, but without any pauses”; “like the squeak of a shoe on a gym floor”; “like an electronic video game.” Spend some time on this also — come up with multiple comparisons if at all possible.
Sketch the sound. If the pitch of the sound goes up, draw a line that goes up. If it then goes down, draw a line that goes down. You get the idea. Put each note on the page, the way it sounds to your ear.
So, for the record, that’s
Audio
Count
Repeat
Onomatopoeia
Similar
Sketch
Or ACROSS for short.
OK, I know, that’s cheesy. But seriously, these are the steps to follow in the field. Don’t just settle for one of the steps — do them all! Try them with common birds. Try them with birds you don’t know (I’ll be happy to help you identify the results). Try them when documenting rarities.
You don’t need to know any fancy terminology, have any musical training, or use any “conceptual frameworks” when describing bird sounds — you just need to sit down and take the time to do each step carefully. It will change the way you listen, and it will change the way you talk about what you hear.
In my 2007 article in Birding magazine, I distinguished three methods of describing a bird sound in words: transliteration, analogy, and analytic description. But the article was so focused on promoting the third of those methods that it gave the other two short shrift. I was particularly scathing in my condemnation of phonetic transcription, bemoaning its “limited capacity to carry information,” with “little or no regard to pitch, tone quality, variation, or any other crucial components of birdsong.”
However, the more I study phonetic transcriptions, the more I become convinced that they tend to be more informative than I thought. In fact, even though the people writing the transcriptions may be completely unaware of it, their choice of vowels almost always follows a consistent set of rules for indicating the pitch and inflection of the bird sound. Consonants are another story, but the consistency of the vowel rules is extraordinary once you learn to see it.
The first principle is this: different vowels represent different pitches in bird sounds. The following list arranges common (American) English vowels — plus the semivowels r, w, and y — from highest transcription pitch (at the top of the list) to lowest transcription pitch (at the bottom of the list).
ee and y (as in feet and yes)
ih (as in pit)
eh (as in pet)
ah (as in father)
oh (as in pole)
er and r (as in herd and rip)
oo and w (as in boot and win)
The lowest-pitched bird sounds in North America are those of doves and owls — and it’s common knowledge that they coo and hoot, respectively. Meanwhile, if I tell you to listen for a “seet” or a “peep,” you’ll expect something high-pitched. Only medium-pitched sounds get filled in with other vowels, like the nasal notes from this Cooper’s Hawk, which Sibley’s guide describes as “pek-pek” notes and the old Audubon Society guides transliterate as “cack-cack-cack” (using vowels from the middle of the chart):
But what we’ve just seen isn’t the cool part. The cool part is how people use diphthongs — that is, vowel combinations — to consistently, systematically represent changes in the pitch of bird sounds. The correspondence between vowels and bird sounds isn’t random — it’s based on the acoustic properties of the human voice and the way it produces vowels and semivowels. The rules are simple:
Monotone sounds are transliterated by single vowels, never diphthongs: for example, “tseet,” “peep,” “hoo.”
Upslurred sounds are transliterated by two consecutive vowels, the first one lower on the chart, the second one higher. For example: w + ee = “whee.”
Downslurred sounds are transliterated by two consecutive vowels, the second one lower on the chart than the first. For example: ee + r = “eer.”
Overslurred sounds are transliterated by three consecutive vowels, the middle one highest on the chart. For example: w + ee + oo = “wheeoo.”
Underslurred sounds are transliterated by three consecutive vowels, the middle one lowest on the chart. For example: ee + oo + ee = “eeyoowee.”
Believe it or not, despite the huge variety of ways to transcribe bird sounds, most transcriptions follow these principles — and for good reason, as we shall see.
Upslurred sounds
The commonest call of the Great Crested Flycatcher is a classic upslur, and like all upslurred sounds, it rises on the spectrogram from lower left to upper right:
The Sibley, National Geographic, and Audubon field guides all transliterate this sound as “wheep,” which starts with an “oo” sound (w) and changes to an “ee” sound — thus, spanning the entire range of the pitch table from bottom to top. This oo + ee combination virtually always denotes an upslurred sound, as it does in the “whit” calls of Empidonax flycatchers, the “squeet” call of Sprague’s Pipit, and the “kwit” flight call of Type 4 Red Crossbill. A look at the human voice on the spectrogram shows the reason:
Whenever the human voice pronounces a diphthong that begins low on the chart and ends high, the resulting spectrogram shows a distinct dark band that runs from lower left to upper right — in other words, an upslur. However, this dark band does not represent the pitch of the person’s voice — note that the darkness moves independently of the harmonics (the underlying horizontal lines), which change with the voice’s pitch. The dark bands are called formants, and they differentiate vowel sounds. (For a detailed explanation of spectrograms as they relate to the human voice, see this page.)
Downslurred sounds
Like all downslurred sounds, the call of the Olive Warbler traces from upper left to lower right on the spectrogram:
Sibley transliterates this “teew” or “tewp,” National Geographic “phew,” and Audubon “kew.” In all cases, the vowel combination is ee + oo = “ew,” as it is in Sibley’s transcriptions of the American Robin’s “tseeew” alarm call and the “kewp” flight call of Type 2 Red Crossbill. Another classic vowel combination for downslurred sounds is ee + r = “eer,” like in the “pdeeer” call of Say’s Phoebe, the “veer” call of Veery, and the “cheer” call of Carolina Wren. As you might expect, spectrograms of the human voice saying “eer,” “eew,” and “yo” show downward-sweeping formants:
Overslurred sounds
The burry overslurred call of the Couch’s Kingbird can be seen three times on the spectrogram below, mixed with shorter calls:
Sibley transliterates this as “kweeeerz” and Audubon as “queer” — in both cases, the vowel combination is oo + ee + er = “weer.” National Geographic goes almost the same route, with er + ee + er = “breeeer.” Similar patterns occur in Sibley’s “hweeeeeew” for Dusky-capped Flycatcher and “urrREEErrr” of Common Pauraque. And, of course, a similar pattern can be seen in spectrograms of the human voice pronouncing these combinations:
The possibility of standardization
So far, I have merely tried to describe the way people do describe bird sounds, not the way that they should do it. However, it may be possible to go one step further and create a standardized system by which transcriptions could communicate the basic properties of a sound to any audience unambiguously, and two people hearing the same sound would transcribe it the same way. Much more work would need to be done, particularly on consonants, but it might be well worth doing. I expect to explore some of the possibilities in future posts.