Speech from Another: The Mechanized Pursuit of the Human Voice
Transcript of Speech from Another: The Mechanized Pursuit of the Human Voice
Davis, Speech from Another May 5, 2015
1
Few arguably trivial endeavors have been fraught with more fear, failure, and potential than
the search to replicate human speech. Whether for the novelty of hearing those familiar words
though simulacra or to achieve connection and unity via long-distance communication, both
rumors and reality of the achievement of replicating human speech are found throughout our
history.
It is no coincidence that the invention of a working “speaking” machine happened squarely
toward the end of the Age of Enlightenment. Concerning technology, timing is everything, and
inventions tend to manifest for specific reasons at specific moments. Society, an ever-evolving
organism, reaches certain points in its progress that seem to necessitate a particular novel idea
and/or contraption that express the needs and concerns of the age. Numerous devices were
created throughout the 18th and 19th centuries that replicated the human voice more or less
effectively for purposes of both entertainment and science. While voice replication was
somewhat well-received in the 17th and 18th centuries, this wasn’t always the case. There is one
famous story of the friar-philosopher Thomas Aquinas dashing apart a speaking machine
created by St. Albertus Magnus over a 30-year period, believing “the Devil to be in her.”1
Thanks to the Enlightenment, superstition began to be replaced by science, and mankind
rediscovered itself as a capable, potent, and intellectual being that was imbued with great
understanding and astounding feats of creation. Immanuel Kant wrote about the necessity to
understand the reasons for the various phenomena of our world through a comprehension of
Davis, Speech from Another May 5, 2015
2
how they actually worked, physically. This pursuit known as teleology is closely linked with
theology. Just as theology attempts to explain God, teleology’s goal was to understand nature
through studying how it worked, quite literally. Man being seen as the ultimate goal of nature
at the time was thought to unlock keys to understanding if only we could know how our bodies
functioned. In short, one could comprehend the meaning of life if one understood how it
worked – how we worked. The first efforts to replicate human speech can be seen as an
attempt to understand the reasons for our very existence and what meaning could be
discerned from speech itself. This period of enlightenment became the engine that drove the
pioneering days of voice synthesis. Later it would be driven by pursuits split equally between a
desire for scientific discovery and frivolous entertainment. Twentieth and 21st century artists
would take up the technology for their own purposes, in both fine art and music, culminating in
a social commentary on our ever-increasing relationship with technology.
To begin at the beginning would be to delve into myth and rumor. While many stories about
“talking” machines go back over a millennia, none of the oldest rumored objects actually still
exist nor do any reliable documents exist regarding how they achieved their feats. In most cases
it should be assumed that apparent voice replication was achieved through some sort of
trickery, such as hidden speaking tubes or actual hidden persons in the apparatus, and the like.
The first legitimate manipulatable speaking machine in historical documentation was made by
Wolfgang von Kempelen in 1791.1 An aristocrat in the court of Vienna and a city official, he was
also widely known at the time for his mechanical abilities. Having become famous for the
creation of “The Turk,” a chess-playing automaton (which, incidentally turned out to be a
Davis, Speech from Another May 5, 2015
3
fraud), he actually did succeed in creating a machine capable of producing human speech based
on scientific research of human physiology and phonetics. Deciding that our vocal organs most
closely resembled a musical instrument in terms of form and function, he found the bagpipe
the most suitable instrument on which to base his machine. After nearly 20 years of effort and
three prototypes, in the end, the machine comprised a box containing a functional glottis as a
moveable reed (such as that found in a bagpipe) fed by an elbow-operated kitchen bellows that
acted as lungs. The end of the machine, where the sound came out, acted as the mouth and
was covered by the user’s hand (as lips), which was formed in various ways to mimic the various
shapes of the mouth and tongue; the effect was not unlike trumpeter playing contemporary
jazz with a toilet plunger over the end of the instrument. Kempelen’s Speaking Machine is said
to have best spoken in French or Latin and was famous for its pronunciation of words such as
“papa,” “mama,” “Marianna,” and “astronomie,” and even short phrases such as “Romanum
Imperator semper Augustus” and “Maman aimez-moi.”1 Goethe himself heard the machine in
1797 and said of it, “The speaking machine of Kempelen… is in truth not very loquacious, but it
pronounces certain childish words very nicely.”1 Although Kempelen’s original machine is lost
to us, his work was carried on by later innovators, most notably German physician and physicist
Hermann vonn Helmholtz in the mid 19th century. His contribution was in focusing on vowels as
a primary vehicle for the formation of comprehensible speech. He discovered that each vowel
contained combinations very specific frequencies (formants) that were invariable reproducible
from person to person as understandable transmissions, akin to musical chords in a piece of
music.2 While he never created a speaking machine per se, he did use tuning forks and
electromagnetic solenoids to create a machine called the Tuning Fork Vowel Synthesizer, a
Davis, Speech from Another May 5, 2015
4
device that sustained “chords” of pure human formants to mimic vowel sounds in pursuit of the
science of phonetics. Phonetics was but a small diversion for Helmoltz and he did not carry his
research to its ultimate end of a speech synthesizer.
Up to this historical point, all speaking machines and their ilk either used a clockwork or
electrical means of operation to reproduce human speech in a repetitive or sustained way (i.e.,
without manipulation or variation) or, as was the case with Kempelen’s Speaking Machine, a
direct physical manipulation with the hand to form various words (not unlike the natural way
we manipulate the air and vibrations from our chest). It would be Joseph Faber and his
invention of the Euphonia in 1846 that would bring about a speaking machine that could be
played, as a piano, to produce human speech. A true technical marvel of the time, it consisted
of sixteen keys – similar in appearance and arrangement to that of a piano – that operated the
jaw, lips, and tongue of a carved wooden head at the end of the machine, and a seventeenth
key that operated the glottis while bellows and an ivory reed made up the lungs and larynx.1 A
skilled operator could produce a wide variety of words and sentences. The variation of “notes”
was really a program, predetermined. The machine’s capabilities were limited by its design with
only sixteen keys to choose from, but the alphabet and language are also limited. The blending
of these options could be considered as mechanized and predetermined in the keys and pedals,
but the operator was able to choose the words and pitch, and could choose from three
different speaking modes: normal, whisper, or song. Sadly, the machine and its inventor
became objects of derision and satire for a press writing for a society that was becoming
increasingly exposed to technical marvels almost daily. Perhaps more than that, the voice
Davis, Speech from Another May 5, 2015
5
produced by the machine was unsettling; “ghostly” and “disembodied” were common
descriptors. Notably, John Hollingshead, a London theater manager, said of the machine: “One
keyboard, touched by [Faber], produced words which, slowly and deliberately in a hoarse
sepulchral voice came from the mouth of the figure, as if from the depths of a tomb.”1 As is the
case with many a failed life-long endeavor, Faber eventually took his own life, his machine
suffering the same fate as himself: as a financial and social failure. Even this final act would be
derided by Hollingshead1:
He disappeared from quietly from London, and took his marvel to the provinces, where it
was even less appreciated. The end came at last, and not the unexpected end. One day…
he destroyed himself and his figure. The world went on just the same, bestowing as little
notice on his memory as it had on his exhibition. As a reward for this brutality, the world,
thirty years afterwards, was presented with the phonograph.
It would seem that by failing to choose either science or entertainment as his ultimate purpose,
he was a disappointment to both communities. Faber was neither the showman people
expected nor the noble scientist in pursuit of a greater truth. His handling of his machine was
lackluster, and he produced no papers on the subject of mechanics or phonetics. Furthermore,
the hollow disembodied voice that emerged from the Euphonia was a step to far. People were
not ready for a voice from another, and they eviscerated Faber, leaving him forlorn and suicidal.
Faber could not have known it, but his machine, while being exhibited in London in the 1840s
was seen by the young Alexander Graham Bell, the inventor of the telephone. While Euphonia
and its technology would not prove vital to what made a telephone work, it none-the-less
inspired the young Bell to study phonetics and to create his own speaking machine. It would be
Davis, Speech from Another May 5, 2015
6
during his own pursuit of a speaking machine that he would stumble upon the technology
necessary to transmit the human voice over great distances.3
In the 20th century, replication of the human voice was passed over in favor of recording and
transmitting it instead. Much progress was made with electromagnetism and electricity, as well
as with recording technology to the transmission of radio waves around the planet. The
phonograph and telephone became common, and it was no longer odd to sit in one’s home and
listen to distant voices emerging from a wooden box in the living room. The telephone also
became commonplace, crucial even, to daily life and business that bandwidth became a real
problem. One can only fit so much electricity into a wire and there simply was too much
demand for the limited cables of the time. Research began into technology that would allow
more headroom in the available infrastructure. Questions such as just how much of the actual
voice needed to be transmitted were pursued, and in the early 1930s, Homer Dudley, an
engineer at Bell Laboratories, developed the Vocoder (VOice enCODER).4 The Vocoder was a
voice synthesizer that analyzed speech by both filtering the human voice and encoding it in
order to transmit it more efficiently. A fortuitous side-effect of this encoding process was that
one could send secure radio transmissions that could only be decoded by select individuals on
the other end. Thus, the Vocoder mainly served as a secure transmission device during World
War II. One notable aspect of the Vocoder’s operation was that it used an input from once
source and then through filtering outputted quite a different sound. It became clear that other
inputs besides the original human’s voice could be used. Potentially other sources could be
employed as the foundational sound for a new synthesized and articulated speech. Dudley
Davis, Speech from Another May 5, 2015
7
imagined that, in time, the Vocoder could fully replace a poor singer’s voice with a good one
from another.5 Naturally there were applications for this in the movie industry, and had it been
developed earlier, all those silent movie actors put out of work when sound was introduced
might have found a job in the talkies. It is interesting that years later, because of its peculiar
and distinctive aesthetical altering of the human voice, the Vocoder has become most widely
known for application in the music industry to this day, as discussed in more detail below.
Toward the end of the 1930s, using this early work as a springboard, Dudley also invented a
device called the Voder (VOice operating DemonstratER).4 Debuting at the 1939 World’s Fair in
San Francisco, the world was introduced to the first bona-fide voice synthesizer. It was a
complicated machine that required a year’s worth of training to operate it well, and the most
skilled telephone operators were draft just for this exhibition. The Voder achieved true voice
synthesis through a combination of analog and electronic technologies: pedals and bars that
operated gas discharges and buttons and switches that controlled oscillators, band-pass filters,
tones, and amplifiers. The result was fully recognizable and understandable speech, even if
robotic sounding. Dudley imagined that the Voder could replace the transmission of an actual
voice over the telephone wires by simply speaking for a person on the other end with its (the
Voder’s) own voice.5
Around the same time the Vocoder was being developed, another inventor stumbled upon an
interesting phenomenon while shaving. Gilbert Wright noticed that when using an electric razor
near his Adam’s apple, the vibrations would emanate from his open mouth. If he mouthed
words with his lips and tongue, the razor’s sound would emerge from is lips as articulated
Davis, Speech from Another May 5, 2015
8
speech – words from a razor. He ran with the idea and in 1939 developed the Sonovox. Not
exactly a voice synthesizer, rather, it was a curious synthesis of another type, that of human
and machine. The Sonovox worked by placing two small transducers on either side of the larynx
on the throat and the subject mouthing the words he or she wanted to speak. The sound
emitted by the transducers was transmitted through the vocal tract and emerged from one’s
mouth as the source sound, but articulated as speech. In this way, train whistles, car engines,
trumpet’s notes, or even the wind could become the surrogate voice for the user. In one
notable quote, the effect of the Sonovox on unsuspecting audience was described: “The
audience was enjoying what they thought was a pipe organ playing ‘The Bells of St. Mary’s,’
when they suddenly ‘sat bolt upright… Some dug at their ears, certain their hearing was playing
them false. Others sat in puzzled wonder. For the pipe organ suddenly burst out singing the
words of the chorus.”5 Shortly after, as this event unfolded, an electric hum became audible
and it started saying “I am power: I light your houses. I run your street cars. I work for you. I am
Power!” While the Sonovox was eventually abandoned as a novelty to the practical usefulness
of the Vocoder and Voder, it did experience a few notable distinctions. In the 1940s, several
films showcased the Sonovox’s abilities. In what almost amounts to a cinematic advertisement
for the technology, the 1940 movie They’ll Find Out featured several extended scenes in which
a big band’s instruments suddenly began speaking the words they were accompanying. In
succession from a trumpet to clarinet to oboe, the scene ended with the lead singer performing
a duet with a woman, with her own voice, singing alongside his (as the voice of the entire
band), complete with him holding the Sonovox to his own throat for all to see. More subtly, in
the 1947 movie Possession, the Sonovox was used to haunt Joan Crawford as the voice of her
Davis, Speech from Another May 5, 2015
9
new husband’s dead wife via the sound of a buzzer or a piano. In the interim between these
two films, it would seem the Sonovox did serve a more practical function – that of wartime
propaganda against Nazi Germany. Utilizing short-wave radio, Allied broadcasts to German
soldiers in Russia would make the wind howl “You’ll never win.” It was even used to broadcast
messages to Allied troops saying (as whispering wind) “Give us revenge,” as factory whistles
chanting “Planes, guns, tanks,” or as a screaming bomb saying “Kill, kill, kill.” Even NBC’s
distinctive chimes would implore Americans to “Buy-WAR-Bonds.” 5
Again, the novelty of the robotic voice would become passé and serious effort returned to the
exploration of practical voice synthesis. Two devices emerged in the 1950s that were capable of
using light to read the spectral patterns of speech and played facsimiles through various means,
including vacuum tube oscillators. Frank Cooper’s Pattern Playback Device and Walter
Lawrence’s PAT (Parametric Artificial Talker) emerged as the ultimate predigital achievements
in voice synthesis.6 In rounding out the 20th Century, we saw vacuum tubes replaced with the
faster and smaller transistors and microchips, and voice synthesis finally split into two distinct
paths. Those of art and science. The time finally came in the 1970s when a computer could read
text with an artificial voice. In 1979, Texas Instruments developed and marketed the Speak and
Spell, a device that was meant to teach children how to read by either challenging them to spell
the words it spoke, or reading the words they typed into it. It was a wild commercial success
and can be seen as a pivotal device in making the acceptance of artificial speech a universal
reality. This cultural and scientific acceptance would come just in time for perhaps the most
famous astrophysicist in history, Stephen Hawking. In 1985, during a trip to CERN, Hawking
Davis, Speech from Another May 5, 2015
10
developed life-threatening pneumonia. An emergency tracheotomy was performed, but in
saving his life it also took his voice. Using artificial speech technology developed by Bell Labs
with software written by a company called Speech Plus, Hawking was able to communicate
efficiently once again. Interestingly, even after more natural voice synthesis technologies were
developed in the intervening years since the adoption of his first speaking device, he has
refused to “upgrade,” considering the robotic voice truly his – distinctive and recognizable.
Since that early time, speech synthesis has made great leaps forward in helping those with
disabilities and has found greater acceptance in society.
During the computer-boom of the late 1970s and 1980s, interest in technology and its place in
our society peaked. People were obsessed with the “world of tomorrow,” and characters such
as Max Headroom appeared in popular culture as the spokesmen of the future. Max was a
computer-generated TV personality that spoke using an actor’s voice though a Vocoder, giving
it a distinctive robotic coloring. Other cinematic robots had emerged complete with seemingly
synthesized voices, such as C-3PO from Star Wars or Johnny Five from Short Circuit. As the use
of artificial (or faux-artificial) speech synthesis began to inundate society, many artists and
musicians began to use the technology in their work as a direct reflection of its impact in our
culture. The Vocoder became an indispensable technology for musicians and performance
artists alike. Laurie Anderson, experimental instrument maker and performance artist, used it in
her 1981 hit O Superman (for Massenet), and more mainstream bands such as Kraftwerk and
Imogen Heap are famous for using the Vocoder in their music. Many visual artists since the
1980s also began working with various forms of speech synthesis, in direct or indirect ways.
Davis, Speech from Another May 5, 2015
11
Martin Riches, a German sound sculptor, debuted The Talking Machine in 1991. The sculpture
has 32 voice pipes, wind chests, valves, bellows and blower – all driven by a computer. Each of
the voice pipes (resonators) are made of wood and modeled on actual X-ray photographs of the
artist while speaking. Each resonator is a dedicated vowel or consonant and the computer
controls them in concert to form words and sounds.7 In 2014, Japanese performance and sound
artist Tomomi Adachi collaborated with Riches to perform with his Talking Machine live on
stage. It would same that an early promotional photograph depicting Riches mimicking the
action of “teaching” his machine to speak (as one would a child) led the young Adachi to believe
this is how the machine achieved its feat. The resulting performance is achieved by exposing
that falsity, and acting it out on the stage with the Talking Machine itself, both artist and
machine finishing the spectacle in stuttering unintelligibility.8 Other artists have made a more
direct use of modern computerized speech synthesis, including Ken Fiengold with his talking
heads. In a variety of compositions, he utilizes two or more prosthetic-like heads that have
conversations or arguments with each other. The statements are more or less random but
sampled from actual conversations and, as generated, will counter one another ad infinitum in
a flat robotic monotone. Mark Hansen and Ben Rubin collaborated in a work called Listening
Post that uses text fragments in real time from thousands of unrestricted Internet chat rooms,
bulletin boards, and other public forums. The texts are read by a voice synthesizer and
simultaneously displayed across a suspended grid of more than two hundred small LCD screens.
These real contextual statements become disassociated from their sources even more when
reanimated by a dispassionate electronic voice.
Davis, Speech from Another May 5, 2015
12
It is hard to say what the future of voice synthesis will be. While artists will no doubt continue
to use it to speak in ways that reflect on our contemporary society, for better or worse, it will in
all practicality continue to be integrated more and more in our day-to-day lives. In 2010, Apple
introduced Siri, a voice activated, interactive software program for the iPhone that not only
listens to you, but will respond with a natural voice. Siri, while still a novelty for some, has
become a very real participant in our lives. I would argue that it isn’t only practical usefulness of
an interactive hands-free program, but also the human-like qualities of Siri’s voice and
personality that has made it an acceptable component of daily life. Since the Enlightenment, we
have sought ways to explain life through scientific study, and that study has manifested as both
practical tools and entertainment. We need it to be both. As the future of voice synthesis
continues to be forged, we can look to parallel paths as laid out by two films featuring
prominent synthesized voices (simulated). Hal, in the 1968 Film 2001: A Space Odyssey presents
us with an apocryphal future where the soulless logical computer Hal endangers its crew
because of malfunction. The almost passive and arguably pleasant voice of Hal makes our fears
of a future spiraling out of our control quite visceral. Conversely, in the 2013 film Her, the
protagonist is presented with the first truly artificially intelligent operating system. Resistant at
first, he quickly succumbs to the intuitiveness of the program, its easy way of anticipating his
needs, and also by its soulful and seductive voice. The two fall in love, and they aren’t the only
ones. The film Her presents a world where, drawn in by the synthesized natural voice of
ourselves, we succumb to technology completely. In perhaps the only future that makes sense
(considering the predictions of Moore’s Law), we finally find a place where the perfect artificial
voice has a home – as the spokesperson of our future.
Davis, Speech from Another May 5, 2015
13
References
1 Hankins, Thomas L., & Silverman, Robert J. “Instruments of the Imagination,” Princeton, NJ: Princeton University Press, 1995. 2 Helmholtz, Hermann. “On the Sensations of Tone.” New York, NY: Dover Publications, 1954. 3 Stearne, Jonathan. “The Audible Past: Cultural Origins of Sound Reproduction.” Trinity, NC: Duke University Press Books, 2003. 4 Mills, Mara. “Media and Prosthesis: The Vocoder, the Artificial Larynx, and the History of Signal Processing.” Qui Parle: Critical Humanities and Social Sciences, Vol. 21, No. 1, Fall/Winter, 2012. 5 Smith, Jacob. “Tearing Speech to Pieces: Voice Technologies of the 1940s.” Music, Sound, and the Moving Image, Vol. 2, Issue 2, Autumn 2008. 6 “PAT Does the Talking.” Popular Electronics. December, 1958. 7 Schulz, Bernd. “Resonanzen: Aspekte der Klangkunst = Resonances: aspects of sound art.” Heidelberg: Kehrer, 2002. 8 Adachi, Tonami. Martin Riches Website. Available at http://martinriches.de/tomomi.html. Accessed May 5th 2015.