Accepted by_________________________________________________________________________
更新时间:2023-08-25 08:02:01 阅读量: 教育文库 文档下载
- accepted推荐度:
- 相关推荐
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
PROSODIC FONT
the Space between the Spoken and the Written
Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning inPartial Fulfillment of the Requirements for the Degree of Master of Media Arts and Sciences at the
Massachusetts Institute of Technology.
M.S. Rensselaer Polytechnic Institute 1995
B.A. University of Waterloo 1993
tara michelle graber rosenberger
August 1998
Program in Media Arts and Sciences
August 7, 1998
Ronald L. MacNeilPrincipal Research AssociateMIT Media Laboratory
Stephen A. Benton
Chair, Departmental Committee on Graduate Students
Program in Media Arts and SciencesMassachusetts Institute of Technology
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
PROSODIC FONT
the Space between the Spoken and the Written
tara michelle graber rosenberger
ABSTRACT
The advent of automated speech recognition opens up new possibilities for design ofnew typographic forms. Graphic designers have long been designing text to evoke thesound of a voice saying the words. Some have even used sound to animate word unitswithin a computational environment. Yet, there is opportunity to use the expressivenessof a voice, found within the speech signal itself, in the design of basic typographic forms.These typographic forms would inherently assume a temporal, dynamic form.Prosody in this thesis represents the melody and rhythm people use in natural speech.Even unintentionally, prosody expresses the emotional state of the speaker, her attitudetowards whom she’s talking with and what she’s talking about, resolves linguisticambiguity, and points towards any new focus of linguistic information.
Prosodic Font is an experiment in designing a font that takes its temporal form fromcontinuous and discrete phonetic and phonological speech parameters. Each glyph – thevisual form of an alphabetic letter – is comprised of one or more font primitives calledstrokes. These strokes are placed within a grid space using two of four possible basicconstraints: independence or dependence, and simultaneity or consecutiveness. Over time and insystematic accordance with parameters from a piece of speech, these stroke primitivestransform shape, size, proportions, orientation, weighting and shade/tint.
Prosodic Font uses a combination of machine and human recognition techniques tocreate text descriptions of prosodic parameters from a sound corpus developed expresslyfor this thesis. The sound corpus is excerpted from two speakers – one male and onefemale – who are telling stories about four different emotional experiences. Becauseaffective extremes produce prosodic extremes, the corpus involves great prosodic varietyand voice range.
According to preliminary user testing results, people are able to identify systems ofgraphic transforms as representative of systems of prosodic variation. I found thatrhythmic variation and variations in vocal stress are extremely important in peoples’ability to match Prosodic Font files to speech audio files.
Thesis Supervisor: Ronald L. MacNeil
Principal Research AssociateMIT Media Laboratory
This work was performed at the MIT Media Laboratory. Support for this work was provided by the NationalEndowment for the Arts, the Digital Life and News in the Future corporate sponsor consortiums. The views
expressed herein do not necessarily reflect the views of the supporting sponsors.
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
MASTERS THESIS COMMITTEE
Ronald L. MacNeilPrincipal Research AssociateMIT Media Laboratory
Stephanie Seneff
Principal Research Scientist
Laboratory for Computer Science, MIT
Maribeth Back
Creative Documents Initiative
Sound Designer
Xerox Corporation @ PARC
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
CONTENTS
Abstract
Masters Thesis CommitteeContents
AcknowledgementsIntroductionMotivation1.
Why do this at the Media Laboratory?
Background2.
Prosody and Affect2.1Feature Set2.1.1Song2.1.2Rhythm
2.2Techniques in Feature Identification2.2.1Intonation2.2.2Pitch Range2.2.3Duration Patterns2.3Models of Prosody2.3.1Auto-Segmental Metrical School of Phonology2.3.2Phonetic Models of Prosody2.4Discourse and Affective Function2.4.1The Emotional Speaker2.4.2Syntax, Information Structure, and Mutual Belief3.
Typography
3.1Typographic Style3.1.1Perception of Glyph Balance and Proportion
Prosodic Font Design4.
Typographic Design System4.1Four Stroke System
4.2Expanded Stroke System: Consecutiveness/Simultaneity and Dependence/Independence5.
Prosodic Features
5.1Speech Corpus Development5.2Labeling Prosody in Speech5.2.1Tilt Phonological-Phonetic System5.2.2Linguistic Labeling5.2.3Phonemic Realization5.2.4Voice Quality
2346712131414161622242426272828293132333436374141414348484949515253
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
6.
Mapping Relationships6.1System Design
6.2Parameter Match AppropriatenessResults7.
User Test
Related WorkFuture work
Appendix A: Tilt file exampleAppendix B: Word file ExampleAppendix C: Font fileAppendix D: QuestionnaireBibliography
53545458
6062646668717273
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
ACKNOWLEDGEMENTS
Many people share in the credit of everything good that comes of Prosodic Font. Allmistakes are, of course, my own responsibility.
Certain colleagues at the Media Lab were collaborators and innovators in the course ofmy study. To these people who took the time to develop an ongoing intellectual,
aesthetic conversation with me, I thank Janet Cahn, Kevin Brooks, Dave Small, StefanAgamanalis, Brygg Ullmer, Peter Cho, Pushpinder Singh, Maggie Orth, Arjan Schutte,Phillip Tiongson, Nick Montfort, Tom Slowe, and Max VanKleek, my amazing andtalented UROP. Thanks to Bill Keyes, Fernanda Viegas, and Tim McNerney, for thecamaraderie during our short-lived internship in IG. Special thanks to Kevin Brooks,Maggie Orth, Nick Montfort, Laurie Hiyakumoto, Janet Cahn, Bill Keyes, and RichieRivetz for contributions made to this Prosodic Font work.
Many thanks are due to Ron MacNeil, a designer and my advisor for two years at theMedia Lab, for not only allowing me to think and plan in abstract, wild terms, but forencouraging me to widen the purview of any stake I resolutely planted. Ron gave mefreedom to learn and research.
Stephanie Shattuck-Hufnagel and Samuel J. Keyser placed me on my feet initially in thevast and overwhelming field of prosody, rhythm and intonation. Stephanie Seneff wasinstrumental in helping me focus the work and understand arcane technicalities involvedin speech recognition. Maribeth Back directed me towards a body of literature that dealtwith mapping relationships between sound and image, as well as inspired me with ideasof Prosodic Font instruments. Suguru Ishizaki’s own design work and feedback atnascent points in Prosodic Font work focused and motivated me.
Glorianna Davenport and Justine Cassell welcomed me into their respective researchgroups at various points and gave me the benefits of their creative and scientificperspectives. John Maeda taught me the conceptual and technical tools in his newcourse, Typography, that enabled me to write Prosodic Font.
My love and appreciation to Mom and Dad, for continuously putting my life into
perspective during the most difficult and busiest of times. Your gifts to me are more thanI can ever realize.
And to Samarjit, who transformed my thesis experience and my life, my love.
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
INTRODUCTION
When most words are written, they become, of course, a part of the visual world. Like most of theelements of the visual world, they become static things and lose, as such, the dynamism which is socharacteristic of the auditory world in general, and of the spoken word in particular. They lose much ofthe personal element...They lose those emotional overtones and emphases...Thus, in general, words, bybecoming visible, join a world of relative indifference to the viewer – a word from which the magic ‘power’of the word has been abstracted.
Marshall McLuhan in The Gutenberg Galaxy (1962), quoting J.C. Carothers, writing inPsychiatry, November 1959.
Compared to the richness of speech, writing is a meager system. A speaker uses stress, pitch, rate, pauses,voice qualities, and a host of other sound patterns not even vaguely defined to communicate a message aswell as attitudes and feelings about what he is saying. Writing can barely achieve such a repertoire.Gibson and Levin, from the Psychology of Reading (1975).
This thesis is about writing. Or rather, what writing might become when one is writingby speaking. What does the introduction of software that can translate speech intowritten symbols do to the nature of writing, of reading? Does the message itself, thewritten object, change in appearance from what we now know, and from what it appearsto be at first glance? Does it encode just the words that we write now by hand? Or doesit also encode the emotional overtones, the lyric melody, the subtle rhythms of ourspeech into the written symbology? What, then, does typography become?
Figure 1: A system overview of a prosodic font system. A speech recognizer paired with prosody recognizerfeeds descriptions of the voice signal and words uttered into a Prosodic Font. A Prosodic Font is an abstractdescription of letter forms with algorithms for motion. It uses a descriptive vocal model of the particularspeaker, developed over time. A speaker might also make certain aesthetic decisions, such as basic fontshapes and colors, about prosodic font appearance through a graphic user interface.
Prosodic typography uses the active recognition of speech and prosody – the song andrhythm of ordinary talk – in the design of a font. Further, the temporal and dynamiccharacteristics of speech are to some extent transferred to font representation, lendingwritten representations some of talk’s transitory, dynamic qualities. A prosodic font is
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
designed for motion, not static print. Prosodic typography is the electronic interventionbetween speech and text. It represents the contextual, individual aspects of speech thatprinted typography does not capture.
Prosodic Font is a project that explores what becomes possible when speech recognitionmerges with dynamic forms of typography. Already, writing is no longer a kinestheticexercise, but a vocal one. Next, speech recognition will recognize not just the word itselfbut how the word was said, and how long it lasted, and how quickly the next wordfollowed. Even vocal events like inhaling and exhaling, sounds which are particularlyexplosive, and speech errors like words left only half-begun can have visual correlates.These prosodic characteristics can be mapped onto the structural architecture of a
letterform, called a glyph. In this dynamic context, word presentation adopts some of thetemporal quality of speech, adopting a temporal word by word presentation rather thanhaving them appear as beads on a visual string.
Text has long been considered one of the least rich mediums of communication, face toface conversation the richest because it involves speech, facial expression, gesture andtemporal forms (Daft and Lengel, 1987). Non-rich forms of communication admit
greater ambiguity into the cycle of interpretation between people; hence, richer forms ofcommunication are the preferred modes of interaction in highly volatile business
communications, as well as intimate personal relationships, where subtle innuendoes areread deeply by participants. By introducing prosodic expression indications into textualwritten form, text as a medium may develop greater communicative richness. A prosodicfont would be situated in the continuum of rich mediums between telephony (voicealone) and textual communication as we currently understand it.
Speech is a medium of emotional communication as well as a medium of semanticcommunication. After the face, vocal inflection is the second-most modality expressiveof emotion we possess (Picard, 1997). Research into emotion and speech has found thatpeople can recognize affect with 60% reliability when context and meaning are obscured(Scherer, 1981). Humans can distinguish arousal in the voice (angry versus sad) butfrequently confuse valence ( angry versus enthusiastic). Scherer believes this confusionwould be mitigated with contextual features (1981). Because the voice is a vehicle ofemotional expression with measurable – and often continuous – vocal characteristics, aprosodic font can use these continuous vocal measurements in the design of temporaltypographic forms. Writing a Prosodic Font with one’s voice assures that the currentemotional state one has will be invested into the font representation. Each mark, eachletter would be signed by the author’s current emotional tone of voice.
The concept of voice has been used to symbolize the externalization of one’s internalstate. To have voice within feminist and psychoanalytic literature is to have power, agencyand character. This metaphor of voice derives from our experience of producing sound,an act of making what is internal – the air in our lungs – an external, public object. Voiceis an act of expression that moves what is internal, private and undifferentiated into anexternal, public and particular environment. Unlike a static font, a prosodic font does notforget the instant of emergence from the body. The prosodic font captures the
emergence and unfolding of sound from the body, recording also the physical part ofcommunication that has not had a place within textual communication.
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
Figure 2: Frame selections from a Prosodic Font performance of speaker saying angrily, “I’m not workingfor my own education here.”
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
MOTIVATION
The motivation for creating a prosodic font comes from a number of current disciplinarytrends: the too narrowly focused research in speech recognition, design for
computational environments, and a growing need for richer and transformationalcommunication mediums in the increasingly casual Internet traffic.
Some designers today have embraced computer technology and code as the very mediumthey work with, like paints and canvas. Computers allow the exploration of forms andmediums that have heretofore not existed. I consider prosodic font work to contribute tothis exploratory design. I ask, “How can the letters of the English alphabet be
represented, differentiated and animated? When the exchange of text occurs through acomputer interface rather than a non-electronic paper interface, how can the nature offont representation change? What additional information can a font convey when thefont represents a speaking voice rather than a hand-manipulated pen?”
Trends in speech recognition and synthesis have been narrowly focused upon
recognizing semantic word units only. The influence of prosody upon the interpretationof semantics and speaker intention has been neglected. Furthermore, research in prosodyrecognition proceeds largely outside of and separate from speech recognition researchefforts. Commercially available speech recognition packages do not even consider thatthird party developers might be interested in something aside from semantic content.IBM’s Via Voice and DragonSpeak’s Naturally Speaking do not include external codelibraries to permit third party developers to further process the raw speech signals. Speech recognition is largely a black-boxed procedure. Although this state of affairs is atestament to the difficulty of prosody recognition and interpretation, this may also beattributed to the fact that there are few compelling applications that use prosody andvocal expression in conjunction with semantic speech recognition. Prosodic Font canbegin to demonstrate the commercial viability of corporate prosody and speechrecognition, widening the scope of what qualifies currently as speech recognition.Prosodic Font contributes to the field of speech generation by developing discretetextual descriptions of emotionally charged segments of speech. This work points toprosodic features of interest, and how one might describe them in text.
Prosodic Font could also be useful to researchers in prosody and speech as a tool to helprecognize and identify prosodic and voice quality variation. Currently researchers learnhow to read prosodic variation from sequences of numbers and spectrograms of speechdata. Prosodic Font could be visual, temporal tool to help researchers identify the
success or failure of the algorithms they develop to extract prosody and affective featuresfrom speech.
Prosodic fonts are becoming a social need. Writing has seldom been used as acommunication medium in environments in which people are spatially co-located,
sometimes even in neighboring offices. The influence of electronic mail has made writinga tool of everyday management, conversation, and even romantic courting. Yet, writingemail is done differently than writing on paper has been done (Ferrara, Brunner, andWhittemore, 1991). The email register (i.e. “tone of voice”) is decidedly more informal,
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
even shorthand-ish, than writing that is used in other written contexts. This informalregister, added to the lack of richness and the level of spontaneity that the email mediumallows, has led to many terrible misunderstandings between people where the writer’sintent has been judged to be much different than that which the writer intended. In face-to-face conversation, prosody is central among human communication tools for
conveying psychological-emotional state, intentions, and the point of information focus.When writing provides little context for the hapless reader, such as in email, there is aneed for speaker’s intention and emotional state cues to be provided along with thesemantics of the message.
In the world of portable technology, there is a need for seamless translation betweenmediums such as voice and text, depending upon the sender’s and recipient’s currentsocial needs. A prosodic font provides such an interface that does not compromise anaudio message to the extent that semantic speech recognition would. Further, a prosodicfont’s design potential for emerging through time might be easily adapted to very smalldisplays. For example, imagine you are ensconced within a formal situation that shouldnot be interrupted, such as an important business meeting. You receive notice throughone of your portables that someone important to you has sent you a message. You wantto hear it, but you don’t want to risk interrupting the meeting, nor do you want othersaround you to hear your message. You select “visual” output. The message plays in aprosodic font, reflecting the sender’s tone of voice, rhythm, loudness, and forcefulness inthe systematic movement of the syllables over time. You can see in the words how thesender expresses emotion vocally, and you understand more deeply what she meant toconvey to you by seeing how the words change relative to each other. In this way,
translation from audio to text may occur without losing speech information. The writtenmessage is individual, contextual and expressive.
1. WHY DO THIS AT THE MEDIA LABORATORY?
Arriving at the concept of prosodic typography is a product of having been at the MediaLaboratory and stepping into the midst of many streams of research that flow within thesame channel here. The on-going work in prosody, affect, and design of textual
information, in addition to the unique convergence of creativity, science and technologyhas made it possible to dream about prosodic type.
This work builds upon work completed in the Visual Language Workshop (VLW).Researchers and students designed computer interfaces to textual information thatinvolve many notions of time. It is VLW students, particularly Yin Yin Wong, whotransferred the idea of Rapid Serial Visual Presentation (RSVP) to message design. TheAesthetics and Computation Group (ACG), chasing Professor John Maeda’s vision ofhow computer technology transforms design, is an intoxicating trajectory with no clearending. Janet Cahn’s work in emotive, intonational speech generation – and Janet Cahnherself – have provided me with direction into an amorphous and distributed body ofprosody and emotion literature. And, lastly, the spirit of curiosity and art that envelopeseven the most scientific of inquiries here has allowed me to learn the technical skills Ineeded to accomplish this work.
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
BACKGROUND
Prosodic Font draws upon work done in phonetic and phonological linguistics research.In particular, I use the work of auto-segmental metrical phonologists who believe thatintonation and prosody are not linguistic systems per se, but that the stream of prosodycan be understood in linear segments. The Prosody and Affect section thus draws adistinction between linguistic and paralinguistic speech features, how we might locateparalinguistic features perceptually and computationally, and communication.
Typographic History describes the historical features of typographic space and perceptualissues of font design. I discuss the migration of some of these historical graphic featuresto temporal design, and introduce new features.
2. PROSODY AND AFFECT
The current task of speech recognition is only to decode the orthographic representationof phonetic sound units. Prosodic Font requires the linguistic function of language onlyinsofar as obtaining the orthographic representation. Prosodic Font’s focus continuesbeyond to that of prosody – the paralinguistic features of speech that convey amultiplicity of emotional, informational and situated meanings.
Prosody is a paralinguistic category that can describe the song – or intonation, rhythm, andvocal timbre (or voice quality) found in all spoken utterances of all languages. Prosodyfunctions above the linguistic function of language, meaning, prosodic meaning does notbear a one-to-one relationship to semantic meaning. It is a non-arbitrary use of vocalfeatures to convey the way we feel about what we are saying, as well as how we are
feeling when we say anything. A number of primitive features interact within any spokenutterance to create a uniquely phrased and emphasized utterance. A spoken utterance,then, conveys two simultaneous channels of communication – the linguistic andparalinguistic. Written language represents the linguistic channel. Prosodic Font goesfurther to represent the paralinguistic channel on top of the visual linguisticrepresentations.
Dr. Robert Ladd describes the coordination of the paralinguistic and linguistic:
“The central difference between paralinguistic and linguistic messages resides in the quantal or categoricalstructure of linguistic signalling and the scalar or gradient nature of paralanguage. In linguistic signalling,physical continua are partitioned into categories, so that close similarity of phonetic form is generally of norelevance for meaning: that is /th/ and /f/ are different phonemes in English, despite their close phoneticsimilarity, and pairs of words like thin and fin are not only clearly distinct but also semantically unrelated.In paralinguistic signalling, by contrast, semantic continua are matched by phonetic ones. If raising thevoice can be used to signal anger or surprise, raising the voice a lot can signal violent anger or great
surprise. Paralinguistic signals that are phonetically similar generally mean similar things.... The differencebetween language and paralanguage is a matter of the way the sound-meaning relation is structured” (1996,p. 36).
Defining prosody is a difficult and contentious task since there is no common agreement.Further, each discipline places different vocal features into the prosodic feature http://www.77cn.com.cnputational linguists and speech communication researchers identify intonation andprominence as the major prosodic feature set items, while poets and poetry critics
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
associate prosody with rate of speaking and metrical rhythm. Experimental psychologistshave studied vocal prosody for how it can inform research on emotion. Some findings goso far as to integrate prosodic parameters of voice quality, range, and speaking durationdifferences along axes of emotion; however, there are fundamental disagreements abouthow emotional space is defined. Some anthropologists have looked at how vocal timbrechanges across context, building upon the work of linguistic anthropologist JohnGumperz in contextualized vocal prosody (1982). Yet this work is not complete norsystematized.
Not only is the definition and what constitutes the prosodic feature in question, but thebasic function of prosody within and across languages is in dispute. Prosody may haveuniversal import to humans, irrespective of which language is spoken. The universality ofprosody is often borne out in psychological tests in which subjects identify the primaryemotion in a voice speaking a language unknown to them (Scherer 1981). Intonationalphonology’s primary goal is to discover the universal functions of prosody. On the otherhand, linguists often subjugate prosody to the status of a linguistic amplifier, believingthat prosody is used by speakers to foreground certain linguistic items introduced intothe conversation, amongst other things.
The field of prosody varies across three dimensions:
Affective versus Syntactic Ontology: those who hold that intonation and patterns of
prominence developed as an extension of grammar and discourse structure versus thosethat believe prosody has non-linguistic roots in affect and emotion that develop inconventionally understood ways, dependent upon sociological and linguistic factors.Phonetic versus Phonological Goals: those who use low-level descriptions of the voice signalversus those who characterize the signals in universal terms that enable comparison andgeneration of phonological rules across individual speakers’ production. (Another way ofdescribing this difference is low to mid-level descriptions versus high-level descriptions.)Linear versus Layered Descriptions: those who believe that prosody is constructed of a linearsequence of events versus those who believe that prosody consists of layers of signals ofgreater or lesser range which interact to produce a composite effect.
My approach to Prosodic Font involves a combination of approaches. Prosodic Fontuses low- to mid-level signal characterizations of voice in order to represent individualdifferences between speakers. However, these events are understood as linear sequencesof meaningful events in order to capture the emotional intention of the song and rhythmapart from the pronunciation requirements of particular words. This serves to smooththe low-level signals and foreground higher level changes and trends. For example,
Prosodic Font does not represent the spectral differences between an /a/ phoneme andan /i/ phoneme, but it would represent a general increase in volume and fall in pitch.Prosodic Font does not require that speech be labeled as an instance of any categoricalemotion or syntactical construction. Although vocal characteristics of some basic
emotions have been identified, correctly identifying affect in a voice signal is fraught withthe potential of mis-identification. To avoid this, I built Prosodic Font with an implicitunderstanding that prosody functions primarily as an instrument of emotional
communication, but the best way to represent affect is to use interpretations of low- tomid-level voice signals.
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
Prosodic Font is interested in more speech data than is currently described in most
syntactical, linguistic research. Casual speech is not often used as an object of analysis. Assuch, speech errors such as false starts and mispronunciations, non-linguistic
exclamations and the like are not described as significant events in syntactic research;whereas, Prosodic Font would find these meaningful, expressive vocal events. Certainly,if Prosodic Font were ever generated from text, syntax and discourse structure would becentral as it is in speech generation. But in terms of speaking Prosodic Font, syntaxemerges as a by-product of a speaker using proper grammatical forms. Syntax, per se,does not affect the visuals.
Prosodic Font assumes that people intuitively understand intonation as a relative systemof contrasts and similarities, and that people will still understood the semantic intentionof prosody if the parameters that comprise its system are mapped onto a completelyalternative medium. This assumes that there is nothing essential or hard-wired aboutpeople’s use and understanding of sound, except that it is an extremely flexible
instrument particularly well-suited to a system as elastic and diverse as prosody. Hence, ifthere were a correspondingly flexible medium, such as computational fonts, there couldbe many mapping relationships established between the parameters sets that would beexpressively meaningful to readers. This assumes a competency on the part of readers,that they can and will be able to read and understand the prosodic relationships conveyedvia fonts. It also assumes a competency on the part of the font designers, speech andprosody recognition systems, that they will select signals to map and mapping
relationships that implicitly have semantic, expressive, and affective meanings to people.First, I define the prosodic feature set, in terms of song and rhythm. Secondly, I describethe perceptual and computational techniques for finding these features within
spontaneous speech. Next, I describe methods of describing prosodic features accordingto relevant theories within the phonetic and phonological fields, and specify which onesare most productive in a Prosodic Font context. And finally, I review provocativefunctions of prosody; and argue that prosody must be understood first as a situated,emotional expression that interacts closely with linguistic structure.
2.1 FEATURE SET2.1.1 Song
Song designates those prosodic features that are centrally involved in the production andperception of tone and pitch. These features are the intonational contour, pitch accentsand final phrasal tones, as well as pitch range.
2.1.1.1 Intonation
Intonation is the psychological perception of the change in pitch during a spoken
utterance. It can also be called the tune of an utterance. Intonation is the perception ofthe physical signal, fundamental frequency (F0). F0 is a measurable signal produced of voicedspeech, a glottal vibration such as evident in the phone /v/ as opposed to the unvoicedphone /f/. The excitation for voiced speech sounds is produced through periodic
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
vibration at the glottis, which in turn produces a pulse train spaced at regular intervals.This is the source of the perceived pitch.
He won't be going will he
Figure 3: The intonation, or tune, of the utterance “He won’t be going willhe” is represented here as a continuously curved line.
Intonation occurs in units called intonational phrases. The intonational phrase can be
distinguished by the presence of an ending tone that signals its closure and by a duration ofsilence that follows the utterance. The duration of the silence and the height or depth ofthe ending tone that follows an intonational phrase may be indicative of the intendedstrength of the ending (Ladd, 1996) or a speaker’s intention of continuing (Pierrehumbertand Hirschberg, 1990). The ending tone, or boundary tone, forms a tonal tail on the utterancethat is high, equal, or low relative to the utterance.
He won't be going will he
Figure 4: The ending tone, or boundary tone, of the intonational phrasefalls approximately within the circled region.
Intonation in particular, relative to other prosodic features, can convey very fine shadesof meaning. Intonation researcher Dwight Bolinger defines intonation as, “all uses offundamental pitch that reflect inner states...” (1989, p. 3). There is evidence that speakersintone with a high degree of precision. Subtle intonational changes can radically affectthe hearer’s interpretation of the words, as well as provide a window onto the speaker’saffective state. Three examples illustrate this difference.
1 2
You might have told me.You might have told me.
Figure 5: The intonation of “You might have told me” can imply indignation [left] todoubt [right]. Example after Bolinger (1989).
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
High
Hello
Hello
Low
Hello
Figure 6: The common greeting “Hello” can convey speaker mood and intentions in a very short linguisticsound-unit. The examples might be interpreted as such: [1] cheery; [2] a response to an initial sexualattraction; and [3] expressing indifference, or no desire to continue the social meeting.
High1 2 3 4
Low
Figure 7: Intonation is independent enough from linguistic structure to imply distinct affective meanings evenwhen accompanied by a non-linguistic sound-unit “mmmhmm”. Non-linguistic sound-units are often used asa backchannel comment from hearer to speaker, to give feedback while the other holds the conversationalfloor. Affective-semantic meanings might range from interpretations of: [1] vigorous agreement; [2] confusion;[3] final comprehension; [4] boredom and disdain.
An intonational phrase does not imply any degree of well-formedness. For example, if aperson stops suddenly during an utterance – even half-way through a word – and beginsagain on a different subject, or coughs or burps, the presence of silence should besufficient reason to mark the end of an intonational phrase. Therefore, an intonationalphrase is not beholden to any syntactical-grammatical notion of completeness or well-formedness. And, in fact as we shall see later, vocal disturbances and so-called speech“errors” can be revealing of the speaker’s affective state. Hence, Prosodic Font shouldseek to convey these non-linguistic vocal sounds as well as the linguistic.
2.1.1.2 Pitch Accent
During the course of any utterance, a speaker speaks certain syllables with greaterprominence than others. There are two kinds of prominence within English, lexicalprominence and prosodic prominence. Lexical prominence is the preferred placement of
accentuation within any given word item, as in the citation form of /LEX-i-cal/. Lexicalprominence is often called syllabic stress, or just stress. Prosodic Font addresses lexicalstress as an element of rhythm.
Prosodic prominence is created through intonational contours; hence, it is an accent
conveyed as an aspect of the utterance’s tune. It is also called intonational accent, pitch accent,or just accent. Accent is placed upon syllables that are often, but not exclusively, foundwithin the class of lexically prominent syllables.
A pitch accent is achieved through distinctive changes in the F0 contour. These changescan be classified as either High or Low. A number of prosodic features often coincide
the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a
with an accent, such as increased duration, increased loudness, and vowel fullness (i.e.not reduced phonetic form).
正在阅读:
Accepted by_________________________________________________________________________08-25
优质护理服务病房各岗位工作职责11-06
工业经济效益评价考核指标的内容及计算公式06-25
日本企业雇佣关系的特征和年资制度05-02
2014中级经济法重点关注 - 图文01-12
计数第06讲_标数法(学生版)A405-07
西南大学本科作业《中国古代诗歌》06-10
- exercise2
- 铅锌矿详查地质设计 - 图文
- 厨余垃圾、餐厨垃圾堆肥系统设计方案
- 陈明珠开题报告
- 化工原理精选例题
- 政府形象宣传册营销案例
- 小学一至三年级语文阅读专项练习题
- 2014.民诉 期末考试 复习题
- 巅峰智业 - 做好顶层设计对建设城市的重要意义
- (三起)冀教版三年级英语上册Unit4 Lesson24练习题及答案
- 2017年实心轮胎现状及发展趋势分析(目录)
- 基于GIS的农用地定级技术研究定稿
- 2017-2022年中国医疗保健市场调查与市场前景预测报告(目录) - 图文
- 作业
- OFDM技术仿真(MATLAB代码) - 图文
- Android工程师笔试题及答案
- 生命密码联合密码
- 空间地上权若干法律问题探究
- 江苏学业水平测试《机械基础》模拟试题
- 选课走班实施方案
- Accepted
- 牌匾制作安装合同书
- 27-09-00_飞行操纵多功能系统组件 B737NG训练手册-中文版
- 2010年某某药业1000亩辛中药材GAP基地建设项目可行性报告40优秀甲级资质资金申请报告41
- 高新区家长课程巡课记录表
- 2019年小学四年级语文下册第五六单元练习题
- 万科集团-项目可研报告内容指引--管理流程
- 工程项目管理课程设计
- 丰田汽车客户维修服务标准手册
- 中国矿渣硅酸盐行业市场前景分析预测年度报告(目录)
- 2017-2022年中国电器技术检测产业竞争格局研究报告(目录)
- 2015广西公务员考试笔试试题下载
- 中国全热交换新风机市场分析及发展趋势研究报告
- 教师自培计划
- 煤质化验分析仪器
- 汽车发动机紧固件项目可行性研究报告(发改立项备案+2013年最新案例范文)详细编制方案
- WebField ECS-100 系统使用手册
- 直言命题及推理补充练习
- 2015年广州市“一测”化学评分标准(20150321)
- 汇报材料2
- 电网ERP与专业管控的集成6