Accepted by_________________________________________________________________________

更新时间:2023-08-25 08:02:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

PROSODIC FONT

the Space between the Spoken and the Written

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning inPartial Fulfillment of the Requirements for the Degree of Master of Media Arts and Sciences at the

Massachusetts Institute of Technology.

M.S. Rensselaer Polytechnic Institute 1995

B.A. University of Waterloo 1993

tara michelle graber rosenberger

August 1998

Program in Media Arts and Sciences

August 7, 1998

Ronald L. MacNeilPrincipal Research AssociateMIT Media Laboratory

Stephen A. Benton

Chair, Departmental Committee on Graduate Students

Program in Media Arts and SciencesMassachusetts Institute of Technology

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

PROSODIC FONT

the Space between the Spoken and the Written

tara michelle graber rosenberger

ABSTRACT

The advent of automated speech recognition opens up new possibilities for design ofnew typographic forms. Graphic designers have long been designing text to evoke thesound of a voice saying the words. Some have even used sound to animate word unitswithin a computational environment. Yet, there is opportunity to use the expressivenessof a voice, found within the speech signal itself, in the design of basic typographic forms.These typographic forms would inherently assume a temporal, dynamic form.Prosody in this thesis represents the melody and rhythm people use in natural speech.Even unintentionally, prosody expresses the emotional state of the speaker, her attitudetowards whom she’s talking with and what she’s talking about, resolves linguisticambiguity, and points towards any new focus of linguistic information.

Prosodic Font is an experiment in designing a font that takes its temporal form fromcontinuous and discrete phonetic and phonological speech parameters. Each glyph – thevisual form of an alphabetic letter – is comprised of one or more font primitives calledstrokes. These strokes are placed within a grid space using two of four possible basicconstraints: independence or dependence, and simultaneity or consecutiveness. Over time and insystematic accordance with parameters from a piece of speech, these stroke primitivestransform shape, size, proportions, orientation, weighting and shade/tint.

Prosodic Font uses a combination of machine and human recognition techniques tocreate text descriptions of prosodic parameters from a sound corpus developed expresslyfor this thesis. The sound corpus is excerpted from two speakers – one male and onefemale – who are telling stories about four different emotional experiences. Becauseaffective extremes produce prosodic extremes, the corpus involves great prosodic varietyand voice range.

According to preliminary user testing results, people are able to identify systems ofgraphic transforms as representative of systems of prosodic variation. I found thatrhythmic variation and variations in vocal stress are extremely important in peoples’ability to match Prosodic Font files to speech audio files.

Thesis Supervisor: Ronald L. MacNeil

Principal Research AssociateMIT Media Laboratory

This work was performed at the MIT Media Laboratory. Support for this work was provided by the NationalEndowment for the Arts, the Digital Life and News in the Future corporate sponsor consortiums. The views

expressed herein do not necessarily reflect the views of the supporting sponsors.

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

MASTERS THESIS COMMITTEE

Ronald L. MacNeilPrincipal Research AssociateMIT Media Laboratory

Stephanie Seneff

Principal Research Scientist

Laboratory for Computer Science, MIT

Maribeth Back

Creative Documents Initiative

Sound Designer

Xerox Corporation @ PARC

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

CONTENTS

Abstract

Masters Thesis CommitteeContents

AcknowledgementsIntroductionMotivation1.

Why do this at the Media Laboratory?

Background2.

Prosody and Affect2.1Feature Set2.1.1Song2.1.2Rhythm

2.2Techniques in Feature Identification2.2.1Intonation2.2.2Pitch Range2.2.3Duration Patterns2.3Models of Prosody2.3.1Auto-Segmental Metrical School of Phonology2.3.2Phonetic Models of Prosody2.4Discourse and Affective Function2.4.1The Emotional Speaker2.4.2Syntax, Information Structure, and Mutual Belief3.

Typography

3.1Typographic Style3.1.1Perception of Glyph Balance and Proportion

Prosodic Font Design4.

Typographic Design System4.1Four Stroke System

4.2Expanded Stroke System: Consecutiveness/Simultaneity and Dependence/Independence5.

Prosodic Features

5.1Speech Corpus Development5.2Labeling Prosody in Speech5.2.1Tilt Phonological-Phonetic System5.2.2Linguistic Labeling5.2.3Phonemic Realization5.2.4Voice Quality

2346712131414161622242426272828293132333436374141414348484949515253

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

6.

Mapping Relationships6.1System Design

6.2Parameter Match AppropriatenessResults7.

User Test

Related WorkFuture work

Appendix A: Tilt file exampleAppendix B: Word file ExampleAppendix C: Font fileAppendix D: QuestionnaireBibliography

53545458

6062646668717273

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

ACKNOWLEDGEMENTS

Many people share in the credit of everything good that comes of Prosodic Font. Allmistakes are, of course, my own responsibility.

Certain colleagues at the Media Lab were collaborators and innovators in the course ofmy study. To these people who took the time to develop an ongoing intellectual,

aesthetic conversation with me, I thank Janet Cahn, Kevin Brooks, Dave Small, StefanAgamanalis, Brygg Ullmer, Peter Cho, Pushpinder Singh, Maggie Orth, Arjan Schutte,Phillip Tiongson, Nick Montfort, Tom Slowe, and Max VanKleek, my amazing andtalented UROP. Thanks to Bill Keyes, Fernanda Viegas, and Tim McNerney, for thecamaraderie during our short-lived internship in IG. Special thanks to Kevin Brooks,Maggie Orth, Nick Montfort, Laurie Hiyakumoto, Janet Cahn, Bill Keyes, and RichieRivetz for contributions made to this Prosodic Font work.

Many thanks are due to Ron MacNeil, a designer and my advisor for two years at theMedia Lab, for not only allowing me to think and plan in abstract, wild terms, but forencouraging me to widen the purview of any stake I resolutely planted. Ron gave mefreedom to learn and research.

Stephanie Shattuck-Hufnagel and Samuel J. Keyser placed me on my feet initially in thevast and overwhelming field of prosody, rhythm and intonation. Stephanie Seneff wasinstrumental in helping me focus the work and understand arcane technicalities involvedin speech recognition. Maribeth Back directed me towards a body of literature that dealtwith mapping relationships between sound and image, as well as inspired me with ideasof Prosodic Font instruments. Suguru Ishizaki’s own design work and feedback atnascent points in Prosodic Font work focused and motivated me.

Glorianna Davenport and Justine Cassell welcomed me into their respective researchgroups at various points and gave me the benefits of their creative and scientificperspectives. John Maeda taught me the conceptual and technical tools in his newcourse, Typography, that enabled me to write Prosodic Font.

My love and appreciation to Mom and Dad, for continuously putting my life into

perspective during the most difficult and busiest of times. Your gifts to me are more thanI can ever realize.

And to Samarjit, who transformed my thesis experience and my life, my love.

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

INTRODUCTION

When most words are written, they become, of course, a part of the visual world. Like most of theelements of the visual world, they become static things and lose, as such, the dynamism which is socharacteristic of the auditory world in general, and of the spoken word in particular. They lose much ofthe personal element...They lose those emotional overtones and emphases...Thus, in general, words, bybecoming visible, join a world of relative indifference to the viewer – a word from which the magic ‘power’of the word has been abstracted.

Marshall McLuhan in The Gutenberg Galaxy (1962), quoting J.C. Carothers, writing inPsychiatry, November 1959.

Compared to the richness of speech, writing is a meager system. A speaker uses stress, pitch, rate, pauses,voice qualities, and a host of other sound patterns not even vaguely defined to communicate a message aswell as attitudes and feelings about what he is saying. Writing can barely achieve such a repertoire.Gibson and Levin, from the Psychology of Reading (1975).

This thesis is about writing. Or rather, what writing might become when one is writingby speaking. What does the introduction of software that can translate speech intowritten symbols do to the nature of writing, of reading? Does the message itself, thewritten object, change in appearance from what we now know, and from what it appearsto be at first glance? Does it encode just the words that we write now by hand? Or doesit also encode the emotional overtones, the lyric melody, the subtle rhythms of ourspeech into the written symbology? What, then, does typography become?

Figure 1: A system overview of a prosodic font system. A speech recognizer paired with prosody recognizerfeeds descriptions of the voice signal and words uttered into a Prosodic Font. A Prosodic Font is an abstractdescription of letter forms with algorithms for motion. It uses a descriptive vocal model of the particularspeaker, developed over time. A speaker might also make certain aesthetic decisions, such as basic fontshapes and colors, about prosodic font appearance through a graphic user interface.

Prosodic typography uses the active recognition of speech and prosody – the song andrhythm of ordinary talk – in the design of a font. Further, the temporal and dynamiccharacteristics of speech are to some extent transferred to font representation, lendingwritten representations some of talk’s transitory, dynamic qualities. A prosodic font is

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

designed for motion, not static print. Prosodic typography is the electronic interventionbetween speech and text. It represents the contextual, individual aspects of speech thatprinted typography does not capture.

Prosodic Font is a project that explores what becomes possible when speech recognitionmerges with dynamic forms of typography. Already, writing is no longer a kinestheticexercise, but a vocal one. Next, speech recognition will recognize not just the word itselfbut how the word was said, and how long it lasted, and how quickly the next wordfollowed. Even vocal events like inhaling and exhaling, sounds which are particularlyexplosive, and speech errors like words left only half-begun can have visual correlates.These prosodic characteristics can be mapped onto the structural architecture of a

letterform, called a glyph. In this dynamic context, word presentation adopts some of thetemporal quality of speech, adopting a temporal word by word presentation rather thanhaving them appear as beads on a visual string.

Text has long been considered one of the least rich mediums of communication, face toface conversation the richest because it involves speech, facial expression, gesture andtemporal forms (Daft and Lengel, 1987). Non-rich forms of communication admit

greater ambiguity into the cycle of interpretation between people; hence, richer forms ofcommunication are the preferred modes of interaction in highly volatile business

communications, as well as intimate personal relationships, where subtle innuendoes areread deeply by participants. By introducing prosodic expression indications into textualwritten form, text as a medium may develop greater communicative richness. A prosodicfont would be situated in the continuum of rich mediums between telephony (voicealone) and textual communication as we currently understand it.

Speech is a medium of emotional communication as well as a medium of semanticcommunication. After the face, vocal inflection is the second-most modality expressiveof emotion we possess (Picard, 1997). Research into emotion and speech has found thatpeople can recognize affect with 60% reliability when context and meaning are obscured(Scherer, 1981). Humans can distinguish arousal in the voice (angry versus sad) butfrequently confuse valence ( angry versus enthusiastic). Scherer believes this confusionwould be mitigated with contextual features (1981). Because the voice is a vehicle ofemotional expression with measurable – and often continuous – vocal characteristics, aprosodic font can use these continuous vocal measurements in the design of temporaltypographic forms. Writing a Prosodic Font with one’s voice assures that the currentemotional state one has will be invested into the font representation. Each mark, eachletter would be signed by the author’s current emotional tone of voice.

The concept of voice has been used to symbolize the externalization of one’s internalstate. To have voice within feminist and psychoanalytic literature is to have power, agencyand character. This metaphor of voice derives from our experience of producing sound,an act of making what is internal – the air in our lungs – an external, public object. Voiceis an act of expression that moves what is internal, private and undifferentiated into anexternal, public and particular environment. Unlike a static font, a prosodic font does notforget the instant of emergence from the body. The prosodic font captures the

emergence and unfolding of sound from the body, recording also the physical part ofcommunication that has not had a place within textual communication.

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

Figure 2: Frame selections from a Prosodic Font performance of speaker saying angrily, “I’m not workingfor my own education here.”

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

MOTIVATION

The motivation for creating a prosodic font comes from a number of current disciplinarytrends: the too narrowly focused research in speech recognition, design for

computational environments, and a growing need for richer and transformationalcommunication mediums in the increasingly casual Internet traffic.

Some designers today have embraced computer technology and code as the very mediumthey work with, like paints and canvas. Computers allow the exploration of forms andmediums that have heretofore not existed. I consider prosodic font work to contribute tothis exploratory design. I ask, “How can the letters of the English alphabet be

represented, differentiated and animated? When the exchange of text occurs through acomputer interface rather than a non-electronic paper interface, how can the nature offont representation change? What additional information can a font convey when thefont represents a speaking voice rather than a hand-manipulated pen?”

Trends in speech recognition and synthesis have been narrowly focused upon

recognizing semantic word units only. The influence of prosody upon the interpretationof semantics and speaker intention has been neglected. Furthermore, research in prosodyrecognition proceeds largely outside of and separate from speech recognition researchefforts. Commercially available speech recognition packages do not even consider thatthird party developers might be interested in something aside from semantic content.IBM’s Via Voice and DragonSpeak’s Naturally Speaking do not include external codelibraries to permit third party developers to further process the raw speech signals. Speech recognition is largely a black-boxed procedure. Although this state of affairs is atestament to the difficulty of prosody recognition and interpretation, this may also beattributed to the fact that there are few compelling applications that use prosody andvocal expression in conjunction with semantic speech recognition. Prosodic Font canbegin to demonstrate the commercial viability of corporate prosody and speechrecognition, widening the scope of what qualifies currently as speech recognition.Prosodic Font contributes to the field of speech generation by developing discretetextual descriptions of emotionally charged segments of speech. This work points toprosodic features of interest, and how one might describe them in text.

Prosodic Font could also be useful to researchers in prosody and speech as a tool to helprecognize and identify prosodic and voice quality variation. Currently researchers learnhow to read prosodic variation from sequences of numbers and spectrograms of speechdata. Prosodic Font could be visual, temporal tool to help researchers identify the

success or failure of the algorithms they develop to extract prosody and affective featuresfrom speech.

Prosodic fonts are becoming a social need. Writing has seldom been used as acommunication medium in environments in which people are spatially co-located,

sometimes even in neighboring offices. The influence of electronic mail has made writinga tool of everyday management, conversation, and even romantic courting. Yet, writingemail is done differently than writing on paper has been done (Ferrara, Brunner, andWhittemore, 1991). The email register (i.e. “tone of voice”) is decidedly more informal,

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

even shorthand-ish, than writing that is used in other written contexts. This informalregister, added to the lack of richness and the level of spontaneity that the email mediumallows, has led to many terrible misunderstandings between people where the writer’sintent has been judged to be much different than that which the writer intended. In face-to-face conversation, prosody is central among human communication tools for

conveying psychological-emotional state, intentions, and the point of information focus.When writing provides little context for the hapless reader, such as in email, there is aneed for speaker’s intention and emotional state cues to be provided along with thesemantics of the message.

In the world of portable technology, there is a need for seamless translation betweenmediums such as voice and text, depending upon the sender’s and recipient’s currentsocial needs. A prosodic font provides such an interface that does not compromise anaudio message to the extent that semantic speech recognition would. Further, a prosodicfont’s design potential for emerging through time might be easily adapted to very smalldisplays. For example, imagine you are ensconced within a formal situation that shouldnot be interrupted, such as an important business meeting. You receive notice throughone of your portables that someone important to you has sent you a message. You wantto hear it, but you don’t want to risk interrupting the meeting, nor do you want othersaround you to hear your message. You select “visual” output. The message plays in aprosodic font, reflecting the sender’s tone of voice, rhythm, loudness, and forcefulness inthe systematic movement of the syllables over time. You can see in the words how thesender expresses emotion vocally, and you understand more deeply what she meant toconvey to you by seeing how the words change relative to each other. In this way,

translation from audio to text may occur without losing speech information. The writtenmessage is individual, contextual and expressive.

1. WHY DO THIS AT THE MEDIA LABORATORY?

Arriving at the concept of prosodic typography is a product of having been at the MediaLaboratory and stepping into the midst of many streams of research that flow within thesame channel here. The on-going work in prosody, affect, and design of textual

information, in addition to the unique convergence of creativity, science and technologyhas made it possible to dream about prosodic type.

This work builds upon work completed in the Visual Language Workshop (VLW).Researchers and students designed computer interfaces to textual information thatinvolve many notions of time. It is VLW students, particularly Yin Yin Wong, whotransferred the idea of Rapid Serial Visual Presentation (RSVP) to message design. TheAesthetics and Computation Group (ACG), chasing Professor John Maeda’s vision ofhow computer technology transforms design, is an intoxicating trajectory with no clearending. Janet Cahn’s work in emotive, intonational speech generation – and Janet Cahnherself – have provided me with direction into an amorphous and distributed body ofprosody and emotion literature. And, lastly, the spirit of curiosity and art that envelopeseven the most scientific of inquiries here has allowed me to learn the technical skills Ineeded to accomplish this work.

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

BACKGROUND

Prosodic Font draws upon work done in phonetic and phonological linguistics research.In particular, I use the work of auto-segmental metrical phonologists who believe thatintonation and prosody are not linguistic systems per se, but that the stream of prosodycan be understood in linear segments. The Prosody and Affect section thus draws adistinction between linguistic and paralinguistic speech features, how we might locateparalinguistic features perceptually and computationally, and communication.

Typographic History describes the historical features of typographic space and perceptualissues of font design. I discuss the migration of some of these historical graphic featuresto temporal design, and introduce new features.

2. PROSODY AND AFFECT

The current task of speech recognition is only to decode the orthographic representationof phonetic sound units. Prosodic Font requires the linguistic function of language onlyinsofar as obtaining the orthographic representation. Prosodic Font’s focus continuesbeyond to that of prosody – the paralinguistic features of speech that convey amultiplicity of emotional, informational and situated meanings.

Prosody is a paralinguistic category that can describe the song – or intonation, rhythm, andvocal timbre (or voice quality) found in all spoken utterances of all languages. Prosodyfunctions above the linguistic function of language, meaning, prosodic meaning does notbear a one-to-one relationship to semantic meaning. It is a non-arbitrary use of vocalfeatures to convey the way we feel about what we are saying, as well as how we are

feeling when we say anything. A number of primitive features interact within any spokenutterance to create a uniquely phrased and emphasized utterance. A spoken utterance,then, conveys two simultaneous channels of communication – the linguistic andparalinguistic. Written language represents the linguistic channel. Prosodic Font goesfurther to represent the paralinguistic channel on top of the visual linguisticrepresentations.

Dr. Robert Ladd describes the coordination of the paralinguistic and linguistic:

“The central difference between paralinguistic and linguistic messages resides in the quantal or categoricalstructure of linguistic signalling and the scalar or gradient nature of paralanguage. In linguistic signalling,physical continua are partitioned into categories, so that close similarity of phonetic form is generally of norelevance for meaning: that is /th/ and /f/ are different phonemes in English, despite their close phoneticsimilarity, and pairs of words like thin and fin are not only clearly distinct but also semantically unrelated.In paralinguistic signalling, by contrast, semantic continua are matched by phonetic ones. If raising thevoice can be used to signal anger or surprise, raising the voice a lot can signal violent anger or great

surprise. Paralinguistic signals that are phonetically similar generally mean similar things.... The differencebetween language and paralanguage is a matter of the way the sound-meaning relation is structured” (1996,p. 36).

Defining prosody is a difficult and contentious task since there is no common agreement.Further, each discipline places different vocal features into the prosodic feature http://www.77cn.com.cnputational linguists and speech communication researchers identify intonation andprominence as the major prosodic feature set items, while poets and poetry critics

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

associate prosody with rate of speaking and metrical rhythm. Experimental psychologistshave studied vocal prosody for how it can inform research on emotion. Some findings goso far as to integrate prosodic parameters of voice quality, range, and speaking durationdifferences along axes of emotion; however, there are fundamental disagreements abouthow emotional space is defined. Some anthropologists have looked at how vocal timbrechanges across context, building upon the work of linguistic anthropologist JohnGumperz in contextualized vocal prosody (1982). Yet this work is not complete norsystematized.

Not only is the definition and what constitutes the prosodic feature in question, but thebasic function of prosody within and across languages is in dispute. Prosody may haveuniversal import to humans, irrespective of which language is spoken. The universality ofprosody is often borne out in psychological tests in which subjects identify the primaryemotion in a voice speaking a language unknown to them (Scherer 1981). Intonationalphonology’s primary goal is to discover the universal functions of prosody. On the otherhand, linguists often subjugate prosody to the status of a linguistic amplifier, believingthat prosody is used by speakers to foreground certain linguistic items introduced intothe conversation, amongst other things.

The field of prosody varies across three dimensions:

Affective versus Syntactic Ontology: those who hold that intonation and patterns of

prominence developed as an extension of grammar and discourse structure versus thosethat believe prosody has non-linguistic roots in affect and emotion that develop inconventionally understood ways, dependent upon sociological and linguistic factors.Phonetic versus Phonological Goals: those who use low-level descriptions of the voice signalversus those who characterize the signals in universal terms that enable comparison andgeneration of phonological rules across individual speakers’ production. (Another way ofdescribing this difference is low to mid-level descriptions versus high-level descriptions.)Linear versus Layered Descriptions: those who believe that prosody is constructed of a linearsequence of events versus those who believe that prosody consists of layers of signals ofgreater or lesser range which interact to produce a composite effect.

My approach to Prosodic Font involves a combination of approaches. Prosodic Fontuses low- to mid-level signal characterizations of voice in order to represent individualdifferences between speakers. However, these events are understood as linear sequencesof meaningful events in order to capture the emotional intention of the song and rhythmapart from the pronunciation requirements of particular words. This serves to smooththe low-level signals and foreground higher level changes and trends. For example,

Prosodic Font does not represent the spectral differences between an /a/ phoneme andan /i/ phoneme, but it would represent a general increase in volume and fall in pitch.Prosodic Font does not require that speech be labeled as an instance of any categoricalemotion or syntactical construction. Although vocal characteristics of some basic

emotions have been identified, correctly identifying affect in a voice signal is fraught withthe potential of mis-identification. To avoid this, I built Prosodic Font with an implicitunderstanding that prosody functions primarily as an instrument of emotional

communication, but the best way to represent affect is to use interpretations of low- tomid-level voice signals.

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

Prosodic Font is interested in more speech data than is currently described in most

syntactical, linguistic research. Casual speech is not often used as an object of analysis. Assuch, speech errors such as false starts and mispronunciations, non-linguistic

exclamations and the like are not described as significant events in syntactic research;whereas, Prosodic Font would find these meaningful, expressive vocal events. Certainly,if Prosodic Font were ever generated from text, syntax and discourse structure would becentral as it is in speech generation. But in terms of speaking Prosodic Font, syntaxemerges as a by-product of a speaker using proper grammatical forms. Syntax, per se,does not affect the visuals.

Prosodic Font assumes that people intuitively understand intonation as a relative systemof contrasts and similarities, and that people will still understood the semantic intentionof prosody if the parameters that comprise its system are mapped onto a completelyalternative medium. This assumes that there is nothing essential or hard-wired aboutpeople’s use and understanding of sound, except that it is an extremely flexible

instrument particularly well-suited to a system as elastic and diverse as prosody. Hence, ifthere were a correspondingly flexible medium, such as computational fonts, there couldbe many mapping relationships established between the parameters sets that would beexpressively meaningful to readers. This assumes a competency on the part of readers,that they can and will be able to read and understand the prosodic relationships conveyedvia fonts. It also assumes a competency on the part of the font designers, speech andprosody recognition systems, that they will select signals to map and mapping

relationships that implicitly have semantic, expressive, and affective meanings to people.First, I define the prosodic feature set, in terms of song and rhythm. Secondly, I describethe perceptual and computational techniques for finding these features within

spontaneous speech. Next, I describe methods of describing prosodic features accordingto relevant theories within the phonetic and phonological fields, and specify which onesare most productive in a Prosodic Font context. And finally, I review provocativefunctions of prosody; and argue that prosody must be understood first as a situated,emotional expression that interacts closely with linguistic structure.

2.1 FEATURE SET2.1.1 Song

Song designates those prosodic features that are centrally involved in the production andperception of tone and pitch. These features are the intonational contour, pitch accentsand final phrasal tones, as well as pitch range.

2.1.1.1 Intonation

Intonation is the psychological perception of the change in pitch during a spoken

utterance. It can also be called the tune of an utterance. Intonation is the perception ofthe physical signal, fundamental frequency (F0). F0 is a measurable signal produced of voicedspeech, a glottal vibration such as evident in the phone /v/ as opposed to the unvoicedphone /f/. The excitation for voiced speech sounds is produced through periodic

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

vibration at the glottis, which in turn produces a pulse train spaced at regular intervals.This is the source of the perceived pitch.

He won't be going will he

Figure 3: The intonation, or tune, of the utterance “He won’t be going willhe” is represented here as a continuously curved line.

Intonation occurs in units called intonational phrases. The intonational phrase can be

distinguished by the presence of an ending tone that signals its closure and by a duration ofsilence that follows the utterance. The duration of the silence and the height or depth ofthe ending tone that follows an intonational phrase may be indicative of the intendedstrength of the ending (Ladd, 1996) or a speaker’s intention of continuing (Pierrehumbertand Hirschberg, 1990). The ending tone, or boundary tone, forms a tonal tail on the utterancethat is high, equal, or low relative to the utterance.

He won't be going will he

Figure 4: The ending tone, or boundary tone, of the intonational phrasefalls approximately within the circled region.

Intonation in particular, relative to other prosodic features, can convey very fine shadesof meaning. Intonation researcher Dwight Bolinger defines intonation as, “all uses offundamental pitch that reflect inner states...” (1989, p. 3). There is evidence that speakersintone with a high degree of precision. Subtle intonational changes can radically affectthe hearer’s interpretation of the words, as well as provide a window onto the speaker’saffective state. Three examples illustrate this difference.

1 2

You might have told me.You might have told me.

Figure 5: The intonation of “You might have told me” can imply indignation [left] todoubt [right]. Example after Bolinger (1989).

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

High

Hello

Hello

Low

Hello

Figure 6: The common greeting “Hello” can convey speaker mood and intentions in a very short linguisticsound-unit. The examples might be interpreted as such: [1] cheery; [2] a response to an initial sexualattraction; and [3] expressing indifference, or no desire to continue the social meeting.

High1 2 3 4

Low

Figure 7: Intonation is independent enough from linguistic structure to imply distinct affective meanings evenwhen accompanied by a non-linguistic sound-unit “mmmhmm”. Non-linguistic sound-units are often used asa backchannel comment from hearer to speaker, to give feedback while the other holds the conversationalfloor. Affective-semantic meanings might range from interpretations of: [1] vigorous agreement; [2] confusion;[3] final comprehension; [4] boredom and disdain.

An intonational phrase does not imply any degree of well-formedness. For example, if aperson stops suddenly during an utterance – even half-way through a word – and beginsagain on a different subject, or coughs or burps, the presence of silence should besufficient reason to mark the end of an intonational phrase. Therefore, an intonationalphrase is not beholden to any syntactical-grammatical notion of completeness or well-formedness. And, in fact as we shall see later, vocal disturbances and so-called speech“errors” can be revealing of the speaker’s affective state. Hence, Prosodic Font shouldseek to convey these non-linguistic vocal sounds as well as the linguistic.

2.1.1.2 Pitch Accent

During the course of any utterance, a speaker speaks certain syllables with greaterprominence than others. There are two kinds of prominence within English, lexicalprominence and prosodic prominence. Lexical prominence is the preferred placement of

accentuation within any given word item, as in the citation form of /LEX-i-cal/. Lexicalprominence is often called syllabic stress, or just stress. Prosodic Font addresses lexicalstress as an element of rhythm.

Prosodic prominence is created through intonational contours; hence, it is an accent

conveyed as an aspect of the utterance’s tune. It is also called intonational accent, pitch accent,or just accent. Accent is placed upon syllables that are often, but not exclusively, foundwithin the class of lexically prominent syllables.

A pitch accent is achieved through distinctive changes in the F0 contour. These changescan be classified as either High or Low. A number of prosodic features often coincide

the Space between the Spoken and the Written tara michelle graber rosenberger The advent of automated speech recognition opens up new possibilities for design of new typographic forms. Graphic designers have long been designing text to evoke the sound of a

with an accent, such as increased duration, increased loudness, and vowel fullness (i.e.not reduced phonetic form).

本文来源:https://www.bwwdw.com/article/fmqi.html

Top