8 Narrator Device / Writing to the Narrator Device


You write to the narrator device by passing a narrator_rb i/o request to
the device with cmd_write set in io_command, the number of bytes to be
written set in io_Length and the address of the write buffer set in
io_Data.

    VoiceIO->message.io_Command = CMD_WRITE;
    VoiceIO->message.io_Offset  = 0;
    VoiceIO->message.io_Data    = PhonBuffer;
    VoiceIO->message.io_Length  = strlen(PhonBuffer);
    DoIO((struct IORequest *)VoiceIO);

You can control several characteristics of the speech, as indicated in the
narrator_rb struct shown in the device interface section.

Generally, the narrator device attempts to speak in a non-regional dialect
of American English.  With pre-V37 versions of the device, the user could
change only a few of the more basic aspects of the speaking voice such as
pitch, male/female, speaking rate, etc.  With the V37 and later versions
of the narrator device, the user can now change many more aspects of the
speaking voice.  In addition, in the pre-V37 device, only mouth shape
changes could be queried by the user.  With the V37 device, the user can
also receive start of word and start of syllable synchronization events.
These events can be generated independently, giving the user much greater
flexibility in synchronizing voice to animation or other effects.

The following describes the fields of the narrator_rb structure:

message.io_Data
    Points to a NULL-terminated ASCII phonetic input string.  For
    backwards compatibility issues, the string may also be terminated
    with a "#" symbol.  See the how to write phonetically for narrator
    section of this chapter for details.

message.io_Length
    Length of the input string.  The narrator device  will parse the
    input string until either a NULL or a "#" is encountered, or until
    io_Length characters have been processed.

rate
    The speaking rate in words/minute.  Range is from 40 to 400 wpm.

pitch
    The baseline pitch of the speaking voice.  Range is 65 to 320 Hertz.

mode
    The F0 (pitch) mode.  ROBOTICF0 produces a monotone pitch, NATURALF0
    produces a normal pitch contour, and MANUALF0 (new for V37 and later)
    gives the user more explicit control over the pitch contour by
    creative use of accent numbers.  In MANUALF0 mode, a given accent
    number will have the same effect on the pitch regardless of its
    position in the sentence and its relation to other accented
    syllables. In NATURALF0 mode, accent numbers have a reduced effect
    towards the end of sentences (especially long ones).  In addition,
    the proximity of other accented syllables, the number of syllables in
    the word, and the number of phrases and words in the sentence all
    affect the pitch contour.  In MANUALF0 mode these things are ignored
    and it's up to the user to do the controlling.  This has the
    advantage of being able to have the pitch be more expressive.  The
    F0enthusiasm field will scale the effect.

sex
    Controls the sex of  the speaking voice (MALE or FEMALE).  In
    actuality, only the formant targets are changed.  The user must still
    change the pitch and speaking rate of the voice to get the correct
    sounding sex.  See the include files for default pitch and rate
    settings.

ch_masks
    Pointer to a set of audio allocation maps.  See the "audio device"
    chapter for details.

nm_masks
    Number of audio allocation maps.  See the "audio device" chapter
    for details.

volume
    Sets the volume of the speaking voice.  Range 0 - 64.

sampfreq
    The synthesizer is ``tuned" to a sampling frequency of 22,200 Hz.
    Changing sampfreq affects pitch and formant tunings and can be used
    to create unusual vocal effects.  For V37 and later, it is
    recommended that F1, F2, and F3adj be used instead to achieve this
    effect.

mouths
    If set to a non-zero value will direct the narrator device  to
    generate mouth shape changes and send this data to the user in
    response to read requests.  See the reading from the narrator device
    section for more details.

chanmask
    Used internally by the narrator device. The user should not modify
    this field.

numchan
    Used internally by the narrator device.  The user should not modify
    this field.

flags (V37)
    Used to specify V37 features of the device.  Possible bit settings
    are: NDB_NEWIORB - I/O request block  uses V37 features. NDB_WORDSYNC
    - Device should generate start of word sync events. NDB_SYLSYNC -
    Device should generate start of syllable sync events. These bit
    definitions and their corresponding field definitions (NDF_NEWIORB,
    NDF_WORDSYNC, and NDF_SYLSYNC) can be found in the include files.

F0enthusiasm (V37)
    The value of this field controls the scaling of pitch (F0) excursions
    used on accented syllables and has the effect of making the narrator
    device sound more or less "enthusiastic" about what it is saying.
    It is calibrated in 1/32s with unity (32) being the default value.
    Higher values cause more F0 variation, lesser values cause less.
    This feature is most useful in manual F0 mode.

F0perturb (V37)
    Non-zero values in this field cause varying amounts of random
    low-frequency modulation of the pitch (F0).  In other words, the
    pitch shakes in much the same way as an elderly person's voice does.
    Range is 0 to 255.

F1adj, F2adj, F3adj (V37)
    Changes the tuning of the formant frequencies. A formant is a major
    vocal tract resonance, and the frequencies of these formants move
    continuously as we speak.  Traditionally, they have been given the
    abbreviations of F1, F2, F3... with F1 being the one lowest in
    frequency.  Moving these formants away from their normal positions
    causes drastic changes in the sound of the voice and is a very
    powerful tool in the creation of character voices.  This adjustment
    is in ±5% steps.  Positive values raise the formant frequencies and
    vice versa. The default is zero.  Use these adjustments instead of
    changing sampfreq.

A1adj, A2adj, A3adj (V37)
    In a parallel formant synthesizer, the amplitudes of the formants
    need to be specified along with their frequencies.  These fields bias
    the amplitudes computed by the narrator device.  This is useful for
    creating different tonal balances (bass or treble), and listening to
    formants in isolation for educational purposes.  The adjustments are
    calibrated directly in ±1db (decibel) steps.  Using negative values
    will cause no problems; use of positive numbers can cause clipping.
    If you want to raise an amplitude, try cutting the others the same
    relative amount, then bring them all up equally until clipping is
    heard, then back them off.  This should produce an optimum setting.
    This field has a +31 to -32 db range and the value -32db is
    equivalent to -infinity, shutting that formant off completely.

articulate (V37)
    According to the popular theories of speech production, we move our
    articulators (jaw, tongue, lips, etc.) smoothly from one "target"
    position to the next.  These articulatory targets correspond to
    acoustic targets specified by the narrator device for each phoneme.
    The device calculates the time it should take to get from one target
    to the next and this field allows you to intervene in that process.
    Values larger than the default will cause the transitions to be
    proportionately longer and vice versa.  This field is calibrated in
    percent with 100 being the default.  For example, a value of 50 will
    cause the transitions to take half the normal time, with the result
    being "sharper", more deliberate sounding speech (not necessarily
    more natural).  A value of 200  will cause the transitions to be
    twice as long, slurring the speech.  Zero is a special value in the
    narrator device will take special measures to create no transitions
    at all and each phoneme will simply be abutted to the next.

centralize (V37)
    This field together with centphon can be used to create regional
    accent effects by modifying vowel sounds.  centralize specifies the
    degree (in percent) to which vowel targets are "pulled" towards the
    targets of the vowel specified by centphon.   The default value of 0%
    indicates that each vowel in the utterance retains its own target
    values.  The maximum value of 100% indicates that each vowel's
    targets are replaced by the targets of the specified vowel.
    Intermediate values control the degree of interpolation between the
    utterance vowel's targets and the targets of the vowel specified by
    centphon.

centphon (V37)
    Pointer to an ASCII string specifying the vowel whose targets are
    used in the interpolation specified by centralize.  The vowels which
    can be specified are: IY, IH, EH, AE, AA, AH, AO, OW, UH, ER, UW.
    Specifying other than these will result in an error code being
    returned.

AVbias, AFbias (V37)
    Controls the relative amplitudes of the voiced and unvoiced speech
    sounds.  Voiced sounds are those made with the vocal cords vibrating,
    such as vowels and some consonants like y, r, w, and m.  Unvoiced
    sounds are made without the vocal cords vibrating and use the sound
    of turbulent air, such as s, t, sh, and f.  Some sounds are
    combinations of both such as z and v.  AVbias and AFbias change the
    default amplitude of the voiced and unvoiced components of the sounds
    respectively.  (AV stands for Amplitude of Voicing and AF stands for
    Amplitude of Frication).  These fields are calibrated in ±1db steps
    and have the same range as the other amplitude biases, namely +31 to
    -32 db.  Again, positive values may cause clipping.  Negative values
    are the most useful.

priority (V37)
    Task priority while speaking.  When the narrator device begins to
    synthesize a sentence, the task priority remains unchanged while it
    is calculating acoustic parameters.  However, when speech begins at
    the end of this process, the priority is bumped to 100 (the default
    value). If you wish, you may change this to anything you want.
    Higher values will tend to lock out most anything while speech is
    going on, and lower values may cause audible breaks in the speech
    output. The following example shows how to issue a write request to
    the narrator device.  The first write is done with the default
    parameter settings.  The second write is done after modifying the
    first and third formant loudness and using the centralization feature.

The following example shows how to issue a write request to the narrator
device.  The first write is done with the default parameter settings.  The
second write is done after modifying the first and third formant loudness
and using the centralization feature.

     speak_narrator.c