About CAPE-V


General Description of the Tool

The CAPE-V indicates salient perceptual vocal attributes, identified by the core consensus group as commonly used and easily understood. The attributes are: (a) Overall Severity; (b) Roughness; (c) Breathiness; (d) Strain; (e) Pitch; and (f) Loudness. The CAPE-V displays each attribute accompanied by a sliding scale forming a visual analog scale (VAS). The clinician indicates the degree of perceived deviance from normal for each parameter on this scale, using a tic mark. For each dimension, scalar extremes are unlabeled. Judgments may be assisted by referring to general regions indicated below each scale on the CAPE-V: “MI” refers to "mildly deviant," “MO” refers to “moderately deviant,” and “SE” refers to "severely deviant." A key issue is that the regions indicate gradations in severity, rather than discrete points. The clinician may place tick marks at any location along the line. Ratings are based on the clinician’s direct observations of the patient’s performance during the evaluation, rather than patient report or other sources.

To the right of each scale are two radio buttons, “C” and “I.” “C” represents "consistent" and “I” represents "intermittent" presence of a particular voice attribute. The rater clicks the letter that best describes the consistency of the judged parameter. A judgment of “consistent” indicates that the attribute was continuously present throughout the tasks. A judgment of “intermittent” indicates that the attribute occurred inconsistently within or across tasks. For example, an individual may consistently exhibit a strained voice quality across all tasks, which include sustained vowels and speech. In this case, the rater would click “C” to the right of the strain scale. In contrast, another individual might exhibit consistent strain during vowel production, but intermittent strain during one or more connected speech task. In this case, the rater would click “I” to the right of the strain scale.

Definitions of Vocal Attributes:

Global, integrated impression of voice deviance.
Perceived irregularity in the voicing source.
Audible air escape in the voice.
Perception of excessive vocal effort (hyperfunction).
Perceptual correlate of fundamental frequency. This scale rates whether the individual's pitch deviates from normal for that person's gender, age, and referent culture. The direction of deviance (high or low) should be indicated in the blank provided above the scale.
Perceptual correlate of sound intensity. This scale indicates whether the individual's loudness deviates from normal for that person's gender, age, and referent culture. The direction of deviance (soft or loud) should be indicated in the blank provided above the scale.


Blank scales and additional features:

The six standard vocal attributes included on the CAPE-V are considered the minimal set of parameters for describing the auditory-perceptual characteristics of disordered voices. The form also includes two unlabeled scales. The clinician may use these to rate additional prominent attributes required to describe a given voice. The clinician may indicate the presence of other attributes or “positive signs” not noted elsewhere under “Additional features.” If an individual is aphonic, this should be noted under "additional features" and no additional marks should be made on the scales.

Data collection:

The individual should be seated comfortably in a quiet environment. The clinician should audio record the individual’s performance on three tasks: vowels, sentences, and conversational speech. Standard recording procedures should be used that incorporate a condenser microphone placed 45 degrees off from the front of the mouth and a 4 cm mike-to-mouth distance. Audio recordings are recommended to be made onto a computer with at least 16 bits of resolution and a signal-sampling rate of no less than 20 KHertz.

Task 1:

Sustained vowels: Two vowels were selected for this task. One is considered a lax vowel (/a/) and the other tense (/i/). In addition, the vowel, /i/, is the sustained vowel used during videostroboscopy. Thus, the use of this vowel during this task offers an auditory comparison to that produced during a stroboscopic exam.

The clinician should say to the individual, “The first task is to say the sound, /a/. Hold it as steady as you can, in your typical voice, until I ask you to stop.” (The clinician may provide a model of this task, if necessary) The individual performs this task three times for 3-5 sec each. “Next, say the sound, /i/. Hold it as steady as you can, in your typical voice, until I ask you to stop.” The individual performs this task three times for 3-5 sec each.

Task 2:

Sentences: Six sentences were designed to elicit various laryngeal behaviors and clinical signs. The first sentence provides production of every vowel sound in the English language, the second sentence emphasizes easy onset with the /h/, the third sentence is all voiced, the fourth sentence elicits hard glottal attack, the fifth sentence incorporates nasal sounds, and the final sentence is weighted with voiceless plosive sounds.

The clinician should give the person being evaluated flash cards, which progressively show the target sentences (see below) one at a time. The clinician says, “Please read the following sentences one at a time, as if you were speaking to somebody in a real conversation.” (Individual performs task, producing one exemplar of each sentence.) If the individual has difficulty reading, the clinician may ask him or her to repeat sentences after verbal examples. This should be noted on the CAPE-V form. The sentences are: (a) The blue spot is on the key again; (b) How hard did he hit him? (c) We were away a year ago; (d) We eat eggs every Easter; (e) My mama makes lemon jam; (f) Peter will keep at the peak.

Task 3:

Running speech: The clinician should elicit at least 20 seconds of natural conversational speech using standard interview questions such as, “Tell me about your voice problem." or "Tell me how your voice is functioning."

Data scoring:

The clinician should have the individual perform all voice tasks—including vowel prolongation, sentence production, and running speech, before completing the CAPE-V form. If performance is uniform across all tasks, the clinician should mark the ratings indicating overall performance for each scale. If the clinician notes a discrepancy in performance across tasks, he or she should rate performance on each task separately, on a given line. Only one CAPE-V form is used per individual being evaluated.

Each of the individual tasks is scored separately on the relevant sliders. Click the bar labeled with the task to show the sliders for that task. When a slider is adjusted, the numerical score (measurement) appears to the right of the scale.


After the clinician has completed all ratings, he or she should measure ratings from each scale. To do so, he or she should physically measure the distance in mm from the left end of the scale. The mm score should be written in the blank space to the far right of the scale, thereby relating the results in a proportion to the total 100 mm length of the line. The results can be reported in two possible ways. First, results can indicate distance in mm to describe the degree of deviancy, for example “73/100” on “strain.” Second, results can be reported using descriptive labels that are typically employed clinically to indicate the general amount of deviancy, for example “moderate-to-severe” on “strain.” We strongly suggest using both forms of reporting.

It is strongly recommended that for all rating sessions following the initial one, the clinician have a paper or electronic copy of the previous CAPE-V ratings available for comparison purposes. He or she should also rate subsequent examinations based on direct comparisons between earlier and current audio recordings. Such an approach should optimize the internal consistency/reliability of repeated sequential ratings within a patient, particularly for purposes of assessing treatment outcomes. Although difficult, clinicians are encouraged to make every effort to minimize bias in all ratings. We acknowledge that this solution is imperfect.

Other procedures:

The clinician can indicate prominent observations about resonance phenomena under “Comments about resonance.” Examples include, but are not limited to hyper- or hyponasality, and cul-de-sac resonance.


Data available on the reliability of all rating scales for voice assessment indicate that both intra- and inter-judge agreement varies widely. Although we have attempted to limit sources of variability in the present tool, its reliability and validity have not yet been assessed. Special Interest Division 3 is in the process of field-testing the validity of the CAPE-V in a multicenter clinical trial. Details will be reported as data are analyzed and interpreted. Future editions are projected to include referent voice recordings as “anchors” as well as training modules.