Seeing Speech: How UTI works

How ultrasound tongue imaging (UTI) works

Ultrasound is sound with a frequency higher than 20kHz. The ultrasound used in our ultrasound tongue imaging system has a frequency of 5MHz.

The tongue

The tongue is a large flexible muscular organ that stretches from its root at the top of the throat to its tip, which can be protruded from the mouth. Most of the tongue’s mass is hidden from view, inside of the oral and pharyngeal cavities. The tongue is crucial to speech production: a large number of consonant and all vowel sounds are produced by modifying the shape of the tongue. Consonantal sounds are often produced by using the tongue to block or restrict air flow in the oral tract, while vowel sounds and sonorous consonants such as [j], [l], [ɹ] and [w] are produced by shaping pockets of air inside of the vocal tract so that these pockets of air resonate at specific frequency ranges.

Figure 1: A midsagittal MRI image of the vocal tract

Using ultrasound to create a visual image of the tongue

The ultrasound probe contains an array of piezoelectric ceramic elements that deform when an electric current is applied to them, and conversely emit an electric current when they are deformed by sound pressure. Application of an electric current causes the elements to vibrate at a very high frequency producing ultrasound waves that travel out from the curved probe surface. The speed of the ultrasound waves as they propagate through the soft tissue of the chin, throat and tongue is much faster (an average of 1540m/s) than sound in air (340m/s).

Figure 2: Recording tongue movement with a handheld ultrasound probe

An ultrasound machine transmits pulses consisting of a few ultrasonic wavelengths, one at a time from the probe, in discrete directions called scan lines (see Figure 3). When a pulse reaches any tissue boundary, a proportion of the wave energy is reflected. When the pulse reaches the boundary between the tongue surface and the air above it, almost all of the wave energy is reflected. Depending upon the angle and smoothness of the tongue surface in relation to the probe, some of the reflected waves from each tissue boundary will reach the probe. The ultrasound machine’s CPU uses the time delay between sending the pulse and receiving the reflected pulses to calculate the distance to each tissue boundary. The strength of each reflection is represented on the ultrasound image by the level of image brightness. Figure 3 shows a sagittal view of the tongue made up of 69 scan lines. These discrete scan lines are interpolated to produce a continuous image of the tongue surface (see Figure 4). The distance of the structures/surfaces from the probe is calculated using the formula d=1/2ct where c is set by the machine operator to estimate the speed of sound through the tissue being imaged. The near 100% reflection at the tongue air boundary often provides a bright tongue surface image. Since virtually no wave energy is transmitted into the air above the tongue, the hard palate is usually not visible in the ultrasound image unless the tongue is pressed against it, or fluid is swallowed.

Figure 3: Noninterpolated ultrasound tongue image made up of 69 scan lines

Figure 4: Ultrasound tongue image with interpolation

SEEING SPEECH

How ultrasound tongue imaging (UTI) works

The tongue

Using ultrasound to create a visual image of the tongue