SEEING SPEECH

Comparing MRI and UTI

Unlike MRI video and still frames, UTI does not allow us to see the boney and soft tissue structures of the vocal tract beyond the tongue. Figures 1 and 2 below show still frames from UTI and MRI respectively of the same person producing a /t/ sound on different occasions.

In the MRI image it is possible to see the mandible, part of the larynx, the epiglottis, the lips, part of the nose and nasal cavities, the hard and soft palates and part of the spine. In the UTI image, only the tongue and chin fat are visible.

Although it may seem that MRI is a far superior method for looking at tongue movements, there are issues associated with obtaining speech articulatory recordings using MRI, including low temporal resolution, noise pollution and intrusiveness. When recording a speaker using MRI, the speaker must lie down, which may change the centre of gravity of the tongue. Recording protocols that translate into faster temporal frame rates are in development and temporal recording frequencies for MRI are much lower than they are for UTI at this point in time. Speakers may have to produce sustained articulations during MRI recording in order for them to be clearly imaged (the method used for MRI recordings of speech on this website), or a composite video is created from a number of repeated articulations to improve temporal resolution. Sound quality must also be borne in mind. Noise-cancelling fibre-optic microphones are required in order to make MRI audio recordings, so there will always be some background noise during recordings.

In comparison, speakers being recorded with UTI can sit or stand in an upright position, recordings can be sampled at hundreds of frames per second. The recordings on this website were sampled at 121fps, which is more than enough to be able to identify the individual closures of an alveolar trill. Our ultrasound tongue imaging set up keeps all noisemaking equipment, such as the ultrasound machine and computer CPU, in an adjacent room, which allows speech recordings to be obtained without background noise.

The following video shows how the ultrasoud tongue image and palate trace relate to an MRI image:

With both UTI and MRI, there is a trade-off between temporal resolution, image resolution and scope of area imaged. The higher the UTI sample rate, the fewer the number of scan lines obtained and the greater the area of interpolation needed between each scan line. Therefore, UTI recordings with high frame rates and fewer scan lines will have a more smeared look. Better image quality at high frame rates can be achieved if the field of view is reduced, focusing either on the tongue root, or tip.