Laboratoire Parole et Langage


THE PHYSIOLOGIA SYSTEM

PHONEDIT


The PHYSIOLOGIA system is a tool for researching physiological mechanisms of speech production. It consists of a PC-compatible computer-driven work station designed to record, edit, and process acoustic speech signals in relation to the corresponding physiological signals. These signals are obtained from various sources (flow-rate transducers, pressure transducers, position and movement gauges, electrodes, microphones, and laryngophones, etc.) which vary in bandwidth.

The PHYSIOLOGIA work station consists of an IBM PC (between 486, DX33 and Pentium), an acquisition system equipped with various transducers and the signal editing and processing software PHONEDIT.


1. Data Capturing Specifications.

The movements of the articulatory organs can be studied on three different levels. Analysis can deal with the neuromotor control of the muscles involved in speech, their actual movements and the phenomena induced. The first level is essentially represented by electromyography; the second consists of direct movement analysis using videocinematographic imagery techniques or displacement transducers; the third includes aerodynamic and acoustic phenomena, which evolve as we move through the vocal tract, and which, in their final state, produce the information- conveying speech signal.

Our objective is to analyze the greatest possible number of parameters wich influence the production of speech. The difficulty of this task lies in the ability to simultaneously acquire data from different sources. Such acquisition depends on the quantity of information needed for the adequate description of the various parameters.

For electromyographic signals, a bandwidth of 1 kHz is known to be sufficient, especially if the signals are integrated beforehand. Signals from kinesiographic transducers fluctuate at the same rate as the movements of the articulatory organs, i.e. slowly, so a bandwidth of 1 kHz is also quite sufficient in this case. Aerodynamic parameters require approximately the same bandwidth as kinesiographic signals, even for special fast video recorders.

Electropalatography (EPG) is a special case in this respect, since multiplexing palatal contacts must be done within a short observation time interval. For theoretical reasons and practical considerations, a l msec time resolution between palatal frames is sufficient. The synchronization link with video image acquisition systems is done by means of a 50 frames per second synchronization signal (European video standard). The 1 kHz bandwidth is sufficient here also.

A bandwidth of 10 kHz is available for speech, in compliance with the European ESPRIT SAM, project recommendations . A 12 bit A.D. converter resolution is sufficient for the physiological parameters dynamic. For usual speech dynamic this resolution is sufficient here also, but a 16 bit resolution is preferable mainly for a good homogeny with SAM project and the "digital sound".


2. Data Acquisition System

2.1. Acquisition interface

Acquisition interface can be used to record data from up to 6 modules selected among 1 acoustical input, a group of 8 physiological inputs, or an electropalatograph (fig. 1 - 14K). Only one EPG is allowed, but there may be more than one of the other types of modules. Each module has the bandwidth of one acoustical input. A physiological module has 8 inputs of one eighth acoustical bandwidth The EPG module has 8 inputs of 8 contacts.

The standard version of the interface has 2 acoustical inputs, 16 physiological inputs (2 modules of 8), and an EPG. We can add to this configuration a third acoustical input or group of 8 physiological inputs. In an extreme case, there could be as many as 6 acoustical inputs or 48 physiological inputs. The interface is used with an acquisition board, Intelligent Instrumentation PCI 200041C-3A, with a 12 bits analog input module PCI 20019M-1A, a 12 bit analog output module PCI 20003M and an expander /sequencer module PCI 20031M-1. The 16 bit resolution can be used with the modules PCI20341M-1 and PCI20006M-2.

Thanks to the random programming feature of the input multiplexer and the possibility of varying the sampling frequency, in function of the number of acoustical and physiological inputs, several variants in the acquisition configuration can be set up. Thus, the acquisition system is highly flexible, and can be adapted to fit each new experiment without loading down the processing system.


2.2 Acoustical inputs

The speech signal can be recorded via acoustical input:

-from an electrodynamic, symmetric or asymmetric microphone, an Electret microphone (with a phantom power supply furnished by the system), and a Bruel and Kjaer serie 4ooo studio microphone (with a 120 volts power supply furnished by the system) . It is possible to calibrate it and another types of microphone to make sonometric measurements.

-from a line input for signals recorded on tape recorder, DAT or generated by an external transducer like electrolaryngograph.

The acoustical inputs are equipped with anti aliasing filter (Cauer 8 poles) with 4, 5, 8, 10, 16 and 20 kHz cutoff frequency . They are also equipped with a peak VU meter, and a threshold detector for automatic recording.


2.3 Physiological Inputs

The physiological inputs have a frequency bandwidth equal to one eighth of the acoustical input bandwidth. Three of them can be equipped with anti-aliasing filters identical to the one used for the acoustical inputs, with a fixed cutoff frequency of 1 kHz. Their maximum level is plus or minus 10 volts.


2.4 Electropalatograph (EPG)

The EPG system consists of a coupler device between the acquisition interface and one "Reading" EPG marketed by the Millgrants Company (UK). It can be used to visualize the points where the tongue touches the palate. Tongue contact is detected by 62 electrodes placed in various positions on a palatal plate. The coupler device transforms EPG digital data in eight 8-bits analog signals.This data is thus synchronous with data corresponding to acoustical and physiological parameters. Regardless of the configuration of the multiplexer, the synchronisation error is 1 ms for a 8 kHz acoustical bandwidth. Artificial palates are supplied by Millgrants Company (UK).


2.5 Remote control system

The remote control system can be used to acquire physiological and acoustical parameters on the computer by:

- The starting and stopping of acquisition in mode AB or C (international instrumentation standard) upon receipt of signals from manual action, from an programable acoustic signal threshold level, or from a device outside the interface.

- Remote control of tape recorders (DAT) or devices outside the interface.

The remote control system can also be used for the automatic real-value calibration of physiological input signals obtained from our custom-made measuration devices.

It is possible to synchronise video records with the acoustical and physiological signals through an extension of the remote control system.


2.6 Acoustical output

The interface has one acoustical output for listening to speech signal files. It is equipped with an amplifier for use with a loudspeaker or earphones. The output signal reconstructing filter is identical to the one used for the acoustical inputs and has the same frequency cutoffs.

Two acoustical outputs are possible, to generate dichotic stimulus for hearing tests (an option).


3. Acquisition driver

This set of programs handles the acquisition of the speech signal and of the various articulatory parameters. Prior to recording, a program is used to create the corpus of sentences to be pronounced. This program displays the corpus on a screen. Sentence display is triggered during the experiment by the remote control system.

The acquisition program include a data acquisition module in which the user defines the instrumental configuration. The experimenter selects a channel for each parameter. It indicates to the system the input numbers and types of signals used (acoustical or physiological, oral and nasal airflow, oral phonogram, etc..) the level for each signal is adjusted and the user is informed by bargraphs if any saturation occurs. A special window on screen allows to adjust the threshold of the EPG. Automatic calibration is performed. The sampling frequency is chosen according to the frequency bandwidth of the acoustical signal (4, 5, 8, 10, 16, or 20 khz),.and the number of channels.

The acquisition file headers are in RIFF format (an ISO standard registered under the name of EAIFF85). The file headers contain several fields, some of which are optional. The fields include: Acquisition specifications (optional): number of channels, number of samples, sampling frequency, resolution (8, 12 or 16-bit), maximum value, minimum value, zero. Signal specifications (mandatory): signal code, signal name, number of samples, sampling frequency, largest value, smallest value, maximum calibration, zero calibration, unit of measurement. The signal (mandatory). Additional information (optional): creation date, comments on corpus, key words, signal source, acquisition software, copyright, recording and storage place, user's name, etc. Subject date (optional): name, age, sex, native language, etc...

The acquisitions are generally started and stopped manually by the experimenter or the speaker. The acquisition duration can be very long. It is only limited by the size of memory. The raw acquisition data file containing interspersed segments of data pertaining to the various parameters is split down into as many files as there are physiological signals. The acoustic signals require an additional mixing operation. The parameter files are then saved on disk.

In practice, acquisition is done one sentence at a time. However, it is possible to do real-time acquisition of long signals directly on disk. In this case, the maximum sampling frequency depends on the type of PC used and several parameters such as the type of bus (ISA or EISA), local bus (VESA or PCI), the disk interface (SCSI or IDE), etc...

The split and mixing operations are performed on disk and require much more time. Then the parameter files can be saved on magnetic tape or on some other storage device (magneto-optical disk).


4. Signal Edition and data Processing

The signal editing module is the PHONEDIT software system, which runs in the Microsoft WINDOWS (3.11 and 95) environment. It is used to visualize, segment, mark off, measure, and process the recorded parameters. PHONEDIT also recognizes most current types of files: ACCOR Edit System (Reading), SAM (GERSON, BDSONS, etc.), ILS, Microsoft WAWE (Multimedia on PC); KAY CSL, SIGNALYZE, SOUND WAWE, ACSII Format, RAW and RAW UNIX. It is possible with a special program (DLL) to enable the design of custom files.

Icons, pop-down menus, and numerous utility programs make the system very easy to use.

PHONEDIT runs under WINDOWS 3.1 or 95, so a spread sheet such as QUATTRO PRO (Borland) or EXEL (Microsoft) or a data base management system like PARADOX (Borland) or ACCES (Microsoft) can be used directly to process the data output by the PHYSIOLOGIA system.

For labelling and data reduction operations, the PHONEDIT software can be used without the acquisition interface. It is possible in this case to listen to the PHYSIOLOGIA files via WINDOWS multimedia boards like SOUNDBLASTER. These cards can be used also to record acoustical signals only (with two channels).

You can download the shareware version of Phonedit (1,2 Mb).

4.1 Measurement operations

One-cursor operations:

By placing a single cursor on a curve, the user can measure linear or logarithmic amplitudes, calculate the spectrum (FFT or otherwise), visualize tongue-palate contacts (EPG), or organ movements (movetrack) or insert a label or segmentation marker (alphanumeric, symbolic, or phonetic alphabet) on a line reserved for that purpose (fig. 2 - 600K).

Parameter amplitudes can be calibrated (1) by the calibration signal (zero and maximum) generated by certain measuring devices, (2) by specifying the parameter maximums and units to the computer, or (3) automatically, if our custom-made instruments equipped with a calibration system are used.

Two-cursors operations:

Between two cursors, the user can zoom and scroll measure durations, count events. Iisten to the signal and calculate the linear or logarithmic difference, the integral, mean, standard deviation, or variation coefficient. It is possible to create a cursor with a variable length (zone cursor) wich defines a moving window. In this window different statistical treatments can be made as between two cursors.


4.2 External functions

External functions are operations applied to parameter curves. They consist of a dynamic link programs library (DLL) performed by three modules.

Computation module:

This module executes the following operations: sum or difference of two curves, absolute value, integration (variable-length steps), powers, roots, logarithms, RMS, quadratic spline, level quantification, variable time shifts, phase inversion, synchronous mean, EPG sum elevation and translation, and EMA (Movetrack or Articulograph) movements interpretation (fig 3 - 600K). A statistical library can be used with shimmer and jitter in adition off the treatments of the zone cursors.

A detailed source file such as DLL squeleton is avalaible to allow the user to develop his own specific operations.

Acoustic analysis module:

This module performs frequency-amplitude analysis (FFT, LPC. 1/3 octave, critical bands, long-term spectrum) and time-frequency-amplitude analysis (wide or narrow band sonograms followed by formants) (fig. 4 - 600K).

Pitch analysis module:

It is composed of two methods of pitch detection, based on the COMB and AMDF (fig 5 - 600K).

The melodic curves can be modelled by spline and automatic detection of target points.

All measurement results (time, amplitude, frequency, statistic etc..) can be saved to a backup table interfaced with market spreadsheets (EXCEL, QUATRO etc..) (fig. 6 - 600K). A utility program allows the printing of the different screen windows, wich can be moved, by "drag and drop" to different editor progams such as WORD.

The screens can be printed in colour or black and white on any printer supported by WINDOWS.


5 Measurement devices

The PHYSIOLOGIA environment offers a set of transducers and measurement devices which are particularly suited to the study of the mechanisms of speech production. They are normally used on the EVA medical work station for speech and voice pathologies .

5.1 Aerodynamic transducers

The aerodynamic transducers perform what we call the aerophonometric function, i.e. they measure the air pressure levels in the vocal tract and the resulting inhaled and exhaled, oral and nasal airflow rates, as a function of the movement of the articulators.

Oral airflow transducer:

The oral airflow rate is measured by a grid pneumotachograph (PTG) with a low dead volume, good linearity, and a wide range on 6 scales: 5, 2, 1, 0.5, 0.2, and 0.1 dm3 per second (or liters/second). The interface between the subject's face and the transducer is achieved via flexible silicone rubber mouthpiece.

Nasal airflow transducer:

The nasal airflow transducer is identical to the buccal airflow transducer. its location and shape were designed to guarantee maximum nasal air evacuation and measurement accuracy. The nasal airflow rate is measured in the nostrils using silicone nosepieces of various sizes. The measurement scales and ranges are the same as for the oral airflow transducer.

Intra oral and subglottic pressure:

This parameter are measured with pressure transducers on 6 scales: 100, 50, 20, 10, 5, and 2 hP (mB).


5.2 Instantaneous pitch meter

The pitch meter measures the instantaneous vibration frequency of the larynx with great acuracy, period by period, based on the speech signal acquired by microphone or by the laryngeal transducer. It operates in real-time, and has four measurement scales: 250, 500, 1000, and 2000 Hz.


5.3 Intensity detector (sonometer)

The intensity detector is a sound level meter. It measures the logarithm of the RMS of the speech signal. Its integration time constant is 10 ms (50 ms can be selected for very low, male voices), his bandwidth is 20 kHz.

Normalized frequency weighting "A" is also available. The detector range is 100 dB on a single scale (20 to 120dB). It is calibrated for Bruel and Kjaer 4000 microphones, and can also be calibrated for one other microphone.


English home page
Page d'accueil en français