The Binaural Properties Of Masking

 

An under-explored area of psychoacoustics is the question of what, if any, effect binaural higher-order processing has on masking. Much has been written on the effects of auditory masking in general, (Zwicker & Fastl, 1990) and of binaural audio perception, particularly in regard to audio localization in three-dimensional space (Begault, 1994). Is there a relationship between the two areas of study, and if so what does it tell us about our ability to discriminate between audio objects in our environment?

The main area of the present investigation focused on testing the perception of tones masked by white noise of two types. In one half of the experiments, the type of noise used as masker was monophonic, or presented identically to each ear. In the other half, the noise was digitally generated separately for each ear, with no simple mathmatical relationship existing between the two sources beyond the fact that they were both randomly generated by the same algorithm.

Another issue was that of the masked tone itself. In order to avoid a potential listener bias towards the end of the tests where subjects might perceive imaginary masked tones due to auditory fatigue (Roederer, 1995), a varying tone was used to promote interest. The tone is a pure sinewave varying between the octave of 1KHz to 2KHz every second. The resulting warble is much like a police siren, an attention-grabbing sound even at very low levels. Although the level of the masker was constant throughout the tests, the level of the siren faded gradually from a level of -22dB (0dB representing the maximum for a 16-bit audio sample) to minus infinity. The fade out method was deemed superior to having the sound fade in over the noise, as it gave the subjects an initial immediate and clear reference as to the tone they were listening for.

Completeness demanded testing not only the effect of monaural vs. binaural masking of a monophonic tone, but the possibilities inherent in making the tone binaural as well. However, the same type of stereo spatialization used to differentiate the left and right noise sources could not be used here; this would require that the masked tone also be a noise source, which would defeat the experiment! Instead, I the tone modulation facility of the Cool Edit sound editing program to vary the phase of the siren tone between the left and right ears. This binaural phase modulation, hereafter referred to as the "Stereo Phase Differential" (SPD) of the masked tone, would vary 360 degrees 0, 1, 10 or 100 times per second, depending on the sample under test. In the zero case, this would be the same as a monophonic tone; there would be no difference in phase between the left and right ears in regard to the masked sound over the course of the sample (15 seconds).

The SPD sound treatment was a compromise for the benefit of a simplified presentation method that would require no hardware beyond that of a personal computer with a stereo 16-bit soundcard and headphones. The perception of binaural delays upon sound such as the Haas effect (Begault, Pohlmann, 1995) represents a continuum of experience, not merely an on/off effect. In this regard, the ideal situation would have been for subjects to be able to control the binaural relationship of the masked siren tone by varying it themselves while keeping it just barely masked (or audible). The result here would be to test whether or not a delay relationship alone would be sufficient to make a masked tone hide or reveal itself.

With the Stereo Phase Difference effect, the phase modulation rate "stereoizes" the masked tone by varying the phase of the signal with respect to the left and right ears 0 to 360 degrees 0 to 100 times per second. This was done for 4 different magnitudes of sweep rate, in an attempt to cover the broadest range of experience with a minimum number of samples. All in all, each subject listened to a suite of 8 samples of 15 seconds each, with 3 seconds of silence between each one. Details of the sample suite along with generalizations as to their effect are presented in Appendix II.

Taken collectively, the masked tone durations gathered from the subjects vary widely, which is to be expected. Just as frequency response differs from individual to individual, masking perceptions, familiarity with the test, and even hand-eye coordination during the evaluation process all have an effect on individual performance. Table 2, for example, shows that the perceived tone duration at 0Hz SPD with a mono noise mask ranged between 8.975 and 13.777 seconds. Graphically, there is little value to be gained from a data analysis at the absolute level (Figures 1,2); it is only when the performances of individuals are analyzed over the entire test suite and plotted against others does a pattern begin to emerge (Figures 3,4,5)

In the Figures, one can see an overall downward slope in aural sensitivity to the masked tone, reflected as a shortening of the durations as the SPD for the samples increase. This is true for both binaural and monophonic masks, and is best summed up in Table 4 and Figure 5. When using a mono mask, the average perceived tone durations decline from a high of 10.60 seconds at 0Hz SPD to a low of 8.59 seconds at 100Hz SPD. For stereo masks, the decline is similar, although the mask in general is less effective: a high of 11.05 seconds at 0Hz SPD to a low of 10.10 seconds at 100Hz SPD.

At this point, some general conclusions can be made. Essentially, monaural masks appear to be more effective in masking tones than their binaural counterparts. Just how much more effective depends on the content of the masked tone, with monaural masks masking wildly stereo-processed sounds the best (Table 4: Mono Masker vs. 100Hz SPD), and binaural masks masking monaural tones the least (Stereo Masker vs. 0Hz SPD). What does this say about our ability to focus on sounds that are unstable in the stereo field?

It certainly appears that in this aspect, some combination of our physical auditory mechanisms and brain processing combine to discriminate against sounds that are binaurally unstable. One theory might be that at the level of the brain where the left and right audio sensory inputs combine results in a reinforcement/cancellation effect related to the phase differences between the signals, and this promotes the greater audibility (or maskability, in the case of noise) seen with monaural tones. At higher SPD rates, the phase cancellation effects may be so rapid that higher-order processing centers of the brain do not have sufficient time to "lock on" to the tones during the period of time when they are in-phase when summed to mono, which would explain the increasing ability of such tones to be masked. Similarly, the binaural masked noise would cancel itself out randomly to a certain degree when summed to mono, which could account for its lower effectiveness. Perhaps when the two situations are combined (binaural noise with high SPD tones) the audio situation is most fully handicapped as can be seen in Table 4 and Figure5. A subject of further study would be to perform these tests again, substituting the SPD-modulated siren tone for one spatialized instead by a suite of fixed short delays.

 

Appendix I