One of the most important aspects of soundscape management is the maintenance of communication capabilities. Achieving stable communications is particularly challenging, as communication both contributes to and competes with the soundscape in which it takes place. Types of communication necessary may include verbal and nonverbal, two-way or broadcast, face-to-face or remote, emergency and routine.
Effective communication requires that a message’s content, delivery mechanism, sound characteristics, and receiver are compatible. To design an effective communication system, due consideration must be given to the sender (e.g. speaker), receiver (listener), and everything in between.
This installment of the “Occupational Soundscapes” series explores characteristics of and interactions among ambient sound, messages or signals, and auditory capabilities to provide the conceptual background needed to establish communication system requirements. There is an emphasis on speech communication, given its prevalence and challenges in workplaces.
Auditory Fitness for Duty
Auditory Fitness for Duty (AFFD) standards are often associated with military service and related occupations. It is an assessment of one’s auditory capabilities with respect to safe and efficient performance of one’s duties. AFFD is being supplanted in military applications, but continues to serve as a cautionary tale.
Even without details of the test and scoring procedures, serious flaws can be seen in the AFFD recommendation chart, shown in in Exhibit 1. Examples include:
Experience tends to improve test scores. Some improvement can be attributed to genuine increases in task performance via learning curves. However, experience with a test procedure can also inflate scores in some cases. In this scenario, true performance of the less-experienced test-taker is actually higher. Unless length of service can be shown to improve task performance beyond that demonstrated by the test, it should not be a factor in the recommendation. Overreliance on arbitrary criteria can lead to bizarre and dangerous results.
The concept of functional hearing is more useful in determinations of fitness for duty. It refers to auditory capability sufficient to maintain situational awareness and speech communication and perform other tasks that require audition. An assessment of functional hearing requires testing or monitoring of task performance of an individual in the setting of concern. Results obtained in a laboratory setting or using special controls may not be representative of performance in the task environment. However, other data, such as audiometric and soundfield measurements are complementary and may have diagnostic value.
Whether subject to an AFFD protocol or less-formal evaluation, an organization must ensure that “hearing-critical” tasks are assigned only to those with the auditory capabilities necessary to perform them successfully. A hearing-critical task is one with the following three characteristics:
Signal-to Noise Ratio (S/N or SNR) is a fundamental concept in communication system design. The series, thus far, has focused on the noise, but effective communication requires attention also be paid to the signal and a key relationship between the two.
Conceptually, S/N is the detectability of a signal in the presence of noise. Mathematically, it is the ratio of signal power to noise power: S/N = Wsig/Wnoise. Following the convention of sound levels (see Part 3), S/N is typically expressed in decibels (dB) using pressure values: S/N = 10 log (Psig/Pnoise) dB. Fortunately, measurements are typically recorded in decibels, yielding the simple expression
S/N = (SPLsig – SPLnoise) dB .
Positive S/N values (ratios > 1.0) identify signals whose intensity exceeds that of the accompanying noise. For example, an S/N of 10 dB indicates that the signal is 10 dB “louder” than the noise. Conversely, negative S/N values (ratios < 1.0) identify signals of lower intensity than the noise. An S/N of 0 dB (ratio = 1.0) indicates a signal and noise of equal intensity.
One possible framework for the use of S/N is to treat the intensity of noise, with any controls (explored further in future installments) active, as an independent variable and signal intensity as a dependent variable. Target S/N is the parameter used to determine the appropriate intensity of a given signal. This simple relation yields a series of parallel lines, as shown in Exhibit 2.
S/N is a simple but important concept, relevant to all forms of communication. As such, it can be cited in reference to verbal (i.e. speech) and nonverbal communication, with or without the use of electronic equipment (e.g. telephone, radio, amplifier, loudspeaker, etc.) When referencing speech communication, S/N may be called the speech-to-noise ratio. The adjustment in terminology serves only to specify the type of signal under scrutiny; definitions and application do not change.
Masking is a phenomenon that causes a signal or message to be more difficult to hear or decipher in the presence of other sounds. Standard audiometric tests determine one’s absolute threshold – the lowest intensity at which a sound is audible in quiet. The lowest intensity at which a sound is audible in the presence of other sounds is called the masked threshold. The difference between the two – that is, the magnitude of the increase in hearing threshold – is the amount, or level, of masking caused by extraneous sound.
“Extraneous sound” is often referred to as “noise” to simplify the presentation. However, the common definition of noise – “unwanted sound” – may not be fully applicable. In fact, several coincident sounds may be necessary, or “wanted,” such as warning signals or other feedback sounds. In this context, a modified definition of “noise” is helpful: “any sound other than the sound of current interest.” This definition accounts for individual analysis of multiple sounds that cannot or should not be eliminated from the soundscape. The “other” sound is also called the masking sound or masker.
Masking of a signal can occur in several ways. The most prevalent is direct masking, which occurs when the signal and masker have similar frequencies. The area of the cochlea needed to process the signal (see Part 2) is preoccupied with the masker, possessing no capability for “attention shift.” The signal cannot, therefore, be perceived as a distinct input.
Whether pure tone, narrowband, or broadband, a masker’s influence extends beyond its constituent frequencies. Frequencies lower than the masker are masked to some degree, but to a much lesser extent than frequencies higher than the masker. This phenomenon is called the upward spread of masking and is the reason that high frequencies are more susceptible to masking than are low frequencies.
Example masking curves, for a range of pure tone frequencies, are shown in Exhibit 3; the value on each curve is the level of the masking tone (frequency shown at top of each plot) above its threshold. Several characteristics of the masking phenomenon can be seen in these plots of masking vs. frequency, including:
A high-frequency band of noise, at high intensity (> 80 dB), can mask pure tones at low frequency. This phenomenon is known as remote masking. It is believed to be a result of low-frequency distortion caused by overstressing the auditory system. This effect can be reducing by filtering.
Interaural masking occurs when one ear receives the signal while the other receives the noise. A masking sound at a level at least 50 dB greater than the signal is needed for significant masking of this type to occur.
Central masking occurs when sound incident on one ear raises the threshold of the opposite ear. It is believed to be negligible and, accordingly, receives little attention.
Adding noise is counterintuitive, but can provide a benefit in certain conditions. If signal and noise are received in one ear, presenting the other with a 100-Hz-wide band of noise, unrelated to the signal or noise in the first ear, provides ~1 dB “release from masking.” A release from masking is a lowering of masked threshold.
Once signal intensity exceeds that of its masker by a few decibels, it seems as loud as it would in the absence of the masker. Loudness of a signal increases more rapidly above its masked threshold in noise than it does in quiet. These points are demonstrated by the converging curves in Exhibit 5.
The preceding discussion of masking focused on various effects of frequency on the audibility of signals, but there is also a temporal component. Forward and backward masking refer to an increase in the threshold of a signal caused by a sound occurring before or after it, respectively.
Forward masking – when a signal follows a masker – is somewhat intuitive. The cochlea must be “freed” from its prior stimulation in order to process the next. Though brief, this refractory period should not be ignored.
Backward masking – when a signal precedes a masker – is much more difficult to comprehend. It involves complex interactions in the auditory system, the exploration of which is beyond the scope of this series. For purposes of this discussion, it is accepted as a genuine phenomenon supported by research detailed in cited references.
A graphical representation of forward and backward masking is provided in Exhibit 6. The break in the graph represents a 5-ms-duration “probe tone” (signal). Backward masking of the tone is presented, in “negative time,” to the left of the break and forward masking to the right. The smaller threshold shift experienced with dichotic presentation (signal in one ear, masker in the other) further demonstrates the advantage of binaural listening.
When the sequence of auditory inputs is important, sufficient delay must exist between signals to allow the listener to determine which occurred first. With a 2 – 3 ms delay, two distinct signals can be recognized, but the sequence is indeterminable. A 10 – 20 ms delay is required to correctly identify the sequence of two sounds received.
While pure tones can be generated for use as warnings and other auditory signals, they are not the norm in naturally-occurring soundscapes or occupational settings. Bands of noise are more-common competitors for listeners’ “auditory attention.” The most complex, and often most important, signals are contained in speech communication. Speech is subject to S/N and masking concerns, as are pure tones and other signals, but experiences additional challenges; these are explored in the following sections.
Audibility and Intelligibility
For many sounds, such as pure tones or narrowband sounds used as warning signals, mere audibility is sufficient to serve its intended purpose. If there is a relatively large number of signals to be monitored, rapid discrimination among them becomes more challenging. When speech communication is needed, there is a much higher bar to be cleared; in addition to being audible and discriminable, speech must also be intelligible to serve its purpose.
The expansion of telephony from commercial enterprises to personal use and its subsequent proliferation provided great impetus for the study of speech intelligibility. Over the past century, several test procedures, media sets, and evaluation schemes have been developed to quantify performance of communication technologies. Though the study of intelligibility originated in telephony, face-to-face communication is subject to similar challenges and can be assessed in similar fashion. Use of an electronic or other intermediary device may improve or degrade intelligibility, but it does not alter the requirements for effective communication.
Some methods of assessment and scoring are rather sophisticated and complex. Reproduction of lengthy procedures is not warranted; readers are encouraged to consult cited references or other sources for additional detail. In lieu of comprehensive instructions, some prominent indices are introduced to provide conceptual understanding of intelligibility testing and scoring. Conceptual understanding is sufficient to recognize the influence of a soundscape on communication system design choices and vice versa. For those that choose to perform calculations, dedicated software and formatted spreadsheets are available to assist in this effort from sources such as the Acoustical Society of America (ASA).
Articulation Index (AI) is the benchmark to which other intelligibility indices are typically compared. Calculation of AI is a laborious process, requiring a series of data plots and correction factorings. This follows the choice of method to be used, based on the data available or precision desired. Its complex calculation process and the limited value of additional precision in most occupational settings prompts a focus on alternative methods to estimate AI.
AI ranges from 0 to 1.0, expressing the proportion of a speech signal that is audible or “available to” a listener. The portion of a speech signal that is available to a listener is that which contributes to the listener’s understanding of the message. The relationship of AI to the proportion of signals correctly understood is not a 1:1 correlation, however. As seen in Exhibit 7, an AI of 0.5 can yield comprehension rates at or near 100%, provided the signal content (vocabulary) is sufficiently limited or additional cues are provided. The high rate of sentence comprehension is afforded by contextual clues inherent in extended messages, even when unfamiliar to the listener (i.e. first presentation). Performance for all media sets shown in Exhibit 7 exceeds 50% comprehension by significant margins at AI = 0.5.
The “overperformance” of speech comprehension, relative to AI, is attributed to the amazing powers of the human brain. With knowledge of the language in use, the brain can extrapolate small portions of the message that were not received clearly. This is not faultless, of course, or comprehension scores would consistently be 100%. In casual conversation, where the consequences of misunderstanding are minimal, these extrapolations can lead to rather humorous exchanges. In consequential communications, however, messages should be crafted such that any extrapolations necessary have a high probability of correctness.
To give AI values intuitive meaning, a qualitative guideline is often used. A typical example is as follows:
Speech Interference Level (SIL) is less precise than an AI calculation; it is used to predict intelligibility of speech in face-to-face communications. SIL is the maximum noise level in which a listener correctly understands 75% of phonetically balanced (PB) words or ~98% of sentences; this comprehension rate is equivalent to AI ≈ 0.5. PB words are those included in a test set such that various speech sounds occur in the same proportion as “normal” speech.
Mathematically, SIL is the arithmetic average of ambient SPLs in the octave bands 600 – 1200 Hz, 1200 – 2400 Hz, and 2400 – 4800 Hz. SIL varies by the speaker’s vocal effort and distance from listener; several combinations of these variables are tabulated in Exhibit 8.
Preferred Speech Interference Level (PSIL) is used to predict the likely level of difficulty using speech to communicate in various circumstances. PSIL is the arithmetic average of ambient SPLs in the octave bands with center frequencies of 500, 1000, and 2000 Hz. Exhibit 9 provides a graphical reference relating PSIL, distance between speaker and listener, and vocal effort to anticipated speech communication difficulty. It also includes a convenient cross-reference to SIL, A- and C-weighted SPLs, and perceived noisiness values as alternative metrics. Estimates of AI at each level of vocal effort are also tabulated, providing additional predictive insight. The chart indicates where noise-reduction efforts may need to be focused or communication system upgrades implemented.
Speech Intelligibility Index (SII) is the most sophisticated index commonly available. It has been adopted in the ANSI S3.5-1997 (R2020) standard, outlining four calculation methods. The details of SII calculations will not be reproduced here; readers are referred to the ANSI standard, available software, and other resources for that information.
Interpretation of SII and AI values are comparable; both range from 0 to 1.0, though SII is often cited as a percentage. Both indices are “outperformed” by speech comprehension over much of this range. Using comparable test sets, SII and AI results are approximately equal. For example, the ~98% comprehension rate of sentences at AI = 0.5 is duplicated at SII = 0.5. This can be seen in Exhibit 10, as well as the comprehension rates as a function of SII for other test sets. As seen in Exhibit 7 for AI, Exhibit 10 shows that simpler vocabulary and additional clues provided by sentences improves comprehension at lower SII values.
In lieu of intensive calculations, visual estimation procedures have been developed. Killion and Mueller’s revised “count-the-dots” method incorporates research on the importance of frequencies outside the 500 – 4000 Hz range that is often the focus of speech communication studies. It has also been adjusted to correlate with SII calculations (1/3 octave importance function) and is now titled “The SII-Based Method for Estimating the Articulation Index.”
The procedure is as follows:
It should be clear by now that intensive calculations are often unwarranted overkill in occupational settings. Variability is introduced by changes in personnel and daily operations; estimates may be the only data available in a reasonable timeframe. AI and similar indices are typically used as indicators, where approximations and trends are more useful than precise values. This in no way diminishes the importance of understanding the concept of intelligibility and how it influences communication system design; it is merely an acknowledgment that a more-practical approach is needed to accommodate resource limitations that exist in most workplaces.
Factors Related to Intelligibility
The previous sections provided background information related to challenges involved in communicating in occupational soundscapes. The presentation now turns to examples that connect these concepts to practical application in system design.
The general “rule” for signal-to-noise ratio is higher is better; however, there are limitations. In general, those with existing hearing loss (HL) require higher S/N to match the comprehension rates of those with normal hearing. In “low-noise” situations, however, higher signal intensity may be unnecessary and can become annoying or otherwise detrimental. For example, high-intensity sound induces distortion in the ear, decreasing intelligibility for all listeners.
SII and other indices were developed for normal hearing. The influence of HL on intelligibility could vary greatly, depending on the nature and severity of hearing loss, the makeup and intensity of the soundscape, and characteristics of the speech signals.
The vocabulary used in speech communication can have a profound impact on its effectiveness (see Exhibit 7 and Exhibit 10). Variables that influence vocabulary effectiveness include the number of words in use (i.e. standardized or free-form), the number of syllables in each word, the uniqueness of words used (e.g. rhymes), and the context in which they are spoken.
Similar words can be difficult to differentiate in random noise due to “consonant confusion.” The “confusion tree” in Exhibit 12 shows the S/Ns at which various consonant sounds become indistinguishable. Two adjacent lines indicate that, at S/Ns below their level of convergence, the corresponding consonant sounds are easily confused. Filtering the speech signal alters the confusion tree; all components of a communication system must be considered in conjunction to achieve desired results.
Dialects add an interesting variable to speech communications. Imagine a meeting with one attendee from each of the following cities: Boston, Houston, London, Dublin, Sydney, and Mumbai. All are fluent in English, the native language of each. Each speaks the language differently, however, stressing different speech sounds, pronouncing words differently, and defining words differently. Add to this scenario high-intensity noise, poor reproduction of vocal inputs to an electronic communication device, and speakers of English as a second language (ESL) and the value of a limited, standardized vocabulary becomes self-evident.
Communication at large gatherings can be difficult. While “listening” to one voice, other voices in the vicinity create masking noise. The effect on intelligibility is shown in Exhibit 13 for a voice of interest held constant at a level of 94 dB. With one masking voice, “selective attention” facilitates relatively high comprehension rates – nearly 80% at S/N = 0 (vertical dashed line). Additional voices degrade comprehension at significantly higher rates. The data on masking voices provides empirical evidence of the productivity-crushing effects of sidebar conversations and unmoderated “debates” in meetings (see “Meetings: Now Available in Productive Format!” [18Dec2019]).
When a speaker must increase vocal effort to be heard above noise, intelligibility can suffer. Increasing vocal effort to shouting levels (> ~80 dB) can result in 20% lower comprehension rates at constant S/N = 0. At lower S/N, shouting degrades comprehension more rapidly despite starting at a lower baseline rate. The decline in comprehension rates when low vocal effort (< ~50 dB) is used is essentially a mirror image.
Acoustic properties of a room in which communication takes place can exacerbate other difficulties. Reverberant properties can cause echoes or hamper the dissipation of sound energy required to “free” a listener’s auditory system to process a new signal.
Face-to-face communication can enhance intelligibility relative to the same message recorded or transmitted electronically. Vocal inflections are undistorted by reproduction and may aid comprehension of the message. In addition, visual cues are readily available, such as facial expressions or “body language.” The additional signals, in some cases, can convey more information than the message itself, particularly among highly-familiar or highly-skilled communicators. The ability to see a speaker’s lips, even if the listener is not a skilled lip-reader, has been found to improve intelligibility significantly in negative-S/N conditions.
Much of the research conducted on speech communication, hearing, and related topics has involved only men. Differences between male and female speech and hearing are believed to be significant, but the details are not well-established. This serves as yet another reminder that every environment is unique, requiring validation of systems within each.
Vast amounts of research have been conducted on the influences of noise on communication, particularly speech communication. Sharing details of this research yields diminishing returns as explorations become more peripheral or less practical to employ in an occupational setting. The preceding presentation is akin to a high-speed flyover of the subject matter, highlighting only the most-relevant and practically-applicable information. However, readers are encouraged to explore the literature on this interesting and valuable subject. “The Third Degree,” meanwhile, will proceed to a presentation of recommendations for design of effective communication systems.
For additional guidance or assistance with Safety, Health, and Environmental (SHE) issues, or other Operations challenges, feel free to leave a comment, contact JayWink Solutions, or schedule an appointment.
For a directory of “Occupational Soundscapes” volumes on “The Third Degree,” see Part 1: An Introduction to Noise-Induced Hearing Loss (26Jul2023).
[Link] The Noise Manual, 6ed. D.K. Meinke, E.H. Berger, R.L. Neitzel, D.P. Driscoll, and K. Bright, eds. The American Industrial Hygiene Association (AIHA); 2022.
[Link] Noise Control in Industry – A Practical Guide. Nicholas P. Cheremisinoff. Noyes Publications; 1996.
[Link] The Effects of Noise on Man. Karl D. Kryter. Academic Press; 1970.
[Link] Human Engineering Guide to Equipment Design (Revised Edition). Harold P. Van Cott and Robert G. Kinkade (Eds). American Institutes for Research; 1972.
[Link] Kodak's Ergonomic Design for People at Work. The Eastman Kodak Company (ed). John Wiley & Sons, Inc.; 2004.
[Link] Fundamentals of Industrial Ergonomics, 2ed. B. Mustafa Pulat. Waveland Press; 1997.
[Link] Engineering Noise Control – Theory and Practice, 4ed. David A. Bies and Colin H. Hansen. Taylor & Francis; 2009.
[Link] An Introduction to Acoustics. Robert H. Randall. Addison-Wesley; 1951.
[Link] “Protection and Enhancement of Hearing in Noise.” John G. Casali and Samir N. Y. Gerges. Reviews of Human Factors and Ergonomics; April 2006.
[Link] “On the Masking Pattern of a Simple Auditory Stimulus.” James P. Egan and Harold W. Hake. The Journal of the Acoustical Society of America; September 1950.
[Link] “Methods for the Calculation and Use of the Articulation Index.” Karl D. Kryter. The Journal of the Acoustical Society of America; November 1962 and Errata [Link].
[Link] “Pediatric Audiology: A Review.” Ryan B. Gregg, Lori S. Wiorek, and Joan C. Arvedson. Pediatrics in Review, July 2004.
[Link] “Signal-to-noise ratio.” Wikipedia.
[Link] “An Easy Method for Calculating the Articulation Index.” H. Gustav Mueller and Mead C. Killion. The Hearing Journal; September 1990.
[Link] “Twenty years later: A NEW Count-The-Dots method.” Mead C. Killion and H. Gustav Mueller. The Hearing Journal; January 2010.
[Link] “The Speech Intelligibility Index: What is it and what's it good for?” Benjamin Hornsby. The Hearing Journal; October 2004.
[Link] “SII Predictions of Aided Speech Recognition.” Susan Scollie. The Hearing Journal; September 2004.
Jody W. Phelps, MSc, PMP®, MBA
JayWink Solutions, LLC
If you'd like to contribute to this blog, please email email@example.com with your suggestions.
© JayWink Solutions, LLC