ABX testAn ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples (sample A, the first reference, and sample B, the second reference) followed by one unknown sample X that is randomly selected from either A or B. The subject is then required to identify X as either A or B. If X cannot be identified reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between A and B. ABX tests can easily be performed as double-blind trials, eliminating any possible unconscious influence from the researcher or the test supervisor. Because samples A and B are provided just prior to sample X, the difference does not have to be discerned using long-term memory or past experience. Thus, the ABX test answers whether or not, under the test circumstances, a perceptual difference can be found. ABX tests are commonly used in evaluations of digital audio data compression methods; sample A is typically an uncompressed sample, and sample B is a compressed version of A. Audible compression artifacts that indicate a shortcoming in the compression algorithm can be identified with subsequent testing. ABX tests can also be used to compare the different degrees of fidelity loss between two different audio formats at a given bitrate. ABX tests can be used to audition input, processing, and output components as well as cabling: virtually any audio product or prototype design. HistoryThe history of ABX testing and naming dates back to 1950 in a paper published by two Bell Labs researchers, W. A. Munson and Mark B. Gardner, titled Standardizing Auditory Tests.[1]
The test has evolved to other variations such as subject control over duration and sequence of testing. One such example was the hardware ABX comparator in 1977, built by the ABX company in Troy, Michigan, and documented by one of its founders, David Clark.[2]
The ABX company is now defunct and hardware comparators in general as commercial offerings extinct. Myriad of software tools exist such as Foobar ABX plug-in for performing file comparisons. But hardware equipment testing requires building custom implementations. Hardware testsABX test equipment utilizing relays to switch between two different hardware paths can help determine if there are perceptual differences in cables and components. Video, audio and digital transmission paths can be compared. If the switching is microprocessor controlled, double-blind tests are possible. Loudspeaker level and line level audio comparisons could be performed on an ABX test device offered for sale as the ABX Comparator by QSC Audio Products from 1998 to 2004. Other hardware solutions have been fabricated privately by individuals or organizations for internal testing. ConfidenceIf only one ABX trial were performed, random guessing would incur a 50% chance of choosing the correct answer, the same as flipping a coin. In order to make a statement having some degree of confidence, many trials must be performed. By increasing the number of trials, the likelihood of statistically asserting a person's ability to distinguish A and B is enhanced for a given confidence level. A 95% confidence level is commonly considered statistically significant.[2] The company QSC, in the ABX Comparator user manual, recommended a minimum of ten listening trials in each round of tests.[3]
QSC recommended that no more than 25 trials be performed, as subject fatigue can set in, making the test less sensitive (less likely to reveal one's actual ability to discern the difference between A and B).[3] However, a more sensitive test can be obtained by pooling the results from a number of such tests using separate individuals or tests from the same subject conducted in between rest breaks. For a large number of total trials N, a significant result (one with 95% confidence) can be claimed if the number of correct responses exceeds . Important decisions are normally based on a higher level of confidence, since an erroneous significant result would be claimed in one of 20 such tests simply by chance. Software testsThe foobar2000 and the Amarok audio players support software-based ABX testing, the latter using a third-party script. Lacinato ABX is a cross-platform audio testing tool for Linux, Windows, and 64-bit Mac. Lacinato WebABX is a web-based cross-browser audio ABX tool. Open source aveX was mainly developed for Linux which also provides test-monitoring from a remote computer. ABX patcher is an ABX implementation for Max/MSP. More ABX software can be found at the archived PCABX website. Codec listening testsA codec listening test is a scientific study designed to compare two or more lossy audio codecs, usually with respect to perceived fidelity or compression efficiency. Potential flawsABX is a type of forced choice testing. A subject's choices can be on merit, i.e. the subject indeed honestly tried to identify whether X seemed closer to A or B. But uninterested or tired subjects might choose randomly without even trying. If not caught, this may dilute the results of other subjects who intently took the test and subject the outcome to Simpson's paradox, resulting in false summary results. Simply looking at the outcome totals of the test (m out of n answers correct) cannot reveal occurrences of this problem. This problem becomes more acute if the differences are small. The user may get frustrated and simply aim to finish the test by voting randomly. In this regard, forced-choice tests such as ABX tend to favor negative outcomes when differences are small if proper protocols are not used to guard against this problem. Best practices call for both the inclusion of controls and the screening of subjects:[5]
Other flaws include lack of subject training and familiarization with the test and content selected:
Other problems might arise from the ABX equipment itself, as outlined by Clark,[2] where the equipment provides a tell, allowing the subject to identify the source. Lack of transparency of the ABX fixture creates similar problems. Since auditory tests and many other sensory tests rely on short-term memory, which only lasts a few seconds, it is critical that the test fixture allows the subject to identify short segments that can be compared quickly. Pops and glitches in switching apparatus likewise must be eliminated, as they may dominate or otherwise interfere with the stimuli being tested in what is stored in the subject's short-term memory. AlternativesAlgorithmic Audio Compression EvaluationSince ABX testing requires human beings for evaluation of lossy audio codecs, it is time-consuming and costly. Therefore, cheaper approaches have been developed, e.g. PEAQ, which is an implementation of the ODG. MUSHRAIn MUSHRA, the subject is presented with the reference (labeled as such), a certain number of test samples, a hidden version of the reference and one or more anchors. A 0–100 rating scale makes it possible to rate very small differences, and the hidden version still provides discrimination checks. Discrimination testingAlternative general methods are used in discrimination testing, such as paired comparison, duo–trio, and triangle testing. Of these, duo–trio and triangle testing are particularly close to ABX testing. Schematically:
In this context, ABX testing is also known as "duo–trio" in "balanced reference" mode – both knowns are presented as references, rather than one alone.[6] See also
References
|