A very simple equivalence testing approach is the ‘two one-sided t-tests’ (TOST) procedure.11 In the TOST procedure an upper (ΔU) and lower (–ΔL) equivalence bound is specified based on the smallest effect size of interest (e.g., a positive or negative difference of d = 0.3). Two composite null hypotheses are tested: H01: Δ ≤ –ΔL and H02: Δ ≥ ΔU. When both these one-sided tests can be statistically rejected, we can conclude that –ΔL < Δ < ΔU, or that the observed effect falls within the equivalence bounds and is statistically smaller than any effect deemed worthwhile and considered practically equivalent".12 Alternatives to the TOST procedure have been developed as well.13 A recent modification to TOST makes the approach feasible in cases of repeated measures and assessing multiple variables.14
The equivalence test can be induced from the t-test.15 Consider a t-test at the significance level αt-test with a power of 1-βt-test for a relevant effect size dr. If Δ=dr as well as αequiv.-test=βt-test and βequiv.-test=αt-test coincide, i.e. the error types (type I and type II) are interchanged between the t-test and the equivalence test, then the t-test will obtain the same results as the equivalence test. To achieve this for the t-test, either the sample size calculation needs to be carried out correctly, or the t-test significance level αt-test needs to be adjusted, referred to as the so-called revised t-test.16 Both approaches have difficulties in practice since sample size planning relies on unverifiable assumptions of the standard deviation, and the revised t-test yields numerical problems.17 Preserving the test behavior, those limitations can be removed by using an equivalence test.
The figure below allows a visual comparison of the equivalence test and the t-test when the sample size calculation is affected by differences between the a priori standard deviation σ {\textstyle \sigma } and the sample's standard deviation σ ^ {\textstyle {\widehat {\sigma }}} , which is a common problem. Using an equivalence test instead of a t-test additionally ensures that αequiv.-test is bounded, which the t-test does not do in case that σ ^ > σ {\textstyle {\widehat {\sigma }}>\sigma } with the type II error growing arbitrary large. On the other hand, having σ ^ < σ {\textstyle {\widehat {\sigma }}<\sigma } results in the t-test being stricter than the dr specified in the planning, which may randomly penalize the sample source (e.g., a device manufacturer). This makes the equivalence test safer to use.
Snapinn, Steven M. (2000). "Noninferiority trials". Current Controlled Trials in Cardiovascular Medicine. 1 (1): 19–21. doi:10.1186/CVM-1-1-019. PMC 59590. PMID 11714400. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC59590 ↩
Rogers, James L.; Howard, Kenneth I.; Vessey, John T. (1993). "Using significance tests to evaluate equivalence between two experimental groups". Psychological Bulletin. 113 (3): 553–565. doi:10.1037/0033-2909.113.3.553. PMID 8316613. /wiki/Doi_(identifier) ↩
Statistics applied to clinical trials (4th ed.). Springer. 2009. ISBN 978-1402095221. 978-1402095221 ↩
Piaggio, Gilda; Elbourne, Diana R.; Altman, Douglas G.; Pocock, Stuart J.; Evans, Stephen J. W.; CONSORT Group, for the (8 March 2006). "Reporting of Noninferiority and Equivalence Randomized Trials" (PDF). JAMA. 295 (10): 1152–60. doi:10.1001/jama.295.10.1152. PMID 16522836. https://researchonline.lshtm.ac.uk/id/eprint/12069/1/Reporting%20of%20Noninferiority%20and%20Equivalence%20Randomized%20Trials.pdf ↩
Piantadosi, Steven (28 August 2017). Clinical trials : a methodologic perspective (Third ed.). John Wiley & Sons. p. 8.6.2. ISBN 978-1-118-95920-6. 978-1-118-95920-6 ↩
Lakens, Daniël (2017-05-05). "Equivalence Tests". Social Psychological and Personality Science. 8 (4): 355–362. doi:10.1177/1948550617697177. PMC 5502906. PMID 28736600. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502906 ↩
Siebert, Michael; Ellenberger, David (2019-04-10). "Validation of automatic passenger counting: introducing the t-test-induced equivalence test". Transportation. 47 (6): 3031–3045. arXiv:1802.03341. doi:10.1007/s11116-019-09991-9. ISSN 0049-4488. https://doi.org/10.1007%2Fs11116-019-09991-9 ↩
Schnellbach, Teresa (2022). Hydraulic Data Analysis Using Python. doi:10.26083/tuprints-00022026. http://tuprints.ulb.tu-darmstadt.de/22026/ ↩
Jahn, Nico; Siebert, Michael (2022). "Engineering the Neural Automatic Passenger Counter". Engineering Applications of Artificial Intelligence. 114. arXiv:2203.01156. doi:10.1016/j.engappai.2022.105148. https://www.sciencedirect.com/science/article/pii/S0952197622002652 ↩
Mazzolari, Raffaele; Porcelli, Simone; Bishop, David J.; Lakens, Daniël (March 2022). "Myths and methodologies: The use of equivalence and non‐inferiority tests for interventional studies in exercise physiology and sport science". Experimental Physiology. 107 (3): 201–212. doi:10.1113/EP090171. ISSN 0958-0670. PMID 35041233. S2CID 246051376. https://onlinelibrary.wiley.com/doi/10.1113/EP090171 ↩
Schuirmann, Donald J. (1987-12-01). "A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability". Journal of Pharmacokinetics and Biopharmaceutics. 15 (6): 657–680. doi:10.1007/BF01068419. ISSN 0090-466X. PMID 3450848. S2CID 206788664. https://zenodo.org/record/1232484 ↩
Lakens, Daniël (May 2017). "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses". Social Psychological and Personality Science. 8 (4): 355–362. doi:10.1177/1948550617697177. ISSN 1948-5506. PMC 5502906. PMID 28736600. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502906 ↩
Wellek, Stefan (2010). Testing statistical hypotheses of equivalence and noninferiority. Chapman and Hall/CRC. ISBN 978-1439808184. 978-1439808184 ↩
Rose, Evangeline M.; Mathew, Thomas; Coss, Derek A.; Lohr, Bernard; Omland, Kevin E. (2018). "A new statistical method to test equivalence: an application in male and female eastern bluebird song". Animal Behaviour. 145: 77–85. doi:10.1016/j.anbehav.2018.09.004. ISSN 0003-3472. S2CID 53152801. /wiki/Doi_(identifier) ↩