Multiple comparisons

From Wikipedia, the free encyclopedia.

Jump to: navigation, search
Multiple comparisons is the current mathematics collaboration of the week! Please help improve it to featured article standard.

In statistics, the multiple comparisons problem tests null hypotheses stating that the averages of several disjoint populations are equal to each other (homogeneous).

Of great concern to statisticians the problem of multiple testing, that is, the potential increase in Type I error that occurs when statistical tests are used repeatedly: If n comparisons are performed, the experimentwise significance level α (alpha) is given by

1 - \left( 1-\alpha_\mathrm{per\ comparison} \right)^n,

and it increases exponentially as the number of comparisons increases.

Thus, in order to retain the same overall rate of false positives in a test involving more than one comparison, the standards for each comparison must be more stringent. Intuitively, reducing the size of the allowable error (alpha) for each comparison by the number of comparisons will result in an overall alpha which does not exceed the desired limit, and this can be mathematically proved true. For instance, to obtain the usual alpha of 0.05 with ten comparisons, requiring an alpha of .005 for each comparison can be demonstrated to result in an overall alpha which does not exceed 0.05.

However, it can also be demonstrated that this technique is overly conservative, i.e. will in actuality result in a true alpha of significantly less than 0.05; therefore raising the rate of false negatives, failing to identify an unnecessarily high percentage of actual significant differences in the data. This can have important real world consequences; for instance, it may result in failure to approve a drug which is in fact superior to existing drugs, thereby both depriving the world of an improved therapy, and also causing the drug company to lose the substantial investment in research and development up to that point. For this reason, there has been a great deal of attention paid to developing better techniques for multiple comparisons, such that the overall rate of false positives can be maintained without inflating the rate of false negatives unnecessarily. Such methods can be divided into three general categories:

  • methods where total alpha can be proved to never exceed .05 (or other chosen value) under any conditions
  • methods where total alpha can be proved not to exceed .05 except under certain defined conditions
  • methods which seem empirically to keep total alpha below .05, but there is no proof

The advent of computerized resampling methods, such as bootstrapping and Monte Carlo simulations, has given rise to many techniques in the latter category, however where absolute certainty is desired, such as in New Drug Applications to the US Food and Drug Administration, they are not considered acceptable.

Contents

Post hoc testing of ANOVAs

Multiple comparison procedures are commonly used after obtaining a significant ANOVA F-test. The significant ANOVA result suggests rejecting the global null hypothesis H0 = "means are the same". Multiple comparison procedures are then used to determine which means are different from each other.

Comparing K means involves K(K − 1)/2 pairwise comparisons.

Methods

General methods of alpha adjustment for multiple comparisons

Single-step procedures

Two-step procedures

Multi-step procedures based on Studentized range statistic

Bayesian methods

Key concepts

  • Comparisonwise error rate
  • Experimentwise error rate
  • Familywise error rate
  • False discovery rate (FDR) - see Benjamini and Hochberg (1995)

Bibliography

  • Miller, R G (1966) Simultaneous Statistical Inference
  • Benjamini, Y, and Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing JRSSB 57:125-133
Personal tools