Multiple Comparisons

Alex Bäcker, May 12, 1998

Clarified slightly June 2000

 

Let us assume we are making an experiment in which we are trying to decide if any of a set of measurements under experimental condition A is different from the corresponding set of measurements made under the negative control condition B. For example, we might be trying to tell whether a spike train in response to a stimulus is different from spike trains under a control condition where there is no stimulus; one measurement might be the # of spikes within a time window, the set might be given by a series of successive windows. The null hypothesis is that both sets are indistinguishable: that the response to A is no different from that to B.

If making many measurements and reporting any deviation from the value expected given the null hypothesis, the probability of finding a value equal or greater than X given the null hypothesis is not given by the probability of finding that value if one were performing a single measurement --even if one uses the probability for the measurement that actually gave the deviation.

P = p(any of N measurements >= X) = 1 - p(all N measurements < X)

If all N measurements are independent, we can write

P = 1 - p(M1<X).p(M2<X)…(p(MN>=X)

For example, if we are ma

Furthermore, if all N measurements are drawn from the same distribution and thus have the same p-values

P = 1 - [1-p(one measurement >= X)]N

Thus if we are making two independent measurements, and we wish to be as strict as if we were doing a single measurement and using a p-value of 0.05, we must make P above equal to 0.05:
0.05=1-(1-p value from a single comparison to be reported as significant given multiple comparisons)N
p value from a single comparison to be reported as significant given multiple comparisons = Nth root of 0.05 , and thus report any measurement where p<1-(1-0.05)^2, i.e. p<

 

But what if we do not know if the measurements are independent, or if we suspect they are not?

There are at least two possible empirical solutions:

  1. If one has plenty of experiments under the experimental condition (condition A): one can use a subset of the experiments (e.g. half of them) to identify measurements that one believes may be significant, and then formulate a specific hypothesis that those are significant, that one can then test with the rest of the experiments and for which one can obtain a p-value without accounting for multiple comparisons, since there is only one hypothesis being tested.
  2. If one has access to plenty of negative controls, but few under the experimental condition (A) so that partitioning the set of condition A experiments is impractical:
  1. Calculate the p-value, without accounting for multiple comparisons, for the measurement whose multiple comparisons (MC)-corrected p-value we wish to obtain, i.e. the probability that a value at least as extreme (far away from that expected for the null hypothesis) as the one obtained in the experiment is obtained in the negative control.
  2. For each of the other measurements, calculate, without accounting for multiple comparisons, the level of the measurement that constitutes the same p-value calculated in (1).
  3. By analyzing a large number of trials for the negative control, compute the probability that a set of measurements (one of each type, e.g. one in each time window) in the negative control yields any value more extreme than the corresponding levels calculated in (2). This will be the MC-corrected p-value for the result obtained in the original measurement used in (1).

This page has been visited times since August 26th, 1999.