Package MKpower

Introduction

The package includes functions for power analysis and sample size calculation for Welch, Hsu (Hedderich and Sachs 2018) and Xiao (Xiao 2018) tests as well as permutation (Janssen 1997) and bootstrap (Davison and Hinkley 1997) t tests including Monte-Carlo simulations of empirical power and type-I-error. Monte-Carlo simulations of empirical power and type-I-error for conditional two-sample tests. In addition, power and sample size calculation for Wilcoxon rank sum and signed rank tests via Monte-Carlo simulations can be performed. Moreover, power and sample size required for the evaluation of a diagnostic test(-system) (Flahault et al. 2005; Dobbin and Simon 2007) as well as for a single proportion (Fleiss et al. 2003), comparing two negative binomial rates (Zhu and Lakkis 2014), ANCOVA (Shieh 2020), reference ranges (Jennen-Steinmetz and Wellek 2005), and multiple primary endpoints (Sozu et al. 2015).

We first load the package.

library(MKpower)

Single Proportion

The computation of the required sample size for investigating a single proportion can be determined via the respective test or confidence interval (Fleiss et al. 2003). First, we consider the asymptotic test.

## with continuity correction
power.prop1.test(p1 = 0.4, power = 0.8)
## 
##      Power calculation for testing a given proportion (with continuity correction) 
## 
##               n = 203.7246
##           delta = 0.1
##              p1 = 0.4
##              p0 = 0.5
##       sig.level = 0.05
## exact.sig.level = 0.04205139
##           power = 0.8
##     exact.power = 0.8008049
##     alternative = two.sided
## 
## NOTE: n = total sample size
## without continuity correction
power.prop1.test(p1 = 0.4, power = 0.8, cont.corr = FALSE)
## 
##      Power calculation for testing a given proportion 
## 
##               n = 193.8473
##           delta = 0.1
##              p1 = 0.4
##              p0 = 0.5
##       sig.level = 0.05
## exact.sig.level = 0.05228312
##           power = 0.8
##     exact.power = 0.8067065
##     alternative = two.sided
## 
## NOTE: n = total sample size

Next, we compute the sample size via the respective asymptotic confidence interval.

## with continuity correction
ssize.prop.ci(prop = 0.4, width = 0.14)
## 
##      Sample size calculation by method of wald-cc 
## 
##               n = 202.1865
##            prop = 0.4
##           width = 0.14
##      conf.level = 0.95
## 
## NOTE: Two-sided confidence interval
## without continuity correction
ssize.prop.ci(prop = 0.4, width = 0.14, method = "wald")
## 
##      Sample size calculation by method of wald 
## 
##               n = 188.1531
##            prop = 0.4
##           width = 0.14
##      conf.level = 0.95
## 
## NOTE: Two-sided confidence interval

Welch Two-Sample t-Test

For computing the sample size of the Welch t-test, we consider the situation of equal as well as unequal group sizes (balanced and unbalanced designs).

## identical results as power.t.test, since sd = sd1 = sd2 = 1
power.t.test(n = 20, delta = 1)
## 
##      Two-sample t test power calculation 
## 
##               n = 20
##           delta = 1
##              sd = 1
##       sig.level = 0.05
##           power = 0.8689528
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.welch.t.test(n = 20, delta = 1)
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 20
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.8689528
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.t.test(power = 0.90, delta = 1)
## 
##      Two-sample t test power calculation 
## 
##               n = 22.0211
##           delta = 1
##              sd = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.welch.t.test(power = .90, delta = 1)
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 22.0211
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.t.test(power = 0.90, delta = 1, alternative = "one.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 17.84713
##           delta = 1
##              sd = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided
## 
## NOTE: n is number in *each* group
power.welch.t.test(power = .90, delta = 1, alternative = "one.sided")
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 17.84713
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided
## 
## NOTE: n is number in *each* group
## sd1 = 0.5, sd2 = 1
power.welch.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 54+54 = 108
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 53.86822
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.welch.t.test(n = 54, delta = 0.5, sd1 = 0.5, sd2 = 1)
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 54
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9007114
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.welch.t.test2(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 22+87 = 109
## 
##      Two-sample Welch t test power calculation 
## 
##              n1 = 21.60259
##              n2 = 86.72645
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)
power.welch.t.test2(n = c(22, 87), delta = 0.5, sd1 = 0.5, sd2 = 1)
## 
##      Two-sample Welch t test power calculation 
## 
##              n1 = 22
##              n2 = 87
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9032172
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)
ssize.welch.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 35+62 = 97
## 
##      Two-sample Welch t test sample size calculation 
## 
##              n1 = 35
##              n2 = 62
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9004175
##     alternative = two.sided
## 
## NOTE: minimum total sample size (sum of group sample sizes)
power.welch.t.test2(n = c(35, 62), delta = 0.5, sd1 = 0.5, sd2 = 1)
## 
##      Two-sample Welch t test power calculation 
## 
##              n1 = 35
##              n2 = 62
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9004175
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)

The last example demonstrates that in case of unequal variances unequal group sizes (i.e. unbalanced design) may lead to a clearly smaller total sample size (sum of group sample sizes) than equal group sizes (i.e. balanced design).

Hsu Two-Sample t-Test

For computing the sample size of the Hsu t-test (Hedderich and Sachs 2018), we only consider the situation of equal group sizes (balanced design).

## slightly more conservative than Welch t-test
power.t.test(n = 20, delta = 1)
## 
##      Two-sample t test power calculation 
## 
##               n = 20
##           delta = 1
##              sd = 1
##       sig.level = 0.05
##           power = 0.8689528
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.hsu.t.test(n = 20, delta = 1)
## 
##      Two-sample Hsu t test power calculation 
## 
##               n = 20
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.8506046
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.hsu.t.test(power = .90, delta = 1)
## 
##      Two-sample Hsu t test power calculation 
## 
##               n = 23.02186
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.hsu.t.test(power = .90, delta = 1, alternative = "one.sided")
## 
##      Two-sample Hsu t test power calculation 
## 
##               n = 18.5674
##           delta = 1
##             sd1 = 1
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided
## 
## NOTE: n is number in *each* group
## sd1 = 0.5, sd2 = 1
power.hsu.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9)
## 
##      Two-sample Hsu t test power calculation 
## 
##               n = 54.49421
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.hsu.t.test2(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 44+59 = 103
## 
##      Two-sample Hsu t test power calculation 
## 
##              n1 = 43.57135
##              n2 = 58.88832
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)
power.hsu.t.test2(n = c(44, 59), delta = 0.5, sd1 = 0.5, sd2 = 1)
## 
##      Two-sample Hsu t test power calculation 
## 
##              n1 = 44
##              n2 = 59
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9012334
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)
ssize.hsu.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 39+62 = 101
## 
##      Two-sample Hsu t test sample size calculation 
## 
##              n1 = 39
##              n2 = 62
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9006363
##     alternative = two.sided
## 
## NOTE: minimum total sample size (sum of group sample sizes)
power.hsu.t.test2(n = c(39, 62), delta = 0.5, sd1 = 0.5, sd2 = 1)
## 
##      Two-sample Hsu t test power calculation 
## 
##              n1 = 39
##              n2 = 62
##           delta = 0.5
##             sd1 = 0.5
##             sd2 = 1
##       sig.level = 0.05
##           power = 0.9006363
##     alternative = two.sided
## 
## NOTE: unequal sample sizes per groups (allowed)

Xiao Two-Sample t-Test

The (optimal) sample sizes can be determined via Monte-Carlo simulations where the starting point is the (optimal) sample sizes of the Welch t-test. Since the computation time is quite high, we don’t run the following code.

sim.ssize.xiao.t.test(delta = 0.5, sd1 = 0.5, sd2 = 1, power = 0.9) # 35+59 = 94

Permutation t-Tests

The required sample size for permutation t-tests can be computed via simulations. Since the computation time is quite high, we don’t run the following code.

rx <- function(n) rnorm(n, mean = 0, sd = 1) 
ry <- function(n) rnorm(n, mean = 0.5, sd = 1) 
sim.ssize.perm.t.test(rx = rx, ry = ry, n.min = 50, n.max = 100, iter = 1000, 
                      var.equal = TRUE)

More examples can be found in section Examples of function sim.ssize.perm.t.test.

Bootstrap t-Tests

The required sample size for bootstrap t-tests can be computed via simulations. Since the computation time is quite high, we don’t run the following code.

rx <- function(n) rnorm(n, mean = 0, sd = 1) 
ry <- function(n) rnorm(n, mean = 0.5, sd = 1) 
sim.ssize.boot.t.test(rx = rx, ry = ry, n.min = 50, n.max = 100, iter = 1000, 
                      var.equal = TRUE)

More examples can be found in section Examples of function sim.ssize.boot.t.test.

ANCOVA

With function power.ancova one can compute power and sample size in ANCOVA designs (Shieh 2020). The default matrix of contrasts used in the function is

## 3 groups
cbind(rep(1,2), -diag(2))
##      [,1] [,2] [,3]
## [1,]    1   -1    0
## [2,]    1    0   -1
## 4 groups
cbind(rep(1,3), -diag(3))
##      [,1] [,2] [,3] [,4]
## [1,]    1   -1    0    0
## [2,]    1    0   -1    0
## [3,]    1    0    0   -1

We use the example provided in Table 9.7 of Maxwell and Delaney (2004).

## Example from Maxwell and Delaney (2004) according to Shieh (2020)
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8)
## 
##      ANCOVA power calculation 
## 
##              ns = 14.29481, 14.29481, 14.29481
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 45
power.ancova(n = rep(45/3, 3), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898)
## 
##      ANCOVA power calculation 
## 
##              ns = 15, 15, 15
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8219894
## 
## NOTE: Total sample size: 45
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.9)
## 
##      ANCOVA power calculation 
## 
##              ns = 18.32456, 18.32456, 18.32456
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: Total sample size: 57
power.ancova(n = rep(57/3, 3), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898)
## 
##      ANCOVA power calculation 
## 
##              ns = 19, 19, 19
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.9115114
## 
## NOTE: Total sample size: 57

Based on the reported adjusted group means and error variance, a (total) sample size of 45 is required to achieve a power of at least 80%. The calculated power is 82.2%. With a sample size of 57 the power will be at least 90%, where the calculated power is 91.2%.

Introducing additional covariates (random covariate model) will increase the required sample size.

power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 2)
## 
##      ANCOVA power calculation 
## 
##              ns = 14.65509, 14.65509, 14.65509
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 2
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 45
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 3)
## 
##      ANCOVA power calculation 
## 
##              ns = 15.01403, 15.01403, 15.01403
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 3
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 48
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 5)
## 
##      ANCOVA power calculation 
## 
##              ns = 15.72837, 15.72837, 15.72837
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 5
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 48
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8,
             nr.covs = 10)
## 
##      ANCOVA power calculation 
## 
##              ns = 17.4965, 17.4965, 17.4965
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 10
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 54

We can also calculate the per group sample sizes for an imbalanced design.

power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8, 
             group.ratio = c(1, 1.25, 1.5))
## 
##      ANCOVA power calculation 
## 
##              ns = 12.25390, 15.31738, 18.38085
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 48
power.ancova(n = c(13, 16, 19), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898,  
             group.ratio = c(1, 1.25, 1.5))
## 
##      ANCOVA power calculation 
## 
##              ns = 13, 16, 19
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8266307
## 
## NOTE: Total sample size: 48
power.ancova(mu=c(7.5366, 11.9849, 13.9785), var = 29.0898, power = 0.8, 
             group.ratio = c(1, 0.8, 2/3))
## 
##      ANCOVA power calculation 
## 
##              ns = 16.87449, 13.49959, 11.24966
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: Total sample size: 43
power.ancova(n = c(17, 14, 12), mu=c(7.5366, 11.9849, 13.9785), var = 29.0898,  
             group.ratio = c(1, 0.8, 2/3))
## 
##      ANCOVA power calculation 
## 
##              ns = 17, 14, 12
##              mu = 7.5366, 11.9849, 13.9785
##             var = 29.0898
##         nr.covs = 1
##       sig.level = 0.05
##           power = 0.8034654
## 
## NOTE: Total sample size: 43

We get a smaller total sample size with an imbalanced design (43 instead of 45).

Wilcoxon Rank Sum and Signed Rank Tests

For computing the sample size of these tests, we offer a function that performs Monte-Carlo simulations to determine their (empirical) power.

rx <- function(n) rnorm(n, mean = 0, sd = 1) 
ry <- function(n) rnorm(n, mean = 0.5, sd = 1) 
## two-sample
sim.ssize.wilcox.test(rx = rx, ry = ry, n.max = 100, iter = 1000)
## 
##      Wilcoxon rank sum test 
## 
##               n = 10, 20, 30, 40, 50, 60, 70
##              rx = rnorm(n, mean = 0, sd = 1)
##              ry = rnorm(n, mean = 0.5, sd = 1)
##       sig.level = 0.05
##       emp.power = 0.159, 0.318, 0.439, 0.558, 0.695, 0.766, 0.825
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
sim.ssize.wilcox.test(rx = rx, ry = ry, n.min = 65, n.max = 70, 
                      step.size = 1, iter = 1000, BREAK = FALSE)
## 
##      Wilcoxon rank sum test 
## 
##               n = 65, 66, 67, 68, 69, 70
##              rx = rnorm(n, mean = 0, sd = 1)
##              ry = rnorm(n, mean = 0.5, sd = 1)
##       sig.level = 0.05
##       emp.power = 0.794, 0.813, 0.802, 0.816, 0.814, 0.836
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.t.test(delta = 0.5, power = 0.8)
## 
##      Two-sample t test power calculation 
## 
##               n = 63.76576
##           delta = 0.5
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## one-sample
sim.ssize.wilcox.test(rx = ry, n.max = 100, iter = 1000, type = "one.sample")
## 
##      Wilcoxon signed rank test 
## 
##               n = 10, 20, 30, 40
##              rx = rnorm(n, mean = 0.5, sd = 1)
##       sig.level = 0.05
##       emp.power = 0.295, 0.562, 0.747, 0.858
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
sim.ssize.wilcox.test(rx = ry, type = "one.sample", n.min = 33, 
                      n.max = 38, step.size = 1, iter = 1000, BREAK = FALSE)
## 
##      Wilcoxon signed rank test 
## 
##               n = 33, 34, 35, 36, 37, 38
##              rx = rnorm(n, mean = 0.5, sd = 1)
##       sig.level = 0.05
##       emp.power = 0.784, 0.799, 0.791, 0.820, 0.814, 0.827
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.t.test(delta = 0.5, power = 0.8, type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 33.3672
##           delta = 0.5
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## two-sample
rx <- function(n) rgamma(n, scale = 2, shape = 1.5)
ry <- function(n) rgamma(n, scale = 4, shape = 1.5) # equal shape
## two-sample
sim.ssize.wilcox.test(rx = rx, ry = ry, n.max = 100, iter = 1000)
## 
##      Wilcoxon rank sum test 
## 
##               n = 10, 20, 30
##              rx = rgamma(n, scale = 2, shape = 1.5)
##              ry = rgamma(n, scale = 4, shape = 1.5)
##       sig.level = 0.05
##       emp.power = 0.348, 0.617, 0.855
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
sim.ssize.wilcox.test(rx = rx, ry = ry, n.min = 25, n.max = 30, 
                      step.size = 1, iter = 1000, BREAK = FALSE)
## 
##      Wilcoxon rank sum test 
## 
##               n = 25, 26, 27, 28, 29, 30
##              rx = rgamma(n, scale = 2, shape = 1.5)
##              ry = rgamma(n, scale = 4, shape = 1.5)
##       sig.level = 0.05
##       emp.power = 0.739, 0.758, 0.772, 0.786, 0.811, 0.819
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

For practical applications we recommend to use a higher number of iterations. For more detailed examples we refer to the help page of the function.

Two Negative Binomial Rates

When we consider two negative binomial rates (Zhu and Lakkis 2014), we can compute sample size or power applying function power.nb.test.

## examples from Table III in Zhu and Lakkis (2014)
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 1)
## 
##      Power calculation for comparing two negative binomial rates 
## 
##               n = 22.37386
##              n1 = 22.37386
##             mu0 = 5
##              RR = 2
##           theta = 2
##        duration = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n = sample size of control group, n1 = sample size of treatment group
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 2)
## 
##      Power calculation for comparing two negative binomial rates 
## 
##               n = 21.23734
##              n1 = 21.23734
##             mu0 = 5
##              RR = 2
##           theta = 2
##        duration = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n = sample size of control group, n1 = sample size of treatment group
power.nb.test(mu0 = 5.0, RR = 2.0, theta = 1/0.5, duration = 1, power = 0.8, approach = 3)
## 
##      Power calculation for comparing two negative binomial rates 
## 
##               n = 20.85564
##              n1 = 20.85564
##             mu0 = 5
##              RR = 2
##           theta = 2
##        duration = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n = sample size of control group, n1 = sample size of treatment group

Diagnostic Test

Given an expected sensitivity and specificity we can compute sample size, power (assurance probability), delta or significance level (error probability) of diagnostic test (Flahault et al. 2005). The calculation is based on a one-sided confidence interval.

## see n2 on page 1202 of Chu and Cole (2007)
ssize.sens.ci(sens = 0.99, delta = 0.14, power = 0.95) # 40
## 
##      Diagnostic test exact power calculation 
## 
##            sens = 0.99
##               n = 40
##              n1 = 40
##           delta = 0.14
##       sig.level = 0.05
##           power = 0.95
##            prev = 0.5
## 
## NOTE: n is number of cases, n1 is number of controls
ssize.sens.ci(sens = 0.99, delta = 0.13, power = 0.95) # 43
## 
##      Diagnostic test exact power calculation 
## 
##            sens = 0.99
##               n = 43
##              n1 = 43
##           delta = 0.13
##       sig.level = 0.05
##           power = 0.95
##            prev = 0.5
## 
## NOTE: n is number of cases, n1 is number of controls
ssize.spec.ci(spec = 0.99, delta = 0.12, power = 0.95) # 47
## 
##      Diagnostic test exact power calculation 
## 
##            spec = 0.99
##               n = 47
##              n1 = 47
##           delta = 0.12
##       sig.level = 0.05
##           power = 0.95
##            prev = 0.5
## 
## NOTE: n is number of controls, n1 is number of cases

Given an expected AUC we can compute sample size, power (assurance probability), delta or significance level (error probability) of the AUC based on a one-sided confidence interval.

ssize.auc.ci(AUC = 0.9, delta = 0.1, power = 0.95)
## 
##      Sample size calculation for AUC confidence interval 
## 
##             AUC = 0.9
##               n = 55.75181
##              n1 = 55.75181
##           delta = 0.1
##       sig.level = 0.05
##           power = 0.95
##            prev = 0.5
## 
## NOTE: n is number of cases, n1 is number of controls

The sample size planning for developing binary classifiers in case of high dimensional data, we can apply function ssize.pcc, which is based on the probability of correct classification (PCC) (Dobbin and Simon 2007).

## see Table 2 of Dobbin et al. (2008)
g <- 0.1
fc <- 1.6
ssize.pcc(gamma = g, stdFC = fc, nrFeatures = 22000)
## 
##      Sample Size Planning for Developing Classifiers Using High Dimensional Data 
## 
##           gamma = 0.1
##            prev = 0.5
##      nrFeatures = 22000
##              n1 = 21
##              n2 = 21
## 
## NOTE: n1 is number of cases, n2 is number of controls

Reference Range

We can apply function ssize.reference.range to determine the sample size required for a study planned to establish a reference range. The parametric approach assumes a normal distribution whereas the non-parametric approach only assumes a continuous distribution.

ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = TRUE)
## 
##      Reference range sample size calculation 
## 
##               n = 271.6691
##           delta = 0.01
##        ref.prob = 0.95
##       conf.prob = 0.9
##     alternative = two.sided
## 
## NOTE: Exact method for normal distribution
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = FALSE)
## 
##      Reference range sample size calculation 
## 
##               n = 269.9241
##           delta = 0.01
##        ref.prob = 0.95
##       conf.prob = 0.9
##     alternative = two.sided
## 
## NOTE: Approximate method for normal distribution
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "nonparametric", exact = TRUE)
## 
##      Reference range sample size calculation 
## 
##               n = 635.2654
##           delta = 0.01
##        ref.prob = 0.95
##       conf.prob = 0.9
##     alternative = two.sided
## 
## NOTE: Exact non-parametric method
ssize.reference.range(delta = 0.01, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "nonparametric", exact = FALSE)
## 
##      Reference range sample size calculation 
## 
##               n = 659.4762
##           delta = 0.01
##        ref.prob = 0.95
##       conf.prob = 0.9
##     alternative = two.sided
## 
## NOTE: Approximate non-parametric method

We can also calculate the sample size for one-sided reference ranges.

ssize.reference.range(delta = 0.015, ref.prob = 0.95, conf.prob = 0.9, 
                      method = "parametric", exact = TRUE, 
                      alternative = "one.sided")
## 
##      Reference range sample size calculation 
## 
##               n = 301.617
##           delta = 0.015
##        ref.prob = 0.95
##       conf.prob = 0.9
##     alternative = one.sided
## 
## NOTE: Exact method for normal distribution

Multiple Primary Endpoints (MPE)

We demonstrate how to calculate the sample size for a trial with two co-primary endpoints with known covariance.

Sigma <- matrix(c(1, 0.8, 0.8, 1), nrow = 2)
power.mpe.known.var(K = 2, delta = c(0.25, 0.4), Sigma = Sigma, 
                    sig.level = 0.025, power = 0.8)
## 
##  Power calculation for multiple co-primary endpoints (covariance known) 
## 
##               n = 251.2079
##           delta = 0.25, 0.40
##              SD = 1, 1
##             rho = 0.8
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.8
## [2,]  0.8  1.0
## 
## NOTE: n is number in *each* group
## equivalent: specify SDs and correlation rho
power.mpe.known.var(K = 2, delta = c(0.25, 0.4), SD = c(1,1), rho = 0.8, 
                    sig.level = 0.025, power = 0.8)
## 
##  Power calculation for multiple co-primary endpoints (covariance known) 
## 
##               n = 251.2079
##           delta = 0.25, 0.40
##              SD = 1, 1
##             rho = 0.8
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.8
## [2,]  0.8  1.0
## 
## NOTE: n is number in *each* group

Next, we show how to calculate the sample size for a trial with two co-primary endpoints with unknown covariance. Here, we follow three steps to determine the sample size.

  • Step 1: As we need starting values for our algorithm that computes the sample size in this case, we first act as if the covariance would be known and compute the sample size by applying our function power.mpe.known.var.

  • Step 2: The resulting value of n is considered as lower bound for the sample size in case of unknown covariance and is used as n.min in function power.mpe.unkown.var. Moreover, we specify a reasonable n.max, which must be larger than n.min.

  • Step 3: Finally, by using the arguments from the step 2, we can compute the sample size for the situation with unknown covariance.

## Step 1:
power.mpe.known.var(K = 2, delta = c(0.5, 0.4), Sigma = Sigma, 
                    sig.level = 0.025, power = 0.8)
## 
##  Power calculation for multiple co-primary endpoints (covariance known) 
## 
##               n = 100.0826
##           delta = 0.5, 0.4
##              SD = 1, 1
##             rho = 0.8
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.8
## [2,]  0.8  1.0
## 
## NOTE: n is number in *each* group
## Step 2 + 3:
Sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
power.mpe.unknown.var(K = 2, delta = c(0.5, 0.4), Sigma = Sigma, 
                      sig.level = 0.025, power = 0.8, 
                      n.min = 105, n.max = 120)
## Current precision:    -0.005071627 
## Current precision:    0.0523889 
## Current precision:    0.001421647 
## Current precision:    -0.0002714107 
## Current precision:    0.0006813584 
## Current precision:    -0.0007883937 
## Current precision:    -0.0001274897 
## Current precision:    0.0003797331 
## Current precision:    0.000433887 
## Current precision:    0.0002532183 
## Current precision:    4.031605e-07 
## Current precision:    -0.0002485883
## 
##  Power calculation for multiple co-primary endpoints (covariance unknown) 
## 
##               n = 106.064
##           delta = 0.5, 0.4
##              SD = 1, 1
##             rho = 0.5
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.5
## [2,]  0.5  1.0
## 
## NOTE: n is number in *each* group
## equivalent: specify SDs and correlation rho
power.mpe.unknown.var(K = 2, delta = c(0.5, 0.4), SD = c(1,1), rho = 0.5, 
                      sig.level = 0.025, power = 0.8, 
                      n.min = 105, n.max = 120)
## Current precision:    -0.004784479 
## Current precision:    0.05237533 
## Current precision:    0.001404638 
## Current precision:    -0.0001453167 
## Current precision:    9.38062e-05 
## Current precision:    -0.0007421143 
## Current precision:    -0.0005028624 
## Current precision:    -0.0004892789 
## Current precision:    -0.0001706202 
## Current precision:    5.929183e-05
## 
##  Power calculation for multiple co-primary endpoints (covariance unknown) 
## 
##               n = 105.9973
##           delta = 0.5, 0.4
##              SD = 1, 1
##             rho = 0.5
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.5
## [2,]  0.5  1.0
## 
## NOTE: n is number in *each* group

We finally demonstrate how to calculate the sample size for a trial with two primary endpoints with known covariance, where the trial is designed to find a significant difference for at least one endpoint.

Sigma <- matrix(c(1, 0.3, 0.3, 1), nrow = 2)
power.mpe.atleast.one(K = 2, delta = c(0.2, 0.3), Sigma = Sigma,
                      power = 0.8, sig.level = 0.025)
## 
##  Power calculation for multiple primary endpoints for at least one endpoint 
## 
##               n = 146.6651
##           delta = 0.2, 0.3
##              SD = 1, 1
##             rho = 0.3
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.3
## [2,]  0.3  1.0
## 
## NOTE: n is number in *each* group
## equivalent: specify SDs and correlation rho
power.mpe.atleast.one(K = 2, delta = c(0.2, 0.3), SD = c(1,1), rho = 0.3,  
                      power = 0.8, sig.level = 0.025)
## 
##  Power calculation for multiple primary endpoints for at least one endpoint 
## 
##               n = 146.6651
##           delta = 0.2, 0.3
##              SD = 1, 1
##             rho = 0.3
##       sig.level = 0.025
##           power = 0.8
## 
## Sigma =
##      [,1] [,2]
## [1,]  1.0  0.3
## [2,]  0.3  1.0
## 
## NOTE: n is number in *each* group

Power and Type-I-Error Simulations

There are quite some discussions and various proposals, which test is the best under which circumstances when we want to compare the location (mean, median) of two groups (Fagerland and Sandvik 2009; Fagerland 2012; Sezer et al. 2017). In addition, it is questioned whether pre-testing of assumptions is appropriate/useful at least from a practical point of view (Zimmerman 2004; Rasch et al. 2011; Rochon et al. 2012; Hoekstra et al. 2012).

We provide functions to simulate power and type-I-error of classical (Gosset 1908), Welch (Welch 1947), Hsu (Hsu 1938) and Xiao (Xiao 2018) t-tests as well as of conditional tests (conditional to pre-tests for normality and equal variance) and Wilcoxon-Mann-Whitney tests (Wilcoxon 1945; Mann and Whitney 1947).

## functions to simulate the data
## group 1
rx <- function(n) rnorm(n, mean = 0, sd = 1)
rx.H0 <- rx
## group 2
ry <- function(n) rnorm(n, mean = 3, sd = 3)
ry.H0 <- function(n) rnorm(n, mean = 0, sd = 3)
## theoretical results
power.welch.t.test(n = 10, delta = 3, sd1 = 1, sd2 = 3)
## 
##      Two-sample Welch t test power calculation 
## 
##               n = 10
##           delta = 3
##             sd1 = 1
##             sd2 = 3
##       sig.level = 0.05
##           power = 0.7794139
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.hsu.t.test(n = 10, delta = 3, sd1 = 1, sd2 = 3)
## 
##      Two-sample Hsu t test power calculation 
## 
##               n = 10
##           delta = 3
##             sd1 = 1
##             sd2 = 3
##       sig.level = 0.05
##           power = 0.7611553
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## simulation
res.t <- sim.power.t.test(nx = 10, rx = rx, rx.H0 = rx.H0,
                          ny = 10, ry = ry, ry.H0 = ry.H0, 
                          methods = c("student", "welch", "hsu"))
res.t
## 
##     Simulation Set-up
##              nx = 10
##              rx = function (n) , rnorm(n, mean = 0, sd = 1)
##           rx.H0 = function (n) , rnorm(n, mean = 0, sd = 1)
##              ny = 10
##              ry = function (n) , rnorm(n, mean = 3, sd = 3)
##           ry.H0 = function (n) , rnorm(n, mean = 0, sd = 3)
##       sig.level = 0.05
##              mu = 0
##     alternative = two.sided
##            iter = 10000
## 
##     Student Two-sample t-Test
##        emp.power = 0.8102
## emp.type.I.error = 0.0571
## 
##     Welch Two-sample t-Test
##        emp.power = 0.7822
## emp.type.I.error = 0.0498
## 
##     Hsu Two-sample t-Test
##        emp.power = 0.7667
## emp.type.I.error = 0.0437
## low number of iterations to reduce computation time
res.w <- sim.power.wilcox.test(nx = 10, rx = rx, rx.H0 = rx.H0,
                               ny = 10, ry = ry, ry.H0 = ry.H0)
res.w
## 
##     Simulation Set-up
##              nx = 10
##              rx = function (n) , rnorm(n, mean = 0, sd = 1)
##           rx.H0 = function (n) , rnorm(n, mean = 0, sd = 1)
##              ny = 10
##              ry = function (n) , rnorm(n, mean = 3, sd = 3)
##           ry.H0 = function (n) , rnorm(n, mean = 0, sd = 3)
##       sig.level = 0.05
##              mu = 0
##     alternative = two.sided
##            iter = 10000
##        conf.int = FALSE
##     approximate = FALSE
##            ties = FALSE
## 
##     Exact Wilcoxon-Mann-Whitney Test
##        emp.power = 0.7462
## emp.type.I.error = 0.0613
## 
##     Asymptotic Wilcoxon-Mann-Whitney Test
##        emp.power = 0.7462
## emp.type.I.error = 0.0613
res.c <- sim.power.cond.test(nx = 10, rx = rx, rx.H0 = rx.H0,
                             ny = 10, ry = ry, ry.H0 = ry.H0)
res.c
## 
##     Simulation Set-up
##              nx = 10
##              rx = function (n) , rnorm(n, mean = 0, sd = 1)
##           rx.H0 = function (n) , rnorm(n, mean = 0, sd = 1)
##              ny = 10
##              ry = function (n) , rnorm(n, mean = 3, sd = 3)
##           ry.H0 = function (n) , rnorm(n, mean = 0, sd = 3)
##       sig.level = 0.05
##              mu = 0
##     alternative = two.sided
##            iter = 10000
## 
##     Conditional Two-sample Test
##        emp.power = 0.7811
## emp.type.I.error = 0.0551

For further investigation of the results, we provide some diagnostic plots.

## t-tests
hist(res.t)

qqunif(res.t, alpha = 0.1)

volcano(res.t, hex = TRUE)
## Warning: Computation failed in `stat_binhex()`.
## Computation failed in `stat_binhex()`.
## Computation failed in `stat_binhex()`.
## Computation failed in `stat_binhex()`.
## Computation failed in `stat_binhex()`.
## Computation failed in `stat_binhex()`.
## Caused by error in `compute_group()`:
## ! The package "hexbin" is required for `stat_bin_hex()`.

##  Wilcoxon-Mann-Whitney tests
hist(res.w)

qqunif(res.w, alpha = 0.05)

## conditional test
hist(res.c)

qqunif(res.c, alpha = 0.1)

We can also generate a volcano plot for the Wilcoxon-Mann-Whitney test, but this would require setting argument conf.int to TRUE, which would lead to a much higher computation time, hence we omitted it here. Furthermore, it is also possible to compute an approximate version of the test by setting argument approximate to TRUE (Hothorn et al. 2008) again by the cost of an increased computation time.

sessionInfo

sessionInfo()
## R version 4.6.1 (2026-06-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 26.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MKpower_1.2    rmarkdown_2.31
## 
## loaded via a namespace (and not attached):
##   [1] tidyselect_1.2.1      libcoin_1.0-13        dplyr_1.2.1          
##   [4] farver_2.1.2          matrixTests_0.2.3.1   S7_0.2.2             
##   [7] bitops_1.0-9          fastmap_1.2.0         TH.data_1.1-5        
##  [10] pracma_2.4.6          MKdescr_1.0           digest_0.6.39        
##  [13] rpart_4.1.27          lifecycle_1.0.5       arrangements_1.1.10  
##  [16] survival_3.8-6        magrittr_2.0.5        compiler_4.6.1       
##  [19] rlang_1.2.0           sass_0.4.10           tools_4.6.1          
##  [22] yaml_2.3.12           knitr_1.51            labeling_0.4.3       
##  [25] RColorBrewer_1.1-3    multcomp_1.4-30       withr_3.0.3          
##  [28] MKinfer_1.4           purrr_1.2.2           sys_3.4.3            
##  [31] stats4_4.6.1          nnet_7.3-20           grid_4.6.1           
##  [34] opdisDownsampling_1.6 jomo_2.7-6            caTools_1.18.3       
##  [37] mice_3.19.0           ggplot2_4.0.3         scales_1.4.0         
##  [40] iterators_1.0.14      MASS_7.3-65           mvtnorm_1.4-1        
##  [43] cli_3.6.6             reformulas_0.4.4      generics_0.1.4       
##  [46] exactRankTests_0.8-37 elliptic_1.5-1        twosamples_2.0.1     
##  [49] robustbase_0.99-7     minqa_1.2.8           DBI_1.3.0            
##  [52] cachem_1.1.0          modeltools_0.2-24     splines_4.6.1        
##  [55] parallel_4.6.1        matrixStats_1.5.0     mitools_2.4          
##  [58] vctrs_0.7.3           sandwich_3.1-1        boot_1.3-32          
##  [61] glmnet_5.0            Matrix_1.7-5          jsonlite_2.0.0       
##  [64] contfrac_1.1-12       mitml_0.4-5           pbmcapply_1.5.1      
##  [67] maketools_1.3.2       hypergeo_1.2-14       foreach_1.5.2        
##  [70] qqplotr_0.0.7         tidyr_1.3.2           jquerylib_0.1.4      
##  [73] glue_1.8.1            DEoptimR_1.2-0        nloptr_2.2.1         
##  [76] pan_2.0               codetools_0.2-20      shape_1.4.6.1        
##  [79] gtable_0.3.6          lme4_2.0-1            gmp_0.7-5.1          
##  [82] tibble_3.3.1          pillar_1.11.1         miceadds_3.20-10     
##  [85] htmltools_0.5.9       deSolve_1.42          R6_2.6.1             
##  [88] Rdpack_2.6.6          doParallel_1.0.17     evaluate_1.0.5       
##  [91] lattice_0.22-9        rbibutils_2.4.1       backports_1.5.1      
##  [94] broom_1.0.13          qqconf_1.3.2          bslib_0.11.0         
##  [97] Rcpp_1.1.1-1.1        nlme_3.1-169          coin_1.4-4           
## [100] xfun_0.59             zoo_1.8-15            buildtools_1.0.0     
## [103] pkgconfig_2.0.3

References

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Methods and Their Applications. Cambridge University Press.
Dobbin, K. K., and R. M. Simon. 2007. Sample size planning for developing classifiers using high-dimensional DNA microarray data.” Biostatistics 8 (1): 101–17.
Fagerland, M. W. 2012. t-tests, non-parametric tests, and large studies–a paradox of statistical practice? BMC Med Res Methodol 12 (June): 78.
Fagerland, M. W., and L. Sandvik. 2009. Performance of five two-sample location tests for skewed distributions with unequal variances.” Contemp Clin Trials 30 (5): 490–96.
Flahault, A., M. Cadilhac, and G. Thomas. 2005. Sample size calculation should be performed for design accuracy in diagnostic test studies.” J Clin Epidemiol 58 (8): 859–62.
Fleiss, Joseph L., Bruce Levin, and Myunghee Cho Paik. 2003. Statistical Methods for Rates and Proportions. Wiley Series in Probability; Statistics.
Gosset, William Sealy. 1908. “The Probable Error of a Mean.” Biometrika 6 (1): 1–25.
Hedderich, Jürgen, and Lothar Sachs. 2018. Angewandte Statistik: Methodensammlung mit R. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-56657-2.
Hoekstra, R., H. A. Kiers, and A. Johnson. 2012. Are assumptions of well-known statistical techniques checked, and why (not)? Front Psychol 3: 137.
Hothorn, Torsten, Kurt Hornik, Mark A. van de Wiel, and Achim Zeileis. 2008. “Implementing a Class of Permutation Tests: The coin Package.” Journal of Statistical Software 28 (8): 1–23. https://doi.org/10.18637/jss.v028.i08.
Hsu, P. 1938. “Contribution to the Theory of Student’s’ t-Test as Applied to the Problem of Two Samples.” Statistical Research Memoirs 2: 1–24.
Janssen, Arnold. 1997. “Studentized Permutation Tests for Non-i.i.d. Hypotheses and the Generalized Behrens-Fisher Problem.” Statistics & Probability Letters 36 (1): 9–21. https://doi.org/10.1016/s0167-7152(97)00043-6.
Jennen-Steinmetz, C., and S. Wellek. 2005. A new approach to sample size calculation for reference interval studies.” Stat Med 24 (20): 3199–212.
Mann, H. B., and D. R. Whitney. 1947. “On a Test of Whether One of Two Random Variables Is Stochastically Larger Than the Other.” Ann. Math. Statist. 18 (1): 50–60. https://doi.org/10.1214/aoms/1177730491.
Maxwell, S. E., and H. D. Delaney. 2004. Designing Experiments and Analyzing Data: A Model Comparison Perspective. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates.
Rasch, Dieter, Klaus D. Kubinger, and Karl Moder. 2011. “The Two-Sample t Test: Pre-Testing Its Assumptions Does Not Pay Off.” Statistical Papers 52: 219–31.
Rochon, J., M. Gondan, and M. Kieser. 2012. To test or not to test: Preliminary assessment of normality when comparing two independent samples.” BMC Med Res Methodol 12 (June): 81.
Sezer, Ahmet, Evren Ozkip, and Berna Yazici. 2017. “Comparison of Confidence Intervals for the Behrens-Fisher Problem.” Communications in Statistics - Simulation and Computation 46 (4): 3242–66. https://doi.org/10.1080/03610918.2015.1082587.
Shieh, G. 2020. Power Analysis and Sample Size Planning in ANCOVA Designs.” Psychometrika 85: 101–20. https://doi.org/10.1007/s11336-019-09692-3.
Sozu, Takashi, Tomoyuki Sugimoto, Toshimitsu Hamasaki, and Scott R. Evans. 2015. Sample Size Determination in Clinical Trials with Multiple Endpoints. Springer Cham. https://doi.org/10.1007/978-3-319-22005-5.
Welch, B. L. 1947. “The Generalisation of Student’s Problems When Several Different Population Variances Are Involved.” Biometrika 34 (1-2): 28–35.
Wilcoxon, Frank. 1945. “Individual Comparisons by Ranking Methods.” Biometrics Bulletin 1 (6): 80–83.
Xiao, Yongshun. 2018. “On the Solution of a Generalized Behrens-Fisher Problem.” Far East Journal of Theoretical Statistics 54 (1): 21–140. https://doi.org/10.17654/TS054010021.
Zhu, H., and H. Lakkis. 2014. Sample size calculation for comparing two negative binomial rates.” Stat Med 33 (3): 376–87.
Zimmerman, D. W. 2004. A note on preliminary tests of equality of variances.” Br J Math Stat Psychol 57 (Pt 1): 173–81.