Package MKinfer

Introduction

Package MKinfer includes a collection of functions for the computation of various confidence intervals (Altman et al. 2000; Hedderich and Sachs 2018) including bootstrapped versions (Davison and Hinkley 1997) as well as Hsu (Hedderich and Sachs 2018), permutation (Janssen 1997), bootstrap (Davison and Hinkley 1997), intersection-union (Sozu et al. 2015) and multiple imputation (Barnard and Rubin 1999) t-test; furthermore, computation of intersection-union z-test as well as multiple imputation Wilcoxon tests. Graphical visualization by volcano and Bland-Altman plots (Bland and Altman 1986; Shieh 2018).

We first load the package.

library(MKinfer)

Confidence Intervals

Binomial Proportion

There are several functions for computing confidence intervals. We can compute 12 different confidence intervals for binomial proportions (DasGupta, Cai, and Brown 2001); e.g.

## default: "wilson"
binomCI(x = 12, n = 50)

## 
##  wilson confidence interval
## 
## 95 percent confidence interval:
##          2.5 %    97.5 %
## prob 0.1429739 0.3741268
## 
## sample estimate:
## prob 
## 0.24 
## 
## additional information:
## standard error of prob 
##             0.05896867

## Clopper-Pearson interval
binomCI(x = 12, n = 50, method = "clopper-pearson")

## 
##  clopper-pearson confidence interval
## 
## 95 percent confidence interval:
##          2.5 %    97.5 %
## prob 0.1306099 0.3816907
## 
## sample estimate:
## prob 
## 0.24

## identical to 
binom.test(x = 12, n = 50)$conf.int

## [1] 0.1306099 0.3816907
## attr(,"conf.level")
## [1] 0.95

For all intervals implemented see the help page of function binomCI. One can also compute bootstrap intervals via function boot.ci of package boot (Davison and Hinkley 1997) as well as one-sided intervals.

Difference of Two Binomial Proportions

There are several functions for computing confidence intervals. We can compute different confidence intervals for the difference of two binomial proportions (independent (Newcombe 1998a) and paired case (Newcombe 1998b)); e.g.

## default: wilson 
binomDiffCI(a = 5, b = 0, c = 51, d = 29)

## 
##  wilson confidence interval (independent proportions)
## 
## 95 percent confidence interval:
##                                             2.5 %  97.5 %
## difference of independent proportions -0.03813715 0.19256
## 
## sample estimate:
## difference of proportions 
##                0.08928571 
## 
## additional information:
## proportion of group 1 proportion of group 2 
##            0.08928571            0.00000000

## default: wilson with continuity correction
binomDiffCI(a = 212, b = 144, c = 256, d = 707, paired = TRUE)

## 
##  wilson-cc confidence interval (paired data)
## 
## 95 percent confidence interval:
##                                              2.5 %      97.5 %
## difference of proportions (paired data) -0.1141915 -0.05543347
## 
## sample estimate:
## difference of proportions 
##               -0.08491281 
## 
## additional information:
## proportion of group 1 proportion of group 2 
##             0.2699014             0.3548143

For all intervals implemented see the help page of function binomDiffCI. One can also compute boostrap intervals via function boot.ci of package boot (Davison and Hinkley 1997) as well as one-sided intervals.

Mean and SD

We can compute confidence intervals for mean and SD Davison and Hinkley (1997).

x <- rnorm(50, mean = 2, sd = 3)
## mean and SD unknown
normCI(x)

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence intervals:
##          2.5 %   97.5 %
## mean 0.5794082 2.401203
## sd   2.6773824 3.994062
## 
## sample estimates:
##     mean       sd 
## 1.490306 3.205164 
## 
## additional information:
## SE of mean 
##  0.4532786

meanCI(x)

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence interval:
##          2.5 %   97.5 %
## mean 0.5794082 2.401203
## 
## sample estimates:
##     mean       sd 
## 1.490306 3.205164 
## 
## additional information:
## SE of mean 
##  0.4532786

sdCI(x)

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence interval:
##       2.5 %   97.5 %
## sd 2.677382 3.994062
## 
## sample estimates:
##     mean       sd 
## 1.490306 3.205164 
## 
## additional information:
## SE of mean 
##  0.4532786

## SD known
normCI(x, sd = 3)

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence interval:
##          2.5 %   97.5 %
## mean 0.6587634 2.321848
## 
## sample estimate:
##     mean 
## 1.490306 
## 
## additional information:
## SE of mean 
##  0.4242641

## mean known
normCI(x, mean = 2, alternative = "less")

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence interval:
##    0 %     95 %
## sd   0 3.851717
## 
## sample estimate:
##       sd 
## 3.205164

## bootstrap
meanCI(x, boot = TRUE)

## 
##  Bootstrap confidence interval(s)
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic             Studentized     
## 95%   ( 0.619,  2.368 )   ( 0.610,  2.369 )   ( 0.608,  2.430 )  
## 
## Level     Percentile            BCa          
## 95%   ( 0.612,  2.371 )   ( 0.636,  2.392 )  
## Calculations and Intervals on Original Scale
## 
## sample estimates:
##     mean       sd 
## 1.490306 3.205164

Difference in Means

We can compute confidence interval for the difference of means (Altman et al. 2000; Hedderich and Sachs 2018; Davison and Hinkley 1997).

x <- rnorm(20)
y <- rnorm(20, sd = 2)
## paired
normDiffCI(x, y, paired = TRUE)

## 
##  Confidence interval (paired)
## 
## 95 percent confidence interval:
##                         2.5 %    97.5 %
## mean of differences -1.379561 0.7990519
## 
## sample estimates:
## mean of differences   sd of differences 
##          -0.2902546           2.3275064 
## 
## additional information:
## SE of mean of differences 
##                 0.5204463

## compare
normCI(x-y)

## 
##  Exact confidence interval(s)
## 
## 95 percent confidence intervals:
##          2.5 %    97.5 %
## mean -1.379561 0.7990519
## sd    1.770046 3.3994898
## 
## sample estimates:
##       mean         sd 
## -0.2902546  2.3275064 
## 
## additional information:
## SE of mean 
##  0.5204463

## bootstrap
normDiffCI(x, y, paired = TRUE, boot = TRUE)

## 
##  Bootstrap confidence interval (paired)
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic             Studentized     
## 95%   (-1.2778,  0.6980 )   (-1.1907,  0.7667 )   (-1.9636,  0.5814 )  
## 
## Level     Percentile            BCa          
## 95%   (-1.3472,  0.6102 )   (-1.6098,  0.4923 )  
## Calculations and Intervals on Original Scale
## 
## sample estimates:
## mean of differences   sd of differences 
##          -0.2902546           2.3275064

## unpaired
y <- rnorm(10, mean = 1, sd = 2)
## classical
normDiffCI(x, y, method = "classical")

## 
##  Classical confidence interval (unpaired)
## 
## 95 percent confidence interval:
##                         2.5 %      97.5 %
## difference in means -1.974127 -0.08137768
## 
## sample estimate:
## difference in means 
##           -1.027752 
## 
## additional information:
## SE of difference in means           Cohen's d (SMD) 
##                 0.4620052                -0.8615635 
## 
##  mean of x    SD of x  mean of y    SD of y 
## 0.03557994 0.97625065 1.06333237 1.55404620

## Welch (default as in case of function t.test)
normDiffCI(x, y, method = "welch")

## 
##  Welch confidence interval (unpaired)
## 
## 95 percent confidence interval:
##                         2.5 %    97.5 %
## difference in means -2.192557 0.1370519
## 
## sample estimate:
## difference in means 
##           -1.027752 
## 
## additional information:
## SE of difference in means           Cohen's d (SMD) 
##                 0.5377353                -0.5600082 
## 
##  mean of x    SD of x  mean of y    SD of y 
## 0.03557994 0.97625065 1.06333237 1.55404620

## Hsu
normDiffCI(x, y, method = "hsu")

## 
##  Hsu confidence interval (unpaired)
## 
## 95 percent confidence interval:
##                         2.5 %    97.5 %
## difference in means -2.244194 0.1886893
## 
## sample estimate:
## difference in means 
##           -1.027752 
## 
## additional information:
## SE of difference in means           Cohen's d (SMD) 
##                 0.5377353                -0.5600082 
## 
##  mean of x    SD of x  mean of y    SD of y 
## 0.03557994 0.97625065 1.06333237 1.55404620

## bootstrap: assuming equal variances
normDiffCI(x, y, method = "classical", boot = TRUE, bootci.type = "bca")

## 
##  Bootstrap confidence interval (equal variances, unpaired)
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level       BCa          
## 95%   (-2.125, -0.118 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
## difference in means 
##           -1.027752 
## 
## additional information:
## SE of difference in means           Cohen's d (SMD) 
##                 0.4620052                -0.8615635 
## 
##  mean of x    SD of x  mean of y    SD of y 
## 0.03557994 0.97625065 1.06333237 1.55404620

## bootstrap: assuming unequal variances
normDiffCI(x, y, method = "welch", boot = TRUE, bootci.type = "bca")

## 
##  Bootstrap confidence interval (unequal variances, unpaired)
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level       BCa          
## 95%   (-2.161, -0.139 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
## difference in means 
##           -1.027752 
## 
## additional information:
## SE of difference in means           Cohen's d (SMD) 
##                 0.5377353                -0.5600082 
## 
##  mean of x    SD of x  mean of y    SD of y 
## 0.03557994 0.97625065 1.06333237 1.55404620

In case of unequal variances and unequal sample sizes per group the classical confidence interval may have a bad coverage (too long or too short), as is indicated by the small Monte-Carlo simulation study below.

M <- 100
CIhsu <- CIwelch <- CIclass <- matrix(NA, nrow = M, ncol = 2)
for(i in 1:M){
  x <- rnorm(10)
  y <- rnorm(30, sd = 0.1)
  CIclass[i,] <- normDiffCI(x, y, method = "classical")$conf.int
  CIwelch[i,] <- normDiffCI(x, y, method = "welch")$conf.int
  CIhsu[i,] <- normDiffCI(x, y, method = "hsu")$conf.int
}
## coverage probabilies
## classical
sum(CIclass[,1] < 0 & 0 < CIclass[,2])/M

## [1] 0.73

## Welch
sum(CIwelch[,1] < 0 & 0 < CIwelch[,2])/M

## [1] 0.98

## Hsu
sum(CIhsu[,1] < 0 & 0 < CIhsu[,2])/M

## [1] 0.98

Coefficient of Variation

We provide 12 different confidence intervals for the (classical) coefficient of variation (Gulhar, Kibria, and Ahmed 2012; Davison and Hinkley 1997); e.g.

x <- rnorm(100, mean = 10, sd = 2) # CV = 0.2
## default: "miller"
cvCI(x)

## 
##  Miller (1991) confidence interval
## 
## 95 percent confidence interval:
##        2.5 %    97.5 %
## CV 0.1640792 0.2194265
## 
## sample estimate:
##        CV 
## 0.1917529 
## 
## additional information:
## standard error of CV 
##           0.01411946

## Gulhar et al. (2012)
cvCI(x, method = "gulhar")

## 
##  Gulhar et al (2012) confidence interval
## 
## 95 percent confidence interval:
##        2.5 %    97.5 %
## CV 0.1683603 0.2227546
## 
## sample estimate:
##        CV 
## 0.1917529

## bootstrap
cvCI(x, method = "boot")

## 
##  Bootstrap confidence interval
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 0.1658,  0.2199 )   ( 0.1657,  0.2196 )  
## 
## Level     Percentile            BCa          
## 95%   ( 0.1639,  0.2178 )   ( 0.1681,  0.2230 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
##        CV 
## 0.1917529

For all intervals implemented see the help page of function cvCI.

Quantiles, Median and MAD

We start with the computation of confidence intervals for quantiles (Hedderich and Sachs 2018; Davison and Hinkley 1997).

x <- rexp(100, rate = 0.5)
## exact
quantileCI(x = x, prob = 0.95)

## 
##  exact confidence interval
## 
## 95.1 percent confidence interval:
##                 2.45 % 97.55 %
## 95 % quantile 4.570694  8.8396
## 
## sample estimate:
## 95 % quantile 
##      6.500321

## asymptotic
quantileCI(x = x, prob = 0.95, method = "asymptotic")

## 
##  asymptotic confidence interval
## 
## 95 percent confidence interval:
##                  2.5 %   97.5 %
## 95 % quantile 4.570694 9.292469
## 
## sample estimate:
## 95 % quantile 
##      6.500321

## boot
quantileCI(x = x, prob = 0.95, method = "boot")

## 
##  bootstrap confidence interval
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 4.212,  8.548 )   ( 4.333,  8.684 )  
## 
## Level     Percentile            BCa          
## 95%   ( 4.317,  8.667 )   ( 4.581,  8.840 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
## 95 % quantile 
##      6.500321

Next, we consider the median.

## exact
medianCI(x = x)

## 
##  exact confidence interval
## 
## 95.2 percent confidence interval:
##          2.4 %   97.6 %
## median 1.04356 1.815473
## 
## sample estimate:
##   median 
## 1.529209

## asymptotic
medianCI(x = x, method = "asymptotic")

## 
##  asymptotic confidence interval
## 
## 95 percent confidence interval:
##           2.5 %   97.5 %
## median 1.177274 1.846582
## 
## sample estimate:
##   median 
## 1.529209

## boot
medianCI(x = x, method = "boot")

## 
##  bootstrap confidence interval
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 1.228,  1.818 )   ( 1.212,  1.866 )  
## 
## Level     Percentile            BCa          
## 95%   ( 1.192,  1.847 )   ( 1.177,  1.831 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
##   median 
## 1.529209

Finally, we take a look at MAD (median absolute deviation) where by default the standardized MAD is used (see function mad).

## exact
madCI(x = x)

## 
##  exact confidence interval
## 
## 95.2 percent confidence interval:
##        2.4 % 97.6 %
## MAD 1.027227 1.4651
## 
## sample estimate:
##     MAD 
## 1.31684

## aysymptotic
madCI(x = x, method = "asymptotic")

## 
##  asymptotic confidence interval
## 
## 95 percent confidence interval:
##       2.5 %   97.5 %
## MAD 1.04543 1.502433
## 
## sample estimate:
##     MAD 
## 1.31684

## boot
madCI(x = x, method = "boot")

## 
##  bootstrap confidence interval
## 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot.out, conf = 1 - alpha, type = bootci.type)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 0.977,  1.704 )   ( 0.965,  1.717 )  
## 
## Level     Percentile            BCa          
## 95%   ( 0.917,  1.669 )   ( 0.957,  1.733 )  
## Calculations and Intervals on Original Scale
## 
## sample estimate:
##     MAD 
## 1.31684

## unstandardized
madCI(x = x, constant = 1)

## 
##  exact confidence interval
## 
## 95.2 percent confidence interval:
##         2.4 %    97.6 %
## MAD 0.6928552 0.9881963
## 
## sample estimate:
##       MAD 
## 0.8881964

Hsu Two-Sample t-Test

The Hsu two-sample t-test is an alternative to the Welch two-sample t-test using a different formula for computing the degrees of freedom of the respective t-distribution (Hedderich and Sachs 2018). The following code is taken and adapted from the help page of the t.test function.

t.test(1:10, y = c(7:20))      # P = .00001855

## 
##  Welch Two Sample t-test
## 
## data:  1:10 and c(7:20)
## t = -5.4349, df = 21.982, p-value = 1.855e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.052802  -4.947198
## sample estimates:
## mean of x mean of y 
##       5.5      13.5

t.test(1:10, y = c(7:20, 200)) # P = .1245    -- NOT significant anymore

## 
##  Welch Two Sample t-test
## 
## data:  1:10 and c(7:20, 200)
## t = -1.6329, df = 14.165, p-value = 0.1245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -47.242900   6.376233
## sample estimates:
## mean of x mean of y 
##   5.50000  25.93333

hsu.t.test(1:10, y = c(7:20))

## 
##  Hsu Two Sample t-test
## 
## data:  1:10 and c(7:20)
## t = -5.4349, df = 9, p-value = 0.0004137
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.329805  -4.670195
## sample estimates:
## mean of x mean of y   SD of x   SD of y 
##   5.50000  13.50000   3.02765   4.18330

hsu.t.test(1:10, y = c(7:20, 200))

## 
##  Hsu Two Sample t-test
## 
## data:  1:10 and c(7:20, 200)
## t = -1.6329, df = 9, p-value = 0.1369
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -48.740846   7.874179
## sample estimates:
## mean of x mean of y   SD of x   SD of y 
##   5.50000  25.93333   3.02765  48.32253

## Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))

## 
##  Welch Two Sample t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean of x mean of y 
##      0.75      2.33

with(sleep, hsu.t.test(extra[group == 1], extra[group == 2]))

## 
##  Hsu Two Sample t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## t = -1.8608, df = 9, p-value = 0.09569
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.5007773  0.3407773
## sample estimates:
## mean of x mean of y   SD of x   SD of y 
##  0.750000  2.330000  1.789010  2.002249

## Formula interface
t.test(extra ~ group, data = sleep)

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

hsu.t.test(extra ~ group, data = sleep)

## 
##  Hsu Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 9, p-value = 0.09569
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.5007773  0.3407773
## sample estimates:
## mean of x mean of y   SD of x   SD of y 
##  0.750000  2.330000  1.789010  2.002249

Bootstrap t-Test

One and two sample bootstrap t-tests with equal or unequal variances in the two sample case (Efron and Tibshirani 1993).

boot.t.test(1:10, y = c(7:20)) # without bootstrap: P = .00001855

## 
##  Bootstrap Welch Two Sample t-test
## 
## data:  1:10 and c(7:20)
## number of bootstrap samples:  9999
## bootstrap p-value < 1e-04 
## bootstrap difference of means (SE) = -8.025385 (1.401167) 
## 95 percent bootstrap percentile confidence interval:
##  -10.75714  -5.30000
## 
## Results without bootstrap:
## t = -5.4349, df = 21.982, p-value = 1.855e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.052802  -4.947198
## sample estimates:
## mean of x mean of y 
##       5.5      13.5

boot.t.test(1:10, y = c(7:20, 200)) # without bootstrap: P = .1245

## 
##  Bootstrap Welch Two Sample t-test
## 
## data:  1:10 and c(7:20, 200)
## number of bootstrap samples:  9999
## bootstrap p-value = 0.0222 
## bootstrap difference of means (SE) = -20.39653 (9.993624) 
## 95 percent bootstrap percentile confidence interval:
##  -46.433333  -5.966667
## 
## Results without bootstrap:
## t = -1.6329, df = 14.165, p-value = 0.1245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -47.242900   6.376233
## sample estimates:
## mean of x mean of y 
##   5.50000  25.93333

## Traditional interface
with(sleep, boot.t.test(extra[group == 1], extra[group == 2]))

## 
##  Bootstrap Welch Two Sample t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## number of bootstrap samples:  9999
## bootstrap p-value = 0.07421 
## bootstrap difference of means (SE) = -1.57632 (0.8009566) 
## 95 percent bootstrap percentile confidence interval:
##  -3.12  0.01
## 
## Results without bootstrap:
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean of x mean of y 
##      0.75      2.33

## Formula interface
boot.t.test(extra ~ group, data = sleep)

## 
##  Bootstrap Welch Two Sample t-test
## 
## data:  extra by group
## number of bootstrap samples:  9999
## bootstrap p-value = 0.07861 
## bootstrap difference of means (SE) = -1.583426 (0.7983694) 
## 95 percent bootstrap percentile confidence interval:
##  -3.170000e+00  2.220446e-17
## 
## Results without bootstrap:
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

Permutation t-Test

One and two sample permutation t-tests with equal (Efron and Tibshirani 1993) or unequal variances (Janssen 1997) in the two sample case.

perm.t.test(1:10, y = c(7:20)) # without permutation: P = .00001855

## 
##  Permutation Welch Two Sample t-test
## 
## data:  1:10 and c(7:20)
## number of permutations:  9999
## (Monte-Carlo) permutation p-value < 1e-04 
## permutation difference of means (SE) = -8.023334 (2.253311) 
## 95 percent (Monte-Carlo) permutation percentile confidence interval:
##  -12.400000  -3.657143
## 
## Results without permutation:
## t = -5.4349, df = 21.982, p-value = 1.855e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.052802  -4.947198
## sample estimates:
## mean of x mean of y 
##       5.5      13.5

## permutation confidence interval sensitive to outlier!
perm.t.test(1:10, y = c(7:20, 200)) # without permutation: P = .1245

## 
##  Permutation Welch Two Sample t-test
## 
## data:  1:10 and c(7:20, 200)
## number of permutations:  9999
## (Monte-Carlo) permutation p-value < 1e-04 
## permutation difference of means (SE) = -20.61166 (15.31447) 
## 95 percent (Monte-Carlo) permutation percentile confidence interval:
##  -36.86667   1.80000
## 
## Results without permutation:
## t = -1.6329, df = 14.165, p-value = 0.1245
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -47.242900   6.376233
## sample estimates:
## mean of x mean of y 
##   5.50000  25.93333

## Traditional interface
with(sleep, perm.t.test(extra[group == 1], extra[group == 2]))

## 
##  Permutation Welch Two Sample t-test
## 
## data:  extra[group == 1] and extra[group == 2]
## number of permutations:  9999
## (Monte-Carlo) permutation p-value = 0.07491 
## permutation difference of means (SE) = -1.589631 (0.9018378) 
## 95 percent (Monte-Carlo) permutation percentile confidence interval:
##  -3.32  0.16
## 
## Results without permutation:
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean of x mean of y 
##      0.75      2.33

## Formula interface
res <- perm.t.test(extra ~ group, data = sleep)
res

## 
##  Permutation Welch Two Sample t-test
## 
## data:  extra by group
## number of permutations:  9999
## (Monte-Carlo) permutation p-value = 0.07561 
## permutation difference of means (SE) = -1.589131 (0.901691) 
## 95 percent (Monte-Carlo) permutation percentile confidence interval:
##  -3.32  0.16
## 
## Results without permutation:
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

In case of skewed distributions, one may use function p2ses to compute an alternative standardized effect size (SES) as proposed by Botta-Dukat (Botta-Dukát 2018).

p2ses(res$p.value)

## [1] 1.754212

Intersection-Union z- and t-Test

We compute the multivariate intersection-union z- and t-test.

library(mvtnorm)
## effect size
delta <- c(0.25, 0.5)
## covariance matrix
Sigma <- matrix(c(1, 0.75, 0.75, 1), ncol = 2)
## sample size
n <- 50
## generate random data from multivariate normal distributions
X <- rmvnorm(n=n, mean = delta, sigma = Sigma)
Y <- rmvnorm(n=n, mean = rep(0, length(delta)), sigma = Sigma)
## perform multivariate z-test
mpe.z.test(X = X, Y = Y, Sigma = Sigma)

## 
##  Intersection-union z-test
## 
## data:  x and y
## z = 1.1588, p-value = 0.1233
## alternative hypothesis:  true difference in means is larger than 0 for all endpoints 
## 97.5 percent confidence intervals:
##           0.025   1
## EP.1 -0.1602377 Inf
## EP.2 -0.1515455 Inf
## sample estimates:
##              X         Y
## EP.1 0.4448950 0.2131399
## EP.2 0.4784242 0.2379769

## perform multivariate t-test
mpe.t.test(X = X, Y = Y)

## 
##  Intersection-union t-test
## 
## data:  x and y
## t = 1.1734, df = 98, p-value = 0.2435
## alternative hypothesis:  true difference in means is larger than 0 for all endpoints 
## 97.5 percent confidence intervals:
##           0.025   1
## EP.1 -0.1934471 Inf
## EP.2 -0.2260308 Inf
## sample estimates:
##              X         Y
## EP.1 0.4448950 0.2131399
## EP.2 0.4784242 0.2379769

Multiple Imputation t-Test

Function mi.t.test can be used to compute a multiple imputation t-test by applying the approch of Rubin (Rubin 1987) in combination with the adjustment of Barnard and Rubin (Barnard and Rubin 1999).

## Generate some data
set.seed(123)
x <- rnorm(25, mean = 1)
x[sample(1:25, 5)] <- NA
y <- rnorm(20, mean = -1)
y[sample(1:20, 4)] <- NA
pair <- c(rnorm(25, mean = 1), rnorm(20, mean = -1))
g <- factor(c(rep("yes", 25), rep("no", 20)))
D <- data.frame(ID = 1:45, response = c(x, y), pair = pair, group = g)

## Use Amelia to impute missing values
library(Amelia)

## Loading required package: Rcpp

## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.3, built: 2024-11-07)
## ## Copyright (C) 2005-2025 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##

res <- amelia(D, m = 10, p2s = 0, idvars = "ID", noms = "group")

## Per protocol analysis (Welch two-sample t-test)
t.test(response ~ group, data = D)

## 
##  Welch Two Sample t-test
## 
## data:  response by group
## t = -6.9214, df = 33.69, p-value = 5.903e-08
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
##  -2.628749 -1.435125
## sample estimates:
##  mean in group no mean in group yes 
##        -1.0862469         0.9456901

## Intention to treat analysis (Multiple Imputation Welch two-sample t-test)
mi.t.test(res, x = "response", y = "group")

## 
##  Multiple Imputation Welch Two Sample t-test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## t = -6.6457, df = 25.142, p-value = 5.63e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.668011 -1.405862
## sample estimates:
##  mean (no)    SD (no) mean (yes)   SD (yes) 
## -1.0809584  0.9379070  0.9559786  1.0203219

## Per protocol analysis (Two-sample t-test)
t.test(response ~ group, data = D, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  response by group
## t = -6.8168, df = 34, p-value = 7.643e-08
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
##  -2.637703 -1.426171
## sample estimates:
##  mean in group no mean in group yes 
##        -1.0862469         0.9456901

## Intention to treat analysis (Multiple Imputation two-sample t-test)
mi.t.test(res, x = "response", y = "group", var.equal = TRUE)

## 
##  Multiple Imputation Two Sample t-test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## t = -6.5736, df = 26.042, p-value = 5.656e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.673828 -1.400046
## sample estimates:
##  mean (no)    SD (no) mean (yes)   SD (yes) 
## -1.0809584  0.9379070  0.9559786  1.0203219

## Specifying alternatives
mi.t.test(res, x = "response", y = "group", alternative = "less")

## 
##  Multiple Imputation Welch Two Sample t-test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## t = -6.6457, df = 25.142, p-value = 2.815e-07
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##     -Inf -1.5135
## sample estimates:
##  mean (no)    SD (no) mean (yes)   SD (yes) 
## -1.0809584  0.9379070  0.9559786  1.0203219

mi.t.test(res, x = "response", y = "group", alternative = "greater")

## 
##  Multiple Imputation Welch Two Sample t-test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## t = -6.6457, df = 25.142, p-value = 1
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -2.560374       Inf
## sample estimates:
##  mean (no)    SD (no) mean (yes)   SD (yes) 
## -1.0809584  0.9379070  0.9559786  1.0203219

## One sample test
t.test(D$response[D$group == "yes"])

## 
##  One Sample t-test
## 
## data:  D$response[D$group == "yes"]
## t = 4.5054, df = 19, p-value = 0.0002422
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5063618 1.3850184
## sample estimates:
## mean of x 
## 0.9456901

mi.t.test(res, x = "response", subset = D$group == "yes")

## 
##  Multiple Imputation One Sample t-test
## 
## data:  Variable response
## number of imputations: 10
## t = 4.6847, df = 18.494, p-value = 0.0001725
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.5280752 1.3838820
## sample estimates:
##      mean        SD 
## 0.9559786 1.0203219

mi.t.test(res, x = "response", mu = -1, subset = D$group == "yes",
          alternative = "less")

## 
##  Multiple Imputation One Sample t-test
## 
## data:  Variable response
## number of imputations: 10
## t = 9.5851, df = 18.494, p-value = 1
## alternative hypothesis: true mean is less than -1
## 95 percent confidence interval:
##      -Inf 1.309328
## sample estimates:
##      mean        SD 
## 0.9559786 1.0203219

mi.t.test(res, x = "response", mu = -1, subset = D$group == "yes",
          alternative = "greater")

## 
##  Multiple Imputation One Sample t-test
## 
## data:  Variable response
## number of imputations: 10
## t = 9.5851, df = 18.494, p-value = 6.655e-09
## alternative hypothesis: true mean is greater than -1
## 95 percent confidence interval:
##  0.6026297       Inf
## sample estimates:
##      mean        SD 
## 0.9559786 1.0203219

## paired test
t.test(D$response, D$pair, paired = TRUE)

## 
##  Paired t-test
## 
## data:  D$response and D$pair
## t = -1.3532, df = 35, p-value = 0.1847
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.6976921  0.1395993
## sample estimates:
## mean difference 
##      -0.2790464

mi.t.test(res, x = "response", y = "pair", paired = TRUE)

## 
##  Multiple Imputation Paired t-test
## 
## data:  Variables response and pair
## number of imputations: 10
## t = -1.0455, df = 39.974, p-value = 0.3021
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5680741  0.1807237
## sample estimates:
## mean of difference   SD of difference 
##         -0.1936752          1.2426515

Function mi.t.test also works with package mice.

library(mice)

## 
## Attaching package: 'mice'

## The following object is masked from 'package:stats':
## 
##     filter

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

res.mice <- mice(D, m = 10, print = FALSE)
mi.t.test(res.mice, x = "response", y = "group")

## 
##  Multiple Imputation Welch Two Sample t-test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## t = -7.2111, df = 29.735, p-value = 5.289e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.6351 -1.4716
## sample estimates:
##  mean (no)    SD (no) mean (yes)   SD (yes) 
## -1.0771373  0.8585436  0.9762127  1.0261920

Multiple Imputation Wilcoxon Tests

Function mi.wilcox.test can be used to compute multiple imputation Wilcoxon tests by applying the median p rule (MPR) approach; see Section 13.3 in (Heymans and Eekhout 2019).

## Per protocol analysis (Exact Wilcoxon rank sum test)
library(exactRankTests)

##  Package 'exactRankTests' is no longer under development.
##  Please consider using package 'coin' instead.

wilcox.exact(response ~ group, data = D, conf.int = TRUE)

## 
##  Exact Wilcoxon rank sum test
## 
## data:  response by group
## W = 12, p-value = 7.444e-08
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -2.664671 -1.329111
## sample estimates:
## difference in location 
##              -1.989152

## Intention to treat analysis (Multiple Imputation Exact Wilcoxon rank sum test)
mi.wilcox.test(res, x = "response", y = "group")

## 
##  Multiple Imputation Exact Wilcoxon rank sum test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## W = 20, p-value = 1.712e-09
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -2.489225 -1.317255
## sample estimates:
## difference in location 
##              -1.882414

## Specifying alternatives
mi.wilcox.test(res, x = "response", y = "group", alternative = "less")

## 
##  Multiple Imputation Exact Wilcoxon rank sum test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## W = 20, p-value = 8.562e-10
## alternative hypothesis: true mu is less than 0
## 95 percent confidence interval:
##       -Inf -1.424056
## sample estimates:
## difference in location 
##              -1.882414

mi.wilcox.test(res, x = "response", y = "group", alternative = "greater")

## 
##  Multiple Imputation Exact Wilcoxon rank sum test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## W = 20, p-value = 1
## alternative hypothesis: true mu is greater than 0
## 95 percent confidence interval:
##  -2.371565       Inf
## sample estimates:
## difference in location 
##              -1.882414

## One sample test
wilcox.exact(D$response[D$group == "yes"], conf.int = TRUE)

## 
##  Exact Wilcoxon signed rank test
## 
## data:  D$response[D$group == "yes"]
## V = 197, p-value = 0.0001678
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  0.4856837 1.3892099
## sample estimates:
## (pseudo)median 
##      0.9151479

mi.wilcox.test(res, x = "response", subset = D$group == "yes")

## 
##  Multiple Imputation Exact Wilcoxon signed rank test
## 
## data:  Variable response
## number of imputations: 10
## V = 306, p-value = 1.83e-05
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  0.5338009 1.3042666
## sample estimates:
## (pseudo)median 
##      0.9033367

mi.wilcox.test(res, x = "response", mu = -1, subset = D$group == "yes",
               alternative = "less")

## 
##  Multiple Imputation Exact Wilcoxon signed rank test
## 
## data:  Variable response
## number of imputations: 10
## V = 325, p-value = 1
## alternative hypothesis: true mu is less than -1
## 95 percent confidence interval:
##      -Inf 1.406019
## sample estimates:
## (pseudo)median 
##      0.9710047

mi.wilcox.test(res, x = "response", mu = -1, subset = D$group == "yes",
               alternative = "greater")

## 
##  Multiple Imputation Exact Wilcoxon signed rank test
## 
## data:  Variable response
## number of imputations: 10
## V = 325, p-value = 2.98e-08
## alternative hypothesis: true mu is greater than -1
## 95 percent confidence interval:
##  0.6708086       Inf
## sample estimates:
## (pseudo)median 
##      0.9710047

## paired test
wilcox.exact(D$response, D$pair, paired = TRUE, conf.int = TRUE)

## 
##  Exact Wilcoxon signed rank test
## 
## data:  D$response and D$pair
## V = 244, p-value = 0.1663
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -0.6467313  0.1005468
## sample estimates:
## (pseudo)median 
##     -0.2795692

mi.wilcox.test(res, x = "response", y = "pair", paired = TRUE)

## 
##  Multiple Imputation Exact Wilcoxon signed rank test
## 
## data:  Variables response and pair
## number of imputations: 10
## V = 436, p-value = 0.3641
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -0.5120206  0.2049652
## sample estimates:
## (pseudo)median 
##     -0.1683673

Function mi.wilcox.test also works with package mice.

mi.wilcox.test(res.mice, x = "response", y = "group")

## 
##  Multiple Imputation Exact Wilcoxon rank sum test
## 
## data:  Variable response: group no vs group yes
## number of imputations: 10
## W = 16, p-value = 5.3e-10
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -2.533659 -1.452234
## sample estimates:
## difference in location 
##              -2.002101

Repeated Measures One-Way ANOVA

We provide a simple wrapper function that allows to compute a classical repeated measures one-way ANOVA as well as a respective mixed effects model. In addition, the non-parametric Friedman and Quade tests can be computed.

set.seed(123)
outcome <- c(rnorm(10), rnorm(10, mean = 1.5), rnorm(10, mean = 1))
timepoints <- factor(rep(1:3, each = 10))
patients <- factor(rep(1:10, times = 3))
rm.oneway.test(outcome, timepoints, patients)

## 
##  Repeated measures 1-way ANOVA
## 
## data:  outcome, timepoints and patients
## F = 6.5898, num df = 2, denom df = 18, p-value = 0.007122

rm.oneway.test(outcome, timepoints, patients, method = "lme")

## 
##  Mixed-effects 1-way ANOVA
## 
## data:  outcome, timepoints and patients
## F = 7.3674, num df = 2, denom df = 18, p-value = 0.004596

rm.oneway.test(outcome, timepoints, patients, method = "friedman")

## 
##  Friedman rank sum test
## 
## data:  outcome, timepoints and patients
## Friedman chi-squared = 12.8, df = 2, p-value = 0.001662

rm.oneway.test(outcome, timepoints, patients, method = "quade")

## 
##  Quade test
## 
## data:  outcome, timepoints and patients
## Quade F = 7.2906, num df = 2, denom df = 18, p-value = 0.004795

Volcano Plots

Volcano plots can be used to visualize the results when many tests have been applied. They show a measure of effect size in combination with (adjusted) p values.

## Generate some data
x <- matrix(rnorm(1000, mean = 10), nrow = 10)
g1 <- rep("control", 10)
y1 <- matrix(rnorm(500, mean = 11.75), nrow = 10)
y2 <- matrix(rnorm(500, mean = 9.75, sd = 3), nrow = 10)
g2 <- rep("treatment", 10)
group <- factor(c(g1, g2))
Data <- rbind(x, cbind(y1, y2))
## compute Hsu t-test
pvals <- apply(Data, 2, function(x, group) hsu.t.test(x ~ group)$p.value,
               group = group)
## compute log-fold change
logfc <- function(x, group){
  res <- tapply(x, group, mean)
  log2(res[1]/res[2])
}
lfcs <- apply(Data, 2, logfc, group = group)
volcano(lfcs, p.adjust(pvals, method = "fdr"), 
        effect.low = -0.25, effect.high = 0.25, 
        xlab = "log-fold change", ylab = "-log10(adj. p value)")

Bland-Altman plots

Bland-Altman plots are used to visually compare two methods of measurement. We can generate classical Bland-Altman plots (Bland and Altman 1986; Shieh 2018).

data("fingsys")
baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Approximative Confidence Intervals", 
       type = "parametric", ci.type = "approximate")

## $`mean of differences`
## estimate    2.5 %   97.5 % 
## 4.295000 2.261102 6.328898 
## 
## $`lower LoA (2.5 %)`
##  estimate     2.5 %    97.5 % 
## -24.32967 -27.81137 -20.84797 
## 
## $`upper LoA (97.5 %)`
## estimate    2.5 %   97.5 % 
## 32.91967 29.43797 36.40137

baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Exact Confidence Intervals", 
       type = "parametric", ci.type = "exact")

## Warning in qt(conf.alpha/2, df, zp * sqrt(n)): full precision may not have been
## achieved in 'pnt{final}'

## Warning in qt(1 - conf.alpha/2, df, zp * sqrt(n)): full precision may not have
## been achieved in 'pnt{final}'
## Warning in qt(1 - conf.alpha/2, df, zp * sqrt(n)): full precision may not have
## been achieved in 'pnt{final}'

## $`mean of differences`
## estimate    2.5 %   97.5 % 
## 4.295000 2.261102 6.328898 
## 
## $`lower LoA (2.5 %)`
##  estimate     2.5 %    97.5 % 
## -24.32967 -28.07305 -21.09776 
## 
## $`upper LoA (97.5 %)`
## estimate    2.5 %   97.5 % 
## 32.91967 29.68776 36.66305

baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Bootstrap Confidence Intervals", 
       type = "parametric", ci.type = "boot", R = 999)

## $`mean of differences`
## estimate    2.5 %   97.5 % 
## 4.295000 2.219930 6.250611 
## 
## $`lower LoA (2.5 %)`
##  estimate     2.5 %    97.5 % 
## -24.32967 -29.53826 -20.48664 
## 
## $`upper LoA (97.5 %)`
## estimate    2.5 %   97.5 % 
## 32.91967 29.37463 37.50405

In the following nonparametric Bland-Altman plots, the median is used instead of the mean in combination with empirical quantiles for the lower and upper limit of agreement.

data("fingsys")
baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Approximative Confidence Intervals", 
       type = "nonparametric", ci.type = "approximate")

## $`median of differences`
## estimate    2.5 %   97.5 % 
##      4.5      2.0      8.0 
## 
## $`lower LoA (2.5 %)`
## estimate.2.5%         2.5 %        97.5 % 
##       -24.025       -48.000       -20.000 
## 
## $`upper LoA (97.5 %)`
## estimate.97.5%          2.5 %         97.5 % 
##             34             28             60

baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Exact Confidence Intervals", 
       type = "nonparametric", ci.type = "exact")

## $`median of differences`
## estimate    2.5 %   97.5 % 
##      4.5      2.0      8.0 
## 
## $`lower LoA (2.5 %)`
## estimate.2.5%         2.5 %        97.5 % 
##       -24.025       -41.000       -18.000 
## 
## $`upper LoA (97.5 %)`
## estimate.97.5%          2.5 %         97.5 % 
##             34             27             45

baplot(fingsys$fingsys, fingsys$armsys, 
       title = "Bootstrap Confidence Intervals", 
       type = "nonparametric", ci.type = "boot", R = 999)

## $`median of differences`
## estimate    2.5 %   97.5 % 
##      4.5      2.0      8.0 
## 
## $`lower LoA (2.5 %)`
## estimate.2.5%         2.5 %        97.5 % 
##       -24.025       -30.275       -18.000 
## 
## $`upper LoA (97.5 %)`
## estimate.97.5%          2.5 %         97.5 % 
##         34.000         27.025         38.175

Imputation of Standard Deviations for Changes from Baseline

The function imputeSD can be used to impute standard deviations for changes from baseline adopting the approach of the Cochrane handbook (Higgins et al. 2019, sec. 6.5.2.8).

SD1 <- c(0.149, 0.022, 0.036, 0.085, 0.125, NA, 0.139, 0.124, 0.038)
SD2 <- c(NA, 0.039, 0.038, 0.087, 0.125, NA, 0.135, 0.126, 0.038)
SDchange <- c(NA, NA, NA, 0.026, 0.058, NA, NA, NA, NA)
imputeSD(SD1, SD2, SDchange)

##     SD1   SD2 SDchange       Cor SDchange.min SDchange.mean SDchange.max
## 1 0.149 0.149       NA        NA   0.04491608    0.05829769   0.06913600
## 2 0.022 0.039       NA        NA   0.01915642    0.02050235   0.02176520
## 3 0.036 0.038       NA        NA   0.01132754    0.01460887   0.01727787
## 4 0.085 0.087    0.026 0.9545639   0.02600000    0.02600000   0.02600000
## 5 0.125 0.125    0.058 0.8923520   0.05800000    0.05800000   0.05800000
## 6    NA    NA       NA        NA           NA            NA           NA
## 7 0.139 0.135       NA        NA   0.04148755    0.05374591   0.06368696
## 8 0.124 0.126       NA        NA   0.03773311    0.04894677   0.05803262
## 9 0.038 0.038       NA        NA   0.01145511    0.01486787   0.01763200

Correlations can also be provided via argument corr. This option may particularly be useful, if no complete data is available.

SDchange2 <- rep(NA, 9)
imputeSD(SD1, SD2, SDchange2, corr = c(0.85, 0.9, 0.95))

##     SD1   SD2 SDchange Cor SDchange.min SDchange.mean SDchange.max
## 1 0.149 0.149       NA  NA   0.04711794    0.06663483   0.08161066
## 2 0.022 0.039       NA  NA   0.01935975    0.02146159   0.02337520
## 3 0.036 0.038       NA  NA   0.01186592    0.01666133   0.02035682
## 4 0.085 0.087       NA  NA   0.02726720    0.03850974   0.04714340
## 5 0.125 0.125       NA  NA   0.03952847    0.05590170   0.06846532
## 6    NA    NA       NA  NA           NA            NA           NA
## 7 0.139 0.135       NA  NA   0.04350287    0.06139218   0.07513654
## 8 0.124 0.126       NA  NA   0.03957777    0.05593568   0.06849234
## 9 0.038 0.038       NA  NA   0.01201666    0.01699412   0.02081346

Pairwise Comparisons

Function pairwise.fun enables the application of arbitrary functions for pairwise comparisons.

pairwise.wilcox.test(airquality$Ozone, airquality$Month, 
                     p.adjust.method = "none")

## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  airquality$Ozone and airquality$Month 
## 
##   5       6       7       8      
## 6 0.19250 -       -       -      
## 7 3e-05   0.01414 -       -      
## 8 0.00012 0.02591 0.86195 -      
## 9 0.11859 0.95887 0.00074 0.00325
## 
## P value adjustment method: none

## To avoid the warnings
pairwise.fun(airquality$Ozone, airquality$Month, 
             fun = function(x, y) wilcox.exact(x, y)$p.value)

##       5 vs 6       5 vs 7       5 vs 8       5 vs 9       6 vs 7       6 vs 8 
## 1.897186e-01 1.087205e-05 6.108735e-05 1.171796e-01 1.183753e-02 2.333564e-02 
##       6 vs 9       7 vs 8       7 vs 9       8 vs 9 
## 9.528659e-01 8.595683e-01 5.281796e-04 2.714694e-03

The function is also the basis for our functions pairwise.wilcox.exact and pairwise.ext.t.test, which in contrast to the standard functions pairwise.wilcox.test and pairwise.t.test generate a much more detailed output. In addition, pairwise.wilcox.exact is based on wilcox.exact instead of wilcox.test.

pairwise.wilcox.exact(airquality$Ozone, airquality$Month)

## 
##  Pairwise Exact Wilcoxon rank sum tests
## 
## data:  airquality$Ozone and airquality$Month
## alternative hypothesis: true location shift is not equal to 0
## 
##    groups     W diff. in location 2.5% 97.5%      p-value adj. p-value
## 1  5 vs 6 152.0               6.5   -5    18 1.897186e-01 0.5691559000
## 2  5 vs 7 566.5              37.5   21    51 1.087205e-05 0.0001087205
## 3  5 vs 8 548.5              31.5   14    53 6.108735e-05 0.0005497862
## 4  5 vs 9 470.0               5.5   -2    13 1.171796e-01 0.4687184000
## 5  6 vs 7 182.5              28.5    8    51 1.183753e-02 0.0710251900
## 6  6 vs 8 176.5              23.5    2    53 2.333564e-02 0.1166782000
## 7  6 vs 9 128.5              -0.5  -12    10 9.528659e-01 1.0000000000
## 8  7 vs 8 328.0              -2.5  -21    19 8.595683e-01 1.0000000000
## 9  7 vs 9 176.5             -30.5  -44   -13 5.281796e-04 0.0042254370
## 10 8 vs 9 202.0             -23.5  -46    -7 2.714694e-03 0.0190028600

Furthermore, function pairwise.ext.t.test also enables pairwise Student, Welch, Hsu, bootstrap and permutation t tests.

pairwise.t.test(airquality$Ozone, airquality$Month, pool.sd = FALSE)

## 
##  Pairwise comparisons using t tests with non-pooled SD 
## 
## data:  airquality$Ozone and airquality$Month 
## 
##   5       6       7       8      
## 6 1.00000 -       -       -      
## 7 0.00026 0.01527 -       -      
## 8 0.00195 0.02135 1.00000 -      
## 9 0.86321 1.00000 0.00589 0.01721
## 
## P value adjustment method: holm

pairwise.ext.t.test(airquality$Ozone, airquality$Month)

## 
##  Pairwise Welch Two Sample t-tests
## 
## data:  airquality$Ozone and airquality$Month
## alternative hypothesis: true difference in means is not equal to 0
## 
##    groups           t       df diff. in means       SE       2.5%     97.5%
## 1  5 vs 6  0.78010090 16.93764      5.8290598 7.472187  -9.940299  21.59842
## 2  5 vs 7  4.68198923 44.84296     35.5000000 7.582247  20.227090  50.77291
## 3  5 vs 8  4.07487966 39.27916     36.3461538 8.919565  18.308730  54.38358
## 4  5 vs 9  1.25274697 52.95697      7.8328912 6.252572  -4.708419  20.37420
## 5  6 vs 7  3.41859846 24.79227     29.6709402 8.679270  11.788050  47.55383
## 6  6 vs 8  3.09220573 29.98945     30.5170940 9.869037  10.361530  50.67265
## 7  6 vs 9  0.26556793 17.61281      2.0038314 7.545457 -13.873600  17.88127
## 8  7 vs 8  0.08501814 47.63564      0.8461538 9.952627 -19.168900  20.86121
## 9  7 vs 9 -3.61450643 46.58247    -27.6671088 7.654464 -43.069550 -12.26467
## 10 8 vs 9 -3.17483051 40.37577    -28.5132626 8.981035 -46.659350 -10.36718
##         p-value adj. p-value
## 1  4.460968e-01 1.0000000000
## 2  2.646679e-05 0.0002646679
## 3  2.168566e-04 0.0019517090
## 4  2.158016e-01 0.8632062000
## 5  2.181685e-03 0.0152717900
## 6  4.269318e-03 0.0213465900
## 7  7.936555e-01 1.0000000000
## 8  9.326033e-01 1.0000000000
## 9  7.361032e-04 0.0058888260
## 10 2.868290e-03 0.0172097400

pairwise.ext.t.test(airquality$Ozone, airquality$Month,
                    method = "hsu.t.test")

## 
##  Pairwise Hsu Two Sample t-tests
## 
## data:  airquality$Ozone and airquality$Month
## alternative hypothesis: true difference in means is not equal to 0
## 
##    groups           t df diff. in means       SE       2.5%     97.5%
## 1  5 vs 6  0.78010090  8      5.8290598 7.472187 -11.401830  23.05995
## 2  5 vs 7  4.68198923 25     35.5000000 7.582247  19.884070  51.11593
## 3  5 vs 8  4.07487966 25     36.3461538 8.919565  17.975970  54.71634
## 4  5 vs 9  1.25274697 25      7.8328912 6.252572  -5.044523  20.71031
## 5  6 vs 7  3.41859846  8     29.6709402 8.679270   9.656507  49.68537
## 6  6 vs 8  3.09220573  8     30.5170940 9.869037   7.759053  53.27514
## 7  6 vs 9  0.26556793  8      2.0038314 7.545457 -15.396020  19.40369
## 8  7 vs 8  0.08501814 25      0.8461538 9.952627 -19.651670  21.34397
## 9  7 vs 9 -3.61450643 25    -27.6671088 7.654464 -43.431770 -11.90245
## 10 8 vs 9 -3.17483051 25    -28.5132626 8.981035 -47.010050 -10.01648
##         p-value adj. p-value
## 1  0.4577885000  1.000000000
## 2  0.0000849598  0.000849598
## 3  0.0004086781  0.003678103
## 4  0.2218896000  0.887558600
## 5  0.0091067660  0.054640600
## 6  0.0148398900  0.074199430
## 7  0.7972873000  1.000000000
## 8  0.9329242000  1.000000000
## 9  0.0013234440  0.010587550
## 10 0.0039521140  0.027664800

pairwise.ext.t.test(airquality$Ozone, airquality$Month, 
                    method = "boot.t.test")

## 
##  Pairwise Bootstrap Welch Two Sample t-tests
## 
## data:  airquality$Ozone and airquality$Month
## alternative hypothesis: true difference in means is not equal to 0
## 
##    groups diff. in means       SE      2.5%     97.5%    p-value adj. p-value
## 1  5 vs 6      5.9575517 6.965299  -7.72265  20.56880 0.43324330   1.00000000
## 2  5 vs 7     35.4748436 7.389274  20.42308  50.07692 0.00020002   0.00200020
## 3  5 vs 8     36.3571088 8.668791  19.53846  53.65577 0.00040004   0.00360036
## 4  5 vs 9      7.8721992 6.025929  -4.40630  19.80564 0.23522350   0.94089410
## 5  6 vs 7     29.6692870 8.267833  13.05214  45.43248 0.00420042   0.02400240
## 6  6 vs 8     30.4539454 9.443246  11.50214  49.38932 0.00400040   0.02400240
## 7  6 vs 9      1.9831826 7.050480 -12.91341  15.68238 0.82248220   1.00000000
## 8  7 vs 8      0.8754337 9.705470 -17.46346  19.88654 0.92589260   1.00000000
## 9  7 vs 9    -27.7096602 7.465314 -42.40723 -12.95146 0.00060006   0.00480048
## 10 8 vs 9    -28.4483592 8.739157 -46.08362 -11.52062 0.00220022   0.01540154

pairwise.ext.t.test(airquality$Ozone, airquality$Month, 
                    method = "perm.t.test")

## 
##  Pairwise Permutation Welch Two Sample t-tests
## 
## data:  airquality$Ozone and airquality$Month
## alternative hypothesis: true difference in means is not equal to 0
## 
##    groups diff. in means        SE       2.5%     97.5%    p-value adj. p-value
## 1  5 vs 6      5.9171567  7.859164  -7.345085  23.62393 0.49304930   1.00000000
## 2  5 vs 7     35.4156570  9.003950  17.996150  52.92308 0.00010001   0.00100010
## 3  5 vs 8     36.2758122 10.191180  16.538460  56.15385 0.00010001   0.00100010
## 4  5 vs 9      7.7036415  6.314074  -4.539788  19.75066 0.22332230   0.89328930
## 5  6 vs 7     29.5471551 12.051430   4.897436  52.31197 0.00520052   0.03120312
## 6  6 vs 8     30.5021455 14.372630   1.205128  57.29487 0.00970097   0.04850485
## 7  6 vs 9      2.0813652  8.435290 -15.938700  17.11111 0.79797980   1.00000000
## 8  7 vs 8      0.8314293  9.854760 -18.461540  20.15385 0.93169320   1.00000000
## 9  7 vs 9    -27.7240971  8.365606 -43.954910 -11.27586 0.00100010   0.00800080
## 10 8 vs 9    -28.3760056  9.501787 -46.814320 -10.12334 0.00190019   0.01330133

sessionInfo

sessionInfo()

## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] exactRankTests_0.8-35 mice_3.17.0           Amelia_1.8.3         
## [4] Rcpp_1.0.14           mvtnorm_1.3-3         MKinfer_1.2          
## [7] rmarkdown_2.29       
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       shape_1.4.6.1      xfun_0.51          bslib_0.9.0       
##  [5] ggplot2_3.5.1      lattice_0.22-6     vctrs_0.6.5        tools_4.4.3       
##  [9] Rdpack_2.6.3       generics_0.1.3     tibble_3.2.1       pan_1.9           
## [13] MKdescr_0.9        pkgconfig_2.0.3    jomo_2.7-6         Matrix_1.7-3      
## [17] arrangements_1.1.9 lifecycle_1.0.4    farver_2.1.2       compiler_4.4.3    
## [21] munsell_0.5.1      mitools_2.4        codetools_0.2-20   htmltools_0.5.8.1 
## [25] sys_3.4.3          buildtools_1.0.0   miceadds_3.17-44   sass_0.4.9        
## [29] yaml_2.3.10        glmnet_4.1-8       gmp_0.7-5          pillar_1.10.1     
## [33] nloptr_2.2.1       jquerylib_0.1.4    tidyr_1.3.1        MASS_7.3-65       
## [37] cachem_1.1.0       reformulas_0.4.0   iterators_1.0.14   rpart_4.1.24      
## [41] boot_1.3-31        foreach_1.5.2      mitml_0.4-5        nlme_3.1-168      
## [45] tidyselect_1.2.1   digest_0.6.37      dplyr_1.1.4        purrr_1.0.4       
## [49] labeling_0.4.3     maketools_1.3.2    splines_4.4.3      fastmap_1.2.0     
## [53] grid_4.4.3         colorspace_2.1-1   cli_3.6.4          magrittr_2.0.3    
## [57] survival_3.8-3     broom_1.0.8        withr_3.0.2        foreign_0.8-90    
## [61] scales_1.3.0       backports_1.5.0    nnet_7.3-20        lme4_1.1-37       
## [65] evaluate_1.0.3     knitr_1.50         rbibutils_2.3      rlang_1.1.5       
## [69] glue_1.8.0         DBI_1.2.3          minqa_1.2.8        jsonlite_2.0.0    
## [73] R6_2.6.1

References

Altman, Douglas G., David Machin, Trevor N. Bryant, and Martin J. Gardner, eds. 2000. Statistics with Confidence: Confidence Intervals and Statistical Guidelines. BMJ Books.

Barnard, J, and D B Rubin. 1999. “Miscellanea. Small-Sample Degrees of Freedom with Multiple Imputation.” Biometrika 86 (4): 948–55. https://doi.org/10.1093/biomet/86.4.948.

Bland, J. M., and D. G. Altman. 1986. “Statistical methods for assessing agreement between two methods of clinical measurement.” Lancet 1 (8476): 307–10.

Botta-Dukát, Z. 2018. “Cautionary Note on Calculating Standardized Effect Size (SES) in Randomization Test.” Community Ecology 19 (1): 77–83. https://doi.org/10.1556/168.2018.19.1.8.

DasGupta, Anirban, T. Tony Cai, and Lawrence D. Brown. 2001. “Interval Estimation for a Binomial Proportion.” Statistical Science 16 (2): 101–33. https://doi.org/10.1214/ss/1009213286.

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Methods and Their Applications. Cambridge: Cambridge University Press. https://statwww.epfl.ch/davison/BMA/.

Efron, Bradley, and Robert J Tibshirani. 1993. An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics and Applied Probability). Chapman; Hall/CRC.

Gulhar, Monika, B M Golam Kibria, and Nasar Ahmed. 2012. “A Comparison of Some Confidence Intervals for Estimating the Population Coefficient of Variation: A Simulation Study.” SORT 36 (January): 45–68.

Hedderich, Jürgen, and Lothar Sachs. 2018. Angewandte Statistik: Methodensammlung Mit r. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-56657-2.

Heymans, Martin W., and Iris Eekhout, eds. 2019. Applied Missing Data Analysis With SPSS and (R)Studio. Self-publishing. https://bookdown.org/mwheymans/bookmi/.

Higgins, Julian P. T., James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li, Matthew J. Page, and Vivian A. Welch, eds. 2019. Cochrane Handbook for Systematic Reviews of Interventions. Wiley. https://doi.org/10.1002/9781119536604.

Janssen, Arnold. 1997. “Studentized Permutation Tests for Non-i.i.d. Hypotheses and the Generalized Behrens-Fisher Problem.” Statistics & Probability Letters 36 (1): 9–21. https://doi.org/10.1016/s0167-7152(97)00043-6.

Newcombe, R. G. 1998a. “Interval Estimation for the Difference Between Independent Proportions: Comparison of Eleven Methods.” Statistics in Medicine 17 (8): 873–90. https://doi.org/10.1002/(sici)1097-0258(19980430)17:8<873::aid-sim779>3.0.co;2-i.

———. 1998b. “Improved confidence intervals for the difference between binomial proportions based on paired data.” Stat Med 17 (22): 2635–50.

Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics). Wiley.

Shieh, G. 2018. “The appropriateness of Bland-Altman’s approximate confidence intervals for limits of agreement.” BMC Med Res Methodol 18 (1): 45.

Sozu, Takashi, Tomoyuki Sugimoto, Toshimitsu Hamasaki, and Scott R. Evans. 2015. Sample Size Determination in Clinical Trials with Multiple Endpoints. Springer Cham. https://doi.org/10.1007/978-3-319-22005-5.