Title: | Statistical Classification |
---|---|
Description: | Performance measures and scores for statistical classification such as accuracy, sensitivity, specificity, recall, similarity coefficients, AUC, GINI index, Brier score and many more. Calculation of optimal cut-offs and decision stumps (Iba and Langley (1991), <doi:10.1016/B978-1-55860-247-2.50035-8>) for all implemented performance measures. Hosmer-Lemeshow goodness of fit tests (Lemeshow and Hosmer (1982), <doi:10.1093/oxfordjournals.aje.a113284>; Hosmer et al (1997), <doi:10.1002/(SICI)1097-0258(19970515)16:9%3C965::AID-SIM509%3E3.0.CO;2-O>). Statistical and epidemiological risk measures such as relative risk, odds ratio, number needed to treat (Porta (2014), <doi:10.1093%2Facref%2F9780199976720.001.0001>). |
Authors: | Matthias Kohl [aut, cre] (0000-0001-9514-8910) |
Maintainer: | Matthias Kohl <[email protected]> |
License: | LGPL-3 |
Version: | 0.5 |
Built: | 2024-11-10 05:00:06 UTC |
Source: | https://github.com/stamats/mkclass |
Performance measures and scores for statistical classification such as accuracy, sensitivity, specificity, recall, similarity coefficients, AUC, GINI index, Brier score and many more. Calculation of optimal cut-offs and decision stumps (Iba and Langley (1991), <doi:10.1016/B978-1-55860-247-2.50035-8>) for all implemented performance measures. Hosmer-Lemeshow goodness of fit tests (Lemeshow and Hosmer (1982), <doi:10.1093/oxfordjournals.aje.a113284>; Hosmer et al (1997), <doi:10.1002/(SICI)1097-0258(19970515)16:9%3C965::AID-SIM509%3E3.0.CO;2-O>). Statistical and epidemiological risk measures such as relative risk, odds ratio, number needed to treat (Porta (2014), <doi:10.1093%2Facref%2F9780199976720.001.0001>).
library(MKclass)
Matthias Kohl https://www.stamats.de
Maintainer: Matthias Kohl [email protected]
The function computes AUC.
AUC(x, y, group, switchAUC = TRUE, na.rm = TRUE)
AUC(x, y, group, switchAUC = TRUE, na.rm = TRUE)
x |
numeric vector. |
y |
numeric vector. If missing, |
group |
grouping vector or factor. |
switchAUC |
logical value. Switch AUC; see Details section. |
na.rm |
logical value, remove |
The function computes the area under the receiver operating characteristic curve (AUC under ROC curve).
If AUC < 0.5
, a warning is printed and 1-AUC
is returned. This
behaviour can be suppressed by using switchAUC = FALSE
The implementation uses the connection of AUC to the Wilcoxon rank sum test; see Hanley and McNeil (1982).
AUC value.
Matthias Kohl [email protected]
J. A. Hanley and B. J. McNeil (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29-36.
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- sample(1:2, 100, replace = TRUE) AUC(x, group = g) ## avoid switching AUC AUC(x, group = g, switchAUC = FALSE)
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- sample(1:2, 100, replace = TRUE) AUC(x, group = g) ## avoid switching AUC AUC(x, group = g, switchAUC = FALSE)
Performs tests for one and two AUCs.
AUC.test(pred1, lab1, pred2, lab2, conf.level = 0.95, paired = FALSE)
AUC.test(pred1, lab1, pred2, lab2, conf.level = 0.95, paired = FALSE)
pred1 |
numeric vector. |
lab1 |
grouping vector or factor for |
pred2 |
numeric vector. |
lab2 |
grouping vector or factor for |
conf.level |
confidence level of the interval. |
paired |
not yet implemented. |
If pred2
and lab2
are missing, the AUC for pred1
and lab1
is tested using the Wilcoxon signed rank test;
see wilcox.test
.
If pred1
and lab1
as well as pred2
and lab2
are specified, the Hanley and McNeil test (cf. Hanley and McNeil (1982))
is computed.
A list with AUC, SE and confidence interval as well as the corresponding test result.
Matthias Kohl [email protected]
J. A. Hanley and B. J. McNeil (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29-36.
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- sample(1:2, 100, replace = TRUE) AUC.test(x, g) y <- rnorm(100) ## assumed as log2-data h <- sample(1:2, 100, replace = TRUE) AUC.test(x, g, y, h)
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- sample(1:2, 100, replace = TRUE) AUC.test(x, g) y <- rnorm(100) ## assumed as log2-data h <- sample(1:2, 100, replace = TRUE) AUC.test(x, g, y, h)
The function computes the confusion matrix of a binary classification.
confMatrix(pred, pred.group, truth, namePos, cutoff = 0.5, relative = TRUE)
confMatrix(pred, pred.group, truth, namePos, cutoff = 0.5, relative = TRUE)
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
pred.group |
vector or factor including the predicted group. If missing,
|
truth |
true grouping vector or factor. |
namePos |
value representing the positive group. |
cutoff |
cutoff value used for classification. |
relative |
logical: absolute and relative values. |
The function computes the confusion matrix of a binary classification consisting of the number of true positive (TP), false negative (FN), false positive (FP) and true negative (TN) predictions.
In addition, their relative counterparts true positive rate (TPR), false negative rate (FNR), false positive rate (FPR) and true negative rate (TNR) can be computed.
matrix
or list
of matrices with respective numbers of true
and false predictions.
Matthias Kohl [email protected]
Wikipedia contributors. (2019, July 18). Confusion matrix. In Wikipedia, The Free Encyclopedia. Retrieved 06:00, August 21, 2019, from https://en.wikipedia.org/w/index.php?title=Confusion_matrix&oldid=906886050
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers confMatrix(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) confMatrix(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) confMatrix(pred2, truth = infert$case, namePos = 1, cutoff = 0) ## only absolute numbers confMatrix(pred, truth = infert$case, namePos = 1, relative = FALSE)
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers confMatrix(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) confMatrix(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) confMatrix(pred2, truth = infert$case, namePos = 1, cutoff = 0) ## only absolute numbers confMatrix(pred, truth = infert$case, namePos = 1, relative = FALSE)
The function computes a decision stump for binary classification also known as 1-level decision tree or 1-rule.
decisionStump(pred, truth, namePos, perfMeasure = "YJS", MAX = TRUE, parallel = FALSE, ncores, delta = 0.01, ...)
decisionStump(pred, truth, namePos, perfMeasure = "YJS", MAX = TRUE, parallel = FALSE, ncores, delta = 0.01, ...)
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
truth |
true grouping vector or factor. |
namePos |
value representing the positive group; i.e., the name of the
category where one expects higher values for |
perfMeasure |
a single performance measure computed by function |
MAX |
logical value. Whether to maximize or minimize the performacne measure. |
parallel |
logical value. If |
ncores |
integer value, number of cores that shall be used to parallelize the computations. |
delta |
numeric value for setting up grid for optimization; start is
minimum of |
... |
further arguments passed to function |
The function is able to compute a decision stump for various performance
measures, all performance measures that are implemented in function
perfMeasures
. Of course, for several of them the computation is
not really usefull such as sensitivity or specificity where one will get
trivial decision rules.
In addition, a decision stump will only give a meaningful result if there is
a monotone relationship between the two categories and the numeric values
given in pred
. In such a case the name of the category where one expects
higher values should be given in namePos
.
Object of class decisionStump
.
Matthias Kohl [email protected]
W. Iba and P. Langley (1992). Induction of One-Level Decision Trees. In: Machine Learning Proceedings 1992, pages 233-240. URL: https://doi.org/10.1016/B978-1-55860-247-2.50035-8
R.C. Holte (1993). Very simple classification rules perform well on most commonly used datasets. In: Machine Learning, pages 63-91. URL: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.67.2711
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") res <- decisionStump(pred, truth = infert$case, namePos = 1) predict(res, newdata = seq(from = 0, to = 1, by = 0.1))
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") res <- decisionStump(pred, truth = infert$case, namePos = 1) predict(res, newdata = seq(from = 0, to = 1, by = 0.1))
The function computes Hosmer-Lemeshow goodness of fit tests for C and H statistic as well as the le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares test for global goodness of fit.
HLgof.test(fit, obs, ngr = 10, X, verbose = FALSE)
HLgof.test(fit, obs, ngr = 10, X, verbose = FALSE)
fit |
numeric vector with fitted probabilities. |
obs |
numeric vector with observed values. |
ngr |
number of groups for C and H statistic. |
X |
covariate(s) for le Cessie-van Houwelingen-Copas-Hosmer global goodness of fit test. |
verbose |
logical, print intermediate results. |
Hosmer-Lemeshow goodness of fit tests are computed; see Lemeshow and Hosmer (1982).
If X
is specified, the le Cessie-van Houwelingen-Copas-Hosmer
unweighted sum of squares test for global goodness of fit is additionally
determined; see Hosmer et al. (1997).
A more general version of this test is implemented in function
residuals.lrm
in package rms.
A list of test results.
Matthias Kohl [email protected]
S. Lemeshow and D.W. Hosmer (1982). A review of goodness of fit statistics for use in the development of logistic regression models. American Journal of Epidemiology, 115(1), 92-106.
D.W. Hosmer, T. Hosmer, S. le Cessie, S. Lemeshow (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 16, 965-980.
set.seed(111) x1 <- factor(sample(1:3, 50, replace = TRUE)) x2 <- rnorm(50) obs <- sample(c(0,1), 50, replace = TRUE) fit <- glm(obs ~ x1+x2, family = binomial) HLgof.test(fit = fitted(fit), obs = obs) HLgof.test(fit = fitted(fit), obs = obs, X = model.matrix(obs ~ x1+x2))
set.seed(111) x1 <- factor(sample(1:3, 50, replace = TRUE)) x2 <- rnorm(50) obs <- sample(c(0,1), 50, replace = TRUE) fit <- glm(obs ~ x1+x2, family = binomial) HLgof.test(fit = fitted(fit), obs = obs) HLgof.test(fit = fitted(fit), obs = obs, X = model.matrix(obs ~ x1+x2))
The function computes the optimal cutoff for various performance weasures for binary classification.
optCutoff(pred, truth, namePos, perfMeasure = "YJS", MAX = TRUE, parallel = FALSE, ncores, delta = 0.01, ...)
optCutoff(pred, truth, namePos, perfMeasure = "YJS", MAX = TRUE, parallel = FALSE, ncores, delta = 0.01, ...)
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
truth |
true grouping vector or factor. |
namePos |
value representing the positive group. |
perfMeasure |
a single performance measure computed by function |
MAX |
logical value. Whether to maximize or minimize the performacne measure. |
parallel |
logical value. If |
ncores |
integer value, number of cores that shall be used to parallelize the computations. |
delta |
numeric value for setting up grid for optimization; start is
minimum of |
... |
further arguments passed to function |
The function is able to compute the optimal cutoff for various performance
measures, all performance measures that are implemented in function
perfMeasures
. Of course, for several of them the computation is
not really usefull such as sensitivity or specificity where one will get
trivial cutoffs.
Optimal cutoff and value of the optimized performance measure based on a simple grid search.
Matthias Kohl [email protected]
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") optCutoff(pred, truth = infert$case, namePos = 1)
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") optCutoff(pred, truth = infert$case, namePos = 1)
The function transforms a given odds-ratio (OR) to the respective relative risk (RR).
or2rr(or, p0, p1)
or2rr(or, p0, p1)
or |
numeric vector: OR (odds-ratio). |
p0 |
numeric vector of length 1: incidence of the outcome of interest in the nonexposed group. |
p1 |
numeric vector of length 1: incidence of the outcome of interest in the exposed group. |
The function transforms a given odds-ratio (OR) to the respective relative risk (RR). It can also be used to transform the limits of confidence intervals.
The formulas can be derived by combining the formulas for RR and OR; see also Zhang and Yu (1998).
relative risk.
Matthias Kohl [email protected]
Zhang, J. and Yu, K. F. (1998). What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA, 280(19):1690-1691.
## We use data from Zhang and Yu (1998) ## OR to RR using OR and p0 or2rr(14.1, 0.05) ## compute p1 or2rr(14.1, 0.05)*0.05 ## OR to RR using OR and p1 or2rr(14.1, p1 = 0.426) ## OR and 95% confidence interval or2rr(c(14.1, 7.8, 27.5), 0.05) ## Logistic OR and 95% confidence interval logisticOR <- rbind(c(14.1, 7.8, 27.5), c(8.7, 5.5, 14.3), c(27.4, 17.2, 45.8), c(4.5, 2.7, 7.8), c(0.25, 0.17, 0.37), c(0.09, 0.05, 0.14)) colnames(logisticOR) <- c("OR", "2.5%", "97.5%") rownames(logisticOR) <- c("7.4", "4.2", "3.0", "2.0", "0.37", "0.14") logisticOR ## p0 p0 <- c(0.05, 0.12, 0.32, 0.27, 0.40, 0.40) ## Compute corrected RR ## helper function or2rr.mat <- function(or, p0){ res <- matrix(NA, nrow = nrow(or), ncol = ncol(or)) for(i in seq_len(nrow(or))) res[i,] <- or2rr(or[i,], p0[i]) dimnames(res) <- dimnames(or) res } RR <- or2rr.mat(logisticOR, p0) round(RR, 2) ## Results are not completely identical to Zhang and Yu (1998) ## what probably is caused by the fact that the logistic OR values ## provided in the table are rounded and are not exact values.
## We use data from Zhang and Yu (1998) ## OR to RR using OR and p0 or2rr(14.1, 0.05) ## compute p1 or2rr(14.1, 0.05)*0.05 ## OR to RR using OR and p1 or2rr(14.1, p1 = 0.426) ## OR and 95% confidence interval or2rr(c(14.1, 7.8, 27.5), 0.05) ## Logistic OR and 95% confidence interval logisticOR <- rbind(c(14.1, 7.8, 27.5), c(8.7, 5.5, 14.3), c(27.4, 17.2, 45.8), c(4.5, 2.7, 7.8), c(0.25, 0.17, 0.37), c(0.09, 0.05, 0.14)) colnames(logisticOR) <- c("OR", "2.5%", "97.5%") rownames(logisticOR) <- c("7.4", "4.2", "3.0", "2.0", "0.37", "0.14") logisticOR ## p0 p0 <- c(0.05, 0.12, 0.32, 0.27, 0.40, 0.40) ## Compute corrected RR ## helper function or2rr.mat <- function(or, p0){ res <- matrix(NA, nrow = nrow(or), ncol = ncol(or)) for(i in seq_len(nrow(or))) res[i,] <- or2rr(or[i,], p0[i]) dimnames(res) <- dimnames(or) res } RR <- or2rr.mat(logisticOR, p0) round(RR, 2) ## Results are not completely identical to Zhang and Yu (1998) ## what probably is caused by the fact that the logistic OR values ## provided in the table are rounded and are not exact values.
The function computes pairwise AUCs.
pairwise.auc(x, g)
pairwise.auc(x, g)
x |
numeric vector. |
g |
grouping vector or factor |
The function computes pairwise areas under the receiver operating
characteristic curves (AUC under ROC curves) using function AUC
.
The implementation is in certain aspects analogously to
pairwise.t.test
.
Vector with pairwise AUCs.
Matthias Kohl [email protected]
set.seed(13) x <- rnorm(100) g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.auc(x, g)
set.seed(13) x <- rnorm(100) g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.auc(x, g)
The function computes various performance measures for binary classification.
perfMeasures(pred, pred.group, truth, namePos, cutoff = 0.5, weight = 0.5, wACC = weight, wLR = weight, wPV = weight, beta = 1, measures = "all")
perfMeasures(pred, pred.group, truth, namePos, cutoff = 0.5, weight = 0.5, wACC = weight, wLR = weight, wPV = weight, beta = 1, measures = "all")
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
pred.group |
vector or factor including the predicted group. If missing,
|
truth |
true grouping vector or factor. |
namePos |
value representing the positive group. |
cutoff |
cutoff value used for classification. |
weight |
weight used for computing weighted values. Must be in [0,1]. |
wACC |
weight used for computing the weighted accuracy, where sensitivity
is multiplied by |
wLR |
weight used for computing the weighted likelihood ratio, where PLR
is multiplied by |
wPV |
weight used for computing the weighted predictive value, where PPV
is multiplied by |
beta |
beta coefficient used for computing the F beta score. Must be nonnegative. |
measures |
character vector giving the measures that shall be computed;
see details. Default |
The function perfMeasures
can be used to compute various performance
measures. For computing specific measures, the abbreviation given in
parentheses have to be specified in argument measures
. Single measures
can also be computed by respective functions, where their names are identical
to the abbreviations given in the parentheses.
The measures are: accuracy (ACC), probability of correct classification (PCC), fraction correct (FC), simple matching coefficient (SMC), Rand (similarity) index (RSI), probability of misclassification (PMC), error rate (ER), fraction incorrect (FIC), sensitivity (SENS), recall (REC), true positive rate (TPR), probability of detection (PD), hit rate (HR), specificity (SPEC), true negative rate (TNR), selectivity (SEL), detection rate (DR), false positive rate (FPR), fall-out (FO), false alarm (rate) (FAR), probability of false alarm (PFA), false negative rate (FNR), miss rate (MR), false discovery rate (FDR), false omission rate (FOR), prevalence (PREV), (positive) pre-test probability (PREP), (positive) pre-test odds (PREO), detection prevalence (DPREV), negative pre-test probability (NPREP), negative pre-test odds (NPREO), no information rate (NIR), weighted accuracy (WACC), balanced accuracy (BACC), (bookmaker) informedness (INF), Youden's J statistic (YJS), deltap' (DPp), positive likelihood ratio (PLR), negative likelihood ratio (NLR), weighted likelihood ratio (WLR), balanced likelihood ratio (BLR), diagnostic odds ratio (DOR), positive predictive value (PPV), precision (PREC), (positive) post-test probability (POSTP), (positive) post-test odds (POSTO), Bayes factor G1 (BFG1), negative predictive value (NPV), negative post-test probability (NPOSTP), negative post-test odds (NPOSTO), Bayes factor G0 (BFG0), markedness (MARK), deltap (DP), weighted predictive value (WPV), balanced predictive value (BPV), F1 score (F1S), Dice similarity coefficient (DSC), F beta score (FBS), Jaccard similarity coefficient (JSC), threat score (TS), critical success index (CSI), Matthews' correlation coefficient (MCC), Pearson's correlation (r phi) (RPHI), Phi coefficient (PHIC), Cramer's V (CRV), proportion of positive predictions (PPP), expected accuracy (EACC), Cohen's kappa coefficient (CKC), mutual information in bits (MI2), joint entropy in bits (JE2), variation of information in bits (VI2), Jaccard distance (JD), information quality ratio (INFQR), uncertainty coefficient (UC), entropy coefficient (EC), proficiency (metric) (PROF), deficiency (metric) (DFM), redundancy (RED), symmetric uncertainty (SU), normalized uncertainty (NU)
These performance measures have in common that they require a dichotomization
of the computed predictions (classification function). For measuring the performance
without dichotomization one can apply function perfScores
.
The prevalence is the prevalence given by the data. This often is not identical
to the prevalence of the population. Hence, it might be better to compute
PPV and NPV (and derived measures) by applying function predValues
,
where one can specify the assumed prevalence. This holds in general for all
measures that depend on the prevalence.
data.frame
with names of the performance measures and their
respective values.
Matthias Kohl [email protected]
K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010). The balanced accuracy and its posterior distribution. In Pattern Recognition (ICPR), 20th International Conference on, 3121-3124 (IEEE, 2010).
J.A. Cohen (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3746.
T. Fawcett (2006). An introduction to ROC analysis. Pattern Recognition Letters 27, 861-874.
T.A. Gerds, T. Cai, M. Schumacher (2008). The performance of risk prediction models. Biom J 50, 457-479.
D. Hand, R. Till (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171-186.
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813-2869.
B.W. Matthews (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 442-451.
D.M. Powers (2011). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1, 37-63.
N.A. Smits (2010). A note on Youden's J and its cost ratio. BMC Medical Research Methodology 10, 89.
B. Wallace, I. Dahabreh (2012). Class probability estimates are unreliable for imbalanced data (and how to fix them). In Data Mining (ICDM), IEEE 12th International Conference on, 695-04.
J.W. Youden (1950). Index for rating diagnostic tests. Cancer 3, 32-35.
confMatrix
, predValues
, perfScores
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers perfMeasures(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) perfMeasures(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) perfMeasures(pred2, truth = infert$case, namePos = 1, cutoff = 0) ## using weights perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3) ## selecting a subset of measures perfMeasures(pred, truth = infert$case, namePos = 1, measures = c("SENS", "SPEC", "BACC", "YJS"))
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers perfMeasures(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) perfMeasures(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) perfMeasures(pred2, truth = infert$case, namePos = 1, cutoff = 0) ## using weights perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3) ## selecting a subset of measures perfMeasures(pred, truth = infert$case, namePos = 1, measures = c("SENS", "SPEC", "BACC", "YJS"))
The function computes various performance scores for binary classification.
perfScores(pred, truth, namePos, wBS = 0.5, scores = "all", transform = FALSE)
perfScores(pred, truth, namePos, wBS = 0.5, scores = "all", transform = FALSE)
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
truth |
true grouping vector or factor. |
namePos |
value representing the positive group. |
wBS |
weight used for computing the weighted Brier score (BS), where
postive BS is multiplied by |
scores |
character vector giving the scores that shall be computed;
see details. Default |
transform |
logical value indicating whether the values in |
The function perfScores
can be used to compute various performance
scores. For computing specific scores, the abbreviation given in
parentheses have to be specified in argument scores
. Single scores
can also be computed by respective functions, where their names are identical
to the abbreviations given in the parentheses.
The available scores are: area under the ROC curve (AUC), Gini index (GINI), Brier score (BS), positive Brier score (PBS), negative Brier score (NBS), weighted Brier score (WBS), balanced Brier score (BBS), Brier skill score (BSS).
If the predictions (pred
) are not in the interval [0,1], the various
Brier scores are not valid. By setting argument transform
to TRUE
,
a simple logistic regression model is fit to the provided data and the
predicted values are used for the computations.
data.frame
with names of the scores and their respective values.
Matthias Kohl [email protected]
G.W. Brier (1950). Verification of forecasts expressed in terms of probability. Mon. Wea. Rev. 78, 1-3.
T. Fawcett (2006). An introduction to ROC analysis. Pattern Recognition Letters 27, 861-874.
T.A. Gerds, T. Cai, M. Schumacher (2008). The performance of risk prediction models. Biom J 50, 457-479.
D. Hand, R. Till (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171-186.
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2011). Brier curves: a new cost- based visualisation of classifier performance. In L. Getoor and T. Scheffer (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML-11), 585???592 (ACM, New York, NY, USA).
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813-2869.
B.W. Matthews (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 442-451.
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers perfScores(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) perfScores(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) perfScores(pred2, truth = infert$case, namePos = 1) ## using weights perfScores(pred, truth = infert$case, namePos = 1, wBS = 0.3)
## example from dataset infert fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial()) pred <- predict(fit, type = "response") ## with group numbers perfScores(pred, truth = infert$case, namePos = 1) ## with group names my.case <- factor(infert$case, labels = c("control", "case")) perfScores(pred, truth = my.case, namePos = "case") ## on the scale of the linear predictors pred2 <- predict(fit) perfScores(pred2, truth = infert$case, namePos = 1) ## using weights perfScores(pred, truth = infert$case, namePos = 1, wBS = 0.3)
The function computes the positive (PPV) and negative predictive value (NPV) given sensitivity, specificity and prevalence (pre-test probability).
predValues(sens, spec, prev)
predValues(sens, spec, prev)
sens |
numeric vector: sensitivities. |
spec |
numeric vector: specificities. |
prev |
numeric vector: prevalence. |
The function computes the positive (PPV) and negative predictive value (NPV) given sensitivity, specificity and prevalence (pre-test probability).
It's a simple application of the Bayes formula.
One can also specify vectors of length larger than 1 for sensitivity and specificity.
Vector or matrix with PPV and NPV.
Matthias Kohl [email protected]
## Example: HIV test ## 1. ELISA screening test (4th generation) predValues(sens = 0.999, spec = 0.998, prev = 0.001) ## 2. Western-Plot confirmation test predValues(sens = 0.998, spec = 0.999996, prev = 1/3) ## Example: connection between sensitivity, specificity and PPV sens <- seq(0.6, 0.99, by = 0.01) spec <- seq(0.6, 0.99, by = 0.01) ppv <- function(sens, spec, pre) predValues(sens, spec, pre)[,1] res <- outer(sens, spec, ppv, pre = 0.1) image(sens, spec, res, col = terrain.colors(256), main = "PPV for prevalence = 10%", xlim = c(0.59, 1), ylim = c(0.59, 1)) contour(sens, spec, res, add = TRUE)
## Example: HIV test ## 1. ELISA screening test (4th generation) predValues(sens = 0.999, spec = 0.998, prev = 0.001) ## 2. Western-Plot confirmation test predValues(sens = 0.998, spec = 0.999996, prev = 1/3) ## Example: connection between sensitivity, specificity and PPV sens <- seq(0.6, 0.99, by = 0.01) spec <- seq(0.6, 0.99, by = 0.01) ppv <- function(sens, spec, pre) predValues(sens, spec, pre)[,1] res <- outer(sens, spec, ppv, pre = 0.1) image(sens, spec, res, col = terrain.colors(256), main = "PPV for prevalence = 10%", xlim = c(0.59, 1), ylim = c(0.59, 1)) contour(sens, spec, res, add = TRUE)
The function computes relative risk (RR), odds ration (OR), and several other risk measures; see details.
risks(p0, p1)
risks(p0, p1)
p0 |
numeric vector of length 1: incidence of the outcome of interest in the nonexposed group. |
p1 |
numeric vector of length 1: incidence of the outcome of interest in the exposed group. |
The function computes relative risk (RR), odds-ratio (OR), relative risk reduction (RRR) resp. relative risk increase (RRI), absolute risk reduction (ARR) resp. absolute risk increase (ARI), number needed to treat (NNT) resp. number needed to harm (NNH).
Vector including several risk measures.
Matthias Kohl [email protected]
Porta, M. (2014). A Dictionary of Epidemiology. Oxford University Press. Retrieved 3 Oct. 2020, from https://www.oxfordreference.com/view/10.1093/acref/9780199976720.001.0001/acref-9780199976720
## See worked example in Wikipedia risks(p0 = 0.4, p1 = 0.1) risks(p0 = 0.4, p1 = 0.5)
## See worked example in Wikipedia risks(p0 = 0.4, p1 = 0.1) risks(p0 = 0.4, p1 = 0.5)
The function computes an approximate confidence interval for the relative risk (RR).
rrCI(a, b, c, d, conf.level = 0.95)
rrCI(a, b, c, d, conf.level = 0.95)
a |
integer: events in exposed group. |
b |
integer: non-events in exposed group. |
c |
integer: events in non-exposed group. |
d |
integer: non-events in non-exposed group. |
conf.level |
numeric: confidence level |
The function computes an approximate confidence interval for the relative risk (RR) based on the normal approximation; see Jewell (2004).
A list with class "confint"
containing the following components:
estimate |
the estimated relative risk. |
conf.int |
a confidence interval for the relative risk. |
Matthias Kohl [email protected]
Jewell, Nicholas P. (2004). Statistics for epidemiology. Chapman & Hall/CRC.
Relative risk. (2016, November 4). In Wikipedia, The Free Encyclopedia. Retrieved 19:58, November 4, 2016, from https://en.wikipedia.org/w/index.php?title=Relative_risk&oldid=747857409
## See worked example in Wikipedia rrCI(a = 15, b = 135, c = 100, d = 150) rrCI(a = 75, b = 75, c = 100, d = 150)
## See worked example in Wikipedia rrCI(a = 15, b = 135, c = 100, d = 150) rrCI(a = 75, b = 75, c = 100, d = 150)