Title: | Omics Data Analysis |
---|---|
Description: | Similarity plots based on correlation and median absolute deviation (MAD); adjusting colors for heatmaps; aggregate technical replicates; calculate pairwise fold-changes and log fold-changes; compute one- and two-way ANOVA; simplified interface to package 'limma' (Ritchie et al. (2015), <doi:10.1093/nar/gkv007> ) for moderated t-test and one-way ANOVA; Hamming and Levenshtein (edit) distance of strings as well as optimal alignment scores for global (Needleman-Wunsch) and local (Smith-Waterman) alignments with constant gap penalties (Merkl and Waack (2009), ISBN:978-3-527-32594-8). |
Authors: | Matthias Kohl [aut, cre] (0000-0001-9514-8910) |
Maintainer: | Matthias Kohl <[email protected]> |
License: | LGPL-3 |
Version: | 0.8 |
Built: | 2024-11-09 05:38:51 UTC |
Source: | https://github.com/stamats/mkomics |
Similarity plots based on correlation and median absolute deviation (MAD); adjusting colors for heatmaps; aggregate technical replicates; calculate pairwise fold-changes and log fold-changes; computate one- and two-way ANOVA; simplified interface to package 'limma' (Ritchie et al. (2015), <doi:10.1093/nar/gkv007>) for moderated t-test and one-way ANOVA; Hamming and Levenshtein (edit) distance of strings as well as optimal alignment scores for global (Needleman-Wunsch) and local (Smith-Waterman) alignments with constant gap penalties (Merkl and Waack (2009), ISBN:978-3-527-32594-8).
Package: | MKomics |
Type: | Package |
Version: | 0.8 |
Date: | 2022-09-06 |
Depends: | R(>= 3.5.0) |
Imports: | stats, utils, graphics, grDevices, RColorBrewer, robustbase, limma, grid, circlize, ComplexHeatmap |
Suggests: | knitr, rmarkdown |
License: | LGPL-3 |
URL: | https://www.stamats.de/ |
library(MKomics)
Matthias Kohl https://www.stamats.de
Maintainer: Matthias Kohl [email protected]
The function computes and returns the correlation and absolute correlation distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.
corDist(x, method = "pearson", diag = FALSE, upper = FALSE, abs = FALSE, use = "pairwise.complete.obs", ...)
corDist(x, method = "pearson", diag = FALSE, upper = FALSE, abs = FALSE, use = "pairwise.complete.obs", ...)
x |
a numeric matrix or data frame |
method |
the correlation distance measure to be used. This must be one of
|
diag |
logical value indicating whether the diagonal of the distance matrix should be printed by 'print.dist'. |
upper |
logical value indicating whether the upper triangle of the distance matrix should be printed by 'print.dist'. |
abs |
logical, compute absolute correlation distances |
use |
character, correponds to argument |
... |
further arguments to functions |
The function computes the Pearson, Spearman, Kendall or Cosine sample correlation
and absolute correlation; confer Section 12.2.2 of Gentleman et al (2005). For more
details about the arguments we refer to functions dist
and
cor
.
Moreover, the function computes the minimum covariance determinant or the
orthogonalized Gnanadesikan-Kettenring estimator. For more details we refer to
functions covMcd
and covOGK
,
respectively.
corDist
returns an object of class "dist"
; cf. dist
.
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
Gentleman R. Ding B., Dudoit S. and Ibrahim J. (2005). Distance Measures in DNA Microarray Data Analysis. In: Gentleman R., Carey V.J., Huber W., Irizarry R.A. and Dudoit S. (editors) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer.
P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley.
P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212-223.
Pison, G., Van Aelst, S., and Willems, G. (2002), Small Sample Corrections for LTS and MCD, Metrika, 55, 111-123.
Maronna, R.A. and Zamar, R.H. (2002). Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44(4), 307-317.
Gnanadesikan, R. and John R. Kettenring (1972). Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81-124.
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) D <- corDist(M)
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) D <- corDist(M)
Plot of similarity matrix. This function is a slight modification of function
plot.cor
of the archived package "sma"
.
corPlot(x, new = FALSE, col, minCor, labels = FALSE, lab.both.axes = FALSE, labcols = "black", title = "", cex.title = 1.2, protocol = FALSE, cex.axis = 0.8, cex.axis.bar = 1, signifBar = 2, ...) corPlot2(x, new = FALSE, col, minCor = 0.5, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2, use_raster = FALSE)
corPlot(x, new = FALSE, col, minCor, labels = FALSE, lab.both.axes = FALSE, labcols = "black", title = "", cex.title = 1.2, protocol = FALSE, cex.axis = 0.8, cex.axis.bar = 1, signifBar = 2, ...) corPlot2(x, new = FALSE, col, minCor = 0.5, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2, use_raster = FALSE)
x |
data or correlation matrix, respectively |
new |
If |
col |
colors palette for image. If missing, the |
minCor |
numeric value in [-1,1], used to adjust |
labels |
vector of character strings to be placed at the tickpoints,
labels for the columns of |
lab.both.axes |
logical, display labels on both axes |
labcols |
colors to be used for the labels of the columns of |
title |
character string, overall title for the plot. |
cex.title |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default;
cf. |
fontsize.title |
numerical value giving the fontsize of the title. |
protocol |
logical, display color bar without numbers. |
cex.axis |
The magnification to be used for axis annotation relative to the
current setting of 'cex'; cf. |
fontsize.axis |
numerical value giving the fontsize of the axis labels. |
cex.axis.bar |
The magnification to be used for axis annotation of the color
bar relative to the current setting of 'cex'; cf.
|
signifBar |
integer indicating the precision to be used for the bar. |
row.width |
numerical value giving width of the row in centimeters; i.e., can be used to change space available for the labels. |
column.height |
numerical value giving the height of the column in centimeters; i.e., can be used to change space available for the labels. |
use_raster |
see |
... |
graphical parameters may also be supplied as arguments to the
function (see |
This functions generates the so called similarity matrix (based on correlation) for a microarray experiment.
If min(x)
, respectively min(cor(x))
is smaller than minCor
,
the colors in col
are adjusted such that the minimum correlation value
which is color coded is equal to minCor
.
The newer function corPlot2
is based on function Heatmap
of
Bioconductor package ComplexHeatmap.
invisible()
A first version of function corPlot
appeared in package SLmisc.
Matthias Kohl [email protected]
Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with
contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar.
sma: Statistical Microarray Analysis.
http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) colnames(M) <- paste("Sample", 1:20) M.cor <- cor(M) corPlot(M.cor, minCor = min(M.cor)) corPlot(M.cor, minCor = min(M.cor), lab.both.axes = TRUE) corPlot(M.cor, minCor = min(M.cor), protocol = TRUE) corPlot(M.cor, minCor = min(M.cor), signifBar = 1) corPlot2(M.cor, minCor = min(M.cor)) corPlot2(M.cor, minCor = min(M.cor), lab.both.axes = FALSE) corPlot2(M.cor, minCor = min(M.cor), signifBar = 1)
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) colnames(M) <- paste("Sample", 1:20) M.cor <- cor(M) corPlot(M.cor, minCor = min(M.cor)) corPlot(M.cor, minCor = min(M.cor), lab.both.axes = TRUE) corPlot(M.cor, minCor = min(M.cor), protocol = TRUE) corPlot(M.cor, minCor = min(M.cor), signifBar = 1) corPlot2(M.cor, minCor = min(M.cor)) corPlot2(M.cor, minCor = min(M.cor), lab.both.axes = FALSE) corPlot2(M.cor, minCor = min(M.cor), signifBar = 1)
This function modifies a given color vector as used for heatmaps.
heatmapCol(data, col, lim, na.rm = TRUE)
heatmapCol(data, col, lim, na.rm = TRUE)
data |
matrix or data.frame; data which shall be displayed in a heatmap; ranging from negative to positive numbers. |
col |
vector of colors used for heatmap. |
lim |
constant colors are used for data below |
na.rm |
logical; remove |
Colors below and above a specified value are kept constant. In addition, the colors are symmetrizised.
vector of colors
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
data.plot <- matrix(rnorm(100*50, sd = 1), ncol = 50) colnames(data.plot) <- paste("patient", 1:50) rownames(data.plot) <- paste("gene", 1:100) data.plot[1:70, 1:30] <- data.plot[1:70, 1:30] + 3 data.plot[71:100, 31:50] <- data.plot[71:100, 31:50] - 1.4 data.plot[1:70, 31:50] <- rnorm(1400, sd = 1.2) data.plot[71:100, 1:30] <- rnorm(900, sd = 1.2) nrcol <- 128 require(RColorBrewer) myCol <- rev(colorRampPalette(brewer.pal(10, "RdBu"))(nrcol)) heatmap(data.plot, col = myCol, main = "standard colors") myCol2 <- heatmapCol(data = data.plot, col = myCol, lim = min(abs(range(data.plot)))-1) heatmap(data.plot, col = myCol2, main = "heatmapCol colors")
data.plot <- matrix(rnorm(100*50, sd = 1), ncol = 50) colnames(data.plot) <- paste("patient", 1:50) rownames(data.plot) <- paste("gene", 1:100) data.plot[1:70, 1:30] <- data.plot[1:70, 1:30] + 3 data.plot[71:100, 31:50] <- data.plot[71:100, 31:50] - 1.4 data.plot[1:70, 31:50] <- rnorm(1400, sd = 1.2) data.plot[71:100, 1:30] <- rnorm(900, sd = 1.2) nrcol <- 128 require(RColorBrewer) myCol <- rev(colorRampPalette(brewer.pal(10, "RdBu"))(nrcol)) heatmap(data.plot, col = myCol, main = "standard colors") myCol2 <- heatmapCol(data = data.plot, col = myCol, lim = min(abs(range(data.plot)))-1) heatmap(data.plot, col = myCol2, main = "heatmapCol colors")
Compute MAD between colums of a matrix or data.frame. Can be used to create a similarity matrix for a microarray experiment.
madMatrix(x)
madMatrix(x)
x |
matrix or data.frame |
This functions computes the so called similarity matrix (based on MAD) for a microarray experiment; cf. Buness et. al. (2004).
matrix of MAD values between colums of x
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. arrayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Advance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti052
plotMAD
## only a dummy example madMatrix(matrix(rnorm(1000), ncol = 10))
## only a dummy example madMatrix(matrix(rnorm(1000), ncol = 10))
Plot of similarity matrix based on MAD between microarrays.
madPlot(x, new = FALSE, col, maxMAD = 3, labels = FALSE, labcols = "black", title = "", protocol = FALSE, ...) madPlot2(x, new = FALSE, col, maxMAD = 3, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2)
madPlot(x, new = FALSE, col, maxMAD = 3, labels = FALSE, labcols = "black", title = "", protocol = FALSE, ...) madPlot2(x, new = FALSE, col, maxMAD = 3, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2)
x |
data or correlation matrix, respectively |
new |
If |
col |
colors palette for image. If missing, the |
maxMAD |
maximum MAD value displayed |
labels |
vector of character strings to be placed at the tickpoints,
labels for the columns of |
labcols |
colors to be used for the labels of the columns of |
title |
character string, overall title for the plot. |
fontsize.title |
numerical value giving the fontsize of the title. |
protocol |
logical, display color bar without numbers |
lab.both.axes |
logical, display labels on both axes |
fontsize.axis |
numerical value giving the fontsize of the axis labels. |
signifBar |
integer indicating the precision to be used for the bar. |
row.width |
numerical value giving width of the row in centimeters; i.e., can be used to change space available for the labels. |
column.height |
numerical value giving the height of the column in centimeters; i.e., can be used to change space available for the labels. |
... |
graphical parameters may also be supplied as arguments to the
function (see |
This functions generates the so called similarity matrix (based on MAD) for
a microarray experiment; cf. Buness et. al. (2004). The function is similar
to corPlot
.
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with
contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar.
sma: Statistical Microarray Analysis.
http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sueltmann, and Annemarie Poustka. arrayMagic: two-colour cDNA microarray quality control and preprocessing. Bioinformatics Advance Access published on September 28, 2004. doi:10.1093/bioinformatics/bti052
corPlot
## only a dummy example set.seed(13) x <- matrix(rnorm(1000), ncol = 10) x[1:20,5] <- x[1:20,5] + 10 madPlot(x, new = TRUE, maxMAD = 2.5) madPlot2(x, new = TRUE, maxMAD = 2.5) ## in contrast corPlot2(x, new = TRUE, minCor = -0.5)
## only a dummy example set.seed(13) x <- matrix(rnorm(1000), ncol = 10) x[1:20,5] <- x[1:20,5] + 10 madPlot(x, new = TRUE, maxMAD = 2.5) madPlot2(x, new = TRUE, maxMAD = 2.5) ## in contrast corPlot2(x, new = TRUE, minCor = -0.5)
Performs moderated 1-Way ANOVAs based on Bioconductor package limma.
mod.oneway.test(x, group, repeated = FALSE, subject, adjust.method = "BH", sort.by = "none")
mod.oneway.test(x, group, repeated = FALSE, subject, adjust.method = "BH", sort.by = "none")
x |
a (non-empty) numeric matrix of data values. |
group |
an optional factor representing the groups. |
repeated |
logical indicating whether there are repeated-measures. |
subject |
factor with subject IDs; required if |
adjust.method |
see |
sort.by |
see |
, where "logFC"
corresponds to difference in means.
The function uses Bioconductor package limma to compute moderated 1-way ANOVAs.
For more details we refer to ebayes
.
A data.frame with the results.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
oneway.test
, mod.t.test
set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) gr <- factor(c(rep("A1", 5), rep("B2", 5), rep("C3", 5), rep("D4", 5))) mod.oneway.test(X, gr) ## Welch 1-Way ANOVA (not moderated) ow.test <- function(x, g){ res <- oneway.test(x ~ g) c(res$statistic, res$p.value) } ow.res <- t(apply(X, 1, ow.test, g = gr)) colnames(ow.res) <- c("F", "p.value") ow.res ## repeated measures X <- rbind(matrix(rnorm(6*18), nrow = 6, ncol = 18), matrix(rnorm(6*18, mean = 1), nrow = 6, ncol = 18)) gr <- factor(c(rep("T1", 6), rep("T2", 6), rep("T3", 6))) subjectID <- factor(c(rep(1:6, 3))) mod.oneway.test(X, gr, repeated = TRUE, subject = subjectID)
set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) gr <- factor(c(rep("A1", 5), rep("B2", 5), rep("C3", 5), rep("D4", 5))) mod.oneway.test(X, gr) ## Welch 1-Way ANOVA (not moderated) ow.test <- function(x, g){ res <- oneway.test(x ~ g) c(res$statistic, res$p.value) } ow.res <- t(apply(X, 1, ow.test, g = gr)) colnames(ow.res) <- c("F", "p.value") ow.res ## repeated measures X <- rbind(matrix(rnorm(6*18), nrow = 6, ncol = 18), matrix(rnorm(6*18, mean = 1), nrow = 6, ncol = 18)) gr <- factor(c(rep("T1", 6), rep("T2", 6), rep("T3", 6))) subjectID <- factor(c(rep(1:6, 3))) mod.oneway.test(X, gr, repeated = TRUE, subject = subjectID)
Performs moderated t-tests based on Bioconductor package limma.
mod.t.test(x, group = NULL, paired = FALSE, subject, adjust.method = "BH", sort.by = "none", na.rm = TRUE)
mod.t.test(x, group = NULL, paired = FALSE, subject, adjust.method = "BH", sort.by = "none", na.rm = TRUE)
x |
a (non-empty) numeric matrix of data values. |
group |
an optional factor representing the groups. |
paired |
a logical indicating whether you want a paired test. |
subject |
factor with subject IDs; required if |
adjust.method |
see |
sort.by |
see |
, where "logFC"
corresponds to difference in means.
na.rm |
logical. Should missing values (including NaN) be omitted from the calculations of group means? |
The function uses Bioconductor package limma to compute moderated t-tests.
For more details we refer to ebayes
.
A data.frame with the results.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
## One-sample test X <- matrix(rnorm(10*20, mean = 1), nrow = 10, ncol = 20) mod.t.test(X) ## corresponds to library(limma) design <- matrix(1, nrow = ncol(X), ncol = 1) colnames(design) <- "A" fit1 <- lmFit(X, design) fit2 <- eBayes(fit1) topTable(fit2, coef = 1, number = Inf, confint = TRUE, sort.by = "none")[,-4] ## Two-sample test set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) g2 <- factor(c(rep("group 1", 10), rep("group 2", 10))) mod.t.test(X, group = g2) ## corresponds to design <- model.matrix(~ 0 + g2) colnames(design) <- c("group1", "group2") fit1 <- lmFit(X, design) cont.matrix <- makeContrasts(group1vsgroup2="group1-group2", levels=design) fit2 <- contrasts.fit(fit1, cont.matrix) fit3 <- eBayes(fit2) topTable(fit3, coef = 1, number = Inf, confint = TRUE, sort.by = "none")[,-4] ## Paired two-sample test subjID <- factor(rep(1:10, 2)) mod.t.test(X, group = g2, paired = TRUE, subject = subjID)
## One-sample test X <- matrix(rnorm(10*20, mean = 1), nrow = 10, ncol = 20) mod.t.test(X) ## corresponds to library(limma) design <- matrix(1, nrow = ncol(X), ncol = 1) colnames(design) <- "A" fit1 <- lmFit(X, design) fit2 <- eBayes(fit1) topTable(fit2, coef = 1, number = Inf, confint = TRUE, sort.by = "none")[,-4] ## Two-sample test set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) g2 <- factor(c(rep("group 1", 10), rep("group 2", 10))) mod.t.test(X, group = g2) ## corresponds to design <- model.matrix(~ 0 + g2) colnames(design) <- c("group1", "group2") fit1 <- lmFit(X, design) cont.matrix <- makeContrasts(group1vsgroup2="group1-group2", levels=design) fit2 <- contrasts.fit(fit1, cont.matrix) fit3 <- eBayes(fit2) topTable(fit3, coef = 1, number = Inf, confint = TRUE, sort.by = "none")[,-4] ## Paired two-sample test subjID <- factor(rep(1:10, 2)) mod.t.test(X, group = g2, paired = TRUE, subject = subjID)
This function is a slight modification of function Anova
of
package genefilter.
oneWayAnova(cov, na.rm = TRUE, var.equal = FALSE)
oneWayAnova(cov, na.rm = TRUE, var.equal = FALSE)
cov |
The covariate. It must have length equal to the number of
columns of the array that the result of |
na.rm |
a logical value indicating whether |
var.equal |
a logical variable indicating whether to treat the variances
in the samples as equal. If |
The function returned by oneWayAnova
uses oneway.test
to perform a one-way ANOVA, where x
is the set of gene expressions.
The F statistic for an overall effect is computed and the corresponding
p-value is returned.
The function Anova
of package genefilter instead compares the computed
p-value to a prespecified p-value and returns TRUE
, if the computed p-value
is smaller than the prespecified one.
oneWayAnova
returns a function with bindings for cov
that will
perform a one-way ANOVA.
The covariate can be continuous, in which case the test is for a linear effect for the covariate.
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
R. Gentleman, V. Carey, W. Huber and F. Hahne (2006). genefilter: methods for filtering genes from microarray experiments. R package version 1.13.7.
set.seed(123) af <- oneWayAnova(c(rep(1,5),rep(2,5))) af(rnorm(10))
set.seed(123) af <- oneWayAnova(c(rep(1,5),rep(2,5))) af(rnorm(10))
This function computes pairwise fold changes. It also works for logarithmic data.
pairwise.fc(x, g, ave = mean, log = TRUE, base = 2, mod.fc = TRUE, ...)
pairwise.fc(x, g, ave = mean, log = TRUE, base = 2, mod.fc = TRUE, ...)
x |
numeric vector. |
g |
grouping vector or factor |
ave |
function to compute the group averages. |
log |
logical. Is the data logarithmic? |
base |
If |
mod.fc |
logical. Return modified fold changes? (see details) |
... |
optional arguments to |
The function computes pairwise fold changes between groups, where
the group values are aggregated using the function which is
given by the argument ave
.
The fold changes are returned in a slightly modified form if mod.fc = TRUE.
Fold changes FC
which are smaller than 1
are reported as
to -1/FC
.
The implementation is in certain aspects analogously to pairwise.t.test
.
Vector with pairwise fold changes.
Matthias Kohl [email protected]
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.fc(x, g) ## some small checks res <- by(x, list(g), mean) 2^(res[[1]] - res[[2]]) # a vs. b -1/2^(res[[1]] - res[[3]]) # a vs. c 2^(res[[1]] - res[[4]]) # a vs. d -1/2^(res[[2]] - res[[3]]) # b vs. c -1/2^(res[[2]] - res[[4]]) # b vs. d 2^(res[[3]] - res[[4]]) # c vs. d
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.fc(x, g) ## some small checks res <- by(x, list(g), mean) 2^(res[[1]] - res[[2]]) # a vs. b -1/2^(res[[1]] - res[[3]]) # a vs. c 2^(res[[1]] - res[[4]]) # a vs. d -1/2^(res[[2]] - res[[3]]) # b vs. c -1/2^(res[[2]] - res[[4]]) # b vs. d 2^(res[[3]] - res[[4]]) # c vs. d
The function computes pairwise log-fold changes.
pairwise.logfc(x, g, ave = mean, log = TRUE, base = 2, ...)
pairwise.logfc(x, g, ave = mean, log = TRUE, base = 2, ...)
x |
numeric vector. |
g |
grouping vector or factor |
ave |
function to compute the group averages. |
log |
logical. Is the data logarithmic? |
base |
If |
... |
optional arguments to |
The function computes pairwise log-fold changes between groups, where
the group values are aggregated using the function which is
given by the argument ave
.
The implementation is in certain aspects analogously to pairwise.t.test
.
Vector with pairwise log-fold changes.
Matthias Kohl [email protected]
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.logfc(x, g) ## some small checks res <- by(x, list(g), mean) res[[1]] - res[[2]] # a vs. b res[[1]] - res[[3]] # a vs. c res[[1]] - res[[4]] # a vs. d res[[2]] - res[[3]] # b vs. c res[[2]] - res[[4]] # b vs. d res[[3]] - res[[4]] # c vs. d
set.seed(13) x <- rnorm(100) ## assumed as log2-data g <- factor(sample(1:4, 100, replace = TRUE)) levels(g) <- c("a", "b", "c", "d") pairwise.logfc(x, g) ## some small checks res <- by(x, list(g), mean) res[[1]] - res[[2]] # a vs. b res[[1]] - res[[3]] # a vs. c res[[1]] - res[[4]] # a vs. d res[[2]] - res[[3]] # b vs. c res[[2]] - res[[4]] # b vs. d res[[3]] - res[[4]] # c vs. d
Performs pairwise moderated t-tests (unpaired) based on Bioconductor package limma.
pairwise.mod.t.test(x, group, adjust.method = "BH", sort.by = "none")
pairwise.mod.t.test(x, group, adjust.method = "BH", sort.by = "none")
x |
a (non-empty) numeric matrix of data values. |
group |
an optional factor representing the groups. |
adjust.method |
see |
sort.by |
see |
, where "logFC"
corresponds to difference in means.
The function uses Bioconductor package limma to compute pairwise moderated
t-tests. For more details we refer to ebayes
.
A data.frame with the results.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
oneway.test
, mod.t.test
set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) gr <- factor(c(rep("A1", 5), rep("B2", 5), rep("C3", 5), rep("D4", 5))) mod.oneway.test(X, gr) pairwise.mod.t.test(X, gr)
set.seed(123) X <- rbind(matrix(rnorm(5*20), nrow = 5, ncol = 20), matrix(rnorm(5*20, mean = 1), nrow = 5, ncol = 20)) gr <- factor(c(rep("A1", 5), rep("B2", 5), rep("C3", 5), rep("D4", 5))) mod.oneway.test(X, gr) pairwise.mod.t.test(X, gr)
Compute mean of replicated spots where additionally spot flags may incorporated.
repMeans(x, flags, use.flags = NULL, ndups, spacing, method, ...)
repMeans(x, flags, use.flags = NULL, ndups, spacing, method, ...)
x |
matrix or data.frame of expression values |
flags |
matrix or data.frame of spot flags; must have same dimension as |
use.flags |
should flags be included and in which way; cf. section details |
ndups |
integer, number of replicates on chip. The number of rows of
|
spacing |
the spacing between the rows of 'x' corresponding to
replicated spots, |
method |
function to aggregate the replicated spots. If missing, the mean is used. |
... |
optional arguments to |
The incorporation of spot flags is controlled via argument use.flags
.
NULL
: flags are not used; minimum flag value of replicated
spots is returned
"max"
: only spots with flag value equal to the maximum flag value of
replicated spots are used
"median"
: only spots with flag values larger or equal to median of
replicated spots are used
"mean"
: only spots with flag values larger or equal to mean of replicated
spots are used
LIST with components
exprs |
mean of expression values |
flags |
flags for mean expression values |
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
## only a dummy example M <- matrix(rnorm(1000), ncol = 10) FL <- matrix(rpois(1000, lambda = 10), ncol = 10) # only for this example res <- repMeans(x = M, flags = FL, use.flags = "max", ndups = 5, spacing = 20)
## only a dummy example M <- matrix(rnorm(1000), ncol = 10) FL <- matrix(rpois(1000, lambda = 10), ncol = 10) # only for this example res <- repMeans(x = M, flags = FL, use.flags = "max", ndups = 5, spacing = 20)
Plot of similarity matrix.
simPlot(x, col, minVal, labels = FALSE, lab.both.axes = FALSE, labcols = "black", title = "", cex.title = 1.2, protocol = FALSE, cex.axis = 0.8, cex.axis.bar = 1, signifBar = 2, ...) simPlot2(x, col, minVal, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2)
simPlot(x, col, minVal, labels = FALSE, lab.both.axes = FALSE, labcols = "black", title = "", cex.title = 1.2, protocol = FALSE, cex.axis = 0.8, cex.axis.bar = 1, signifBar = 2, ...) simPlot2(x, col, minVal, labels = FALSE, row.width = 6, column.height = 6, lab.both.axes = TRUE, fontsize.axis = 12, title = "", fontsize.title = 16, signifBar = 2)
x |
quadratic data matrix. |
col |
colors palette for image. If missing, the |
minVal |
numeric, minimum value which is display by a color; used to adjust |
labels |
vector of character strings to be placed at the tickpoints,
labels for the columns of |
lab.both.axes |
logical, display labels on both axes |
labcols |
colors to be used for the labels of the columns of |
title |
character string, overall title for the plot. |
cex.title |
A numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default;
cf. |
fontsize.title |
numerical value giving the fontsize of the title. |
protocol |
logical, display color bar without numbers |
cex.axis |
The magnification to be used for axis annotation relative to the
current setting of 'cex'; cf. |
fontsize.axis |
numerical value giving the fontsize of the axis labels. |
cex.axis.bar |
The magnification to be used for axis annotation of the color
bar relative to the current setting of 'cex'; cf.
|
signifBar |
integer indicating the precision to be used for the bar. |
row.width |
numerical value giving width of the row in centimeters; i.e., can be used to change space available for the labels. |
column.height |
numerical value giving the height of the column in centimeters; i.e., can be used to change space available for the labels. |
... |
graphical parameters may also be supplied as arguments to the
function (see |
This functions generates a so called similarity matrix.
If min(x)
is smaller than minVal
, the colors in col
are
adjusted such that the minimum value which is color coded is equal to minVal
.
invisible()
The function is a slight modification of function corPlot
.
Matthias Kohl [email protected]
Sandrine Dudoit, Yee Hwa (Jean) Yang, Benjamin Milo Bolstad and with
contributions from Natalie Thorne, Ingrid Loennstedt and Jessica Mar.
sma: Statistical Microarray Analysis.
http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) colnames(M) <- paste("Sample", 1:20) M.cor <- cor(M) simPlot(M.cor, minVal = min(M.cor)) simPlot(M.cor, minVal = min(M.cor), lab.both.axes = TRUE) simPlot(M.cor, minVal = min(M.cor), protocol = TRUE) simPlot(M.cor, minVal = min(M.cor), signifBar = 1) simPlot2(M.cor, minVal = min(M.cor)) simPlot2(M.cor, minVal = min(M.cor), lab.both.axes = FALSE) simPlot2(M.cor, minVal = min(M.cor), signifBar = 1)
## only a dummy example M <- matrix(rnorm(1000), ncol = 20) colnames(M) <- paste("Sample", 1:20) M.cor <- cor(M) simPlot(M.cor, minVal = min(M.cor)) simPlot(M.cor, minVal = min(M.cor), lab.both.axes = TRUE) simPlot(M.cor, minVal = min(M.cor), protocol = TRUE) simPlot(M.cor, minVal = min(M.cor), signifBar = 1) simPlot2(M.cor, minVal = min(M.cor)) simPlot2(M.cor, minVal = min(M.cor), lab.both.axes = FALSE) simPlot2(M.cor, minVal = min(M.cor), signifBar = 1)
The function can be used to compute distances between strings.
stringDist(x, y, method = "levenshtein", mismatch = 1, gap = 1)
stringDist(x, y, method = "levenshtein", mismatch = 1, gap = 1)
x |
character vector, first string |
y |
character vector, second string |
method |
character, name of the distance method. This must be
|
mismatch |
numeric, distance value for a mismatch between symbols |
gap |
numeric, distance value for inserting a gap |
The function computes the Hamming and the Levenshtein (edit) distance of two given strings (sequences).
In case of the Hamming distance the two strings must have the same length.
In case of the Levenshtein (edit) distance a scoring and a trace-back matrix are computed
and are saved as attributes "ScoringMatrix"
and "TraceBackMatrix"
.
The characters in the trace-back matrix reflect insertion of a gap in string y
(d
: deletion), match (m
), mismatch (mm
),
and insertion of a gap in string x
(i
).
stringDist
returns an object of S3 class "stringDist"
inherited
from class "dist"
; cf. dist
.
The function is mainly for teaching purposes.
For distances between strings and string alignments see also Bioconductor package Biostrings.
Matthias Kohl [email protected]
R. Merkl and S. Waack (2009). Bioinformatik Interaktiv. Wiley.
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## Levenshtein distance d <- stringDist(x, y) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix") ## Hamming distance stringDist(x, y)
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## Levenshtein distance d <- stringDist(x, y) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix") ## Hamming distance stringDist(x, y)
The function can be used to compute similarity scores between strings.
stringSim(x, y, global = TRUE, match = 1, mismatch = -1, gap = -1, minSim = 0)
stringSim(x, y, global = TRUE, match = 1, mismatch = -1, gap = -1, minSim = 0)
x |
character vector, first string |
y |
character vector, second string |
global |
logical; global or local alignment |
match |
numeric, score for a match between symbols |
mismatch |
numeric, score for a mismatch between symbols |
gap |
numeric, penalty for inserting a gap |
minSim |
numeric, used as required minimum score in case of local alignments |
The function computes optimal alignment scores for global (Needleman-Wunsch) and local (Smith-Waterman) alignments with constant gap penalties.
Scoring and trace-back matrix are computed and saved in form of attributes
"ScoringMatrix"
and "TraceBackMatrix"
.
The characters in the trace-back matrix reflect insertion of a gap in
string y
(d
: deletion), match (m
), mismatch (mm
),
and insertion of a gap in string x
(i
). In addition stop
indicates that the minimum similarity score has been reached.
stringSim
returns an object of S3 class "stringSim"
inherited
from class "dist"
; cf. dist
.
The function is mainly for teaching purposes.
For distances between strings and string alignments see also Bioconductor package Biostrings.
Matthias Kohl [email protected]
R. Merkl and S. Waack (2009). Bioinformatik Interaktiv. Wiley.
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## optimal global alignment score d <- stringSim(x, y) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix") ## optimal local alignment score d <- stringSim(x, y, global = FALSE) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix")
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## optimal global alignment score d <- stringSim(x, y) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix") ## optimal local alignment score d <- stringSim(x, y, global = FALSE) d attr(d, "ScoringMatrix") attr(d, "TraceBackMatrix")
Function computes an optimal global or local alignment based on a trace back
matrix as provided by function stringDist
or stringSim
.
traceBack(D, global = TRUE)
traceBack(D, global = TRUE)
D |
object of class |
global |
logical, global or local alignment |
Computes one possible optimal global or local alignment based on the trace back
matrix saved in an object of class "stringDist"
or "stringSim"
.
matrix: pairwise global/local alignment
The function is mainly for teaching purposes.
For distances between strings and string alignments see Bioconductor package Biostrings.
Matthias Kohl [email protected]
R. Merkl and S. Waack (2009). Bioinformatik Interaktiv. Wiley.
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## Levenshtein distance d <- stringDist(x, y) ## optimal global alignment traceBack(d) ## Optimal global alignment score d <- stringSim(x, y) ## optimal global alignment traceBack(d) ## Optimal local alignment score d <- stringSim(x, y, global = FALSE) ## optimal local alignment traceBack(d, global = FALSE)
x <- "GACGGATTATG" y <- "GATCGGAATAG" ## Levenshtein distance d <- stringDist(x, y) ## optimal global alignment traceBack(d) ## Optimal global alignment score d <- stringSim(x, y) ## optimal global alignment traceBack(d) ## Optimal local alignment score d <- stringSim(x, y, global = FALSE) ## optimal local alignment traceBack(d, global = FALSE)
This function is a slight modification of function Anova
of
package genefilter.
twoWayAnova(cov1, cov2, interaction, na.rm = TRUE)
twoWayAnova(cov1, cov2, interaction, na.rm = TRUE)
cov1 |
The first covariate. It must have length equal to the number of
columns of the array that the result of |
cov2 |
The second covariate. It must have length equal to the number of
columns of the array that the result of |
interaction |
logical, should interaction be considered |
na.rm |
a logical value indicating whether 'NA' values should be stripped before the computation proceeds. |
The function returned by twoWayAnova
uses lm
to fit
a linear model of the form lm(x ~ cov1*cov2)
, where x
is the set
of gene expressions. The F statistics for the main effects and the interaction are
computed and the corresponding p-values are returned.
twoWayAnova
returns a function with bindings for cov1
and
cov2
that will perform a two-way ANOVA.
A first version of this function appeared in package SLmisc.
Matthias Kohl [email protected]
R. Gentleman, V. Carey, W. Huber and F. Hahne (2006). genefilter: methods for filtering genes from microarray experiments. R package version 1.13.7.
set.seed(123) af1 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2)) af2 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2), interaction = FALSE) x <- matrix(rnorm(12*10), nrow = 10) apply(x, 1, af1) apply(x, 1, af2)
set.seed(123) af1 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2)) af2 <- twoWayAnova(c(rep(1,6),rep(2,6)), rep(c(rep(1,3), rep(2,3)), 2), interaction = FALSE) x <- matrix(rnorm(12*10), nrow = 10) apply(x, 1, af1) apply(x, 1, af2)