Title: | Tools to Analyse RFLP Data |
---|---|
Description: | Provides functions to analyse DNA fragment samples (i.e. derived from RFLP-analysis) and standalone BLAST report files (i.e. DNA sequence analysis). |
Authors: | Fabienne Flessa [aut], Alexandra Kehl [aut] , Mohammed Aslam Imtiaz [aut], Matthias Kohl [aut, cre] |
Maintainer: | Matthias Kohl <[email protected]> |
License: | LGPL-3 |
Version: | 2.0 |
Built: | 2024-11-17 04:10:06 UTC |
Source: | https://github.com/cran/RFLPtools |
RFLPtools provides functions to analyse DNA fragment samples (i.e. derived from RFLP-analysis) and standalone BLAST report files (i.e. DNA sequence analysis).
Package: | RFLPtools |
Version: | 2.0 |
Date: | 2022-02-07 |
Depends: | R(>= 4.0.0) |
Imports: | stats, utils, graphics, grDevices, RColorBrewer |
Suggests: | knitr, rmarkdown, lattice, MKomics |
License: | LGPL-3 |
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Mohammed Aslam Imtiaz,
Matthias Kohl [email protected]
Maintainer: Matthias Kohl [email protected]
Local Blast download: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews
Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Matsumoto, Masaru; Furuya, Naruto; Takanami, Yoichi; Matsuyama, Nobuaki. RFLP analysis of the PCR-amplified 28S rDNA in Rhizoctonia solani. Mycoscience 1996 37:351-356.
Persoh, D., Melcher, M., Flessa, F., Rambold, G.: First fungal community analyses of endophytic ascomycetes associated with Viscum album ssp. austriacum and itshost Pinus sylvestris. Fungal Biology 2010 Jul;114(7):585-96.
Poussier, Stephane; Trigalet-Demery, Danielle; Vandewalle, Peggy; Goffinet, Bruno; Luisetti, Jacques; Trigalet, Andre. Genetic diversity of Ralstonia solanacearum as assessed by PCR-RFLP of the hrp gene region, AFLP and 16S rRNA sequence analysis, and identification of an African subdivision. Microbiology 2000 146:1679-1692.
T. A. Saari, S. K. Saari, C. D. Campbell, I. J Alexander, I. C. Anderson. FragMatch - a program for the analysis of DNA fragment data. Mycorrhiza 2007, 17:133-136
data(RFLPdata) res <- RFLPdist(RFLPdata) plot(hclust(res[[1]]), main = "Euclidean distance") par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8) data(RFLPref) RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8) library(MKomics) data(BLASTdata) res <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) simPlot(res, col = myCol, minVal = 0, labels = colnames(res), title = "(Dis-)Similarity Plot")
data(RFLPdata) res <- RFLPdist(RFLPdata) plot(hclust(res[[1]]), main = "Euclidean distance") par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8) data(RFLPref) RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8) library(MKomics) data(BLASTdata) res <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) simPlot(res, col = myCol, minVal = 0, labels = colnames(res), title = "(Dis-)Similarity Plot")
This is an example data set for BLAST data generated with standalone BLAST from NCBI.
data(RFLPdata)
data(RFLPdata)
A data frame with 737 observations on the following four variables
query.id
character: sequence identifier.
subject.id
character: subject identifier.
identity
numeric: identity between sequences (in percent).
alignment.length
integer: number of nucleotides.
mismatches
integer: number of mismatches.
gap.opens
integer: number of gaps.
q.start
integer: query sequence start.
q.end
integer: query sequence end.
s.start
integer: subject sequence start.
s.end
integer: subject sequence end.
evalue
numeric: evalue.
bit.score
numeric: score value.
The data was generated with standalone BLAST from NCBI. Pairwise similarities of DNA sequences are calculated among all sequences to analyse applying Standalone Blast with the parameters -m 8 -r 2 -G 5 -E 2.
Alternatively data can be generated with "local BLAST" implemented in BioEdit v7.0.9 using the additional parameters -m 8 -r 2 -G 5 -E 2 and by selecting "open output" and "tabular output".
The data set was generated by F. Flessa.
Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews
BioEdit: https://bioedit.software.informer.com/
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(BLASTdata) str(BLASTdata)
data(BLASTdata) str(BLASTdata)
This function computes and returns the distance matrix computed by
using the specified distance measure to compute the distances between
the rows of a data matrix. Instead of the row values as in the case of
dist
, the successive differences of the row values
are used.
diffDist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
diffDist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
x |
a numeric matrix, data frame or |
method |
the distance measure to be used. This must be one of
|
diag |
logical value indicating whether the diagonal of the
distance matrix should be printed by |
upper |
logical value indicating whether the upper triangle of the
distance matrix should be printed by |
p |
The power of the Minkowski distance. |
This function computes and returns the distance matrix computed by
using the specified distance measure to compute the distances between
the rows of a data matrix. Instead of the row values as in the case of
dist
, the successive differences of the row values
are used.
It's a simple wrapper function arround dist
. For
more details about the distances we refer to dist
.
The function may be helpful, if there is a shift w.r.t.\ the measured
bands; e.g.\ c(550, 500, 300, 250)
vs.\ c(510, 460, 260, 210)
.
diffDist
returns an object of class "dist"
; cf. dist
.
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
## assume a shift in the measured bands M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210), c(550, 500, 300, 200)) dist(M) diffDist(M)
## assume a shift in the measured bands M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210), c(550, 500, 300, 200)) dist(M) diffDist(M)
Compute matches for RFLP data using FragMatch - a program for the analysis of DNA fragment data.
FragMatch(newData, refData, maxValue = 1000, errorBound = 25, weight = 1, na.rm = TRUE)
FragMatch(newData, refData, maxValue = 1000, errorBound = 25, weight = 1, na.rm = TRUE)
newData |
data.frame with new RFLP data; see |
refData |
data.frame with reference RFLP data; see |
maxValue |
numeric: maximum value for which the error bound is applied. Can be a vector of length larger than 1. |
errorBound |
numeric: error bound corresponding to |
weight |
numeric: weight for weighting partial matches; see details section. |
na.rm |
logical: indicating whether NA values should be stripped before the computation proceeds. |
A rather simple algorithm which consists of counting the number of matches where
it is considered a match if the value is inside a range of +/- errorBound
.
If there is more than one enzyme, one can use weights to give the partial perfect matches for a certain enzyme a higher (or also smaller) weight.
A character matrix with entries of the form "a_b"
which means that there
were a
out of b
possible matches.
Mohammed Aslam Imtiaz, Matthias Kohl [email protected]
T. A. Saari, S. K. Saari, C. D. Campbell, I. J Alexander, I. C. Anderson. FragMatch - a program for the analysis of DNA fragment data. Mycorrhiza 2007, 17:133-136
data(refDataGerm) data(newDataGerm) res <- FragMatch(newDataGerm, refDataGerm)
data(refDataGerm) data(newDataGerm) res <- FragMatch(newDataGerm, refDataGerm)
Compute matches for RFLP data using the Good-Enough RFLP Matcher (GERM) program.
germ(newData, refData, parameters = list("Max forward error" = 25, "Max backward error" = 25, "Max sum error" = 100, "Lower measurement limit" = 100), method = "joint", na.rm = TRUE)
germ(newData, refData, parameters = list("Max forward error" = 25, "Max backward error" = 25, "Max sum error" = 100, "Lower measurement limit" = 100), method = "joint", na.rm = TRUE)
newData |
data.frame with new RFLP data; see |
refData |
data.frame with reference RFLP data; see |
parameters |
list of the four program parameters of GERM; see details section. |
method |
matching and ranking method used for computation; see details section. |
na.rm |
logical: indicating whether NA values should be stripped before the computation proceeds. |
There are four matching and ranking methods which are "joint"
, "forward"
,
"backward"
, and "sum"
. For more details see Dickie et al. (2003).
The parameters of the GERM software are:
"Max forward error"
: Used if "matching and ranking method" is set to "forward"
or "joint"
.
"Max backward error"
: Used if "matching and ranking method" is set to "backward"
or "joint"
.
"Max sum error"
: Used for matching if "matching and ranking method" is set to "sum"
.
"Lower measurement limit"
: The lower bound of measurements (often 100 or 50, depending on ladder used).
A named list with the results.
Mohammed Aslam Imtiaz, Matthias Kohl [email protected]
Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.
data(refDataGerm) data(newDataGerm) ## Example 1 res1 <- germ(newDataGerm[1:7,], refDataGerm) ## Example 2 res2 <- germ(newDataGerm[8:15,], refDataGerm) ## Example 3 res3 <- germ(newDataGerm[16:20,], refDataGerm) ## all three examples in one step res.all <- germ(newDataGerm, refDataGerm)
data(refDataGerm) data(newDataGerm) ## Example 1 res1 <- germ(newDataGerm[1:7,], refDataGerm) ## Example 2 res2 <- germ(newDataGerm[8:15,], refDataGerm) ## Example 3 res3 <- germ(newDataGerm[16:20,], refDataGerm) ## all three examples in one step res.all <- germ(newDataGerm, refDataGerm)
This function computes linear combinations of distances.
linCombDist(x, distfun1, w1, distfun2, w2, diag = FALSE, upper = FALSE)
linCombDist(x, distfun1, w1, distfun2, w2, diag = FALSE, upper = FALSE)
x |
object which is passed to |
distfun1 |
function used to compute an object of class |
w1 |
weight for result of |
distfun2 |
function used to compute an object of class |
w2 |
weight for result of |
diag |
see |
upper |
see |
This function computes and returns the distance matrix computed by a linear combination of two distance matrices.
linCombDist
returns an object of class "dist"
; cf. dist
.
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
## assume a shift in the measured bands M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210), c(700, 650, 450, 400), c(550, 490, 310, 250)) dist(M) diffDist(M) ## convex combination of dist and diffDist linCombDist(M, distfun1 = dist, w1 = 0.5, distfun2 = diffDist, w2 = 0.5) ## linear combination linCombDist(M, distfun1 = dist, w1 = 2, distfun2 = diffDist, w2 = 5) ## maximum distance linCombDist(M, distfun1 = function(x) dist(x, method = "maximum"), w1 = 0.5, distfun2 = function(x) diffDist(x, method = "maximum"), w2 = 0.5) data(RFLPdata) distfun <- function(x) linCombDist(x, distfun1 = dist, w1 = 0.1, distfun2 = diffDist, w2 = 0.9) par(mfrow = c(2, 2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun)), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, mar.bottom = 6, cex.axis = 0.8) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)
## assume a shift in the measured bands M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210), c(700, 650, 450, 400), c(550, 490, 310, 250)) dist(M) diffDist(M) ## convex combination of dist and diffDist linCombDist(M, distfun1 = dist, w1 = 0.5, distfun2 = diffDist, w2 = 0.5) ## linear combination linCombDist(M, distfun1 = dist, w1 = 2, distfun2 = diffDist, w2 = 5) ## maximum distance linCombDist(M, distfun1 = function(x) dist(x, method = "maximum"), w1 = 0.5, distfun2 = function(x) diffDist(x, method = "maximum"), w2 = 0.5) data(RFLPdata) distfun <- function(x) linCombDist(x, distfun1 = dist, w1 = 0.1, distfun2 = diffDist, w2 = 0.9) par(mfrow = c(2, 2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun)), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, mar.bottom = 6, cex.axis = 0.8) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)
This is the reference data taken from the GERM software.
data(newDataGerm)
data(newDataGerm)
A data frame with 20 observations on the following six variables
Sample
character: sample identifier.
Enzyme
character: enzyme used.
Band
integer: band number.
MW
integer: molecular weight.
Genus
character: genus of sample.
Species
character: species of sample.
See GERM software.
The data set was taken from the GERM software (table 'Example Unknowns').
Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.
data(newDataGerm) str(newDataGerm)
data(newDataGerm) str(newDataGerm)
Computes groups based on the number of bands per sample in a RFLP data set. Each group comprises RFLP-samples with equal number of bands.
nrBands(x)
nrBands(x)
x |
data.frame with RFLP data; see |
The function computes groups based on the number of bands per sample in a RFLP data set. Each group comprises RFLP-samples with equal number of bands.
Number of bands per RFLP-samples.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) nrBands(RFLPdata)
data(RFLPdata) nrBands(RFLPdata)
Function to read BLAST data generated with standalone BLAST from NCBI.
read.blast(file, sep = "\t")
read.blast(file, sep = "\t")
file |
character: BLAST file to read in. |
sep |
the field separator character. Values on each line of the file are
separated by this character. Default |
The function reads data which was generated with standalone BLAST from NCBI; see ftp://ftp.ncbi.nih.gov/blast/executables/release/.
Possible steps:
1) Install NCBI BLAST
2) Generate and import database(s)
3) Apply BLAST with options outfmt
and out
; e.g.blastn -query Testquery -db Testdatabase -outfmt 6 -out out.txt
orblastn -query Testquery -db Testdatabase -outfmt 10 -out out.csv
One can also call BLAST from inside R by using function system
system("blastn -query Testquery -db Testdatabase -outfmt 6 -out out.txt")
4) Read in the resultstest.res <- read.blast(file = "out.txt")
ortest.res <- read.blast(file = "out.csv", sep = ",")
A data.frame
with variables
query.id
character: sequence identifier.
subject.id
character: subject identifier.
identity
numeric: identity between sequences (in percent).
alignment.length
integer: number of nucleotides.
mismatches
integer: number of mismatches.
gap.opens
integer: number of gaps.
q.start
integer: query sequence start.
q.end
integer: query sequence end.
s.start
integer: subject sequence start.
s.end
integer: subject sequence end.
evalue
numeric: evalue.
bit.score
numeric: score value.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "BLASTexample.txt") BLAST1 <- read.blast(file = filename) str(BLAST1)
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "BLASTexample.txt") BLAST1 <- read.blast(file = filename) str(BLAST1)
Function to read RFLP data (e.g. generated with software package Gene Profiler 4.05 (Scanalytics Inc.)) for DNA fragment analysis and genotyping, and exported to a text file.
read.rflp(file)
read.rflp(file)
file |
character: RFLP file to read in. |
The function reads data from a text file which was generated e.g. with the
software package Gene Profiler 4.05 (Scanalytics Inc.) for DNA fragment
analysis and genotyping. The data file contains sample identifier (Sample
),
band number (Band
), molecular weight (MW
) and gel identifier (Gel
)
(see RFLPdata
).
If gel identifier Gel
is missing it is extracted from the sample identifier
Sample
.
A data.frame
with variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Gel
character: gel identifier.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "RFLPexample.txt") RFLP1 <- read.rflp(file = filename) str(RFLP1) filename <- file.path(Dir, "AZ091016_report.txt") RFLP2 <- read.rflp(file = filename) str(RFLP2)
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "RFLPexample.txt") RFLP1 <- read.rflp(file = filename) str(RFLP1) filename <- file.path(Dir, "AZ091016_report.txt") RFLP2 <- read.rflp(file = filename) str(RFLP2)
This is the reference data taken from the GERM software.
data(refDataGerm)
data(refDataGerm)
A data frame with 250 observations on the following six variables
Sample
character: sample identifier.
Enzyme
character: enzyme used.
Band
integer: band number.
MW
integer: molecular weight.
Genus
character: genus of sample.
Species
character: species of sample.
See GERM software.
The data set was taken from the GERM software (table 'Example Data').
Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.
data(refDataGerm) str(refDataGerm)
data(refDataGerm) str(refDataGerm)
Function to combine an arbitrary number of RFLP data sets.
RFLPcombine(...)
RFLPcombine(...)
... |
two or more data.frames with RFLP data. |
The data sets are combined using rbind
.
If data sets with identical sample identifiers are given, the
identifiers are made unique using make.unique
.
A data.frame
with variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Gel
character: gel identifier.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) res <- RFLPcombine(RFLPdata, RFLPdata, RFLPdata) RFLPplot(res, nrBands = 4)
data(RFLPdata) res <- RFLPcombine(RFLPdata, RFLPdata, RFLPdata) RFLPplot(res, nrBands = 4)
This is an example data set for RFLP data.
data(RFLPdata)
data(RFLPdata)
A data frame with 737 observations on the following four variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Gel
character: gel identifier.
The molecular weight was determined using the software package Gene Profiler 4.05 (Scanalytics Inc.) for DNA fragment analysis and genotyping, and exported to a text file.
The data set was generated by F. Flessa.
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) str(RFLPdata)
data(RFLPdata) str(RFLPdata)
Within each group containing RFLP-samples exhibiting a equal number of bands, the distance between the molecular weights is computed.
RFLPdist(x, distfun = dist, nrBands, LOD = 0)
RFLPdist(x, distfun = dist, nrBands, LOD = 0)
x |
data.frame with RFLP data; see |
distfun |
function computing the distance with default |
nrBands |
if not missing, then only samples with the specified number of bands are considered. |
LOD |
threshold for low-bp bands. |
For each number of bands the given distance between the molecular weights is computed. The result is a named list of distances where the names correspond to the number of bands which occur in each group.
If nrBands
is specified only samples with this number of bands are considered.
If LOD > 0
is specified, all values below LOD
are removed before the
distances are calculated.
A named list with the distances; see dist
.
In case nrBands
is not missing, an object of S3 class dist
.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Poussier, Stephane; Trigalet-Demery, Danielle; Vandewalle, Peggy; Goffinet, Bruno; Luisetti, Jacques; Trigalet, Andre. Genetic diversity of Ralstonia solanacearum as assessed by PCR-RFLP of the hrp gene region, AFLP and 16S rRNA sequence analysis, and identification of an African subdivision. Microbiology 2000 146:1679-1692
Matsumoto, Masaru; Furuya, Naruto; Takanami, Yoichi; Matsuyama, Nobuaki. RFLP analysis of the PCR-amplified 28S rDNA in Rhizoctonia solani. Mycoscience 1996 37:351 - 356
## Euclidean distance data(RFLPdata) res <- RFLPdist(RFLPdata) names(res) ## number of bands res$"6" RFLPdist(RFLPdata, nrBands = 6) ## Other distances res1 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "manhattan")) res2 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "maximum")) res[[1]] res1[[1]] res2[[1]] ## cut dendrogram at height 50 clust4bd <- hclust(res[[2]]) cgroups50 <- cutree(clust4bd, h=50) cgroups50 ## or library(MKomics) res3 <- RFLPdist(RFLPdata, distfun = corDist) res3$"9" ## hierarchical clustering par(mfrow = c(2,2)) plot(hclust(res[[1]]), main = "Euclidean distance") plot(hclust(res1[[1]]), main = "Manhattan distance") plot(hclust(res2[[1]]), main = "Maximum distance") plot(hclust(res3[[1]]), main = "Pearson correlation distance") ## Similarity matrix library(MKomics) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) ord <- order.dendrogram(as.dendrogram(hclust(res[[1]]))) temp <- as.matrix(res[[1]]) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot") ## or library(lattice) levelplot(temp[ord,ord], col.regions = rev(myCol), at = do.breaks(c(0, max(temp)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## multidimensional scaling loc <- cmdscale(res[[5]]) x <- loc[,1] y <- -loc[,2] plot(x, y, type="n", xlab="", ylab="", xlim = 1.05*range(x), main="Multidemsional scaling") text(x, y, rownames(loc), cex=0.8)
## Euclidean distance data(RFLPdata) res <- RFLPdist(RFLPdata) names(res) ## number of bands res$"6" RFLPdist(RFLPdata, nrBands = 6) ## Other distances res1 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "manhattan")) res2 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "maximum")) res[[1]] res1[[1]] res2[[1]] ## cut dendrogram at height 50 clust4bd <- hclust(res[[2]]) cgroups50 <- cutree(clust4bd, h=50) cgroups50 ## or library(MKomics) res3 <- RFLPdist(RFLPdata, distfun = corDist) res3$"9" ## hierarchical clustering par(mfrow = c(2,2)) plot(hclust(res[[1]]), main = "Euclidean distance") plot(hclust(res1[[1]]), main = "Manhattan distance") plot(hclust(res2[[1]]), main = "Maximum distance") plot(hclust(res3[[1]]), main = "Pearson correlation distance") ## Similarity matrix library(MKomics) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) ord <- order.dendrogram(as.dendrogram(hclust(res[[1]]))) temp <- as.matrix(res[[1]]) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot") ## or library(lattice) levelplot(temp[ord,ord], col.regions = rev(myCol), at = do.breaks(c(0, max(temp)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## multidimensional scaling loc <- cmdscale(res[[5]]) x <- loc[,1] y <- -loc[,2] plot(x, y, type="n", xlab="", ylab="", xlim = 1.05*range(x), main="Multidemsional scaling") text(x, y, rownames(loc), cex=0.8)
If gel image quality is low, faint bands may be disregarded and
may lead to wrong conclusions. This function computes the distance
between the molecular weights of RFLP samples, including samples
containing one or more additional bands. Thus, failures during
band detection could be identified. Visualisation of band patterns
using this method can be done by RFLPplot
using the
argument nrMissing
.
RFLPdist2(x, distfun = dist, nrBands, nrMissing, LOD = 0, diag = FALSE, upper = FALSE)
RFLPdist2(x, distfun = dist, nrBands, nrMissing, LOD = 0, diag = FALSE, upper = FALSE)
x |
data.frame with RFLP data; see |
distfun |
function computing the distance with default |
nrBands |
samples with number of bands equal to |
nrMissing |
number of bands that might be missing. |
LOD |
threshold for low-bp bands. |
diag |
see |
upper |
see |
For a given number of bands the given distance between the molecular weights is computed. It is assumed that a number of bands might be missing. Hence all samples with number of bands in nrBands, nrBands+1, ..., nrBands+nrMissing are compared.
If LOD > 0
is specified, it is assumed that missing bands can only occur for
molecular weights smaller than LOD
. As a consequence only samples which
have nrBands
bands with molecular weight larger or equal to LOD
are
selected.
For computing the distance between the molecular weight of a sample S1 with x bands and a Sample S2 with x+y bands the distances between the molecular weight of sample S1 and the molecular weight of all possible subsets of S2 with x bands are computed. The distance between S1 and S2 is then defined as the minimum of all these distances.
If LOD > 0
is specified, only all combinations of values below LOD
are
considered.
This option may be useful, if gel image quality is low, and the detection of bands is doubtful.
An object of class "dist"
returned; cf. dist
.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.
RFLPdata
, nrBands
, RFLPdist
, dist
## Euclidean distance data(RFLPdata) nrBands(RFLPdata) res0 <- RFLPdist(RFLPdata, nrBands = 4) res1 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1) res2 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 2) res3 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 3) ## assume missing bands only below LOD res1.lod <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1, LOD = 60) ## hierarchical clustering par(mfrow = c(2,2)) plot(hclust(res0), main = "0 bands missing") plot(hclust(res1), main = "1 band missing") plot(hclust(res2), main = "2 bands missing") plot(hclust(res3), main = "3 bands missing") ## missing bands only below LOD par(mfrow = c(1,2)) plot(hclust(res0), main = "0 bands missing") plot(hclust(res1.lod), main = "1 band missing below LOD") ## Similarity matrix library(MKomics) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) ord <- order.dendrogram(as.dendrogram(hclust(res1))) temp <- as.matrix(res1) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot") ## missing bands only below LOD ord <- order.dendrogram(as.dendrogram(hclust(res1.lod))) temp <- as.matrix(res1.lod) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot\n1 band missing below LOD") ## or library(lattice) levelplot(temp[ord,ord], col.regions = rev(myCol), at = do.breaks(c(0, max(temp)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## Other distances res11 <- RFLPdist2(RFLPdata, distfun = function(x) dist(x, method = "manhattan"), nrBands = 4, nrMissing = 1) res12 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1) res13 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1, LOD = 60) par(mfrow = c(2,2)) plot(hclust(res1), main = "Euclidean distance\n1 band missing") plot(hclust(res11), main = "Manhattan distance\n1 band missing") plot(hclust(res12), main = "Pearson correlation distance\n1 band missing") plot(hclust(res13), main = "Pearson correlation distance\n1 band missing below LOD")
## Euclidean distance data(RFLPdata) nrBands(RFLPdata) res0 <- RFLPdist(RFLPdata, nrBands = 4) res1 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1) res2 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 2) res3 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 3) ## assume missing bands only below LOD res1.lod <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1, LOD = 60) ## hierarchical clustering par(mfrow = c(2,2)) plot(hclust(res0), main = "0 bands missing") plot(hclust(res1), main = "1 band missing") plot(hclust(res2), main = "2 bands missing") plot(hclust(res3), main = "3 bands missing") ## missing bands only below LOD par(mfrow = c(1,2)) plot(hclust(res0), main = "0 bands missing") plot(hclust(res1.lod), main = "1 band missing below LOD") ## Similarity matrix library(MKomics) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) ord <- order.dendrogram(as.dendrogram(hclust(res1))) temp <- as.matrix(res1) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot") ## missing bands only below LOD ord <- order.dendrogram(as.dendrogram(hclust(res1.lod))) temp <- as.matrix(res1.lod) simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, labels = colnames(temp), title = "(Dis-)Similarity Plot\n1 band missing below LOD") ## or library(lattice) levelplot(temp[ord,ord], col.regions = rev(myCol), at = do.breaks(c(0, max(temp)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## Other distances res11 <- RFLPdist2(RFLPdata, distfun = function(x) dist(x, method = "manhattan"), nrBands = 4, nrMissing = 1) res12 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1) res13 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1, LOD = 60) par(mfrow = c(2,2)) plot(hclust(res1), main = "Euclidean distance\n1 band missing") plot(hclust(res11), main = "Manhattan distance\n1 band missing") plot(hclust(res12), main = "Pearson correlation distance\n1 band missing") plot(hclust(res13), main = "Pearson correlation distance\n1 band missing below LOD")
Function to compute distance between RFLP data and RFLP reference data.
RFLPdist2ref(x, ref, distfun = dist, nrBands, LOD = 0)
RFLPdist2ref(x, ref, distfun = dist, nrBands, LOD = 0)
x |
data.frame with RFLP data; e.g. |
ref |
data.frame with RFLP reference data; e.g. |
distfun |
function computing the distance with default |
nrBands |
only samples and reference samples with this number of bands are considered. |
LOD |
threshold for low-bp bands. |
For each sample with nrBands
bands the distance to each reference
sample with nrBands
bands is computed. The result is a matrix with
the corresponding distances where rows represent the samples and columns
the reference samples.
If LOD > 0
is specified, all values below LOD
are removed before the
distances are calculated. This applies to x
and ref
.
A matrix with distances.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
## Euclidean distance data(RFLPdata) data(RFLPref) nrBands(RFLPref) RFLPdist2ref(RFLPdata, RFLPref, nrBands = 4) RFLPdist2ref(RFLPdata, RFLPref, nrBands = 6) Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) RFLP2 <- RFLPqc(RFLP1) nrBands(RFLP2) RFLPdist2ref(RFLP1, RFLPref, nrBands = 4) RFLPdist2ref(RFLP1, RFLPref, nrBands = 5)
## Euclidean distance data(RFLPdata) data(RFLPref) nrBands(RFLPref) RFLPdist2ref(RFLPdata, RFLPref, nrBands = 4) RFLPdist2ref(RFLPdata, RFLPref, nrBands = 6) Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) RFLP2 <- RFLPqc(RFLP1) nrBands(RFLP2) RFLPdist2ref(RFLP1, RFLPref, nrBands = 4) RFLPdist2ref(RFLP1, RFLPref, nrBands = 5)
Function to exclude bands below a given LOD.
RFLPlod(x, LOD)
RFLPlod(x, LOD)
x |
data.frame with RFLP data. |
LOD |
threshold for low-bp bands. |
Low-bp bands may be regarded as unreliable. Function
RFLPlod
can be used to exclude such bands, which
are likely to be absent in some other samples, before
further analyses.
A data.frame
with variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Gel
character: gel identifier.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) ## remove bands with MW smaller than 60 RFLPdata.lod <- RFLPlod(RFLPdata, LOD = 60) par(mfrow = c(1, 2)) RFLPplot(RFLPdata, nrBands = 4, ylim = c(40, 670)) RFLPplot(RFLPdata.lod, nrBands = 4, ylim = c(40, 670)) title(sub = "After applying RFLPlod")
data(RFLPdata) ## remove bands with MW smaller than 60 RFLPdata.lod <- RFLPlod(RFLPdata, LOD = 60) par(mfrow = c(1, 2)) RFLPplot(RFLPdata, nrBands = 4, ylim = c(40, 670)) RFLPplot(RFLPdata.lod, nrBands = 4, ylim = c(40, 670)) title(sub = "After applying RFLPlod")
Given RFLP data is plotted where the samples are sorted according to the corresponding dendrogram.
RFLPplot(x, nrBands, nrMissing, distfun = dist, hclust.method = "complete", mar.bottom = 5, cex.axis = 0.5, colBands, xlab = "", ylab = "molecular weight", ylim, ...)
RFLPplot(x, nrBands, nrMissing, distfun = dist, hclust.method = "complete", mar.bottom = 5, cex.axis = 0.5, colBands, xlab = "", ylab = "molecular weight", ylim, ...)
x |
data.frame with RFLP data; see |
nrBands |
if not missing, then only samples with the specified number of bands are considered. |
nrMissing |
if not missing, then it is assumed that some bands may be missing. That is, all samples with number of bands in nrBands, nrBands+1, ..., nrBands+nrMissing are considered. |
distfun |
function computing the distance with default |
hclust.method |
method used for hierarchical clustering;
see |
mar.bottom |
bottom margin of the plot; see |
cex.axis |
size of the x-axis annotation. |
colBands |
color for the bands. Has to be of length 1 or number of samples.
If missing, |
xlab |
passed to function |
ylab |
passed to function |
ylim |
passed to function |
... |
additional arguments passed to function |
RFLP data is plotted. The samples are sorted according to the corresponding
dendrogram which is computed via function hclust
.
The option to specify nrMissing
may be useful, if gel image quality is low,
and the detection of bands is doubtful.
invisible
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8) par(mfrow = c(1,2)) plot(hclust(RFLPdist2(RFLPdata, nrBands = 9, nrMissing = 1)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 9, nrMissing = 1, mar.bottom = 6, cex.axis = 0.8) distfun <- function(x) dist(x, method = "maximum") par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun), method = "average"), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, hclust.method = "average", mar.bottom = 6, cex.axis = 0.8)
data(RFLPdata) par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8) par(mfrow = c(1,2)) plot(hclust(RFLPdist2(RFLPdata, nrBands = 9, nrMissing = 1)), cex = 0.7) RFLPplot(RFLPdata, nrBands = 9, nrMissing = 1, mar.bottom = 6, cex.axis = 0.8) distfun <- function(x) dist(x, method = "maximum") par(mfrow = c(1,2)) plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun), method = "average"), cex = 0.7, cex.lab = 0.7) RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, hclust.method = "average", mar.bottom = 6, cex.axis = 0.8)
Function to perform quality control for RFLP data based on a
comparison between the total length of the digested PCR
amplification product and the sum of the fragment lengths. If the
sum is smaller or larger than the PCR amplification product
(within a certain range to define), the samples can be excluded
from further analyses. This function is helpful for data sets
containig faint or uncertain bands. It is necessary to include
the total length of the PCR amplification product for each sample
as largest fragment in the data set, see RFLPdata
.
RFLPqc(x, rm.band1 = TRUE, QC.lo = 0.8, QC.up = 1.07, QC.rm = FALSE)
RFLPqc(x, rm.band1 = TRUE, QC.lo = 0.8, QC.up = 1.07, QC.rm = FALSE)
x |
data.frame with RFLP data. |
rm.band1 |
logical: remove first band. |
QC.lo |
numeric: a real number in (0,1). |
QC.up |
numeric: a real number larger than 1. |
QC.rm |
logical: remove samples with unsufficient quality. |
In case the first band corresponds to the total length of the fragment one can perform
a quality control comparing the length of the first band with the sum of the lengths
of the remaining bands for each sample. If the sum is smaller than QC.lo
times
the length of the first band or larger than QC.up
times the length of the first
band, respectively, a text message is printed.
If rm.band1 = TRUE
band 1 of all samples is removed and the remaining band
numbers are reduced by 1.
If QC.rm = TRUE
samples of insufficient quality are entirely removed from the
given data and the resulting data.frame
is returned.
A data.frame
with variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Gel
character: gel identifier.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) str(RFLP1) RFLP2 <- RFLPqc(RFLP1, rm.band1 = FALSE) # identical to RFLP1 identical(RFLP1, RFLP2) RFLP3 <- RFLPqc(RFLP1) str(RFLP3) RFLP4 <- RFLPqc(RFLP1, rm.band1 = TRUE, QC.rm = TRUE) str(RFLP4)
Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) str(RFLP1) RFLP2 <- RFLPqc(RFLP1, rm.band1 = FALSE) # identical to RFLP1 identical(RFLP1, RFLP2) RFLP3 <- RFLPqc(RFLP1) str(RFLP3) RFLP4 <- RFLPqc(RFLP1, rm.band1 = TRUE, QC.rm = TRUE) str(RFLP4)
This is an example data set for RFLP reference.
data(RFLPref)
data(RFLPref)
A data frame with 35 observations on the following five variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Taxonname
character: taxon name.
Accession
character: accession number.
This example data set for RFLP reference consists of seven RFLP reference samples. Taxon names are assigned by sequence comparison with GenBank database (https://www.ncbi.nlm.nih.gov/BLAST/), and supplemented with imaginary accession numbers.
The data set was generated by F. Flessa.
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPref) str(RFLPref)
data(RFLPref) str(RFLPref)
Given RFLP samples are plotted together with reference samples and sorted by their distance to the reference sample.
RFLPrefplot(x, ref, distfun = dist, nrBands, mar.bottom = 5, cex.main = 1.2, cex.axis = 0.5, devNew = FALSE, colBands, xlab = "", ylab = "molecular weight", ylim, ...)
RFLPrefplot(x, ref, distfun = dist, nrBands, mar.bottom = 5, cex.main = 1.2, cex.axis = 0.5, devNew = FALSE, colBands, xlab = "", ylab = "molecular weight", ylim, ...)
x |
data.frame with RFLP data; e.g. |
ref |
data.frame with RFLP reference data; e.g. |
distfun |
function computing the distance with default |
nrBands |
if not missing, then only samples with the specified number of bands are considered. |
mar.bottom |
bottom margin of the plot; see |
cex.main |
size of the plot title. |
cex.axis |
size of the x-axis annotation. |
devNew |
logical. Open new graphics device for each plot. |
colBands |
color for the bands. Has to be of length 1 or number of samples.
If missing, |
xlab |
passed to function |
ylab |
passed to function |
ylim |
passed to function |
... |
additional arguments passed to function |
Given RFLP samples are plotted together with reference samples and sorted by their distance to the reference sample.
invisible
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) data(RFLPref) dev.new(width = 12) RFLPrefplot(RFLPdata, RFLPref, nrBands = 4, cex.axis = 0.5) dev.new() RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8) RFLPrefplot(RFLPdata, RFLPref, nrBands = 9, cex.axis = 0.8) RFLPrefplot(RFLPdata, RFLPref[RFLPref$Sample == "Ni_29_A3",], nrBands = 4, cex.axis = 0.7) Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) RFLP2 <- RFLPqc(RFLP1) dev.new(width = 12) RFLPrefplot(RFLP1, RFLPref, nrBands = 4, cex.axis = 0.8) dev.new() RFLPrefplot(RFLP1, RFLPref, nrBands = 5, cex.axis = 0.8)
data(RFLPdata) data(RFLPref) dev.new(width = 12) RFLPrefplot(RFLPdata, RFLPref, nrBands = 4, cex.axis = 0.5) dev.new() RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8) RFLPrefplot(RFLPdata, RFLPref, nrBands = 9, cex.axis = 0.8) RFLPrefplot(RFLPdata, RFLPref[RFLPref$Sample == "Ni_29_A3",], nrBands = 4, cex.axis = 0.7) Dir <- system.file("extdata", package = "RFLPtools") # input directory filename <- file.path(Dir, "AZ091016_report.txt") RFLP1 <- read.rflp(file = filename) RFLP2 <- RFLPqc(RFLP1) dev.new(width = 12) RFLPrefplot(RFLP1, RFLPref, nrBands = 4, cex.axis = 0.8) dev.new() RFLPrefplot(RFLP1, RFLPref, nrBands = 5, cex.axis = 0.8)
Function to convert similarity matrix to object of S3 class "dist"
.
sim2dist(x, maxSim = 1)
sim2dist(x, maxSim = 1)
x |
symmetric matrix: similarity matrix. |
maxSim |
maximum similarity possible. |
Similarity is converted to distance by maxSim - x
.
The resulting matrix is converted to an object of S3 class "dist"
by as.dist
Object of S3 class "dist"
is returned; see dist
.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(BLASTdata) ## without sequence range ## Not run: res <- simMatrix(BLASTdata) ## End(Not run) ## with sequence range range(BLASTdata$alignment.length) res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450) res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500) ## visualize similarity matrix library(MKomics) simPlot(res2, minVal = 0, labels = colnames(res2), title = "(Dis-)Similarity Plot") ## or library(lattice) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) levelplot(res2, col.regions = myCol, at = do.breaks(c(0, max(res2)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## convert to distance res.d <- sim2dist(res2) ## hierarchical clustering plot(hclust(res.d))
data(BLASTdata) ## without sequence range ## Not run: res <- simMatrix(BLASTdata) ## End(Not run) ## with sequence range range(BLASTdata$alignment.length) res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450) res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500) ## visualize similarity matrix library(MKomics) simPlot(res2, minVal = 0, labels = colnames(res2), title = "(Dis-)Similarity Plot") ## or library(lattice) myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128) levelplot(res2, col.regions = myCol, at = do.breaks(c(0, max(res2)), 128), xlab = "", ylab = "", ## Rotate label of x axis scales = list(x = list(rot = 90)), main = "(Dis-)Similarity Plot") ## convert to distance res.d <- sim2dist(res2) ## hierarchical clustering plot(hclust(res.d))
Function to compute similarity matrix for all-vs-all BLAST results of rDNA sequences generated with standalone BLAST from NCBI or local BLAST implemented in BioEdit.
simMatrix(x, sequence.range = FALSE, Min, Max)
simMatrix(x, sequence.range = FALSE, Min, Max)
x |
data.frame with BLAST data; see |
sequence.range |
logical: use sequence range. |
Min |
minimum sequence length. |
Max |
maximum sequence length. |
The given BLAST data is used to compute a similarity matrix using the following algorithm: First, the length of each sequence (LS) comprised in the input data file is extracted. If there is more than one comparison for one sequence including different parts of the respective sequence, that one with maximum base length is chosen. Subsequently, the number of matching bases (mB) is calculated by multiplying two variables comprised in the BLAST output: the identity between sequences (%) and the number of nucleotides divided by 100. The, resulting value is rounded to integer. Furthermore, the similarity is calculated by dividing mB by LS. Finally, the similarity matrix including all sequences is built. If the similarity of a combination is not shown in the BLAST report file (because the similarity was lower than 70%), this comparison is included in the similarity matrix with the result zero.
Similarity matrix.
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews
BioEdit: https://bioedit.software.informer.com/
Persoh, D., Melcher, M., Flessa, F., Rambold, G.: First fungal community analyses of endophytic ascomycetes associated with Viscum album ssp. austriacum and itshost Pinus sylvestris. Fungal Biology 2010 Jul;114(7):585-96.
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(BLASTdata) ## without sequence range ## code takes some time ## Not run: res <- simMatrix(BLASTdata) ## End(Not run) ## with sequence range range(BLASTdata$alignment.length) res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450) res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)
data(BLASTdata) ## without sequence range ## code takes some time ## Not run: res <- simMatrix(BLASTdata) ## End(Not run) ## with sequence range range(BLASTdata$alignment.length) res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450) res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)
Simulates RFLP data for comparions of algorithms.
simulateRFLPdata(N = 10, nrBands = 3:12, bandCenters = seq(100, 800, by = 100), delta = 50, refData = FALSE)
simulateRFLPdata(N = 10, nrBands = 3:12, bandCenters = seq(100, 800, by = 100), delta = 50, refData = FALSE)
N |
integer: number samples which shall be simulated per number of bands. |
nrBands |
integer: vector of number of bands. |
bandCenters |
numeric: vector of band centers. |
delta |
numeric: uniform distribution with |
refData |
logical: if TRUE, additonal columns |
The function can be used to simulate RFLP data. For every number of band specified in
nrBands
a total number of N
samples are generated.
First the band centers are randomly selected (with replacement) from bandCenter
which form the centers of intervals of length 2*delta
. From these intervals
uniform random numbers are drawn leading to randomly generated RFLP data.
A data frame with N*length(nrBands)
observations on the following four variables
Sample
character: sample identifier.
Band
integer: band number.
MW
integer: molecular weight.
Enzyme
character: enzyme name.
is generated. If refData = TRUE
then the following two additional variables
are added.
Taxonname
character: taxon name.
Accession
character: accession number.
Mohammed Aslam Imtiaz, Matthias Kohl [email protected]
simData <- simulateRFLPdata()
simData <- simulateRFLPdata()
The tree obtained by a hierarchical cluster analysis is cut into
groups by using cutree
and the results are
exported to a text file.
write.hclust(x, file, prefix, h = NULL, k = NULL, append = FALSE, dec = ",")
write.hclust(x, file, prefix, h = NULL, k = NULL, append = FALSE, dec = ",")
x |
object of class |
file |
either a character string naming a file or a connection open
for writing. |
prefix |
character. Information about the cluster analysis. |
h |
numeric scalar or vector with heights where the tree should be cut. |
k |
an integer scalar or vector with the desired number of groups. |
append |
logical. Only relevant if |
dec |
the string to use for decimal points in numeric or complex columns: must be a single character. |
The results are written to file by a call to write.table
where the columns in the resulting file are seperated by tabulators
(i.e. sep="\t"
) and no row names are exported (i.e. row.names = FALSE
).
Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]
Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.
data(RFLPdata) res <- RFLPdist(RFLPdata, nrBands = 4) cl <- hclust(res) ## Not run: write.hclust(cl, file = "Test.txt", prefix = "Bd4", h = 50) ## End(Not run) res <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1) cl <- hclust(res) ## Not run: write.hclust(cl, file = "Test.txt", append = TRUE, prefix = "Bd4_Mis1", h = 60) ## End(Not run)
data(RFLPdata) res <- RFLPdist(RFLPdata, nrBands = 4) cl <- hclust(res) ## Not run: write.hclust(cl, file = "Test.txt", prefix = "Bd4", h = 50) ## End(Not run) res <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1) cl <- hclust(res) ## Not run: write.hclust(cl, file = "Test.txt", append = TRUE, prefix = "Bd4_Mis1", h = 60) ## End(Not run)