Package 'RFLPtools' reference manual

Title:	Tools to Analyse RFLP Data
Description:	Provides functions to analyse DNA fragment samples (i.e. derived from RFLP-analysis) and standalone BLAST report files (i.e. DNA sequence analysis).
Authors:	Fabienne Flessa [aut], Alexandra Kehl [aut] , Mohammed Aslam Imtiaz [aut], Matthias Kohl [aut, cre]
Maintainer:	Matthias Kohl <[email protected]>
License:	LGPL-3
Version:	2.0
Built:	2025-02-15 04:55:10 UTC
Source:	https://github.com/cran/RFLPtools

Tools To Analyse RFLP-Data

Description

RFLPtools provides functions to analyse DNA fragment samples (i.e. derived from RFLP-analysis) and standalone BLAST report files (i.e. DNA sequence analysis).

Details

Package:	RFLPtools
Version:	2.0
Date:	2022-02-07
Depends:	R(>= 4.0.0)
Imports:	stats, utils, graphics, grDevices, RColorBrewer
Suggests:	knitr, rmarkdown, lattice, MKomics
License:	LGPL-3

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Mohammed Aslam Imtiaz,
Matthias Kohl [email protected]

Maintainer: Matthias Kohl [email protected]

References

Local Blast download: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews

Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Matsumoto, Masaru; Furuya, Naruto; Takanami, Yoichi; Matsuyama, Nobuaki. RFLP analysis of the PCR-amplified 28S rDNA in Rhizoctonia solani. Mycoscience 1996 37:351-356.

Persoh, D., Melcher, M., Flessa, F., Rambold, G.: First fungal community analyses of endophytic ascomycetes associated with Viscum album ssp. austriacum and itshost Pinus sylvestris. Fungal Biology 2010 Jul;114(7):585-96.

Poussier, Stephane; Trigalet-Demery, Danielle; Vandewalle, Peggy; Goffinet, Bruno; Luisetti, Jacques; Trigalet, Andre. Genetic diversity of Ralstonia solanacearum as assessed by PCR-RFLP of the hrp gene region, AFLP and 16S rRNA sequence analysis, and identification of an African subdivision. Microbiology 2000 146:1679-1692.

T. A. Saari, S. K. Saari, C. D. Campbell, I. J Alexander, I. C. Anderson. FragMatch - a program for the analysis of DNA fragment data. Mycorrhiza 2007, 17:133-136

Examples

data(RFLPdata)
res <- RFLPdist(RFLPdata)
plot(hclust(res[[1]]), main = "Euclidean distance")

par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)

data(RFLPref)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8)


library(MKomics)
data(BLASTdata)
res <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
simPlot(res, col = myCol, minVal = 0, 
        labels = colnames(res), title = "(Dis-)Similarity Plot")
data(RFLPdata)
res <- RFLPdist(RFLPdata)
plot(hclust(res[[1]]), main = "Euclidean distance")

par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)

data(RFLPref)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8)


library(MKomics)
data(BLASTdata)
res <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
simPlot(res, col = myCol, minVal = 0, 
        labels = colnames(res), title = "(Dis-)Similarity Plot")

Example data set for BLAST data

Description

This is an example data set for BLAST data generated with standalone BLAST from NCBI.

Usage

data(RFLPdata)data(RFLPdata)

Format

A data frame with 737 observations on the following four variables

query.id: character: sequence identifier.
subject.id: character: subject identifier.
identity: numeric: identity between sequences (in percent).
alignment.length: integer: number of nucleotides.
mismatches: integer: number of mismatches.
gap.opens: integer: number of gaps.
q.start: integer: query sequence start.
q.end: integer: query sequence end.
s.start: integer: subject sequence start.
s.end: integer: subject sequence end.
evalue: numeric: evalue.
bit.score: numeric: score value.

Details

The data was generated with standalone BLAST from NCBI. Pairwise similarities of DNA sequences are calculated among all sequences to analyse applying Standalone Blast with the parameters -m 8 -r 2 -G 5 -E 2.

Alternatively data can be generated with "local BLAST" implemented in BioEdit v7.0.9 using the additional parameters -m 8 -r 2 -G 5 -E 2 and by selecting "open output" and "tabular output".

Source

The data set was generated by F. Flessa.

References

Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews

BioEdit: https://bioedit.software.informer.com/

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(BLASTdata)
str(BLASTdata)
data(BLASTdata)
str(BLASTdata)

Distance Matrix Computation

Description

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix. Instead of the row values as in the case of dist, the successive differences of the row values are used.

Usage

diffDist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
diffDist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

Arguments

`x`	a numeric matrix, data frame or `"dist"` object.
`method`	the distance measure to be used. This must be one of `"euclidean"`, `"maximum"`, `"manhattan"`, `"canberra"`, `"binary"` or `"minkowski"`. Any unambiguous substring can be given.
`diag`	logical value indicating whether the diagonal of the distance matrix should be printed by `print.dist`.
`upper`	logical value indicating whether the upper triangle of the distance matrix should be printed by `print.dist`.
`p`	The power of the Minkowski distance.

Details

It's a simple wrapper function arround dist. For more details about the distances we refer to dist.

The function may be helpful, if there is a shift w.r.t.\ the measured bands; e.g.\ c(550, 500, 300, 250) vs.\ c(510, 460, 260, 210).

Value

diffDist returns an object of class "dist"; cf. dist.

Author(s)

Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

## assume a shift in the measured bands
M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210),
           c(550, 500, 300, 200))
dist(M)
diffDist(M)
## assume a shift in the measured bands
M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210),
           c(550, 500, 300, 200))
dist(M)
diffDist(M)

Compute matches for RFLP data via FragMatch.

Description

Compute matches for RFLP data using FragMatch - a program for the analysis of DNA fragment data.

Usage

FragMatch(newData, refData, maxValue = 1000, errorBound = 25,
          weight = 1, na.rm = TRUE)
FragMatch(newData, refData, maxValue = 1000, errorBound = 25,
          weight = 1, na.rm = TRUE)

Arguments

`newData`	data.frame with new RFLP data; see `newDataGerm`.
`refData`	data.frame with reference RFLP data; see `refDataGerm`.
`maxValue`	numeric: maximum value for which the error bound is applied. Can be a vector of length larger than 1.
`errorBound`	numeric: error bound corresponding to `maxValue`. Can be a vector of length larger than 1.
`weight`	numeric: weight for weighting partial matches; see details section.
`na.rm`	logical: indicating whether NA values should be stripped before the computation proceeds.

Details

A rather simple algorithm which consists of counting the number of matches where it is considered a match if the value is inside a range of +/- errorBound.

If there is more than one enzyme, one can use weights to give the partial perfect matches for a certain enzyme a higher (or also smaller) weight.

Value

A character matrix with entries of the form "a_b" which means that there were a out of b possible matches.

Author(s)

Mohammed Aslam Imtiaz, Matthias Kohl [email protected]

References

T. A. Saari, S. K. Saari, C. D. Campbell, I. J Alexander, I. C. Anderson. FragMatch - a program for the analysis of DNA fragment data. Mycorrhiza 2007, 17:133-136

Examples

  data(refDataGerm)
  data(newDataGerm)
  
  res <- FragMatch(newDataGerm, refDataGerm)
data(refDataGerm)
  data(newDataGerm)
  
  res <- FragMatch(newDataGerm, refDataGerm)

Compute matches for RFLP data via GERM.

Description

Compute matches for RFLP data using the Good-Enough RFLP Matcher (GERM) program.

Usage

germ(newData, refData, parameters = list("Max forward error" = 25,
                                         "Max backward error" = 25,
                                         "Max sum error" = 100,
                                         "Lower measurement limit" = 100), 
     method = "joint", na.rm = TRUE)
germ(newData, refData, parameters = list("Max forward error" = 25,
                                         "Max backward error" = 25,
                                         "Max sum error" = 100,
                                         "Lower measurement limit" = 100), 
     method = "joint", na.rm = TRUE)

Arguments

`newData`	data.frame with new RFLP data; see `newDataGerm`.
`refData`	data.frame with reference RFLP data; see `refDataGerm`.
`parameters`	list of the four program parameters of GERM; see details section.
`method`	matching and ranking method used for computation; see details section.
`na.rm`	logical: indicating whether NA values should be stripped before the computation proceeds.

Details

There are four matching and ranking methods which are "joint", "forward", "backward", and "sum". For more details see Dickie et al. (2003).

The parameters of the GERM software are: "Max forward error": Used if "matching and ranking method" is set to "forward" or "joint". "Max backward error": Used if "matching and ranking method" is set to "backward" or "joint". "Max sum error": Used for matching if "matching and ranking method" is set to "sum". "Lower measurement limit": The lower bound of measurements (often 100 or 50, depending on ladder used).

Value

A named list with the results.

Author(s)

Mohammed Aslam Imtiaz, Matthias Kohl [email protected]

References

Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.

Examples

  data(refDataGerm)
  data(newDataGerm)
  
  ## Example 1
  res1 <- germ(newDataGerm[1:7,], refDataGerm)
  
  ## Example 2
  res2 <- germ(newDataGerm[8:15,], refDataGerm)
  
  ## Example 3
  res3 <- germ(newDataGerm[16:20,], refDataGerm)
  
  ## all three examples in one step
  res.all <- germ(newDataGerm, refDataGerm)
data(refDataGerm)
  data(newDataGerm)
  
  ## Example 1
  res1 <- germ(newDataGerm[1:7,], refDataGerm)
  
  ## Example 2
  res2 <- germ(newDataGerm[8:15,], refDataGerm)
  
  ## Example 3
  res3 <- germ(newDataGerm[16:20,], refDataGerm)
  
  ## all three examples in one step
  res.all <- germ(newDataGerm, refDataGerm)

Linear Combination of Distances

Description

This function computes linear combinations of distances.

Usage

linCombDist(x, distfun1, w1, distfun2, w2, diag = FALSE, upper = FALSE)
linCombDist(x, distfun1, w1, distfun2, w2, diag = FALSE, upper = FALSE)

Arguments

`x`	object which is passed to `distfun1` and `distfun2`.
`distfun1`	function used to compute an object of class `"dist"`.
`w1`	weight for result of `distfun1`.
`distfun2`	function used to compute an object of class `"dist"`.
`w2`	weight for result of `distfun2`.
`diag`	see `dist`
`upper`	see `dist`

Details

This function computes and returns the distance matrix computed by a linear combination of two distance matrices.

Value

linCombDist returns an object of class "dist"; cf. dist.

Author(s)

Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

## assume a shift in the measured bands
M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210),
           c(700, 650, 450, 400), c(550, 490, 310, 250))
dist(M)
diffDist(M)

## convex combination of dist and diffDist
linCombDist(M, distfun1 = dist, w1 = 0.5, distfun2 = diffDist, w2 = 0.5)

## linear combination
linCombDist(M, distfun1 = dist, w1 = 2, distfun2 = diffDist, w2 = 5)

## maximum distance
linCombDist(M, distfun1 = function(x) dist(x, method = "maximum"), w1 = 0.5, 
            distfun2 = function(x) diffDist(x, method = "maximum"), w2 = 0.5)
            
data(RFLPdata)
distfun <- function(x) linCombDist(x, distfun1 = dist, w1 = 0.1, distfun2 = diffDist, w2 = 0.9)
par(mfrow = c(2, 2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun)), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, mar.bottom = 6, cex.axis = 0.8)
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)
## assume a shift in the measured bands
M <- rbind(c(550, 500, 300, 250), c(510, 460, 260, 210),
           c(700, 650, 450, 400), c(550, 490, 310, 250))
dist(M)
diffDist(M)

## convex combination of dist and diffDist
linCombDist(M, distfun1 = dist, w1 = 0.5, distfun2 = diffDist, w2 = 0.5)

## linear combination
linCombDist(M, distfun1 = dist, w1 = 2, distfun2 = diffDist, w2 = 5)

## maximum distance
linCombDist(M, distfun1 = function(x) dist(x, method = "maximum"), w1 = 0.5, 
            distfun2 = function(x) diffDist(x, method = "maximum"), w2 = 0.5)
            
data(RFLPdata)
distfun <- function(x) linCombDist(x, distfun1 = dist, w1 = 0.1, distfun2 = diffDist, w2 = 0.9)
par(mfrow = c(2, 2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun)), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, mar.bottom = 6, cex.axis = 0.8)
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)

Example data set from GERM software

Description

This is the reference data taken from the GERM software.

Usage

data(newDataGerm)data(newDataGerm)

Format

A data frame with 20 observations on the following six variables

Sample: character: sample identifier.
Enzyme: character: enzyme used.
Band: integer: band number.
MW: integer: molecular weight.
Genus: character: genus of sample.
Species: character: species of sample.

Details

See GERM software.

Source

The data set was taken from the GERM software (table 'Example Unknowns').

References

Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.

Examples

data(newDataGerm)
str(newDataGerm)
data(newDataGerm)
str(newDataGerm)

Function to compute number of bands.

Description

Computes groups based on the number of bands per sample in a RFLP data set. Each group comprises RFLP-samples with equal number of bands.

Usage

nrBands(x)
nrBands(x)

Arguments

`x`	data.frame with RFLP data; see `RFLPdata`.

Details

The function computes groups based on the number of bands per sample in a RFLP data set. Each group comprises RFLP-samples with equal number of bands.

Value

Number of bands per RFLP-samples.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
nrBands(RFLPdata)
data(RFLPdata)
nrBands(RFLPdata)

Read BLAST data

Description

Function to read BLAST data generated with standalone BLAST from NCBI.

Usage

read.blast(file, sep = "\t")
read.blast(file, sep = "\t")

Arguments

`file`	character: BLAST file to read in.
`sep`	the field separator character. Values on each line of the file are separated by this character. Default `"\t"`.

Details

The function reads data which was generated with standalone BLAST from NCBI; see ftp://ftp.ncbi.nih.gov/blast/executables/release/.

Possible steps:
1) Install NCBI BLAST
2) Generate and import database(s)
3) Apply BLAST with options outfmt and out; e.g.
blastn -query Testquery -db Testdatabase -outfmt 6 -out out.txt
or
blastn -query Testquery -db Testdatabase -outfmt 10 -out out.csv
One can also call BLAST from inside R by using function system
system("blastn -query Testquery -db Testdatabase -outfmt 6 -out out.txt")
4) Read in the results
test.res <- read.blast(file = "out.txt")
or
test.res <- read.blast(file = "out.csv", sep = ",")

Value

A data.frame with variables

query.id: character: sequence identifier.
subject.id: character: subject identifier.
identity: numeric: identity between sequences (in percent).
alignment.length: integer: number of nucleotides.
mismatches: integer: number of mismatches.
gap.opens: integer: number of gaps.
q.start: integer: query sequence start.
q.end: integer: query sequence end.
s.start: integer: subject sequence start.
s.end: integer: subject sequence end.
evalue: numeric: evalue.
bit.score: numeric: score value.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "BLASTexample.txt")
BLAST1 <- read.blast(file = filename)
str(BLAST1)
Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "BLASTexample.txt")
BLAST1 <- read.blast(file = filename)
str(BLAST1)

Read RFLP data

Description

Function to read RFLP data (e.g. generated with software package Gene Profiler 4.05 (Scanalytics Inc.)) for DNA fragment analysis and genotyping, and exported to a text file.

Usage

read.rflp(file)
read.rflp(file)

Arguments

file

character: RFLP file to read in.

Details

The function reads data from a text file which was generated e.g. with the software package Gene Profiler 4.05 (Scanalytics Inc.) for DNA fragment analysis and genotyping. The data file contains sample identifier (Sample), band number (Band), molecular weight (MW) and gel identifier (Gel) (see RFLPdata).

If gel identifier Gel is missing it is extracted from the sample identifier Sample.

Value

A data.frame with variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Gel: character: gel identifier.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "RFLPexample.txt")
RFLP1 <- read.rflp(file = filename)
str(RFLP1)

filename <- file.path(Dir, "AZ091016_report.txt")
RFLP2 <- read.rflp(file = filename)
str(RFLP2)
Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "RFLPexample.txt")
RFLP1 <- read.rflp(file = filename)
str(RFLP1)

filename <- file.path(Dir, "AZ091016_report.txt")
RFLP2 <- read.rflp(file = filename)
str(RFLP2)

Example data set from GERM software

Description

This is the reference data taken from the GERM software.

Usage

data(refDataGerm)data(refDataGerm)

Format

A data frame with 250 observations on the following six variables

Sample: character: sample identifier.
Enzyme: character: enzyme used.
Band: integer: band number.
MW: integer: molecular weight.
Genus: character: genus of sample.
Species: character: species of sample.

Details

See GERM software.

Source

The data set was taken from the GERM software (table 'Example Data').

References

Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.

Examples

data(refDataGerm)
str(refDataGerm)
data(refDataGerm)
str(refDataGerm)

Combine RFLP data sets

Description

Function to combine an arbitrary number of RFLP data sets.

Usage

RFLPcombine(...)
RFLPcombine(...)

Arguments

...

two or more data.frames with RFLP data.

Details

The data sets are combined using rbind.

If data sets with identical sample identifiers are given, the identifiers are made unique using make.unique.

Value

A data.frame with variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Gel: character: gel identifier.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
res <- RFLPcombine(RFLPdata, RFLPdata, RFLPdata)
RFLPplot(res, nrBands = 4)
data(RFLPdata)
res <- RFLPcombine(RFLPdata, RFLPdata, RFLPdata)
RFLPplot(res, nrBands = 4)

Example data set for RFLP data

Description

This is an example data set for RFLP data.

Usage

data(RFLPdata)data(RFLPdata)

Format

A data frame with 737 observations on the following four variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Gel: character: gel identifier.

Details

The molecular weight was determined using the software package Gene Profiler 4.05 (Scanalytics Inc.) for DNA fragment analysis and genotyping, and exported to a text file.

Source

The data set was generated by F. Flessa.

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
str(RFLPdata)
data(RFLPdata)
str(RFLPdata)

Compute distances for RFLP data.

Description

Within each group containing RFLP-samples exhibiting a equal number of bands, the distance between the molecular weights is computed.

Usage

RFLPdist(x, distfun = dist, nrBands, LOD = 0)
RFLPdist(x, distfun = dist, nrBands, LOD = 0)

Arguments

`x`	data.frame with RFLP data; see `RFLPdata`.
`distfun`	function computing the distance with default `dist`; cf. `dist`.
`nrBands`	if not missing, then only samples with the specified number of bands are considered.
`LOD`	threshold for low-bp bands.

Details

For each number of bands the given distance between the molecular weights is computed. The result is a named list of distances where the names correspond to the number of bands which occur in each group.

If nrBands is specified only samples with this number of bands are considered.

If LOD > 0 is specified, all values below LOD are removed before the distances are calculated.

Value

A named list with the distances; see dist.

In case nrBands is not missing, an object of S3 class dist.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Matsumoto, Masaru; Furuya, Naruto; Takanami, Yoichi; Matsuyama, Nobuaki. RFLP analysis of the PCR-amplified 28S rDNA in Rhizoctonia solani. Mycoscience 1996 37:351 - 356

Examples

## Euclidean distance
data(RFLPdata)
res <- RFLPdist(RFLPdata)
names(res) ## number of bands
res$"6"

RFLPdist(RFLPdata, nrBands = 6)

## Other distances
res1 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "manhattan"))
res2 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "maximum"))
res[[1]]
res1[[1]]
res2[[1]]

## cut dendrogram at height 50
clust4bd <- hclust(res[[2]])
cgroups50 <- cutree(clust4bd, h=50)
cgroups50

## or
library(MKomics)
res3 <- RFLPdist(RFLPdata, distfun = corDist)
res3$"9"

## hierarchical clustering
par(mfrow = c(2,2))
plot(hclust(res[[1]]), main = "Euclidean distance")
plot(hclust(res1[[1]]), main = "Manhattan distance")
plot(hclust(res2[[1]]), main = "Maximum distance")
plot(hclust(res3[[1]]), main = "Pearson correlation distance")


## Similarity matrix
library(MKomics)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
ord <- order.dendrogram(as.dendrogram(hclust(res[[1]])))
temp <- as.matrix(res[[1]])
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot")


## or
library(lattice)
levelplot(temp[ord,ord], col.regions = rev(myCol),
          at = do.breaks(c(0, max(temp)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")

## multidimensional scaling
loc <- cmdscale(res[[5]])
x <- loc[,1]
y <- -loc[,2]
plot(x, y, type="n", xlab="", ylab="", xlim = 1.05*range(x), main="Multidemsional scaling")
text(x, y, rownames(loc), cex=0.8)
## Euclidean distance
data(RFLPdata)
res <- RFLPdist(RFLPdata)
names(res) ## number of bands
res$"6"

RFLPdist(RFLPdata, nrBands = 6)

## Other distances
res1 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "manhattan"))
res2 <- RFLPdist(RFLPdata, distfun = function(x) dist(x, method = "maximum"))
res[[1]]
res1[[1]]
res2[[1]]

## cut dendrogram at height 50
clust4bd <- hclust(res[[2]])
cgroups50 <- cutree(clust4bd, h=50)
cgroups50

## or
library(MKomics)
res3 <- RFLPdist(RFLPdata, distfun = corDist)
res3$"9"

## hierarchical clustering
par(mfrow = c(2,2))
plot(hclust(res[[1]]), main = "Euclidean distance")
plot(hclust(res1[[1]]), main = "Manhattan distance")
plot(hclust(res2[[1]]), main = "Maximum distance")
plot(hclust(res3[[1]]), main = "Pearson correlation distance")


## Similarity matrix
library(MKomics)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
ord <- order.dendrogram(as.dendrogram(hclust(res[[1]])))
temp <- as.matrix(res[[1]])
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot")


## or
library(lattice)
levelplot(temp[ord,ord], col.regions = rev(myCol),
          at = do.breaks(c(0, max(temp)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")

## multidimensional scaling
loc <- cmdscale(res[[5]])
x <- loc[,1]
y <- -loc[,2]
plot(x, y, type="n", xlab="", ylab="", xlim = 1.05*range(x), main="Multidemsional scaling")
text(x, y, rownames(loc), cex=0.8)

Compute distances for RFLP data.

Description

If gel image quality is low, faint bands may be disregarded and may lead to wrong conclusions. This function computes the distance between the molecular weights of RFLP samples, including samples containing one or more additional bands. Thus, failures during band detection could be identified. Visualisation of band patterns using this method can be done by RFLPplot using the argument nrMissing.

Usage

RFLPdist2(x, distfun = dist, nrBands, nrMissing, LOD = 0,
          diag = FALSE, upper = FALSE)
RFLPdist2(x, distfun = dist, nrBands, nrMissing, LOD = 0,
          diag = FALSE, upper = FALSE)

Arguments

`x`	data.frame with RFLP data; see `RFLPdata`.
`distfun`	function computing the distance with default `dist`; cf. `dist`.
`nrBands`	samples with number of bands equal to `nrBands` are to be considered.
`nrMissing`	number of bands that might be missing.
`LOD`	threshold for low-bp bands.
`diag`	see `dist`
`upper`	see `dist`

Details

For a given number of bands the given distance between the molecular weights is computed. It is assumed that a number of bands might be missing. Hence all samples with number of bands in nrBands, nrBands+1, ..., nrBands+nrMissing are compared.

If LOD > 0 is specified, it is assumed that missing bands can only occur for molecular weights smaller than LOD. As a consequence only samples which have nrBands bands with molecular weight larger or equal to LOD are selected.

For computing the distance between the molecular weight of a sample S1 with x bands and a Sample S2 with x+y bands the distances between the molecular weight of sample S1 and the molecular weight of all possible subsets of S2 with x bands are computed. The distance between S1 and S2 is then defined as the minimum of all these distances.

If LOD > 0 is specified, only all combinations of values below LOD are considered.

This option may be useful, if gel image quality is low, and the detection of bands is doubtful.

Value

An object of class "dist" returned; cf. dist.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Ian A. Dickie, Peter G. Avis, David J. McLaughlin, Peter B. Reich. Good-Enough RFLP Matcher (GERM) program. Mycorrhiza 2003, 13:171-172.

Examples

## Euclidean distance
data(RFLPdata)
nrBands(RFLPdata)
res0 <- RFLPdist(RFLPdata, nrBands = 4)
res1 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1)
res2 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 2)
res3 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 3)

## assume missing bands only below LOD
res1.lod <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1, LOD = 60)

## hierarchical clustering
par(mfrow = c(2,2))
plot(hclust(res0), main = "0 bands missing")
plot(hclust(res1), main = "1 band missing")
plot(hclust(res2), main = "2 bands missing")
plot(hclust(res3), main = "3 bands missing")

## missing bands only below LOD
par(mfrow = c(1,2))
plot(hclust(res0), main = "0 bands missing")
plot(hclust(res1.lod), main = "1 band missing below LOD")

## Similarity matrix
library(MKomics)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
ord <- order.dendrogram(as.dendrogram(hclust(res1)))
temp <- as.matrix(res1)
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot")

## missing bands only below LOD
ord <- order.dendrogram(as.dendrogram(hclust(res1.lod)))
temp <- as.matrix(res1.lod)
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot\n1 band missing below LOD")


## or
library(lattice)
levelplot(temp[ord,ord], col.regions = rev(myCol),
          at = do.breaks(c(0, max(temp)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")


## Other distances
res11 <- RFLPdist2(RFLPdata, distfun = function(x) dist(x, method = "manhattan"),
                 nrBands = 4, nrMissing = 1)
res12 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1)
res13 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1, LOD = 60)
par(mfrow = c(2,2))
plot(hclust(res1), main = "Euclidean distance\n1 band missing")
plot(hclust(res11), main = "Manhattan distance\n1 band missing")
plot(hclust(res12), main = "Pearson correlation distance\n1 band missing")
plot(hclust(res13), main = "Pearson correlation distance\n1 band missing below LOD")
## Euclidean distance
data(RFLPdata)
nrBands(RFLPdata)
res0 <- RFLPdist(RFLPdata, nrBands = 4)
res1 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1)
res2 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 2)
res3 <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 3)

## assume missing bands only below LOD
res1.lod <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1, LOD = 60)

## hierarchical clustering
par(mfrow = c(2,2))
plot(hclust(res0), main = "0 bands missing")
plot(hclust(res1), main = "1 band missing")
plot(hclust(res2), main = "2 bands missing")
plot(hclust(res3), main = "3 bands missing")

## missing bands only below LOD
par(mfrow = c(1,2))
plot(hclust(res0), main = "0 bands missing")
plot(hclust(res1.lod), main = "1 band missing below LOD")

## Similarity matrix
library(MKomics)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
ord <- order.dendrogram(as.dendrogram(hclust(res1)))
temp <- as.matrix(res1)
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot")

## missing bands only below LOD
ord <- order.dendrogram(as.dendrogram(hclust(res1.lod)))
temp <- as.matrix(res1.lod)
simPlot(temp[ord,ord], col = rev(myCol), minVal = 0, 
        labels = colnames(temp), title = "(Dis-)Similarity Plot\n1 band missing below LOD")


## or
library(lattice)
levelplot(temp[ord,ord], col.regions = rev(myCol),
          at = do.breaks(c(0, max(temp)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")


## Other distances
res11 <- RFLPdist2(RFLPdata, distfun = function(x) dist(x, method = "manhattan"),
                 nrBands = 4, nrMissing = 1)
res12 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1)
res13 <- RFLPdist2(RFLPdata, distfun = corDist, nrBands = 4, nrMissing = 1, LOD = 60)
par(mfrow = c(2,2))
plot(hclust(res1), main = "Euclidean distance\n1 band missing")
plot(hclust(res11), main = "Manhattan distance\n1 band missing")
plot(hclust(res12), main = "Pearson correlation distance\n1 band missing")
plot(hclust(res13), main = "Pearson correlation distance\n1 band missing below LOD")

Compute distance between RFLP data and RFLP reference data.

Description

Function to compute distance between RFLP data and RFLP reference data.

Usage

RFLPdist2ref(x, ref, distfun = dist, nrBands, LOD = 0)
RFLPdist2ref(x, ref, distfun = dist, nrBands, LOD = 0)

Arguments

`x`	data.frame with RFLP data; e.g. `RFLPdata`.
`ref`	data.frame with RFLP reference data; e.g. `RFLPref`.
`distfun`	function computing the distance with default `dist`; cf. `dist`.
`nrBands`	only samples and reference samples with this number of bands are considered.
`LOD`	threshold for low-bp bands.

Details

For each sample with nrBands bands the distance to each reference sample with nrBands bands is computed. The result is a matrix with the corresponding distances where rows represent the samples and columns the reference samples.

If LOD > 0 is specified, all values below LOD are removed before the distances are calculated. This applies to x and ref.

Value

A matrix with distances.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

## Euclidean distance
data(RFLPdata)
data(RFLPref)
nrBands(RFLPref)
RFLPdist2ref(RFLPdata, RFLPref, nrBands = 4)
RFLPdist2ref(RFLPdata, RFLPref, nrBands = 6)

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
RFLP2 <- RFLPqc(RFLP1)
nrBands(RFLP2)
RFLPdist2ref(RFLP1, RFLPref, nrBands = 4)
RFLPdist2ref(RFLP1, RFLPref, nrBands = 5)
## Euclidean distance
data(RFLPdata)
data(RFLPref)
nrBands(RFLPref)
RFLPdist2ref(RFLPdata, RFLPref, nrBands = 4)
RFLPdist2ref(RFLPdata, RFLPref, nrBands = 6)

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
RFLP2 <- RFLPqc(RFLP1)
nrBands(RFLP2)
RFLPdist2ref(RFLP1, RFLPref, nrBands = 4)
RFLPdist2ref(RFLP1, RFLPref, nrBands = 5)

Remove bands below LOD

Description

Function to exclude bands below a given LOD.

Usage

RFLPlod(x, LOD)
RFLPlod(x, LOD)

Arguments

`x`	data.frame with RFLP data.
`LOD`	threshold for low-bp bands.

Details

Low-bp bands may be regarded as unreliable. Function RFLPlod can be used to exclude such bands, which are likely to be absent in some other samples, before further analyses.

Value

A data.frame with variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Gel: character: gel identifier.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
## remove bands with MW smaller than 60
RFLPdata.lod <- RFLPlod(RFLPdata, LOD = 60)
par(mfrow = c(1, 2))
RFLPplot(RFLPdata, nrBands = 4, ylim = c(40, 670))
RFLPplot(RFLPdata.lod, nrBands = 4, ylim = c(40, 670))
title(sub = "After applying RFLPlod")
data(RFLPdata)
## remove bands with MW smaller than 60
RFLPdata.lod <- RFLPlod(RFLPdata, LOD = 60)
par(mfrow = c(1, 2))
RFLPplot(RFLPdata, nrBands = 4, ylim = c(40, 670))
RFLPplot(RFLPdata.lod, nrBands = 4, ylim = c(40, 670))
title(sub = "After applying RFLPlod")

Function to plot RFLP data.

Description

Given RFLP data is plotted where the samples are sorted according to the corresponding dendrogram.

Usage

RFLPplot(x, nrBands, nrMissing, distfun = dist, 
         hclust.method = "complete", mar.bottom = 5, 
         cex.axis = 0.5, colBands, xlab = "", 
         ylab = "molecular weight", ylim, ...)
RFLPplot(x, nrBands, nrMissing, distfun = dist, 
         hclust.method = "complete", mar.bottom = 5, 
         cex.axis = 0.5, colBands, xlab = "", 
         ylab = "molecular weight", ylim, ...)

Arguments

`x`	data.frame with RFLP data; see `RFLPdata`.
`nrBands`	if not missing, then only samples with the specified number of bands are considered.
`nrMissing`	if not missing, then it is assumed that some bands may be missing. That is, all samples with number of bands in nrBands, nrBands+1, ..., nrBands+nrMissing are considered.
`distfun`	function computing the distance with default `dist`; see `dist`.
`hclust.method`	method used for hierarchical clustering; see `hclust`.
`mar.bottom`	bottom margin of the plot; see `par`.
`cex.axis`	size of the x-axis annotation.
`colBands`	color for the bands. Has to be of length 1 or number of samples. If missing, `"Set1"` of RColorBrewer is used; see `ColorBrewer`.
`xlab`	passed to function `plot`.
`ylab`	passed to function `plot`.
`ylim`	passed to function `plot`. If missing an appropriate range of y-values is computed.
`...`	additional arguments passed to function `plot` except `xlim` which is defined inside of `RFLPplot`.

Details

RFLP data is plotted. The samples are sorted according to the corresponding dendrogram which is computed via function hclust.

The option to specify nrMissing may be useful, if gel image quality is low, and the detection of bands is doubtful.

Value

invisible

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)

par(mfrow = c(1,2))
plot(hclust(RFLPdist2(RFLPdata, nrBands = 9, nrMissing = 1)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 9, nrMissing = 1, mar.bottom = 6, cex.axis = 0.8)


distfun <- function(x) dist(x, method = "maximum")
par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun), 
            method = "average"), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, hclust.method = "average", 
         mar.bottom = 6, cex.axis = 0.8)
data(RFLPdata)
par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 3, mar.bottom = 6, cex.axis = 0.8)

par(mfrow = c(1,2))
plot(hclust(RFLPdist2(RFLPdata, nrBands = 9, nrMissing = 1)), cex = 0.7)
RFLPplot(RFLPdata, nrBands = 9, nrMissing = 1, mar.bottom = 6, cex.axis = 0.8)


distfun <- function(x) dist(x, method = "maximum")
par(mfrow = c(1,2))
plot(hclust(RFLPdist(RFLPdata, nrBands = 3, distfun = distfun), 
            method = "average"), cex = 0.7, cex.lab = 0.7)
RFLPplot(RFLPdata, nrBands = 3, distfun = distfun, hclust.method = "average", 
         mar.bottom = 6, cex.axis = 0.8)

Quality control for RFLP data

Description

Function to perform quality control for RFLP data based on a comparison between the total length of the digested PCR amplification product and the sum of the fragment lengths. If the sum is smaller or larger than the PCR amplification product (within a certain range to define), the samples can be excluded from further analyses. This function is helpful for data sets containig faint or uncertain bands. It is necessary to include the total length of the PCR amplification product for each sample as largest fragment in the data set, see RFLPdata.

Usage

RFLPqc(x, rm.band1 = TRUE, QC.lo = 0.8, QC.up = 1.07, QC.rm = FALSE)
RFLPqc(x, rm.band1 = TRUE, QC.lo = 0.8, QC.up = 1.07, QC.rm = FALSE)

Arguments

`x`	data.frame with RFLP data.
`rm.band1`	logical: remove first band.
`QC.lo`	numeric: a real number in (0,1).
`QC.up`	numeric: a real number larger than 1.
`QC.rm`	logical: remove samples with unsufficient quality.

Details

In case the first band corresponds to the total length of the fragment one can perform a quality control comparing the length of the first band with the sum of the lengths of the remaining bands for each sample. If the sum is smaller than QC.lo times the length of the first band or larger than QC.up times the length of the first band, respectively, a text message is printed.

If rm.band1 = TRUE band 1 of all samples is removed and the remaining band numbers are reduced by 1.

If QC.rm = TRUE samples of insufficient quality are entirely removed from the given data and the resulting data.frame is returned.

Value

A data.frame with variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Gel: character: gel identifier.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
str(RFLP1)

RFLP2 <- RFLPqc(RFLP1, rm.band1 = FALSE) # identical to RFLP1
identical(RFLP1, RFLP2)

RFLP3 <- RFLPqc(RFLP1)
str(RFLP3)

RFLP4 <- RFLPqc(RFLP1, rm.band1 = TRUE, QC.rm = TRUE)
str(RFLP4)
Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
str(RFLP1)

RFLP2 <- RFLPqc(RFLP1, rm.band1 = FALSE) # identical to RFLP1
identical(RFLP1, RFLP2)

RFLP3 <- RFLPqc(RFLP1)
str(RFLP3)

RFLP4 <- RFLPqc(RFLP1, rm.band1 = TRUE, QC.rm = TRUE)
str(RFLP4)

Example data set for RFLP reference

Description

This is an example data set for RFLP reference.

Usage

data(RFLPref)data(RFLPref)

Format

A data frame with 35 observations on the following five variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Taxonname: character: taxon name.
Accession: character: accession number.

Details

This example data set for RFLP reference consists of seven RFLP reference samples. Taxon names are assigned by sequence comparison with GenBank database (https://www.ncbi.nlm.nih.gov/BLAST/), and supplemented with imaginary accession numbers.

Source

The data set was generated by F. Flessa.

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPref)
str(RFLPref)
data(RFLPref)
str(RFLPref)

Function for a visual comparison of RFLP samples with reference samples.

Description

Given RFLP samples are plotted together with reference samples and sorted by their distance to the reference sample.

Usage

RFLPrefplot(x, ref, distfun = dist, nrBands, mar.bottom = 5, 
            cex.main = 1.2, cex.axis = 0.5, devNew = FALSE, 
            colBands, xlab = "", ylab = "molecular weight", 
            ylim, ...)
RFLPrefplot(x, ref, distfun = dist, nrBands, mar.bottom = 5, 
            cex.main = 1.2, cex.axis = 0.5, devNew = FALSE, 
            colBands, xlab = "", ylab = "molecular weight", 
            ylim, ...)

Arguments

`x`	data.frame with RFLP data; e.g. `RFLPdata`.
`ref`	data.frame with RFLP reference data; e.g. `RFLPref`.
`distfun`	function computing the distance with default `dist`; see `dist`.
`nrBands`	if not missing, then only samples with the specified number of bands are considered.
`mar.bottom`	bottom margin of the plot; see `par`.
`cex.main`	size of the plot title.
`cex.axis`	size of the x-axis annotation.
`devNew`	logical. Open new graphics device for each plot.
`colBands`	color for the bands. Has to be of length 1 or number of samples. If missing, `"Set1"` of RColorBrewer is used; see `ColorBrewer`.
`xlab`	passed to function `plot`.
`ylab`	passed to function `plot`.
`ylim`	passed to function `plot`. If missing an appropriate range of y-values is computed.
`...`	additional arguments passed to function `plot` except `xlim` which is defined inside of `RFLPplot`.

Details

Given RFLP samples are plotted together with reference samples and sorted by their distance to the reference sample.

Value

invisible

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
data(RFLPref)
dev.new(width = 12)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 4, cex.axis = 0.5)

dev.new()
RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 9, cex.axis = 0.8)

RFLPrefplot(RFLPdata, RFLPref[RFLPref$Sample == "Ni_29_A3",], nrBands = 4, cex.axis = 0.7)

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
RFLP2 <- RFLPqc(RFLP1)

dev.new(width = 12)
RFLPrefplot(RFLP1, RFLPref, nrBands = 4, cex.axis = 0.8)

dev.new()
RFLPrefplot(RFLP1, RFLPref, nrBands = 5, cex.axis = 0.8)
data(RFLPdata)
data(RFLPref)
dev.new(width = 12)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 4, cex.axis = 0.5)

dev.new()
RFLPrefplot(RFLPdata, RFLPref, nrBands = 6, cex.axis = 0.8)
RFLPrefplot(RFLPdata, RFLPref, nrBands = 9, cex.axis = 0.8)

RFLPrefplot(RFLPdata, RFLPref[RFLPref$Sample == "Ni_29_A3",], nrBands = 4, cex.axis = 0.7)

Dir <- system.file("extdata", package = "RFLPtools") # input directory 
filename <- file.path(Dir, "AZ091016_report.txt")
RFLP1 <- read.rflp(file = filename)
RFLP2 <- RFLPqc(RFLP1)

dev.new(width = 12)
RFLPrefplot(RFLP1, RFLPref, nrBands = 4, cex.axis = 0.8)

dev.new()
RFLPrefplot(RFLP1, RFLPref, nrBands = 5, cex.axis = 0.8)

Convert similarity matrix to dist object.

Description

Function to convert similarity matrix to object of S3 class "dist".

Usage

sim2dist(x, maxSim = 1)
sim2dist(x, maxSim = 1)

Arguments

`x`	symmetric matrix: similarity matrix.
`maxSim`	maximum similarity possible.

Details

Similarity is converted to distance by maxSim - x. The resulting matrix is converted to an object of S3 class "dist" by as.dist

Value

Object of S3 class "dist" is returned; see dist.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(BLASTdata)

## without sequence range
## Not run: 
res <- simMatrix(BLASTdata)

## End(Not run)

## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)

## visualize similarity matrix
library(MKomics)
simPlot(res2, minVal = 0, 
        labels = colnames(res2), title = "(Dis-)Similarity Plot")


## or
library(lattice)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
levelplot(res2, col.regions = myCol,
          at = do.breaks(c(0, max(res2)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")

## convert to distance
res.d <- sim2dist(res2)

## hierarchical clustering
plot(hclust(res.d))
data(BLASTdata)

## without sequence range
## Not run: 
res <- simMatrix(BLASTdata)

## End(Not run)

## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)

## visualize similarity matrix
library(MKomics)
simPlot(res2, minVal = 0, 
        labels = colnames(res2), title = "(Dis-)Similarity Plot")


## or
library(lattice)
myCol <- colorRampPalette(brewer.pal(8, "RdYlGn"))(128)
levelplot(res2, col.regions = myCol,
          at = do.breaks(c(0, max(res2)), 128),
          xlab = "", ylab = "",
          ## Rotate label of x axis
          scales = list(x = list(rot = 90)),
          main = "(Dis-)Similarity Plot")

## convert to distance
res.d <- sim2dist(res2)

## hierarchical clustering
plot(hclust(res.d))

Similarity matrix for BLAST data.

Description

Function to compute similarity matrix for all-vs-all BLAST results of rDNA sequences generated with standalone BLAST from NCBI or local BLAST implemented in BioEdit.

Usage

simMatrix(x, sequence.range = FALSE, Min, Max)
simMatrix(x, sequence.range = FALSE, Min, Max)

Arguments

`x`	data.frame with BLAST data; see `BLASTdata`.
`sequence.range`	logical: use sequence range.
`Min`	minimum sequence length.
`Max`	maximum sequence length.

Details

The given BLAST data is used to compute a similarity matrix using the following algorithm: First, the length of each sequence (LS) comprised in the input data file is extracted. If there is more than one comparison for one sequence including different parts of the respective sequence, that one with maximum base length is chosen. Subsequently, the number of matching bases (mB) is calculated by multiplying two variables comprised in the BLAST output: the identity between sequences (%) and the number of nucleotides divided by 100. The, resulting value is rounded to integer. Furthermore, the similarity is calculated by dividing mB by LS. Finally, the similarity matrix including all sequences is built. If the similarity of a combination is not shown in the BLAST report file (because the similarity was lower than 70%), this comparison is included in the similarity matrix with the result zero.

Value

Similarity matrix.

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews

BioEdit: https://bioedit.software.informer.com/

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(BLASTdata)

## without sequence range
## code takes some time
## Not run: 
res <- simMatrix(BLASTdata)

## End(Not run)

## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)
data(BLASTdata)

## without sequence range
## code takes some time
## Not run: 
res <- simMatrix(BLASTdata)

## End(Not run)

## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)

Simulate RFLP data.

Description

Simulates RFLP data for comparions of algorithms.

Usage

simulateRFLPdata(N = 10, nrBands = 3:12, bandCenters = seq(100, 800, by = 100),
                 delta = 50, refData = FALSE)
simulateRFLPdata(N = 10, nrBands = 3:12, bandCenters = seq(100, 800, by = 100),
                 delta = 50, refData = FALSE)

Arguments

`N`	integer: number samples which shall be simulated per number of bands.
`nrBands`	integer: vector of number of bands.
`bandCenters`	numeric: vector of band centers.
`delta`	numeric: uniform distribution with `min = bandCenter - delta` and `max = bandCenter + delta` is used.
`refData`	logical: if TRUE, additonal columns `Taxonname` and `Accesion` are generated.

Details

The function can be used to simulate RFLP data. For every number of band specified in nrBands a total number of N samples are generated.

First the band centers are randomly selected (with replacement) from bandCenter which form the centers of intervals of length 2*delta. From these intervals uniform random numbers are drawn leading to randomly generated RFLP data.

Value

A data frame with N*length(nrBands) observations on the following four variables

Sample: character: sample identifier.
Band: integer: band number.
MW: integer: molecular weight.
Enzyme: character: enzyme name.

is generated. If refData = TRUE then the following two additional variables are added.

Taxonname: character: taxon name.
Accession: character: accession number.

Author(s)

Mohammed Aslam Imtiaz, Matthias Kohl [email protected]

Examples

  simData <- simulateRFLPdata()
simData <- simulateRFLPdata()

Cut a hierarchical cluster tree and write cluster identifiers to a text file.

Description

The tree obtained by a hierarchical cluster analysis is cut into groups by using cutree and the results are exported to a text file.

Usage

write.hclust(x, file, prefix, h = NULL, k = NULL, append = FALSE, dec = ",")
write.hclust(x, file, prefix, h = NULL, k = NULL, append = FALSE, dec = ",")

Arguments

`x`	object of class `hclust`: result of hierarchical cluster analysis computed via function `hclust`.
`file`	either a character string naming a file or a connection open for writing. `""` indicates output to the console.
`prefix`	character. Information about the cluster analysis.
`h`	numeric scalar or vector with heights where the tree should be cut.
`k`	an integer scalar or vector with the desired number of groups.
`append`	logical. Only relevant if `file` is a character string. If `TRUE`, the output is appended to the file. If `FALSE`, any existing file of the name is destroyed.
`dec`	the string to use for decimal points in numeric or complex columns: must be a single character.

Details

The results are written to file by a call to write.table where the columns in the resulting file are seperated by tabulators (i.e. sep="\t") and no row names are exported (i.e. row.names = FALSE).

Author(s)

Fabienne Flessa [email protected],
Alexandra Kehl [email protected],
Matthias Kohl [email protected]

References

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

Examples

data(RFLPdata)
res <- RFLPdist(RFLPdata, nrBands = 4)
cl <- hclust(res)
## Not run: 
write.hclust(cl, file = "Test.txt", prefix = "Bd4", h = 50)

## End(Not run)

res <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1)
cl <- hclust(res)
## Not run: 
write.hclust(cl, file = "Test.txt", append = TRUE, prefix = "Bd4_Mis1", h = 60)

## End(Not run)
data(RFLPdata)
res <- RFLPdist(RFLPdata, nrBands = 4)
cl <- hclust(res)
## Not run: 
write.hclust(cl, file = "Test.txt", prefix = "Bd4", h = 50)

## End(Not run)

res <- RFLPdist2(RFLPdata, nrBands = 4, nrMissing = 1)
cl <- hclust(res)
## Not run: 
write.hclust(cl, file = "Test.txt", append = TRUE, prefix = "Bd4_Mis1", h = 60)

## End(Not run)

Package 'RFLPtools'

Help Index

Tools To Analyse RFLP-Data

Description

Details

Author(s)

References

Examples

Example data set for BLAST data

Description

Usage

Format

Details

Source

References

Examples

Distance Matrix Computation

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Compute matches for RFLP data via FragMatch.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Compute matches for RFLP data via GERM.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Linear Combination of Distances

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Example data set from GERM software

Description

Usage

Format

Details

Source

References

Examples

Function to compute number of bands.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Read BLAST data

Description

Usage

Arguments

Details

Value

Author(s)

References