Title: | Wrangle Phylogenetic Distance Matrices and Other Utilities |
---|---|
Description: | Harriet was Charles Darwin's pet tortoise (possibly). 'harrietr' implements some function to manipulate distance matrices and phylogenetic trees to make it easier to plot with 'ggplot2' and to manipulate using 'tidyverse' tools. |
Authors: | Anders Gonçalves da Silva [aut, cre] |
Maintainer: | Anders Gonçalves da Silva <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.2.4 |
Built: | 2024-10-29 03:46:11 UTC |
Source: | https://github.com/andersgs/harrietr |
This will take an alignment, will calculate the evolutionary distance between all pairs of sequence, and will transform the distance matrix to long format. It will remove upper triangle, and diagonal elements, so you end with only (n)*(n-1)/2 rows, where n are the total number of rows in the distance matrix.
dist_long(aln, order = NULL, dist = "N", tree = NULL)
dist_long(aln, order = NULL, dist = "N", tree = NULL)
aln |
An object of class matrix, it must be square |
order |
A character vector of size n with the order of the columns and rows (default: NULL) |
dist |
A string naming the model to calculate distances (accepted values are those in ape::dist.dna) |
tree |
An object of class phylo |
If a tree is optionally given, a fourth column is returned with the cophenetic distance across all elements of tree. It assumes the tree was generated from the alignment.
A data.frame with three or four columns: (1) iso1; (2) iso2: (3) dist. If a tree is given then a fourth column (evol_dist) containig the distances from the tree is also supplied.
## Not run: data(woodmouse) dist_df <- dist_long(woodmouse) ## End(Not run)
## Not run: data(woodmouse) dist_df <- dist_long(woodmouse) ## End(Not run)
In IQTREE it is possible to obtain node support values by SH aproximate likelihood ratio tests (SH-aLRT), and ultrafast bootstraps (uBS). Often, we do both, which IQTREE encodes as two numbers separated by a '/' as the internal node label. This function will return a data.frame with the number of the internal nodes, and the support values for each.
get_node_support(tree)
get_node_support(tree)
tree |
An object of type phylo generated using IQTREE |
A data.frame with internal node information, plus two columns: (1) SH-aLRT; and (2) uBS
## Not run: data(woodmouse_iqtree) get_node_support(woodmouse_iqtree) ## End(Not run)
## Not run: data(woodmouse_iqtree) get_node_support(woodmouse_iqtree) ## End(Not run)
harrietr
packageharrietr: Wrangle Phylogenetic Distance Matrices and Other Utilities
See the README on CRAN or GitHub
This functions takes the output from dist_long
, plus a
data.frame with metadata, and attaches it to the data.frame output from
dist_long
. It uses a column in the metadata data.frame as a key to
join the two data.frames. So, it requires a column of data in the metadata
data.frame to have same ID labels as those in the pairwise comparison table.
join_metadata(dist, meta, isolate = "ISOLATES", group = "CLUSTER", remove_ind = TRUE, measure_col_contains = "dist")
join_metadata(dist, meta, isolate = "ISOLATES", group = "CLUSTER", remove_ind = TRUE, measure_col_contains = "dist")
dist |
A data.frame produced by dist_long function |
meta |
A data.frame with one column of IDs that match the IDs in |
isolate |
A character string with the name of the column in the meta data.frame with the ID data |
group |
A character string with the name of column containing the grouping variable |
remove_ind |
A boolean whether to remove all non-essential columns |
measure_col_contains |
A character string with a pattern that matches up with the measurement columns you wish to retain in the final output (default: 'dist') |
The output from dist_long
with an additional column containing
a factor, with levels composed of joining the categories in the group
colum of the metadata data.frame for each pairwise comparison. For example,
if one row has distance between samples id1 and id2, and in the grouping column
of the metadata id1 is identified as part of group 'A', and id2 is identified
as part of group 'B', then the output from that row will 'AB'. If they were
from the same group, say 'A', the output would be just 'A'. In this way
it is easy to identify pairs of isolates that are from the same group, and
pairs of isolates that are from different groups.
## Not run: data(woodmouse) data(woodmouse_meta) dist_df <- dist_long(woodmouse) join_metadata(dist_df, woodmouse_meta, isolate = 'SAMPLE_ID', group = 'CLUSTER', remove_ind = TRUE) ## End(Not run)
## Not run: data(woodmouse) data(woodmouse_meta) dist_df <- dist_long(woodmouse) join_metadata(dist_df, woodmouse_meta, isolate = 'SAMPLE_ID', group = 'CLUSTER', remove_ind = TRUE) ## End(Not run)
This will take a square distance matrix, and will transform in to long format. It will remove upper triangle, and diagonal elements, so you end with only (n)*(n-1)/2 rows, where n are the total number of rows in the distance matrix.
melt_dist(dist, order = NULL, dist_name = "dist")
melt_dist(dist, order = NULL, dist_name = "dist")
dist |
An object of class matrix, it must be square |
order |
A character vector of size n with the order of the columns and rows (default: NULL) |
dist_name |
A string to name the distance column in the output (default: dist) |
A data.frame with three columns: (1) iso1; (2) iso2; (3) dist. iso1 and iso2 indicate the pair being compared, and dist indicates the distance between that pair.
## Not run: data(woodmouse) dist <- ape::dist(woodmouse, model = 'N', as.matrix = TRUE) dist_df <- melt_dist(dist) ## End(Not run)
## Not run: data(woodmouse) dist <- ape::dist(woodmouse, model = 'N', as.matrix = TRUE) dist_df <- melt_dist(dist) ## End(Not run)
Woodmouse dataset
woodmouse
woodmouse
An object of class DNAbin
with 15 rows and 965 columns.
"ape" package woodmouse
Generated a multiFASTA, and used IQTREE to generate a tree with the following command:
woodmouse_iqtree
woodmouse_iqtree
An object of class phylo
of length 5.
iqtree -s woodmouse.fasta -m TEST -nt 4 -bb 1000 -alrt 1000
The tree was loaded into 'R' using 'ape::read.tree', and saved to a dataset.
"ape" package woodmouse
A dummy metadata table generated to demonstrate the use of add_metadata
.
woodmouse_meta
woodmouse_meta
An object of class tbl_df
(inherits from tbl
, data.frame
) with 15 rows and 2 columns.