NeighborJoining

Documentation for NeighborJoining.

This package contains algorithms for neighbor joining

What is currently implemented?

  • regular Neighborjoining regNJ(): Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406-425 (1987).
  • fast NeighborJoining fastNJ(): Li, J. F. A fast neighbor joining method. Genet Mol Res 14, 8733–8743 (2015).
    • uses heuristics, will not always find the additive tree

Installation

add git@github.com:BenjaminDoran/NeighborJoining.jl.git

Examples

# make distance matrix
d = rand(10, 10)^2
for i in 1:size(d,1) d[i,i]=0 end;

# regular neighbor joining
njclusts = regNJ(d)

# or fast neighbor joining
njclusts = fastNJ(d)

# convert to newicktree string for export to other packages
nwstring = newickstring(njclusts)

# adding labels to the leaves
labels = ["leaf $i" for i in 1:10]
nwstring = newickstring(njclusts, labels)

# adding labels to the internal nodes
nwstring = newickstring(njclusts, labels; labelinternalnodes=true)
NeighborJoining.NJClustType
NJClust(merges::Matrix{M}, heights::Matrix{H})

fields:

  • merges is a n-1 x 2 matrix of integers: absolute values of negative integers indicate index into the distance matrix (i.e., leaves). positive integers are the index into the merge list (i.e., the kth internal node)
  • heights is an n-1 x 2 matrix where each value is the distance from the left (1) or right (2) chield from its parent. Specifically heights[i,j] is the jth childs distance to the parent node, row i.
source
NeighborJoining.fastNJMethod
fastNJ(d::AbstractMatrix{<:Number})

fastNJ algorithm finds k independent pairs to merge and merges them for each iteration. This is significantly faster because it is not recalculating the full pairwise Q for each pair joined.

This algorithm is nearly additive, but there are instances where the topology '(a,(b,(c,d)))' is inferred as '((a,b), (c,d))'. This happens when 'b' is not as close to 'c' or 'd' as it is to 'a', but 'b' is closer to the common ancestor '(c,d)' than it is to 'a'.

args:

  • d is an n by n square symetric distance matrix

returns:

  • NJClust struct with fields merges and heights
source
NeighborJoining.newickstringFunction
newickstring(njc::NJClust, tiplabels=AbstractVector{<:String}; labelinternalnodes=false)
newickstring(merges::AbstractArray{<:Integer}, heights::AbstractArray{<:AbstractFloat}, tiplabels::AbstractVector{<:String}; labelinternalnodes=false)

Converts a list of merges into a newicktree formatted string.

args:

  • njc: is a struct that has merges and heights
  • merges and heights from a NJClust struct:
    • merges is a n-1 x 2 matrix of integers: absolute values of negative integers indicate index into the distance matrix (i.e., leaves). positive integers are the index into the merge list (i.e., the kth internal node)
    • heights is an n-1 x 2 matrix where each value is the distance from the left (1) or right (2) chield from its parent. Specifically heights[i,j] is the jth childs distance to the parent node, row i.
  • tiplabels: vector of string labels corresponding to the order of leaves in the distance matrix
  • labelinternalnodes: whether to generate node labels for the internal nodes. defaults to false.

returns:

  • newicktree formatted string

example:

d = rand(10, 10)^2
for i in 1:size(d,1) d[i,i]=0 end;
njclusts = regNJ(d)
nwstring = newicktstring(njclusts)
source
NeighborJoining.regNJMethod
regNJ(d::AbstractMatrix{<:Number})

regNJ algorithm is the traditional NeighborJoining algorithm from

Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406-425 (1987).

This algorithm is guarenteed to infer the tree for additive distance matrices, but it does have an algorithmic complexity of O(n^3), so it can be slow for distance matrices on the order of >10³.

args:

  • d is an n by n square symetric distance matrix

returns:

  • NJClust struct with fields merges and heights
source