Wednesday, April 1, 2015

The first post-Darwinian phylogeny

It is tolerably well known that Alfred Russel Wallace developed the idea of evolution via natural selection quite independently of Charles Darwin, and that, indeed, it was Wallace's revelation of this fact that prompted Darwin to finally publish his ideas (Bannister et al. 2014).

Some people are even aware that Wallace developed the Tree of Life metaphor independently, as well (Wallace 1855), a fact of which Darwin himself was perfectly well aware (eg. Bradman and Bartlett 1998):
"the analogy of a branching tree [is] the best mode of representing the natural arrangement of species ... a complicated branching of the lines of affinity, as intricate as the twigs of a gnarled oak ... we have only fragments of this vast system, the stem and main branches being represented by extinct species of which we have no knowledge, while a vast mass of limbs and boughs and minute twigs and scattered leaves is what we have to place in order, and determine the true position each originally occupied with regard to the others."
What is less well known is Wallace's contribution to phylogenetic imagery.

The Darwinian version of a phylogenetic tree is, of course, something usually considered to post-date 1859, when Darwin published his best-known book. However, producing such a tree was apparently a rather slow process. For example, in 1863, Franz Hilgendorf wrote a PhD thesis for which he produced a hand-drawn phylogeny, but he did not actually include this in the thesis; and he significantly modified it for its publication in 1866. In 1864 Fritz Müller published a couple of three-taxon trees. Also in 1864, Ernst Haeckel claimed to have started work on his series of phylogenetic trees, but the resulting book was not published until 1866. This means that the first substantial tree to appear in print was that of Mivart (1865).

However, long before this, Wallace was already moving ahead. In 1856 Wallace took the tree imagery from his 1855 publication and applied it to the relationships among bird groups. This publication was his first clearly evolutionary empirical contribution. He adapted the unrooted diagram of Strickland (1841), which represented "the natural system" of bird relationships, and gave it a clearly evolutionary interpretation. So, while Strickland's work was strictly atemporal and non-evolutionary, Wallace produced an evolutionary view of the world, with his two trees representing the end-product of change through time.

Wallace was in South-East Asia at the time of this work, collecting specimens among the islands of what is now Indonesia. He returned to England in 1862, thus having been absent during Darwin's rise to fame. However, he did return before anyone else had tackled Darwin's ideas empirically, and he was in an ideal position to do so himself (Beckenbauer et al. 2010). It would therefore be surprising if he had not done so.

Recently, it has become clear, as a result of the work done for the Wallace Correspondence Project, that Wallace did, indeed, produce a post-Darwinian phylogenetic diagram before any of his contemporaries, although it remained unpublished (Becker and Borg 2014). Not unexpectedly, it also refers to the relationships among birds. What is most interesting for us, however, is that it was a phylogenetic network, not a tree.

You will note that it is an unrooted network, in the same manner as his unrooted bird trees from 1856. In this, his presentation differed from that of Müller, Hilgendorf, Mivart and Haeckel, who all indicated a common ancestor. On the other hand, the branch lengths represent the "relative amount of affinity" between the named taxa, unlike the diagrams of his contemporaries. This means that the diagram can, indeed, be interpreted (in modern terms) as an unrooted phylogenetic network.

In his bird paper, Wallace (1856) had noted that producing the tree diagrams is not easy, as "you will most likely find that you have set down some conflicting affinities, or that you have mistaken some mere analogies for affinities". This seems to be the origin of his interest in the alternative model of a network, rather than a tree (Brabham and Berger 2014), thus making him the first person the use a data-display network to represent conflicting character data.

This post was inspired by the work of Torvill and Dean (1996). Happy April 1.


Bannister RG, Ballesteros-Sota S, Bjørndalen OE (2014) Running, swinging and skiing — the private life of Alfred Russel Wallace. Studia Wallaceana 6: 82-96.

Becker BF, Borg BR (2014) The phylogenetics of A.R. Wallace, and its relation to the science of tennis. Journal of Phylogenetic Inference 13: 101-110.

Beckenbauer FA, Best G, Bruyneel J (2010) Association football as a metaphor for phylogenetics. Is it a sport or a science? Phyloinformatics 7:1.

Brabham JA, Berger G (2014) The speed required to achieve the publication rate of A.R. Wallace. Philosophy and History of Biology 102: 89-92.

Bradman DG, Bartlett KC (1998) Wallace Down Under: the work of Alfred Russel Wallace in the southern hemisphere. Systematic Zoology 47: 767-780.

Haeckel E (1866) Generelle Morphologie der Organismen. Verlag von Georg Reimer, Berlin.

Hilgendorf F (1866) Planorbis multiformis im Steinheimer Süßwasserkalk: ein beispiel von gestaltveränderung im laufe der zeit. Buchhandlung von W. Weber, Berlin.

Mivart, StG (1865) Contributions towards a more complete knowledge of the axial skeleton in the primates. Proceedings of the Zoological Society of London 33: 545-592.

Müller F (1864) Für Darwin. Verlag von Wilhelm Engelman, Leipzig.

Strickland HE (1841) On the true method of discovering the natural system in zoology and botany. Annals and Magazine of Natural History 6: 184-194.

Torvill J, Dean CC (1996) Skating on thin ice. Systematic Biology 45: 641-650.

Wallace AR (1855) On the law which has regulated the introduction of new species. Annals and Magazine of Natural History 16 (2nd series): 184-196.

Wallace AR (1856) Attempts at a natural arrangement of birds. Annals and Magazine of Natural History 18 (2nd series): 193-216.

Monday, March 30, 2015

Inconsequential splits in NeighborNet graphs

NeighborNet produces splits graphs based on distances between the taxa, rather than using the original character data. This approach can produce what we might call inconsequential splits in the graph — that is, splits that are not explicitly supported by the character data. Here, I present a simple example to illustrate the extent to which this can occur.

The data are taken from: Nanette Thomas, Jeremy J. Bruhl, Andrew Ford, Peter H. Weston (2014) Molecular dating of Winteraceae reveals a complex biogeographical history involving both ancient Gondwanan vicariance and long-distance dispersal. Journal of Biogeography 41: 894-904.

This dataset consists of a set of eight morphological features of the pollen from 31 extant plant taxa plus two fossil samples, as shown in this data matrix:

T_lanceolata        00111011
T_stipitata         00111011
T_purpurescens      00111011
T_xerophila_x       00111011
T_xerophila_r       00111011
T_vickeriana        00111011
T_glaucifolia       00111011
T_membranea         00111011
T_insipida          00111011
T_perrieri          00111010
D_winteri           00111010
D_grenadensis       00111010
B_comptonii         00011010
B_howeana           00011010
B_semicarpoides     00011010
B_whiteana          00011010
B_queenslandiana_q  00011010
B_queenslandiana_1  00011010
P_axillaris         00011011
P_colorata          00011011
Pseudowinterapollis 00011011
B_pancheri          01001011
Harrisipollenites   01001100
Z_acsmithii         01001101
E_stipitatum        01001101
Z_bicolor           01001101
Z_balansae          11001101
C_dinisii           1-111101
C_madagascariensis  1-111101
W_salutaris         1-111101
P_macranthum        1-111101
C_ekmanii           1-111101
C_winterana         1-111101

Note that there are only nine groups of taxa (separated by the dashed lines) — within each group the data are identical. Each character has two states: present / absent.

The resulting NeighborNet, as produced by default using the SplitsTree4 program, is shown in the first graph.

As expected, the taxa form nine groups. There are a number of apparently well-supported splits (ie. with long edges) separating these groups. There are also a number of smaller splits, and a whole series of very tiny splits. None of these latter two groupings are explicitly present in the dataset — the only splits supported by the characters are plotted onto the graph using the character numbers. (Note that character 5 is uninformative.)

The series of very tiny splits are present throughout the graph as extremely short edges. For example, a detailed view of the bottom left-hand corner of the graph is shown in the next figure.

Note that these six taxa have identical character data, and therefore their separation into four groups is entirely an artifact of the NeighborNet algorithm.

So, one needs to be careful when interpreting small splits in such a graph — they may have biologiocal support and they may not.