Wednesday, October 22, 2014

Is phylogenomics tree-like?


Phylogenomics, the idea of applying genomic data to phylogenetic studies, has been around for quite a while now (Eisen 1998), although it was probably Rokas et al. (2003) who drew the first widespread attention among phylogeneticists. Molecular phylogenetics started off using the sequence of a single locus (often small-subunit rRNA) as the data, and slowly progressed from there to multiple loci. Currently, it is considered good practice to use half-a-dozen loci, sampling the main genomes (nucleus, mitochondrion, plastid); and genomics offers the possibility of a fast and cost-effective means of generating large amounts of multi-locus sequence data.


Review papers are beginning to appear based explicitly on next-generation sequencing (NGS), such as those of Lemmon & Lemmon (2013) and McCormack et al. (2013), replacing the earlier work of Philippe et al. (2005), and there are suggestions for how phylogenetics analyses might need to change on response to NGS data (Chan and Ragan 2013). These all treat phylogenomics as being very similar to traditional molecular phylogenetics, in the sense that many people are expecting phylogenomics to provide tree-like resolution of questions that remain unresolved with the current smaller datasets. In the words of Rokas et al. (2003), phylogenomics is intent on "resolving incongruence in molecular phylogenies". That is, incongruent gene trees are seen as the major obstacle to be overcome by phylogenetics data analysis (see also Jeffroy et al. 2006).

However, this might be a naive expectation. After all, the existing phylogenetic conflicts are there for a reason. If we cannot resolve certain parts of organismal history in terms of a phylogenetic tree when we use the current levels of multi-locus data (say <10 loci), then there is no real reason to think that this will happen just because we increase the number of loci. There are plenty of other reason for incongruence among genes, the most obvious one being that the history is not tree-like in the first place. The advantage of phylogenomics, then, would be its ability to clarify the phylogenetic history rather than to resolve incongruence.

There are now quite a few published empirical phylogenomic studies, which allows us to provide a preliminary answer to the question about whether phylogenomic patterns are tree-like or not. There are a few published studies where the authors claim resolution in terms of a tree, as least for part of their phylogeny (e.g. Wang et al. 2012), but it seems to me that there are far more studies where the incongruence remains even with genomic data. Below, I briefly introduce a few arbitrarily chosen examples.

So, complex genealogical problems often remain complex even after using genomic data. We haven’t "solved" any of the so-called genealogy problems, we have simply made clear in what way they are complex. That is, genomics data generally reveal reticulate evolutionary histories, not simple tree-like ones.

This leads me to conclude that phylogenomics is about reticulate evolution, and it is thus time for phylogeneticists to abandon trees as a model for genealogies. We have probably already resolved most of the simple tree-like genealogical patterns, using non-genomic data, and from here on we will be using genomics to study gene flow in addition to parental gene inheritance.

Examples

(1) Galtier and Daubin (2008) were among the earliest researchers to try to "deal with incongruence in phylogenomic analyses", and one of their examples was the long-standing problem deciphering the relationships among the closest relatives of humans. However, the genomics data make it clear that, while humans share slightly more genes with chimpanzees than with other great apes, we still share some with gorillas but not chimpanzees, and with orangutans but not either chimpanzees or gorillas. Also, chimpanzees share some genes with gorillas that we do not share. The situation is now clearer, but the tree incongruence remains.


(2) At the same time, Kuo et al. (2008) looked at the then-available genomes for members of the Apicomplexa, which are unicellular eukaryotic parasites. The genomic data confirmed the current groupings of Haemosporidians, Piroplasmids and Coccidians (shown as branches with high support in the diagram) but completely failed to resolve the relationships between these groups (shown as branches with low support). Things are no better today, when we have at least some data for 35 genomes.


(3) The relationships among mammal superorders, particularly the placentals, has been a ongoing area of debate. I have already covered this in some previous blog posts, notably Conflicting placental roots: network or tree? and Why are there conflicting placental roots? There are three possible ways of resolving a tree at the root of the placental phylogeny, and genomic datasets seem to support all three of them — the published different trees are therefore based on variation in the model used for data analysis. As Hallström and Janke (2010) have noted, there was probably incomplete lineage sorting and hybridization in the early placental mammalian divergences, rather than a truly tree-like history.

(4) Dell'Ampio et al. (2014) looked at the phylogenetic relationships of the wingless insects, and tried to come to grips with the incongruence among genes. They considered three main tree-based hypotheses for the relationships, and found that genomic support was pretty evenly spread among the three topologies. They dryly note that after their hard work the relationships "are still considered unresolved."


(5) Relationships among hominids have been a popular study for many years, and not unexpectedly there has been a burst as a result of genomic data, especially as there are now SNP micro-arrays available to simplify the data collection. I have covered this in previous posts, as well, notably Why do we still use trees for the Neandertal genealogy? The bottom line is that the genomic data provide evidence of extensive introgression (or admixture) between humans and their nearest relatives throughout their time of co-existence. This example is from Reich et al. (2011).


References

Chan CX, Ragan MA (2013) Next-generation phylogenomics. Biology Direct 8: 3.

Dell'Ampio E, Meusemann K, Szucsich NU, Peters RS, Meyer B, Borner J, Petersen M, Aberer AJ, Stamatakis A, Walzl MG, Minh BQ, von Haeseler A, Ebersberger I, Pass G, Misof B (2014) Decisive data sets in phylogenomics: lessons from studies on the phylogenetic relationships of primarily wingless insects. Molecular Biology and Evolution 31: 239-249.

Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research 8: 163-167.

Galtier N, Daubin V (2008) Dealing with incongruence in phylogenomic analyses. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences 363: 4023-4029.

Hallström BM, Janke A (2010) Mammalian evolution may not be strictly bifurcating. Molecular Biology and Evolution 27: 2804-2816.

Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends in Genetics 22: 225-231.

Kuo C-H, Wares JP, Kissinger JC (2008) The Apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees. Molecular Biology and Evolution 25: 2689-2698.

Lemmon EM, Lemmon AR (2013) High-throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics 44: 99-121.

McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT (2013) Applications of next-generation sequencing to phylogeography and phylogenetics. Molecular Phylogenetics and Evolution 66: 526-538.

Philippe H, Delsuc F, Brinkmann H. Lartillot N (2005) Phylogenomics. Annual Review of Ecology, Evolution, and Systematics 36: 541-562.

Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, Ko AM, Ko Y-C, Jinam TA, Phipps ME, Saitou N, Wollstein A, Kayser M, Pääbo S, Stoneking M (2011) Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. American Journal of Human Genetics 89: 516-528.

Rokas A, Williams BL, King N, Carroll SB (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798-804.

Wang N, Braun EL, Kimball RT (2012) Testing hypotheses about the sister group of the Passeriformes using an independent 30 locus dataset. Molecular Biology and Evolution 29: 737-750.

Monday, October 20, 2014

Beer family trees


Some time ago I wrote a blog post about The bourbon family forest, which contained a collection of trees that, rather than being genealogical trees, instead showed the corporate ownership of American whiskey.

Here is a similar arrangement for "the six companies that make 50% of the world's beer", produced by David Yanofsky at the Quartz blog. As before, the vertical axis is actually a time scale, but the trees are only marginally family trees in the genealogical sense. Note that there is a reticulation between two of the trees for the "Scottish & Newcastle" entry, although this was apparently followed immediately by a subsequent divergence.


Nevertheless, roughly the same sort of information could actually be presented as proper genealogies. Here is an example form Philip Howard's blog, restricted to American beer. Note that the genealogies refer to the joining of branches through time, rather than their splitting. There are two reticulation events, one of which also refers to the "Scottish & Newcastle" entry.


It is also worth noting the use of other types of network by Philip Howard, to look at: