Wednesday, February 25, 2015

Three years of network blogging


Today is the third anniversary of starting this blog, and this is post number 325. Thanks to all of our visitors over the past three years — we hope that the next year will be as productive as this past one has been.

I have summarized here some of the accumulated data, in order to document at least some of the productivity.

As of this morning, there have been 238,613 pageviews, with a median of 192 per day. The blog has continued to grow in popularity, with a median of 70 pageviews per day in the first year, 189 per day in the second year, and 353 per day in the third year. The range of pageviews was 172-1148 per day during this past year. The daily pattern for the three years is shown in the first graph.

Line graph of the number of pageviews through time, up to today.
The largest values are off the graph. The green line is the half-way mark.
The inset shows the mean (blue) and standard deviation of the daily number of pageviews.

There are a few general patterns in the data, the most obvious one being the day of the week, as shown in the inset of the above graph. The posts have usually been on Mondays and Wednesdays, and these two days have had the greatest mean number of pageviews.

Some of the more obvious dips include times such as Christmas - New Year; and the biggest peaks are associated with mentions of particular blog posts on popular sites.

Unfortunately, the data are also seriously skewed by visits from troll sites. These have been particularly from the Ukraine, which is solely responsible for the peak between days 900 and 1000. The smaller following peak represents visits from Taiwan.

The posts themselves have varied greatly in popularity, as shown in the next graph. It is actually a bit tricky to assign pageviews to particular posts, because visits to the blog's homepage are not attributed by the counter to any specific post. Since the current two posts are the ones that appear on the homepage, these posts are under-counted until they move off the homepage, (after which they can be accessed only by a direct visit to their own pages, and thus always get counted). On average, 30% of the blog's pageviews are to the homepage, rather than to a specific post page, and so there is considerable under-counting.

Scatterplot of post pageviews through time, up to last week; the line is the median.
Note the log scale, and that the values are under-counted (see the text).

It is good to note that the most popular posts were scattered throughout the years. Keeping in mind the initial under-counting, the top collection of posts (with counted pageviews) have been:
129
42
172
10
181
73
58
188
146
98
49
29
8
The Music Genome Project is no such thing
Charles Darwin's unpublished tree sketches
The acoustics of the Sydney Opera House
Why do we still use trees for the dog genealogy?
How do we interpret a rooted haplotype network?
Carnival of Evolution, Number 52
Who published the first phylogenetic tree?
Phylogenetics with SpongeBob
Charles Darwin's family pedigree network
Faux phylogenies
Evolutionary trees: old wine in new bottles?
Network analysis of scotch whiskies
Tattoo Monday
8,347
5,271
5,052
3,954
3,644
2,398
2,077
2,037
2,011
1,951
1,870
1,756
1,747
This list is not very different to the same time last year. Posts 129 (which is linked in Wikipedia) and 172 continue to receive visitors almost every day.

The audience for the blog continues to be firmly in the USA. Based on the number of pageviews, the visitor data are:
United States
France
Ukraine [spurious]
Germany
United Kingdom
Russia
Canada
Australia
China
Turkey
40.3%
6.8%
5.1%
5.0%
4.7%
3.1%
1.8%
1.6%
1.0%
0.7%

Finally, if anyone wants to contribute, then we welcome guest bloggers. This is a good forum to try out all of your half-baked ideas, in order to get some feedback, as well as to raise issues that have not yet received any discussion in the literature. If nothing else, it is a good place to be dogmatic without interference from a referee!

Monday, February 23, 2015

Darwin's Finches, genomics and phylogenetic networks


As a means of motivating his interest in speciation, in The Origin of Species Charles Darwin highlighted the diversity of morphological forms among the finches of the Galápagos Islands, in the south-eastern Pacific Ocean, which he visited while circumnavigating the world in The Beagle. He considered this to be a prime example of biodiversity related to adaptation and natural selection, what we would now call an adaptive radiation.

Recently, the following paper, which provides a genomic-scale study of these birds, has attracted considerable attention:
Lamichhaney S, Berglund J, Almén MS, Maqbool K, Grabherr M, Martinez-Barrio A, Promerová M, Rubin CJ, Wang C, Zamani N, Grant BR, Grant PR, Webster MT, Andersson L (205) Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 58: 371-375.
The authors note:
Darwin's finches are a classic example of a young adaptive radiation. They have diversified in beak sizes and shapes, feeding habits and diets in adapting to different food resources. The radiation is entirely intact, unlike most other radiations, none of the species having become extinct as a result of human activities.
Here we report results from whole genome re-sequencing of 120 individuals representing all Darwin's finch species and two closely related tanagers. For some species we collected samples from multiple islands. We comprehensively analyse patterns of intra- and inter-specific genome diversity and phylogenetic relationships among species. We find widespread evidence of inter-specific gene flow that may have enhanced evolutionary diversification throughout phylogeny, and report the discovery of a locus with a major effect on beak shape.
Sadly, the authors try to study the intra- and inter-specific variation principally using phylogenetic trees. They do this in spite of noting that:
Extensive sharing of genetic variation among populations was evident, particularly among ground and tree finches, with almost no fixed differences between species in each group.
Clearly, this situation requires a phylogenetic network for adequate study, as a network can always display at least as much phylogenetic information as a tree, and usually considerably more. The authors do recognize this:
A network constructed from autosomal genome sequences indicates conflicting signals in the internal branches of ground and tree finches that may reflect incomplete lineage sorting and/or gene flow ... We used PLINK to calculate genetic distance (on the basis of proportion of alleles identical by state) for all pairs of individuals separately for autosomes and the Z chromosome. We used the neighbour-net method of SplitsTree4 to compute the phylogenetic network from genetic distances.
However, this network is tucked away as Fig. 3 in the appendices. It is shown here in the first figure. The authors attribute the gene flow to introgression, but occasionally refer to hybridization and convergent evolution. Indeed, they suggest both relatively recent hybridization as well as the possibility of more ancient hybridization between warbler finches and other finches.


Clearly, this network is not particularly tree-like in places, especially with respect to the delimitation of species based on their morphology, as reflected in their current taxonomy. Nevertheless, the authors prefer to present as their main result as a:
maximum-likelihood phylogenetic tree based on autosomal genome sequences ... We used FastTree to infer approximately maximum-likelihood phylogenies with standard parameters for nucleotide alignments of variable positions in the data set. FastTree computes local support values with the Shimodaira–Hasegawa test.
This tree is shown in the second figure.


This apparently well-supported tree is not a particularly accurate representation of the pattern shown by the network. Indeed, it makes clear just why it is inadequate to use a tree to study the interplay of intra- and inter-specific variation. Gene flow requires a network for accurate representation, not a tree.

The authors do acknowledge this situation. While they try to date the nodes on their tree, they do note that:
Although these estimates are based on whole-genome data, they should be considered minimum times, as they do not take into account gene flow.
Actually, in the face of gene flow the concept that a node has a specific date is illogical, because the nodes do not represent discrete events (see Representing macro- and micro-evolution in a network). Given the authors' final conclusion, it seems quite inappropriate to rely on trees rather than networks:
Evidence of introgressive hybridization, which has been documented as a contemporary process, is found throughout the radiation. Hybridization has given rise to species of mixed ancestry, in the past and the present. It has influenced the evolution of a key phenotypic trait: beak shape ... The degree of continuity between historical and contemporary evolution is unexpected because introgressive hybridization plays no part in traditional accounts of adaptive radiations of animals.