Monday, May 28, 2018

Comparing reconstruction systems in historical linguistics

The term linguistic reconstruction has a very specific meaning in historical linguistics, pointing usually to the techniques that are used in order to infer how a given language was originally pronounced, even though it has not been attested in written sources. In previous posts, I have occasionally pointed to reconstructed forms, the so-called proto-forms, which linguists usually mark as such by putting an asterisk in from of them. For example, the word Indo-European *ph₂tér- is a reconstructed proto-form for the supposed Indo-European word "father".

While the reconstruction techniques are usually limited to languages for which we have no written record, they can in principle also be applied in order to find out how ancient languages like, for example, Latin and Greek, were pronounced in detail (Sturtevant 1920). For languages like Chinese, whose writing system leaves almost no clues about pronunciation, linguistic reconstruction is the only way to investigate the pronunciation of the oldest stages of the language.

When dealing with different reconstruction systems for Old Chinese phonology, it is quite difficult, even for experienced scholars, to spot the actual differences between the systems. That these differences exist, and that they can be quite substantial, is beyond question — and easy to understand, if one takes into account that Old Chinese is reconstructed with the help of a philological (as opposed to a mainly comparative) approach, by which data from different sources is sifted and individually weighed (cf. Jarceva 1990: 409 and List 2008).

When comparing different reconstruction systems, it is not enough simply to look at the inventories of proto-phonemes proposed by different scholars. Even if two proto-inventories (the sets of the reconstructed sounds) are exactly the same, it is possible that scholars will provide different reconstructions for individual characters. The only way to compare two or more reconstruction systems is therefore to compare the concrete reconstructions for a certain number of characters.

In addition to the sample of words, however, we also need a clear account of which segments (which proto-sounds) should be compared with each other. When comparing proto-forms for Chinese 一 ‘one’ in different Old Chinese reconstruction systems, such as Karlgren (1950) *ʔi ̯ĕt, Li (1971) *ʔjit, Wáng (1980) *iet, and Baxter and Sagart (2014) *ʔi[t], we would obviously not compare the medial *i ̯ of Karlgren with the initial *ʔ of Baxter and Sagart.

When adding more reconstructions, such as the one for 七 ‘seven’ across the four systems, for which the authors give *ts'i ̯ĕt, *tshjit, *tshiet, and *[tsh]i[t], respectively, we can further see that there are not only differences for the different segments in the same positions, but also for the interpretation of the words. Although all authors give different medials, main vowels, and finals in the two words, they are structurally consistent in giving both words the same sound segments for medial, nucleus, and coda.

What we can see from this example is that any difference in the sound segments, like the choice of initials, or the concrete solution proposed for a problem, does not immediately reflect important differences in the reconstruction systems. If two scholars just choose another symbol for a distinction that they both recognize and acknowledge, this does not render the reconstructions incompatible. It should therefore not be used as a criterion for dismissing a given reconstruction system, at least not in a first step. If two systems are structurally equivalent, then they have equivalent predictive power for the descendant language(s) they are supposed to reconstruct.

This abstractionist notion of proto-forms, which can be found in the early work of Saussure (1916) and Meillet (1903), is problematic for the endeavour of linguistic reconstruction, and usually not strictly followed (Lass 2017). Nevertheless, the potentially abstract notion of proto-forms is important to be kept in mind when comparing different reconstruction systems. When distinguishing the structural differences (which result from the direct interpretation of the data and the identification of regular sound correspondences) from the substantial differences (resulting from a phonetic and phonological interpretation of the identified correspondences), we have a much clearer account of the core of the differences, and whether they are worth our consideration or not.

But how can we compare reconstruction systems structurally? Firstly, we need to have the data assembled in aligned form, in order to make sure that we only compare like with like (e.g., medial with medial, and vowel with vowel). A sample illustration in which alignments of the proto-forms for ‘seven’ and ‘one’, produced with the help of the EDICTOR tool (List 2017), is given in the figure below.

Comparing reconstruction proposals with the help of alignments.

Alternatively, we can also select a single aspect, such as, for example, the vowel system proposed in different reconstruction systems. Having assembled a substantial amount of different proto-forms in this way, the structural comparison between different reconstruction systems can be modeled as a comparison of different cluster analyses, or, more accurately, partitioning analyses. A partitioning analysis assigns a given number of objects to a certain number of different groups. When dealing only with the vowels proposed by different reconstruction systems, we can say that a given reconstruction, like the one by Karlgren, for example, assigns each Chinese character, for which a proto-form is given, to a particular group depending on the main vowel selected for the reconstruction.

If, for a given number of reconstructions, we model each reconstruction system as a partitioning analysis, based on the main vowel proposed by the system, we can use standard metrics from graph theory and Natural Language Processing to compare different reconstruction systems with each other. Very straight-forward measures for the comparison of two partitioning analyses are the so-called B-Cubed scores (Amigó et al. 2009), which have proven specifically useful for the evaluation of automatic cognate detection methods in historical linguistics, compared to a gold standard (Hauer and Kondrak 2011, List et al. 2017).

Being an evaluation measure, B-Cubed scores come in the typical three flavors of precision, recall, and F-Score. Precision is similar to the notion of true positives, and recall is similar to true negatives. For the purpose of comparing reconstruction systems, only the F-score is needed, as it is a symmetric measure, and the notion of true positives and true negatives is meaningless, unless we decide that we blindly trust one of the given systems. As also for the scores for precision and recall, the F-score ranges between 0 and 1, with 1 indicating that the two partitioning analyses are identical.

In order to compare more than one reconstruction system, we can make use of techniques for exploratory data analysis (Morrison 2014); and the most straightforward way to do this, is, of course, to use the NeighborNet algorithm (Bryant and Moulton 2004), as provided by the SplitsTree package (Huson 1998).

In order to illustrate how data-display networks can be used to study differences among Old Chinese reconstruction systems, I designed a little experiment, based on data taken from (List et al. 2017b), who provide Old Chinese reconstructions for all rhyme words in the Shījīng based on eight different reconstruction systems (Baxter and Sagart 2014, Karlgren 1950, Li 1971, Pān 2000, Schuessler 2007, Starostin 1989, Wáng 1980, Zhèngzhāng 2003).

In order to keep the analysis simple, I extracted only the different reconstructions of the main vowel for each character in each system, and carried out a pairwise comparison of all eight systems, computing the B-Cubed F-scores for each pair, omitting characters for which no reconstruction could be found in the data. These scores were then converted to a distance matrix, and fed to the NeighborNet algorithm (the source code can be downloaded here). The resulting network is provided in the figure below.

NeighorNet reflecting the closeness of the different reconstruction systems
As one can see, the data roughly clusters into three subgroups, namely Schuessler, Baxter and Sagart, and Starostin vs. Pān and Zhèngzhāng vs. Karlgren, Li, and Wáng. On a larger scale, we can divide the data into all six-vowel systems versus the non-six-vowel systems (Karlgren, Wáng, Li). Given that Pān is a direct student of Zhèngzhāng, the closeness between their reconstruction systems is not surprising.

What may be surprising is the closeness of the Schuessler, Starostin, and Baxter and Sagart systems, given their notable differences with respect to the criterion of vowel purity tested by List et al. (2017b). Even if the network analysis cannot directly explain all of these differences in detail, it seems like a worthwhile enterprise, which should be further expanded by comparing not only the vowels, but fully aligned proto-forms.

Given the straightforwardness of the application, it seems also useful to test it on other language families where there is similar disagreement, as in the reconstruction of Old Chinese phonology.


Amigó, E., J. Gonzalo, J. Artiles, and F. Verdejo (2009): A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12.4. 461-486.

Baxter, W. and L. Sagart (2014) Old Chinese: a new reconstruction. Oxford University Press: Oxford.

Bryant, D. and V. Moulton (2004) Neighbor-Net. An agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution 21.2. 255-265.

Hauer, B. and G. Kondrak (2011) Clustering semantically equivalent words into cognate sets in multilingual lists. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. AFNLP 865-873.

Huson, D. (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14.1. 68-73.

Jarceva, V. (1990) Sovetskaja Enciklopedija: Moscow.

Karlgren, B. (1950) The Book of Odes. Chinese text, transcription and translation. Museum of Far Eastern Antiquities: Stockholm.

Lass, R. (2017) Reality in a soft science: the metaphonology of historical reconstruction. Papers in Historical Phonology 2.1. 152-163.

Li Fang-kuei 李方桂 (1971) Shànggǔyīn yánjiū 上古音研究 [Studies on Archaic Chinese phonology]. Qīnghuá Xuébào 清華學報 9.1-2. 1-60.

List, J.-M. (2008) Rekonstruktion der Aussprache des Mittel- und Altchinesischen. Vergleich der Rekonstruktionsmethoden der indogermanischen und der chinesischen Sprachwissenschaft [Reconstruction of the pronunciation of Middle and Old Chinese. Comparison of reconstruction methods in Indo-European and Chinese linguistics]. Magister thesis. Freie Universität Berlin: Berlin.

List, J.-M., S. Greenhill, and R. Gray (2017) The potential of automatic word comparison for historical linguistics. PLOS One 12.1. 1-18.

List, J.-M. (2017) A web-based interactive tool for creating, inspecting, editing, and publishing etymological datasets. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. System Demonstrations. 9-12.

List, J.-M., J. Pathmanathan, N. Hill, E. Bapteste, and P. Lopez (2017) Vowel purity and rhyme evidence in Old Chinese reconstruction. Lingua Sinica 3.1. 1-17.

Meillet, A. (1903) Introduction à l’étude comparative des langues indo-européennes. Hachette: Paris.

Morrison, D.A. (2014) Phylogenetic networks: a new form of multivariate data summary for data mining and exploratory data analysis. WIREs Data Mining and Knowledge Discovery 4: 296-312.

Pān Wùyún 潘悟云 (2000) Hànyǔ lìshǐ yīnyùnxué 汉语历史音韵学 [Chinese historical phonology]. Shànghǎi Jiàoyù 上海教育: Shànghǎi 上海.

de Saussure, F. (1916) Cours de linguistique générale. Payot: Lausanne.

Schuessler, A. (2007) ABC Etymological dictionary of Old Chinese. University of Hawai’i Press: Honolulu.

Starostin, S. (1989) Sravnitel’no-istoričeskoe jazykoznanie i leksikostatistika [Comparative-historical linguistics and lexicostatistics]. In: Kullanda, S., J. Longinov, A. Militarev, E. Nosenko, and V. Shnirel’man (eds.): Lingvističeskaja rekonstrukcija i drevnejšaja istorija VostokaMaterialy k diskussijam na konferencii.[Materials for the discussion on the conference].1. Institut Vostokovedenija: Moscow. 3-39.

Sturtevant, E. (1920) The pronunciation of Greek and Latin. University of Chicago Press: Chicago.

Zhèngzhāng Shàngfāng 郑张尚芳 (2003) Shànggǔ yīnxì 上古音系 [Old Chinese phonology]. Shànghǎi Jiàoyù 上海教育: Shànghǎi 上海.