The daughters of PIE

J. P. Mallory held a talk in Copenhagen in 2012 with some fascinating data about Proto-Indo-European and the daughter languages. In it, he explains that he was able to reconstruct 1364 individual lexemes in PIE. So how many cognates, he asks, are preserved in each language? The results are fascinating. For example, over 900 cognates are found in Indic languages (the highest) while Iranian languages have just above 600 cognates -- even though both come from the Indo-Iranian family. 

Some quick playing with his data paints a very intriguing picture:

Language family - Percentage of lexemes conserved from PIE

  1. Indic - 67.8% 
  2. Greek - 56.5%
  3. Germanic - 55.8%
  4. Italic - 51.6%
  5. Iranian - 49.4%
  6. Baltic - 44%
  7. Celtic - 39.5%
  8. Slavic - 38.9%
  9. Tocharian - 34.1%
  10. Anatolian - 26%
  11. Armenian - 21.1%
  12. Albanian - 16.6%

Part of the differences has to do with the age of attestation. Indic languages are best represented by Classical Sanskrit, which is the second-oldest written IE language. Compare that to Albanian, which barely surfaces in writing until the age of modernity. A lot of language change can go on in 2500+ years. 

On the other hand, there are some surprises, as Mallory notes. Anatolian is the oldest representative of IE languages and yet it performs remarkably poorly. I have two gut reactions to that: first, the Anatolian languages are so ancient (and extinct very early on) that we do not have nearly as much written material as others; second, the Anatolian languages were heavily influenced by their geographic neighbors which resulted in a high degree of lexical loss.