When words become ghosts

Gąsiorowski has an awesome etymology of "fuck," which everyone should go read over at Language Evolution. With the glorious tool of Google Ngram, we can see the taboo-ization of gros mots. We can see the moment a word becomes a cuss, a taboo word. Below are charts of the appearances of words in literature and their phantom disappearances. No, the word didn't exit the English language. People still used them. But taboo eradicated the words from print.

Throwing stones in other people's glass houses.

Review: The Indo-European Controversy: Facts and Fallacies in Historical Linguistics

Asya Pereltsvaig, Martin W. Lewis

Cambridge University Press, 2015

The Indo-European Controversy (TIEC) evaluates modern theories of Indo-European origins in three thematic sections. The first part consists of two chapters on older theories, ranging from theories accepted by mainstream scientists to those on the fringe. The second part consists of five chapters critically evaluating a new theory of Indo-European origins: the Gray & Atkinson model. The third part cautiously advocates for the Revised Kurgan Steppe Hypothesis in four chapters. In addition, this book includes an introduction and over fifty pages of maps, tables, and appendicular notes.

Disclosure: Prof. Pereltsvaig was kind enough to send this book for free.

The Indo-European Controversy will make you angry.

No, not angry with the authors. The book is well written and their analyses well researched. But at some point you’re going to get angry with the media and their addiction to sensationalized news stories, and frustrated with the relative silence from mainstream linguists who could disseminate basic information on language evolution to the public. You will grow upset how grievously flawed and highly speculative theories on the origins of the Indo-Europeans are published in highly regarded academic journals.

TIEC investigates the putative homeland for the original speakers of the Proto-Indo-European language. The book is epic in scope. Pereltsvaig & Lewis survey a hundred years of scientific inquiry, and no theory is left untouched. Readers wanting a survey of mainstream thought will of course be satisfied, and lovers of pseudo-science, fringe theorists, and general “nut cases” will be happy as well. The Out-of-India model; the white supremacist’s Alpinic Hypothesis; the “cannabis trail” theory; it’s all there.

But what is mainstream in Indo-European studies? Okay, let’s answer that with a quick survey of our own:

Over the last thirty years or so, there have been two competing theories for the Indo-European homeland: the Anatolian Hypothesis and the Revised Kurgan Steppe Hypothesis (more succinctly called the Kurgan Hypothesis). The Anatolian Hypothesis argues that the Proto-Indo-European language was last spoken 9,000 years ago in what is today Anatolia, Turkey. The Kurgan Hypothesis believes the language was spoken roughly 7,000 years ago by people in the southwestern Pontic-Caspian steppes of Russia. Most archaeologists and linguists today fall into either camp.

A new contending theory has emerged over the last ten years. The Gray & Atkinson model advocates for a home in Anatolia, ~9,500 years ago. On the surface this looks remarkably similar to the Anatolian Hypothesis, but Gray & Atkinson's methods used to reach such a conclusion are distinct.

The Anatolian Hypothesis used archeological data to chart the migratory spread of ancient farmers from Anatolia into Europe, and postulated that language diffused throughout Eurasia with them. Gray & Atkinson utilize an algorithm to map the origin of the Indo-Europeans and internally classify the language family. One method is math-free and archaeology-heavy, the other is archaeology-free and pure math.

Okay, survey over. Got everything?

While virtually no historical linguist has bought into Gray & Atkinson’s theory, those same linguists have been silent in the press. Articles in The New York Times, Nature, Wired, among others, are relatively free of critique. With no other academic touching the Gray & Atkinson model, Pereltsvaig & Lewis blaze their own trail. TIEC represents the first exhaustive criticism of Gray & Atkinson.

I was happy to see the book left no stone unturned. We’ve always known Gray & Atkinson’s reliance on lexical material (more easily described as “words”) for their algorithm is more a problem than a help. Pereltsvaig & Lewis explain why depending on words is a problem with the precision of a surgeon’s scalpel. The math is unable to distinguish true cognate sets between language families from loanwords, which leads the linguist down a dark path of misshapen branches in the language family tree. Thus in Gray & Atkinson we find Albanian located within the Indo-Iranian branch, when Albanian is a understood to be an independent branch, and Romani, the language most famous as the tongue of the “gypsies,” breaks from the Indo-Iranian branch thousands of years before linguists consider possible.

As you may imagine, TIEC is most valuable for its criticisms of Gray & Atkinson. The book surveys all theories on Indo-European origins, but Pereltsvaig & Lewis introduce new scholarship only in their review of the Gray & Atkinson model. When addressing other models, the authors paraphrase the consensus of scholars before them. Sometimes, earlier surveys on those older models by writers like James Mallory, Ben Fortson, or David Anthony are sufficient and maybe even better. What makes TIEC special is the new, sharp critique of the Gray & Atkinson model.

I use the adjective "sharp" intentionally, because a concern of mine is not of the quality of the research, but of how acerbic some of their language can be. Or to re-phrase myself, Pereltsvaig & Lewis are critical of Gray & Atkinson, perhaps even to a fault.

For example, when comparing the lead proponent of the Anatolian Hypothesis, Colin Renfrew, with Gray & Atkinson, the authors write, “…Renfrew’s Archaeology and Language is a deeply learned book rooted in an impressive synthesis of traditional archeological methods, historical, and linguistics methods that grapples thoughtfully with complexities and potential contradictions. Unfortunately one cannot say the same in regard to [Gray & Atkinson’s publications].”


Pereltsvaig & Lewis have a reason for their tone. “Had [Gray & Atkinson] framed their findings as suggestive,” they write in the Introduction, “we would have shrugged it off as an intriguing if misguided effort. But by claiming to have resolved a crucial debate, they have crossed a line, veering from inadequately conceptualized science into a pernicious form of scientism that demands firm rebuttal.”

Did Gray & Atkinson’s papers truly cross a delicate line from earnest academic debate into deeply speculative science passed off as conclusive? You can be the judge, because TIEC is comprehensive to the point where the reader is left well-informed and, with Gray & Atkinson’s papers at hand, fully prepared to draw their own conclusions. The book is written at a level any layman or laywoman can understand.

The arguments in TIEC are persuasive and strong. I am greatly interested in any rebuttals Gray & Atkinson may have and I hope they do not ignore this book. Even if TIEC is not a knock-down argument of the Gray & Atkinson model (I believe it is), the arguments they present must be addressed. 

The history of the word 'Caucasian'

In 1795, a 43-year-old professor from Göttingen, Germany, was five years into a new and exciting scientific project. His name was Johann Friedrich Blumenbach. Blumenbach may not be a familiar name today, but back then among men of letters he was already established as an eminent physiologist and natural philosopher. He was editor of the Medicinische Bibliothek and author of a widely-circulated anatomy text Institutiones Physiologicae, which explored and explained the little details of animal anatomy. 

The project of Blumenbach's was something unseen to natural science at the time, an idea of his own. Blumenbach scoured the earth for the skulls of humans from different people groups to draw conclusions about racial and enthographic differences. It was the start of comparative anatomy across human groups.

An 87-year-old Johann Friedrich Blumenbach, drawn by Ludwig Grimm in 1840. Ludwig was the brother of Jacob and Wilhelm Grimm, of "Grimm's Fairy Tales" fame.

An 87-year-old Johann Friedrich Blumenbach, drawn by Ludwig Grimm in 1840. Ludwig was the brother of Jacob and Wilhelm Grimm, of "Grimm's Fairy Tales" fame.

Blumenbach began his work in 1790, halfway through a stay in England, and by 1795 he had amassed no fewer than 60 craniums, representing peoples around the globe. The skulls came from as near as Holland and as far as northernmost North America. Most exciting to Blumenbach, however, was a skull from the Georgian people. Why was Georgian most exciting?

Blumenbach had a perverted sense of beauty, where what was beautiful to him was equivalent to some sort of biological truism. Typical of the time, he valued the high vertical forehead of northern Eurasians, pronounced eye ridges, and Roman jawline. Nowhere were these qualities more strikingly found, Blumenbach saw, than among the Kartvelian people - Georgians.

Now Blumenbach was not studying these skulls in an historical vacuum, free from the influence of those before him. Blumenbach's comparisons of skulls were made to advocate that all humans come from a single parentage (the Biblical Adam and Eve) and to refute the earlier argument of Christoph Meiners' that different races came from separate sources. Five years into his project, Blumenbach had concluded that the study of skulls was strong evidence that all humans are related, by asserting that the differences were gradations of changes. In other words, by studying all human skulls, we see a spectrum of change across the world and no single marked differences among races. 

But his work of 18th-century science was fraught with 18th-century nonsense. He praised the Georgian skulls as the most beautiful and, because beauty was an indication of perfection, argued in that same work that all humans come from the Caucasus, the homeland of the Georgians. Because white Eurasians (the whole of Europeans but also Turks and Hindustani) best "preserve" the features of their putative ancestors, they were Caucasian. Other large racial groups he called Ethiopian, Mongoloid, Americanoid, and Malay. 

Blumenbach's work was published in Latin under the title Collectionis suae craniorum diversarum gentium illustratae decades later that year and it was well received by the scientific community. His term "Caucasian" as a moniker for all white people stuck. That is the origin of why we call all white people Caucasian, despite only a small subset of people truly coming from the Caucasus. 

Blumenbach didn't coin the word Caucasian. Christoph Meiners invented it in 1785 (from the Latin name for the region), but it was only when Blumenbach borrowed the word from Meiners that the name spread. 

How are we to judge Blumenbach's work today? Blumenbach's work was piece of both racism and progressive values. As I discussed, Blumenbach erroneously based his scientific conclusions in part on his conviction that what was beautiful to him was somehow purer. This led Blumenbach down many dark paths over his career. As Bhopal (2007) noted, Blumenbach dismissed diseases that cause a loss of skin pigment like leucoplakia as a fiction, because lighter skin was more beautiful to him. In his mind, leucoplakia was a return to a more natural physical state. 

On the other hand, Blumenbach confronted the racist values of the time. He believed all races were equal in intelligence and rationality (Stephen Jay Gould notes that he was remarkably insistent of the fact of equal human worth), albeit not as objectively beautiful. His work refuted the consensus of his contemporaries that whites were completely unrelated to other races. So while Blumenbach's views are disappointingly backwards by today's standards, they are happily forward-thinking for his day and age. It's dangerous to assume that Blumenbach would hold his racialist views had he been born in a time of greater scientific enlightenment.  

Later scientists, both "psuedo" and legitimate, hijacked Blumenbach's comparative anatomy of skulls and used the differences in skull size and shape across human ethnic groups to determine differences in intelligence. Samuel Morton was most infamous for this. Of course, it's all hogwash. Outside of rare genetic disorders, skull size and shape are superficial and not an indication of intelligence.

Today, Blumenbach is largely forgotten. But his hallmark is a permanent indent upon the English language, the history of the word Caucasian. For more reading, see:

Bhopal, Raj. "The beautiful skull and Blumenbach's errors: the birth of the scientific concept of race." BMJ. (2007).

"Race - The Power of an Illusion." Interview with Stephen Jay Gould. PBS. California Newsreel. (2003). 

Other works mentioned here are:

Blumenbach, Johann Friedrich. Collectionis suae craniorum diversarum gentium illustratae decades. (1790-1828)

Meiners, Christoph. Grundriß der Geschichte der Menschheit. (1785).

Rediscovering Martellus

An interesting article in PhysOrg brings to light new discoveries in a five century old map of Henricus Martellus, a German cartographer responsible for one of the most accurate pre-Columbian maps ever. In fact, this may have been the map Christopher Columbus himself studied.

Credit: Mike Cummings, PhysOrg.

Credit: Mike Cummings, PhysOrg.

Unfortunately, time was not kind to Martellus' map, and by the time it was donated to Yale University in 1962, much of the ink had faded. But thanks to some fancy tech work, photographing in reflective colors - including outside the range of light visible to the human eye - we have been able to discover many new notes and facets of the map that have been unknown to us since Columbus' voyage. 

Notes recently discovered in Southern Africa correspond with the Egyptus Novelo map, affirming that Martellus preferred African sources for African cartography over European. Notes of Martellus' in the text range from descriptions of orca whales to descriptions of mythical Panotii (humans, described by Pliny the Elder, with ears so large they used them as blankets). 

Getting schooled in noun class

Noun classification is a tricky thing. It's so complicated that linguists do not have a full and satisfying explanation for how and why it exists but it's so important to many languages that even starting language learners cannot avoid it.

Proto-Indo-European had two genders, animate and inanimate, that later split the animate class into masculine and feminine while the inanimate remained more-or-less intact as the neuter. Only the oldest Indo-European languages preserved the animate/inanimate distinction  (e.g., Hittite and Luwian), while later IE tongues preserve the masculine/feminine/neuter (e.g., Sanskrit, Greek, German) or have even more changes (e.g., English, French, Sorani). 

Other language families capture different types of noun classifications. At no point in history, to our knowledge, did Basque have grammatical gender. Depending on the language in question, Niger-Congo languages almost always have at least 10 classes. 

There are many ways noun classification is governed, but the key trait is that there is a semantic distinction between classes. It's not always cut and dry. Russian has gender in the singular but plural nouns are genderless. Aikhenvald (2000) divides the government of noun classification as follows:

  1. Pure semantic assignment. Here, classification of the noun is understood on the basis of its meaning.
    1. Tamil divides gender between rational and non-rational nouns. In the former belong humans, gods, and demons.
    2. Dyirbal has four complicated classes: group one covers males humans and non-male animates; group two is female humans, water, fire, and fighting; group three covers non-flesh food; and group four is simply everything else. 
  2. Phonological assignment. There is no language that solely relies upon the sound structure of a word to assign a noun class. Some languages do seem to have a tendency to classify a noun on the basis of what sounds are present in a word, even if the classification does not semantically fit. 
    1. Limilngan typically classifies nouns on a semantic basis (like the languages above). However, Limilngan can form "super-classes." If a new word has /l/ or /d/, it is assigned to Class II (animals), while words with /m/ are assigned to Class III (plants), even if the words have nothing to do with animals or plants.
  3. Impure semantic assignment. Some languages have classification principles that fall outside the realm of definitions, though as we discussed before, no known language classifies entirely independent of a noun's semantic value. These reasons can be suprasegmental or morphological.
    1. In the Harar dialect of Oromo, animate masculine and feminine nouns correspond to their biological sex, but inanimate nouns are classified as masculine or feminine on the basis of whether the word ends in a low vowel (masculine) or not (feminine). 
    2. Latin may decline a word in one gender but it will grammatically agree according to its biological sex (e.g., the office of an agricola "farmer" belonged to a man in ancient times; the word is first declension and therefore declines as feminine but is grammatically treated as masculine). 
    3. Iraqw nouns that derive from Class I verbs are treated as masculine and nouns from Class II verbs are feminine, irrespective of semantics or phonology. 
    4. Ket distinguishes masculine and feminine corresponding to biological sex. Except that important objects like wood are also masculine; the sun (masculine) and moon (feminine) is due to its role in Ket mythology; everything else is neuter.
  4. Unknown motivation. Sometimes it is not understood why nouns are classified the way they are. Hebrew divides nouns along masculine and feminine lines according to their semantic meaning or its morphology. But a small collection of words like  "fire" belong to the feminine gender even though the masculine is expected. There is no explanation for this.

If you want to learn more about world languages and how we, as humans, divide the world into categories through our language, check out Aikhenvald's fantastic book:

Aikhenvald, Alexandra. Classifiers: A Typology of Noun Categorization Devices. Oxford University Press. 2000.

The Indo-European homeland that wasn't.

Whatever you think of Quentin Atkinson and his theory of the Indo-European expansion, you have to concede that he is great at marketing. A new map made by Business Insider aims to visually present Atkinson's theory (below), which was then re-posted to other science journalism sites like ScienceAlert.

The video description is positively cringe-worthy, demonstrating that the mapmakers themselves have little knowledge of IE languages:

In 2012, a team of evolutionary biologists at the University of Auckland led by Dr. Quentin Atkinson released a study that found all modern IE languages could be traced back to a single root: Anatolian — the language of Anatolia, now modern-day Turkey.

I'm sorry but no one is claiming that the IE languages derive from "Anatolian" (whatever that means), not even Atkinson. The Anatolian languages comprise the Anatolian branch of the Indo-European language family. They derive from a hypothetical common ancestor, Proto-Anatolian. Proto-Anatolian, in turn, comes from Proto-Indo-European, which is the ancestor to the other Indo-European languages as well. 

Atkinson's theory is outside of mainstream historical linguistics. It was sharply criticized by Martin Lewis of Stanford, but for the most part was ignored as unsupported by the evidence. While the origins of the Indo-Europeans is still a point of contention, the most commonly accepted theory is Gimbutas' Kurgan Steppe Hypothesis, as it was supported by subsequent discoveries both linguistically and archaeologically. The other theory with any credibility is Renfrew's Anatolian Hypothesis (advocating for the same location as Atkinson's theory, but arguing for a very different timeperiod), but the Anatolian Hypothesis has fallen on hard times and I very much doubt it will survive another 25 years.

Atkinson's theory is sexier. It simplifies the narrative and is more easily digested by the media. For the most part, the media has ate it up. Someone on Facebook put it better:

...and not a drop to drink.

Proto-Indo-European had many possible words for water. Most famous is *u̯ód-r (from which we get English water) because of its similarity to Proto-Uralic *weti "water." But no less compelling is a mysterious trio of waterwords that look remarkably similar, yet cannot be reduced to a single etymon. It may be there was a feature in Proto-Indo-European that we cannot reconstruct that changed the phonological nature, it may be that all three etyma arose from a pre-Proto-Indo-European state of the language, or it may be that the similarities are due to chance and happenstance.

*h2ekw- Derivations in PIE *h2ekw-ró-, whence PG *agra- "flood;" *h2ékw-eh2-, PI *akwā- and PG *ahwō- "river;" *h2ekw-ieh2-, PG *aujō- "wetland," "island."

*h2ep- Found in PI *āpā-, PGk. *āp-, PBS *āp- and PT *āp; PIIr. *Hāp-. Derivations in PIE *h2ep-h3on-, whence PC *abon- "river."

*h2ebh- Derivations in PIE *h2ebh-o-, h2ebh-n-, whence PA *h2ebo-.

PA = Proto-Anatolian; PBS = Proto-Balto-Slavic; PC = Proto-Celtic PG = Proto-Germanic; PGk. = Proto-Greek; PI = Proto-Italic; PIIr. = Proto-Indo-Iranian; PT = Proto-Tocharian

Of the three, *h2ep- has the strongest attestation, but *h2ebh- has the oldest. Early Proto-Indo-European may have been *h2ebh- that unexpectedly devoiced after the Anatolian branch departed. On the other hand, if there were an environment factor, *p may have been voiced (we see this happen in *-ph3- > *-b-), but it's hard to see why it would yield an aspirated *bh instead of *b. So it's easier to see this going from Early PIE *h2ebh- > *h2ep- than the other way around.

Then there's *h2ekw-, which is tantalizingly close to *h2ep-. Biarticulation of a rounded consonant, like *kw and *gw, frequently vacillates with *p and *b respectively. We saw this happen over and over again in Celtic, where *p became *q [kw] and vice-versa several times over thousands of years.

Germanic languages do not preserve *h2ep- in any form (unless Old Icelandic afr "oat drink" comes from *h2ep-ro-, but that is my private theory). But Italic languages have both *h2ep-, in Oscan aapa-, and *h2ekw-, in Latin aqua. Because attestation of *h2ekw- in Italic languages was limited and competed with *h2ep-, and because the only IE cognate was in Germanic, it has prompted some (like Beekes) to argue it's a loanword from outside. It may be, though it's difficult to believe words for "water" are easily borrowed and utterly replace any native words so thoroughly that Latin does not have it.

Whatever solution you find, the three etyma are a pernicious mystery. Cranberry Letters out.

Proto-Anatolian from Alwin Kloekhorst, Etymological Dictionary of Hittite. Brill. 2014.

Proto-Celtic from Ranko Matasovic, Etymological Dictionary of Proto-Celtic. Brill. 2009.

Proto-Germanic from Guus Kroonen, Etymological Dictionary of Proto-Germanic. Brill. 2009.

Proto-Italic and Proto-Indo-Iranian from Michiel de Vaan, Etymological Dictionary of Italic Languages. Brill. 2009.

Proto-Greek, Proto-Balto-Slavic, and Proto-Tocharian etymologies were my own.


The etymology of a common drink

A mysterious word skipped over in most etymological works of Old Icelandic is afr, áfr 'a beverage of some kind' (Zoega). It's not hard to see why not, there aren't any readily observed cognates in any other language - neither Indo-European nor otherwise. As far as I am aware, the only attempt to trace its origin is De Vries, who links it to a Proto-Germanic root *abara-; but research into De Vries' reasoning finds his etymology lacking, for reasons we will explore in a bit. But first, let's take a look at what clues we have in Icelandic for afr.

The word appears in the poetic Eddas of Icelandic lore, but the first attempt to provide a definition was by Árni Magnússon, writing a note on Egil's Saga that the word stands for sorbitio avanacaea, Latin for "oat drink." We see a glimpse of this in the Hárbarðsljóð poem, when Thor says 

"...át ek í hvíld, áðr ek heiman fór, síldr ok hafra..." (verse 3)

"...I ate the rest before I went home, herring and porridge..."

Today's translations of Hárbarðsljóð write "porridge" based on the fact that a single copy of the poem has hafra (lit. "oats") instead of afra "oat drink." Cleasby dismisses this as a mistake: the oldest copies have afra. As afra went out of fashion, it is likely that hafra was inserted by a later copyist attempting to correct what he thought was a mistake. But the scribal confusion between hafra and afra is telling; reinforcing, though not proving, the idea that the drink was oat-based.

Finally, we cannot skip Modern Icelandic afir "buttermilk," which Cleasby notes was used in place of beer. 

De Vries is the single scholar to attempt an etymology, focusing on the oats of the drink. Noting that Finnish apara "bierhefe" is probably not native, and likely a loan, he reconstructs Proto-Germanic *abara- with no definition posited. It would not impossible for *abara- to yield afr (with intermediate *apra- I presume) but *afra- is a conservative reconstruction (cf. OIc. hafr "goat" < PG *hafra-, Kroonen 2009). 

A better etymology, I would argue, would be afr comes from Proto-Germanic *afra- and from there Proto-Indo-European *h2ep-ro-, that's *h2ep- "water" with the *-ro- noun-forming suffix that is prevalent in Proto-Germanic (cf. *wira- "man" < *u̯iH-ró-; *hafra- "goat" < *kap-ro-; *timbra- "timber" < *demH-ro-). As a root *h2ep- is well attested: Sanskrit áp- "wet;" Old Avestan āfš; Old Prussian ape- "river;" Tocharian B āp "water;" Greek Āpía "the Peloponnesus;" Oscan aapa "water;" Old Irish ab "river." Conspicuously missing, however, is a Germanic representative. While the phonological reconstruction is tight, evidence would suggest that the stem *h2ep- was lost or had changed to *h2ékw- (cf. PG *agra- "flood"). Some believe that *-kw- became *-p- (which is more likely than *-p- > *-kw-, though neither direction of change is rare), but I would argue that Anatolian evidence for unexpected variant *h2eb- points to an original bilabial stop, and that *h2ep-ro- was formed before palatization to *-kw-. The condition of cluster *-pr- preserved it from palatization.