An introduction.

The following is part of an on-going attempt to list the conjectured survivalisms of indigenous European languages that were spoken prior to the Indo-European expansion found in peer-reviewed literature. Europe was a land filled with dozens, if not hundreds, of tribes, each with their own dialects and languages. Yet after the Uralic and Indo-European migrations into Europe, only one continues to be spoken today: Basque. The other native tongues have disappeared. A small handful have left us writings: Linear A, Iberian, and Etruscan are the most famous examples. Not all of those have been deciphered. A greater number of languages are known through the writings of neighboring authors who mention their existence. A larger number still is hypothesized through non-IE words and grammatical structures in living languages. The theory, roughly, is that when the natives learned the new language, bits and pieces of the native language went with them. 

But why this "Dictionary?" Because we don't have one yet. To this author's knowledge, there exists no easily accessible list of substrata proposals. This leads to redundancies and a lot of headache for the researcher. Much like the Online Etymology Dictionary, I list peer-reviewed lemmata in alphabetical order. The layout of the Dictionary is simple:

etymon (lexical category) "definition" - Summary of commentary from peer-reviewed sources. [Commentary of my own when necessary]. Citations.

It is important to understand the geographic parameters of this project. For now, this project is concerned with Western Europe - that is, Europe west of the Caspian Sea. The reason for this has more to do with time than anything else. The Caucasus are an intricate and incredibly difficult region to study. This project does not, at the moment, concern itself with substrata in Uralic languages. I simply do not have the expertise to comment on Uralic languages. This may change in time as the relevance of Finnish, Hungarian, and Saami languages upon substrata studies is becoming increasingly important.

Finally, there are several self-imposed limitations worth mentioning. This project will not include Basque. Vasconic studies is a rich and thriving literature and has no need for another dictionary. This project will not include the lexicons of already-understood or well-published indigenous languages (Iberian, Linear A, Tyrrhenian languages, etc...). On the other hand, when those languages have been hypothesized to represent the source of an IE reflex, they are included. We already have a plenitude of reports on them. What we are missing is a listing of proposed loanwords from paleo-European tongues. Finally, including Pre-Greek would be a gargantuan and unnecessary task unto itself. Beekes has written extensively on the language and there is little I could add. Pre-Greek lemmata is introduced when relevant to other substrata discussions, especially as regards arboreal nomenclature.

The author is pleased to note that the Dictionary captures a range of debate and dissension among scholars. No effort is made to harmonize the disagreements, nor should any effort be made, for debate among academics is the lifeblood of progress. In some instances, different etyma have been reconstructed using the same materials, such as *akr- vs. AKR? (ref. Tree Names). Many etyma are disputed, such as *alsnos- (Pre-Proto-Germanic) which more conservative linguists trace to Proto-Indo-European *h2éliso/eh2- rather than to a substratum. Several proposed substratum etyma are accepted by virtually no one but the proponent, such as quercus (Tree Names). Bracketed comments obviate the reader to scholastic disagreements or examples of spurious or generally unaccepted reconstructions.


  1. Iberian Peninsula languages
  2. Pre-Proto-Celtic
  3. Pre-Proto-Germanic
  4. Pre-Insular Celtic
  5. Pre-Proto-Slavic
  6. Pre-Proto-Italic
  7. Old European Hydronomy
  8. Wanderwörter
  9. Appendix A: Substrata in Uralic languages
  10. Appendix B: Tree Names
  11. Appendix C: Bird Names

A note on the notation.

Attempts have been made to limit the use of abbreviations which, for the new reader, can be exceedingly burdensome to learn. In rare but frustrating moments, abbreviations unnecessarily obfuscate an author's notations. I will abstain from pointing fingers but let it be known there are some very "opaque" works of scholarship out there. Two exceptions are PIE for Proto-Indo-European and IE for Indo-European

Certain lemmata do not have proposed reconstructs. It may be that a reconstruction is impossible or that the author did not attempt one. In these cases, the lemma has been replaced with an abbreviated code (e.g. ELM? in Tree Names).  

Macrons and other unusual indications of length are very difficult to type out on this processor. Instead, long vowels in proto-languages and several attested languages are represented with < : >. E.g., Proto-Celtic *bandno:- "mountain peak," where a macron would normally be above the <o> as opposed to a colon.