Whence Did They Come?

In a recent episode of Slate’s Lexicon Valley podcast, John McWhorter discussed the history of English personal pronouns. Why don’t we use ye or thee and thou anymore? What’s the deal with using they as a gender-neutral singular pronoun? And where do they and she come from?

The first half, on the loss of ye and the original second-person singular pronoun thou, is interesting, but the second half, on the origins of she and they, missed the mark, in my opinion.

I recommend listening to the whole thing, but here’s the short version. The pronouns she and they/them/their(s) are new to the language, relatively speaking. This is what the personal pronoun paradigm looked like in Old English:

Case Masculine Neuter Feminine Plural
Nominative hit hēo hīe
Accusative hine hit hīe hīe
Dative him him hire him
Genitive his his hire heora

There was some variation in some forms in different dialects and sometimes even within a single dialect, but this table captures the basic forms. (Note that the vowels here basically have classical values, so would be pronounced somewhat like hey, hire would be something like hee-reh, and so on. A macron or acute accent just indicates that a vowel is longer.)

One thing that’s surprising is how recognizable many of them are. We can easily see he, him, and his in the singular masculine forms (though hine, along with all the other accusative forms, have been lost), it (which has lost its h) in the singular neuter forms, and her in the singular feminine forms. The real oddballs here are the singular feminine form, hēo, and the third-person plural forms. They look nothing like their modern forms.

These changes started when the case system started to disappear at the end of the Old English period. , hēo, and hie began to merge together, which would have led to a lot of confusion. But during the Middle English period (roughly 1100 to 1500 AD), some new pronouns appeared, and then things started settling down into the paradigms we know now: he/him/his, it/it/its, she/her/her, and they/them/their. (Note that the original dative and genitive forms for it were identical to those for he, but it wasn’t until Early Modern English that these were replaced by it and his, respectively.)

The origin of they/them/their is fairly uncontroversial: these were apparently borrowed from Old Norse–speaking settlers, who invaded during the Old English period and captured large parts of eastern and northern England, forming what is known as the Danelaw. These Old Norse speakers gave us quite a lot of words, including anger, bag, eye, get, leg, and sky.

The Old Norse words for they/them/their looked like this:

Case Masculine Neuter Feminine
Nominative þeir þau þær
Accusative þá þau þær
Dative þeim þeim þeim
Genitive þeirra þeirra þeirra

If you look at the masculine column, you’ll notice the similarity to the current they/them/their paradigm. (Note that the letter that looks like a cross between a b and a p is a thorn, which stood for the sounds now represented by th in English.)

Many Norse borrowings lost their final r, and unstressed final vowels began to be dropped in Middle English, which would yield þei/þeim/þeir. (As with the Old English pronouns, the accusative form was lost.) It seems like a pretty straightforward case of borrowing. The English third-person pronouns began to merge together as the result of some regular sound changes, but the influx of Norse speakers provided us an alternative for the plural forms.

But not so fast, McWhorter says. Borrowing nouns, verbs, and the like is pretty common, but borrowing pronouns, especially personal pronouns, is pretty rare. So he proposes an alternative origin for they/them/their: the Old English demonstrative pronouns—that is, words like this and these (though in Old English, the demonstratives functioned as definite articles too). Since hē/hēo/hīe were becoming ambiguous, McWhorter argues, English speakers turned to the next best thing: a set of words meaning essentially “that one” or “those ones”. Here’s what the plural demonstrative pronouns in Old English looked like:

Case Plural
Nominative þā
Accusative þā
Dative þǣm/þām
Genitive þāra/þǣra

(Old English had a common plural form rather than separate plural forms for the masculine, neuter, and feminine genders.)

There’s some basis for this kind of change from a demonstrative to a person pronoun; third-person pronouns in many languages come from demonstratives, and the third-person plural pronouns in Old Norse actually come from demonstratives themselves, which explains why they look similar to the Old English demonstratives: they all start with þ, and the dative and genitive forms have the -m and -r on the end just like them/their and the Old Norse forms do.

But notice that the vowels are different. Instead of ei in the nominative, dative, and genitive forms, we have ā or ǣ. This may not seem like a big deal, but generally speaking, vowel changes don’t just randomly affect a few words at a time; they usually affect every word with that sound. There has to be some way to explain the change from ā to ei/ey.

And to make matters worse, we know that ā (/ɑː/ in the International Phonetic Alphabet) raised to /ɔː/ (the vowel in court or caught if you don’t rhyme it with cot) during Middle English and eventually raised to /oʊ/ (the vowel in coat) during the Great Vowel Shift. In a nutshell, if English speakers had started using þā as the third-person plural pronoun in the nominative case, we’d be saying tho rather than they today.

But the biggest problem is that the historical evidence just doesn’t support the idea that they originates from þā. The first recorded instance of they, according to The Oxford English Dictionary, is in a twelfth-century manuscript known as the Ormulum, written by a monk known only as Orm. Orm is the Old Norse word for worm, serpent, or dragon, and the manuscript is written in an East Midlands dialect, which means that it came from the Danelaw, the area once controlled by Norse speakers.

In the Ormulum we finds forms like þeȝȝ and þeȝȝre for they and their, respectively. (The letter ȝ, known as yogh, could represent a variety of sounds, but in this case it represents /i/ or /j/). Other early forms of they include þei, þai, and thei.

The spread of these new forms was gradual, moving from areas of heaviest Old Norse influence throughout the rest of the English-speaking British Isles. The early-fifteenth-century Hengwert Chaucer, a manuscript of The Canterbury Tales, usually has they as the subject but retains her for genitives (from the Old English plural genitive form hiera or heora) and em for objects (from the Old English plural dative him. The ’em that we use today as a reduced form of them probably traces back to this, making it the last vestige of the original Old English third-person plural pronouns.

So to make a long story short, we have new pronouns that look like Old Norse pronouns that arose in an Old Norse–influenced area and then spread out from there. McWhorter’s argument boils down to “borrowing personal pronouns is rare, so it must not have happened”, and then he ignores or hand-waves away any problems with this theory. The idea that these pronouns instead come from the Old English þā just doesn’t appear to be supported either phonologically or historically.

This isn’t even an area of controversy. When I tweeted about McWhorter’s podcast, Merriam-Webster lexicographer Kory Stamper was surprised, responding, “I…didn’t realize there was an argument about the ety of ‘they’? I mean, all the etymologists I know agree it’s Old Norse.” Borrowing pronouns may be rare, but in this case all the signs point to yes.

For a more controversial etymology, though, you’ll have to wait until a later date, when I wade into the murky etymology of she.


The Taxing Etymology of Ask

A couple of months back, I learned that task arose as a variant of tax, with the /s/ and /k/ metathesized. This change apparently happened in French before the word was borrowed into English. That is, French had the word taxa, which came from Latin, and then the variant form tasca arose and evolved into a separate word with an independent meaning.

I thought this was an interesting little bit of historical linguistics, and as a side note, I mentioned on Twitter that a similar phonological change gave us the word ask, which was originally ax (or acs or ahs—spelling was not standardized back then). Beowulf and Chaucer both use ax, and we didn’t settle on ask as the standard form until the time of Shakespeare.

But when I said that “it was ‘ax’ before it was ‘ask'”, that didn’t necessarily mean that ax was the original form—history is a little more complicated than that.

The Oxford English Dictionary says that ask originally meant “to call for, call upon (a person or thing personified) to come” and that it comes from the Old English áscian, which comes from the Proto-Germanic *aiskôjan. But most of the earliest recorded instances, like this one from Beowulf, are of the ax form:

syþðan hé for wlenco wéan áhsode

(after he sought misery from pride)

(A note on Old English orthography: spelling was not exactly standardized, but it was still fairly predictable and mostly phonetic, even though it didn’t follow the same conventions we follow today. In Old English, the letter h represented either the sound /h/ at the beginning of words or the sound /x/ [like the final consonant in the Scottish loch] in the middle of or at the end of words. And when followed by s, as in áhsode, it made the k sound, so hs was pronounced like modern-day x, or /ks/. But the /ks/ cluster could also be represented by cs or x. For simplicity’s sake, I’m going to use ask and ax rather than asc or ahs or whatever other variant spellings have been used over the years.)

We know that ask must have been the original form because that’s what we find in cognate languages like Old Saxon, Old Frisian, and Old High German. This means that at some point after Old English became differentiated from those other languages (around 500 AD), the /s/ and /k/ metathesized and produced ax.

Almost all of the OED’s citations from Old English (which lasted to about 1100 AD) use the ax form, as in this translation of Mark 12:34 from the West Saxon Gospels: “Hine ne dorste nan mann ahsian” (no man durst ask him). (As a bonus, this sentence also has a great double negative: it literally says “no man durst not ask him”.) Only a few of the citations from the Old English period are of the ask variety. I’ll discuss this variation between ask and ax later on.

The ax forms continued through Middle English (about 1100 to 1475 AD) and into Early Modern English. Chaucer’s Canterbury Tales (about 1386 AD) has ax: “I axe, why the fyfte man Was nought housbond to the Samaritan?” In Middle English, ask starts to become a little more common in written work, and we also occasionally see ash, though this form peters out by about 1500. (Again, I’ll discuss this variant more below.)

William Tyndale’s Bible, which was the first Early Modern English translation of the Bible, has ax: Matthew 7:7 reads, “Axe and it shalbe geven you.” The Coverdale Bible, published in 1535 and based on Tyndale’s work, also has ax, but the King James Bible, published in 1611, has the now-standard ask. So do Shakespeare’s plays (dating from the late 1500s to the early 1600s). After about 1600, ax forms become scarce, though one citation from 1803 records axe as a dialectal form used in London. And it’s in nonstandard dialects where ax survives today, especially in Southern US English and African American English. (I assume it also survives in other places besides the US, but I don’t know enough about its use or distribution in other countries.)

In a nutshell, ax arose as a metathesized form of ask at some point in the Old English period, and it was the dominant form in written Old English and an acceptable variant down to the 1500s, when it started to be supplanted by the resurgent ask. And at some point, ash also appeared, though it quietly disappeared a few centuries later. So why did ask disappear for so long? And why did it come back?

The simple answer to the first question is that the word metathesized in the dominant dialect of Old English, which was West Saxon. (Modern Standard English descends not from West Saxon but from the dialect around London.) These sorts of changes just happen sometimes. In West Saxon, /sk/ often became /ks/ in the middle or at the end of a word. Sound changes are usually regular—that is, they affect all words with a particular sound or set of sounds—but this particular change apparently wasn’t; metathesized and unmetathesized forms continued to exist side by side, and sometimes there’s variation even within a manuscript. King Alfred the Great’s translation of Boethius’s Consolation of Philosophy, switches freely between the two: “Þæt is þæt ic þé ær ymb acsade. . . . Swa is ðisse spræce ðe ðu me æfter ascast.” This is pretty weird. When a change is beginning to happen, there may be some variation among words or among speakers, but variation between different forms of a word used by the same speaker is highly unusual.

As for the second question, it’s not entirely clear how or why ask came back. At first glance, it would seem that ask must have survived in other dialects and started to crop back up in written works during the Middle English Period. Or perhaps ax simply remetathesized and became ask again. But it can’t be quite that simple, because /sk/ regularly palatalized to /ʃ/ (the “sh” sound) during the Old English period. You can see the effects of this change in cognate pairs like shirt (from Old English) and skirt (from Old Norse) or ship (from Old English) and skipper (from Middle Dutch).

It’s not entirely clear when this palatalization of /sk/ to /ʃ/ happened, but it must have been sometime after the Angles and Saxons left mainland Europe (starting in the 400s or 500s) but before the Viking invasions beginning in the 800s, because Old Norse words borrowed into English retain /sk/ where English words did not. If palatalization had occurred after the influx of words from Old Norse, we’d say shy and shill instead of sky and skill.

One thing that makes it hard to pin down the date of this change is that /sk/ was originally spelled sc, and the sc spelling continued to be used even after palatalization must have happened. That means that words like ship and fish were spelled like scip and fisc. Thus a form with sc is ambiguous—we don’t know for certain if it was pronounced /sk/ or /ʃ/, though we can infer from other evidence that by the time most Old English documents were being created, sc represented /ʃ/. (Interestingly, this means that in the quote from Alfred the Great, the two forms would have been pronounced ax-ade and ash-ast.) It wasn’t until Middle English that scribes began using spellings like sch, ssh, or sh to distinguish /ʃ/ from the /sk/ combination.

If ask had simply survived in some dialect of Old English without metathesizing, it should have undergone palatalization and resulted in the modern-day form ash. As I said above, we do occasionally see ash in Middle English, which means that this did happen in some dialects of Old English. But this was never even the dominant form—it just pops up every now and then in the South West and West Midlands regions of England from the 1200s down to about 1500, when it finally dies out.

One other option is that the original ask metathesized to ax, missed out on palatalization, and then somehow metathesized back to ask. There may be some evidence for this option, because some other words seem to have followed the same route. For instance, words like flask and tusk appear in Old English as both flasce/flaxe and tusc/tux. But flask didn’t survive Old English—the original word was lost, and it was reborrowed from Romance languages in the 1500s—so we don’t know for sure if it was pronounced with /sk/ or /ʃ/ or both. Tusk appears in some dialects as tush, so we have the same three-way /sk/–/ks/–/ʃ/ alternation as ask.

But while ash meaning the powdery residue shows the same three-way variation, ash meaning the kind of tree does not—it’s always /ʃ/. Ask, ash, and ash all would have had /sk/ in the early stages of Old English, so why did one of them simply palatalize while the other two showed a three-way variation before settling on different forms? If it was a case of remetathesis that turned /ks/ back into /sk/, then why weren’t other words that originally ended in /ks/ affected by this second round of metathesis? And if /ks/ had turned back into /sk/ at some point, then why didn’t ax ‘a tool for chopping’ thus become ask? Honestly, I have no idea.

If those changes happened in that order, then we should expect to see /ask/ for the questioning word, the tree, and the tool. But there’s no way to reorder these rules to get the proper outputs for all three. Putting palatalization before metathesis gets us the proper output for the tree but also gives us ash for the questioning word, and putting a second round of metathesis at the end gets us the proper output for the questioning word but gives us ask for the chopping tool. And any way you rearrange them, you should never see multiple outputs for the same word, all apparently the products of different rules or at least different rule ordering, used in the same dialects or even by the same speakers.

So how do we explain this?


Maybe the sound changes happened in different orders in different parts of England, and those different dialects then borrowed forms from each other. Maybe some forms were borrowed from or influenced by the Vikings. Maybe there were several other intermediate rules that I’m missing, and those rules interacted in some strange ways. At any rate, the pronunciation ax for ask had a long and noble tradition before falling by the wayside as a dialectal form about four hundred years ago. But who knows—there’s always a chance it could become standard again in the future.


Celtic and the History of the English Language

A little while ago a link to this list of 23 maps and charts on language went around on Twitter. It’s full of interesting stuff on linguistic diversity and the genetic relationships among languages, but there was one chart that bothered me: this one on the history of the English language by Sabio Lantz.

The Origins of English

The first and largest problem is that the timeline makes it look as though English began with the Celts and then received later contributions from the Romans, Anglo-Saxons, Vikings, and so on. While this is a decent account of the migrations and conquests that have occurred in the last two thousand years, it’s not an accurate account of the history of the English language. (To be fair, the bar on the bottom gets it right, but it leaves out all the contributions from other languages.)

English began with the Anglo-Saxons. They were a group of Germanic tribes originating in the area of the Netherlands, northern Germany, and Denmark, and they spoke dialects of what might be called common West Germanic. There was no distinct English language at the time, just a group of dialects that would later evolve into English, Dutch, German, Low German, and Frisian. (Frisian, for the record, is English’s closest relative on the continent, and it’s close enough that you can buy a cow in Friesland by speaking Old English.)

The inhabitants of Great Britain when the Anglo-Saxons arrived were mostly romanized Celts who spoke Latin and a Celtic language that was the ancestor of modern-day Welsh and Cornish. (In what is now Scotland, the inhabitants spoke a different Celtic language, Gaelic, and perhaps also Pictish, but not much is known about Pictish.) But while there were Latin- and Celtic-speaking people in Great Britain before the Anglo-Saxons arrived, those languages probably had very little influence on Old English and should not be considered ancestors of English. English began as a distinct language when the Anglo-Saxons split off from their Germanic cousins and left mainland Europe beginning around 450 AD.

For years it was assumed that the Anglo-Saxons wiped out most of the Celts and forced the survivors to the edges of the island—Cornwall, Wales, and Scotland. But archaeological and genetic evidence has shown that this isn’t exactly the case. The Anglo-Saxons more likely conquered the Celts and intermarried with them. Old English became the language of government and education, but Celtic languages may have survived in Anglo-Saxon–occupied areas for quite some time.

From Old to Middle English

Old English continues until about 1066, when the Normans invaded and conquered England. At that point, the language of government became Old French—or at least the version of it spoken by the Normans—or Medieval Latin. Though peasants still spoke English, nobody was writing much in the language anymore. And when English made a comeback in the 1300s, it had changed quite radically. The complex system of declensions and other inflections from Old English were gone, and the language had borrowed considerably from French and Latin. Though there isn’t a firm line, by the end of the eleventh century Old English is considered to have ended and Middle English to have begun.

The differences between Old English and Middle English are quite stark. Just compare the Lord’s Prayer in each language:

Old English:

Fæder ure þu þe eart on heofonum;
Si þin nama gehalgod
to becume þin rice
gewurþe ðin willa
on eorðan swa swa on heofonum.
urne gedæghwamlican hlaf syle us todæg
and forgyf us ure gyltas
swa swa we forgyfað urum gyltendum
and ne gelæd þu us on costnunge
ac alys us of yfele soþlice

(The character that looks like a p with an ascender is called a thorn, and it is pronounced like the modern th. It could be either voiceless or voiced depending on its position in a word. The character that looks like an uncial d with a stroke through it is also pronounced just like a thorn, and the two symbols were used interchangeably. Don’t ask me why.)

Middle English:

Oure fadir that art in heuenes,
halewid be thi name;
thi kyngdoom come to;
be thi wille don,
in erthe as in heuene.
Yyue to vs this dai oure breed ouer othir substaunce,
and foryyue to vs oure dettis,
as we foryyuen to oure dettouris;
and lede vs not in to temptacioun,
but delyuere vs fro yuel. Amen.

(Note that u and v could both represent either /u/ or /v/. V was used at the beginnings of words and u in the middle. Thus vs is “us” and yuel is “evil”.)

While you can probably muddle your way through some of the Lord’s Prayer in Old English, there are a lot of words that are unfamiliar, such as gewurþe and soþlice. And this is probably one of the easiest short passages to read in Old English. Not only is it a familiar text, but it dates to the late Old English period. Older Old English text can be much more difficult. The Middle English, on the other hand, is quite readable if you know a little bit about Middle English spelling conventions.

And even where the Old English is readable, it shows grammatical inflections that are stripped away in Middle English. For example, ure, urne, and urum are all forms of “our” based on their grammatical case. In Middle English, though, they’re all oure, much like Modern English. As I said above, the change from Old English to Middle English was quite radical, and it was also quite sudden. My professor of Old English and Middle English said that there are cases where town chronicles essentially change from Old to Middle English in a generation.

But here’s where things get a little murky. Some have argued that the vernacular language didn’t really change that quickly—it was only the codified written form that did. That is, people were taught to write a sort of standard Old English that didn’t match what they spoke, just as people continued to write Latin even as they were speaking the evolving Romance dialects such as Old French and Old Spanish.

So perhaps the complex inflectional system of Old English didn’t disappear suddenly when the Normans invaded; perhaps it was disappearing gradually throughout the Old English period, but those few who were literate learned the old forms and retained them in writing. Then, when the Normans invaded and people mostly stopped writing in English, they also stopped learning how to write standard Old English. When they started writing English again a couple of centuries later, they simply wrote the language as it was spoken, free of the grammatical forms that had been artificially retained in Old English for so long. This also explains why there was so much dialectal variation in Middle English; because there was no standard form, people wrote their own local variety. It wasn’t until the end of the Middle English period that a new standard started to coalesce and Early Modern English was born.

Supposed Celtic Syntax in English

And with that history established, I can finally get to my second problem with that graphic above: the supposed Celtic remnants in English. English may be a Germanic language, but it differs from its Germanic cousins in several notable ways. In addition to the glut of French, Latin, Greek, and other borrowings that occurred in the Middle and Early Modern English periods, English has some striking syntactic differences from other Germanic languages.

English has what is known as the continuous or progressive aspect, which is formed with a form of be and a present participle. So we usually say I’m going to the store rather than just I go to the store. It’s rather unusual to use a periphrastic—that is, wordy—construction as the default when there’s a shorter option available. Many languages do not have progressive forms at all, and if they do, they’re used to specifically emphasize that an action is happening right now or is ongoing. English, on the other hand, uses it as the default form for many types of verbs. But in German, for example, you simply say Ich gehe in den Laden (“I go to the store”), not Ich bin gehende in den Laden (“I am going to the store”).

English also makes extensive use of a feature known as do support, wherein we insert do into certain kinds of constructions, mostly questions and negatives. So while German would have Magst du Eis? (“Like you ice cream?”), English inserts a dummy do: Do you like ice cream? These constructions are rare cross-linguistically and are very un-Germanic.

And some people have come up with a very interesting explanation for this unusual syntax: it comes from a Celtic substrate. That is, they believe that the Celtic population of Britain adopted Old English from their Anglo-Saxon conquerors but remained bilingual for some time. As they learned Old English, they carried over some of their native syntax. The Celtic languages have some rather unusual syntax themselves, highly favoring periphrastic constructions over inflected ones. Some of these constructions are roughly analogous to the English use of do support and progressive forms. For instance, in Welsh you might say Dwi yn mynd i’r siop (“I am in going to the shop”). (Disclaimer: I took all of one semester in Welsh, so I’m relying on what little I remember plus some help from various websites on Welsh grammar and a smattering of Google Translate.)

While this isn’t exactly like the English equivalent, it looks close. Welsh doesn’t have present participial forms but instead uses something called a verbal noun, which is a sort of cross between an infinitive and gerund. Welsh also uses the particle yn (“in”) to connect the verbal noun to the rest of the sentence, which is actually quite similar to constructions from late Middle and Early Modern English such as He was a-going to the store, where a- is just a worn-down version of the preposition on.

But Welsh uses this construction in all kinds of places where English doesn’t. To say I speak Welsh, for example, you say Dw’i’n siarad Cymraeg, which literally translated means I am in speaking Welsh. In English the progressive stresses that you are doing something right now, while the simple present is used for things that are done habitually or that are generally true. In Welsh, though, it’s unmarked—it’s simply a wordier way of stating something without any special progressive meaning. Despite its superficial similarities to the English progressive, it’s quite far from English in both use and meaning. Additionally, the English construction may have much more mundane origins in the conflation of gerunds and present participles in late Middle English, but that’s a discussion for another time.

Welsh’s use of do support—or, I should say, gwneud support—even less closely parallels that of English. In English, do is used in interrogatives (Do you like ice cream?), negatives (I don’t like ice cream), and emphatic statements (I do like ice cream), and it also appears as a stand-in for whole verb phrases (He thinks I don’t like ice cream, but I do). In Welsh, however, gwneud is not obligatory, and it can be used in simple affirmative statements without any emphasis.

Nor is it always used where it would be in English. Many questions and negatives are formed with a form of the be verb, bod, rather than gwneud. For example, Do you speak Welsh? is Wyt ti’n siarad Cymraeg? (“Are you in speaking Welsh?”), and I don’t understand is Dw i ddim yn deall (“I am not in understanding”). (This is probably simply because Welsh uses the pseudo-progressive in the affirmative form, so it uses the same construction in interrogatives and negatives, much like how English would turn “He is going to the store” into “Is he going to the store?” or “He isn’t going to the store.” Do is only used when there isn’t another auxiliary verb that could be used.)

But there’s perhaps an even bigger problem with the theory that English borrowed these constructions from Celtic: time. Both the progressive and do support start to appear in late Middle English (the fourteenth and fifteenth centuries), but they don’t really take off until the sixteenth century and beyond, over a thousand years after the Anglo-Saxons began colonizing Great Britain. So if the Celtic inhabitants of Britain adopted English but carried over some Celtic syntax, and if the reason why that Celtic syntax never appeared in Old English is that the written language was a standardized form that didn’t match the vernacular, and if the reason why Middle English looks so different from Old English is that people were now writing the way they spoke, then why don’t we see these Celticisms until the end of the Middle English period, and then only rarely?

Proponents of the Celtic substrate theory argue that these features are so unusual that they could only have been borrowed into English from Celtic languages. They ask why English is the only Germanic language to develop them, but it’s easy to flip this sort of question around. Why did English wait for more than a thousand years to borrow these constructions? Why didn’t English borrow the verb-subject-object sentence order from the Celtic languages? Why didn’t it borrow the after-perfect, which uses after plus a gerund instead of have plus a past participle (She is after coming rather than She has come), or any other number of Celtic constructions? And maybe most importantly, why are there almost no lexical borrowings from Celtic languages into English? Words are the first things to be borrowed, while more structural grammatical features like syntax and morphology are among the last. And just to beat a dead horse, just because something developed in English doesn’t mean you should expect to see the same thing develop in related languages.

The best thing that the Celtic substrate theory has going for it, I think, is that it’s appealing. It neatly explains something that makes English unique and celebrates the Celtic heritage of the island. But there’s a danger whenever a theory is too attractive on an emotional level. You tend to overlook its weaknesses and play up its strengths, as John McWhorter does when he breathlessly explains the theory in Our Magnificent Bastard Tongue. He stresses again and again how unique English is, how odd these constructions are, and how therefore they must have come from the Celtic languages.

I’m not a historical linguist and certainly not an expert in Celtic languages, but alarm bells started going off in my head when I read McWhorter’s book. There were just too many things that didn’t add up, too many pieces that didn’t quite fit. I wanted to believe it because it sounded so cool, but wanting to believe something doesn’t make it so. Of course, none of this is to say that it isn’t so. Maybe it’s all true but there just isn’t enough evidence to prove it yet. Maybe I’m being overly skeptical for nothing.

But in linguistics, as in other sciences, a good dose of skepticism is healthy. A crazy theory requires some crazy-good proof, and right now, all I see is a theory with enough holes in it to sink a fleet of Viking longboats.


The Pronunciation of Smaug

With the recent release of the new Hobbit movie, The Desolation of Smaug, a lot of people have been talking about the pronunciation of the titular dragon’s name. The inclination for English speakers is to pronounce it like smog, but Tolkien made clear in his appendixes to The Lord of the Rings that the combination au was pronounced /au/ (“ow”), as it is in German. A quick search on Twitter shows that a lot of people are perplexed or annoyed by the pronunciation, with some even declaring that they refuse to see the movie because of it. Movie critic Eric D. Snider joked, “I’m calling him ‘Smeowg’ now. Someone please Photoshop him to reflect the change, thanks.” I happily obliged.


I can haz desolashun?

So what is it about the pronunciation of Smaug that makes people so crazy? Simply put, it doesn’t fit modern English phonology. Phonology is the pattern of sounds in language (or the study of those patterns), including things like syllable structure, word stress, and permissible sound combinations. In my undergraduate phonology class, my professor once gave us an exercise: think of all the consonants that can follow /au/, and give an example of each. The first several came easily, but we started to run out quickly: out, house (both as a noun with /s/ and as a verb with /z/), owl, mouth (both as a noun with /θ/ and as a verb with /ð/), down, couch, hour, and gouge. What these sounds all have in common is that they’re coronal consonants, or those made with the front of the tongue.

The coronal consonants in modern Standard English are /d/, /t/, /s/, /z/, /ʃ/ (as in shoe), /ʒ/ (as in measure), /tʃ/ (as in church), /dʒ/ (as in judge) /l/, /r/, and /n/. As far as I know, only two coronal consonants are missing from the list of consonants that can follow /au/—/ʃ/ and /ʒ/, the voiceless and voiced postalveolar fricatives. By contrast, /g/ is a dorsal consonant, pronounced with the back of the tongue. There are some nonstandard dialects (such as Cockney and African American English) that change /θ/ to /f/ and thus pronounce words like mouth as /mauf/, but in Standard English the pattern holds; there are no words with /aup/ or /aum/ or /auk/. (The only exception I know of, howf, is a rare Scottish word that was apparently borrowed from Dutch, and it could be argued that it appears rarely enough in Standard English that it shouldn’t be considered a part of it. It appears not at all in the Corpus of Contemporary American English and only once in the Corpus of Historical American English, but it’s in scare quotes. I only know it as an occasionally handy Scrabble word.)

And this isn’t simply a case like orange or silver, where nothing happens to rhyme with them. Through the accidents of history, the /aug/ combination simply does not occur in modern English. Before the Great Vowel Shift, Middle English /au/ turned into /ɔ:/ (as in caught today). (Note: the : symbol here denotes that a vowel is long.) During the Great Vowel Shift, /u:/ turned into a new /au/, but apparently this /u:/ never occurred before non-coronal consonants. This means that in Middle English, either /u/ lengthened before coronals or /u:/ shortened before non-coronals; I’m not sure which. But either way, it left us with the unusual pattern we see in English today.

What all this technical gibberish means is that, in the absence of a clear pronunciation guide, readers will assume that the “au” in Smaug is pronounced as it is in other English words, which today is almost always /ɔ:/ or /ɑ:/. Thus most Americans will rhyme it with smog. (I can’t speak with authority about other varieties of English, but they would probably opt for one of those vowels or something similar, but not the diphthong /au/.) It’s not surprising that many readers will feel annoyed when told that their pronunciation clashes with the official pronunciation, which they find unintuitive and, frankly, rather non-English.

One final note: Michael Martinez suggests in this post that /smaug/ is not actually Tolkien’s intended pronunciation. After all, he says, the appendixes are a guide to the pronunciation of Elvish, and Smaug’s name is not Elvish. Martinez quotes one of Tolkien’s letters regarding the origin of the name: “The dragon bears as name—a pseudonym—the past tense of the primitive Germanic verb Smugan, to squeeze through a hole: a low philological jest.” He seems to take this as evidence against the pronunciation /smaug/, but this is probably because Tolkien was not as clear as he could have been. Smugan is the infinitive form; the past tense is—surprise—smaug.

Note: the definition given for the Proto-Germanic form doesn’t quite match Tolkien’s, though it appears to be the same verb; the Old English form, also with the infinitive smugan, is defined as “to creep, crawl, move gradually”. The astute student of language will notice that the past tense of the verb in Old English had the form smēag in the first and third person. This is because the Proto-Germanic /au/ became /ēa/ in Old English and /i:/ or /ai/ in modern English; compare the German auge ‘eye’ and the English eye. This demonstrates once again that English lost the combination /aug/ quite some time ago while its sister languages hung on to it.

So yes, it appears that Tolkien really did intend Smaug to be pronounced /smaug/, with that very un-English (but very Germanic) /aug/ combination at the end. He was a linguist and studied several languages in depth, particularly old Germanic languages such as Old English, Old Norse, and Gothic. He was certainly well aware of the pronunciation of the word, even if he didn’t make it clear to his readers. You can find the pronunciation silly if you want, you can hate it, and you can even threaten to boycott the movie, but you can’t call it wrong.


Hanged and Hung

The distinction between hanged and hung is one of the odder ones in the language. I remember learning in high school that people are hanged, pictures are hung. There was never any explanation of why it was so; it simply was. It was years before I learned the strange and complicated history of these two words.

English has a few pairs of related verbs that are differentiated by their transitivity: lay/lie, rise/raise, and sit/set. Transitive verbs take objects; intransitive ones don’t. In each of these pairs, the intransitive verb is strong, and the transitive verb is weak. Strong verbs inflect for the preterite (simple past) and past participle forms by means of a vowel change, such as sing–sang–sung. Weak verbs add the -(e)d suffix (or sometimes just a -t or nothing at all if the word already ends in -t). So lie–lay–lain is a strong verb, and lay–laid–laid is weak. Note that the subject of one of the intransitive verbs becomes the object when you use its transitive counterpart. The book lay on the floor but I laid the book on the floor.

Historically hang belonged with these pairs, and it ended up in its current state through the accidents of sound change and history. It was originally two separate verbs (the Oxford English Dictionary actually says it was three—two Old English verbs and one Old Norse verb—but I don’t want to go down that rabbit hole) that came to be pronounced identically in their present-tense forms. They still retained their own preterite and past participle forms, though, so at one point in Early Modern English hang–hung–hung existed alongside hang–hanged–hanged.

Once the two verbs started to collapse together, the distinction started to become lost too. Just look at how much trouble we have keeping lay and lie separate, and they only overlap in the present lay and the past tense lay. With identical present tenses, hang/hang began to look like any other word with a choice between strong and weak past forms, like dived/dove or sneaked/snuck. The transitive/intransitive distinction between the two effectively disappeared, and hung won out as the preterite and past participle form.

The weak transitive hanged didn’t completely vanish, though; it stuck around in legal writing, which tends to use a lot of archaisms. Because it was only used in legal writing in the sense of hanging someone to death (with the poor soul as the object of the verb), it picked up the new sense that we’re now familiar with, whether or not the verb is transitive. Similarly, hung is used for everything but people, whether or not the verb is intransitive.

Interestingly, German has mostly hung on to the distinction. Though the German verbs both merged in the present tense into hängen, the past forms are still separate: hängen–hing–gehungen for intransitive forms and hängen–hängte–gehängt for transitive. Germans would say the equivalent of I hanged the picture on the wall and The picture hung on the wall—none of this nonsense about only using hanged when it’s a person hanging by the neck until dead.

The surprising thing about the distinction in English is that it’s observed (at least in edited writing) so faithfully. Usually people aren’t so good at honoring fussy semantic distinctions, but here I think the collocates do a lot of the work of selecting one word or the other. Searching for collocates of both hanged and hung in COCA, we find the following words:



The hanged words pretty clearly all hanging people, whether by suicide, as punishment for murder, or in effigy. (The collocations with burned were all about hanging and burning people or effigies.) The collocates for hung show no real pattern; it’s simply used for everything else. (The collocations with neck were not about hanging by the neck but about things being hung from or around the neck.)

So despite what I said about this being one of the odder distinctions in the language, it seems to work. (Though I’d like to know to what extent, if any, the distinction is an artifact of the copy editing process.) Hung is the general-use word; hanged is used when a few very specific and closely related contexts call for it.


No Dice

If you’ve ever had to learn a foreign language, you may have struggled to memorize plural forms of nouns. German, for example, has about a half a dozen ways of forming plurals, and it’s a chore to remember which kind of plural each noun takes. English, by comparison, is ridiculously easy. Here’s how it works for nearly every English noun: add -s to the end. Sometimes you need to insert an e before the s, and sometimes you need to change a preceding y to ie, but that’s the rule in a nutshell.

Of course, there are still plenty of exceptions: a couple that end in -en (oxen and the strange double plural children), a handful of umlaut plurals (man–men, foot–feet, mouse–mice, etc.), some uninflected plurals (usually for domesticated or game animals, such as sheep, deer, and so on), and a plethora of foreign borrowings (particularly from Latin and Greek) that often follow rules from their donor languages but occasionally don’t. There are a few other oddballs—like person–people, for example—but nearly every English count noun fits into one of these categories.

But there’s one plural that doesn’t fit into any of these categories, because it’s been caught for centuries in a strange limbo between count nouns, which take plural forms, and mass nouns, which don’t. It’s dice. If you need a refresher, mass nouns generally refer to things that are not discrete, such as milk or oil, though some refer to things that are made of discrete pieces “whose indivual identities are not usually important to us,” as Arnold Zwicky put it in this Language Log post—words like corn or rice. You could count the individual grains or kernels if you wanted to, but why would you ever want to?

And this is how dice slipped through the cracks of language change. Originally, die was a regular noun that formed its plural by adding an s sound to the end. (For the moment, let’s leave aside the issue of spelling, because Middle and Early Modern English spelling was anything but standard.) At some point in the history of English, the final -s in plurals was voiceless, meaning that it was always pronounced with an s sound, not a z sound. But then that changed, probably sometime in the 1500s, so that the final -s was always voiced—that is, pronounced as a z—unless it followed a voiceless sound. Strangely, this sound change seems to have affected only the plural and possessive -s endings and not other word-final s’s.

But around that time, we start seeing the plural of die, when referring to those little cubes with pips used for games and whatnot, spelled as dice (and similar forms). In Modern English spelling, the final -s on a plural can be either voiced or voiceless, depending on the preceding word, but -ce is always voiceless. As the regular plural ending was becoming voiced for many many words, it remained voiceless in dice. Why?

Well, apparently because people had stopped thinking of it as a plural and started thinking of it as a mass noun, much like corn and rice, so they stopped seeing the s sound on the end as the plural marker and started perceiving it as simply part of the word. Singular dice can be found back to the late 1300s, and when the sound change came along in the 1500s and voiced most plural -s endings, dice was left behind, with its spelling altered to show that it was unequivocally voiceless. In other senses of the word, die was still thought of as a regular count noun, so its plural forms ended up as dies.*

Dice wasn’t the only word passed over in this way, though; truce (originally the plural of true, meaning “pledge” or “oath”), bodice (plural of body), and pence (a contracted plural form of penny) come to us the same way. Speakers subconsciously reanalyzed these words as mass nouns or singular count nouns, so their final s sounds stayed voiceless. Similarly, once, twice, and thrice were originally genitive forms, but they ceased to be thought of as such and consequently retained their voiceless sounds, respelled with ce.

But the strange thing is that whereas the words mentioned above made the transition to mass nouns or new singular count nouns, usage of dice has been split for centuries. We’ve never fully made the switch to thinking of dice as a mass noun, used regardless of the actual number of the things, because, unlike rice or corn, we do frequently care about the number of dice being used. Instead of a true mass noun, it’s become an uninflected count noun—one dice, two dice—for many people, though it exists alongside the original singular die. But singular dice is rare in print, because we’re told that it’s properly one die, two dice, even though some dictionaries note that singular dice is much more frequent in gaming than die.

So where does that leave us? You can go with singular die and possibly be thought of as something of a pedant, or you can go with singular dice and possibly be thought of as a little ignorant. As for me, I usually use singular die and feel twinges of self-loathing when I do so; I haven’t had the heart to correct my boys when they use singular dice.

*For more on the reconstruction of the plural ending in English, see the section on the English plural suffix in the chapter “Reconstruction” in Language History: An Introduction, by Andrew L. Sihler (Philadelphia: John Benjamins, 2000).


Whose Pronoun Is That?

In my last post I touched on the fact that whose as a relative possessive adjective referring to inanimate objects feels a little strange to some people. In a submission for the topic suggestion contest, Jake asked about the use of that with animate referents (“The woman that was in the car”) and then said, “On the flip side, consider ‘the couch, whose cushion is blue.’ ‘Who’ is usually used for animate subjects. Why don’t we have the word ‘whichs’ for inanimate ones?”

Merriam-Webster’s Dictionary of English Usage (one of my favorite books on language; if you don’t already own it, you should buy it now—seriously.) says that it has been in use from the fourteenth century to the present but that it wasn’t until the eighteenth century that grammarians like Bishop Lowth (surprise, surprise) started to cast aspersions on its use.

MWDEU concludes that “the notion that whose may not properly be used of anything except persons is a superstition; it has been used by innumerable standard authors from Wycliffe to Updike, and is entirely standard as an alternative to of which the in all varieties of discourse.” Bryan A. Garner, in his Garner’s Modern American Usage, says somewhat more equivocally, “Whose may usefully refer to things ⟨an idea whose time has come⟩. This use of whose, formerly decried by some 19th-century grammarians and their predecessors, is often an inescapable way of avoiding clumsiness.” He ranks it a 5—“universally adopted except for a few eccentrics”—but his tone leaves one feeling as if he thinks it the lesser of two evils.

But how did we end up in this situation in the first place? Why don’t we have a whiches or thats or something equivalent? MWDEU notes that “English is not blessed with a genitive form for that or which“, but to understand why, you have to go back to Old English and the loss of the case system in Early Middle English.

First of all, Old English did not use interrogative pronouns (who, which, or what) as relative pronouns. It either used demonstrative pronouns—whence our modern that is descended—or the invariable complementizer þe, which we’ll ignore for now. The demonstrative pronouns declined for gender, number, and case, just like the demonstrative and relative pronouns of modern German. The important point is that in Old English, the relative pronouns looked like this:

Case Masculine Neuter Feminine Plural
Nominative se þæt sēo þā
Accusative þone þæt þā þā
Genitive þæs þæs þǣre þāra, þǣra
Dative þǣm þǣm þǣre þǣm, þām
Instrumental þȳ, þon þȳ, þon

(Taken from The þ is a thorn, which represents a “th” sound.)

As the Old English case system disappeared, this all reduced to the familiar that, which you can see comes from the neuter nominative/accusative form. The genitive, or possessive, form was lost. And in Middle English, speakers began to use interrogative pronouns as relatives, probably under the influence of French. Here’s what the Old English interrogative pronouns looked like:

Case Masculine/Feminine Neuter Plural
Nominative hwā hwæt hwā/hwæt
Accusative hwone hwæt hwone/hwæt
Genitive hwæs hwæs hwæs
Dative hwǣm hwǣm hwǣm
Instrumental hwȳ hwȳ hwǣm

(Wikipedia didn’t have an article or section on Old English interrogative pronouns, so I borrowed the forms from Wikibooks.)

On the masculine/feminine side, we get the ancestors of our modern who/whom/whose (hwā/hwǣm/hwæs), and on the neuter side, we get the ancestor of what (hwæt). Notice that the genitive forms for the two are the same—that is, although we think of whose being the possessive form of who, it’s historically also the possessive form of what.

But we don’t use what as a relative pronoun (well, some dialects do, but Standard English doesn’t); we use which instead. Which also had the full paradigm of case endings just like who/what that. But rather than bore you with more tables full of weird-looking characters, I’ll cut to the chase: which originally had a genitive form, but it too was lost when the Old English case system disappeared.

So of all the demonstrative and interrogative pronouns in English, only one survived with its own genitive form, who. (I don’t know why who hung on to its case forms while the others lost theirs; maybe that’s a topic for another day.) Speakers quite naturally used whose to fill that gap—and keep in mind that it was originally the genitive form of both the animate and inanimate forms of the interrogative pronoun, so English speakers originally didn’t have any qualms about employing it with inanimate relative pronouns, either.

But what does that mean for us today? Well, on the one hand, you can argue that whose as an inanimate relative possessive adjective has a long, well-established history. It’s been used by the best writers for centuries, so there’s no question that it’s standard. But on the other hand, this ignores the fact that some people think there’s something not quite right about it. After all, we don’t use whose as a possessive form of which or that in their interrogative or demonstrative functions. And although it has a long pedigree, another inanimate possessive with a long pedigree fell out of use and was replaced.

His was originally the possessive form of both he and it, but neuter his started to fall out of use and be replaced by a new form its in the sixteenth century. After English lost grammatical gender, people began to use he and she only for people and other animate things and it only for inanimate things. They started to feel a little uncomfortable using the original possessive form of it, his, with inanimate things, so they fashioned a new possessive, its, to replace it.

In other words, there’s precedence for disfavoring inanimate whose and using another word or construction instead. Unfortunately, now thats or whiches will never get off the ground, because they’ll be so heavily stigmatized as nonstandard forms. There are two different impulses fighting one another here: the impulse to have a full and symmetrical paradigm and the impulse to avoid using animate pronouns for inanimate things. Only time will tell which one wins out. For now, I’d say it’s good to remember that inanimate whose is frequently used by good writers and that there’s nothing wrong with it per se. In your own writing, just trust your ear.


An Introduction to Historical Linguistics

Historical linguistics is a field that many people don’t know a whole lot about. We all speak a language, and we all know that our words came from somewhere else, but we don’t always have the clearest idea as to where or why. So people speculate and come up with plausible explanations of word origins—what we call folk etymologies.

The problem is that etymologies are quite often not intuitive, nor can they be determined solely through deductive reasoning. Words take very circuitous paths on their way from history to the present. Over the course of a couple thousand years, a word can change so thoroughly that it becomes unrecognizable. Words that look similar aren’t always related, and words that are related don’t always look similar.

Take, for instance, just a few of the Indo-European words for five: fünf (German), cinq (French), pump (Welsh), cóig (Scottish), pénte (classical Greek), pyat’ (Russian), pãch (Hindi), and panj (Farsi). Believe it or not, all these words—as disparate as they seem—are related; they come from the same word, *penkwe. They may look very different, but they all changed via systematic sound changes.

Think of it like a Rubik’s cube: you start out with all the colors on their respective sides, and then you start turning the faces. Pretty soon, it’s a complete jumble. There doesn’t seem to be any sort of pattern to the arrangement of the colors now, but they didn’t get that way by chance; you made a specific series of twists to get them to end up the where they are.

This means that you can’t assume two words are related just because they look alike, or that two words aren’t related because they don’t look alike. Looking alike is a good start, but that’s all it is; next you have to find the systematic changes that connect the words. Historical linguistics isn’t just guesswork or finding lists of words that have a couple of sounds in common. It’s about knowing where the language has been and how languages change and then filling in the blanks.

There are lots of books and sites out there that purport to show that German comes from Hebrew or that Welsh and Hindi are closely related or any number of other weird claims. However, the thing that these all lack is systematicity. Without a system and without a knowledge of how languages change, historical linguistics is nothing more than a meaningless matching game. You can take the stickers off the Rubik’s cube and rearrange them to look good, but you haven’t really solved the puzzle.

