Arrant Pedantry


Umlauts, Diaereses, and the New Yorker

Several weeks ago, the satirical viral content site Clickhole posted this article: “Going Rogue: ‘The New Yorker’ Has Announced That They’re Going To Start Putting An Umlaut Over Every Letter ‘O’ And No One Can Stop Them”. I’ve long enjoyed poking at the New Yorker for its distractingly idiosyncratic style,* but I had a couple of quibbles with the article, so I took to Twitter to explain the history of those little dots.

First off, those little dots that appear in words like coöperation aren’t umlauts: they’re diaereses. But a few paragraphs into the article, they actually correct the headline with this fake quote from New Yorker editor David Remnick: “We already know some of you don’t like the dots. You probably call them umlauts. Well, you’re wrong: They’re actually called diaeresis, so try thinking twice before trying to correct us on how we use them.” But this just introduced another problem: diaeresis is the singular form. The plural form is diaereses. That is, the second o in coöperate has a diaeresis over it, but you’d say that the New Yorker uses diaereses in words with doubled vowels.

A diaeresis is a pair of dots that appear over a vowel to indicate that the vowel is pronounced separately from an adjacent vowel. For example, in English oo is generally pronounced as a single vowel sound, usually either the /u/ sound in boot or the /ʊ/ in book. The New Yorker puts a diaeresis over the repeated vowel in words like cooperate to show that those two o’s are pronounced as two distinct vowels. This also applies to other words with repeated vowels like reelect.

English doesn’t use very many diacritical marks, and the ones that it does use are almost entirely from foreign borrowings. But the diaeresis is uncommon in English even compared to other diacriticals. It mostly appears in French borrowings like naïveté (though naïve is often simplified to naive), where it serves the same purpose: showing that the two adjacent vowels are pronounced separately and not as a diphthong or a single long vowel. (In French, for example, ai is pronounced with the /ɛ/ sound in bet, so without the diaeresis, naive would be pronounced like Neve Campbell’s first name.)

The diaeresis goes all the way back to Ancient Greek, where it was also used the same way. Its first use, though, was to separate a vowel at the start of a new word from a vowel at the end of a preceding word, because Greek was originally written without any spaces between words. The word diaeresis comes from the Ancient Greek word for ‘division’, from diairein ‘to divide, separate’, from dia– ‘apart’ + hairein ‘take’. That is, it was simply a mark that divided two words or two adjacent vowels. Some later European languages saw the utility of a mark that indicated that two vowels were meant to be pronounced individually, and they adopted it.

But it has never been common in English outside of the pages of the New Yorker. In Confessions of a Comma Queen, former New Yorker copy editor Mary Norris briefly recounts the rationale behind the magazine’s style choice (excerpted here on Merriam-Webster’s website):

Basically, we have three options for these kinds of words: “cooperate,” “co-operate,” and “coöperate.” Back when the magazine was just developing its style, someone decided that the first could be misread and the second was ridiculous, and so adopted the third as the most elegant solution with the broadest application.

Norris also says that the style editor was on the verge of changing his mind on the diaereses back in 1978, but then he died, and “no one has had the nerve to raise the subject since.” Norris herself admits that “most people would not trip over the ‘coop’ in ‘cooperate’ or the ‘reel’ in ‘reelect'” and that diaereses are number one complaint from readers, but apparently they’re not going anywhere anytime soon. (I think that if you’re afraid to talk about changing your style guide—especially when readers find your style distracting and annoying—then you have either a bad style guide or a bad culture surrounding your style guide or both.)

But on to umlauts. Umlauts look just like diaereses—you could call diaereses and umlauts homoglyphic—but they’re used in a very different way and have a distinct origin. The umlaut symbol originated in German but has been borrowed into other languages, including Swedish, Hungarian, Turkish, and Finnish. But to understand what an umlaut does, you need to understand a little bit about where vowels are produced in the mouth.

The vowel /u/ (the sound in “boot”) is a high back vowel: it’s pronounced with the tongue pulled back and the mouth only slightly open so that the tongue is close to the roof of the mouth. The vowel /i/ (the sound in “beet”), on the other hand, is a high front vowel: it’s similarly pronounced with the mouth only slightly open so that the tongue is close to the roof of the mouth, but the tongue is pushed forward instead. If you alternate between saying “oo” and “ee”, you should be able to feel the difference. The vowel /i/ is pronounced a little behind your top front teeth, while /u/ is pronounced towards the soft palate, also known as the velum. (The vowel /u/ is also pronounced with the lips rounded, which has the effect of enhancing the distinction between it and /i/.) And the vowel /a/ (roughly like “ah”, though not every dialect of English has that exact vowel sound) is pronounced with the tongue in the middle or towards the front of the mouth and with the mouth wide open. The International Phonetic Alphabet considers it a low front vowel, but it’s also sometimes treated as a low central vowel.

What an umlaut symbol does, then, is indicate that a vowel is produced further forward in the mouth (and sometimes also higher in the mouth) than normal. For example, ü is pronounced in the same place as /i/, but it retains the lip rounding of /u/. Try saying /i/ or /ɪ/ (“ee” or “ih”) with your lips rounded, and voilà: you just made the sound of the German ü. An ö, by contrast, is like an /e/ or an /ɛ/ (roughly an “ay” or an “eh”) with lips rounded, while an ä is raised to an /e/ or an /ɛ/. (The vowel /a/ doesn’t have any lip rounding, so neither does the umlauted version.)

The term umlaut, which comes from a German word roughly meaning ‘sound change’, is also used in Germanic linguistics to refer to certain kinds of vowel changes, especially when a vowel moves closer to /i/. Sometimes, when a back or central vowel is followed by a front vowel, we start moving our tongue forward a little early in anticipation of that front vowel. In other words, the frontness of one vowel can spread backwards through the word to the preceding vowel.

English doesn’t use the umlaut mark, but it’s full of words that were produced by the phonological process of umlaut. Plurals like men, geese, feet, and mice were formed by umlaut. In Proto-Germanic, an ancestor of English that was spoken between about 500 BC and the first few centuries AD, the singular form of the word for ‘man’ was mann, and the plural was manniz. That /i/ vowel in the suffix eventually pulled the /a/ up and forward to /ɛ/, yielding the word men in English. At some point the suffix dropped away entirely, leaving only the changed vowel in the stem as evidence that it was there. In the case of geese, feet, and mice, the umlauted vowels also lost their rounding after they moved forward.

Umlaut also shows up in English in some less expected places. Have you ever wondered why words like busy and bury aren’t spelled like they’re pronounced? It’s because those words evolved in different ways in different Old English dialects. In some dialects, the first vowel umlauted and then lost its rounding, ultimately yielding an /ɪ/ or an /ɛ/. But in other dialects, they didn’t undergo umlaut, retaining the original /u/. At some point the two forms mashed up, and we got the spelling of one dialect and the pronunciation of another. The weird alternations in words like bring/brought and teach/taught are also the product of umlaut, with a couple other phonological changes thrown in for good measure.

So if the phonological process of umlaut is common to English, German, and other Germanic languages, why does German use the umlaut character but not English? It’s simply because the writing systems of each language developed separately after many of those sound changes had happened. For example, the modern German word schön was written schoene in Middle High German (around 1050 to 1350 AD). That final -e on the end, which has since dropped off, caused the o to become umlauted. But then, to make it clear that the o was pronounced with an umlaut, people started writing another e after the o too. Then they started writing that e above the o rather than after it to show that it was affecting the vowel but wasn’t really pronounced, and eventually this superscript e simplified to two short vertical strokes or two dots.† And thus the confusion between diaereses and umlauts was born.

So there you have it: The diaeresis is originally a Greek thing that indicates that two adjacent vowels are pronounced separately. In English, you’ll mostly see it in a few French borrowings or in the pages of the New Yorker. And the umlaut is originally a German thing, though it also represents a phonological process found in English and other languages. There aren’t a lot of German borrowings in English that use umlauts, so you mostly see it in the names of bands that are trying to look a little more metal.

Nöw yöu knöw.

* The New Yorker’s style inspired one of my favorite style-related tweets, from the inimitable Benjamin Dreyer:

† The tilde and cedilla were formed in similar ways. A tilde was originally just a superscript n, while a cedilla was a subscript z. The history of the latter is even right there in its name: a cedilla is a little ceda, an Old Spanish form of zeta.


Cognates, False and Otherwise

A few months ago, I was editing some online German courses, and I came across one of my biggest peeves in discussions of language: false cognates that aren’t.

If you’ve ever studied a foreign language, you’ve probably learned about false cognates at some point. According to most language teachers and even many language textbooks, false cognates are words that look like they should mean the same thing as their supposed English counterparts but don’t. But cognates don’t necessarily look the same or mean the same thing, and words that look the same and mean the same thing aren’t necessarily cognates.

In linguistics, cognate is a technical term meaning that words are etymologically related—that is, they have a common origin. The English one, two, three, German eins, zwei, drei, French un, deux, trois, and Welsh un, dau, tri are all cognate—they and words for one, two, three in many other language all trace back to the Proto-Indo-European (PIE) *oino, *dwo, *trei.

These sets are all pretty obvious, but not all cognates are. For example, the English four, five, German vier, fünf, French quatre, cinq, and Welsh pedwar, pump. The English and German are still obviously related, but the others less so. Fünf and pump are actually pretty close, but it seems a pretty long way from four and vier to pedwar, and an even longer way from them to quatre and cinq.

And yet these words all go back to the PIE *kwetwer and *penkwe. Though the modern-day forms aren’t as obviously related, linguists can nevertheless establish their relationships by tracing the them back through a series of sound changes to their conjectured historical forms.

And not all cognates share meaning. The English yoke, for instance, is related to the Latin jugular, the Greek zeugma, and the Hindi yoga, along with join, joust, conjugate, and many others. These words all trace back to the PIE *yeug ‘join’, and that sense can still be seen in some of its modern descendants, but if you’re learning Hindi, you can’t rely on the word yoke to tell you what yoga means.

Which brings us back to the German course that I was editing. Cognates are often presented as a way to learn vocabulary quickly, because the form and meaning are often similar enough to the form and meaning of the English word to make them easy to remember. But cognates often vary wildly in form (like four, quatre, and pedwar) and in meaning (like yoke, jugular, zeugma, and yoga). And many of the words presented as cognates are in fact not cognates but merely borrowings. Strictly speaking, cognates are words that have a common origin—that is, they were inherited from an ancestral language, just as the cognates above all descend from Proto-Indo-European. Cognates are like cousins—they may belong to different families, but they all trace back to a common ancestor.

But if cognates are like cousins, then borrowings are like clones, where a copy of word is taken directly from one language to another. Most of the cognates that I learned in French class years ago are actually borrowings. The English and French forms may look a little different now, but the resemblance is unmistakable. Many of the cognates in the German course I was editing were also borrowings, and in many cases they were words that were borrowed into both German and English from French:


Of these, only gold, hand, land, sand, and wind are actually cognates. Maybe it’s nitpicking to point out that the English jaguar and the German Jaguar aren’t cognates but borrowings from Portuguese. For a language learner, the important thing is that these words are similar in both languages, making them easy to learn.

But it’s the list of supposed false cognates that really irks me:

karton/cardboard box
peperoni/chili pepper
beamer/video projector
argument/proof, reasons

The German word is on the left and the English word on the right. Once again, many of these words are borrowings, mostly from French and Latin. All of these borrowings are clearly related, though their senses may have developed in different directions. For example, chef generally means “boss” in French, but it acquired its specialized sense in English from the longer phrase chef de cuisine, “head of the kitchen”. The earlier borrowing chief still maintains the sense of “head” or “boss”.

(It’s interesting that billion and trillion are on the list, since this isn’t necessarily an English/German difference—it also used to be an American/British difference, but the UK has adopted the same system as the US. Some languages use billion to mean a thousand million, while other languages use it to mean a million million. There’s a whole Wikipedia article on it.)

But some of these words really are cognate with English words—they just don’t necessarily look like it. Bad, for example, is cognate with the English bath. You just need to know that the English sounds spelled as <th>—like the /θ/ in thin or the /ð/ in then—generally became /d/ in German.

And, surprisingly, the German Gift, “poison”, is indeed cognate with the English gift. Gift is derived from the word give, and it means “something given”. The German word is essentially just a highly narrowed sense of the word: poison is something you give someone. (Well, hopefully not something you give someone.)

On a related note, that most notorious of alleged false cognates, the Spanish embarazado, really is related to the English embarrassed. They both trace back to an earlier word meaning “to put someone in an awkward or difficult situation”.

Rather than call these words false cognates, it would be more accurate to call them
false friends. This term is broad enough to encompass both words that are unrelated and words that are borrowings or cognates but that have different senses.

This isn’t to say that cognates aren’t useful in learning a language, of course, but sometimes it takes a little effort to see the connections. For example, when I learned German, one of my professors gave us a handout of some common English–German sound correlations, like the th ~ d connection above. For example, if you know that the English /p/ often corresponds to a German /f/ and that the English sound spelled <ea> often corresponds to the German /au/, then the relation between leap and laufen “to run” becomes clearer.

Or if you know that the English sound spelled <ch> often corresponds with the German /k/ or that the English /p/ often corresponds with the German /f/, then the relation between cheap and kaufen “to buy” becomes a little clearer. (Incidentally, this means that the English surname Chapman is cognate with the German Kaufmann.) And knowing that the English <y> sometimes corresponds to the German /g/ might help you see the relationship between the verb yearn and the German adverb gern “gladly, willingly”.

You don’t have to teach a course in historical linguistics in order to teach a foreign language like German, but you’re doing a disservice if you teach that obviously related pairs like Bad and bath aren’t actually related. Rather than teach students that language is random and treacherous, you can teach them to find the patterns that are already there. A little bit of linguistic background can go a long way.

Plus, you know, real etymology is a lot more fun.

Edited to add: In response to this post, Christopher Bergmann ( created this great diagram of helpful cognates, unhelpful or less-helpful cognates, false cognates, and so on:

Click to see the full-sized image.


Book Review: Schottenfreude

German is famous for its compound words. While languages like English are content to use whole phrases to express an idea, German can efficiently pack the same idea into a single word, like Schadenfreude, which means a feeling of joy from watching or hearing of someone else’s miseries. Well, in Schottenfreude: German Words for the Human Condition, Ben Schott has decided to expand on German’s compounding ability and create words that should exist.

Every right-hand page lists three made-up German compounds, along with their pronunciation, their English translation, and a more literal gloss. On the facing left-hand pages are explanatory notes discussing the concepts in more depth. For example, the first word is Herbstlaubtrittvergnügen (autumn-foliage-strike-fun), meaning “kicking through piles of autumn leaves”. The explanatory notes talk about self-reported rewarding events and the metaphorical connection between fallen leaves and human souls in literature.

The rest of the book proceeds much the same way, with funny and surprising insights into the insecurities, frailties, and joys of human life. Who hasn’t at some time or another experienced Deppenfahrerbeäugung (“the urge to turn and glare at a bad driver you’ve just overtaken”), Sommerferienewigkeitsgefühl (“childhood sensation that the summer vacation will last forever”), or Gesprächsgemetzel (“moments when, for no good reason, a conversation suddenly goes awry”)?

You don’t have to be a German speaker to appreciate this book, but it certainly helps. There are a few puns that you can only appreciate if you have a knowledge of both English and German, such as Besserwinzer (“one of those people who pretend to know more about wine than they do”), which is a play on Besserwisser, meaning “know-it-all”, and Götzengeschwätz (“praying to a god you don’t believe in”), which literally means “idol chatter”. And knowing German will certainly help you pronounce the words better; I found the provided pronunciations somewhat unintuitive, and there’s no key. The words also don’t seem to be in any particular order, so it can be a little difficult to find one again, even though there is an index.

Overall, though, it’s a greatly enjoyable little book, great for flipping through when you have a few idle minutes. Word lovers—and especially German lovers—are sure to find a lot of treasures inside.

Full disclosure: I received a free review copy of this book from the publisher. My apologies to the author and publisher for the lateness of this review.


The Pronunciation of Smaug

With the recent release of the new Hobbit movie, The Desolation of Smaug, a lot of people have been talking about the pronunciation of the titular dragon’s name. The inclination for English speakers is to pronounce it like smog, but Tolkien made clear in his appendixes to The Lord of the Rings that the combination au was pronounced /au/ (“ow”), as it is in German. A quick search on Twitter shows that a lot of people are perplexed or annoyed by the pronunciation, with some even declaring that they refuse to see the movie because of it. Movie critic Eric D. Snider joked, “I’m calling him ‘Smeowg’ now. Someone please Photoshop him to reflect the change, thanks.” I happily obliged.


I can haz arkenstone?

So what is it about the pronunciation of Smaug that makes people so crazy? Simply put, it doesn’t fit modern English phonology. Phonology is the pattern of sounds in language (or the study of those patterns), including things like syllable structure, word stress, and permissible sound combinations. In my undergraduate phonology class, my professor once gave us an exercise: think of all the consonants that can follow /au/, and give an example of each. The first several came easily, but we started to run out quickly: out, house (both as a noun with /s/ and as a verb with /z/), owl, mouth (both as a noun with /θ/ and as a verb with /ð/), down, couch, hour, and gouge. What these sounds all have in common is that they’re coronal consonants, or those made with the front of the tongue.

The coronal consonants in modern Standard English are /d/, /t/, /s/, /z/, /ʃ/ (as in shoe), /ʒ/ (as in measure), /tʃ/ (as in church), /dʒ/ (as in judge) /l/, /r/, and /n/. As far as I know, only two coronal consonants are missing from the list of consonants that can follow /au/—/ʃ/ and /ʒ/, the voiceless and voiced postalveolar fricatives. By contrast, /g/ is a dorsal consonant, pronounced with the back of the tongue. There are some nonstandard dialects (such as Cockney and African American English) that change /θ/ to /f/ and thus pronounce words like mouth as /mauf/, but in Standard English the pattern holds; there are no words with /aup/ or /aum/ or /auk/. (The only exception I know of, howf, is a rare Scottish word that was apparently borrowed from Dutch, and it could be argued that it appears rarely enough in Standard English that it shouldn’t be considered a part of it. It appears not at all in the Corpus of Contemporary American English and only once in the Corpus of Historical American English, but it’s in scare quotes. I only know it as an occasionally handy Scrabble word.)

And this isn’t simply a case like orange or silver, where nothing happens to rhyme with them. Through the accidents of history, the /aug/ combination simply does not occur in modern English. Before the Great Vowel Shift, Middle English /au/ turned into /ɔ:/ (as in caught today). (Note: the : symbol here denotes that a vowel is long.) During the Great Vowel Shift, /u:/ turned into a new /au/, but apparently this /u:/ never occurred before non-coronal consonants. This means that in Middle English, either /u/ lengthened before coronals or /u:/ shortened before non-coronals; I’m not sure which. But either way, it left us with the unusual pattern we see in English today.

What all this technical gibberish means is that, in the absence of a clear pronunciation guide, readers will assume that the “au” in Smaug is pronounced as it is in other English words, which today is almost always /ɔ:/ or /ɑ:/. Thus most Americans will rhyme it with smog. (I can’t speak with authority about other varieties of English, but they would probably opt for one of those vowels or something similar, but not the diphthong /au/.) It’s not surprising that many readers will feel annoyed when told that their pronunciation clashes with the official pronunciation, which they find unintuitive and, frankly, rather non-English.

One final note: Michael Martinez suggests in this post that /smaug/ is not actually Tolkien’s intended pronunciation. After all, he says, the appendixes are a guide to the pronunciation of Elvish, and Smaug’s name is not Elvish. Martinez quotes one of Tolkien’s letters regarding the origin of the name: “The dragon bears as name—a pseudonym—the past tense of the primitive Germanic verb Smugan, to squeeze through a hole: a low philological jest.” He seems to take this as evidence against the pronunciation /smaug/, but this is probably because Tolkien was not as clear as he could have been. Smugan is the infinitive form; the past tense is—surprise—smaug.

Note: the definition given for the Proto-Germanic form doesn’t quite match Tolkien’s, though it appears to be the same verb; the Old English form, also with the infinitive smugan, is defined as “to creep, crawl, move gradually”. The astute student of language will notice that the past tense of the verb in Old English had the form smēag in the first and third person. This is because the Proto-Germanic /au/ became /ēa/ in Old English and /i:/ or /ai/ in modern English; compare the German auge ‘eye’ and the English eye. This demonstrates once again that English lost the combination /aug/ quite some time ago while its sister languages hung on to it.

So yes, it appears that Tolkien really did intend Smaug to be pronounced /smaug/, with that very un-English (but very Germanic) /aug/ combination at the end. He was a linguist and studied several languages in depth, particularly old Germanic languages such as Old English, Old Norse, and Gothic. He was certainly well aware of the pronunciation of the word, even if he didn’t make it clear to his readers. You can find the pronunciation silly if you want, you can hate it, and you can even threaten to boycott the movie, but you can’t call it wrong.


Hanged and Hung

The distinction between hanged and hung is one of the odder ones in the language. I remember learning in high school that people are hanged, pictures are hung. There was never any explanation of why it was so; it simply was. It was years before I learned the strange and complicated history of these two words.

English has a few pairs of related verbs that are differentiated by their transitivity: lay/lie, rise/raise, and sit/set. Transitive verbs take objects; intransitive ones don’t. In each of these pairs, the intransitive verb is strong, and the transitive verb is weak. Strong verbs inflect for the preterite (simple past) and past participle forms by means of a vowel change, such as sing–sang–sung. Weak verbs add the -(e)d suffix (or sometimes just a -t or nothing at all if the word already ends in -t). So lie–lay–lain is a strong verb, and lay–laid–laid is weak. Note that the subject of one of the intransitive verbs becomes the object when you use its transitive counterpart. The book lay on the floor but I laid the book on the floor.

Historically hang belonged with these pairs, and it ended up in its current state through the accidents of sound change and history. It was originally two separate verbs (the Oxford English Dictionary actually says it was three—two Old English verbs and one Old Norse verb—but I don’t want to go down that rabbit hole) that came to be pronounced identically in their present-tense forms. They still retained their own preterite and past participle forms, though, so at one point in Early Modern English hang–hung–hung existed alongside hang–hanged–hanged.

Once the two verbs started to collapse together, the distinction started to become lost too. Just look at how much trouble we have keeping lay and lie separate, and they only overlap in the present lay and the past tense lay. With identical present tenses, hang/hang began to look like any other word with a choice between strong and weak past forms, like dived/dove or sneaked/snuck. The transitive/intransitive distinction between the two effectively disappeared, and hung won out as the preterite and past participle form.

The weak transitive hanged didn’t completely vanish, though; it stuck around in legal writing, which tends to use a lot of archaisms. Because it was only used in legal writing in the sense of hanging someone to death (with the poor soul as the object of the verb), it picked up the new sense that we’re now familiar with, whether or not the verb is transitive. Similarly, hung is used for everything but people, whether or not the verb is intransitive.

Interestingly, German has mostly hung on to the distinction. Though the German verbs both merged in the present tense into hängen, the past forms are still separate: hängen–hing–gehangen for intransitive forms and hängen–hängte–gehängt for transitive. Germans would say the equivalent of I hanged the picture on the wall and The picture hung on the wall—none of this nonsense about only using hanged when it’s a person hanging by the neck until dead.

The surprising thing about the distinction in English is that it’s observed (at least in edited writing) so faithfully. Usually people aren’t so good at honoring fussy semantic distinctions, but here I think the collocates do a lot of the work of selecting one word or the other. Searching for collocates of both hanged and hung in COCA, we find the following words:



The hanged words pretty clearly all hanging people, whether by suicide, as punishment for murder, or in effigy. (The collocations with burned were all about hanging and burning people or effigies.) The collocates for hung show no real pattern; it’s simply used for everything else. (The collocations with neck were not about hanging by the neck but about things being hung from or around the neck.)

So despite what I said about this being one of the odder distinctions in the language, it seems to work. (Though I’d like to know to what extent, if any, the distinction is an artifact of the copy editing process.) Hung is the general-use word; hanged is used when a few very specific and closely related contexts call for it.

%d bloggers like this: