Arrant Pedantry

By

Black Friday Sale at the Arrant Pedantry Store

It’s Black Friday (ugh), but from now through Sunday, everything at the Arrant Pedantry Store is 15 percent off (yay!). Now’s a great chance to get a word-nerdy shirt for that special someone in your life (or for yourself). Just use the code CYBER18 at checkout. Or if you wait until Monday, you can get 15 percent off and free shipping, which I think is the best sale that Spreadshirt has ever offered. Use the code CYBERSALE on Monday to get that deal.

And don’t forget that you can customize products. Just hit the Customize button, find the design you want, and put it on whatever product you want. You can put Battlestar Grammatica on an iPhone case or I Could Care Fewer on a tote bag.

Check it out!


By

100,000 Words Whose Pronunciations Have Changed

We all know that language changes over time, and one of the major components of language change is sound change. Many of the words we use today are pronounced differently than they were in Shakespeare’s or Chaucer’s time. You may have seen articles like this one that list 10 or 15 words whose pronunciations have changed over time. But I can do one better. Here are 100,000 words that illustrate how words change.

  1. a: Before the Great Vowel Shift, the name of the first letter of the alphabet was pronounced /aː/, much like when the doctor asks you to open your mouth and say “ah” to look down your throat. In Old English, it was /ɑː/, which is pronounced slightly further back in the mouth. The name of the letter was borrowed from Latin, which introduced its alphabet to much of Europe. The Romans got their alphabet from the Greeks, probably by way of the Etruscans. But unlike the Greeks, the Romans simply called the letters by the sounds they made. The corresponding Greek letter, alpha, got its name from the Phoenician aleph, meaning ‘ox’, because the letter aleph represented the first sound in the word aleph. In Phoenician this was a glottal stop (which is not written in the Latin alphabet). The Greeks didn’t use this sound, so they borrowed it for the /a/ sound instead.
  2. a: This casual pronunciation of the preposition of goes back at least to the 1200s. It doesn’t appear in writing much, except in dialogue, where it’s usually attached to another word, as in kinda. But of itself comes from an unstressed form of the Old English preposition æf. Æf didn’t survive past Old English, but in time a new stressed form of of arose, giving us the preposition off. Of and off were more or less interchangeable until the 1600s, at which point they finally started to diverge into two distinct words. Æf is cognate with the German ab, and these ultimately come from the Proto-Indo-European *h₂epó ‘off, away, from’, which is also the source of the Greek apo (as in apostasy) and the Latin ab (as in abuse). So the initial laryngeal sound in *h₂epó disappeared after changing the following vowel to /a/, the final /o/ disappeared, the /p/ fricatized to /f/, the vowel moved back and reduced, the /f/ became voiced to /v/, and then the /v/ fell away, leaving only a schwa, the barest little wisp of a word.
  3. a: The indefinite article a comes from an unstressed version of the numeral one, which in Old English was ān, though it also inflected for gender, number, and case, meaning that it could look like āne, ānum, ānes, ānre, or ānra. By Middle English those inflections were gone, leaving only an. The /n/ started to disappear before consonants starting in the 1100s, giving us the a/an distinction we have today. But the Old English ān came from an earlier Proto-Germanic *ainaz. The az ending had disappeared by Old English, and the diphthong /ai/ smoothed and became /ɑ:/. In its use as an article, its vowel shortened and eventually reduced to a schwa. But in its use as a numeral, it retained a long vowel, which eventually rose to /o:/ and then broke into the diphthong /wʊ/ and then lowered to /wʌ/, giving us the modern word one. The Proto-Germanic *ainaz goes further back to the Proto-Indo-European *óynos, so between PIE and Proto-Germanic the vowels lowered and the final /s/ became voiced.
  4. aback: This adverb comes from the prefix a- and the noun back. The prefix a- comes from an unstressed form of the preposition on which lost its final /n/ and reduced to a schwa. This prefix also appears in words like among, atop, awake, and asleep. On comes from the Proto-Germanic *ana, which in turn comes from the Proto-Indo-European **h₂en-, which is also the source of the Greek ana-, as in analog and analyze. As with *h₂epó, the initial laryngeal sound changed the vowel to /a/ and then disappeared. Back, on the other hand, has changed remarkably little in the last thousand years. It was spelled bæc in Old English and was pronounced just like the modern word. It comes from a Proto-Germanic word *baka, though its ultimate origin is unknown.

Hopefully by now you see where I’m going with this. It’s interesting to talk about how words have changed over the years, but listicles like “10 Words Whose Pronunciations Have Changed” can be misleading, because they imply that changes in pronunciation are both random and rare. Well, sound changes are random in a way, in that it’s hard to predict what will change in the future, but they’re not random in the sense that they affect random words. Sound changes are just that—changes to a sound in the language, like /r/ disappearing after vowels or /t/ turning into a flap in certain cases in the middle of words. Words can randomly change too, but that’s the exception rather than the rule.

And sound changes aren’t something that just happen from time to time, like the Great Vowel Shift. They’re happening continuously, and they have been happening since the beginning of language. If you like really deep dives (or if you need something to combat your insomnia), this Wikipedia article details the sound changes that have happened between late Proto-Germanic, spoken roughly 2,000 years ago, and the present day, when changes like th-fronting in England (saying fink for think) and the Northern Cities Shift in the US are still occurring.

So while it’s okay to talk about individual words whose pronunciations have changed, I think we shouldn’t miss the bigger picture: it’s language change all the way down.

By

I Request You to Read This Post

Several weeks ago, I tweeted about a weird construction that I see frequently at work thanks to our project management system. Whenever someone assigns me to a project, I get an email like the one below:Hi Jonathon, [Name Redacted] just requested you to work on Editing. It's all yours.

I said that the construction sounded ungrammatical to me—you can ask someone to do something or request that they do it, but not request them to do it. Several people agreed with me, while others said that it makes sense to them if you stress you—they requested me to work on it, not someone else. Honestly, I’m not sure that stress changes anything, since the question is about what kind of complementation the verb request allows. Changing the stress doesn’t change the syntax.

However, Jesse Sheidlower, a former editor for The Oxford English Dictionary, quickly pointed out that the first sense of request in the OED is “to ask (a person), esp. in a polite or formal manner, to do something.” There are citations from around 1485 down to the present illustrating the construction request [someone] to [verb]. (Sense 3 is the request that [someone] [verb] construction, which has been around from 1554 to the present.) Jordan Smith, a linguistics PhD student at Iowa State, also pointed out that The Longman Grammar says that request is attested in the pattern [verb + NP + to-clause], just like ask. He agreed that it sounds odd, though.

So obviously the construction has been around for a while, and it’s apparently still around, but that didn’t explain why it sounds weird to me. I decided to do a little digging in the BYU corpora, and what I found was a little surprising.

The Corpus of Historical American English (COHA) shows a slow decline in the request [someone] to [verb] construction, from 13.71 hits per million words in the 1820s to just .2 per million words in the first decade of the 2000s.

And it isn’t just that we’re using the verb request a lot less now than we were two hundred years ago. Though it has seen a moderate decline, it doesn’t match the curve for that particular construction.

Even if the construction hasn’t vanished entirely, it’s pretty close to nonexistent in modern published writing—at least in some parts of the world. The Corpus of Global Web-Based English (GLoWbE) shows that while it’s mostly gone in nations where English is the most widely spoken first language (the US, Canada, the UK, Ireland, Australia, and New Zealand), it’s alive and well in South Asia (the taller bars in the middle are India, Sri Lanka, Pakistan, and Bangladesh). (Interestingly, the only OED citation for this construction in the last fifty years comes from a book called World Food: India.) To a lesser extent, it also survives in some parts of Africa and Southeast Asia (the two smallish bars at the right are Kenya and Tanzania).

It’s not clear why my work’s project management system uses a construction that is all but extinct in most varieties of English but is still alive and well in South Asia. The company is based in Utah, but it’s possible that they employ people from South Asia or that whoever wrote that text just happens to be among the few speakers of American English who still use it.

Whatever the reason, it’s an interesting example of language change in action. Peter Sokolowski, an editor for Merriam-Webster, likes to say, “Most English speakers accept the fact that the language changes over time, but don’t accept the changes made in their own time.” With apologies to Peter, I don’t think this is quite right. The changes we don’t accept are generally the ones made in our own time, but most changes happen without us really noticing. Constructions like request that [someone] [verb] fade out of use, and no one bemoans their loss. Other changes, like the shift from infinitives to gerunds and the others listed in this article by Arika Okrent, creep in without anyone getting worked up about them. It’s only the tip of the iceberg that we occasionally gripe about, while the vast bulk of language change slips by unnoticed.

This is important because we often conflate change and error—that is, we think that language changes begin as errors that gradually become accepted. For example, Bryan Garner’s entire Language Change Index is predicated on the notion that change is synonymous with error. But many things that are often considered wrong—towards, less with count nouns, which used as a restrictive relative pronoun—are quite old, while the rules forbidding their use are in fact the innovations. It’s perverse to call these changes that are creeping in when they’re really old features that are being pushed out. Indeed, the whole purpose of the index isn’t to tell you where a particular use falls on a scale of change, but to tell you how accepted that use is—that is, how much of an error it is.

So the next time you assume that a certain form must be a recent change because it’s disfavored, I request you to reexamine your assumptions. Language change is much more subtle and much more complex than you may think.

By

Is Change Okay or Not?

A few weeks ago I got into a bit of an argument with my coworkers in staff meeting. One of them had asked our editorial interns to do a brief presentation on the that/which rule in our staff meeting, and they did. But one of the interns seemed a little unclear on the rule—she said she had learned the rule in her class on modern American usage, but she had also learned that either that or which is technically fine with restrictive clauses. So of course I asked if I could chime in.

I pointed out that the rule—which states that you should always use that for restrictive clauses (except where that is grammatically impermissible, as when the relative pronoun follows a preposition or the demonstrative pronoun that)—is a relatively recent invention and that it didn’t really start to take hold in American usage until the mid-twentieth century. Many writers still don’t follow it, which means that editors have a lot of opportunities to apply the rule, and it’s generally not enforced outside the US.

My coworkers didn’t really like the perceived implication that the rule is bogus and that we shouldn’t worry about it, and one of them countered by saying that it didn’t matter what people did in 1810—the history is interesting, but we should be concerned about what usage is now. After all, the data clearly shows that the that/which rule is being followed in recent publications. And then she deployed an argument I’ve been seeing more and more lately: we all know that language changes, so why can’t we accept this change? (I’ve also heard variations like “Language changes, so why can’t we make it change this way?”)

These are good questions, and I don’t believe that linguists have good answers to them. (Indeed, I’m not even sure that good answers—or at least logically sound answers—are even possible.) In her book Verbal Hygiene, the linguist Deborah Cameron argues that it’s silly for linguists to embrace change from below but to resist change from above. What makes a “natural” change better than an unnatural one? We talk about how language changes, but it’s really people who change language, not language that changes by itself, so is there even a meaningful difference between natural and unnatural change?

Besides, many linguists have embraced certain unnatural changes, such as the movements for gender-neutral and plain language. Why is it okay for us to denounce prescriptivism on the one hand and then turn around and prescribe gender-neutral language on the other?

I haven’t come to a firm conclusion on this myself, but I think it all comes down to whether the alleged problem is in fact a problem and whether the proposed solution is in fact a solution. Does it solve the problem, does it do nothing, or does it simply create a new or different problem?

With gender-specific language, it’s clear that there’s a problem. Even though he is purportedly gender-neutral when its antecedent is indefinite or of unspecified gender, studies have shown that readers are more likely to assume that its antecedent is male. Clearly it’s not really gender-neutral if most people think “male” when they read “he”. Singular they has centuries of use behind it, including use by many great authors, and most people use it naturally and unselfconsciously. It’s not entirely uncontroversial, of course, but acceptance is growing, even among copy editors.

There are some minor thorny issues, like trying to figure out what the gender-neutral forms of freshman or fisherman should be, but writing around these seems like a small price to pay for text that treats people equally.

So what about the that/which rule? What problem does it claim to solve, and does it actually solve it?

The claim is that the rule helps distinguish between restrictive and nonrestrictive relative clauses, which in the abstract sounds like a good thing. But the argument quickly falls apart when you look at how other relative clauses work in English. We don’t need any extra help distinguishing between restrictive and nonrestrictive clauses with who, where, or when—the comma (or, in speech, the intonation) tells you whether a clause is restrictive. The fact that nobody has even recognized ambiguity with restrictive who or where or when as a problem, let alone proposed and implemented a solution, argues against the idea that there’s something wrong with restrictive which. Furthermore, no language I’ve heard of distinguishes between restrictive and nonrestrictive clauses with different pronouns. If it were really an advantage, then we’d expect to see languages all over the world with a grammaticalized distinction between restrictive and nonrestrictive clauses.

I’ve sometimes seen the counterargument that writers don’t always know how to use commas properly, so we can’t trust them to mark whether a clause is restrictive or not; but again, nobody worries about this with other relative clauses. And anyway, if copy editors can always identify when a clause is restrictive and thus know when to change which to that, then it stands to reason that they can also identify when a clause is nonrestrictive and and thus insert the commas if needed. (Though it’s not clear if even the commas are really necessary; in German, even restrictive clauses are set off with commas in writing, so you have to rely on context and common sense to tell you which kind of clause it is.)

It seems, then, that restrictive which is not a real problem at all and that insisting on that for all restrictive clauses doesn’t really accomplish anything. Even though Deborah Cameron criticizes linguists for accepting natural changes and rejecting unnatural ones, she also recognizes that many of the rules that copy editors impose, including the that/which rule, go far beyond what’s necessary for effective communication. She even quotes one scholar as saying that the that/which rule’s “sole virtue . . . is to give copy editors more billable hours.”

Some would argue that changing which to that doesn’t take much time, so there’s really no cost, but I don’t believe that’s true. My own research shows that it’s one of the most common usage or grammar changes that editors make. All those changes add up. I also know from experience that a lot of editors gripe about people not following the rule. That griping has a real effect on people, making them nervous about their abilities with their own native language. Even if you think the that/which rule is useful enough to justify the time it takes to impose it, is it worth making so many people feel self-conscious about their language?

Even if you believe that the that/which rule is an improvement, the fact is that English existed for nearly 1500 years without it, and even now it’s probably safe to say that the vast majority of English speakers have never heard of it. Although corpus data makes it appear as though it’s taken hold in American English, all we can really say from this data is that it has taken hold in edited, published American English, which really means that it’s taken hold among American copy editors. I’m sure some writers have picked the rule up from their English classes or from Word’s grammar checker, but I think it’s safe to say that American English as a whole has not changed—only the most visible portion, published writing, has.

So it’s rather disingenuous to say that the language has changed and thus we should accept the that/which rule as a valid part of Standard English. The argument is entirely circular: editors should enforce the rule because editors have been enforcing the rule now for a few decades. The fact that they have been enforcing the rule rather successfully doesn’t tell us whether they should be enforcing the rule.

Of course, that’s the fundamental problem with all prescriptions—sooner or later, you run into the is–ought problem. That is, it’s logically impossible to derive a prescriptive statement (one that tells you what you ought to do) from a descriptive one (one that states what is). Any statement like “This feature has been in use for centuries, so it’s correct” or “Shakespeare and Jane Austen used this feature, so it’s correct” or even “This feature is used by a majority of speakers today, so it’s correct” is technically a logical fallacy.

While acknowledging that nothing can definitively tell us what usage rules we should or shouldn’t follow, I still think we can come to a general understanding of which rules are worth following and which ones aren’t by looking at several different criteria:

  1. Historical use
  2. Modern use
  3. Oral use
  4. Edited written use
  5. Unedited written use
  6. Use by literary greats
  7. Common use

No single criterion is either necessary or sufficient to prove that a rule should be followed, but by looking at the totality of the usage evidence, we can get a good sense of where the rule came from, who uses it and in which contexts they use it, whether use is increasing or decreasing, and so on. So something might not be correct just because Chaucer or Shakespeare or Austen used it, but if something has been in continuous use for centuries by both literary greats and common people in both speech and writing, then it’s hard to maintain that it’s an error.

And if a rule is only followed in modern edited use, as the that/which rule is (and even then, it’s primarily modern edited American use), then it’s likewise hard to insist that this is a valid rule that all English speakers should be following. Again, the fact that editors have been enforcing a rule doesn’t tell us whether they should. Editors are good at learning and following rules, and we’re often good at pointing out holes or inconsistencies in a text or making it clearer and more readable, but this doesn’t mean that we have any special insight into what the grammar of English relative clauses should be, let alone the authority to insist that everyone follow our proposed changes.

So we can’t—or, at least, I think we shouldn’t—simply say that language has changed in this instance and that therefore we should all follow the rule. Language change is not necessarily good or bad, but it’s important to look at who is changing the language and why. If most people are changing the language in a particular way because they find that change genuinely useful, then it seems like a good thing, or at least a harmless thing. But if the change is being imposed by a small group of disproportionately powerful people for dubious reasons, and if the fact that this group has been successful is then used as evidence that the change is justified, then I think we should be skeptical.

If you want the language to change in a particular way, then the burden of proof is on you to demonstrate why you’re right and four hundred million native speakers are wrong. Until then, I’ll continue to tell our intern that what she learned in class was right: either that or which is fine.

By

Skunked Terms and Scorched Earth

A recent Twitter exchange about the term beg the question got me thinking again about the notion of skunked terms. David Ehrlich said that at some point the new sense of beg the question was going to become the correct one, and I said that that point had already come and gone.

If you’re not familiar with the issue, it’s that begging the question is traditionally a type of circular reasoning. Increasingly, though, it’s being used in the newer sense of ‘raising the question’ or ‘demanding that we ask the question’. A couple of years ago, Stan Carey found that the newer sense makes up about 90 percent of the hits in the GloWbE corpus (and the percentage is even higher if you exclude mentions and only count uses).

On Language Log Neal Goldfarb wrote that the term should be avoided, either because it’s likely to be misunderstood or because it will incur the wrath of sticklers. On Twitter, many others agreed that the term was skunked, to borrow a term from Bryan Garner.

In his Modern American Usage, Garner writes, “When a word undergoes a marked change from one use to another . . . it’s likely to be the subject of dispute. . . . A word is most hotly disputed in the middle part of this process: any use of it is likely to distract some readers. . . . The word has become ‘skunked.'”

Many people find this a useful idea, but it has always rubbed me the wrong way. On the one hand, it seems helpful to identify usage problems that may attract ire or create confusion. But on the other hand, it’s often used as sort of a trump card in usage debates. It doesn’t matter which use is right or wrong—the word or phrase is now tarnished and can never be used again (at least until the sticklers all die off and everyone forgets what the fuss was about).

And in many cases it feels like a sort of scorched-earth policy: if we can’t use this term the way we think is best, then nobody should use it. Better to ruin the term for everyone than to let it fall into the hands of the enemy. After all, who’s doing the skunking? The people who use a term in its new sense and are usually unaware of the debate, or the people who use it in the old sense and are raising a stink about the change?

In some cases, though, it’s not clear what declaring a word skunked accomplishes. For instance, Garner says that data is skunked because some people object to its use with a plural verb, while others object to its use with a singular. Either way, you might annoy someone. But scientists can’t just stop writing about data—they’re going to have to pick a side.

And sometimes, as with beg the question, it almost seems silly to keep calling a new use skunked. If upwards of 90 percent of the uses of a term are in the new sense (and I suspect it’s even higher in speech), then the battle is all but over. We can’t realistically say that you should avoid using beg the question because it’s ambiguous, because it’s always clear in context. And the new sense certainly isn’t unclear or unfamiliar—how could it be if it’s the one that most people are using? The old sense may be unclear to the uninitiated, but that’s always been the case, because it’s a rather technical term. The new use doesn’t change that.

So what it really comes down to is the fact that a very small but very vocal minority don’t like the new use and would rather say that it’s been ruined for everyone than to admit defeat. The question is, should that be enough reason to declare the term off-limits to everybody? Many editors and usage commentators argue that there’s no harm in avoidance, but Geoff Nunberg calls this rationale “the pedant’s veto“: “It doesn’t matter if you consider a word to be correct English. If some sticklers insist that it’s an error, the dictionaries and style manuals are going to counsel you to steer clear of it to avoid bringing down their wrath.” (Arnold Zwicky, somewhat less charitably, calls this rationale “crazies win“.) Nunberg says that this sort of avoidance can be a wise course of action, but other times it seems a bit ridiculous.

Consider, for example, the Economist style guide, which is often mocked for its avoidance of split infinitives. It reads, “Happy the man who has never been told that it is wrong to split an infinitive: the ban is pointless. Unfortunately, to see it broken is so annoying to so many people that you should observe it.” Who are all these people who find split infinitives so annoying? And even if there are still a few people who cling to this non-rule, why should everybody else change just to make them happy? Indeed, it seems that most other usage guides have moved on at this point.

Perhaps the biggest problem with declaring a term skunked is that it’s not clear what the criteria are. How many sticklers does it take to skunk a term? How contentious does the debate need to be? And how do we know when it stops being skunked?

I have to wonder, though, if the entire notion of skunked terms is ultimately self-defeating. The people who are most likely to heed a warning to avoid a contentious usage are also the people who are most likely to adhere to traditional usage in the first place. The people who use beg the question in the new sense, for example, are most likely unaware not only of the traditional meaning but also of the fact that there’s a debate about its meaning. If the traditionalists all start avoiding the term, then all that will remain will be the new use. By declaring a term skunked and saying it should be avoided, it could be that all we really accomplish is to drive the old use out even faster.

Ultimately, the question is, how much do we care about the opinions of that small but vocal minority? Maybe it’s just the contrarian streak in me, but I hate giving such a small group such disproportionate power over the language we all use. I’d rather spend my efforts trying to change opinions on usage than trying to placate the peevers. But I have to admit that there’s no easy answer. If there were, there’d be no reason to call a term skunked in the first place.

By

The Whole Truth

A correspondent named Jitendra Pant recently asked me to elaborate on the etymology of whole:

Dear Jonathon, I am wondering why whole has a spelling beginning with ‘w’ and not just ‘hole’. Online checking suggests that ‘hole’ and ‘whole’ did have related origins, but departed around the 15th century, when ‘wh’ was introduced. https://www.etymonline.com/word/whole doesn’t say why. The Am Heritage concurs for hal, hole. And a 5-year-old nephew asked, so I am counting on your reply. Thank you!

I certainly don’t want to disappoint Jitendra’s nephew, so here goes.

It’s true that the word whole didn’t originally have the w, but it’s not actually related to hole. As the Online Etymology Dictionary, whole comes from the Old English hal and is related to the German heil. Related words without the w can still be seen in heal, hale, and health. These words apparently all go back to a Proto-Indo-European root *kailo-, ‘whole, uninjured, of good omen’.

Hole, on the other hand, goes back to a different Proto-Indo-European root, *kel-, meaning ‘to cover, conceal, save’. Eventually this developed into the ‘cave, hollow place’ sense. Hole was generally spelled hol in Old English, so the two words were not originally homophones. It wasn’t until Middle English that they started to converge in spelling and pronunciation.

So where do we get that unetymological w in whole? In the entry for whole, the Online Etymology Dictionary simply says that the wh- spelling arose in the early 15th century. In the entry for wh-, it says that the wh spelling was sometimes added to borrowed words like whiskey and native words formerly spelled with only w- or h- like whole and whore. It even threatened to spread to words like home and hot. It doesn’t explain why this spelling took off, but The Oxford English Dictionary provides a clue.

Under the entry for whole, it says, “Spellings with initial wh- appear in the mid 15th cent. and reflect development of a w-glide chiefly before long open ǭ (see discussion at wh n.), sometimes followed by loss of the initial h-.” That is, people started spelling it with a w- because they had started saying it with a w.

The entry for wh elaborates on this a little, saying that in the early 15th century, wh- started appearing in a lot of words beginning with ho-, including home, hot, and holy, the last of which appears as wholy in William Tyndale’s 1526 translation of the Bible. The pronunciation of these words with the w survives in some dialects, but it apparently fell out of Standard English fairly quickly, leaving only whole and whore with the modified spelling but the original pronunciation with h.

Interestingly, a similar change happened around the same time to words beginning with o. The word one began to appear with a w around 1450 (Tyndale has it as won), as did oat and oak. Only one kept the pronunciation with the w in Standard English (though it didn’t keep the won spelling), though, again, dialectal pronunciations of the other words with w can still be found.

The older pronunciation of one with a long o and no w can still be found in compounds and derived forms like only, alone, and atone, though the modern descendent of the w-less form of one is the enclitic ‘un (as in young’uns).

It’s not clear to me if these two changes—the addition of a w in words beginning with ho and the addition of a w in words beginning with o—are really the same change or are just two related changes that happened around the same time. Either way, it’s interesting to see the way they left their mark on the spelling and pronunciation of a few words, even after they had otherwise vanished from Standard English.

By

Two Space or Not Two Space

A friend of mine recently posted on Facebook that you could take the second space after a period away from him when you pry it from his cold, dead fingers. I responded with this image of Ben Wyatt from Parks and Recreation.

I don't even have time to tell you how wrong you are. Actually, it’s gonna bug me if I don’t.

But I said I’d refrain from sharing my thoughts unless he really wanted to hear them. He said he did, so here goes.

Even though the extra space has its defenders, using two spaces between sentences is wrong by today’s standards, but nearly everybody is wrong about why.

The usual argument goes that it’s a holdover from the days of typewriters. Typewriters use monospaced fonts (meaning that each character takes up the same amount of horizontal space, whether it’s an i or a W), which look spacey compared to proportional fonts (where characters have different widths according to the size and shape of the actual character). Since monospaced text looks spacey already, it was decided that an extra space was needed between sentences to make things readable. But since we’re now all writing on computers with proportional fonts, we should all ditch the two-space habit. Case closed!

But not so fast.

You may have been taught in typing class to type two spaces at the end of a sentence, but the practice has nothing to do with typewriters. It’s actually just an attempt to replicate the look of typeset text of the era. There are other blog posts out there that give a much more thorough account of the history of sentence spacing than I’ll give here (and I’ll link to them at the end), but I’ll use some of the same sources.

But before we dive in, some definitions. Spacing in typography is usually based on the em, a relative unit of measurement that’s as wide as a line of type is tall. That is, if type is set at 12 points, then an em is also 12 points. The name derives from the fact that a capital M in many typefaces is about as wide as it is tall. The em dash (—) is so named because it’s 1 em wide. A space the width of an em is called an em space, an em quad, or just an em or a quad.

An en space or en quad is about the width of a capital N, which is half the width of an em space. An en dash, as you guessed it, is 1 en wide.

A three-em space is not three ems wide but one-third of an em (that is, it’s a three-to-an-em space). Also called a thick space, this is the standard space used between words. There are also smaller spaces like four-em and five-em spaces (known as thin spaces) and hair spaces, but we don’t need to worry about them.

Modern typesetting practice is to use a thick space everywhere, but professional practice even just a hundred years ago was surprisingly different. Just take a look at this guide to spacing from the first edition of what would later be known as The Chicago Manual of Style (published in 1906):

Space evenly. A standard line should have a 3-em space between all words not separated by other punctuation points than commas, and after commas; an en-quad after semicolons, and colons followed by a lower-case letter; two 3-em spaces after colons followed by a capital; an em-quad after periods, and exclamation and interrogation points, concluding a sentence.

In other words, the standard spacing was a thick space (one-third of an em) between words (the same as it is today), a little bit more than that (half an em) after semicolons or colons that were followed by a lowercase letter, two thick spaces after a colon followed by a capital, and the equivalent of three thick spaces between sentences. Typesetters weren’t just double-spacing between sentences—they were triple spacing. You can see this extra spacing in the manual itself:

Remember that typewriters were generally monospaced, meaning that the carriage advanced the same amount for every character, including spaces. On a typewriter, there’s no such thing as a thin space, en space, or em space. Consequently, the rules for spacing were simplified a bit: a thick space between words and following semicolons or colons followed by a lowercase letter, and two thick spaces between sentences or after a colon followed by a capital letter.

At this point the two-spacers may be cheering. History is on your side! The extra space is good! But not so fast.

Around the middle of the last century, typesetting practice began to change. That complicated system of spacing takes extra time to implement, and financial and technological pressures eventually pushed typesetters to adopt the current practice of using a single thick space everywhere. But this wasn’t an innovation. English and Americans typesetters may have used extra space, but French typesetters did not—they used just one space between sentences. Clearly not everyone thought that the extra space was necessary.

And as someone who has done a fair amount of typesetting, I have to say that I’m thankful for the current standard. It’s easy to ensure that there’s only a single space everywhere, but trying to ensure that there’s extra space between sentences—and only between sentences—would be a nightmare even with the help of find-and-replace queries or regular expressions. (I’ve seen some suggestions that typesetting software automatically adds space between sentences, but this isn’t true of any of the typesetting software I’ve ever used, which includes FrameMaker, QuarkXPress, and InDesign. Maybe LaTeX does it, but I’d be curious to see how well it really does.)

My wife has done a fair amount of editing for doctoral students whose committees seem to think that the APA style requires two spaces between sentences, so she’s spent a lot of time putting all those extra spaces in. (Luckily for her, she charges by the hour.) In its section on spacing following punctuation, the Publication Manual of the American Psychological Association says that “spacing twice after punctuation marks at the end of a sentence aids readers of draft manuscripts.” (APA doesn’t require the extra space in published work, though, meaning that authors are asked to put the spaces in and then editors or typesetters take them right back out.)

Unfortunately, there’s no evidence to back up the claim that the extra space aids readability by providing a little more visual separation between sentences; what few studies have been done have been inconclusive. Inserting extra spacing means extra time for the editor, typesetter, and proofreader, and it’s extra time that doesn’t appear to add any value. (Conversely, there’s also no evidence that the extra space hurts.) I suspect that the readability argument is just a post hoc rationalization for a habit that some find hard to break.

After all, most people alive today grew up in the era of single spacing in professionally set text, so it’s what most people are familiar with. You never see the extra space unless you’re looking at an older text, a typewritten text, or a text that hasn’t been professionally edited and typeset. But most people who use the extra space do so not because of allegedly improved readability but because it’s simply what they were taught or because they say it’s impossible to break the habit of hitting the spacebar twice after a sentence.

And I’m skeptical when people claim that double-spacing is hardwired into their brains. Maybe I just have an easier time breaking bad habits than some people, but when I was taught to type in eighth grade (on typewriters, even—my school didn’t have enough money to stock the typing lab with computers), I was taught the two-space rule. And almost as soon as I was out of that class, I stopped. It took maybe two weeks to break the habit. But I already knew that it was an outdated practice, so I was motivated to abandon it as soon as my grade no longer depended on it.

If you’ve been typing this way for decades, though, or if you were never informed that the practice was outdated, you may be less motivated to try to change. Even if you write for publication, you can rely on your editor or typesetter to remove those extra spaces for you with a quick find-and-replace. You may not even be aware that they’re doing it.

Of course, even some people who should know better seem to be unaware that using two spaces is no longer the standard. When my oldest son was taught to type in school a couple of years ago, his teacher—who is probably younger than me—taught the class to use two spaces after a sentence. Even though typesetters switched to using a single space over fifty years ago, and typewriters have gone the way of the rotary phone, the two-space practice just won’t die.

So the real question is, what should you do? If you’re still using two spaces, either out of habit or because you like how it looks, should you make the switch? Or, put another way, is it really wrong to keep using two spaces after half a century after the publishing world has moved on?

Part of me really wants to say that yes, it really is that wrong, and you need to get over yourself and just break the stupid habit already. But the truth is that unless you’re writing for publication, it doesn’t actually matter all that much. If your work is going to be edited and typeset, then you should know that the extra space is going to be taken out anyway, so you might as well save a step by not putting it in in the first place.

But if you’re just writing a text or posting on Facebook or something like that, it’s not that big a deal. At worst, you’re showing your age and maybe showing your inability or unwillingness to break a habit that annoys some people. But the fact that it annoys some people is on us, not you. After all, it’s not like you’re committing a serious crime, like not using a serial comma.

Sources and Further Reading

This post on Creative Pro is fairly exhaustive and rather sensible, but it still concludes that using two spaces is the right thing to do when using monospaced fonts. If the rationale behind using two spaces on a typewriter was to look like typeset text of the era, then there’s no reason to continue doing it.

On a blog called the World’s Greatest Book, Dave Bricker also has a very well-researched and even-handed post on the history of sentence spacing. He concludes, “Though writers are encouraged to unlearn the double-space typing habit, they may be heartened to learn that intellectual arguments against the old style are mostly contrived. At worst, the wide space after a period is a victim of fashion.”

By

Book Review: Word by Word

Word by Word: The Secret Life of Dictionaries, by Kory Stamper

Disclosure: I received a free advance review copy of this book from the publisher, Pantheon Books. I also consider Kory Stamper a friend.

A lot of work goes into making a book, from the initial writing and development to editing, copyediting, design and layout, proofreading, and printing. Orders of magnitude more work go into making a dictionary, yet few of us give much thought to how dictionaries actually come into being. Most people probably don’t think about the fact that there are multiple dictionaries. We always refer to it as the dictionary, as if it were a monolithic entity.

In Word by Word, Merriam-Webster editor Kory Stamper shows us the inner workings of dictionary making, from gathering citations to defining to writing pronunciations to researching etymologies. In doing so, she also takes us through the history of lexicography and the history of the English language itself.

If you’ve read other popular books on lexicography, like The Lexicographer’s Dilemma by Jack Lynch, you’re probably already familiar with some of the broad outlines of Word by Word—where dictionaries come from, how words get in them, and so on. But Stamper presents even familiar ideas in a fresh way and with wit and charm. If you’re familiar with her blog, Harmless Drudgery, you know she’s a gifted writer. (And if you’re not familiar with it, you should remedy that as soon as possible.)

In discussing the influence of French and Latin on English, for example, she writes, “Blending grammatical systems from two languages on different branches of the Indo-European language tree is a bit like mixing orange juice and milk: you can do it, but it’s going to be nasty.” And in describing the ability of lexicographers to focus on the same dry task day in and day out, she says that “project timelines in lexicography are traditionally so long that they could reasonably be measured in geologic epochs.”

Stamper also deftly teaches us about lexicography by taking us through her own experience of learning the craft, from the job interview in which she gushed about medieval Icelandic family sagas to the day-to-day grind of sifting through citations to the much more politically fraught side of dictionary writing, like changing the definitions for marriage or nude (one of the senses was defined as the color of white skin).

But the real joy of Stamper’s book isn’t the romp through the history of lexicography or the English language or even the self-deprecating jokes about lexicographers’ antisocial ways. It’s the way in which Stamper make stories about words into stories about us.

In one chapter, she looks into the mind of peevers by examining the impulse to fix English and explaining why so many of the rules we cherish are wrong:

The fact is that many of the things that are presented to us as rules are really just the of-the-moment preferences of people who have had the opportunity to get their opinions published and whose opinions end up being reinforced and repeated down the ages as Truth.

Real language is messy, and it doesn’t fit neatly into the categories of right and wrong that we’re taught. Learning this “is a betrayal”, she says, but it’s one that lexicographers have to get over if they’re going to write good dictionaries.

In the chapter “Irregardless”, she explores some of the social factors that shape our speech—race and ethnicity, geography, social class—to explain how she became one of the world’s foremost irregardless apologists when she started answering emails from correspondents who want the word removed from the dictionary. Though she initially shared her correspondents’ hatred of the word, an objective look at its use helped her appreciate it in all its nuanced, nonstandard glory. But—just like anyone else—she still has her own hangups and peeves, like when her teenage daughter started saying “I’m done my homework.”

In another chapter, she relates how she discovered that the word bitch had no stylistic label warning dictionary users that the word is vulgar or offensive, and she dives not only into the word’s history but also into modern efforts to reclaim the slur and the effects the word can have on those who hear it—anger, shame, embarrassment—even when it’s not directed at them.

And in my favorite chapter, she takes a look at the arcane art of etymology. “If logophiles want to be lexicographers when they grow up,” she writes, “then lexicographers want to be etymologists.” (I’ve always wanted to be an etymologist, but I don’t know nearly enough dead languages. Plus, there are basically zero job openings for etymologists.) Stamper relates the time when she brought some Finnish candy into the office, and Merriam-Webster’s etymologist asked her—in Finnish—if she spoke Finnish. She said—also in Finnish—that she spoke a little and asked if he did too. He replied—again, in Finnish—that he didn’t speak Finnish. This is the sort of logophilia that I can only dream of.

Stamper explodes some common etymological myths—no, posh and golf and the f word don’t originate from acronyms—before turning a critical eye on Noah Webster himself. The man may have been the founder of American lexicography, but his etymologies were crap. Webster was motivated by the belief that all languages descend from Hebrew, and so he tried to connect every word to a Hebrew root. But tracing a word’s history requires poring over old documents (often in one of those aforementioned dead languages) and painstakingly following it through the twists and turns of sound changes and semantic shifts.

Stamper ends the book with some thoughts on the present state and future of lexicography. The internet has enabled dictionaries to expand far beyond the limitations of print books—you no longer have to worry about things line breaks or page counts—but it also pushes lexicographers to work faster even as it completely upends the business side of things.

It’s not clear what the future holds for lexicography, but I’m glad that Kory Stamper has given us a peek behind the curtain. Word by Word is a heartfelt, funny, and ultimately human look at where words come from, how they’re defined, and what they say about us.

Word by Word: The Secret Life of Dictionaries is available now at Amazon and other booksellers.

By

Politeness and Pragmatics

On a forum I frequent, a few posters started talking about indirectness and how it can be annoying when a superior—whether a boss or a parent—asks you to do something in an indirect way. My response was popular enough that I thought I might repost it here. What follows is one of the original posts plus my edited and expanded response.

My kids used to get really pissed off when I asked them “Would you please unload the dishwasher”. They said it implied that they had a choice, when they really didn’t.

It’s time for some speech act theory.

The study of the meanings of words and utterances is called semantics, but the study of speech acts—how we intend those utterances to be received and how they’re received in context—is called pragmatics. And a look at pragmatics can reveal why parents say things like “Would you please unload the dishwasher?” when they really mean “Unload the dishwasher.”

Any speech act has three components: the locution (the meaning of the words themselves), the illocution (the intent of the speaker or writer), and the perlocution (the message that is received, or the effect of the speech act). Quite often, all three of these coincide. If I ask “What time is it?”, you can be pretty sure that my intent is find out the time, so the message you receive is “Jonathon wants me to tell him the time.” We call this a direct speech act.

But sometimes the locution, illocution, and perlocution don’t exactly correspond. If I ask “Do you know what time it is?”, I’m not literally asking if you have knowledge of the current time and nothing more, so the appropriate response is not just “Yes” or “No” but “It’s 11:13” or whatever the time is. I’m still asking you to tell me the time, but I didn’t say it directly. We call this an indirect speech act.

And speech can often be much more indirect than this. If we’re on a road trip and I ask my wife, “Are you hungry?”, what I really mean is that I’m hungry and want to stop for food, and I’m checking to see if she wants to stop too. Or maybe we’re sitting at home and I ask, “Is it just me, or is it hot in here?” And what I really mean is “I’m hot—do you mind if I turn the AC up?”

Indirect speech acts are often used to be polite or to save face. In the case of asking a child or subordinate to do something when they really don’t have a choice, it’s a way of downplaying the power imbalance in the relationship. By pretending to give someone a choice, we acknowledge that we’re imposing our will on them, which can make them feel better about having to do it. So while it’s easy to get annoyed at someone for implying that you have a choice when you really don’t, this reaction deliberately misses the point of indirectness, which is to lubricate social interaction.

Of course, different speech communities and even different individuals within a community can have varying notions of how indirect one should be, which can actually cause additional friction. Some cultures rely much more on indirectness, and so it causes problems when people are too direct. On the flip side, others may be frustrated with what they perceive as passive-aggressiveness, while the offender is probably just trying to be polite or save face.

In other words, indirectness is generally a feature, not a bug, though it only works if both sides are playing the same game. Instead of getting annoyed at the mismatch between the locution and the illocution, ask yourself what the speaker is probably trying to accomplish. Indirectness isn’t a means of obscuring the message—it’s an important part of the message itself.

By

Cognates, False and Otherwise

A few months ago, I was editing some online German courses, and I came across one of my biggest peeves in discussions of language: false cognates that aren’t.

If you’ve ever studied a foreign language, you’ve probably learned about false cognates at some point. According to most language teachers and even many language textbooks, false cognates are words that look like they should mean the same thing as their supposed English counterparts but don’t. But cognates don’t necessarily look the same or mean the same thing, and words that look the same and mean the same thing aren’t necessarily cognates.

In linguistics, cognate is a technical term meaning that words are etymologically related—that is, they have a common origin. The English one, two, three, German eins, zwei, drei, French un, deux, trois, and Welsh un, dau, tri are all cognate—they and words for one, two, three in many other language all trace back to the Proto-Indo-European (PIE) *oino, *dwo, *trei.

These sets are all pretty obvious, but not all cognates are. For example, the English four, five, German vier, fünf, French quatre, cinq, and Welsh pedwar, pump. The English and German are still obviously related, but the others less so. Fünf and pump are actually pretty close, but it seems a pretty long way from four and vier to pedwar, and an even longer way from them to quatre and cinq.

And yet these words all go back to the PIE *kwetwer and *penkwe. Though the modern-day forms aren’t as obviously related, linguists can nevertheless establish their relationships by tracing the them back through a series of sound changes to their conjectured historical forms.

And not all cognates share meaning. The English yoke, for instance, is related to the Latin jugular, the Greek zeugma, and the Hindi yoga, along with join, joust, conjugate, and many others. These words all trace back to the PIE *yeug ‘join’, and that sense can still be seen in some of its modern descendants, but if you’re learning Hindi, you can’t rely on the word yoke to tell you what yoga means.

Which brings us back to the German course that I was editing. Cognates are often presented as a way to learn vocabulary quickly, because the form and meaning are often similar enough to the form and meaning of the English word to make them easy to remember. But cognates often vary wildly in form (like four, quatre, and pedwar) and in meaning (like yoke, jugular, zeugma, and yoga). And many of the words presented as cognates are in fact not cognates but merely borrowings. Strictly speaking, cognates are words that have a common origin—that is, they were inherited from an ancestral language, just as the cognates above all descend from Proto-Indo-European. Cognates are like cousins—they may belong to different families, but they all trace back to a common ancestor.

But if cognates are like cousins, then borrowings are like clones, where a copy of word is taken directly from one language to another. Most of the cognates that I learned in French class years ago are actually borrowings. The English and French forms may look a little different now, but the resemblance is unmistakable. Many of the cognates in the German course I was editing were also borrowings, and in many cases they were words that were borrowed into both German and English from French:

bank
drama
form
gold
hand
jaguar
kredit
land
name
park
problem
sand
tempo
wind
zoo

Of these, only gold, hand, land, sand, and wind are actually cognates. Maybe it’s nitpicking to point out that the English jaguar and the German Jaguar aren’t cognates but borrowings from Portuguese. For a language learner, the important thing is that these words are similar in both languages, making them easy to learn.

But it’s the list of supposed false cognates that really irks me:

bad/bath
billion/trillion
karton/cardboard box
chef/boss
gift/venom
handy/cellphone
mode/fashion
peperoni/chili pepper
pickel/zit
rock/skirt
wand/wall
beamer/video projector
argument/proof, reasons

The German word is on the left and the English word on the right. Once again, many of these words are borrowings, mostly from French and Latin. All of these borrowings are clearly related, though their senses may have developed in different directions. For example, chef generally means “boss” in French, but it acquired its specialized sense in English from the longer phrase chef de cuisine, “head of the kitchen”. The earlier borrowing chief still maintains the sense of “head” or “boss”.

(It’s interesting that billion and trillion are on the list, since this isn’t necessarily an English/German difference—it also used to be an American/British difference, but the UK has adopted the same system as the US. Some languages use billion to mean a thousand million, while other languages use it to mean a million million. There’s a whole Wikipedia article on it.)

But some of these words really are cognate with English words—they just don’t necessarily look like it. Bad, for example, is cognate with the English bath. You just need to know that the English sounds spelled as <th>—like the /θ/ in thin or the /ð/ in then—generally became /d/ in German.

And, surprisingly, the German Gift, “poison”, is indeed cognate with the English gift. Gift is derived from the word give, and it means “something given”. The German word is essentially just a highly narrowed sense of the word: poison is something you give someone. (Well, hopefully not something you give someone.)

On a related note, that most notorious of alleged false cognates, the Spanish embarazado, really is related to the English embarrassed. They both trace back to an earlier word meaning “to put someone in an awkward or difficult situation”.

Rather than call these words false cognates, it would be more accurate to call them
false friends. This term is broad enough to encompass both words that are unrelated and words that are borrowings or cognates but that have different senses.

This isn’t to say that cognates aren’t useful in learning a language, of course, but sometimes it takes a little effort to see the connections. For example, when I learned German, one of my professors gave us a handout of some common English–German sound correlations, like the th ~ d connection above. For example, if you know that the English /p/ often corresponds to a German /f/ and that the English sound spelled <ea> often corresponds to the German /au/, then the relation between leap and laufen “to run” becomes clearer.

Or if you know that the English sound spelled <ch> often corresponds with the German /k/ or that the English /p/ often corresponds with the German /f/, then the relation between cheap and kaufen “to buy” becomes a little clearer. (Incidentally, this means that the English surname Chapman is cognate with the German Kaufmann.) And knowing that the English <y> sometimes corresponds to the German /g/ might help you see the relationship between the verb yearn and the German adverb gern “gladly, willingly”.

You don’t have to teach a course in historical linguistics in order to teach a foreign language like German, but you’re doing a disservice if you teach that obviously related pairs like Bad and bath aren’t actually related. Rather than teach students that language is random and treacherous, you can teach them to find the patterns that are already there. A little bit of linguistic background can go a long way.

Plus, you know, real etymology is a lot more fun.

Edited to add: In response to this post, Christopher Bergmann (www.isoglosse.de) created this great diagram of helpful cognates, unhelpful or less-helpful cognates, false cognates, and so on:

Click to see the full-sized image.

%d bloggers like this: