Prescriptivism and Language Change

Recently, John McIntyre posted a video in which he defended the unetymological use of decimate to the Baltimore Sun’s Facebook page. When he shared it to his own Facebook page, a lively discussion ensued, including this comment:

Putting aside all the straw men, the ad absurdums, the ad hominems and the just plain sillies, answer me two questions:
1. Why are we so determined that decimate, having once changed its meaning to a significant portion of the population, must be used to mean obliterate and must never be allowed to change again?
2. Is your defence of the status quo on the word not at odds with your determination that it is a living language?
3. If the word were to have been invented yesterday, do you really think “destroy” is the best meaning for it?
Putting aside all the straw men in these questions themselves, let’s get at what he’s really asking, which is, “If decimate changed once before from ‘reduce by one-tenth’ to ‘reduce drastically’, why can’t it change again to the better, more etymological meaning?”

I’ve seen variations on this question pop up multiple times over the last few years when traditional rules have been challenged or debunked. It seems that the notions that language changes and that such change is normal have become accepted by many people, but some of those people then turn around and ask, “So if language changes, why can’t change it in the way I want?” For example, some may recognize that the that/which distinction is an invention that’s being forced on the language, but they may believe that this is a good change that increases clarity.

On the surface, this seems like a reasonable question. If language is arbitrary and changeable, why can’t we all just decide to change it in a positive way? After all, this is essentially the rationale behind the movements that advocate bias-free or plain language. But whereas those movements are motivated by social or cognitive science and have measurable benefits, this argument in favor of old prescriptive rules is just a case of motivated reasoning.

The bias-free and plain language movements are based on the premises that people deserve to be treated equally and that language should be accessible to its audience. Arguing that decimated really should mean “reduced by one-tenth” is based on a desire to hang on to rules that one was taught in one’s youth. It’s an entirely post hoc rationale, because it’s only employed to defend bad rules, not to determine the best meaning for or use of every word. For example, if we really thought that narrower etymological senses were always better, shouldn’t we insist that cupboard only be used to refer to a board on which one places cups?

This argument is based in part on a misunderstanding of what the descriptivist/prescriptivist debate is all about. Nobody is insisting that decimate must mean “obliterate”, only observing that it is used in the broader sense far more often than the narrower etymological sense. Likewise, no one is insisting that the word must never be allowed to change again, only noting that it is unlikely that the “destroy one-tenth” sense will ever be the dominant sense. Arguing against a particular prescription is not the same as making the opposite prescription.

But perhaps more importantly, this argument is based on a fundamental misunderstanding of how language change works. As Allan Metcalf said in a recent Lingua Franca post, “It seems a basic principle of language that if an expression is widely used, that must be because it is widely useful. People wouldn’t use a word if they didn’t find it useful.” And as Jan Freeman has said, “we don’t especially need a term that means ‘kill one in 10.’” That is, the “destroy one-tenth” sense is not dominant precisely because it is not useful.

The language changed when people began using the word in a more useful way, or to put it more accurately, people changed the language by using the word in a more useful way. You can try to persuade them to change back by arguing that the narrow meaning is better, but this argument hasn’t gotten much traction in the 250 years since people started complaining about the broader sense. (The broader sense, unsurprisingly, dates back to the mid-1600s, meaning that English speakers were using it for a full two centuries before someone decided to be bothered by it.)

But even if you succeed, all you’ll really accomplish is driving decimate out of use altogether. Just remember that death is also a kind of change.


Stupidity on Singular They

A few weeks ago, the National Review published a singularly stupid article on singular they. It’s wrong from literally the first sentence, in which the author, Josh Gelernter, says that “this week, the 127-year-old American Dialect Society voted the plural pronoun ‘they,’ used as a singular pronoun, their Word of the Year.” It isn’t from last week; this is a piece of old news that recently went viral again. The American Dialect Society announced its word of the year, as it typically does, at the beginning of the year. Unfortunately, this is a good indication of the quality of the author’s research throughout the rest of the article.

After calling those who use singular they stupid and criticizing the ADS for failing to correct them (which is a fairly serious misunderstanding of the purpose of the ADS and the entire field of linguistics in general), Gelernter says that we already have a gender-neutral third-person pronoun, and it’s he. He cites “the dictionary of record”, Webster’s Second International, for support. His choice of dictionary is telling. For those not familiar with it, Webster’s Second, or W2, was published in 1934 and has been out of print for decades.

The only reason someone would choose it over Webster’s Third, published in 1961, is as a reaction to the perception that W3 was overly permissive. When it was first published, it was widely criticized for its more descriptive stance, which did away with some of the more judgemental usage labels. Even W3 is out of date and has been replaced with the new online Unabridged; W2 is only the dictionary of record of someone who refuses to accept any of the linguistic change or social progress of the last century.

Gelernter notes that W2’s first definition for man is “a member of the human race”, while the “male human being” sense “is the second-given, secondary definition.” Here it would have helped Gelernter to read the front matter of his dictionary. Unlike some other dictionaries, Merriam-Webster arranges entries not in order of primary or central meanings to more peripheral meanings but in order of historical attestation. Man was most likely originally gender-neutral, while the original word for a male human being was wer (which survives only in the word werewolf). Over time, though, wer fell out of use, and man began pulling double duty. 1The Online Etymology Dictionary notes that a similar thing happened with the Latin vir (cognate with wer) and homo. Vir fell out of use as homo took over the sense of “male human”.

So just because an entry is listed first in a Merriam-Webster dictionary does not mean it’s the primary definition, and just because a word originally meant one thing (and still does mean that thing to some extent) does not mean we must continue to use it that way.

Interestingly, Gelernter admits that the language lost some precision when the plural you pushed out the singular thou as a second-person pronoun, though, bizarrely, he says that it was for good reason, because you had caught on as a more polite form of address. The use of you as a singular pronoun started as a way to be polite and evolved into an obsession with social status, in which thou was eventually relegated to inferiors before finally dropping out of use.

The resurgence of singular they in the twentieth century was driven by a different sort of social force: an acknowledgement that the so-called gender-neutral he is not really gender-neutral. Research has shown that gender-neutral uses of he and man cause readers to think primarily of males, even when context makes it clear that the person could be of either gender. (Here’s just one example.) They send the message that men are the default and women are other. Embracing gender-neutral language, whether it’s he or she or they or some other solution, is about correcting that imbalance by acknowledging that women are people too.

And in case you still think that singular they is just some sort of newfangled politically correct usage, you should know that it has been in use since the 1300s and has been used by literary greats from Chaucer to Shakespeare to Orwell.2I once wrote that Orwell didn’t actually use singular they; it turns out that the quote attributed to him in Merriam-Webster’s Dictionary of English Usage was wrong, but he really did use it. For centuries, nobody batted an eye at singular they, until grammarians started to proscribe it in favor of generic he in the eighteenth and nineteenth centuries. Embracing singular they doesn’t break English grammar; it merely embraces something that’s been part of English grammar for seven centuries.

At the end, we get to the real heart of Gelernter’s article: ranting about new gender-neutral job titles in the armed forces. Gelernter seems to think that changing to gender-neutral titles will somehow make the members of our armed forces suddenly forget how to do their jobs. This isn’t really about grammar; it’s about imagining that it’s a burden to think about the ways in which language affects people, that it’s a burden to treat women with the same respect as men.

But ultimately, it doesn’t matter what Josh Gelernter thinks about singular they or about gender-neutral language in general. Society will continue to march on, just as language has continued to march on in the eight decades since his beloved Webster’s Second was published. But remember that we have a choice in deciding how language will march on. We can use our language to reflect outdated and harmful stereotypes, or we can use it to treat others with the respect they deserve. I know which one I choose.

To Boldly Split Infinitives

Today is the fiftieth anniversary of the first airing of Star Trek, so I thought it was a good opportunity to talk about split infinitives. (So did Merriam-Webster, which beat me to the punch.) If you’re unfamiliar with split infinitives or have thankfully managed to forget what they are since your high school days, it’s when you put some sort of modifier between the to and the infinitive verb itself—that is, a verb that is not inflected for tense, like be or go—and for many years it was considered verboten.

Kirk’s opening monologue on the show famously featured the split infinitive “to boldly go”, and it’s hard to imagine the phrase working so well without it. “To go boldly” and “boldly to go” both sound terribly clunky, partly because they ruin the rhythm of the phrase. “To BOLDly GO” is a nice iambic bimeter, meaning that it has two metrical feet, each consisting of an unstressed syllable followed by a stressed syllable—duh-DUN duh-DUN. “BOLDly to GO” is a trochee followed by an iamb, meaning that we have a stressed syllable, two unstressed syllables, and then another stressed syllable—DUN-duh duh-DUN. “To GO BOLDly” is the reverse, an iamb followed by a trochee, leading to a stress clash in the middle where the two stresses butt up against each other and then ending on a weaker unstressed syllable. Blech.

But the root of the alleged problem with split infinitives concerns not meter but syntax. The question is where it’s syntactically permissible to put a modifier in a to-infinitive phrase. Normally, an adverb would go just in front of the verb it modifies, as in She boldly goes or He will boldly go. Things were a little different when the verb was an infinitive form preceded by to. In this case the adverb often went in front of the to, not in front of the verb itself.

As Merriam-Webster’s post notes, split infinitives date back at least to the fourteenth century, though they were not as common back then and were often used in different ways than they are today. But they mostly fell out of use in the sixteenth century and then roared back to life in the eighteenth century, only to be condemned by usage commentators in the nineteenth and twentieth centuries. (Incidentally, this illustrates a common pattern of prescriptivist complaints: a new usage arises, or perhaps it has existed for literally millennia, it goes unnoticed for decades or even centuries, someone finally notices it and decides they don’t like it (often because they don’t understand it), and suddenly everyone starts decrying this terrible new thing that’s ruining English.)

It’s not particularly clear, though, why people thought that this particular thing was ruining English. The older boldly to go was replaced by the resurgent to boldly go. It’s often claimed that people objected to split infinitives on the basis of analogy with Latin (Merriam-Webster’s post repeats this claim). In Latin, an infinitive is a single word, like ire, and it can’t be split. Ergo, since you can’t split infinitives in Latin, you shouldn’t be able to split them in English either. The problem with this theory is that there’s no evidence to support it. Here’s the earliest recorded criticism of the split infinitive, according to Wikipedia:

The practice of separating the prefix of the infinitive mode from the verb, by the intervention of an adverb, is not unfrequent among uneducated persons. . . . I am not conscious, that any rule has been heretofore given in relation to this point. . . . The practice, however, of not separating the particle from its verb, is so general and uniform among good authors, and the exceptions are so rare, that the rule which I am about to propose will, I believe, prove to be as accurate as most rules, and may be found beneficial to inexperienced writers. It is this :—The particle, TO, which comes before the verb in the infinitive mode, must not be separated from it by the intervention of an adverb or any other word or phrase; but the adverb should immediately precede the particle, or immediately follow the verb.

No mention of Latin or of the supposed unsplittability of infinitives. In fact, the only real argument is that uneducated people split infinitives, while good authors didn’t. Some modern usage commentators have used this purported Latin origin of the rule as the basis of a straw-man argument: Latin couldn’t split infinitives, but English isn’t Latin, so the rule isn’t valid. Unfortunately, Merriam-Webster’s post does the same thing:

The rule against splitting the infinitive comes, as do many of our more irrational rules, from a desire to more rigidly adhere (or, if you prefer, “to adhere more rigidly”) to the structure of Latin. As in Old English, Latin infinitives are written as single words: there are no split infinitives, because a single word is difficult to split. Some linguistic commenters have pointed out that English isn’t splitting its infinitives, since the word to is not actually a part of the infinitive, but merely an appurtenance of it.

The problem with this argument (aside from the fact that the rule wasn’t based on Latin) is that modern English infinitives—not just Old English infinitives—are only one word too and can’t be split either. The infinitive in to boldly go is just go, and go certainly can’t be split. So this line of argument misses the point: the question isn’t whether the infinitive verb, which is a single word, can be split in half, but whether an adverb can be placed between to and the verb. As Merriam-Webster’s Dictionary of English Usage notes, the term split infinitive is a misnomer, since it’s not really the infinitive but the construction containing an infinitive that’s being split.

But in recent years I’ve seen some people take this terminological argument even further, saying that split infinitives don’t even exist because English infinitives can’t be split. I think this is silly. Of course they exist. It used to be that people would say boldly to go; then they started saying to boldly go instead. It doesn’t matter what you call the phenomenon of moving the adverb so that it’s snug up against the verb—it’s still a phenomenon. As Arnold Zwicky likes to say, “Labels are not definitions.” Just because the name doesn’t accurately describe the phenomenon doesn’t mean it doesn’t exist. We could call this phenomenon Steve, and it wouldn’t change what it is.

At this point, the most noteworthy thing about the split infinitive is that there are still some people who think there’s something wrong with it. The original objection was that it was wrong because uneducated people used it and good writers didn’t, but that hasn’t been true in decades. Most usage commentators have long since given up their objections to it, and some even point out that avoiding a split infinitive can cause awkwardness or even ambiguity. In his book The Sense of Style, Steven Pinker gives the example The board voted immediately to approve the casino. Which word does immediately modify—voted or approve?

But this hasn’t stopped The Economist from maintaining its opposition to split infinitives. Its style guide says, “Happy the man who has never been told that it is wrong to split an infinitive: the ban is pointless. Unfortunately, to see it broken is so annoying to so many people that you should observe it.”

I call BS on this. Most usage commentators have moved on, and I suspect that most laypeople either don’t know or don’t care what a split infinitive is. I don’t think I know a single copy editor who’s bothered by them. If you’ve been worrying about splitting infinitives since your high school English teacher beat the fear of them into you, it’s time to let it go. If they’re good enough for Star Trek, they’re good enough for you too.

On a Collision Course with Reality

In a blog post last month, John McIntyre took the editors of the AP Stylebook to task for some of the bad rules they enforce. One of these was the notion that “two objects must be in motion to collide, that a moving object cannot collide with a stationary object.” That is, according to the AP Stylebook, a car cannot collide with a tree, because the tree is not moving, and it can only collide with another car if that other car is moving. McIntyre notes that this rule is not supported by Fowler’s Modern English Usage or even mentioned in Garner’s Modern American Usage.

Merriam-Webster’s Dictionary of English Usage does have an entry for collide and notes that the rule is a tradition (read “invention”) of American newspaper editors. It’s not even clear where the rule came from or why; there’s nothing in the etymology of the word to suggest that only two objects in motion can collide. It comes from the Latin collidere, meaning “to strike together”, from com- “together” + laedere “to strike”.

The rule is not supported by traditional usage either. Speakers and writers of English have been using collide to refer to bodies that are not both in motion for as long as the word has been in use, which is roughly four hundred years. Nor is the rule an attempt to slow language change or hang on to a fading distinction; it’s an attempt to create a distinction and impose it on everyone who uses the language, or at least journalists.

What I found especially baffling was the discussion that took place on Mr. McIntyre’s Facebook page when he shared the link there. Several people chimed in to defend the rule, with one gentleman saying, “There’s an unnecessary ambiguity when ‘collides’ involves <2 moving objects.” Mr. McIntyre responded, “Only if you imagine one.” And this is key: collide is ambiguous only if you have been taught that it is ambiguous—or in other words, only if you’re a certain kind of journalist.

In that Facebook discussion, I wrote,

So the question is, is this actually a problem that needs to be solved? Are readers constantly left scratching their heads because they see “collided with a tree” and wonder how a tree could have been moving? If nobody has ever found such phrasing confusing, then insisting on different phrasing to avoid potential ambiguity is nothing but a waste of time. It’s a way to ensure that editors have work to do, not a way to ensure that editors are adding benefit for the readers.

The discussion thread petered out after that.

I’m generally skeptical of the usefulness of invented distinctions, but this one seems especially useless. When would it be important to distinguish between a crash involving two moving objects and one involving only one moving object? Wouldn’t it be clear from context anyway? And if it’s not clear from context, how on earth would we expect most readers—who have undoubtedly never heard of this journalistic shibboleth—to pick up on it? Should we avoid using words like crash or struck because they’re ambiguous in the same way—because they don’t tell us whether both objects were moving?

It doesn’t matter how rigorously you follow the rule in your own writing or in the writing you edit; if your readers think that collide is synonymous with crash, then they will assume that your variation between collide and crash is merely stylistic. They’ll have no idea that you’re trying to communicate something else. If it’s important, they’ll probably deduce from context whether both objects were moving, regardless of the word you use.

In other words, if an editor makes a distinction and no reader picks up on it, is it still useful?


A Rule Worth Giving Up On

A few weeks ago, the official Twitter account for the forthcoming movie Deadpool tweeted, “A love for which is worth killing.” Name developer Nancy Friedman commented, “There are some slogans up with which I will not put.” Obviously, with a name like Arrant Pedantry, I couldn’t let that slogan pass by without comment.

The slogan is obviously attempting to follow the old rule against stranding prepositions. Prepositions usually come before their complements, but there are several constructions in English in which they’re commonly stranded, or left at the end without their complements. Preposition stranding is especially common in speech and informal writing, whereas preposition fronting (or keeping the preposition with its complement) is more typical of a very formal style. For example, you’d probably say Who did you give it to? when talking to a friend, but in a very formal situation, you might move that preposition up to the front: To whom did you give it?

This rule has been criticized and debunked countless times, but even if you believe firmly in it, you should recognize that there are some constructions where you can’t follow it. That is, following the rule sometimes produces sentences that are stylistically bad if not flat-out ungrammatical. The following constructions all require preposition stranding:

  1. Relative clauses introduced by that. The relative pronoun that cannot come after a preposition, which is one reason why some linguists argue that it’s really a conjunction (a form of the complementizer that) and not a true pronoun. You can’t say There aren’t any of that I know—you have to use which instead or leave the preposition at the end—There aren’t any that I know of.
  2. Relative clauses introduced with an omitted relative. As with the above example, the preposition in There aren’t any I know of can’t be fronted. There isn’t even anything to put it in front of, because the relative pronoun is gone. This should probably be considered a subset of the first item, because the most straightforward analysis is that relative that is omissible while other relatives aren’t. (This is another reason why some consider it not a true pronoun but rather a form of the complementizer thatthat is often omissible.)
  3. The fused relative construction. When you use what, whatever, or whoever as a relative pronoun, as in the U2 song “I Still Haven’t Found What I’m Looking For”, the preposition must come at the end. Strangely, Reader’s Digest once declared that the correct version would be “I Still Haven’t Found for What I’m Looking”. But this is ungrammatical, because “what” cannot serve as the object of “for”. For the fronted version to work, you have to reword it to break up the fused relative: “I Still Haven’t Found That for Which I’m Looking”.
  4. A subordinate interrogative clause functioning as the complement of a preposition. The Cambridge Grammar of the English Language gives the example We can’t agree on which grant we should apply for. The fronted form We can’t agree on for which grant we should apply sounds stilted and awkward at best.
  5. Passive clauses where the subject has been promoted from an object of a preposition. In Her apartment was broken into, there’s no way to reword the sentence to avoid the stranded preposition, because there’s nothing to put the preposition in front of. The only option is to turn it back into an active clause: Someone broke into her apartment.
  6. Hollow non-finite clauses. A non-finite clause is one that uses an infinitive or participial form rather than a tensed verb, so it has no overt subject. A hollow non-finite clause is also missing some other element that can be recovered from context. In That book is too valuable to part with, for example, the hollow non-finite clause is to part with. With is missing a complement, which makes it hollow, though we can recover its complement from context: that book. Sometimes you can flip a hollow non-finite clause around and insert the dummy subject it to put the complement back in its place. It’s too valuable to part with that book doesn’t really work, though It’s worth killing for a love is at least grammatical. It’s worth killing for this love is better, but in this case A love worth killing for is still stylistically preferable. But the important thing to note is that since the complement of the preposition is missing, there’s nowhere to move the preposition to. It has to remain stranded.

And that’s where the Deadpool tweet goes off the rails. Rather than leave the preposition stranded, they invent a place for it by inserting the completely unnecessary relative pronoun which. But A love for which worth killing sounds like caveman talk, so they stuck in the similarly unnecessary is: A love for which is worth killing. They’ve turned the non-finite clause into a finite one, but now it’s missing a subject. They could have fixed that by inserting a dummy it, as in A love for which it is worth killing, but they didn’t. The result is a completely ungrammatical mess, but one that sounds just sophisticated enough, thanks to its convoluted syntax, that it might fool some people into thinking it’s some sort of highfalutin form. It’s not.

Instead, it’s some sort of hideous freak, the product of an experiment conducted by people who didn’t fully understand what they were doing, just like Deadpool himself. Unlike Deadpool, though, this sentence doesn’t have any superhuman healing powers. If you ever find yourself writing something like this, do the merciful thing and put it out of its misery.


Historic, Historical

My brother recently asked me how to use pairs of words like historic/historical, mathematic/mathematical, and problematic/problematical. The typical usage advice is pretty straightforward—use historic to refer to important things from history and historical to refer to anything having to do with past events, important or not—but the reality of usage is a lot more complicated.

According to the Oxford English Dictionary, historic was first used as an adjective meaning “related to history; concerned with past events” in 1594. In 1610, it appeared in the sense “belonging to, constituting, or of the nature of history; in accordance with history”; sometimes this was contrasted with prehistoric. It wasn’t until 1756 that it was first used to mean “having or likely to have great historical importance or fame; having a significance due to connection with historical events.” The first edition of the OED called this “the prevailing current sense”, but the current edition notes that the other senses are still common.

The history of historical isn’t much clearer. It first appeared as an adjective meaning “belonging to, constituting, or of the nature of history; in accordance with history” (sense 2 of historic) in 1425, though there aren’t any more citations in this sense until the mid- to late 1500s, about the same time that historic began to be used in this sense. Also in 1425, it appeared in the sense of a work that is based on or depicts events from history, though this sense also didn’t appear again until the late 1500s. In the broader sense “related to history; concerned with past events”, it appeared in 1521, several decades before historichistoric appeared in this sense.

In other words, both of these words have been used in essentially all of these senses for all of their history, and they both appeared around the same time. It’s not as if one clearly came first in one sense and the other clearly came first in the other sense. There is no innate distinction between the two words, though a distinction has begun to emerge over the last century or so of use.

Other such pairs are not much clearer. The OED gives several senses for mathematical beginning in 1425, and for mathematic it simply says, “= mathematical adj. (in various senses)”, with citations beginning in 1402. Apparently they are interchangeable. Problematic/problematical seem to be interchangeable as well, though problematical is obsolete in the logic sense.

But rather than go through every single pair, I’ll just conclude with what Grammarist says on the topic:

There is no rule or consistent pattern governing the formation of adjectives ending in -ic and -ical. . . .

When you’re in doubt about which form is preferred or whether an -ic/-ical word pair has differentiated, the only way to know for sure is to check a dictionary or other reference source.


Overanxious about Ambiguity

As my last post revealed, a lot of people are concerned—or at least pretend to be concerned—about the use of anxious to mean “eager” or “excited”. They claim that since it has multiple meanings, it’s ambiguous, and thus the disparaged “eager” sense should be avoided. But as I said in my last post, it’s not really ambiguous, and anyone who claims otherwise is simply being uncooperative.

Anxious entered the English language in the the early to mid-1600s in the sense of “troubled in mind; fearful; brooding”. But within a century, the sense had expanded to mean “earnestly desirous” or “eager”. That’s right—the allegedly new sense of the word was already in use before the United States declared independence.

These two meanings existed side by side until the early 1900s, when usage commentators first decided to be bothered by the “eager” sense. And make no mistake—this was a deliberate decision to be bothered. Merriam-Webster’s Dictionary of English Usage includes this anecdote from Alfred Ayres in 1901:

Only a few days ago, I heard a learned man, an LL.D., a dictionary-maker, an expert in English, say that he was anxious to finish the moving of his belongings from one room to another.

“No, you are not,” said I.

“Yes, I am. How do you know?”

“I know you are not.”

“Why, what do you mean?”

“There is no anxiety about it. You are simply desirous.”

Ayres’s correction has nothing to do with clarity or ambiguity. He obviously knew perfectly well what the man meant but decided to rub his nose in his supposed error instead. One can almost hear his self-satisfied smirk as he lectured a lexicographer—a learned man! a doctor of laws!—on the use of the language he was supposed to catalog.

A few years later, Ambrose Bierce also condemned this usage, saying that anxious should not be used to mean “eager” and that it should not be followed by an infinitive. As MWDEU notes, anxious is typically used to mean “eager” when it is followed by an infinitive. But it also says that it’s “an oversimplification” to say that anxious is simply being used to mean “eager”. It notes that “the word, in fact, fairly often has the notion of anxiety mingled with that of eagerness.” That is, anxious is not being used as a mere synonym of eager—it’s being used to indicate not just eagerness but a sort of nervous excitement or anticipation.

MWDEU also says that this sense is the predominant one in the Merriam-Webster citation files, but a search in COCA doesn’t quite bear this out—only about a third of the tokens are followed by to and are clearly used in the “eager” sense. Google Books Ngrams, however, shows that to is by far the most common word that immediately follows anxious; that is, people are anxious to do something far more often than they’re anxious about something.

This didn’t stop one commenter from claiming that not only is this use of anxious confusing, but she’d literally never encountered it before. It’s hard to take such a claim seriously when this use is not only common but has been common for centuries.

It’s also hard to take seriously the claim that it’s ambiguous when nobody can manage to find an example that’s actually ambiguous. A few commenters offered made-up examples that seemed designed to be maximally ambiguous when presented devoid of context. They also ignored the fact that the “eager” sense is almost always followed by an infinitive. That is, as John McIntyre pointed out, no English speaker would say “I was anxious upon hearing that my mother was coming to stay with us” or “I start a new job next week and I’m really anxious about that” if they meant that they were eager or excited.

Another commenter seemed to argue that the problem was that language was changing in an undesirable way, saying, “It’s clearly understood that language evolves, but some of us might prefer a different or better direction for that evolution. . . . Is evolution the de facto response for any misusage in language?”

But this comment has everything backwards. Evolution isn’t the response to misuse—claims of misuse are (occasionally) the response to evolution. The word anxious changed in a very natural way, losing some of its negative edge and being used in a more neutral or positive way. The same thing happened to the word care, which originally meant “to sorrow or grieve” or “to be troubled, uneasy, or anxious”, according to the Oxford English Dictionary. Yet nobody complains that everyone is misusing the word today.

That’s because nobody ever decided to be bothered by it as they did with anxious. The claims of ambiguity or undesired language change are all post hoc; the real objection to this use of anxious was simply that someone decided on the basis of etymology—and in spite of established usage—that it was wrong, and that personal peeve went viral and became established in the usage literature.

It’s remarkably easy to convince yourself that something is an error. All you have to do is hear someone say that it is, and almost immediately you’ll start noticing the error everywhere and recoiling in horror every time you encounter it. And once the idea that it’s an error has become lodged in your brain, it’s remarkably difficult to dislodge it. We come up with an endless stream of bogus arguments to rationalize our pet peeves.

So if you choose to be bothered by this use of anxious, that’s certainly your right. But don’t pretend that you’re doing the language a service.


No, Online Grammar Errors Have Not Increased by 148%

Yesterday a post appeared on (home of Grammar Girl’s popular podcast) that appears to have been written by a company called Knowingly, which is promoting its Correctica grammar-checking tool. They claim that “online grammar errors have increased by 148% in nine years”. If true, it would be a pretty shocking claim, but the numbers immediately sent up some red flags.

They searched for seventeen different errors and compared the numbers from nine years ago to the numbers from today. From the description, I gather that the first set of numbers comes from a publicly available set of data that Google culled from public web pages. The data was released in 2006 and is hosted by the Linguistic Data Consortium. You can read more about the data here, but this part is the most relevant:

We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.

So the data is taken from over a trillion words of text, but some sequences were discarded if they didn’t appear frequently enough, and you can only search sequences up to five words long. Also note that while the data was released in 2006, it does not necessarily all come from 2006; some of it could have come from web pages that were older than that.

It sounds like the second set of numbers comes from a series of Google searches—it simply says “search result data today”. It isn’t explicitly stated, but it appears that the search terms were put in quotes to find exact strings. But we’re already comparing apples and oranges: though the first set of data came from a known sample size (just over a trillion words) and and was cleaned up a bit by having outliers thrown out, we have no idea how big the second sample size is. How many words are you effectively searching when you do a search in Google?

This is why corpora usually present not just raw numbers but normalized numbers—that is, not just an overall count, but a count per thousand words or something similar. Knowing that you have 500 instances of something in data set A and 1000 instances in data set B doesn’t mean anything unless you know how big those sets are, and in this case we don’t.

This problem is ameliorated somewhat by looking not just at the raw numbers but at the error rates. That is, they searched for both the correct and incorrect forms of each item, calculated how frequent the erroneous form was, and compared the rates from 2006 to the rates from 2015. It would still be better to compare two similar datasets, because we have no idea how different the cleaned-up Google Ngrams data is from raw Google search data, but at least this allows us to make some rough comparisons. But notice the huge differences between the “then” and “now” numbers in the table below. Obviously the 2015 data represents a much larger set. (I’ve split their table into two pieces, one for the correct terms and one for the incorrect terms, to make them fit in my column here.)

Correct Term



jugular vein



bear in mind



head over heels



chocolate mousse



egg yolk



without further ado



whet your appetite



heroin and morphine



reach across the aisle



herd mentality



weather vane



zombie horde



chili peppers



brake pedal



pique your interest



lessen the burden



bridal shower



Incorrect Term



juggler vein



bare in mind



head over heals



chocolate moose



egg yoke



without further adieu



wet your appetite



heroine and morphine



reach across the isle



heard mentality



weather vein



zombie hoard



chilly peppers



brake petal



peek your interest



lesson the burden



bridle shower



But then the Correctica team commits a really major statistical goof—they average all those percentages together to calculate an overall percentage. Here’s their data again:

Incorrect Term




juggler vein




bare in mind




head over heals




chocolate moose




egg yoke




without further adieu




wet your appetite




heroine and morphine




reach across the isle




heard mentality




weather vein




zombie hoard




chilly peppers




brake petal




peek your interest




lesson the burden




bridle shower







They simply add up all the percentages (1.2% + 1.9% + 6.6% + . . .) and divide by the numbers of percentages, 17. But this number is meaningless. Imagine that we were comparing two items: isn’t is used 9,900 times and ain’t 100 times, and regardless is used 999 times and irregardless 1 time. This means that when there’s a choice between isn’t and ain’t, ain’t is used 1% of the time (100/(9900+100)), and when there’s a choice between regardless and irregardless, irregardless is used .1% of the time (1/(999+1)). If you average 1% and .1%, you get .55%, but this isn’t the overall error rate.

But to get an overall error rate, you need to calculate the percentage from the totals. We have to take the total number of errors and the total number of opportunities to use either the correct or the incorrect form. This gives us (1+100/((9900+999)+(100+1))), or 101/11000, which works out to .92%, not .55%.

When we count up the totals and calculate the overall rates, we get an error rate of 1.88% for then (not 3.4%) and 2.38% for now (not 8.4%). That means the increase from 2006 to 2009 is not 148.2%, but a much more modest 26.64%. (By the way, I’m not sure where they got 148.2%; by my calculations, it should be 147.1%, but I could have made a mistake somewhere.) This is still a rather impressive increase in errors from 2009 to today, but the problems with the data set make it impossible to say for sure if this number is accurate or meaningful. “Heroine and morphine” occurred 45 times out of over a trillion words. Even if the error rate jumped 141.73% from 2009 to 2015, and even if the two sample sets were comparable, this would still probably amount to nothing more than statistical noise.

And even if these numbers were accurate and meaningful, there’s still the question of research design. They claim that grammar errors have increased, but all of the items are spelling errors, and most of them are rather obscure ones at that. At best, this study only tells us that these errors have increased that much, not that grammar errors in general have increased that much. If you’re setting out to study grammar errors (using grammar in the broad sense), why would you assume that these items are representative of the phenomenon in general?

So in sum, the study is completely bogus, and it’s obviously nothing more than an attempt to sell yet another grammar-checking service. Is it important to check your writing for errors? Sure. Can Correctica help you do that? I have no idea. But I do know that this study doesn’t show an epidemic of grammar errors as it claims to.

(Here’s the data if anyone’s interested.)


Fifty Shades of Bad Grammar Advice

A few weeks ago, the folks at the grammar-checking website Grammarly wrote a piece about supposed grammar mistakes in Fifty Shades of Grey. Despite being a runaway hit, the book has frequently been criticized for its terrible prose, and Grammarly apparently saw an opportunity to fix some of the book’s problems (and probably sell its grammar-checking services along the way).

The first problem, of course, is that most of the errors Grammarly identified have nothing to do with grammar. The second is that most of their edits not only fail to fix the clunky prose but actually make it worse.

Mark Allen already took Grammarly to task in a post on the Copyediting blog, saying that their edits “lack restraint”, that “the list is full of style choices and non-errors”, and that “it fails to make a case for the value of proofreading, and, by association, . . . reflects poorly on the craft of copyediting.” I agreed and thought at the time that nothing more needed to be said.

But then Grammarly decided to go even further. In this infographic, they claim to have found “similar gaffes” in the works of authors ranging from Nicholas Sparks to Shakespeare.

The first edit suggests that Nicholas Sparks needs a comma in the sentence “I am a common man with common thoughts and I’ve led a common life.” It’s true that this is a compound sentence, and such sentences typically require a comma between the two independent clauses. But The Chicago Manual of Style says that the comma can be omitted when the clauses are short and closely related. This isn’t an error so much as a style choice.

Incidentally, Grammarly says that “E. L. James is not the first author to include a comma in her work when a semi-colon would be more appropriate, or vice versa.” But the supposed error here isn’t that James used a comma when she should have used a semicolon; it’s that she didn’t use a comma at all. (Also note that “semicolon” is not spelled with a hyphen and that the comma before “or vice versa” is not necessary.)

Error number 2 is comma misuse (which is somehow different from error number 1, which is also comma misuse). Grammarly says, “Many writers forget to include a comma when one is necessary, or include a comma when it is not necessary.” (By the way, the comma before “or include a comma when it is not necessary” is not necessary.) The supposed offender here is Hemingway, who wrote, “We would be together and have our books and at night be warm in bed together with the windows open and the stars bright.” Grammarly suggests putting a comma after “at night”, but that would be a mistake.

The sentence has a compound predicate with three verb phrases strung together with ands. Hemingway says that “We would (1) be together and (2) have our books and (3) at night be warm in bed together with the windows open and the stars bright.” You don’t need a comma between the parts of a compound predicate, and if you want to set off the phrase “at night”, then you need commas on both sides: “We would be together and have our books and, at night, be warm in bed together with the windows open and the stars bright.” But that destroys the rhythm of the sentence and interferes with Hemingway’s signature style.

Error number 3 is wordiness, and the offender is Edith Wharton, who wrote, “Each time you happen to me all over again.” Grammarly suggests axing “all over”, leaving “Each time you happen to me again”. But this edit doesn’t fix a wordy sentence so much as it kills its emphasis. This is dialogue; shouldn’t dialogue sound like the way people talk?

Error number 4, colloquialisms, is not even an error by Grammarly’s own admission—it’s a stylistic choice. And choosing to use colloquialisms—more particularly, contractions—is a perfectly valid stylistic choice in fiction, especially in dialogue. Changing “doesn’t sound very exciting” to “it does not sound very exciting” is probably fine if you’re editing dialogue for Data from Star Trek, but it just isn’t how normal people talk.

The next error, commonly confused words, is a bit of a head-scratcher. Here Grammarly fingers F. Scott Fitzgerald for writing “to-night” rather than “tonight”. But this has nothing to do with confused words, because they’re the same word. To-night was the more common spelling until the 1930s, when the unhyphenated tonight surpassed it. This is not an error at all, let alone an error involving commonly confused words.

The sixth error, sentence fragments, is again debatable, and Grammarly even acknowledges that using fragments “is one way to emphasize an idea.” Once again, Grammarly says that it’s a style choice that for some reason you should never make. The Chicago Manual of Style, on the other hand, rightly acknowledges that the proscription against sentence fragments has “no historical or grammatical foundation.”

Error number 7 is another puzzler. They say that determiners “help writers to be specific about what they are talking about.” Then they say that Boris Pasternak should have written “sent down to the earth” rather than “sent down to earth” in Doctor Zhivago. Where on the earth did they get that idea? Not only is “down to earth” far more common in writing, but there’s nothing unclear about it. Adding the “the” doesn’t solve any problem because there is no problem here. Incidentally, they say the error has to do with determiners, but they’re really talking about articles—a, an, and the. Articles are simply one type of determiner, which also includes possessive determiners, demonstratives, and quantifiers.

I’ll skip error number 8 for the moment and go to number 9, the passive voice. Again they note the passive voice is a stylistic choice and not a grammatical error, and then they edit it out anyway. In place of Mr. Darcy’s “My feelings will not be repressed” we now have “I will not repress my feelings.” Grammarly claims that the passive can cause “a lack of clarity in your writing”, but what is unclear about this line? Is anyone confused about it in the slightest? Instead of added clarity, we get a ham-fisted edit that shifts the focus from where it should be—the feelings—onto Mr. Darcy himself. This is exactly the sort of sentence that calls for the passive voice.

The eighth error is probably the most infuriating because it gets so many things wrong. Here they take Shakespeare himself to task over his supposed preposition misuse. They say that in The Tempest, Shakespeare should have written “such stuff on which dreams are made on” rather than “such stuff as dreams are made on”. The first problem with Grammarly’s correction is that it doubles the preposition “on”, creating a grammatical problem rather than fixing it.

The second problem with this correction is that which can’t be used as a relative pronoun referring to such—only as can do that. Their fix is not just awkward but doubly ungrammatical.

The third is that it simply ruins the meter of the line. Remember that Shakespeare often wrote in a meter called iambic pentameter, which means that each foot contains two syllables with stress on the second syllable and that there are five feet per line. Here’s the sentence from The Tempest:

We are such stuff
As dreams are made on, and our little life
Is rounded with a sleep.

(Note that these aren’t full lines because I’m omitting the text from surrounding sentences that make up part of the first and third lines.) Pay attention to the rhythm of those lines.

we ARE such STUFF

Now compare Grammarly’s fix:

we ARE such STUFF
on WHICH dreams ARE made ON and OUR littLE life

The second line has too many syllables, and the stresses have all shifted. Shakespeare’s line puts most of the stresses on nouns and verbs, while Grammarly’s fix puts it mostly on function words—pronouns, prepositions, determiners—and, maybe worst of all, on the second syllable of “little”. They have taken lines from one of the greatest writers in all of English history and turned them into ungrammatical doggerel. It takes some nerve to edit the Bard; it apparently takes sheer blinkered idiocy to edit him so badly.

So, just to recap, that’s nine supposed grammatical errors that Grammarly says will ruin your prose, most of which are not errors and have nothing to do with grammar. Their suggested fixes, on the other hand, sometimes introduce grammatical errors and always worsen the writing. The takeaway from all of this is not, as Grammarly says, that loves conquers all, but rather that Grammarly doesn’t know the first thing about grammar, let alone good writing.

Addendum: I decided to stop giving Grammarly such a bad time and help them out by editing their infographic pro bono.


Why Is It “Woe Is Me”?

I recently received an email asking about the expression woe is me, namely what the plural would be and why it’s not woe am I. Though the phrase may strike modern speakers as bizarre if not downright ungrammatical, there’s actually a fairly straightforward explanation: it’s an archaic dative expression. Strange as it may seem, the correct form really is woe is me, not woe am I or woe is I, and the first-person plural would simply be woe is us. I’ll explain why.

Today English only has three cases—nominative (or subjective), objective, and genitive (or possessive)—and these cases only apply to personal pronouns and who. Old English, on the other hand, had four cases (and vestiges of a fifth), and they applied to all nouns, pronouns, and adjectives. Among these four were two different cases for objects: accusative and dative. (The forms that we now think of simply as object pronouns actually descend from the dative pronouns, though they now cover the functions of both the accusative and dative.) These correspond roughly to direct and indirect objects, respectively, though they could be used in other ways too.

For instance, some prepositions took accusative objects, and some took dative objects (and some took either depending on the meaning). Nouns and pronouns in the accusative and dative cases could also be used in ways that seem strange to modern speakers. The dative, for example, could be used in places where we would normally use to and a pronoun. In some constructions we still have the choice between a pronoun or to and a pronoun—think of how you can say either I gave her the ball or I gave the ball to her—but in Old English you could do this to a much greater degree.

In the phrase woe is me, woe is the subject and me is a dative object, something that isn’t allowed in English today. It really means woe is to me. Today the phrase woe is me is pretty fixed, but some past variations on the phrase make the meaning a little clearer. Sometimes it was used with a verb, and sometimes woe was simply followed by a noun or prepositional phrase. In the King James Bible, we find “If I be wicked, woe unto me” (Job 10:15). One example from Old English reads, “Wa biþ þonne þæm mannum” (woe be then [to] those men).

So “woe is I” is not simply a fancy or archaic way of saying “I am woe” and is thus not parallel to constructions like “it is I”, where the nominative form is usually prescribed and the objective form is proscribed. In “woe is me”, “me” is not a subject complement (also known as a predicative complement) but a type of dative construction.

Thus the singular is is always correct, because it agrees with the singular mass noun woe. And though we don’t have distinct dative pronouns anymore, you can still use any pronoun in the object case, so woe is us would also be correct.

Addendum: Arika Okrent, writing at Mental Floss, has also just posted a piece on this construction. She goes into a little more detail on related constructions in English, German, and Yiddish.

And here are a couple of articles by Jan Freeman from 2007, specifically addressing Patricia O’Conner’s Woe Is I and a column by William Safire on the phrase:

Woe Is Us, Part 1
Woe Is Us, Continued

