Arrant Pedantry


Solstices, Vegetables, and Official Definitions

Summer officially began just a few days ago—at least that’s what the calendar says. June 20 was the summer solstice, the day when the northern hemisphere is most inclined towards the sun and consequently receives the most daylight. By this definition, summer lasts until the autumnal equinox, in late September, when days and nights are of equal length. But by other definitions, summer starts at the beginning of June and goes through August. Other less formal definitions may put the start of summer on Memorial Day or after the end of the school year (which for my children were the same this year).

For years I wondered why summer officially began so late into June. After all, shouldn’t the solstice, as the day when we receive the most sunlight, be the middle of summer rather than the start? But even though it receives the most sunlight, it’s not the hottest, thanks to something called seasonal lag. The oceans absorb a large amount of heat and continue to release that heat for quite some time after the solstice, so the hottest day may come a month or more after the day that receives the most solar energy. Summer officially starts later than it should to compensate for this lag.

But what does this have to do with language? It’s all about definitions, and definitions are arbitrary things. Laypeople may think of June 1 as the start of summer, but June 1 is a day of absolutely no meteorological or astronomical significance. So someone decided that the solstice would be the official start of summer, even though the period from June 20/21 to September 22/23 doesn’t completely encompass the hottest days of the year (at least not in most of the United States).

Sometimes the clash between common and scientific definitions engenders endless debate. Take the well-known argument about whether tomatoes are fruit. By the common culinary definition, tomatoes are vegetables, because they are used mostly in savory or salty dishes. Botanically, though, they’re fruit, because they’re formed from a plant’s ovaries and contain seeds. But tomatoes aren’t the only culinary vegetables that are botanical fruits: cucumbers, squashes, peas, beans, avocados, eggplants, and many other things commonly thought of as vegetables are actually fruits.

The question of whether a tomato is a fruit or a vegetable may have entered popular mythology following a Supreme Court case in 1893 that answered the question of whether imported tomatoes should be taxed as vegetables. The Supreme Court ruled that the law was written with the common definition in mind, so tomatoes got taxed, and people are still arguing about it over a century later.

Sometimes these definitional clashes even lead to strong emotions. Consider how many people got upset when the International Astronomical Union decided that Pluto wasn’t really a planet. People who probably hadn’t thought about planetary astronomy since elementary school passionately proclaimed that Pluto was always their favorite planet. Even some astronomers declared, “Pluto’s dead.” But nothing actually happened to Pluto, just to our definition of planet. Astronomers had discovered several other Pluto-like objects and suspect that there may be a hundred or more such objects in the outer reaches of the solar system.

Does it really make sense to call all of these objects planets? Should we expect students to learn the names of Eris, Sedna, Quaoar, Orcus, and whatever other bodies are discovered and named? Or is it perhaps more reasonable to use some agreed-upon criteria and draw a clear line between planets and other objects? After all, that’s part of what scientists do: try to increase our understanding of the natural world by describing features of and discovering relationships among different things. Sometimes the definitions are arbitrary, but they’re arbitrary in ways that are useful to scientists.

And this is the crux of the matter: sometimes definitions that are useful to scientists aren’t that useful to laypeople, just as common definitions aren’t always useful to scientists. These definitions are used by different people for different purposes, and so they continue to exist side by side. Scientific definitions have their place, but they’re not automatically or inherently more correct than common definitions. And there’s nothing wrong with this. After all, tomatoes may be fruit, but I don’t want them in my fruit salad.


Completion Successful

The other day I added some funds to my student card and saw a familiar message: “Your Deposit Completed Successfully!” I’ve seen the similar message “Completion successful” on gas pumps after I finish pumping gas. These messages seem perfectly ordinary at first glance, but the more I thought about them, the more I realized how odd they are. Though they’re intended as concise messages to let me know that everything worked the way it was supposed to, I had to wonder what it meant for a completion to be successful.

The first question is, what is it that’s being completed? Obviously it must be the transaction. But rather than describing the transaction as successful or unsuccessful, it describes the act of completing the transaction as such. So is it possible to separate the notions of completion and success? In my mind, the fact that the transaction is complete means that it was successful, and vice versa. An incomplete transaction would be unsuccessful. After all, if the transaction were incomplete or unsuccessful, it certainly wouldn’t give me a message like “Completion unsuccessful” or, worse yet, “Incompletion successful”.

So saying that the completion is successful is really just another way of saying that the transaction is complete. But as a consumer, I don’t really care that the abstract act of completing the transaction is successful—I just care that the transaction is complete. The message takes what I care about (the completion), nominalizes it, and reports on the status of the nominalization instead.

What I can’t figure out is why the messages would frame the status of the transaction in such an odd way. Perhaps it’s a case of what Geoffrey Pullum calls nerdview, which is when experts frame public language in a way that makes sense to them but that seems odd or nonsensical to laypeople. Perhaps from the perspective of the company processing the credit card transaction, there’s a difference between the completeness of the transaction and its success. I don’t know—I don’t work at a bank, and I don’t care enough to research how credit card transactions are processed. I just want to put some money on my student card so I can buy some donuts from the vending machine.


Hanged and Hung

The distinction between hanged and hung is one of the odder ones in the language. I remember learning in high school that people are hanged, pictures are hung. There was never any explanation of why it was so; it simply was. It was years before I learned the strange and complicated history of these two words.

English has a few pairs of related verbs that are differentiated by their transitivity: lay/lie, rise/raise, and sit/set. Transitive verbs take objects; intransitive ones don’t. In each of these pairs, the intransitive verb is strong, and the transitive verb is weak. Strong verbs inflect for the preterite (simple past) and past participle forms by means of a vowel change, such as sing–sang–sung. Weak verbs add the -(e)d suffix (or sometimes just a -t or nothing at all if the word already ends in -t). So lie–lay–lain is a strong verb, and lay–laid–laid is weak. Note that the subject of one of the intransitive verbs becomes the object when you use its transitive counterpart. The book lay on the floor but I laid the book on the floor.

Historically hang belonged with these pairs, and it ended up in its current state through the accidents of sound change and history. It was originally two separate verbs (the Oxford English Dictionary actually says it was three—two Old English verbs and one Old Norse verb—but I don’t want to go down that rabbit hole) that came to be pronounced identically in their present-tense forms. They still retained their own preterite and past participle forms, though, so at one point in Early Modern English hang–hung–hung existed alongside hang–hanged–hanged.

Once the two verbs started to collapse together, the distinction started to become lost too. Just look at how much trouble we have keeping lay and lie separate, and they only overlap in the present lay and the past tense lay. With identical present tenses, hang/hang began to look like any other word with a choice between strong and weak past forms, like dived/dove or sneaked/snuck. The transitive/intransitive distinction between the two effectively disappeared, and hung won out as the preterite and past participle form.

The weak transitive hanged didn’t completely vanish, though; it stuck around in legal writing, which tends to use a lot of archaisms. Because it was only used in legal writing in the sense of hanging someone to death (with the poor soul as the object of the verb), it picked up the new sense that we’re now familiar with, whether or not the verb is transitive. Similarly, hung is used for everything but people, whether or not the verb is intransitive.

Interestingly, German has mostly hung on to the distinction. Though the German verbs both merged in the present tense into hängen, the past forms are still separate: hängen–hing–gehungen for intransitive forms and hängen–hängte–gehängt for transitive. Germans would say the equivalent of I hanged the picture on the wall and The picture hung on the wall—none of this nonsense about only using hanged when it’s a person hanging by the neck until dead.

The surprising thing about the distinction in English is that it’s observed (at least in edited writing) so faithfully. Usually people aren’t so good at honoring fussy semantic distinctions, but here I think the collocates do a lot of the work of selecting one word or the other. Searching for collocates of both hanged and hung in COCA, we find the following words:



The hanged words pretty clearly all hanging people, whether by suicide, as punishment for murder, or in effigy. (The collocations with burned were all about hanging and burning people or effigies.) The collocates for hung show no real pattern; it’s simply used for everything else. (The collocations with neck were not about hanging by the neck but about things being hung from or around the neck.)

So despite what I said about this being one of the odder distinctions in the language, it seems to work. (Though I’d like to know to what extent, if any, the distinction is an artifact of the copy editing process.) Hung is the general-use word; hanged is used when a few very specific and closely related contexts call for it.


The Enormity of a Usage Problem

Recently on Twitter, Mark Allen wrote, “Despite once being synonyms, ‘enormity’ and ‘enormousness’ are different. Try to keep ‘enormity’ for something evil or outrageous.” I’ll admit right off that this usage problem interests me because I didn’t learn about the distinction until a few years ago. To me, they’re completely synonymous, and the idea of using enormity to mean “an outrageous, improper, vicious, or immoral act” and not “the quality or state of being huge”, as Merriam-Webster defines it, seems almost quaint.

Of course, such usage advice presupposes that people are using the two words synonymously; if they weren’t, there’d be no reason to tell them to keep the words separate, so the assertion that they’re different is really an exhortation to make them different. Given that, I had to wonder how different they really are. I turned to Mark Davies Corpus of Contemporary American English to get an idea of how often enormity is used in the sense of great size rather than outrageousness or immorality. I looked at the first hundred results from the keyword-in-context option, which randomly samples the corpus, and tried to determine which of the four Merriam-Webster definitions was being used. For reference, here are the four definitions:

1 : an outrageous, improper, vicious, or immoral act enormities of state power — Susan Sontag> enormities too juvenile to mention — Richard Freedman>
2 : the quality or state of being immoderate, monstrous, or outrageous; especially : great wickedness enormity of the crimes committed during the Third Reich — G. A. Craig>
3 : the quality or state of being huge : immensity
enormity of the universe>
4 : a quality of momentous importance or impact
enormity of the decision>

In some cases it was a tough call; for instance, when someone writes about the enormity of poverty in India, enormity has a negative connotation, but it doesn’t seem right to substitute a word like monstrousness or wickedness. It seems that the author simply means the size of the problem. I tried to use my best judgement based on the context the corpus provides, but in some cases I weaseled out by assigning a particular use to two definitions. Here’s my count:

1: 1
2: 19
2/3: 3
3: 67
3/4: 1
4: 9

By far the most common use is in the sense of “enormousness”; the supposedly correct senses of great wickedness (definitions 1 and 2) are used just under a quarter of the time. So why did Mr. Allen say that enormity and enormousness were once synonyms? Even the Oxford English Dictionary marks the “enormousness” sense as obsolete and says, “Recent examples might perh. be found, but the use is now regarded as incorrect.” Perhaps? It’s clear from the evidence that it’s still quite common—about three times as common as the prescribed “monstrous wickedness” sense.

It’s true that the sense of immoderateness or wickedness came along before the sense of great size. The first uses as recorded in the OED are in the sense of “a breach of law or morality” (1477), “deviation from moral or legal rectitude” (1480), “something that is abnormal” (a1513), and “divergence from a normal standard or type” (a1538). The sense of “excess in magnitude”—the one that the OED marks as obsolete and incorrect—didn’t come along until 1792. In all these senses the etymology is clear: the word comes from enorm, meaning “out of the norm”.

As is to be expected, Merriam-Webster’s Dictionary of English Usage has an excellent entry on the topic. It notes that many of the uses of enormity considered objectionable carry shades of meaning or connotations not shown by enormousness:

Quite often enormity will be used to suggest a size that is beyond normal bounds, a size that is unexpectedly great. Hence the notion of monstrousness may creep in, but without the notion of wickedness. . . .

In many instances the notion of great size is colored by aspects of the first sense of enormity as defined in Webster’s Second. One common figurative use blends together notions of immoderateness, excess, and monstrousness to suggest a size that is daunting or overwhelming.

Indeed, it’s the blending of senses that made it hard to categorize some of the uses that I came across in COCA. Enormousness does not seem to be a fitting replacement for those blended or intermediate senses, and, as MWDEU notes, it’s never been a popular word anyway. Interestingly, MWDEU also notes that “the reasons for stigmatizing the size sense of enormity are not known.” Perhaps it became rare in the 1800s, when the OED marked it obsolete, and the rule was created before the sense enjoyed a resurgence in the twentieth century. Whatever the reason, I don’t think it makes much sense to condemn the more widely used sense of a word just because it’s newer or was rare at some point in the past. MWDEU sensibly concludes, “We have seen that there is no clear basis for the ‘rule’ at all. We suggest that you follow the writers rather than the critics: writers use enormity with a richness and subtlety that the critics have failed to take account of. The stigmatized sense is entirely standard and has been for more than a century and a half.”


It’s All Grammar—So What?

It’s a frequent complaint among linguists that laypeople use the term grammar in such a loose and unsystematic way that it’s more or less useless. They say that it’s overly broad, encompassing many different types of rules, and that it allows people to confuse things as different as syntax and spelling. They insist that spelling, punctuation, and ideas such as style or formality are not grammar at all, that grammar is really just the rules of syntax and morphology that define the language.

Arnold Zwicky, for instance, has complained that grammar as it’s typically used refers to nothing more than a “grab-bag of linguistic peeve-triggers”. I think this is an overly negative view; yes, there are a lot of people who peeve about grammar, but I think that most people, when they talk about grammar, are thinking about how to say things well or correctly.

Some people take linguists’ insistence on the narrower, more technical meaning of grammar as a sign of hypocrisy. After all, they say, with something of a smirk, shouldn’t we just accept the usage of the majority? If almost everyone uses grammar in a broad and vague way, shouldn’t we consider that usage standard? Linguists counter that this really is an important distinction, though I think it’s fair to say that they have a personal interest here; they teach grammar in the technical sense and are dismayed when people misunderstand what they do.

I’ve complained about this myself, but I’m starting to wonder whether it’s really something to worry about. (Of course, I’m probably doubly a hypocrite, what with all the shirts I sell with the word grammar on them.) After all, we see similar splits between technical and popular terminology in a lot of other fields, and they seem to get by just fine.

Take the terms fruit and vegetable, for instance. In popular use, fruits are generally sweeter, while vegetables are more savory or bitter. And while most people have probably heard the argument that tomatoes are actually fruits, not vegetables, they might not realize that squash, eggplants, peppers, peas, green beans, nuts, and grains are fruits too, at least by the botanical definition. And vegetable doesn’t even have a botanical definition—it’s just any part of a plant (other than fruits or seeds) that’s edible. It’s not a natural class at all.

In a bit of editorializing, the Oxford English Dictionary adds this note after its first definition of grammar:

As above defined, grammar is a body of statements of fact—a ‘science’; but a large portion of it may be viewed as consisting of rules for practice, and so as forming an ‘art’. The old-fashioned definition of grammar as ‘the art of speaking and writing a language correctly’ is from the modern point of view in one respect too narrow, because it applies only to a portion of this branch of study; in another respect, it is too wide, and was so even from the older point of view, because many questions of ‘correctness’ in language were recognized as outside the province of grammar: e.g. the use of a word in a wrong sense, or a bad pronunciation or spelling, would not have been called a grammatical mistake. At the same time, it was and is customary, on grounds of convenience, for books professedly treating of grammar to include more or less information on points not strictly belonging to the subject.

There are a few points here to consider. The definition of grammar has not been solely limited to syntax and morphology for many years. Once it started branching out into notions of correctness, it made sense to treat grammar, usage, spelling, and pronunciation together. From there it’s a short leap to calling the whole collection grammar, since there isn’t really another handy label. And since few people are taught much in the way of syntax and morphology unless they’re majoring in linguistics, it’s really no surprise that the loose sense of grammar predominates. I’ll admit, however, that it’s still a little exasperating to see lists of grammar rules that everyone gets wrong that are just spelling rules or, at best, misused words.

The root of the problem is that laypeople use words in ways that are useful and meaningful to them, and these ways don’t always jibe with scientific facts. It’s the same thing with grammar; laypeople use it to refer to language rules in general, especially the ones they’re most conscious of, which tend to be the ones that are the most highly regulated—usage, spelling, and style. Again, issues of syntax, morphology, semantics, usage, spelling, and style don’t constitute a natural class, but it’s handy to have a word that refers to the aspects of language that most people are conscious of and concerned with.

I think there still is a problem, though, and it’s that most people generally have a pretty poor understanding of things like syntax, morphology, and semantics. Grammar isn’t taught much in schools anymore, so many people graduate from high school and even college without much of an understanding of grammar beyond spelling and mechanics. I got out of high school without knowing anything more advanced than prepositional phrases. My first grammar class in college was a bit of a shock, because I’d never even learned about things like the passive voice or dependent clauses before that point, so I have some sympathy for those people who think that grammar is mostly just spelling and punctuation with a few minor points of usage or syntax thrown in.

So what’s the solution? Well, maybe I’m just biased, but I think it’s to teach more grammar. I know this is easier said than done, but I think it’s important for people to have an understanding of how language works. A lot of people are naturally interested in or curious about language, and I think we do those students a disservice if all we teach them is never to use infer for imply and to avoid the passive voice. Grammar isn’t just a set of rules telling you what not to do; it’s also a fascinatingly complex and mostly subconscious system that governs the singular human gift of language. Maybe we just need to accept the broader sense of grammar and start teaching people all of what it is.

Addendum: I just came across a blog post criticizing the word funner as bad grammar, and my first reaction was “That’s not grammar!” It’s always easier to preach than to practice, but my reaction has me reconsidering my laissez-faire attitude. While it seems handy to have a catch-all term for language errors, regardless of what type they are, it also seems handy—probably more so—to distinguish between violations of the regulative rules and constitutive rules of language. But this leaves us right where we started.


Relative What

A few months ago Braden asked in a comment about the history of what as a relative pronoun. (For my previous posts on relative pronouns, see here.) The history of relative pronouns in English is rather complicated, and the system as a whole is still in flux, partly because modern English essentially has two overlapping systems of relativization.

In Old English, there were a few different ways to create a relative pronoun, as this site explains. One way was to use the indeclinable particle þe, another was to use a form of the demonstrative pronoun (roughly equivalent to modern English that/those), and another was to use a demonstrative or personal pronoun followed by þe. Our modern relative that grew out of the use of demonstrative pronouns, though unlike the Old English demonstratives, that does not decline for gender, number, and case.

In the late Old English and Middle English periods, writers and speakers began to use interrogative pronouns as relative pronouns by analogy with French and Latin. It first appeared in texts that were translations from Latin around 1000 AD, but within a couple of centuries it had apparently been naturalized. Other interrogatives became pressed into service as relatives during this time, including who, which, where, when, why, and how. All of these are still in common use in Standard English except for what.

It’s important to note that what is still used as a nominal relative, which means that it does not modify another noun phrase but stands in for a noun phrase and a relative simultaneously, as in We fear what we don’t understand. This could be rephrased as We fear that which we don’t understand or We fear the things that we don’t understand, revealing the nominal and the relative.

But while all the other interrogatives have continued as relatives in Standard English, what as a simple relative pronoun is nonstandard today. Simple relative what is found in the works of Shakespeare and the King James Bible, but at some point in the last three or four centuries it fell out of use in the standard dialect. Unfortunately, I’m not really sure when this happened; the Oxford English Dictionary has citations up through 1740 and then one from 1920 that appears to be dialogue from a novel. Merriam-Webster’s Dictionary of English Usage says that in the US, it’s mainly found in rural areas in the Midland and South. As I told Braden in a response to his comment, I’ve heard it used myself. A couple of months ago I heard a man in church pray for “our leaders what guides and directs us”—not just a beautiful example of relative what, but also an interesting example of nonstandard verb agreement.

As for why simple relative what died out in Standard English, I really have no idea. Jonathan Hope noted that it’s rather unusual of Standard English to allow other interrogatives as relatives but not this one.[1] In some ways, relative what would make more sense than relative which, since what is historically part of the same paradigm as who; what comes from the neuter form of the interrogative or indefinite pronoun in Old English, while who comes from the combined masculine/feminine form, as shown here. And as I said in this post, whose was originally the genitive form for both who and what, so allowing simple relative what would make for a rather tidy paradigm.

Perhaps that’s the problem. Hope and other have argued that standardized languages—or perhaps speakers of standardized languages—tend to resist tidy paradigms. Irregularities creep in and are preserved, and they can be surprisingly resistant to change. Maybe someone reading this has a fuller explanation of just how this particular little wrinkle came to be.

  1. [1] Jonathan Hope, “Rats, Bats, Sparrows and Dogs: Biology, Linguistics and the Nature of Standard English,” in The Development of Standard English, 1300–1800, ed. Laura Wright (Cambridge: University of Cambridge Press, 2000).


The Data Is In, pt. 2

In the last post, I said that the debate over whether data is singular or plural is ultimately a question of how we know whether a word is singular or plural, or, more accurately, whether it is count or mass. To determine whether data is a count or a mass noun, we’ll need to answer a few questions. First—and this one may seem so obvious as to not need stating—does it have both singular and plural forms? Second, does it occur with cardinal numbers? Third, what kinds of grammatical agreement does it trigger?

Most attempts to settle the debate point to the etymology of the word, but this is an unreliable guide. Some words begin life as plurals but become reanalyzed as singulars or vice versa. For example, truce, bodice, and to some extent dice and pence were originally plural forms that have been made into singulars. As some of the posts I linked to last time pointed out, agenda was also a Latin plural, much like data, but it’s almost universally treated as a singular now, along with insignia, opera, and many others. On the flip side, cherries and peas were originally singular forms that were reanalyzed as plurals, giving rise to the new singular forms cherry and pea.

So obviously etymology alone cannot tell us what a word should mean or how it should work today, but then again, any attempt to say what a word ought mean ultimately rests on one logical fallacy or another, because you can’t logically derive an ought from an is. Nevertheless, if you want to determine how a word really works, you need to look at real usage. Present usage matters most, but historical usage can also shed light on such problems.

Unfortunately for the “data is plural” crowd, both present and historical usage are far more complicated than most people realize. The earliest citation in the OED for either data or datum is from 1630, but it’s just a one-word quote, “Data.” The next citation is from 1645 for the plural count noun “datas” (!), followed by the more familiar “data” in 1646. The singular mass noun appeared in 1702, and the singular count noun “datum” didn’t appear until 1737, roughly a century later. Of course, you always have to take such dates with a grain of salt, because any of them could be antedated, but it’s clear that even from the beginning, data‘s grammatical number was in doubt. Some writers used it as a plural, some used it as a singular with the plural form “datas”, and apparently no one used its purported singular form “datum” for another hundred years.

It appears that historical English usage doesn’t help much in settling the matter, though it does make a few things clear. First, there has been considerable variation in the perceived number of data (mass, singular count, or plural count) for over 350 years. Second, the purported singular form, datum, was apparently absent from English for almost a hundred years and continues to be relatively rare today. In fact, in Mark Davies’ COCA, “data point” slightly outnumbers “datum”, and most of the occurrences of “datum” are not the traditional singular form of data but other specialized uses. This is the first strike against data as a plural; count nouns are supposed to have singular forms, though there are a handful of words known as pluralia tantum, which occur only in the plural. I’ll get to that later.

So data doesn’t really seem to have a singular form. At least you can still count data, right? Well, apparently not. Nearly all of the hits in COCA for “[mc*] data” (meaning a cardinal number followed by the word data) are for things like “two data sets” or “74 data points”. It seems that no one who uses data as a plural count noun ever bothers to count their data, or when they do, they revert to using “data” as a mass noun to modify a normal count noun like “points”. Strike two, and this is a big one. The Cambridge Grammar of the English Language gives use with cardinal numbers as the primary test of countability.

Data does better when it comes to grammatical agreement, though this is not as positive as it may seem. It’s easy enough to find constructions like as these few data show, but it’s just as easy to find constructions like there is very little data. And when the word fails the first two tests, the results here seem suspect. Aren’t people simply forcing the word data to behave like a plural count noun? As this wonderfully thorough post by Norman Gray points out (seriously, read the whole thing), “People who scrupulously write ‘data’ as a plural are frequently confused when it comes to more complicated sentences”, writing things like “What is HEP data? The data themselves…”. The urge to treat data as a singular mass noun—because that’s how it behaves—is so strong that it takes real effort to make it seem otherwise.

It seems that if data really is a plural noun, it’s a rather defective one. As I mentioned earlier, it’s possible that it’s some sort of plurale tantum, but even this conclusion is unsatisfying.
Many pluralia tantum in English are words that refer to things made of two halves, like scissors or tweezers, but there are others like news or clothes. You can’t talk about one new or one clothe (though clothes was originally the plural of cloth). You also usually can’t talk about numbers of such things without using an additional counting word or paraphrasing. Thus we have news items or articles of clothing.

Similarly, you can talk about data points or points of data, but at best this undermines the idea that data is an ordinary plural count noun. But language is full of exceptions, right? Maybe data is just especially exceptional. After all, as Robert Lane Green said in this post, “We have a strong urge to just have language behave, but regular readers of this column know that, as the original Johnson knew, it just won’t.”

I must disagree. The only thing that makes data exceptional is that people have gone to such great lengths to try to get it to act like a plural, but it just isn’t working. Its irregularity is entirely artificial, and there’s no purpose for it except a misguided loyalty to the word’s Latin roots. I say it’s time to stop the act and just let the word behave—as a mass noun.


The Data Is In, pt. 1

Lately there has been a spate of blog posts on the question of whether data is a singular or a plural noun. Surprisingly, most of them come down on the side of saying that it can be singular—except when it’s plural. Although saying that it can be singular is refreshingly open-minded, I’ve still got a few problems with the facts and reasoning that led them to that conclusion, as well as the wishy-washiness of saying that it’s singular except when it isn’t.

The first post, “Is Data Is, or Is Data Ain’t, a Plural?”, came from the Wall Street Journal, and it took what Robert Lane Greene of the Economist blog Johnson called “an unusually fence-sitting position“: although they say that they “hereby join the majority” by accepting it as either singular or plural, they predict that “the plural will continue to dominate in our prose”. And they give this head-scratching reasoning:

Singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Isn’t all data—whether you think of it as a count or a mass noun—“collections of information”? Just because something’s in a collection doesn’t mean it’s singular. For example, if I had an extensive rock collection, you probably wouldn’t say that I had a lot of rock, though I suppose you could; you’d probably say that I have a lot of rocks. The number really depends on the way we perceive the things in the collection, not on the fact that it’s in a collection. But if that wasn’t confusing enough, they give this unreliable test of data‘s number:

As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster. (italics added for clarity)

Doesn’t this test simply tell you that data should always be plural? In what case would the singular is ever pass muster? Either I’m missing something important about how you’re supposed to use this substitution test or it’s simply broken.

Next came this post on the Guardian‘s Datablog. Sadly, it’s even more muddled than the Wall Street Journal post, and it’s depressingly light on data. It simply asserts, without examination,

Strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.

But despite further assertions that data is “strictly a plural”, the Guardian style guide says, “Data takes a singular verb”, though they correctly note that (virtually) “no one ever uses ‘agendum’ or ‘datum’”. But this idoesn’t make much sense; if it’s plural, why does it take a singular verb? And if it takes a singular verb, is it really plural?

The Guardian post also linked to this National Geographic post from a few years ago, which says much the same thing but somehow manages to be even more muddled. It starts off badly by saying that “data is often used as a collective noun referring to information, statistics, and the like”. Here they mean “mass noun”, not “collective noun”. Note that the Wikipedia articles each say at the top that these terms should not be confused. But aside from this basic mistake, note how it seems to contradict the Wall Street Journal post, which says that singular verbs are used for collections of information.

I wondered if this was just a simple error in the National Geographic post; from context, I would have expected the so-called “collective” form to use a singular verb. But in the next paragraph they say that their style is to use data as a plural when “referring to a body of facts, figures, and such.”

The post gets even more confusing, pointing out some of National Geographic‘s supposed errors and then saying that both the singular and plural are considered standard. If they’re both standard, then how are their examples errors? The post ends with a red herring about avoiding confusion and the bizarre statement, “I’d rather not box writers into a singular form.” So why box them into a plural form? If there’s a distinction to be made, even a subtle one, between data as a mass noun and data as a singular noun, why not encourage it? Why whitewash over it by insisting that data always be plural?

Ultimately, though, this whole debate rests on one question: how do we know whether a word is plural or singular? And that’s what I’ll tackle next time.

Read part 2 here.


Take My Commas—Please

Most editors are probably familiar with the rule that commas should be used to set off nonrestrictive appositives and that no commas should be used around restrictive appositives. (In Chicago 16, it’s under 6.23.) A restrictive appositive specifies which of a group of possible referents you’re talking about, and it’s thus integral to the sentence. A nonrestrictive appositive simply provides extra information about the thing you’re talking about. Thus you would write My wife, Ruth, (because I only have one wife) but My cousin Steve (because I have multiple cousins, and one is named Steve). The first tells you that my wife’s name is Ruth, and the latter tells you which of my cousins I’m talking about.

Most editors are probably also familiar with the claim that if you leave out the commas after a phrase like “my wife”, the implication is that you’re a polygamist. In one of my editing classes, we would take a few minutes at the start of each class to share bloopers with the rest of the class. One time my professor shared the dedication of a book, which read something like “To my wife Cindy”. Obviously the lack of a comma implies that he must be a polygamist! Isn’t that funny? Everyone had a good laugh.

Except me, that is. I was vaguely annoyed by this alleged blooper, which required a willful misreading of the dedication. There was no real ambiguity here—only an imagined one. If the author had actually meant to imply that he was a polygamist, he would have written something like “To my third wife, Cindy”, though of course he could still write this if he were a serial monogamist.

Usually I find this insistence on commas a little exasperating, but in one instance the other day, the commas were actually wrong. A proofreader had corrected a caption which read “his wife Arete” to “his wife, Arete,” which probably seemed like a safe change to make but which was wrong in this instance—the man referred to in the caption had three wives concurrently. I stetted the change, but it got me thinking about fact-checking and the extent to which it’s an editor’s job to split hairs.

This issue came up repeatedly during a project I worked on last year. It was a large book with a great deal of biographical information in it, and I frequently came across phrases like “Hans’s daughter Ingrid”. Did Hans have more than one daughter, or was she his only daughter? Should it be “Hans’s daughter, Ingrid,” or “Hans’s daughter Ingrid”? And how was I to know?

Pretty quickly I realized just how ridiculous the whole endeavor was. I had neither the time nor the resources to look up World War II–era German citizens in a genealogical database, and I wasn’t about to bombard the author with dozens of requests for him to track down the information either. Ultimately, it was all pretty irrelevant. It simply made no difference to the reader. I decided we were safe just leaving the commas out of such constructions.

And, honestly, I think it’s even safer to leave the commas out when referring to one’s spouse. Polygamy is such a rarity in our culture that it’s usually highlighted in the text, with wording such as “John and Janet, one of his three wives”. Assuming that “my wife Ruth” implies that I have more than one wife is a deliberate flouting of the cooperative principle of communication. This insistence on a narrow, prescribed meaning over the obvious, intended meaning is a problem with many prescriptive rules, but, once again, that’s a topic for another day.

Please note, however, that I’m not saying that anything goes or that you can punctuate however you want as long as the meaning’s clear. In cases where it’s a safe assumption that there’s just one possible referent, or when it doesn’t really matter, the commas can sometimes seem a little fussy and superfluous.


Most Awarded

The other day a friend of mine complained about the use of the phrase “most-awarded” in a commercial for the Jeep Cherokee, which called it the “most-awarded SUV ever.” It bothered him, he said, because “they are saying lots of Cherokees get given away as awards, but that’s not what they mean.” I was surprised—I thought it was pretty clear that it meant “the SUV that has been given the most awards”—but several other people chimed in to say that they read it the other way—the SUV most given as an award. One person suggested that it was just another example of advertisers bastardizing the language, while another thought that it was an attempt to be funny by saying one thing but meaning another. And of course the question came up, “Can you correctly say that something has been ‘awarded’ if it is not the award?

There’s absolutely nothing incorrect about it, though it is technically ambiguous. The problem is that in this instance, “awarded” is a passive construction (technically a reduced one), meaning that what is normally an object has been moved to subject position. But it’s ambiguous because “awarded” is ditransitive, which means that it can take both a direct and an indirect object. Most transitive verbs (that is, verbs that take objects) can take only one object, as in “The boy kicked the ball,” but some can take two, as in “The boy gave his friend the ball.” In both sentences, the ball is the direct object, but in the second sentence, we also have an indirect object, his friend.

The same holds for the verb award—you award something to someone (or something), like “The committee awarded him (indirect object) the Nobel Prize (direct object)” or “Car and Driver awarded the Cherokee (indirect object) SUV of the Year (direct object).” (I don’t know if they actually did.) To put the sentence in the passive voice, we can move either one of the objects to subject position, giving us either “The Cherokee was awarded SUV of the Year (by Car and Driver)” or “SUV of the Year was awarded to the Cherokee (by Car and Driver).”

The structural ambiguity comes in when you turn a sentence like this into a reduced passive, as in “most-awarded SUV.” The adjectival phrase “most-awarded” derives from the fuller passive clause “The Cherokee was awarded the most.” Structurally speaking, because award is ditransitive, this could derive from something like either “The Cherokee was awarded to people the most” or “The Cherokee was awarded the most awards.” (Ignore the awkward repetition of the latter; we’re just interested in the structure here, not in elegance.)

Put back into the active voice, this could be either “(Someone) awarded the Cherokee to the most people” or “(Someone) awarded the Cherokee the most awards.” (In either case, it’s not relevant who the subject is, especially since it’s presumably multiple someones.) In the first sentence, the Cherokee is being given as an award; in the second, it’s receiving the awards.

At first, my intuition was that there was something strange about giving a car as an award; it could be a reward or a prize, but in my mind an award is something like the Nobel Prize or an Academy Award or some sort of cash prize. But then I remembered the infamous leg lamp from A Christmas Story, which the father repeatedly describes as “a major award.” So obviously an award could be something other than a medal or a cash amount.

Corpus data wasn’t very helpful, either. COCA gives only five hits for “most awarded,” but all of them support my reading—”the SUV that has received the most awards”—by making the subject the recipient of the award, not the thing being awarded to someone. The Google Books corpus provides more hits, and though most of them still use the “has received the most awards” sense, there’s a little more variation here, with some employing the “most given as an award” sense, such as “The Nobel Prize in physics is the most awarded of all the five prize categories.”

Next I turned to Twitter to solve the argument. I wrote, “Help me settle an argument: Does ‘most-awarded SUV’ mean ‘SUV most given as an award’ or ‘SUV that has received the most awards’?” The results were not terribly helpful. Out of five responses, three voted for “most given as an award” and two voted for “has received the most awards,” though one noted that either was possible.

Honestly, I was baffled, though I think there’s something of an answer in here somewhere. In most of the examples I came across in the corpora, it’s very clear from context what the award is and who or what is receiving it. If I tell you that Schindler’s List is the most-awarded movie in history (at least it was in 1994, when one of the corpus examples was written), you know that the movie received awards, not that someone received a movie as an award. And if I tell you that the PhD is the most-awarded degree, you know that someone is receiving the degree, not that the degree is receiving an award.

But with a car, it’s more ambiguous. Cars can receive awards, and people can presumably receive cars as awards. And although I think it’s clear that the first meaning is intended, a lot of people are irked by it or don’t get the intended meaning at all.

The upshot of this is that it underscores the importance of researching points of usage before declaring an answer. At first I was convinced that I was clearly right and everyone else was wrong. But though my intuition coincides with the intended meaning, intuition alone isn’t enough to explain what’s going on. You need real-world data for that, and sometimes you find that the answer is not as simple as you thought.

%d bloggers like this: