Arrant Pedantry


The Data Is In, pt. 2

In the last post, I said that the debate over whether data is singular or plural is ultimately a question of how we know whether a word is singular or plural, or, more accurately, whether it is count or mass. To determine whether data is a count or a mass noun, we’ll need to answer a few questions. First—and this one may seem so obvious as to not need stating—does it have both singular and plural forms? Second, does it occur with cardinal numbers? Third, what kinds of grammatical agreement does it trigger?

Most attempts to settle the debate point to the etymology of the word, but this is an unreliable guide. Some words begin life as plurals but become reanalyzed as singulars or vice versa. For example, truce, bodice, and to some extent dice and pence were originally plural forms that have been made into singulars. As some of the posts I linked to last time pointed out, agenda was also a Latin plural, much like data, but it’s almost universally treated as a singular now, along with insignia, opera, and many others. On the flip side, cherries and peas were originally singular forms that were reanalyzed as plurals, giving rise to the new singular forms cherry and pea.

So obviously etymology alone cannot tell us what a word should mean or how it should work today, but then again, any attempt to say what a word ought mean ultimately rests on one logical fallacy or another, because you can’t logically derive an ought from an is. Nevertheless, if you want to determine how a word really works, you need to look at real usage. Present usage matters most, but historical usage can also shed light on such problems.

Unfortunately for the “data is plural” crowd, both present and historical usage are far more complicated than most people realize. The earliest citation in the OED for either data or datum is from 1630, but it’s just a one-word quote, “Data.” The next citation is from 1645 for the plural count noun “datas” (!), followed by the more familiar “data” in 1646. The singular mass noun appeared in 1702, and the singular count noun “datum” didn’t appear until 1737, roughly a century later. Of course, you always have to take such dates with a grain of salt, because any of them could be antedated, but it’s clear that even from the beginning, data‘s grammatical number was in doubt. Some writers used it as a plural, some used it as a singular with the plural form “datas”, and apparently no one used its purported singular form “datum” for another hundred years.

It appears that historical English usage doesn’t help much in settling the matter, though it does make a few things clear. First, there has been considerable variation in the perceived number of data (mass, singular count, or plural count) for over 350 years. Second, the purported singular form, datum, was apparently absent from English for almost a hundred years and continues to be relatively rare today. In fact, in Mark Davies’ COCA, “data point” slightly outnumbers “datum”, and most of the occurrences of “datum” are not the traditional singular form of data but other specialized uses. This is the first strike against data as a plural; count nouns are supposed to have singular forms, though there are a handful of words known as pluralia tantum, which occur only in the plural. I’ll get to that later.

So data doesn’t really seem to have a singular form. At least you can still count data, right? Well, apparently not. Nearly all of the hits in COCA for “[mc*] data” (meaning a cardinal number followed by the word data) are for things like “two data sets” or “74 data points”. It seems that no one who uses data as a plural count noun ever bothers to count their data, or when they do, they revert to using “data” as a mass noun to modify a normal count noun like “points”. Strike two, and this is a big one. The Cambridge Grammar of the English Language gives use with cardinal numbers as the primary test of countability.

Data does better when it comes to grammatical agreement, though this is not as positive as it may seem. It’s easy enough to find constructions like as these few data show, but it’s just as easy to find constructions like there is very little data. And when the word fails the first two tests, the results here seem suspect. Aren’t people simply forcing the word data to behave like a plural count noun? As this wonderfully thorough post by Norman Gray points out (seriously, read the whole thing), “People who scrupulously write ‘data’ as a plural are frequently confused when it comes to more complicated sentences”, writing things like “What is HEP data? The data themselves…”. The urge to treat data as a singular mass noun—because that’s how it behaves—is so strong that it takes real effort to make it seem otherwise.

It seems that if data really is a plural noun, it’s a rather defective one. As I mentioned earlier, it’s possible that it’s some sort of plurale tantum, but even this conclusion is unsatisfying.
Many pluralia tantum in English are words that refer to things made of two halves, like scissors or tweezers, but there are others like news or clothes. You can’t talk about one new or one clothe (though clothes was originally the plural of cloth). You also usually can’t talk about numbers of such things without using an additional counting word or paraphrasing. Thus we have news items or articles of clothing.

Similarly, you can talk about data points or points of data, but at best this undermines the idea that data is an ordinary plural count noun. But language is full of exceptions, right? Maybe data is just especially exceptional. After all, as Robert Lane Green said in this post, “We have a strong urge to just have language behave, but regular readers of this column know that, as the original Johnson knew, it just won’t.”

I must disagree. The only thing that makes data exceptional is that people have gone to such great lengths to try to get it to act like a plural, but it just isn’t working. Its irregularity is entirely artificial, and there’s no purpose for it except a misguided loyalty to the word’s Latin roots. I say it’s time to stop the act and just let the word behave—as a mass noun.


No Dice

If you’ve ever had to learn a foreign language, you may have struggled to memorize plural forms of nouns. German, for example, has about a half a dozen ways of forming plurals, and it’s a chore to remember which kind of plural each noun takes. English, by comparison, is ridiculously easy. Here’s how it works for nearly every English noun: add -s to the end. Sometimes you need to insert an e before the s, and sometimes you need to change a preceding y to ie, but that’s the rule in a nutshell.

Of course, there are still plenty of exceptions: a couple that end in -en (oxen and the strange double plural children), a handful of umlaut plurals (man–men, foot–feet, mouse–mice, etc.), some uninflected plurals (usually for domesticated or game animals, such as sheep, deer, and so on), and a plethora of foreign borrowings (particularly from Latin and Greek) that often follow rules from their donor languages but occasionally don’t. There are a few other oddballs—like person–people, for example—but nearly every English count noun fits into one of these categories.

But there’s one plural that doesn’t fit into any of these categories, because it’s been caught for centuries in a strange limbo between count nouns, which take plural forms, and mass nouns, which don’t. It’s dice. If you need a refresher, mass nouns generally refer to things that are not discrete, such as milk or oil, though some refer to things that are made of discrete pieces “whose indivual identities are not usually important to us,” as Arnold Zwicky put it in this Language Log post—words like corn or rice. You could count the individual grains or kernels if you wanted to, but why would you ever want to?

And this is how dice slipped through the cracks of language change. Originally, die was a regular noun that formed its plural by adding an s sound to the end. (For the moment, let’s leave aside the issue of spelling, because Middle and Early Modern English spelling was anything but standard.) At some point in the history of English, the final -s in plurals was voiceless, meaning that it was always pronounced with an s sound, not a z sound. But then that changed, probably sometime in the 1500s, so that the final -s was always voiced—that is, pronounced as a z—unless it followed a voiceless sound. Strangely, this sound change seems to have affected only the plural and possessive -s endings and not other word-final s’s.

But around that time, we start seeing the plural of die, when referring to those little cubes with pips used for games and whatnot, spelled as dice (and similar forms). In Modern English spelling, the final -s on a plural can be either voiced or voiceless, depending on the preceding word, but -ce is always voiceless. As the regular plural ending was becoming voiced for many many words, it remained voiceless in dice. Why?

Well, apparently because people had stopped thinking of it as a plural and started thinking of it as a mass noun, much like corn and rice, so they stopped seeing the s sound on the end as the plural marker and started perceiving it as simply part of the word. Singular dice can be found back to the late 1300s, and when the sound change came along in the 1500s and voiced most plural -s endings, dice was left behind, with its spelling altered to show that it was unequivocally voiceless. In other senses of the word, die was still thought of as a regular count noun, so its plural forms ended up as dies.*

Dice wasn’t the only word passed over in this way, though; truce (originally the plural of true, meaning “pledge” or “oath”), bodice (plural of body), and pence (a contracted plural form of penny) come to us the same way. Speakers subconsciously reanalyzed these words as mass nouns or singular count nouns, so their final s sounds stayed voiceless. Similarly, once, twice, and thrice were originally genitive forms, but they ceased to be thought of as such and consequently retained their voiceless sounds, respelled with ce.

But the strange thing is that whereas the words mentioned above made the transition to mass nouns or new singular count nouns, usage of dice has been split for centuries. We’ve never fully made the switch to thinking of dice as a mass noun, used regardless of the actual number of the things, because, unlike rice or corn, we do frequently care about the number of dice being used. Instead of a true mass noun, it’s become an uninflected count noun—one dice, two dice—for many people, though it exists alongside the original singular die. But singular dice is rare in print, because we’re told that it’s properly one die, two dice, even though some dictionaries note that singular dice is much more frequent in gaming than die.

So where does that leave us? You can go with singular die and possibly be thought of as something of a pedant, or you can go with singular dice and possibly be thought of as a little ignorant. As for me, I usually use singular die and feel twinges of self-loathing when I do so; I haven’t had the heart to correct my boys when they use singular dice.

*For more on the reconstruction of the plural ending in English, see the section on the English plural suffix in the chapter “Reconstruction” in Language History: An Introduction, by Andrew L. Sihler (Philadelphia: John Benjamins, 2000).


Less and Fewer

I know this topic has been addressed in detail elsewhere (see goofy’s post here for example), but a friend recently asked me about it, so I thought I’d take a crack at it. It’s fairly straightforward: there are the complex, implicit rules that people have been following for over a thousand years, and then there are the simple, explicit, artificial rules that some people have been trying to inflict on everyone else for the last couple of centuries.

The explicit rule is this: use fewer for count nouns (things that can be numbered), and use less for mass nouns (things that are typically measured). So you’d say fewer eggs but less milk, fewer books but less information. Units of time, money, distance, and so on are usually treated as mass nouns (so you’d say less than ten years old, not fewer than ten years old. One handy (but overly simplistic) way to tell mass nouns and count nouns apart (save for the exception I just noted) is this: if you can make it plural and use a numeral in front of it (five eggs), then it’s a count noun and it takes fewer.

The only problem with this rule is that it was invented by Robert Baker in 1770, and it contradicts historical and present-day usage. In actual practice, fewer has always been restricted to count nouns, but less is often used with count nouns, too, especially in certain constructions like twenty-five words or less, no less than one hundred people, and one less problem to worry about. It used to be that people used less when it sounded natural and nobody worried about it, but then some guy in the eighteenth century got the bright idea that we should always use one word for count nouns and one word for mass nouns, and people have been freaking out about it ever since.

Baker’s rule is appealing because it’s simple and (in my opinion) because it allows people to judge others who don’t know grammar. It makes a certain kind of sense to use one word for one thing and another word for another thing, but the fact is that language is seldom so neat and tidy. Real language is full of complexities and exceptions to rules, and the amazing thing is that we learn all of these rules naturally just by listening to and talking with other people. Breaking Baker’s rule is not a sign of lazy thinking or sloppy grammar or anything else negative—it’s just a sign that you’re a native speaker.

The fact that not everybody follows the simple, explicit rule, nearly 240 years after it was created, shows you just how hard it is to get people to change their linguistic habits. Is there any advantage to following the made-up rule? Probably not, aside from avoiding stigma from people who like to look down their noses at those who they deem to have poor grammar. So if you want to please the fussy grammarian types, be sure to use follow Baker’s made-up rule. If you don’t care about those types, use whatever comes naturally to you.

%d bloggers like this: