The Data Is In, pt. 1

Lately there has been a spate of blog posts on the question of whether data is a singular or a plural noun. Surprisingly, most of them come down on the side of saying that it can be singular—except when it’s plural. Although saying that it can be singular is refreshingly open-minded, I’ve still got a few problems with the facts and reasoning that led them to that conclusion, as well as the wishy-washiness of saying that it’s singular except when it isn’t.

The first post, “Is Data Is, or Is Data Ain’t, a Plural?”, came from the Wall Street Journal, and it took what Robert Lane Greene of the Economist blog Johnson called “an unusually fence-sitting position“: although they say that they “hereby join the majority” by accepting it as either singular or plural, they predict that “the plural will continue to dominate in our prose”. And they give this head-scratching reasoning:

Singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Isn’t all data—whether you think of it as a count or a mass noun—“collections of information”? Just because something’s in a collection doesn’t mean it’s singular. For example, if I had an extensive rock collection, you probably wouldn’t say that I had a lot of rock, though I suppose you could; you’d probably say that I have a lot of rocks. The number really depends on the way we perceive the things in the collection, not on the fact that it’s in a collection. But if that wasn’t confusing enough, they give this unreliable test of data‘s number:

As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster. (italics added for clarity)

Doesn’t this test simply tell you that data should always be plural? In what case would the singular is ever pass muster? Either I’m missing something important about how you’re supposed to use this substitution test or it’s simply broken.

Next came this post on the Guardian‘s Datablog. Sadly, it’s even more muddled than the Wall Street Journal post, and it’s depressingly light on data. It simply asserts, without examination,

Strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.

But despite further assertions that data is “strictly a plural”, the Guardian style guide says, “Data takes a singular verb”, though they correctly note that (virtually) “no one ever uses ‘agendum’ or ‘datum'”. But this idoesn’t make much sense; if it’s plural, why does it take a singular verb? And if it takes a singular verb, is it really plural?

The Guardian post also linked to this National Geographic post from a few years ago, which says much the same thing but somehow manages to be even more muddled. It starts off badly by saying that “data is often used as a collective noun referring to information, statistics, and the like”. Here they mean “mass noun”, not “collective noun”. Note that the Wikipedia articles each say at the top that these terms should not be confused. But aside from this basic mistake, note how it seems to contradict the Wall Street Journal post, which says that singular verbs are used for collections of information.

I wondered if this was just a simple error in the National Geographic post; from context, I would have expected the so-called “collective” form to use a singular verb. But in the next paragraph they say that their style is to use data as a plural when “referring to a body of facts, figures, and such.”

The post gets even more confusing, pointing out some of National Geographic‘s supposed errors and then saying that both the singular and plural are considered standard. If they’re both standard, then how are their examples errors? The post ends with a red herring about avoiding confusion and the bizarre statement, “I’d rather not box writers into a singular form.” So why box them into a plural form? If there’s a distinction to be made, even a subtle one, between data as a mass noun and data as a singular noun, why not encourage it? Why whitewash over it by insisting that data always be plural?

Ultimately, though, this whole debate rests on one question: how do we know whether a word is plural or singular? And that’s what I’ll tackle next time.

Read part 2 here.