July 30, 2012

The Data Is In, pt. 1

Lately there has been a spate of blog posts on the question of whether data is a singular or a plural noun. Surprisingly, most of them come down on the side of saying that it can be singular—except when it’s plural. Although saying that it can be singular is refreshingly open-minded, I’ve still got a few problems with the facts and reasoning that led them to that conclusion, as well as the wishy-washiness of saying that it’s singular except when it isn’t.

The first post, “Is Data Is, or Is Data Ain’t, a Plural?”, came from the Wall Street Journal, and it took what Robert Lane Greene of the Economist blog Johnson called “an unusually fence-sitting position”: although they say that they “hereby join the majority” by accepting it as either singular or plural, they predict that “the plural will continue to dominate in our prose”. And they give this head-scratching reasoning:

Singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Isn’t all data—whether you think of it as a count or a mass noun—“collections of information”? Just because something’s in a collection doesn’t mean it’s singular. For example, if I had an extensive rock collection, you probably wouldn’t say that I had a lot of rock, though I suppose you could; you’d probably say that I have a lot of rocks. The number really depends on the way we perceive the things in the collection, not on the fact that it’s in a collection. But if that wasn’t confusing enough, they give this unreliable test of data‘s number:

As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster. (italics added for clarity)

Doesn’t this test simply tell you that data should always be plural? In what case would the singular is ever pass muster? Either I’m missing something important about how you’re supposed to use this substitution test or it’s simply broken.

Next came this post on the Guardian’s Datablog. Sadly, it’s even more muddled than the Wall Street Journal post, and it’s depressingly light on data. It simply asserts, without examination:

Strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.

But despite further assertions that data is “strictly a plural”, the Guardian style guide says, “Data takes a singular verb”, though they correctly note that (virtually) “no one ever uses ‘agendum’ or ‘datum’”. But this doesn’t make much sense; if it’s plural, why does it take a singular verb? And if it takes a singular verb, is it really plural?

The Guardian post also linked to this National Geographic post from a few years ago, which says much the same thing but somehow manages to be even more muddled. It starts off badly by saying that “data is often used as a collective noun referring to information, statistics, and the like”. Here they mean “mass noun”, not “collective noun”. Note that the Wikipedia articles each say at the top that these terms should not be confused. But aside from this basic mistake, note how it seems to contradict the Wall Street Journal post, which says that singular verbs are used for collections of information.

I wondered if this was just a simple error in the National Geographic post; from context, I would have expected the so-called “collective” form to use a singular verb. But in the next paragraph they say that their style is to use data as a plural when “referring to a body of facts, figures, and such.”

The post gets even more confusing, pointing out some of National Geographic’s supposed errors and then saying that both the singular and plural are considered standard. If they’re both standard, then how are their examples errors? The post ends with a red herring about avoiding confusion and the bizarre statement, “I’d rather not box writers into a singular form.” So why box them into a plural form? If there’s a distinction to be made, even a subtle one, between data as a mass noun and data as a singular noun, why not encourage it? Why whitewash over it by insisting that data always be plural?

Ultimately, though, this whole debate rests on one question: how do we know whether a word is plural or singular? And that’s what I’ll tackle next time.

Read part 2 here.

Semantics, Usage, Words 6 Replies to “The Data Is In, pt. 1”
Jonathon Owen
Jonathon Owen


6 thoughts on “The Data Is In, pt. 1

    Author’s gravatar

    I personally think of “data” as a mass noun – “the data is” but not “one data shows”, rather “some (a piece of) data shows”. This, encountered in a NOVA broadcast, just sounds totally weird to me:

    “We were expecting to receive all of the simulated data,” says Jean-Pierre Lebreton. “Unfortunately, we didn’t receive very many of those data.”

    Author’s gravatar

    These articles are so confusing. The Guardian article seems to say “X is grammatically or technically incorrect, but it’s ok to use X anyway”, which I don’t get at all. 

    That NG article is weird. It doesn’t explain anything about when to use the singular and when to use the plural. And they call it Webster’s Dictionary of English Usage instead of Merriam-Webster’s Dictionary of English Usage.

    Author’s gravatar

    The Ridger: That sounds very weird to me too. Data is pretty firmly a mass noun to me too, and the mere use of plural verbs or quantifiers doesn’t change it for me. It sounds as ungrammatical to me as “We didn’t have very many of those rice.”

    Goofy: Each article seems confusing on its own, but they’re even worse when taken as a whole, since they occasionally make similar-sounding arguments to make opposite points.

    Author’s gravatar

    Lebreton turns out to be French; maybe he just was told “data is plural” and therefore he can treat it as such?

    Author’s gravatar

    Got stuck on your rock collection for a bit. Had assumed you meant the Jimi Hendrix variety.

    […] on writing a post about the whole data conundrum at some point, but until that happens, check out parts 1 and 2 of The Data Is In at Arrant Pedantry. Personally, I prefer to consider data a collective noun […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.