Arrant Pedantry

By

Some Exciting News

I meant to have a new post up this week, but things have been a little more hectic than I’d anticipated. I do have an exciting announcement, though: I’ve been asked to take over the column “Grammar on the Edge” in Copyediting newsletter. Every other month, I’ll cover difficult or obscure points of grammar. Watch for my first article in the December issue.

By

New Product Designer

Yes, it’s another t-shirt-related post. I’ve just launched a new product designer, where you can place designs on whatever products you want—t-shirts, hoodies, laptop sleeves, aprons, iPhone cases, and more. You can even change the colors of the designs and pick your own printing technique. Just click on the Product Designer tab on the menu above to give it a whirl.

By

In Praise of Spreadshirt

Some of you may get tired of me hawking my t-shirts. If you’re one of those, I guess you can go ahead and plug your ears for this post. But I’d like to talk a bit about Spreadshirt and why I’ve chosen them to make my shirts.

I first started designing my own t-shirts several years ago, after I discovered that I could do so on CafePress. But the first time I bought one of my own designs (as a gift for my wife), I was quite disappointed with the quality. The shirt was so thin and lightweight that it was practically gauze, and the print was literally an iron-on graphic. I’m sure I could have made the same thing myself for less money.

At some point CafePress improved its printing techniques and t-shirt quality, but I was still disappointed. The printing on one of my shirts looked so faded, even when it was brand-new, that one of my coworkers thought it was an intentionally vintage look. The last straw was a problem with my account involving some rather inept customer service.

I went with Zazzle for a little while, but then I switched to Spreadshirt after I bought a shirt from another site that was printed through Spreadshirt. The printing was crisp and durable and has resisted fading or peeling even after several years of wear. I was sold when I saw that Spreadshirt allows users to upload vector graphics, which means that complex artwork can be scaled up or down without losing image quality. It has allowed me to create designs like Battlestar Grammatica and Word Nerd without worrying about the final product looking fuzzy or faded.

If you’ve never taken a look at my store, I’d encourage you to do so now. If you like a design but want it on a different product—let’s say you want to put Battlestar Grammatica on a laptop case—you can customize it in the Spreadshirt Marketplace. And of course, you can always create your own custom t-shirts and other products. You don’t even need to set up your own storefront to do so.

I know I’m never going to get rich selling t-shirts aimed at linguists and editors through my own site. But that’s okay. It’s a fun hobby, and earning a few extra bucks here and there is always nice. Oh, and this weekend (today, August 31st, through September 1st), you can get 20 percent off everything with the coupon code NICEWEEKEND. There’s no minimum purchase requirement. If you like what you see, please consider spreading the love.

By

Relative What

A few months ago Braden asked in a comment about the history of what as a relative pronoun. (For my previous posts on relative pronouns, see here.) The history of relative pronouns in English is rather complicated, and the system as a whole is still in flux, partly because modern English essentially has two overlapping systems of relativization.

In Old English, there were a few different ways to create a relative pronoun, as this site explains. One way was to use the indeclinable particle þe, another was to use a form of the demonstrative pronoun (roughly equivalent to modern English that/those), and another was to use a demonstrative or personal pronoun followed by þe. Our modern relative that grew out of the use of demonstrative pronouns, though unlike the Old English demonstratives, that does not decline for gender, number, and case.

In the late Old English and Middle English periods, writers and speakers began to use interrogative pronouns as relative pronouns by analogy with French and Latin. It first appeared in texts that were translations from Latin around 1000 AD, but within a couple of centuries it had apparently been naturalized. Other interrogatives became pressed into service as relatives during this time, including who, which, where, when, why, and how. All of these are still in common use in Standard English except for what.

It’s important to note that what is still used as a nominal relative, which means that it does not modify another noun phrase but stands in for a noun phrase and a relative simultaneously, as in We fear what we don’t understand. This could be rephrased as We fear that which we don’t understand or We fear the things that we don’t understand, revealing the nominal and the relative.

But while all the other interrogatives have continued as relatives in Standard English, what as a simple relative pronoun is nonstandard today. Simple relative what is found in the works of Shakespeare and the King James Bible, but at some point in the last three or four centuries it fell out of use in the standard dialect. Unfortunately, I’m not really sure when this happened; the Oxford English Dictionary has citations up through 1740 and then one from 1920 that appears to be dialogue from a novel. Merriam-Webster’s Dictionary of English Usage says that in the US, it’s mainly found in rural areas in the Midland and South. As I told Braden in a response to his comment, I’ve heard it used myself. A couple of months ago I heard a man in church pray for “our leaders what guides and directs us”—not just a beautiful example of relative what, but also an interesting example of nonstandard verb agreement.

As for why simple relative what died out in Standard English, I really have no idea. Jonathan Hope noted that it’s rather unusual of Standard English to allow other interrogatives as relatives but not this one.1Jonathan Hope, “Rats, Bats, Sparrows and Dogs: Biology, Linguistics and the Nature of Standard English,” in The Development of Standard English, 1300–1800, ed. Laura Wright (Cambridge: University of Cambridge Press, 2000). In some ways, relative what would make more sense than relative which, since what is historically part of the same paradigm as who; what comes from the neuter form of the interrogative or indefinite pronoun in Old English, while who comes from the combined masculine/feminine form, as shown here. And as I said in this post, whose was originally the genitive form for both who and what, so allowing simple relative what would make for a rather tidy paradigm.

Perhaps that’s the problem. Hope and other have argued that standardized languages—or perhaps speakers of standardized languages—tend to resist tidy paradigms. Irregularities creep in and are preserved, and they can be surprisingly resistant to change. Maybe someone reading this has a fuller explanation of just how this particular little wrinkle came to be.

Notes   [ + ]

1. Jonathan Hope, “Rats, Bats, Sparrows and Dogs: Biology, Linguistics and the Nature of Standard English,” in The Development of Standard English, 1300–1800, ed. Laura Wright (Cambridge: University of Cambridge Press, 2000).

By

The Data Is In, pt. 2

In the last post, I said that the debate over whether data is singular or plural is ultimately a question of how we know whether a word is singular or plural, or, more accurately, whether it is count or mass. To determine whether data is a count or a mass noun, we’ll need to answer a few questions. First—and this one may seem so obvious as to not need stating—does it have both singular and plural forms? Second, does it occur with cardinal numbers? Third, what kinds of grammatical agreement does it trigger?

Most attempts to settle the debate point to the etymology of the word, but this is an unreliable guide. Some words begin life as plurals but become reanalyzed as singulars or vice versa. For example, truce, bodice, and to some extent dice and pence were originally plural forms that have been made into singulars. As some of the posts I linked to last time pointed out, agenda was also a Latin plural, much like data, but it’s almost universally treated as a singular now, along with insignia, opera, and many others. On the flip side, cherries and peas were originally singular forms that were reanalyzed as plurals, giving rise to the new singular forms cherry and pea.

So obviously etymology alone cannot tell us what a word should mean or how it should work today, but then again, any attempt to say what a word ought mean ultimately rests on one logical fallacy or another, because you can’t logically derive an ought from an is. Nevertheless, if you want to determine how a word really works, you need to look at real usage. Present usage matters most, but historical usage can also shed light on such problems.

Unfortunately for the “data is plural” crowd, both present and historical usage are far more complicated than most people realize. The earliest citation in the OED for either data or datum is from 1630, but it’s just a one-word quote, “Data.” The next citation is from 1645 for the plural count noun “datas” (!), followed by the more familiar “data” in 1646. The singular mass noun appeared in 1702, and the singular count noun “datum” didn’t appear until 1737, roughly a century later. Of course, you always have to take such dates with a grain of salt, because any of them could be antedated, but it’s clear that even from the beginning, data‘s grammatical number was in doubt. Some writers used it as a plural, some used it as a singular with the plural form “datas”, and apparently no one used its purported singular form “datum” for another hundred years.

It appears that historical English usage doesn’t help much in settling the matter, though it does make a few things clear. First, there has been considerable variation in the perceived number of data (mass, singular count, or plural count) for over 350 years. Second, the purported singular form, datum, was apparently absent from English for almost a hundred years and continues to be relatively rare today. In fact, in Mark Davies’ COCA, “data point” slightly outnumbers “datum”, and most of the occurrences of “datum” are not the traditional singular form of data but other specialized uses. This is the first strike against data as a plural; count nouns are supposed to have singular forms, though there are a handful of words known as pluralia tantum, which occur only in the plural. I’ll get to that later.

So data doesn’t really seem to have a singular form. At least you can still count data, right? Well, apparently not. Nearly all of the hits in COCA for “[mc*] data” (meaning a cardinal number followed by the word data) are for things like “two data sets” or “74 data points”. It seems that no one who uses data as a plural count noun ever bothers to count their data, or when they do, they revert to using “data” as a mass noun to modify a normal count noun like “points”. Strike two, and this is a big one. The Cambridge Grammar of the English Language gives use with cardinal numbers as the primary test of countability.

Data does better when it comes to grammatical agreement, though this is not as positive as it may seem. It’s easy enough to find constructions like as these few data show, but it’s just as easy to find constructions like there is very little data. And when the word fails the first two tests, the results here seem suspect. Aren’t people simply forcing the word data to behave like a plural count noun? As this wonderfully thorough post by Norman Gray points out (seriously, read the whole thing), “People who scrupulously write ‘data’ as a plural are frequently confused when it comes to more complicated sentences”, writing things like “What is HEP data? The data themselves…”. The urge to treat data as a singular mass noun—because that’s how it behaves—is so strong that it takes real effort to make it seem otherwise.

It seems that if data really is a plural noun, it’s a rather defective one. As I mentioned earlier, it’s possible that it’s some sort of plurale tantum, but even this conclusion is unsatisfying.

Many pluralia tantum in English are words that refer to things made of two halves, like scissors or tweezers, but there are others like news or clothes. You can’t talk about one new or one clothe (though clothes was originally the plural of cloth). You also usually can’t talk about numbers of such things without using an additional counting word or paraphrasing. Thus we have news items or articles of clothing.

Similarly, you can talk about data points or points of data, but at best this undermines the idea that data is an ordinary plural count noun. But language is full of exceptions, right? Maybe data is just especially exceptional. After all, as Robert Lane Green said in this post, “We have a strong urge to just have language behave, but regular readers of this column know that, as the original Johnson knew, it just won’t.”

I must disagree. The only thing that makes data exceptional is that people have gone to such great lengths to try to get it to act like a plural, but it just isn’t working. Its irregularity is entirely artificial, and there’s no purpose for it except a misguided loyalty to the word’s Latin roots. I say it’s time to stop the act and just let the word behave—as a mass noun.

By

The Data Is In, pt. 1

Lately there has been a spate of blog posts on the question of whether data is a singular or a plural noun. Surprisingly, most of them come down on the side of saying that it can be singular—except when it’s plural. Although saying that it can be singular is refreshingly open-minded, I’ve still got a few problems with the facts and reasoning that led them to that conclusion, as well as the wishy-washiness of saying that it’s singular except when it isn’t.

The first post, “Is Data Is, or Is Data Ain’t, a Plural?”, came from the Wall Street Journal, and it took what Robert Lane Greene of the Economist blog Johnson called “an unusually fence-sitting position“: although they say that they “hereby join the majority” by accepting it as either singular or plural, they predict that “the plural will continue to dominate in our prose”. And they give this head-scratching reasoning:

Singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Isn’t all data—whether you think of it as a count or a mass noun—“collections of information”? Just because something’s in a collection doesn’t mean it’s singular. For example, if I had an extensive rock collection, you probably wouldn’t say that I had a lot of rock, though I suppose you could; you’d probably say that I have a lot of rocks. The number really depends on the way we perceive the things in the collection, not on the fact that it’s in a collection. But if that wasn’t confusing enough, they give this unreliable test of data‘s number:

As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster. (italics added for clarity)

Doesn’t this test simply tell you that data should always be plural? In what case would the singular is ever pass muster? Either I’m missing something important about how you’re supposed to use this substitution test or it’s simply broken.

Next came this post on the Guardian’s Datablog. Sadly, it’s even more muddled than the Wall Street Journal post, and it’s depressingly light on data. It simply asserts, without examination,

Strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.

But despite further assertions that data is “strictly a plural”, the Guardian style guide says, “Data takes a singular verb”, though they correctly note that (virtually) “no one ever uses ‘agendum’ or ‘datum'”. But this idoesn’t make much sense; if it’s plural, why does it take a singular verb? And if it takes a singular verb, is it really plural?

The Guardian post also linked to this National Geographic post from a few years ago, which says much the same thing but somehow manages to be even more muddled. It starts off badly by saying that “data is often used as a collective noun referring to information, statistics, and the like”. Here they mean “mass noun”, not “collective noun”. Note that the Wikipedia articles each say at the top that these terms should not be confused. But aside from this basic mistake, note how it seems to contradict the Wall Street Journal post, which says that singular verbs are used for collections of information.

I wondered if this was just a simple error in the National Geographic post; from context, I would have expected the so-called “collective” form to use a singular verb. But in the next paragraph they say that their style is to use data as a plural when “referring to a body of facts, figures, and such.”

The post gets even more confusing, pointing out some of National Geographic’s supposed errors and then saying that both the singular and plural are considered standard. If they’re both standard, then how are their examples errors? The post ends with a red herring about avoiding confusion and the bizarre statement, “I’d rather not box writers into a singular form.” So why box them into a plural form? If there’s a distinction to be made, even a subtle one, between data as a mass noun and data as a singular noun, why not encourage it? Why whitewash over it by insisting that data always be plural?

Ultimately, though, this whole debate rests on one question: how do we know whether a word is plural or singular? And that’s what I’ll tackle next time.

Read part 2 here.

By

International T-Shirt Day

Today is International T-Shirt Day! To celebrate, Spreadshirt is offering free shipping on all orders today only with the coupon code T-DAY2012. Check out the Arrant Pedantry Store, and remember that all designs are available for customization in the Spreadshirt Marketplace.

By

Take My Commas—Please

Most editors are probably familiar with the rule that commas should be used to set off nonrestrictive appositives and that no commas should be used around restrictive appositives. (In Chicago 16, it’s under 6.23.) A restrictive appositive specifies which of a group of possible referents you’re talking about, and it’s thus integral to the sentence. A nonrestrictive appositive simply provides extra information about the thing you’re talking about. Thus you would write My wife, Ruth, (because I only have one wife) but My cousin Steve (because I have multiple cousins, and one is named Steve). The first tells you that my wife’s name is Ruth, and the latter tells you which of my cousins I’m talking about.

Most editors are probably also familiar with the claim that if you leave out the commas after a phrase like “my wife”, the implication is that you’re a polygamist. In one of my editing classes, we would take a few minutes at the start of each class to share bloopers with the rest of the class. One time my professor shared the dedication of a book, which read something like “To my wife Cindy”. Obviously the lack of a comma implies that he must be a polygamist! Isn’t that funny? Everyone had a good laugh.

Except me, that is. I was vaguely annoyed by this alleged blooper, which required a willful misreading of the dedication. There was no real ambiguity here—only an imagined one. If the author had actually meant to imply that he was a polygamist, he would have written something like “To my third wife, Cindy”, though of course he could still write this if he were a serial monogamist.

Usually I find this insistence on commas a little exasperating, but in one instance the other day, the commas were actually wrong. A proofreader had corrected a caption which read “his wife Arete” to “his wife, Arete,” which probably seemed like a safe change to make but which was wrong in this instance—the man referred to in the caption had three wives concurrently. I stetted the change, but it got me thinking about fact-checking and the extent to which it’s an editor’s job to split hairs.

This issue came up repeatedly during a project I worked on last year. It was a large book with a great deal of biographical information in it, and I frequently came across phrases like “Hans’s daughter Ingrid”. Did Hans have more than one daughter, or was she his only daughter? Should it be “Hans’s daughter, Ingrid,” or “Hans’s daughter Ingrid”? And how was I to know?

Pretty quickly I realized just how ridiculous the whole endeavor was. I had neither the time nor the resources to look up World War II–era German citizens in a genealogical database, and I wasn’t about to bombard the author with dozens of requests for him to track down the information either. Ultimately, it was all pretty irrelevant. It simply made no difference to the reader. I decided we were safe just leaving the commas out of such constructions.

And, honestly, I think it’s even safer to leave the commas out when referring to one’s spouse. Polygamy is such a rarity in our culture that it’s usually highlighted in the text, with wording such as “John and Janet, one of his three wives”. Assuming that “my wife Ruth” implies that I have more than one wife is a deliberate flouting of the cooperative principle of communication. This insistence on a narrow, prescribed meaning over the obvious, intended meaning is a problem with many prescriptive rules, but, once again, that’s a topic for another day.

Please note, however, that I’m not saying that anything goes or that you can punctuate however you want as long as the meaning’s clear. In cases where it’s a safe assumption that there’s just one possible referent, or when it doesn’t really matter, the commas can sometimes seem a little fussy and superfluous.

By

What Descriptivism Is and Isn’t

A few weeks ago, the New Yorker published what is nominally a review of Henry Hitchings’ book The Language Wars (which I still have not read but have been meaning to) but which was really more of a thinly veiled attack on what its author, Joan Acocella, sees as the moral and intellectual failings of linguistic descriptivism. In what John McIntyre called “a bad week for Joan Acocella”, the whole mess was addressed multiple times by various bloggers and other writers.* I wanted to write about it at the time but was too busy, but then the New Yorker did me a favor by publishing a follow-up, “Inescapably, You’re Judged by Your Language”, which was equally off-base, so I figured that the door was still open.

I suspected from the first paragraph that Acocella’s article was headed for trouble, and the second paragraph quickly confirmed it. For starters, her brief description of the history and nature of English sounds like it’s based more on folklore than fact. A lot of people lived in Great Britain before the Anglo-Saxons arrived, and their linguistic contributions were effectively nil. But that’s relatively small stuff. The real problem is that she doesn’t really understand what descriptivism is, and she doesn’t understand that she doesn’t understand, so she spends the next five pages tilting at windmills.

Acocella says that descriptivists “felt that all we could legitimately do in discussing language was to say what the current practice was.” This statement is far too narrow, and not only because it completely leaves out historical linguistics. As a linguist, I think it’s odd to describe linguistics as merely saying what the current practice is, since it makes it sound as though all linguists study is usage. Do psycholinguists say what the current practice is when they do eye-tracking studies or other psychological experiments? Do phonologists or syntacticians say what the current practice is when they devise abstract systems of ordered rules to describe the phonological or syntactic system of a language? What about experts in translation or first-language acquisition or computational linguistics? Obviously there’s far more to linguistics than simply saying what the current practice is.

But when it does come to describing usage, we linguists love facts and complexity. We’re less interested in declaring what’s correct or incorrect than we are in uncovering all the nitty-gritty details. It is true, though, that many linguists are at least a little antipathetic to prescriptivism, but not without justification. Because we linguists tend to deal in facts, we take a rather dim view of claims about language that don’t appear to be based in fact, and, by extension, of the people who make those claims. And because many prescriptions make assertions that are based in faulty assumptions or spurious facts, some linguists become skeptical or even hostile to the whole enterprise.

But it’s important to note that this hostility is not actually descriptivism. It’s also, in my experience, not nearly as common as a lot of prescriptivists seem to assume. I think most linguists don’t really care about prescriptivism unless they’re dealing with an officious copyeditor on a manuscript. It’s true that some linguists do spend a fair amount of effort attacking prescriptivism in general, but again, this is not actually descriptivism; it’s simply anti-prescriptivism.

Some other linguists (and some prescriptivists) argue for a more empirical basis for prescriptions, but this isn’t actually descriptivism either. As Language Log’s Mark Liberman argued here, it’s just prescribing on the basis of evidence rather than person taste, intuition, tradition, or peevery.

Of course, all of this is not to say that descriptivists don’t believe in rules, despite what the New Yorker writers think. Even the most anti-prescriptivist linguist still believes in rules, but not necessarily the kind that most people think of. Many of the rules that linguists talk about are rather abstract schematics that bear no resemblance to the rules that prescriptivists talk about. For example, here’s a rather simple one, the rule describing intervocalic alveolar flapping (in a nutshell, the process by which a word like latter comes to sound like ladder) in some dialects of English:

intervocalic alveolar flapping

Rules like these constitute the vast bulk of the language, though they’re largely subconscious and unseen, like a sort of linguistic dark matter. The entire canon of prescriptions (my advisor has identified at least 10,000 distinct prescriptive rules in various handbooks, though only a fraction of these are repeated) seems rather peripheral and inconsequential to most linguists, which is another reason why we get annoyed when prescriptivists insist on their importance or identify standard English with them. Despite what most people think, standard English is not really defined by prescriptive rules, which makes it somewhat disingenuous and ironic for prescriptivists to call us hypocrites for writing in standard English.

If there’s anything disingenuous about linguists’ belief in rules, it’s that we’re not always clear about what kinds of rules we’re talking about. It’s easy to say that we believe in the rules of standard English and good communication and whatnot, but we’re often pretty vague about just what exactly those rules are. But that’s probably a topic for another day.

*A roundup of some of the posts on the recent brouhaha:

Cheap Shot”, “A Bad Week for Joan Acocella”, “Daddy, Are Prescriptivists Real?”, and “Unmourned: The Queen’s English Society” by John McIntyre

Rules and Rules” and “A Half Century of Usage Denialism” by Mark Liberman

Descriptivists as Hypocrites (Again)” by Jan Freeman

Ignorant Blathering at The New Yorker”, by Stephen Dodson, aka Languagehat

Re: The Language Wars” and “False Fronts in the Language Wars” by Steven Pinker

The New Yorker versus the Descriptivist Specter” by Ben Zimmer

Speaking Truth about Power” by Nancy Friedman

Sator Resartus” by Ben Yagoda

I’m sure there are others that I’ve missed. If you know of any more, feel free to make note of them in the comments.

By

Guest Post at Logophilius

Today I have a guest post about rules and style choices at Andy Hollandbeck's blog Logophilius. Go take a look, and while you’re there, check out the rest of his site.