Arrant Pedantry

By

It’s All Grammar—So What?

It’s a frequent complaint among linguists that laypeople use the term grammar in such a loose and unsystematic way that it’s more or less useless. They say that it’s overly broad, encompassing many different types of rules, and that it allows people to confuse things as different as syntax and spelling. They insist that spelling, punctuation, and ideas such as style or formality are not grammar at all, that grammar is really just the rules of syntax and morphology that define the language.

Arnold Zwicky, for instance, has complained that grammar as it’s typically used refers to nothing more than a “grab-bag of linguistic peeve-triggers”. I think this is an overly negative view; yes, there are a lot of people who peeve about grammar, but I think that most people, when they talk about grammar, are thinking about how to say things well or correctly.

Some people take linguists’ insistence on the narrower, more technical meaning of grammar as a sign of hypocrisy. After all, they say, with something of a smirk, shouldn’t we just accept the usage of the majority? If almost everyone uses grammar in a broad and vague way, shouldn’t we consider that usage standard? Linguists counter that this really is an important distinction, though I think it’s fair to say that they have a personal interest here; they teach grammar in the technical sense and are dismayed when people misunderstand what they do.

I’ve complained about this myself, but I’m starting to wonder whether it’s really something to worry about. (Of course, I’m probably doubly a hypocrite, what with all the shirts I sell with the word grammar on them.) After all, we see similar splits between technical and popular terminology in a lot of other fields, and they seem to get by just fine.

Take the terms fruit and vegetable, for instance. In popular use, fruits are generally sweeter, while vegetables are more savory or bitter. And while most people have probably heard the argument that tomatoes are actually fruits, not vegetables, they might not realize that squash, eggplants, peppers, peas, green beans, nuts, and grains are fruits too, at least by the botanical definition. And vegetable doesn’t even have a botanical definition—it’s just any part of a plant (other than fruits or seeds) that’s edible. It’s not a natural class at all.

In a bit of editorializing, the Oxford English Dictionary adds this note after its first definition of grammar:

As above defined, grammar is a body of statements of fact—a ‘science’; but a large portion of it may be viewed as consisting of rules for practice, and so as forming an ‘art’. The old-fashioned definition of grammar as ‘the art of speaking and writing a language correctly’ is from the modern point of view in one respect too narrow, because it applies only to a portion of this branch of study; in another respect, it is too wide, and was so even from the older point of view, because many questions of ‘correctness’ in language were recognized as outside the province of grammar: e.g. the use of a word in a wrong sense, or a bad pronunciation or spelling, would not have been called a grammatical mistake. At the same time, it was and is customary, on grounds of convenience, for books professedly treating of grammar to include more or less information on points not strictly belonging to the subject.

There are a few points here to consider. The definition of grammar has not been solely limited to syntax and morphology for many years. Once it started branching out into notions of correctness, it made sense to treat grammar, usage, spelling, and pronunciation together. From there it’s a short leap to calling the whole collection grammar, since there isn’t really another handy label. And since few people are taught much in the way of syntax and morphology unless they’re majoring in linguistics, it’s really no surprise that the loose sense of grammar predominates. I’ll admit, however, that it’s still a little exasperating to see lists of grammar rules that everyone gets wrong that are just spelling rules or, at best, misused words.

The root of the problem is that laypeople use words in ways that are useful and meaningful to them, and these ways don’t always jibe with scientific facts. It’s the same thing with grammar; laypeople use it to refer to language rules in general, especially the ones they’re most conscious of, which tend to be the ones that are the most highly regulated—usage, spelling, and style. Again, issues of syntax, morphology, semantics, usage, spelling, and style don’t constitute a natural class, but it’s handy to have a word that refers to the aspects of language that most people are conscious of and concerned with.

I think there still is a problem, though, and it’s that most people generally have a pretty poor understanding of things like syntax, morphology, and semantics. Grammar isn’t taught much in schools anymore, so many people graduate from high school and even college without much of an understanding of grammar beyond spelling and mechanics. I got out of high school without knowing anything more advanced than prepositional phrases. My first grammar class in college was a bit of a shock, because I’d never even learned about things like the passive voice or dependent clauses before that point, so I have some sympathy for those people who think that grammar is mostly just spelling and punctuation with a few minor points of usage or syntax thrown in.

So what’s the solution? Well, maybe I’m just biased, but I think it’s to teach more grammar. I know this is easier said than done, but I think it’s important for people to have an understanding of how language works. A lot of people are naturally interested in or curious about language, and I think we do those students a disservice if all we teach them is never to use infer for imply and to avoid the passive voice. Grammar isn’t just a set of rules telling you what not to do; it’s also a fascinatingly complex and mostly subconscious system that governs the singular human gift of language. Maybe we just need to accept the broader sense of grammar and start teaching people all of what it is.

Addendum: I just came across a blog post criticizing the word funner as bad grammar, and my first reaction was “That’s not grammar!” It’s always easier to preach than to practice, but my reaction has me reconsidering my laissez-faire attitude. While it seems handy to have a catch-all term for language errors, regardless of what type they are, it also seems handy—probably more so—to distinguish between violations of the regulative rules and constitutive rules of language. But this leaves us right where we started.

By

The Data Is In, pt. 2

In the last post, I said that the debate over whether data is singular or plural is ultimately a question of how we know whether a word is singular or plural, or, more accurately, whether it is count or mass. To determine whether data is a count or a mass noun, we’ll need to answer a few questions. First—and this one may seem so obvious as to not need stating—does it have both singular and plural forms? Second, does it occur with cardinal numbers? Third, what kinds of grammatical agreement does it trigger?

Most attempts to settle the debate point to the etymology of the word, but this is an unreliable guide. Some words begin life as plurals but become reanalyzed as singulars or vice versa. For example, truce, bodice, and to some extent dice and pence were originally plural forms that have been made into singulars. As some of the posts I linked to last time pointed out, agenda was also a Latin plural, much like data, but it’s almost universally treated as a singular now, along with insignia, opera, and many others. On the flip side, cherries and peas were originally singular forms that were reanalyzed as plurals, giving rise to the new singular forms cherry and pea.

So obviously etymology alone cannot tell us what a word should mean or how it should work today, but then again, any attempt to say what a word ought mean ultimately rests on one logical fallacy or another, because you can’t logically derive an ought from an is. Nevertheless, if you want to determine how a word really works, you need to look at real usage. Present usage matters most, but historical usage can also shed light on such problems.

Unfortunately for the “data is plural” crowd, both present and historical usage are far more complicated than most people realize. The earliest citation in the OED for either data or datum is from 1630, but it’s just a one-word quote, “Data.” The next citation is from 1645 for the plural count noun “datas” (!), followed by the more familiar “data” in 1646. The singular mass noun appeared in 1702, and the singular count noun “datum” didn’t appear until 1737, roughly a century later. Of course, you always have to take such dates with a grain of salt, because any of them could be antedated, but it’s clear that even from the beginning, data‘s grammatical number was in doubt. Some writers used it as a plural, some used it as a singular with the plural form “datas”, and apparently no one used its purported singular form “datum” for another hundred years.

It appears that historical English usage doesn’t help much in settling the matter, though it does make a few things clear. First, there has been considerable variation in the perceived number of data (mass, singular count, or plural count) for over 350 years. Second, the purported singular form, datum, was apparently absent from English for almost a hundred years and continues to be relatively rare today. In fact, in Mark Davies’ COCA, “data point” slightly outnumbers “datum”, and most of the occurrences of “datum” are not the traditional singular form of data but other specialized uses. This is the first strike against data as a plural; count nouns are supposed to have singular forms, though there are a handful of words known as pluralia tantum, which occur only in the plural. I’ll get to that later.

So data doesn’t really seem to have a singular form. At least you can still count data, right? Well, apparently not. Nearly all of the hits in COCA for “[mc*] data” (meaning a cardinal number followed by the word data) are for things like “two data sets” or “74 data points”. It seems that no one who uses data as a plural count noun ever bothers to count their data, or when they do, they revert to using “data” as a mass noun to modify a normal count noun like “points”. Strike two, and this is a big one. The Cambridge Grammar of the English Language gives use with cardinal numbers as the primary test of countability.

Data does better when it comes to grammatical agreement, though this is not as positive as it may seem. It’s easy enough to find constructions like as these few data show, but it’s just as easy to find constructions like there is very little data. And when the word fails the first two tests, the results here seem suspect. Aren’t people simply forcing the word data to behave like a plural count noun? As this wonderfully thorough post by Norman Gray points out (seriously, read the whole thing), “People who scrupulously write ‘data’ as a plural are frequently confused when it comes to more complicated sentences”, writing things like “What is HEP data? The data themselves…”. The urge to treat data as a singular mass noun—because that’s how it behaves—is so strong that it takes real effort to make it seem otherwise.

It seems that if data really is a plural noun, it’s a rather defective one. As I mentioned earlier, it’s possible that it’s some sort of plurale tantum, but even this conclusion is unsatisfying.

Many pluralia tantum in English are words that refer to things made of two halves, like scissors or tweezers, but there are others like news or clothes. You can’t talk about one new or one clothe (though clothes was originally the plural of cloth). You also usually can’t talk about numbers of such things without using an additional counting word or paraphrasing. Thus we have news items or articles of clothing.

Similarly, you can talk about data points or points of data, but at best this undermines the idea that data is an ordinary plural count noun. But language is full of exceptions, right? Maybe data is just especially exceptional. After all, as Robert Lane Green said in this post, “We have a strong urge to just have language behave, but regular readers of this column know that, as the original Johnson knew, it just won’t.”

I must disagree. The only thing that makes data exceptional is that people have gone to such great lengths to try to get it to act like a plural, but it just isn’t working. Its irregularity is entirely artificial, and there’s no purpose for it except a misguided loyalty to the word’s Latin roots. I say it’s time to stop the act and just let the word behave—as a mass noun.

By

The Data Is In, pt. 1

Lately there has been a spate of blog posts on the question of whether data is a singular or a plural noun. Surprisingly, most of them come down on the side of saying that it can be singular—except when it’s plural. Although saying that it can be singular is refreshingly open-minded, I’ve still got a few problems with the facts and reasoning that led them to that conclusion, as well as the wishy-washiness of saying that it’s singular except when it isn’t.

The first post, “Is Data Is, or Is Data Ain’t, a Plural?”, came from the Wall Street Journal, and it took what Robert Lane Greene of the Economist blog Johnson called “an unusually fence-sitting position“: although they say that they “hereby join the majority” by accepting it as either singular or plural, they predict that “the plural will continue to dominate in our prose”. And they give this head-scratching reasoning:

Singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.

Otherwise, generally continue to use the plural: Data are still being collected.

Isn’t all data—whether you think of it as a count or a mass noun—“collections of information”? Just because something’s in a collection doesn’t mean it’s singular. For example, if I had an extensive rock collection, you probably wouldn’t say that I had a lot of rock, though I suppose you could; you’d probably say that I have a lot of rocks. The number really depends on the way we perceive the things in the collection, not on the fact that it’s in a collection. But if that wasn’t confusing enough, they give this unreliable test of data‘s number:

As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster. (italics added for clarity)

Doesn’t this test simply tell you that data should always be plural? In what case would the singular is ever pass muster? Either I’m missing something important about how you’re supposed to use this substitution test or it’s simply broken.

Next came this post on the Guardian’s Datablog. Sadly, it’s even more muddled than the Wall Street Journal post, and it’s depressingly light on data. It simply asserts, without examination,

Strictly-speaking, data is a plural term. Ie, if we’re following the rules of grammar, we shouldn’t write “the data is” or “the data shows” but instead “the data are” or “the data show”.

But despite further assertions that data is “strictly a plural”, the Guardian style guide says, “Data takes a singular verb”, though they correctly note that (virtually) “no one ever uses ‘agendum’ or ‘datum'”. But this idoesn’t make much sense; if it’s plural, why does it take a singular verb? And if it takes a singular verb, is it really plural?

The Guardian post also linked to this National Geographic post from a few years ago, which says much the same thing but somehow manages to be even more muddled. It starts off badly by saying that “data is often used as a collective noun referring to information, statistics, and the like”. Here they mean “mass noun”, not “collective noun”. Note that the Wikipedia articles each say at the top that these terms should not be confused. But aside from this basic mistake, note how it seems to contradict the Wall Street Journal post, which says that singular verbs are used for collections of information.

I wondered if this was just a simple error in the National Geographic post; from context, I would have expected the so-called “collective” form to use a singular verb. But in the next paragraph they say that their style is to use data as a plural when “referring to a body of facts, figures, and such.”

The post gets even more confusing, pointing out some of National Geographic’s supposed errors and then saying that both the singular and plural are considered standard. If they’re both standard, then how are their examples errors? The post ends with a red herring about avoiding confusion and the bizarre statement, “I’d rather not box writers into a singular form.” So why box them into a plural form? If there’s a distinction to be made, even a subtle one, between data as a mass noun and data as a singular noun, why not encourage it? Why whitewash over it by insisting that data always be plural?

Ultimately, though, this whole debate rests on one question: how do we know whether a word is plural or singular? And that’s what I’ll tackle next time.

Read part 2 here.

By

No Dice

If you’ve ever had to learn a foreign language, you may have struggled to memorize plural forms of nouns. German, for example, has about a half a dozen ways of forming plurals, and it’s a chore to remember which kind of plural each noun takes. English, by comparison, is ridiculously easy. Here’s how it works for nearly every English noun: add -s to the end. Sometimes you need to insert an e before the s, and sometimes you need to change a preceding y to ie, but that’s the rule in a nutshell.

Of course, there are still plenty of exceptions: a couple that end in -en (oxen and the strange double plural children), a handful of umlaut plurals (man–men, foot–feet, mouse–mice, etc.), some uninflected plurals (usually for domesticated or game animals, such as sheep, deer, and so on), and a plethora of foreign borrowings (particularly from Latin and Greek) that often follow rules from their donor languages but occasionally don’t. There are a few other oddballs—like person–people, for example—but nearly every English count noun fits into one of these categories.

But there’s one plural that doesn’t fit into any of these categories, because it’s been caught for centuries in a strange limbo between count nouns, which take plural forms, and mass nouns, which don’t. It’s dice. If you need a refresher, mass nouns generally refer to things that are not discrete, such as milk or oil, though some refer to things that are made of discrete pieces “whose indivual identities are not usually important to us,” as Arnold Zwicky put it in this Language Log post—words like corn or rice. You could count the individual grains or kernels if you wanted to, but why would you ever want to?

And this is how dice slipped through the cracks of language change. Originally, die was a regular noun that formed its plural by adding an s sound to the end. (For the moment, let’s leave aside the issue of spelling, because Middle and Early Modern English spelling was anything but standard.) At some point in the history of English, the final -s in plurals was voiceless, meaning that it was always pronounced with an s sound, not a z sound. But then that changed, probably sometime in the 1500s, so that the final -s was always voiced—that is, pronounced as a z—unless it followed a voiceless sound. Strangely, this sound change seems to have affected only the plural and possessive -s endings and not other word-final s’s.

But around that time, we start seeing the plural of die, when referring to those little cubes with pips used for games and whatnot, spelled as dice (and similar forms). In Modern English spelling, the final -s on a plural can be either voiced or voiceless, depending on the preceding word, but -ce is always voiceless. As the regular plural ending was becoming voiced for many many words, it remained voiceless in dice. Why?

Well, apparently because people had stopped thinking of it as a plural and started thinking of it as a mass noun, much like corn and rice, so they stopped seeing the s sound on the end as the plural marker and started perceiving it as simply part of the word. Singular dice can be found back to the late 1300s, and when the sound change came along in the 1500s and voiced most plural -s endings, dice was left behind, with its spelling altered to show that it was unequivocally voiceless. In other senses of the word, die was still thought of as a regular count noun, so its plural forms ended up as dies.*

Dice wasn’t the only word passed over in this way, though; truce (originally the plural of true, meaning “pledge” or “oath”), bodice (plural of body), and pence (a contracted plural form of penny) come to us the same way. Speakers subconsciously reanalyzed these words as mass nouns or singular count nouns, so their final s sounds stayed voiceless. Similarly, once, twice, and thrice were originally genitive forms, but they ceased to be thought of as such and consequently retained their voiceless sounds, respelled with ce.

But the strange thing is that whereas the words mentioned above made the transition to mass nouns or new singular count nouns, usage of dice has been split for centuries. We’ve never fully made the switch to thinking of dice as a mass noun, used regardless of the actual number of the things, because, unlike rice or corn, we do frequently care about the number of dice being used. Instead of a true mass noun, it’s become an uninflected count noun—one dice, two dice—for many people, though it exists alongside the original singular die. But singular dice is rare in print, because we’re told that it’s properly one die, two dice, even though some dictionaries note that singular dice is much more frequent in gaming than die.

So where does that leave us? You can go with singular die and possibly be thought of as something of a pedant, or you can go with singular dice and possibly be thought of as a little ignorant. As for me, I usually use singular die and feel twinges of self-loathing when I do so; I haven’t had the heart to correct my boys when they use singular dice.

*For more on the reconstruction of the plural ending in English, see the section on the English plural suffix in the chapter “Reconstruction” in Language History: An Introduction, by Andrew L. Sihler (Philadelphia: John Benjamins, 2000).

By

However

Several weeks ago, Bob Scopatz asked in a comment about the word however, specifically whether it should be preceded by a comma or a semicolon when it’s used between two clauses. He says that a comma always seems fine to him, but apparently this causes people to look askance at him.

The rule here is pretty straightforward, and Purdue’s Online Writing Lab has a nice explanation. Independent clauses joined by coordinating conjunctions are separated by a comma; independent clauses that are not joined by coordinating conjunctions or are joined by what OWL calls “conjunctive adverbs” require a semicolon.

I’ve also seen the terms “transitional adverb” and “transitional phrase,” though the latter usually refers to multiword constructions like as a result, for example, and so on. These terms are probably more accurate since (I believe) words and phrases like however are not, strictly speaking, conjunctions. Though they do show a relationship between two clauses, that relationship is more semantic or rhetorical than grammatical.

Since however falls into this group, it should be preceded by a semicolon, though it can also start a new sentence. Grammar-Monster.com has some nice illustrative examples:

I am leaving on Tuesday, however, I will be back on Wednesday to collect my wages.
I am leaving on Tuesday; however, I will be back on Wednesday to collect my wages.
I am leaving on Tuesday. However, I will be back on Wednesday to collect my wages.

The first example is incorrect, while the latter two are correct. Note that “however” is also followed by a comma. (But would also work here, though in that case it would be preceded by a comma and not followed by one.)

Bob also mentioned that he sometimes starts a sentence with “however,” and this usage is a little more controversial. Strunk & White and others forbade however in sentence- or clause-initial position, sometimes with the argument that in this position it can only mean “in whatever way” or “to whatever extent.”

It’s true that however is sometimes used this way, as in “However it is defined, the middle class is standing on shaky ground,” to borrow an example from COCA. But this is clearly different from the Grammar-Monster sentences above. In those, the punctuation—namely the comma after “however”—indicates that this is not the “in whatever way” however, but rather the “on the contrary” or “in spite of that” one.

Some editors fastidiously move sentence-initial “howevers” to a position later in the sentence, as in I will be back on Wednesday, however, to collect my wages. As long as it’s punctuated correctly, it’s fine in either location, so there’s no need to move it. But note that when it occurs in the middle of a clause, it’s surrounded by commas.

It’s possible that sentence-initial however could be ambiguous without the following comma, but even then the confusion is likely to be momentary. I don’t see this as a compelling reason to avoid sentence-initial however, though I do believe it’s important to punctuate it properly, with both a preceding semicolon or period and a following comma, to avoid tripping up the reader.

In a nutshell, however is an adverb, not a true conjunction, so it can’t join two independent clauses with just a comma. You can either join those clauses with a semicolon or separate them with a period. But either way, however should be set off by commas. When it’s in the middle of a clause, the commas go on both sides; when it’s at the beginning of a clause, it just needs a following comma. Hopefully this will help Bob (and others) stop getting those funny looks.

By

Comprised of Fail

A few days ago on Twitter, John McIntyre wrote, “A reporter has used ‘comprises’ correctly. I feel giddy.” And a couple of weeks ago, Nancy Friedman tweeted, “Just read ‘is comprised of’ in a university’s annual report. I give up.” I’ve heard editors confess that they can never remember how to use comprise correctly and always have to look it up. And recently I spotted a really bizarre use in Wired, complete with a subject-verb agreement problem: “It is in fact a Meson (which comprise of a quark and an anti-quark). “So what’s wrong with this word that makes it so hard to get right?

I did a project on “comprised of” for my class last semester on historical changes in American English, and even though I knew it was becoming increasingly common even in edited writing, I was still surprised to see the numbers. For those unfamiliar with the rule, it’s actually pretty simple: the whole comprises the parts, and the parts compose the whole. This makes the two words reciprocal antonyms, meaning that they describe opposite sides of a relationship, like buy/sell or teach/learn. Another way to look at it is that comprise essentially means “to be composed of,” while “compose” means “to be comprised in” (note: in, not of). But increasingly, comprise is being used not as an antonym for compose, but as a synonym.

It’s not hard to see why it’s happened. They’re extremely similar in sound, and each is equivalent to the passive form of the other. When “comprises” means the same thing as “is composed of,” it’s almost inevitable that some people are going to conflate the two and produce “is comprised of.” According to the rule, any instance of “comprised of” is an error that should probably be replaced with “composed of.” Regardless of the rule, this usage has risen sharply in recent decades, though it’s still dwarfed by “composed of.” (Though “composed of” appears to be in serious decline. I have no idea why). The following chart shows its frequency in COHA and the Google Books Corpus.

frequency of "comprised of" and "composed of" in COHA and Google Books

Though it still looks pretty small on the chart, “comprised of” now occurs anywhere from 21 percent as often as “composed of” (in magazines) to a whopping 63 percent as often (in speech) according to COCA. (It’s worth noting, of course, that the speech genre in COCA is composed of a lot of news and radio show transcripts, so even though it’s unscripted, it’s not exactly reflective of typical speech.)

frequency of "comprised of" by genre

What I find most striking about this graph is the frequency of “comprised of” in academic writing. It is often held that standard English is the variety of English used by the educated elite, especially in writing. In this case, though, academics are leading the charge in the spread of a nonstandard usage. Like it or not, it’s becoming increasingly more common, and the prestige lent to it by its academic feel is certainly a factor.

But it’s not just “comprised of” that’s the problem; remember that the whole comprises the parts, which means that comprise should be used with singular subjects and plural objects (or multiple subjects with multiple respective objects, as in The fifty states comprise some 3,143 counties; each individual state comprises many counties). So according to the rule, not only is The United States is comprised of fifty states an error, but so is The fifty states comprise the United States.

It can start to get fuzzy, though, when either the subject or the object is a mass or collective noun, as in “youngsters comprise 17% of the continent’s workforce,” to take an example from Mark Davies’ COCA. This kind of error may be harder to catch, because the relationship between parts and whole is a little more abstract.

And with all the data above, it’s important to remember that we’re seeing things that have made it into print. As I said above, many editors have to look up the rule every time they encounter a form of “comprise” in print, meaning that they’re more liable to make mistakes. It’s possible that many more editors don’t even know that there is a rule, and so they read past it without a second thought.

Personally, I gave up on the rule a few years ago when one day it struck me that I couldn’t recall the last time I’d seen it used correctly in my editing. It’s never truly ambiguous (though if you can find an ambiguous example that doesn’t require willful misreading, please share), and it’s safe to assume that if nearly all of our authors who use comprise do so incorrectly, then most of our readers probably won’t notice, because they think that’s the correct usage.

And who’s to say it isn’t correct now? When it’s used so frequently, especially by highly literate and highly educated writers and speakers, I think you have to recognize that the rule has changed. To insist that it’s always an error, no matter how many people use it, is to deny the facts of usage. Good usage has to have some basis in reality; it can’t be grounded only in the ipse dixits of self-styled usage authorities.

And of course, it’s worth noting that the “traditional” meaning of comprise is really just one in a long series of loosely related meanings the word has had since it was first borrowed into English from French in the 1400s, including “to seize,” “to perceive or comprehend,” “to bring together,” and “to hold.” Perhaps the new meaning of “compose” (which in reality is over two hundred years old at this point) is just another step in the evolution of the word.

By

More on That

As I said in my last post, I don’t think the distribution of that and which is adequately explained by the restrictive/nonrestrictive distinction. It’s true that nearly all thats are restrictive (with a few rare exceptions), but it’s not true that all restrictive relative pronouns are thats and that all whiches are nonrestrictive, even when you follow the traditional rule. In some cases that is strictly forbidden, and in other cases it is disfavored to varying degrees. Something that linguistics has taught me is that when your rule is riddled with exceptions and wrinkles, it’s usually sign that you’ve missed something important in your analysis.

In researching the topic for this post, I’ve learned a couple of things: (1) I don’t know syntax as well as I should, and (2) the behavior of relatives in English, particularly that, is far more complex than most editors or pop grammarians realize. First of all, there’s apparently been a century-long argument over whether that is even a relative pronoun or actually some sort of relativizing conjunction or particle. (Some linguists seem to prefer the latter, but I won’t wade too deep into that debate.) Previous studies have looked at multiple factors to explain the variation in relativizers, including the animacy of the referent, the distance between the pronoun and its referent, the semantic role of the relative clause, and the syntactic role of the referent.

It’s often noted that that can’t follow a preposition and that it doesn’t have a genitive form of its own (it must use either whose or of which), but no usage guide I’ve seen ever makes mention of the fact that this pattern follows the accessibility hierarchy. That is, in a cross-linguistic analysis, linguists have found an order to the way in which relative clauses are formed. Some languages can only relativize subjects, others can do subjects and verbal objects, yet others can do subjects, verbal objects, and oblique objects (like the objects of prepositions), and so on. For any allowable position on the hierarchy, all positions to the left are also allowable. The hierarchy goes something like this:

subject ≥ direct object ≥ indirect object ≥ object of stranded preposition ≥ object of fronted preposition ≥ possessor noun phrase ≥ object of comparative particle

What is interesting is that that and the wh- relatives, who and which, occupy overlapping but different portions of the hierarchy. Who and which can relativize anything from subjects to possessors and possibly objects of comparative particles, though whose as the genitive form of which seems a little odd to some, and both sound odd if not outright ungrammatical with comparatives, as in The man than who I’m taller. But that can’t relativize objects of fronted prepositions or anything further down the scale.

Strangely, though, there are things that that can do that who and which can’t. That can sometimes function as a sort of relative adverb, equivalent to the relative adverbs why, where, or when or to which with a preposition. That is, you can say The day that we met, The day when we met, or The day on which we met, but not The day which we met. And which can relativize whole clauses (though some sticklers consider this ungrammatical), while that cannot, as in This author uses restrictive “which,” which bothers me a lot.

So what explains the differences between that and which or who? Well, as I mentioned above, some linguists consider that not a pronoun but a complementizer or conjunction (perhaps a highly pronominal one), making it more akin to the complementizer that, as in He said that relativizers were confusing. And some linguists have also proposed different syntactic structures for restrictive and nonrestrictive clauses, which could account for the limitation of that to restrictive clauses. If that is not a true pronoun but a complementizer, then that could account for its strange distribution. It can’t appear in nonrestrictive clauses, because they require a full pronoun like which or who, and it can’t appear after prepositions, because those constructions similarly require a pronoun. But it can function as a relative adverb, which a regular relative pronoun can’t do.

As I argued in my previous post, it seems that which and that do not occupy separate parts of a single paradigm but are part of two different paradigms that overlap. The differences between them can be characterized in a few different ways, but for some reason, grammarians have seized on the restrictive/nonrestrictive distinction and have written off the rest as idiosyncratic exceptions to the rule or as common errors (when they’ve addressed those points at all).

The proposal to disallow which in restrictive relative clauses, except in the cases where that is ungrammatical—sometimes called Fowler’s rule, though that’s not entirely accurate—is based on the rather trivial observation that all thats are restrictive and that all nonrestrictives are which. It then assumes that the converse is true (or should be) and tries to force all restrictives to be that and all whiches to be nonrestrictive (except for all those pesky exceptions, of course).

Garner calls Fowler’s rule “nothing short of brilliant,”1Garner’s Modern American Usage, 3rd ed., s.v. “that. A. And which.” but I must disagree. It’s based on a rather facile analysis followed by some terrible logical leaps. And insisting on following a rule based on bad linguistic analysis is not only not helpful to the reader, it’s a waste of editors’ time. As my last post shows, editors have obviously worked very hard to put the rule into practice, but this is not evidence of its utility, let alone its brilliance. But a linguistic analysis that could account for all of the various differences between the two systems of relativization in English? Now that just might be brilliant.

Sources

Herbert F. W. Stahlke, “Which That,” Language 52, no. 3 (Sept. 1976): 584–610
Johan Van Der Auwera, “Relative That: A Centennial Dispute,” Journal of Lingusitics 21, no. 1 (March 1985): 149–79
Gregory R. Guy and Robert Bayley, “On the Choice of Relative Pronouns in English,” American Speech 70, no. 2 (Summer 1995): 148–62
Nigel Fabb, “The Difference between English Restrictive and Nonrestrictive Relative Clauses,” Journal of Linguistics 26, no. 1 (March 1990): 57–77
Robert D. Borsley, “More on the Difference between English Restrictive and Nonrestrictive Relative Clauses,” Journal of Linguistics 28, no. 1 (March 1992), 139–48

Notes   [ + ]

1. Garner’s Modern American Usage, 3rd ed., s.v. “that. A. And which.”

By

Which Hunting

I meant to blog about this several weeks ago, when the topic came up in my corpus linguistics class from Mark Davies, but I didn’t have time then. And I know the that/which distinction has been done to death, but I thought this was an interesting look at the issue that I hadn’t seen before.

For one of our projects in the corpus class, we were instructed to choose a prescriptive rule and then examine it using corpus data, determining whether the rule was followed in actual usage and whether it varied over time, among genres, or between the American and British dialects. One of my classmates (and former coworkers) chose the that/which rule for her project, and I found the results enlightening.

She searched for the sequences “[noun] that [verb]” and “[noun] which [verb],” which aren’t perfect—they obviously won’t find every relative clause, and they’ll pull in a few non-relatives—but the results serve as a rough measurement of their relative frequencies. What she found is that before about the 1920s, the two were used with nearly equal frequency. That is, the distinction did not exist. After that, though, which takes a dive and that surges. The following chart shows the trends according to Mark Davies’ Corpus of Historical American English and his Google Books N-grams interface.

It’s interesting that although the two corpora show the same trend, Google Books lags a few decades behind. I think this is a result of the different style guides used in different genres. Perhaps style guides in certain genres picked up the rule first, from whence it disseminated to other style guides. And when we break out the genres in COHA, we see that newspapers and magazines lead the plunge, with fiction and nonfiction books following a few decades later, though use of which is apparently in a general decline the entire time. (NB: The data from the first decade or two in COHA often seems wonky; I think the word counts are low enough in those years that strange things can skew the numbers.)

Proportion of "which" by genres

The strange thing about this rule is that so many people not only take it so seriously but slander those who disagree, as I mentioned in this post. Bryan Garner, for instance, solemnly declares—without any evidence at all—that those who don’t follow the rule “probably don’t write very well,” while those who follow it “just might.”1Garner’s Modern American Usage, 3rd ed., s.v. “that. A. And which.” (This elicited an enormous eye roll from me.) But Garner later tacitly acknowledges that the rule is an invention—not by the Fowler brothers, as some claim, but by earlier grammarians. If the rule did not exist two hundred years ago and was not consistently enforced until the 1920s or later, how did anyone before that time ever manage to write well?

I do say enforced, because most writers do not consistently follow it. In my research for my thesis, I’ve found that changing “which” to “that” is the single most frequent usage change that copy editors make. If so many writers either don’t know the rule or can’t apply it consistently, it stands to reason that most readers don’t know it either and thus won’t notice the difference. Some editors and grammarians might take this as a challenge to better educate the populace on the alleged usefulness of the rule, but I take it as evidence that it’s just not useful. And anyway, as Stan Carey already noted, it’s the commas that do the real work here, not the relative pronouns. (If you’ve already read his post, you might want to go and check it out again. He’s added some updates and new links to the end.)

And as I noted in my previous post on relatives, we don’t observe a restrictive/nonrestrictive distinction with who(m) or, for that matter, with relative adverbs like where or when, so at the least we can say it’s not a very robust distinction in the language and certainly not necessary for comprehension. As with so many other useful distinctions, its usefulness is taken to be self-evident, but the evidence of its usefulness is less than compelling. It seems more likely that it’s one of those random things that sometimes gets grammaticalized, like gender or evidentiality. (Though it’s not fully grammaticalized, because it’s not obligatory and is not a part of the natural grammar of the language, but is a rule that has to be learned later.)

Even if we just look at that and which, we find a lot of exceptions to the rule. You can’t use that as the object of a preposition, even when it’s restrictive. You can’t use it after a demonstrative that, as in “Is there a clear distinction between that which comes naturally and that which is forced, even when what’s forced looks like the real thing?” (I saw this example in COCA and couldn’t resist.) And Garner even notes “the exceptional which”, which is often used restrictively when the relative clause is somewhat removed from its noun.2S.v. “Remote Relatives. B. The Exceptional which.” And furthermore, restrictive which is frequently used in conjoined relative clauses, such as “Eisner still has a huge chunk of stock options—about 8.7 million shares’ worth—that he can’t exercise yet and which still presumably increase in value over the next decade,” to borrow an example from Garner.3S.v. “which. D. And which; but which..”

Something that linguistics has taught me is that when your rule is riddled with exceptions and wrinkles, it’s usually sign that you’ve missed something important in its formulation. I’ll explain what I think is going on with that and which in a later post.

Notes   [ + ]

1. Garner’s Modern American Usage, 3rd ed., s.v. “that. A. And which.”
2. S.v. “Remote Relatives. B. The Exceptional which.”
3. S.v. “which. D. And which; but which..”

By

Distinctions, Useful and Otherwise

In a recent New York Times video interview, Steven Pinker touched on the topic of language change, saying, “I think that we do sometimes lose distinctions that it would be nice to preserve—disinterested to mean ‘impartial’ as opposed to ‘bored’, for example.”

He goes on to make the point that language does not degenerate, because it constantly replenishes itself—a point which I agree with—but that line caught the attention of Merriam-Webster’s Peter Sokolowski, who said, “It’s a useful distinction, but why pick a problematic example?” I responded, “I find it ironic that such a useful distinction is so rarely used. And its instability undermines the claims of usefulness.”

What Mr. Sokolowski was alluding to was the fact that the history of disinterested is more complicated than the simple laments over its loss would indicate. If you’re unfamiliar with the usage controversy, it goes something like this: disinterested originally meant ‘impartial’ or ‘unbiased’, and uninterested originally meant ‘bored’, but now people have used disinterested to mean ‘bored’ so much that you can’t use it anymore, because too many people will misunderstand you. It’s an appealing story that encapsulates prescriptivists’ struggle to maintain important aspects of the language in the face of encroaching decay. Too bad it’s not really true.

I won’t dive too deeply into the history of the two words—the always-excellent Merriam-Webster’s Dictionary of English Usage spends over two pages on the topic, revealing a surprisingly complex history—but suffice it to say that disinterested is, as Peter Sokolowski mildly put it, “a problematic example”. The first definition the OED gives for disinterested is “Without interest or concern; not interested, unconcerned. (Often regarded as a loose use.)” The first citation dates to about 1631. The second definition (the correct one, according to traditionalists) is “Not influenced by interest; impartial, unbiased, unprejudiced; now always, Unbiased by personal interest; free from self-seeking. (Of persons, or their dispositions, actions, etc.)” Its first citation, however, is from 1659. And uninterested was originally used in the “impartial” or “unbiased” senses now attributed to disinterested, though those uses are obsolete.

It’s clear from the OED’s citations that both meanings have existed side by side from the 1600s. So there’s not so much a present confusion of the two words as a continuing, three-and-a-half-century-long confusion. And for good reason, too. The positive form interested is the opposite of both disinterested and uninterested, and yet nobody complains that we can’t use it because readers won’t be sure whether we mean “having the attention engaged” or “being affected or involved”, to borrow the Merriam-Webster definitions. If we can use interested to mean two different things, why do we need two different words to refer to the opposite of those things?

And as my advisor, Don Chapman, has written, “When gauging the usefulness of a distinction, we need to keep track of two questions: 1) is it really a distinction, or how easy is the distinction to grasp; 2) is it actually useful, or how often do speakers really use the distinction.”1Don Chapman, “Bad Ideas in the History of English Usage,” in Studies in the History of the English Language 5, Variation and Change in English Grammar and Lexicon: Contemporary Approaches, ed. Robert A. Cloutier, Anne Marie Hamilton-Brehm, William A. Kretzschmar Jr. (New York: Walter de Gruyter, 2010), 151 Chapman adds that “often the claim that a distinction is useful seems to rest on little more than this: if the prescriber can state a clear distinction, the distinction is considered to be desirable ipso facto.” He then asks, “But how easy is the distinction to maintain in actual usage?” (151).

From the OED citations, it’s clear that speakers have never been able to fully distinguish between the two words. Chapman also pointed out to me that the two prefixes in question, dis- and un-, do not clearly indicate one meaning or the other. The meanings of the two words comes from different meanings of the root interested, not the prefixes, so the assignment of meaning to form is arbitrary and must simply be memorized, which makes the distinction difficult for many people to learn and maintain. And even those who do learn the distinction do not employ it very frequently. I know this is anecdotal, but it seems to me that disinterested is far more often mentioned than it is used. I can’t remember the last time I spotted a genuine use of disinterested in the wild.

I think it’s time we dispel the myth that disinterested and uninterested epitomize a lost battle to preserve useful distinctions. The current controversy over its use is not indicative of current laxness or confusion, because there was never a time when people managed to fully distinguish between the two words. If anything, disinterested epitomizes the prescriptivist tendency to elegize the usage wars. The typical discussion of disinterested is often light on historical facts and heavy on wistful sighs over how we can no longer use a word that was perhaps never as useful as we would like to think it was.

Notes   [ + ]

1. Don Chapman, “Bad Ideas in the History of English Usage,” in Studies in the History of the English Language 5, Variation and Change in English Grammar and Lexicon: Contemporary Approaches, ed. Robert A. Cloutier, Anne Marie Hamilton-Brehm, William A. Kretzschmar Jr. (New York: Walter de Gruyter, 2010), 151

By

Till Kingdom Come

The other day on Twitter, Bryan A. Garner posted, “May I ask a favor? Would all who read this please use the prep. ‘till’ in a tweet? Not till then will we start getting people used to it.” I didn’t help out, partly because I hate pleas of the “Repost this if you agree!” variety and partly because I knew it would be merely a symbolic gesture. Even if all of Garner’s followers and all of their followers used “till” in a tweet, it wouldn’t even be a blip on the radar of usage.

But it did get me thinking about the word till and the fact that a lot of people seem to regard it as incorrect and forms like 'til as correct. The assumption for many people seems to be that it’s a shortened form of until, so it requires an apostrophe to signal the omission. Traditionalists, however, know that although the two words are related, till actually came first, appearing in the language about four hundred years before until.

Both words came into English via Old Norse, where the preposition til had replaced the preposition to. (As I understand it, modern-day North Germanic languages like Swedish and Danish still use it this way.) Despite their similar appearances, to and till are not related; till comes from a different root meaning ‘end’ or ‘goal’ (compare modern German Ziel ‘goal’). Norse settlers brought the word til with them when they started raiding and colonizing northeastern Britain in the 800s.

There was also a compound form, until, from und + til. Und was another Old Norse preposition deriving from the noun und, which is cognate with the English word end. Till and until have been more or less synonymous throughout their history in English, despite their slightly different forms. And as a result of the haphazard process of spelling standardization in English, we ended up with two ls on till but only one on until. The apostrophized form 'til is an occasional variant that shows up far more in unedited than edited writing. Interestingly, the OED’s first citation for 'til comes from P. G. Perrin’s An Index to English in 1939: “Till, until, (’til), these three words are not distinguishable in meaning. Since ’til in speech sounds the same as till and looks slightly odd on paper, it may well be abandoned.”

Mark Davies’ Corpus of Historical American English, however, tells a slightly different story. It shows a slight increase in 'til since the mid-twentieth century, though it has been declining again slightly in the last thirty years. And keep in mind that these numbers come from a corpus of edited writing drawn from books, magazines, and newspapers. It may well be increasing much faster in unedited writing, with only the efforts of copy editors keeping it (mostly) out of print. This chart shows the relative proportions of the three forms—that is, the proportion of each compared to the total of all three.

Relative proportions of till, until, and 'til.

As Garner laments, till is becoming less and less common in writing and may all but disappear within the next century, though predicting the future of usage is always a guessing game, even with clear trends like this. Sometimes they spontaneously reverse, and it’s often not clear why. But why is till in decline? I honestly don’t know for sure, but I suspect it stems from either the idea that longer words are more formal or the perception that it’s a shortened form of until. Contractions and clipped forms are generally avoided in formal writing, so this could be driving till out of use.

Note that we don’t have this problem with to and unto, probably because to is one of the most common words in the language, occurring about 9,000 times per million words in the last decade in COHA. By comparison, unto occurs just under 70 times per million words. There’s no uncertainty or confusion about the use of spelling of to. We tend to be less sure of the meanings and spellings of less frequent words, and this uncertainty can lead to avoidance. If you don’t know which form is right, it’s easy to just not use it.

At any rate, many people are definitely unfamiliar with till and may well think that the correct form is 'til, as Gabe Doyle of Motivated Grammar did in this post four years ago, though he checked his facts and found that his original hunch was wrong.

He’s far from the only person who thought that 'til was correct. When my then-fiancee and I got our wedding announcements printed over eight years ago, the printer asked us if we really wanted “till” instead of “'til” (“from six till eight that evening”). I told him that yes, it was right, and he kind of shrugged and dropped the point, though I got the feeling he still thought I was wrong. He probably didn’t want to annoy a paying customer, though.

And though this is anecdotal and possibly falls prey to the recency illusion, it seems that 'til is on the rise in signage (frequently as ‘til, with a single opening quotation mark rather than an apostrophe), and I even spotted a til' the other day. (I wish I’d thought to get a picture of it.)

I think the evidence is pretty clear that, barring some amazing turnaround, till is dying. It’s showing up less in print, where it’s mostly been replaced by until, and the traditionally incorrect 'til may be hastening its death as people become unsure of which form is correct or even become convinced that till is wrong and 'til is right. I’ll keep using till myself, but I’m not holding out hope for a revival. Sorry, Garner.

%d bloggers like this: