Can You Repeat That?

Sat Oct 01 2016

Today I want to talk about patterns of repetition in writing. I apologize in advance for the math that follows (but not really because I love this kind of stuff). I assure you it's relevant and I will not be suggesting you do any actual math on your story.

For those who have been following along, this example picks up after the scene from two weeks ago:

Sonja dug through her pack, drawing from its very bottom the oilskin parcel she had carried all the way from Westport. Why was the parcel so important it was worth hiring her to deliver it? Why had the Regent been so insistent both that she deliver the parcel as quickly as possible, and that no one should learn of it?

If not for the Regent's urgency, she would not have taken the shorter but more dangerous forest path. She would not have been bitten by that damnable wereling. She teased open the leather straps tying the parcel shut. Whatever it is, she thought, it had better be worth it.

Untied, she peeled back the parcel's layers, revealing a layer of waxed paper which crinkled softly as she opened it. There in the parcel's heart was a velvet pouch, its drawstring pulled tight. She opened it, and a long, heavy key fell into her hand.

It's not bad, particularly, but do you notice a problem? It's with the words "parcel" and "open".

In those short 153 words, "parcel" appears six times. In the 94 words of the second and third paragraphs, "open" or "opened" appear three times.

That's a lot. But is it a problem?

Natural word frequency

Having been exposed to language your whole life, you have an innate sense for the frequency that various words should occur. You know, without even thinking about it, that "there" is a more common word than "house", and that "trophy" is more common than "badminton". In fact, "there" is the 35th most common word in English, while "house" is number 165. "Trophy" and "badminton" are numbers 5914 and 18660, respectively, according to wordcount.org.

A word's natural frequency turns out to be directly related to its place on that list. "The", which is the most common word in English, accounts for about 6% of all the words we read or hear. "Of", which is second on the list, occurs about 1/2 as often as "the", or 3%. "And" is third on the list, meaning it occurs about 1/3rd as often as "the", or 2%. And so on.

Doing the math

In our example passage, "parcel" accounts for 6/153rds of the words, or almost 4%. Once it starts occurring, "open" accounts for 3/94ths, or just over 3%.

Yet those words are, respectively, 8770th and 285th on the list of most common words. Therefore, the natural frequency of "parcel" should be about 6%*1/8770 = 0.0007% of the time. For "open," it's 0.02%.

Thus, in the sample passage, parcel's actual 4% frequency is far in excess of its natural 0.0007% frequency. Open's 3% is similarly way above 0.02%.

So what's the problem?

The problem is that these words occur too often with a short span of text, and because your readers have just as much of an innate sense of word frequency as you do, they will notice the bizarrely high rate at which those words are popping up.

The words become repetitive and draw attention to themselves. This pulls readers' focus out of your story and onto your writing, and not in a good way. As a rule, writers should strive to avoid (mis)uses of language that pull the reader out of their immersive reading experience.

Practical tips

Does this mean you need to count the occurrences of every word in your story and crunch the numbers as above?

Of course not.

It just means that while you write, you should be sensitive to the fact that words can be overused. Since unnaturally frequent words distract readers away from your story, you should strive to avoid repeating less-common words too close together.

Some guidelines:

  1. Don’t worry about the little, functional words of English. Articles, prepositions, pronouns, and so forth. These occur with such high frequency anyway that readers are pretty much desensitized to them. You'd have to work really hard to make readers notice an overabundance of "the," for example.
  2. Do worry about the more specific, meaningful words. Nouns, verbs, adjectives and adverbs. Avoid re-using the same meaningful word twice in a single sentence. Strive, as much as you can, to avoid repeating them within a single paragraph.
  3. The more unusual a word is, the less often you should use it. A word like "paragon," (number 30144 on the list) probably shouldn't occur more than twice in a whole novel.
  4. You can make exceptions for words which acquire specific meanings relative to the story itself. For example, our sample passage above refers to a "Regent". As an ordinary word, that's #10176 on the list, and thus should occur, statistically speaking, about 0.3 times in a 50,000 word novel. But since in Sonja's story "Regent" refers to a specific person, it's fine to use it as often as necessary. Just, you know, don't violate guideline #2.

Fixing overuse problems

Fortunately, a few simple techniques fix most overuse problems

Here's a revised version of the above passage, with the changes underlined:

Sonja dug through her pack, drawing from its very bottom the oilskin parcel she had carried all the way from Westport. Why was it so important it was worth hiring her to deliver it? Why had the Regent been so insistent on both speed and secrecy?

If not for the his urgency, she would not have taken the shorter but more dangerous forest path. She would not have been bitten by that damnable wereling. She teased out the knots in the leather straps tying the package shut. Whatever it is, she thought, it had better be worth it.

Untied, she peeled back the oilskin, revealing a layer of waxed paper which crinkled softly as she unwrapped it. There in the parcel's heart was a velvet pouch, its drawstring pulled tight. She opened it, and a long, heavy key fell into her hand.

The first one replaces "parcel" with a pronoun. The second one re-thinks the lengthy phrasing about the Regent's concerns into a shorter phrase that essentially means the same thing but avoids referring to the parcel directly. The third one is another pronoun replacement. The fourth one replaces "teased open" with a synonymous phrasal verb, "teased out". However, idiomatically, "teased out" can't relate to the leather straps directly, but has to relate to the knots in them. It's more words, but that's ok; avoiding a re-use of "open" is worth the extra length.

The fifth one replaces "parcel" with a direct synonym, "package". The sixth one replaces "parcel's layers" with "oilskin", which in context is perfectly clear anyway. Readers know what she's doing, and we need not over-explain. And finally, the seventh change replaces "opened" with "unwrapped", which is a more colorful verb anyway.

All in all, that reduces "parcel" down to 2 instances, and "open" down to 1. That's still statistically too many for "parcel", but it's a vast improvement. And the fact that the context of the story gives "parcel" a specific meaning similar to the way "Regent" has a specific meaning makes it ok under guideline #4.

Geeking out

The mathematical basis for all this is Zipf's Law, which relates frequency to commonness-rankings not just in language but actually in surprisingly diverse range of natural and manmade phenomena.

If you want to learn more, the Wikipedia article is pretty good, while this Vsauce video does a nice job of showing just how widespread Zipf's Law turns out to be.