On an Unscientific Reading Experiment

by F.

My cat can’t read and yet I can, which I find amazing. I can look at a bunch of tiny squiggles on a page, each one no more than an eighth of an inch tall, and discern meaning from them—construct in my mind some sort of story or scene or fact or whatever. This floors me. When I think about the evolution of human beings, I don’t see anything that could have given rise to our ability to read.

This may—probably is—just ignorance on my part, but if you compare other talents we have, such as color vision, it’s not as hard to tell an evolutionary story about them. For instance, say, color vision allowed us to distinguish different kinds of plants, some of which were poisonous. I’m not saying that’s exactly right, but there is some plausible story that can be told about color vision, 3-D vision, the length of our digestive tract (longer than a cat’s due to differences in diet), and so on.

But reading? I just don’t get that. Even the process of reading is amazing, because it seems to involve a lot of guesswork. This is a bit like music. According to one popular theory of music cognition, if you send a stream of music at a human being—let’s say it’s a simple melody—the brain is constantly making guesses about what will come next. The music hits the human being serially. It’s a string of information. The brain has three basic guesses about the next note. Higher? Lower? Same? Previous notes alter the subjective probability of the next note being higher, lower, or the same. So, if you’re in the key of C, you’ll expect (that is, think likely) that the melody will resolve to C. The pleasure of music comes from it’s being somewhat predictable—our guesses are right sometimes—but also somewhat unpredictable—we are sometimes wrong. This is a great simplification, but it seems like the right approach.

Reading is the—fish. Didn’t see that coming, did you? It was highly unlikely that I would say “fish” after “the.” The point: reading is like music. You have expectations of what comes next. Subjective probabilities. Bets. Guesses. Hunches. Words like “but” and “although” and “however” make us (the readers) make different guesses or bets. “Oh, now we’ll get the other side of the argument,” or “now things will go from good to bad.” It seems to me that any theory of reading worth a damn must be probabilistic like this. It must describe how a brain makes predictions of what will come next when presented with a stream of information, presented linearly on a page.

I was reading a book entitled Understanding Expository Text yesterday, which I was hoping would shed some light on this topic. But it was a disappointment. Mostly because the models use old-school sentential logic. I didn’t see a lot of “psychology” in the book, even though it is classed as being on the “psychology of reading.” If the brain is anything, it is a quasi-Baysian, belief updating machine. So any theory that just gives static redescriptions of textual phenomena in a new lingo (i.e., sentential logic) is unlikely to be illuminating, in my humble opinion.

My gut suggests to me that “reading is guessing followed by confirmation or infirmation.” Not a groundbreaking thought. At all. But I thought about the following self-experiment. Suppose I am reading, say, The Economist. How much does the order of presentation of the text matter to my comprehension? For instance, suppose I could look at a concordance of the text, along with (a) the section of the magazine containing the piece (i.e., leaders, Business, Finance and Economics, etc.) and (b) the titles (both main and sub). How much of the story would I get at that point compared to after having read the story? Eighty percent? Twenty?

Here’s the story I used. The piece is in the section called “The Americas,” which means Latin America. The title is “The Tough Get Going,” under the topic “Crime in Mexico.” The “deck copy” or “bank” or “standfirst” (that is, the “sub heading” summarizing the story) is “The new president has sent the army after the drug mobs. More importantly, he has started to reform the police.” Now for the concordance. I’ve used a basic text analysis tool from DEVONThink to get the most frequent words (in order from most to least):

  1. the
  2. and
  3. police
  4. The
  5. for
  6. drug
  7. has
  8. that
  9. government
  10. local
  11. Mexico
  12. Calderón
  13. García
  14. was
  15. will
  16. been
  17. federal
  18. forces
  19. they
  20. AFI
  21. army
  22. being
  23. But
  24. control
  25. country
  26. far
  27. first
  28. force
  29. Fox
  30. from
  31. have
  32. Michoacán
  33. ministry
  34. month
  35. murders
  36. says
  37. security
  38. Some
  39. Tijuana
  40. wants

And so on. These are not all of them. But you get the idea. Then I look at the “weight” of the words, which is a special “relevance” property that DEVONThink assigns the words. I don’t know the algorithm, but I can guess that it compares how likely a word is generally (or in the rest of the database) with its frequency in this text sample. Or something like that. Glancing at this list, I can get a pretty good idea of what is in the story.

Then I ask myself some simple “story” questions—the kid of thing a human mind is good at and at which computers suck. For instance, Who wants what from whom and why now? Well, the government wants to crack down on crime and wants to reform the police. The President is “new” so he wants to make a splash (maybe). We got all this info in the standapart. So that means the cops are corrupt. Probably in league with the gangs. What do the gangs want? Sell drugs, make money, and so on. OK. Got that. Now, since this story is in the magazine this week, it means that there is probably something new—so the new President is trying out some new tactics.

Already, I can get a pretty good picture of what this story is most likely about. Now I just have to read the text and look for surprises. This minimizes the cost of my consuming the story. Basically, I suspect this story will boil down to something like this: President wants to reform the cops and go after the drug ganges, but he faces obstacles X, Y, Z. It’s too soon to tell whether he will succeed (or he is doomed or he will likely succeed). Time will tell. The End.

With this is mind, I read the story. It was pretty much in line with my guesses—which shouldn’t be surprising, because The Economist is “well written” (i.e., easy to understand). But some other facts come into play in the actual text—why the President (Felipe Calderon) actually wants to do this, where he is focusing, and why he think the cops won’t become corrupt this time. These details fill out the basic schema I guessed about earlier. In addition, there are a lot of numbers—7,000 troops, 2,100 drug related murders last year, 600 killings, and so on. And then there is the word “severed” which, of course, jumped out at me in the concordance. I immediately thought “limb.” I was wrong: “in a notorious case, five severed heads were dumped in a dance hall in Michoacan.” Well, at least I was close.

The point is that my guesses got me about (say) 80% of the way to the story’s meaning, I would say. The actual reading of the stream of letters on the page got me farther, but I think the first step yielded more than the second. Which makes me think reading is guessing, and good reading is guessing correctly. It follows that good writing is writing that makes it easy for the reader to guess correctly.