On Clouds

by F.

Lexical statistics are highly informative. Unfortunately, a long list of word frequencies is hard for the mind to digest. But now that the “cloud” visualization tool is ubiquitous, I suspect we’ll see more of this kind of analysis:

This screenshot is an analysis Todd Bishop did of some Microsoft documents.

Even better is Chirag Mehta’s analysis of US Presidential speeches. Notice how large the word “economy” is, including its related terms. Also, notice how “china” is pretty large in the early part of the 20th century. With this sort of analysis, I suspect you could find some surprising patterns in things like legal cases, statutes, tax laws, and other sort of stuff that is too boring to actually read.