October 15, 2005

New ways to get lost in Amazon.

Have you noticed the new features in Amazon? I haven't seen any announcements on their main page, but if you go to an individual book, now, you can find all kinds of new, cool things. I'm not finding it for every book, and it seems as though it wouldn't be possible for books that are not open to the "search inside" function. But take, for example, "Dress Your Family in Corduroy and Denim." Let's look at the SIPs -- or "statistically improbable phrases":
Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements.
For "Dress Your Family," they've only come up with: "eight black men." If you've read the book, you know what that refers to. If you click on the phrase, you get all the other books with that phrase: here. Useful? Possibly not in this case, though still interesting, in a rather random way.

Then there are the CAPs (capitalized phrases):
Aunt Monie, Monie Changes Everything, Baby Einstein, The Girl Next Door, The Ship Shape, Full House, Nuit of the Living Dead, Blood Work, North Carolina, Slumus Lordicus, Kwik Pik, Great Dane, Puta Lid, Saint Nicholas, Anne Frank, The End of the Affair, Royal Pavilion, Who's the Chef, Apple Pan, The Empire
Each is clickable. You can then see other books that frequently use that capitalized phrase. Well, no one else is using Slumus Lordicus yet, but here are the Anne Frank references.

Then there's the concordance:
Concordance is an alphabetized list of the most frequently occurring words in a book, excluding common words such as "of" and "it." The font size of a word is proportional to the number of times it occurs in the book. Hover your mouse over a word to see how many times it occurs, or click on a word to see a list of book excerpts containing that word.
There are also "text stats," showing you how easy or hard the book is to read. Sedaris, it turns out, is awfully easy to read -- 7th grade level. Doesn't say how funny he is, though.

Let me check another, similarly funny, but much darker book I like, "Running With Scissors." Hey, that's even easier to read! Let's try "The Curious Incident of the Dog in the Night-Time." Oh, that's easy too. Hmmmm.... that's got a SIP of "bloody dog," so who else is SIP-ing "bloody dog"?

James Joyce! All right, then. Let's get the text stats for "Ulysses." I see that's easy to read too, so they say. According to the Flesch-Kinkaid analysis, it's written at less than a 7th grade level, so if you're not ready to tackle "Dress Your Family in Corduroy and Denim"...

UPDATE: You can check text stats for blogs at this website. In case you're wondering, this blog has the following numbers:
Gunning Fog Index 10.31
Flesch Reading Ease 66.29 (higher is easier, with 100 being the easiest)
Flesch-Kincaid Grade 7.13
The numbers for "Ulysses" are: 9.0, 68.1, 6.8. Do you find this confusing? Generally, I think it's a good sign if your ease-of-reading stats seem low for the difficulty of the material.

8 comments:

Steve Donohue said...

Ulysses is easy to read?

Isn't Ulysses one of the most difficult books in the English language? Maybe the words are simple, but Ulysses is not an easy book to read by any logical measurement.

Ann Althouse said...

Steven: Yeah. It raises interesting questions about the conventional stats used for rating difficulty, don't you think?

miklos rosza said...

The Amazon preview feature for CDs plays much longer excerpts of classical music than of rock, sometimes up to one minute long.

There are a lot of new albums available of tonal late-Romantic stuff that just wasn't on CD five years ago or so. The pianist Marc-Andre Hamelin began digging up obscure but often excellent material that had slid off the radar screen because of the Russian Revolution, WWII, or simple inanition. Stephen Hough, another great pianist, has also dug up some interesting stuff.

I've had wonderful success (after listening to extended excerpts) getting records by Federico Mompou (who's somewhat Satie-esque), Nikolay Roslavets and Sorabji (both of whom to my taste improve somewhat on Scriabin, heretical as that may sound), Karel Szymanowski, and piano concertos by Medtner in particular.

Proust's lover, Reynaldo Hahn, was a composer, and as a curiosity there's a piano concerto by him that's not bad.

The concert repertoire has long been so boringly unimaginative, it's somewhat as if in going back to 60s rock one only knew the Beatles and the Stones. But what about the Kinks, and the Zombies, the Jefferson Airplane, the Doors and the Byrds?

I feel like an archaeologist digging up tombs full of gold.

Learned Fist said...

Gee...according to MS Word, the journal comment that I am (ostensibly) working on has a Flesch Reading Ease of 19.9, and a Flesch-Kincaid Grade Level of 12.0. At least there's one thing that I am better than Joyce at (other than reading an eye chart, of course).

Ann Althouse said...

Learned Fist: It's not better. You need to edit more!

Learned Fist said...

Prof. Althouse:

I know -- I was just making a sarcastic comment on those stats in the first place, and what they can't measure. I used to work around instructional designers who were obsessed with writing at a specific grade level (6th grade for people working in a corporate environment). Many of them confused "reading ease" with "complexity", and complained that they couldn't explain things without getting complex.

Joyce, on the other hand, is an example of the opposite. Just because a 7th grader should be able to understand the words Joyce uses doesn't mean that they'll appreciate the depth in them. Likewise, just because I use long words does not mean that I'm not shallow.

-- Learned Fist
(whose favorite required reading in undergrad was 'A Portrait of the Artist as a Young Man')

Steve Donohue said...

On the other hand, we now have scientific proof that Prof. Althouse is more difficult to read than James Joyce (66.29 to 68.1 on tha Flesch reading scale.)

Ann Althouse said...

Steven: My blog has a lot of pasted in quoted material. I'll bet I'd come in easier if that stuff were removed. I try to be easy to read!