This data pro­vided one piece of a com­plex puz­zle. The rest of the puz­zle hit my inbox with a mighty thud last week. I re­ceived an email from an au­thor with ad­vanced cod­ing skills who had cre­ated a soft­ware pro­gram that can crawl on­line best­seller lists and grab moun­tains of data. All of this data is pub­lic—it’s on­line for any­one to see—but until now it’s been ex­tremely dif­fi­cult to gather, ag­gre­gate, and or­ga­nize. This pro­gram, how­ever, is able to do in a day what would take hun­dreds of vol­un­teers with web browsers and pen­cils a week to ac­com­plish. The first run grabbed data on nearly 7,000 e-books from sev­eral best­selling genre cat­e­gories on Ama­zon. Sub­se­quent runs have looked at data for 50,000 ti­tles across all gen­res. You can ask this data some pretty amaz­ing ques­tions, ques­tions I’ve been ask­ing for well over a year [link]. And now we fi­nally have some an­swers. When Ama­zon re­ports that self-pub­lished books make up 25% of the top 100 list, the re­ac­tion from many is that these are merely the out­liers. We hear that au­thors stand no chance if they self-pub­lish and that most won’t sell more than a dozen copies in their life­time if they do. (The same peo­ple rarely point out that all best­sellers are out­liers and that the vast ma­jor­ity of those who go the tra­di­tional route are never pub­lished at all.) Well, now we have a large enough sam­ple of data to help glimpse the truth. What emerges is, to my knowl­edge, the clear­est pub­lic pic­ture to date of what’s hap­pen­ing in this pub­lish­ing rev­o­lu­tion. It’s a lot to ab­sorb, but I be­lieve there’s much here to learn.

This is definitely worth a read. We hope that data crunchers will download the raw data Hugh has provided and do their own analysis. We need more of this.