This data provided one piece of a complex puzzle. The rest of the puzzle hit my inbox with a mighty thud last week. I received an email from an author with advanced coding skills who had created a software program that can crawl online bestseller lists and grab mountains of data. All of this data is public—it’s online for anyone to see—but until now it’s been extremely difficult to gather, aggregate, and organize. This program, however, is able to do in a day what would take hundreds of volunteers with web browsers and pencils a week to accomplish. The first run grabbed data on nearly 7,000 e-books from several bestselling genre categories on Amazon. Subsequent runs have looked at data for 50,000 titles across all genres. You can ask this data some pretty amazing questions, questions I’ve been asking for well over a year [link]. And now we finally have some answers. When Amazon reports that self-published books make up 25% of the top 100 list, the reaction from many is that these are merely the outliers. We hear that authors stand no chance if they self-publish and that most won’t sell more than a dozen copies in their lifetime if they do. (The same people rarely point out that all bestsellers are outliers and that the vast majority of those who go the traditional route are never published at all.) Well, now we have a large enough sample of data to help glimpse the truth. What emerges is, to my knowledge, the clearest public picture to date of what’s happening in this publishing revolution. It’s a lot to absorb, but I believe there’s much here to learn.
This is definitely worth a read. We hope that data crunchers will download the raw data Hugh has provided and do their own analysis. We need more of this.