Are ebook Samples really Useful?
Why Did I Do This?
One of the biggest problems with books these days – and I guess I really mean ebooks – is there’s just too much freaking choice. The rise of self-publishing is undoubtedly a good thing, it means that anyone and everyone can get their words online and into a form you can conveniently download onto your phone, tablet or ereader device. But not everyone and anyone can write, or has something interesting to say, or can use a spell-checker apparently. And that’s before we get into issues of taste and preference.
One of the tools that sites like Amazon use to counter this problem – along with ratings and reviews – is the availability of free samples. Basically every ebook available from Amazon also has a sample – usually the first chapter or so – that you can download for free. A try-before-you-buy option with no commitment. Good idea huh?
Yes. Well, I mean I think so in principle but I seem to almost never use them in practice. This post will be partly about why that is. Maybe.
However the thing that really inspired this post was when samples are used in the recurring arguments over the relative quality of indies versus trad-published books. This is a sub-section of an argument about quality and it basically says that even if there is a lot of unreadable junk out there it’s possible to find the “gems” by using, amongst other things, samples.
Let’s just say I’m sceptical about this – surely it simply takes too much time to read samples to use them as anything other than a final filter? But that’s a gut reaction. So I thought I’d test it. Sort of.
What did I do?
I decided to throw a few numbers together and see what came out.
On the 16th August 2012 I went to amazon.co.uk and I looked at the available fiction ebooks (I almost never read non-fiction). I read mostly from the following genres (Amazon’s categories) SciFi, Fantasy, Crime & Thrillers and Action & Adventure. I looked for a “comedy” category but although I found “humour” as a category for paper books I didn’t for the Kindle store. Also that included non-fiction humour – books of essays and memoirs and so on – which I’m less inclined to read.
Anyway here’s a list of how many titles there were:
|Action & Adventure||38,375|
|Crime & Thrillers||74,605|
Clearly, even without further analysis that’s too many books. Fortunately Amazon gives me lots of ways to filter these. I can look at just the ones with a 4star or higher review average (I want to read the good ones right?), or the ones which came out in the last 30days (let’s assume I check regularly) or I could look at what’s about to come out. Or combine two or more of these.
|Action & Adventure||38,375||4,508||1,435||70||70|
|Crime & Thrillers||74,605||12,987||3,035||509||250|
Now some of those numbers look less scary but what do they mean in terms of reading samples?
What did I assume?
I needed to make an mathematical model (i.e. a spreadsheet) and for that I need some generalisations or assumptions.
First let’s assume that it takes me on average 5mins to read a sample. Sample sizes vary but I am a slow reader so I think this is on the low end but that will favour the proposition that samples are a good way to filter.
So let’s plug that into our model and here’s the time taken to read all those samples:
|Action & Adventure||133d 5h55m||16d 15h40m||5d 23h35m||5h50m||5h50m|
|Crime & Thrillers||259d 1h05m||45d 2h15m||11d 12h55m||2d 18h25m||1d 20h50m|
|Fantasy||135d 16h20m||22d 3h55m||6d 7h05m||1d 14h15m||11h20m|
|SciFi||118d 17h20m||14d 5h50m||5d 5h22m||8h30m||50m|
|All four||645d 16h50m||97d 3h40m||27d 18h30m||3d 23h35m||2d 14h50m|
|SciFi/Fantasy||252d 9h50m||36d 9h45m||11d 6h00m||1d 23h20m||1d 12h10m|
|All Fiction||1949d 12h50m||235d 0h50m||79d 5h05m||11d 7h05m||5d 21h25m|
Whoops! The power of multiplication has turned what had seemed reasonable book numbers into to unreasonable lengths of time. I’m clearly not going to spend days (or months, years!) reading samples to decide my next “full” book read. About the only thing that seems reasonable is 4star SciFi from the last 30 days.
How did I refine the model? (assumptions #2)
OK so I’ve got some numbers now but are they at all useful? Would any sane person really trying to read all the samples from a particular category? Probably not. We can refine the model with a couple of additional assumptions. Let’s say I go to Amazon and look at the list of my particular category – it shows me them in pages of 12 where I get the book covers, titles and authors. Probably what I would do is page through this list and click on a few likely looking ones and read the blurb and if that didn’t immediately disqualify itself I’d then download the sample.
So let’s assume it takes 5seconds to scan each page of 12 book titles and covers.
Let’s assume that for any list 10% are worth reading the blurb and that it takes 15seconds to skim-read the blurb.
Remember this is based on testing the idea that samples are actually the way to go so the blurb-reading is really to confirm that the cover/title has given the correct impression as regards genre and probable content.
Finally let’s assume that we commit to read the samples of half the ones where we read the blurb i.e. 5% of the list overall.
Plugging those numbers in to our new model the overall time take per list is:
|Action & Adventure||8d 12h19m||1d 21h11m||6h44m||19m||19m|
|Crime & Thrillers||15d 14h34m||3d 13h01m||1d 14h15m||2h23m||1d10m|
|Fantasy||8d 14h16m||1d 5h59m||8h31m||50m||38m|
|SciFi||7d 15h19m||1d 19h16m||6h42m||28m||2m|
|All four||36d 8h29m||5d 11h28m||2d 12h13m||4h02m||2h11m|
|SciFi/Fantasy||14d 5h35m||2d 1h16m||1d 15h13m||1h18m||41m|
|All Fiction||110d 21h01m||13d 6h04m||4d 11h12m||1d 15h17m||6h37m|
Still a lot of large numbers there. I’m automatically rejecting anything over a day. However an hour and a half to check out upcoming SciFi/Fantasy seems doable, as does a couple of hours to review the 4star+ books in my favourite genres from the past 30 days.
So, whilst the numbers overall confirm my gut instinct, limit the scope a little and it may actually be a viable method.
Hold on a second your model is wrong because…
I can think of two main reasons someone may object to the way I’ve set this up:
- The numbers in your assumptions are wrong. Obviously it’s true that if we vary these numbers we can come out with different answers. All I can say is I think the assumptions are roughly true for me and I’ve tried to err on the side that would lessen time taken so that I’m giving sampling as a method a fair chance.
- In reality, no-one would do it that way. Clearly when you have a nice simple equation you can plug whatever numbers you like in and get the answer. A human being however would react differently given 10 books to sample rather than 10,000. In other words the assumptions don’t scale. I think this is true. I think that the larger the number of books you have the more you would want to use other filters first OR the more likely you are to simply bail out early i.e. read the first 25 samples say, and pick the best of those. However I think the numbers are still useful because they show the difficulty of getting your book read, based on sampling alone, if it’s lower down that list. Which I think just confirms what indie authors already know which is the importance of getting as may good reviews, ratings and getting as high up those popularity lists as possible.
Have I learnt anything?
I think so. I had assumed that if I wanted to find something new to read I should follow the usual routes – reviews from trusted sources and recommendations from family/friends – methods which haven’t changed since I started reading (well before the advent of ebooks). I hadn’t expected sampling would help because I hadn’t expected that the numbers would ever dip to low enough levels to be reasonable. Turns out that may not be true and scanning the latest 4star books in my chosen genres once a month for samples might be a worthwhile investment.
Or not. Because intellectually I can see the merit. Psychologically an hour reading samples when I could be reading my next book seems like an hour wasted.