Get the latest tech news

How large should your sample size be?


I read a recent interview with Hadley Wickham. Two things stood out to me. The first is how down-to-earth he seems, even given how well-known he is in the data science community. The second was this quote: Big data problems [are] actually small data problems, once you have the right subset/sample/summary. Inventing numbers on the spot, I’d say 90% of big data problems fall into this category.

Even if you don’t have huge data sets (defined for me personally as anything over 10GB or 5 million rows, whichever comes first), you usually run into issues where even a fast computer will process too slowly in memory (especially if you’re using R). how large you want your confidence level to be: i.e. do you want to be 90% sure that you found all the farms that will cancel Goatly and leave your CEO traveling coach with the peons? There are a couple of ways to get close to it in R, but I haven’t found anything in the pwr library so far that only requires those three things: margin of error, confidence level, and population.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Sample Size

Sample Size

Related news:

News photo

Sample Size [in Baseball]