In research, the best data you can feed a model is the whole population.

After a decade of working with data, working with the entire population is almost never the case. For most of the analyses we create, we end up working with a sample.

When selecting your sample, its methods should be systematic and well-defined to draw a valid inference from it.

There are many different classifications of sampling methods. The most common is probability sampling. Each individual in the population has an equal chance of being selected.

Inside this method, there are multiple submethods. I want to concentrate on the two most common: simple and stratified.

Simple Random Sampling

Let’s say you are running a survey for a town of 5,000. The goal is to determine the town’s political affiliation. Because the population is not that large, we can use 10% (500). In simple random sampling, every individual has a 1/5000 chance of being selected.

In this method, minimal knowledge of the population is required. You select your people and go for it.

Stratified Random Sampling

In stratified sampling, you divide the sample into subgroups (strata) that share similar characteristics, such as age, sex, race, income, education, and so on.

This allows each stratum to be estimated and compared while reducing the variability from systematic sampling.

Using the same example mentioned above, if you want to find the political affiliations of men and women, stratification is the best way to do so.

Of course, the analysis would be very different, requiring the application of weights and other methods for the proper inference.

Which Method is Best

No method is best. It depends on the kind of analysis you are doing. Back to the example, if you are just trying to find the city’s political affiliation, the simple method will suffice.

Conversely, if you are looking to determine political affiliation by gender, stratification is the way to go.

It is crucial to know these methods and refer back to them when necessary.