Date: Saturday, May 23, 2026
Hello, AEA365 community! Liz DiLuzio here, Lead Curator of the blog. This week is Individuals Week, which means we take a break from our themed weeks and spotlight the Hot Tips, Cool Tricks, Rad Resources and Lessons Learned from any evaluator interested in sharing. Would you like to contribute to future individuals weeks? Email me at AEA365@eval.org with an idea or a draft and we will make it happen.
I’m Gene Shackman, Applied Sociologist and author of the Beginners Guide to Evaluation.
When using surveys as a tool to conduct an experimental or quasi-experimental evaluation, it’s typical to administer the survey to a sample of the population. We then take the results from our sample and apply them to the population.
For example, we ask a sample of people who participated in a Job Training Program: “What were the most useful parts of the Job Training Program?” We then want to be able to say that the responses would describe how all the program participants thought.
Ideally, we would want to use some type of probability-based sampling approach, where each person has a known probability of being selected. Examples of probability-based sampling approaches include random sampling or stratified random sampling. With probability samples, we can be reasonably confident that the results of the survey apply to the population.
Unfortunately, the past decades have seen declining response rates. For example, by 30 percentage points among PEW Research Center telephone surveys and among some government surveys. These declines mean that, currently, response rates can vary widely depending on the survey, the method, the sponsor, and other factors. Low response rates have the potential to result in non-response bias where those who respond are different from those who did not respond. Some research, like Rhodes et al, 2025, and Roberts et al, 2020, show that surveys with low response rates are vulnerable to non-response bias. A study by Heirene et al, 2025 shows very mixed results, where non-response bias depends on the variable, eligibility criteria, sample size, data collection methods and the setting.
At least partly because of declining response rates, surveys have become more expensive due to the increased effort needed to get responses. Because of the increased costs, more organizations are turning to less expensive, non-probability samples. Non-probability samples do not, by definition, have known probabilities for each participant to be selected for participation. As a result, one can’t clearly say whether non-probability samples are representative of the larger population, and so one also can’t say whether survey results from non-probability samples can be generalized to the larger population.
Researchers have developed several methods to reduce bias in non-probability samples. The most intuitive approach makes the sample look like the population on key characteristics.
Quota sampling is the simplest version. Researchers select participants in proportions that match the population on variables like age, gender, and education. The evidence on whether this works is discouraging. A review of comparison studies (Freese and Jin, 2025) found that quota sampling did not sufficiently reduce bias in non-probability samples across multiple domains, including demographics, voting behavior, health behavior, and technology use.
Murray-Watters and colleagues show how unreliable individual non-probability vendors can be, even when those vendors apply quotas. They commissioned the same questionnaire from eight vendors who each used quotas on age, region, gender, and education at their discretion. Estimates varied widely across vendors and across outcomes. A vendor whose sample produced accurate estimates for one outcome often produced poor estimates for another. Averaging point estimates across all eight vendors reduced the worst-case error. A LASSO-based procedure that selected an optimal subset of vendors reduced error further, though that method requires a probability sample as a benchmark.
Quotas have a structural limit that helps explain these results. As Freese and Jin (2025) note, a quota sample only matches the population on the specific variables used to set the quotas. On any other variable, including the outcome of interest, the sample can still diverge from the population in ways the researcher cannot detect.
The takeaway for evaluators is practical. Non-probability samples are cheap, fast, and tempting. They are also inconsistent in ways that quotas only partially fix. Use them when the question tolerates that uncertainty. Treat the results with appropriate humility when it does not.
Do you have questions, concerns, kudos, or content to extend this AEA365 contribution? Please add them in the comments section for this post so that we may enrich our community of practice. Would you like to submit an aea365 Tip? Please send a note of interest to AEA365@eval.org. AEA365 is sponsored by the American Evaluation Association and provides a Tip-a-Day by and for evaluators. The views and opinions expressed on the AEA365 blog are solely those of the original authors and other contributors. These views and opinions do not necessarily represent those of the American Evaluation Association, and/or any/all contributors to this site.