Get more Inspiration

Is setting quotas a thing of the past?

Maybe it's time to rethink the use of quotas in online surveys. Learn how to reduce bias and obtain more representative responses.

31 July 2020

Jon
Puleston

Vice President, Innovation, Profiles Division

Chris
Stevens

Head of Quality, Profiles Division

Get in touch

Quotas are generally established in surveys to ensure a good balance of responses from different groups of people. However, there are many things that can go wrong when setting quotas, due to their misuse or legacy setups that aren’t logical for online research. There can also be general misunderstandings of the impact quotas have on the efficient delivery and representation of a research project.

So, we ask: what quotas do you really need? And is setting quotas in your survey a thing of the past?

Below we will answer various questions around quota types and potential problems that can arise when implementing them. We’ll also explore alternatives to quota sampling more fit for modern day online fieldwork.

What quotas do you really need?

The most common quotas used today are age, gender and region. Researchers often apply quotas on other factors too, such as social class, education and ethnicity (particularly if there are known skews in the online sample sources compared to the population being measured).

The more quota variables, the more complex the field management is of a project. You can get to the point where you are looking for that proverbial “needle in the haystack”. Why doesn’t this work? Typically, by inference, these interlocked cells are looking for odd combinations which often don’t exist in the population. Because of this, you can inadvertently start to include “non-representative” sample into your survey.

You also risk an increase in field time and reduction in feasibility. With every dimension of binary quota added, the sample pool is effectively halved and the time it takes to sample a project increased. Easy-to-fill quotas fill up faster than harder to reach quotas, thus you face the paradoxical problem that harder to reach groups become even harder to reach with additional quotas.

This is why trade-offs are needed when setting up quotas using several variables. The first two we mentioned, age and gender, are clear. We know that people of different ages and gender have different views and opinions and buying behaviors. Can you say the same about region?

Why do we set quotas on region?

Understandably there will be regional differences in consumption for some product or service types, which will have an impact on attitudinal differences. For example, people in red and blue states in the US often have different politically driven opinions. Generally speaking, most products are not marketed regionally these days because digital consumption doesn’t have the same regional boundaries.

Quotas are used to handle misbalances in responses; for example, more older people than younger people and more women than men may respond to a survey invitation. But there generally is very little difference in the response rates by region.

So why are we wedded to regional quotas? Regional quotas are often carried over from legacy face-to-face research, where it was necessary to have a spread of interviews evenly around the country. With online research, sample is naturally regionally homogeneous so it’s not something to worry about.

We recommend only using regional quotas if it is demonstrably important on ensuring the balance of data. Essentially, if there are clear regional differences in attitudes or consumption levels you need to consider or differences in response rates by regions. Otherwise regional quotas are more of a hindrance than a help.

What other quotas do you really need? (Are you trying to set unquotable quotas?)

Age, gender and where you live are usually reliable data points with well-established general population benchmarks to compare against. Social class and education, however, are more subjective and unstable measures. Simply when and how you ask them in a survey will wildly influence different answers and can be subject to significant levels of overclaim. For example, if 60% of your respondents tell you they have degree level education, but you know only 40% of the population has a degree, setting education quotas would be misguided without first tackling the underlying challenge of measuring what level of education someone actually has.

If you’re looking for a representative sample, we would advise against setting quotas on any measure that is subject to any degree of self-report bias or without a reliable population benchmark to calibrate it against. Instead of getting a representative view, you may be introducing hidden bias.

Why would setting quotas on hard-to-reach groups backfire?

It can be a mistake to use quotas to deal with a natural shortfall of a demographic group in sample. Take for example a survey for a product aimed at older people. During fielding, you notice a lack of responses of people 75 and older so you set an additional age quota (of 75 and older) to make the sample more representative.

The question is, did increasing the age group make the sample more representative? Possibly not. It’s dependent on if those in the quota group who take the survey are representative of the whole age group. It’s likely, however, that there is under representation of over 75s because the group who do participate are different to those that don’t. They are probably more technologically advanced, wealthier, better educated and more cognitively able.

By forcing a quota on this group it might look like you are making your sample more representative, but instead you risk introducing bias by upweighting the opinions of this group over others.

Do interlocked quotas solve sample representivity problems?

To correct any imbalance of incoming completes, you might be tempted to interlock quotas. For example, interlocking age and gender to get an exact number of men and women in each age group.

The challenge with this approach is that it makes filling quotas even harder from the get-go and you end up blocking respondents from the survey. Whilst someone might be able to fulfil one open quota group (i.e. gender), the other quota (i.e. age) might be full, thus terminating them from the survey. As a result, you end up with an exponential growth in the time it takes to sample your project as with each binary increase in quota cells you are effectively halving the sample pool.

Expected time need to sample:

No quota = 1/2 day
Gender quota = 1 day
Age & gender quotas = 2 days
Age, gender & regional quotas = 4 days

How reliable are the quotas being set in the survey?

Another challenge can be the reliability of the quotas and if they are feasible. For example, setting quotas based on “national representation” for a survey with less than 50% incidence.

If a certain age group or gender does not qualify at the same rate as others then it could lead to a gap in the data or result in falling short of the needed completes. It is important to understand the source of the quotas and how up to date that information is, particularly for groups like Social Class or Living Standards Measure.

Take the example of targeting the viewers of a popular TV talent show that half the population watch to find out what they think about the program. Now you cannot assume that those that watch it are representative of the whole population; there may be more younger families in the audience, for example. So by then imposing nationally representative quotas on the sample you are biasing the feedback, as you might be over representing the views of the types of people who don’t watch it.

So, what are the alternatives solutions to ensure balanced sample?

Using sample invitation quotas can be better than setting quotas in the survey itself

The most efficient way to ensure you get a representative sample in your survey is not by setting quotas in the survey itself but through the sample invitation process and letting the sampling engine manage the process.

Sampling technology has become increasingly more sophisticated in recent years and we are much more able to efficiently handle the quotas through the sample invitation process. If you want 50 men and 50 women to complete the survey, the sample engine will send out exactly the number of invites it thinks, through machine learning, it needs to get 50 males and 50 females. The gender information on the panelist is automatically passed through to the survey via and API and that data is “auto-stamped” into the survey.

By trying to set and manage quotas in the survey itself, you can quickly trip up the whole process. It only takes one person in 100 to accidentally say they are a woman instead of a man for there to then be a miss match between who the sample engine thinks has completed the survey and the data in the survey itself. When this happens on multiple quota cells, the only solution is to continuously monitor the survey and close down quotas manually. Doing that is time consuming, costly and increases the number of quota fails. To overcome the problem, many more people must start the survey than necessary.

This a major industry-wide issue and we estimate that one in four people trying to do a survey get screened out because of a quota fail issue. The inefficient use of audiences in this way has a knock-on effect and inadvertently costs every researcher more money.

For this reason, we advocate managing quotas more directly using sampling protocols and the automated pass-through of demographic information into the survey where possible, reducing the need to set any basic in-survey quotas and any manual monitoring.

Use weighting instead of using quotas

An alternative to using quotas, and one that should be strongly considered, is to more carefully weight the data. As a middle ground, you can also set looser quotas and then weight the differences.

For context, if you have a 10% difference in scores between two quota groups, and one group is under represented by 10%, then the overall differences in the scores even before weighting are likely to be quite small; in the region of +/-1% compared to full balanced sample.

Place this into the wider context of errors caused by other common factors (i.e. bad survey question design), which can cause errors of +/-50%+ when chasing fully balanced quotas. You may end up worrying about the wrong problem, so we recommend a more pragmatic perspective.

Interested in learning more? Use the form below to download Kantar’s paper, “6 ways to combat quota problems and unrepresentative sample”.

If you’re looking for personalized support for reviewing or selecting quotas, Kantar can help. We’ve established a global database of nationally representative demographic benchmarks that can be used for setting quotas for any international project. We also have a program to review the complexity of your existing quota set ups and provide guidance on improvements and management to support the efficiency and representativity of your surveys. Use the get-in-touch button below to learn more.