One Third of Online Survey Responses Are Useless

Every time you click a link to a survey, you are entering a contract. The researcher promises anonymity. You promise attention. But there is a dirty secret hiding inside every dataset collected online: a huge chunk of respondents never read a single question.

They click. They scroll. They submit.

Ward and Meade (2022) reviewed decades of research on this problem and arrived at a number that should terrify anyone who has ever published a paper based on self-reported data: in many online surveys, between 10 and 30 percent of responses are completely useless. Some estimates go higher. The authors report that in a typical undergraduate sample, you might lose 3 to 10 percent of responses to obvious carelessness. But in online panels like Amazon's Mechanical Turk, that number regularly climbs past 30 percent.

That means for every three people who complete your survey, one of them might as well have been a robot.

This is not a minor statistical inconvenience. It is a structural threat to an entire field of science.

The Problem Nobody Wants to Talk About

Psychologists love surveys. They are cheap, fast, and easy to distribute. A researcher can collect data from 500 people in a single afternoon. But that speed comes with a hidden cost: you have no idea whether those 500 people actually read your questions.

Ward and Meade (2022) define careless responding as occurring when "respondents fail to read item content or give sufficient attention." In practice, this looks like someone clicking "Agree" for every single item in a 100 question battery, regardless of whether the item says "I am happy" or "I am depressed." It looks like someone completing a 20 minute survey in 90 seconds. It looks like someone answering "Strongly Disagree" to both "I enjoy social gatherings" and "I prefer to be alone."

The authors call these "longstring" patterns, and they are everywhere.

One study cited by Ward and Meade (2022) found that in a sample of over 400,000 survey responses, roughly 11 percent contained obvious longstring patterns. Another found that when researchers embedded simple attention checks like "Please select 'Agree' for this item," between 5 and 15 percent of participants failed them.

The problem is not limited to bad actors. Some people are genuinely trying but get distracted. Others are tired. Some are multitasking. A few are deliberately gaming the system to get paid.

The result is the same: your data is polluted, and you probably did not catch it.

How to Catch a Careless Respondent

Ward and Meade (2022) spent years cataloguing the methods researchers use to detect this problem. The list is surprisingly creative.

### Response Time

The simplest check is timing. If a survey takes 20 minutes for the median respondent, and someone finishes in 2 minutes, they are not reading. But the authors caution that this is a blunt instrument. Some careful respondents are fast readers. Some careless respondents take their time to click random boxes slowly.

The trick is to set a threshold that is too fast to be plausible, not just too fast to be typical.

### Longstring Analysis

This is the technical term for "clicking the same answer over and over." If someone selects "Strongly Agree" for 30 consecutive questions, they are not evaluating each item. They are on autopilot.

Ward and Meade (2022) recommend flagging respondents who have a longstring of identical responses that exceeds a certain length, typically 10 to 15 items depending on the survey.

### Bogus Items

This is the gold standard. You insert a question that has an obvious correct answer. "Please select 'Disagree' for this item." Or "I have never used a computer before." Anyone who answers incorrectly is clearly not paying attention.

The authors report that bogus items are highly effective, but they have a downside: they can annoy legitimate respondents and prime them to think the survey is tricking them.

### Consistency Checks

If you ask the same question in two different ways, a careful respondent will answer consistently. A careless respondent will contradict themselves. Ward and Meade (2022) recommend embedding pairs of items that are logically opposite, then flagging anyone who agrees with both.

### Instructional Manipulation Checks

These are more elaborate versions of bogus items. Instead of asking you to select a specific answer, they ask you to do something unusual. "This is an attention check. Please ignore the question below and type 'I read instructions' in the text box." The authors found that failure rates on these checks can exceed 30 percent in unmonitored online samples.

The Damage Careless Data Does

You might think that a few bad responses just add noise. They blur the signal but do not change the conclusion. That is wrong.

Ward and Meade (2022) show that careless responding systematically distorts statistical results. It can inflate or deflate correlations. It can create false positives and hide real effects. It can make a treatment seem effective when it is not, or make a real treatment look useless.

The authors explain why: careless respondents do not answer randomly. They tend to choose the same response option repeatedly, which artificially inflates the consistency of their answers. This inflates reliability estimates like Cronbach's alpha, making a bad scale look good. At the same time, it deflates correlations between different scales, because the careless responses are not measuring anything real.

In one simulation cited by Ward and Meade (2022), including just 5 percent careless respondents reduced statistical power by enough to miss a real effect. At 15 percent, the false positive rate doubled.

This is not a rounding error. This is a systematic bias that undermines the credibility of published research.

Why Online Surveys Are Especially Vulnerable

The problem has existed as long as surveys have existed. But online data collection has made it dramatically worse.

In a lab, a researcher can see you. They can watch you yawn, check your phone, or stare blankly at the screen. They can ask if you understood the instructions. They can build rapport.

Online, you are a pair of eyeballs behind an IP address. The researcher has no idea if you are reading, watching Netflix, or letting your cat walk across the keyboard.

Ward and Meade (2022) note that the rise of online panels like Mechanical Turk, Prolific, and Qualtrics panels has made data collection faster and cheaper, but it has also created an ecosystem where speed is incentivized. Workers are paid per survey, not per careful response. The faster they finish, the more they earn.

This creates a perverse incentive: the system rewards the exact behavior that destroys data quality.

The authors report that in some Mechanical Turk samples, over 30 percent of respondents fail basic attention checks. The number is lower on platforms like Prolific that screen for attention, but it never drops to zero.

What Researchers Are Doing Wrong

The most alarming finding in Ward and Meade's (2022) review is not the prevalence of careless responding. It is how few researchers try to detect it.

The authors surveyed published studies in top psychology journals and found that fewer than half reported any screening for careless responses. Most simply assumed their data was clean.

This is a problem of incentives. Journals do not require authors to report screening procedures. Reviewers rarely ask. And researchers do not want to admit that some of their data might be garbage.

But the authors argue that ignoring the problem is worse. If you do not screen, you cannot know whether your findings are real or artifacts of carelessness. You are publishing results that might not replicate.

The authors recommend that every study report three things: whether they screened for careless responding, what method they used, and how many participants were excluded. They call this the "minimum reporting standard."

What the Research Does Not Prove

This is a fair place to pause and ask: does every careless response ruin a study?

The answer is no.

Ward and Meade (2022) are careful to note that the impact of careless responding depends on the sample size, the effect size, and the type of analysis. In a large sample with a strong effect, a few bad responses might not change the conclusion. In a small sample with a weak effect, they can be devastating.

The authors also acknowledge that some screening methods are too aggressive. If you set your timing threshold too low, you might exclude fast but careful respondents. If you use too many bogus items, you might annoy legitimate participants and introduce a different kind of bias.

There is no perfect solution. Every screening decision is a tradeoff between data quality and sample representativeness.

But the authors argue that the default should be to screen, report, and justify your decisions. The current default is to do nothing. That is worse.

The Ethical Dimension

There is a moral argument here that Ward and Meade (2022) do not fully explore, but it deserves attention.

When researchers publish results based on careless data, they are making claims about the world. Those claims can influence policy, clinical practice, and public understanding. If the data is junk, the claims are suspect.

This is not an abstract problem. In the early 2010s, a series of high profile replication failures in psychology were partly attributed to careless responding in original studies. The replication crisis was not just about p hacking and small samples. It was also about data quality.

Ward and Meade (2022) do not say this directly, but their review implies a hard truth: if you do not screen for careless responding, you are not doing rigorous science. You are collecting noise and calling it data.

What This Actually Means

▸If you run online surveys, you must screen for careless responding. The minimum is a timing check and a longstring analysis. If you do not screen, you cannot trust your results.

▸Report your screening decisions in every paper. Tell readers how many participants you excluded and why. This should be as standard as reporting sample size and demographic characteristics.

▸Use bogus items or instructional manipulation checks in every survey. They are the most reliable way to catch inattention. Yes, they annoy some participants. That is a price worth paying for clean data.

▸Do not assume that a large sample protects you from careless responding. Even 5 percent bad data can distort results. The bigger the sample, the more bad data you can accumulate without noticing.

▸Be skeptical of any study that does not report screening. The absence of a screening report is not proof of clean data. It is proof that the authors did not check.

References

[1]M.K. Ward, Adam W. Meade (2022). Dealing with Careless Responding in Survey Data: Prevention, Identification, and Recommended Best Practices. Annual Review of PsychologyDOI· 419 citations

One Third of Online Survey Responses Are Useless