Dec 15, 2022 5 min read

The RCT Fetish

A curious thing happened in 2016. The federal government quietly removed its recommendation to Americans to floss in that year’s dietary recommendations.

The move was prompted by an article finding little to no evidence in favor of flossing in the Associated Press. No good randomized controlled trials (RCTs) supported flossing. Therefore, flossing was no longer recommended. Internet hot-takes gleefully exposed the big flossing lie.

Five years later, after more than 600,000 deaths from COVID-19 in the U.S., millions of deaths worldwide, and endless debate about the efficacy of masks in preventing transmission, something quite similar has happened. A randomized controlled trial in Bangladesh has established that masks actually *gasp* ARE effective at preventing transmission of COVID-19. Finally. Now we can put the debate to rest.

Both of these narratives rely on the RCT fetish: the idea that only a randomized controlled trial can definitively determine whether an intervention “works” or not.

I like RCTs. No — I love them. They are one of the greatest research innovations ever created. But RCTs are not magic.

The question of “what works” is always more complicated than it first appears. But there are two major components to answering it. First, are there empirical comparisons between the proposed intervention (the thing that supposedly works) and whatever the default choice is (often current practice, or, in some cases, doing nothing)? This is what RCTs tend to address. Is this drug better than a placebo? Do masks reduce viral transmission? Does flossing reduce cavities?

But there is another huge component of the “what works” question: the underlying mechanism that links the intervention to the outcome. How is this drug actually supposed to affect the body? How would masks reduce viral transmission? How does flossing prevent cavities?

It’s this component that is erased by the RCT fetish. The strongest evidence about whether something works comes from integrating rigorous empirical comparisons with a thorough understanding of the intervention mechanism.

Ideally, we have both. But depending on the case, it can be hard to obtain one or the other.

With flossing, the mechanism is clear every time you floss your teeth: little bits of food gets stuck on the floss. And there’s probably tinier bits that you don’t even notice. Decaying food on teeth lets bacteria thrive, leading to tooth decay.

This dovetails with the everyday experience of dentists, who claim that they can tell when someone flosses regularly (or someone has started flossing). But what about rigorous comparisons? Where are the RCTs on flossing?

Turns out, doing an RCT on flossing is complicated. The first hurdle is whether it’s ethical. Would you want to be in the condition that doesn’t floss their teeth for a few years to see what happens? The ethical question is bound up with how much we think we know about flossing. If we think it’s a close comparison — or have little evidence either way — the more ethical it becomes.

But there’s a bigger hurdle: logistics. You can test whether asking people to floss decreases cavities. You can test whether paying people to floss decreases cavities. But this is not the same thing as asking whether flossing decreases cavities.

Ask a group not to floss, and many participants will do it anyway. Ask a group to floss, and many might not do it. Ask people to report on how much they floss, and they won’t do so accurately (especially if people are reporting to a dentist). If most people in both groups already floss, and you encourage one group to floss, you might under-estimate the value of flossing. The hypotheticals just go on like this.

None of these hurdles are insurmountable on their own. But the result is that any RCT has to make big tradeoffs during the research design process. And it means never quite having the comparison we’d like to have.

But given our strong understanding of the mechanism and the (flawed) comparison research, it seems pretty reasonable to floss. And to recommend that people floss. And to teach people to floss properly, so that when they floss it’s maximally effective. As pretty much every dentist tries to do.

Relying only on our (potentially mistaken or incomplete) understanding of how an intervention works, however, can create real problems. In very complicated systems — the human body, an ecosystem, a social system — there are all kinds of things that might be happening. Incomplete mechanistic understandings are usually the cause of “medical reversal”: we believed something to be true, based on our understanding of the body, but when we actually test it, it turns out not to be true.

The mask situation is considerably more complicated than the flossing situation. Certainly wearing an N95 mask and a face shield is going to reduce the chance that you get COVID-19, in an environment where COVID-19 is being spread. Few (if any) researchers would argue with that.

But the disposable paper masks are not as effective. And the effectiveness of wearing a mask is different from whether a mask mandate or society-wide encouragement of masks decrease transmission rates. Fitting the mask properly is important. Whether people take riskier behaviors because of wearing a mask (like entering crowded places) is important. Many social behaviors are at play.

Rates of COVID-19 are also important: if it’s everywhere, and masks reduce transmission by 10%, that can be a huge reduction. If there’s a couple of cases each week, then the reduction may be trivial. But the existence of pre-symptomatic and asymptomatic cases makes it extremely challenging to know “who” should be wearing those masks even when transmission events are low.

These complications make it difficult to know exactly how effective society-wide mask-wearing is in any given situation. But to pretend that we didn’t already know that masks reduce the likelihood of COVID-19 transmission, as the Atlantic article describing the Bangladesh study does, is to demolish what we mean by “know”.

Even prior to the pandemic, many different forms of evidence suggested that masks would be at least somewhat effective at reducing transmission. And since the pandemic began, there’s been even more research on this question. The latest RCT just adds to our understanding of the mechanisms at play.

The Bangladesh study is not even fundamentally about the efficacy of mask-wearing on transmission. Rather, it’s about the efficacy of various interventions to increase mask-wearing. Like any reasonable researcher, the research team assumes that mask-wearing reduces transmission and is figuring out ways to try to increase mask-wearing, while also measuring the benefit of those interventions.

If mask wearing weren’t so culturally controversial, the finding that wearing masks did actually decrease transmission rates would be utterly unremarkable.

RCTs are very powerful, but they can also be narrow. The researchers in the Bangladesh study paid members of the community to monitor mask-wearing by the public, in addition to providing free masks. Would this strategy be equally effective in cities? In Nigeria? In the U.S.? It’s hard to say. Strictly applying the results to other contexts is a fools’ game. RCTs have to be carefully interpreted, just like every other form of research.

And, often, it’s what RCTs reveal about how interventions work that is the most informative. In the Bangladesh study, many plausible interventions didn’t work: text reminders, monetary incentives, verbal commitments, police presence. Light, social sanctions and reminders seemed to be the main driving force of the increased mask usage. It’s this idea, rather than the exact program, that might be tested in a different setting.

In research, there are no silver bullets; no definitive, broadly applicable answers to our most pressing questions. We are always acting with some degree of uncertainty. But leveraging empirical comparisons — even imperfect ones — while building our understanding of the mechanisms at work is the way to make sound policy decisions.