R-squared doesn’t matter− at least not in Randomized Control Trials

researcher spotlight

Akhter Ahmed is a Senior Research Fellow in the IFPRI’s Poverty, Gender, and Inclusion Unit and Country Representative for IFPRI Bangladesh. He has worked on strategies for agricultural and rural development, social protection, and women’s empowerment to reduce poverty, food insecurity, and undernutrition in developing countries including Bangladesh, China, Egypt, India, Malawi, the Philippines, and Turkey.

blogs

IFPRI Blog:

Markets, Trade, and Institutions (MTID)

R-squared doesn’t matter− at least not in Randomized Control Trials

Open Access | CC-BY-4.0

By Alan de Brauw

September 30, 2015

A few months ago, Marc Bellemare noted that in at least some randomized control trials (RCTs), the treatment effect does not explain a whole lot of variation in the dependent variable (as measured by the R-sqaured). With this post, I want to make a subtle point: an RCT is not actually designed explain the variance of an outcome of interest, and in fact the treatment effect will never explain much of the variation in an outcome of interest. Consequently, we should either ignore it in presentations or just leave it out entirely.

To explain, recall that the R-squared is defined as the ratio of the variation that right-hand side variables explain in a regression to the total amount of variation in that variable. In an RCT, the goal is to understand either the average treatment effect on the treated (ATT) or the “intent to treat” effect, depending upon whether the researchers expect the treatment or intervention to be taken up by all potential beneficiaries or just a portion of them. This goal is very different from trying to explain variation in the outcome of interest, as either quantity is simply the difference in means between the treatment and the control groups. There is no reason to expect that controlling for that difference in means in a regression would explain variance in the dependent variable. The goal of estimating the ATT is therefore not to explain variation in the outcome, but rather to understand how much the outcome changes on average based on an intervention relative to the intervention not occurring.

To illustrate, I conducted a simple Monte Carlo experiment (Stata code here). I first constructed a hypothetical experiment in which the average treatment effect on the treated was 0.1 standard deviations, on a standard normal dependent variable. A power calculation suggested 3,200 observations would be necessary to identify the average treatment effect 80 percent of the time, so I generated 10,000 samples of 3200 observations with an average treatment effect of 0.1 (the treatment effect was generated with a mean of 0.1 and a standard deviation of 0.2). I plot the variance explained by the treatment effect in Figure 1 (see right hand sidebar). As expected, the treatment effect almost always explains less than half a percent of the variation in the dependent variable; otherwise, it performs quite well− the average estimated average treatment effect on the treated is 0.099, and it is significant at the 5 percent level 79.4 percent of the time in the Monte Carlo.

In theory, adding covariates to a regression of the outcome of interest on the treatment effect will not change the estimated average treatment effect because the covariates are orthogonal to the treatment indicator. It will, however, explain some of the variance in the outcome, reducing the overall variance of the regression and the standard error on the treatment effect. Since those covariates are thought to be related to the outcome, they should increase the amount of explained variance in the regression but not otherwise affect the treatment effect. In fact, they should explain much more of the variation in the dependent variable than the treatment effect.

To illustrate, I added a randomly generated covariate with a correlation of 0.2 with the dependent variable in a second Monte Carlo experiment. I regressed the outcome on the treatment variable and the covariate, and measured the proportion of explained variance accounted for by the treatment variable in each experiment. The distribution of the proportion of explained variation in the dependent variable due to the treatment dummy is in Figure 2; it is typically less than 10 percent of explained variance. Otherwise, the estimates work just as expected: the average estimated average treatment effect on the treated is still 0.1, with a slightly lower standard error than in the first experiment, and the treatment effect is statistically significant in more cases− 81.4 percent of the time.

In conclusion, the variation in the outcome explained by the treatment effect is low by design. In an RCT, the treatment effect is meant to measure the difference in means between groups rather than explaining variation in the dependent variable. Not surprisingly, when a variable correlated with the outcome is included in the regression along with the treatment indicator in the Monte Carlo, it accounts for most of the explained variance. Clearly, policy makers worried about improving average outcomes should worry about the impact on the average outcome rather than the R-squared.

Links

No links

Countries

Topics

Donors

No donors listed

Previous Blog Posts

Issue Post

May 21, 2025

2025 Global Report on Food Crises: Rising food insecurity, waning humanitarian assistance

By Sara Gustafson and Rob Vos

The world faces a stark inflection point.
Issue Post

May 20, 2025

A key decade of food system change and looking forward: Per Pinstrup-Andersen on his time as IFPRI Director General

Insights from a World Food Prize laureate.
Issue Post

May 16, 2025

Agrifood science in transition: Building a sustainable architecture in the face of declining development aid

By Steven Were Omamo and David Spielman

Navigating a challenging new environment.

Who we are

What we do

Research topics

Projects and impact

Modeling tools

Project microsites

What’s new

Where we work

IFPRI Regional Programs

IFPRI Country Programs

Research by country and region

2025 Global Report on Food Crises: Rising food insecurity, waning humanitarian assistance

A key decade of food system change and looking forward: Per Pinstrup-Andersen on his time as IFPRI Director General

Agrifood science in transition: Building a sustainable architecture in the face of declining development aid

R-squared doesn’t matter− at least not in Randomized Control Trials

Share

Links

Countries

Topics

Donors

Previous Blog Posts