Thursday, November 6, 2008

Urquiola & Verhoogen on Tainted RDD

After a bit of a hiatus from blogging, we're back, pointing you to some great research on tainted regression discontinuity designs (RDDs) presented by Eric Verhoogen today at the quantitative political science seminar (paper linked here). The paper looks at RDDs that use class size cut-offs as the basis for identifying the effects of class size on student performance. In many settings, when a class reaches such a cut-off (e.g. 45 students in a classroom), it is somehow mandated to add a new class-room, thus creating a discontinuous decrease in class sizes relative to schools with classes just below the cap. For example, if the 45-student cut-off is rigid, then when the school reaches 46 students, it has to add a class; if it evenly divides students between the (now) two classes, class size falls to 23 per class. So long as no other important factors change discontinuously at the cut-off, then we can compare the class with 45 students to those with the 23 students to identify the effects of class size.

At least, that's what a naive interpretation of such cut-offs imply. Urquiola and Verhoogen show that things are not so simple. They model interactions between schools and households. As they write in the abstract, they find that
[S]chools at the class-size cap [in this case, a maximum of 45 students] adjust prices (or enrollments) to avoid adding an additional classroom, which generates discontinuities in the relationship between enrollment and household characteristics, violating the assumptions underlying regression-discontinuity research designs.

Some schools will try to find ways to avoid crossing the 45 student threshold, others will not, and these choices are correlated with things like the socio-economic status of the communities that the schools serve and the schools' quality, both of which affect the ability of schools to attract more students to compensate for the need to add new classrooms. They are also correlated with student performance, spoiling the RDD.

The upshot for others using RDDs is that behavior near the RDD cut-offs and relevant to the outcomes being studied may involve discontinuities that spoil the design. Clearly researchers using RDD should use whatever data they have to study whether other discontinuities exist near the cut-offs.

Nonetheless, there is something funny going on in the paper. Urquiola and Verhoogen use a model to explain why things like the socio-economic status of households and school quality will vary discontinuously at the cut-offs. But of course, one could probably use less formal reasoning to come to such hypotheses, and then test that with the data. The model seems like overkill in exposing something that is pretty obvious (although hindsight is always 20-20!). So I wonder, rather than using the model to just expose the potential taint that is pretty obvious without the model, why wouldn't we use the model to actually correct our estimation procedure and obtain our "best guess" estimate? It seems that would be carrying a Heckman-type approach to its logical conclusion. But maybe we just don't have that much faith in our models?