Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras
Robust impact assessment methods need credible yield, costs, and other production performance parameter estimates. Sample data issues and the realities of producer heterogeneity and markets, including endogeneity, simultaneity, and outliers can affect such parameters. Methods have continued to evolve that may address data issues identified in the earlier literature examining genetically modified (GM) crops impacts especially those of conventional field level surveys. These methods may themselves have limitations, introduce trade-offs, and may not always be successful in addressing such issues. Experimental methods such as randomized control trials have been proposed to address several control treatment data issues, but these may not be suitable for every situation and issue and may be more expensive and complex than conventional field surveys. Furthermore, experimental methods may induce the unfortunate outcome of crowding-out impact assessors from low- and middle-income countries. The continued search for alternatives that help address conventional survey shortcomings remains critical. Previously, existing assessment methods were applied to the impact assessment of insect resistant and herbicide tolerant maize adoption in Honduras in 2008 and 2012. Results from assessments identified endogeneity issues such as self-selection and simultaneity concurrently with influential outliers. Procedures used to address these issues independently showed trade-offs between addressing endogeneity and outliers. Thus, the need to identify methods that address both issues simultaneously, minimizing as much as possible the impact of method trade-offs, continues. We structured this paper as follows. First, we review the literature to delineate data and assessment issues potentially affecting robust performance indicators such as yields and costs differentials. Second, we discuss and apply four types of approaches that can be used to obtain robust performance estimates for yield and cost differentials including: 1) Robust Instrumental Variables, 2) Instrumental Variable Regressions, and 3) Control/Treatment, and 4) Machine Learning methods that are amenable to robust strategies to deal with outliers including Random Forest and a Stacking regression approach that allows for a number of “base learners” in order to examine the pooled 2008 and 2012 Honduras field surveys. Third, we discuss implications for impact assessment results and implementation limitations especially in low- and middle-income countries. We further discuss and draw some conclusions regarding methodological issues for consideration by impact assessors and stakeholders.
Authors
Falck-Zepeda, José B.; Zambrano, Patricia; Sanders, Arie; Trabanino, Carlos Rogelio
Citation
Falck-Zepeda, José B.; Zambrano, Patricia; Sanders, Arie; and Trabanino, Carlos Rogelio. 2025. Parametric and machine learning approaches to examine yield differences between control and treatment considering outliers and statistical biases: The case of insect resistant/herbicide tolerant (IR/HT) maize in Honduras. IFPRI Discussion Paper 2334. Washington, DC: International Food Policy Research Institute. https://hdl.handle.net/10568/174327
Keywords
Latin America and the Caribbean; Central America; Maize; Yields; Impact Assessment; Agriculture; Data; Capacity Building; Machine Learning; Parametric Programming; Herbicide Resistance
Access/Licence
Open Access