What does it mean to do empirical social science?
For many of us, most of the time, what it means is writing and debugging code.
A lot of economics is software engineering.
Whether I am analyzing Environmental Impact Statement completion-time trends, replicating inflation data from the BLS, or graphing examples of Simpson’s paradox with NIH clinical trials, most of my job as an economics researcher is writing code.
I’m not the only one either. In 1963, half of all published papers in economics were “theory,” which means they were solely exploring the properties of mathematical equations, with little connection to outside data. In 2011, theory papers were less than 20% of published work and simulation or empirical work made up the other 80%. Even within the theoretical papers, coding has become more important, but the transition to the far more software-intensive practice of empirical economics has pushed this further.
Jesús Fernández-Villaverde, an economics professor at the University of Pennsylvania explains:
“Computation has become a central tool in economics. From the solution of dynamic equilibrium models in macroeconomics or industrial organization, to the characterization of equilibria in game theory, or in estimation by simulation, economists spend a considerable amount of their time coding and running fairly sophisticated software.”
This shift into software skills has made economists valuable hires for tech companies like Amazon or Uber which in turn has further incentivized economics education and practice to adopt more of the tools of software engineering.
Some economists, like Melissa Dell of Harvard University, are writing state of the art OCR and machine learning software as infrastructure for their research.
Economics has inherited the incredibly powerful tools of software engineering and this has improved both the internal and external validity of the field. However, the transition of economics into software engineering is not yet complete and is currently in an uncanny valley. Economics has the power of software engineering but still retains the ethics and intellectual norms of 1970s academia.
Even though code and data have become by far the most important and resource intensive parts of economics research output, the currency of the profession is still “papers.” These are text-only artifacts that rarely describe the research process in enough to detail to recreate it, instead focusing on displaying the results. The skeuomorphic emulation of actual paper in PDFs decades after digital distribution became the primary source of audience is emblematic of the mismatch between the tools and the culture of economics.
Github repos for papers are rare. The vast majority of economics papers published today are collections of text, code, and data. But very often only the text part of this package is reviewed and published and the other essential pieces are left in the background.
This leads to embarrassingly widespread fraud, non-replication, and p-hacking.
PDFs are no longer sufficient to contain the research outputs of economists. There are too many choices in the code and data to accept the outputs without verification and too many contributions to future economics research to leave “available upon request.” Requiring a zip file that also contains the code and data behind a paper is a bare minimum that some journals do reach, but a structured, fork-able Github repo would be better.
Interactive, web-forward presentations of research like Scaling Monosemanticity from Anthropic or Bartosz Ciechanowski’s incredible blog posts, should be rewarded over basic PDFs. Basic versions of these are not difficult to create using tools like Shiny in R, like this interactive website I made for my senior thesis on cannabis legalization and prescription drug use.
There are also insufficient rewards for economists writing open-source software for other economists. Some do, e.g the DiD package by Callaway and Sant’Anna but it’s not common, even though it’s a massive leverage way to move the field. Looking through top papers in econ journals and you’ll find dozens of papers using the exact color scheme of the graphs produced by this package! There is little to no official reward for writing software like Callaway and Sant’Anna despite it being influential and useful for thousands of economists.
Journal review panels are the clearest enforcers of the mismatched intellectual norms of economics, but the ultimate source of these norms are the individual economics professors and their students. Next time you assign a econ research paper to your class, don’t accept just a pdf as the final product, push your students to submit their research as what it actually is: a software product.
Moving forward, we shouldn’t treat a paper as complete without a fully replicating Github repo. Eventually, this will move beyond simply appending the traditional paper to the other elements of a research project and towards a deeper combination of all the elements into one product. Perhaps something like replication walkthroughs that take readers through annotated and visualized code which verifies the results, highlights the places where important assumptions were made, and provides commentary on the results.
The culture of economics definitely shifting closer to the norms of software engineering but it should be pushed faster. We should have been treating papers as text + code + data packages decades ago. But these cultural norms are not static and change can be accelerated. Let’s escape the uncanny valley between typewriter academia and software engineering and push forward into a new age of economics research!
This is a little off-topic, but I believe that any researcher who receives government funding should be required to publish their dataset and computations online so other researchers could vet the results and potentially use the dataset in their research.
Physics is in a similar spot, where more and more of the everyday work of physics is writing Python code.
I wonder if programming skills will become like literacy. In 50 years will programming just be a necessary skill for any scientist? It’ll be easier to program, too, with tools like ChatGPT. Just like inventions like punctuation and lowercase letters made it easier to be literate.