This is a little off-topic, but I believe that any researcher who receives government funding should be required to publish their dataset and computations online so other researchers could vet the results and potentially use the dataset in their research.
Physics is in a similar spot, where more and more of the everyday work of physics is writing Python code.
I wonder if programming skills will become like literacy. In 50 years will programming just be a necessary skill for any scientist? It’ll be easier to program, too, with tools like ChatGPT. Just like inventions like punctuation and lowercase letters made it easier to be literate.
That is a very interesting point about the similarities between economics and software engineering. I am not a software engineer, but I have worked in the sector as a UX Designer for 20 years. Two common practices of software engineering may also be applicable to economics:
1) Code review by another developer before completion
2) Quality Assurance testing by third-party specialists, who are deliberately trying to break the code and identify its outer limits.
I am sure that economists will not like making publishing papers more complex and time-consuming, but it would probably help the profession in the long-run.
The big problem is that software engineering is always iterative, but academia is oriented around events like publishinga paper or getting a grant. Like if someone finds a bug in the code can you release V2 of the paper, probably not.
It is understandable that an American software engineer would view economics that way, given the disastrously unhinged condition of our economy and the utter failure of our economists to chart a path out of our current mess (after charting our way in).
Read some Keynes, some Marx or, for that matter, some Xi Jinping, and you'll see that, while software plays an important supporting role in shaping economies, it represents barely 10% of the factors–including the all-important moral dimension–of successful economics.
In my opinion, any "scientific paper" which does not give a knowledgable person everything necessary to reproduce the results is not a scientific paper.
It's a press release.
And in today's world, where so much of science is software development, part of what you need to give a knowledgable person so they can reproduce their results is the software developed as part of writing the paper. Preferably as a GitHub. Ideally as well documented code.
To me, though, that extends well beyond the software, and includes the tools and methods used--any apparatus constructed, any measurement equipment used, the way data was gathered. If you review the original paper by R.A. Millikan where he attempts to measure the charge of the electron, he provides a diagram of the chamber used to run his tests and a detailed description of the way the experiment was conducted, the measurements taken, and the math used to arrive at his final results.
In today's world of 3d printers, that may also mean checking in CAD drawing and shape files of custom components that were created to construct test apparatus, as well as a description of the pre-manufactured equipment that was used in the test apparatus.
And to me, until we start seeing actual scientific papers--including references to GitHub repositories, 3D cad drawings, GPS locations where ground surveys were conducted, and everything else so that the results can be replicated--we should start rejecting those papers that do not provide everything necessary to understand how the results were achieved.
David Donoho and others believe that the combination of open datasets, open code, and competition is what's responsible for the machine learning revolution, not neural networks:
I highly recommend this piece to metascience folks, super interesting perspective. He thinks it could apply to other sciences too, especially computational fields like economics. I talk a little about building this flywheel in other fields here:
as an upper bound on the quality of the coding you can expect from the economics community, while it's standard practice in the academic machine learning community to publish github repos with code, it's common for these to be poorly organized, poorly documented, and take significant effort to build to reproduce results. if computer scientists haven't managed to enforce software engineering norms for academic research, it's unlikely that economists will do effective software engineering, and there's probably an underlying reason that it's difficult for acadmics to provide high quality code alongside their papers
Agree. In my experience, it is not GitHub vs some replication code journals currently require, as the author suggests, but how well the code is organized and written. Well written code transparently shows how the authors got their results.
I would consider the ease with which choice can be subject to scrutiny one of the criteria on which the validity of a paper ought to be assessed. This can be facilitated by the creation of open source frameworks which standardise approaches and organisation, and will also be greatly simplified by AI which can read and understand entire repos trivially. I think we can reasonably assume that this issue will be solved within 5-10 years.
I once asked the authors of a paper published in Science for the "equations" of their economic model. They said "Software is so good now that we never have to write down the equations." They then refused to share any details.
I asked the editors of Energy Economics to force the author a paper they published to provide the details of his economic model. One editor replied "[obscenity]. You are trying to steal intellectual property."
Economists publish work that they know contains major errors. Nobody cares.
Jim Heckman, in his role as a JPE editor, tells me that he will reject a paper of mine if I criticize one of their editors, even though that criticism was not in the paper. I took my name off the paper because I did not want my junior coauthors to be hurt by Jim's attack. After my name was taken off, Jim quickly accepted the paper.
See my blog: showmethemath.org for many more examples of how economics is controlled by the "elite" with little concern for scientific standards.
At work we use Jupyter Notebooks for all the data science stuff or proofing calculations. It's effective because you can intermix text and executable code and output and tell a story. https://jupyter.org/
I'd like to shamelessly promote my own 2017 article "The Code is the Model" on the topic. In it, I'm applying principles from agile software engineering to the art of building economic simulations.
Empirical work is fine, and necessary, but perishable, because human behavior generates few or no exploitable numerical constants. The most valuable part of economics is the generalizations and some rudimentary techniques that can be absorbed (though they often are not) by college students.
I remember when I took Ken Judd's class that he made related complaints. One thing he couldn't understand was the insistence of economists to code their own solvers "from scratch." He advocated using professional-grade solvers (written by people who really know what they're doing). Perhaps part of this culture shift requires establishing good norms for 1) writing software that others can build on, and 2) what's okay and not okay when building on existing software. I get the sense that the open-source software world has worked a lot of this out, and we could probably learn a lot from them.
You definitely could. I would argue that the first step towards implementing these norms will be to include repos in code so that the coding and architectural decisions can be subject to review, not just the economic decisions and assumptions. Also Ken Judd has commented below.
This is a little off-topic, but I believe that any researcher who receives government funding should be required to publish their dataset and computations online so other researchers could vet the results and potentially use the dataset in their research.
Agree. Maybe to apply for a grant, applicants should be required to first create a GitHub account and link to it in their application.
Physics is in a similar spot, where more and more of the everyday work of physics is writing Python code.
I wonder if programming skills will become like literacy. In 50 years will programming just be a necessary skill for any scientist? It’ll be easier to program, too, with tools like ChatGPT. Just like inventions like punctuation and lowercase letters made it easier to be literate.
That is a very interesting point about the similarities between economics and software engineering. I am not a software engineer, but I have worked in the sector as a UX Designer for 20 years. Two common practices of software engineering may also be applicable to economics:
1) Code review by another developer before completion
2) Quality Assurance testing by third-party specialists, who are deliberately trying to break the code and identify its outer limits.
I am sure that economists will not like making publishing papers more complex and time-consuming, but it would probably help the profession in the long-run.
The big problem is that software engineering is always iterative, but academia is oriented around events like publishinga paper or getting a grant. Like if someone finds a bug in the code can you release V2 of the paper, probably not.
It is understandable that an American software engineer would view economics that way, given the disastrously unhinged condition of our economy and the utter failure of our economists to chart a path out of our current mess (after charting our way in).
Read some Keynes, some Marx or, for that matter, some Xi Jinping, and you'll see that, while software plays an important supporting role in shaping economies, it represents barely 10% of the factors–including the all-important moral dimension–of successful economics.
In my opinion, any "scientific paper" which does not give a knowledgable person everything necessary to reproduce the results is not a scientific paper.
It's a press release.
And in today's world, where so much of science is software development, part of what you need to give a knowledgable person so they can reproduce their results is the software developed as part of writing the paper. Preferably as a GitHub. Ideally as well documented code.
To me, though, that extends well beyond the software, and includes the tools and methods used--any apparatus constructed, any measurement equipment used, the way data was gathered. If you review the original paper by R.A. Millikan where he attempts to measure the charge of the electron, he provides a diagram of the chamber used to run his tests and a detailed description of the way the experiment was conducted, the measurements taken, and the math used to arrive at his final results.
In today's world of 3d printers, that may also mean checking in CAD drawing and shape files of custom components that were created to construct test apparatus, as well as a description of the pre-manufactured equipment that was used in the test apparatus.
And to me, until we start seeing actual scientific papers--including references to GitHub repositories, 3D cad drawings, GPS locations where ground surveys were conducted, and everything else so that the results can be replicated--we should start rejecting those papers that do not provide everything necessary to understand how the results were achieved.
Excellent
David Donoho and others believe that the combination of open datasets, open code, and competition is what's responsible for the machine learning revolution, not neural networks:
https://arxiv.org/abs/2310.00865
I highly recommend this piece to metascience folks, super interesting perspective. He thinks it could apply to other sciences too, especially computational fields like economics. I talk a little about building this flywheel in other fields here:
https://splittinginfinity.substack.com/p/standardize-science
as an upper bound on the quality of the coding you can expect from the economics community, while it's standard practice in the academic machine learning community to publish github repos with code, it's common for these to be poorly organized, poorly documented, and take significant effort to build to reproduce results. if computer scientists haven't managed to enforce software engineering norms for academic research, it's unlikely that economists will do effective software engineering, and there's probably an underlying reason that it's difficult for acadmics to provide high quality code alongside their papers
Agree. In my experience, it is not GitHub vs some replication code journals currently require, as the author suggests, but how well the code is organized and written. Well written code transparently shows how the authors got their results.
I would consider the ease with which choice can be subject to scrutiny one of the criteria on which the validity of a paper ought to be assessed. This can be facilitated by the creation of open source frameworks which standardise approaches and organisation, and will also be greatly simplified by AI which can read and understand entire repos trivially. I think we can reasonably assume that this issue will be solved within 5-10 years.
Economics is a sewer and no one cares.
I once asked the authors of a paper published in Science for the "equations" of their economic model. They said "Software is so good now that we never have to write down the equations." They then refused to share any details.
I asked the editors of Energy Economics to force the author a paper they published to provide the details of his economic model. One editor replied "[obscenity]. You are trying to steal intellectual property."
Economists publish work that they know contains major errors. Nobody cares.
Jim Heckman, in his role as a JPE editor, tells me that he will reject a paper of mine if I criticize one of their editors, even though that criticism was not in the paper. I took my name off the paper because I did not want my junior coauthors to be hurt by Jim's attack. After my name was taken off, Jim quickly accepted the paper.
See my blog: showmethemath.org for many more examples of how economics is controlled by the "elite" with little concern for scientific standards.
The economists in *Red Plenty* were so ahead of us! Cybernetics and data and linear algebra ALL. THE. WAY. DOWN.
Sounds like a great topic for a research paper. Which does need the question: what is the most effective way to push towards this future?
At work we use Jupyter Notebooks for all the data science stuff or proofing calculations. It's effective because you can intermix text and executable code and output and tell a story. https://jupyter.org/
I'd like to shamelessly promote my own 2017 article "The Code is the Model" on the topic. In it, I'm applying principles from agile software engineering to the art of building economic simulations.
https://www.microsimulation.pub/articles/00169
Empirical work is fine, and necessary, but perishable, because human behavior generates few or no exploitable numerical constants. The most valuable part of economics is the generalizations and some rudimentary techniques that can be absorbed (though they often are not) by college students.
I remember when I took Ken Judd's class that he made related complaints. One thing he couldn't understand was the insistence of economists to code their own solvers "from scratch." He advocated using professional-grade solvers (written by people who really know what they're doing). Perhaps part of this culture shift requires establishing good norms for 1) writing software that others can build on, and 2) what's okay and not okay when building on existing software. I get the sense that the open-source software world has worked a lot of this out, and we could probably learn a lot from them.
You definitely could. I would argue that the first step towards implementing these norms will be to include repos in code so that the coding and architectural decisions can be subject to review, not just the economic decisions and assumptions. Also Ken Judd has commented below.