How Scientific Incentives Stalled the Fight Against Antibiotic Resistance, and How We Can Fix It
My latest post at Tim Hwang's Macroscience
I am cross-posting an excerpt of a piece I wrote for Tim Hwang’s blog Macroscience about peptides and antibiotic resistance. You can read the entire thing here.
Introduction
For all of human history until the past 100 years, infectious diseases have been our deadliest foe. Even during the roaring 1920s, nearly one in a hundred Americans would die of an infectious disease every year. To put that into context, the US infectious disease death rate was 10x lower during the height of the COVID-19 pandemic in 2021. The glorious relief we enjoy from the ancient specter of deadly disease is due in large part to development of antibiotic treatments like penicillin.
But this relief may soon be coming to an end. If nothing is done, antibiotic resistance promises a return to the historical norm of frequent death from infectious disease. As humans use more antibiotics, we are inadvertently running the world's largest selective breeding program for bacteria which can survive our onslaught of drugs. Already by the late 1960s, 80% of cases of Staphylococcus aureus, a common and notorious bacterial infection agent, were resistant to penicillin. Since then, we have discovered many more powerful antibiotic drugs, but our use of the drugs is growing rapidly, while our discovery rate is stagnating at best.
As a result, antibiotic resistance is spreading. Today, certain forms of Staphylococcus aureus, like MRSA, are resistant to even our most powerful antibiotics, and the disease results in 20 thousand deaths every year in the US.
The most promising solution to antibiotic resistance comes from dragon blood.
Komodo dragons, native to a few small islands in Indonesia, are the world’s largest lizards. They eat carrion and live in swamps, and their saliva hosts many of the world’s most stubborn and infectious bacteria. But Komodos almost never get infected. Even when they have open wounds, Komodo dragons can trudge happily along through rotting corpses and mud without a worry.
Their resilience is due to an arsenal of chemicals in their blood called antimicrobial peptides. These peptides are short sequences of amino acids, the building blocks of proteins. These chemical chains glom onto negatively charged bacteria (but not neutrally charged animal cells) and force open holes in the membrane, killing the infectious bacterium. Humans have peptides too, and we use them for everything from regulating blood sugar with insulin to fighting infections.
Peptides are especially promising candidates for antibiotic-resistant pathogens for two reasons. One is that they are easily programmable and synthesized. Their properties and structure are the result of chaining amino acids together in a line, so it’s easy to work with them computationally and apply machine learning and bioinformatics. The second reason is that peptides are resistant to resistance. Researchers can use them to target much more fundamental properties of bacteria, whereas antibiotics target particular molecular pathways that are often closed by a single, small mutation. For example, bacterial membranes are almost universally negatively charged; it is a feature of their physiology which is not easily mutated away. Therefore, peptides which use this negative charge to seek out and destroy invading bacteria are difficult to avoid, even after those bacteria evolve through generations of intensive selective breeding as a result of being targeted.
Even though peptides are short, usually less than 50 amino acids, the combinatorial space of peptide sequences is vast. It’s difficult to search through this space for peptides that are effective against the resistant superbugs which threaten to return us to the medieval world of deadly infections. However, searching for these peptides is a well-defined problem with easy-to-measure inputs and outputs. The fundamental research problem is perfectly poised to benefit from rapid advances in computation. The cutting edge of research in this field involves building machine learning models to predict which sequences of amino acids will be bio-active against certain pathogens, similar to Deepmind’s AlphaFold, then developing those peptides and testing the model’s predictions.
But progress in this field is slower than we need it to be to meet the challenge of antibiotic resistance. This isn’t just due to inherent difficulties in the science, though of course those do exist. Progress towards antimicrobial peptides is slowed by scattered, poorly maintained, and small datasets of peptide sequences paired with experimentally verified properties. Machine learning thrives on big data, but the largest database of peptides only has a few thousand experimentally validated sequences and only tracks three or four chemical properties, like antimicrobial activity and host toxicity. These properties are often difficult to compare to other sources.
Most importantly, there is almost zero negative data in these sources. Scientists test hundreds or thousands of peptides to find one which is active against some pathogen, and then they publish a paper about the one which succeeded. That success might go into the database, but all of the preceding failures are kept in the file drawer, even though they are, at current margins, far more valuable for machine learning models than one more success data point.
Making a better dataset is feasible and desirable, but no actor in science today has the incentives to do it. Open data sets are a public good, so private research organizations will tend to underinvest. The non-pecuniary rewards in academia like publications and prestige are pointed towards splashy results in big journals, not a foundational piece of infrastructure, like a dataset.
This problem is solvable with an investment in public data production. A massive, standardized, and detailed dataset of one million peptide sequences and their antimicrobial properties (or lack thereof) would accelerate progress towards new drugs that can kill antibiotic-resistant pathogens. This would replicate the success of datasets like the Protein Structure Initiative and the Human Genome Project and put us on track to defeat these drug-resistant diseases, before they roll back the clock on the medical progress of the past century.
Much more at the link.
I can't agree with this enough.
Also publish null results