CRISPR is a breakthrough technology with humble origins. Scientists first discovered the powerful gene editor in bacteria that were using it as a weapon against invading viruses called phages. Phages can wipe out up to a quarter of a bacterial population in a day. Under assault, bacteria have evolved a hefty arsenal of defenses in a relentless arms race.
These bacterial immune systems often chop up the DNA or RNA of invading viruses and are relatively easy to manufacture, making them alluring targets for scientists developing genetic engineering tools. CRISPR is just one example. There are many more. But traditional methods of searching for them are slow and labor-intensive, leaving most CRISPR-like proteins unexplored.
Now, MIT scientists have released an AI called DefensePredictor that can root out new bacterial defense systems in five minutes, instead of weeks or months. As proof of concept, DefensePredictor churned through hundreds of thousands of proteins in multiple strains of Escherichia coli (E. coli). Over 600 proteins not previously linked to immune defense popped up. Added to a vulnerable strain of bacteria, a subset of these protected them against attack.
“E. coli harbors a much broader landscape of antiphage defense than previously realized, expanding the likely number of systems by multiple orders of magnitude,” wrote the team.
These systems might hold secrets about how immunity evolved. And because the proteins may work in different ways, they could be a goldmine for next-generation precision molecular tools.
Unrivaled Success
Around three decades ago, Japanese scientists discovered a curious, repetitive DNA sequence in E. coli. Other researchers soon realized it was widespread across bacterial species and matched viral DNA sequences—suggesting it could be part of the bacteria’s immunity against phages.
The system now known as CRISPR stores snippets of DNA from past infections and uses protein “scissors” to cut apart matching viral DNA during reinfection. Intrigued by its precision, scientists repurposed CRISPR into a variety of gene editing tools and launched a gene therapy revolution.
CRISPR is the most famous, but a range of bacterial defense systems have transformed genetic engineering. One, containing an enzyme that cuts specific sequences of foreign DNA, is widely used to add genetic material into cells. Another encodes a balance of toxins and antitoxins that can trigger bacterial death after phage infection. This one has been adapted into a kill switch to prevent engineered microbes or genetically modified crops from spreading uncontrollably.
Researchers are also exploring the use of newly discovered systems—with video game-like names like Zorya and Thoeris—as molecular sensors and programmable signaling in synthetic biology.
There are likely more undiscovered tools in the universe of bacterial defense, and scientists have ways of hunting them down. Some defense genes are grouped close to one another, so a known gene could guide the discovery of others. Researchers have also found genes by screening libraries of free-floating circular genome fragments across bacterial populations.
Over 250 systems have been painstakingly validated. But plenty more could escape current detection methods if, for example, their components are spread across the genome.
“The full repertoire of antiphage defense systems in bacteria remains unknown,” wrote the team. “We currently lack the tools to systematically identify systems with high speed, sensitivity, and specificity.”
AI Discoverer
The new DefensePredictor algorithm bridges that gap.
At its core is a protein language model called ESM-2. Proteins are made of 20 molecular “letters” that combine into strings and fold into complex 3D shapes. Similar to large language models, algorithms like ESM-2 learn the language of proteins and can predict their structure and purpose based on sequence alone.
ESM-2 and other similar algorithms have already helped scientists decipher mysterious proteins in bacteria, viruses, and other microorganisms previously unknown to science. Researchers hope their unique shapes could inspire antibiotics, biofuels, or even be used to build synthetic organisms.
To build their AI, the team first established a training ground. With a previous model, DefenseFinder, they screened roughly 17,000 microbial genomes for genes related—and unrelated—to defense systems. They translated these genes into corresponding proteins and built up a database with some 15,000 antiphage proteins and 186,000 proteins unrelated to defense.
These numbers are far too staggering for a human to tackle, but the AI took the work in stride. Alongside ESM-2, the model used several algorithms to distinguish between defense and non-defense proteins. Eventually DefensePredictor learned some general characteristics that make a protein more likely to be part of the immune system. (Like other language models, it’s hard to fully understand the system’s reasoning, which the team is still trying to unpack.)
When tested on 69 strains of E. coli, DefensePredictor surfaced a treasure trove of over 600 new defense-related proteins, including more than 100 that were different than any yet discovered. Although some were encoded near one another or in circular DNA—like previous findings—nearly half weren’t. They were instead littered across the genome yet may still work together.
To test the results, the team engineered a highly vulnerable E. coli strain to express candidate defense proteins—predicted to work either alone or as part of a system—and exposed them to two dozen aggressive phages. Nearly 45 percent of the proteins offered protection against at least one phage.
Beyond E. coli, the scientists expanded their search to 1,000 more microorganisms and found thousands of potential defense proteins unlike anything seen before. “New immune mechanisms remain to be found,” wrote the team.
The race is on. Also published this week, a Pasteur Institute team combined multiple AI models to look for antiphage systems in protein sequences. Across over 32,000 bacterial genomes, the model predicted nearly 2.4 million antiphage proteins—most previously unknown. They released an atlas of AI-predicted bacterial immunity proteins for others to explore.
“The diversity of antiphage defense systems is vast and largely untapped,” they wrote.
Microorganisms harbor a colossal repertoire of biological tools we’re only just beginning to uncover at scale. More species are constantly found thriving in diverse environments, from pond scum to boiling sulfuric springs to the crushing pressure of the Mariana Trench. Every new genome scientists discover and pick apart, now with AI’s help, could be hiding the next CRISPR.








