Scientific research is undergoing a fundamental transformation, one that may prove to be the most significant shift in how humanity pursues knowledge since the invention of the scientific method itself. What once took months of painstaking analysis can now be accomplished in minutes. Experiments that required entire teams of specialists working around the clock can now be designed and interpreted by AI systems that never sleep, never forget a detail, and can access the entirety of human scientific knowledge in an instant.
This isn't speculation about a distant future or optimistic projection from technology enthusiasts. It's happening right now, today, in research labs across the world, from the gleaming facilities at Stanford and MIT to pharmaceutical giants and small academic teams pushing the boundaries of what's possible.
The Dawn of a New Scientific Era
For centuries, scientific progress has been constrained by a fundamental bottleneck: human cognitive bandwidth. A brilliant researcher might dedicate their entire career to understanding a single protein, a specific cellular pathway, or one aspect of a disease. The sheer volume of scientific knowledge, now doubling every nine years, has made it impossible for any individual to stay current even within their narrow specialty.
In October 2025, Anthropic launched Claude for Life Sciences, a suite of connectors and specialized capabilities designed to transform AI from a mere tool into a genuine scientific collaborator. The launch represented a watershed moment in the company's strategy, marking its first formal entry into the life sciences sector with purpose-built features for researchers.
Since then, Anthropic has invested heavily in making Claude the most capable model for scientific work. Their latest Opus 4.5 model demonstrates significant improvements in figure interpretation (correctly parsing complex scientific visualizations), computational biology tasks, and protein understanding benchmarks. On the Protocol QA benchmark, which measures comprehension of experimental procedures, Claude's Sonnet 4.5 scored 0.83, exceeding the human expert baseline of 0.79.
But raw benchmarks only tell part of the story. The real revolution lies in how scientists are actually using these tools to reshape the very process of discovery.
Inside the AI for Science Program
Through Anthropic's AI for Science program, which provides free API credits to leading researchers working on high-impact scientific projects around the world, scientists have developed custom systems that go far beyond basic tasks like literature reviews, data visualization, or coding assistance.
In the labs participating in this program, AI has evolved into something unprecedented: a collaborator that works seamlessly across all stages of the research process. These AI systems are helping scientists determine which experiments to run before they invest precious resources, compressing projects that normally take months into mere hours, and identifying patterns in massive datasets that would take human researchers years to recognize, if they noticed them at all.
In many cases, these AI tools are eliminating long-standing bottlenecks, handling tasks that require deep domain knowledge and have historically been impossible to scale. In some instances, they're enabling entirely different research approaches than scientists have traditionally been able to pursue.
The result? Claude is beginning to reshape how these scientists work, and more importantly, it's pointing them toward novel scientific insights and discoveries that might otherwise have remained hidden for decades.
Biomni: Unifying the Fragmented World of Biomedical Research
One of the most significant and often overlooked bottlenecks in biological research is the sheer fragmentation of tools available to scientists. Across the biomedical sciences, there exist hundreds of databases, each with its own query language and data format. There are countless software packages, each designed for specific analyses. There are established protocols developed over years of trial and error, scattered across thousands of published papers and supplementary materials.
Researchers, especially those early in their careers, spend an enormous amount of time simply learning to navigate this landscape. They must select appropriate tools from bewildering options, master various platforms and their idiosyncrasies, and figure out how to chain different analyses together. That's precious time that, in a perfect world, would be spent on running experiments, interpreting data, developing new hypotheses, or pursuing entirely new research directions.
Enter Biomni, an ambitious agentic AI platform developed by a collaborative team spanning Stanford University, Genentech, the Arc Institute, the University of Washington, Princeton University, and the University of California, San Francisco. Biomni represents a fundamentally new approach: collecting hundreds of tools, packages, and datasets into a single unified system through which a Claude-powered agent can navigate intelligently.
Biomni's foundational environment, called Biomni-E1, was constructed through an unprecedented effort: mining tens of thousands of biomedical publications across 25 distinct subfields. From this vast literature, the team extracted 150 specialized tools, 105 software packages, and 59 databases, creating a unified biomedical action space that no single human researcher could ever hope to master.
The system works remarkably simply from the user's perspective. Researchers give it requests in plain English, describing what they want to accomplish. Biomni automatically selects the appropriate resources, chains them together in logical sequences, handles data format conversions, and executes complex analyses. It can form hypotheses about biological phenomena, design detailed experimental protocols, and perform sophisticated analyses across more than 25 biological subfields, from genomics to proteomics to clinical research.
The GWAS Revolution: From Months to Minutes
To understand Biomni's transformative potential, consider the example of a genome-wide association study, commonly known as a GWAS. These studies search for genetic variants linked to specific traits or diseases, and they've revolutionized our understanding of the genetic basis of conditions from diabetes to schizophrenia.
Here's how a GWAS traditionally works: Researchers assemble a very large group of people, some who have a particular trait or condition, and others who don't. Perfect pitch, for instance, has a strong genetic basis. You might gather thousands of people who can produce a musical note without any reference tone, and compare them to others you would never invite to karaoke. Then you scan their genomes, looking for genetic variants that appear more frequently in one group than the other.
The genome scanning itself is relatively straightforward with modern sequencing technology. It's everything that comes after that consumes months of researcher time:
- Data cleaning: Genomic data arrives in messy, inconsistent formats. Missing values need handling. Quality control must be performed on thousands of genetic markers.
- Statistical analysis: Researchers must control for countless confounding variables, things like population ancestry, age, environmental factors, and other genetic variants that might create spurious associations.
- Hit identification: When statistical signals emerge, researchers need to determine which ones are genuinely significant versus random noise.
- Biological interpretation: For each "hit," researchers must investigate: What gene is nearby? (GWAS identifies genomic locations, not genes directly.) What cell types express that gene? What biological pathway might be affected? Is this mechanism plausible given what we know about the trait?
Each step typically involves different specialized software, different file formats, and extensive manual decision-making based on domain expertise accumulated over years. A single comprehensive GWAS analysis can easily consume three to six months of a skilled researcher's time.
In an early trial of Biomni, this entire analytical pipeline was completed in 20 minutes.
Rigorous Validation Across Multiple Domains
Such dramatic claims require rigorous validation, and the Biomni team has delivered exactly that through multiple carefully designed case studies spanning different subfields of biology.
In one validation study, Biomni was tasked with designing a molecular cloning experiment, a fundamental technique in biology used to create copies of specific DNA segments. In a blind evaluation where expert reviewers didn't know whether they were assessing AI or human work, Biomni's protocol and experimental design matched the quality of work produced by a postdoctoral researcher with more than five years of hands-on laboratory experience.
In another case study focusing on wearable health data, Biomni analyzed over 450 data files collected from 30 different individuals. These files contained diverse data streams: continuous glucose monitoring readings tracking blood sugar fluctuations, temperature measurements, and physical activity metrics from fitness trackers. The system processed all of this heterogeneous data, identified relevant patterns, and produced meaningful analysis in just 35 minutes. Human experts estimated this same task would require approximately three weeks of dedicated work.
Perhaps most impressively, Biomni tackled single-cell analysis, one of the most data-intensive and technically demanding areas of modern biology. The system analyzed gene activity data from over 336,000 individual cells extracted from human embryonic tissue. Working through this massive dataset, Biomni not only confirmed regulatory relationships between genes that scientists had already established through years of painstaking research, but also identified novel transcription factors, the proteins that control when genes turn on and off, that researchers hadn't previously connected to human embryonic development.
On standardized benchmarks designed to test AI capabilities in life sciences, Biomni's performance is striking. On the LAB-Bench benchmark, the system achieved 74.4% accuracy in database question-answering tasks and 81.9% accuracy in sequence analysis, slightly outperforming human expert baselines of 74.7% and 78.8% respectively. On the more challenging HLE benchmark covering 14 subfields, Biomni scored 17.3%, which might sound modest until you realize this outperforms base large language models by 402.3% and specialized coding agents by 43.0%.
Built-in Safeguards and Expert Collaboration
The Biomni team is careful to acknowledge that their system isn't perfect, and that recognition of limitations is itself a feature. The platform includes sophisticated guardrails designed to detect when Claude might be heading off-track, when an analysis doesn't make biological sense, or when conclusions aren't adequately supported by the data.
Where Biomni encounters gaps in its capabilities, the system supports a collaborative solution: experts can encode their specialized methodology as a "skill," essentially teaching the agent how an experienced researcher in their domain would approach a particular problem, rather than letting the AI improvise from general principles.
This collaborative approach proved essential when the team worked with the Undiagnosed Diseases Network on rare disease diagnosis. The researchers discovered that Claude's default approach to diagnostic reasoning differed substantially from the systematic methodology a skilled clinician would employ. Rather than abandon the effort, they interviewed diagnostic experts, carefully documented their reasoning process step by step, and taught this previously tacit knowledge to Claude. With that new expertise integrated, the agent's diagnostic performance improved dramatically.
MozzareLLM: Decoding the Language of Gene Knockouts
While Biomni represents a general-purpose platform approach, other research groups are building more specialized AI systems that target specific bottlenecks in their own workflows. One particularly elegant example comes from the Whitehead Institute at MIT.
Iain Cheeseman's laboratory specializes in a technique called optical pooled screening, which leverages the revolutionary gene-editing tool CRISPR. Since CRISPR emerged around 2012, scientists have been able to precisely and efficiently "knock out" specific genes, essentially disabling them to see what breaks. Cheeseman's team has pushed this to an impressive scale: they systematically knock out thousands of different genes across tens of millions of human cells, then capture detailed microscopy images of each cell to document what changed.
The patterns in those images reveal something profound about genetic function. Genes that perform similar jobs in the cell tend to produce similar-looking damage when they're disabled. Software can detect these visual patterns and automatically group genes together into clusters, each cluster potentially representing a shared biological function. Cheeseman's lab built a sophisticated computational pipeline called Brieflow (yes, like the cheese) to handle exactly this analysis.
But here's the bottleneck that technology alone couldn't solve: interpreting what those gene clusters actually mean. Why do certain genes group together? What biological process might they share? Is this a relationship scientists have documented before, or something entirely new?
The Limits of Human Expertise
For years, Cheeseman performed all this interpretation himself. He's accumulated remarkable expertise over his career; he estimates he can recall the function of approximately 5,000 genes off the top of his head, drawing on decades of reading, experimentation, and intuition. Even with this extraordinary knowledge base, analyzing data from a single large-scale screen still requires hundreds of hours of painstaking work.
The mathematics of the situation are sobering. A single screen can produce hundreds of gene clusters. The human genome contains roughly 20,000 protein-coding genes, each with complex functions that may vary by cell type, developmental stage, or disease state. The scientific literature describing what we know about these genes is scattered across millions of papers. No human, regardless of their brilliance or dedication, can comprehensively review all this information for every cluster that emerges from an experiment.
The result is scientific opportunity cost on a massive scale. Most gene clusters never get thoroughly investigated simply because labs don't have the time, the bandwidth, or the specialized knowledge required to pursue every lead. Potentially important discoveries languish in datasets, waiting for attention that may never come.
Building an AI Extension of Expert Cognition
Matteo Di Bernardo, a PhD student in Cheeseman's lab, set out to solve this problem. Working closely with Cheeseman to understand exactly how the senior scientist approaches cluster interpretation, Di Bernardo mapped out the expert's methodology: What data sources does he consult first? What patterns does he look for? What kinds of information make a finding seem interesting or dismiss it as likely noise?
They encoded this expertise into a Claude-powered system they named MozzareLLM (continuing the lab's cheese-themed naming convention). The system takes a cluster of genes and systematically performs the analysis an expert like Cheeseman would do: identifying what biological processes the genes might share, flagging which genes are well-understood versus poorly studied and thus potentially more interesting, evaluating the strength of evidence for different interpretations, and highlighting which findings might be worth following up with laboratory experiments.
Exceeding Human Performance
The results have exceeded expectations. Cheeseman reports that Claude consistently identifies things he missed in his own analyses:
"Every time I go through I'm like, I didn't notice that one! And in each case, these are discoveries that we can understand and verify."
This isn't AI hallucination or false pattern-matching. These are genuine biological insights that Cheeseman, with his deep expertise, can evaluate and confirm. The AI isn't replacing his judgment; it's extending his capacity to notice things that would otherwise slip past human attention.
What makes MozzareLLM particularly valuable in practice is its ability to communicate uncertainty. The system provides confidence levels for its findings, distinguishing between conclusions that are well-supported by multiple lines of evidence versus those that are more speculative. This calibrated uncertainty helps researchers make informed decisions about where to invest additional time and resources.
During development, Di Bernardo tested multiple AI models on the same gene cluster interpretation tasks. Claude consistently outperformed alternatives. In one particularly striking case, Claude correctly identified an RNA modification pathway that other models dismissed as random noise, a finding that subsequent investigation confirmed as genuine.
Democratizing Scientific Discovery
The implications extend beyond Cheeseman's own research. The team envisions making these Claude-annotated datasets publicly available, creating a new kind of scientific resource. A mitochondrial biologist anywhere in the world could dive into mitochondrial-related gene clusters that Cheeseman's lab flagged as interesting but never had time to pursue. Researchers studying rare diseases could explore gene clusters potentially relevant to their conditions. The tacit knowledge encoded in MozzareLLM becomes a shared scientific resource.
As other laboratories adopt similar approaches for their own CRISPR experiments, this could accelerate the functional characterization of genes across the entire genome. Thousands of genes remain poorly understood despite decades of molecular biology research. AI-assisted interpretation could finally bring them into focus.
Lundberg Lab: Revolutionizing How Scientists Choose What to Study
The Cheeseman lab's bottleneck lies in interpretation, in making sense of data they've already generated. But for many research teams, the critical bottleneck comes much earlier in the process: deciding which genes to target in the first place.
The Lundberg Lab at Stanford University faces exactly this challenge. Unlike Cheeseman's optical pooled screening approach, which can examine thousands of genes in a single experiment, the Lundberg team runs smaller, more focused screens targeting specific biological questions. This is necessary for certain cell types and experimental conditions that don't work well with pooled approaches.
The economics of focused screening create difficult trade-offs. A single well-designed screen can cost upwards of $20,000, and costs scale with the number of genes targeted. Labs typically select a few hundred genes they believe are most likely involved in whatever biological process they're studying.
The Spreadsheet Method and Its Limitations
The conventional process for selecting target genes is remarkably low-tech given the sophistication of modern biology. A team of graduate students and postdocs gathers around a shared Google spreadsheet. Each researcher adds candidate genes one by one, accompanied by a sentence of justification or perhaps a link to a relevant paper. Sometimes the evidence is strong: "Gene X was shown to regulate this pathway in mouse studies." Other times it's more speculative: "Gene Y is expressed in the relevant cell type and might be involved."
It's fundamentally an educated guessing game, informed by literature reviews, personal expertise, and scientific intuition, but ultimately constrained by human bandwidth. There are only so many papers a team can read, only so many genes researchers happen to remember or stumble upon.
This approach carries a deeper limitation: it's inherently biased toward genes that have already been extensively studied. If a gene has never been investigated in a particular context, it won't appear in the literature, and it probably won't make it onto the spreadsheet. This creates a self-reinforcing cycle where well-studied genes get studied more, while potentially important genes remain overlooked.
A Fundamentally Different Approach
The Lundberg Lab is using Claude to flip this paradigm entirely. Instead of asking "What guesses can we make based on what researchers have already studied?", their system asks a more fundamental question: "What should be studied, based on molecular properties themselves?"
To enable this approach, the team constructed a comprehensive map of cellular molecular relationships. This knowledge graph includes every known molecule in the cell, proteins, RNA transcripts, DNA regulatory elements, and captures how they relate to each other. Which proteins physically bind together? Which genes code for which protein products? Which molecules are structurally similar and might therefore perform similar functions?
Armed with this molecular relationship map, Claude can identify candidate genes through biological reasoning rather than literature mining. Given a research target, such as genes that might regulate a particular cellular structure or process, Claude navigates the relationship network to identify candidates based on their molecular properties, interaction patterns, and structural features, regardless of whether anyone has previously studied them in that context.
Putting the Approach to the Test
The Lundberg Lab is currently running a rigorous experiment to validate this methodology. To do so, they needed to select a biological topic where very little previous research existed. If they chose something well-studied, Claude might simply be recapitulating known findings it encountered during training, rather than demonstrating genuine molecular reasoning.
They settled on primary cilia, peculiar antenna-like appendages that protrude from most cell types in the human body. Despite their ubiquity, primary cilia remain poorly understood. We know they're implicated in various developmental disorders and neurological conditions, but the genetic networks controlling their formation and function are largely unmapped. This relative obscurity makes primary cilia an ideal test case.
The experimental design is elegant. First, the team will run a comprehensive whole-genome screen to establish ground truth: which genes actually affect cilia formation? This creates an unbiased answer key.
Then they compare prediction methods. Human experts use the traditional spreadsheet approach, drawing on their training, literature knowledge, and intuition to predict which genes will be involved. Claude uses the molecular relationship map to generate its own predictions through biological reasoning.
If Claude correctly identifies 150 out of 200 true hits while human experts catch only 80 out of 200, that's strong evidence the approach works better. Even if performance is roughly equal, Claude's speed advantage would still transform research efficiency.
If validated, this methodology could become a standard first step in perturbation screening across biology. Rather than gambling on human intuition or resorting to expensive brute-force whole-genome approaches, labs could make informed, biology-driven predictions about which genes to target, achieving better results with smaller, more affordable experiments.
The Broader Transformation of Scientific Discovery
These examples from Anthropic's research partners are part of a much larger transformation sweeping through science worldwide.
According to Microsoft Research President Peter Lee, the role of AI in science is approaching a fundamental threshold: "In 2026, AI won't just summarize papers, answer questions and write reports — it will actively join the process of discovery in physics, chemistry and biology. AI will generate hypotheses, use tools and apps that control scientific experiments, and collaborate with both human and AI research colleagues."
The Rise of Multi-Agent Scientific Systems
Google has released AI co-scientist, a sophisticated multi-agent AI system designed specifically to help scientists generate novel research hypotheses. The system coordinates multiple specialized AI components: some focused on literature analysis, others on experimental design, still others on logical reasoning and hypothesis evaluation.
Early results are remarkable. At Stanford, AI co-scientist helped researchers identify existing drugs that could potentially be repurposed to treat liver fibrosis, a finding that emerged from connecting patterns across disparate datasets that no individual researcher could have synthesized. At Imperial College London, researchers working on antimicrobial resistance, one of the most pressing public health challenges of our time, found that AI co-scientist produced the same core hypothesis in days that their human team had spent years developing through traditional research methods.
Compressing Research Cycles
OpenAI reports that their most advanced models are beginning to meaningfully accelerate research cycles. In documented laboratory experiments, AI assistance helped researchers design a new molecular cloning method that proved 79 times more efficient than standard approaches. The AI didn't just suggest optimizations; it reasoned through the molecular biology to identify novel modifications that human researchers hadn't considered.
The broader pattern is consistent: research cycles that once required days or weeks are being compressed to hours. Literature reviews that took weeks now take minutes. Data analyses that required specialized consultants can be performed directly by research teams.
Massive Investment Signals Confidence
The scale of investment flowing into AI for science reflects the perceived magnitude of this transformation. The U.S. federal government invested $3.3 billion in non-defense AI research and development in fiscal year 2025 alone. Private sector investments have been even more dramatic, exceeding $109 billion in 2024.
These aren't speculative bets on distant possibilities. Major pharmaceutical companies like Novo Nordisk are already deploying AI at scale. The company reports using Claude to reduce clinical study documentation from over 10 weeks to just 10 minutes, a transformation in efficiency that ripples through every aspect of drug development timelines.
Implications for the Future of Discovery
The patterns emerging from these early implementations suggest several profound implications for how science will be conducted in the coming years.
The Democratization of Specialized Expertise
Systems like MozzareLLM represent something genuinely new: the encoding of expert tacit knowledge into shareable, scalable tools. A mitochondrial biologist in Brazil can now access the analytical patterns that Cheeseman developed over decades at MIT. A graduate student just beginning their research career can leverage interpretive frameworks that previously existed only in the minds of senior scientists.
This democratization doesn't replace expertise; it amplifies it. The most knowledgeable researchers can now extend their impact far beyond what they could accomplish personally, while researchers with less experience gain access to analytical sophistication that would otherwise take years to develop.
From Literature-Driven to Biology-Driven Research
The Lundberg Lab's approach represents a subtle but potentially revolutionary shift in scientific methodology. Traditional hypothesis generation starts with what humans have already published: we study what we've studied. This creates powerful path dependencies and self-reinforcing biases in scientific attention.
Biology-driven hypothesis generation starts instead with molecular reality: what does the network of cellular relationships suggest might be important? This approach can identify research targets that have been overlooked not because they're unimportant, but simply because historical accidents of scientific attention never brought them into focus.
The Compression of Time
Perhaps the most consistent finding across all these examples is the dramatic compression of research timelines. Analyses that took months now take minutes. Interpretations that required weeks of expert attention can be generated in hours. Projects that demanded large collaborative teams can be accomplished by small groups or even individuals.
This time compression has cascading effects. Researchers can pursue more hypotheses, explore more alternatives, and iterate more rapidly on experimental designs. The limiting factor in science increasingly becomes not analytical capacity but creative imagination: the ability to ask the right questions.
New Research Paradigms Become Possible
Perhaps most importantly, AI isn't merely accelerating existing research approaches; it's enabling entirely different ones. When analysis becomes essentially free, researchers can explore hypotheses they would never have considered before. When literature review becomes instantaneous, connections across distant fields become visible. When expert interpretation scales, comprehensive characterization projects become feasible.
We may be witnessing the emergence of entirely new scientific methodologies, approaches to discovery that were simply impossible before AI assistance became sophisticated enough to serve as a genuine intellectual partner.
The Road Ahead
None of the systems described here are perfect, and the researchers building them are appropriately humble about current limitations.
Biomni includes extensive guardrails to detect when Claude might be generating unreliable outputs. MozzareLLM's conclusions still require expert validation before they inform laboratory decisions. The Lundberg Lab's biology-driven hypothesis generation remains under active experimental evaluation. Every team emphasizes that human judgment remains essential at critical decision points.
But the trajectory is unmistakable. As AI capabilities continue to advance, the tools available to scientists grow more powerful in concert. What earlier models could accomplish, basic code generation, simple literature summaries, has evolved into systems that can replicate sophisticated analytical work that previously required years of specialized training.
The researchers building these systems share a consistent observation: each new model release brings noticeable improvements in capability. Tasks that were impossible with last year's models become routine with current ones. What will be possible with next year's models is genuinely difficult to predict.
What seems certain is that AI has crossed a threshold. It's no longer merely a tool that scientists use occasionally for specific tasks. It's becoming a collaborator that fundamentally reshapes how scientific discovery happens, a partner in the oldest human endeavor: the pursuit of understanding.
Getting Started with AI in Research
For researchers interested in exploring AI-powered approaches to scientific work, several resources are now available:
- Anthropic's AI for Science program accepts applications from researchers worldwide working on high-impact scientific projects, providing API credits and support
- Biomni's web platform offers an accessible interface for biomedical research tasks, with documentation and tutorials
- Claude for Life Sciences provides specialized connectors to research platforms including Benchling, BioRender, PubMed, Scholar Gateway by Wiley, and Synapse.org
The transformation of scientific research through AI is accelerating. The examples in this article represent just the beginning of what's becoming possible when human creativity, domain expertise, and tireless curiosity combine with artificial intelligence in the pursuit of discovery.
The next breakthrough in medicine, materials science, or fundamental biology may well emerge from this new kind of collaboration, human insight amplified by artificial intelligence, working together to push the boundaries of what we know and what we can achieve.
For more on how AI agents are transforming various industries, read our comprehensive guide on What Are AI Agents or explore how The Agentic Era Is Reshaping Business and Commerce.