Your AI Can Design a Molecule. It Can't Formulate a Drug.
In 2024, AI drug discovery companies raised $3.8 billion in venture capital. Over 530 companies are focused on molecule design. Almost none address the question that actually kills drugs: how do you turn a molecule into something a patient can take? AI-discovered compounds show 80-90% Phase I success rates but only ~40% Phase II success, indistinguishable from the industry baseline. The formulation gap is where drugs die, and the industry has barely begun to close it.
The $3.8 Billion Blind Spot
In 2024, artificial intelligence drug discovery companies raised $3.8 billion in venture capital. Isomorphic Labs, spun out from Google DeepMind by Nobel Prize winners Demis Hassabis and John Jumper, closed a $600 million Series A, the largest biotech funding round of 2024. Xaira Therapeutics launched with $1 billion in committed capital. Chai Discovery, backed by OpenAI, pushed its valuation to $1.3 billion. Entering 2026, over 530 companies worldwide are focused on ai powered drug discovery, carrying a nearly 100% valuation premium over broader biopharma.
All of that capital chases the same question: What molecule should we make?
Almost none of it addresses the question that kills drugs in development: How do we turn that molecule into something a patient can take?
The pharmaceutical industry's ~90% clinical failure rate has not improved despite a decade of AI investment. AI-discovered compounds show progression rates similar to traditionally discovered compounds once they enter the clinic. From 2012 to 2024, platform partnerships between AI drug discovery companies and pharmaceutical giants have repeatedly stalled at Phase II. Insilico Medicine's Phase IIa results for rentosertib "fell short on statistically significant efficacy." Recursion's first clinical trial showed "no reportable efficacy."
The formulation is the bottleneck, and the industry is only now starting to confront it.
AI Drug Discovery Investment vs. Reality
The AI Discovery Hype Machine
The current wave of computer aided drug design and computational drug design platforms has produced real capabilities. AlphaFold 3, the leading system for ai protein folding and ai protein structure prediction, can predict the structure and interactions of virtually all biomolecules with at least 50% better accuracy than existing methods. For protein-ligand binding specifically, accuracy doubles compared to previous approaches. Structures that once required months of experimental determination now arrive in hours.
Molecular docking software has matured into a capable industrial toolkit. Schrodinger Glide achieves 90% pose prediction accuracy and processes approximately 10 seconds per compound, enabling high-throughput virtual screening. AutoDock Vina remains the most widely used free docking platform. Discovery Studio, Surflex, FlexX, DOCK, and MOE-Dock each handle specialized functions within the in silico drug design workflow. The infrastructure for drug discovery ai runs deep.
Generative molecular design has moved from concept to clinic. Eight leading ai in drug discovery companies had 31 drugs in human clinical trials as of early 2024, a number that has since grown. The global AI in pharmaceutical market reached an estimated $1.94 billion in 2025 and is projected to hit $16.49 billion by 2034. As of this year, 75% of pharmaceutical companies have made generative AI a strategic priority.
The numbers get uncomfortable after Phase I. AI-discovered molecules show an 80-90% success rate in Phase I, substantially higher than historic averages. In Phase II, the success rate drops to approximately 40%, indistinguishable from the industry baseline. Multiple AI-designed drugs were deprioritized, shelved, or showed no efficacy signal in 2025 trials.
The AI drug revolution needs a revolution.
Nature's npj Drug Discovery, 2025
Molecular design, regardless of sophistication, is insufficient on its own. AI can compress early discovery timelines by 30-40%. It cannot compress the formulation, manufacturing, and regulatory phases that constitute the majority of the drug development process. AlphaFold still struggles with protein-ligand complexes involving significant conformational changes (>5 angstrom RMSD), but even if that limitation vanished, the formulation gap would persist.
What AI Drug Discovery Does Not Address
Current ai powered drug discovery platforms do not predict or optimize: how a molecule will crystallize, whether it will dissolve in the GI tract, what pharmaceutical excipients it needs for stability, how it will behave during tableting or lyophilization, whether a formulation will scale from lab bench to commercial manufacturing, or long-term stability under real-world storage conditions. These are the reasons drugs die.
Formulation development remains an empirical discipline running on trial and error. With 3.6 million possible formulation combinations for a single compound, the design space is too large for brute-force experimentation. The $3.8 billion flowing into AI has done nothing to change that.
Where Drugs Actually Fail
of drug candidates that enter clinical studies fail. The average cost to develop a single new drug reached $2.23 billion in Deloitte's most recent analysis of the 20 largest pharma R&D budgets. The entire drug development process from ideation to market typically takes 10 to 15 years.
Source: Deloitte R&D Returns AnalysisAnalyses of clinical trial data from 2010 to 2017 attribute failures to four categories:
| Cause of Failure | Share | Notes |
|---|---|---|
| Lack of clinical efficacy | 40-50% | Often a formulation failure in disguise |
| Unmanageable toxicity | 30% | Dose-dependent; formulation affects exposure |
| Poor drug-like properties | 10-15% | Solubility, drug metabolism, pharmacokinetics, chemical instability |
| Lack of commercial need | 10% | Strategic miscalculation |
The "poor drug-like properties" category deserves a closer look. It currently accounts for 10-15% of failures, down from 30-40% in the 1990s. That apparent improvement is misleading. The reduction came because the drug discovery process got better at killing poor-property molecules earlier in the pipeline, not because the underlying problem was solved. Many efficacy failures are actually formulation failures in disguise: a drug that shows poor bioavailability due to formulation issues will appear to lack efficacy. The molecule may be perfectly effective if properly formulated through bioavailability enhancement techniques. The failure data conflates the two.
The Solubility Crisis
| BCS Class | Solubility | Permeability | % of Drug Candidates |
|---|---|---|---|
| Class I | High | High | ~34% |
| Class II | Low | High | ~17% |
| Class III | High | Low | ~39% |
| Class IV | Low | Low | ~10% |
Up to 70-90% of drug candidates in the development pipeline are poorly soluble (BCS Class II or IV). Roughly 40% of newly discovered chemical entities fail to reach the market specifically due to poor water solubility. Poor solubility is the central challenge of modern drug formulation .
A drug substance is classified as "highly soluble" only if the highest single therapeutic dose is completely soluble in 250 mL or less of aqueous media across the pH range of 1.2-6.8 at 37 degrees C. For drug absorption to occur, the molecule must dissolve. Without dissolution, there is no absorption. Without absorption, there is no efficacy, and $2.23 billion disappears.
The Ritonavir Disaster
Case Study: Ritonavir ($900M Formulation Failure)
In 1998, Abbott's HIV protease inhibitor ritonavir became one of the most instructive pharmaceutical formulation development failures in history. The drug had been on the market for two years when lots of the oral capsule formulation began failing dissolution testing.
The cause: a previously unknown crystalline polymorph (Form II) that spontaneously appeared in manufacturing. Form II had drastically lower solubility, compromising bioavailability entirely. Failure rates in affected lots reached up to 50%.
Ritonavir the molecule was a perfectly effective HIV protease inhibitor. The catastrophe was entirely a pharmaceutical chemistry and solid-state problem. No molecular docking software or ai protein folding system would have predicted it. Abbott reformulated at a cost of approximately $900 million.
Nifedipine tells a similar story: low oral bioavailability from poor aqueous solubility, decreasing dissolution rates during storage, 4 polymorphs, a dihydrate, multiple solvates, and an amorphous phase. The amorphous phase degrades at approximately 1.8x the rate of crystalline phases. The formulation solution took years to develop.
The Excipient Problem
Pharmaceutical excipients are not inert bystanders. Drug-excipient compatibility studies are among the most labor-intensive steps in pre-formulation, requiring binary and multi-component physical mixtures, stability testing at accelerated conditions for 1 to 3 months, and analysis using DSC, isothermal microcalorimetry, HSM, SEM, FT-IR, solid-state NMR, and PXRD. Three types of incompatibility must be assessed: physical, chemical, and therapeutic. Results from binary mixtures can differ completely from multi-component systems.
The traditional process takes 3-6 months. Novel approaches can reduce this to 1-2 weeks, but most companies still rely on empirical trial-and-error. The excipient intelligence gap is immense: the global pharmaceutical excipients market is valued at $9.63 billion and projected to reach $15.12 billion by 2032, yet AI penetration in excipient selection remains minimal.
Drug Stability Testing: The Years-Long Lock-In
Under ICH Q1A(R2) guidelines, drug stability testing requires at least three primary batches tested every 3 months during the first year, every 6 months during the second year, and annually thereafter, across long-term, intermediate, and accelerated storage conditions. A company cannot file for regulatory approval without a minimum of 12 months of long-term stability data. A change in excipient, process, or container closure can require restarting the entire stability program.
Early formulation decisions lock companies into years of commitments . The cost of getting it wrong is financial and temporal. In the drug development stages, time is the resource that cannot be recovered.
Quality by Design: The Regulatory Expectation
QbD pharmaceutical principles, codified in ICH Q8, Q9, Q10, and Q11, require defining a Quality Target Product Profile, identifying Critical Quality Attributes, establishing a design space, and implementing control strategies. Quality by design pharma implementation demands running dozens to hundreds of formulation experiments to map the design space. Multi-objective optimization across solubility, stability, manufacturability, and bioavailability is where drug formulation gets genuinely hard. The formulation scientist's dilemma is that human cognition can hold four variables at once, while the design space demands dozens.
Eroom's Law
The inflation-adjusted cost of developing a new drug roughly doubles every nine years. The number of new drugs approved per billion dollars spent on R&D has halved roughly every 9 years since 1950, falling around 80-fold in inflation-adjusted terms. AI drug discovery proponents claim they can break Eroom's Law. But if AI only accelerates the molecular design phase and leaves pharmaceutical formulation development untouched, the overall timeline will not meaningfully compress.
The Industry Wake-Up Call
In January 2025, the FDA published its first-ever draft guidance specifically addressing AI across the drug development lifecycle. It was the product of a December 2022 expert workshop, over 800 comments from external parties, CDER's experience reviewing over 500 submissions with AI components from 2016 to 2023, and a public workshop held August 6, 2024. The guidance introduces a risk-based credibility assessment framework for AI models used in manufacturing and quality control, not just discovery.
The EMA and FDA subsequently issued ten joint guiding principles for AI practice across the entire medicine lifecycle, from early pharmaceutical research to clinical trials to manufacturing to safety monitoring. This international alignment on AI principles signals that regulators expect AI to extend beyond discovery into formulation, manufacturing, and quality assurance. AI tools for research that touch only the molecular design phase will not satisfy the regulatory trajectory.
The CDER AI Council, established in 2024, provides oversight of AI activities across the agency. The direction is clear: ai in medicine and ai in research must encompass the full product lifecycle, not just molecule design.
Clinical Failures Expose Discovery-Only AI
The ai drug development track record entering 2026 is sobering. AI has not demonstrably improved the ~90% clinical failure rate. AI-discovered compounds show progression rates similar to traditionally discovered compounds once they enter the clinic. Multiple AI programs were deprioritized or shelved in 2025.
AI can compress early discovery timelines and reduce preclinical candidate development to 13-18 months. But clinical trial duration, regulatory review, and manufacturing scale-up remain bound by biology, patient enrollment, and regulatory requirements. These are formulation and process problems.
The Shift from Discovery-Only to Full-Lifecycle AI
Pharmaceutical industry trends are beginning to shift. A 2024 paper in the Journal of Controlled Release argued that "drugs need to be formulated with scale-up in mind," calling out the systemic failure of preliminary pharmaceutical research to consider manufacturing challenges. A 2025 paper in the Journal of Pharmaceutical Innovation described "AI-Driven Drug Formulation Development" as an emerging paradigm. Emerging tools include ExPreSo (a machine learning in pharma algorithm for excipient prediction), FormulationDT (a data-driven AI platform for rational formulation strategy design), and Merck's AI tool for predicting compatible co-formers for co-crystallization.
But the state of artificial intelligence in pharma for formulation work remains nascent:
- 530+ companies: focused on AI drug discovery
- $3.8 billion: in annual VC funding for discovery AI
- A handful: of emerging tools for AI-assisted formulation
- Minimal VC funding: specifically for formulation AI
The ai in pharmaceutical industry investment picture is almost entirely tilted toward finding molecules, while the science of turning those molecules into medicines remains largely manual.
McKinsey Biopharma Executive Survey (Latest)
Yet actual AI deployment in formulation and manufacturing lags far behind discovery. Pharmaceutical technology investment follows the hype, not the need.
The FDA's January 2025 guidance on advanced manufacturing explicitly supports real-time quality monitoring, Process Analytical Technology, and continuous manufacturing systems. The ai for scientists building these systems is fundamentally different from the ai chemistry tools optimizing molecular structures. It requires domain-specific training data, pharmaceutical-grade validation, and regulatory defensibility, none of which transfer from discovery AI platforms.
The gap between computational ambition and formulation reality remains the industry's biggest blind spot. The digital twin disconnect is a symptom: 79% of pharma companies virtualize their equipment, fewer than 17% virtualize their formulations. The ai in life sciences community is slowly waking up, but capital allocation has not caught up.
Manufacturing Reality
A formulation that works perfectly in a 100-gram lab batch can fail at 100-kilogram commercial scale. Approximately 1% of process operations lead to a scale-up incident, with:
- 30%: occurring during reaction operations
- 30%: involving crystallization (slower or unwanted nucleation)
- 40%: related to other work-up operations
Severity ranges from simple deviations to entirely unusable product. Mixing that is uniform in a 1-liter beaker becomes stratified in a 10,000-liter vessel. Heat transfer that is instantaneous at small scale becomes a rate-limiting factor at production scale. A powder does not flow the same on R&D equipment as on production-scale tablet presses, directly impacting weight uniformity and leading to tablet rejection.
Beyond a certain stage in drug product development, it is often impossible to make major changes in formulation or process, as such changes may require repeating clinical and stability studies. Early formulation decisions, made with limited data, can lock companies into suboptimal pharmaceutical manufacturing processes for the entire product lifecycle.
Drug Delivery Systems Add Complexity
Advanced drug delivery systems compound these challenges. Liposomal and lipid nanoparticle systems require integration of multiple components into a single nanosized carrier, creating problems for large-scale cGMP production. Specific challenges include sterilization (standard filtration causes membrane clogging and reduced liposome integrity), scale-up reproducibility, quality control, and cost. The FDA has invested in continuous manufacturing research for liposomes and lipid nanoparticles, reflecting the difficulty of these systems. The production technology is evolving toward continuous manufacturing , but it requires AI that understands pharmaceutical engineering, not just molecular design.
Pharmaceutical Process Validation: The Regulatory Burden
Pharmaceutical process validation under FDA 21 CFR Parts 210 and 211 requires written procedures and documented batch records for every manufacturing step, validated processes that consistently produce products meeting predefined quality attributes, trained personnel, environmental controls, equipment calibration, and in-process controls to prevent contamination. A single formulation change or excipient substitution or process parameter adjustment can trigger revalidation, potentially costing months of effort and millions of dollars.
The pharmaceutical analysis requirements are extensive: physical, chemical, biological, microbiological, preservative content, and functionality testing, all documented with complete traceability. In January 2025, the FDA issued new guidance further emphasizing real-time quality monitoring and the integration of analytical tools into manufacturing workflows.
The Computation-to-Manufacturing Gap
The disconnect between AI drug discovery and real-world pharmaceutical product development is starkest at the manufacturing stage.
AI Can
Predict that a molecule will bind to a target. Optimize a pharmacokinetic profile. Screen millions of virtual compounds in hours. Suggest an optimal molecular structure.
AI Cannot
Predict unexpected polymorphs during spray drying. Ensure uniform dosage units at 200,000 tablets per hour. Compress the 12-month minimum stability program. Account for an excipient supplier changing their manufacturing process.
Pharmaceutical engineering at the manufacturing level deals with variables that are physical, empirical, and scale-dependent. They emerge from interactions between materials, equipment, and environmental conditions that resist purely computational prediction. The polymer selection paradox , the cocrystal screening dilemma , and the viscosity ceiling for biologics are all examples of this daily reality in pharmaceutical product development.
DeepC Builds the Bridge
DeepC occupies a different position in the AI pharmaceutical market. While the $3.8 billion flowing into AI drug discovery chases molecules, DeepC focuses on what happens after a molecule is found: turning it into a deliverable, stable, manufacturable drug product.
DeepC is an AI co-scientist for formulation scientists, purpose-built for the formulation development lifecycle where drugs actually die.
Specialized AI Agents
DeepC deploys purpose-built ai powered research tools, each engineered for specific formulation science tasks:
- Formulation Agent: Assists with excipient selection, compatibility assessment, and formulation design. Uses pharmaceutical machine learning models trained on real-world formulation data to predict interactions and suggest strategies, reducing the traditional 3-6 month empirical screening process to a prioritized shortlist.
- Research Agent: Mines and synthesizes pharmaceutical literature, regulatory databases, and clinical evidence. Converts labor-intensive literature review into a structured, queryable knowledge base.
- Analytics Agent: Provides pharmaceutical data analytics for interpreting experimental results, identifying trends in stability data, and performing statistical analysis aligned with ICH guidelines and QbD principles.
Purpose-Built Research Tools
SMILES2SPEC
Computational chemistry software that predicts LC-MS/MS spectra from molecular structures, letting formulation scientists anticipate analytical challenges before running expensive experiments.
MiCQ
ML-powered Critical Quality Attribute imputation. Uses KNN, MICE, MissForest, and CatBoost to address incomplete CQA datasets, turning a pervasive formulation development blocker into a solvable problem.
Elute
Extracts structured data from over 50 document formats, including CMC documents, batch records, stability reports, analytical certificates, and regulatory filings.
Grounded in Authoritative Data
| Data Source | Coverage |
|---|---|
| FDA Inactive Ingredient Database | All FDA-approved excipients with maximum potency levels |
| FAERS | Individual case safety reports for excipient-related signal detection |
| DailyMed | 154,834+ drug labeling records with complete formulation data |
| ClinicalTrials.gov | Comprehensive clinical study registry |
| PubMed | Biomedical literature for evidence-based formulation decisions |
Quality by Computational Design
DeepC extends traditional QbD into what it calls Quality by Computational Design (QbCD):
- Predictive CQA identification using ML models trained on pharmaceutical manufacturing data
- Virtual design space exploration that supplements physical DoE experiments
- Risk-based formulation screening that prioritizes experiments most likely to define design space boundaries
- Audit trails and data provenance built into every AI-assisted decision
This approach aligns with the FDA's January 2025 draft guidance on AI credibility assessment, which requires documented validation, transparent methodology, and traceable data lineage for AI models used in regulatory decisions. The AI validation burden is real, and DeepC was designed for it. The commercial urgency is just as pressing: with $300 billion in revenue facing patent expiration , companies need reformulation capabilities that are fast, data-driven, and defensible.
The Market Asymmetry
530+ companies focused on AI drug discovery. $3.8 billion in annual VC funding for discovery AI. A handful of tools for AI-assisted formulation.
DeepC does not compete with Isomorphic Labs, Recursion, or Insilico Medicine for the molecule design market. It builds the platform their molecules will need when they leave the computational domain and enter the physical world of formulation, manufacturing, and regulatory submission.
The data to power this work exists. It has never been assembled for formulation scientists.
The Bottom Line
The pharmaceutical industry has spent $3.8 billion per year teaching AI to find molecules. It has spent almost nothing teaching AI to formulate drugs. The clinical failure data is unambiguous: drugs fail because the formulation was wrong, the stability was wrong, the manufacturing process was wrong, the excipient interaction was missed.
AI-designed molecules show 80-90% Phase I success rates and 40% Phase II success rates, no better than traditional discovery. The gap between Phase I and Phase II is where formulation science lives. Solubility, stability, bioavailability, and manufacturability determine whether a molecule becomes a medicine or a write-off.
Eroom's Law will not break by accelerating one phase of drug development while leaving the rest untouched. The $2.23 billion average cost per new drug will not decline if AI compresses discovery from four years to eighteen months but formulation development, stability testing, and manufacturing scale-up consume the same decade they always have.
The molecule is the beginning. The drug product is the end. DeepC builds the bridge between them.

