Andrew White | FutureHouse
Automating Scientific Discovery with Open Data
Andrew White
FutureHouse
CZI Open Science
October 2025
Science is changing independent of AI
Arxiv.org,10.6084/m9.figshare.17064419.v3
Number of Researchers are Growing
International R&D spending
PhD Researchers
NSF - https://ncses.nsf.gov/pubs/nsf24332; UNESCO UISI SDG9
Intellectual bottlenecks are growing
📝 Increasing paper count ($\approx$10M per year)
🧬 Larger data sets from cheaper
experiments (genome at
$200 per person, $1 / GB of sequencing)
🔍95% decline in disruptive papers since 1980
Park, M. et al. Nature 613, 138-144 (2023); Scannell, J.W. et al. Nat. Rev. Drug Discov. 11, 191–200 (2012); Deloitte 2025: Pharma innovation returns.
Accelerate Scientific Discovery
What is an agent?
Agent: trained, makes decisions
Environment: untrained, has tools, state
Wet lab validation
| Name | Environment | Key Tools |
|---|---|---|
| Crow/PaperQA | Literature Research | Search, Citation Traversal |
| ProteinCrow | Designing novel proteins | AlphaFold2, Molecular Dynamics |
| ChemCrow/Phoenix | Designing new molecules | Retrosynthesis, self-driving robotic lab |
| Data analysis crow/Finch | Generating discoveries from data | bioinformatics tools, code, file system |
Modify surface residues of IL-10 to increase expression and solubility in E. coli without disrupting dimerization or receptor interaction.
Language agents achieve superhuman synthesis of scientific knowledge
Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, Andrew D. White arXiv:2409.13740, 2024
Overexpression studies of PRMT4 in SW480 UPF1 knockout cells show that which arginine residue in PRMT4 is important for asymmetric di-methylation of UPF1 R433?
Better at answering questions than PhD biology experts
Improving over time
Better than human written Wikipedia articles
Can be used to check for precedent and disagreement in literature
API
Model intelligence will continue to increase
Complete cycle of disease to mechanism to target to drug
ROBIN: A Multi-Agent System for Automating Scientific Discovery
Ali Essam Ghareeb*, Benjamin Chang*, Ludovico Mitchener, Angela Yiu, Caralyn J. Szostkiewicz, Jon M. Laurent, Muhammed T. Razzak, Andrew D. White†, Michaela M. Hinks‡, Samuel G. Rodriques